Re: [PATCH] drm/i915/gt: Report full vm address range
Hi Andi, In Mesa we've been relying on I915_CONTEXT_PARAM_GTT_SIZE so as long as that is adjusted by the kernel, we should be able to continue working without issues. Acked-by: Lionel Landwerlin Thanks, -Lionel On 13/03/2024 21:39, Andi Shyti wrote: Commit 9bb66c179f50 ("drm/i915: Reserve some kernel space per vm") has reserved an object for kernel space usage. Userspace, though, needs to know the full address range. Fixes: 9bb66c179f50 ("drm/i915: Reserve some kernel space per vm") Signed-off-by: Andi Shyti Cc: Andrzej Hajda Cc: Chris Wilson Cc: Lionel Landwerlin Cc: Michal Mrozek Cc: Nirmoy Das Cc: # v6.2+ --- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index fa46d2308b0e..d76831f50106 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -982,8 +982,9 @@ static int gen8_init_rsvd(struct i915_address_space *vm) vm->rsvd.vma = i915_vma_make_unshrinkable(vma); vm->rsvd.obj = obj; - vm->total -= vma->node.size; + return 0; + unref: i915_gem_object_put(obj); return ret;
Re: [PATCH] drm/i915/perf: Clear out entire reports after reading if not power of 2 size
On 22/05/2023 23:17, Ashutosh Dixit wrote: Clearing out report id and timestamp as means to detect unlanded reports only works if report size is power of 2. That is, only when report size is a sub-multiple of the OA buffer size can we be certain that reports will land at the same place each time in the OA buffer (after rewind). If report size is not a power of 2, we need to zero out the entire report to be able to detect unlanded reports reliably. Cc: Umesh Nerlige Ramappa Signed-off-by: Ashutosh Dixit Sad but necessary unfortunately Reviewed-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_perf.c | 17 +++-- 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 19d5652300eeb..58284156428dc 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -877,12 +877,17 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream, stream->oa_buffer.last_ctx_id = ctx_id; } - /* -* Clear out the report id and timestamp as a means to detect unlanded -* reports. -*/ - oa_report_id_clear(stream, report32); - oa_timestamp_clear(stream, report32); + if (is_power_of_2(report_size)) { + /* +* Clear out the report id and timestamp as a means +* to detect unlanded reports. +*/ + oa_report_id_clear(stream, report32); + oa_timestamp_clear(stream, report32); + } else { + /* Zero out the entire report */ + memset(report32, 0, report_size); + } } if (start_offset != *offset) {
Re: [Intel-gfx] [PATCH v8 6/8] drm/i915/uapi/pxp: Add a GET_PARAM for PXP
On 27/04/2023 21:19, Teres Alexis, Alan Previn wrote: (fixed email addresses again - why is my Evolution client deteorating??) On Thu, 2023-04-27 at 17:18 +, Teres Alexis, Alan Previn wrote: On Wed, 2023-04-26 at 15:35 -0700, Justen, Jordan L wrote: On 2023-04-26 11:17:16, Teres Alexis, Alan Previn wrote: alan:snip Can you tell that pxp is in progress, but not ready yet, as a separate state from 'it will never work on this platform'? If so, maybe the status could return something like: 0: It's never going to work 1: It's ready to use 2: It's starting and should work soon I could see an argument for treating that as a case where we could still advertise protected content support, but if we try to use it we might be in for a nasty delay. alan: IIRC Lionel seemed okay with any permutation that would allow it to not get blocked. Daniele did ask for something similiar to what u mentioned above but he said that is non-blocking. But since both you AND Daniele have mentioned the same thing, i shall re-rev this and send that change out today. I notice most GET_PARAMS use -ENODEV for "never gonna work" so I will stick with that. but 1 = ready to use and 2 = starting and should work sounds good. so '0' will never be returned - we just look for a positive value (from user space). I will also make a PR for mesa side as soon as i get it tested. thanks for reviewing btw. alan: I also realize with these final touch-ups, we can go back to the original pxp-context-creation timeout of 250 milisecs like it was on ADL since the user space component will have this new param to check on (so even farther down from 1 sec on the last couple of revs). Jordan, Lional - i am thinking of creating the PR on MESA side to take advantage of GET_PARAM on both get-caps AND runtime creation (latter will be useful to ensure no unnecesssary delay experienced by Mesa stuck in kernel call - which practically never happenned in ADL AFAIK): 1. MESA PXP get caps: - use GET_PARAM (any positive number shall mean its supported). 2. MESA app-triggered PXP context creation (i.e. if caps was supported): - use GET_PARAM to wait until positive number switches from "2" to "1". - now call context creation. So at this point if it fails, we know its an actual failure. you guys okay with above? (i'll re-rev this kernel series first and wait on your ack or feedback before i create/ test/ submit a PR for Mesa side). Sounds good. Thanks, -Lionel
Re: [PATCH v7 6/8] drm/i915/uapi/pxp: Fix UAPI spec comments and add GET_PARAM for PXP
On 14/04/2023 18:17, Teres Alexis, Alan Previn wrote: Hi Lionel, does this patch work for you? Hi, Sorry for the late answer. That looks good : Acked-by: Lionel Landwerlin Thanks, -Lionel On Mon, 2023-04-10 at 10:22 -0700, Ceraolo Spurio, Daniele wrote: On 4/6/2023 10:44 AM, Alan Previn wrote: alan:snip +/* + * Query the status of PXP support in i915. + * + * The query can fail in the following scenarios with the listed error codes: + * -ENODEV = PXP support is not available on the GPU device or in the kernel + *due to missing component drivers or kernel configs. + * If the IOCTL is successful, the returned parameter will be set to one of the + * following values: + * 0 = PXP support maybe available but underlying SOC fusing, BIOS or firmware + * configuration is unknown and a PXP-context-creation would be required + * for final verification of feature availibility. Would it be useful to add: 1 = PXP support is available And start returning that after we've successfully created our first session? Not sure if userspace would use this though, since they still need to handle the 0 case anyway. I'm also ok with this patch as-is, as long as you get an ack from the userspace drivers for this interface behavior: Reviewed-by: Daniele Ceraolo Spurio Daniele alan:snip
Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO creation
On 04/04/2023 19:04, Yang, Fei wrote: Subject: Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO creation On 01/04/2023 09:38, fei.y...@intel.com wrote: From: Fei Yang To comply with the design that buffer objects shall have immutable cache setting through out its life cycle, {set, get}_caching ioctl's are no longer supported from MTL onward. With that change caching policy can only be set at object creation time. The current code applies a default (platform dependent) cache setting for all objects. However this is not optimal for performance tuning. The patch extends the existing gem_create uAPI to let user set PAT index for the object at creation time. The new extension is platform independent, so UMD's can switch to using this extension for older platforms as well, while {set, get}_caching are still supported on these legacy paltforms for compatibility reason. Cc: Chris Wilson Cc: Matt Roper Signed-off-by: Fei Yang Reviewed-by: Andi Shyti Just like the protected content uAPI, there is no way for userspace to tell this feature is available other than trying using it. Given the issues with protected content, is it not thing we could want to add? Sorry I'm not aware of the issues with protected content, could you elaborate? There was a long discussion on teams uAPI channel, could you comment there if any concerns? https://teams.microsoft.com/l/message/19:f1767bda6734476ba0a9c7d147b928d1@thread.skype/1675860924675?tenantId=46c98d88-e344-4ed4-8496-4ed7712e255d&groupId=379f3ae1-d138-4205-bb65-d4c7d38cb481&parentMessageId=1675860924675&teamName=GSE%20OSGC&channelName=i915%20uAPI%20changes&createdTime=1675860924675&allowXTenantAccess=false Thanks, -Fei We wanted to have a getparam to detect protected support and were told to detect it by trying to create a context with it. Now it appears trying to create a protected context can block for several seconds. Since we have to report capabilities to the user even before it creates protected contexts, any app is at risk of blocking. -Lionel Thanks, -Lionel --- drivers/gpu/drm/i915/gem/i915_gem_create.c | 33 include/uapi/drm/i915_drm.h| 36 ++ tools/include/uapi/drm/i915_drm.h | 36 ++ 3 files changed, 105 insertions(+)
Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO creation
On 01/04/2023 09:38, fei.y...@intel.com wrote: From: Fei Yang To comply with the design that buffer objects shall have immutable cache setting through out its life cycle, {set, get}_caching ioctl's are no longer supported from MTL onward. With that change caching policy can only be set at object creation time. The current code applies a default (platform dependent) cache setting for all objects. However this is not optimal for performance tuning. The patch extends the existing gem_create uAPI to let user set PAT index for the object at creation time. The new extension is platform independent, so UMD's can switch to using this extension for older platforms as well, while {set, get}_caching are still supported on these legacy paltforms for compatibility reason. Cc: Chris Wilson Cc: Matt Roper Signed-off-by: Fei Yang Reviewed-by: Andi Shyti Just like the protected content uAPI, there is no way for userspace to tell this feature is available other than trying using it. Given the issues with protected content, is it not thing we could want to add? Thanks, -Lionel --- drivers/gpu/drm/i915/gem/i915_gem_create.c | 33 include/uapi/drm/i915_drm.h| 36 ++ tools/include/uapi/drm/i915_drm.h | 36 ++ 3 files changed, 105 insertions(+) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c index e76c9703680e..1c6e2034d28e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c @@ -244,6 +244,7 @@ struct create_ext { unsigned int n_placements; unsigned int placement_mask; unsigned long flags; + unsigned int pat_index; }; static void repr_placements(char *buf, size_t size, @@ -393,11 +394,39 @@ static int ext_set_protected(struct i915_user_extension __user *base, void *data return 0; } +static int ext_set_pat(struct i915_user_extension __user *base, void *data) +{ + struct create_ext *ext_data = data; + struct drm_i915_private *i915 = ext_data->i915; + struct drm_i915_gem_create_ext_set_pat ext; + unsigned int max_pat_index; + + BUILD_BUG_ON(sizeof(struct drm_i915_gem_create_ext_set_pat) != +offsetofend(struct drm_i915_gem_create_ext_set_pat, rsvd)); + + if (copy_from_user(&ext, base, sizeof(ext))) + return -EFAULT; + + max_pat_index = INTEL_INFO(i915)->max_pat_index; + + if (ext.pat_index > max_pat_index) { + drm_dbg(&i915->drm, "PAT index is invalid: %u\n", + ext.pat_index); + return -EINVAL; + } + + ext_data->pat_index = ext.pat_index; + + return 0; +} + static const i915_user_extension_fn create_extensions[] = { [I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements, [I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected, + [I915_GEM_CREATE_EXT_SET_PAT] = ext_set_pat, }; +#define PAT_INDEX_NOT_SET 0x /** * Creates a new mm object and returns a handle to it. * @dev: drm device pointer @@ -417,6 +446,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data, if (args->flags & ~I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) return -EINVAL; + ext_data.pat_index = PAT_INDEX_NOT_SET; ret = i915_user_extensions(u64_to_user_ptr(args->extensions), create_extensions, ARRAY_SIZE(create_extensions), @@ -453,5 +483,8 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data, if (IS_ERR(obj)) return PTR_ERR(obj); + if (ext_data.pat_index != PAT_INDEX_NOT_SET) + i915_gem_object_set_pat_index(obj, ext_data.pat_index); + return i915_gem_publish(obj, file, &args->size, &args->handle); } diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index dba7c5a5b25e..03c5c314846e 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -3630,9 +3630,13 @@ struct drm_i915_gem_create_ext { * * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see * struct drm_i915_gem_create_ext_protected_content. +* +* For I915_GEM_CREATE_EXT_SET_PAT usage see +* struct drm_i915_gem_create_ext_set_pat. */ #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0 #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1 +#define I915_GEM_CREATE_EXT_SET_PAT 2 __u64 extensions; }; @@ -3747,6 +3751,38 @@ struct drm_i915_gem_create_ext_protected_content { __u32 flags; }; +/** + * struct drm_i915_gem_create_ext_set_pat - The + * I915_GEM_CREATE_EXT_SET_PAT extension. + * + * If this extension is provided, the specified caching policy (PAT index) is + * applied to the buffer object. + * + * Below is an example on how to create an object with
Re: [PATCH v6 5/8] drm/i915/pxp: Add ARB session creation and cleanup
On 26/03/2023 14:18, Rodrigo Vivi wrote: On Sat, Mar 25, 2023 at 02:19:21AM -0400, Teres Alexis, Alan Previn wrote: alan:snip @@ -353,8 +367,20 @@ int intel_pxp_start(struct intel_pxp *pxp) alan:snip + if (HAS_ENGINE(pxp->ctrl_gt, GSC0)) { + /* +* GSC-fw loading, GSC-proxy init (requiring an mei component driver) and +* HuC-fw loading must all occur first before we start requesting for PXP +* sessions. Checking HuC authentication (the last dependency) will suffice. +* Let's use a much larger 8 second timeout considering all the types of +* dependencies prior to that. +*/ + if (wait_for(intel_huc_is_authenticated(&pxp->ctrl_gt->uc.huc), 8000)) This big timeout needs an ack from userspace drivers, as intel_pxp_start is called during context creation and the current way to query if the feature is supported is to create a protected context. Unfortunately, we do need to wait to confirm that PXP is available (although in most cases it shouldn't take even close to 8 secs), because until everything is setup we're not sure if things will work as expected. I see 2 potential mitigations in case the timeout doesn't work as-is: 1) we return -EAGAIN (or another dedicated error code) to userspace if the prerequisite steps aren't done yet. This would indicate that the feature is there, but that we haven't completed the setup yet. The caller can then decide if they want to retry immediately or later. Pro: more flexibility for userspace; Cons: new interface return code. 2) we add a getparam to say if PXP is supported in HW and the support is compiled in i915. Userspace can query this as a way to check the feature support and only create the context if they actually need it for PXP operations. Pro: simpler kernel implementation; Cons: new getparam, plus even if the getparam returns true the pxp_start could later fail, so userspace needs to handle that case. alan: I've cc'd Rodrigo, Joonas and Lionel. Folks - what are your thoughts on above issue? Recap: On MTL, only when creating a GEM Protected (PXP) context for the very first time after a driver load, it will be dependent on (1) loading the GSC firmware, (2) GuC loading the HuC firmware and (3) GSC authenticating the HuC fw. But step 3 also depends on additional GSC-proxy-init steps that depend on a new mei-gsc-proxy component driver. I'd used the 8 second number based on offline conversations with Daniele but that is a worse-case. Alternatively, should we change UAPI instead to return -EAGAIN as per Daniele's proposal? I believe we've had the get-param conversation offline recently and the direction was to stick with attempting to create the context as it is normal in 3D UMD when it comes to testing capabilities for other features too. Thoughts? I like the option 1 more. This extra return handling won't break compatibility. I like option 2 better because we have to report support as fast as we can when enumerating devices on the system for example. If I understand correctly, with the get param, most apps won't ever be blocking on any PXP stuff if they don't use it. Only the ones that require protected support might block. -Lionel
Re: [PATCH v2] drm/syncobj: Fix sync syncobj issue
I'll let Lucas comment. I've only looked a little at it. From what I remember just enabling sw_signaling was enough to fix the issue. -Lionel On 12/07/2022 13:26, Christian König wrote: Ping to the Intel guys here. Especially Lucas/Nirmoy/Lionel. IIRC you stumbled over that problem as well, have you found any solution? Regards, Christian. Am 07.07.22 um 12:29 schrieb jie1zhan: enable signaling after flatten dma_fence_chains on transfer Signed-off-by: jie1zhan --- drivers/gpu/drm/drm_syncobj.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 7e48dcd1bee4..0d9d3577325f 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -920,6 +920,7 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, if (ret) goto err_free_fence; + dma_fence_enable_sw_signaling(fence); chain = dma_fence_chain_alloc(); if (!chain) { ret = -ENOMEM;
Re: [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition
On 30/06/2022 20:12, Zanoni, Paulo R wrote: Can you please explain what happens when we try to write to a range that's bound as read-only? It will be mapped as read-only in device page table. Hence any write access will fail. I would expect a CAT error reported. What's a CAT error? Does this lead to machine freeze or a GPU hang? Let's make sure we document this. Catastrophic error. Reading the documentation, it seems the behavior depends on the context type. With the Legacy 64bit context type, writes are ignored (BSpec 531) : - "For legacy context, the access rights are not applicable and should not be considered during page walk." For Advanced 64bit context type, I think the HW will generate a pagefault. -Lionel
Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition
On 23/06/2022 14:05, Tvrtko Ursulin wrote: On 23/06/2022 09:57, Lionel Landwerlin wrote: On 23/06/2022 11:27, Tvrtko Ursulin wrote: After a vm_unbind, UMD can re-bind to same VA range against an active VM. Though I am not sue with Mesa usecase if that new mapping is required for running GPU job or it will be for the next submission. But ensuring the tlb flush upon unbind, KMD can ensure correctness. Isn't that their problem? If they re-bind for submitting _new_ work then they get the flush as part of batch buffer pre-amble. In the non sparse case, if a VA range is unbound, it is invalid to use that range for anything until it has been rebound by something else. We'll take the fence provided by vm_bind and put it as a wait fence on the next execbuffer. It might be safer in case of memory over fetching? TLB flush will have to happen at some point right? What's the alternative to do it in unbind? Currently TLB flush happens from the ring before every BB_START and also when i915 returns the backing store pages to the system. For the former, I haven't seen any mention that for execbuf3 there are plans to stop doing it? Anyway, as long as this is kept and sequence of bind[1..N]+execbuf is safe and correctly sees all the preceding binds. Hence about the alternative to doing it in unbind - first I think lets state the problem that is trying to solve. For instance is it just for the compute "append work to the running batch" use case? I honestly don't remember how was that supposed to work so maybe the tlb flush on bind was supposed to deal with that scenario? Or you see a problem even for Mesa with the current model? Regards, Tvrtko As far as I can tell, all the binds should have completed before execbuf starts if you follow the vulkan sparse binding rules. For non-sparse, the UMD will take care of it. I think we're fine. -Lionel
Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition
On 22/06/2022 18:12, Niranjana Vishwanathapura wrote: On Wed, Jun 22, 2022 at 09:10:07AM +0100, Tvrtko Ursulin wrote: On 22/06/2022 04:56, Niranjana Vishwanathapura wrote: VM_BIND and related uapi definitions v2: Reduce the scope to simple Mesa use case. v3: Expand VM_UNBIND documentation and add I915_GEM_VM_BIND/UNBIND_FENCE_VALID and I915_GEM_VM_BIND_TLB_FLUSH flags. Signed-off-by: Niranjana Vishwanathapura --- Documentation/gpu/rfc/i915_vm_bind.h | 243 +++ 1 file changed, 243 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h new file mode 100644 index ..fa23b2d7ec6f --- /dev/null +++ b/Documentation/gpu/rfc/i915_vm_bind.h @@ -0,0 +1,243 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2022 Intel Corporation + */ + +/** + * DOC: I915_PARAM_HAS_VM_BIND + * + * VM_BIND feature availability. + * See typedef drm_i915_getparam_t param. + */ +#define I915_PARAM_HAS_VM_BIND 57 + +/** + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND + * + * Flag to opt-in for VM_BIND mode of binding during VM creation. + * See struct drm_i915_gem_vm_control flags. + * + * The older execbuf2 ioctl will not support VM_BIND mode of operation. + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any + * execlist (See struct drm_i915_gem_execbuffer3 for more details). + * + */ +#define I915_VM_CREATE_FLAGS_USE_VM_BIND (1 << 0) + +/* VM_BIND related ioctls */ +#define DRM_I915_GEM_VM_BIND 0x3d +#define DRM_I915_GEM_VM_UNBIND 0x3e +#define DRM_I915_GEM_EXECBUFFER3 0x3f + +#define DRM_IOCTL_I915_GEM_VM_BIND DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind) +#define DRM_IOCTL_I915_GEM_VM_UNBIND DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind) +#define DRM_IOCTL_I915_GEM_EXECBUFFER3 DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3) + +/** + * struct drm_i915_gem_vm_bind_fence - Bind/unbind completion notification. + * + * A timeline out fence for vm_bind/unbind completion notification. + */ +struct drm_i915_gem_vm_bind_fence { + /** @handle: User's handle for a drm_syncobj to signal. */ + __u32 handle; + + /** @rsvd: Reserved, MBZ */ + __u32 rsvd; + + /** + * @value: A point in the timeline. + * Value must be 0 for a binary drm_syncobj. A Value of 0 for a + * timeline drm_syncobj is invalid as it turns a drm_syncobj into a + * binary one. + */ + __u64 value; +}; + +/** + * struct drm_i915_gem_vm_bind - VA to object mapping to bind. + * + * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU + * virtual address (VA) range to the section of an object that should be bound + * in the device page table of the specified address space (VM). + * The VA range specified must be unique (ie., not currently bound) and can + * be mapped to whole object or a section of the object (partial binding). + * Multiple VA mappings can be created to the same section of the object + * (aliasing). + * + * The @start, @offset and @length should be 4K page aligned. However the DG2 + * and XEHPSDV has 64K page size for device local-memory and has compact page + * table. On those platforms, for binding device local-memory objects, the + * @start should be 2M aligned, @offset and @length should be 64K aligned. Should some error codes be documented and has the ability to programmatically probe the alignment restrictions been considered? Currently what we have internally is that -EINVAL is returned if the sart, offset and length are not aligned. If the specified mapping already exits, we return -EEXIST. If there are conflicts in the VA range and VA range can't be reserved, then -ENOSPC is returned. I can add this documentation here. But I am worried that there will be more suggestions/feedback about error codes while reviewing the code patch series, and we have to revisit it again. That's not really a good excuse to not document. + * Also, on those platforms, it is not allowed to bind an device local-memory + * object and a system memory object in a single 2M section of VA range. Text should be clear whether "not allowed" means there will be an error returned, or it will appear to work but bad things will happen. Yah, error returned, will fix. + */ +struct drm_i915_gem_vm_bind { + /** @vm_id: VM (address space) id to bind */ + __u32 vm_id; + + /** @handle: Object handle */ + __u32 handle; + + /** @start: Virtual Address start to bind */ + __u64 start; + + /** @offset: Offset in object to bind */ + __u64 offset; + + /** @length: Length of mapping to bind */ + __u64 length; + + /** + * @flags: Supported flags are: + * + * I915_GEM_VM_BIND_FENCE_VALID: + * @fence is valid, needs bind completion notificati
Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition
On 23/06/2022 11:27, Tvrtko Ursulin wrote: After a vm_unbind, UMD can re-bind to same VA range against an active VM. Though I am not sue with Mesa usecase if that new mapping is required for running GPU job or it will be for the next submission. But ensuring the tlb flush upon unbind, KMD can ensure correctness. Isn't that their problem? If they re-bind for submitting _new_ work then they get the flush as part of batch buffer pre-amble. In the non sparse case, if a VA range is unbound, it is invalid to use that range for anything until it has been rebound by something else. We'll take the fence provided by vm_bind and put it as a wait fence on the next execbuffer. It might be safer in case of memory over fetching? TLB flush will have to happen at some point right? What's the alternative to do it in unbind? -Lionel
Re: [PATCH v2 01/12] drm/doc: add rfc section for small BAR uapi
On 21/06/2022 13:44, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) v3: - Drop the vma query for now. - Add unallocated_cpu_visible_size as part of the region query. - Improve the docs some more, including documenting the expected behaviour on older kernels, since this came up in some offline discussion. v4: - Various improvements all over. (Tvrtko) v5: - Include newer integrated platforms when applying the non-recoverable context and error capture restriction. (Thomas) Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Tvrtko Ursulin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org Acked-by: Tvrtko Ursulin Acked-by: Akeem G Abodunrin With Jordan with have changes for Anv/Iris : https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16739 Acked-by: Lionel Landwerlin --- Documentation/gpu/rfc/i915_small_bar.h | 189 +++ Documentation/gpu/rfc/i915_small_bar.rst | 47 ++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 240 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..752bb2ceb399 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,189 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** +* @probed_size: Memory probed by the driver (-1 = unknown) +* +* Note that it should not be possible to ever encounter a zero value +* here, also note that no current region type will ever return -1 here. +* Although for future region types, this might be a possibility. The +* same applies to the other size fields. +*/ + __u64 probed_size; + + /** +* @unallocated_size: Estimate of memory remaining (-1 = unknown) +* +* Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable accounting. +* Without this (or if this is an older kernel) the value here will +* always equal the @probed_size. Note this is only currently tracked +* for I915_MEMORY_CLASS_DEVICE regions (for other types the value here +* will always equal the @probed_size). +*/ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** +* @probed_cpu_visible_size: Memory probed by the driver +* that is CPU accessible. (-1 = unknown). +* +* This will be always be <= @probed_size, and the +* remainder (if there is any) will not be CPU +* accessible. +* +* On systems without small BAR, the @probed_size will +* always equal the @probed_cpu_visible_size, since all +* of it will be CPU accessible. +* +* Note this is only tracked for +* I915_MEMORY_CLASS_DEVICE regions (for other types the +* value here will always equal the @probed_size). +* +* Note that if the value returned here is zero, then +* this must be an old kernel which lacks the relevant +* small-bar uAPI support (including +* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS), but on +* such systems we should never actually end up with a +* small BAR configuration, assuming we are able to load +* the kernel module. Hence it should be safe to treat +* this the same as when @probed_cpu_visible_size == +* @probed_size. +*/ + __u64 probed_cpu_v
Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document
On 13/06/2022 21:02, Niranjana Vishwanathapura wrote: On Mon, Jun 13, 2022 at 06:33:07AM -0700, Zeng, Oak wrote: Regards, Oak -Original Message- From: Intel-gfx On Behalf Of Niranjana Vishwanathapura Sent: June 10, 2022 1:43 PM To: Landwerlin, Lionel G Cc: Intel GFX ; Maling list - DRI developers de...@lists.freedesktop.org>; Hellstrom, Thomas ; Wilson, Chris P ; Vetter, Daniel ; Christian König Subject: Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document On Fri, Jun 10, 2022 at 11:18:14AM +0300, Lionel Landwerlin wrote: >On 10/06/2022 10:54, Niranjana Vishwanathapura wrote: >>On Fri, Jun 10, 2022 at 09:53:24AM +0300, Lionel Landwerlin wrote: >>>On 09/06/2022 22:31, Niranjana Vishwanathapura wrote: >>>>On Thu, Jun 09, 2022 at 05:49:09PM +0300, Lionel Landwerlin wrote: >>>>> On 09/06/2022 00:55, Jason Ekstrand wrote: >>>>> >>>>> On Wed, Jun 8, 2022 at 4:44 PM Niranjana Vishwanathapura >>>>> wrote: >>>>> >>>>> On Wed, Jun 08, 2022 at 08:33:25AM +0100, Tvrtko Ursulin wrote: >>>>> > >>>>> > >>>>> >On 07/06/2022 22:32, Niranjana Vishwanathapura wrote: >>>>> >>On Tue, Jun 07, 2022 at 11:18:11AM -0700, Niranjana >>>>>Vishwanathapura >>>>> wrote: >>>>> >>>On Tue, Jun 07, 2022 at 12:12:03PM -0500, Jason >>>>>Ekstrand wrote: >>>>> >>>> On Fri, Jun 3, 2022 at 6:52 PM Niranjana Vishwanathapura >>>>> >>>> wrote: >>>>> >>>> >>>>> >>>> On Fri, Jun 03, 2022 at 10:20:25AM +0300, Lionel >>>>>Landwerlin >>>>> wrote: >>>>> >>>> > On 02/06/2022 23:35, Jason Ekstrand wrote: >>>>> >>>> > >>>>> >>>> > On Thu, Jun 2, 2022 at 3:11 PM Niranjana >>>>>Vishwanathapura >>>>> >>>> > wrote: >>>>> >>>> > >>>>> >>>> > On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew >>>>> >>>>Brost wrote: >>>>> >>>> > >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel >>>>> Landwerlin >>>>> >>>> wrote: >>>>> >>>> > >> On 17/05/2022 21:32, Niranjana Vishwanathapura >>>>> wrote: >>>>> >>>> > >> > +VM_BIND/UNBIND ioctl will immediately start >>>>> >>>> binding/unbinding >>>>> >>>> > the mapping in an >>>>> >>>> > >> > +async worker. The binding and >>>>>unbinding will >>>>> >>>>work like a >>>>> >>>> special >>>>> >>>> > GPU engine. >>>>> >>>> > >> > +The binding and unbinding operations are >>>>> serialized and >>>>> >>>> will >>>>> >>>> > wait on specified >>>>> >>>> > >> > +input fences before the operation >>>>>and will signal >>>>> the >>>>> >>>> output >>>>> >>>> > fences upon the >>>>> >>>> > >> > +completion of the operation. Due to >>>>> serialization, >>>>> >>>> completion of >>>>> >>>> > an operation >>>>> >>>> > >> > +will also indicate that all >>>>>previous operations >>>>> >>>>are also >>>>> >>>> > complete. >>>>> >>>> > >> >>>>> >>>> > >> I guess we should avoid saying "will >>>>>immediately >>>>> start >>>>> >>>> > binding/unbinding" if >>>>> >>>> > >> there are fences involved. >>>>> >>>> > >> >>>>> >>>> > >> And the fact that i
Re: [PATCH 3/3] drm/doc/rfc: VM_BIND uapi definition
On 10/06/2022 11:53, Matthew Brost wrote: On Fri, Jun 10, 2022 at 12:07:11AM -0700, Niranjana Vishwanathapura wrote: VM_BIND and related uapi definitions Signed-off-by: Niranjana Vishwanathapura --- Documentation/gpu/rfc/i915_vm_bind.h | 490 +++ 1 file changed, 490 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h new file mode 100644 index ..9fc854969cfb --- /dev/null +++ b/Documentation/gpu/rfc/i915_vm_bind.h @@ -0,0 +1,490 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2022 Intel Corporation + */ + +/** + * DOC: I915_PARAM_HAS_VM_BIND + * + * VM_BIND feature availability. + * See typedef drm_i915_getparam_t param. + * bit[0]: If set, VM_BIND is supported, otherwise not. + * bits[8-15]: VM_BIND implementation version. + * version 0 will not have VM_BIND/UNBIND timeline fence array support. + */ +#define I915_PARAM_HAS_VM_BIND 57 + +/** + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND + * + * Flag to opt-in for VM_BIND mode of binding during VM creation. + * See struct drm_i915_gem_vm_control flags. + * + * The older execbuf2 ioctl will not support VM_BIND mode of operation. + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any + * execlist (See struct drm_i915_gem_execbuffer3 for more details). + * + */ +#define I915_VM_CREATE_FLAGS_USE_VM_BIND (1 << 0) + +/** + * DOC: I915_CONTEXT_CREATE_FLAGS_LONG_RUNNING + * + * Flag to declare context as long running. + * See struct drm_i915_gem_context_create_ext flags. + * + * Usage of dma-fence expects that they complete in reasonable amount of time. + * Compute on the other hand can be long running. Hence it is not appropriate + * for compute contexts to export request completion dma-fence to user. + * The dma-fence usage will be limited to in-kernel consumption only. + * Compute contexts need to use user/memory fence. + * + * So, long running contexts do not support output fences. Hence, + * I915_EXEC_FENCE_SIGNAL (See &drm_i915_gem_exec_fence.flags) is expected + * to be not used. DRM_I915_GEM_WAIT ioctl call is also not supported for + * objects mapped to long running contexts. + */ +#define I915_CONTEXT_CREATE_FLAGS_LONG_RUNNING (1u << 2) + +/* VM_BIND related ioctls */ +#define DRM_I915_GEM_VM_BIND 0x3d +#define DRM_I915_GEM_VM_UNBIND 0x3e +#define DRM_I915_GEM_EXECBUFFER3 0x3f +#define DRM_I915_GEM_WAIT_USER_FENCE 0x40 + +#define DRM_IOCTL_I915_GEM_VM_BIND DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind) +#define DRM_IOCTL_I915_GEM_VM_UNBIND DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind) +#define DRM_IOCTL_I915_GEM_EXECBUFFER3 DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3) +#define DRM_IOCTL_I915_GEM_WAIT_USER_FENCE DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_WAIT_USER_FENCE, struct drm_i915_gem_wait_user_fence) + +/** + * struct drm_i915_gem_vm_bind - VA to object mapping to bind. + * + * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU + * virtual address (VA) range to the section of an object that should be bound + * in the device page table of the specified address space (VM). + * The VA range specified must be unique (ie., not currently bound) and can + * be mapped to whole object or a section of the object (partial binding). + * Multiple VA mappings can be created to the same section of the object + * (aliasing). + * + * The @queue_idx specifies the queue to use for binding. Same queue can be + * used for both VM_BIND and VM_UNBIND calls. All submitted bind and unbind + * operations in a queue are performed in the order of submission. + * + * The @start, @offset and @length should be 4K page aligned. However the DG2 + * and XEHPSDV has 64K page size for device local-memory and has compact page + * table. On those platforms, for binding device local-memory objects, the + * @start should be 2M aligned, @offset and @length should be 64K aligned. + * Also, on those platforms, it is not allowed to bind an device local-memory + * object and a system memory object in a single 2M section of VA range. + */ +struct drm_i915_gem_vm_bind { + /** @vm_id: VM (address space) id to bind */ + __u32 vm_id; + + /** @queue_idx: Index of queue for binding */ + __u32 queue_idx; + + /** @rsvd: Reserved, MBZ */ + __u32 rsvd; + + /** @handle: Object handle */ + __u32 handle; + + /** @start: Virtual Address start to bind */ + __u64 start; + + /** @offset: Offset in object to bind */ + __u64 offset; + + /** @length: Length of mapping to bind */ + __u64 length; This probably isn't needed. We are never going to unbind a subset of a VMA are we? That being said it can't hurt as a sanity check (e.g. internal vma->le
Re: [Intel-gfx] [PATCH 3/3] drm/doc/rfc: VM_BIND uapi definition
On 10/06/2022 13:37, Tvrtko Ursulin wrote: On 10/06/2022 08:07, Niranjana Vishwanathapura wrote: VM_BIND and related uapi definitions Signed-off-by: Niranjana Vishwanathapura --- Documentation/gpu/rfc/i915_vm_bind.h | 490 +++ 1 file changed, 490 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h new file mode 100644 index ..9fc854969cfb --- /dev/null +++ b/Documentation/gpu/rfc/i915_vm_bind.h @@ -0,0 +1,490 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2022 Intel Corporation + */ + +/** + * DOC: I915_PARAM_HAS_VM_BIND + * + * VM_BIND feature availability. + * See typedef drm_i915_getparam_t param. + * bit[0]: If set, VM_BIND is supported, otherwise not. + * bits[8-15]: VM_BIND implementation version. + * version 0 will not have VM_BIND/UNBIND timeline fence array support. + */ +#define I915_PARAM_HAS_VM_BIND 57 + +/** + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND + * + * Flag to opt-in for VM_BIND mode of binding during VM creation. + * See struct drm_i915_gem_vm_control flags. + * + * The older execbuf2 ioctl will not support VM_BIND mode of operation. + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any + * execlist (See struct drm_i915_gem_execbuffer3 for more details). + * + */ +#define I915_VM_CREATE_FLAGS_USE_VM_BIND (1 << 0) + +/** + * DOC: I915_CONTEXT_CREATE_FLAGS_LONG_RUNNING + * + * Flag to declare context as long running. + * See struct drm_i915_gem_context_create_ext flags. + * + * Usage of dma-fence expects that they complete in reasonable amount of time. + * Compute on the other hand can be long running. Hence it is not appropriate + * for compute contexts to export request completion dma-fence to user. + * The dma-fence usage will be limited to in-kernel consumption only. + * Compute contexts need to use user/memory fence. + * + * So, long running contexts do not support output fences. Hence, + * I915_EXEC_FENCE_SIGNAL (See &drm_i915_gem_exec_fence.flags) is expected + * to be not used. DRM_I915_GEM_WAIT ioctl call is also not supported for + * objects mapped to long running contexts. + */ +#define I915_CONTEXT_CREATE_FLAGS_LONG_RUNNING (1u << 2) + +/* VM_BIND related ioctls */ +#define DRM_I915_GEM_VM_BIND 0x3d +#define DRM_I915_GEM_VM_UNBIND 0x3e +#define DRM_I915_GEM_EXECBUFFER3 0x3f +#define DRM_I915_GEM_WAIT_USER_FENCE 0x40 + +#define DRM_IOCTL_I915_GEM_VM_BIND DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind) +#define DRM_IOCTL_I915_GEM_VM_UNBIND DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind) +#define DRM_IOCTL_I915_GEM_EXECBUFFER3 DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3) +#define DRM_IOCTL_I915_GEM_WAIT_USER_FENCE DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_WAIT_USER_FENCE, struct drm_i915_gem_wait_user_fence) + +/** + * struct drm_i915_gem_vm_bind - VA to object mapping to bind. + * + * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU + * virtual address (VA) range to the section of an object that should be bound + * in the device page table of the specified address space (VM). + * The VA range specified must be unique (ie., not currently bound) and can + * be mapped to whole object or a section of the object (partial binding). + * Multiple VA mappings can be created to the same section of the object + * (aliasing). + * + * The @queue_idx specifies the queue to use for binding. Same queue can be + * used for both VM_BIND and VM_UNBIND calls. All submitted bind and unbind + * operations in a queue are performed in the order of submission. + * + * The @start, @offset and @length should be 4K page aligned. However the DG2 + * and XEHPSDV has 64K page size for device local-memory and has compact page + * table. On those platforms, for binding device local-memory objects, the + * @start should be 2M aligned, @offset and @length should be 64K aligned. + * Also, on those platforms, it is not allowed to bind an device local-memory + * object and a system memory object in a single 2M section of VA range. + */ +struct drm_i915_gem_vm_bind { + /** @vm_id: VM (address space) id to bind */ + __u32 vm_id; + + /** @queue_idx: Index of queue for binding */ + __u32 queue_idx; I have a question here to which I did not find an answer by browsing the old threads. Queue index appears to be an implicit synchronisation mechanism, right? Operations on the same index are executed/complete in order of ioctl submission? Do we _have_ to implement this on the kernel side and could just allow in/out fence and let userspace deal with it? It orders operations like in a queue. Which is kind of what happens with existing queues/engines. If I understood correctly, it's going to be a kthread + a linked list righ
Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document
On 10/06/2022 10:54, Niranjana Vishwanathapura wrote: On Fri, Jun 10, 2022 at 09:53:24AM +0300, Lionel Landwerlin wrote: On 09/06/2022 22:31, Niranjana Vishwanathapura wrote: On Thu, Jun 09, 2022 at 05:49:09PM +0300, Lionel Landwerlin wrote: On 09/06/2022 00:55, Jason Ekstrand wrote: On Wed, Jun 8, 2022 at 4:44 PM Niranjana Vishwanathapura wrote: On Wed, Jun 08, 2022 at 08:33:25AM +0100, Tvrtko Ursulin wrote: > > >On 07/06/2022 22:32, Niranjana Vishwanathapura wrote: >>On Tue, Jun 07, 2022 at 11:18:11AM -0700, Niranjana Vishwanathapura wrote: >>>On Tue, Jun 07, 2022 at 12:12:03PM -0500, Jason Ekstrand wrote: >>>> On Fri, Jun 3, 2022 at 6:52 PM Niranjana Vishwanathapura >>>> wrote: >>>> >>>> On Fri, Jun 03, 2022 at 10:20:25AM +0300, Lionel Landwerlin wrote: >>>> > On 02/06/2022 23:35, Jason Ekstrand wrote: >>>> > >>>> > On Thu, Jun 2, 2022 at 3:11 PM Niranjana Vishwanathapura >>>> > wrote: >>>> > >>>> > On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew >>>>Brost wrote: >>>> > >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel Landwerlin >>>> wrote: >>>> > >> On 17/05/2022 21:32, Niranjana Vishwanathapura wrote: >>>> > >> > +VM_BIND/UNBIND ioctl will immediately start >>>> binding/unbinding >>>> > the mapping in an >>>> > >> > +async worker. The binding and unbinding will >>>>work like a >>>> special >>>> > GPU engine. >>>> > >> > +The binding and unbinding operations are serialized and >>>> will >>>> > wait on specified >>>> > >> > +input fences before the operation and will signal the >>>> output >>>> > fences upon the >>>> > >> > +completion of the operation. Due to serialization, >>>> completion of >>>> > an operation >>>> > >> > +will also indicate that all previous operations >>>>are also >>>> > complete. >>>> > >> >>>> > >> I guess we should avoid saying "will immediately start >>>> > binding/unbinding" if >>>> > >> there are fences involved. >>>> > >> >>>> > >> And the fact that it's happening in an async >>>>worker seem to >>>> imply >>>> > it's not >>>> > >> immediate. >>>> > >> >>>> > >>>> > Ok, will fix. >>>> > This was added because in earlier design binding was deferred >>>> until >>>> > next execbuff. >>>> > But now it is non-deferred (immediate in that sense). >>>>But yah, >>>> this is >>>> > confusing >>>> > and will fix it. >>>> > >>>> > >> >>>> > >> I have a question on the behavior of the bind >>>>operation when >>>> no >>>> > input fence >>>> > >> is provided. Let say I do : >>>> > >> >>>> > >> VM_BIND (out_fence=fence1) >>>> > >> >>>> > >> VM_BIND (out_fence=fence2) >>>> > >> >>>> > >> VM_BIND (out_fence=fence3) >>>> > >> >>>> > >> >>>> > >> In what order are the fences going to be signaled? >>>> > >> >>>> > >> In the order of VM_BIND ioctls? Or out of order? >>>> >
Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document
On 09/06/2022 22:31, Niranjana Vishwanathapura wrote: On Thu, Jun 09, 2022 at 05:49:09PM +0300, Lionel Landwerlin wrote: On 09/06/2022 00:55, Jason Ekstrand wrote: On Wed, Jun 8, 2022 at 4:44 PM Niranjana Vishwanathapura wrote: On Wed, Jun 08, 2022 at 08:33:25AM +0100, Tvrtko Ursulin wrote: > > >On 07/06/2022 22:32, Niranjana Vishwanathapura wrote: >>On Tue, Jun 07, 2022 at 11:18:11AM -0700, Niranjana Vishwanathapura wrote: >>>On Tue, Jun 07, 2022 at 12:12:03PM -0500, Jason Ekstrand wrote: >>>> On Fri, Jun 3, 2022 at 6:52 PM Niranjana Vishwanathapura >>>> wrote: >>>> >>>> On Fri, Jun 03, 2022 at 10:20:25AM +0300, Lionel Landwerlin wrote: >>>> > On 02/06/2022 23:35, Jason Ekstrand wrote: >>>> > >>>> > On Thu, Jun 2, 2022 at 3:11 PM Niranjana Vishwanathapura >>>> > wrote: >>>> > >>>> > On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew >>>>Brost wrote: >>>> > >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel Landwerlin >>>> wrote: >>>> > >> On 17/05/2022 21:32, Niranjana Vishwanathapura wrote: >>>> > >> > +VM_BIND/UNBIND ioctl will immediately start >>>> binding/unbinding >>>> > the mapping in an >>>> > >> > +async worker. The binding and unbinding will >>>>work like a >>>> special >>>> > GPU engine. >>>> > >> > +The binding and unbinding operations are serialized and >>>> will >>>> > wait on specified >>>> > >> > +input fences before the operation and will signal the >>>> output >>>> > fences upon the >>>> > >> > +completion of the operation. Due to serialization, >>>> completion of >>>> > an operation >>>> > >> > +will also indicate that all previous operations >>>>are also >>>> > complete. >>>> > >> >>>> > >> I guess we should avoid saying "will immediately start >>>> > binding/unbinding" if >>>> > >> there are fences involved. >>>> > >> >>>> > >> And the fact that it's happening in an async >>>>worker seem to >>>> imply >>>> > it's not >>>> > >> immediate. >>>> > >> >>>> > >>>> > Ok, will fix. >>>> > This was added because in earlier design binding was deferred >>>> until >>>> > next execbuff. >>>> > But now it is non-deferred (immediate in that sense). >>>>But yah, >>>> this is >>>> > confusing >>>> > and will fix it. >>>> > >>>> > >> >>>> > >> I have a question on the behavior of the bind >>>>operation when >>>> no >>>> > input fence >>>> > >> is provided. Let say I do : >>>> > >> >>>> > >> VM_BIND (out_fence=fence1) >>>> > >> >>>> > >> VM_BIND (out_fence=fence2) >>>> > >> >>>> > >> VM_BIND (out_fence=fence3) >>>> > >> >>>> > >> >>>> > >> In what order are the fences going to be signaled? >>>> > >> >>>> > >> In the order of VM_BIND ioctls? Or out of order? >>>> > >> >>>> > >> Because you wrote "serialized I assume it's : in order >
Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document
On 09/06/2022 00:55, Jason Ekstrand wrote: On Wed, Jun 8, 2022 at 4:44 PM Niranjana Vishwanathapura wrote: On Wed, Jun 08, 2022 at 08:33:25AM +0100, Tvrtko Ursulin wrote: > > >On 07/06/2022 22:32, Niranjana Vishwanathapura wrote: >>On Tue, Jun 07, 2022 at 11:18:11AM -0700, Niranjana Vishwanathapura wrote: >>>On Tue, Jun 07, 2022 at 12:12:03PM -0500, Jason Ekstrand wrote: >>>> On Fri, Jun 3, 2022 at 6:52 PM Niranjana Vishwanathapura >>>> wrote: >>>> >>>> On Fri, Jun 03, 2022 at 10:20:25AM +0300, Lionel Landwerlin wrote: >>>> > On 02/06/2022 23:35, Jason Ekstrand wrote: >>>> > >>>> > On Thu, Jun 2, 2022 at 3:11 PM Niranjana Vishwanathapura >>>> > wrote: >>>> > >>>> > On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew >>>>Brost wrote: >>>> > >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel Landwerlin >>>> wrote: >>>> > >> On 17/05/2022 21:32, Niranjana Vishwanathapura wrote: >>>> > >> > +VM_BIND/UNBIND ioctl will immediately start >>>> binding/unbinding >>>> > the mapping in an >>>> > >> > +async worker. The binding and unbinding will >>>>work like a >>>> special >>>> > GPU engine. >>>> > >> > +The binding and unbinding operations are serialized and >>>> will >>>> > wait on specified >>>> > >> > +input fences before the operation and will signal the >>>> output >>>> > fences upon the >>>> > >> > +completion of the operation. Due to serialization, >>>> completion of >>>> > an operation >>>> > >> > +will also indicate that all previous operations >>>>are also >>>> > complete. >>>> > >> >>>> > >> I guess we should avoid saying "will immediately start >>>> > binding/unbinding" if >>>> > >> there are fences involved. >>>> > >> >>>> > >> And the fact that it's happening in an async >>>>worker seem to >>>> imply >>>> > it's not >>>> > >> immediate. >>>> > >> >>>> > >>>> > Ok, will fix. >>>> > This was added because in earlier design binding was deferred >>>> until >>>> > next execbuff. >>>> > But now it is non-deferred (immediate in that sense). >>>>But yah, >>>> this is >>>> > confusing >>>> > and will fix it. >>>> > >>>> > >> >>>> > >> I have a question on the behavior of the bind >>>>operation when >>>> no >>>> > input fence >>>> > >> is provided. Let say I do : >>>> > >> >>>> > >> VM_BIND (out_fence=fence1) >>>> > >> >>>> > >> VM_BIND (out_fence=fence2) >>>> > >> >>>> > >> VM_BIND (out_fence=fence3) >>>> > >> >>>> > >> >>>> > >> In what order are the fences going to be signaled? >>>> > >> >>>> > >> In the order of VM_BIND ioctls? Or out of order? >>>> > >> >>>> > >> Because you wrote "serialized I assume it's : in order >>>> > >> >>>> > >>>> > Yes, in the order of VM_BIND/UNBIND ioctls. Note that >>>>bind and >>>> unbind >>>> > will use >>>> > the same queue and hence are ord
Re: [Intel-gfx] [RFC v3 3/3] drm/doc/rfc: VM_BIND uapi definition
On 08/06/2022 11:36, Tvrtko Ursulin wrote: On 08/06/2022 07:40, Lionel Landwerlin wrote: On 03/06/2022 09:53, Niranjana Vishwanathapura wrote: On Wed, Jun 01, 2022 at 10:08:35PM -0700, Niranjana Vishwanathapura wrote: On Wed, Jun 01, 2022 at 11:27:17AM +0200, Daniel Vetter wrote: On Wed, 1 Jun 2022 at 11:03, Dave Airlie wrote: On Tue, 24 May 2022 at 05:20, Niranjana Vishwanathapura wrote: On Thu, May 19, 2022 at 04:07:30PM -0700, Zanoni, Paulo R wrote: >On Tue, 2022-05-17 at 11:32 -0700, Niranjana Vishwanathapura wrote: >> VM_BIND and related uapi definitions >> >> v2: Ensure proper kernel-doc formatting with cross references. >> Also add new uapi and documentation as per review comments >> from Daniel. >> >> Signed-off-by: Niranjana Vishwanathapura >> --- >> Documentation/gpu/rfc/i915_vm_bind.h | 399 +++ >> 1 file changed, 399 insertions(+) >> create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h >> >> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h >> new file mode 100644 >> index ..589c0a009107 >> --- /dev/null >> +++ b/Documentation/gpu/rfc/i915_vm_bind.h >> @@ -0,0 +1,399 @@ >> +/* SPDX-License-Identifier: MIT */ >> +/* >> + * Copyright © 2022 Intel Corporation >> + */ >> + >> +/** >> + * DOC: I915_PARAM_HAS_VM_BIND >> + * >> + * VM_BIND feature availability. >> + * See typedef drm_i915_getparam_t param. >> + */ >> +#define I915_PARAM_HAS_VM_BIND 57 >> + >> +/** >> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND >> + * >> + * Flag to opt-in for VM_BIND mode of binding during VM creation. >> + * See struct drm_i915_gem_vm_control flags. >> + * >> + * A VM in VM_BIND mode will not support the older execbuff mode of binding. >> + * In VM_BIND mode, execbuff ioctl will not accept any execlist (ie., the >> + * &drm_i915_gem_execbuffer2.buffer_count must be 0). >> + * Also, &drm_i915_gem_execbuffer2.batch_start_offset and >> + * &drm_i915_gem_execbuffer2.batch_len must be 0. >> + * DRM_I915_GEM_EXECBUFFER_EXT_BATCH_ADDRESSES extension must be provided >> + * to pass in the batch buffer addresses. >> + * >> + * Additionally, I915_EXEC_NO_RELOC, I915_EXEC_HANDLE_LUT and >> + * I915_EXEC_BATCH_FIRST of &drm_i915_gem_execbuffer2.flags must be 0 >> + * (not used) in VM_BIND mode. I915_EXEC_USE_EXTENSIONS flag must always be >> + * set (See struct drm_i915_gem_execbuffer_ext_batch_addresses). >> + * The buffers_ptr, buffer_count, batch_start_offset and batch_len fields >> + * of struct drm_i915_gem_execbuffer2 are also not used and must be 0. >> + */ > >From that description, it seems we have: > >struct drm_i915_gem_execbuffer2 { > __u64 buffers_ptr; -> must be 0 (new) > __u32 buffer_count; -> must be 0 (new) > __u32 batch_start_offset; -> must be 0 (new) > __u32 batch_len; -> must be 0 (new) > __u32 DR1; -> must be 0 (old) > __u32 DR4; -> must be 0 (old) > __u32 num_cliprects; (fences) -> must be 0 since using extensions > __u64 cliprects_ptr; (fences, extensions) -> contains an actual pointer! > __u64 flags; -> some flags must be 0 (new) > __u64 rsvd1; (context info) -> repurposed field (old) > __u64 rsvd2; -> unused >}; > >Based on that, why can't we just get drm_i915_gem_execbuffer3 instead >of adding even more complexity to an already abused interface? While >the Vulkan-like extension thing is really nice, I don't think what >we're doing here is extending the ioctl usage, we're completely >changing how the base struct should be interpreted based on how the VM >was created (which is an entirely different ioctl). > >From Rusty Russel's API Design grading, drm_i915_gem_execbuffer2 is >already at -6 without these changes. I think after vm_bind we'll need >to create a -11 entry just to deal with this ioctl. > The only change here is removing the execlist support for VM_BIND mode (other than natual extensions). Adding a new execbuffer3 was considered, but I think we need to be careful with that as that goes beyond the VM_BIND support, including any future requirements (as we don't want an execbuffer4 after VM_BIND). Why not? it's not like adding extensions here is really that different than adding new ioctls. I definitely think this deserves an execbuffer3 without even considering future req
Re: [Intel-gfx] [RFC v3 3/3] drm/doc/rfc: VM_BIND uapi definition
On 03/06/2022 09:53, Niranjana Vishwanathapura wrote: On Wed, Jun 01, 2022 at 10:08:35PM -0700, Niranjana Vishwanathapura wrote: On Wed, Jun 01, 2022 at 11:27:17AM +0200, Daniel Vetter wrote: On Wed, 1 Jun 2022 at 11:03, Dave Airlie wrote: On Tue, 24 May 2022 at 05:20, Niranjana Vishwanathapura wrote: On Thu, May 19, 2022 at 04:07:30PM -0700, Zanoni, Paulo R wrote: >On Tue, 2022-05-17 at 11:32 -0700, Niranjana Vishwanathapura wrote: >> VM_BIND and related uapi definitions >> >> v2: Ensure proper kernel-doc formatting with cross references. >> Also add new uapi and documentation as per review comments >> from Daniel. >> >> Signed-off-by: Niranjana Vishwanathapura >> --- >> Documentation/gpu/rfc/i915_vm_bind.h | 399 +++ >> 1 file changed, 399 insertions(+) >> create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h >> >> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h >> new file mode 100644 >> index ..589c0a009107 >> --- /dev/null >> +++ b/Documentation/gpu/rfc/i915_vm_bind.h >> @@ -0,0 +1,399 @@ >> +/* SPDX-License-Identifier: MIT */ >> +/* >> + * Copyright © 2022 Intel Corporation >> + */ >> + >> +/** >> + * DOC: I915_PARAM_HAS_VM_BIND >> + * >> + * VM_BIND feature availability. >> + * See typedef drm_i915_getparam_t param. >> + */ >> +#define I915_PARAM_HAS_VM_BIND 57 >> + >> +/** >> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND >> + * >> + * Flag to opt-in for VM_BIND mode of binding during VM creation. >> + * See struct drm_i915_gem_vm_control flags. >> + * >> + * A VM in VM_BIND mode will not support the older execbuff mode of binding. >> + * In VM_BIND mode, execbuff ioctl will not accept any execlist (ie., the >> + * &drm_i915_gem_execbuffer2.buffer_count must be 0). >> + * Also, &drm_i915_gem_execbuffer2.batch_start_offset and >> + * &drm_i915_gem_execbuffer2.batch_len must be 0. >> + * DRM_I915_GEM_EXECBUFFER_EXT_BATCH_ADDRESSES extension must be provided >> + * to pass in the batch buffer addresses. >> + * >> + * Additionally, I915_EXEC_NO_RELOC, I915_EXEC_HANDLE_LUT and >> + * I915_EXEC_BATCH_FIRST of &drm_i915_gem_execbuffer2.flags must be 0 >> + * (not used) in VM_BIND mode. I915_EXEC_USE_EXTENSIONS flag must always be >> + * set (See struct drm_i915_gem_execbuffer_ext_batch_addresses). >> + * The buffers_ptr, buffer_count, batch_start_offset and batch_len fields >> + * of struct drm_i915_gem_execbuffer2 are also not used and must be 0. >> + */ > >From that description, it seems we have: > >struct drm_i915_gem_execbuffer2 { > __u64 buffers_ptr; -> must be 0 (new) > __u32 buffer_count; -> must be 0 (new) > __u32 batch_start_offset; -> must be 0 (new) > __u32 batch_len; -> must be 0 (new) > __u32 DR1; -> must be 0 (old) > __u32 DR4; -> must be 0 (old) > __u32 num_cliprects; (fences) -> must be 0 since using extensions > __u64 cliprects_ptr; (fences, extensions) -> contains an actual pointer! > __u64 flags; -> some flags must be 0 (new) > __u64 rsvd1; (context info) -> repurposed field (old) > __u64 rsvd2; -> unused >}; > >Based on that, why can't we just get drm_i915_gem_execbuffer3 instead >of adding even more complexity to an already abused interface? While >the Vulkan-like extension thing is really nice, I don't think what >we're doing here is extending the ioctl usage, we're completely >changing how the base struct should be interpreted based on how the VM >was created (which is an entirely different ioctl). > >From Rusty Russel's API Design grading, drm_i915_gem_execbuffer2 is >already at -6 without these changes. I think after vm_bind we'll need >to create a -11 entry just to deal with this ioctl. > The only change here is removing the execlist support for VM_BIND mode (other than natual extensions). Adding a new execbuffer3 was considered, but I think we need to be careful with that as that goes beyond the VM_BIND support, including any future requirements (as we don't want an execbuffer4 after VM_BIND). Why not? it's not like adding extensions here is really that different than adding new ioctls. I definitely think this deserves an execbuffer3 without even considering future requirements. Just to burn down the old requirements and pointless fields. Make execbuffer3 be vm bind only, no relocs, no legacy bits, leave the older sw on execbuf2 for ever. I guess another point in favour of execbuf3 would be that it's less midlayer. If we share the entry point then there's quite a few vfuncs needed to cleanly split out the vm_bind paths from the legacy reloc/softping paths. If we invert this and do execbuf3, then there's the existing ioctl vfunc, and then we share code (where it even makes sense, probably request setup/submit need to be shared, a
Re: [Intel-gfx] [RFC v3 3/3] drm/doc/rfc: VM_BIND uapi definition
On 08/06/2022 09:40, Lionel Landwerlin wrote: On 03/06/2022 09:53, Niranjana Vishwanathapura wrote: On Wed, Jun 01, 2022 at 10:08:35PM -0700, Niranjana Vishwanathapura wrote: On Wed, Jun 01, 2022 at 11:27:17AM +0200, Daniel Vetter wrote: On Wed, 1 Jun 2022 at 11:03, Dave Airlie wrote: On Tue, 24 May 2022 at 05:20, Niranjana Vishwanathapura wrote: On Thu, May 19, 2022 at 04:07:30PM -0700, Zanoni, Paulo R wrote: >On Tue, 2022-05-17 at 11:32 -0700, Niranjana Vishwanathapura wrote: >> VM_BIND and related uapi definitions >> >> v2: Ensure proper kernel-doc formatting with cross references. >> Also add new uapi and documentation as per review comments >> from Daniel. >> >> Signed-off-by: Niranjana Vishwanathapura >> --- >> Documentation/gpu/rfc/i915_vm_bind.h | 399 +++ >> 1 file changed, 399 insertions(+) >> create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h >> >> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h >> new file mode 100644 >> index ..589c0a009107 >> --- /dev/null >> +++ b/Documentation/gpu/rfc/i915_vm_bind.h >> @@ -0,0 +1,399 @@ >> +/* SPDX-License-Identifier: MIT */ >> +/* >> + * Copyright © 2022 Intel Corporation >> + */ >> + >> +/** >> + * DOC: I915_PARAM_HAS_VM_BIND >> + * >> + * VM_BIND feature availability. >> + * See typedef drm_i915_getparam_t param. >> + */ >> +#define I915_PARAM_HAS_VM_BIND 57 >> + >> +/** >> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND >> + * >> + * Flag to opt-in for VM_BIND mode of binding during VM creation. >> + * See struct drm_i915_gem_vm_control flags. >> + * >> + * A VM in VM_BIND mode will not support the older execbuff mode of binding. >> + * In VM_BIND mode, execbuff ioctl will not accept any execlist (ie., the >> + * &drm_i915_gem_execbuffer2.buffer_count must be 0). >> + * Also, &drm_i915_gem_execbuffer2.batch_start_offset and >> + * &drm_i915_gem_execbuffer2.batch_len must be 0. >> + * DRM_I915_GEM_EXECBUFFER_EXT_BATCH_ADDRESSES extension must be provided >> + * to pass in the batch buffer addresses. >> + * >> + * Additionally, I915_EXEC_NO_RELOC, I915_EXEC_HANDLE_LUT and >> + * I915_EXEC_BATCH_FIRST of &drm_i915_gem_execbuffer2.flags must be 0 >> + * (not used) in VM_BIND mode. I915_EXEC_USE_EXTENSIONS flag must always be >> + * set (See struct drm_i915_gem_execbuffer_ext_batch_addresses). >> + * The buffers_ptr, buffer_count, batch_start_offset and batch_len fields >> + * of struct drm_i915_gem_execbuffer2 are also not used and must be 0. >> + */ > >From that description, it seems we have: > >struct drm_i915_gem_execbuffer2 { > __u64 buffers_ptr; -> must be 0 (new) > __u32 buffer_count; -> must be 0 (new) > __u32 batch_start_offset; -> must be 0 (new) > __u32 batch_len; -> must be 0 (new) > __u32 DR1; -> must be 0 (old) > __u32 DR4; -> must be 0 (old) > __u32 num_cliprects; (fences) -> must be 0 since using extensions > __u64 cliprects_ptr; (fences, extensions) -> contains an actual pointer! > __u64 flags; -> some flags must be 0 (new) > __u64 rsvd1; (context info) -> repurposed field (old) > __u64 rsvd2; -> unused >}; > >Based on that, why can't we just get drm_i915_gem_execbuffer3 instead >of adding even more complexity to an already abused interface? While >the Vulkan-like extension thing is really nice, I don't think what >we're doing here is extending the ioctl usage, we're completely >changing how the base struct should be interpreted based on how the VM >was created (which is an entirely different ioctl). > >From Rusty Russel's API Design grading, drm_i915_gem_execbuffer2 is >already at -6 without these changes. I think after vm_bind we'll need >to create a -11 entry just to deal with this ioctl. > The only change here is removing the execlist support for VM_BIND mode (other than natual extensions). Adding a new execbuffer3 was considered, but I think we need to be careful with that as that goes beyond the VM_BIND support, including any future requirements (as we don't want an execbuffer4 after VM_BIND). Why not? it's not like adding extensions here is really that different than adding new ioctls. I definitely think this deserves an execbuffer3 without even considering future requirements. Just to burn down the old requiremen
Re: [Intel-gfx] [RFC v3 3/3] drm/doc/rfc: VM_BIND uapi definition
On 03/06/2022 09:53, Niranjana Vishwanathapura wrote: On Wed, Jun 01, 2022 at 10:08:35PM -0700, Niranjana Vishwanathapura wrote: On Wed, Jun 01, 2022 at 11:27:17AM +0200, Daniel Vetter wrote: On Wed, 1 Jun 2022 at 11:03, Dave Airlie wrote: On Tue, 24 May 2022 at 05:20, Niranjana Vishwanathapura wrote: On Thu, May 19, 2022 at 04:07:30PM -0700, Zanoni, Paulo R wrote: >On Tue, 2022-05-17 at 11:32 -0700, Niranjana Vishwanathapura wrote: >> VM_BIND and related uapi definitions >> >> v2: Ensure proper kernel-doc formatting with cross references. >> Also add new uapi and documentation as per review comments >> from Daniel. >> >> Signed-off-by: Niranjana Vishwanathapura >> --- >> Documentation/gpu/rfc/i915_vm_bind.h | 399 +++ >> 1 file changed, 399 insertions(+) >> create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h >> >> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h b/Documentation/gpu/rfc/i915_vm_bind.h >> new file mode 100644 >> index ..589c0a009107 >> --- /dev/null >> +++ b/Documentation/gpu/rfc/i915_vm_bind.h >> @@ -0,0 +1,399 @@ >> +/* SPDX-License-Identifier: MIT */ >> +/* >> + * Copyright © 2022 Intel Corporation >> + */ >> + >> +/** >> + * DOC: I915_PARAM_HAS_VM_BIND >> + * >> + * VM_BIND feature availability. >> + * See typedef drm_i915_getparam_t param. >> + */ >> +#define I915_PARAM_HAS_VM_BIND 57 >> + >> +/** >> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND >> + * >> + * Flag to opt-in for VM_BIND mode of binding during VM creation. >> + * See struct drm_i915_gem_vm_control flags. >> + * >> + * A VM in VM_BIND mode will not support the older execbuff mode of binding. >> + * In VM_BIND mode, execbuff ioctl will not accept any execlist (ie., the >> + * &drm_i915_gem_execbuffer2.buffer_count must be 0). >> + * Also, &drm_i915_gem_execbuffer2.batch_start_offset and >> + * &drm_i915_gem_execbuffer2.batch_len must be 0. >> + * DRM_I915_GEM_EXECBUFFER_EXT_BATCH_ADDRESSES extension must be provided >> + * to pass in the batch buffer addresses. >> + * >> + * Additionally, I915_EXEC_NO_RELOC, I915_EXEC_HANDLE_LUT and >> + * I915_EXEC_BATCH_FIRST of &drm_i915_gem_execbuffer2.flags must be 0 >> + * (not used) in VM_BIND mode. I915_EXEC_USE_EXTENSIONS flag must always be >> + * set (See struct drm_i915_gem_execbuffer_ext_batch_addresses). >> + * The buffers_ptr, buffer_count, batch_start_offset and batch_len fields >> + * of struct drm_i915_gem_execbuffer2 are also not used and must be 0. >> + */ > >From that description, it seems we have: > >struct drm_i915_gem_execbuffer2 { > __u64 buffers_ptr; -> must be 0 (new) > __u32 buffer_count; -> must be 0 (new) > __u32 batch_start_offset; -> must be 0 (new) > __u32 batch_len; -> must be 0 (new) > __u32 DR1; -> must be 0 (old) > __u32 DR4; -> must be 0 (old) > __u32 num_cliprects; (fences) -> must be 0 since using extensions > __u64 cliprects_ptr; (fences, extensions) -> contains an actual pointer! > __u64 flags; -> some flags must be 0 (new) > __u64 rsvd1; (context info) -> repurposed field (old) > __u64 rsvd2; -> unused >}; > >Based on that, why can't we just get drm_i915_gem_execbuffer3 instead >of adding even more complexity to an already abused interface? While >the Vulkan-like extension thing is really nice, I don't think what >we're doing here is extending the ioctl usage, we're completely >changing how the base struct should be interpreted based on how the VM >was created (which is an entirely different ioctl). > >From Rusty Russel's API Design grading, drm_i915_gem_execbuffer2 is >already at -6 without these changes. I think after vm_bind we'll need >to create a -11 entry just to deal with this ioctl. > The only change here is removing the execlist support for VM_BIND mode (other than natual extensions). Adding a new execbuffer3 was considered, but I think we need to be careful with that as that goes beyond the VM_BIND support, including any future requirements (as we don't want an execbuffer4 after VM_BIND). Why not? it's not like adding extensions here is really that different than adding new ioctls. I definitely think this deserves an execbuffer3 without even considering future requirements. Just to burn down the old requirements and pointless fields. Make execbuffer3 be vm bind only, no relocs, no legacy bits, leave the older sw on execbuf2 for ever. I guess another point in favour of execbuf3 would be that it's less midlayer. If we share the entry point then there's quite a few vfuncs needed to cleanly split out the vm_bind paths from the legacy reloc/softping paths. If we invert this and do execbuf3, then there's the existing ioctl vfunc, and then we share code (where it even makes sense, probably request setup/submit need to be shared, a
Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document
On 02/06/2022 23:35, Jason Ekstrand wrote: On Thu, Jun 2, 2022 at 3:11 PM Niranjana Vishwanathapura wrote: On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew Brost wrote: >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel Landwerlin wrote: >> On 17/05/2022 21:32, Niranjana Vishwanathapura wrote: >> > +VM_BIND/UNBIND ioctl will immediately start binding/unbinding the mapping in an >> > +async worker. The binding and unbinding will work like a special GPU engine. >> > +The binding and unbinding operations are serialized and will wait on specified >> > +input fences before the operation and will signal the output fences upon the >> > +completion of the operation. Due to serialization, completion of an operation >> > +will also indicate that all previous operations are also complete. >> >> I guess we should avoid saying "will immediately start binding/unbinding" if >> there are fences involved. >> >> And the fact that it's happening in an async worker seem to imply it's not >> immediate. >> Ok, will fix. This was added because in earlier design binding was deferred until next execbuff. But now it is non-deferred (immediate in that sense). But yah, this is confusing and will fix it. >> >> I have a question on the behavior of the bind operation when no input fence >> is provided. Let say I do : >> >> VM_BIND (out_fence=fence1) >> >> VM_BIND (out_fence=fence2) >> >> VM_BIND (out_fence=fence3) >> >> >> In what order are the fences going to be signaled? >> >> In the order of VM_BIND ioctls? Or out of order? >> >> Because you wrote "serialized I assume it's : in order >> Yes, in the order of VM_BIND/UNBIND ioctls. Note that bind and unbind will use the same queue and hence are ordered. >> >> One thing I didn't realize is that because we only get one "VM_BIND" engine, >> there is a disconnect from the Vulkan specification. >> >> In Vulkan VM_BIND operations are serialized but per engine. >> >> So you could have something like this : >> >> VM_BIND (engine=rcs0, in_fence=fence1, out_fence=fence2) >> >> VM_BIND (engine=ccs0, in_fence=fence3, out_fence=fence4) >> >> >> fence1 is not signaled >> >> fence3 is signaled >> >> So the second VM_BIND will proceed before the first VM_BIND. >> >> >> I guess we can deal with that scenario in userspace by doing the wait >> ourselves in one thread per engines. >> >> But then it makes the VM_BIND input fences useless. >> >> >> Daniel : what do you think? Should be rework this or just deal with wait >> fences in userspace? >> > >My opinion is rework this but make the ordering via an engine param optional. > >e.g. A VM can be configured so all binds are ordered within the VM > >e.g. A VM can be configured so all binds accept an engine argument (in >the case of the i915 likely this is a gem context handle) and binds >ordered with respect to that engine. > >This gives UMDs options as the later likely consumes more KMD resources >so if a different UMD can live with binds being ordered within the VM >they can use a mode consuming less resources. > I think we need to be careful here if we are looking for some out of (submission) order completion of vm_bind/unbind. In-order completion means, in a batch of binds and unbinds to be completed in-order, user only needs to specify in-fence for the first bind/unbind call and the our-fence for the last bind/unbind call. Also, the VA released by an unbind call can be re-used by any subsequent bind call in that in-order batch. These things will break if binding/unbinding were to be allowed to go out of order (of submission) and user need to be extra careful not to run into pre-mature triggereing of out-fence and bind failing as VA is still in use etc. Also, VM_BIND binds the provided mapping on the specified address space (VM). So, the uapi is not engine/context specific. We can however add a 'queue' to the uapi which can be one from the pre-defined queues, I915_VM_BIND_QUEUE_0 I915_VM_BIND_QUEUE_1 ... I915_VM_BIND_QUEUE_(N-1) KMD will spawn an async work queue for each queue which will only bind
Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document
On 02/06/2022 00:18, Matthew Brost wrote: On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel Landwerlin wrote: On 17/05/2022 21:32, Niranjana Vishwanathapura wrote: +VM_BIND/UNBIND ioctl will immediately start binding/unbinding the mapping in an +async worker. The binding and unbinding will work like a special GPU engine. +The binding and unbinding operations are serialized and will wait on specified +input fences before the operation and will signal the output fences upon the +completion of the operation. Due to serialization, completion of an operation +will also indicate that all previous operations are also complete. I guess we should avoid saying "will immediately start binding/unbinding" if there are fences involved. And the fact that it's happening in an async worker seem to imply it's not immediate. I have a question on the behavior of the bind operation when no input fence is provided. Let say I do : VM_BIND (out_fence=fence1) VM_BIND (out_fence=fence2) VM_BIND (out_fence=fence3) In what order are the fences going to be signaled? In the order of VM_BIND ioctls? Or out of order? Because you wrote "serialized I assume it's : in order One thing I didn't realize is that because we only get one "VM_BIND" engine, there is a disconnect from the Vulkan specification. In Vulkan VM_BIND operations are serialized but per engine. So you could have something like this : VM_BIND (engine=rcs0, in_fence=fence1, out_fence=fence2) VM_BIND (engine=ccs0, in_fence=fence3, out_fence=fence4) Question - let's say this done after the above operations: EXEC (engine=ccs0, in_fence=NULL, out_fence=NULL) Is the exec ordered with respected to bind (i.e. would fence3 & 4 be signaled before the exec starts)? Matt Hi Matt, From the vulkan point of view, everything is serialized within an engine (we map that to a VkQueue). So with : EXEC (engine=ccs0, in_fence=NULL, out_fence=NULL) VM_BIND (engine=ccs0, in_fence=fence3, out_fence=fence4) EXEC completes first then VM_BIND executes. To be even clearer : EXEC (engine=ccs0, in_fence=fence2, out_fence=NULL) VM_BIND (engine=ccs0, in_fence=fence3, out_fence=fence4) EXEC will wait until fence2 is signaled. Once fence2 is signaled, EXEC proceeds, finishes and only after it is done, VM_BIND executes. It would kind of like having the VM_BIND operation be another batch executed from the ringbuffer buffer. -Lionel fence1 is not signaled fence3 is signaled So the second VM_BIND will proceed before the first VM_BIND. I guess we can deal with that scenario in userspace by doing the wait ourselves in one thread per engines. But then it makes the VM_BIND input fences useless. Daniel : what do you think? Should be rework this or just deal with wait fences in userspace? Sorry I noticed this late. -Lionel
Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document
On 17/05/2022 21:32, Niranjana Vishwanathapura wrote: +VM_BIND/UNBIND ioctl will immediately start binding/unbinding the mapping in an +async worker. The binding and unbinding will work like a special GPU engine. +The binding and unbinding operations are serialized and will wait on specified +input fences before the operation and will signal the output fences upon the +completion of the operation. Due to serialization, completion of an operation +will also indicate that all previous operations are also complete. I guess we should avoid saying "will immediately start binding/unbinding" if there are fences involved. And the fact that it's happening in an async worker seem to imply it's not immediate. I have a question on the behavior of the bind operation when no input fence is provided. Let say I do : VM_BIND (out_fence=fence1) VM_BIND (out_fence=fence2) VM_BIND (out_fence=fence3) In what order are the fences going to be signaled? In the order of VM_BIND ioctls? Or out of order? Because you wrote "serialized I assume it's : in order One thing I didn't realize is that because we only get one "VM_BIND" engine, there is a disconnect from the Vulkan specification. In Vulkan VM_BIND operations are serialized but per engine. So you could have something like this : VM_BIND (engine=rcs0, in_fence=fence1, out_fence=fence2) VM_BIND (engine=ccs0, in_fence=fence3, out_fence=fence4) fence1 is not signaled fence3 is signaled So the second VM_BIND will proceed before the first VM_BIND. I guess we can deal with that scenario in userspace by doing the wait ourselves in one thread per engines. But then it makes the VM_BIND input fences useless. Daniel : what do you think? Should be rework this or just deal with wait fences in userspace? Sorry I noticed this late. -Lionel
Re: [Intel-gfx] [PATCH v2 2/6] drm/i915/xehp: Drop GETPARAM lookups of I915_PARAM_[SUB]SLICE_MASK
On 17/05/2022 06:20, Matt Roper wrote: Slice/subslice/EU information should be obtained via the topology queries provided by the I915_QUERY interface; let's turn off support for the old GETPARAM lookups on Xe_HP and beyond where we can't return meaningful values. The slice mask lookup is meaningless since Xe_HP doesn't support traditional slices (and we make no attempt to return the various new units like gslices, cslices, mslices, etc.) here. The subslice mask lookup is even more problematic; given the distinct masks for geometry vs compute purposes, the combined mask returned here is likely not what userspace would want to act upon anyway. The value is also limited to 32-bits by the nature of the GETPARAM ioctl which is sufficient for the initial Xe_HP platforms, but is unable to convey the larger masks that will be needed on other upcoming platforms. Finally, the value returned here becomes even less meaningful when used on multi-tile platforms where each tile will have its own masks. Signed-off-by: Matt Roper Sounds fair. We've been relying on the topology query in Mesa since it's available and it's a requirement for Gfx10+. FYI, we're also not using I915_PARAM_EU_TOTAL on Gfx10+ for the same reason. Acked-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_getparam.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c index c12a0adefda5..ac9767c56619 100644 --- a/drivers/gpu/drm/i915/i915_getparam.c +++ b/drivers/gpu/drm/i915/i915_getparam.c @@ -148,11 +148,19 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data, value = intel_engines_has_context_isolation(i915); break; case I915_PARAM_SLICE_MASK: + /* Not supported from Xe_HP onward; use topology queries */ + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) + return -EINVAL; + value = sseu->slice_mask; if (!value) return -ENODEV; break; case I915_PARAM_SUBSLICE_MASK: + /* Not supported from Xe_HP onward; use topology queries */ + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) + return -EINVAL; + /* Only copy bits from the first slice */ memcpy(&value, sseu->subslice_mask, min(sseu->ss_stride, (u8)sizeof(value)));
Re: [Intel-gfx] [PATCH] drm/syncobj: flatten dma_fence_chains on transfer
On 30/05/2022 14:40, Christian König wrote: Am 30.05.22 um 12:09 schrieb Lionel Landwerlin: On 30/05/2022 12:52, Christian König wrote: Am 25.05.22 um 23:59 schrieb Lucas De Marchi: On Wed, May 25, 2022 at 12:38:51PM +0200, Christian König wrote: Am 25.05.22 um 11:35 schrieb Lionel Landwerlin: [SNIP] Err... Let's double check with my colleagues. It seems we're running into a test failure in IGT with this patch, but now I have doubts that it's where the problem lies. Yeah, exactly that's what I couldn't understand as well. What you describe above should still work fine. Thanks for taking a look into this, Christian. With some additional prints: [ 210.742634] Console: switching to colour dummy device 80x25 [ 210.742686] [IGT] syncobj_timeline: executing [ 210.756988] [IGT] syncobj_timeline: starting subtest transfer-timeline-point [ 210.757364] [drm:drm_syncobj_transfer_ioctl] *ERROR* adding fence0 signaled=1 [ 210.764543] [drm:drm_syncobj_transfer_ioctl] *ERROR* resulting array fence signaled=0 [ 210.800469] [IGT] syncobj_timeline: exiting, ret=98 [ 210.825426] Console: switching to colour frame buffer device 240x67 still learning this part of the code but AFAICS the problem is because when we are creating the array, the 'signaled' doesn't propagate to the array. Yeah, but that is intentionally. The array should only signal when requested. I still don't get what the test case here is checking. There must be something I don't know about fence arrays. You seem to say that creating an array of signaled fences will not make the array signaled. Exactly that, yes. The array delays it's signaling until somebody asks for it. In other words the fences inside the array are check only after someone calls dma_fence_enable_sw_signaling() which in turn calls dma_fence_array_enable_signaling(). It is certainly possible that nobody does that in the drm_syncobj and because of this the array never signals. Regards, Christian. Thanks, Yeah I guess dma_fence_enable_sw_signaling() is never called for sw_sync. Don't we also want to call it right at the end of drm_syncobj_flatten_chain() ? -Lionel This is the situation with this IGT test. We started with a syncobj with point 1 & 2 signaled. We take point 2 and import it as a new point 3 on the same syncobj. We expect point 3 to be signaled as well and it's not. Thanks, -Lionel Regards, Christian. dma_fence_array_create() { ... atomic_set(&array->num_pending, signal_on_any ? 1 : num_fences); ... } This is not considering the fact that some of the fences could already have been signaled as is the case in the igt@syncobj_timeline@transfer-timeline-point test. See https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11693/shard-dg1-12/igt@syncobj_timel...@transfer-timeline-point.html Quick patch on this function fixes it for me: -8< Subject: [PATCH] dma-buf: Honor already signaled fences on array creation When creating an array, array->num_pending is marked with the number of fences. However the fences could alredy have been signaled. Propagate num_pending to the array by looking at each individual fence the array contains. Signed-off-by: Lucas De Marchi --- drivers/dma-buf/dma-fence-array.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c index 5c8a7084577b..32f491c32fa0 100644 --- a/drivers/dma-buf/dma-fence-array.c +++ b/drivers/dma-buf/dma-fence-array.c @@ -158,6 +158,8 @@ struct dma_fence_array *dma_fence_array_create(int num_fences, { struct dma_fence_array *array; size_t size = sizeof(*array); + unsigned num_pending = 0; + struct dma_fence **f; WARN_ON(!num_fences || !fences); @@ -173,7 +175,14 @@ struct dma_fence_array *dma_fence_array_create(int num_fences, init_irq_work(&array->work, irq_dma_fence_array_work); array->num_fences = num_fences; - atomic_set(&array->num_pending, signal_on_any ? 1 : num_fences); + + for (f = fences; f < fences + num_fences; f++) + num_pending += !dma_fence_is_signaled(*f); + + if (signal_on_any) + num_pending = !!num_pending; + + atomic_set(&array->num_pending, num_pending); array->fences = fences; array->base.error = PENDING_ERROR;
Re: [Intel-gfx] [PATCH] drm/syncobj: flatten dma_fence_chains on transfer
On 30/05/2022 12:52, Christian König wrote: Am 25.05.22 um 23:59 schrieb Lucas De Marchi: On Wed, May 25, 2022 at 12:38:51PM +0200, Christian König wrote: Am 25.05.22 um 11:35 schrieb Lionel Landwerlin: [SNIP] Err... Let's double check with my colleagues. It seems we're running into a test failure in IGT with this patch, but now I have doubts that it's where the problem lies. Yeah, exactly that's what I couldn't understand as well. What you describe above should still work fine. Thanks for taking a look into this, Christian. With some additional prints: [ 210.742634] Console: switching to colour dummy device 80x25 [ 210.742686] [IGT] syncobj_timeline: executing [ 210.756988] [IGT] syncobj_timeline: starting subtest transfer-timeline-point [ 210.757364] [drm:drm_syncobj_transfer_ioctl] *ERROR* adding fence0 signaled=1 [ 210.764543] [drm:drm_syncobj_transfer_ioctl] *ERROR* resulting array fence signaled=0 [ 210.800469] [IGT] syncobj_timeline: exiting, ret=98 [ 210.825426] Console: switching to colour frame buffer device 240x67 still learning this part of the code but AFAICS the problem is because when we are creating the array, the 'signaled' doesn't propagate to the array. Yeah, but that is intentionally. The array should only signal when requested. I still don't get what the test case here is checking. There must be something I don't know about fence arrays. You seem to say that creating an array of signaled fences will not make the array signaled. This is the situation with this IGT test. We started with a syncobj with point 1 & 2 signaled. We take point 2 and import it as a new point 3 on the same syncobj. We expect point 3 to be signaled as well and it's not. Thanks, -Lionel Regards, Christian. dma_fence_array_create() { ... atomic_set(&array->num_pending, signal_on_any ? 1 : num_fences); ... } This is not considering the fact that some of the fences could already have been signaled as is the case in the igt@syncobj_timeline@transfer-timeline-point test. See https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11693/shard-dg1-12/igt@syncobj_timel...@transfer-timeline-point.html Quick patch on this function fixes it for me: -8< Subject: [PATCH] dma-buf: Honor already signaled fences on array creation When creating an array, array->num_pending is marked with the number of fences. However the fences could alredy have been signaled. Propagate num_pending to the array by looking at each individual fence the array contains. Signed-off-by: Lucas De Marchi --- drivers/dma-buf/dma-fence-array.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c index 5c8a7084577b..32f491c32fa0 100644 --- a/drivers/dma-buf/dma-fence-array.c +++ b/drivers/dma-buf/dma-fence-array.c @@ -158,6 +158,8 @@ struct dma_fence_array *dma_fence_array_create(int num_fences, { struct dma_fence_array *array; size_t size = sizeof(*array); + unsigned num_pending = 0; + struct dma_fence **f; WARN_ON(!num_fences || !fences); @@ -173,7 +175,14 @@ struct dma_fence_array *dma_fence_array_create(int num_fences, init_irq_work(&array->work, irq_dma_fence_array_work); array->num_fences = num_fences; - atomic_set(&array->num_pending, signal_on_any ? 1 : num_fences); + + for (f = fences; f < fences + num_fences; f++) + num_pending += !dma_fence_is_signaled(*f); + + if (signal_on_any) + num_pending = !!num_pending; + + atomic_set(&array->num_pending, num_pending); array->fences = fences; array->base.error = PENDING_ERROR;
Re: [2/2] dma-buf: Add an API for importing sync files (v9)
Just noticed a small nit on this one : ordering via these fences, it is the respnosibility of userspace to use -> responsibility Acked-by: Lionel Landwerlin Cheers, -Lionel
Re: [1/2] dma-buf: Add an API for exporting sync files (v14)
Acked-by: Lionel Landwerlin
Re: [Intel-gfx] [PATCH] drm/syncobj: flatten dma_fence_chains on transfer
On 25/05/2022 12:26, Lionel Landwerlin wrote: On 25/05/2022 11:24, Christian König wrote: Am 25.05.22 um 08:47 schrieb Lionel Landwerlin: On 09/02/2022 20:26, Christian König wrote: It is illegal to add a dma_fence_chain as timeline point. Flatten out the fences into a dma_fence_array instead. Signed-off-by: Christian König --- drivers/gpu/drm/drm_syncobj.c | 61 --- 1 file changed, 56 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index c313a5b4549c..7e48dcd1bee4 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -853,12 +853,57 @@ drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data, &args->handle); } + +/* + * Try to flatten a dma_fence_chain into a dma_fence_array so that it can be + * added as timeline fence to a chain again. + */ +static int drm_syncobj_flatten_chain(struct dma_fence **f) +{ + struct dma_fence_chain *chain = to_dma_fence_chain(*f); + struct dma_fence *tmp, **fences; + struct dma_fence_array *array; + unsigned int count; + + if (!chain) + return 0; + + count = 0; + dma_fence_chain_for_each(tmp, &chain->base) + ++count; + + fences = kmalloc_array(count, sizeof(*fences), GFP_KERNEL); + if (!fences) + return -ENOMEM; + + count = 0; + dma_fence_chain_for_each(tmp, &chain->base) + fences[count++] = dma_fence_get(tmp); + + array = dma_fence_array_create(count, fences, + dma_fence_context_alloc(1), Hi Christian, Sorry for the late answer to this. It appears this commit is trying to remove the warnings added by "dma-buf: Warn about dma_fence_chain container rules" Yes, correct. We are now enforcing some rules with warnings and this here bubbled up. But the context allocation you added just above is breaking some tests. In particular igt@syncobj_timeline@transfer-timeline-point That test transfer points into the timeline at point 3 and expects that we'll still on the previous points to complete. Hui what? I don't understand the problem you are seeing here. What exactly is the test doing? In my opinion we should be reusing the previous context number if there is one and only allocate if we don't have a point. Scratching my head what you mean with that. The functionality transfers a synchronization fence from one timeline to another. So as far as I can see the new point should be part of the timeline of the syncobj we are transferring to. If the application wants to not depend on previous points for wait operations, it can reset the syncobj prior to adding a new point. Well we should never lose synchronization. So what happens is that when we do the transfer all the fences of the source are flattened out into an array. And that array is then added as new point into the destination timeline. In this case would be broken : syncobjA <- signal point 1 syncobjA <- import syncobjB point 1 into syncobjA point 2 syncobjA <- query returns 0 -Lionel Err... Let's double check with my colleagues. It seems we're running into a test failure in IGT with this patch, but now I have doubts that it's where the problem lies. -Lionel Where exactly is the problem? Regards, Christian. Cheers, -Lionel + 1, false); + if (!array) + goto free_fences; + + dma_fence_put(*f); + *f = &array->base; + return 0; + +free_fences: + while (count--) + dma_fence_put(fences[count]); + + kfree(fences); + return -ENOMEM; +} + static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, struct drm_syncobj_transfer *args) { struct drm_syncobj *timeline_syncobj = NULL; - struct dma_fence *fence; struct dma_fence_chain *chain; + struct dma_fence *fence; int ret; timeline_syncobj = drm_syncobj_find(file_private, args->dst_handle); @@ -869,16 +914,22 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, args->src_point, args->flags, &fence); if (ret) - goto err; + goto err_put_timeline; + + ret = drm_syncobj_flatten_chain(&fence); + if (ret) + goto err_free_fence; + chain = dma_fence_chain_alloc(); if (!chain) { ret = -ENOMEM; - goto err1; + goto err_free_fence; } + drm_syncobj_add_point(timeline_syncobj, chain, fence, args->dst_point); -err1: +err_free_fence: dma_fence_put(fence); -err: +err_put_timeline: drm_syncobj_put(timeline_syncobj); return ret;
Re: [Intel-gfx] [PATCH] drm/syncobj: flatten dma_fence_chains on transfer
On 25/05/2022 11:24, Christian König wrote: Am 25.05.22 um 08:47 schrieb Lionel Landwerlin: On 09/02/2022 20:26, Christian König wrote: It is illegal to add a dma_fence_chain as timeline point. Flatten out the fences into a dma_fence_array instead. Signed-off-by: Christian König --- drivers/gpu/drm/drm_syncobj.c | 61 --- 1 file changed, 56 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index c313a5b4549c..7e48dcd1bee4 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -853,12 +853,57 @@ drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data, &args->handle); } + +/* + * Try to flatten a dma_fence_chain into a dma_fence_array so that it can be + * added as timeline fence to a chain again. + */ +static int drm_syncobj_flatten_chain(struct dma_fence **f) +{ + struct dma_fence_chain *chain = to_dma_fence_chain(*f); + struct dma_fence *tmp, **fences; + struct dma_fence_array *array; + unsigned int count; + + if (!chain) + return 0; + + count = 0; + dma_fence_chain_for_each(tmp, &chain->base) + ++count; + + fences = kmalloc_array(count, sizeof(*fences), GFP_KERNEL); + if (!fences) + return -ENOMEM; + + count = 0; + dma_fence_chain_for_each(tmp, &chain->base) + fences[count++] = dma_fence_get(tmp); + + array = dma_fence_array_create(count, fences, + dma_fence_context_alloc(1), Hi Christian, Sorry for the late answer to this. It appears this commit is trying to remove the warnings added by "dma-buf: Warn about dma_fence_chain container rules" Yes, correct. We are now enforcing some rules with warnings and this here bubbled up. But the context allocation you added just above is breaking some tests. In particular igt@syncobj_timeline@transfer-timeline-point That test transfer points into the timeline at point 3 and expects that we'll still on the previous points to complete. Hui what? I don't understand the problem you are seeing here. What exactly is the test doing? In my opinion we should be reusing the previous context number if there is one and only allocate if we don't have a point. Scratching my head what you mean with that. The functionality transfers a synchronization fence from one timeline to another. So as far as I can see the new point should be part of the timeline of the syncobj we are transferring to. If the application wants to not depend on previous points for wait operations, it can reset the syncobj prior to adding a new point. Well we should never lose synchronization. So what happens is that when we do the transfer all the fences of the source are flattened out into an array. And that array is then added as new point into the destination timeline. In this case would be broken : syncobjA <- signal point 1 syncobjA <- import syncobjB point 1 into syncobjA point 2 syncobjA <- query returns 0 -Lionel Where exactly is the problem? Regards, Christian. Cheers, -Lionel + 1, false); + if (!array) + goto free_fences; + + dma_fence_put(*f); + *f = &array->base; + return 0; + +free_fences: + while (count--) + dma_fence_put(fences[count]); + + kfree(fences); + return -ENOMEM; +} + static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, struct drm_syncobj_transfer *args) { struct drm_syncobj *timeline_syncobj = NULL; - struct dma_fence *fence; struct dma_fence_chain *chain; + struct dma_fence *fence; int ret; timeline_syncobj = drm_syncobj_find(file_private, args->dst_handle); @@ -869,16 +914,22 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, args->src_point, args->flags, &fence); if (ret) - goto err; + goto err_put_timeline; + + ret = drm_syncobj_flatten_chain(&fence); + if (ret) + goto err_free_fence; + chain = dma_fence_chain_alloc(); if (!chain) { ret = -ENOMEM; - goto err1; + goto err_free_fence; } + drm_syncobj_add_point(timeline_syncobj, chain, fence, args->dst_point); -err1: +err_free_fence: dma_fence_put(fence); -err: +err_put_timeline: drm_syncobj_put(timeline_syncobj); return ret;
Re: [Intel-gfx] [PATCH] drm/syncobj: flatten dma_fence_chains on transfer
On 09/02/2022 20:26, Christian König wrote: It is illegal to add a dma_fence_chain as timeline point. Flatten out the fences into a dma_fence_array instead. Signed-off-by: Christian König --- drivers/gpu/drm/drm_syncobj.c | 61 --- 1 file changed, 56 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index c313a5b4549c..7e48dcd1bee4 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -853,12 +853,57 @@ drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data, &args->handle); } + +/* + * Try to flatten a dma_fence_chain into a dma_fence_array so that it can be + * added as timeline fence to a chain again. + */ +static int drm_syncobj_flatten_chain(struct dma_fence **f) +{ + struct dma_fence_chain *chain = to_dma_fence_chain(*f); + struct dma_fence *tmp, **fences; + struct dma_fence_array *array; + unsigned int count; + + if (!chain) + return 0; + + count = 0; + dma_fence_chain_for_each(tmp, &chain->base) + ++count; + + fences = kmalloc_array(count, sizeof(*fences), GFP_KERNEL); + if (!fences) + return -ENOMEM; + + count = 0; + dma_fence_chain_for_each(tmp, &chain->base) + fences[count++] = dma_fence_get(tmp); + + array = dma_fence_array_create(count, fences, + dma_fence_context_alloc(1), Hi Christian, Sorry for the late answer to this. It appears this commit is trying to remove the warnings added by "dma-buf: Warn about dma_fence_chain container rules" But the context allocation you added just above is breaking some tests. In particular igt@syncobj_timeline@transfer-timeline-point That test transfer points into the timeline at point 3 and expects that we'll still on the previous points to complete. In my opinion we should be reusing the previous context number if there is one and only allocate if we don't have a point. If the application wants to not depend on previous points for wait operations, it can reset the syncobj prior to adding a new point. Cheers, -Lionel + 1, false); + if (!array) + goto free_fences; + + dma_fence_put(*f); + *f = &array->base; + return 0; + +free_fences: + while (count--) + dma_fence_put(fences[count]); + + kfree(fences); + return -ENOMEM; +} + static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, struct drm_syncobj_transfer *args) { struct drm_syncobj *timeline_syncobj = NULL; - struct dma_fence *fence; struct dma_fence_chain *chain; + struct dma_fence *fence; int ret; timeline_syncobj = drm_syncobj_find(file_private, args->dst_handle); @@ -869,16 +914,22 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, args->src_point, args->flags, &fence); if (ret) - goto err; + goto err_put_timeline; + + ret = drm_syncobj_flatten_chain(&fence); + if (ret) + goto err_free_fence; + chain = dma_fence_chain_alloc(); if (!chain) { ret = -ENOMEM; - goto err1; + goto err_free_fence; } + drm_syncobj_add_point(timeline_syncobj, chain, fence, args->dst_point); -err1: +err_free_fence: dma_fence_put(fence); -err: +err_put_timeline: drm_syncobj_put(timeline_syncobj); return ret;
Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document
On 20/05/2022 01:52, Zanoni, Paulo R wrote: On Tue, 2022-05-17 at 11:32 -0700, Niranjana Vishwanathapura wrote: VM_BIND design document with description of intended use cases. v2: Add more documentation and format as per review comments from Daniel. Signed-off-by: Niranjana Vishwanathapura --- diff --git a/Documentation/gpu/rfc/i915_vm_bind.rst b/Documentation/gpu/rfc/i915_vm_bind.rst new file mode 100644 index ..f1be560d313c --- /dev/null +++ b/Documentation/gpu/rfc/i915_vm_bind.rst @@ -0,0 +1,304 @@ +== +I915 VM_BIND feature design and use cases +== + +VM_BIND feature + +DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer +objects (BOs) or sections of a BOs at specified GPU virtual addresses on a +specified address space (VM). These mappings (also referred to as persistent +mappings) will be persistent across multiple GPU submissions (execbuff calls) +issued by the UMD, without user having to provide a list of all required +mappings during each submission (as required by older execbuff mode). + +VM_BIND/UNBIND ioctls will support 'in' and 'out' fences to allow userpace +to specify how the binding/unbinding should sync with other operations +like the GPU job submission. These fences will be timeline 'drm_syncobj's +for non-Compute contexts (See struct drm_i915_vm_bind_ext_timeline_fences). +For Compute contexts, they will be user/memory fences (See struct +drm_i915_vm_bind_ext_user_fence). + +VM_BIND feature is advertised to user via I915_PARAM_HAS_VM_BIND. +User has to opt-in for VM_BIND mode of binding for an address space (VM) +during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension. + +VM_BIND/UNBIND ioctl will immediately start binding/unbinding the mapping in an +async worker. The binding and unbinding will work like a special GPU engine. +The binding and unbinding operations are serialized and will wait on specified +input fences before the operation and will signal the output fences upon the +completion of the operation. Due to serialization, completion of an operation +will also indicate that all previous operations are also complete. + +VM_BIND features include: + +* Multiple Virtual Address (VA) mappings can map to the same physical pages + of an object (aliasing). +* VA mapping can map to a partial section of the BO (partial binding). +* Support capture of persistent mappings in the dump upon GPU error. +* TLB is flushed upon unbind completion. Batching of TLB flushes in some + use cases will be helpful. +* Asynchronous vm_bind and vm_unbind support with 'in' and 'out' fences. +* Support for userptr gem objects (no special uapi is required for this). + +Execbuff ioctl in VM_BIND mode +--- +The execbuff ioctl handling in VM_BIND mode differs significantly from the +older method. A VM in VM_BIND mode will not support older execbuff mode of +binding. In VM_BIND mode, execbuff ioctl will not accept any execlist. Hence, +no support for implicit sync. It is expected that the below work will be able +to support requirements of object dependency setting in all use cases: + +"dma-buf: Add an API for exporting sync files" +(https://lwn.net/Articles/859290/) I would really like to have more details here. The link provided points to new ioctls and we're not very familiar with those yet, so I think you should really clarify the interaction between the new additions here. Having some sample code would be really nice too. For Mesa at least (and I believe for the other drivers too) we always have a few exported buffers in every execbuf call, and we rely on the implicit synchronization provided by execbuf to make sure everything works. The execbuf ioctl also has some code to flush caches during implicit synchronization AFAIR, so I would guess we rely on it too and whatever else the Kernel does. Is that covered by the new ioctls? In addition, as far as I remember, one of the big improvements of vm_bind was that it would help reduce ioctl latency and cpu overhead. But if making execbuf faster comes at the cost of requiring additional ioctls calls for implicit synchronization, which is required on ever execbuf call, then I wonder if we'll even get any faster at all. Comparing old execbuf vs plain new execbuf without the new required ioctls won't make sense. But maybe I'm wrong and we won't need to call these new ioctls around every single execbuf ioctl we submit? Again, more clarification and some code examples here would be really nice. This is a big change on an important part of the API, we should clarify the new expected usage. Hey Paulo, I think in the case of X11/Wayland, we'll be doing 1 or 2 extra ioctls per frame which seems pretty reasonable. Essentially we need to set the dependencies on the buffer we´re going to tell the display engine (gnome-shell/kde/bare-display-hw) to use. In the Vulkan case, we're t
Re: [PATCH v3] drm/doc: add rfc section for small BAR uapi
On 17/05/2022 12:23, Tvrtko Ursulin wrote: On 17/05/2022 09:55, Lionel Landwerlin wrote: On 17/05/2022 11:29, Tvrtko Ursulin wrote: On 16/05/2022 19:11, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) v3: - Drop the vma query for now. - Add unallocated_cpu_visible_size as part of the region query. - Improve the docs some more, including documenting the expected behaviour on older kernels, since this came up in some offline discussion. Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Tvrtko Ursulin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jon Bloomfield Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 164 +++ Documentation/gpu/rfc/i915_small_bar.rst | 47 +++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 215 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..4079d287750b --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,164 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** @probed_size: Memory probed by the driver (-1 = unknown) */ + __u64 probed_size; Is -1 possible today or when it will be? For system memory it appears zeroes are returned today so that has to stay I think. Does it effectively mean userspace has to consider both 0 and -1 as unknown is the question. I raised this on v2. As far as I can tell there are no situation where we would get -1. Is it really probed_size=0 on smem?? It's not the case on the internal branch. My bad, I misread the arguments to intel_memory_region_create while grepping: struct intel_memory_region *i915_gem_shmem_setup(struct drm_i915_private *i915, u16 type, u16 instance) { return intel_memory_region_create(i915, 0, totalram_pages() << PAGE_SHIFT, PAGE_SIZE, 0, 0, type, instance, &shmem_region_ops); I saw "0, 0" and wrongly assumed that would be the data, since it matched with my mental model and the comment against unallocated_size saying it's only tracked for device memory. Although I'd say it is questionable for i915 to return this data. I wonder it use case is possible where it would even be wrong but don't know. I guess the cat is out of the bag now. Not sure how questionable that is. There are a bunch of tools reporting the amount of memory available (free, top, htop, etc...). It might not be totalram_pages() but probably something close to it. Having a non 0 & non -1 value is useful. -Lionel If the situation is -1 for unknown and some valid size (not zero) I don't think there is a problem here. Regards, Tvrtko Anv is not currently handling that case. I would very much like to not deal with 0 for smem. It really makes it easier for userspace rather than having to fish information from 2 different places and on top of dealing with multiple kernel versions. -Lionel + + /** + * @unallocated_size: Estimate of memory remaining (-1 = unknown) + * + * Note this is only currently tracked for I915_MEMORY_CLASS_DEVICE + * regions, and also requires CAP_PERFMON or CAP_SYS_ADMIN to get + * reliable accounting. Without this(or if this an older kernel) the s/if this an/if this is an/ Also same question as above about -1. + * value here will always match the @probed_size. + */ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** + * @probed_cpu_visible_size: Memory probed by the driver + * that is CPU accessible. (-1 = unknown). Also question about -1. In this case this could be done since the field is yet to be added but I am curious if it ever can be -1. + * + * This will be always be <= @probed_s
Re: [PATCH v3] drm/doc: add rfc section for small BAR uapi
On 17/05/2022 11:29, Tvrtko Ursulin wrote: On 16/05/2022 19:11, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) v3: - Drop the vma query for now. - Add unallocated_cpu_visible_size as part of the region query. - Improve the docs some more, including documenting the expected behaviour on older kernels, since this came up in some offline discussion. Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Tvrtko Ursulin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jon Bloomfield Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 164 +++ Documentation/gpu/rfc/i915_small_bar.rst | 47 +++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 215 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..4079d287750b --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,164 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** @probed_size: Memory probed by the driver (-1 = unknown) */ + __u64 probed_size; Is -1 possible today or when it will be? For system memory it appears zeroes are returned today so that has to stay I think. Does it effectively mean userspace has to consider both 0 and -1 as unknown is the question. I raised this on v2. As far as I can tell there are no situation where we would get -1. Is it really probed_size=0 on smem?? It's not the case on the internal branch. Anv is not currently handling that case. I would very much like to not deal with 0 for smem. It really makes it easier for userspace rather than having to fish information from 2 different places and on top of dealing with multiple kernel versions. -Lionel + + /** + * @unallocated_size: Estimate of memory remaining (-1 = unknown) + * + * Note this is only currently tracked for I915_MEMORY_CLASS_DEVICE + * regions, and also requires CAP_PERFMON or CAP_SYS_ADMIN to get + * reliable accounting. Without this(or if this an older kernel) the s/if this an/if this is an/ Also same question as above about -1. + * value here will always match the @probed_size. + */ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** + * @probed_cpu_visible_size: Memory probed by the driver + * that is CPU accessible. (-1 = unknown). Also question about -1. In this case this could be done since the field is yet to be added but I am curious if it ever can be -1. + * + * This will be always be <= @probed_size, and the + * remainder(if there is any) will not be CPU + * accessible. + * + * On systems without small BAR, the @probed_size will + * always equal the @probed_cpu_visible_size, since all + * of it will be CPU accessible. + * + * Note that if the value returned here is zero, then + * this must be an old kernel which lacks the relevant + * small-bar uAPI support(including I have noticed you prefer no space before parentheses throughout the text so I guess it's just my preference to have it. Very nitpicky even if I am right so up to you. + * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS), but on + * such systems we should never actually end up with a + * small BAR configuration, assuming we are able to load + * the kernel module. Hence it should be safe to treat + * this the same as when @probed_cpu_visible_size == + * @probed_size. + */ + __u64 probed_cpu_visible_size; + + /** + * @unallocated_cpu_visible_size: Estimate of CPU + * visible memory remaining (-1 = unknown). + * + * Note this is only
Re: [PATCH v3] uapi/drm/i915: Document memory residency and Flat-CCS capability of obj
On 14/05/2022 00:06, Jordan Justen wrote: On 2022-05-13 05:31:00, Lionel Landwerlin wrote: On 02/05/2022 17:15, Ramalingam C wrote: Capture the impact of memory region preference list of the objects, on their memory residency and Flat-CCS capability. v2: Fix the Flat-CCS capability of an obj with {lmem, smem} preference list [Thomas] v3: Reworded the doc [Matt] Signed-off-by: Ramalingam C cc: Matthew Auld cc: Thomas Hellstrom cc: Daniel Vetter cc: Jon Bloomfield cc: Lionel Landwerlin cc: Kenneth Graunke cc:mesa-...@lists.freedesktop.org cc: Jordan Justen cc: Tony Ye Reviewed-by: Matthew Auld --- include/uapi/drm/i915_drm.h | 16 1 file changed, 16 insertions(+) diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index a2def7b27009..b7e1c2fe08dc 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -3443,6 +3443,22 @@ struct drm_i915_gem_create_ext { * At which point we get the object handle in &drm_i915_gem_create_ext.handle, * along with the final object size in &drm_i915_gem_create_ext.size, which * should account for any rounding up, if required. + * + * Note that userspace has no means of knowing the current backing region + * for objects where @num_regions is larger than one. The kernel will only + * ensure that the priority order of the @regions array is honoured, either + * when initially placing the object, or when moving memory around due to + * memory pressure + * + * On Flat-CCS capable HW, compression is supported for the objects residing + * in I915_MEMORY_CLASS_DEVICE. When such objects (compressed) has other + * memory class in @regions and migrated (by I915, due to memory + * constrain) to the non I915_MEMORY_CLASS_DEVICE region, then I915 needs to + * decompress the content. But I915 dosen't have the required information to + * decompress the userspace compressed objects. + * + * So I915 supports Flat-CCS, only on the objects which can reside only on + * I915_MEMORY_CLASS_DEVICE regions. I think it's fine to assume Flat-CSS surface will always be in lmem. I see no issue for the Anv Vulkan driver. Maybe Nanley or Ken can speak for the Iris GL driver? Acked-by: Jordan Justen I think Nanley has accounted for this on iris with: https://gitlab.freedesktop.org/mesa/mesa/-/commit/42a865730ef72574e179b56a314f30fdccc6cba8 -Jordan Thanks Jordan, We might want to through in an additional : assert((|flags &||BO_ALLOC_SMEM) == 0); in the CCS case | | | |-Lionel |
Re: [PATCH v3] uapi/drm/i915: Document memory residency and Flat-CCS capability of obj
On 02/05/2022 17:15, Ramalingam C wrote: Capture the impact of memory region preference list of the objects, on their memory residency and Flat-CCS capability. v2: Fix the Flat-CCS capability of an obj with {lmem, smem} preference list [Thomas] v3: Reworded the doc [Matt] Signed-off-by: Ramalingam C cc: Matthew Auld cc: Thomas Hellstrom cc: Daniel Vetter cc: Jon Bloomfield cc: Lionel Landwerlin cc: Kenneth Graunke cc: mesa-...@lists.freedesktop.org cc: Jordan Justen cc: Tony Ye Reviewed-by: Matthew Auld --- include/uapi/drm/i915_drm.h | 16 1 file changed, 16 insertions(+) diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index a2def7b27009..b7e1c2fe08dc 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -3443,6 +3443,22 @@ struct drm_i915_gem_create_ext { * At which point we get the object handle in &drm_i915_gem_create_ext.handle, * along with the final object size in &drm_i915_gem_create_ext.size, which * should account for any rounding up, if required. + * + * Note that userspace has no means of knowing the current backing region + * for objects where @num_regions is larger than one. The kernel will only + * ensure that the priority order of the @regions array is honoured, either + * when initially placing the object, or when moving memory around due to + * memory pressure + * + * On Flat-CCS capable HW, compression is supported for the objects residing + * in I915_MEMORY_CLASS_DEVICE. When such objects (compressed) has other + * memory class in @regions and migrated (by I915, due to memory + * constrain) to the non I915_MEMORY_CLASS_DEVICE region, then I915 needs to + * decompress the content. But I915 dosen't have the required information to + * decompress the userspace compressed objects. + * + * So I915 supports Flat-CCS, only on the objects which can reside only on + * I915_MEMORY_CLASS_DEVICE regions. I think it's fine to assume Flat-CSS surface will always be in lmem. I see no issue for the Anv Vulkan driver. Maybe Nanley or Ken can speak for the Iris GL driver? -Lionel */ struct drm_i915_gem_create_ext_memory_regions { /** @base: Extension link. See struct i915_user_extension. */
Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi
On 03/05/2022 17:27, Matthew Auld wrote: On 03/05/2022 11:39, Lionel Landwerlin wrote: On 03/05/2022 13:22, Matthew Auld wrote: On 02/05/2022 09:53, Lionel Landwerlin wrote: On 02/05/2022 10:54, Lionel Landwerlin wrote: On 20/04/2022 20:13, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 190 +++ Documentation/gpu/rfc/i915_small_bar.rst | 58 +++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 252 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..7bfd0cf44d35 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,190 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** @probed_size: Memory probed by the driver (-1 = unknown) */ + __u64 probed_size; + + /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** + * @probed_cpu_visible_size: Memory probed by the driver + * that is CPU accessible. (-1 = unknown). + * + * This will be always be <= @probed_size, and the + * remainder(if there is any) will not be CPU + * accessible. + */ + __u64 probed_cpu_visible_size; + }; Trying to implement userspace support in Vulkan for this, I have an additional question about the value of probed_cpu_visible_size. When is it set to -1? I'm guessing before there is support for this value it'll be 0 (MBZ). After after it should either be the entire lmem or something smaller. -Lionel Other pain point of this new uAPI, previously we could query the unallocated size for each heap. unallocated_size should always give the same value as probed_size. We have the avail tracking, but we don't currently expose that through unallocated_size, due to lack of real userspace/user etc. Now lmem is effectively divided into 2 heaps, but unallocated_size is tracking allocation from both parts of lmem. Yeah, if we ever properly expose the unallocated_size, then we could also just add unallocated_cpu_visible_size. Is adding new I915_MEMORY_CLASS_DEVICE_NON_MAPPABLE out of question? I don't think it's out of the question... I guess user-space should be able to get the current flag behaviour just by specifying: device, system. And it does give more flexibly to allow something like: device, device-nm, smem. We can also drop the probed_cpu_visible_size, which would now just be the probed_size with device/device-nm. And if we lack device-nm, then the entire thing must be CPU mappable. One of the downsides though, is that we can no longer easily mix object pages from both device + device-nm, which we could previously do when we didn't specify the flag. At least according to the current design/behaviour for @regions that would not be allowed. I guess some kind of new flag like ALLOC_MIXED or so? Although currently that is only possible with device + device-nm in ttm/i915. Thanks, I wasn't aware of the restrictions. Adding unallocated_cpu_visible_size would be great. So do we want this in the next version? i.e we already have a current real use case in mind for unallocated_size where probed_size is not good enough? Yeah in the next iteration. We're using unallocated_size to implement VK_EXT_memory_budget and since I'm going to expose lmem mappable/unmappable as 2 different heaps on Vulkan, I would use that there too. -Lionel -Lionel -Lionel + }; +}; + +/** + * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added + * extension support using struct i915_user
Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi
On 03/05/2022 13:22, Matthew Auld wrote: On 02/05/2022 09:53, Lionel Landwerlin wrote: On 02/05/2022 10:54, Lionel Landwerlin wrote: On 20/04/2022 20:13, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 190 +++ Documentation/gpu/rfc/i915_small_bar.rst | 58 +++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 252 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..7bfd0cf44d35 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,190 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** @probed_size: Memory probed by the driver (-1 = unknown) */ + __u64 probed_size; + + /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** + * @probed_cpu_visible_size: Memory probed by the driver + * that is CPU accessible. (-1 = unknown). + * + * This will be always be <= @probed_size, and the + * remainder(if there is any) will not be CPU + * accessible. + */ + __u64 probed_cpu_visible_size; + }; Trying to implement userspace support in Vulkan for this, I have an additional question about the value of probed_cpu_visible_size. When is it set to -1? I'm guessing before there is support for this value it'll be 0 (MBZ). After after it should either be the entire lmem or something smaller. -Lionel Other pain point of this new uAPI, previously we could query the unallocated size for each heap. unallocated_size should always give the same value as probed_size. We have the avail tracking, but we don't currently expose that through unallocated_size, due to lack of real userspace/user etc. Now lmem is effectively divided into 2 heaps, but unallocated_size is tracking allocation from both parts of lmem. Yeah, if we ever properly expose the unallocated_size, then we could also just add unallocated_cpu_visible_size. Is adding new I915_MEMORY_CLASS_DEVICE_NON_MAPPABLE out of question? I don't think it's out of the question... I guess user-space should be able to get the current flag behaviour just by specifying: device, system. And it does give more flexibly to allow something like: device, device-nm, smem. We can also drop the probed_cpu_visible_size, which would now just be the probed_size with device/device-nm. And if we lack device-nm, then the entire thing must be CPU mappable. One of the downsides though, is that we can no longer easily mix object pages from both device + device-nm, which we could previously do when we didn't specify the flag. At least according to the current design/behaviour for @regions that would not be allowed. I guess some kind of new flag like ALLOC_MIXED or so? Although currently that is only possible with device + device-nm in ttm/i915. Thanks, I wasn't aware of the restrictions. Adding unallocated_cpu_visible_size would be great. -Lionel -Lionel + }; +}; + +/** + * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added + * extension support using struct i915_user_extension. + * + * Note that new buffer flags should be added here, at least for the stuff that + * is immutable. Previously we would have two ioctls, one to create the object + * with gem_create, and another to apply various parameters, however this + * creates some ambiguity for the params which are considered immutable. Also in + * general we're phasing out the various SET/GET ioctls. + */ +struct __drm_i915_gem_create_ext { + /** + * @size: Requested siz
Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi
On 03/05/2022 12:07, Matthew Auld wrote: On 02/05/2022 19:03, Lionel Landwerlin wrote: On 02/05/2022 20:58, Abodunrin, Akeem G wrote: -Original Message- From: Landwerlin, Lionel G Sent: Monday, May 2, 2022 12:55 AM To: Auld, Matthew ; intel-...@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org; Thomas Hellström ; Bloomfield, Jon ; Daniel Vetter ; Justen, Jordan L ; Kenneth Graunke ; Abodunrin, Akeem G ; mesa-...@lists.freedesktop.org Subject: Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi On 20/04/2022 20:13, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 190 +++ Documentation/gpu/rfc/i915_small_bar.rst | 58 +++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 252 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..7bfd0cf44d35 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,190 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as +known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id +DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** @probed_size: Memory probed by the driver (-1 = unknown) */ + __u64 probed_size; + + /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** + * @probed_cpu_visible_size: Memory probed by the driver + * that is CPU accessible. (-1 = unknown). + * + * This will be always be <= @probed_size, and the + * remainder(if there is any) will not be CPU + * accessible. + */ + __u64 probed_cpu_visible_size; + }; Trying to implement userspace support in Vulkan for this, I have an additional question about the value of probed_cpu_visible_size. When is it set to -1? I believe it is set to -1 if it is unknown, and/or not cpu accessible... Cheers! ~Akeem So what should I expect on system memory? I guess just probed_cpu_visible_size == probed_size. Or maybe we can just use -1 here? What value is returned when all of probed_size is CPU visible on local memory? probed_size == probed_cpu_visible_size. Thanks, looks good to me. Then maybe we should update the comment to say that. Looks like there are no cases where we'll get -1. -Lionel Thanks, -Lionel I'm guessing before there is support for this value it'll be 0 (MBZ). After after it should either be the entire lmem or something smaller. -Lionel + }; +}; + +/** + * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, +with added + * extension support using struct i915_user_extension. + * + * Note that new buffer flags should be added here, at least for the +stuff that + * is immutable. Previously we would have two ioctls, one to create +the object + * with gem_create, and another to apply various parameters, however +this + * creates some ambiguity for the params which are considered +immutable. Also in + * general we're phasing out the various SET/GET ioctls. + */ +struct __drm_i915_gem_create_ext { + /** + * @size: Requested size for the object. + * + * The (page-aligned) allocated size for the object will be returned. + * + * Note that for some devices we have might have further minimum + * page-size restrictions(larger than 4K), like for device local-memory. + * However in general the final size here should always reflect any + * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS + * extension to place the object in device local-memory. + */ + __u64 size; + /** + * @handle: Returned handle for the object. + * + * Object handles are no
Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi
On 02/05/2022 20:58, Abodunrin, Akeem G wrote: -Original Message- From: Landwerlin, Lionel G Sent: Monday, May 2, 2022 12:55 AM To: Auld, Matthew ; intel-...@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org; Thomas Hellström ; Bloomfield, Jon ; Daniel Vetter ; Justen, Jordan L ; Kenneth Graunke ; Abodunrin, Akeem G ; mesa-...@lists.freedesktop.org Subject: Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi On 20/04/2022 20:13, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 190 +++ Documentation/gpu/rfc/i915_small_bar.rst | 58 +++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 252 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..7bfd0cf44d35 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,190 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as +known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id +DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** @probed_size: Memory probed by the driver (-1 = unknown) */ + __u64 probed_size; + + /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** +* @probed_cpu_visible_size: Memory probed by the driver +* that is CPU accessible. (-1 = unknown). +* +* This will be always be <= @probed_size, and the +* remainder(if there is any) will not be CPU +* accessible. +*/ + __u64 probed_cpu_visible_size; + }; Trying to implement userspace support in Vulkan for this, I have an additional question about the value of probed_cpu_visible_size. When is it set to -1? I believe it is set to -1 if it is unknown, and/or not cpu accessible... Cheers! ~Akeem So what should I expect on system memory? What value is returned when all of probed_size is CPU visible on local memory? Thanks, -Lionel I'm guessing before there is support for this value it'll be 0 (MBZ). After after it should either be the entire lmem or something smaller. -Lionel + }; +}; + +/** + * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, +with added + * extension support using struct i915_user_extension. + * + * Note that new buffer flags should be added here, at least for the +stuff that + * is immutable. Previously we would have two ioctls, one to create +the object + * with gem_create, and another to apply various parameters, however +this + * creates some ambiguity for the params which are considered +immutable. Also in + * general we're phasing out the various SET/GET ioctls. + */ +struct __drm_i915_gem_create_ext { + /** +* @size: Requested size for the object. +* +* The (page-aligned) allocated size for the object will be returned. +* +* Note that for some devices we have might have further minimum +* page-size restrictions(larger than 4K), like for device local-memory. +* However in general the final size here should always reflect any +* rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS +* extension to place the object in device local-memory. +*/ + __u64 size; + /** +* @handle: Returned handle for the object. +* +* Object handles are nonzero. +*/ + __u32 handle; + /** +* @flags: Optional flags. +* +* Supported values: +* +* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that +* the object will need to be accessed via the CPU. +* +* Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and +* only strictly required on platforms where only some of t
Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi
On 02/05/2022 10:54, Lionel Landwerlin wrote: On 20/04/2022 20:13, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 190 +++ Documentation/gpu/rfc/i915_small_bar.rst | 58 +++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 252 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..7bfd0cf44d35 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,190 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** @probed_size: Memory probed by the driver (-1 = unknown) */ + __u64 probed_size; + + /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** + * @probed_cpu_visible_size: Memory probed by the driver + * that is CPU accessible. (-1 = unknown). + * + * This will be always be <= @probed_size, and the + * remainder(if there is any) will not be CPU + * accessible. + */ + __u64 probed_cpu_visible_size; + }; Trying to implement userspace support in Vulkan for this, I have an additional question about the value of probed_cpu_visible_size. When is it set to -1? I'm guessing before there is support for this value it'll be 0 (MBZ). After after it should either be the entire lmem or something smaller. -Lionel Other pain point of this new uAPI, previously we could query the unallocated size for each heap. Now lmem is effectively divided into 2 heaps, but unallocated_size is tracking allocation from both parts of lmem. Is adding new I915_MEMORY_CLASS_DEVICE_NON_MAPPABLE out of question? -Lionel + }; +}; + +/** + * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added + * extension support using struct i915_user_extension. + * + * Note that new buffer flags should be added here, at least for the stuff that + * is immutable. Previously we would have two ioctls, one to create the object + * with gem_create, and another to apply various parameters, however this + * creates some ambiguity for the params which are considered immutable. Also in + * general we're phasing out the various SET/GET ioctls. + */ +struct __drm_i915_gem_create_ext { + /** + * @size: Requested size for the object. + * + * The (page-aligned) allocated size for the object will be returned. + * + * Note that for some devices we have might have further minimum + * page-size restrictions(larger than 4K), like for device local-memory. + * However in general the final size here should always reflect any + * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS + * extension to place the object in device local-memory. + */ + __u64 size; + /** + * @handle: Returned handle for the object. + * + * Object handles are nonzero. + */ + __u32 handle; + /** + * @flags: Optional flags. + * + * Supported values: + * + * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that + * the object will need to be accessed via the CPU. + * + * Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and + * only strictly required on platforms where only some of the device + * memory is directly visible or mappable through the CPU, like on DG2+. + * + * One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to + * ensure we can always spill the allocation to system memory, if we + * can't place the object in the mappable part of + * I915_MEMORY_CLASS_DEVICE. + * + * Note that since th
Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi
On 20/04/2022 20:13, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 190 +++ Documentation/gpu/rfc/i915_small_bar.rst | 58 +++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 252 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..7bfd0cf44d35 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,190 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** @probed_size: Memory probed by the driver (-1 = unknown) */ + __u64 probed_size; + + /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** +* @probed_cpu_visible_size: Memory probed by the driver +* that is CPU accessible. (-1 = unknown). +* +* This will be always be <= @probed_size, and the +* remainder(if there is any) will not be CPU +* accessible. +*/ + __u64 probed_cpu_visible_size; + }; Trying to implement userspace support in Vulkan for this, I have an additional question about the value of probed_cpu_visible_size. When is it set to -1? I'm guessing before there is support for this value it'll be 0 (MBZ). After after it should either be the entire lmem or something smaller. -Lionel + }; +}; + +/** + * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added + * extension support using struct i915_user_extension. + * + * Note that new buffer flags should be added here, at least for the stuff that + * is immutable. Previously we would have two ioctls, one to create the object + * with gem_create, and another to apply various parameters, however this + * creates some ambiguity for the params which are considered immutable. Also in + * general we're phasing out the various SET/GET ioctls. + */ +struct __drm_i915_gem_create_ext { + /** +* @size: Requested size for the object. +* +* The (page-aligned) allocated size for the object will be returned. +* +* Note that for some devices we have might have further minimum +* page-size restrictions(larger than 4K), like for device local-memory. +* However in general the final size here should always reflect any +* rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS +* extension to place the object in device local-memory. +*/ + __u64 size; + /** +* @handle: Returned handle for the object. +* +* Object handles are nonzero. +*/ + __u32 handle; + /** +* @flags: Optional flags. +* +* Supported values: +* +* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that +* the object will need to be accessed via the CPU. +* +* Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and +* only strictly required on platforms where only some of the device +* memory is directly visible or mappable through the CPU, like on DG2+. +* +* One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to +* ensure we can always spill the allocation to system memory, if we +* can't place the object in the mappable part of +* I915_MEMORY_CLASS_DEVICE. +* +* Note that since the kernel only supports flat-CCS on objects that can +* *only* be placed in I915_MEMORY_C
Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi
On 27/04/2022 18:18, Matthew Auld wrote: On 27/04/2022 07:48, Lionel Landwerlin wrote: One question though, how do we detect that this flag (I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) is accepted on a given kernel? I assume older kernels are going to reject object creation if we use this flag? From some offline discussion with Lionel, the plan here is to just do a dummy gem_create_ext to check if the kernel throws an error with the new flag or not. I didn't plan to use __drm_i915_query_vma_info, but isn't it inconsistent to select the placement on the GEM object and then query whether it's mappable by address? You made a comment stating this is racy, wouldn't querying on the GEM object prevent this? Since mesa at this time doesn't currently have a use for this one, then I guess we should maybe just drop this part of the uapi, in this version at least, if no objections. Just repeating what we discussed (maybe I missed some other discussion and that's why I was confused) : The way I was planning to use this is to have 3 heaps in Vulkan : - heap0: local only, no cpu visible - heap1: system, cpu visible - heap2: local & cpu visible With heap2 having the reported probed_cpu_visible_size size. It is an error for the application to map from heap0 [1]. With that said, it means if we created a GEM BO without I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS, we'll never mmap it. So why the query? I guess it would be useful when we import a buffer from another application. But in that case, why not have the query on the BO? -Lionel [1] : https://www.khronos.org/registry/vulkan/specs/1.3-extensions/man/html/vkMapMemory.html (VUID-vkMapMemory-memory-00682) Thanks, -Lionel On 27/04/2022 09:35, Lionel Landwerlin wrote: Hi Matt, The proposal looks good to me. Looking forward to try it on drm-tip. -Lionel On 20/04/2022 20:13, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 190 +++ Documentation/gpu/rfc/i915_small_bar.rst | 58 +++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 252 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..7bfd0cf44d35 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,190 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** @probed_size: Memory probed by the driver (-1 = unknown) */ + __u64 probed_size; + + /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** + * @probed_cpu_visible_size: Memory probed by the driver + * that is CPU accessible. (-1 = unknown). + * + * This will be always be <= @probed_size, and the + * remainder(if there is any) will not be CPU + * accessible. + */ + __u64 probed_cpu_visible_size; + }; + }; +}; + +/** + * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added + * extension support using struct i915_user_extension. + * + * Note that new buffer flags should be added here, at least for the stuff that + * is immutable. Previously we would have two ioctls, one to create the object + * with gem_create, and another to apply various parameters, however this + * creates some ambiguity for the params which are considered immutable. Also in + * general we're phasing out the various SET/GET ioctls. + */ +struct __drm_i915_gem_create_ext { + /** + * @size: Requested size for the object. + * + * The (page-aligned) allocated size for the object will be returned. + * +
Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi
One question though, how do we detect that this flag (I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) is accepted on a given kernel? I assume older kernels are going to reject object creation if we use this flag? I didn't plan to use __drm_i915_query_vma_info, but isn't it inconsistent to select the placement on the GEM object and then query whether it's mappable by address? You made a comment stating this is racy, wouldn't querying on the GEM object prevent this? Thanks, -Lionel On 27/04/2022 09:35, Lionel Landwerlin wrote: Hi Matt, The proposal looks good to me. Looking forward to try it on drm-tip. -Lionel On 20/04/2022 20:13, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 190 +++ Documentation/gpu/rfc/i915_small_bar.rst | 58 +++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 252 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..7bfd0cf44d35 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,190 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** @probed_size: Memory probed by the driver (-1 = unknown) */ + __u64 probed_size; + + /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** + * @probed_cpu_visible_size: Memory probed by the driver + * that is CPU accessible. (-1 = unknown). + * + * This will be always be <= @probed_size, and the + * remainder(if there is any) will not be CPU + * accessible. + */ + __u64 probed_cpu_visible_size; + }; + }; +}; + +/** + * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added + * extension support using struct i915_user_extension. + * + * Note that new buffer flags should be added here, at least for the stuff that + * is immutable. Previously we would have two ioctls, one to create the object + * with gem_create, and another to apply various parameters, however this + * creates some ambiguity for the params which are considered immutable. Also in + * general we're phasing out the various SET/GET ioctls. + */ +struct __drm_i915_gem_create_ext { + /** + * @size: Requested size for the object. + * + * The (page-aligned) allocated size for the object will be returned. + * + * Note that for some devices we have might have further minimum + * page-size restrictions(larger than 4K), like for device local-memory. + * However in general the final size here should always reflect any + * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS + * extension to place the object in device local-memory. + */ + __u64 size; + /** + * @handle: Returned handle for the object. + * + * Object handles are nonzero. + */ + __u32 handle; + /** + * @flags: Optional flags. + * + * Supported values: + * + * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that + * the object will need to be accessed via the CPU. + * + * Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and + * only strictly required on platforms where only some of the device + * memory is directly visible or mappable through the CPU, like on DG2+. + * + * One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to + * ensure we can always spill the allocation to system memory, if we + * can't place the object in the mappable part of + * I915_MEMORY_CLASS_DEVICE. + * + * Note that since the kernel only supports fla
Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi
Hi Matt, The proposal looks good to me. Looking forward to try it on drm-tip. -Lionel On 20/04/2022 20:13, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 190 +++ Documentation/gpu/rfc/i915_small_bar.rst | 58 +++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 252 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..7bfd0cf44d35 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,190 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS + * at &drm_i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** @probed_size: Memory probed by the driver (-1 = unknown) */ + __u64 probed_size; + + /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** +* @probed_cpu_visible_size: Memory probed by the driver +* that is CPU accessible. (-1 = unknown). +* +* This will be always be <= @probed_size, and the +* remainder(if there is any) will not be CPU +* accessible. +*/ + __u64 probed_cpu_visible_size; + }; + }; +}; + +/** + * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added + * extension support using struct i915_user_extension. + * + * Note that new buffer flags should be added here, at least for the stuff that + * is immutable. Previously we would have two ioctls, one to create the object + * with gem_create, and another to apply various parameters, however this + * creates some ambiguity for the params which are considered immutable. Also in + * general we're phasing out the various SET/GET ioctls. + */ +struct __drm_i915_gem_create_ext { + /** +* @size: Requested size for the object. +* +* The (page-aligned) allocated size for the object will be returned. +* +* Note that for some devices we have might have further minimum +* page-size restrictions(larger than 4K), like for device local-memory. +* However in general the final size here should always reflect any +* rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS +* extension to place the object in device local-memory. +*/ + __u64 size; + /** +* @handle: Returned handle for the object. +* +* Object handles are nonzero. +*/ + __u32 handle; + /** +* @flags: Optional flags. +* +* Supported values: +* +* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that +* the object will need to be accessed via the CPU. +* +* Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and +* only strictly required on platforms where only some of the device +* memory is directly visible or mappable through the CPU, like on DG2+. +* +* One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to +* ensure we can always spill the allocation to system memory, if we +* can't place the object in the mappable part of +* I915_MEMORY_CLASS_DEVICE. +* +* Note that since the kernel only supports flat-CCS on objects that can +* *only* be placed in I915_MEMORY_CLASS_DEVICE, we therefore don't +* support I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS together with +* flat-CCS. +* +* Without this hint, the kernel will assume that non-mappable +* I915_MEMORY_CLASS_DE
Re: [Intel-gfx] [PATCH 2/2] drm/doc: add rfc section for small BAR uapi
Hey Matthew, all, This sounds like a good thing to have. There are a number of DG2 machines where we have a small BAR and this is causing more apps to fail. Anv currently reports 3 memory heaps to the app : - local device only (not host visible) -> mapped to lmem - device/cpu -> mapped to smem - local device but also host visible -> mapped to lmem So we could use this straight away, by just not putting the I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS flag on the allocation of the first heap. One thing I don't see in this proposal is how can we get the size of the 2 lmem heap : cpu visible, cpu not visible We could use that to report the appropriate size to the app. We probably want to report a new drm_i915_memory_region_info and either : - put one of the reserve field to use to indicate : cpu visible - or define a new enum value in drm_i915_gem_memory_class Cheers, -Lionel On 18/02/2022 13:22, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: mesa-...@lists.freedesktop.org --- Documentation/gpu/rfc/i915_small_bar.h | 153 +++ Documentation/gpu/rfc/i915_small_bar.rst | 40 ++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 197 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..fa65835fd608 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,153 @@ +/** + * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added + * extension support using struct i915_user_extension. + * + * Note that in the future we want to have our buffer flags here, at least for + * the stuff that is immutable. Previously we would have two ioctls, one to + * create the object with gem_create, and another to apply various parameters, + * however this creates some ambiguity for the params which are considered + * immutable. Also in general we're phasing out the various SET/GET ioctls. + */ +struct __drm_i915_gem_create_ext { + /** +* @size: Requested size for the object. +* +* The (page-aligned) allocated size for the object will be returned. +* +* Note that for some devices we have might have further minimum +* page-size restrictions(larger than 4K), like for device local-memory. +* However in general the final size here should always reflect any +* rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS +* extension to place the object in device local-memory. +*/ + __u64 size; + /** +* @handle: Returned handle for the object. +* +* Object handles are nonzero. +*/ + __u32 handle; + /** +* @flags: Optional flags. +* +* Supported values: +* +* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that +* the object will need to be accessed via the CPU. +* +* Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and +* only strictly required on platforms where only some of the device +* memory is directly visible or mappable through the CPU, like on DG2+. +* +* One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to +* ensure we can always spill the allocation to system memory, if we +* can't place the object in the mappable part of +* I915_MEMORY_CLASS_DEVICE. +* +* Note that buffers that need to be captured with EXEC_OBJECT_CAPTURE, +* will need to enable this hint, if the object can also be placed in +* I915_MEMORY_CLASS_DEVICE, starting from DG2+. The execbuf call will +* throw an error otherwise. This also means that such objects will need +* I915_MEMORY_CLASS_SYSTEM set as a possible placement. +* +* Without this hint, the kernel will assume that non-mappable +* I915_MEMORY_CLASS_DEVICE is preferred for this object. Note that the +* kernel can still migrate the object to the mappable part, as a last +* resort, if userspace ever CPU faults this object, but this might be +* expensive, and so ideally should be avoided. +*/ +#define I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS (1 << 0) + __u32 flags; + /** +* @extensions: The chain of extensions to apply to this object. +* +* This will be useful in the future when we need to support several +* different extensions, and we need to apply more than one when +* creating the object. See struct i915_user_extension. +* +* If we d
Re: [Intel-gfx] [PATCH v4 12/16] uapi/drm/dg2: Introduce format modifier for DG2 clear color
On 09/12/2021 17:45, Ramalingam C wrote: From: Mika Kahola DG2 clear color render compression uses Tile4 layout. Therefore, we need to define a new format modifier for uAPI to support clear color rendering. Signed-off-by: Mika Kahola cc: Anshuman Gupta Signed-off-by: Juha-Pekka Heikkilä Signed-off-by: Ramalingam C --- drivers/gpu/drm/i915/display/intel_fb.c| 8 drivers/gpu/drm/i915/display/skl_universal_plane.c | 9 - include/uapi/drm/drm_fourcc.h | 8 3 files changed, 24 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c index e15216f1cb82..f10e77cb5b4a 100644 --- a/drivers/gpu/drm/i915/display/intel_fb.c +++ b/drivers/gpu/drm/i915/display/intel_fb.c @@ -144,6 +144,12 @@ static const struct intel_modifier_desc intel_modifiers[] = { .modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS, .display_ver = { 13, 14 }, .plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_MC, + }, { + .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC, + .display_ver = { 13, 14 }, + .plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_RC_CC, + + .ccs.cc_planes = BIT(1), }, { .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS, .display_ver = { 13, 14 }, @@ -559,6 +565,7 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane) else return 512; case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS: + case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: case I915_FORMAT_MOD_4_TILED: /* @@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb, case I915_FORMAT_MOD_Yf_TILED: return 1 * 1024 * 1024; case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS: + case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: return 16 * 1024; default: diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c index d80424194c75..9a89df9c0243 100644 --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c @@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier) return PLANE_CTL_TILED_4 | PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE | PLANE_CTL_CLEAR_COLOR_DISABLE; + case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: + return PLANE_CTL_TILED_4 | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE; case I915_FORMAT_MOD_Y_TILED_CCS: case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC: return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE; @@ -2337,10 +2339,15 @@ skl_get_initial_plane_config(struct intel_crtc *crtc, break; case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */ if (HAS_4TILE(dev_priv)) { - if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE) + u32 rc_mask = PLANE_CTL_RENDER_DECOMPRESSION_ENABLE | + PLANE_CTL_CLEAR_COLOR_DISABLE; + + if ((val & rc_mask) == rc_mask) fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS; else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE) fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS; + else if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE) + fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC; else fb->modifier = I915_FORMAT_MOD_4_TILED; } else { diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index 51fdda26844a..b155f69f2344 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -598,6 +598,14 @@ extern "C" { */ #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11) My colleague Nanley (Cc) had some requests for clarifications on this new modifier. In particular in which plane is the clear color located. I guess it wouldn't hurt to also state for each of the new modifiers defined in this series, how many planes and what data they contain. Thanks, -Lionel +/* + * Intel color control surfaces (CCS) for DG2 clear color render compression. + * + * DG2 uses a unified compression format for clear color render compression. + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout. + */ +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC fourcc_mod_code(INTEL, 12) + /* * Tiled, NV12MT, grouped in 64 (pi
Re: [PATCH v2] drm/syncobj: Deal with signalled fences in drm_syncobj_find_fence.
On 08/12/2021 11:28, Christian König wrote: Am 08.12.21 um 03:39 schrieb Bas Nieuwenhuizen: dma_fence_chain_find_seqno only ever returns the top fence in the chain or an unsignalled fence. Hence if we request a seqno that is already signalled it returns a NULL fence. Some callers are not prepared to handle this, like the syncobj transfer functions for example. This behavior is "new" with timeline syncobj and it looks like not all callers were updated. To fix this behavior make sure that a successful drm_sync_find_fence always returns a non-NULL fence. v2: Move the fix to drm_syncobj_find_fence from the transfer functions. Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between binary and timeline v2") Cc: sta...@vger.kernel.org Signed-off-by: Bas Nieuwenhuizen Reviewed-by: Christian König Thanks! Acked-by: Lionel Landwerlin --- drivers/gpu/drm/drm_syncobj.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index fdd2ec87cdd1..11be91b5709b 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -404,8 +404,17 @@ int drm_syncobj_find_fence(struct drm_file *file_private, if (*fence) { ret = dma_fence_chain_find_seqno(fence, point); - if (!ret) + if (!ret) { + /* If the requested seqno is already signaled + * drm_syncobj_find_fence may return a NULL + * fence. To make sure the recipient gets + * signalled, use a new fence instead. + */ + if (!*fence) + *fence = dma_fence_get_stub(); + goto out; + } dma_fence_put(*fence); } else { ret = -EINVAL;
Re: [PATCH] drm/syncobj: Deal with signalled fences in transfer.
On 07/12/2021 13:00, Christian König wrote: Am 07.12.21 um 11:40 schrieb Bas Nieuwenhuizen: On Tue, Dec 7, 2021 at 8:21 AM Christian König wrote: Am 07.12.21 um 08:10 schrieb Lionel Landwerlin: On 07/12/2021 03:32, Bas Nieuwenhuizen wrote: See the comments in the code. Basically if the seqno is already signalled then we get a NULL fence. If we then put the NULL fence in a binary syncobj it counts as unsignalled, making that syncobj pretty much useless for all expected uses. Not 100% sure about the transfer to a timeline syncobj but I believe it is needed there too, as AFAICT the add_point function assumes the fence isn't NULL. Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between binary and timeline v2") Cc: sta...@vger.kernel.org Signed-off-by: Bas Nieuwenhuizen --- drivers/gpu/drm/drm_syncobj.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index fdd2ec87cdd1..eb28a40400d2 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -861,6 +861,19 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, &fence); if (ret) goto err; + + /* If the requested seqno is already signaled drm_syncobj_find_fence may + * return a NULL fence. To make sure the recipient gets signalled, use + * a new fence instead. + */ + if (!fence) { + fence = dma_fence_allocate_private_stub(); + if (!fence) { + ret = -ENOMEM; + goto err; + } + } + Shouldn't we fix drm_syncobj_find_fence() instead? Mhm, now that you mention it. Bas, why do you think that dma_fence_chain_find_seqno() may return NULL when the fence is already signaled? Double checking the code that should never ever happen. Well, I tested the patch with https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fmerge_requests%2F14097%2Fdiffs%3Fcommit_id%3Dd4c5c840f4e3839f9f5c1747a9034eb2b565f5c0&data=04%7C01%7Cchristian.koenig%40amd.com%7Cc1ab29fc100842826f5d08d9b96e102a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637744705383763833%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=sXkTJWm%2FWm2xwgLGdepVWAOlqj%2FeArnvmMvnJpQ9YEs%3D&reserved=0 so I'm pretty sure it happens, and this patch fixes it, though I may have misidentified what the code should do. My reading is that the dma_fence_chain_for_each in dma_fence_chain_find_seqno will never visit a signalled fence (unless the top one is signalled), as dma_fence_chain_walk will never return a signalled fence (it only returns on NULL or !signalled). Ah, yes that suddenly makes more sense. Happy to move this to drm_syncobj_find_fence. No, I think that your current patch is fine. That drm_syncobj_find_fence() only returns NULL when it can't find anything !signaled is correct behavior I think. We should probably update the docs then : * Returns 0 on success or a negative error value on failure. On success @fence * contains a reference to the fence, which must be released by calling * dma_fence_put(). Looking at some of the kernel drivers, it looks like they don't all protect themselves against NULL pointers : https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/vc4/vc4_gem.c#L1195 https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c#L1020 -Lionel Going to push your original patch if nobody has any more objections. But somebody might want to take care of the IGT as well. Regards, Christian. Regards, Christian. By returning a stub fence for the timeline case if there isn't one. Because the same NULL fence check appears missing in amdgpu (and probably other drivers). Also we should have tests for this in IGT. AMD contributed some tests when this code was written but they never got reviewed :( -Lionel chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL); if (!chain) { ret = -ENOMEM; @@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file *file_private, args->src_point, args->flags, &fence); if (ret) goto err; + + /* If the requested seqno is already signaled drm_syncobj_find_fence may + * return a NULL fence. To make sure the recipient gets signalled, use + * a new fence instead. + */ + if (!fence) { + fence = dma_fence_allocate_private_stub(); + if (!fence) { + ret = -ENOMEM; + goto err; + } + } + drm_syncobj_replace_fence(binary_syncobj, fence); dma_fence_put(fence); err:
Re: [PATCH] drm/syncobj: Deal with signalled fences in transfer.
On 07/12/2021 03:32, Bas Nieuwenhuizen wrote: See the comments in the code. Basically if the seqno is already signalled then we get a NULL fence. If we then put the NULL fence in a binary syncobj it counts as unsignalled, making that syncobj pretty much useless for all expected uses. Not 100% sure about the transfer to a timeline syncobj but I believe it is needed there too, as AFAICT the add_point function assumes the fence isn't NULL. Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between binary and timeline v2") Cc: sta...@vger.kernel.org Signed-off-by: Bas Nieuwenhuizen --- drivers/gpu/drm/drm_syncobj.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index fdd2ec87cdd1..eb28a40400d2 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -861,6 +861,19 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private, &fence); if (ret) goto err; + + /* If the requested seqno is already signaled drm_syncobj_find_fence may +* return a NULL fence. To make sure the recipient gets signalled, use +* a new fence instead. +*/ + if (!fence) { + fence = dma_fence_allocate_private_stub(); + if (!fence) { + ret = -ENOMEM; + goto err; + } + } + Shouldn't we fix drm_syncobj_find_fence() instead? By returning a stub fence for the timeline case if there isn't one. Because the same NULL fence check appears missing in amdgpu (and probably other drivers). Also we should have tests for this in IGT. AMD contributed some tests when this code was written but they never got reviewed :( -Lionel chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL); if (!chain) { ret = -ENOMEM; @@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file *file_private, args->src_point, args->flags, &fence); if (ret) goto err; + + /* If the requested seqno is already signaled drm_syncobj_find_fence may +* return a NULL fence. To make sure the recipient gets signalled, use +* a new fence instead. +*/ + if (!fence) { + fence = dma_fence_allocate_private_stub(); + if (!fence) { + ret = -ENOMEM; + goto err; + } + } + drm_syncobj_replace_fence(binary_syncobj, fence); dma_fence_put(fence); err:
Re: [PATCH v5 00/15] drm/i915: Introduce Intel PXP
On 16/07/2021 07:10, Daniele Ceraolo Spurio wrote: PXP (Protected Xe Path) is an i915 component, available on GEN12+, that helps to establish the hardware protected session and manage the status of the alive software session, as well as its life cycle. The main changes in v5 are: - Rebased to new proto_ctx implementation. - Squashed all uapi changes in a single patch and slightly updated docs. - Now handling mei_pxp loading after i915 Tested with: https://patchwork.freedesktop.org/series/87570/ Cc: Gaurav Kumar Cc: Chris Wilson Cc: Rodrigo Vivi Cc: Joonas Lahtinen Cc: Juston Li Cc: Alan Previn Cc: Lionel Landwerlin Cc: Jason Ekstrand Cc: Daniel Vetter Updated the Mesa series for GL/Vulkan. UAPI looks good : Acked-by: Lionel Landwerlin Cheers, -Lionel
Re: [PATCH 31/53] drm/i915/dg2: Report INSTDONE_GEOM values in error state
On 01/07/2021 23:24, Matt Roper wrote: Xe_HPG adds some additional INSTDONE_GEOM debug registers; the Mesa team has indicated that having these reported in the error state would be useful for debugging GPU hangs. These registers are replicated per-DSS with gslice steering. Cc: Lionel Landwerlin Signed-off-by: Matt Roper Thanks, Acked-by: Lionel Landwerlin --- drivers/gpu/drm/i915/gt/intel_engine_cs.c| 7 +++ drivers/gpu/drm/i915/gt/intel_engine_types.h | 3 +++ drivers/gpu/drm/i915/i915_gpu_error.c| 10 -- drivers/gpu/drm/i915/i915_reg.h | 1 + 4 files changed, 19 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index e1302e9c168b..b3c002e4ae9f 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1220,6 +1220,13 @@ void intel_engine_get_instdone(const struct intel_engine_cs *engine, GEN7_ROW_INSTDONE); } } + + if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 55)) { + for_each_instdone_gslice_dss_xehp(i915, sseu, iter, slice, subslice) + instdone->geom_svg[slice][subslice] = + read_subslice_reg(engine, slice, subslice, + XEHPG_INSTDONE_GEOM_SVG); + } } else if (GRAPHICS_VER(i915) >= 7) { instdone->instdone = intel_uncore_read(uncore, RING_INSTDONE(mmio_base)); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index e917b7519f2b..93609d797ac2 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -80,6 +80,9 @@ struct intel_instdone { u32 slice_common_extra[2]; u32 sampler[GEN_MAX_GSLICES][I915_MAX_SUBSLICES]; u32 row[GEN_MAX_GSLICES][I915_MAX_SUBSLICES]; + + /* Added in XeHPG */ + u32 geom_svg[GEN_MAX_GSLICES][I915_MAX_SUBSLICES]; }; /* diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index c1e744b5ab47..4de7edc451ef 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -431,6 +431,7 @@ static void error_print_instdone(struct drm_i915_error_state_buf *m, const struct sseu_dev_info *sseu = &ee->engine->gt->info.sseu; int slice; int subslice; + int iter; err_printf(m, " INSTDONE: 0x%08x\n", ee->instdone.instdone); @@ -445,8 +446,6 @@ static void error_print_instdone(struct drm_i915_error_state_buf *m, return; if (GRAPHICS_VER_FULL(m->i915) >= IP_VER(12, 50)) { - int iter; - for_each_instdone_gslice_dss_xehp(m->i915, sseu, iter, slice, subslice) err_printf(m, " SAMPLER_INSTDONE[%d][%d]: 0x%08x\n", slice, subslice, @@ -471,6 +470,13 @@ static void error_print_instdone(struct drm_i915_error_state_buf *m, if (GRAPHICS_VER(m->i915) < 12) return; + if (GRAPHICS_VER_FULL(m->i915) >= IP_VER(12, 55)) { + for_each_instdone_gslice_dss_xehp(m->i915, sseu, iter, slice, subslice) + err_printf(m, " GEOM_SVGUNIT_INSTDONE[%d][%d]: 0x%08x\n", + slice, subslice, + ee->instdone.geom_svg[slice][subslice]); + } + err_printf(m, " SC_INSTDONE_EXTRA: 0x%08x\n", ee->instdone.slice_common_extra[0]); err_printf(m, " SC_INSTDONE_EXTRA2: 0x%08x\n", diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 35a42df1f2aa..d58864c7adc6 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -2686,6 +2686,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define GEN12_SC_INSTDONE_EXTRA2 _MMIO(0x7108) #define GEN7_SAMPLER_INSTDONE _MMIO(0xe160) #define GEN7_ROW_INSTDONE _MMIO(0xe164) +#define XEHPG_INSTDONE_GEOM_SVG_MMIO(0x666c) #define MCFG_MCR_SELECTOR _MMIO(0xfd0) #define SF_MCR_SELECTOR _MMIO(0xfd8) #define GEN8_MCR_SELECTOR _MMIO(0xfdc)
Re: [Intel-gfx] [PATCH 3/3] drm/i915/uapi: Add query for L3 bank count
On 10/06/2021 23:46, john.c.harri...@intel.com wrote: From: John Harrison Various UMDs need to know the L3 bank count. So add a query API for it. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/intel_gt.c | 15 +++ drivers/gpu/drm/i915/gt/intel_gt.h | 1 + drivers/gpu/drm/i915/i915_query.c | 22 ++ drivers/gpu/drm/i915/i915_reg.h| 1 + include/uapi/drm/i915_drm.h| 1 + 5 files changed, 40 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 2161bf01ef8b..708bb3581d83 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -704,3 +704,18 @@ void intel_gt_info_print(const struct intel_gt_info *info, intel_sseu_dump(&info->sseu, p); } + +int intel_gt_get_l3bank_count(struct intel_gt *gt) +{ + struct drm_i915_private *i915 = gt->i915; + intel_wakeref_t wakeref; + u32 fuse3; + + if (GRAPHICS_VER(i915) < 12) + return -ENODEV; + + with_intel_runtime_pm(gt->uncore->rpm, wakeref) + fuse3 = intel_uncore_read(gt->uncore, GEN10_MIRROR_FUSE3); + + return hweight32(REG_FIELD_GET(GEN12_GT_L3_MODE_MASK, ~fuse3)); +} diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h index 7ec395cace69..46aa1cf4cf30 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.h +++ b/drivers/gpu/drm/i915/gt/intel_gt.h @@ -77,6 +77,7 @@ static inline bool intel_gt_is_wedged(const struct intel_gt *gt) void intel_gt_info_print(const struct intel_gt_info *info, struct drm_printer *p); +int intel_gt_get_l3bank_count(struct intel_gt *gt); void intel_gt_watchdog_work(struct work_struct *work); diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index 96bd8fb3e895..0e92bb2d21b2 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -10,6 +10,7 @@ #include "i915_perf.h" #include "i915_query.h" #include +#include "gt/intel_gt.h" static int copy_query_item(void *query_hdr, size_t query_sz, u32 total_length, @@ -502,6 +503,26 @@ static int query_hwconfig_table(struct drm_i915_private *i915, return hwconfig->size; } +static int query_l3banks(struct drm_i915_private *i915, +struct drm_i915_query_item *query_item) +{ + u32 banks; + + if (query_item->length == 0) + return sizeof(banks); + + if (query_item->length < sizeof(banks)) + return -EINVAL; + + banks = intel_gt_get_l3bank_count(&i915->gt); + + if (copy_to_user(u64_to_user_ptr(query_item->data_ptr), +&banks, sizeof(banks))) + return -EFAULT; + + return sizeof(banks); +} + static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv, struct drm_i915_query_item *query_item) = { query_topology_info, @@ -509,6 +530,7 @@ static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv, query_perf_config, query_memregion_info, query_hwconfig_table, + query_l3banks, }; int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index eb13c601d680..e9ba88fe3db7 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -3099,6 +3099,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define GEN10_MIRROR_FUSE3 _MMIO(0x9118) #define GEN10_L3BANK_PAIR_COUNT 4 #define GEN10_L3BANK_MASK 0x0F +#define GEN12_GT_L3_MODE_MASK 0xFF #define GEN8_EU_DISABLE0 _MMIO(0x9134) #define GEN8_EU_DIS0_S0_MASK0xff diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 87d369cae22a..20d18cca5066 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -2234,6 +2234,7 @@ struct drm_i915_query_item { #define DRM_I915_QUERY_PERF_CONFIG 3 #define DRM_I915_QUERY_MEMORY_REGIONS 4 #define DRM_I915_QUERY_HWCONFIG_TABLE 5 +#define DRM_I915_QUERY_L3_BANK_COUNT6 A little bit of documentation about the format of the return data would be nice :) -Lionel /* Must be kept compact -- no holes and well documented */ /**
Re: [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
On 29/04/2021 03:34, Umesh Nerlige Ramappa wrote: Perf measurements rely on CPU and engine timestamps to correlate events of interest across these time domains. Current mechanisms get these timestamps separately and the calculated delta between these timestamps lack enough accuracy. To improve the accuracy of these time measurements to within a few us, add a query that returns the engine and cpu timestamps captured as close to each other as possible. v2: (Tvrtko) - document clock reference used - return cpu timestamp always - capture cpu time just before lower dword of cs timestamp v3: (Chris) - use uncore-rpm - use __query_cs_timestamp helper v4: (Lionel) - Kernel perf subsytem allows users to specify the clock id to be used in perf_event_open. This clock id is used by the perf subsystem to return the appropriate cpu timestamp in perf events. Similarly, let the user pass the clockid to this query so that cpu timestamp corresponds to the clock id requested. v5: (Tvrtko) - Use normal ktime accessors instead of fast versions - Add more uApi documentation v6: (Lionel) - Move switch out of spinlock v7: (Chris) - cs_timestamp is a misnomer, use cs_cycles instead - return the cs cycle frequency as well in the query v8: - Add platform and engine specific checks v9: (Lionel) - Return 2 cpu timestamps in the query - captured before and after the register read v10: (Chris) - Use local_clock() to measure time taken to read lower dword of register and return it to user. v11: (Jani) - IS_GEN deprecated. User GRAPHICS_VER instead. v12: (Jason) - Split cpu timestamp array into timestamp and delta for cleaner API Signed-off-by: Umesh Nerlige Ramappa Reviewed-by: Lionel Landwerlin Thanks for the update : Reviewed-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_query.c | 148 ++ include/uapi/drm/i915_drm.h | 52 +++ 2 files changed, 200 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index fed337ad7b68..357c44e8177c 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -6,6 +6,8 @@ #include +#include "gt/intel_engine_pm.h" +#include "gt/intel_engine_user.h" #include "i915_drv.h" #include "i915_perf.h" #include "i915_query.h" @@ -90,6 +92,151 @@ static int query_topology_info(struct drm_i915_private *dev_priv, return total_length; } +typedef u64 (*__ktime_func_t)(void); +static __ktime_func_t __clock_id_to_func(clockid_t clk_id) +{ + /* +* Use logic same as the perf subsystem to allow user to select the +* reference clock id to be used for timestamps. +*/ + switch (clk_id) { + case CLOCK_MONOTONIC: + return &ktime_get_ns; + case CLOCK_MONOTONIC_RAW: + return &ktime_get_raw_ns; + case CLOCK_REALTIME: + return &ktime_get_real_ns; + case CLOCK_BOOTTIME: + return &ktime_get_boottime_ns; + case CLOCK_TAI: + return &ktime_get_clocktai_ns; + default: + return NULL; + } +} + +static inline int +__read_timestamps(struct intel_uncore *uncore, + i915_reg_t lower_reg, + i915_reg_t upper_reg, + u64 *cs_ts, + u64 *cpu_ts, + u64 *cpu_delta, + __ktime_func_t cpu_clock) +{ + u32 upper, lower, old_upper, loop = 0; + + upper = intel_uncore_read_fw(uncore, upper_reg); + do { + *cpu_delta = local_clock(); + *cpu_ts = cpu_clock(); + lower = intel_uncore_read_fw(uncore, lower_reg); + *cpu_delta = local_clock() - *cpu_delta; + old_upper = upper; + upper = intel_uncore_read_fw(uncore, upper_reg); + } while (upper != old_upper && loop++ < 2); + + *cs_ts = (u64)upper << 32 | lower; + + return 0; +} + +static int +__query_cs_cycles(struct intel_engine_cs *engine, + u64 *cs_ts, u64 *cpu_ts, u64 *cpu_delta, + __ktime_func_t cpu_clock) +{ + struct intel_uncore *uncore = engine->uncore; + enum forcewake_domains fw_domains; + u32 base = engine->mmio_base; + intel_wakeref_t wakeref; + int ret; + + fw_domains = intel_uncore_forcewake_for_reg(uncore, + RING_TIMESTAMP(base), + FW_REG_READ); + + with_intel_runtime_pm(uncore->rpm, wakeref) { + spin_lock_irq(&uncore->lock); + intel_uncore_forcewake_get__locked(uncore, fw_domains); + + ret = __read_timestamps(uncore, + RING_TIMESTAMP(base), +
Re: [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
On 28/04/2021 23:45, Jason Ekstrand wrote: On Wed, Apr 28, 2021 at 3:14 PM Lionel Landwerlin wrote: On 28/04/2021 22:54, Jason Ekstrand wrote: On Wed, Apr 28, 2021 at 2:50 PM Lionel Landwerlin wrote: On 28/04/2021 22:24, Jason Ekstrand wrote: On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula wrote: On Tue, 27 Apr 2021, Umesh Nerlige Ramappa wrote: Perf measurements rely on CPU and engine timestamps to correlate events of interest across these time domains. Current mechanisms get these timestamps separately and the calculated delta between these timestamps lack enough accuracy. To improve the accuracy of these time measurements to within a few us, add a query that returns the engine and cpu timestamps captured as close to each other as possible. Cc: dri-devel, Jason and Daniel for review. Thanks! v2: (Tvrtko) - document clock reference used - return cpu timestamp always - capture cpu time just before lower dword of cs timestamp v3: (Chris) - use uncore-rpm - use __query_cs_timestamp helper v4: (Lionel) - Kernel perf subsytem allows users to specify the clock id to be used in perf_event_open. This clock id is used by the perf subsystem to return the appropriate cpu timestamp in perf events. Similarly, let the user pass the clockid to this query so that cpu timestamp corresponds to the clock id requested. v5: (Tvrtko) - Use normal ktime accessors instead of fast versions - Add more uApi documentation v6: (Lionel) - Move switch out of spinlock v7: (Chris) - cs_timestamp is a misnomer, use cs_cycles instead - return the cs cycle frequency as well in the query v8: - Add platform and engine specific checks v9: (Lionel) - Return 2 cpu timestamps in the query - captured before and after the register read v10: (Chris) - Use local_clock() to measure time taken to read lower dword of register and return it to user. v11: (Jani) - IS_GEN deprecated. User GRAPHICS_VER instead. Signed-off-by: Umesh Nerlige Ramappa --- drivers/gpu/drm/i915/i915_query.c | 145 ++ include/uapi/drm/i915_drm.h | 48 ++ 2 files changed, 193 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index fed337ad7b68..2594b93901ac 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -6,6 +6,8 @@ #include +#include "gt/intel_engine_pm.h" +#include "gt/intel_engine_user.h" #include "i915_drv.h" #include "i915_perf.h" #include "i915_query.h" @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv, return total_length; } +typedef u64 (*__ktime_func_t)(void); +static __ktime_func_t __clock_id_to_func(clockid_t clk_id) +{ + /* + * Use logic same as the perf subsystem to allow user to select the + * reference clock id to be used for timestamps. + */ + switch (clk_id) { + case CLOCK_MONOTONIC: + return &ktime_get_ns; + case CLOCK_MONOTONIC_RAW: + return &ktime_get_raw_ns; + case CLOCK_REALTIME: + return &ktime_get_real_ns; + case CLOCK_BOOTTIME: + return &ktime_get_boottime_ns; + case CLOCK_TAI: + return &ktime_get_clocktai_ns; + default: + return NULL; + } +} + +static inline int +__read_timestamps(struct intel_uncore *uncore, + i915_reg_t lower_reg, + i915_reg_t upper_reg, + u64 *cs_ts, + u64 *cpu_ts, + __ktime_func_t cpu_clock) +{ + u32 upper, lower, old_upper, loop = 0; + + upper = intel_uncore_read_fw(uncore, upper_reg); + do { + cpu_ts[1] = local_clock(); + cpu_ts[0] = cpu_clock(); + lower = intel_uncore_read_fw(uncore, lower_reg); + cpu_ts[1] = local_clock() - cpu_ts[1]; + old_upper = upper; + upper = intel_uncore_read_fw(uncore, upper_reg); + } while (upper != old_upper && loop++ < 2); + + *cs_ts = (u64)upper << 32 | lower; + + return 0; +} + +static int +__query_cs_cycles(struct intel_engine_cs *engine, + u64 *cs_ts, u64 *cpu_ts, + __ktime_func_t cpu_clock) +{ + struct intel_uncore *uncore = engine->uncore; + enum forcewake_domains fw_domains; + u32 base = engine->mmio_base; + intel_wakeref_t wakeref; + int ret; + + fw_domains = intel_uncore_forcewake_for_reg(uncore, + RING_TIMESTAMP(base), + FW_REG_READ); + + with_intel_runtime_pm(uncore->rpm, wakeref) { + spin_lock_irq(&uncore->lock); + intel_uncore_forcewake_get__locked(uncore, fw_domains); + + ret = __read_timestamps(uncore, + RING_TIMESTAMP(base), +
Re: [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
On 28/04/2021 23:14, Lionel Landwerlin wrote: On 28/04/2021 22:54, Jason Ekstrand wrote: On Wed, Apr 28, 2021 at 2:50 PM Lionel Landwerlin wrote: On 28/04/2021 22:24, Jason Ekstrand wrote: On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula wrote: On Tue, 27 Apr 2021, Umesh Nerlige Ramappa wrote: Perf measurements rely on CPU and engine timestamps to correlate events of interest across these time domains. Current mechanisms get these timestamps separately and the calculated delta between these timestamps lack enough accuracy. To improve the accuracy of these time measurements to within a few us, add a query that returns the engine and cpu timestamps captured as close to each other as possible. Cc: dri-devel, Jason and Daniel for review. Thanks! v2: (Tvrtko) - document clock reference used - return cpu timestamp always - capture cpu time just before lower dword of cs timestamp v3: (Chris) - use uncore-rpm - use __query_cs_timestamp helper v4: (Lionel) - Kernel perf subsytem allows users to specify the clock id to be used in perf_event_open. This clock id is used by the perf subsystem to return the appropriate cpu timestamp in perf events. Similarly, let the user pass the clockid to this query so that cpu timestamp corresponds to the clock id requested. v5: (Tvrtko) - Use normal ktime accessors instead of fast versions - Add more uApi documentation v6: (Lionel) - Move switch out of spinlock v7: (Chris) - cs_timestamp is a misnomer, use cs_cycles instead - return the cs cycle frequency as well in the query v8: - Add platform and engine specific checks v9: (Lionel) - Return 2 cpu timestamps in the query - captured before and after the register read v10: (Chris) - Use local_clock() to measure time taken to read lower dword of register and return it to user. v11: (Jani) - IS_GEN deprecated. User GRAPHICS_VER instead. Signed-off-by: Umesh Nerlige Ramappa --- drivers/gpu/drm/i915/i915_query.c | 145 ++ include/uapi/drm/i915_drm.h | 48 ++ 2 files changed, 193 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index fed337ad7b68..2594b93901ac 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -6,6 +6,8 @@ #include +#include "gt/intel_engine_pm.h" +#include "gt/intel_engine_user.h" #include "i915_drv.h" #include "i915_perf.h" #include "i915_query.h" @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv, return total_length; } +typedef u64 (*__ktime_func_t)(void); +static __ktime_func_t __clock_id_to_func(clockid_t clk_id) +{ + /* + * Use logic same as the perf subsystem to allow user to select the + * reference clock id to be used for timestamps. + */ + switch (clk_id) { + case CLOCK_MONOTONIC: + return &ktime_get_ns; + case CLOCK_MONOTONIC_RAW: + return &ktime_get_raw_ns; + case CLOCK_REALTIME: + return &ktime_get_real_ns; + case CLOCK_BOOTTIME: + return &ktime_get_boottime_ns; + case CLOCK_TAI: + return &ktime_get_clocktai_ns; + default: + return NULL; + } +} + +static inline int +__read_timestamps(struct intel_uncore *uncore, + i915_reg_t lower_reg, + i915_reg_t upper_reg, + u64 *cs_ts, + u64 *cpu_ts, + __ktime_func_t cpu_clock) +{ + u32 upper, lower, old_upper, loop = 0; + + upper = intel_uncore_read_fw(uncore, upper_reg); + do { + cpu_ts[1] = local_clock(); + cpu_ts[0] = cpu_clock(); + lower = intel_uncore_read_fw(uncore, lower_reg); + cpu_ts[1] = local_clock() - cpu_ts[1]; + old_upper = upper; + upper = intel_uncore_read_fw(uncore, upper_reg); + } while (upper != old_upper && loop++ < 2); + + *cs_ts = (u64)upper << 32 | lower; + + return 0; +} + +static int +__query_cs_cycles(struct intel_engine_cs *engine, + u64 *cs_ts, u64 *cpu_ts, + __ktime_func_t cpu_clock) +{ + struct intel_uncore *uncore = engine->uncore; + enum forcewake_domains fw_domains; + u32 base = engine->mmio_base; + intel_wakeref_t wakeref; + int ret; + + fw_domains = intel_uncore_forcewake_for_reg(uncore, + RING_TIMESTAMP(base), + FW_REG_READ); + + with_intel_runtime_pm(uncore->rpm, wakeref) { + spin_lock_irq(&uncore->lock); + intel_uncore_forcewake_get__locked(uncore, fw_domains); + + ret = __read_timestamps(uncore, + RING_TIMESTAMP(base), + RING_TIMESTAMP_UDW(base), + cs_ts, + cpu_ts, + cpu_clock)
Re: [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
On 28/04/2021 22:54, Jason Ekstrand wrote: On Wed, Apr 28, 2021 at 2:50 PM Lionel Landwerlin wrote: On 28/04/2021 22:24, Jason Ekstrand wrote: On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula wrote: On Tue, 27 Apr 2021, Umesh Nerlige Ramappa wrote: Perf measurements rely on CPU and engine timestamps to correlate events of interest across these time domains. Current mechanisms get these timestamps separately and the calculated delta between these timestamps lack enough accuracy. To improve the accuracy of these time measurements to within a few us, add a query that returns the engine and cpu timestamps captured as close to each other as possible. Cc: dri-devel, Jason and Daniel for review. Thanks! v2: (Tvrtko) - document clock reference used - return cpu timestamp always - capture cpu time just before lower dword of cs timestamp v3: (Chris) - use uncore-rpm - use __query_cs_timestamp helper v4: (Lionel) - Kernel perf subsytem allows users to specify the clock id to be used in perf_event_open. This clock id is used by the perf subsystem to return the appropriate cpu timestamp in perf events. Similarly, let the user pass the clockid to this query so that cpu timestamp corresponds to the clock id requested. v5: (Tvrtko) - Use normal ktime accessors instead of fast versions - Add more uApi documentation v6: (Lionel) - Move switch out of spinlock v7: (Chris) - cs_timestamp is a misnomer, use cs_cycles instead - return the cs cycle frequency as well in the query v8: - Add platform and engine specific checks v9: (Lionel) - Return 2 cpu timestamps in the query - captured before and after the register read v10: (Chris) - Use local_clock() to measure time taken to read lower dword of register and return it to user. v11: (Jani) - IS_GEN deprecated. User GRAPHICS_VER instead. Signed-off-by: Umesh Nerlige Ramappa --- drivers/gpu/drm/i915/i915_query.c | 145 ++ include/uapi/drm/i915_drm.h | 48 ++ 2 files changed, 193 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index fed337ad7b68..2594b93901ac 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -6,6 +6,8 @@ #include +#include "gt/intel_engine_pm.h" +#include "gt/intel_engine_user.h" #include "i915_drv.h" #include "i915_perf.h" #include "i915_query.h" @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv, return total_length; } +typedef u64 (*__ktime_func_t)(void); +static __ktime_func_t __clock_id_to_func(clockid_t clk_id) +{ + /* + * Use logic same as the perf subsystem to allow user to select the + * reference clock id to be used for timestamps. + */ + switch (clk_id) { + case CLOCK_MONOTONIC: + return &ktime_get_ns; + case CLOCK_MONOTONIC_RAW: + return &ktime_get_raw_ns; + case CLOCK_REALTIME: + return &ktime_get_real_ns; + case CLOCK_BOOTTIME: + return &ktime_get_boottime_ns; + case CLOCK_TAI: + return &ktime_get_clocktai_ns; + default: + return NULL; + } +} + +static inline int +__read_timestamps(struct intel_uncore *uncore, + i915_reg_t lower_reg, + i915_reg_t upper_reg, + u64 *cs_ts, + u64 *cpu_ts, + __ktime_func_t cpu_clock) +{ + u32 upper, lower, old_upper, loop = 0; + + upper = intel_uncore_read_fw(uncore, upper_reg); + do { + cpu_ts[1] = local_clock(); + cpu_ts[0] = cpu_clock(); + lower = intel_uncore_read_fw(uncore, lower_reg); + cpu_ts[1] = local_clock() - cpu_ts[1]; + old_upper = upper; + upper = intel_uncore_read_fw(uncore, upper_reg); + } while (upper != old_upper && loop++ < 2); + + *cs_ts = (u64)upper << 32 | lower; + + return 0; +} + +static int +__query_cs_cycles(struct intel_engine_cs *engine, + u64 *cs_ts, u64 *cpu_ts, + __ktime_func_t cpu_clock) +{ + struct intel_uncore *uncore = engine->uncore; + enum forcewake_domains fw_domains; + u32 base = engine->mmio_base; + intel_wakeref_t wakeref; + int ret; + + fw_domains = intel_uncore_forcewake_for_reg(uncore, + RING_TIMESTAMP(base), + FW_REG_READ); + + with_intel_runtime_pm(uncore->rpm, wakeref) { + spin_lock_irq(&uncore->lock); + intel_uncore_forcewake_get__locked(uncore, fw_domains); + + ret = __read_timestamps(uncore, + RING_TIMESTAMP(base), + RING_TIMESTAMP_UDW(base), +
Re: [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy
On 28/04/2021 22:24, Jason Ekstrand wrote: On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula wrote: On Tue, 27 Apr 2021, Umesh Nerlige Ramappa wrote: Perf measurements rely on CPU and engine timestamps to correlate events of interest across these time domains. Current mechanisms get these timestamps separately and the calculated delta between these timestamps lack enough accuracy. To improve the accuracy of these time measurements to within a few us, add a query that returns the engine and cpu timestamps captured as close to each other as possible. Cc: dri-devel, Jason and Daniel for review. Thanks! v2: (Tvrtko) - document clock reference used - return cpu timestamp always - capture cpu time just before lower dword of cs timestamp v3: (Chris) - use uncore-rpm - use __query_cs_timestamp helper v4: (Lionel) - Kernel perf subsytem allows users to specify the clock id to be used in perf_event_open. This clock id is used by the perf subsystem to return the appropriate cpu timestamp in perf events. Similarly, let the user pass the clockid to this query so that cpu timestamp corresponds to the clock id requested. v5: (Tvrtko) - Use normal ktime accessors instead of fast versions - Add more uApi documentation v6: (Lionel) - Move switch out of spinlock v7: (Chris) - cs_timestamp is a misnomer, use cs_cycles instead - return the cs cycle frequency as well in the query v8: - Add platform and engine specific checks v9: (Lionel) - Return 2 cpu timestamps in the query - captured before and after the register read v10: (Chris) - Use local_clock() to measure time taken to read lower dword of register and return it to user. v11: (Jani) - IS_GEN deprecated. User GRAPHICS_VER instead. Signed-off-by: Umesh Nerlige Ramappa --- drivers/gpu/drm/i915/i915_query.c | 145 ++ include/uapi/drm/i915_drm.h | 48 ++ 2 files changed, 193 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index fed337ad7b68..2594b93901ac 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -6,6 +6,8 @@ #include +#include "gt/intel_engine_pm.h" +#include "gt/intel_engine_user.h" #include "i915_drv.h" #include "i915_perf.h" #include "i915_query.h" @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private *dev_priv, return total_length; } +typedef u64 (*__ktime_func_t)(void); +static __ktime_func_t __clock_id_to_func(clockid_t clk_id) +{ + /* + * Use logic same as the perf subsystem to allow user to select the + * reference clock id to be used for timestamps. + */ + switch (clk_id) { + case CLOCK_MONOTONIC: + return &ktime_get_ns; + case CLOCK_MONOTONIC_RAW: + return &ktime_get_raw_ns; + case CLOCK_REALTIME: + return &ktime_get_real_ns; + case CLOCK_BOOTTIME: + return &ktime_get_boottime_ns; + case CLOCK_TAI: + return &ktime_get_clocktai_ns; + default: + return NULL; + } +} + +static inline int +__read_timestamps(struct intel_uncore *uncore, + i915_reg_t lower_reg, + i915_reg_t upper_reg, + u64 *cs_ts, + u64 *cpu_ts, + __ktime_func_t cpu_clock) +{ + u32 upper, lower, old_upper, loop = 0; + + upper = intel_uncore_read_fw(uncore, upper_reg); + do { + cpu_ts[1] = local_clock(); + cpu_ts[0] = cpu_clock(); + lower = intel_uncore_read_fw(uncore, lower_reg); + cpu_ts[1] = local_clock() - cpu_ts[1]; + old_upper = upper; + upper = intel_uncore_read_fw(uncore, upper_reg); + } while (upper != old_upper && loop++ < 2); + + *cs_ts = (u64)upper << 32 | lower; + + return 0; +} + +static int +__query_cs_cycles(struct intel_engine_cs *engine, + u64 *cs_ts, u64 *cpu_ts, + __ktime_func_t cpu_clock) +{ + struct intel_uncore *uncore = engine->uncore; + enum forcewake_domains fw_domains; + u32 base = engine->mmio_base; + intel_wakeref_t wakeref; + int ret; + + fw_domains = intel_uncore_forcewake_for_reg(uncore, + RING_TIMESTAMP(base), + FW_REG_READ); + + with_intel_runtime_pm(uncore->rpm, wakeref) { + spin_lock_irq(&uncore->lock); + intel_uncore_forcewake_get__locked(uncore, fw_domains); + + ret = __read_timestamps(uncore, + RING_TIMESTAMP(base), + RING_TIMESTAMP_UDW(base), + cs_ts, + cpu_ts, + cpu_clock); + + intel_uncore_forcewake_put__locked(uncore, fw_domains); + spin_unlock_irq(&uncore->lock); + } + +
Re: [PATCH] drm: fix drm_mode_create_blob comment
On 02/03/2021 20:48, Simon Ser wrote: On Tuesday, March 2nd, 2021 at 7:47 PM, Lionel Landwerlin wrote: Thanks Simon. Do you have the rights to push this patch? Ah, since you're asking about this, it probably means you don't have the rights. I'll push the patch now to drm-misc-next. Thanks a bunch! -Lionel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm: fix drm_mode_create_blob comment
Thanks Simon. Do you have the rights to push this patch? -Lionel On 02/03/2021 20:46, Simon Ser wrote: Good catch! Reviewed-by: Simon Ser ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm: fix drm_mode_create_blob comment
Just a silly mistake Signed-off-by: Lionel Landwerlin Suggested-by: Ben Widawsky --- include/uapi/drm/drm_mode.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h index b49fbf2bdc408..93b494f704b91 100644 --- a/include/uapi/drm/drm_mode.h +++ b/include/uapi/drm/drm_mode.h @@ -993,7 +993,7 @@ struct drm_format_modifier { }; /** - * struct drm_mode_create_blob - Create New block property + * struct drm_mode_create_blob - Create New blob property * * Create a new 'blob' data property, copying length bytes from data pointer, * and returning new blob ID. -- 2.30.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v6 44/80] docs: gpu: i915.rst: Fix several C duplication warnings
On 13/10/2020 14:53, Mauro Carvalho Chehab wrote: As reported by Sphinx: ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:1147: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_wait_unlocked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:1169: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_poll_wait'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:1189: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_read'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:2669: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_stream_enable'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:2734: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_stream_disable'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:2820: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_stream_init'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3010: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_read'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3098: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_poll_locked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3129: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_poll'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3152: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_enable_locked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3181: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_disable_locked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3273: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_ioctl'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3296: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_destroy_locked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3321: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_release'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3379: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_open_ioctl_locked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3534: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'read_properties_unlocked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3717: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_open_ioctl'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3760: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_register'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3789: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_unregister'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:4009: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_add_config_ioctl'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:4162: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_remove_config_ioctl'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:4260: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_init'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:4423: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_fini'. With Sphinx 3, C declarations can't be duplicated anymore, so let's exclude those from the other internals
Re: [PATCH v6 44/80] docs: gpu: i915.rst: Fix several C duplication warnings
On 16/10/2020 14:50, Jani Nikula wrote: On Fri, 16 Oct 2020, Lionel Landwerlin wrote: On 16/10/2020 14:37, Mauro Carvalho Chehab wrote: Em Fri, 16 Oct 2020 14:01:07 +0300 Joonas Lahtinen escreveu: + Lionel Can you please take a look at best resolving the below problem. Maybe we should eliminate the duplicate declarations? Updating such a list manually seems error prone to me. For Kernel 5.10, IMO the best is to apply this patch as-is, as any other thing would need to be postponed, and we want 5.10 free of doc warnings. That's odd... Most of the functions are documented. Is it that we're missing the "()" after the function name maybe? The problem is we first include named functions, and then go on to include everything again, duplicating the documentation for the named functions. BR, Jani. Thanks, now the patch makes sense. -Lionel -Lionel Yet, when I wrote this one, I almost took a different approach: to implement something like @*group (or \*group) directives that exists on doxygen: https://www.doxygen.nl/manual/grouping.html If something like that gets added to kernel-doc syntax, then one could do something like: /** * DOC: some foo description * @group foo */ /** * foo1 - do some foo things * @group foo ... */ /** * foo2 - do some other foo things * @group foo ... */ /** * bar - do bar things * @group bar ... */ And then, at kernel-doc markup: FOO === .. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c :group: foo BAR === .. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c :group: bar I suspect that something like that would be a lot easier to maintain. Once having someone like that implemented, it should be easy to also have something like this: OTHERS == .. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c :export: :not-grouped: in order to pick other functions that aren't grouped. I suspect that implementing something like that at kernel-doc.pl won't be hard. Regards, Mauro Regards, Joonas Quoting Mauro Carvalho Chehab (2020-10-13 14:53:59) As reported by Sphinx: ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:1147: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_wait_unlocked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:1169: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_poll_wait'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:1189: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_read'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:2669: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_stream_enable'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:2734: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_stream_disable'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:2820: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_stream_init'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3010: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_read'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3098: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_poll_locked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3129: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_poll'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3152: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_enable_locked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3181: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_disable_locked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3273: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_ioctl'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i
Re: [PATCH v6 44/80] docs: gpu: i915.rst: Fix several C duplication warnings
On 13/10/2020 14:53, Mauro Carvalho Chehab wrote: As reported by Sphinx: ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:1147: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_wait_unlocked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:1169: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_poll_wait'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:1189: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_read'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:2669: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_stream_enable'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:2734: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_stream_disable'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:2820: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_stream_init'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3010: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_read'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3098: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_poll_locked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3129: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_poll'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3152: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_enable_locked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3181: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_disable_locked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3273: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_ioctl'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3296: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_destroy_locked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3321: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_release'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3379: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_open_ioctl_locked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3534: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'read_properties_unlocked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3717: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_open_ioctl'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3760: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_register'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3789: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_unregister'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:4009: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_add_config_ioctl'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:4162: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_remove_config_ioctl'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:4260: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_init'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:4423: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_fini'. With Sphinx 3, C declarations can't be duplicated anymore, so let's exclude those from the other internals
Re: [PATCH v6 44/80] docs: gpu: i915.rst: Fix several C duplication warnings
On 16/10/2020 14:37, Mauro Carvalho Chehab wrote: Em Fri, 16 Oct 2020 14:01:07 +0300 Joonas Lahtinen escreveu: + Lionel Can you please take a look at best resolving the below problem. Maybe we should eliminate the duplicate declarations? Updating such a list manually seems error prone to me. For Kernel 5.10, IMO the best is to apply this patch as-is, as any other thing would need to be postponed, and we want 5.10 free of doc warnings. That's odd... Most of the functions are documented. Is it that we're missing the "()" after the function name maybe? -Lionel Yet, when I wrote this one, I almost took a different approach: to implement something like @*group (or \*group) directives that exists on doxygen: https://www.doxygen.nl/manual/grouping.html If something like that gets added to kernel-doc syntax, then one could do something like: /** * DOC: some foo description * @group foo */ /** * foo1 - do some foo things * @group foo ... */ /** * foo2 - do some other foo things * @group foo ... */ /** * bar - do bar things * @group bar ... */ And then, at kernel-doc markup: FOO === .. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c :group: foo BAR === .. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c :group: bar I suspect that something like that would be a lot easier to maintain. Once having someone like that implemented, it should be easy to also have something like this: OTHERS == .. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c :export: :not-grouped: in order to pick other functions that aren't grouped. I suspect that implementing something like that at kernel-doc.pl won't be hard. Regards, Mauro Regards, Joonas Quoting Mauro Carvalho Chehab (2020-10-13 14:53:59) As reported by Sphinx: ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:1147: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_wait_unlocked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:1169: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_poll_wait'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:1189: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_read'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:2669: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_stream_enable'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:2734: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_stream_disable'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:2820: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_oa_stream_init'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3010: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_read'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3098: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_poll_locked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3129: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_poll'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3152: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_enable_locked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3181: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_disable_locked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3273: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_ioctl'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3296: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_destroy_locked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3321: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_release'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3379: WARNING: Duplicate C declaration, also defined in 'gpu/i915'. Declaration is 'i915_perf_open_ioctl_locked'. ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3534: WARN
Re: [PATCH] drm/syncobj: Tune down unordered timeline DRM_ERROR
On 01/08/2020 12:26, Daniel Vetter wrote: Userspace can provoke this, we generally don't allow userspace to spam dmesg. Tune it down to debug. Unfortunately we don't have easy access to the drm_device here (not at all without changing a few things), so leave it as old style dmesg output for now. References: https://patchwork.freedesktop.org/series/80146/ Signed-off-by: Daniel Vetter Cc: Chris Wilson Cc: Lionel Landwerlin Cc: "Christian König" --- drivers/gpu/drm/drm_syncobj.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 3bf73971daf3..6e74e6745eca 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -297,7 +297,7 @@ void drm_syncobj_add_point(struct drm_syncobj *syncobj, prev = drm_syncobj_fence_get(syncobj); /* You are adding an unorder point to timeline, which could cause payload returned from query_ioctl is 0! */ if (prev && prev->seqno >= point) - DRM_ERROR("You are adding an unorder point to timeline!\n"); + DRM_DEBUG("You are adding an unorder point to timeline!\n"); dma_fence_chain_init(chain, prev, fence, point); rcu_assign_pointer(syncobj->fence, &chain->base); Thanks, Acked-by: Lionel Landwerlin ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 1/2] Revert "dma-buf: Report signaled links inside dma-fence-chain"
On 25/06/2020 15:43, Christian König wrote: Am 25.06.20 um 14:34 schrieb Lionel Landwerlin: This reverts commit 5de376bb434f80a13138f0ebedc8351ab73d8b0d. This change breaks synchronization of a timeline. dma_fence_chain_find_seqno() might be a bit of a confusing name but this function is not trying to find a particular seqno, is supposed to give a fence to wait on for a particular point in the timeline. In a timeline, a particular value is reached when all the points up to and including that value have signaled. Signed-off-by: Lionel Landwerlin Reviewed-by: Christian König Now that you are a maintainer, feel free to merge this and the test changes. Thanks, -Lionel --- drivers/dma-buf/dma-fence-chain.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c index c435bbba851c..3d123502ff12 100644 --- a/drivers/dma-buf/dma-fence-chain.c +++ b/drivers/dma-buf/dma-fence-chain.c @@ -99,12 +99,6 @@ int dma_fence_chain_find_seqno(struct dma_fence **pfence, uint64_t seqno) return -EINVAL; dma_fence_chain_for_each(*pfence, &chain->base) { - if ((*pfence)->seqno < seqno) { /* already signaled */ - dma_fence_put(*pfence); - *pfence = NULL; - break; - } - if ((*pfence)->context != chain->base.context || to_dma_fence_chain(*pfence)->prev_seqno < seqno) break; @@ -228,7 +222,6 @@ EXPORT_SYMBOL(dma_fence_chain_ops); * @chain: the chain node to initialize * @prev: the previous fence * @fence: the current fence - * @seqno: the sequence number (syncpt) of the fence within the chain * * Initialize a new chain node and either start a new chain or add the node to * the existing chain of the previous fence. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] dma-buf: document dma-fence-chain purpose/behavior
On 26/06/2020 17:22, Chris Wilson wrote: Quoting Lionel Landwerlin (2020-06-26 13:21:00) Trying to explain a bit how this thing works. In my opinion diagrams are a bit easier to understand than words. Signed-off-by: Lionel Landwerlin --- drivers/dma-buf/dma-fence-chain.c | 37 +++ 1 file changed, 37 insertions(+) diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c index 3d123502ff12..ac90ddf37b55 100644 --- a/drivers/dma-buf/dma-fence-chain.c +++ b/drivers/dma-buf/dma-fence-chain.c @@ -9,6 +9,43 @@ #include +/** + * DOC: DMA fence chains overview + * + * DMA fence chains, represented by &struct dma_fence_chain, are a kernel + * internal synchronization primitive providing a wrapping mechanism of other + * DMA fences in the form a single link list. + * + * One of the use case of this primitive is to implement Vulkan timeline + * semaphores (see VK_KHR_timeline_semaphore extension or Vulkan specification + * 1.2). + * + * Each DMA fence chain item wraps 2 items : + * + * - A previous DMA fence. + * + * - A DMA fence associated to the current &struct dma_fence_chain. + * + * A DMA fence chain becomes signaled when its previous fence as well as its + * associated fence are signaled. If a chain of dma fence chains is created, + * this property recurses, meaning that any dma fence chain element in the + * list becomes signaled only if its associated fence and all the previous + * fences in the chain are also signaled. + * + * A DMA fence chain's seqno is specified through dma_fence_chain_init(). This + * value is lower bound to the seqno of the previous fence to ensure the chain + * is monotically increasing. + * + * By traversing the chain's linked list, one can compute a seqno number + * associated with the chain such that is the highest number for which all + * previous fences have signaled. Next fence - 1 == highest seqno for all previous fences. Ok, what about the end point then? If you ask for a seqno higher than the last fence. Since that is not yet defined, it is an error, right? Correct, find_seqno() will return -EINVAL in that case. -Lionel Otherwise, we could interpret the highest possible seqno for the last fence as meaning U64_MAX. -Chris ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] dma-buf: document dma-fence-chain purpose/behavior
On 26/06/2020 15:43, Daniel Vetter wrote: On Fri, Jun 26, 2020 at 2:21 PM Lionel Landwerlin wrote: Trying to explain a bit how this thing works. In my opinion diagrams are a bit easier to understand than words. kerneldoc supports in-line DOT graphs, see e.g. https://dri.freedesktop.org/docs/drm/gpu/drm-kms.html#overview If that doesn't work, then you can include a full-blown svg too. And yes for this a quick DOT graph that explains how things connect sound like the perfect use of a diagramm. Cheers, Daniel Thanks! Though I'm thinking I need a few to show the signaling behavior. Not sure how tractable that is with DOT/SVG. My last attempt was a series of slides... -Lionel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] dma-buf: document dma-fence-chain purpose/behavior
Trying to explain a bit how this thing works. In my opinion diagrams are a bit easier to understand than words. Signed-off-by: Lionel Landwerlin --- drivers/dma-buf/dma-fence-chain.c | 37 +++ 1 file changed, 37 insertions(+) diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c index 3d123502ff12..ac90ddf37b55 100644 --- a/drivers/dma-buf/dma-fence-chain.c +++ b/drivers/dma-buf/dma-fence-chain.c @@ -9,6 +9,43 @@ #include +/** + * DOC: DMA fence chains overview + * + * DMA fence chains, represented by &struct dma_fence_chain, are a kernel + * internal synchronization primitive providing a wrapping mechanism of other + * DMA fences in the form a single link list. + * + * One of the use case of this primitive is to implement Vulkan timeline + * semaphores (see VK_KHR_timeline_semaphore extension or Vulkan specification + * 1.2). + * + * Each DMA fence chain item wraps 2 items : + * + * - A previous DMA fence. + * + * - A DMA fence associated to the current &struct dma_fence_chain. + * + * A DMA fence chain becomes signaled when its previous fence as well as its + * associated fence are signaled. If a chain of dma fence chains is created, + * this property recurses, meaning that any dma fence chain element in the + * list becomes signaled only if its associated fence and all the previous + * fences in the chain are also signaled. + * + * A DMA fence chain's seqno is specified through dma_fence_chain_init(). This + * value is lower bound to the seqno of the previous fence to ensure the chain + * is monotically increasing. + * + * By traversing the chain's linked list, one can compute a seqno number + * associated with the chain such that is the highest number for which all + * previous fences have signaled. + * + * One can also traverse the chain's linked list to find a &struct + * dma_fence_chain that when signaled guarantees that all previous fences in + * the chain are signaled. dma_fence_chain_find_seqno() provides this + * functionality. + */ + static bool dma_fence_chain_enable_signaling(struct dma_fence *fence); /** -- 2.27.0 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Intel-gfx] [PATCH 2/2] dma-buf: fix dma-fence-chain out of order test
On 25/06/2020 16:47, Chris Wilson wrote: Quoting Lionel Landwerlin (2020-06-25 14:23:25) On 25/06/2020 16:18, Chris Wilson wrote: Quoting Lionel Landwerlin (2020-06-25 13:34:43) There was probably a misunderstand on how the dma-fence-chain is supposed to work or what dma_fence_chain_find_seqno() is supposed to return. dma_fence_chain_find_seqno() is here to give us the fence to wait upon for a particular point in the timeline. The timeline progresses only when all the points prior to a given number have completed. Hmm, the question was what point is it supposed to wait for. For the simple chain of [1, 3], does 1 being signaled imply that all points up to 3 are signaled, or does 3 not being signaled imply that all points after 1 are not. If that's mentioned already somewhere, my bad. If not, could you put the answer somewhere. -Chris In [1, 3], if 1 is signaled, the timeline value is 1. And find_seqno(2) should return NULL. In the out_of_order selftest the chain was [1, 2, 3], 2 was signaled and the test was expecting no fence to be returned by find_seqno(2). But we still have to wait on 1 to complete before find_seqno(2) can return NULL (as in you don't have to wait on anything). * scratches head I thought it was meant to be expecting fc.chain[1] to still be present as the chain at that point was not yet signaled. You're right that the point is not yet signaled. But it doesn't need to stay in the chain if you can wait on a previous point. chain[1] gets removed as we walk the chain backward in dma_fence_chain_walk. -Lionel Oh well, a mistake compounded. :| -Chris ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Intel-gfx] [PATCH 2/2] dma-buf: fix dma-fence-chain out of order test
On 25/06/2020 16:18, Chris Wilson wrote: Quoting Lionel Landwerlin (2020-06-25 13:34:43) There was probably a misunderstand on how the dma-fence-chain is supposed to work or what dma_fence_chain_find_seqno() is supposed to return. dma_fence_chain_find_seqno() is here to give us the fence to wait upon for a particular point in the timeline. The timeline progresses only when all the points prior to a given number have completed. Hmm, the question was what point is it supposed to wait for. For the simple chain of [1, 3], does 1 being signaled imply that all points up to 3 are signaled, or does 3 not being signaled imply that all points after 1 are not. If that's mentioned already somewhere, my bad. If not, could you put the answer somewhere. -Chris In [1, 3], if 1 is signaled, the timeline value is 1. And find_seqno(2) should return NULL. In the out_of_order selftest the chain was [1, 2, 3], 2 was signaled and the test was expecting no fence to be returned by find_seqno(2). But we still have to wait on 1 to complete before find_seqno(2) can return NULL (as in you don't have to wait on anything). Hope that answer the question. -Lionel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 1/2] Revert "dma-buf: Report signaled links inside dma-fence-chain"
This reverts commit 5de376bb434f80a13138f0ebedc8351ab73d8b0d. This change breaks synchronization of a timeline. dma_fence_chain_find_seqno() might be a bit of a confusing name but this function is not trying to find a particular seqno, is supposed to give a fence to wait on for a particular point in the timeline. In a timeline, a particular value is reached when all the points up to and including that value have signaled. Signed-off-by: Lionel Landwerlin --- drivers/dma-buf/dma-fence-chain.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c index c435bbba851c..3d123502ff12 100644 --- a/drivers/dma-buf/dma-fence-chain.c +++ b/drivers/dma-buf/dma-fence-chain.c @@ -99,12 +99,6 @@ int dma_fence_chain_find_seqno(struct dma_fence **pfence, uint64_t seqno) return -EINVAL; dma_fence_chain_for_each(*pfence, &chain->base) { - if ((*pfence)->seqno < seqno) { /* already signaled */ - dma_fence_put(*pfence); - *pfence = NULL; - break; - } - if ((*pfence)->context != chain->base.context || to_dma_fence_chain(*pfence)->prev_seqno < seqno) break; @@ -228,7 +222,6 @@ EXPORT_SYMBOL(dma_fence_chain_ops); * @chain: the chain node to initialize * @prev: the previous fence * @fence: the current fence - * @seqno: the sequence number (syncpt) of the fence within the chain * * Initialize a new chain node and either start a new chain or add the node to * the existing chain of the previous fence. -- 2.27.0 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 2/2] dma-buf: fix dma-fence-chain out of order test
There was probably a misunderstand on how the dma-fence-chain is supposed to work or what dma_fence_chain_find_seqno() is supposed to return. dma_fence_chain_find_seqno() is here to give us the fence to wait upon for a particular point in the timeline. The timeline progresses only when all the points prior to a given number have completed. Signed-off-by: Lionel Landwerlin Fixes: dc2f7e67a28a5c ("dma-buf: Exercise dma-fence-chain under selftests") --- drivers/dma-buf/st-dma-fence-chain.c | 43 ++-- 1 file changed, 21 insertions(+), 22 deletions(-) diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c index 5d45ba7ba3cd..9525f7f56119 100644 --- a/drivers/dma-buf/st-dma-fence-chain.c +++ b/drivers/dma-buf/st-dma-fence-chain.c @@ -318,15 +318,16 @@ static int find_out_of_order(void *arg) goto err; } - if (fence && fence != fc.chains[1]) { + /* +* We signaled the middle fence (2) of the 1-2-3 chain. The behavior +* of the dma-fence-chain is to make us wait for all the fences up to +* the point we want. Since fence 1 is still not signaled, this what +* we should get as fence to wait upon (fence 2 being garbage +* collected during the traversal of the chain). +*/ + if (fence != fc.chains[0]) { pr_err("Incorrect chain-fence.seqno:%lld reported for completed seqno:2\n", - fence->seqno); - - dma_fence_get(fence); - err = dma_fence_chain_find_seqno(&fence, 2); - dma_fence_put(fence); - if (err) - pr_err("Reported %d for finding self!\n", err); + fence ? fence->seqno : 0); err = -EINVAL; } @@ -415,20 +416,18 @@ static int __find_race(void *arg) if (!fence) goto signal; - err = dma_fence_chain_find_seqno(&fence, seqno); - if (err) { - pr_err("Reported an invalid fence for find-self:%d\n", - seqno); - dma_fence_put(fence); - break; - } - - if (fence->seqno < seqno) { - pr_err("Reported an earlier fence.seqno:%lld for seqno:%d\n", - fence->seqno, seqno); - err = -EINVAL; - dma_fence_put(fence); - break; + /* +* We can only find ourselves if we are on fence we were +* looking for. +*/ + if (fence->seqno == seqno) { + err = dma_fence_chain_find_seqno(&fence, seqno); + if (err) { + pr_err("Reported an invalid fence for find-self:%d\n", + seqno); + dma_fence_put(fence); + break; + } } dma_fence_put(fence); -- 2.27.0 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Intel-gfx] [PATCH 2/2] RFC drm/i915: Export per-client debug tracing
On 01/03/2020 17:52, Chris Wilson wrote: Rather than put sensitive, and often voluminous, user details into a global dmesg, report the error and debug messages directly back to the user via the kernel tracing mechanism. Sounds really nice. Don't you want the existing global tracing to be the default at least until a client does a get_trace? -Lionel Signed-off-by: Chris Wilson Cc: Steven Rostedt (VMware) --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 104 ++- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 124 ++ drivers/gpu/drm/i915/gem/i915_gem_pages.c | 6 +- drivers/gpu/drm/i915/i915_drv.h | 4 + drivers/gpu/drm/i915/i915_gem.c | 5 +- include/uapi/drm/i915_drm.h | 7 + 6 files changed, 156 insertions(+), 94 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index e525ead073f7..c136a8c90e27 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -81,6 +81,8 @@ #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1 +#define CTX_TRACE(ctx, ...) TRACE((ctx)->file_priv->trace, __VA_ARGS__) + static struct i915_global_gem_context { struct i915_global base; struct kmem_cache *slab_luts; @@ -158,8 +160,12 @@ lookup_user_engine(struct i915_gem_context *ctx, engine = intel_engine_lookup_user(ctx->i915, ci->engine_class, ci->engine_instance); - if (!engine) + if (!engine) { + CTX_TRACE(ctx, + "Unknown engine {class:%d, instance:%d}\n", + ci->engine_class, ci->engine_instance); return ERR_PTR(-EINVAL); + } idx = engine->legacy_idx; } else { @@ -762,8 +768,6 @@ i915_gem_create_context(struct drm_i915_private *i915, unsigned int flags) ppgtt = i915_ppgtt_create(&i915->gt); if (IS_ERR(ppgtt)) { - drm_dbg(&i915->drm, "PPGTT setup failed (%ld)\n", - PTR_ERR(ppgtt)); context_close(ctx); return ERR_CAST(ppgtt); } @@ -1461,14 +1465,15 @@ set_engines__load_balance(struct i915_user_extension __user *base, void *data) return -EFAULT; if (idx >= set->engines->num_engines) { - drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n", - idx, set->engines->num_engines); + CTX_TRACE(set->ctx, + "Invalid placement value, %d >= %d\n", + idx, set->engines->num_engines); return -EINVAL; } idx = array_index_nospec(idx, set->engines->num_engines); if (set->engines->engines[idx]) { - drm_dbg(&i915->drm, + CTX_TRACE(set->ctx, "Invalid placement[%d], already occupied\n", idx); return -EEXIST; } @@ -1505,9 +1510,9 @@ set_engines__load_balance(struct i915_user_extension __user *base, void *data) ci.engine_class, ci.engine_instance); if (!siblings[n]) { - drm_dbg(&i915->drm, - "Invalid sibling[%d]: { class:%d, inst:%d }\n", - n, ci.engine_class, ci.engine_instance); + CTX_TRACE(set->ctx, + "Invalid sibling[%d]: { class:%d, inst:%d }\n", + n, ci.engine_class, ci.engine_instance); err = -EINVAL; goto out_siblings; } @@ -1551,15 +1556,15 @@ set_engines__bond(struct i915_user_extension __user *base, void *data) return -EFAULT; if (idx >= set->engines->num_engines) { - drm_dbg(&i915->drm, - "Invalid index for virtual engine: %d >= %d\n", - idx, set->engines->num_engines); + CTX_TRACE(set->ctx, + "Invalid index for virtual engine: %d >= %d\n", + idx, set->engines->num_engines); return -EINVAL; } idx = array_index_nospec(idx, set->engines->num_engines); if (!set->engines->engines[idx]) { - drm_dbg(&i915->drm, "Invalid engine at %d\n", idx); + CTX_TRACE(set->ctx, "Invalid engine at %d\n", idx); return -EINVAL; } virtual = set->engines->engines[idx]->engine; @@ -1580,9 +1585,9 @@ set_engines__bond(struct i915_user_extension __user *base, void *data
Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services
On 28/02/2020 13:46, Michel Dänzer wrote: On 2020-02-28 12:02 p.m., Erik Faye-Lund wrote: On Fri, 2020-02-28 at 10:43 +, Daniel Stone wrote: On Fri, 28 Feb 2020 at 10:06, Erik Faye-Lund wrote: On Fri, 2020-02-28 at 11:40 +0200, Lionel Landwerlin wrote: Yeah, changes on vulkan drivers or backend compilers should be fairly sandboxed. We also have tools that only work for intel stuff, that should never trigger anything on other people's HW. Could something be worked out using the tags? I think so! We have the pre-defined environment variable CI_MERGE_REQUEST_LABELS, and we can do variable conditions: https://docs.gitlab.com/ee/ci/yaml/#onlyvariablesexceptvariables That sounds like a pretty neat middle-ground to me. I just hope that new pipelines are triggered if new labels are added, because not everyone is allowed to set labels, and sometimes people forget... There's also this which is somewhat more robust: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2569 I'm not sure it's more robust, but yeah that a useful tool too. The reason I'm skeptical about the robustness is that we'll miss testing if this misses a path. Surely missing a path will be less likely / often to happen compared to an MR missing a label. (Users which aren't members of the project can't even set labels for an MR) Sounds like a good alternative to tags. -Lionel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services
On 28/02/2020 11:28, Erik Faye-Lund wrote: On Fri, 2020-02-28 at 13:37 +1000, Dave Airlie wrote: On Fri, 28 Feb 2020 at 07:27, Daniel Vetter wrote: Hi all, You might have read the short take in the X.org board meeting minutes already, here's the long version. The good news: gitlab.fd.o has become very popular with our communities, and is used extensively. This especially includes all the CI integration. Modern development process and tooling, yay! The bad news: The cost in growth has also been tremendous, and it's breaking our bank account. With reasonable estimates for continued growth we're expecting hosting expenses totalling 75k USD this year, and 90k USD next year. With the current sponsors we've set up we can't sustain that. We estimate that hosting expenses for gitlab.fd.o without any of the CI features enabled would total 30k USD, which is within X.org's ability to support through various sponsorships, mostly through XDC. Note that X.org does no longer sponsor any CI runners themselves, we've stopped that. The huge additional expenses are all just in storing and serving build artifacts and images to outside CI runners sponsored by various companies. A related topic is that with the growth in fd.o it's becoming infeasible to maintain it all on volunteer admin time. X.org is therefore also looking for admin sponsorship, at least medium term. Assuming that we want cash flow reserves for one year of gitlab.fd.o (without CI support) and a trimmed XDC and assuming no sponsor payment meanwhile, we'd have to cut CI services somewhere between May and June this year. The board is of course working on acquiring sponsors, but filling a shortfall of this magnitude is neither easy nor quick work, and we therefore decided to give an early warning as soon as possible. Any help in finding sponsors for fd.o is very much appreciated. a) Ouch. b) we probably need to take a large step back here. I kinda agree, but maybe the step doesn't have to be *too* large? I wonder if we could solve this by restructuring the project a bit. I'm talking purely from a Mesa point of view here, so it might not solve the full problem, but: 1. It feels silly that we need to test changes to e.g the i965 driver on dragonboards. We only have a big "do not run CI at all" escape- hatch. Yeah, changes on vulkan drivers or backend compilers should be fairly sandboxed. We also have tools that only work for intel stuff, that should never trigger anything on other people's HW. Could something be worked out using the tags? -Lionel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2] drm/syncobj: Add documentation for timeline syncobj
On 14/01/2020 16:25, Christian König wrote: Am 14.01.20 um 13:19 schrieb Lionel Landwerlin: We've added a set of new APIs to manipulate syncobjs holding timelines of dma_fence. This adds a bit of documentation about how this works. v2: Small language nits (Lionel) Signed-off-by: Lionel Landwerlin Cc: Christian Koenig Cc: Jason Ekstrand Cc: David(ChunMing) Zhou Reviewed-by: Christian König Thanks for the review Christian. Feel free to merge this commit whenever, I don't think I have commit rights. Cheers, -Lionel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v2] drm/syncobj: Add documentation for timeline syncobj
We've added a set of new APIs to manipulate syncobjs holding timelines of dma_fence. This adds a bit of documentation about how this works. v2: Small language nits (Lionel) Signed-off-by: Lionel Landwerlin Cc: Christian Koenig Cc: Jason Ekstrand Cc: David(ChunMing) Zhou --- drivers/gpu/drm/drm_syncobj.c | 87 +-- 1 file changed, 74 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 669c93fe2500..42d46414f767 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -43,27 +43,66 @@ * - Signal a syncobj (set a trivially signaled fence) * - Wait for a syncobj's fence to appear and be signaled * + * The syncobj userspace API also provides operations to manipulate a syncobj + * in terms of a timeline of struct &dma_fence_chain rather than a single + * struct &dma_fence, through the following operations: + * + * - Signal a given point on the timeline + * - Wait for a given point to appear and/or be signaled + * - Import and export from/to a given point of a timeline + * * At it's core, a syncobj is simply a wrapper around a pointer to a struct * &dma_fence which may be NULL. * When a syncobj is first created, its pointer is either NULL or a pointer * to an already signaled fence depending on whether the * &DRM_SYNCOBJ_CREATE_SIGNALED flag is passed to * &DRM_IOCTL_SYNCOBJ_CREATE. - * When GPU work which signals a syncobj is enqueued in a DRM driver, - * the syncobj fence is replaced with a fence which will be signaled by the - * completion of that work. - * When GPU work which waits on a syncobj is enqueued in a DRM driver, the - * driver retrieves syncobj's current fence at the time the work is enqueued - * waits on that fence before submitting the work to hardware. - * If the syncobj's fence is NULL, the enqueue operation is expected to fail. - * All manipulation of the syncobjs's fence happens in terms of the current - * fence at the time the ioctl is called by userspace regardless of whether - * that operation is an immediate host-side operation (signal or reset) or - * or an operation which is enqueued in some driver queue. - * &DRM_IOCTL_SYNCOBJ_RESET and &DRM_IOCTL_SYNCOBJ_SIGNAL can be used to - * manipulate a syncobj from the host by resetting its pointer to NULL or + * + * If the syncobj is considered as a binary (its state is either signaled or + * unsignaled) primitive, when GPU work is enqueued in a DRM driver to signal + * the syncobj, the syncobj's fence is replaced with a fence which will be + * signaled by the completion of that work. + * If the syncobj is considered as a timeline primitive, when GPU work is + * enqueued in a DRM driver to signal the a given point of the syncobj, a new + * struct &dma_fence_chain pointing to the DRM driver's fence and also + * pointing to the previous fence that was in the syncobj. The new struct + * &dma_fence_chain fence replace the syncobj's fence and will be signaled by + * completion of the DRM driver's work and also any work associated with the + * fence previously in the syncobj. + * + * When GPU work which waits on a syncobj is enqueued in a DRM driver, at the + * time the work is enqueued, it waits on the syncobj's fence before + * submitting the work to hardware. That fence is either : + * + *- The syncobj's current fence if the syncobj is considered as a binary + * primitive. + *- The struct &dma_fence associated with a given point if the syncobj is + * considered as a timeline primitive. + * + * If the syncobj's fence is NULL or not present in the syncobj's timeline, + * the enqueue operation is expected to fail. + * + * With binary syncobj, all manipulation of the syncobjs's fence happens in + * terms of the current fence at the time the ioctl is called by userspace + * regardless of whether that operation is an immediate host-side operation + * (signal or reset) or or an operation which is enqueued in some driver + * queue. &DRM_IOCTL_SYNCOBJ_RESET and &DRM_IOCTL_SYNCOBJ_SIGNAL can be used + * to manipulate a syncobj from the host by resetting its pointer to NULL or * setting its pointer to a fence which is already signaled. * + * With a timeline syncobj, all manipulation of the synobj's fence happens in + * terms of a u64 value referring to point in the timeline. See + * dma_fence_chain_find_seqno() to see how a given point is found in the + * timeline. + * + * Note that applications should be careful to always use timeline set of + * ioctl() when dealing with syncobj considered as timeline. Using a binary + * set of ioctl() with a syncobj considered as timeline could result incorrect + * synchronization. The use of binary syncobj is supported through the + * timeline set of ioctl() by using a point value of 0, this will reproduce + * the behavior o
Re: [PATCH 1/1] drm/syncobj: add sideband payload
Following earlier discussions in particular with James Jones at Nvidia, I think we established this patch/feature is not needed. This feature was indented to fix a failing test on our implementation. I've just submitted a MR to delete that test : https://gitlab.freedesktop.org/mesa/crucible/merge_requests/55 I think it is invalid. We should be able to workaround the submission thread race condition issue by just resetting a binary semaphore to be signaled in vkQueueSubmit before submitting the workload, so that further waits happen on the right dma-fence. This might be a bit more costly (more ioctls) than the feature in this patch, so I'm looking for your feedback on this. Thanks a lot, -Lionel On 17/09/2019 16:06, Lionel Landwerlin wrote: Thanks David, I'll try to fix the test to match AMD's restrictions. The v7 here was to fix another existing test : dEQP-VK.api.external.fence.sync_fd.transference_temporary Cheers, -Lionel On 17/09/2019 15:36, Zhou, David(ChunMing) wrote: Hi Lionel, The update looks good to me. I tried your signal-order test, seems it isn't ready to run, not sure if I can reproduce your this issue. -David ---- *From:* Lionel Landwerlin *Sent:* Tuesday, September 17, 2019 7:03 PM *To:* dri-devel@lists.freedesktop.org *Cc:* Lionel Landwerlin ; Zhou, David(ChunMing) ; Koenig, Christian ; Jason Ekstrand *Subject:* [PATCH 1/1] drm/syncobj: add sideband payload The Vulkan timeline semaphores allow signaling to happen on the point of the timeline without all of the its dependencies to be created. The current 2 implementations (AMD/Intel) of the Vulkan spec on top of the Linux kernel are using a thread to wait on the dependencies of a given point to materialize and delay actual submission to the kernel driver until the wait completes. If a binary semaphore is submitted for signaling along the side of a timeline semaphore waiting for completion that means that the drm syncobj associated with that binary semaphore will not have a DMA fence associated with it by the time vkQueueSubmit() returns. This and the fact that a binary semaphore can be signaled and unsignaled as before its DMA fences materialize mean that we cannot just rely on the fence within the syncobj but we also need a sideband payload verifying that the fence in the syncobj matches the last submission from the Vulkan API point of view. This change adds a sideband payload that is incremented with signaled syncobj when vkQueueSubmit() is called. The next vkQueueSubmit() waiting on a the syncobj will read the sideband payload and wait for a fence chain element with a seqno superior or equal to the sideband payload value to be added into the fence chain and use that fence to trigger the submission on the kernel driver. v2: Use a separate ioctl to get/set the sideband value (Christian) v3: Use 2 ioctls for get/set (Christian) v4: Use a single new ioctl v5: a bunch of blattant mistakes Store payload atomically (Chris) v6: Only touch atomic value once (Jason) v7: Updated atomic value when importing sync file Signed-off-by: Lionel Landwerlin Reviewed-by: David Zhou (v6) Cc: Christian Koenig Cc: Jason Ekstrand Cc: David(ChunMing) Zhou --- drivers/gpu/drm/drm_internal.h | 2 ++ drivers/gpu/drm/drm_ioctl.c | 3 ++ drivers/gpu/drm/drm_syncobj.c | 64 -- include/drm/drm_syncobj.h | 9 + include/uapi/drm/drm.h | 17 + 5 files changed, 93 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h index 51a2055c8f18..e297dfd85019 100644 --- a/drivers/gpu/drm/drm_internal.h +++ b/drivers/gpu/drm/drm_internal.h @@ -208,6 +208,8 @@ int drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data, struct drm_file *file_private); int drm_syncobj_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file_private); +int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_private); /* drm_framebuffer.c */ void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent, diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c index f675a3bb2c88..644d0bc800a4 100644 --- a/drivers/gpu/drm/drm_ioctl.c +++ b/drivers/gpu/drm/drm_ioctl.c @@ -703,6 +703,9 @@ static const struct drm_ioctl_desc drm_ioctls[] = { DRM_RENDER_ALLOW), DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_BINARY, drm_syncobj_binary_ioctl, + DRM_RENDER_ALLOW), + DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 0), DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, drm_crt
Re: [PATCH] drm/drm_syncobj: Dead code removal
On 04/10/2019 15:16, Zbigniew Kempczyński wrote: Remove dead code, likely overseened during review process. Signed-off-by: Zbigniew Kempczyński Cc: Chunming Zhou Cc: Daniel Vetter Cc: Jason Ekstrand --- drivers/gpu/drm/drm_syncobj.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 4b5c7b0ed714..21a22e39c9fa 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -192,8 +192,6 @@ static void drm_syncobj_fence_add_wait(struct drm_syncobj *syncobj, if (!fence || dma_fence_chain_find_seqno(&fence, wait->point)) { dma_fence_put(fence); list_add_tail(&wait->node, &syncobj->cb_list); - } else if (!fence) { - wait->fence = dma_fence_get_stub(); } else { wait->fence = fence; } @@ -856,8 +854,6 @@ static void syncobj_wait_syncobj_func(struct drm_syncobj *syncobj, if (!fence || dma_fence_chain_find_seqno(&fence, wait->point)) { dma_fence_put(fence); return; - } else if (!fence) { - wait->fence = dma_fence_get_stub(); } else { wait->fence = fence; } Like Chris said, dma_fence_chain_find_seqno() will update the fence pointer, so a subsequent check might not be dealing with the same value. A bit cheeky, but... -Lionel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 1/1] drm/syncobj: add sideband payload
Thanks David, I'll try to fix the test to match AMD's restrictions. The v7 here was to fix another existing test : dEQP-VK.api.external.fence.sync_fd.transference_temporary Cheers, -Lionel On 17/09/2019 15:36, Zhou, David(ChunMing) wrote: Hi Lionel, The update looks good to me. I tried your signal-order test, seems it isn't ready to run, not sure if I can reproduce your this issue. -David ---- *From:* Lionel Landwerlin *Sent:* Tuesday, September 17, 2019 7:03 PM *To:* dri-devel@lists.freedesktop.org *Cc:* Lionel Landwerlin ; Zhou, David(ChunMing) ; Koenig, Christian ; Jason Ekstrand *Subject:* [PATCH 1/1] drm/syncobj: add sideband payload The Vulkan timeline semaphores allow signaling to happen on the point of the timeline without all of the its dependencies to be created. The current 2 implementations (AMD/Intel) of the Vulkan spec on top of the Linux kernel are using a thread to wait on the dependencies of a given point to materialize and delay actual submission to the kernel driver until the wait completes. If a binary semaphore is submitted for signaling along the side of a timeline semaphore waiting for completion that means that the drm syncobj associated with that binary semaphore will not have a DMA fence associated with it by the time vkQueueSubmit() returns. This and the fact that a binary semaphore can be signaled and unsignaled as before its DMA fences materialize mean that we cannot just rely on the fence within the syncobj but we also need a sideband payload verifying that the fence in the syncobj matches the last submission from the Vulkan API point of view. This change adds a sideband payload that is incremented with signaled syncobj when vkQueueSubmit() is called. The next vkQueueSubmit() waiting on a the syncobj will read the sideband payload and wait for a fence chain element with a seqno superior or equal to the sideband payload value to be added into the fence chain and use that fence to trigger the submission on the kernel driver. v2: Use a separate ioctl to get/set the sideband value (Christian) v3: Use 2 ioctls for get/set (Christian) v4: Use a single new ioctl v5: a bunch of blattant mistakes Store payload atomically (Chris) v6: Only touch atomic value once (Jason) v7: Updated atomic value when importing sync file Signed-off-by: Lionel Landwerlin Reviewed-by: David Zhou (v6) Cc: Christian Koenig Cc: Jason Ekstrand Cc: David(ChunMing) Zhou --- drivers/gpu/drm/drm_internal.h | 2 ++ drivers/gpu/drm/drm_ioctl.c | 3 ++ drivers/gpu/drm/drm_syncobj.c | 64 -- include/drm/drm_syncobj.h | 9 + include/uapi/drm/drm.h | 17 + 5 files changed, 93 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h index 51a2055c8f18..e297dfd85019 100644 --- a/drivers/gpu/drm/drm_internal.h +++ b/drivers/gpu/drm/drm_internal.h @@ -208,6 +208,8 @@ int drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data, struct drm_file *file_private); int drm_syncobj_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file_private); +int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_private); /* drm_framebuffer.c */ void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent, diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c index f675a3bb2c88..644d0bc800a4 100644 --- a/drivers/gpu/drm/drm_ioctl.c +++ b/drivers/gpu/drm/drm_ioctl.c @@ -703,6 +703,9 @@ static const struct drm_ioctl_desc drm_ioctls[] = { DRM_RENDER_ALLOW), DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_BINARY, drm_syncobj_binary_ioctl, + DRM_RENDER_ALLOW), + DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 0), DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, drm_crtc_queue_sequence_ioctl, 0), DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, DRM_MASTER), diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 4b5c7b0ed714..2de8f1380890 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -418,8 +418,10 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags, if (flags & DRM_SYNCOBJ_CREATE_SIGNALED) drm_syncobj_assign_null_handle(syncobj); - if (fence) + if (fence) { drm_syncobj_replace_fence(syncobj, fence); + atomic64_set(&syncobj->binary_payload, fence->seqno); + } *out_syncobj = syncobj; return 0; @
[PATCH 1/1] drm/syncobj: add sideband payload
The Vulkan timeline semaphores allow signaling to happen on the point of the timeline without all of the its dependencies to be created. The current 2 implementations (AMD/Intel) of the Vulkan spec on top of the Linux kernel are using a thread to wait on the dependencies of a given point to materialize and delay actual submission to the kernel driver until the wait completes. If a binary semaphore is submitted for signaling along the side of a timeline semaphore waiting for completion that means that the drm syncobj associated with that binary semaphore will not have a DMA fence associated with it by the time vkQueueSubmit() returns. This and the fact that a binary semaphore can be signaled and unsignaled as before its DMA fences materialize mean that we cannot just rely on the fence within the syncobj but we also need a sideband payload verifying that the fence in the syncobj matches the last submission from the Vulkan API point of view. This change adds a sideband payload that is incremented with signaled syncobj when vkQueueSubmit() is called. The next vkQueueSubmit() waiting on a the syncobj will read the sideband payload and wait for a fence chain element with a seqno superior or equal to the sideband payload value to be added into the fence chain and use that fence to trigger the submission on the kernel driver. v2: Use a separate ioctl to get/set the sideband value (Christian) v3: Use 2 ioctls for get/set (Christian) v4: Use a single new ioctl v5: a bunch of blattant mistakes Store payload atomically (Chris) v6: Only touch atomic value once (Jason) v7: Updated atomic value when importing sync file Signed-off-by: Lionel Landwerlin Reviewed-by: David Zhou (v6) Cc: Christian Koenig Cc: Jason Ekstrand Cc: David(ChunMing) Zhou --- drivers/gpu/drm/drm_internal.h | 2 ++ drivers/gpu/drm/drm_ioctl.c| 3 ++ drivers/gpu/drm/drm_syncobj.c | 64 -- include/drm/drm_syncobj.h | 9 + include/uapi/drm/drm.h | 17 + 5 files changed, 93 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h index 51a2055c8f18..e297dfd85019 100644 --- a/drivers/gpu/drm/drm_internal.h +++ b/drivers/gpu/drm/drm_internal.h @@ -208,6 +208,8 @@ int drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data, struct drm_file *file_private); int drm_syncobj_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file_private); +int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data, +struct drm_file *file_private); /* drm_framebuffer.c */ void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent, diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c index f675a3bb2c88..644d0bc800a4 100644 --- a/drivers/gpu/drm/drm_ioctl.c +++ b/drivers/gpu/drm/drm_ioctl.c @@ -703,6 +703,9 @@ static const struct drm_ioctl_desc drm_ioctls[] = { DRM_RENDER_ALLOW), DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_BINARY, drm_syncobj_binary_ioctl, + DRM_RENDER_ALLOW), + DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 0), DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, drm_crtc_queue_sequence_ioctl, 0), DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, DRM_MASTER), diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 4b5c7b0ed714..2de8f1380890 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -418,8 +418,10 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags, if (flags & DRM_SYNCOBJ_CREATE_SIGNALED) drm_syncobj_assign_null_handle(syncobj); - if (fence) + if (fence) { drm_syncobj_replace_fence(syncobj, fence); + atomic64_set(&syncobj->binary_payload, fence->seqno); + } *out_syncobj = syncobj; return 0; @@ -604,6 +606,7 @@ static int drm_syncobj_import_sync_file_fence(struct drm_file *file_private, } drm_syncobj_replace_fence(syncobj, fence); + atomic64_set(&syncobj->binary_payload, fence->seqno); dma_fence_put(fence); drm_syncobj_put(syncobj); return 0; @@ -1224,8 +1227,10 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data, if (ret < 0) return ret; - for (i = 0; i < args->count_handles; i++) + for (i = 0; i < args->count_handles; i++) { drm_syncobj_replace_fence(syncobjs[i], NULL); + atomic64_set(&syncobjs[i]->binary_payload, 0); + } drm_syncobj_array_free(syncobjs, args->count_handles); @
[PATCH 0/1] drm/syncobj: add sideband payload
Hi all, Just explaining what is being changed here compared to v6 : We just noticed that some of our CTS runs are flaky because when importing a dma fence into a drm syncobj we do not update the atomic binary payload. This leads to issues when the userspace drivers tries to add new points to the timeline because the atomic binary payload may then have a value inferior to the seqno of the new installed fence. Cheers, Lionel Landwerlin (1): drm/syncobj: add sideband payload drivers/gpu/drm/drm_internal.h | 2 ++ drivers/gpu/drm/drm_ioctl.c| 3 ++ drivers/gpu/drm/drm_syncobj.c | 64 -- include/drm/drm_syncobj.h | 9 + include/uapi/drm/drm.h | 17 + 5 files changed, 93 insertions(+), 2 deletions(-) -- 2.23.0 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/syncobj: Add documentation for timeline syncobj
On 27/08/2019 19:27, Daniel Vetter wrote: On Mon, Aug 26, 2019 at 07:30:08AM +0300, Lionel Landwerlin wrote: On 26/08/2019 00:01, Daniel Vetter wrote: On Fri, Aug 23, 2019 at 8:53 PM Jason Ekstrand wrote: On Thu, Aug 22, 2019 at 5:28 PM Lionel Landwerlin wrote: On 22/08/2019 21:24, Jason Ekstrand wrote: On Thu, Aug 22, 2019 at 9:55 AM Lionel Landwerlin wrote: We've added a set of new APIs to manipulate syncobjs holding timelines of dma_fence. This adds a bit of documentation about how this works. Signed-off-by: Lionel Landwerlin Cc: Christian Koenig Cc: Jason Ekstrand Cc: David(ChunMing) Zhou --- drivers/gpu/drm/drm_syncobj.c | 87 +-- 1 file changed, 74 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index b5ad73330a48..32ffded6d2c0 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -43,27 +43,66 @@ * - Signal a syncobj (set a trivially signaled fence) * - Wait for a syncobj's fence to appear and be signaled * + * The syncobj userspace API also provides operations to manipulate a syncobj + * in terms of a timeline of struct &dma_fence rather than a single struct + * &dma_fence, through the following operations: + * + * - Signal a given point on the timeline + * - Wait for a given point to appear and/or be signaled + * - Import and export from/to a given point of a timeline + * * At it's core, a syncobj is simply a wrapper around a pointer to a struct * &dma_fence which may be NULL. * When a syncobj is first created, its pointer is either NULL or a pointer * to an already signaled fence depending on whether the * &DRM_SYNCOBJ_CREATE_SIGNALED flag is passed to * &DRM_IOCTL_SYNCOBJ_CREATE. - * When GPU work which signals a syncobj is enqueued in a DRM driver, - * the syncobj fence is replaced with a fence which will be signaled by the - * completion of that work. - * When GPU work which waits on a syncobj is enqueued in a DRM driver, the - * driver retrieves syncobj's current fence at the time the work is enqueued - * waits on that fence before submitting the work to hardware. - * If the syncobj's fence is NULL, the enqueue operation is expected to fail. - * All manipulation of the syncobjs's fence happens in terms of the current - * fence at the time the ioctl is called by userspace regardless of whether - * that operation is an immediate host-side operation (signal or reset) or - * or an operation which is enqueued in some driver queue. - * &DRM_IOCTL_SYNCOBJ_RESET and &DRM_IOCTL_SYNCOBJ_SIGNAL can be used to - * manipulate a syncobj from the host by resetting its pointer to NULL or + * + * If the syncobj is considered as a binary (signal/unsignaled) primitive, What does "considered as a binary" mean? Is it an inherent property of the syncobj given at create time? Is it a state the syncobj can be in? Or is it a property of how the submit ioctl in the DRM driver references it? I'm really hoping it's either 1 or 3 3: you either use it binary/legacy apis, or timeline apis. timeline apis also provide some binary compatibility with the point 0 (in particular for wait). Right. Maybe we should say something like "When GPU work is enqueued which signals a non-zero time point" or something like that? I guess that implies a certain unification across drivers that maybe we don't want [Just jumping in on this comment here] I thought the point of syncobj is that you can share them across drivers (not just within drivers)? Otherwise not much sense in the common infrastructure. Hence I'd say we should spec all these things. Concern from someone who's seen way too many cross-driver apis that turned out the be decidedly cross-driver than planned ... The sharing of a timeline semaphore/syncobj between 2 apps/drivers implies that they both know they're dealing with a timeline semaphore. I see that at the same level as sharing a file descriptor and knowing it represents a syncfd or a syncobj. There has to be some kind of understanding, otherwise nothing works. If the shared semantic between the 2 clients is a binary (signal/unsignaled) semaphore, then both drivers should share the existing syncobj type, that is a syncobj that will only ever contain a single dma-fence. You can build that out of the timeline by exporting a particular point into another syncobj (transfer ioctl). Oh this is just stating that apps need to agree on old syncobj or timeline syncobj mode? I guess if it's all there is that should be a given, still worth maybe putting in words. -Daniel Thanks, there is a note a bit further down in this patch. It was worded along the lines of with a semaphore within single app, but it applies to shared semaphores too. -Lionel -Lionel Cheers, Daniel + * when GPU work is
[PATCH 2/3] drm/amd/amdgpu: disallow replacing fences in timeline syncobjs
Similarly to the host path from drm_syncobj.c we would like to disallow those operations to help applications figure where they using the wrong kind of ioctl. Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 2e53feed40e2..d9bbc31e97d0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1159,6 +1159,8 @@ static int amdgpu_cs_process_syncobj_out_dep(struct amdgpu_cs_parser *p, drm_syncobj_find(p->filp, deps[i].handle); if (!p->post_deps[i].syncobj) return -EINVAL; + if (p->post_deps[i].syncobj->is_timeline) + return -EINVAL; p->post_deps[i].chain = NULL; p->post_deps[i].point = 0; p->num_post_deps++; -- 2.23.0 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 1/3] drm/syncobj: protect timeline syncobjs
Binary/legacy signal operations on a syncobj work by replacing the dma_fence held within the syncobj. Whe dealing with timeline semaphores we would like to avoid this as this would effectivelly lead to looser synchronization (by discarding the dma_fence_chain mechanism waiting on all previous dma_fence to signal before signal itself). This change adds a flags that can be used at creation of the syncobj to mean that the syncobj will hold a timeline of dma_fence (using dma_fence_chain). When flagged as such, the dma_fence held by the syncobj should not be replaced but instead we should always adding to the timeline. Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/drm_syncobj.c | 30 +- include/drm/drm_syncobj.h | 8 include/uapi/drm/drm.h| 1 + 3 files changed, 38 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 72d083acd388..69d43c791a42 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -476,6 +476,8 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags, if (flags & DRM_SYNCOBJ_CREATE_SIGNALED) drm_syncobj_assign_null_handle(syncobj); + if (flags & DRM_SYNCOBJ_CREATE_TIMELINE) + syncobj->is_timeline = true; if (fence) drm_syncobj_replace_fence(syncobj, fence); @@ -661,6 +663,10 @@ static int drm_syncobj_import_sync_file_fence(struct drm_file *file_private, dma_fence_put(fence); return -ENOENT; } + if (syncobj->is_timeline) { + dma_fence_put(fence); + return -EINVAL; + } drm_syncobj_replace_fence(syncobj, fence); dma_fence_put(fence); @@ -749,7 +755,13 @@ drm_syncobj_create_ioctl(struct drm_device *dev, void *data, return -EOPNOTSUPP; /* no valid flags yet */ - if (args->flags & ~DRM_SYNCOBJ_CREATE_SIGNALED) + if (args->flags & ~(DRM_SYNCOBJ_CREATE_SIGNALED | + DRM_SYNCOBJ_CREATE_TIMELINE)) + return -EINVAL; + + /* Creating a signaled timeline makes no sense. */ + if ((args->flags & DRM_SYNCOBJ_CREATE_SIGNALED) && + (args->flags & DRM_SYNCOBJ_CREATE_TIMELINE)) return -EINVAL; return drm_syncobj_create_as_handle(file_private, @@ -862,6 +874,10 @@ drm_syncobj_transfer_to_binary(struct drm_file *file_private, binary_syncobj = drm_syncobj_find(file_private, args->dst_handle); if (!binary_syncobj) return -ENOENT; + if (binary_syncobj->is_timeline) { + ret = -EINVAL; + goto err; + } ret = drm_syncobj_find_fence(file_private, args->src_handle, args->src_point, args->flags, &fence); if (ret) @@ -1137,6 +1153,7 @@ static int drm_syncobj_array_wait(struct drm_device *dev, static int drm_syncobj_array_find(struct drm_file *file_private, void __user *user_handles, uint32_t count_handles, + bool no_timeline, struct drm_syncobj ***syncobjs_out) { uint32_t i, *handles; @@ -1165,6 +1182,10 @@ static int drm_syncobj_array_find(struct drm_file *file_private, ret = -ENOENT; goto err_put_syncobjs; } + if (no_timeline && syncobjs[i]->is_timeline) { + ret = -EINVAL; + goto err_put_syncobjs; + } } kfree(handles); @@ -1211,6 +1232,7 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data, ret = drm_syncobj_array_find(file_private, u64_to_user_ptr(args->handles), args->count_handles, +false, &syncobjs); if (ret < 0) return ret; @@ -1245,6 +1267,7 @@ drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data, ret = drm_syncobj_array_find(file_private, u64_to_user_ptr(args->handles), args->count_handles, +false, &syncobjs); if (ret < 0) return ret; @@ -1279,6 +1302,7 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data, ret = drm_syncobj_array_find(file_private, u64_to_user_ptr(args->handles), args->count_handles, +false,
[PATCH 3/3] drm/i915: disallow replacing fences of timeline syncobjs
Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 09248398fa7b..f1af3490f96b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -2494,6 +2494,14 @@ get_legacy_fence_array(struct i915_execbuffer *eb, goto err; } + if ((user_fence.flags & I915_EXEC_FENCE_SIGNAL) && + syncobj->is_timeline) { + DRM_DEBUG("Cannot replace fence in timeline syncobj\n"); + drm_syncobj_put(syncobj); + err = -EINVAL; + goto err; + } + if (user_fence.flags & I915_EXEC_FENCE_WAIT) { fence = drm_syncobj_fence_get(syncobj); if (!fence) { -- 2.23.0 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 0/3] drm/syncobj: add protection against timeline resets
Hi all, Following Jason's suggestion on another thread adding timeline documentation [1], here is a small series adding a creation flag to syncobjs so that users are prevented to drop the existing timeline fences in the syncobj, effectivelly ensuring a user always adds to the dma_fence_chain instead of replacing it. We still allow explicit reset. Apart from the fact we need to enforce this policy in each driver's submission path, I haven't run into odds things yet. Cheers, [1] : https://lists.freedesktop.org/archives/dri-devel/2019-August/232700.html Lionel Landwerlin (3): drm/syncobj: protect timeline syncobjs drm/amd/amdgpu: disallow replacing fences in timeline syncobjs drm/i915: disallow replacing fences of timeline syncobjs drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 2 ++ drivers/gpu/drm/drm_syncobj.c | 30 ++- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 8 + include/drm/drm_syncobj.h | 8 + include/uapi/drm/drm.h| 1 + 5 files changed, 48 insertions(+), 1 deletion(-) -- 2.23.0 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/syncobj: Add documentation for timeline syncobj
On 26/08/2019 00:01, Daniel Vetter wrote: On Fri, Aug 23, 2019 at 8:53 PM Jason Ekstrand wrote: On Thu, Aug 22, 2019 at 5:28 PM Lionel Landwerlin wrote: On 22/08/2019 21:24, Jason Ekstrand wrote: On Thu, Aug 22, 2019 at 9:55 AM Lionel Landwerlin wrote: We've added a set of new APIs to manipulate syncobjs holding timelines of dma_fence. This adds a bit of documentation about how this works. Signed-off-by: Lionel Landwerlin Cc: Christian Koenig Cc: Jason Ekstrand Cc: David(ChunMing) Zhou --- drivers/gpu/drm/drm_syncobj.c | 87 +-- 1 file changed, 74 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index b5ad73330a48..32ffded6d2c0 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -43,27 +43,66 @@ * - Signal a syncobj (set a trivially signaled fence) * - Wait for a syncobj's fence to appear and be signaled * + * The syncobj userspace API also provides operations to manipulate a syncobj + * in terms of a timeline of struct &dma_fence rather than a single struct + * &dma_fence, through the following operations: + * + * - Signal a given point on the timeline + * - Wait for a given point to appear and/or be signaled + * - Import and export from/to a given point of a timeline + * * At it's core, a syncobj is simply a wrapper around a pointer to a struct * &dma_fence which may be NULL. * When a syncobj is first created, its pointer is either NULL or a pointer * to an already signaled fence depending on whether the * &DRM_SYNCOBJ_CREATE_SIGNALED flag is passed to * &DRM_IOCTL_SYNCOBJ_CREATE. - * When GPU work which signals a syncobj is enqueued in a DRM driver, - * the syncobj fence is replaced with a fence which will be signaled by the - * completion of that work. - * When GPU work which waits on a syncobj is enqueued in a DRM driver, the - * driver retrieves syncobj's current fence at the time the work is enqueued - * waits on that fence before submitting the work to hardware. - * If the syncobj's fence is NULL, the enqueue operation is expected to fail. - * All manipulation of the syncobjs's fence happens in terms of the current - * fence at the time the ioctl is called by userspace regardless of whether - * that operation is an immediate host-side operation (signal or reset) or - * or an operation which is enqueued in some driver queue. - * &DRM_IOCTL_SYNCOBJ_RESET and &DRM_IOCTL_SYNCOBJ_SIGNAL can be used to - * manipulate a syncobj from the host by resetting its pointer to NULL or + * + * If the syncobj is considered as a binary (signal/unsignaled) primitive, What does "considered as a binary" mean? Is it an inherent property of the syncobj given at create time? Is it a state the syncobj can be in? Or is it a property of how the submit ioctl in the DRM driver references it? I'm really hoping it's either 1 or 3 3: you either use it binary/legacy apis, or timeline apis. timeline apis also provide some binary compatibility with the point 0 (in particular for wait). Right. Maybe we should say something like "When GPU work is enqueued which signals a non-zero time point" or something like that? I guess that implies a certain unification across drivers that maybe we don't want [Just jumping in on this comment here] I thought the point of syncobj is that you can share them across drivers (not just within drivers)? Otherwise not much sense in the common infrastructure. Hence I'd say we should spec all these things. Concern from someone who's seen way too many cross-driver apis that turned out the be decidedly cross-driver than planned ... The sharing of a timeline semaphore/syncobj between 2 apps/drivers implies that they both know they're dealing with a timeline semaphore. I see that at the same level as sharing a file descriptor and knowing it represents a syncfd or a syncobj. There has to be some kind of understanding, otherwise nothing works. If the shared semantic between the 2 clients is a binary (signal/unsignaled) semaphore, then both drivers should share the existing syncobj type, that is a syncobj that will only ever contain a single dma-fence. You can build that out of the timeline by exporting a particular point into another syncobj (transfer ioctl). -Lionel Cheers, Daniel + * when GPU work is enqueued in a DRM driver to signal the syncobj, the fence + * is replaced with a fence which will be signaled by the completion of that + * work. + * If the syncobj is considered as a timeline primitive, when GPU work is + * enqueued in a DRM driver to signal the a given point of the syncobj, a new + * struct &dma_fence_chain pointing to the DRM driver's fence and also + * pointing to the previous fence that was in the syncobj. The new struct + * &dma_fence_chain fen
Re: [PATCH] drm/syncobj: Add documentation for timeline syncobj
On 22/08/2019 21:24, Jason Ekstrand wrote: On Thu, Aug 22, 2019 at 9:55 AM Lionel Landwerlin mailto:lionel.g.landwer...@intel.com>> wrote: We've added a set of new APIs to manipulate syncobjs holding timelines of dma_fence. This adds a bit of documentation about how this works. Signed-off-by: Lionel Landwerlin mailto:lionel.g.landwer...@intel.com>> Cc: Christian Koenig mailto:christian.koe...@amd.com>> Cc: Jason Ekstrand mailto:ja...@jlekstrand.net>> Cc: David(ChunMing) Zhou mailto:david1.z...@amd.com>> --- drivers/gpu/drm/drm_syncobj.c | 87 +-- 1 file changed, 74 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index b5ad73330a48..32ffded6d2c0 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -43,27 +43,66 @@ * - Signal a syncobj (set a trivially signaled fence) * - Wait for a syncobj's fence to appear and be signaled * + * The syncobj userspace API also provides operations to manipulate a syncobj + * in terms of a timeline of struct &dma_fence rather than a single struct + * &dma_fence, through the following operations: + * + * - Signal a given point on the timeline + * - Wait for a given point to appear and/or be signaled + * - Import and export from/to a given point of a timeline + * * At it's core, a syncobj is simply a wrapper around a pointer to a struct * &dma_fence which may be NULL. * When a syncobj is first created, its pointer is either NULL or a pointer * to an already signaled fence depending on whether the * &DRM_SYNCOBJ_CREATE_SIGNALED flag is passed to * &DRM_IOCTL_SYNCOBJ_CREATE. - * When GPU work which signals a syncobj is enqueued in a DRM driver, - * the syncobj fence is replaced with a fence which will be signaled by the - * completion of that work. - * When GPU work which waits on a syncobj is enqueued in a DRM driver, the - * driver retrieves syncobj's current fence at the time the work is enqueued - * waits on that fence before submitting the work to hardware. - * If the syncobj's fence is NULL, the enqueue operation is expected to fail. - * All manipulation of the syncobjs's fence happens in terms of the current - * fence at the time the ioctl is called by userspace regardless of whether - * that operation is an immediate host-side operation (signal or reset) or - * or an operation which is enqueued in some driver queue. - * &DRM_IOCTL_SYNCOBJ_RESET and &DRM_IOCTL_SYNCOBJ_SIGNAL can be used to - * manipulate a syncobj from the host by resetting its pointer to NULL or + * + * If the syncobj is considered as a binary (signal/unsignaled) primitive, What does "considered as a binary" mean? Is it an inherent property of the syncobj given at create time? Is it a state the syncobj can be in? Or is it a property of how the submit ioctl in the DRM driver references it? I'm really hoping it's either 1 or 3 3: you either use it binary/legacy apis, or timeline apis. timeline apis also provide some binary compatibility with the point 0 (in particular for wait). + * when GPU work is enqueued in a DRM driver to signal the syncobj, the fence + * is replaced with a fence which will be signaled by the completion of that + * work. + * If the syncobj is considered as a timeline primitive, when GPU work is + * enqueued in a DRM driver to signal the a given point of the syncobj, a new + * struct &dma_fence_chain pointing to the DRM driver's fence and also + * pointing to the previous fence that was in the syncobj. The new struct + * &dma_fence_chain fence put into the syncobj will be signaled by completion + * of the DRM driver's work and also any work associated with the fence + * previously in the syncobj. + * + * When GPU work which waits on a syncobj is enqueued in a DRM driver, at the + * time the work is enqueued, it waits on the fence coming from the syncobj + * before submitting the work to hardware. That fence is either : + * + * - The syncobj's current fence if the syncobj is considered as a binary + * primitive. + * - The struct &dma_fence associated with a given point if the syncobj is + * considered as a timeline primitive. + * + * If the syncobj's fence is NULL or not present in the syncobj's timeline, + * the enqueue operation is expected to fail. + * + * With binary syncobj, all manipulation of the syncobjs's fence happens in + * terms of t
[PATCH] drm/syncobj: Add documentation for timeline syncobj
We've added a set of new APIs to manipulate syncobjs holding timelines of dma_fence. This adds a bit of documentation about how this works. Signed-off-by: Lionel Landwerlin Cc: Christian Koenig Cc: Jason Ekstrand Cc: David(ChunMing) Zhou --- drivers/gpu/drm/drm_syncobj.c | 87 +-- 1 file changed, 74 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index b5ad73330a48..32ffded6d2c0 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -43,27 +43,66 @@ * - Signal a syncobj (set a trivially signaled fence) * - Wait for a syncobj's fence to appear and be signaled * + * The syncobj userspace API also provides operations to manipulate a syncobj + * in terms of a timeline of struct &dma_fence rather than a single struct + * &dma_fence, through the following operations: + * + * - Signal a given point on the timeline + * - Wait for a given point to appear and/or be signaled + * - Import and export from/to a given point of a timeline + * * At it's core, a syncobj is simply a wrapper around a pointer to a struct * &dma_fence which may be NULL. * When a syncobj is first created, its pointer is either NULL or a pointer * to an already signaled fence depending on whether the * &DRM_SYNCOBJ_CREATE_SIGNALED flag is passed to * &DRM_IOCTL_SYNCOBJ_CREATE. - * When GPU work which signals a syncobj is enqueued in a DRM driver, - * the syncobj fence is replaced with a fence which will be signaled by the - * completion of that work. - * When GPU work which waits on a syncobj is enqueued in a DRM driver, the - * driver retrieves syncobj's current fence at the time the work is enqueued - * waits on that fence before submitting the work to hardware. - * If the syncobj's fence is NULL, the enqueue operation is expected to fail. - * All manipulation of the syncobjs's fence happens in terms of the current - * fence at the time the ioctl is called by userspace regardless of whether - * that operation is an immediate host-side operation (signal or reset) or - * or an operation which is enqueued in some driver queue. - * &DRM_IOCTL_SYNCOBJ_RESET and &DRM_IOCTL_SYNCOBJ_SIGNAL can be used to - * manipulate a syncobj from the host by resetting its pointer to NULL or + * + * If the syncobj is considered as a binary (signal/unsignaled) primitive, + * when GPU work is enqueued in a DRM driver to signal the syncobj, the fence + * is replaced with a fence which will be signaled by the completion of that + * work. + * If the syncobj is considered as a timeline primitive, when GPU work is + * enqueued in a DRM driver to signal the a given point of the syncobj, a new + * struct &dma_fence_chain pointing to the DRM driver's fence and also + * pointing to the previous fence that was in the syncobj. The new struct + * &dma_fence_chain fence put into the syncobj will be signaled by completion + * of the DRM driver's work and also any work associated with the fence + * previously in the syncobj. + * + * When GPU work which waits on a syncobj is enqueued in a DRM driver, at the + * time the work is enqueued, it waits on the fence coming from the syncobj + * before submitting the work to hardware. That fence is either : + * + *- The syncobj's current fence if the syncobj is considered as a binary + * primitive. + *- The struct &dma_fence associated with a given point if the syncobj is + * considered as a timeline primitive. + * + * If the syncobj's fence is NULL or not present in the syncobj's timeline, + * the enqueue operation is expected to fail. + * + * With binary syncobj, all manipulation of the syncobjs's fence happens in + * terms of the current fence at the time the ioctl is called by userspace + * regardless of whether that operation is an immediate host-side operation + * (signal or reset) or or an operation which is enqueued in some driver + * queue. &DRM_IOCTL_SYNCOBJ_RESET and &DRM_IOCTL_SYNCOBJ_SIGNAL can be used + * to manipulate a syncobj from the host by resetting its pointer to NULL or * setting its pointer to a fence which is already signaled. * + * With timeline syncobj, all manipulation of the timeline fences happens in + * terms of the fence referred to in the timeline. See + * dma_fence_chain_find_seqno() to see how a given point is found in the + * timeline. + * + * Note that applications should be careful to always use timeline set of + * ioctl() when dealing with syncobj considered as timeline. Using a binary + * set of ioctl() with a syncobj considered as timeline could result incorrect + * synchronization. The use of binary syncobj is supported through the + * timeline set of ioctl() by using a point value of 0, this will reproduce + * the behavior of the binary set of ioctl() (for example replace the + * syncobj's fence when signaling). + * *
[PATCH v6] drm/syncobj: add sideband payload
The Vulkan timeline semaphores allow signaling to happen on the point of the timeline without all of the its dependencies to be created. The current 2 implementations (AMD/Intel) of the Vulkan spec on top of the Linux kernel are using a thread to wait on the dependencies of a given point to materialize and delay actual submission to the kernel driver until the wait completes. If a binary semaphore is submitted for signaling along the side of a timeline semaphore waiting for completion that means that the drm syncobj associated with that binary semaphore will not have a DMA fence associated with it by the time vkQueueSubmit() returns. This and the fact that a binary semaphore can be signaled and unsignaled as before its DMA fences materialize mean that we cannot just rely on the fence within the syncobj but we also need a sideband payload verifying that the fence in the syncobj matches the last submission from the Vulkan API point of view. This change adds a sideband payload that is incremented with signaled syncobj when vkQueueSubmit() is called. The next vkQueueSubmit() waiting on a the syncobj will read the sideband payload and wait for a fence chain element with a seqno superior or equal to the sideband payload value to be added into the fence chain and use that fence to trigger the submission on the kernel driver. v2: Use a separate ioctl to get/set the sideband value (Christian) v3: Use 2 ioctls for get/set (Christian) v4: Use a single new ioctl v5: a bunch of blattant mistakes Store payload atomically (Chris) v6: Only touch atomic value once (Jason) Signed-off-by: Lionel Landwerlin Reviewed-by: David Zhou (v5) Cc: Christian Koenig Cc: Jason Ekstrand Cc: David(ChunMing) Zhou --- drivers/gpu/drm/drm_internal.h | 2 ++ drivers/gpu/drm/drm_ioctl.c| 3 ++ drivers/gpu/drm/drm_syncobj.c | 59 +- include/drm/drm_syncobj.h | 9 ++ include/uapi/drm/drm.h | 17 ++ 5 files changed, 89 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h index 51a2055c8f18..e297dfd85019 100644 --- a/drivers/gpu/drm/drm_internal.h +++ b/drivers/gpu/drm/drm_internal.h @@ -208,6 +208,8 @@ int drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data, struct drm_file *file_private); int drm_syncobj_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file_private); +int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data, +struct drm_file *file_private); /* drm_framebuffer.c */ void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent, diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c index f675a3bb2c88..644d0bc800a4 100644 --- a/drivers/gpu/drm/drm_ioctl.c +++ b/drivers/gpu/drm/drm_ioctl.c @@ -703,6 +703,9 @@ static const struct drm_ioctl_desc drm_ioctls[] = { DRM_RENDER_ALLOW), DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_BINARY, drm_syncobj_binary_ioctl, + DRM_RENDER_ALLOW), + DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 0), DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, drm_crtc_queue_sequence_ioctl, 0), DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, DRM_MASTER), diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 4b5c7b0ed714..732310b2b367 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -1224,8 +1224,10 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data, if (ret < 0) return ret; - for (i = 0; i < args->count_handles; i++) + for (i = 0; i < args->count_handles; i++) { drm_syncobj_replace_fence(syncobjs[i], NULL); + atomic64_set(&syncobjs[i]->binary_payload, 0); + } drm_syncobj_array_free(syncobjs, args->count_handles); @@ -1395,6 +1397,61 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void *data, if (ret) break; } + + drm_syncobj_array_free(syncobjs, args->count_handles); + + return ret; +} + +int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data, +struct drm_file *file_private) +{ + struct drm_syncobj_binary_array *args = data; + struct drm_syncobj **syncobjs; + u32 __user *access_flags = u64_to_user_ptr(args->access_flags); + u64 __user *values = u64_to_user_ptr(args->values); + u32 i; + int ret; + + if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ_TIMELINE)) + return -EOPNOTSUPP; + + if (args->pad != 0) + return -EINVAL; +
[PATCH v5 1/1] drm/syncobj: add sideband payload
The Vulkan timeline semaphores allow signaling to happen on the point of the timeline without all of the its dependencies to be created. The current 2 implementations (AMD/Intel) of the Vulkan spec on top of the Linux kernel are using a thread to wait on the dependencies of a given point to materialize and delay actual submission to the kernel driver until the wait completes. If a binary semaphore is submitted for signaling along the side of a timeline semaphore waiting for completion that means that the drm syncobj associated with that binary semaphore will not have a DMA fence associated with it by the time vkQueueSubmit() returns. This and the fact that a binary semaphore can be signaled and unsignaled as before its DMA fences materialize mean that we cannot just rely on the fence within the syncobj but we also need a sideband payload verifying that the fence in the syncobj matches the last submission from the Vulkan API point of view. This change adds a sideband payload that is incremented with signaled syncobj when vkQueueSubmit() is called. The next vkQueueSubmit() waiting on a the syncobj will read the sideband payload and wait for a fence chain element with a seqno superior or equal to the sideband payload value to be added into the fence chain and use that fence to trigger the submission on the kernel driver. v2: Use a separate ioctl to get/set the sideband value (Christian) v3: Use 2 ioctls for get/set (Christian) v4: Use a single new ioctl v5: a bunch of blattant mistakes Store payload atomically (Chris) Signed-off-by: Lionel Landwerlin Cc: Christian Koenig Cc: Jason Ekstrand Cc: David(ChunMing) Zhou --- drivers/gpu/drm/drm_internal.h | 2 ++ drivers/gpu/drm/drm_ioctl.c| 3 ++ drivers/gpu/drm/drm_syncobj.c | 58 +- include/drm/drm_syncobj.h | 9 ++ include/uapi/drm/drm.h | 17 ++ 5 files changed, 88 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h index 51a2055c8f18..e297dfd85019 100644 --- a/drivers/gpu/drm/drm_internal.h +++ b/drivers/gpu/drm/drm_internal.h @@ -208,6 +208,8 @@ int drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data, struct drm_file *file_private); int drm_syncobj_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file_private); +int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data, +struct drm_file *file_private); /* drm_framebuffer.c */ void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent, diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c index f675a3bb2c88..644d0bc800a4 100644 --- a/drivers/gpu/drm/drm_ioctl.c +++ b/drivers/gpu/drm/drm_ioctl.c @@ -703,6 +703,9 @@ static const struct drm_ioctl_desc drm_ioctls[] = { DRM_RENDER_ALLOW), DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_BINARY, drm_syncobj_binary_ioctl, + DRM_RENDER_ALLOW), + DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 0), DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, drm_crtc_queue_sequence_ioctl, 0), DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, DRM_MASTER), diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index b927e482e554..d2d3a8d1374d 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -1150,8 +1150,10 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data, if (ret < 0) return ret; - for (i = 0; i < args->count_handles; i++) + for (i = 0; i < args->count_handles; i++) { drm_syncobj_replace_fence(syncobjs[i], NULL); + atomic64_set(&syncobjs[i]->binary_payload, 0); + } drm_syncobj_array_free(syncobjs, args->count_handles); @@ -1321,6 +1323,60 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void *data, if (ret) break; } + + drm_syncobj_array_free(syncobjs, args->count_handles); + + return ret; +} + +int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data, +struct drm_file *file_private) +{ + struct drm_syncobj_binary_array *args = data; + struct drm_syncobj **syncobjs; + u32 __user *access_flags = u64_to_user_ptr(args->access_flags); + u64 __user *values = u64_to_user_ptr(args->values); + u32 i; + int ret; + + if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ_TIMELINE)) + return -EOPNOTSUPP; + + if (args->pad != 0) + return -EINVAL; + + if (args->count_handles == 0) + return -EINVAL
[PATCH v5 0/1] drm/syncobj: add syncobj sideband payload for threaded submission
A bunch of fixes :) Lionel Landwerlin (1): drm/syncobj: add sideband payload drivers/gpu/drm/drm_internal.h | 2 ++ drivers/gpu/drm/drm_ioctl.c| 3 ++ drivers/gpu/drm/drm_syncobj.c | 58 +- include/drm/drm_syncobj.h | 9 ++ include/uapi/drm/drm.h | 17 ++ 5 files changed, 88 insertions(+), 1 deletion(-) -- 2.23.0.rc1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v4 1/1] drm/syncobj: add sideband payload
On 09/08/2019 15:27, Koenig, Christian wrote: Am 09.08.19 um 14:26 schrieb Lionel Landwerlin: On 09/08/2019 14:44, Chris Wilson wrote: Quoting Lionel Landwerlin (2019-08-09 12:30:30) diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 8a5b2f8f8eb9..1ce83853f997 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -785,6 +785,22 @@ struct drm_syncobj_timeline_array { __u32 pad; }; +struct drm_syncobj_binary_array { + /* A pointer to an array of u32 syncobj handles. */ + __u64 handles; + /* A pointer to an array of u32 access flags for each handle. */ + __u64 access_flags; + /* The binary value of a syncobj is read before it is incremented. */ +#define I915_DRM_SYNCOBJ_BINARY_ITEM_VALUE_READ (1u << 0) +#define I915_DRM_SYNCOBJ_BINARY_ITEM_VALUE_INC (1u << 1) You're not in Kansas anymore ;) -Chris Which means? :) You are in common DRM code, but the new defines start with I915_ Cheers, Christian. Oh dear... -Lionel -Lionel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v4 1/1] drm/syncobj: add sideband payload
On 09/08/2019 14:58, Chris Wilson wrote: Quoting Lionel Landwerlin (2019-08-09 12:30:30) +int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data, +struct drm_file *file_private) +{ + struct drm_syncobj_binary_array *args = data; + struct drm_syncobj **syncobjs; + u32 __user *access_flags = u64_to_user_ptr(args->access_flags); + u64 __user *values = u64_to_user_ptr(args->values); + u32 i; + int ret; + + if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ_TIMELINE)) + return -EOPNOTSUPP; + + if (args->pad != 0) + return -EINVAL; + + if (args->count_handles == 0) + return -EINVAL; You may find it easier to just return success for 0 handles. Slightly less obnoxious error handling? All the other ioctls in this file return EINVAL in that case. I'm just going for consistency. It's also a good indication for the application it can save itself an ioctl really :) + ret = drm_syncobj_array_find(file_private, +u64_to_user_ptr(args->handles), +args->count_handles, +&syncobjs); + if (ret < 0) + return ret; + + for (i = 0; i < args->count_handles; i++) { + u32 flags; + + copy_from_user(&flags, &access_flags[i], sizeof(flags)); + ret = ret ? -EFAULT : 0; Magic? if (get_user(flags, &access_flags[i[)) return -EFAULT; I give this no testing, I'm just trying to get some feedback about the direction. Thanks though :) + if (ret) + break; + + if (flags & I915_DRM_SYNCOBJ_BINARY_ITEM_VALUE_READ) { + copy_to_user(&values[i], &syncobjs[i]->binary_payload, sizeof(values[i])); + ret = ret ? -EFAULT : 0; More magic. if (put_user(&syncobjs[i]->binary_payload, &values[i])) return -EFAULT; + if (ret) + break; + } + + if (flags & I915_DRM_SYNCOBJ_BINARY_ITEM_VALUE_INC) + syncobjs[i]->binary_payload++; So if an error occurs how does the user know which syncobj were advanced before the error? (Or explain why it doesn't actually matter) The clue I guess is with read/inc, but confirmation of design would be nice. I guess we could toggle the access flag bits to notify that the actions were completed. Not atomic (the u64 write should really be to avoid total corruption) and nothing prevents userspace from racing. How safe is that in the overall design? Atomic would prevent issue related to 2 processes/threads seeing different values because of caching? If not then it's not really interesting for the use case. The increment should happen during the vkQueueSubmit() call and the value is only valid upon returning. The application is responsible for not having vkQueueSubmit()/vkWaitForFences() race. Not opposed to switch to atomic though. What would happen if the binary_payload was initialised to -1? The 0 value is problematic because it's also used for "whatever fence in the syncobj". I think we need to stick to the same rules as the timeline values : 0 is always signaled Thanks, -Lionel + } + drm_syncobj_array_free(syncobjs, args->count_handles); return ret; ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel