Re: [PATCH] drm/i915/gt: Report full vm address range

2024-03-14 Thread Lionel Landwerlin

Hi Andi,

In Mesa we've been relying on I915_CONTEXT_PARAM_GTT_SIZE so as long as 
that is adjusted by the kernel, we should be able to continue working 
without issues.


Acked-by: Lionel Landwerlin 

Thanks,

-Lionel

On 13/03/2024 21:39, Andi Shyti wrote:

Commit 9bb66c179f50 ("drm/i915: Reserve some kernel space per
vm") has reserved an object for kernel space usage.

Userspace, though, needs to know the full address range.

Fixes: 9bb66c179f50 ("drm/i915: Reserve some kernel space per vm")
Signed-off-by: Andi Shyti 
Cc: Andrzej Hajda 
Cc: Chris Wilson 
Cc: Lionel Landwerlin 
Cc: Michal Mrozek 
Cc: Nirmoy Das 
Cc:  # v6.2+
---
  drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c 
b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index fa46d2308b0e..d76831f50106 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -982,8 +982,9 @@ static int gen8_init_rsvd(struct i915_address_space *vm)
  
  	vm->rsvd.vma = i915_vma_make_unshrinkable(vma);

vm->rsvd.obj = obj;
-   vm->total -= vma->node.size;
+
return 0;
+
  unref:
i915_gem_object_put(obj);
return ret;





Re: [PATCH] drm/i915/perf: Clear out entire reports after reading if not power of 2 size

2023-05-23 Thread Lionel Landwerlin

On 22/05/2023 23:17, Ashutosh Dixit wrote:

Clearing out report id and timestamp as means to detect unlanded reports
only works if report size is power of 2. That is, only when report size is
a sub-multiple of the OA buffer size can we be certain that reports will
land at the same place each time in the OA buffer (after rewind). If report
size is not a power of 2, we need to zero out the entire report to be able
to detect unlanded reports reliably.

Cc: Umesh Nerlige Ramappa 
Signed-off-by: Ashutosh Dixit 


Sad but necessary unfortunately


Reviewed-by:  Lionel Landwerlin 



---
  drivers/gpu/drm/i915/i915_perf.c | 17 +++--
  1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 19d5652300eeb..58284156428dc 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -877,12 +877,17 @@ static int gen8_append_oa_reports(struct i915_perf_stream 
*stream,
stream->oa_buffer.last_ctx_id = ctx_id;
}
  
-		/*

-* Clear out the report id and timestamp as a means to detect 
unlanded
-* reports.
-*/
-   oa_report_id_clear(stream, report32);
-   oa_timestamp_clear(stream, report32);
+   if (is_power_of_2(report_size)) {
+   /*
+* Clear out the report id and timestamp as a means
+* to detect unlanded reports.
+*/
+   oa_report_id_clear(stream, report32);
+   oa_timestamp_clear(stream, report32);
+   } else {
+   /* Zero out the entire report */
+   memset(report32, 0, report_size);
+   }
}
  
  	if (start_offset != *offset) {





Re: [Intel-gfx] [PATCH v8 6/8] drm/i915/uapi/pxp: Add a GET_PARAM for PXP

2023-04-27 Thread Lionel Landwerlin

On 27/04/2023 21:19, Teres Alexis, Alan Previn wrote:

(fixed email addresses again - why is my Evolution client deteorating??)

On Thu, 2023-04-27 at 17:18 +, Teres Alexis, Alan Previn wrote:

On Wed, 2023-04-26 at 15:35 -0700, Justen, Jordan L wrote:

On 2023-04-26 11:17:16, Teres Alexis, Alan Previn wrote:

alan:snip

Can you tell that pxp is in progress, but not ready yet, as a separate
state from 'it will never work on this platform'? If so, maybe the
status could return something like:

0: It's never going to work
1: It's ready to use
2: It's starting and should work soon

I could see an argument for treating that as a case where we could
still advertise protected content support, but if we try to use it we
might be in for a nasty delay.


alan: IIRC Lionel seemed okay with any permutation that would allow it to not
get blocked. Daniele did ask for something similiar to what u mentioned above
but he said that is non-blocking. But since both you AND Daniele have mentioned
the same thing, i shall re-rev this and send that change out today.
I notice most GET_PARAMS use -ENODEV for "never gonna work" so I will stick 
with that.
but 1 = ready to use and 2 = starting and should work sounds good. so '0' will 
never
be returned - we just look for a positive value (from user space). I will also
make a PR for mesa side as soon as i get it tested. thanks for reviewing btw.

alan: I also realize with these final touch-ups, we can go back to the original
pxp-context-creation timeout of 250 milisecs like it was on ADL since the user
space component will have this new param to check on (so even farther down from
1 sec on the last couple of revs).

Jordan, Lional - i am thinking of creating the PR on MESA side to take advantage
of GET_PARAM on both get-caps AND runtime creation (latter will be useful to 
ensure
no unnecesssary delay experienced by Mesa stuck in kernel call - which 
practically
never happenned in ADL AFAIK):

1. MESA PXP get caps:
- use GET_PARAM (any positive number shall mean its supported).
2. MESA app-triggered PXP context creation (i.e. if caps was supported):
- use GET_PARAM to wait until positive number switches from "2" to "1".
- now call context creation. So at this point if it fails, we know its
  an actual failure.

you guys okay with above? (i'll re-rev this kernel series first and wait on your
ack or feedback before i create/ test/ submit a PR for Mesa side).



Sounds good.

Thanks,


-Lionel




Re: [PATCH v7 6/8] drm/i915/uapi/pxp: Fix UAPI spec comments and add GET_PARAM for PXP

2023-04-18 Thread Lionel Landwerlin

On 14/04/2023 18:17, Teres Alexis, Alan Previn wrote:

Hi Lionel, does this patch work for you?



Hi, Sorry for the late answer.

That looks good :


Acked-by: Lionel Landwerlin 


Thanks,


-Lionel




On Mon, 2023-04-10 at 10:22 -0700, Ceraolo Spurio, Daniele wrote:

On 4/6/2023 10:44 AM, Alan Previn wrote:

alan:snip


+/*
+ * Query the status of PXP support in i915.
+ *
+ * The query can fail in the following scenarios with the listed error codes:
+ *  -ENODEV = PXP support is not available on the GPU device or in the kernel
+ *due to missing component drivers or kernel configs.
+ * If the IOCTL is successful, the returned parameter will be set to one of the
+ * following values:
+ *   0 = PXP support maybe available but underlying SOC fusing, BIOS or 
firmware
+ *   configuration is unknown and a PXP-context-creation would be required
+ *   for final verification of feature availibility.

Would it be useful to add:

1 = PXP support is available

And start returning that after we've successfully created our first
session? Not sure if userspace would use this though, since they still
need to handle the 0 case anyway.
I'm also ok with this patch as-is, as long as you get an ack from the
userspace drivers for this interface behavior:

Reviewed-by: Daniele Ceraolo Spurio 

Daniele

alan:snip





Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO creation

2023-04-05 Thread Lionel Landwerlin

On 04/04/2023 19:04, Yang, Fei wrote:

Subject: Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO 
creation

On 01/04/2023 09:38, fei.y...@intel.com wrote:

From: Fei Yang 

To comply with the design that buffer objects shall have immutable
cache setting through out its life cycle, {set, get}_caching ioctl's
are no longer supported from MTL onward. With that change caching
policy can only be set at object creation time. The current code
applies a default (platform dependent) cache setting for all objects.
However this is not optimal for performance tuning. The patch extends
the existing gem_create uAPI to let user set PAT index for the object
at creation time.
The new extension is platform independent, so UMD's can switch to
using this extension for older platforms as well, while {set,
get}_caching are still supported on these legacy paltforms for compatibility 
reason.

Cc: Chris Wilson 
Cc: Matt Roper 
Signed-off-by: Fei Yang 
Reviewed-by: Andi Shyti 


Just like the protected content uAPI, there is no way for userspace to tell
this feature is available other than trying using it.

Given the issues with protected content, is it not thing we could want to add?

Sorry I'm not aware of the issues with protected content, could you elaborate?
There was a long discussion on teams uAPI channel, could you comment there if
any concerns?

https://teams.microsoft.com/l/message/19:f1767bda6734476ba0a9c7d147b928d1@thread.skype/1675860924675?tenantId=46c98d88-e344-4ed4-8496-4ed7712e255d&groupId=379f3ae1-d138-4205-bb65-d4c7d38cb481&parentMessageId=1675860924675&teamName=GSE%20OSGC&channelName=i915%20uAPI%20changes&createdTime=1675860924675&allowXTenantAccess=false

Thanks,
-Fei



We wanted to have a getparam to detect protected support and were told 
to detect it by trying to create a context with it.


Now it appears trying to create a protected context can block for 
several seconds.


Since we have to report capabilities to the user even before it creates 
protected contexts, any app is at risk of blocking.



-Lionel





Thanks,

-Lionel



---
   drivers/gpu/drm/i915/gem/i915_gem_create.c | 33 
   include/uapi/drm/i915_drm.h| 36 ++
   tools/include/uapi/drm/i915_drm.h  | 36 ++
   3 files changed, 105 insertions(+)





Re: [Intel-gfx] [PATCH 7/7] drm/i915: Allow user to set cache at BO creation

2023-04-04 Thread Lionel Landwerlin

On 01/04/2023 09:38, fei.y...@intel.com wrote:

From: Fei Yang 

To comply with the design that buffer objects shall have immutable
cache setting through out its life cycle, {set, get}_caching ioctl's
are no longer supported from MTL onward. With that change caching
policy can only be set at object creation time. The current code
applies a default (platform dependent) cache setting for all objects.
However this is not optimal for performance tuning. The patch extends
the existing gem_create uAPI to let user set PAT index for the object
at creation time.
The new extension is platform independent, so UMD's can switch to using
this extension for older platforms as well, while {set, get}_caching are
still supported on these legacy paltforms for compatibility reason.

Cc: Chris Wilson 
Cc: Matt Roper 
Signed-off-by: Fei Yang 
Reviewed-by: Andi Shyti 



Just like the protected content uAPI, there is no way for userspace to 
tell this feature is available other than trying using it.


Given the issues with protected content, is it not thing we could want 
to add?



Thanks,


-Lionel



---
  drivers/gpu/drm/i915/gem/i915_gem_create.c | 33 
  include/uapi/drm/i915_drm.h| 36 ++
  tools/include/uapi/drm/i915_drm.h  | 36 ++
  3 files changed, 105 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index e76c9703680e..1c6e2034d28e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -244,6 +244,7 @@ struct create_ext {
unsigned int n_placements;
unsigned int placement_mask;
unsigned long flags;
+   unsigned int pat_index;
  };
  
  static void repr_placements(char *buf, size_t size,

@@ -393,11 +394,39 @@ static int ext_set_protected(struct i915_user_extension 
__user *base, void *data
return 0;
  }
  
+static int ext_set_pat(struct i915_user_extension __user *base, void *data)

+{
+   struct create_ext *ext_data = data;
+   struct drm_i915_private *i915 = ext_data->i915;
+   struct drm_i915_gem_create_ext_set_pat ext;
+   unsigned int max_pat_index;
+
+   BUILD_BUG_ON(sizeof(struct drm_i915_gem_create_ext_set_pat) !=
+offsetofend(struct drm_i915_gem_create_ext_set_pat, rsvd));
+
+   if (copy_from_user(&ext, base, sizeof(ext)))
+   return -EFAULT;
+
+   max_pat_index = INTEL_INFO(i915)->max_pat_index;
+
+   if (ext.pat_index > max_pat_index) {
+   drm_dbg(&i915->drm, "PAT index is invalid: %u\n",
+   ext.pat_index);
+   return -EINVAL;
+   }
+
+   ext_data->pat_index = ext.pat_index;
+
+   return 0;
+}
+
  static const i915_user_extension_fn create_extensions[] = {
[I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements,
[I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected,
+   [I915_GEM_CREATE_EXT_SET_PAT] = ext_set_pat,
  };
  
+#define PAT_INDEX_NOT_SET	0x

  /**
   * Creates a new mm object and returns a handle to it.
   * @dev: drm device pointer
@@ -417,6 +446,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void 
*data,
if (args->flags & ~I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS)
return -EINVAL;
  
+	ext_data.pat_index = PAT_INDEX_NOT_SET;

ret = i915_user_extensions(u64_to_user_ptr(args->extensions),
   create_extensions,
   ARRAY_SIZE(create_extensions),
@@ -453,5 +483,8 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void 
*data,
if (IS_ERR(obj))
return PTR_ERR(obj);
  
+	if (ext_data.pat_index != PAT_INDEX_NOT_SET)

+   i915_gem_object_set_pat_index(obj, ext_data.pat_index);
+
return i915_gem_publish(obj, file, &args->size, &args->handle);
  }
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index dba7c5a5b25e..03c5c314846e 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3630,9 +3630,13 @@ struct drm_i915_gem_create_ext {
 *
 * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see
 * struct drm_i915_gem_create_ext_protected_content.
+*
+* For I915_GEM_CREATE_EXT_SET_PAT usage see
+* struct drm_i915_gem_create_ext_set_pat.
 */
  #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0
  #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1
+#define I915_GEM_CREATE_EXT_SET_PAT 2
__u64 extensions;
  };
  
@@ -3747,6 +3751,38 @@ struct drm_i915_gem_create_ext_protected_content {

__u32 flags;
  };
  
+/**

+ * struct drm_i915_gem_create_ext_set_pat - The
+ * I915_GEM_CREATE_EXT_SET_PAT extension.
+ *
+ * If this extension is provided, the specified caching policy (PAT index) is
+ * applied to the buffer object.
+ *
+ * Below is an example on how to create an object with 

Re: [PATCH v6 5/8] drm/i915/pxp: Add ARB session creation and cleanup

2023-03-27 Thread Lionel Landwerlin

On 26/03/2023 14:18, Rodrigo Vivi wrote:

On Sat, Mar 25, 2023 at 02:19:21AM -0400, Teres Alexis, Alan Previn wrote:

alan:snip

@@ -353,8 +367,20 @@ int intel_pxp_start(struct intel_pxp *pxp)
alan:snip

+   if (HAS_ENGINE(pxp->ctrl_gt, GSC0)) {
+   /*
+* GSC-fw loading, GSC-proxy init (requiring an mei component 
driver) and
+* HuC-fw loading must all occur first before we start 
requesting for PXP
+* sessions. Checking HuC authentication (the last dependency)  
will suffice.
+* Let's use a much larger 8 second timeout considering all the 
types of
+* dependencies prior to that.
+*/
+   if (wait_for(intel_huc_is_authenticated(&pxp->ctrl_gt->uc.huc), 
8000))

This big timeout needs an ack from userspace drivers, as intel_pxp_start
is called during context creation and the current way to query if the
feature is supported is to create a protected context. Unfortunately, we
do need to wait to confirm that PXP is available (although in most cases
it shouldn't take even close to 8 secs), because until everything is
setup we're not sure if things will work as expected. I see 2 potential
mitigations in case the timeout doesn't work as-is:

1) we return -EAGAIN (or another dedicated error code) to userspace if
the prerequisite steps aren't done yet. This would indicate that the
feature is there, but that we haven't completed the setup yet. The
caller can then decide if they want to retry immediately or later. Pro:
more flexibility for userspace; Cons: new interface return code.

2) we add a getparam to say if PXP is supported in HW and the support is
compiled in i915. Userspace can query this as a way to check the feature
support and only create the context if they actually need it for PXP
operations. Pro: simpler kernel implementation; Cons: new getparam, plus
even if the getparam returns true the pxp_start could later fail, so
userspace needs to handle that case.


alan: I've cc'd Rodrigo, Joonas and Lionel. Folks - what are your thoughts on 
above issue?
Recap: On MTL, only when creating a GEM Protected (PXP) context for the very 
first time after
a driver load, it will be dependent on (1) loading the GSC firmware, (2) GuC 
loading the HuC
firmware and (3) GSC authenticating the HuC fw. But step 3 also depends on 
additional
GSC-proxy-init steps that depend on a new mei-gsc-proxy component driver. I'd 
used the
8 second number based on offline conversations with Daniele but that is a 
worse-case.
Alternatively, should we change UAPI instead to return -EAGAIN as per Daniele's 
proposal?
I believe we've had the get-param conversation offline recently and the 
direction was to
stick with attempting to create the context as it is normal in 3D UMD when it 
comes to
testing capabilities for other features too.

Thoughts?

I like the option 1 more. This extra return handling won't break compatibility.



I like option 2 better because we have to report support as fast as we 
can when enumerating devices on the system for example.


If I understand correctly, with the get param, most apps won't ever be 
blocking on any PXP stuff if they don't use it.


Only the ones that require protected support might block.


-Lionel





Re: [PATCH v2] drm/syncobj: Fix sync syncobj issue

2022-07-12 Thread Lionel Landwerlin

I'll let Lucas comment. I've only looked a little at it.
From what I remember just enabling sw_signaling was enough to fix the 
issue.


-Lionel

On 12/07/2022 13:26, Christian König wrote:

Ping to the Intel guys here. Especially Lucas/Nirmoy/Lionel.

IIRC you stumbled over that problem as well, have you found any solution?

Regards,
Christian.

Am 07.07.22 um 12:29 schrieb jie1zhan:

enable signaling after flatten dma_fence_chains on transfer

Signed-off-by: jie1zhan 
---
  drivers/gpu/drm/drm_syncobj.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index 7e48dcd1bee4..0d9d3577325f 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -920,6 +920,7 @@ static int 
drm_syncobj_transfer_to_timeline(struct drm_file *file_private,

  if (ret)
  goto err_free_fence;
  +    dma_fence_enable_sw_signaling(fence);
  chain = dma_fence_chain_alloc();
  if (!chain) {
  ret = -ENOMEM;






Re: [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-07-04 Thread Lionel Landwerlin

On 30/06/2022 20:12, Zanoni, Paulo R wrote:

Can you please explain what happens when we try to write to a range
that's bound as read-only?


It will be mapped as read-only in device page table. Hence any
write access will fail. I would expect a CAT error reported.

What's a CAT error? Does this lead to machine freeze or a GPU hang?
Let's make sure we document this.


Catastrophic error.

Reading the documentation, it seems the behavior depends on the context 
type.


With the Legacy 64bit context type, writes are ignored (BSpec 531) :

    - "For legacy context, the access rights are not applicable and 
should not be considered during page walk."


For Advanced 64bit context type, I think the HW will generate a pagefault.


-Lionel


Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-23 Thread Lionel Landwerlin

On 23/06/2022 14:05, Tvrtko Ursulin wrote:


On 23/06/2022 09:57, Lionel Landwerlin wrote:

On 23/06/2022 11:27, Tvrtko Ursulin wrote:


After a vm_unbind, UMD can re-bind to same VA range against an 
active VM.
Though I am not sue with Mesa usecase if that new mapping is 
required for
running GPU job or it will be for the next submission. But ensuring 
the

tlb flush upon unbind, KMD can ensure correctness.


Isn't that their problem? If they re-bind for submitting _new_ work 
then they get the flush as part of batch buffer pre-amble. 


In the non sparse case, if a VA range is unbound, it is invalid to 
use that range for anything until it has been rebound by something else.


We'll take the fence provided by vm_bind and put it as a wait fence 
on the next execbuffer.


It might be safer in case of memory over fetching?


TLB flush will have to happen at some point right?

What's the alternative to do it in unbind?


Currently TLB flush happens from the ring before every BB_START and 
also when i915 returns the backing store pages to the system.


For the former, I haven't seen any mention that for execbuf3 there are 
plans to stop doing it? Anyway, as long as this is kept and sequence 
of bind[1..N]+execbuf is safe and correctly sees all the preceding binds.
Hence about the alternative to doing it in unbind - first I think lets 
state the problem that is trying to solve.


For instance is it just for the compute "append work to the running 
batch" use case? I honestly don't remember how was that supposed to 
work so maybe the tlb flush on bind was supposed to deal with that 
scenario?


Or you see a problem even for Mesa with the current model?

Regards,

Tvrtko



As far as I can tell, all the binds should have completed before execbuf 
starts if you follow the vulkan sparse binding rules.


For non-sparse, the UMD will take care of it.

I think we're fine.


-Lionel




Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-23 Thread Lionel Landwerlin

On 22/06/2022 18:12, Niranjana Vishwanathapura wrote:

On Wed, Jun 22, 2022 at 09:10:07AM +0100, Tvrtko Ursulin wrote:


On 22/06/2022 04:56, Niranjana Vishwanathapura wrote:

VM_BIND and related uapi definitions

v2: Reduce the scope to simple Mesa use case.
v3: Expand VM_UNBIND documentation and add
    I915_GEM_VM_BIND/UNBIND_FENCE_VALID
    and I915_GEM_VM_BIND_TLB_FLUSH flags.

Signed-off-by: Niranjana Vishwanathapura 


---
 Documentation/gpu/rfc/i915_vm_bind.h | 243 +++
 1 file changed, 243 insertions(+)
 create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h

diff --git a/Documentation/gpu/rfc/i915_vm_bind.h 
b/Documentation/gpu/rfc/i915_vm_bind.h

new file mode 100644
index ..fa23b2d7ec6f
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_vm_bind.h
@@ -0,0 +1,243 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+/**
+ * DOC: I915_PARAM_HAS_VM_BIND
+ *
+ * VM_BIND feature availability.
+ * See typedef drm_i915_getparam_t param.
+ */
+#define I915_PARAM_HAS_VM_BIND    57
+
+/**
+ * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
+ *
+ * Flag to opt-in for VM_BIND mode of binding during VM creation.
+ * See struct drm_i915_gem_vm_control flags.
+ *
+ * The older execbuf2 ioctl will not support VM_BIND mode of 
operation.
+ * For VM_BIND mode, we have new execbuf3 ioctl which will not 
accept any

+ * execlist (See struct drm_i915_gem_execbuffer3 for more details).
+ *
+ */
+#define I915_VM_CREATE_FLAGS_USE_VM_BIND    (1 << 0)
+
+/* VM_BIND related ioctls */
+#define DRM_I915_GEM_VM_BIND    0x3d
+#define DRM_I915_GEM_VM_UNBIND    0x3e
+#define DRM_I915_GEM_EXECBUFFER3    0x3f
+
+#define DRM_IOCTL_I915_GEM_VM_BIND DRM_IOWR(DRM_COMMAND_BASE + 
DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_VM_UNBIND DRM_IOWR(DRM_COMMAND_BASE + 
DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3 DRM_IOWR(DRM_COMMAND_BASE + 
DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)

+
+/**
+ * struct drm_i915_gem_vm_bind_fence - Bind/unbind completion 
notification.

+ *
+ * A timeline out fence for vm_bind/unbind completion notification.
+ */
+struct drm_i915_gem_vm_bind_fence {
+    /** @handle: User's handle for a drm_syncobj to signal. */
+    __u32 handle;
+
+    /** @rsvd: Reserved, MBZ */
+    __u32 rsvd;
+
+    /**
+ * @value: A point in the timeline.
+ * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
+ * timeline drm_syncobj is invalid as it turns a drm_syncobj 
into a

+ * binary one.
+ */
+    __u64 value;
+};
+
+/**
+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
+ *
+ * This structure is passed to VM_BIND ioctl and specifies the 
mapping of GPU
+ * virtual address (VA) range to the section of an object that 
should be bound

+ * in the device page table of the specified address space (VM).
+ * The VA range specified must be unique (ie., not currently bound) 
and can
+ * be mapped to whole object or a section of the object (partial 
binding).
+ * Multiple VA mappings can be created to the same section of the 
object

+ * (aliasing).
+ *
+ * The @start, @offset and @length should be 4K page aligned. 
However the DG2
+ * and XEHPSDV has 64K page size for device local-memory and has 
compact page
+ * table. On those platforms, for binding device local-memory 
objects, the
+ * @start should be 2M aligned, @offset and @length should be 64K 
aligned.


Should some error codes be documented and has the ability to 
programmatically probe the alignment restrictions been considered?




Currently what we have internally is that -EINVAL is returned if the 
sart, offset
and length are not aligned. If the specified mapping already exits, we 
return
-EEXIST. If there are conflicts in the VA range and VA range can't be 
reserved,
then -ENOSPC is returned. I can add this documentation here. But I am 
worried
that there will be more suggestions/feedback about error codes while 
reviewing

the code patch series, and we have to revisit it again.



That's not really a good excuse to not document.




+ * Also, on those platforms, it is not allowed to bind an device 
local-memory
+ * object and a system memory object in a single 2M section of VA 
range.


Text should be clear whether "not allowed" means there will be an 
error returned, or it will appear to work but bad things will happen.




Yah, error returned, will fix.


+ */
+struct drm_i915_gem_vm_bind {
+    /** @vm_id: VM (address space) id to bind */
+    __u32 vm_id;
+
+    /** @handle: Object handle */
+    __u32 handle;
+
+    /** @start: Virtual Address start to bind */
+    __u64 start;
+
+    /** @offset: Offset in object to bind */
+    __u64 offset;
+
+    /** @length: Length of mapping to bind */
+    __u64 length;
+
+    /**
+ * @flags: Supported flags are:
+ *
+ * I915_GEM_VM_BIND_FENCE_VALID:
+ * @fence is valid, needs bind completion notificati

Re: [Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-23 Thread Lionel Landwerlin

On 23/06/2022 11:27, Tvrtko Ursulin wrote:


After a vm_unbind, UMD can re-bind to same VA range against an active 
VM.
Though I am not sue with Mesa usecase if that new mapping is required 
for

running GPU job or it will be for the next submission. But ensuring the
tlb flush upon unbind, KMD can ensure correctness.


Isn't that their problem? If they re-bind for submitting _new_ work 
then they get the flush as part of batch buffer pre-amble. 


In the non sparse case, if a VA range is unbound, it is invalid to use 
that range for anything until it has been rebound by something else.


We'll take the fence provided by vm_bind and put it as a wait fence on 
the next execbuffer.


It might be safer in case of memory over fetching?


TLB flush will have to happen at some point right?

What's the alternative to do it in unbind?


-Lionel



Re: [PATCH v2 01/12] drm/doc: add rfc section for small BAR uapi

2022-06-21 Thread Lionel Landwerlin

On 21/06/2022 13:44, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)
v3:
   - Drop the vma query for now.
   - Add unallocated_cpu_visible_size as part of the region query.
   - Improve the docs some more, including documenting the expected
 behaviour on older kernels, since this came up in some offline
 discussion.
v4:
   - Various improvements all over. (Tvrtko)

v5:
   - Include newer integrated platforms when applying the non-recoverable
 context and error capture restriction. (Thomas)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Tvrtko Ursulin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-...@lists.freedesktop.org
Acked-by: Tvrtko Ursulin 
Acked-by: Akeem G Abodunrin 



With Jordan with have changes for Anv/Iris : 
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16739


Acked-by: Lionel Landwerlin 



---
  Documentation/gpu/rfc/i915_small_bar.h   | 189 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  47 ++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 240 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h
new file mode 100644
index ..752bb2ceb399
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,189 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as known to the
+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS
+ * at &drm_i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+   /** @region: The class:instance pair encoding */
+   struct drm_i915_gem_memory_class_instance region;
+
+   /** @rsvd0: MBZ */
+   __u32 rsvd0;
+
+   /**
+* @probed_size: Memory probed by the driver (-1 = unknown)
+*
+* Note that it should not be possible to ever encounter a zero value
+* here, also note that no current region type will ever return -1 here.
+* Although for future region types, this might be a possibility. The
+* same applies to the other size fields.
+*/
+   __u64 probed_size;
+
+   /**
+* @unallocated_size: Estimate of memory remaining (-1 = unknown)
+*
+* Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable accounting.
+* Without this (or if this is an older kernel) the value here will
+* always equal the @probed_size. Note this is only currently tracked
+* for I915_MEMORY_CLASS_DEVICE regions (for other types the value here
+* will always equal the @probed_size).
+*/
+   __u64 unallocated_size;
+
+   union {
+   /** @rsvd1: MBZ */
+   __u64 rsvd1[8];
+   struct {
+   /**
+* @probed_cpu_visible_size: Memory probed by the driver
+* that is CPU accessible. (-1 = unknown).
+*
+* This will be always be <= @probed_size, and the
+* remainder (if there is any) will not be CPU
+* accessible.
+*
+* On systems without small BAR, the @probed_size will
+* always equal the @probed_cpu_visible_size, since all
+* of it will be CPU accessible.
+*
+* Note this is only tracked for
+* I915_MEMORY_CLASS_DEVICE regions (for other types the
+* value here will always equal the @probed_size).
+*
+* Note that if the value returned here is zero, then
+* this must be an old kernel which lacks the relevant
+* small-bar uAPI support (including
+* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS), but on
+* such systems we should never actually end up with a
+* small BAR configuration, assuming we are able to load
+* the kernel module. Hence it should be safe to treat
+* this the same as when @probed_cpu_visible_size ==
+* @probed_size.
+*/
+   __u64 probed_cpu_v

Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document

2022-06-14 Thread Lionel Landwerlin

On 13/06/2022 21:02, Niranjana Vishwanathapura wrote:

On Mon, Jun 13, 2022 at 06:33:07AM -0700, Zeng, Oak wrote:



Regards,
Oak


-Original Message-
From: Intel-gfx  On Behalf 
Of Niranjana

Vishwanathapura
Sent: June 10, 2022 1:43 PM
To: Landwerlin, Lionel G 
Cc: Intel GFX ; Maling list - DRI 
developers de...@lists.freedesktop.org>; Hellstrom, Thomas 
;

Wilson, Chris P ; Vetter, Daniel
; Christian König 
Subject: Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature 
design

document

On Fri, Jun 10, 2022 at 11:18:14AM +0300, Lionel Landwerlin wrote:
>On 10/06/2022 10:54, Niranjana Vishwanathapura wrote:
>>On Fri, Jun 10, 2022 at 09:53:24AM +0300, Lionel Landwerlin wrote:
>>>On 09/06/2022 22:31, Niranjana Vishwanathapura wrote:
>>>>On Thu, Jun 09, 2022 at 05:49:09PM +0300, Lionel Landwerlin wrote:
>>>>>  On 09/06/2022 00:55, Jason Ekstrand wrote:
>>>>>
>>>>>    On Wed, Jun 8, 2022 at 4:44 PM Niranjana Vishwanathapura
>>>>>  wrote:
>>>>>
>>>>>  On Wed, Jun 08, 2022 at 08:33:25AM +0100, Tvrtko Ursulin 
wrote:

>>>>>  >
>>>>>  >
>>>>>  >On 07/06/2022 22:32, Niranjana Vishwanathapura wrote:
>>>>>  >>On Tue, Jun 07, 2022 at 11:18:11AM -0700, Niranjana
>>>>>Vishwanathapura
>>>>>  wrote:
>>>>>  >>>On Tue, Jun 07, 2022 at 12:12:03PM -0500, Jason
>>>>>Ekstrand wrote:
>>>>>  >>>> On Fri, Jun 3, 2022 at 6:52 PM Niranjana 
Vishwanathapura

>>>>>  >>>>  wrote:
>>>>>  >>>>
>>>>>  >>>>   On Fri, Jun 03, 2022 at 10:20:25AM +0300, Lionel
>>>>>Landwerlin
>>>>>      wrote:
>>>>>  >>>>   >   On 02/06/2022 23:35, Jason Ekstrand wrote:
>>>>>  >>>>   >
>>>>>  >>>>   > On Thu, Jun 2, 2022 at 3:11 PM Niranjana
>>>>>Vishwanathapura
>>>>>  >>>>   >  wrote:
>>>>>  >>>>   >
>>>>>  >>>>   >   On Wed, Jun 01, 2022 at 01:28:36PM -0700, 
Matthew

>>>>>  >>>>Brost wrote:
>>>>>  >>>>   >   >On Wed, Jun 01, 2022 at 05:25:49PM +0300, 
Lionel

>>>>>  Landwerlin
>>>>>  >>>>   wrote:
>>>>>  >>>>   > >> On 17/05/2022 21:32, Niranjana Vishwanathapura
>>>>>  wrote:
>>>>>  >>>>   > >> > +VM_BIND/UNBIND ioctl will immediately start
>>>>>  >>>>   binding/unbinding
>>>>>  >>>>   >   the mapping in an
>>>>>  >>>>   > >> > +async worker. The binding and
>>>>>unbinding will
>>>>>  >>>>work like a
>>>>>  >>>>   special
>>>>>  >>>>   >   GPU engine.
>>>>>  >>>>   > >> > +The binding and unbinding operations are
>>>>>  serialized and
>>>>>  >>>>   will
>>>>>  >>>>   >   wait on specified
>>>>>  >>>>   > >> > +input fences before the operation
>>>>>and will signal
>>>>>  the
>>>>>  >>>>   output
>>>>>  >>>>   >   fences upon the
>>>>>  >>>>   > >> > +completion of the operation. Due to
>>>>>  serialization,
>>>>>  >>>>   completion of
>>>>>  >>>>   >   an operation
>>>>>  >>>>   > >> > +will also indicate that all
>>>>>previous operations
>>>>>  >>>>are also
>>>>>  >>>>   > complete.
>>>>>  >>>>   > >>
>>>>>  >>>>   > >> I guess we should avoid saying "will
>>>>>immediately
>>>>>  start
>>>>>  >>>>   > binding/unbinding" if
>>>>>  >>>>   > >> there are fences involved.
>>>>>  >>>>   > >>
>>>>>  >>>>   > >> And the fact that i

Re: [PATCH 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-13 Thread Lionel Landwerlin

On 10/06/2022 11:53, Matthew Brost wrote:

On Fri, Jun 10, 2022 at 12:07:11AM -0700, Niranjana Vishwanathapura wrote:

VM_BIND and related uapi definitions

Signed-off-by: Niranjana Vishwanathapura 
---
  Documentation/gpu/rfc/i915_vm_bind.h | 490 +++
  1 file changed, 490 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h

diff --git a/Documentation/gpu/rfc/i915_vm_bind.h 
b/Documentation/gpu/rfc/i915_vm_bind.h
new file mode 100644
index ..9fc854969cfb
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_vm_bind.h
@@ -0,0 +1,490 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+/**
+ * DOC: I915_PARAM_HAS_VM_BIND
+ *
+ * VM_BIND feature availability.
+ * See typedef drm_i915_getparam_t param.
+ * bit[0]: If set, VM_BIND is supported, otherwise not.
+ * bits[8-15]: VM_BIND implementation version.
+ * version 0 will not have VM_BIND/UNBIND timeline fence array support.
+ */
+#define I915_PARAM_HAS_VM_BIND 57
+
+/**
+ * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
+ *
+ * Flag to opt-in for VM_BIND mode of binding during VM creation.
+ * See struct drm_i915_gem_vm_control flags.
+ *
+ * The older execbuf2 ioctl will not support VM_BIND mode of operation.
+ * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
+ * execlist (See struct drm_i915_gem_execbuffer3 for more details).
+ *
+ */
+#define I915_VM_CREATE_FLAGS_USE_VM_BIND   (1 << 0)
+
+/**
+ * DOC: I915_CONTEXT_CREATE_FLAGS_LONG_RUNNING
+ *
+ * Flag to declare context as long running.
+ * See struct drm_i915_gem_context_create_ext flags.
+ *
+ * Usage of dma-fence expects that they complete in reasonable amount of time.
+ * Compute on the other hand can be long running. Hence it is not appropriate
+ * for compute contexts to export request completion dma-fence to user.
+ * The dma-fence usage will be limited to in-kernel consumption only.
+ * Compute contexts need to use user/memory fence.
+ *
+ * So, long running contexts do not support output fences. Hence,
+ * I915_EXEC_FENCE_SIGNAL (See &drm_i915_gem_exec_fence.flags) is expected
+ * to be not used. DRM_I915_GEM_WAIT ioctl call is also not supported for
+ * objects mapped to long running contexts.
+ */
+#define I915_CONTEXT_CREATE_FLAGS_LONG_RUNNING   (1u << 2)
+
+/* VM_BIND related ioctls */
+#define DRM_I915_GEM_VM_BIND   0x3d
+#define DRM_I915_GEM_VM_UNBIND 0x3e
+#define DRM_I915_GEM_EXECBUFFER3   0x3f
+#define DRM_I915_GEM_WAIT_USER_FENCE   0x40
+
+#define DRM_IOCTL_I915_GEM_VM_BIND DRM_IOWR(DRM_COMMAND_BASE + 
DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_VM_UNBIND   DRM_IOWR(DRM_COMMAND_BASE + 
DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3 DRM_IOWR(DRM_COMMAND_BASE + 
DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
+#define DRM_IOCTL_I915_GEM_WAIT_USER_FENCE DRM_IOWR(DRM_COMMAND_BASE + 
DRM_I915_GEM_WAIT_USER_FENCE, struct drm_i915_gem_wait_user_fence)
+
+/**
+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
+ *
+ * This structure is passed to VM_BIND ioctl and specifies the mapping of GPU
+ * virtual address (VA) range to the section of an object that should be bound
+ * in the device page table of the specified address space (VM).
+ * The VA range specified must be unique (ie., not currently bound) and can
+ * be mapped to whole object or a section of the object (partial binding).
+ * Multiple VA mappings can be created to the same section of the object
+ * (aliasing).
+ *
+ * The @queue_idx specifies the queue to use for binding. Same queue can be
+ * used for both VM_BIND and VM_UNBIND calls. All submitted bind and unbind
+ * operations in a queue are performed in the order of submission.
+ *
+ * The @start, @offset and @length should be 4K page aligned. However the DG2
+ * and XEHPSDV has 64K page size for device local-memory and has compact page
+ * table. On those platforms, for binding device local-memory objects, the
+ * @start should be 2M aligned, @offset and @length should be 64K aligned.
+ * Also, on those platforms, it is not allowed to bind an device local-memory
+ * object and a system memory object in a single 2M section of VA range.
+ */
+struct drm_i915_gem_vm_bind {
+   /** @vm_id: VM (address space) id to bind */
+   __u32 vm_id;
+
+   /** @queue_idx: Index of queue for binding */
+   __u32 queue_idx;
+
+   /** @rsvd: Reserved, MBZ */
+   __u32 rsvd;
+
+   /** @handle: Object handle */
+   __u32 handle;
+
+   /** @start: Virtual Address start to bind */
+   __u64 start;
+
+   /** @offset: Offset in object to bind */
+   __u64 offset;
+
+   /** @length: Length of mapping to bind */
+   __u64 length;

This probably isn't needed. We are never going to unbind a subset of a
VMA are we? That being said it can't hurt as a sanity check (e.g.
internal vma->le

Re: [Intel-gfx] [PATCH 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-10 Thread Lionel Landwerlin

On 10/06/2022 13:37, Tvrtko Ursulin wrote:


On 10/06/2022 08:07, Niranjana Vishwanathapura wrote:

VM_BIND and related uapi definitions

Signed-off-by: Niranjana Vishwanathapura 


---
  Documentation/gpu/rfc/i915_vm_bind.h | 490 +++
  1 file changed, 490 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h

diff --git a/Documentation/gpu/rfc/i915_vm_bind.h 
b/Documentation/gpu/rfc/i915_vm_bind.h

new file mode 100644
index ..9fc854969cfb
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_vm_bind.h
@@ -0,0 +1,490 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+/**
+ * DOC: I915_PARAM_HAS_VM_BIND
+ *
+ * VM_BIND feature availability.
+ * See typedef drm_i915_getparam_t param.
+ * bit[0]: If set, VM_BIND is supported, otherwise not.
+ * bits[8-15]: VM_BIND implementation version.
+ * version 0 will not have VM_BIND/UNBIND timeline fence array support.
+ */
+#define I915_PARAM_HAS_VM_BIND    57
+
+/**
+ * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
+ *
+ * Flag to opt-in for VM_BIND mode of binding during VM creation.
+ * See struct drm_i915_gem_vm_control flags.
+ *
+ * The older execbuf2 ioctl will not support VM_BIND mode of operation.
+ * For VM_BIND mode, we have new execbuf3 ioctl which will not 
accept any

+ * execlist (See struct drm_i915_gem_execbuffer3 for more details).
+ *
+ */
+#define I915_VM_CREATE_FLAGS_USE_VM_BIND    (1 << 0)
+
+/**
+ * DOC: I915_CONTEXT_CREATE_FLAGS_LONG_RUNNING
+ *
+ * Flag to declare context as long running.
+ * See struct drm_i915_gem_context_create_ext flags.
+ *
+ * Usage of dma-fence expects that they complete in reasonable 
amount of time.
+ * Compute on the other hand can be long running. Hence it is not 
appropriate

+ * for compute contexts to export request completion dma-fence to user.
+ * The dma-fence usage will be limited to in-kernel consumption only.
+ * Compute contexts need to use user/memory fence.
+ *
+ * So, long running contexts do not support output fences. Hence,
+ * I915_EXEC_FENCE_SIGNAL (See &drm_i915_gem_exec_fence.flags) is 
expected
+ * to be not used. DRM_I915_GEM_WAIT ioctl call is also not 
supported for

+ * objects mapped to long running contexts.
+ */
+#define I915_CONTEXT_CREATE_FLAGS_LONG_RUNNING   (1u << 2)
+
+/* VM_BIND related ioctls */
+#define DRM_I915_GEM_VM_BIND    0x3d
+#define DRM_I915_GEM_VM_UNBIND    0x3e
+#define DRM_I915_GEM_EXECBUFFER3    0x3f
+#define DRM_I915_GEM_WAIT_USER_FENCE    0x40
+
+#define DRM_IOCTL_I915_GEM_VM_BIND DRM_IOWR(DRM_COMMAND_BASE + 
DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_VM_UNBIND DRM_IOWR(DRM_COMMAND_BASE + 
DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER3 DRM_IOWR(DRM_COMMAND_BASE + 
DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
+#define DRM_IOCTL_I915_GEM_WAIT_USER_FENCE DRM_IOWR(DRM_COMMAND_BASE 
+ DRM_I915_GEM_WAIT_USER_FENCE, struct drm_i915_gem_wait_user_fence)

+
+/**
+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
+ *
+ * This structure is passed to VM_BIND ioctl and specifies the 
mapping of GPU
+ * virtual address (VA) range to the section of an object that 
should be bound

+ * in the device page table of the specified address space (VM).
+ * The VA range specified must be unique (ie., not currently bound) 
and can
+ * be mapped to whole object or a section of the object (partial 
binding).
+ * Multiple VA mappings can be created to the same section of the 
object

+ * (aliasing).
+ *
+ * The @queue_idx specifies the queue to use for binding. Same queue 
can be
+ * used for both VM_BIND and VM_UNBIND calls. All submitted bind and 
unbind

+ * operations in a queue are performed in the order of submission.
+ *
+ * The @start, @offset and @length should be 4K page aligned. 
However the DG2
+ * and XEHPSDV has 64K page size for device local-memory and has 
compact page
+ * table. On those platforms, for binding device local-memory 
objects, the
+ * @start should be 2M aligned, @offset and @length should be 64K 
aligned.
+ * Also, on those platforms, it is not allowed to bind an device 
local-memory
+ * object and a system memory object in a single 2M section of VA 
range.

+ */
+struct drm_i915_gem_vm_bind {
+    /** @vm_id: VM (address space) id to bind */
+    __u32 vm_id;
+
+    /** @queue_idx: Index of queue for binding */
+    __u32 queue_idx;


I have a question here to which I did not find an answer by browsing 
the old threads.


Queue index appears to be an implicit synchronisation mechanism, 
right? Operations on the same index are executed/complete in order of 
ioctl submission?


Do we _have_ to implement this on the kernel side and could just allow 
in/out fence and let userspace deal with it?



It orders operations like in a queue. Which is kind of what happens with 
existing queues/engines.


If I understood correctly, it's going to be a kthread + a linked list righ

Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document

2022-06-10 Thread Lionel Landwerlin

On 10/06/2022 10:54, Niranjana Vishwanathapura wrote:

On Fri, Jun 10, 2022 at 09:53:24AM +0300, Lionel Landwerlin wrote:

On 09/06/2022 22:31, Niranjana Vishwanathapura wrote:

On Thu, Jun 09, 2022 at 05:49:09PM +0300, Lionel Landwerlin wrote:

  On 09/06/2022 00:55, Jason Ekstrand wrote:

    On Wed, Jun 8, 2022 at 4:44 PM Niranjana Vishwanathapura
     wrote:

  On Wed, Jun 08, 2022 at 08:33:25AM +0100, Tvrtko Ursulin wrote:
  >
  >
  >On 07/06/2022 22:32, Niranjana Vishwanathapura wrote:
  >>On Tue, Jun 07, 2022 at 11:18:11AM -0700, Niranjana 
Vishwanathapura

  wrote:
  >>>On Tue, Jun 07, 2022 at 12:12:03PM -0500, Jason Ekstrand 
wrote:

  >>>> On Fri, Jun 3, 2022 at 6:52 PM Niranjana Vishwanathapura
  >>>>  wrote:
  >>>>
      >>>>   On Fri, Jun 03, 2022 at 10:20:25AM +0300, Lionel 
Landwerlin

  wrote:
  >>>>   >   On 02/06/2022 23:35, Jason Ekstrand wrote:
  >>>>   >
  >>>>   > On Thu, Jun 2, 2022 at 3:11 PM Niranjana 
Vishwanathapura

  >>>>   >  wrote:
  >>>>   >
  >>>>   >   On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew
  >>>>Brost wrote:
  >>>>   >   >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel
  Landwerlin
  >>>>   wrote:
  >>>>   >   >> On 17/05/2022 21:32, Niranjana Vishwanathapura
  wrote:
  >>>>   >   >> > +VM_BIND/UNBIND ioctl will immediately start
  >>>>   binding/unbinding
  >>>>   >   the mapping in an
  >>>>   >   >> > +async worker. The binding and unbinding 
will

  >>>>work like a
  >>>>   special
  >>>>   >   GPU engine.
  >>>>   >   >> > +The binding and unbinding operations are
  serialized and
  >>>>   will
  >>>>   >   wait on specified
  >>>>   >   >> > +input fences before the operation and 
will signal

  the
  >>>>   output
  >>>>   >   fences upon the
  >>>>   >   >> > +completion of the operation. Due to
  serialization,
  >>>>   completion of
  >>>>   >   an operation
  >>>>   >   >> > +will also indicate that all previous 
operations

  >>>>are also
  >>>>   >   complete.
  >>>>   >   >>
  >>>>   >   >> I guess we should avoid saying "will 
immediately

  start
  >>>>   >   binding/unbinding" if
  >>>>   >   >> there are fences involved.
  >>>>   >   >>
  >>>>   >   >> And the fact that it's happening in an async
  >>>>worker seem to
  >>>>   imply
  >>>>   >   it's not
  >>>>   >   >> immediate.
  >>>>   >   >>
  >>>>   >
  >>>>   >   Ok, will fix.
  >>>>   >   This was added because in earlier design 
binding was

  deferred
  >>>>   until
  >>>>   >   next execbuff.
  >>>>   >   But now it is non-deferred (immediate in that 
sense).

  >>>>But yah,
  >>>>   this is
  >>>>   >   confusing
  >>>>   >   and will fix it.
  >>>>   >
  >>>>   >   >>
  >>>>   >   >> I have a question on the behavior of the bind
  >>>>operation when
  >>>>   no
  >>>>   >   input fence
  >>>>   >   >> is provided. Let say I do :
  >>>>   >   >>
  >>>>   >   >> VM_BIND (out_fence=fence1)
  >>>>   >   >>
  >>>>   >   >> VM_BIND (out_fence=fence2)
  >>>>   >   >>
  >>>>   >   >> VM_BIND (out_fence=fence3)
  >>>>   >   >>
  >>>>   >   >>
  >>>>   >   >> In what order are the fences going to be 
signaled?

  >>>>   >   >>
  >>>>   >   >> In the order of VM_BIND ioctls? Or out of 
order?

  >>>>   >  

Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document

2022-06-09 Thread Lionel Landwerlin

On 09/06/2022 22:31, Niranjana Vishwanathapura wrote:

On Thu, Jun 09, 2022 at 05:49:09PM +0300, Lionel Landwerlin wrote:

  On 09/06/2022 00:55, Jason Ekstrand wrote:

    On Wed, Jun 8, 2022 at 4:44 PM Niranjana Vishwanathapura
     wrote:

  On Wed, Jun 08, 2022 at 08:33:25AM +0100, Tvrtko Ursulin wrote:
  >
  >
  >On 07/06/2022 22:32, Niranjana Vishwanathapura wrote:
  >>On Tue, Jun 07, 2022 at 11:18:11AM -0700, Niranjana 
Vishwanathapura

  wrote:
  >>>On Tue, Jun 07, 2022 at 12:12:03PM -0500, Jason Ekstrand wrote:
  >>>> On Fri, Jun 3, 2022 at 6:52 PM Niranjana Vishwanathapura
  >>>>  wrote:
  >>>>
      >>>>   On Fri, Jun 03, 2022 at 10:20:25AM +0300, Lionel Landwerlin
  wrote:
  >>>>   >   On 02/06/2022 23:35, Jason Ekstrand wrote:
  >>>>   >
  >>>>   > On Thu, Jun 2, 2022 at 3:11 PM Niranjana 
Vishwanathapura

  >>>>   >  wrote:
  >>>>   >
  >>>>   >   On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew
  >>>>Brost wrote:
  >>>>   >   >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel
  Landwerlin
  >>>>   wrote:
  >>>>   >   >> On 17/05/2022 21:32, Niranjana Vishwanathapura
  wrote:
  >>>>   >   >> > +VM_BIND/UNBIND ioctl will immediately start
  >>>>   binding/unbinding
  >>>>   >   the mapping in an
  >>>>   >   >> > +async worker. The binding and unbinding will
  >>>>work like a
  >>>>   special
  >>>>   >   GPU engine.
  >>>>   >   >> > +The binding and unbinding operations are
  serialized and
  >>>>   will
  >>>>   >   wait on specified
  >>>>   >   >> > +input fences before the operation and will 
signal

  the
  >>>>   output
  >>>>   >   fences upon the
  >>>>   >   >> > +completion of the operation. Due to
  serialization,
  >>>>   completion of
  >>>>   >   an operation
  >>>>   >   >> > +will also indicate that all previous 
operations

  >>>>are also
  >>>>   >   complete.
  >>>>   >   >>
  >>>>   >   >> I guess we should avoid saying "will immediately
  start
  >>>>   >   binding/unbinding" if
  >>>>   >   >> there are fences involved.
  >>>>   >   >>
  >>>>   >   >> And the fact that it's happening in an async
  >>>>worker seem to
  >>>>   imply
  >>>>   >   it's not
  >>>>   >   >> immediate.
  >>>>   >   >>
  >>>>   >
  >>>>   >   Ok, will fix.
  >>>>   >   This was added because in earlier design binding 
was

  deferred
  >>>>   until
  >>>>   >   next execbuff.
  >>>>   >   But now it is non-deferred (immediate in that 
sense).

  >>>>But yah,
  >>>>   this is
  >>>>   >   confusing
  >>>>   >   and will fix it.
  >>>>   >
  >>>>   >   >>
  >>>>   >   >> I have a question on the behavior of the bind
  >>>>operation when
  >>>>   no
  >>>>   >   input fence
  >>>>   >   >> is provided. Let say I do :
  >>>>   >   >>
  >>>>   >   >> VM_BIND (out_fence=fence1)
  >>>>   >   >>
  >>>>   >   >> VM_BIND (out_fence=fence2)
  >>>>   >   >>
  >>>>   >   >> VM_BIND (out_fence=fence3)
  >>>>   >   >>
  >>>>   >   >>
  >>>>   >   >> In what order are the fences going to be 
signaled?

  >>>>   >   >>
  >>>>   >   >> In the order of VM_BIND ioctls? Or out of order?
  >>>>   >   >>
  >>>>   >   >> Because you wrote "serialized I assume it's : in
  order
  >

Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document

2022-06-09 Thread Lionel Landwerlin

On 09/06/2022 00:55, Jason Ekstrand wrote:
On Wed, Jun 8, 2022 at 4:44 PM Niranjana Vishwanathapura 
 wrote:


On Wed, Jun 08, 2022 at 08:33:25AM +0100, Tvrtko Ursulin wrote:
>
>
>On 07/06/2022 22:32, Niranjana Vishwanathapura wrote:
>>On Tue, Jun 07, 2022 at 11:18:11AM -0700, Niranjana
Vishwanathapura wrote:
>>>On Tue, Jun 07, 2022 at 12:12:03PM -0500, Jason Ekstrand wrote:
>>>> On Fri, Jun 3, 2022 at 6:52 PM Niranjana Vishwanathapura
>>>>  wrote:
>>>>
>>>>   On Fri, Jun 03, 2022 at 10:20:25AM +0300, Lionel Landwerlin
wrote:
>>>>   >   On 02/06/2022 23:35, Jason Ekstrand wrote:
>>>>   >
>>>>   > On Thu, Jun 2, 2022 at 3:11 PM Niranjana Vishwanathapura
>>>>   >  wrote:
>>>>   >
>>>>   >   On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew
>>>>Brost wrote:
>>>>   >   >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel
Landwerlin
>>>>   wrote:
>>>>   >   >> On 17/05/2022 21:32, Niranjana Vishwanathapura
wrote:
>>>>   >   >> > +VM_BIND/UNBIND ioctl will immediately start
>>>>   binding/unbinding
>>>>   >   the mapping in an
>>>>   >   >> > +async worker. The binding and unbinding will
>>>>work like a
>>>>   special
>>>>   >   GPU engine.
>>>>   >   >> > +The binding and unbinding operations are
serialized and
>>>>   will
>>>>   >   wait on specified
>>>>   >   >> > +input fences before the operation and will
signal the
>>>>   output
>>>>   >   fences upon the
>>>>   >   >> > +completion of the operation. Due to
serialization,
>>>>   completion of
>>>>   >   an operation
>>>>   >   >> > +will also indicate that all previous operations
>>>>are also
>>>>   >   complete.
>>>>   >   >>
>>>>   >   >> I guess we should avoid saying "will immediately
start
>>>>   >   binding/unbinding" if
>>>>   >   >> there are fences involved.
>>>>   >   >>
>>>>   >   >> And the fact that it's happening in an async
>>>>worker seem to
>>>>   imply
>>>>   >   it's not
>>>>   >   >> immediate.
>>>>   >   >>
>>>>   >
>>>>   >   Ok, will fix.
>>>>   >   This was added because in earlier design binding
was deferred
>>>>   until
>>>>   >   next execbuff.
>>>>   >   But now it is non-deferred (immediate in that sense).
>>>>But yah,
>>>>   this is
>>>>   >   confusing
>>>>   >   and will fix it.
>>>>   >
>>>>   >   >>
>>>>   >   >> I have a question on the behavior of the bind
>>>>operation when
>>>>   no
>>>>   >   input fence
>>>>   >   >> is provided. Let say I do :
>>>>   >   >>
>>>>   >   >> VM_BIND (out_fence=fence1)
>>>>   >   >>
>>>>   >   >> VM_BIND (out_fence=fence2)
>>>>   >   >>
>>>>   >   >> VM_BIND (out_fence=fence3)
>>>>   >   >>
>>>>   >   >>
>>>>   >   >> In what order are the fences going to be signaled?
>>>>   >   >>
>>>>   >   >> In the order of VM_BIND ioctls? Or out of order?
>>>>   >   >>
>>>>   >   >> Because you wrote "serialized I assume it's : in
order
>>>>   >   >>
>>>>   >
>>>>   >   Yes, in the order of VM_BIND/UNBIND ioctls. Note that
>>>>bind and
>>>>   unbind
>>>>   >   will use
>>>>   >   the same queue and hence are ord

Re: [Intel-gfx] [RFC v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-08 Thread Lionel Landwerlin

On 08/06/2022 11:36, Tvrtko Ursulin wrote:


On 08/06/2022 07:40, Lionel Landwerlin wrote:

On 03/06/2022 09:53, Niranjana Vishwanathapura wrote:
On Wed, Jun 01, 2022 at 10:08:35PM -0700, Niranjana Vishwanathapura 
wrote:

On Wed, Jun 01, 2022 at 11:27:17AM +0200, Daniel Vetter wrote:

On Wed, 1 Jun 2022 at 11:03, Dave Airlie  wrote:


On Tue, 24 May 2022 at 05:20, Niranjana Vishwanathapura
 wrote:


On Thu, May 19, 2022 at 04:07:30PM -0700, Zanoni, Paulo R wrote:
>On Tue, 2022-05-17 at 11:32 -0700, Niranjana Vishwanathapura 
wrote:

>> VM_BIND and related uapi definitions
>>
>> v2: Ensure proper kernel-doc formatting with cross references.
>> Also add new uapi and documentation as per review comments
>> from Daniel.
>>
>> Signed-off-by: Niranjana Vishwanathapura 


>> ---
>>  Documentation/gpu/rfc/i915_vm_bind.h | 399 
+++

>>  1 file changed, 399 insertions(+)
>>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>>
>> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h 
b/Documentation/gpu/rfc/i915_vm_bind.h

>> new file mode 100644
>> index ..589c0a009107
>> --- /dev/null
>> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>> @@ -0,0 +1,399 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright © 2022 Intel Corporation
>> + */
>> +
>> +/**
>> + * DOC: I915_PARAM_HAS_VM_BIND
>> + *
>> + * VM_BIND feature availability.
>> + * See typedef drm_i915_getparam_t param.
>> + */
>> +#define I915_PARAM_HAS_VM_BIND 57
>> +
>> +/**
>> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>> + *
>> + * Flag to opt-in for VM_BIND mode of binding during VM 
creation.

>> + * See struct drm_i915_gem_vm_control flags.
>> + *
>> + * A VM in VM_BIND mode will not support the older execbuff 
mode of binding.
>> + * In VM_BIND mode, execbuff ioctl will not accept any 
execlist (ie., the

>> + * &drm_i915_gem_execbuffer2.buffer_count must be 0).
>> + * Also, &drm_i915_gem_execbuffer2.batch_start_offset and
>> + * &drm_i915_gem_execbuffer2.batch_len must be 0.
>> + * DRM_I915_GEM_EXECBUFFER_EXT_BATCH_ADDRESSES extension 
must be provided

>> + * to pass in the batch buffer addresses.
>> + *
>> + * Additionally, I915_EXEC_NO_RELOC, I915_EXEC_HANDLE_LUT and
>> + * I915_EXEC_BATCH_FIRST of &drm_i915_gem_execbuffer2.flags 
must be 0
>> + * (not used) in VM_BIND mode. I915_EXEC_USE_EXTENSIONS flag 
must always be
>> + * set (See struct 
drm_i915_gem_execbuffer_ext_batch_addresses).
>> + * The buffers_ptr, buffer_count, batch_start_offset and 
batch_len fields
>> + * of struct drm_i915_gem_execbuffer2 are also not used and 
must be 0.

>> + */
>
>From that description, it seems we have:
>
>struct drm_i915_gem_execbuffer2 {
>    __u64 buffers_ptr;  -> must be 0 (new)
>    __u32 buffer_count; -> must be 0 (new)
>    __u32 batch_start_offset;   -> must be 0 (new)
>    __u32 batch_len;    -> must be 0 (new)
>    __u32 DR1;  -> must be 0 (old)
>    __u32 DR4;  -> must be 0 (old)
>    __u32 num_cliprects; (fences)   -> must be 0 since 
using extensions
>    __u64 cliprects_ptr; (fences, extensions) -> contains 
an actual pointer!
>    __u64 flags;    -> some flags must be 0 
(new)

>    __u64 rsvd1; (context info) -> repurposed field (old)
>    __u64 rsvd2;    -> unused
>};
>
>Based on that, why can't we just get drm_i915_gem_execbuffer3 
instead
>of adding even more complexity to an already abused interface? 
While

>the Vulkan-like extension thing is really nice, I don't think what
>we're doing here is extending the ioctl usage, we're completely
>changing how the base struct should be interpreted based on how 
the VM

>was created (which is an entirely different ioctl).
>
>From Rusty Russel's API Design grading, 
drm_i915_gem_execbuffer2 is
>already at -6 without these changes. I think after vm_bind 
we'll need

>to create a -11 entry just to deal with this ioctl.
>

The only change here is removing the execlist support for VM_BIND
mode (other than natual extensions).
Adding a new execbuffer3 was considered, but I think we need to 
be careful
with that as that goes beyond the VM_BIND support, including any 
future

requirements (as we don't want an execbuffer4 after VM_BIND).


Why not? it's not like adding extensions here is really that 
different

than adding new ioctls.

I definitely think this deserves an execbuffer3 without even
considering future req

Re: [Intel-gfx] [RFC v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-08 Thread Lionel Landwerlin

On 03/06/2022 09:53, Niranjana Vishwanathapura wrote:
On Wed, Jun 01, 2022 at 10:08:35PM -0700, Niranjana Vishwanathapura 
wrote:

On Wed, Jun 01, 2022 at 11:27:17AM +0200, Daniel Vetter wrote:

On Wed, 1 Jun 2022 at 11:03, Dave Airlie  wrote:


On Tue, 24 May 2022 at 05:20, Niranjana Vishwanathapura
 wrote:


On Thu, May 19, 2022 at 04:07:30PM -0700, Zanoni, Paulo R wrote:
>On Tue, 2022-05-17 at 11:32 -0700, Niranjana Vishwanathapura wrote:
>> VM_BIND and related uapi definitions
>>
>> v2: Ensure proper kernel-doc formatting with cross references.
>> Also add new uapi and documentation as per review comments
>> from Daniel.
>>
>> Signed-off-by: Niranjana Vishwanathapura 


>> ---
>>  Documentation/gpu/rfc/i915_vm_bind.h | 399 
+++

>>  1 file changed, 399 insertions(+)
>>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>>
>> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h 
b/Documentation/gpu/rfc/i915_vm_bind.h

>> new file mode 100644
>> index ..589c0a009107
>> --- /dev/null
>> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>> @@ -0,0 +1,399 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright © 2022 Intel Corporation
>> + */
>> +
>> +/**
>> + * DOC: I915_PARAM_HAS_VM_BIND
>> + *
>> + * VM_BIND feature availability.
>> + * See typedef drm_i915_getparam_t param.
>> + */
>> +#define I915_PARAM_HAS_VM_BIND   57
>> +
>> +/**
>> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>> + *
>> + * Flag to opt-in for VM_BIND mode of binding during VM creation.
>> + * See struct drm_i915_gem_vm_control flags.
>> + *
>> + * A VM in VM_BIND mode will not support the older execbuff 
mode of binding.
>> + * In VM_BIND mode, execbuff ioctl will not accept any 
execlist (ie., the

>> + * &drm_i915_gem_execbuffer2.buffer_count must be 0).
>> + * Also, &drm_i915_gem_execbuffer2.batch_start_offset and
>> + * &drm_i915_gem_execbuffer2.batch_len must be 0.
>> + * DRM_I915_GEM_EXECBUFFER_EXT_BATCH_ADDRESSES extension must 
be provided

>> + * to pass in the batch buffer addresses.
>> + *
>> + * Additionally, I915_EXEC_NO_RELOC, I915_EXEC_HANDLE_LUT and
>> + * I915_EXEC_BATCH_FIRST of &drm_i915_gem_execbuffer2.flags 
must be 0
>> + * (not used) in VM_BIND mode. I915_EXEC_USE_EXTENSIONS flag 
must always be

>> + * set (See struct drm_i915_gem_execbuffer_ext_batch_addresses).
>> + * The buffers_ptr, buffer_count, batch_start_offset and 
batch_len fields
>> + * of struct drm_i915_gem_execbuffer2 are also not used and 
must be 0.

>> + */
>
>From that description, it seems we have:
>
>struct drm_i915_gem_execbuffer2 {
>    __u64 buffers_ptr;  -> must be 0 (new)
>    __u32 buffer_count; -> must be 0 (new)
>    __u32 batch_start_offset;   -> must be 0 (new)
>    __u32 batch_len;    -> must be 0 (new)
>    __u32 DR1;  -> must be 0 (old)
>    __u32 DR4;  -> must be 0 (old)
>    __u32 num_cliprects; (fences)   -> must be 0 since using 
extensions
>    __u64 cliprects_ptr; (fences, extensions) -> contains an 
actual pointer!
>    __u64 flags;    -> some flags must be 0 
(new)

>    __u64 rsvd1; (context info) -> repurposed field (old)
>    __u64 rsvd2;    -> unused
>};
>
>Based on that, why can't we just get drm_i915_gem_execbuffer3 
instead

>of adding even more complexity to an already abused interface? While
>the Vulkan-like extension thing is really nice, I don't think what
>we're doing here is extending the ioctl usage, we're completely
>changing how the base struct should be interpreted based on how 
the VM

>was created (which is an entirely different ioctl).
>
>From Rusty Russel's API Design grading, drm_i915_gem_execbuffer2 is
>already at -6 without these changes. I think after vm_bind we'll 
need

>to create a -11 entry just to deal with this ioctl.
>

The only change here is removing the execlist support for VM_BIND
mode (other than natual extensions).
Adding a new execbuffer3 was considered, but I think we need to be 
careful
with that as that goes beyond the VM_BIND support, including any 
future

requirements (as we don't want an execbuffer4 after VM_BIND).


Why not? it's not like adding extensions here is really that different
than adding new ioctls.

I definitely think this deserves an execbuffer3 without even
considering future requirements. Just  to burn down the old
requirements and pointless fields.

Make execbuffer3 be vm bind only, no relocs, no legacy bits, leave the
older sw on execbuf2 for ever.


I guess another point in favour of execbuf3 would be that it's less
midlayer. If we share the entry point then there's quite a few vfuncs
needed to cleanly split out the vm_bind paths from the legacy
reloc/softping paths.

If we invert this and do execbuf3, then there's the existing ioctl
vfunc, and then we share code (where it even makes sense, probably
request setup/submit need to be shared, a

Re: [Intel-gfx] [RFC v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-07 Thread Lionel Landwerlin

On 08/06/2022 09:40, Lionel Landwerlin wrote:

On 03/06/2022 09:53, Niranjana Vishwanathapura wrote:
On Wed, Jun 01, 2022 at 10:08:35PM -0700, Niranjana Vishwanathapura 
wrote:

On Wed, Jun 01, 2022 at 11:27:17AM +0200, Daniel Vetter wrote:

On Wed, 1 Jun 2022 at 11:03, Dave Airlie  wrote:


On Tue, 24 May 2022 at 05:20, Niranjana Vishwanathapura
 wrote:


On Thu, May 19, 2022 at 04:07:30PM -0700, Zanoni, Paulo R wrote:
>On Tue, 2022-05-17 at 11:32 -0700, Niranjana Vishwanathapura wrote:
>> VM_BIND and related uapi definitions
>>
>> v2: Ensure proper kernel-doc formatting with cross references.
>> Also add new uapi and documentation as per review comments
>> from Daniel.
>>
>> Signed-off-by: Niranjana Vishwanathapura 


>> ---
>>  Documentation/gpu/rfc/i915_vm_bind.h | 399 
+++

>>  1 file changed, 399 insertions(+)
>>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>>
>> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h 
b/Documentation/gpu/rfc/i915_vm_bind.h

>> new file mode 100644
>> index ..589c0a009107
>> --- /dev/null
>> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>> @@ -0,0 +1,399 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright © 2022 Intel Corporation
>> + */
>> +
>> +/**
>> + * DOC: I915_PARAM_HAS_VM_BIND
>> + *
>> + * VM_BIND feature availability.
>> + * See typedef drm_i915_getparam_t param.
>> + */
>> +#define I915_PARAM_HAS_VM_BIND 57
>> +
>> +/**
>> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>> + *
>> + * Flag to opt-in for VM_BIND mode of binding during VM 
creation.

>> + * See struct drm_i915_gem_vm_control flags.
>> + *
>> + * A VM in VM_BIND mode will not support the older execbuff 
mode of binding.
>> + * In VM_BIND mode, execbuff ioctl will not accept any 
execlist (ie., the

>> + * &drm_i915_gem_execbuffer2.buffer_count must be 0).
>> + * Also, &drm_i915_gem_execbuffer2.batch_start_offset and
>> + * &drm_i915_gem_execbuffer2.batch_len must be 0.
>> + * DRM_I915_GEM_EXECBUFFER_EXT_BATCH_ADDRESSES extension must 
be provided

>> + * to pass in the batch buffer addresses.
>> + *
>> + * Additionally, I915_EXEC_NO_RELOC, I915_EXEC_HANDLE_LUT and
>> + * I915_EXEC_BATCH_FIRST of &drm_i915_gem_execbuffer2.flags 
must be 0
>> + * (not used) in VM_BIND mode. I915_EXEC_USE_EXTENSIONS flag 
must always be

>> + * set (See struct drm_i915_gem_execbuffer_ext_batch_addresses).
>> + * The buffers_ptr, buffer_count, batch_start_offset and 
batch_len fields
>> + * of struct drm_i915_gem_execbuffer2 are also not used and 
must be 0.

>> + */
>
>From that description, it seems we have:
>
>struct drm_i915_gem_execbuffer2 {
>    __u64 buffers_ptr;  -> must be 0 (new)
>    __u32 buffer_count; -> must be 0 (new)
>    __u32 batch_start_offset;   -> must be 0 (new)
>    __u32 batch_len;    -> must be 0 (new)
>    __u32 DR1;  -> must be 0 (old)
>    __u32 DR4;  -> must be 0 (old)
>    __u32 num_cliprects; (fences)   -> must be 0 since using 
extensions
>    __u64 cliprects_ptr; (fences, extensions) -> contains an 
actual pointer!
>    __u64 flags;    -> some flags must be 0 
(new)

>    __u64 rsvd1; (context info) -> repurposed field (old)
>    __u64 rsvd2;    -> unused
>};
>
>Based on that, why can't we just get drm_i915_gem_execbuffer3 
instead
>of adding even more complexity to an already abused interface? 
While

>the Vulkan-like extension thing is really nice, I don't think what
>we're doing here is extending the ioctl usage, we're completely
>changing how the base struct should be interpreted based on how 
the VM

>was created (which is an entirely different ioctl).
>
>From Rusty Russel's API Design grading, drm_i915_gem_execbuffer2 is
>already at -6 without these changes. I think after vm_bind we'll 
need

>to create a -11 entry just to deal with this ioctl.
>

The only change here is removing the execlist support for VM_BIND
mode (other than natual extensions).
Adding a new execbuffer3 was considered, but I think we need to 
be careful
with that as that goes beyond the VM_BIND support, including any 
future

requirements (as we don't want an execbuffer4 after VM_BIND).


Why not? it's not like adding extensions here is really that 
different

than adding new ioctls.

I definitely think this deserves an execbuffer3 without even
considering future requirements. Just  to burn down the old
requiremen

Re: [Intel-gfx] [RFC v3 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-07 Thread Lionel Landwerlin

On 03/06/2022 09:53, Niranjana Vishwanathapura wrote:
On Wed, Jun 01, 2022 at 10:08:35PM -0700, Niranjana Vishwanathapura 
wrote:

On Wed, Jun 01, 2022 at 11:27:17AM +0200, Daniel Vetter wrote:

On Wed, 1 Jun 2022 at 11:03, Dave Airlie  wrote:


On Tue, 24 May 2022 at 05:20, Niranjana Vishwanathapura
 wrote:


On Thu, May 19, 2022 at 04:07:30PM -0700, Zanoni, Paulo R wrote:
>On Tue, 2022-05-17 at 11:32 -0700, Niranjana Vishwanathapura wrote:
>> VM_BIND and related uapi definitions
>>
>> v2: Ensure proper kernel-doc formatting with cross references.
>> Also add new uapi and documentation as per review comments
>> from Daniel.
>>
>> Signed-off-by: Niranjana Vishwanathapura 


>> ---
>>  Documentation/gpu/rfc/i915_vm_bind.h | 399 
+++

>>  1 file changed, 399 insertions(+)
>>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>>
>> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h 
b/Documentation/gpu/rfc/i915_vm_bind.h

>> new file mode 100644
>> index ..589c0a009107
>> --- /dev/null
>> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>> @@ -0,0 +1,399 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * Copyright © 2022 Intel Corporation
>> + */
>> +
>> +/**
>> + * DOC: I915_PARAM_HAS_VM_BIND
>> + *
>> + * VM_BIND feature availability.
>> + * See typedef drm_i915_getparam_t param.
>> + */
>> +#define I915_PARAM_HAS_VM_BIND   57
>> +
>> +/**
>> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>> + *
>> + * Flag to opt-in for VM_BIND mode of binding during VM creation.
>> + * See struct drm_i915_gem_vm_control flags.
>> + *
>> + * A VM in VM_BIND mode will not support the older execbuff 
mode of binding.
>> + * In VM_BIND mode, execbuff ioctl will not accept any 
execlist (ie., the

>> + * &drm_i915_gem_execbuffer2.buffer_count must be 0).
>> + * Also, &drm_i915_gem_execbuffer2.batch_start_offset and
>> + * &drm_i915_gem_execbuffer2.batch_len must be 0.
>> + * DRM_I915_GEM_EXECBUFFER_EXT_BATCH_ADDRESSES extension must 
be provided

>> + * to pass in the batch buffer addresses.
>> + *
>> + * Additionally, I915_EXEC_NO_RELOC, I915_EXEC_HANDLE_LUT and
>> + * I915_EXEC_BATCH_FIRST of &drm_i915_gem_execbuffer2.flags 
must be 0
>> + * (not used) in VM_BIND mode. I915_EXEC_USE_EXTENSIONS flag 
must always be

>> + * set (See struct drm_i915_gem_execbuffer_ext_batch_addresses).
>> + * The buffers_ptr, buffer_count, batch_start_offset and 
batch_len fields
>> + * of struct drm_i915_gem_execbuffer2 are also not used and 
must be 0.

>> + */
>
>From that description, it seems we have:
>
>struct drm_i915_gem_execbuffer2 {
>    __u64 buffers_ptr;  -> must be 0 (new)
>    __u32 buffer_count; -> must be 0 (new)
>    __u32 batch_start_offset;   -> must be 0 (new)
>    __u32 batch_len;    -> must be 0 (new)
>    __u32 DR1;  -> must be 0 (old)
>    __u32 DR4;  -> must be 0 (old)
>    __u32 num_cliprects; (fences)   -> must be 0 since using 
extensions
>    __u64 cliprects_ptr; (fences, extensions) -> contains an 
actual pointer!
>    __u64 flags;    -> some flags must be 0 
(new)

>    __u64 rsvd1; (context info) -> repurposed field (old)
>    __u64 rsvd2;    -> unused
>};
>
>Based on that, why can't we just get drm_i915_gem_execbuffer3 
instead

>of adding even more complexity to an already abused interface? While
>the Vulkan-like extension thing is really nice, I don't think what
>we're doing here is extending the ioctl usage, we're completely
>changing how the base struct should be interpreted based on how 
the VM

>was created (which is an entirely different ioctl).
>
>From Rusty Russel's API Design grading, drm_i915_gem_execbuffer2 is
>already at -6 without these changes. I think after vm_bind we'll 
need

>to create a -11 entry just to deal with this ioctl.
>

The only change here is removing the execlist support for VM_BIND
mode (other than natual extensions).
Adding a new execbuffer3 was considered, but I think we need to be 
careful
with that as that goes beyond the VM_BIND support, including any 
future

requirements (as we don't want an execbuffer4 after VM_BIND).


Why not? it's not like adding extensions here is really that different
than adding new ioctls.

I definitely think this deserves an execbuffer3 without even
considering future requirements. Just  to burn down the old
requirements and pointless fields.

Make execbuffer3 be vm bind only, no relocs, no legacy bits, leave the
older sw on execbuf2 for ever.


I guess another point in favour of execbuf3 would be that it's less
midlayer. If we share the entry point then there's quite a few vfuncs
needed to cleanly split out the vm_bind paths from the legacy
reloc/softping paths.

If we invert this and do execbuf3, then there's the existing ioctl
vfunc, and then we share code (where it even makes sense, probably
request setup/submit need to be shared, a

Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document

2022-06-03 Thread Lionel Landwerlin

On 02/06/2022 23:35, Jason Ekstrand wrote:
On Thu, Jun 2, 2022 at 3:11 PM Niranjana Vishwanathapura 
 wrote:


On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew Brost wrote:
>On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel Landwerlin wrote:
>> On 17/05/2022 21:32, Niranjana Vishwanathapura wrote:
>> > +VM_BIND/UNBIND ioctl will immediately start
binding/unbinding the mapping in an
>> > +async worker. The binding and unbinding will work like a
special GPU engine.
>> > +The binding and unbinding operations are serialized and will
wait on specified
>> > +input fences before the operation and will signal the output
fences upon the
>> > +completion of the operation. Due to serialization,
completion of an operation
>> > +will also indicate that all previous operations are also
complete.
>>
>> I guess we should avoid saying "will immediately start
binding/unbinding" if
>> there are fences involved.
>>
>> And the fact that it's happening in an async worker seem to
imply it's not
>> immediate.
>>

Ok, will fix.
This was added because in earlier design binding was deferred
until next execbuff.
But now it is non-deferred (immediate in that sense). But yah,
this is confusing
and will fix it.

>>
>> I have a question on the behavior of the bind operation when no
input fence
>> is provided. Let say I do :
>>
>> VM_BIND (out_fence=fence1)
>>
>> VM_BIND (out_fence=fence2)
>>
>> VM_BIND (out_fence=fence3)
>>
>>
>> In what order are the fences going to be signaled?
>>
>> In the order of VM_BIND ioctls? Or out of order?
>>
>> Because you wrote "serialized I assume it's : in order
>>

Yes, in the order of VM_BIND/UNBIND ioctls. Note that bind and
unbind will use
the same queue and hence are ordered.

>>
>> One thing I didn't realize is that because we only get one
"VM_BIND" engine,
>> there is a disconnect from the Vulkan specification.
>>
>> In Vulkan VM_BIND operations are serialized but per engine.
>>
>> So you could have something like this :
>>
>> VM_BIND (engine=rcs0, in_fence=fence1, out_fence=fence2)
>>
>> VM_BIND (engine=ccs0, in_fence=fence3, out_fence=fence4)
>>
>>
>> fence1 is not signaled
>>
>> fence3 is signaled
>>
>> So the second VM_BIND will proceed before the first VM_BIND.
>>
>>
>> I guess we can deal with that scenario in userspace by doing
the wait
>> ourselves in one thread per engines.
>>
>> But then it makes the VM_BIND input fences useless.
>>
>>
>> Daniel : what do you think? Should be rework this or just deal
with wait
>> fences in userspace?
>>
>
>My opinion is rework this but make the ordering via an engine
param optional.
>
>e.g. A VM can be configured so all binds are ordered within the VM
>
>e.g. A VM can be configured so all binds accept an engine
argument (in
>the case of the i915 likely this is a gem context handle) and binds
>ordered with respect to that engine.
>
>This gives UMDs options as the later likely consumes more KMD
resources
>so if a different UMD can live with binds being ordered within the VM
>they can use a mode consuming less resources.
>

I think we need to be careful here if we are looking for some out of
(submission) order completion of vm_bind/unbind.
In-order completion means, in a batch of binds and unbinds to be
completed in-order, user only needs to specify in-fence for the
first bind/unbind call and the our-fence for the last bind/unbind
call. Also, the VA released by an unbind call can be re-used by
any subsequent bind call in that in-order batch.

These things will break if binding/unbinding were to be allowed to
go out of order (of submission) and user need to be extra careful
not to run into pre-mature triggereing of out-fence and bind failing
as VA is still in use etc.

Also, VM_BIND binds the provided mapping on the specified address
space
(VM). So, the uapi is not engine/context specific.

We can however add a 'queue' to the uapi which can be one from the
pre-defined queues,
I915_VM_BIND_QUEUE_0
I915_VM_BIND_QUEUE_1
...
I915_VM_BIND_QUEUE_(N-1)

KMD will spawn an async work queue for each queue which will only
bind 

Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document

2022-06-01 Thread Lionel Landwerlin

On 02/06/2022 00:18, Matthew Brost wrote:

On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel Landwerlin wrote:

On 17/05/2022 21:32, Niranjana Vishwanathapura wrote:

+VM_BIND/UNBIND ioctl will immediately start binding/unbinding the mapping in an
+async worker. The binding and unbinding will work like a special GPU engine.
+The binding and unbinding operations are serialized and will wait on specified
+input fences before the operation and will signal the output fences upon the
+completion of the operation. Due to serialization, completion of an operation
+will also indicate that all previous operations are also complete.

I guess we should avoid saying "will immediately start binding/unbinding" if
there are fences involved.

And the fact that it's happening in an async worker seem to imply it's not
immediate.


I have a question on the behavior of the bind operation when no input fence
is provided. Let say I do :

VM_BIND (out_fence=fence1)

VM_BIND (out_fence=fence2)

VM_BIND (out_fence=fence3)


In what order are the fences going to be signaled?

In the order of VM_BIND ioctls? Or out of order?

Because you wrote "serialized I assume it's : in order


One thing I didn't realize is that because we only get one "VM_BIND" engine,
there is a disconnect from the Vulkan specification.

In Vulkan VM_BIND operations are serialized but per engine.

So you could have something like this :

VM_BIND (engine=rcs0, in_fence=fence1, out_fence=fence2)

VM_BIND (engine=ccs0, in_fence=fence3, out_fence=fence4)


Question - let's say this done after the above operations:

EXEC (engine=ccs0, in_fence=NULL, out_fence=NULL)

Is the exec ordered with respected to bind (i.e. would fence3 & 4 be
signaled before the exec starts)?

Matt



Hi Matt,

From the vulkan point of view, everything is serialized within an 
engine (we map that to a VkQueue).


So with :

EXEC (engine=ccs0, in_fence=NULL, out_fence=NULL)
VM_BIND (engine=ccs0, in_fence=fence3, out_fence=fence4)

EXEC completes first then VM_BIND executes.


To be even clearer :

EXEC (engine=ccs0, in_fence=fence2, out_fence=NULL)
VM_BIND (engine=ccs0, in_fence=fence3, out_fence=fence4)


EXEC will wait until fence2 is signaled.
Once fence2 is signaled, EXEC proceeds, finishes and only after it is done, 
VM_BIND executes.

It would kind of like having the VM_BIND operation be another batch executed 
from the ringbuffer buffer.

-Lionel





fence1 is not signaled

fence3 is signaled

So the second VM_BIND will proceed before the first VM_BIND.


I guess we can deal with that scenario in userspace by doing the wait
ourselves in one thread per engines.

But then it makes the VM_BIND input fences useless.


Daniel : what do you think? Should be rework this or just deal with wait
fences in userspace?


Sorry I noticed this late.


-Lionel






Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document

2022-06-01 Thread Lionel Landwerlin

On 17/05/2022 21:32, Niranjana Vishwanathapura wrote:

+VM_BIND/UNBIND ioctl will immediately start binding/unbinding the mapping in an
+async worker. The binding and unbinding will work like a special GPU engine.
+The binding and unbinding operations are serialized and will wait on specified
+input fences before the operation and will signal the output fences upon the
+completion of the operation. Due to serialization, completion of an operation
+will also indicate that all previous operations are also complete.


I guess we should avoid saying "will immediately start 
binding/unbinding" if there are fences involved.


And the fact that it's happening in an async worker seem to imply it's 
not immediate.



I have a question on the behavior of the bind operation when no input 
fence is provided. Let say I do :


VM_BIND (out_fence=fence1)

VM_BIND (out_fence=fence2)

VM_BIND (out_fence=fence3)


In what order are the fences going to be signaled?

In the order of VM_BIND ioctls? Or out of order?

Because you wrote "serialized I assume it's : in order


One thing I didn't realize is that because we only get one "VM_BIND" 
engine, there is a disconnect from the Vulkan specification.


In Vulkan VM_BIND operations are serialized but per engine.

So you could have something like this :

VM_BIND (engine=rcs0, in_fence=fence1, out_fence=fence2)

VM_BIND (engine=ccs0, in_fence=fence3, out_fence=fence4)


fence1 is not signaled

fence3 is signaled

So the second VM_BIND will proceed before the first VM_BIND.


I guess we can deal with that scenario in userspace by doing the wait 
ourselves in one thread per engines.


But then it makes the VM_BIND input fences useless.


Daniel : what do you think? Should be rework this or just deal with wait 
fences in userspace?



Sorry I noticed this late.


-Lionel




Re: [Intel-gfx] [PATCH v2 2/6] drm/i915/xehp: Drop GETPARAM lookups of I915_PARAM_[SUB]SLICE_MASK

2022-05-31 Thread Lionel Landwerlin

On 17/05/2022 06:20, Matt Roper wrote:

Slice/subslice/EU information should be obtained via the topology
queries provided by the I915_QUERY interface; let's turn off support for
the old GETPARAM lookups on Xe_HP and beyond where we can't return
meaningful values.

The slice mask lookup is meaningless since Xe_HP doesn't support
traditional slices (and we make no attempt to return the various new
units like gslices, cslices, mslices, etc.) here.

The subslice mask lookup is even more problematic; given the distinct
masks for geometry vs compute purposes, the combined mask returned here
is likely not what userspace would want to act upon anyway.  The value
is also limited to 32-bits by the nature of the GETPARAM ioctl which is
sufficient for the initial Xe_HP platforms, but is unable to convey the
larger masks that will be needed on other upcoming platforms.  Finally,
the value returned here becomes even less meaningful when used on
multi-tile platforms where each tile will have its own masks.

Signed-off-by: Matt Roper 



Sounds fair. We've been relying on the topology query in Mesa since it's 
available and it's a requirement for Gfx10+.


FYI, we're also not using I915_PARAM_EU_TOTAL on Gfx10+ for the same reason.


Acked-by: Lionel Landwerlin 



---
  drivers/gpu/drm/i915/i915_getparam.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_getparam.c 
b/drivers/gpu/drm/i915/i915_getparam.c
index c12a0adefda5..ac9767c56619 100644
--- a/drivers/gpu/drm/i915/i915_getparam.c
+++ b/drivers/gpu/drm/i915/i915_getparam.c
@@ -148,11 +148,19 @@ int i915_getparam_ioctl(struct drm_device *dev, void 
*data,
value = intel_engines_has_context_isolation(i915);
break;
case I915_PARAM_SLICE_MASK:
+   /* Not supported from Xe_HP onward; use topology queries */
+   if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
+   return -EINVAL;
+
value = sseu->slice_mask;
if (!value)
return -ENODEV;
break;
case I915_PARAM_SUBSLICE_MASK:
+   /* Not supported from Xe_HP onward; use topology queries */
+   if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
+   return -EINVAL;
+
/* Only copy bits from the first slice */
memcpy(&value, sseu->subslice_mask,
   min(sseu->ss_stride, (u8)sizeof(value)));





Re: [Intel-gfx] [PATCH] drm/syncobj: flatten dma_fence_chains on transfer

2022-05-30 Thread Lionel Landwerlin

On 30/05/2022 14:40, Christian König wrote:

Am 30.05.22 um 12:09 schrieb Lionel Landwerlin:

On 30/05/2022 12:52, Christian König wrote:

Am 25.05.22 um 23:59 schrieb Lucas De Marchi:

On Wed, May 25, 2022 at 12:38:51PM +0200, Christian König wrote:

Am 25.05.22 um 11:35 schrieb Lionel Landwerlin:

[SNIP]

Err... Let's double check with my colleagues.

It seems we're running into a test failure in IGT with this 
patch, but now I have doubts that it's where the problem lies.


Yeah, exactly that's what I couldn't understand as well.

What you describe above should still work fine.

Thanks for taking a look into this,
Christian.


With some additional prints:

[  210.742634] Console: switching to colour dummy device 80x25
[  210.742686] [IGT] syncobj_timeline: executing
[  210.756988] [IGT] syncobj_timeline: starting subtest 
transfer-timeline-point
[  210.757364] [drm:drm_syncobj_transfer_ioctl] *ERROR* adding 
fence0 signaled=1
[  210.764543] [drm:drm_syncobj_transfer_ioctl] *ERROR* resulting 
array fence signaled=0

[  210.800469] [IGT] syncobj_timeline: exiting, ret=98
[  210.825426] Console: switching to colour frame buffer device 240x67


still learning this part of the code but AFAICS the problem is because
when we are creating the array, the 'signaled' doesn't propagate to 
the

array.


Yeah, but that is intentionally. The array should only signal when 
requested.


I still don't get what the test case here is checking.



There must be something I don't know about fence arrays.

You seem to say that creating an array of signaled fences will not 
make the array signaled.


Exactly that, yes. The array delays it's signaling until somebody asks 
for it.


In other words the fences inside the array are check only after 
someone calls dma_fence_enable_sw_signaling() which in turn calls 
dma_fence_array_enable_signaling().


It is certainly possible that nobody does that in the drm_syncobj and 
because of this the array never signals.


Regards,
Christian.



Thanks,


Yeah I guess dma_fence_enable_sw_signaling() is never called for sw_sync.

Don't we also want to call it right at the end of 
drm_syncobj_flatten_chain() ?



-Lionel







This is the situation with this IGT test.

We started with a syncobj with point 1 & 2 signaled.

We take point 2 and import it as a new point 3 on the same syncobj.

We expect point 3 to be signaled as well and it's not.


Thanks,


-Lionel




Regards,
Christian.



dma_fence_array_create() {
...
atomic_set(&array->num_pending, signal_on_any ? 1 : num_fences);
...
}

This is not considering the fact that some of the fences could already
have been signaled as is the case in the 
igt@syncobj_timeline@transfer-timeline-point
test. See 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11693/shard-dg1-12/igt@syncobj_timel...@transfer-timeline-point.html


Quick patch on this function fixes it for me:

-8<
Subject: [PATCH] dma-buf: Honor already signaled fences on array 
creation


When creating an array, array->num_pending is marked with the 
number of

fences. However the fences could alredy have been signaled. Propagate
num_pending to the array by looking at each individual fence the array
contains.

Signed-off-by: Lucas De Marchi 
---
 drivers/dma-buf/dma-fence-array.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/dma-buf/dma-fence-array.c 
b/drivers/dma-buf/dma-fence-array.c

index 5c8a7084577b..32f491c32fa0 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -158,6 +158,8 @@ struct dma_fence_array 
*dma_fence_array_create(int num_fences,

 {
 struct dma_fence_array *array;
 size_t size = sizeof(*array);
+    unsigned num_pending = 0;
+    struct dma_fence **f;

 WARN_ON(!num_fences || !fences);

@@ -173,7 +175,14 @@ struct dma_fence_array 
*dma_fence_array_create(int num_fences,

 init_irq_work(&array->work, irq_dma_fence_array_work);

 array->num_fences = num_fences;
-    atomic_set(&array->num_pending, signal_on_any ? 1 : num_fences);
+
+    for (f = fences; f < fences + num_fences; f++)
+    num_pending += !dma_fence_is_signaled(*f);
+
+    if (signal_on_any)
+    num_pending = !!num_pending;
+
+    atomic_set(&array->num_pending, num_pending);
 array->fences = fences;

 array->base.error = PENDING_ERROR;










Re: [Intel-gfx] [PATCH] drm/syncobj: flatten dma_fence_chains on transfer

2022-05-30 Thread Lionel Landwerlin

On 30/05/2022 12:52, Christian König wrote:

Am 25.05.22 um 23:59 schrieb Lucas De Marchi:

On Wed, May 25, 2022 at 12:38:51PM +0200, Christian König wrote:

Am 25.05.22 um 11:35 schrieb Lionel Landwerlin:

[SNIP]

Err... Let's double check with my colleagues.

It seems we're running into a test failure in IGT with this patch, 
but now I have doubts that it's where the problem lies.


Yeah, exactly that's what I couldn't understand as well.

What you describe above should still work fine.

Thanks for taking a look into this,
Christian.


With some additional prints:

[  210.742634] Console: switching to colour dummy device 80x25
[  210.742686] [IGT] syncobj_timeline: executing
[  210.756988] [IGT] syncobj_timeline: starting subtest 
transfer-timeline-point
[  210.757364] [drm:drm_syncobj_transfer_ioctl] *ERROR* adding fence0 
signaled=1
[  210.764543] [drm:drm_syncobj_transfer_ioctl] *ERROR* resulting 
array fence signaled=0

[  210.800469] [IGT] syncobj_timeline: exiting, ret=98
[  210.825426] Console: switching to colour frame buffer device 240x67


still learning this part of the code but AFAICS the problem is because
when we are creating the array, the 'signaled' doesn't propagate to the
array.


Yeah, but that is intentionally. The array should only signal when 
requested.


I still don't get what the test case here is checking.



There must be something I don't know about fence arrays.

You seem to say that creating an array of signaled fences will not make 
the array signaled.



This is the situation with this IGT test.

We started with a syncobj with point 1 & 2 signaled.

We take point 2 and import it as a new point 3 on the same syncobj.

We expect point 3 to be signaled as well and it's not.


Thanks,


-Lionel




Regards,
Christian.



dma_fence_array_create() {
...
atomic_set(&array->num_pending, signal_on_any ? 1 : num_fences);
...
}

This is not considering the fact that some of the fences could already
have been signaled as is the case in the 
igt@syncobj_timeline@transfer-timeline-point
test. See 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11693/shard-dg1-12/igt@syncobj_timel...@transfer-timeline-point.html


Quick patch on this function fixes it for me:

-8<
Subject: [PATCH] dma-buf: Honor already signaled fences on array 
creation


When creating an array, array->num_pending is marked with the number of
fences. However the fences could alredy have been signaled. Propagate
num_pending to the array by looking at each individual fence the array
contains.

Signed-off-by: Lucas De Marchi 
---
 drivers/dma-buf/dma-fence-array.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/dma-buf/dma-fence-array.c 
b/drivers/dma-buf/dma-fence-array.c

index 5c8a7084577b..32f491c32fa0 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -158,6 +158,8 @@ struct dma_fence_array 
*dma_fence_array_create(int num_fences,

 {
 struct dma_fence_array *array;
 size_t size = sizeof(*array);
+    unsigned num_pending = 0;
+    struct dma_fence **f;

 WARN_ON(!num_fences || !fences);

@@ -173,7 +175,14 @@ struct dma_fence_array 
*dma_fence_array_create(int num_fences,

 init_irq_work(&array->work, irq_dma_fence_array_work);

 array->num_fences = num_fences;
-    atomic_set(&array->num_pending, signal_on_any ? 1 : num_fences);
+
+    for (f = fences; f < fences + num_fences; f++)
+    num_pending += !dma_fence_is_signaled(*f);
+
+    if (signal_on_any)
+    num_pending = !!num_pending;
+
+    atomic_set(&array->num_pending, num_pending);
 array->fences = fences;

 array->base.error = PENDING_ERROR;






Re: [2/2] dma-buf: Add an API for importing sync files (v9)

2022-05-26 Thread Lionel Landwerlin

Just noticed a small nit on this one :

ordering via these fences, it is the respnosibility of userspace to use

-> responsibility


Acked-by: Lionel Landwerlin 


Cheers,


-Lionel



Re: [1/2] dma-buf: Add an API for exporting sync files (v14)

2022-05-26 Thread Lionel Landwerlin

Acked-by: Lionel Landwerlin 



Re: [Intel-gfx] [PATCH] drm/syncobj: flatten dma_fence_chains on transfer

2022-05-25 Thread Lionel Landwerlin

On 25/05/2022 12:26, Lionel Landwerlin wrote:

On 25/05/2022 11:24, Christian König wrote:

Am 25.05.22 um 08:47 schrieb Lionel Landwerlin:

On 09/02/2022 20:26, Christian König wrote:

It is illegal to add a dma_fence_chain as timeline point. Flatten out
the fences into a dma_fence_array instead.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/drm_syncobj.c | 61 
---

  1 file changed, 56 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index c313a5b4549c..7e48dcd1bee4 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -853,12 +853,57 @@ drm_syncobj_fd_to_handle_ioctl(struct 
drm_device *dev, void *data,

  &args->handle);
  }
  +
+/*
+ * Try to flatten a dma_fence_chain into a dma_fence_array so that 
it can be

+ * added as timeline fence to a chain again.
+ */
+static int drm_syncobj_flatten_chain(struct dma_fence **f)
+{
+    struct dma_fence_chain *chain = to_dma_fence_chain(*f);
+    struct dma_fence *tmp, **fences;
+    struct dma_fence_array *array;
+    unsigned int count;
+
+    if (!chain)
+    return 0;
+
+    count = 0;
+    dma_fence_chain_for_each(tmp, &chain->base)
+    ++count;
+
+    fences = kmalloc_array(count, sizeof(*fences), GFP_KERNEL);
+    if (!fences)
+    return -ENOMEM;
+
+    count = 0;
+    dma_fence_chain_for_each(tmp, &chain->base)
+    fences[count++] = dma_fence_get(tmp);
+
+    array = dma_fence_array_create(count, fences,
+   dma_fence_context_alloc(1),



Hi Christian,


Sorry for the late answer to this.


It appears this commit is trying to remove the warnings added by 
"dma-buf: Warn about dma_fence_chain container rules"


Yes, correct. We are now enforcing some rules with warnings and this 
here bubbled up.




But the context allocation you added just above is breaking some 
tests. In particular igt@syncobj_timeline@transfer-timeline-point


That test transfer points into the timeline at point 3 and expects 
that we'll still on the previous points to complete.


Hui what? I don't understand the problem you are seeing here. What 
exactly is the test doing?




In my opinion we should be reusing the previous context number if 
there is one and only allocate if we don't have a point.


Scratching my head what you mean with that. The functionality 
transfers a synchronization fence from one timeline to another.


So as far as I can see the new point should be part of the timeline 
of the syncobj we are transferring to.


If the application wants to not depend on previous points for wait 
operations, it can reset the syncobj prior to adding a new point.


Well we should never lose synchronization. So what happens is that 
when we do the transfer all the fences of the source are flattened 
out into an array. And that array is then added as new point into the 
destination timeline.



In this case would be broken :


syncobjA <- signal point 1

syncobjA <- import syncobjB point 1 into syncobjA point 2

syncobjA <- query returns 0


-Lionel



Err... Let's double check with my colleagues.

It seems we're running into a test failure in IGT with this patch, but 
now I have doubts that it's where the problem lies.



-Lionel







Where exactly is the problem?

Regards,
Christian.





Cheers,


-Lionel




+   1, false);
+    if (!array)
+    goto free_fences;
+
+    dma_fence_put(*f);
+    *f = &array->base;
+    return 0;
+
+free_fences:
+    while (count--)
+    dma_fence_put(fences[count]);
+
+    kfree(fences);
+    return -ENOMEM;
+}
+
  static int drm_syncobj_transfer_to_timeline(struct drm_file 
*file_private,

  struct drm_syncobj_transfer *args)
  {
  struct drm_syncobj *timeline_syncobj = NULL;
-    struct dma_fence *fence;
  struct dma_fence_chain *chain;
+    struct dma_fence *fence;
  int ret;
    timeline_syncobj = drm_syncobj_find(file_private, 
args->dst_handle);
@@ -869,16 +914,22 @@ static int 
drm_syncobj_transfer_to_timeline(struct drm_file *file_private,

   args->src_point, args->flags,
   &fence);
  if (ret)
-    goto err;
+    goto err_put_timeline;
+
+    ret = drm_syncobj_flatten_chain(&fence);
+    if (ret)
+    goto err_free_fence;
+
  chain = dma_fence_chain_alloc();
  if (!chain) {
  ret = -ENOMEM;
-    goto err1;
+    goto err_free_fence;
  }
+
  drm_syncobj_add_point(timeline_syncobj, chain, fence, 
args->dst_point);

-err1:
+err_free_fence:
  dma_fence_put(fence);
-err:
+err_put_timeline:
  drm_syncobj_put(timeline_syncobj);
    return ret;











Re: [Intel-gfx] [PATCH] drm/syncobj: flatten dma_fence_chains on transfer

2022-05-25 Thread Lionel Landwerlin

On 25/05/2022 11:24, Christian König wrote:

Am 25.05.22 um 08:47 schrieb Lionel Landwerlin:

On 09/02/2022 20:26, Christian König wrote:

It is illegal to add a dma_fence_chain as timeline point. Flatten out
the fences into a dma_fence_array instead.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/drm_syncobj.c | 61 
---

  1 file changed, 56 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index c313a5b4549c..7e48dcd1bee4 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -853,12 +853,57 @@ drm_syncobj_fd_to_handle_ioctl(struct 
drm_device *dev, void *data,

  &args->handle);
  }
  +
+/*
+ * Try to flatten a dma_fence_chain into a dma_fence_array so that 
it can be

+ * added as timeline fence to a chain again.
+ */
+static int drm_syncobj_flatten_chain(struct dma_fence **f)
+{
+    struct dma_fence_chain *chain = to_dma_fence_chain(*f);
+    struct dma_fence *tmp, **fences;
+    struct dma_fence_array *array;
+    unsigned int count;
+
+    if (!chain)
+    return 0;
+
+    count = 0;
+    dma_fence_chain_for_each(tmp, &chain->base)
+    ++count;
+
+    fences = kmalloc_array(count, sizeof(*fences), GFP_KERNEL);
+    if (!fences)
+    return -ENOMEM;
+
+    count = 0;
+    dma_fence_chain_for_each(tmp, &chain->base)
+    fences[count++] = dma_fence_get(tmp);
+
+    array = dma_fence_array_create(count, fences,
+   dma_fence_context_alloc(1),



Hi Christian,


Sorry for the late answer to this.


It appears this commit is trying to remove the warnings added by 
"dma-buf: Warn about dma_fence_chain container rules"


Yes, correct. We are now enforcing some rules with warnings and this 
here bubbled up.




But the context allocation you added just above is breaking some 
tests. In particular igt@syncobj_timeline@transfer-timeline-point


That test transfer points into the timeline at point 3 and expects 
that we'll still on the previous points to complete.


Hui what? I don't understand the problem you are seeing here. What 
exactly is the test doing?




In my opinion we should be reusing the previous context number if 
there is one and only allocate if we don't have a point.


Scratching my head what you mean with that. The functionality 
transfers a synchronization fence from one timeline to another.


So as far as I can see the new point should be part of the timeline of 
the syncobj we are transferring to.


If the application wants to not depend on previous points for wait 
operations, it can reset the syncobj prior to adding a new point.


Well we should never lose synchronization. So what happens is that 
when we do the transfer all the fences of the source are flattened out 
into an array. And that array is then added as new point into the 
destination timeline.



In this case would be broken :


syncobjA <- signal point 1

syncobjA <- import syncobjB point 1 into syncobjA point 2

syncobjA <- query returns 0


-Lionel




Where exactly is the problem?

Regards,
Christian.





Cheers,


-Lionel




+   1, false);
+    if (!array)
+    goto free_fences;
+
+    dma_fence_put(*f);
+    *f = &array->base;
+    return 0;
+
+free_fences:
+    while (count--)
+    dma_fence_put(fences[count]);
+
+    kfree(fences);
+    return -ENOMEM;
+}
+
  static int drm_syncobj_transfer_to_timeline(struct drm_file 
*file_private,

  struct drm_syncobj_transfer *args)
  {
  struct drm_syncobj *timeline_syncobj = NULL;
-    struct dma_fence *fence;
  struct dma_fence_chain *chain;
+    struct dma_fence *fence;
  int ret;
    timeline_syncobj = drm_syncobj_find(file_private, 
args->dst_handle);
@@ -869,16 +914,22 @@ static int 
drm_syncobj_transfer_to_timeline(struct drm_file *file_private,

   args->src_point, args->flags,
   &fence);
  if (ret)
-    goto err;
+    goto err_put_timeline;
+
+    ret = drm_syncobj_flatten_chain(&fence);
+    if (ret)
+    goto err_free_fence;
+
  chain = dma_fence_chain_alloc();
  if (!chain) {
  ret = -ENOMEM;
-    goto err1;
+    goto err_free_fence;
  }
+
  drm_syncobj_add_point(timeline_syncobj, chain, fence, 
args->dst_point);

-err1:
+err_free_fence:
  dma_fence_put(fence);
-err:
+err_put_timeline:
  drm_syncobj_put(timeline_syncobj);
    return ret;









Re: [Intel-gfx] [PATCH] drm/syncobj: flatten dma_fence_chains on transfer

2022-05-24 Thread Lionel Landwerlin

On 09/02/2022 20:26, Christian König wrote:

It is illegal to add a dma_fence_chain as timeline point. Flatten out
the fences into a dma_fence_array instead.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/drm_syncobj.c | 61 ---
  1 file changed, 56 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index c313a5b4549c..7e48dcd1bee4 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -853,12 +853,57 @@ drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, 
void *data,
&args->handle);
  }
  
+

+/*
+ * Try to flatten a dma_fence_chain into a dma_fence_array so that it can be
+ * added as timeline fence to a chain again.
+ */
+static int drm_syncobj_flatten_chain(struct dma_fence **f)
+{
+   struct dma_fence_chain *chain = to_dma_fence_chain(*f);
+   struct dma_fence *tmp, **fences;
+   struct dma_fence_array *array;
+   unsigned int count;
+
+   if (!chain)
+   return 0;
+
+   count = 0;
+   dma_fence_chain_for_each(tmp, &chain->base)
+   ++count;
+
+   fences = kmalloc_array(count, sizeof(*fences), GFP_KERNEL);
+   if (!fences)
+   return -ENOMEM;
+
+   count = 0;
+   dma_fence_chain_for_each(tmp, &chain->base)
+   fences[count++] = dma_fence_get(tmp);
+
+   array = dma_fence_array_create(count, fences,
+  dma_fence_context_alloc(1),



Hi Christian,


Sorry for the late answer to this.


It appears this commit is trying to remove the warnings added by 
"dma-buf: Warn about dma_fence_chain container rules"


But the context allocation you added just above is breaking some tests. 
In particular igt@syncobj_timeline@transfer-timeline-point


That test transfer points into the timeline at point 3 and expects that 
we'll still on the previous points to complete.



In my opinion we should be reusing the previous context number if there 
is one and only allocate if we don't have a point.


If the application wants to not depend on previous points for wait 
operations, it can reset the syncobj prior to adding a new point.



Cheers,


-Lionel




+  1, false);
+   if (!array)
+   goto free_fences;
+
+   dma_fence_put(*f);
+   *f = &array->base;
+   return 0;
+
+free_fences:
+   while (count--)
+   dma_fence_put(fences[count]);
+
+   kfree(fences);
+   return -ENOMEM;
+}
+
  static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
struct drm_syncobj_transfer *args)
  {
struct drm_syncobj *timeline_syncobj = NULL;
-   struct dma_fence *fence;
struct dma_fence_chain *chain;
+   struct dma_fence *fence;
int ret;
  
  	timeline_syncobj = drm_syncobj_find(file_private, args->dst_handle);

@@ -869,16 +914,22 @@ static int drm_syncobj_transfer_to_timeline(struct 
drm_file *file_private,
 args->src_point, args->flags,
 &fence);
if (ret)
-   goto err;
+   goto err_put_timeline;
+
+   ret = drm_syncobj_flatten_chain(&fence);
+   if (ret)
+   goto err_free_fence;
+
chain = dma_fence_chain_alloc();
if (!chain) {
ret = -ENOMEM;
-   goto err1;
+   goto err_free_fence;
}
+
drm_syncobj_add_point(timeline_syncobj, chain, fence, args->dst_point);
-err1:
+err_free_fence:
dma_fence_put(fence);
-err:
+err_put_timeline:
drm_syncobj_put(timeline_syncobj);
  
  	return ret;





Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document

2022-05-24 Thread Lionel Landwerlin

On 20/05/2022 01:52, Zanoni, Paulo R wrote:

On Tue, 2022-05-17 at 11:32 -0700, Niranjana Vishwanathapura wrote:

VM_BIND design document with description of intended use cases.

v2: Add more documentation and format as per review comments
 from Daniel.

Signed-off-by: Niranjana Vishwanathapura 
---

diff --git a/Documentation/gpu/rfc/i915_vm_bind.rst 
b/Documentation/gpu/rfc/i915_vm_bind.rst
new file mode 100644
index ..f1be560d313c
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_vm_bind.rst
@@ -0,0 +1,304 @@
+==
+I915 VM_BIND feature design and use cases
+==
+
+VM_BIND feature
+
+DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM buffer
+objects (BOs) or sections of a BOs at specified GPU virtual addresses on a
+specified address space (VM). These mappings (also referred to as persistent
+mappings) will be persistent across multiple GPU submissions (execbuff calls)
+issued by the UMD, without user having to provide a list of all required
+mappings during each submission (as required by older execbuff mode).
+
+VM_BIND/UNBIND ioctls will support 'in' and 'out' fences to allow userpace
+to specify how the binding/unbinding should sync with other operations
+like the GPU job submission. These fences will be timeline 'drm_syncobj's
+for non-Compute contexts (See struct drm_i915_vm_bind_ext_timeline_fences).
+For Compute contexts, they will be user/memory fences (See struct
+drm_i915_vm_bind_ext_user_fence).
+
+VM_BIND feature is advertised to user via I915_PARAM_HAS_VM_BIND.
+User has to opt-in for VM_BIND mode of binding for an address space (VM)
+during VM creation time via I915_VM_CREATE_FLAGS_USE_VM_BIND extension.
+
+VM_BIND/UNBIND ioctl will immediately start binding/unbinding the mapping in an
+async worker. The binding and unbinding will work like a special GPU engine.
+The binding and unbinding operations are serialized and will wait on specified
+input fences before the operation and will signal the output fences upon the
+completion of the operation. Due to serialization, completion of an operation
+will also indicate that all previous operations are also complete.
+
+VM_BIND features include:
+
+* Multiple Virtual Address (VA) mappings can map to the same physical pages
+  of an object (aliasing).
+* VA mapping can map to a partial section of the BO (partial binding).
+* Support capture of persistent mappings in the dump upon GPU error.
+* TLB is flushed upon unbind completion. Batching of TLB flushes in some
+  use cases will be helpful.
+* Asynchronous vm_bind and vm_unbind support with 'in' and 'out' fences.
+* Support for userptr gem objects (no special uapi is required for this).
+
+Execbuff ioctl in VM_BIND mode
+---
+The execbuff ioctl handling in VM_BIND mode differs significantly from the
+older method. A VM in VM_BIND mode will not support older execbuff mode of
+binding. In VM_BIND mode, execbuff ioctl will not accept any execlist. Hence,
+no support for implicit sync. It is expected that the below work will be able
+to support requirements of object dependency setting in all use cases:
+
+"dma-buf: Add an API for exporting sync files"
+(https://lwn.net/Articles/859290/)

I would really like to have more details here. The link provided points
to new ioctls and we're not very familiar with those yet, so I think
you should really clarify the interaction between the new additions
here. Having some sample code would be really nice too.

For Mesa at least (and I believe for the other drivers too) we always
have a few exported buffers in every execbuf call, and we rely on the
implicit synchronization provided by execbuf to make sure everything
works. The execbuf ioctl also has some code to flush caches during
implicit synchronization AFAIR, so I would guess we rely on it too and
whatever else the Kernel does. Is that covered by the new ioctls?

In addition, as far as I remember, one of the big improvements of
vm_bind was that it would help reduce ioctl latency and cpu overhead.
But if making execbuf faster comes at the cost of requiring additional
ioctls calls for implicit synchronization, which is required on ever
execbuf call, then I wonder if we'll even get any faster at all.
Comparing old execbuf vs plain new execbuf without the new required
ioctls won't make sense.
But maybe I'm wrong and we won't need to call these new ioctls around
every single execbuf ioctl we submit? Again, more clarification and
some code examples here would be really nice. This is a big change on
an important part of the API, we should clarify the new expected usage.



Hey Paulo,


I think in the case of X11/Wayland, we'll be doing 1 or 2 extra ioctls 
per frame which seems pretty reasonable.


Essentially we need to set the dependencies on the buffer we´re going to 
tell the display engine (gnome-shell/kde/bare-display-hw) to use.



In the Vulkan case, we're t

Re: [PATCH v3] drm/doc: add rfc section for small BAR uapi

2022-05-17 Thread Lionel Landwerlin

On 17/05/2022 12:23, Tvrtko Ursulin wrote:


On 17/05/2022 09:55, Lionel Landwerlin wrote:

On 17/05/2022 11:29, Tvrtko Ursulin wrote:


On 16/05/2022 19:11, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)
v3:
   - Drop the vma query for now.
   - Add unallocated_cpu_visible_size as part of the region query.
   - Improve the docs some more, including documenting the expected
 behaviour on older kernels, since this came up in some offline
 discussion.

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Tvrtko Ursulin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jon Bloomfield 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-...@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 164 
+++

  Documentation/gpu/rfc/i915_small_bar.rst |  47 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 215 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h

new file mode 100644
index ..4079d287750b
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,164 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as 
known to the

+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS

+ * at &drm_i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+    /** @region: The class:instance pair encoding */
+    struct drm_i915_gem_memory_class_instance region;
+
+    /** @rsvd0: MBZ */
+    __u32 rsvd0;
+
+    /** @probed_size: Memory probed by the driver (-1 = unknown) */
+    __u64 probed_size;


Is -1 possible today or when it will be? For system memory it 
appears zeroes are returned today so that has to stay I think. Does 
it effectively mean userspace has to consider both 0 and -1 as 
unknown is the question.



I raised this on v2. As far as I can tell there are no situation 
where we would get -1.


Is it really probed_size=0 on smem?? It's not the case on the 
internal branch.


My bad, I misread the arguments to intel_memory_region_create while 
grepping:


struct intel_memory_region *i915_gem_shmem_setup(struct 
drm_i915_private *i915,

 u16 type, u16 instance)
{
return intel_memory_region_create(i915, 0,
  totalram_pages() << PAGE_SHIFT,
  PAGE_SIZE, 0, 0,
  type, instance,
  &shmem_region_ops);

I saw "0, 0" and wrongly assumed that would be the data, since it 
matched with my mental model and the comment against unallocated_size 
saying it's only tracked for device memory.


Although I'd say it is questionable for i915 to return this data. I 
wonder it use case is possible where it would even be wrong but don't 
know. I guess the cat is out of the bag now.



Not sure how questionable that is. There are a bunch of tools reporting 
the amount of memory available (free, top, htop, etc...).


It might not be totalram_pages() but probably something close to it.

Having a non 0 & non -1 value is useful.


-Lionel




If the situation is -1 for unknown and some valid size (not zero) I 
don't think there is a problem here.


Regards,

Tvrtko


Anv is not currently handling that case.


I would very much like to not deal with 0 for smem.

It really makes it easier for userspace rather than having to fish 
information from 2 different places and on top of dealing with 
multiple kernel versions.



-Lionel





+
+    /**
+ * @unallocated_size: Estimate of memory remaining (-1 = unknown)
+ *
+ * Note this is only currently tracked for 
I915_MEMORY_CLASS_DEVICE

+ * regions, and also requires CAP_PERFMON or CAP_SYS_ADMIN to get
+ * reliable accounting. Without this(or if this an older 
kernel) the


s/if this an/if this is an/

Also same question as above about -1.


+ * value here will always match the @probed_size.
+ */
+    __u64 unallocated_size;
+
+    union {
+    /** @rsvd1: MBZ */
+    __u64 rsvd1[8];
+    struct {
+    /**
+ * @probed_cpu_visible_size: Memory probed by the driver
+ * that is CPU accessible. (-1 = unknown).


Also question about -1. In this case this could be done since the 
field is yet to be added but I am curious if it ever can be -1.



+ *
+ * This will be always be <= @probed_s

Re: [PATCH v3] drm/doc: add rfc section for small BAR uapi

2022-05-17 Thread Lionel Landwerlin

On 17/05/2022 11:29, Tvrtko Ursulin wrote:


On 16/05/2022 19:11, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)
v3:
   - Drop the vma query for now.
   - Add unallocated_cpu_visible_size as part of the region query.
   - Improve the docs some more, including documenting the expected
 behaviour on older kernels, since this came up in some offline
 discussion.

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Tvrtko Ursulin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jon Bloomfield 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-...@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 164 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  47 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 215 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h

new file mode 100644
index ..4079d287750b
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,164 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as 
known to the

+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS

+ * at &drm_i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+    /** @region: The class:instance pair encoding */
+    struct drm_i915_gem_memory_class_instance region;
+
+    /** @rsvd0: MBZ */
+    __u32 rsvd0;
+
+    /** @probed_size: Memory probed by the driver (-1 = unknown) */
+    __u64 probed_size;


Is -1 possible today or when it will be? For system memory it appears 
zeroes are returned today so that has to stay I think. Does it 
effectively mean userspace has to consider both 0 and -1 as unknown is 
the question.



I raised this on v2. As far as I can tell there are no situation where 
we would get -1.


Is it really probed_size=0 on smem?? It's not the case on the internal 
branch.


Anv is not currently handling that case.


I would very much like to not deal with 0 for smem.

It really makes it easier for userspace rather than having to fish 
information from 2 different places and on top of dealing with multiple 
kernel versions.



-Lionel





+
+    /**
+ * @unallocated_size: Estimate of memory remaining (-1 = unknown)
+ *
+ * Note this is only currently tracked for I915_MEMORY_CLASS_DEVICE
+ * regions, and also requires CAP_PERFMON or CAP_SYS_ADMIN to get
+ * reliable accounting. Without this(or if this an older kernel) 
the


s/if this an/if this is an/

Also same question as above about -1.


+ * value here will always match the @probed_size.
+ */
+    __u64 unallocated_size;
+
+    union {
+    /** @rsvd1: MBZ */
+    __u64 rsvd1[8];
+    struct {
+    /**
+ * @probed_cpu_visible_size: Memory probed by the driver
+ * that is CPU accessible. (-1 = unknown).


Also question about -1. In this case this could be done since the 
field is yet to be added but I am curious if it ever can be -1.



+ *
+ * This will be always be <= @probed_size, and the
+ * remainder(if there is any) will not be CPU
+ * accessible.
+ *
+ * On systems without small BAR, the @probed_size will
+ * always equal the @probed_cpu_visible_size, since all
+ * of it will be CPU accessible.
+ *
+ * Note that if the value returned here is zero, then
+ * this must be an old kernel which lacks the relevant
+ * small-bar uAPI support(including


I have noticed you prefer no space before parentheses throughout the 
text so I guess it's just my preference to have it. Very nitpicky even 
if I am right so up to you.



+ * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS), but on
+ * such systems we should never actually end up with a
+ * small BAR configuration, assuming we are able to load
+ * the kernel module. Hence it should be safe to treat
+ * this the same as when @probed_cpu_visible_size ==
+ * @probed_size.
+ */
+    __u64 probed_cpu_visible_size;
+
+    /**
+ * @unallocated_cpu_visible_size: Estimate of CPU
+ * visible memory remaining (-1 = unknown).
+ *
+ * Note this is only 

Re: [PATCH v3] uapi/drm/i915: Document memory residency and Flat-CCS capability of obj

2022-05-16 Thread Lionel Landwerlin

On 14/05/2022 00:06, Jordan Justen wrote:

On 2022-05-13 05:31:00, Lionel Landwerlin wrote:

On 02/05/2022 17:15, Ramalingam C wrote:

Capture the impact of memory region preference list of the objects, on
their memory residency and Flat-CCS capability.

v2:
Fix the Flat-CCS capability of an obj with {lmem, smem} preference
list [Thomas]
v3:
Reworded the doc [Matt]

Signed-off-by: Ramalingam C
cc: Matthew Auld
cc: Thomas Hellstrom
cc: Daniel Vetter
cc: Jon Bloomfield
cc: Lionel Landwerlin
cc: Kenneth Graunke
cc:mesa-...@lists.freedesktop.org
cc: Jordan Justen
cc: Tony Ye
Reviewed-by: Matthew Auld
---
   include/uapi/drm/i915_drm.h | 16 
   1 file changed, 16 insertions(+)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index a2def7b27009..b7e1c2fe08dc 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3443,6 +3443,22 @@ struct drm_i915_gem_create_ext {
* At which point we get the object handle in 
&drm_i915_gem_create_ext.handle,
* along with the final object size in &drm_i915_gem_create_ext.size, which
* should account for any rounding up, if required.
+ *
+ * Note that userspace has no means of knowing the current backing region
+ * for objects where @num_regions is larger than one. The kernel will only
+ * ensure that the priority order of the @regions array is honoured, either
+ * when initially placing the object, or when moving memory around due to
+ * memory pressure
+ *
+ * On Flat-CCS capable HW, compression is supported for the objects residing
+ * in I915_MEMORY_CLASS_DEVICE. When such objects (compressed) has other
+ * memory class in @regions and migrated (by I915, due to memory
+ * constrain) to the non I915_MEMORY_CLASS_DEVICE region, then I915 needs to
+ * decompress the content. But I915 dosen't have the required information to
+ * decompress the userspace compressed objects.
+ *
+ * So I915 supports Flat-CCS, only on the objects which can reside only on
+ * I915_MEMORY_CLASS_DEVICE regions.

I think it's fine to assume Flat-CSS surface will always be in lmem.

I see no issue for the Anv Vulkan driver.

Maybe Nanley or Ken can speak for the Iris GL driver?


Acked-by: Jordan Justen

I think Nanley has accounted for this on iris with:

https://gitlab.freedesktop.org/mesa/mesa/-/commit/42a865730ef72574e179b56a314f30fdccc6cba8

-Jordan


Thanks Jordan,


We might want to through in an additional : assert((|flags 
&||BO_ALLOC_SMEM) == 0); in the CCS case

|

|
|

|-Lionel
|


Re: [PATCH v3] uapi/drm/i915: Document memory residency and Flat-CCS capability of obj

2022-05-13 Thread Lionel Landwerlin

On 02/05/2022 17:15, Ramalingam C wrote:

Capture the impact of memory region preference list of the objects, on
their memory residency and Flat-CCS capability.

v2:
   Fix the Flat-CCS capability of an obj with {lmem, smem} preference
   list [Thomas]
v3:
   Reworded the doc [Matt]

Signed-off-by: Ramalingam C 
cc: Matthew Auld 
cc: Thomas Hellstrom 
cc: Daniel Vetter 
cc: Jon Bloomfield 
cc: Lionel Landwerlin 
cc: Kenneth Graunke 
cc: mesa-...@lists.freedesktop.org
cc: Jordan Justen 
cc: Tony Ye 
Reviewed-by: Matthew Auld 
---
  include/uapi/drm/i915_drm.h | 16 
  1 file changed, 16 insertions(+)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index a2def7b27009..b7e1c2fe08dc 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3443,6 +3443,22 @@ struct drm_i915_gem_create_ext {
   * At which point we get the object handle in &drm_i915_gem_create_ext.handle,
   * along with the final object size in &drm_i915_gem_create_ext.size, which
   * should account for any rounding up, if required.
+ *
+ * Note that userspace has no means of knowing the current backing region
+ * for objects where @num_regions is larger than one. The kernel will only
+ * ensure that the priority order of the @regions array is honoured, either
+ * when initially placing the object, or when moving memory around due to
+ * memory pressure
+ *
+ * On Flat-CCS capable HW, compression is supported for the objects residing
+ * in I915_MEMORY_CLASS_DEVICE. When such objects (compressed) has other
+ * memory class in @regions and migrated (by I915, due to memory
+ * constrain) to the non I915_MEMORY_CLASS_DEVICE region, then I915 needs to
+ * decompress the content. But I915 dosen't have the required information to
+ * decompress the userspace compressed objects.
+ *
+ * So I915 supports Flat-CCS, only on the objects which can reside only on
+ * I915_MEMORY_CLASS_DEVICE regions.



I think it's fine to assume Flat-CSS surface will always be in lmem.

I see no issue for the Anv Vulkan driver.


Maybe Nanley or Ken can speak for the Iris GL driver?


-Lionel



   */
  struct drm_i915_gem_create_ext_memory_regions {
/** @base: Extension link. See struct i915_user_extension. */





Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-05-03 Thread Lionel Landwerlin

On 03/05/2022 17:27, Matthew Auld wrote:

On 03/05/2022 11:39, Lionel Landwerlin wrote:

On 03/05/2022 13:22, Matthew Auld wrote:

On 02/05/2022 09:53, Lionel Landwerlin wrote:

On 02/05/2022 10:54, Lionel Landwerlin wrote:

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-...@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 190 
+++

  Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 252 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h

new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region 
as known to the

+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS

+ * at &drm_i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+    /** @region: The class:instance pair encoding */
+    struct drm_i915_gem_memory_class_instance region;
+
+    /** @rsvd0: MBZ */
+    __u32 rsvd0;
+
+    /** @probed_size: Memory probed by the driver (-1 = unknown) */
+    __u64 probed_size;
+
+    /** @unallocated_size: Estimate of memory remaining (-1 = 
unknown) */

+    __u64 unallocated_size;
+
+    union {
+    /** @rsvd1: MBZ */
+    __u64 rsvd1[8];
+    struct {
+    /**
+ * @probed_cpu_visible_size: Memory probed by the 
driver

+ * that is CPU accessible. (-1 = unknown).
+ *
+ * This will be always be <= @probed_size, and the
+ * remainder(if there is any) will not be CPU
+ * accessible.
+ */
+    __u64 probed_cpu_visible_size;
+    };



Trying to implement userspace support in Vulkan for this, I have 
an additional question about the value of probed_cpu_visible_size.


When is it set to -1?

I'm guessing before there is support for this value it'll be 0 (MBZ).

After after it should either be the entire lmem or something smaller.


-Lionel



Other pain point of this new uAPI, previously we could query the 
unallocated size for each heap.


unallocated_size should always give the same value as probed_size. 
We have the avail tracking, but we don't currently expose that 
through unallocated_size, due to lack of real userspace/user etc.




Now lmem is effectively divided into 2 heaps, but unallocated_size 
is tracking allocation from both parts of lmem.


Yeah, if we ever properly expose the unallocated_size, then we could 
also just add unallocated_cpu_visible_size.




Is adding new I915_MEMORY_CLASS_DEVICE_NON_MAPPABLE out of question?


I don't think it's out of the question...

I guess user-space should be able to get the current flag behaviour 
just by specifying: device, system. And it does give more flexibly 
to allow something like: device, device-nm, smem.


We can also drop the probed_cpu_visible_size, which would now just 
be the probed_size with device/device-nm. And if we lack device-nm, 
then the entire thing must be CPU mappable.


One of the downsides though, is that we can no longer easily mix 
object pages from both device + device-nm, which we could previously 
do when we didn't specify the flag. At least according to the 
current design/behaviour for @regions that would not be allowed. I 
guess some kind of new flag like ALLOC_MIXED or so? Although 
currently that is only possible with device + device-nm in ttm/i915.



Thanks, I wasn't aware of the restrictions.

Adding unallocated_cpu_visible_size would be great.


So do we want this in the next version? i.e we already have a current 
real use case in mind for unallocated_size where probed_size is not 
good enough?



Yeah in the  next iteration.

We're using unallocated_size to implement VK_EXT_memory_budget and since 
I'm going to expose lmem mappable/unmappable as 2 different heaps on 
Vulkan, I would use that there too.



-Lionel







-Lionel







-Lionel






+    };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create 
behaviour, with added

+ * extension support using struct i915_user

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-05-03 Thread Lionel Landwerlin

On 03/05/2022 13:22, Matthew Auld wrote:

On 02/05/2022 09:53, Lionel Landwerlin wrote:

On 02/05/2022 10:54, Lionel Landwerlin wrote:

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-...@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 190 
+++

  Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 252 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h

new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as 
known to the

+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS

+ * at &drm_i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+    /** @region: The class:instance pair encoding */
+    struct drm_i915_gem_memory_class_instance region;
+
+    /** @rsvd0: MBZ */
+    __u32 rsvd0;
+
+    /** @probed_size: Memory probed by the driver (-1 = unknown) */
+    __u64 probed_size;
+
+    /** @unallocated_size: Estimate of memory remaining (-1 = 
unknown) */

+    __u64 unallocated_size;
+
+    union {
+    /** @rsvd1: MBZ */
+    __u64 rsvd1[8];
+    struct {
+    /**
+ * @probed_cpu_visible_size: Memory probed by the driver
+ * that is CPU accessible. (-1 = unknown).
+ *
+ * This will be always be <= @probed_size, and the
+ * remainder(if there is any) will not be CPU
+ * accessible.
+ */
+    __u64 probed_cpu_visible_size;
+    };



Trying to implement userspace support in Vulkan for this, I have an 
additional question about the value of probed_cpu_visible_size.


When is it set to -1?

I'm guessing before there is support for this value it'll be 0 (MBZ).

After after it should either be the entire lmem or something smaller.


-Lionel



Other pain point of this new uAPI, previously we could query the 
unallocated size for each heap.


unallocated_size should always give the same value as probed_size. We 
have the avail tracking, but we don't currently expose that through 
unallocated_size, due to lack of real userspace/user etc.




Now lmem is effectively divided into 2 heaps, but unallocated_size is 
tracking allocation from both parts of lmem.


Yeah, if we ever properly expose the unallocated_size, then we could 
also just add unallocated_cpu_visible_size.




Is adding new I915_MEMORY_CLASS_DEVICE_NON_MAPPABLE out of question?


I don't think it's out of the question...

I guess user-space should be able to get the current flag behaviour 
just by specifying: device, system. And it does give more flexibly to 
allow something like: device, device-nm, smem.


We can also drop the probed_cpu_visible_size, which would now just be 
the probed_size with device/device-nm. And if we lack device-nm, then 
the entire thing must be CPU mappable.


One of the downsides though, is that we can no longer easily mix 
object pages from both device + device-nm, which we could previously 
do when we didn't specify the flag. At least according to the current 
design/behaviour for @regions that would not be allowed. I guess some 
kind of new flag like ALLOC_MIXED or so? Although currently that is 
only possible with device + device-nm in ttm/i915.



Thanks, I wasn't aware of the restrictions.

Adding unallocated_cpu_visible_size would be great.


-Lionel







-Lionel






+    };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create 
behaviour, with added

+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for 
the stuff that
+ * is immutable. Previously we would have two ioctls, one to 
create the object
+ * with gem_create, and another to apply various parameters, 
however this
+ * creates some ambiguity for the params which are considered 
immutable. Also in

+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+    /**
+ * @size: Requested siz

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-05-03 Thread Lionel Landwerlin

On 03/05/2022 12:07, Matthew Auld wrote:

On 02/05/2022 19:03, Lionel Landwerlin wrote:

On 02/05/2022 20:58, Abodunrin, Akeem G wrote:



-Original Message-
From: Landwerlin, Lionel G 
Sent: Monday, May 2, 2022 12:55 AM
To: Auld, Matthew ; 
intel-...@lists.freedesktop.org

Cc: dri-devel@lists.freedesktop.org; Thomas Hellström
; Bloomfield, Jon
; Daniel Vetter ; 
Justen,

Jordan L ; Kenneth Graunke
; Abodunrin, Akeem G
; mesa-...@lists.freedesktop.org
Subject: Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
    - Some spelling fixes and other small tweaks. (Akeem & Thomas)
    - Rework error capture interactions, including no longer needing
  NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
    - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-...@lists.freedesktop.org
---
   Documentation/gpu/rfc/i915_small_bar.h   | 190

+++

Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
   Documentation/gpu/rfc/index.rst  |   4 +
   3 files changed, 252 insertions(+)
   create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
   create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h
b/Documentation/gpu/rfc/i915_small_bar.h
new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as
+known to the
+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct

drm_i915_query.

+ * For this new query we are adding the new query id
+DRM_I915_QUERY_MEMORY_REGIONS
+ * at &drm_i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+   /** @region: The class:instance pair encoding */
+   struct drm_i915_gem_memory_class_instance region;
+
+   /** @rsvd0: MBZ */
+   __u32 rsvd0;
+
+   /** @probed_size: Memory probed by the driver (-1 = unknown) */
+   __u64 probed_size;
+
+   /** @unallocated_size: Estimate of memory remaining (-1 = 
unknown)

*/

+   __u64 unallocated_size;
+
+   union {
+   /** @rsvd1: MBZ */
+   __u64 rsvd1[8];
+   struct {
+   /**
+    * @probed_cpu_visible_size: Memory probed by the

driver

+    * that is CPU accessible. (-1 = unknown).
+    *
+    * This will be always be <= @probed_size, and 
the

+    * remainder(if there is any) will not be CPU
+    * accessible.
+    */
+   __u64 probed_cpu_visible_size;
+   };


Trying to implement userspace support in Vulkan for this, I have an 
additional

question about the value of probed_cpu_visible_size.

When is it set to -1?
I believe it is set to -1 if it is unknown, and/or not cpu 
accessible...


Cheers!
~Akeem



So what should I expect on system memory?


I guess just probed_cpu_visible_size == probed_size. Or maybe we can 
just use -1 here?




What value is returned when all of probed_size is CPU visible on 
local memory?


probed_size == probed_cpu_visible_size.



Thanks, looks good to me.

Then maybe we should update the comment to say that.

Looks like there are no cases where we'll get -1.


-Lionel







Thanks,


-Lionel



I'm guessing before there is support for this value it'll be 0 (MBZ).

After after it should either be the entire lmem or something smaller.


-Lionel



+   };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour,
+with added
+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for the
+stuff that
+ * is immutable. Previously we would have two ioctls, one to create
+the object
+ * with gem_create, and another to apply various parameters, however
+this
+ * creates some ambiguity for the params which are considered
+immutable. Also in
+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+   /**
+    * @size: Requested size for the object.
+    *
+    * The (page-aligned) allocated size for the object will be 
returned.

+    *
+    * Note that for some devices we have might have further minimum
+    * page-size restrictions(larger than 4K), like for device 
local-memory.
+    * However in general the final size here should always 
reflect any

+    * rounding up, if for example using the

I915_GEM_CREATE_EXT_MEMORY_REGIONS

+    * extension to place the object in device local-memory.
+    */
+   __u64 size;
+   /**
+    * @handle: Returned handle for the object.
+    *
+    * Object handles are no

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-05-02 Thread Lionel Landwerlin

On 02/05/2022 20:58, Abodunrin, Akeem G wrote:



-Original Message-
From: Landwerlin, Lionel G 
Sent: Monday, May 2, 2022 12:55 AM
To: Auld, Matthew ; intel-...@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org; Thomas Hellström
; Bloomfield, Jon
; Daniel Vetter ; Justen,
Jordan L ; Kenneth Graunke
; Abodunrin, Akeem G
; mesa-...@lists.freedesktop.org
Subject: Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
- Some spelling fixes and other small tweaks. (Akeem & Thomas)
- Rework error capture interactions, including no longer needing
  NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
- Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-...@lists.freedesktop.org
---
   Documentation/gpu/rfc/i915_small_bar.h   | 190

+++

   Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
   Documentation/gpu/rfc/index.rst  |   4 +
   3 files changed, 252 insertions(+)
   create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
   create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h
b/Documentation/gpu/rfc/i915_small_bar.h
new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as
+known to the
+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct

drm_i915_query.

+ * For this new query we are adding the new query id
+DRM_I915_QUERY_MEMORY_REGIONS
+ * at &drm_i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+   /** @region: The class:instance pair encoding */
+   struct drm_i915_gem_memory_class_instance region;
+
+   /** @rsvd0: MBZ */
+   __u32 rsvd0;
+
+   /** @probed_size: Memory probed by the driver (-1 = unknown) */
+   __u64 probed_size;
+
+   /** @unallocated_size: Estimate of memory remaining (-1 = unknown)

*/

+   __u64 unallocated_size;
+
+   union {
+   /** @rsvd1: MBZ */
+   __u64 rsvd1[8];
+   struct {
+   /**
+* @probed_cpu_visible_size: Memory probed by the

driver

+* that is CPU accessible. (-1 = unknown).
+*
+* This will be always be <= @probed_size, and the
+* remainder(if there is any) will not be CPU
+* accessible.
+*/
+   __u64 probed_cpu_visible_size;
+   };


Trying to implement userspace support in Vulkan for this, I have an additional
question about the value of probed_cpu_visible_size.

When is it set to -1?

I believe it is set to -1 if it is unknown, and/or not cpu accessible...

Cheers!
~Akeem



So what should I expect on system memory?

What value is returned when all of probed_size is CPU visible on local 
memory?



Thanks,


-Lionel



I'm guessing before there is support for this value it'll be 0 (MBZ).

After after it should either be the entire lmem or something smaller.


-Lionel



+   };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour,
+with added
+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for the
+stuff that
+ * is immutable. Previously we would have two ioctls, one to create
+the object
+ * with gem_create, and another to apply various parameters, however
+this
+ * creates some ambiguity for the params which are considered
+immutable. Also in
+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+   /**
+* @size: Requested size for the object.
+*
+* The (page-aligned) allocated size for the object will be returned.
+*
+* Note that for some devices we have might have further minimum
+* page-size restrictions(larger than 4K), like for device local-memory.
+* However in general the final size here should always reflect any
+* rounding up, if for example using the

I915_GEM_CREATE_EXT_MEMORY_REGIONS

+* extension to place the object in device local-memory.
+*/
+   __u64 size;
+   /**
+* @handle: Returned handle for the object.
+*
+* Object handles are nonzero.
+*/
+   __u32 handle;
+   /**
+* @flags: Optional flags.
+*
+* Supported values:
+*
+* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the

kernel that

+* the object will need to be accessed via the CPU.
+*
+* Only valid when placing objects in I915_MEMORY_CLASS_DEVICE,

and

+* only strictly required on platforms where only some of t

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-05-02 Thread Lionel Landwerlin

On 02/05/2022 10:54, Lionel Landwerlin wrote:

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-...@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 190 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 252 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h

new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as 
known to the

+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS

+ * at &drm_i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+    /** @region: The class:instance pair encoding */
+    struct drm_i915_gem_memory_class_instance region;
+
+    /** @rsvd0: MBZ */
+    __u32 rsvd0;
+
+    /** @probed_size: Memory probed by the driver (-1 = unknown) */
+    __u64 probed_size;
+
+    /** @unallocated_size: Estimate of memory remaining (-1 = 
unknown) */

+    __u64 unallocated_size;
+
+    union {
+    /** @rsvd1: MBZ */
+    __u64 rsvd1[8];
+    struct {
+    /**
+ * @probed_cpu_visible_size: Memory probed by the driver
+ * that is CPU accessible. (-1 = unknown).
+ *
+ * This will be always be <= @probed_size, and the
+ * remainder(if there is any) will not be CPU
+ * accessible.
+ */
+    __u64 probed_cpu_visible_size;
+    };



Trying to implement userspace support in Vulkan for this, I have an 
additional question about the value of probed_cpu_visible_size.


When is it set to -1?

I'm guessing before there is support for this value it'll be 0 (MBZ).

After after it should either be the entire lmem or something smaller.


-Lionel



Other pain point of this new uAPI, previously we could query the 
unallocated size for each heap.


Now lmem is effectively divided into 2 heaps, but unallocated_size is 
tracking allocation from both parts of lmem.


Is adding new I915_MEMORY_CLASS_DEVICE_NON_MAPPABLE out of question?


-Lionel






+    };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, 
with added

+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for the 
stuff that
+ * is immutable. Previously we would have two ioctls, one to create 
the object
+ * with gem_create, and another to apply various parameters, however 
this
+ * creates some ambiguity for the params which are considered 
immutable. Also in

+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+    /**
+ * @size: Requested size for the object.
+ *
+ * The (page-aligned) allocated size for the object will be 
returned.

+ *
+ * Note that for some devices we have might have further minimum
+ * page-size restrictions(larger than 4K), like for device 
local-memory.

+ * However in general the final size here should always reflect any
+ * rounding up, if for example using the 
I915_GEM_CREATE_EXT_MEMORY_REGIONS

+ * extension to place the object in device local-memory.
+ */
+    __u64 size;
+    /**
+ * @handle: Returned handle for the object.
+ *
+ * Object handles are nonzero.
+ */
+    __u32 handle;
+    /**
+ * @flags: Optional flags.
+ *
+ * Supported values:
+ *
+ * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the 
kernel that

+ * the object will need to be accessed via the CPU.
+ *
+ * Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and
+ * only strictly required on platforms where only some of the 
device
+ * memory is directly visible or mappable through the CPU, like 
on DG2+.

+ *
+ * One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to
+ * ensure we can always spill the allocation to system memory, 
if we

+ * can't place the object in the mappable part of
+ * I915_MEMORY_CLASS_DEVICE.
+ *
+ * Note that since th

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-05-02 Thread Lionel Landwerlin

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-...@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 190 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 252 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h
new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as known to the
+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS
+ * at &drm_i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+   /** @region: The class:instance pair encoding */
+   struct drm_i915_gem_memory_class_instance region;
+
+   /** @rsvd0: MBZ */
+   __u32 rsvd0;
+
+   /** @probed_size: Memory probed by the driver (-1 = unknown) */
+   __u64 probed_size;
+
+   /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
+   __u64 unallocated_size;
+
+   union {
+   /** @rsvd1: MBZ */
+   __u64 rsvd1[8];
+   struct {
+   /**
+* @probed_cpu_visible_size: Memory probed by the driver
+* that is CPU accessible. (-1 = unknown).
+*
+* This will be always be <= @probed_size, and the
+* remainder(if there is any) will not be CPU
+* accessible.
+*/
+   __u64 probed_cpu_visible_size;
+   };



Trying to implement userspace support in Vulkan for this, I have an 
additional question about the value of probed_cpu_visible_size.


When is it set to -1?

I'm guessing before there is support for this value it'll be 0 (MBZ).

After after it should either be the entire lmem or something smaller.


-Lionel



+   };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added
+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for the stuff that
+ * is immutable. Previously we would have two ioctls, one to create the object
+ * with gem_create, and another to apply various parameters, however this
+ * creates some ambiguity for the params which are considered immutable. Also 
in
+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+   /**
+* @size: Requested size for the object.
+*
+* The (page-aligned) allocated size for the object will be returned.
+*
+* Note that for some devices we have might have further minimum
+* page-size restrictions(larger than 4K), like for device local-memory.
+* However in general the final size here should always reflect any
+* rounding up, if for example using the 
I915_GEM_CREATE_EXT_MEMORY_REGIONS
+* extension to place the object in device local-memory.
+*/
+   __u64 size;
+   /**
+* @handle: Returned handle for the object.
+*
+* Object handles are nonzero.
+*/
+   __u32 handle;
+   /**
+* @flags: Optional flags.
+*
+* Supported values:
+*
+* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that
+* the object will need to be accessed via the CPU.
+*
+* Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and
+* only strictly required on platforms where only some of the device
+* memory is directly visible or mappable through the CPU, like on DG2+.
+*
+* One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to
+* ensure we can always spill the allocation to system memory, if we
+* can't place the object in the mappable part of
+* I915_MEMORY_CLASS_DEVICE.
+*
+* Note that since the kernel only supports flat-CCS on objects that can
+* *only* be placed in I915_MEMORY_C

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-04-27 Thread Lionel Landwerlin

On 27/04/2022 18:18, Matthew Auld wrote:

On 27/04/2022 07:48, Lionel Landwerlin wrote:
One question though, how do we detect that this flag 
(I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) is accepted on a given 
kernel?
I assume older kernels are going to reject object creation if we use 
this flag?


From some offline discussion with Lionel, the plan here is to just do 
a dummy gem_create_ext to check if the kernel throws an error with the 
new flag or not.




I didn't plan to use __drm_i915_query_vma_info, but isn't it 
inconsistent to select the placement on the GEM object and then query 
whether it's mappable by address?
You made a comment stating this is racy, wouldn't querying on the GEM 
object prevent this?


Since mesa at this time doesn't currently have a use for this one, 
then I guess we should maybe just drop this part of the uapi, in this 
version at least, if no objections.



Just repeating what we discussed (maybe I missed some other discussion 
and that's why I was confused) :



The way I was planning to use this is to have 3 heaps in Vulkan :

    - heap0: local only, no cpu visible

    - heap1: system, cpu visible

    - heap2: local & cpu visible


With heap2 having the reported probed_cpu_visible_size size.

It is an error for the application to map from heap0 [1].


With that said, it means if we created a GEM BO without 
I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS, we'll never mmap it.


So why the query?

I guess it would be useful when we import a buffer from another 
application. But in that case, why not have the query on the BO?



-Lionel


[1] : 
https://www.khronos.org/registry/vulkan/specs/1.3-extensions/man/html/vkMapMemory.html 
(VUID-vkMapMemory-memory-00682)






Thanks,

-Lionel

On 27/04/2022 09:35, Lionel Landwerlin wrote:

Hi Matt,


The proposal looks good to me.

Looking forward to try it on drm-tip.


-Lionel

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-...@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 190 
+++

  Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 252 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h

new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as 
known to the

+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS

+ * at &drm_i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+    /** @region: The class:instance pair encoding */
+    struct drm_i915_gem_memory_class_instance region;
+
+    /** @rsvd0: MBZ */
+    __u32 rsvd0;
+
+    /** @probed_size: Memory probed by the driver (-1 = unknown) */
+    __u64 probed_size;
+
+    /** @unallocated_size: Estimate of memory remaining (-1 = 
unknown) */

+    __u64 unallocated_size;
+
+    union {
+    /** @rsvd1: MBZ */
+    __u64 rsvd1[8];
+    struct {
+    /**
+ * @probed_cpu_visible_size: Memory probed by the driver
+ * that is CPU accessible. (-1 = unknown).
+ *
+ * This will be always be <= @probed_size, and the
+ * remainder(if there is any) will not be CPU
+ * accessible.
+ */
+    __u64 probed_cpu_visible_size;
+    };
+    };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create 
behaviour, with added

+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for 
the stuff that
+ * is immutable. Previously we would have two ioctls, one to 
create the object
+ * with gem_create, and another to apply various parameters, 
however this
+ * creates some ambiguity for the params which are considered 
immutable. Also in

+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+    /**
+ * @size: Requested size for the object.
+ *
+ * The (page-aligned) allocated size for the object will be 
returned.

+ *
+ 

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-04-26 Thread Lionel Landwerlin
One question though, how do we detect that this flag 
(I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) is accepted on a given kernel?
I assume older kernels are going to reject object creation if we use 
this flag?


I didn't plan to use __drm_i915_query_vma_info, but isn't it 
inconsistent to select the placement on the GEM object and then query 
whether it's mappable by address?
You made a comment stating this is racy, wouldn't querying on the GEM 
object prevent this?


Thanks,

-Lionel

On 27/04/2022 09:35, Lionel Landwerlin wrote:

Hi Matt,


The proposal looks good to me.

Looking forward to try it on drm-tip.


-Lionel

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-...@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 190 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 252 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h

new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as 
known to the

+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS

+ * at &drm_i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+    /** @region: The class:instance pair encoding */
+    struct drm_i915_gem_memory_class_instance region;
+
+    /** @rsvd0: MBZ */
+    __u32 rsvd0;
+
+    /** @probed_size: Memory probed by the driver (-1 = unknown) */
+    __u64 probed_size;
+
+    /** @unallocated_size: Estimate of memory remaining (-1 = 
unknown) */

+    __u64 unallocated_size;
+
+    union {
+    /** @rsvd1: MBZ */
+    __u64 rsvd1[8];
+    struct {
+    /**
+ * @probed_cpu_visible_size: Memory probed by the driver
+ * that is CPU accessible. (-1 = unknown).
+ *
+ * This will be always be <= @probed_size, and the
+ * remainder(if there is any) will not be CPU
+ * accessible.
+ */
+    __u64 probed_cpu_visible_size;
+    };
+    };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, 
with added

+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for the 
stuff that
+ * is immutable. Previously we would have two ioctls, one to create 
the object
+ * with gem_create, and another to apply various parameters, however 
this
+ * creates some ambiguity for the params which are considered 
immutable. Also in

+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+    /**
+ * @size: Requested size for the object.
+ *
+ * The (page-aligned) allocated size for the object will be 
returned.

+ *
+ * Note that for some devices we have might have further minimum
+ * page-size restrictions(larger than 4K), like for device 
local-memory.

+ * However in general the final size here should always reflect any
+ * rounding up, if for example using the 
I915_GEM_CREATE_EXT_MEMORY_REGIONS

+ * extension to place the object in device local-memory.
+ */
+    __u64 size;
+    /**
+ * @handle: Returned handle for the object.
+ *
+ * Object handles are nonzero.
+ */
+    __u32 handle;
+    /**
+ * @flags: Optional flags.
+ *
+ * Supported values:
+ *
+ * I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the 
kernel that

+ * the object will need to be accessed via the CPU.
+ *
+ * Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and
+ * only strictly required on platforms where only some of the 
device
+ * memory is directly visible or mappable through the CPU, like 
on DG2+.

+ *
+ * One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to
+ * ensure we can always spill the allocation to system memory, 
if we

+ * can't place the object in the mappable part of
+ * I915_MEMORY_CLASS_DEVICE.
+ *
+ * Note that since the kernel only supports fla

Re: [PATCH v2] drm/doc: add rfc section for small BAR uapi

2022-04-26 Thread Lionel Landwerlin

Hi Matt,


The proposal looks good to me.

Looking forward to try it on drm-tip.


-Lionel

On 20/04/2022 20:13, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-...@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 190 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  58 +++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 252 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h
new file mode 100644
index ..7bfd0cf44d35
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,190 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as known to the
+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS
+ * at &drm_i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+   /** @region: The class:instance pair encoding */
+   struct drm_i915_gem_memory_class_instance region;
+
+   /** @rsvd0: MBZ */
+   __u32 rsvd0;
+
+   /** @probed_size: Memory probed by the driver (-1 = unknown) */
+   __u64 probed_size;
+
+   /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */
+   __u64 unallocated_size;
+
+   union {
+   /** @rsvd1: MBZ */
+   __u64 rsvd1[8];
+   struct {
+   /**
+* @probed_cpu_visible_size: Memory probed by the driver
+* that is CPU accessible. (-1 = unknown).
+*
+* This will be always be <= @probed_size, and the
+* remainder(if there is any) will not be CPU
+* accessible.
+*/
+   __u64 probed_cpu_visible_size;
+   };
+   };
+};
+
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added
+ * extension support using struct i915_user_extension.
+ *
+ * Note that new buffer flags should be added here, at least for the stuff that
+ * is immutable. Previously we would have two ioctls, one to create the object
+ * with gem_create, and another to apply various parameters, however this
+ * creates some ambiguity for the params which are considered immutable. Also 
in
+ * general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+   /**
+* @size: Requested size for the object.
+*
+* The (page-aligned) allocated size for the object will be returned.
+*
+* Note that for some devices we have might have further minimum
+* page-size restrictions(larger than 4K), like for device local-memory.
+* However in general the final size here should always reflect any
+* rounding up, if for example using the 
I915_GEM_CREATE_EXT_MEMORY_REGIONS
+* extension to place the object in device local-memory.
+*/
+   __u64 size;
+   /**
+* @handle: Returned handle for the object.
+*
+* Object handles are nonzero.
+*/
+   __u32 handle;
+   /**
+* @flags: Optional flags.
+*
+* Supported values:
+*
+* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that
+* the object will need to be accessed via the CPU.
+*
+* Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and
+* only strictly required on platforms where only some of the device
+* memory is directly visible or mappable through the CPU, like on DG2+.
+*
+* One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to
+* ensure we can always spill the allocation to system memory, if we
+* can't place the object in the mappable part of
+* I915_MEMORY_CLASS_DEVICE.
+*
+* Note that since the kernel only supports flat-CCS on objects that can
+* *only* be placed in I915_MEMORY_CLASS_DEVICE, we therefore don't
+* support I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS together with
+* flat-CCS.
+*
+* Without this hint, the kernel will assume that non-mappable
+* I915_MEMORY_CLASS_DE

Re: [Intel-gfx] [PATCH 2/2] drm/doc: add rfc section for small BAR uapi

2022-03-18 Thread Lionel Landwerlin

Hey Matthew, all,

This sounds like a good thing to have.
There are a number of DG2 machines where we have a small BAR and this is 
causing more apps to fail.


Anv currently reports 3 memory heaps to the app :

    - local device only (not host visible) -> mapped to lmem
    - device/cpu -> mapped to smem
    - local device but also host visible -> mapped to lmem

So we could use this straight away, by just not putting the 
I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS flag on the allocation of the 
first heap.


One thing I don't see in this proposal is how can we get the size of the 
2 lmem heap : cpu visible, cpu not visible

We could use that to report the appropriate size to the app.
We probably want to report a new drm_i915_memory_region_info and either :
    - put one of the reserve field to use to indicate : cpu visible
    - or define a new enum value in drm_i915_gem_memory_class

Cheers,

-Lionel


On 18/02/2022 13:22, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: mesa-...@lists.freedesktop.org
---
  Documentation/gpu/rfc/i915_small_bar.h   | 153 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  40 ++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 197 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h
new file mode 100644
index ..fa65835fd608
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,153 @@
+/**
+ * struct __drm_i915_gem_create_ext - Existing gem_create behaviour, with added
+ * extension support using struct i915_user_extension.
+ *
+ * Note that in the future we want to have our buffer flags here, at least for
+ * the stuff that is immutable. Previously we would have two ioctls, one to
+ * create the object with gem_create, and another to apply various parameters,
+ * however this creates some ambiguity for the params which are considered
+ * immutable. Also in general we're phasing out the various SET/GET ioctls.
+ */
+struct __drm_i915_gem_create_ext {
+   /**
+* @size: Requested size for the object.
+*
+* The (page-aligned) allocated size for the object will be returned.
+*
+* Note that for some devices we have might have further minimum
+* page-size restrictions(larger than 4K), like for device local-memory.
+* However in general the final size here should always reflect any
+* rounding up, if for example using the 
I915_GEM_CREATE_EXT_MEMORY_REGIONS
+* extension to place the object in device local-memory.
+*/
+   __u64 size;
+   /**
+* @handle: Returned handle for the object.
+*
+* Object handles are nonzero.
+*/
+   __u32 handle;
+   /**
+* @flags: Optional flags.
+*
+* Supported values:
+*
+* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that
+* the object will need to be accessed via the CPU.
+*
+* Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and
+* only strictly required on platforms where only some of the device
+* memory is directly visible or mappable through the CPU, like on DG2+.
+*
+* One of the placements MUST also be I915_MEMORY_CLASS_SYSTEM, to
+* ensure we can always spill the allocation to system memory, if we
+* can't place the object in the mappable part of
+* I915_MEMORY_CLASS_DEVICE.
+*
+* Note that buffers that need to be captured with EXEC_OBJECT_CAPTURE,
+* will need to enable this hint, if the object can also be placed in
+* I915_MEMORY_CLASS_DEVICE, starting from DG2+. The execbuf call will
+* throw an error otherwise. This also means that such objects will need
+* I915_MEMORY_CLASS_SYSTEM set as a possible placement.
+*
+* Without this hint, the kernel will assume that non-mappable
+* I915_MEMORY_CLASS_DEVICE is preferred for this object. Note that the
+* kernel can still migrate the object to the mappable part, as a last
+* resort, if userspace ever CPU faults this object, but this might be
+* expensive, and so ideally should be avoided.
+*/
+#define I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS (1 << 0)
+   __u32 flags;
+   /**
+* @extensions: The chain of extensions to apply to this object.
+*
+* This will be useful in the future when we need to support several
+* different extensions, and we need to apply more than one when
+* creating the object. See struct i915_user_extension.
+*
+* If we d

Re: [Intel-gfx] [PATCH v4 12/16] uapi/drm/dg2: Introduce format modifier for DG2 clear color

2021-12-15 Thread Lionel Landwerlin

On 09/12/2021 17:45, Ramalingam C wrote:

From: Mika Kahola 

DG2 clear color render compression uses Tile4 layout. Therefore, we need
to define a new format modifier for uAPI to support clear color rendering.

Signed-off-by: Mika Kahola 
cc: Anshuman Gupta 
Signed-off-by: Juha-Pekka Heikkilä 
Signed-off-by: Ramalingam C 
---
  drivers/gpu/drm/i915/display/intel_fb.c| 8 
  drivers/gpu/drm/i915/display/skl_universal_plane.c | 9 -
  include/uapi/drm/drm_fourcc.h  | 8 
  3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fb.c 
b/drivers/gpu/drm/i915/display/intel_fb.c
index e15216f1cb82..f10e77cb5b4a 100644
--- a/drivers/gpu/drm/i915/display/intel_fb.c
+++ b/drivers/gpu/drm/i915/display/intel_fb.c
@@ -144,6 +144,12 @@ static const struct intel_modifier_desc intel_modifiers[] 
= {
.modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
.display_ver = { 13, 14 },
.plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_MC,
+   }, {
+   .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC,
+   .display_ver = { 13, 14 },
+   .plane_caps = INTEL_PLANE_CAP_TILING_4 | 
INTEL_PLANE_CAP_CCS_RC_CC,
+
+   .ccs.cc_planes = BIT(1),
}, {
.modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
.display_ver = { 13, 14 },
@@ -559,6 +565,7 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, 
int color_plane)
else
return 512;
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
+   case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
case I915_FORMAT_MOD_4_TILED:
/*
@@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const struct 
drm_framebuffer *fb,
case I915_FORMAT_MOD_Yf_TILED:
return 1 * 1024 * 1024;
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
+   case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
return 16 * 1024;
default:
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c 
b/drivers/gpu/drm/i915/display/skl_universal_plane.c
index d80424194c75..9a89df9c0243 100644
--- a/drivers/gpu/drm/i915/display/skl_universal_plane.c
+++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c
@@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier)
return PLANE_CTL_TILED_4 |
PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
PLANE_CTL_CLEAR_COLOR_DISABLE;
+   case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
+   return PLANE_CTL_TILED_4 | 
PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
case I915_FORMAT_MOD_Y_TILED_CCS:
case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC:
return PLANE_CTL_TILED_Y | 
PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
@@ -2337,10 +2339,15 @@ skl_get_initial_plane_config(struct intel_crtc *crtc,
break;
case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */
if (HAS_4TILE(dev_priv)) {
-   if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
+   u32 rc_mask = PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
+ PLANE_CTL_CLEAR_COLOR_DISABLE;
+
+   if ((val & rc_mask) == rc_mask)
fb->modifier = 
I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
fb->modifier = 
I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
+   else if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
+   fb->modifier = 
I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC;
else
fb->modifier = I915_FORMAT_MOD_4_TILED;
} else {
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index 51fdda26844a..b155f69f2344 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -598,6 +598,14 @@ extern "C" {
   */
  #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11)
  


My colleague Nanley (Cc) had some requests for clarifications on this 
new modifier.


In particular in which plane is the clear color located.


I guess it wouldn't hurt to also state for each of the new modifiers 
defined in this series, how many planes and what data they contain.


Thanks,

-Lionel



+/*
+ * Intel color control surfaces (CCS) for DG2 clear color render compression.
+ *
+ * DG2 uses a unified compression format for clear color render compression.
+ * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
+ */
+#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC fourcc_mod_code(INTEL, 12)
+
  /*
   * Tiled, NV12MT, grouped in 64 (pi

Re: [PATCH v2] drm/syncobj: Deal with signalled fences in drm_syncobj_find_fence.

2021-12-08 Thread Lionel Landwerlin

On 08/12/2021 11:28, Christian König wrote:

Am 08.12.21 um 03:39 schrieb Bas Nieuwenhuizen:

dma_fence_chain_find_seqno only ever returns the top fence in the
chain or an unsignalled fence. Hence if we request a seqno that
is already signalled it returns a NULL fence. Some callers are
not prepared to handle this, like the syncobj transfer functions
for example.

This behavior is "new" with timeline syncobj and it looks like
not all callers were updated. To fix this behavior make sure
that a successful drm_sync_find_fence always returns a non-NULL
fence.

v2: Move the fix to drm_syncobj_find_fence from the transfer
 functions.

Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between 
binary and timeline v2")

Cc: sta...@vger.kernel.org
Signed-off-by: Bas Nieuwenhuizen 


Reviewed-by: Christian König 



Thanks!


Acked-by: Lionel Landwerlin 





---
  drivers/gpu/drm/drm_syncobj.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index fdd2ec87cdd1..11be91b5709b 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -404,8 +404,17 @@ int drm_syncobj_find_fence(struct drm_file 
*file_private,

    if (*fence) {
  ret = dma_fence_chain_find_seqno(fence, point);
-    if (!ret)
+    if (!ret) {
+    /* If the requested seqno is already signaled
+ * drm_syncobj_find_fence may return a NULL
+ * fence. To make sure the recipient gets
+ * signalled, use a new fence instead.
+ */
+    if (!*fence)
+    *fence = dma_fence_get_stub();
+
  goto out;
+    }
  dma_fence_put(*fence);
  } else {
  ret = -EINVAL;






Re: [PATCH] drm/syncobj: Deal with signalled fences in transfer.

2021-12-07 Thread Lionel Landwerlin

On 07/12/2021 13:00, Christian König wrote:

Am 07.12.21 um 11:40 schrieb Bas Nieuwenhuizen:
On Tue, Dec 7, 2021 at 8:21 AM Christian König 
 wrote:

Am 07.12.21 um 08:10 schrieb Lionel Landwerlin:

On 07/12/2021 03:32, Bas Nieuwenhuizen wrote:

See the comments in the code. Basically if the seqno is already
signalled then we get a NULL fence. If we then put the NULL fence
in a binary syncobj it counts as unsignalled, making that syncobj
pretty much useless for all expected uses.

Not 100% sure about the transfer to a timeline syncobj but I
believe it is needed there too, as AFAICT the add_point function
assumes the fence isn't NULL.

Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between
binary and timeline v2")
Cc: sta...@vger.kernel.org
Signed-off-by: Bas Nieuwenhuizen 
---
   drivers/gpu/drm/drm_syncobj.c | 26 ++
   1 file changed, 26 insertions(+)

diff --git a/drivers/gpu/drm/drm_syncobj.c
b/drivers/gpu/drm/drm_syncobj.c
index fdd2ec87cdd1..eb28a40400d2 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -861,6 +861,19 @@ static int
drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
    &fence);
   if (ret)
   goto err;
+
+    /* If the requested seqno is already signaled
drm_syncobj_find_fence may
+ * return a NULL fence. To make sure the recipient gets
signalled, use
+ * a new fence instead.
+ */
+    if (!fence) {
+    fence = dma_fence_allocate_private_stub();
+    if (!fence) {
+    ret = -ENOMEM;
+    goto err;
+    }
+    }
+


Shouldn't we fix drm_syncobj_find_fence() instead?

Mhm, now that you mention it. Bas, why do you think that
dma_fence_chain_find_seqno() may return NULL when the fence is already
signaled?

Double checking the code that should never ever happen.

Well, I tested the patch with
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fmerge_requests%2F14097%2Fdiffs%3Fcommit_id%3Dd4c5c840f4e3839f9f5c1747a9034eb2b565f5c0&data=04%7C01%7Cchristian.koenig%40amd.com%7Cc1ab29fc100842826f5d08d9b96e102a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637744705383763833%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=sXkTJWm%2FWm2xwgLGdepVWAOlqj%2FeArnvmMvnJpQ9YEs%3D&reserved=0 


so I'm pretty sure it happens, and this patch fixes  it, though I may
have misidentified what the code should do.

My reading is that the dma_fence_chain_for_each in
dma_fence_chain_find_seqno will never visit a signalled fence (unless
the top one is signalled), as dma_fence_chain_walk will never return a
signalled fence (it only returns on NULL or !signalled).


Ah, yes that suddenly makes more sense.


Happy to move this to drm_syncobj_find_fence.


No, I think that your current patch is fine.

That drm_syncobj_find_fence() only returns NULL when it can't find 
anything !signaled is correct behavior I think.



We should probably update the docs then :


 * Returns 0 on success or a negative error value on failure. On 
success @fence

 * contains a reference to the fence, which must be released by calling
 * dma_fence_put().


Looking at some of the kernel drivers, it looks like they don't all 
protect themselves against NULL pointers :



https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/vc4/vc4_gem.c#L1195

https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c#L1020


-Lionel




Going to push your original patch if nobody has any more objections.

But somebody might want to take care of the IGT as well.

Regards,
Christian.


Regards,
Christian.


By returning a stub fence for the timeline case if there isn't one.


Because the same NULL fence check appears missing in amdgpu (and
probably other drivers).


Also we should have tests for this in IGT.

AMD contributed some tests when this code was written but they never
got reviewed :(


-Lionel



   chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
   if (!chain) {
   ret = -ENOMEM;
@@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file
*file_private,
    args->src_point, args->flags, &fence);
   if (ret)
   goto err;
+
+    /* If the requested seqno is already signaled
drm_syncobj_find_fence may
+ * return a NULL fence. To make sure the recipient gets
signalled, use
+ * a new fence instead.
+ */
+    if (!fence) {
+    fence = dma_fence_allocate_private_stub();
+    if (!fence) {
+    ret = -ENOMEM;
+    goto err;
+    }
+    }
+
   drm_syncobj_replace_fence(binary_syncobj, fence);
   dma_fence_put(fence);
   err:








Re: [PATCH] drm/syncobj: Deal with signalled fences in transfer.

2021-12-06 Thread Lionel Landwerlin

On 07/12/2021 03:32, Bas Nieuwenhuizen wrote:

See the comments in the code. Basically if the seqno is already
signalled then we get a NULL fence. If we then put the NULL fence
in a binary syncobj it counts as unsignalled, making that syncobj
pretty much useless for all expected uses.

Not 100% sure about the transfer to a timeline syncobj but I
believe it is needed there too, as AFAICT the add_point function
assumes the fence isn't NULL.

Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between binary and timeline 
v2")
Cc: sta...@vger.kernel.org
Signed-off-by: Bas Nieuwenhuizen 
---
  drivers/gpu/drm/drm_syncobj.c | 26 ++
  1 file changed, 26 insertions(+)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index fdd2ec87cdd1..eb28a40400d2 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -861,6 +861,19 @@ static int drm_syncobj_transfer_to_timeline(struct 
drm_file *file_private,
 &fence);
if (ret)
goto err;
+
+   /* If the requested seqno is already signaled drm_syncobj_find_fence may
+* return a NULL fence. To make sure the recipient gets signalled, use
+* a new fence instead.
+*/
+   if (!fence) {
+   fence = dma_fence_allocate_private_stub();
+   if (!fence) {
+   ret = -ENOMEM;
+   goto err;
+   }
+   }
+



Shouldn't we fix drm_syncobj_find_fence() instead?

By returning a stub fence for the timeline case if there isn't one.


Because the same NULL fence check appears missing in amdgpu (and 
probably other drivers).



Also we should have tests for this in IGT.

AMD contributed some tests when this code was written but they never got 
reviewed :(



-Lionel



chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
if (!chain) {
ret = -ENOMEM;
@@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file 
*file_private,
 args->src_point, args->flags, &fence);
if (ret)
goto err;
+
+   /* If the requested seqno is already signaled drm_syncobj_find_fence may
+* return a NULL fence. To make sure the recipient gets signalled, use
+* a new fence instead.
+*/
+   if (!fence) {
+   fence = dma_fence_allocate_private_stub();
+   if (!fence) {
+   ret = -ENOMEM;
+   goto err;
+   }
+   }
+
drm_syncobj_replace_fence(binary_syncobj, fence);
dma_fence_put(fence);
  err:





Re: [PATCH v5 00/15] drm/i915: Introduce Intel PXP

2021-07-27 Thread Lionel Landwerlin

On 16/07/2021 07:10, Daniele Ceraolo Spurio wrote:

PXP (Protected Xe Path) is an i915 component, available on
GEN12+, that helps to establish the hardware protected session
and manage the status of the alive software session, as well
as its life cycle.

The main changes in v5 are:

- Rebased to new proto_ctx implementation.

- Squashed all uapi changes in a single patch and slightly updated docs.

- Now handling mei_pxp loading after i915

Tested with: https://patchwork.freedesktop.org/series/87570/

Cc: Gaurav Kumar 
Cc: Chris Wilson 
Cc: Rodrigo Vivi 
Cc: Joonas Lahtinen 
Cc: Juston Li 
Cc: Alan Previn 
Cc: Lionel Landwerlin 
Cc: Jason Ekstrand 
Cc: Daniel Vetter 



Updated the Mesa series for GL/Vulkan.
UAPI looks good :

Acked-by: Lionel Landwerlin 

Cheers,

-Lionel


Re: [PATCH 31/53] drm/i915/dg2: Report INSTDONE_GEOM values in error state

2021-07-02 Thread Lionel Landwerlin

On 01/07/2021 23:24, Matt Roper wrote:

Xe_HPG adds some additional INSTDONE_GEOM debug registers; the Mesa team
has indicated that having these reported in the error state would be
useful for debugging GPU hangs.  These registers are replicated per-DSS
with gslice steering.

Cc: Lionel Landwerlin 
Signed-off-by: Matt Roper 



Thanks,


Acked-by: Lionel Landwerlin 



---
  drivers/gpu/drm/i915/gt/intel_engine_cs.c|  7 +++
  drivers/gpu/drm/i915/gt/intel_engine_types.h |  3 +++
  drivers/gpu/drm/i915/i915_gpu_error.c| 10 --
  drivers/gpu/drm/i915/i915_reg.h  |  1 +
  4 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index e1302e9c168b..b3c002e4ae9f 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1220,6 +1220,13 @@ void intel_engine_get_instdone(const struct 
intel_engine_cs *engine,
  GEN7_ROW_INSTDONE);
}
}
+
+   if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 55)) {
+   for_each_instdone_gslice_dss_xehp(i915, sseu, iter, 
slice, subslice)
+   instdone->geom_svg[slice][subslice] =
+   read_subslice_reg(engine, slice, 
subslice,
+ 
XEHPG_INSTDONE_GEOM_SVG);
+   }
} else if (GRAPHICS_VER(i915) >= 7) {
instdone->instdone =
intel_uncore_read(uncore, RING_INSTDONE(mmio_base));
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index e917b7519f2b..93609d797ac2 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -80,6 +80,9 @@ struct intel_instdone {
u32 slice_common_extra[2];
u32 sampler[GEN_MAX_GSLICES][I915_MAX_SUBSLICES];
u32 row[GEN_MAX_GSLICES][I915_MAX_SUBSLICES];
+
+   /* Added in XeHPG */
+   u32 geom_svg[GEN_MAX_GSLICES][I915_MAX_SUBSLICES];
  };
  
  /*

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
b/drivers/gpu/drm/i915/i915_gpu_error.c
index c1e744b5ab47..4de7edc451ef 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -431,6 +431,7 @@ static void error_print_instdone(struct 
drm_i915_error_state_buf *m,
const struct sseu_dev_info *sseu = &ee->engine->gt->info.sseu;
int slice;
int subslice;
+   int iter;
  
  	err_printf(m, "  INSTDONE: 0x%08x\n",

   ee->instdone.instdone);
@@ -445,8 +446,6 @@ static void error_print_instdone(struct 
drm_i915_error_state_buf *m,
return;
  
  	if (GRAPHICS_VER_FULL(m->i915) >= IP_VER(12, 50)) {

-   int iter;
-
for_each_instdone_gslice_dss_xehp(m->i915, sseu, iter, slice, 
subslice)
err_printf(m, "  SAMPLER_INSTDONE[%d][%d]: 0x%08x\n",
   slice, subslice,
@@ -471,6 +470,13 @@ static void error_print_instdone(struct 
drm_i915_error_state_buf *m,
if (GRAPHICS_VER(m->i915) < 12)
return;
  
+	if (GRAPHICS_VER_FULL(m->i915) >= IP_VER(12, 55)) {

+   for_each_instdone_gslice_dss_xehp(m->i915, sseu, iter, slice, 
subslice)
+   err_printf(m, "  GEOM_SVGUNIT_INSTDONE[%d][%d]: 
0x%08x\n",
+  slice, subslice,
+  ee->instdone.geom_svg[slice][subslice]);
+   }
+
err_printf(m, "  SC_INSTDONE_EXTRA: 0x%08x\n",
   ee->instdone.slice_common_extra[0]);
err_printf(m, "  SC_INSTDONE_EXTRA2: 0x%08x\n",
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 35a42df1f2aa..d58864c7adc6 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -2686,6 +2686,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
  #define GEN12_SC_INSTDONE_EXTRA2  _MMIO(0x7108)
  #define GEN7_SAMPLER_INSTDONE _MMIO(0xe160)
  #define GEN7_ROW_INSTDONE _MMIO(0xe164)
+#define XEHPG_INSTDONE_GEOM_SVG_MMIO(0x666c)
  #define MCFG_MCR_SELECTOR _MMIO(0xfd0)
  #define SF_MCR_SELECTOR   _MMIO(0xfd8)
  #define GEN8_MCR_SELECTOR _MMIO(0xfdc)





Re: [Intel-gfx] [PATCH 3/3] drm/i915/uapi: Add query for L3 bank count

2021-06-10 Thread Lionel Landwerlin

On 10/06/2021 23:46, john.c.harri...@intel.com wrote:

From: John Harrison 

Various UMDs need to know the L3 bank count. So add a query API for it.

Signed-off-by: John Harrison 
---
  drivers/gpu/drm/i915/gt/intel_gt.c | 15 +++
  drivers/gpu/drm/i915/gt/intel_gt.h |  1 +
  drivers/gpu/drm/i915/i915_query.c  | 22 ++
  drivers/gpu/drm/i915/i915_reg.h|  1 +
  include/uapi/drm/i915_drm.h|  1 +
  5 files changed, 40 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
b/drivers/gpu/drm/i915/gt/intel_gt.c
index 2161bf01ef8b..708bb3581d83 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -704,3 +704,18 @@ void intel_gt_info_print(const struct intel_gt_info *info,
  
  	intel_sseu_dump(&info->sseu, p);

  }
+
+int intel_gt_get_l3bank_count(struct intel_gt *gt)
+{
+   struct drm_i915_private *i915 = gt->i915;
+   intel_wakeref_t wakeref;
+   u32 fuse3;
+
+   if (GRAPHICS_VER(i915) < 12)
+   return -ENODEV;
+
+   with_intel_runtime_pm(gt->uncore->rpm, wakeref)
+   fuse3 = intel_uncore_read(gt->uncore, GEN10_MIRROR_FUSE3);
+
+   return hweight32(REG_FIELD_GET(GEN12_GT_L3_MODE_MASK, ~fuse3));
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h 
b/drivers/gpu/drm/i915/gt/intel_gt.h
index 7ec395cace69..46aa1cf4cf30 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -77,6 +77,7 @@ static inline bool intel_gt_is_wedged(const struct intel_gt 
*gt)
  
  void intel_gt_info_print(const struct intel_gt_info *info,

 struct drm_printer *p);
+int intel_gt_get_l3bank_count(struct intel_gt *gt);
  
  void intel_gt_watchdog_work(struct work_struct *work);
  
diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c

index 96bd8fb3e895..0e92bb2d21b2 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -10,6 +10,7 @@
  #include "i915_perf.h"
  #include "i915_query.h"
  #include 
+#include "gt/intel_gt.h"
  
  static int copy_query_item(void *query_hdr, size_t query_sz,

   u32 total_length,
@@ -502,6 +503,26 @@ static int query_hwconfig_table(struct drm_i915_private 
*i915,
return hwconfig->size;
  }
  
+static int query_l3banks(struct drm_i915_private *i915,

+struct drm_i915_query_item *query_item)
+{
+   u32 banks;
+
+   if (query_item->length == 0)
+   return sizeof(banks);
+
+   if (query_item->length < sizeof(banks))
+   return -EINVAL;
+
+   banks = intel_gt_get_l3bank_count(&i915->gt);
+
+   if (copy_to_user(u64_to_user_ptr(query_item->data_ptr),
+&banks, sizeof(banks)))
+   return -EFAULT;
+
+   return sizeof(banks);
+}
+
  static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
struct drm_i915_query_item *query_item) 
= {
query_topology_info,
@@ -509,6 +530,7 @@ static int (* const i915_query_funcs[])(struct 
drm_i915_private *dev_priv,
query_perf_config,
query_memregion_info,
query_hwconfig_table,
+   query_l3banks,
  };
  
  int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index eb13c601d680..e9ba88fe3db7 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -3099,6 +3099,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
  #define   GEN10_MIRROR_FUSE3  _MMIO(0x9118)
  #define GEN10_L3BANK_PAIR_COUNT 4
  #define GEN10_L3BANK_MASK   0x0F
+#define GEN12_GT_L3_MODE_MASK 0xFF
  
  #define GEN8_EU_DISABLE0		_MMIO(0x9134)

  #define   GEN8_EU_DIS0_S0_MASK0xff
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 87d369cae22a..20d18cca5066 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -2234,6 +2234,7 @@ struct drm_i915_query_item {
  #define DRM_I915_QUERY_PERF_CONFIG  3
  #define DRM_I915_QUERY_MEMORY_REGIONS   4
  #define DRM_I915_QUERY_HWCONFIG_TABLE   5
+#define DRM_I915_QUERY_L3_BANK_COUNT6



A little bit of documentation about the format of the return data would 
be nice :)



-Lionel



  /* Must be kept compact -- no holes and well documented */
  
  	/**





Re: [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy

2021-04-29 Thread Lionel Landwerlin

On 29/04/2021 03:34, Umesh Nerlige Ramappa wrote:

Perf measurements rely on CPU and engine timestamps to correlate
events of interest across these time domains. Current mechanisms get
these timestamps separately and the calculated delta between these
timestamps lack enough accuracy.

To improve the accuracy of these time measurements to within a few us,
add a query that returns the engine and cpu timestamps captured as
close to each other as possible.

v2: (Tvrtko)
- document clock reference used
- return cpu timestamp always
- capture cpu time just before lower dword of cs timestamp

v3: (Chris)
- use uncore-rpm
- use __query_cs_timestamp helper

v4: (Lionel)
- Kernel perf subsytem allows users to specify the clock id to be used
   in perf_event_open. This clock id is used by the perf subsystem to
   return the appropriate cpu timestamp in perf events. Similarly, let
   the user pass the clockid to this query so that cpu timestamp
   corresponds to the clock id requested.

v5: (Tvrtko)
- Use normal ktime accessors instead of fast versions
- Add more uApi documentation

v6: (Lionel)
- Move switch out of spinlock

v7: (Chris)
- cs_timestamp is a misnomer, use cs_cycles instead
- return the cs cycle frequency as well in the query

v8:
- Add platform and engine specific checks

v9: (Lionel)
- Return 2 cpu timestamps in the query - captured before and after the
   register read

v10: (Chris)
- Use local_clock() to measure time taken to read lower dword of
   register and return it to user.

v11: (Jani)
- IS_GEN deprecated. User GRAPHICS_VER instead.

v12: (Jason)
- Split cpu timestamp array into timestamp and delta for cleaner API

Signed-off-by: Umesh Nerlige Ramappa 
Reviewed-by: Lionel Landwerlin 



Thanks for the update :


Reviewed-by: Lionel Landwerlin 



---
  drivers/gpu/drm/i915/i915_query.c | 148 ++
  include/uapi/drm/i915_drm.h   |  52 +++
  2 files changed, 200 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_query.c 
b/drivers/gpu/drm/i915/i915_query.c
index fed337ad7b68..357c44e8177c 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -6,6 +6,8 @@
  
  #include 
  
+#include "gt/intel_engine_pm.h"

+#include "gt/intel_engine_user.h"
  #include "i915_drv.h"
  #include "i915_perf.h"
  #include "i915_query.h"
@@ -90,6 +92,151 @@ static int query_topology_info(struct drm_i915_private 
*dev_priv,
return total_length;
  }
  
+typedef u64 (*__ktime_func_t)(void);

+static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
+{
+   /*
+* Use logic same as the perf subsystem to allow user to select the
+* reference clock id to be used for timestamps.
+*/
+   switch (clk_id) {
+   case CLOCK_MONOTONIC:
+   return &ktime_get_ns;
+   case CLOCK_MONOTONIC_RAW:
+   return &ktime_get_raw_ns;
+   case CLOCK_REALTIME:
+   return &ktime_get_real_ns;
+   case CLOCK_BOOTTIME:
+   return &ktime_get_boottime_ns;
+   case CLOCK_TAI:
+   return &ktime_get_clocktai_ns;
+   default:
+   return NULL;
+   }
+}
+
+static inline int
+__read_timestamps(struct intel_uncore *uncore,
+ i915_reg_t lower_reg,
+ i915_reg_t upper_reg,
+ u64 *cs_ts,
+ u64 *cpu_ts,
+ u64 *cpu_delta,
+ __ktime_func_t cpu_clock)
+{
+   u32 upper, lower, old_upper, loop = 0;
+
+   upper = intel_uncore_read_fw(uncore, upper_reg);
+   do {
+   *cpu_delta = local_clock();
+   *cpu_ts = cpu_clock();
+   lower = intel_uncore_read_fw(uncore, lower_reg);
+   *cpu_delta = local_clock() - *cpu_delta;
+   old_upper = upper;
+   upper = intel_uncore_read_fw(uncore, upper_reg);
+   } while (upper != old_upper && loop++ < 2);
+
+   *cs_ts = (u64)upper << 32 | lower;
+
+   return 0;
+}
+
+static int
+__query_cs_cycles(struct intel_engine_cs *engine,
+ u64 *cs_ts, u64 *cpu_ts, u64 *cpu_delta,
+ __ktime_func_t cpu_clock)
+{
+   struct intel_uncore *uncore = engine->uncore;
+   enum forcewake_domains fw_domains;
+   u32 base = engine->mmio_base;
+   intel_wakeref_t wakeref;
+   int ret;
+
+   fw_domains = intel_uncore_forcewake_for_reg(uncore,
+   RING_TIMESTAMP(base),
+   FW_REG_READ);
+
+   with_intel_runtime_pm(uncore->rpm, wakeref) {
+   spin_lock_irq(&uncore->lock);
+   intel_uncore_forcewake_get__locked(uncore, fw_domains);
+
+   ret = __read_timestamps(uncore,
+   RING_TIMESTAMP(base),
+   

Re: [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy

2021-04-28 Thread Lionel Landwerlin

On 28/04/2021 23:45, Jason Ekstrand wrote:

On Wed, Apr 28, 2021 at 3:14 PM Lionel Landwerlin
 wrote:

On 28/04/2021 22:54, Jason Ekstrand wrote:

On Wed, Apr 28, 2021 at 2:50 PM Lionel Landwerlin
 wrote:

On 28/04/2021 22:24, Jason Ekstrand wrote:

On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula  wrote:

On Tue, 27 Apr 2021, Umesh Nerlige Ramappa  
wrote:

Perf measurements rely on CPU and engine timestamps to correlate
events of interest across these time domains. Current mechanisms get
these timestamps separately and the calculated delta between these
timestamps lack enough accuracy.

To improve the accuracy of these time measurements to within a few us,
add a query that returns the engine and cpu timestamps captured as
close to each other as possible.

Cc: dri-devel, Jason and Daniel for review.

Thanks!

v2: (Tvrtko)
- document clock reference used
- return cpu timestamp always
- capture cpu time just before lower dword of cs timestamp

v3: (Chris)
- use uncore-rpm
- use __query_cs_timestamp helper

v4: (Lionel)
- Kernel perf subsytem allows users to specify the clock id to be used
in perf_event_open. This clock id is used by the perf subsystem to
return the appropriate cpu timestamp in perf events. Similarly, let
the user pass the clockid to this query so that cpu timestamp
corresponds to the clock id requested.

v5: (Tvrtko)
- Use normal ktime accessors instead of fast versions
- Add more uApi documentation

v6: (Lionel)
- Move switch out of spinlock

v7: (Chris)
- cs_timestamp is a misnomer, use cs_cycles instead
- return the cs cycle frequency as well in the query

v8:
- Add platform and engine specific checks

v9: (Lionel)
- Return 2 cpu timestamps in the query - captured before and after the
register read

v10: (Chris)
- Use local_clock() to measure time taken to read lower dword of
register and return it to user.

v11: (Jani)
- IS_GEN deprecated. User GRAPHICS_VER instead.

Signed-off-by: Umesh Nerlige Ramappa 
---
   drivers/gpu/drm/i915/i915_query.c | 145 ++
   include/uapi/drm/i915_drm.h   |  48 ++
   2 files changed, 193 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_query.c 
b/drivers/gpu/drm/i915/i915_query.c
index fed337ad7b68..2594b93901ac 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -6,6 +6,8 @@

   #include 

+#include "gt/intel_engine_pm.h"
+#include "gt/intel_engine_user.h"
   #include "i915_drv.h"
   #include "i915_perf.h"
   #include "i915_query.h"
@@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private 
*dev_priv,
return total_length;
   }

+typedef u64 (*__ktime_func_t)(void);
+static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
+{
+ /*
+  * Use logic same as the perf subsystem to allow user to select the
+  * reference clock id to be used for timestamps.
+  */
+ switch (clk_id) {
+ case CLOCK_MONOTONIC:
+ return &ktime_get_ns;
+ case CLOCK_MONOTONIC_RAW:
+ return &ktime_get_raw_ns;
+ case CLOCK_REALTIME:
+ return &ktime_get_real_ns;
+ case CLOCK_BOOTTIME:
+ return &ktime_get_boottime_ns;
+ case CLOCK_TAI:
+ return &ktime_get_clocktai_ns;
+ default:
+ return NULL;
+ }
+}
+
+static inline int
+__read_timestamps(struct intel_uncore *uncore,
+   i915_reg_t lower_reg,
+   i915_reg_t upper_reg,
+   u64 *cs_ts,
+   u64 *cpu_ts,
+   __ktime_func_t cpu_clock)
+{
+ u32 upper, lower, old_upper, loop = 0;
+
+ upper = intel_uncore_read_fw(uncore, upper_reg);
+ do {
+ cpu_ts[1] = local_clock();
+ cpu_ts[0] = cpu_clock();
+ lower = intel_uncore_read_fw(uncore, lower_reg);
+ cpu_ts[1] = local_clock() - cpu_ts[1];
+ old_upper = upper;
+ upper = intel_uncore_read_fw(uncore, upper_reg);
+ } while (upper != old_upper && loop++ < 2);
+
+ *cs_ts = (u64)upper << 32 | lower;
+
+ return 0;
+}
+
+static int
+__query_cs_cycles(struct intel_engine_cs *engine,
+   u64 *cs_ts, u64 *cpu_ts,
+   __ktime_func_t cpu_clock)
+{
+ struct intel_uncore *uncore = engine->uncore;
+ enum forcewake_domains fw_domains;
+ u32 base = engine->mmio_base;
+ intel_wakeref_t wakeref;
+ int ret;
+
+ fw_domains = intel_uncore_forcewake_for_reg(uncore,
+ RING_TIMESTAMP(base),
+ FW_REG_READ);
+
+ with_intel_runtime_pm(uncore->rpm, wakeref) {
+ spin_lock_irq(&uncore->lock);
+ intel_uncore_forcewake_get__locked(uncore, fw_domains);
+
+ ret = __read_timestamps(uncore,
+ RING_TIMESTAMP(base),
+   

Re: [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy

2021-04-28 Thread Lionel Landwerlin

On 28/04/2021 23:14, Lionel Landwerlin wrote:

On 28/04/2021 22:54, Jason Ekstrand wrote:

On Wed, Apr 28, 2021 at 2:50 PM Lionel Landwerlin
 wrote:

On 28/04/2021 22:24, Jason Ekstrand wrote:

On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula 
 wrote:


On Tue, 27 Apr 2021, Umesh Nerlige Ramappa 
 wrote:


Perf measurements rely on CPU and engine timestamps to correlate
events of interest across these time domains. Current mechanisms get
these timestamps separately and the calculated delta between these
timestamps lack enough accuracy.

To improve the accuracy of these time measurements to within a few us,
add a query that returns the engine and cpu timestamps captured as
close to each other as possible.

Cc: dri-devel, Jason and Daniel for review.

Thanks!

v2: (Tvrtko)
- document clock reference used
- return cpu timestamp always
- capture cpu time just before lower dword of cs timestamp

v3: (Chris)
- use uncore-rpm
- use __query_cs_timestamp helper

v4: (Lionel)
- Kernel perf subsytem allows users to specify the clock id to be used
   in perf_event_open. This clock id is used by the perf subsystem to
   return the appropriate cpu timestamp in perf events. Similarly, let
   the user pass the clockid to this query so that cpu timestamp
   corresponds to the clock id requested.

v5: (Tvrtko)
- Use normal ktime accessors instead of fast versions
- Add more uApi documentation

v6: (Lionel)
- Move switch out of spinlock

v7: (Chris)
- cs_timestamp is a misnomer, use cs_cycles instead
- return the cs cycle frequency as well in the query

v8:
- Add platform and engine specific checks

v9: (Lionel)
- Return 2 cpu timestamps in the query - captured before and after the
   register read

v10: (Chris)
- Use local_clock() to measure time taken to read lower dword of
   register and return it to user.

v11: (Jani)
- IS_GEN deprecated. User GRAPHICS_VER instead.

Signed-off-by: Umesh Nerlige Ramappa 
---
  drivers/gpu/drm/i915/i915_query.c | 145 
++

  include/uapi/drm/i915_drm.h   |  48 ++
  2 files changed, 193 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_query.c 
b/drivers/gpu/drm/i915/i915_query.c

index fed337ad7b68..2594b93901ac 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -6,6 +6,8 @@

  #include 

+#include "gt/intel_engine_pm.h"
+#include "gt/intel_engine_user.h"
  #include "i915_drv.h"
  #include "i915_perf.h"
  #include "i915_query.h"
@@ -90,6 +92,148 @@ static int query_topology_info(struct 
drm_i915_private *dev_priv,

   return total_length;
  }

+typedef u64 (*__ktime_func_t)(void);
+static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
+{
+ /*
+  * Use logic same as the perf subsystem to allow user to 
select the

+  * reference clock id to be used for timestamps.
+  */
+ switch (clk_id) {
+ case CLOCK_MONOTONIC:
+ return &ktime_get_ns;
+ case CLOCK_MONOTONIC_RAW:
+ return &ktime_get_raw_ns;
+ case CLOCK_REALTIME:
+ return &ktime_get_real_ns;
+ case CLOCK_BOOTTIME:
+ return &ktime_get_boottime_ns;
+ case CLOCK_TAI:
+ return &ktime_get_clocktai_ns;
+ default:
+ return NULL;
+ }
+}
+
+static inline int
+__read_timestamps(struct intel_uncore *uncore,
+   i915_reg_t lower_reg,
+   i915_reg_t upper_reg,
+   u64 *cs_ts,
+   u64 *cpu_ts,
+   __ktime_func_t cpu_clock)
+{
+ u32 upper, lower, old_upper, loop = 0;
+
+ upper = intel_uncore_read_fw(uncore, upper_reg);
+ do {
+ cpu_ts[1] = local_clock();
+ cpu_ts[0] = cpu_clock();
+ lower = intel_uncore_read_fw(uncore, lower_reg);
+ cpu_ts[1] = local_clock() - cpu_ts[1];
+ old_upper = upper;
+ upper = intel_uncore_read_fw(uncore, upper_reg);
+ } while (upper != old_upper && loop++ < 2);
+
+ *cs_ts = (u64)upper << 32 | lower;
+
+ return 0;
+}
+
+static int
+__query_cs_cycles(struct intel_engine_cs *engine,
+   u64 *cs_ts, u64 *cpu_ts,
+   __ktime_func_t cpu_clock)
+{
+ struct intel_uncore *uncore = engine->uncore;
+ enum forcewake_domains fw_domains;
+ u32 base = engine->mmio_base;
+ intel_wakeref_t wakeref;
+ int ret;
+
+ fw_domains = intel_uncore_forcewake_for_reg(uncore,
+ RING_TIMESTAMP(base),
+ FW_REG_READ);
+
+ with_intel_runtime_pm(uncore->rpm, wakeref) {
+ spin_lock_irq(&uncore->lock);
+ intel_uncore_forcewake_get__locked(uncore, fw_domains);
+
+ ret = __read_timestamps(uncore,
+ RING_TIMESTAMP(base),
+ RING_TIMESTAMP_UDW(base),
+ cs_ts,
+ cpu_ts,
+ cpu_clock)

Re: [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy

2021-04-28 Thread Lionel Landwerlin

On 28/04/2021 22:54, Jason Ekstrand wrote:

On Wed, Apr 28, 2021 at 2:50 PM Lionel Landwerlin
 wrote:

On 28/04/2021 22:24, Jason Ekstrand wrote:

On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula  wrote:

On Tue, 27 Apr 2021, Umesh Nerlige Ramappa  
wrote:

Perf measurements rely on CPU and engine timestamps to correlate
events of interest across these time domains. Current mechanisms get
these timestamps separately and the calculated delta between these
timestamps lack enough accuracy.

To improve the accuracy of these time measurements to within a few us,
add a query that returns the engine and cpu timestamps captured as
close to each other as possible.

Cc: dri-devel, Jason and Daniel for review.

Thanks!

v2: (Tvrtko)
- document clock reference used
- return cpu timestamp always
- capture cpu time just before lower dword of cs timestamp

v3: (Chris)
- use uncore-rpm
- use __query_cs_timestamp helper

v4: (Lionel)
- Kernel perf subsytem allows users to specify the clock id to be used
   in perf_event_open. This clock id is used by the perf subsystem to
   return the appropriate cpu timestamp in perf events. Similarly, let
   the user pass the clockid to this query so that cpu timestamp
   corresponds to the clock id requested.

v5: (Tvrtko)
- Use normal ktime accessors instead of fast versions
- Add more uApi documentation

v6: (Lionel)
- Move switch out of spinlock

v7: (Chris)
- cs_timestamp is a misnomer, use cs_cycles instead
- return the cs cycle frequency as well in the query

v8:
- Add platform and engine specific checks

v9: (Lionel)
- Return 2 cpu timestamps in the query - captured before and after the
   register read

v10: (Chris)
- Use local_clock() to measure time taken to read lower dword of
   register and return it to user.

v11: (Jani)
- IS_GEN deprecated. User GRAPHICS_VER instead.

Signed-off-by: Umesh Nerlige Ramappa 
---
  drivers/gpu/drm/i915/i915_query.c | 145 ++
  include/uapi/drm/i915_drm.h   |  48 ++
  2 files changed, 193 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_query.c 
b/drivers/gpu/drm/i915/i915_query.c
index fed337ad7b68..2594b93901ac 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -6,6 +6,8 @@

  #include 

+#include "gt/intel_engine_pm.h"
+#include "gt/intel_engine_user.h"
  #include "i915_drv.h"
  #include "i915_perf.h"
  #include "i915_query.h"
@@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private 
*dev_priv,
   return total_length;
  }

+typedef u64 (*__ktime_func_t)(void);
+static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
+{
+ /*
+  * Use logic same as the perf subsystem to allow user to select the
+  * reference clock id to be used for timestamps.
+  */
+ switch (clk_id) {
+ case CLOCK_MONOTONIC:
+ return &ktime_get_ns;
+ case CLOCK_MONOTONIC_RAW:
+ return &ktime_get_raw_ns;
+ case CLOCK_REALTIME:
+ return &ktime_get_real_ns;
+ case CLOCK_BOOTTIME:
+ return &ktime_get_boottime_ns;
+ case CLOCK_TAI:
+ return &ktime_get_clocktai_ns;
+ default:
+ return NULL;
+ }
+}
+
+static inline int
+__read_timestamps(struct intel_uncore *uncore,
+   i915_reg_t lower_reg,
+   i915_reg_t upper_reg,
+   u64 *cs_ts,
+   u64 *cpu_ts,
+   __ktime_func_t cpu_clock)
+{
+ u32 upper, lower, old_upper, loop = 0;
+
+ upper = intel_uncore_read_fw(uncore, upper_reg);
+ do {
+ cpu_ts[1] = local_clock();
+ cpu_ts[0] = cpu_clock();
+ lower = intel_uncore_read_fw(uncore, lower_reg);
+ cpu_ts[1] = local_clock() - cpu_ts[1];
+ old_upper = upper;
+ upper = intel_uncore_read_fw(uncore, upper_reg);
+ } while (upper != old_upper && loop++ < 2);
+
+ *cs_ts = (u64)upper << 32 | lower;
+
+ return 0;
+}
+
+static int
+__query_cs_cycles(struct intel_engine_cs *engine,
+   u64 *cs_ts, u64 *cpu_ts,
+   __ktime_func_t cpu_clock)
+{
+ struct intel_uncore *uncore = engine->uncore;
+ enum forcewake_domains fw_domains;
+ u32 base = engine->mmio_base;
+ intel_wakeref_t wakeref;
+ int ret;
+
+ fw_domains = intel_uncore_forcewake_for_reg(uncore,
+ RING_TIMESTAMP(base),
+ FW_REG_READ);
+
+ with_intel_runtime_pm(uncore->rpm, wakeref) {
+ spin_lock_irq(&uncore->lock);
+ intel_uncore_forcewake_get__locked(uncore, fw_domains);
+
+ ret = __read_timestamps(uncore,
+ RING_TIMESTAMP(base),
+ RING_TIMESTAMP_UDW(base),
+  

Re: [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy

2021-04-28 Thread Lionel Landwerlin

On 28/04/2021 22:24, Jason Ekstrand wrote:

On Wed, Apr 28, 2021 at 3:43 AM Jani Nikula  wrote:

On Tue, 27 Apr 2021, Umesh Nerlige Ramappa  
wrote:

Perf measurements rely on CPU and engine timestamps to correlate
events of interest across these time domains. Current mechanisms get
these timestamps separately and the calculated delta between these
timestamps lack enough accuracy.

To improve the accuracy of these time measurements to within a few us,
add a query that returns the engine and cpu timestamps captured as
close to each other as possible.

Cc: dri-devel, Jason and Daniel for review.

Thanks!


v2: (Tvrtko)
- document clock reference used
- return cpu timestamp always
- capture cpu time just before lower dword of cs timestamp

v3: (Chris)
- use uncore-rpm
- use __query_cs_timestamp helper

v4: (Lionel)
- Kernel perf subsytem allows users to specify the clock id to be used
   in perf_event_open. This clock id is used by the perf subsystem to
   return the appropriate cpu timestamp in perf events. Similarly, let
   the user pass the clockid to this query so that cpu timestamp
   corresponds to the clock id requested.

v5: (Tvrtko)
- Use normal ktime accessors instead of fast versions
- Add more uApi documentation

v6: (Lionel)
- Move switch out of spinlock

v7: (Chris)
- cs_timestamp is a misnomer, use cs_cycles instead
- return the cs cycle frequency as well in the query

v8:
- Add platform and engine specific checks

v9: (Lionel)
- Return 2 cpu timestamps in the query - captured before and after the
   register read

v10: (Chris)
- Use local_clock() to measure time taken to read lower dword of
   register and return it to user.

v11: (Jani)
- IS_GEN deprecated. User GRAPHICS_VER instead.

Signed-off-by: Umesh Nerlige Ramappa 
---
  drivers/gpu/drm/i915/i915_query.c | 145 ++
  include/uapi/drm/i915_drm.h   |  48 ++
  2 files changed, 193 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_query.c 
b/drivers/gpu/drm/i915/i915_query.c
index fed337ad7b68..2594b93901ac 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -6,6 +6,8 @@

  #include 

+#include "gt/intel_engine_pm.h"
+#include "gt/intel_engine_user.h"
  #include "i915_drv.h"
  #include "i915_perf.h"
  #include "i915_query.h"
@@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private 
*dev_priv,
   return total_length;
  }

+typedef u64 (*__ktime_func_t)(void);
+static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
+{
+ /*
+  * Use logic same as the perf subsystem to allow user to select the
+  * reference clock id to be used for timestamps.
+  */
+ switch (clk_id) {
+ case CLOCK_MONOTONIC:
+ return &ktime_get_ns;
+ case CLOCK_MONOTONIC_RAW:
+ return &ktime_get_raw_ns;
+ case CLOCK_REALTIME:
+ return &ktime_get_real_ns;
+ case CLOCK_BOOTTIME:
+ return &ktime_get_boottime_ns;
+ case CLOCK_TAI:
+ return &ktime_get_clocktai_ns;
+ default:
+ return NULL;
+ }
+}
+
+static inline int
+__read_timestamps(struct intel_uncore *uncore,
+   i915_reg_t lower_reg,
+   i915_reg_t upper_reg,
+   u64 *cs_ts,
+   u64 *cpu_ts,
+   __ktime_func_t cpu_clock)
+{
+ u32 upper, lower, old_upper, loop = 0;
+
+ upper = intel_uncore_read_fw(uncore, upper_reg);
+ do {
+ cpu_ts[1] = local_clock();
+ cpu_ts[0] = cpu_clock();
+ lower = intel_uncore_read_fw(uncore, lower_reg);
+ cpu_ts[1] = local_clock() - cpu_ts[1];
+ old_upper = upper;
+ upper = intel_uncore_read_fw(uncore, upper_reg);
+ } while (upper != old_upper && loop++ < 2);
+
+ *cs_ts = (u64)upper << 32 | lower;
+
+ return 0;
+}
+
+static int
+__query_cs_cycles(struct intel_engine_cs *engine,
+   u64 *cs_ts, u64 *cpu_ts,
+   __ktime_func_t cpu_clock)
+{
+ struct intel_uncore *uncore = engine->uncore;
+ enum forcewake_domains fw_domains;
+ u32 base = engine->mmio_base;
+ intel_wakeref_t wakeref;
+ int ret;
+
+ fw_domains = intel_uncore_forcewake_for_reg(uncore,
+ RING_TIMESTAMP(base),
+ FW_REG_READ);
+
+ with_intel_runtime_pm(uncore->rpm, wakeref) {
+ spin_lock_irq(&uncore->lock);
+ intel_uncore_forcewake_get__locked(uncore, fw_domains);
+
+ ret = __read_timestamps(uncore,
+ RING_TIMESTAMP(base),
+ RING_TIMESTAMP_UDW(base),
+ cs_ts,
+ cpu_ts,
+ cpu_clock);
+
+ intel_uncore_forcewake_put__locked(uncore, fw_domains);
+ spin_unlock_irq(&uncore->lock);
+ }
+
+

Re: [PATCH] drm: fix drm_mode_create_blob comment

2021-03-03 Thread Lionel Landwerlin

On 02/03/2021 20:48, Simon Ser wrote:

On Tuesday, March 2nd, 2021 at 7:47 PM, Lionel Landwerlin 
 wrote:


Thanks Simon. Do you have the rights to push this patch?

Ah, since you're asking about this, it probably means you don't have the
rights. I'll push the patch now to drm-misc-next.


Thanks a bunch!


-Lionel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm: fix drm_mode_create_blob comment

2021-03-02 Thread Lionel Landwerlin

Thanks Simon. Do you have the rights to push this patch?

-Lionel

On 02/03/2021 20:46, Simon Ser wrote:

Good catch!

Reviewed-by: Simon Ser 



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm: fix drm_mode_create_blob comment

2021-03-02 Thread Lionel Landwerlin
Just a silly mistake

Signed-off-by: Lionel Landwerlin 
Suggested-by: Ben Widawsky 
---
 include/uapi/drm/drm_mode.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h
index b49fbf2bdc408..93b494f704b91 100644
--- a/include/uapi/drm/drm_mode.h
+++ b/include/uapi/drm/drm_mode.h
@@ -993,7 +993,7 @@ struct drm_format_modifier {
 };
 
 /**
- * struct drm_mode_create_blob - Create New block property
+ * struct drm_mode_create_blob - Create New blob property
  *
  * Create a new 'blob' data property, copying length bytes from data pointer,
  * and returning new blob ID.
-- 
2.30.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v6 44/80] docs: gpu: i915.rst: Fix several C duplication warnings

2020-10-16 Thread Lionel Landwerlin

On 13/10/2020 14:53, Mauro Carvalho Chehab wrote:

As reported by Sphinx:

./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:1147: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_oa_wait_unlocked'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:1169: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_oa_poll_wait'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:1189: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_oa_read'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:2669: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_oa_stream_enable'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:2734: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_oa_stream_disable'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:2820: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_oa_stream_init'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3010: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_read'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3098: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_poll_locked'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3129: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_poll'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3152: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_enable_locked'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3181: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_disable_locked'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3273: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_ioctl'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3296: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_destroy_locked'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3321: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_release'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3379: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_open_ioctl_locked'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3534: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'read_properties_unlocked'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3717: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_open_ioctl'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3760: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_register'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3789: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_unregister'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:4009: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_add_config_ioctl'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:4162: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_remove_config_ioctl'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:4260: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_init'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:4423: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_fini'.

With Sphinx 3, C declarations can't be duplicated anymore,
so let's exclude those from the other internals 

Re: [PATCH v6 44/80] docs: gpu: i915.rst: Fix several C duplication warnings

2020-10-16 Thread Lionel Landwerlin

On 16/10/2020 14:50, Jani Nikula wrote:

On Fri, 16 Oct 2020, Lionel Landwerlin  wrote:

On 16/10/2020 14:37, Mauro Carvalho Chehab wrote:

Em Fri, 16 Oct 2020 14:01:07 +0300
Joonas Lahtinen  escreveu:


+ Lionel

Can you please take a look at best resolving the below problem.

Maybe we should eliminate the duplicate declarations? Updating such
a list manually seems error prone to me.

For Kernel 5.10, IMO the best is to apply this patch as-is, as any
other thing would need to be postponed, and we want 5.10 free of
doc warnings.


That's odd... Most of the functions are documented. Is it that we're
missing the "()" after the function name maybe?

The problem is we first include named functions, and then go on to
include everything again, duplicating the documentation for the named
functions.

BR,
Jani.



Thanks, now the patch makes sense.


-Lionel







-Lionel



Yet, when I wrote this one, I almost took a different approach:
to implement something like @*group (or \*group) directives that
exists on doxygen:

https://www.doxygen.nl/manual/grouping.html

If something like that gets added to kernel-doc syntax, then
one could do something like:

/**
 * DOC: some foo description
 * @group foo
 */
   
	/**

 * foo1 - do some foo things
 * @group foo
...
 */

/**
 * foo2 - do some other foo things
 * @group foo
...
 */

/**
 * bar - do bar things
 * @group bar
...
 */


And then, at kernel-doc markup:

FOO
===

.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
:group: foo


BAR
===
.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
:group: bar


I suspect that something like that would be a lot easier to maintain.

Once having someone like that implemented, it should be easy to also
have something like this:

OTHERS
==
.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
:export:
:not-grouped:

in order to pick other functions that aren't grouped.

I suspect that implementing something like that at kernel-doc.pl
won't be hard.

Regards,
Mauro


Regards, Joonas

Quoting Mauro Carvalho Chehab (2020-10-13 14:53:59)

As reported by Sphinx:

  ./Documentation/gpu/i915:646: 
./drivers/gpu/drm/i915/i915_perf.c:1147: WARNING: Duplicate C declaration, also 
defined in 'gpu/i915'.
  Declaration is 'i915_oa_wait_unlocked'.
  ./Documentation/gpu/i915:646: 
./drivers/gpu/drm/i915/i915_perf.c:1169: WARNING: Duplicate C declaration, also 
defined in 'gpu/i915'.
  Declaration is 'i915_oa_poll_wait'.
  ./Documentation/gpu/i915:646: 
./drivers/gpu/drm/i915/i915_perf.c:1189: WARNING: Duplicate C declaration, also 
defined in 'gpu/i915'.
  Declaration is 'i915_oa_read'.
  ./Documentation/gpu/i915:646: 
./drivers/gpu/drm/i915/i915_perf.c:2669: WARNING: Duplicate C declaration, also 
defined in 'gpu/i915'.
  Declaration is 'i915_oa_stream_enable'.
  ./Documentation/gpu/i915:646: 
./drivers/gpu/drm/i915/i915_perf.c:2734: WARNING: Duplicate C declaration, also 
defined in 'gpu/i915'.
  Declaration is 'i915_oa_stream_disable'.
  ./Documentation/gpu/i915:646: 
./drivers/gpu/drm/i915/i915_perf.c:2820: WARNING: Duplicate C declaration, also 
defined in 'gpu/i915'.
  Declaration is 'i915_oa_stream_init'.
  ./Documentation/gpu/i915:646: 
./drivers/gpu/drm/i915/i915_perf.c:3010: WARNING: Duplicate C declaration, also 
defined in 'gpu/i915'.
  Declaration is 'i915_perf_read'.
  ./Documentation/gpu/i915:646: 
./drivers/gpu/drm/i915/i915_perf.c:3098: WARNING: Duplicate C declaration, also 
defined in 'gpu/i915'.
  Declaration is 'i915_perf_poll_locked'.
  ./Documentation/gpu/i915:646: 
./drivers/gpu/drm/i915/i915_perf.c:3129: WARNING: Duplicate C declaration, also 
defined in 'gpu/i915'.
  Declaration is 'i915_perf_poll'.
  ./Documentation/gpu/i915:646: 
./drivers/gpu/drm/i915/i915_perf.c:3152: WARNING: Duplicate C declaration, also 
defined in 'gpu/i915'.
  Declaration is 'i915_perf_enable_locked'.
  ./Documentation/gpu/i915:646: 
./drivers/gpu/drm/i915/i915_perf.c:3181: WARNING: Duplicate C declaration, also 
defined in 'gpu/i915'.
  Declaration is 'i915_perf_disable_locked'.
  ./Documentation/gpu/i915:646: 
./drivers/gpu/drm/i915/i915_perf.c:3273: WARNING: Duplicate C declaration, also 
defined in 'gpu/i915'.
  Declaration is 'i915_perf_ioctl'.
  ./Documentation/gpu/i915:646: 
./drivers/gpu/drm/i

Re: [PATCH v6 44/80] docs: gpu: i915.rst: Fix several C duplication warnings

2020-10-16 Thread Lionel Landwerlin

On 13/10/2020 14:53, Mauro Carvalho Chehab wrote:

As reported by Sphinx:

./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:1147: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_oa_wait_unlocked'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:1169: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_oa_poll_wait'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:1189: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_oa_read'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:2669: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_oa_stream_enable'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:2734: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_oa_stream_disable'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:2820: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_oa_stream_init'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3010: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_read'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3098: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_poll_locked'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3129: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_poll'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3152: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_enable_locked'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3181: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_disable_locked'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3273: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_ioctl'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3296: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_destroy_locked'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3321: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_release'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3379: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_open_ioctl_locked'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3534: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'read_properties_unlocked'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3717: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_open_ioctl'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3760: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_register'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3789: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_unregister'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:4009: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_add_config_ioctl'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:4162: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_remove_config_ioctl'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:4260: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_init'.
./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:4423: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
Declaration is 'i915_perf_fini'.

With Sphinx 3, C declarations can't be duplicated anymore,
so let's exclude those from the other internals 

Re: [PATCH v6 44/80] docs: gpu: i915.rst: Fix several C duplication warnings

2020-10-16 Thread Lionel Landwerlin

On 16/10/2020 14:37, Mauro Carvalho Chehab wrote:

Em Fri, 16 Oct 2020 14:01:07 +0300
Joonas Lahtinen  escreveu:


+ Lionel

Can you please take a look at best resolving the below problem.

Maybe we should eliminate the duplicate declarations? Updating such
a list manually seems error prone to me.

For Kernel 5.10, IMO the best is to apply this patch as-is, as any
other thing would need to be postponed, and we want 5.10 free of
doc warnings.



That's odd... Most of the functions are documented. Is it that we're 
missing the "()" after the function name maybe?



-Lionel




Yet, when I wrote this one, I almost took a different approach:
to implement something like @*group (or \*group) directives that
exists on doxygen:

https://www.doxygen.nl/manual/grouping.html

If something like that gets added to kernel-doc syntax, then
one could do something like:

/**
 * DOC: some foo description
 * @group foo
 */
  
	/**

 * foo1 - do some foo things
 * @group foo
...
 */

/**
 * foo2 - do some other foo things
 * @group foo
...
 */

/**
 * bar - do bar things
 * @group bar
...
 */


And then, at kernel-doc markup:

FOO
===

.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
:group: foo


BAR
===
.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
:group: bar


I suspect that something like that would be a lot easier to maintain.

Once having someone like that implemented, it should be easy to also
have something like this:

OTHERS
==
.. kernel-doc:: drivers/gpu/drm/i915/i915_perf.c
:export:
:not-grouped:

in order to pick other functions that aren't grouped.

I suspect that implementing something like that at kernel-doc.pl
won't be hard.

Regards,
Mauro


Regards, Joonas

Quoting Mauro Carvalho Chehab (2020-10-13 14:53:59)

As reported by Sphinx:

 ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:1147: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
 Declaration is 'i915_oa_wait_unlocked'.
 ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:1169: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
 Declaration is 'i915_oa_poll_wait'.
 ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:1189: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
 Declaration is 'i915_oa_read'.
 ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:2669: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
 Declaration is 'i915_oa_stream_enable'.
 ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:2734: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
 Declaration is 'i915_oa_stream_disable'.
 ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:2820: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
 Declaration is 'i915_oa_stream_init'.
 ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3010: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
 Declaration is 'i915_perf_read'.
 ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3098: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
 Declaration is 'i915_perf_poll_locked'.
 ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3129: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
 Declaration is 'i915_perf_poll'.
 ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3152: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
 Declaration is 'i915_perf_enable_locked'.
 ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3181: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
 Declaration is 'i915_perf_disable_locked'.
 ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3273: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
 Declaration is 'i915_perf_ioctl'.
 ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3296: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
 Declaration is 'i915_perf_destroy_locked'.
 ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3321: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
 Declaration is 'i915_perf_release'.
 ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3379: 
WARNING: Duplicate C declaration, also defined in 'gpu/i915'.
 Declaration is 'i915_perf_open_ioctl_locked'.
 ./Documentation/gpu/i915:646: ./drivers/gpu/drm/i915/i915_perf.c:3534: 
WARN

Re: [PATCH] drm/syncobj: Tune down unordered timeline DRM_ERROR

2020-08-01 Thread Lionel Landwerlin

On 01/08/2020 12:26, Daniel Vetter wrote:

Userspace can provoke this, we generally don't allow userspace to spam
dmesg. Tune it down to debug. Unfortunately we don't have easy access
to the drm_device here (not at all without changing a few things), so
leave it as old style dmesg output for now.

References: https://patchwork.freedesktop.org/series/80146/
Signed-off-by: Daniel Vetter 
Cc: Chris Wilson 
Cc: Lionel Landwerlin 
Cc: "Christian König" 
---
  drivers/gpu/drm/drm_syncobj.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 3bf73971daf3..6e74e6745eca 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -297,7 +297,7 @@ void drm_syncobj_add_point(struct drm_syncobj *syncobj,
prev = drm_syncobj_fence_get(syncobj);
/* You are adding an unorder point to timeline, which could cause 
payload returned from query_ioctl is 0! */
if (prev && prev->seqno >= point)
-   DRM_ERROR("You are adding an unorder point to timeline!\n");
+   DRM_DEBUG("You are adding an unorder point to timeline!\n");
dma_fence_chain_init(chain, prev, fence, point);
rcu_assign_pointer(syncobj->fence, &chain->base);
  


Thanks,

Acked-by: Lionel Landwerlin 

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 1/2] Revert "dma-buf: Report signaled links inside dma-fence-chain"

2020-07-02 Thread Lionel Landwerlin

On 25/06/2020 15:43, Christian König wrote:

Am 25.06.20 um 14:34 schrieb Lionel Landwerlin:

This reverts commit 5de376bb434f80a13138f0ebedc8351ab73d8b0d.

This change breaks synchronization of a timeline.
dma_fence_chain_find_seqno() might be a bit of a confusing name but
this function is not trying to find a particular seqno, is supposed to
give a fence to wait on for a particular point in the timeline.

In a timeline, a particular value is reached when all the points up to
and including that value have signaled.

Signed-off-by: Lionel Landwerlin 


Reviewed-by: Christian König 



Now that you are a maintainer, feel free to merge this and the test changes.


Thanks,


-Lionel





---
  drivers/dma-buf/dma-fence-chain.c | 7 ---
  1 file changed, 7 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-chain.c 
b/drivers/dma-buf/dma-fence-chain.c

index c435bbba851c..3d123502ff12 100644
--- a/drivers/dma-buf/dma-fence-chain.c
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -99,12 +99,6 @@ int dma_fence_chain_find_seqno(struct dma_fence 
**pfence, uint64_t seqno)

  return -EINVAL;
    dma_fence_chain_for_each(*pfence, &chain->base) {
-    if ((*pfence)->seqno < seqno) { /* already signaled */
-    dma_fence_put(*pfence);
-    *pfence = NULL;
-    break;
-    }
-
  if ((*pfence)->context != chain->base.context ||
  to_dma_fence_chain(*pfence)->prev_seqno < seqno)
  break;
@@ -228,7 +222,6 @@ EXPORT_SYMBOL(dma_fence_chain_ops);
   * @chain: the chain node to initialize
   * @prev: the previous fence
   * @fence: the current fence
- * @seqno: the sequence number (syncpt) of the fence within the chain
   *
   * Initialize a new chain node and either start a new chain or add 
the node to

   * the existing chain of the previous fence.




___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] dma-buf: document dma-fence-chain purpose/behavior

2020-06-26 Thread Lionel Landwerlin

On 26/06/2020 17:22, Chris Wilson wrote:

Quoting Lionel Landwerlin (2020-06-26 13:21:00)

Trying to explain a bit how this thing works. In my opinion diagrams
are a bit easier to understand than words.

Signed-off-by: Lionel Landwerlin 
---
  drivers/dma-buf/dma-fence-chain.c | 37 +++
  1 file changed, 37 insertions(+)

diff --git a/drivers/dma-buf/dma-fence-chain.c 
b/drivers/dma-buf/dma-fence-chain.c
index 3d123502ff12..ac90ddf37b55 100644
--- a/drivers/dma-buf/dma-fence-chain.c
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -9,6 +9,43 @@
  
  #include 
  
+/**

+ * DOC: DMA fence chains overview
+ *
+ * DMA fence chains, represented by &struct dma_fence_chain, are a kernel
+ * internal synchronization primitive providing a wrapping mechanism of other
+ * DMA fences in the form a single link list.
+ *
+ * One of the use case of this primitive is to implement Vulkan timeline
+ * semaphores (see VK_KHR_timeline_semaphore extension or Vulkan specification
+ * 1.2).
+ *
+ * Each DMA fence chain item wraps 2 items :
+ *
+ * - A previous DMA fence.
+ *
+ * - A DMA fence associated to the current &struct dma_fence_chain.
+ *
+ * A DMA fence chain becomes signaled when its previous fence as well as its
+ * associated fence are signaled. If a chain of dma fence chains is created,
+ * this property recurses, meaning that any dma fence chain element in the
+ * list becomes signaled only if its associated fence and all the previous
+ * fences in the chain are also signaled.
+ *
+ * A DMA fence chain's seqno is specified through dma_fence_chain_init(). This
+ * value is lower bound to the seqno of the previous fence to ensure the chain
+ * is monotically increasing.
+ *
+ * By traversing the chain's linked list, one can compute a seqno number
+ * associated with the chain such that is the highest number for which all
+ * previous fences have signaled.

Next fence - 1 == highest seqno for all previous fences.

Ok, what about the end point then? If you ask for a seqno higher than
the last fence. Since that is not yet defined, it is an error, right?



Correct, find_seqno() will return -EINVAL in that case.


-Lionel



Otherwise, we could interpret the highest possible seqno for the last
fence as meaning U64_MAX.
-Chris



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] dma-buf: document dma-fence-chain purpose/behavior

2020-06-26 Thread Lionel Landwerlin

On 26/06/2020 15:43, Daniel Vetter wrote:

On Fri, Jun 26, 2020 at 2:21 PM Lionel Landwerlin
 wrote:

Trying to explain a bit how this thing works. In my opinion diagrams
are a bit easier to understand than words.

kerneldoc supports in-line DOT graphs, see e.g.

https://dri.freedesktop.org/docs/drm/gpu/drm-kms.html#overview

If that doesn't work, then you can include a full-blown svg too.

And yes for this a quick DOT graph that explains how things connect
sound like the perfect use of a diagramm.

Cheers, Daniel


Thanks!

Though I'm thinking I need a few to show the signaling behavior.

Not sure how tractable that is with DOT/SVG.

My last attempt was a series of slides...


-Lionel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] dma-buf: document dma-fence-chain purpose/behavior

2020-06-26 Thread Lionel Landwerlin
Trying to explain a bit how this thing works. In my opinion diagrams
are a bit easier to understand than words.

Signed-off-by: Lionel Landwerlin 
---
 drivers/dma-buf/dma-fence-chain.c | 37 +++
 1 file changed, 37 insertions(+)

diff --git a/drivers/dma-buf/dma-fence-chain.c 
b/drivers/dma-buf/dma-fence-chain.c
index 3d123502ff12..ac90ddf37b55 100644
--- a/drivers/dma-buf/dma-fence-chain.c
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -9,6 +9,43 @@
 
 #include 
 
+/**
+ * DOC: DMA fence chains overview
+ *
+ * DMA fence chains, represented by &struct dma_fence_chain, are a kernel
+ * internal synchronization primitive providing a wrapping mechanism of other
+ * DMA fences in the form a single link list.
+ *
+ * One of the use case of this primitive is to implement Vulkan timeline
+ * semaphores (see VK_KHR_timeline_semaphore extension or Vulkan specification
+ * 1.2).
+ *
+ * Each DMA fence chain item wraps 2 items :
+ *
+ * - A previous DMA fence.
+ *
+ * - A DMA fence associated to the current &struct dma_fence_chain.
+ *
+ * A DMA fence chain becomes signaled when its previous fence as well as its
+ * associated fence are signaled. If a chain of dma fence chains is created,
+ * this property recurses, meaning that any dma fence chain element in the
+ * list becomes signaled only if its associated fence and all the previous
+ * fences in the chain are also signaled.
+ *
+ * A DMA fence chain's seqno is specified through dma_fence_chain_init(). This
+ * value is lower bound to the seqno of the previous fence to ensure the chain
+ * is monotically increasing.
+ *
+ * By traversing the chain's linked list, one can compute a seqno number
+ * associated with the chain such that is the highest number for which all
+ * previous fences have signaled.
+ *
+ * One can also traverse the chain's linked list to find a &struct
+ * dma_fence_chain that when signaled guarantees that all previous fences in
+ * the chain are signaled. dma_fence_chain_find_seqno() provides this
+ * functionality.
+ */
+
 static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
 
 /**
-- 
2.27.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [PATCH 2/2] dma-buf: fix dma-fence-chain out of order test

2020-06-25 Thread Lionel Landwerlin

On 25/06/2020 16:47, Chris Wilson wrote:

Quoting Lionel Landwerlin (2020-06-25 14:23:25)

On 25/06/2020 16:18, Chris Wilson wrote:

Quoting Lionel Landwerlin (2020-06-25 13:34:43)

There was probably a misunderstand on how the dma-fence-chain is
supposed to work or what dma_fence_chain_find_seqno() is supposed to
return.

dma_fence_chain_find_seqno() is here to give us the fence to wait upon
for a particular point in the timeline. The timeline progresses only
when all the points prior to a given number have completed.

Hmm, the question was what point is it supposed to wait for.

For the simple chain of [1, 3], does 1 being signaled imply that all
points up to 3 are signaled, or does 3 not being signaled imply that all
points after 1 are not. If that's mentioned already somewhere, my bad.
If not, could you put the answer somewhere.
-Chris

In [1, 3], if 1 is signaled, the timeline value is 1. And find_seqno(2)
should return NULL.


In the out_of_order selftest the chain was [1, 2, 3], 2 was signaled and
the test was expecting no fence to be returned by find_seqno(2).

But we still have to wait on 1 to complete before find_seqno(2) can
return NULL (as in you don't have to wait on anything).

* scratches head

I thought it was meant to be expecting fc.chain[1] to still be present
as the chain at that point was not yet signaled.



You're right that the point is not yet signaled.

But it doesn't need to stay in the chain if you can wait on a previous 
point.



chain[1] gets removed as we walk the chain backward in dma_fence_chain_walk.


-Lionel




Oh well, a mistake compounded. :|
-Chris



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [PATCH 2/2] dma-buf: fix dma-fence-chain out of order test

2020-06-25 Thread Lionel Landwerlin

On 25/06/2020 16:18, Chris Wilson wrote:

Quoting Lionel Landwerlin (2020-06-25 13:34:43)

There was probably a misunderstand on how the dma-fence-chain is
supposed to work or what dma_fence_chain_find_seqno() is supposed to
return.

dma_fence_chain_find_seqno() is here to give us the fence to wait upon
for a particular point in the timeline. The timeline progresses only
when all the points prior to a given number have completed.

Hmm, the question was what point is it supposed to wait for.

For the simple chain of [1, 3], does 1 being signaled imply that all
points up to 3 are signaled, or does 3 not being signaled imply that all
points after 1 are not. If that's mentioned already somewhere, my bad.
If not, could you put the answer somewhere.
-Chris


In [1, 3], if 1 is signaled, the timeline value is 1. And find_seqno(2) 
should return NULL.



In the out_of_order selftest the chain was [1, 2, 3], 2 was signaled and 
the test was expecting no fence to be returned by find_seqno(2).


But we still have to wait on 1 to complete before find_seqno(2) can 
return NULL (as in you don't have to wait on anything).



Hope that answer the question.


-Lionel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 1/2] Revert "dma-buf: Report signaled links inside dma-fence-chain"

2020-06-25 Thread Lionel Landwerlin
This reverts commit 5de376bb434f80a13138f0ebedc8351ab73d8b0d.

This change breaks synchronization of a timeline.
dma_fence_chain_find_seqno() might be a bit of a confusing name but
this function is not trying to find a particular seqno, is supposed to
give a fence to wait on for a particular point in the timeline.

In a timeline, a particular value is reached when all the points up to
and including that value have signaled.

Signed-off-by: Lionel Landwerlin 
---
 drivers/dma-buf/dma-fence-chain.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-chain.c 
b/drivers/dma-buf/dma-fence-chain.c
index c435bbba851c..3d123502ff12 100644
--- a/drivers/dma-buf/dma-fence-chain.c
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -99,12 +99,6 @@ int dma_fence_chain_find_seqno(struct dma_fence **pfence, 
uint64_t seqno)
return -EINVAL;
 
dma_fence_chain_for_each(*pfence, &chain->base) {
-   if ((*pfence)->seqno < seqno) { /* already signaled */
-   dma_fence_put(*pfence);
-   *pfence = NULL;
-   break;
-   }
-
if ((*pfence)->context != chain->base.context ||
to_dma_fence_chain(*pfence)->prev_seqno < seqno)
break;
@@ -228,7 +222,6 @@ EXPORT_SYMBOL(dma_fence_chain_ops);
  * @chain: the chain node to initialize
  * @prev: the previous fence
  * @fence: the current fence
- * @seqno: the sequence number (syncpt) of the fence within the chain
  *
  * Initialize a new chain node and either start a new chain or add the node to
  * the existing chain of the previous fence.
-- 
2.27.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 2/2] dma-buf: fix dma-fence-chain out of order test

2020-06-25 Thread Lionel Landwerlin
There was probably a misunderstand on how the dma-fence-chain is
supposed to work or what dma_fence_chain_find_seqno() is supposed to
return.

dma_fence_chain_find_seqno() is here to give us the fence to wait upon
for a particular point in the timeline. The timeline progresses only
when all the points prior to a given number have completed.

Signed-off-by: Lionel Landwerlin 
Fixes: dc2f7e67a28a5c ("dma-buf: Exercise dma-fence-chain under selftests")
---
 drivers/dma-buf/st-dma-fence-chain.c | 43 ++--
 1 file changed, 21 insertions(+), 22 deletions(-)

diff --git a/drivers/dma-buf/st-dma-fence-chain.c 
b/drivers/dma-buf/st-dma-fence-chain.c
index 5d45ba7ba3cd..9525f7f56119 100644
--- a/drivers/dma-buf/st-dma-fence-chain.c
+++ b/drivers/dma-buf/st-dma-fence-chain.c
@@ -318,15 +318,16 @@ static int find_out_of_order(void *arg)
goto err;
}
 
-   if (fence && fence != fc.chains[1]) {
+   /*
+* We signaled the middle fence (2) of the 1-2-3 chain. The behavior
+* of the dma-fence-chain is to make us wait for all the fences up to
+* the point we want. Since fence 1 is still not signaled, this what
+* we should get as fence to wait upon (fence 2 being garbage
+* collected during the traversal of the chain).
+*/
+   if (fence != fc.chains[0]) {
pr_err("Incorrect chain-fence.seqno:%lld reported for completed 
seqno:2\n",
-  fence->seqno);
-
-   dma_fence_get(fence);
-   err = dma_fence_chain_find_seqno(&fence, 2);
-   dma_fence_put(fence);
-   if (err)
-   pr_err("Reported %d for finding self!\n", err);
+  fence ? fence->seqno : 0);
 
err = -EINVAL;
}
@@ -415,20 +416,18 @@ static int __find_race(void *arg)
if (!fence)
goto signal;
 
-   err = dma_fence_chain_find_seqno(&fence, seqno);
-   if (err) {
-   pr_err("Reported an invalid fence for find-self:%d\n",
-  seqno);
-   dma_fence_put(fence);
-   break;
-   }
-
-   if (fence->seqno < seqno) {
-   pr_err("Reported an earlier fence.seqno:%lld for 
seqno:%d\n",
-  fence->seqno, seqno);
-   err = -EINVAL;
-   dma_fence_put(fence);
-   break;
+   /*
+* We can only find ourselves if we are on fence we were
+* looking for.
+*/
+   if (fence->seqno == seqno) {
+   err = dma_fence_chain_find_seqno(&fence, seqno);
+   if (err) {
+   pr_err("Reported an invalid fence for 
find-self:%d\n",
+  seqno);
+   dma_fence_put(fence);
+   break;
+   }
}
 
dma_fence_put(fence);
-- 
2.27.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [PATCH 2/2] RFC drm/i915: Export per-client debug tracing

2020-03-01 Thread Lionel Landwerlin

On 01/03/2020 17:52, Chris Wilson wrote:

Rather than put sensitive, and often voluminous, user details into a
global dmesg, report the error and debug messages directly back to the
user via the kernel tracing mechanism.



Sounds really nice. Don't you want the existing global tracing to be the 
default at least until a client does a get_trace?



-Lionel




Signed-off-by: Chris Wilson 
Cc: Steven Rostedt (VMware) 
---
  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 104 ++-
  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 124 ++
  drivers/gpu/drm/i915/gem/i915_gem_pages.c |   6 +-
  drivers/gpu/drm/i915/i915_drv.h   |   4 +
  drivers/gpu/drm/i915/i915_gem.c   |   5 +-
  include/uapi/drm/i915_drm.h   |   7 +
  6 files changed, 156 insertions(+), 94 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index e525ead073f7..c136a8c90e27 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -81,6 +81,8 @@
  
  #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1
  
+#define CTX_TRACE(ctx, ...) TRACE((ctx)->file_priv->trace, __VA_ARGS__)

+
  static struct i915_global_gem_context {
struct i915_global base;
struct kmem_cache *slab_luts;
@@ -158,8 +160,12 @@ lookup_user_engine(struct i915_gem_context *ctx,
engine = intel_engine_lookup_user(ctx->i915,
  ci->engine_class,
  ci->engine_instance);
-   if (!engine)
+   if (!engine) {
+   CTX_TRACE(ctx,
+ "Unknown engine {class:%d, instance:%d}\n",
+ ci->engine_class, ci->engine_instance);
return ERR_PTR(-EINVAL);
+   }
  
  		idx = engine->legacy_idx;

} else {
@@ -762,8 +768,6 @@ i915_gem_create_context(struct drm_i915_private *i915, 
unsigned int flags)
  
  		ppgtt = i915_ppgtt_create(&i915->gt);

if (IS_ERR(ppgtt)) {
-   drm_dbg(&i915->drm, "PPGTT setup failed (%ld)\n",
-   PTR_ERR(ppgtt));
context_close(ctx);
return ERR_CAST(ppgtt);
}
@@ -1461,14 +1465,15 @@ set_engines__load_balance(struct i915_user_extension 
__user *base, void *data)
return -EFAULT;
  
  	if (idx >= set->engines->num_engines) {

-   drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n",
-   idx, set->engines->num_engines);
+   CTX_TRACE(set->ctx,
+ "Invalid placement value, %d >= %d\n",
+ idx, set->engines->num_engines);
return -EINVAL;
}
  
  	idx = array_index_nospec(idx, set->engines->num_engines);

if (set->engines->engines[idx]) {
-   drm_dbg(&i915->drm,
+   CTX_TRACE(set->ctx,
"Invalid placement[%d], already occupied\n", idx);
return -EEXIST;
}
@@ -1505,9 +1510,9 @@ set_engines__load_balance(struct i915_user_extension 
__user *base, void *data)
   ci.engine_class,
   ci.engine_instance);
if (!siblings[n]) {
-   drm_dbg(&i915->drm,
-   "Invalid sibling[%d]: { class:%d, inst:%d }\n",
-   n, ci.engine_class, ci.engine_instance);
+   CTX_TRACE(set->ctx,
+ "Invalid sibling[%d]: { class:%d, inst:%d 
}\n",
+ n, ci.engine_class, ci.engine_instance);
err = -EINVAL;
goto out_siblings;
}
@@ -1551,15 +1556,15 @@ set_engines__bond(struct i915_user_extension __user 
*base, void *data)
return -EFAULT;
  
  	if (idx >= set->engines->num_engines) {

-   drm_dbg(&i915->drm,
-   "Invalid index for virtual engine: %d >= %d\n",
-   idx, set->engines->num_engines);
+   CTX_TRACE(set->ctx,
+ "Invalid index for virtual engine: %d >= %d\n",
+ idx, set->engines->num_engines);
return -EINVAL;
}
  
  	idx = array_index_nospec(idx, set->engines->num_engines);

if (!set->engines->engines[idx]) {
-   drm_dbg(&i915->drm, "Invalid engine at %d\n", idx);
+   CTX_TRACE(set->ctx, "Invalid engine at %d\n", idx);
return -EINVAL;
}
virtual = set->engines->engines[idx]->engine;
@@ -1580,9 +1585,9 @@ set_engines__bond(struct i915_user_extension __user 
*base, void *data

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Lionel Landwerlin

On 28/02/2020 13:46, Michel Dänzer wrote:

On 2020-02-28 12:02 p.m., Erik Faye-Lund wrote:

On Fri, 2020-02-28 at 10:43 +, Daniel Stone wrote:

On Fri, 28 Feb 2020 at 10:06, Erik Faye-Lund
 wrote:

On Fri, 2020-02-28 at 11:40 +0200, Lionel Landwerlin wrote:

Yeah, changes on vulkan drivers or backend compilers should be
fairly
sandboxed.

We also have tools that only work for intel stuff, that should
never
trigger anything on other people's HW.

Could something be worked out using the tags?

I think so! We have the pre-defined environment variable
CI_MERGE_REQUEST_LABELS, and we can do variable conditions:

https://docs.gitlab.com/ee/ci/yaml/#onlyvariablesexceptvariables

That sounds like a pretty neat middle-ground to me. I just hope
that
new pipelines are triggered if new labels are added, because not
everyone is allowed to set labels, and sometimes people forget...

There's also this which is somewhat more robust:
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2569

I'm not sure it's more robust, but yeah that a useful tool too.

The reason I'm skeptical about the robustness is that we'll miss
testing if this misses a path.

Surely missing a path will be less likely / often to happen compared to
an MR missing a label. (Users which aren't members of the project can't
even set labels for an MR)



Sounds like a good alternative to tags.


-Lionel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

2020-02-28 Thread Lionel Landwerlin

On 28/02/2020 11:28, Erik Faye-Lund wrote:

On Fri, 2020-02-28 at 13:37 +1000, Dave Airlie wrote:

On Fri, 28 Feb 2020 at 07:27, Daniel Vetter 
wrote:

Hi all,

You might have read the short take in the X.org board meeting
minutes
already, here's the long version.

The good news: gitlab.fd.o has become very popular with our
communities, and is used extensively. This especially includes all
the
CI integration. Modern development process and tooling, yay!

The bad news: The cost in growth has also been tremendous, and it's
breaking our bank account. With reasonable estimates for continued
growth we're expecting hosting expenses totalling 75k USD this
year,
and 90k USD next year. With the current sponsors we've set up we
can't
sustain that. We estimate that hosting expenses for gitlab.fd.o
without any of the CI features enabled would total 30k USD, which
is
within X.org's ability to support through various sponsorships,
mostly
through XDC.

Note that X.org does no longer sponsor any CI runners themselves,
we've stopped that. The huge additional expenses are all just in
storing and serving build artifacts and images to outside CI
runners
sponsored by various companies. A related topic is that with the
growth in fd.o it's becoming infeasible to maintain it all on
volunteer admin time. X.org is therefore also looking for admin
sponsorship, at least medium term.

Assuming that we want cash flow reserves for one year of
gitlab.fd.o
(without CI support) and a trimmed XDC and assuming no sponsor
payment
meanwhile, we'd have to cut CI services somewhere between May and
June
this year. The board is of course working on acquiring sponsors,
but
filling a shortfall of this magnitude is neither easy nor quick
work,
and we therefore decided to give an early warning as soon as
possible.
Any help in finding sponsors for fd.o is very much appreciated.

a) Ouch.

b) we probably need to take a large step back here.


I kinda agree, but maybe the step doesn't have to be *too* large?

I wonder if we could solve this by restructuring the project a bit. I'm
talking purely from a Mesa point of view here, so it might not solve
the full problem, but:

1. It feels silly that we need to test changes to e.g the i965 driver
on dragonboards. We only have a big "do not run CI at all" escape-
hatch.



Yeah, changes on vulkan drivers or backend compilers should be fairly 
sandboxed.


We also have tools that only work for intel stuff, that should never 
trigger anything on other people's HW.


Could something be worked out using the tags?


-Lionel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2] drm/syncobj: Add documentation for timeline syncobj

2020-01-20 Thread Lionel Landwerlin

On 14/01/2020 16:25, Christian König wrote:

Am 14.01.20 um 13:19 schrieb Lionel Landwerlin:

We've added a set of new APIs to manipulate syncobjs holding timelines
of dma_fence. This adds a bit of documentation about how this works.

v2: Small language nits (Lionel)

Signed-off-by: Lionel Landwerlin 
Cc: Christian Koenig 
Cc: Jason Ekstrand 
Cc: David(ChunMing) Zhou 


Reviewed-by: Christian König 


Thanks for the review Christian.

Feel free to merge this commit whenever, I don't think I have commit rights.


Cheers,


-Lionel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v2] drm/syncobj: Add documentation for timeline syncobj

2020-01-14 Thread Lionel Landwerlin
We've added a set of new APIs to manipulate syncobjs holding timelines
of dma_fence. This adds a bit of documentation about how this works.

v2: Small language nits (Lionel)

Signed-off-by: Lionel Landwerlin 
Cc: Christian Koenig 
Cc: Jason Ekstrand 
Cc: David(ChunMing) Zhou 
---
 drivers/gpu/drm/drm_syncobj.c | 87 +--
 1 file changed, 74 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 669c93fe2500..42d46414f767 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -43,27 +43,66 @@
  *  - Signal a syncobj (set a trivially signaled fence)
  *  - Wait for a syncobj's fence to appear and be signaled
  *
+ * The syncobj userspace API also provides operations to manipulate a syncobj
+ * in terms of a timeline of struct &dma_fence_chain rather than a single
+ * struct &dma_fence, through the following operations:
+ *
+ *   - Signal a given point on the timeline
+ *   - Wait for a given point to appear and/or be signaled
+ *   - Import and export from/to a given point of a timeline
+ *
  * At it's core, a syncobj is simply a wrapper around a pointer to a struct
  * &dma_fence which may be NULL.
  * When a syncobj is first created, its pointer is either NULL or a pointer
  * to an already signaled fence depending on whether the
  * &DRM_SYNCOBJ_CREATE_SIGNALED flag is passed to
  * &DRM_IOCTL_SYNCOBJ_CREATE.
- * When GPU work which signals a syncobj is enqueued in a DRM driver,
- * the syncobj fence is replaced with a fence which will be signaled by the
- * completion of that work.
- * When GPU work which waits on a syncobj is enqueued in a DRM driver, the
- * driver retrieves syncobj's current fence at the time the work is enqueued
- * waits on that fence before submitting the work to hardware.
- * If the syncobj's fence is NULL, the enqueue operation is expected to fail.
- * All manipulation of the syncobjs's fence happens in terms of the current
- * fence at the time the ioctl is called by userspace regardless of whether
- * that operation is an immediate host-side operation (signal or reset) or
- * or an operation which is enqueued in some driver queue.
- * &DRM_IOCTL_SYNCOBJ_RESET and &DRM_IOCTL_SYNCOBJ_SIGNAL can be used to
- * manipulate a syncobj from the host by resetting its pointer to NULL or
+ *
+ * If the syncobj is considered as a binary (its state is either signaled or
+ * unsignaled) primitive, when GPU work is enqueued in a DRM driver to signal
+ * the syncobj, the syncobj's fence is replaced with a fence which will be
+ * signaled by the completion of that work.
+ * If the syncobj is considered as a timeline primitive, when GPU work is
+ * enqueued in a DRM driver to signal the a given point of the syncobj, a new
+ * struct &dma_fence_chain pointing to the DRM driver's fence and also
+ * pointing to the previous fence that was in the syncobj. The new struct
+ * &dma_fence_chain fence replace the syncobj's fence and will be signaled by
+ * completion of the DRM driver's work and also any work associated with the
+ * fence previously in the syncobj.
+ *
+ * When GPU work which waits on a syncobj is enqueued in a DRM driver, at the
+ * time the work is enqueued, it waits on the syncobj's fence before
+ * submitting the work to hardware. That fence is either :
+ *
+ *- The syncobj's current fence if the syncobj is considered as a binary
+ *  primitive.
+ *- The struct &dma_fence associated with a given point if the syncobj is
+ *  considered as a timeline primitive.
+ *
+ * If the syncobj's fence is NULL or not present in the syncobj's timeline,
+ * the enqueue operation is expected to fail.
+ *
+ * With binary syncobj, all manipulation of the syncobjs's fence happens in
+ * terms of the current fence at the time the ioctl is called by userspace
+ * regardless of whether that operation is an immediate host-side operation
+ * (signal or reset) or or an operation which is enqueued in some driver
+ * queue. &DRM_IOCTL_SYNCOBJ_RESET and &DRM_IOCTL_SYNCOBJ_SIGNAL can be used
+ * to manipulate a syncobj from the host by resetting its pointer to NULL or
  * setting its pointer to a fence which is already signaled.
  *
+ * With a timeline syncobj, all manipulation of the synobj's fence happens in
+ * terms of a u64 value referring to point in the timeline. See
+ * dma_fence_chain_find_seqno() to see how a given point is found in the
+ * timeline.
+ *
+ * Note that applications should be careful to always use timeline set of
+ * ioctl() when dealing with syncobj considered as timeline. Using a binary
+ * set of ioctl() with a syncobj considered as timeline could result incorrect
+ * synchronization. The use of binary syncobj is supported through the
+ * timeline set of ioctl() by using a point value of 0, this will reproduce
+ * the behavior o

Re: [PATCH 1/1] drm/syncobj: add sideband payload

2019-10-18 Thread Lionel Landwerlin
Following earlier discussions in particular with James Jones at Nvidia, 
I think we established this patch/feature is not needed.


This feature was indented to fix a failing test on our implementation.
I've just submitted a MR to delete that test : 
https://gitlab.freedesktop.org/mesa/crucible/merge_requests/55

I think it is invalid.

We should be able to workaround the submission thread race condition 
issue by just resetting a binary semaphore to be signaled in 
vkQueueSubmit before submitting the workload, so that further waits 
happen on the right dma-fence.
This might be a bit more costly (more ioctls) than the feature in this 
patch, so I'm looking for your feedback on this.


Thanks a lot,

-Lionel

On 17/09/2019 16:06, Lionel Landwerlin wrote:

Thanks David,

I'll try to fix the test to match AMD's restrictions.

The v7 here was to fix another existing test : 
dEQP-VK.api.external.fence.sync_fd.transference_temporary


Cheers,

-Lionel

On 17/09/2019 15:36, Zhou, David(ChunMing) wrote:

Hi Lionel,
The update looks good to me.
I tried your signal-order test, seems it isn't ready to run, not sure 
if I can reproduce your this issue.


-David
----
*From:* Lionel Landwerlin 
*Sent:* Tuesday, September 17, 2019 7:03 PM
*To:* dri-devel@lists.freedesktop.org 
*Cc:* Lionel Landwerlin ; Zhou, 
David(ChunMing) ; Koenig, Christian 
; Jason Ekstrand 

*Subject:* [PATCH 1/1] drm/syncobj: add sideband payload
The Vulkan timeline semaphores allow signaling to happen on the point
of the timeline without all of the its dependencies to be created.

The current 2 implementations (AMD/Intel) of the Vulkan spec on top of
the Linux kernel are using a thread to wait on the dependencies of a
given point to materialize and delay actual submission to the kernel
driver until the wait completes.

If a binary semaphore is submitted for signaling along the side of a
timeline semaphore waiting for completion that means that the drm
syncobj associated with that binary semaphore will not have a DMA
fence associated with it by the time vkQueueSubmit() returns. This and
the fact that a binary semaphore can be signaled and unsignaled as
before its DMA fences materialize mean that we cannot just rely on the
fence within the syncobj but we also need a sideband payload verifying
that the fence in the syncobj matches the last submission from the
Vulkan API point of view.

This change adds a sideband payload that is incremented with signaled
syncobj when vkQueueSubmit() is called. The next vkQueueSubmit()
waiting on a the syncobj will read the sideband payload and wait for a
fence chain element with a seqno superior or equal to the sideband
payload value to be added into the fence chain and use that fence to
trigger the submission on the kernel driver.

v2: Use a separate ioctl to get/set the sideband value (Christian)

v3: Use 2 ioctls for get/set (Christian)

v4: Use a single new ioctl

v5: a bunch of blattant mistakes
    Store payload atomically (Chris)

v6: Only touch atomic value once (Jason)

v7: Updated atomic value when importing sync file

Signed-off-by: Lionel Landwerlin 
Reviewed-by: David Zhou  (v6)
Cc: Christian Koenig 
Cc: Jason Ekstrand 
Cc: David(ChunMing) Zhou 
---
 drivers/gpu/drm/drm_internal.h |  2 ++
 drivers/gpu/drm/drm_ioctl.c    |  3 ++
 drivers/gpu/drm/drm_syncobj.c  | 64 --
 include/drm/drm_syncobj.h  |  9 +
 include/uapi/drm/drm.h | 17 +
 5 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_internal.h 
b/drivers/gpu/drm/drm_internal.h

index 51a2055c8f18..e297dfd85019 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -208,6 +208,8 @@ int drm_syncobj_timeline_signal_ioctl(struct 
drm_device *dev, void *data,

   struct drm_file *file_private);
 int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_private);
+int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data,
+    struct drm_file *file_private);

 /* drm_framebuffer.c */
 void drm_framebuffer_print_info(struct drm_printer *p, unsigned int 
indent,

diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index f675a3bb2c88..644d0bc800a4 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -703,6 +703,9 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
   DRM_RENDER_ALLOW),
 DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
   DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_BINARY, drm_syncobj_binary_ioctl,
+ DRM_RENDER_ALLOW),
+
 DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, 
drm_crtc_get_sequence_ioctl, 0),
 DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, 
drm_crt

Re: [PATCH] drm/drm_syncobj: Dead code removal

2019-10-04 Thread Lionel Landwerlin

On 04/10/2019 15:16, Zbigniew Kempczyński wrote:

Remove dead code, likely overseened during review process.

Signed-off-by: Zbigniew Kempczyński 
Cc: Chunming Zhou 
Cc: Daniel Vetter 
Cc: Jason Ekstrand 
---
  drivers/gpu/drm/drm_syncobj.c | 4 
  1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 4b5c7b0ed714..21a22e39c9fa 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -192,8 +192,6 @@ static void drm_syncobj_fence_add_wait(struct drm_syncobj 
*syncobj,
if (!fence || dma_fence_chain_find_seqno(&fence, wait->point)) {
dma_fence_put(fence);
list_add_tail(&wait->node, &syncobj->cb_list);
-   } else if (!fence) {
-   wait->fence = dma_fence_get_stub();
} else {
wait->fence = fence;
}
@@ -856,8 +854,6 @@ static void syncobj_wait_syncobj_func(struct drm_syncobj 
*syncobj,
if (!fence || dma_fence_chain_find_seqno(&fence, wait->point)) {
dma_fence_put(fence);
return;
-   } else if (!fence) {
-   wait->fence = dma_fence_get_stub();
} else {
wait->fence = fence;
}


Like Chris said, dma_fence_chain_find_seqno() will update the fence 
pointer, so a subsequent check might not be dealing with the same value.


A bit cheeky, but...


-Lionel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/1] drm/syncobj: add sideband payload

2019-09-17 Thread Lionel Landwerlin

Thanks David,

I'll try to fix the test to match AMD's restrictions.

The v7 here was to fix another existing test : 
dEQP-VK.api.external.fence.sync_fd.transference_temporary


Cheers,

-Lionel

On 17/09/2019 15:36, Zhou, David(ChunMing) wrote:

Hi Lionel,
The update looks good to me.
I tried your signal-order test, seems it isn't ready to run, not sure 
if I can reproduce your this issue.


-David
----
*From:* Lionel Landwerlin 
*Sent:* Tuesday, September 17, 2019 7:03 PM
*To:* dri-devel@lists.freedesktop.org 
*Cc:* Lionel Landwerlin ; Zhou, 
David(ChunMing) ; Koenig, Christian 
; Jason Ekstrand 

*Subject:* [PATCH 1/1] drm/syncobj: add sideband payload
The Vulkan timeline semaphores allow signaling to happen on the point
of the timeline without all of the its dependencies to be created.

The current 2 implementations (AMD/Intel) of the Vulkan spec on top of
the Linux kernel are using a thread to wait on the dependencies of a
given point to materialize and delay actual submission to the kernel
driver until the wait completes.

If a binary semaphore is submitted for signaling along the side of a
timeline semaphore waiting for completion that means that the drm
syncobj associated with that binary semaphore will not have a DMA
fence associated with it by the time vkQueueSubmit() returns. This and
the fact that a binary semaphore can be signaled and unsignaled as
before its DMA fences materialize mean that we cannot just rely on the
fence within the syncobj but we also need a sideband payload verifying
that the fence in the syncobj matches the last submission from the
Vulkan API point of view.

This change adds a sideband payload that is incremented with signaled
syncobj when vkQueueSubmit() is called. The next vkQueueSubmit()
waiting on a the syncobj will read the sideband payload and wait for a
fence chain element with a seqno superior or equal to the sideband
payload value to be added into the fence chain and use that fence to
trigger the submission on the kernel driver.

v2: Use a separate ioctl to get/set the sideband value (Christian)

v3: Use 2 ioctls for get/set (Christian)

v4: Use a single new ioctl

v5: a bunch of blattant mistakes
    Store payload atomically (Chris)

v6: Only touch atomic value once (Jason)

v7: Updated atomic value when importing sync file

Signed-off-by: Lionel Landwerlin 
Reviewed-by: David Zhou  (v6)
Cc: Christian Koenig 
Cc: Jason Ekstrand 
Cc: David(ChunMing) Zhou 
---
 drivers/gpu/drm/drm_internal.h |  2 ++
 drivers/gpu/drm/drm_ioctl.c    |  3 ++
 drivers/gpu/drm/drm_syncobj.c  | 64 --
 include/drm/drm_syncobj.h  |  9 +
 include/uapi/drm/drm.h | 17 +
 5 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_internal.h 
b/drivers/gpu/drm/drm_internal.h

index 51a2055c8f18..e297dfd85019 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -208,6 +208,8 @@ int drm_syncobj_timeline_signal_ioctl(struct 
drm_device *dev, void *data,

   struct drm_file *file_private);
 int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_private);
+int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data,
+    struct drm_file *file_private);

 /* drm_framebuffer.c */
 void drm_framebuffer_print_info(struct drm_printer *p, unsigned int 
indent,

diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index f675a3bb2c88..644d0bc800a4 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -703,6 +703,9 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
   DRM_RENDER_ALLOW),
 DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
   DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_BINARY, drm_syncobj_binary_ioctl,
+ DRM_RENDER_ALLOW),
+
 DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, 
drm_crtc_get_sequence_ioctl, 0),
 DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, 
drm_crtc_queue_sequence_ioctl, 0),
 DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, 
drm_mode_create_lease_ioctl, DRM_MASTER),

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 4b5c7b0ed714..2de8f1380890 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -418,8 +418,10 @@ int drm_syncobj_create(struct drm_syncobj 
**out_syncobj, uint32_t flags,

 if (flags & DRM_SYNCOBJ_CREATE_SIGNALED)
 drm_syncobj_assign_null_handle(syncobj);

-   if (fence)
+   if (fence) {
 drm_syncobj_replace_fence(syncobj, fence);
+ atomic64_set(&syncobj->binary_payload, fence->seqno);
+   }

 *out_syncobj = syncobj;
 return 0;
@

[PATCH 1/1] drm/syncobj: add sideband payload

2019-09-17 Thread Lionel Landwerlin
The Vulkan timeline semaphores allow signaling to happen on the point
of the timeline without all of the its dependencies to be created.

The current 2 implementations (AMD/Intel) of the Vulkan spec on top of
the Linux kernel are using a thread to wait on the dependencies of a
given point to materialize and delay actual submission to the kernel
driver until the wait completes.

If a binary semaphore is submitted for signaling along the side of a
timeline semaphore waiting for completion that means that the drm
syncobj associated with that binary semaphore will not have a DMA
fence associated with it by the time vkQueueSubmit() returns. This and
the fact that a binary semaphore can be signaled and unsignaled as
before its DMA fences materialize mean that we cannot just rely on the
fence within the syncobj but we also need a sideband payload verifying
that the fence in the syncobj matches the last submission from the
Vulkan API point of view.

This change adds a sideband payload that is incremented with signaled
syncobj when vkQueueSubmit() is called. The next vkQueueSubmit()
waiting on a the syncobj will read the sideband payload and wait for a
fence chain element with a seqno superior or equal to the sideband
payload value to be added into the fence chain and use that fence to
trigger the submission on the kernel driver.

v2: Use a separate ioctl to get/set the sideband value (Christian)

v3: Use 2 ioctls for get/set (Christian)

v4: Use a single new ioctl

v5: a bunch of blattant mistakes
Store payload atomically (Chris)

v6: Only touch atomic value once (Jason)

v7: Updated atomic value when importing sync file

Signed-off-by: Lionel Landwerlin 
Reviewed-by: David Zhou  (v6)
Cc: Christian Koenig 
Cc: Jason Ekstrand 
Cc: David(ChunMing) Zhou 
---
 drivers/gpu/drm/drm_internal.h |  2 ++
 drivers/gpu/drm/drm_ioctl.c|  3 ++
 drivers/gpu/drm/drm_syncobj.c  | 64 --
 include/drm/drm_syncobj.h  |  9 +
 include/uapi/drm/drm.h | 17 +
 5 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 51a2055c8f18..e297dfd85019 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -208,6 +208,8 @@ int drm_syncobj_timeline_signal_ioctl(struct drm_device 
*dev, void *data,
  struct drm_file *file_private);
 int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private);
+int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data,
+struct drm_file *file_private);
 
 /* drm_framebuffer.c */
 void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index f675a3bb2c88..644d0bc800a4 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -703,6 +703,9 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
  DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_BINARY, drm_syncobj_binary_ioctl,
+ DRM_RENDER_ALLOW),
+
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 
0),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, 
drm_crtc_queue_sequence_ioctl, 0),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, 
DRM_MASTER),
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 4b5c7b0ed714..2de8f1380890 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -418,8 +418,10 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, 
uint32_t flags,
if (flags & DRM_SYNCOBJ_CREATE_SIGNALED)
drm_syncobj_assign_null_handle(syncobj);
 
-   if (fence)
+   if (fence) {
drm_syncobj_replace_fence(syncobj, fence);
+   atomic64_set(&syncobj->binary_payload, fence->seqno);
+   }
 
*out_syncobj = syncobj;
return 0;
@@ -604,6 +606,7 @@ static int drm_syncobj_import_sync_file_fence(struct 
drm_file *file_private,
}
 
drm_syncobj_replace_fence(syncobj, fence);
+   atomic64_set(&syncobj->binary_payload, fence->seqno);
dma_fence_put(fence);
drm_syncobj_put(syncobj);
return 0;
@@ -1224,8 +1227,10 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void 
*data,
if (ret < 0)
return ret;
 
-   for (i = 0; i < args->count_handles; i++)
+   for (i = 0; i < args->count_handles; i++) {
drm_syncobj_replace_fence(syncobjs[i], NULL);
+   atomic64_set(&syncobjs[i]->binary_payload, 0);
+   }
 
drm_syncobj_array_free(syncobjs, args->count_handles);
 
@

[PATCH 0/1] drm/syncobj: add sideband payload

2019-09-17 Thread Lionel Landwerlin
Hi all,

Just explaining what is being changed here compared to v6 :

We just noticed that some of our CTS runs are flaky because when
importing a dma fence into a drm syncobj we do not update the atomic
binary payload. This leads to issues when the userspace drivers tries
to add new points to the timeline because the atomic binary payload
may then have a value inferior to the seqno of the new installed
fence.

Cheers,

Lionel Landwerlin (1):
  drm/syncobj: add sideband payload

 drivers/gpu/drm/drm_internal.h |  2 ++
 drivers/gpu/drm/drm_ioctl.c|  3 ++
 drivers/gpu/drm/drm_syncobj.c  | 64 --
 include/drm/drm_syncobj.h  |  9 +
 include/uapi/drm/drm.h | 17 +
 5 files changed, 93 insertions(+), 2 deletions(-)

--
2.23.0
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/syncobj: Add documentation for timeline syncobj

2019-08-27 Thread Lionel Landwerlin

On 27/08/2019 19:27, Daniel Vetter wrote:

On Mon, Aug 26, 2019 at 07:30:08AM +0300, Lionel Landwerlin wrote:

On 26/08/2019 00:01, Daniel Vetter wrote:

On Fri, Aug 23, 2019 at 8:53 PM Jason Ekstrand  wrote:

On Thu, Aug 22, 2019 at 5:28 PM Lionel Landwerlin 
 wrote:

On 22/08/2019 21:24, Jason Ekstrand wrote:

On Thu, Aug 22, 2019 at 9:55 AM Lionel Landwerlin 
 wrote:

We've added a set of new APIs to manipulate syncobjs holding timelines
of dma_fence. This adds a bit of documentation about how this works.

Signed-off-by: Lionel Landwerlin 
Cc: Christian Koenig 
Cc: Jason Ekstrand 
Cc: David(ChunMing) Zhou 
---
   drivers/gpu/drm/drm_syncobj.c | 87 +--
   1 file changed, 74 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index b5ad73330a48..32ffded6d2c0 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -43,27 +43,66 @@
*  - Signal a syncobj (set a trivially signaled fence)
*  - Wait for a syncobj's fence to appear and be signaled
*
+ * The syncobj userspace API also provides operations to manipulate a syncobj
+ * in terms of a timeline of struct &dma_fence rather than a single struct
+ * &dma_fence, through the following operations:
+ *
+ *   - Signal a given point on the timeline
+ *   - Wait for a given point to appear and/or be signaled
+ *   - Import and export from/to a given point of a timeline
+ *
* At it's core, a syncobj is simply a wrapper around a pointer to a struct
* &dma_fence which may be NULL.
* When a syncobj is first created, its pointer is either NULL or a pointer
* to an already signaled fence depending on whether the
* &DRM_SYNCOBJ_CREATE_SIGNALED flag is passed to
* &DRM_IOCTL_SYNCOBJ_CREATE.
- * When GPU work which signals a syncobj is enqueued in a DRM driver,
- * the syncobj fence is replaced with a fence which will be signaled by the
- * completion of that work.
- * When GPU work which waits on a syncobj is enqueued in a DRM driver, the
- * driver retrieves syncobj's current fence at the time the work is enqueued
- * waits on that fence before submitting the work to hardware.
- * If the syncobj's fence is NULL, the enqueue operation is expected to fail.
- * All manipulation of the syncobjs's fence happens in terms of the current
- * fence at the time the ioctl is called by userspace regardless of whether
- * that operation is an immediate host-side operation (signal or reset) or
- * or an operation which is enqueued in some driver queue.
- * &DRM_IOCTL_SYNCOBJ_RESET and &DRM_IOCTL_SYNCOBJ_SIGNAL can be used to
- * manipulate a syncobj from the host by resetting its pointer to NULL or
+ *
+ * If the syncobj is considered as a binary (signal/unsignaled) primitive,

What does "considered as a binary" mean?  Is it an inherent property of the 
syncobj given at create time?  Is it a state the syncobj can be in?  Or is it a property 
of how the submit ioctl in the DRM driver references it?  I'm really hoping it's either 1 
or 3


3: you either use it binary/legacy apis, or timeline apis. timeline apis also 
provide some binary compatibility with the point 0 (in particular for wait).

Right.  Maybe we should say something like  "When GPU work is enqueued which signals 
a non-zero time point" or something like that?  I guess that implies a certain 
unification across drivers that maybe we don't want

[Just jumping in on this comment here]

I thought the point of syncobj is that you can share them across
drivers (not just within drivers)? Otherwise not much sense in the
common infrastructure. Hence I'd say we should spec all these things.
Concern from someone who's seen way too many cross-driver apis that
turned out the be decidedly cross-driver than planned ...


The sharing of a timeline semaphore/syncobj between 2 apps/drivers implies
that they both know they're dealing with a timeline semaphore.

I see that at the same level as sharing a file descriptor and knowing it
represents a syncfd or a syncobj.

There has to be some kind of understanding, otherwise nothing works.


If the shared semantic between the 2 clients is a binary (signal/unsignaled)
semaphore, then both drivers should share the existing syncobj type, that is
a syncobj that will only ever contain a single dma-fence.

You can build that out of the timeline by exporting a particular point into
another syncobj (transfer ioctl).

Oh this is just stating that apps need to agree on old syncobj or timeline
syncobj mode? I guess if it's all there is that should be a given, still
worth maybe putting in words.
-Daniel



Thanks, there is a note a bit further down in this patch.

It was worded along the lines of with a semaphore within single app, but 
it applies to shared semaphores too.



-Lionel







-Lionel


Cheers, Daniel



+ * when GPU work is

[PATCH 2/3] drm/amd/amdgpu: disallow replacing fences in timeline syncobjs

2019-08-26 Thread Lionel Landwerlin
Similarly to the host path from drm_syncobj.c we would like to
disallow those operations to help applications figure where they using
the wrong kind of ioctl.

Signed-off-by: Lionel Landwerlin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 2e53feed40e2..d9bbc31e97d0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1159,6 +1159,8 @@ static int amdgpu_cs_process_syncobj_out_dep(struct 
amdgpu_cs_parser *p,
drm_syncobj_find(p->filp, deps[i].handle);
if (!p->post_deps[i].syncobj)
return -EINVAL;
+   if (p->post_deps[i].syncobj->is_timeline)
+   return -EINVAL;
p->post_deps[i].chain = NULL;
p->post_deps[i].point = 0;
p->num_post_deps++;
-- 
2.23.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 1/3] drm/syncobj: protect timeline syncobjs

2019-08-26 Thread Lionel Landwerlin
Binary/legacy signal operations on a syncobj work by replacing the
dma_fence held within the syncobj. Whe dealing with timeline
semaphores we would like to avoid this as this would effectivelly lead
to looser synchronization (by discarding the dma_fence_chain mechanism
waiting on all previous dma_fence to signal before signal itself).

This change adds a flags that can be used at creation of the syncobj
to mean that the syncobj will hold a timeline of dma_fence (using
dma_fence_chain). When flagged as such, the dma_fence held by the
syncobj should not be replaced but instead we should always adding to
the timeline.

Signed-off-by: Lionel Landwerlin 
---
 drivers/gpu/drm/drm_syncobj.c | 30 +-
 include/drm/drm_syncobj.h |  8 
 include/uapi/drm/drm.h|  1 +
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 72d083acd388..69d43c791a42 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -476,6 +476,8 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, 
uint32_t flags,
 
if (flags & DRM_SYNCOBJ_CREATE_SIGNALED)
drm_syncobj_assign_null_handle(syncobj);
+   if (flags & DRM_SYNCOBJ_CREATE_TIMELINE)
+   syncobj->is_timeline = true;
 
if (fence)
drm_syncobj_replace_fence(syncobj, fence);
@@ -661,6 +663,10 @@ static int drm_syncobj_import_sync_file_fence(struct 
drm_file *file_private,
dma_fence_put(fence);
return -ENOENT;
}
+   if (syncobj->is_timeline) {
+   dma_fence_put(fence);
+   return -EINVAL;
+   }
 
drm_syncobj_replace_fence(syncobj, fence);
dma_fence_put(fence);
@@ -749,7 +755,13 @@ drm_syncobj_create_ioctl(struct drm_device *dev, void 
*data,
return -EOPNOTSUPP;
 
/* no valid flags yet */
-   if (args->flags & ~DRM_SYNCOBJ_CREATE_SIGNALED)
+   if (args->flags & ~(DRM_SYNCOBJ_CREATE_SIGNALED |
+   DRM_SYNCOBJ_CREATE_TIMELINE))
+   return -EINVAL;
+
+   /* Creating a signaled timeline makes no sense. */
+   if ((args->flags & DRM_SYNCOBJ_CREATE_SIGNALED) &&
+   (args->flags & DRM_SYNCOBJ_CREATE_TIMELINE))
return -EINVAL;
 
return drm_syncobj_create_as_handle(file_private,
@@ -862,6 +874,10 @@ drm_syncobj_transfer_to_binary(struct drm_file 
*file_private,
binary_syncobj = drm_syncobj_find(file_private, args->dst_handle);
if (!binary_syncobj)
return -ENOENT;
+   if (binary_syncobj->is_timeline) {
+   ret = -EINVAL;
+   goto err;
+   }
ret = drm_syncobj_find_fence(file_private, args->src_handle,
 args->src_point, args->flags, &fence);
if (ret)
@@ -1137,6 +1153,7 @@ static int drm_syncobj_array_wait(struct drm_device *dev,
 static int drm_syncobj_array_find(struct drm_file *file_private,
  void __user *user_handles,
  uint32_t count_handles,
+ bool no_timeline,
  struct drm_syncobj ***syncobjs_out)
 {
uint32_t i, *handles;
@@ -1165,6 +1182,10 @@ static int drm_syncobj_array_find(struct drm_file 
*file_private,
ret = -ENOENT;
goto err_put_syncobjs;
}
+   if (no_timeline && syncobjs[i]->is_timeline) {
+   ret = -EINVAL;
+   goto err_put_syncobjs;
+   }
}
 
kfree(handles);
@@ -1211,6 +1232,7 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
ret = drm_syncobj_array_find(file_private,
 u64_to_user_ptr(args->handles),
 args->count_handles,
+false,
 &syncobjs);
if (ret < 0)
return ret;
@@ -1245,6 +1267,7 @@ drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, 
void *data,
ret = drm_syncobj_array_find(file_private,
 u64_to_user_ptr(args->handles),
 args->count_handles,
+false,
 &syncobjs);
if (ret < 0)
return ret;
@@ -1279,6 +1302,7 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void 
*data,
ret = drm_syncobj_array_find(file_private,
 u64_to_user_ptr(args->handles),
 args->count_handles,
+false,

[PATCH 3/3] drm/i915: disallow replacing fences of timeline syncobjs

2019-08-26 Thread Lionel Landwerlin
Signed-off-by: Lionel Landwerlin 
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 09248398fa7b..f1af3490f96b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2494,6 +2494,14 @@ get_legacy_fence_array(struct i915_execbuffer *eb,
goto err;
}
 
+   if ((user_fence.flags & I915_EXEC_FENCE_SIGNAL) &&
+   syncobj->is_timeline) {
+   DRM_DEBUG("Cannot replace fence in timeline syncobj\n");
+   drm_syncobj_put(syncobj);
+   err = -EINVAL;
+   goto err;
+   }
+
if (user_fence.flags & I915_EXEC_FENCE_WAIT) {
fence = drm_syncobj_fence_get(syncobj);
if (!fence) {
-- 
2.23.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 0/3] drm/syncobj: add protection against timeline resets

2019-08-26 Thread Lionel Landwerlin
Hi all,

Following Jason's suggestion on another thread adding timeline
documentation [1], here is a small series adding a creation flag to
syncobjs so that users are prevented to drop the existing timeline
fences in the syncobj, effectivelly ensuring a user always adds to the
dma_fence_chain instead of replacing it.

We still allow explicit reset.

Apart from the fact we need to enforce this policy in each driver's
submission path, I haven't run into odds things yet.

Cheers,

[1] : https://lists.freedesktop.org/archives/dri-devel/2019-August/232700.html

Lionel Landwerlin (3):
  drm/syncobj: protect timeline syncobjs
  drm/amd/amdgpu: disallow replacing fences in timeline syncobjs
  drm/i915: disallow replacing fences of timeline syncobjs

 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c|  2 ++
 drivers/gpu/drm/drm_syncobj.c | 30 ++-
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c|  8 +
 include/drm/drm_syncobj.h |  8 +
 include/uapi/drm/drm.h|  1 +
 5 files changed, 48 insertions(+), 1 deletion(-)

--
2.23.0
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/syncobj: Add documentation for timeline syncobj

2019-08-25 Thread Lionel Landwerlin

On 26/08/2019 00:01, Daniel Vetter wrote:

On Fri, Aug 23, 2019 at 8:53 PM Jason Ekstrand  wrote:


On Thu, Aug 22, 2019 at 5:28 PM Lionel Landwerlin 
 wrote:

On 22/08/2019 21:24, Jason Ekstrand wrote:

On Thu, Aug 22, 2019 at 9:55 AM Lionel Landwerlin 
 wrote:

We've added a set of new APIs to manipulate syncobjs holding timelines
of dma_fence. This adds a bit of documentation about how this works.

Signed-off-by: Lionel Landwerlin 
Cc: Christian Koenig 
Cc: Jason Ekstrand 
Cc: David(ChunMing) Zhou 
---
  drivers/gpu/drm/drm_syncobj.c | 87 +--
  1 file changed, 74 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index b5ad73330a48..32ffded6d2c0 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -43,27 +43,66 @@
   *  - Signal a syncobj (set a trivially signaled fence)
   *  - Wait for a syncobj's fence to appear and be signaled
   *
+ * The syncobj userspace API also provides operations to manipulate a syncobj
+ * in terms of a timeline of struct &dma_fence rather than a single struct
+ * &dma_fence, through the following operations:
+ *
+ *   - Signal a given point on the timeline
+ *   - Wait for a given point to appear and/or be signaled
+ *   - Import and export from/to a given point of a timeline
+ *
   * At it's core, a syncobj is simply a wrapper around a pointer to a struct
   * &dma_fence which may be NULL.
   * When a syncobj is first created, its pointer is either NULL or a pointer
   * to an already signaled fence depending on whether the
   * &DRM_SYNCOBJ_CREATE_SIGNALED flag is passed to
   * &DRM_IOCTL_SYNCOBJ_CREATE.
- * When GPU work which signals a syncobj is enqueued in a DRM driver,
- * the syncobj fence is replaced with a fence which will be signaled by the
- * completion of that work.
- * When GPU work which waits on a syncobj is enqueued in a DRM driver, the
- * driver retrieves syncobj's current fence at the time the work is enqueued
- * waits on that fence before submitting the work to hardware.
- * If the syncobj's fence is NULL, the enqueue operation is expected to fail.
- * All manipulation of the syncobjs's fence happens in terms of the current
- * fence at the time the ioctl is called by userspace regardless of whether
- * that operation is an immediate host-side operation (signal or reset) or
- * or an operation which is enqueued in some driver queue.
- * &DRM_IOCTL_SYNCOBJ_RESET and &DRM_IOCTL_SYNCOBJ_SIGNAL can be used to
- * manipulate a syncobj from the host by resetting its pointer to NULL or
+ *
+ * If the syncobj is considered as a binary (signal/unsignaled) primitive,


What does "considered as a binary" mean?  Is it an inherent property of the 
syncobj given at create time?  Is it a state the syncobj can be in?  Or is it a property 
of how the submit ioctl in the DRM driver references it?  I'm really hoping it's either 1 
or 3


3: you either use it binary/legacy apis, or timeline apis. timeline apis also 
provide some binary compatibility with the point 0 (in particular for wait).


Right.  Maybe we should say something like  "When GPU work is enqueued which signals 
a non-zero time point" or something like that?  I guess that implies a certain 
unification across drivers that maybe we don't want

[Just jumping in on this comment here]

I thought the point of syncobj is that you can share them across
drivers (not just within drivers)? Otherwise not much sense in the
common infrastructure. Hence I'd say we should spec all these things.
Concern from someone who's seen way too many cross-driver apis that
turned out the be decidedly cross-driver than planned ...



The sharing of a timeline semaphore/syncobj between 2 apps/drivers 
implies that they both know they're dealing with a timeline semaphore.


I see that at the same level as sharing a file descriptor and knowing it 
represents a syncfd or a syncobj.


There has to be some kind of understanding, otherwise nothing works.


If the shared semantic between the 2 clients is a binary 
(signal/unsignaled) semaphore, then both drivers should share the 
existing syncobj type, that is a syncobj that will only ever contain a 
single dma-fence.


You can build that out of the timeline by exporting a particular point 
into another syncobj (transfer ioctl).



-Lionel



Cheers, Daniel





+ * when GPU work is enqueued in a DRM driver to signal the syncobj, the fence
+ * is replaced with a fence which will be signaled by the completion of that
+ * work.
+ * If the syncobj is considered as a timeline primitive, when GPU work is
+ * enqueued in a DRM driver to signal the a given point of the syncobj, a new
+ * struct &dma_fence_chain pointing to the DRM driver's fence and also
+ * pointing to the previous fence that was in the syncobj. The new struct
+ * &dma_fence_chain fen

Re: [PATCH] drm/syncobj: Add documentation for timeline syncobj

2019-08-22 Thread Lionel Landwerlin

On 22/08/2019 21:24, Jason Ekstrand wrote:
On Thu, Aug 22, 2019 at 9:55 AM Lionel Landwerlin 
mailto:lionel.g.landwer...@intel.com>> 
wrote:


We've added a set of new APIs to manipulate syncobjs holding timelines
of dma_fence. This adds a bit of documentation about how this works.

Signed-off-by: Lionel Landwerlin mailto:lionel.g.landwer...@intel.com>>
Cc: Christian Koenig mailto:christian.koe...@amd.com>>
Cc: Jason Ekstrand mailto:ja...@jlekstrand.net>>
Cc: David(ChunMing) Zhou mailto:david1.z...@amd.com>>
---
 drivers/gpu/drm/drm_syncobj.c | 87
+--
 1 file changed, 74 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c
b/drivers/gpu/drm/drm_syncobj.c
index b5ad73330a48..32ffded6d2c0 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -43,27 +43,66 @@
  *  - Signal a syncobj (set a trivially signaled fence)
  *  - Wait for a syncobj's fence to appear and be signaled
  *
+ * The syncobj userspace API also provides operations to
manipulate a syncobj
+ * in terms of a timeline of struct &dma_fence rather than a
single struct
+ * &dma_fence, through the following operations:
+ *
+ *   - Signal a given point on the timeline
+ *   - Wait for a given point to appear and/or be signaled
+ *   - Import and export from/to a given point of a timeline
+ *
  * At it's core, a syncobj is simply a wrapper around a pointer
to a struct
  * &dma_fence which may be NULL.
  * When a syncobj is first created, its pointer is either NULL or
a pointer
  * to an already signaled fence depending on whether the
  * &DRM_SYNCOBJ_CREATE_SIGNALED flag is passed to
  * &DRM_IOCTL_SYNCOBJ_CREATE.
- * When GPU work which signals a syncobj is enqueued in a DRM driver,
- * the syncobj fence is replaced with a fence which will be
signaled by the
- * completion of that work.
- * When GPU work which waits on a syncobj is enqueued in a DRM
driver, the
- * driver retrieves syncobj's current fence at the time the work
is enqueued
- * waits on that fence before submitting the work to hardware.
- * If the syncobj's fence is NULL, the enqueue operation is
expected to fail.
- * All manipulation of the syncobjs's fence happens in terms of
the current
- * fence at the time the ioctl is called by userspace regardless
of whether
- * that operation is an immediate host-side operation (signal or
reset) or
- * or an operation which is enqueued in some driver queue.
- * &DRM_IOCTL_SYNCOBJ_RESET and &DRM_IOCTL_SYNCOBJ_SIGNAL can be
used to
- * manipulate a syncobj from the host by resetting its pointer to
NULL or
+ *
+ * If the syncobj is considered as a binary (signal/unsignaled)
primitive,


What does "considered as a binary" mean?  Is it an inherent property 
of the syncobj given at create time?  Is it a state the syncobj can be 
in?  Or is it a property of how the submit ioctl in the DRM driver 
references it?  I'm really hoping it's either 1 or 3



3: you either use it binary/legacy apis, or timeline apis. timeline apis 
also provide some binary compatibility with the point 0 (in particular 
for wait).




+ * when GPU work is enqueued in a DRM driver to signal the
syncobj, the fence
+ * is replaced with a fence which will be signaled by the
completion of that
+ * work.
+ * If the syncobj is considered as a timeline primitive, when GPU
work is
+ * enqueued in a DRM driver to signal the a given point of the
syncobj, a new
+ * struct &dma_fence_chain pointing to the DRM driver's fence and
also
+ * pointing to the previous fence that was in the syncobj. The
new struct
+ * &dma_fence_chain fence put into the syncobj will be signaled
by completion
+ * of the DRM driver's work and also any work associated with the
fence
+ * previously in the syncobj.
+ *
+ * When GPU work which waits on a syncobj is enqueued in a DRM
driver, at the
+ * time the work is enqueued, it waits on the fence coming from
the syncobj
+ * before submitting the work to hardware. That fence is either :
+ *
+ *    - The syncobj's current fence if the syncobj is considered
as a binary
+ *      primitive.
+ *    - The struct &dma_fence associated with a given point if
the syncobj is
+ *      considered as a timeline primitive.
+ *
+ * If the syncobj's fence is NULL or not present in the syncobj's
timeline,
+ * the enqueue operation is expected to fail.
+ *
+ * With binary syncobj, all manipulation of the syncobjs's fence
happens in
+ * terms of t

[PATCH] drm/syncobj: Add documentation for timeline syncobj

2019-08-22 Thread Lionel Landwerlin
We've added a set of new APIs to manipulate syncobjs holding timelines
of dma_fence. This adds a bit of documentation about how this works.

Signed-off-by: Lionel Landwerlin 
Cc: Christian Koenig 
Cc: Jason Ekstrand 
Cc: David(ChunMing) Zhou 
---
 drivers/gpu/drm/drm_syncobj.c | 87 +--
 1 file changed, 74 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index b5ad73330a48..32ffded6d2c0 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -43,27 +43,66 @@
  *  - Signal a syncobj (set a trivially signaled fence)
  *  - Wait for a syncobj's fence to appear and be signaled
  *
+ * The syncobj userspace API also provides operations to manipulate a syncobj
+ * in terms of a timeline of struct &dma_fence rather than a single struct
+ * &dma_fence, through the following operations:
+ *
+ *   - Signal a given point on the timeline
+ *   - Wait for a given point to appear and/or be signaled
+ *   - Import and export from/to a given point of a timeline
+ *
  * At it's core, a syncobj is simply a wrapper around a pointer to a struct
  * &dma_fence which may be NULL.
  * When a syncobj is first created, its pointer is either NULL or a pointer
  * to an already signaled fence depending on whether the
  * &DRM_SYNCOBJ_CREATE_SIGNALED flag is passed to
  * &DRM_IOCTL_SYNCOBJ_CREATE.
- * When GPU work which signals a syncobj is enqueued in a DRM driver,
- * the syncobj fence is replaced with a fence which will be signaled by the
- * completion of that work.
- * When GPU work which waits on a syncobj is enqueued in a DRM driver, the
- * driver retrieves syncobj's current fence at the time the work is enqueued
- * waits on that fence before submitting the work to hardware.
- * If the syncobj's fence is NULL, the enqueue operation is expected to fail.
- * All manipulation of the syncobjs's fence happens in terms of the current
- * fence at the time the ioctl is called by userspace regardless of whether
- * that operation is an immediate host-side operation (signal or reset) or
- * or an operation which is enqueued in some driver queue.
- * &DRM_IOCTL_SYNCOBJ_RESET and &DRM_IOCTL_SYNCOBJ_SIGNAL can be used to
- * manipulate a syncobj from the host by resetting its pointer to NULL or
+ *
+ * If the syncobj is considered as a binary (signal/unsignaled) primitive,
+ * when GPU work is enqueued in a DRM driver to signal the syncobj, the fence
+ * is replaced with a fence which will be signaled by the completion of that
+ * work.
+ * If the syncobj is considered as a timeline primitive, when GPU work is
+ * enqueued in a DRM driver to signal the a given point of the syncobj, a new
+ * struct &dma_fence_chain pointing to the DRM driver's fence and also
+ * pointing to the previous fence that was in the syncobj. The new struct
+ * &dma_fence_chain fence put into the syncobj will be signaled by completion
+ * of the DRM driver's work and also any work associated with the fence
+ * previously in the syncobj.
+ *
+ * When GPU work which waits on a syncobj is enqueued in a DRM driver, at the
+ * time the work is enqueued, it waits on the fence coming from the syncobj
+ * before submitting the work to hardware. That fence is either :
+ *
+ *- The syncobj's current fence if the syncobj is considered as a binary
+ *  primitive.
+ *- The struct &dma_fence associated with a given point if the syncobj is
+ *  considered as a timeline primitive.
+ *
+ * If the syncobj's fence is NULL or not present in the syncobj's timeline,
+ * the enqueue operation is expected to fail.
+ *
+ * With binary syncobj, all manipulation of the syncobjs's fence happens in
+ * terms of the current fence at the time the ioctl is called by userspace
+ * regardless of whether that operation is an immediate host-side operation
+ * (signal or reset) or or an operation which is enqueued in some driver
+ * queue. &DRM_IOCTL_SYNCOBJ_RESET and &DRM_IOCTL_SYNCOBJ_SIGNAL can be used
+ * to manipulate a syncobj from the host by resetting its pointer to NULL or
  * setting its pointer to a fence which is already signaled.
  *
+ * With timeline syncobj, all manipulation of the timeline fences happens in
+ * terms of the fence referred to in the timeline. See
+ * dma_fence_chain_find_seqno() to see how a given point is found in the
+ * timeline.
+ *
+ * Note that applications should be careful to always use timeline set of
+ * ioctl() when dealing with syncobj considered as timeline. Using a binary
+ * set of ioctl() with a syncobj considered as timeline could result incorrect
+ * synchronization. The use of binary syncobj is supported through the
+ * timeline set of ioctl() by using a point value of 0, this will reproduce
+ * the behavior of the binary set of ioctl() (for example replace the
+ * syncobj's fence when signaling).
+ *
  *

[PATCH v6] drm/syncobj: add sideband payload

2019-08-22 Thread Lionel Landwerlin
The Vulkan timeline semaphores allow signaling to happen on the point
of the timeline without all of the its dependencies to be created.

The current 2 implementations (AMD/Intel) of the Vulkan spec on top of
the Linux kernel are using a thread to wait on the dependencies of a
given point to materialize and delay actual submission to the kernel
driver until the wait completes.

If a binary semaphore is submitted for signaling along the side of a
timeline semaphore waiting for completion that means that the drm
syncobj associated with that binary semaphore will not have a DMA
fence associated with it by the time vkQueueSubmit() returns. This and
the fact that a binary semaphore can be signaled and unsignaled as
before its DMA fences materialize mean that we cannot just rely on the
fence within the syncobj but we also need a sideband payload verifying
that the fence in the syncobj matches the last submission from the
Vulkan API point of view.

This change adds a sideband payload that is incremented with signaled
syncobj when vkQueueSubmit() is called. The next vkQueueSubmit()
waiting on a the syncobj will read the sideband payload and wait for a
fence chain element with a seqno superior or equal to the sideband
payload value to be added into the fence chain and use that fence to
trigger the submission on the kernel driver.

v2: Use a separate ioctl to get/set the sideband value (Christian)

v3: Use 2 ioctls for get/set (Christian)

v4: Use a single new ioctl

v5: a bunch of blattant mistakes
Store payload atomically (Chris)

v6: Only touch atomic value once (Jason)

Signed-off-by: Lionel Landwerlin 
Reviewed-by: David Zhou  (v5)
Cc: Christian Koenig 
Cc: Jason Ekstrand 
Cc: David(ChunMing) Zhou 
---
 drivers/gpu/drm/drm_internal.h |  2 ++
 drivers/gpu/drm/drm_ioctl.c|  3 ++
 drivers/gpu/drm/drm_syncobj.c  | 59 +-
 include/drm/drm_syncobj.h  |  9 ++
 include/uapi/drm/drm.h | 17 ++
 5 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 51a2055c8f18..e297dfd85019 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -208,6 +208,8 @@ int drm_syncobj_timeline_signal_ioctl(struct drm_device 
*dev, void *data,
  struct drm_file *file_private);
 int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private);
+int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data,
+struct drm_file *file_private);
 
 /* drm_framebuffer.c */
 void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index f675a3bb2c88..644d0bc800a4 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -703,6 +703,9 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
  DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_BINARY, drm_syncobj_binary_ioctl,
+ DRM_RENDER_ALLOW),
+
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 
0),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, 
drm_crtc_queue_sequence_ioctl, 0),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, 
DRM_MASTER),
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 4b5c7b0ed714..732310b2b367 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1224,8 +1224,10 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void 
*data,
if (ret < 0)
return ret;
 
-   for (i = 0; i < args->count_handles; i++)
+   for (i = 0; i < args->count_handles; i++) {
drm_syncobj_replace_fence(syncobjs[i], NULL);
+   atomic64_set(&syncobjs[i]->binary_payload, 0);
+   }
 
drm_syncobj_array_free(syncobjs, args->count_handles);
 
@@ -1395,6 +1397,61 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void 
*data,
if (ret)
break;
}
+
+   drm_syncobj_array_free(syncobjs, args->count_handles);
+
+   return ret;
+}
+
+int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data,
+struct drm_file *file_private)
+{
+   struct drm_syncobj_binary_array *args = data;
+   struct drm_syncobj **syncobjs;
+   u32 __user *access_flags = u64_to_user_ptr(args->access_flags);
+   u64 __user *values = u64_to_user_ptr(args->values);
+   u32 i;
+   int ret;
+
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ_TIMELINE))
+   return -EOPNOTSUPP;
+
+   if (args->pad != 0)
+   return -EINVAL;
+

[PATCH v5 1/1] drm/syncobj: add sideband payload

2019-08-09 Thread Lionel Landwerlin
The Vulkan timeline semaphores allow signaling to happen on the point
of the timeline without all of the its dependencies to be created.

The current 2 implementations (AMD/Intel) of the Vulkan spec on top of
the Linux kernel are using a thread to wait on the dependencies of a
given point to materialize and delay actual submission to the kernel
driver until the wait completes.

If a binary semaphore is submitted for signaling along the side of a
timeline semaphore waiting for completion that means that the drm
syncobj associated with that binary semaphore will not have a DMA
fence associated with it by the time vkQueueSubmit() returns. This and
the fact that a binary semaphore can be signaled and unsignaled as
before its DMA fences materialize mean that we cannot just rely on the
fence within the syncobj but we also need a sideband payload verifying
that the fence in the syncobj matches the last submission from the
Vulkan API point of view.

This change adds a sideband payload that is incremented with signaled
syncobj when vkQueueSubmit() is called. The next vkQueueSubmit()
waiting on a the syncobj will read the sideband payload and wait for a
fence chain element with a seqno superior or equal to the sideband
payload value to be added into the fence chain and use that fence to
trigger the submission on the kernel driver.

v2: Use a separate ioctl to get/set the sideband value (Christian)

v3: Use 2 ioctls for get/set (Christian)

v4: Use a single new ioctl

v5: a bunch of blattant mistakes
Store payload atomically (Chris)

Signed-off-by: Lionel Landwerlin 
Cc: Christian Koenig 
Cc: Jason Ekstrand 
Cc: David(ChunMing) Zhou 
---
 drivers/gpu/drm/drm_internal.h |  2 ++
 drivers/gpu/drm/drm_ioctl.c|  3 ++
 drivers/gpu/drm/drm_syncobj.c  | 58 +-
 include/drm/drm_syncobj.h  |  9 ++
 include/uapi/drm/drm.h | 17 ++
 5 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 51a2055c8f18..e297dfd85019 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -208,6 +208,8 @@ int drm_syncobj_timeline_signal_ioctl(struct drm_device 
*dev, void *data,
  struct drm_file *file_private);
 int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private);
+int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data,
+struct drm_file *file_private);
 
 /* drm_framebuffer.c */
 void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index f675a3bb2c88..644d0bc800a4 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -703,6 +703,9 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
  DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_BINARY, drm_syncobj_binary_ioctl,
+ DRM_RENDER_ALLOW),
+
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 
0),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, 
drm_crtc_queue_sequence_ioctl, 0),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, 
DRM_MASTER),
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index b927e482e554..d2d3a8d1374d 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1150,8 +1150,10 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void 
*data,
if (ret < 0)
return ret;
 
-   for (i = 0; i < args->count_handles; i++)
+   for (i = 0; i < args->count_handles; i++) {
drm_syncobj_replace_fence(syncobjs[i], NULL);
+   atomic64_set(&syncobjs[i]->binary_payload, 0);
+   }
 
drm_syncobj_array_free(syncobjs, args->count_handles);
 
@@ -1321,6 +1323,60 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void 
*data,
if (ret)
break;
}
+
+   drm_syncobj_array_free(syncobjs, args->count_handles);
+
+   return ret;
+}
+
+int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data,
+struct drm_file *file_private)
+{
+   struct drm_syncobj_binary_array *args = data;
+   struct drm_syncobj **syncobjs;
+   u32 __user *access_flags = u64_to_user_ptr(args->access_flags);
+   u64 __user *values = u64_to_user_ptr(args->values);
+   u32 i;
+   int ret;
+
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ_TIMELINE))
+   return -EOPNOTSUPP;
+
+   if (args->pad != 0)
+   return -EINVAL;
+
+   if (args->count_handles == 0)
+   return -EINVAL

[PATCH v5 0/1] drm/syncobj: add syncobj sideband payload for threaded submission

2019-08-09 Thread Lionel Landwerlin
A bunch of fixes :)

Lionel Landwerlin (1):
  drm/syncobj: add sideband payload

 drivers/gpu/drm/drm_internal.h |  2 ++
 drivers/gpu/drm/drm_ioctl.c|  3 ++
 drivers/gpu/drm/drm_syncobj.c  | 58 +-
 include/drm/drm_syncobj.h  |  9 ++
 include/uapi/drm/drm.h | 17 ++
 5 files changed, 88 insertions(+), 1 deletion(-)

--
2.23.0.rc1
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4 1/1] drm/syncobj: add sideband payload

2019-08-09 Thread Lionel Landwerlin

On 09/08/2019 15:27, Koenig, Christian wrote:

Am 09.08.19 um 14:26 schrieb Lionel Landwerlin:

On 09/08/2019 14:44, Chris Wilson wrote:

Quoting Lionel Landwerlin (2019-08-09 12:30:30)

diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 8a5b2f8f8eb9..1ce83853f997 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -785,6 +785,22 @@ struct drm_syncobj_timeline_array {
  __u32 pad;
   };
   +struct drm_syncobj_binary_array {
+   /* A pointer to an array of u32 syncobj handles. */
+   __u64 handles;
+   /* A pointer to an array of u32 access flags for each
handle. */
+   __u64 access_flags;
+   /* The binary value of a syncobj is read before it is
incremented. */
+#define I915_DRM_SYNCOBJ_BINARY_ITEM_VALUE_READ (1u << 0)
+#define I915_DRM_SYNCOBJ_BINARY_ITEM_VALUE_INC  (1u << 1)

You're not in Kansas anymore ;)
-Chris


Which means? :)

You are in common DRM code, but the new defines start with I915_

Cheers,
Christian.



Oh dear...


-Lionel






-Lionel



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4 1/1] drm/syncobj: add sideband payload

2019-08-09 Thread Lionel Landwerlin

On 09/08/2019 14:58, Chris Wilson wrote:

Quoting Lionel Landwerlin (2019-08-09 12:30:30)

+int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data,
+struct drm_file *file_private)
+{
+   struct drm_syncobj_binary_array *args = data;
+   struct drm_syncobj **syncobjs;
+   u32 __user *access_flags = u64_to_user_ptr(args->access_flags);
+   u64 __user *values = u64_to_user_ptr(args->values);
+   u32 i;
+   int ret;
+
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ_TIMELINE))
+   return -EOPNOTSUPP;
+
+   if (args->pad != 0)
+   return -EINVAL;
+
+   if (args->count_handles == 0)
+   return -EINVAL;

You may find it easier to just return success for 0 handles. Slightly less
obnoxious error handling?



All the other ioctls in this file return EINVAL in that case. I'm just 
going for consistency.


It's also a good indication for the application it can save itself an 
ioctl really :)






+   ret = drm_syncobj_array_find(file_private,
+u64_to_user_ptr(args->handles),
+args->count_handles,
+&syncobjs);
+   if (ret < 0)
+   return ret;
+
+   for (i = 0; i < args->count_handles; i++) {
+   u32 flags;
+
+   copy_from_user(&flags, &access_flags[i], sizeof(flags));
+   ret = ret ? -EFAULT : 0;

Magic?

if (get_user(flags, &access_flags[i[))
return -EFAULT;



I give this no testing, I'm just trying to get some feedback about the 
direction.


Thanks though :)





+   if (ret)
+   break;
+
+   if (flags & I915_DRM_SYNCOBJ_BINARY_ITEM_VALUE_READ) {
+   copy_to_user(&values[i], &syncobjs[i]->binary_payload, 
sizeof(values[i]));
+   ret = ret ? -EFAULT : 0;

More magic.

if (put_user(&syncobjs[i]->binary_payload, &values[i]))
return -EFAULT;


+   if (ret)
+   break;
+   }
+
+   if (flags & I915_DRM_SYNCOBJ_BINARY_ITEM_VALUE_INC)
+   syncobjs[i]->binary_payload++;

So if an error occurs how does the user know which syncobj were
advanced before the error? (Or explain why it doesn't actually matter)
The clue I guess is with read/inc, but confirmation of design would be
nice.



I guess we could toggle the access flag bits to notify that the actions 
were completed.





Not atomic (the u64 write should really be to avoid total corruption)
and nothing prevents userspace from racing. How safe is that in the
overall design?



Atomic would prevent issue related to 2 processes/threads seeing 
different values because of caching?



If not then it's not really interesting for the use case. The increment 
should happen during the vkQueueSubmit() call and the value is only 
valid upon returning.


The application is responsible for not having 
vkQueueSubmit()/vkWaitForFences() race.



Not opposed to switch to atomic though.




What would happen if the binary_payload was initialised to -1?



The 0 value is problematic because it's also used for "whatever fence in 
the syncobj".


I think we need to stick to the same rules as the timeline values : 0 is 
always signaled



Thanks,


-Lionel





+   }
+
 drm_syncobj_array_free(syncobjs, args->count_handles);
  
 return ret;



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  1   2   3   >