Re: [PATCH v8 5/8] mm: Device exclusive memory access

2021-05-19 Thread Peter Xu
On Wed, May 19, 2021 at 08:49:01PM +1000, Alistair Popple wrote:
> On Wednesday, 19 May 2021 7:16:38 AM AEST Peter Xu wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On Wed, Apr 07, 2021 at 06:42:35PM +1000, Alistair Popple wrote:
> > 
> > [...]
> > 
> > > +static bool try_to_protect(struct page *page, struct mm_struct *mm,
> > > +unsigned long address, void *arg)
> > > +{
> > > + struct ttp_args ttp = {
> > > + .mm = mm,
> > > + .address = address,
> > > + .arg = arg,
> > > + .valid = false,
> > > + };
> > > + struct rmap_walk_control rwc = {
> > > + .rmap_one = try_to_protect_one,
> > > + .done = page_not_mapped,
> > > + .anon_lock = page_lock_anon_vma_read,
> > > + .arg = ,
> > > + };
> > > +
> > > + /*
> > > +  * Restrict to anonymous pages for now to avoid potential writeback
> > > +  * issues.
> > > +  */
> > > + if (!PageAnon(page))
> > > + return false;
> > > +
> > > + /*
> > > +  * During exec, a temporary VMA is setup and later moved.
> > > +  * The VMA is moved under the anon_vma lock but not the
> > > +  * page tables leading to a race where migration cannot
> > > +  * find the migration ptes. Rather than increasing the
> > > +  * locking requirements of exec(), migration skips
> > > +  * temporary VMAs until after exec() completes.
> > > +  */
> > > + if (!PageKsm(page) && PageAnon(page))
> > > + rwc.invalid_vma = invalid_migration_vma;
> > > +
> > > + rmap_walk(page, );
> > > +
> > > + return ttp.valid && !page_mapcount(page);
> > > +}
> > 
> > I raised a question in the other thread regarding fork():
> > 
> > https://lore.kernel.org/lkml/YKQjmtMo+YQGx%2FwZ@t490s/
> > 
> > While I suddenly noticed that we may have similar issues even if we fork()
> > before creating the ptes.
> > 
> > In that case, we may see multiple read-only ptes pointing to the same page. 
> > We will convert all of them into device exclusive read ptes in rmap_walk()
> > above, however how do we guarantee after all COW done in the parent and all
> > the childs processes, the device owned page will be returned to the parent?
> 
> I assume you are talking about a fork() followed by a call to 
> make_device_exclusive()? I think this should be ok because 
> make_device_exclusive() always calls GUP with FOLL_WRITE both to break COW 
> and 
> because a device performing atomic operations needs to write to the page. I 
> suppose a comment here highlighting the need to break COW to avoid this 
> scenario would be useful though.

Indeed, sorry for the false alarm!  Yes it would be great to mention that too.

-- 
Peter Xu



Re: [PATCH v8 5/8] mm: Device exclusive memory access

2021-05-19 Thread Peter Xu
On Wed, May 19, 2021 at 09:35:10PM +1000, Alistair Popple wrote:
> I think the approach you are describing is similar to what 
> migrate_vma_collect()/migrate_vma_unamp() does now and I think it could be 
> made to work. I ended up going with the GUP+unmap approach in part because 
> Christoph suggested it but primarily because it required less code especially 
> given we needed to call something to fault the page in/break COW anyway (or 
> extend what was there to call handle_mm_fault(), etc.).

I see, thank. Would you mind add a short paragraph in the commit message
talking about these two solutions and why we choose the GUP way?

-- 
Peter Xu



Re: [PATCH] drivers/video/fbdev/core/fbmem.c: add pointer judgment

2021-05-19 Thread Matthew Wilcox
On Wed, May 19, 2021 at 08:00:28PM +0800, songqiang wrote:
> Signed-off-by: songqiang 
> ---

You need to explain:

 - Why you think this patch is needed
   - Did you observe a problem at runtime?
   - Is this the output from some checking tool?
 - Why this is the right way to address the problem



Re: [PATCH v8 5/8] mm: Device exclusive memory access

2021-05-19 Thread Peter Xu
On Wed, May 19, 2021 at 09:04:53PM +1000, Alistair Popple wrote:
> Failing fork() because we couldn't take a lock doesn't seem like the right 
> approach though, especially as there is already existing code that retries. I 
> get this adds complexity though, so would be happy to take a look at cleaning 
> copy_pte_range() up in future.

Yes, I proposed that as this one won't affect any existing applications (unlike
the existing ones) but only new userspace driver apps that will use this new
atomic feature.

IMHO it'll be a pity to add extra complexity and maintainance burden into
fork() if only for keeping the "logical correctness of fork()" however the code
never triggers. If we start with trylock we'll know whether people will use it,
since people will complain with a reason when needed; however I still doubt
whether a sane userspace device driver should fork() within busy interaction
with the device underneath..

In all cases, please still consider to keep them in copy_nonpresent_pte() (and
if to rework, separating patches would be great).

Thanks,

-- 
Peter Xu



Re: [PATCH 1/2] drm/doc: document how userspace should find out CRTC index

2021-05-19 Thread Leandro Ribeiro



On 5/12/21 10:04 AM, Pekka Paalanen wrote:
> On Wed, 12 May 2021 09:50:14 -0300
> Leandro Ribeiro  wrote:
> 
>> On 5/6/21 5:50 AM, Pekka Paalanen wrote:
>>> On Wed, 28 Apr 2021 18:36:50 -0300
>>> Leandro Ribeiro  wrote:
>>>   
 In this patch we add a section to document what userspace should do to
 find out the CRTC index. This is important as there are multiple places
 in the documentation that need this, so it's better to just point to
 this section and avoid repetition.

 Signed-off-by: Leandro Ribeiro 
 ---
  Documentation/gpu/drm-uapi.rst| 14 ++
  drivers/gpu/drm/drm_debugfs_crc.c |  9 +
  include/uapi/drm/drm.h|  3 ++-
  3 files changed, 21 insertions(+), 5 deletions(-)

 diff --git a/Documentation/gpu/drm-uapi.rst 
 b/Documentation/gpu/drm-uapi.rst
 index 04bdc7a91d53..1aa52a6ac567 100644
 --- a/Documentation/gpu/drm-uapi.rst
 +++ b/Documentation/gpu/drm-uapi.rst
 @@ -457,6 +457,20 @@ Userspace API Structures
  .. kernel-doc:: include/uapi/drm/drm_mode.h
 :doc: overview

 +.. _crtc_index:
 +
 +CRTC index
 +--
 +
 +In some situations, it is important for userspace to find out the index 
 of a  
>>>
>>> That could be said about everything, so this sentence has no
>>> information value. Instead, you could start by stating that CRTCs have
>>> both an object ID and an index, and they are not the same thing. CRTC
>>> index is used in cases where a densely packed identifier for a CRTC is
>>> needed, e.g. in bit-for-crtc masks, where using the object ID would not
>>> work.
>>>  
 +CRTC. The CRTC index should not be confused with its object id.
 +
 +In order to do this, userspace should first query the resources object  
>>>
>>> Instead of saying what userspace must do, you could just explain where
>>> it can be observed.
>>>   
 +from the device that owns the CRTC (using the DRM_IOCTL_MODE_GETRESOURCES 
  
>>>
>>> So here you might start with: DRM_IOCTL_MODE_GETRESOURCES populates a
>>> structure with an array of CRTC IDs. CRTC's index is its index in that
>>> array.
>>>   
 +ioctl). The resources object contains a pointer to an array of CRTC's, 
 and also
 +the number of entries of the array. The index of the CRTC is the same as 
 its
 +position in this array.  
>>>
>>> Anyway, the idea here is right.
>>>   
>>
>> So what about:
>>
>> CRTC's have both an object ID and an index, and they should not be
>> confused. The index is used in cases where a densely packed identifier
>> for a CRTC is needed, for instance in a bitmask of CRTC's. (maybe a link
>> to the possible_crtcs field of struct drm_mode_get_plane? as example)
>>
>> DRM_IOCTL_MODE_GETRESOURCES populates a structure with an array of CRTC
>> id's, and the CRTC index is its position in this array.
> 
> Sure, sounds good.
> 
> Capitalized 'ID'?
> 

Sure, I'll add this change.

Thanks,
Leandro


Re: [PATCH v7 13/16] drm/scheduler: Fix hang when sched_entity released

2021-05-19 Thread Christian König

Am 19.05.21 um 13:51 schrieb Andrey Grodzovsky:



On 2021-05-19 7:46 a.m., Christian König wrote:

Am 19.05.21 um 13:03 schrieb Andrey Grodzovsky:



On 2021-05-19 6:57 a.m., Christian König wrote:

Am 18.05.21 um 20:48 schrieb Andrey Grodzovsky:

[SNIP]


Would this be the right way to do it ?


Yes, it is at least a start. Question is if we can wait blocking 
here or not.


We install a callback a bit lower to avoid blocking, so I'm 
pretty sure that won't work as expected.


Christian.


I can't see why this would create problems, as long as the 
dependencies
complete or force competed if they are from same device 
(extracted) but

on a different ring then looks to me it should work. I will give it
a try.


Ok, but please also test the case for a killed process.

Christian.


You mean something like run glxgears and then simply
terminate it ? Because I done that. Or something more ?


Well glxgears is a bit to lightweight for that.

You need at least some test which is limited by the rendering pipeline.

Christian.


You mean something that fill the entity queue faster then sched thread
empties it so when we kill the process we actually need to explicitly go
through remaining jobs termination ? I done that too by inserting
artificial delay in drm_sched_main.


Yeah, something like that.

Ok in that case I would say that this should work then.

Christian.



Andrey





Andrey






Andrey


___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7Candrey.grodzovsky%40amd.com%7Cce1252e55fae4338710d08d91ab4de01%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637570186393107071%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=vGqxY5sxpEIiQGFBNn2PWkKqVjviM29r34Yjv0wujf4%3Dreserved=0 







Re: [PATCH v7 13/16] drm/scheduler: Fix hang when sched_entity released

2021-05-19 Thread Andrey Grodzovsky




On 2021-05-19 7:46 a.m., Christian König wrote:

Am 19.05.21 um 13:03 schrieb Andrey Grodzovsky:



On 2021-05-19 6:57 a.m., Christian König wrote:

Am 18.05.21 um 20:48 schrieb Andrey Grodzovsky:

[SNIP]


Would this be the right way to do it ?


Yes, it is at least a start. Question is if we can wait blocking 
here or not.


We install a callback a bit lower to avoid blocking, so I'm pretty 
sure that won't work as expected.


Christian.


I can't see why this would create problems, as long as the dependencies
complete or force competed if they are from same device (extracted) but
on a different ring then looks to me it should work. I will give it
a try.


Ok, but please also test the case for a killed process.

Christian.


You mean something like run glxgears and then simply
terminate it ? Because I done that. Or something more ?


Well glxgears is a bit to lightweight for that.

You need at least some test which is limited by the rendering pipeline.

Christian.


You mean something that fill the entity queue faster then sched thread
empties it so when we kill the process we actually need to explicitly go
through remaining jobs termination ? I done that too by inserting
artificial delay in drm_sched_main.

Andrey





Andrey






Andrey


___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7Candrey.grodzovsky%40amd.com%7Cce1252e55fae4338710d08d91ab4de01%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637570186393107071%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=vGqxY5sxpEIiQGFBNn2PWkKqVjviM29r34Yjv0wujf4%3Dreserved=0 





Re: [PATCH v7 13/16] drm/scheduler: Fix hang when sched_entity released

2021-05-19 Thread Christian König

Am 19.05.21 um 13:03 schrieb Andrey Grodzovsky:



On 2021-05-19 6:57 a.m., Christian König wrote:

Am 18.05.21 um 20:48 schrieb Andrey Grodzovsky:

[SNIP]


Would this be the right way to do it ?


Yes, it is at least a start. Question is if we can wait blocking 
here or not.


We install a callback a bit lower to avoid blocking, so I'm pretty 
sure that won't work as expected.


Christian.


I can't see why this would create problems, as long as the dependencies
complete or force competed if they are from same device (extracted) but
on a different ring then looks to me it should work. I will give it
a try.


Ok, but please also test the case for a killed process.

Christian.


You mean something like run glxgears and then simply
terminate it ? Because I done that. Or something more ?


Well glxgears is a bit to lightweight for that.

You need at least some test which is limited by the rendering pipeline.

Christian.



Andrey






Andrey


___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7Candrey.grodzovsky%40amd.com%7Cce1252e55fae4338710d08d91ab4de01%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637570186393107071%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=vGqxY5sxpEIiQGFBNn2PWkKqVjviM29r34Yjv0wujf4%3Dreserved=0 





Re: [Mesa-dev] [RFC 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-05-19 Thread Christian König

Oh, yeah we call that gang submit on the AMD side.

Had already some internal discussions how to implement this, but so far 
couldn't figure out how to cleanly introduce that into the DRM scheduler.


Can you briefly describe in a few words how that is supposed to work on 
the Intel side?


Thanks,
Christian.

Am 19.05.21 um 01:58 schrieb Matthew Brost:

Add entry fpr i915 new parallel submission uAPI plan.

v2:
  (Daniel Vetter):
   - Expand logical order explaination
   - Add dummy header
   - Only allow N BBs in execbuf IOCTL
   - Configure parallel submission per slot not per gem context

Cc: Tvrtko Ursulin 
Cc: Tony Ye 
CC: Carl Zhang 
Cc: Daniel Vetter 
Cc: Jason Ekstrand 
Signed-off-by: Matthew Brost 
---
  Documentation/gpu/rfc/i915_parallel_execbuf.h | 144 ++
  Documentation/gpu/rfc/i915_scheduler.rst  |  53 ++-
  2 files changed, 196 insertions(+), 1 deletion(-)
  create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h

diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h 
b/Documentation/gpu/rfc/i915_parallel_execbuf.h
new file mode 100644
index ..8c64b983ccad
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h
@@ -0,0 +1,144 @@
+#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see 
i915_context_engines_parallel_submit */
+
+/*
+ * i915_context_engines_parallel_submit:
+ *
+ * Setup a slot to allow multiple BBs to be submitted in a single execbuf 
IOCTL.
+ * Those BBs will then be scheduled to run on the GPU in parallel. Multiple
+ * hardware contexts are created internally in the i915 run these BBs. Once a
+ * slot is configured for N BBs only N BBs can be submitted in each execbuf
+ * IOCTL and this is implict behavior (e.g. the user doesn't tell the execbuf
+ * IOCTL there are N BBs, the execbuf IOCTL know how many BBs there are based 
on
+ * the slots configuration).
+ *
+ * Their are two currently defined ways to control the placement of the
+ * hardware contexts on physical engines: default behavior (no flags) and
+ * I915_PARALLEL_IMPLICT_BONDS (a flag). More flags may be added the in the
+ * future as new hardware / use cases arise. Details of how to use this
+ * interface below above the flags.
+ *
+ * Returns -EINVAL if hardware context placement configuration invalid or if 
the
+ * placement configuration isn't supported on the platform / submission
+ * interface.
+ * Returns -ENODEV if extension isn't supported on the platform / submission
+ * inteface.
+ */
+struct i915_context_engines_parallel_submit {
+   struct i915_user_extension base;
+
+   __u16 engine_index; /* slot for parallel engine */
+   __u16 width;/* number of contexts per parallel engine */
+   __u16 num_siblings; /* number of siblings per context */
+   __u16 mbz16;
+/*
+ * Default placement behvavior (currently unsupported):
+ *
+ * Rather than restricting parallel submission to a single class with a
+ * logically contiguous placement (I915_PARALLEL_IMPLICT_BONDS), add a mode 
that
+ * enables parallel submission across multiple engine classes. In this case 
each
+ * context's logical engine mask indicates where that context can placed. It is
+ * implied in this mode that all contexts have mutual exclusive placement (e.g.
+ * if one context is running CS0 no other contexts can run on CS0).
+ *
+ * Example 1 pseudo code:
+ * CSX[Y] = engine class X, logical instance Y
+ * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ * set_engines(INVALID)
+ * set_parallel(engine_index=0, width=2, num_siblings=2,
+ * engines=CS0[0],CS0[1],CS1[0],CS1[1])
+ *
+ * Results in the following valid placements:
+ * CS0[0], CS1[0]
+ * CS0[0], CS1[1]
+ * CS0[1], CS1[0]
+ * CS0[1], CS1[1]
+ *
+ * This can also be though of as 2 virtual engines:
+ * VE[0] = CS0[0], CS0[1]
+ * VE[1] = CS1[0], CS1[1]
+ *
+ * Example 2 pseudo code:
+ * CS[X] = generic engine of same class, logical instance X
+ * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ * set_engines(INVALID)
+ * set_parallel(engine_index=0, width=2, num_siblings=3,
+ * engines=CS[0],CS[1],CS[2],CS[0],CS[1],CS[2])
+ *
+ * Results in the following valid placements:
+ * CS[0], CS[1]
+ * CS[0], CS[2]
+ * CS[1], CS[0]
+ * CS[1], CS[2]
+ * CS[2], CS[0]
+ * CS[2], CS[1]
+ *
+ *
+ * This can also be though of as 2 virtual engines:
+ * VE[0] = CS[0], CS[1], CS[2]
+ * VE[1] = CS[0], CS[1], CS[2]
+
+ * This enables a use case where all engines are created equally, we don't care
+ * where they are scheduled, we just want a certain number of resources, for
+ * those resources to be scheduled in parallel, and possibly across multiple
+ * engine classes.
+ */
+
+/*
+ * I915_PARALLEL_IMPLICT_BONDS - Create implict bonds between each context.
+ * Each context must have the same number sibling and bonds are implictly 
create
+ * of the siblings.
+ *
+ * All of the below examples are in logical space.
+ *
+ * Example 1 pseudo code:

Re: [RFC] Add DMA_RESV_USAGE flags

2021-05-19 Thread Christian König

Am 19.05.21 um 00:06 schrieb Jason Ekstrand:

[SNIP]

E.g. we can't add a fence which doesn't wait for the exclusive one as
shared.

Ok I think that's a real problem, and  guess it's also related to all
the ttm privatization tricks and all that. So essentially we'd need
the opposite of ttm_bo->moving, as in you can't ignore it, but
otherwise it completely ignores all the userspace implicit fence
stuff.

Would you mind explaining it to the rest of the class?  I get the need
to do a TLB flush after a BO is removed from the processes address
space and I get that it may be super-heavy and that it has to be
delayed.  I also get that the driver needs to hold a reference to the
underlying pages until that TLB flush is done.  What I don't get is
what this has to do with the exclusive fence.  Why can't the driver
just gather up all the dma_resv fences on the current object (or,
better yet, just the ones from the current amdgpu process) and wait on
them all?  Why does it need to insert an exclusive fence that then
clogs up the whole works?


Because we have mixed up resource management with implicit syncing.

When I sum up all fences in (for example) a dma_fence_array container 
and add that as explicit fence to the dma_resv object resource 
management will do what I want and wait for everything to finish before 
moving or freeing the buffer. But implicit sync will just horrible over 
sync and wait for stuff it shouldn't wait for in the first place.


When I add the fence as shared fence I can run into the problem the the 
TLB flush might finish before the exclusive fence. Which is not allowed 
according to the DMA-buf fencing rules.


We currently have some rather crude workarounds to make use cases like 
this work as expected. E.g. by using a 
dma_fence_chain()/dma_fence_array() and/or adding the explusive fence to 
the shared fences etc etc...



Let's say that you have a buffer which is shared between two drivers A
and B and let's say driver A has thrown a fence on it just to ensure
that the BO doesn't get swapped out to disk until it's at a good
stopping point.  Then driver B comes along and wants to throw a
write-fence on it.  Suddenly, your memory fence from driver A causes
driver B to have to stall waiting for a "good" time to throw in a
fence.  It sounds like this is the sort of scenario that Christian is
running into.  And, yes, with certain Vulkan drivers being a bit
sloppy about exactly when they throw in write fences, I could see it
being a real problem.

Yes this is a potential problem, and on the i915 side we need to do
some shuffling here most likely. Especially due to discrete, but the
problem is pre-existing. tbh I forgot about the implications here
until I pondered this again yesterday evening.

But afaiui the amdgpu code and winsys in mesa, this isn't (yet) the
problem amd vk drivers have. The issue is that with amdgpu, all you
supply are the following bits at CS time:
- list of always mapped private buffers, which is implicit and O(1) in
the kernel fastpath
- additional list of shared buffers that are used by the current CS

I didn't check how exactly that works wrt winsys buffer ownership, but
the thing is that on the kernel side _any_ buffer in there is treated
as a implicit sync'ed write. Which means if you render your winsys
with a bunch of command submission split over 3d and compute pipes,
you end up with horrendous amounts of oversync.

What are you talking about? We have no sync at all for submissions from
the same client.

Yes. Except when the buffer is shared with another driver, at which
point you sync a _lot_ and feel the pain.

Yes, exactly that's the problem.

We basically don't know during CS if a BO is shared or not.

We do know that during importing or exporting the BO thought.

No you don't. Or at least that's massively awkward, see Jason's reply.

Please.  In Vulkan, we know explicitly whether or not any BO will ever
be shared and, if a BO is ever flagged as shared even though it's not,
that's the app being stupid and they can eat the perf hit.


Yeah, that's not a problem at all. We already have the per BO flag in 
amdgpu for this as well.



In GL, things are more wishy-washy but GL has so many stupid cases where we
have to throw a buffer away and re-allocate that one more isn't going
to be all that bad.  Even there, you could do something where you add
an in-fence to the BO export operation so that the driver knows when
to switch from the shared internal dma_resv to the external one
without having to create a new BO and copy.


Hui what? What do you mean with in-fence here?


[SNIP]

Yeah but why does your userspace not know when a bo is used?

We always know when a BO is exported because we're the ones doing the
export call.  Always.  Of course, we don't know if that BO is shared
with another driver or re-imported back into the same one but is that
really the case we're optimizing for?


Yes, unfortunately. Exactly that's one of the reasons we couldn't go 
with the per 

Re: [PATCH v8 5/8] mm: Device exclusive memory access

2021-05-19 Thread Alistair Popple
On Wednesday, 19 May 2021 3:27:42 AM AEST Peter Xu wrote:
> > > The odd part is the remote GUP should have walked the page table
> > > already, so since the target here is the vaddr to replace, the 1st page
> > > table walk should be able to both trylock/lock the page, then modify
> > > the pte with pgtable lock held, return the locked page, then walk the
> > > rmap again to remove all the rest of the ptes that are mapping to this
> > > page.  In that case before we call the rmap_walk() we know this must be
> > > the page we want to take care of, and no one will be able to restore
> > > the original mm pte either (as we're with the page lock).  Then we
> > > don't need this check, neither do we need ttp->address.
> > 
> > If I am understanding you correctly I think this would be similar to the
> > approach that was taken in v2. However it pretty much ended up being just
> > an open-coded version of gup which is useful anyway to fault the page in.
> I see.  For easier reference this is v2 patch 1:
> 
> https://lore.kernel.org/lkml/20210219020750.16444-2-apop...@nvidia.com/

Sorry, I should have been clearer and just included that reference for you.

> Indeed that looks like it, it's just that instead of grabbing the page only
> in hmm_exclusive_pmd() we can do the pte modification along the way to seal
> the whole thing (address/pte & page).  I saw Christoph and Jason commented
> in that patch, but not regarding to this approach.  So is there a reason
> that you switched?  Do you think it'll work?

I think the approach you are describing is similar to what 
migrate_vma_collect()/migrate_vma_unamp() does now and I think it could be 
made to work. I ended up going with the GUP+unmap approach in part because 
Christoph suggested it but primarily because it required less code especially 
given we needed to call something to fault the page in/break COW anyway (or 
extend what was there to call handle_mm_fault(), etc.).

> I have no strong opinion either, it's just not crystal clear why we'd need
> that ttp->address at all for a rmap walk along with that "valid" field.

It's purely to ensure the PTE pointing to the GUP page was replaced with an 
exclusive swap entry and that the mapping didn't change between calls.

> Meanwhile it should be slightly less efficient too to go with current
> approach, especially when the page array gets huge, I think: since there'll
> be longer time we do GUP before doing the rmap walk, so higher possibility
> that the GUPed pages got replaced for whatever reason.  Then the call to
> make_device_exclusive_range() will fail as a whole just for a single page
> replacement within the range.






Re: [Intel-gfx] [PATCH v2 10/15] drm/i915/ttm: Introduce a TTM i915 gem object backend

2021-05-19 Thread Thomas Hellström

Thanks a lot for reviewing, Matthew!

On 5/19/21 11:53 AM, Matthew Auld wrote:

On Tue, 18 May 2021 at 09:28, Thomas Hellström
 wrote:

Most logical place to introduce TTM buffer objects is as an i915
gem object backend. We need to add some ops to account for added
functionality like delayed delete and LRU list manipulation.

Initially we support only LMEM and SYSTEM memory, but SYSTEM
(which in this case means evicted LMEM objects) is not
visible to i915 GEM yet. The plan is to move the i915 gem system region
over to the TTM system memory type in upcoming patches.

We set up GPU bindings directly both from LMEM and from the system region,
as there is no need to use the legacy TTM_TT memory type. We reserve
that for future porting of GGTT bindings to TTM.

Remove the old lmem backend.

Signed-off-by: Thomas Hellström 

+/**
+ * i915_gem_object_evictable - Whether object is likely evictable after unbind.
+ * @obj: The object to check
+ *
+ * This function checks whether the object is likely unvictable after unbind.
+ * If the object is not locked when checking, the result is only advisory.
+ * If the object is locked when checking, and the function returns true,
+ * then an eviction should indeed be possible. But since unlocked vma
+ * unpinning and unbinding is currently possible, the object can actually
+ * become evictable even if this function returns false.
+ *
+ * Return: true if the object may be evictable. False otherwise.
+ */
+bool i915_gem_object_evictable(struct drm_i915_gem_object *obj)
+{
+   struct i915_vma *vma;
+   int pin_count = atomic_read(>mm.pages_pin_count);
+
+   if (!pin_count)
+   return true;
+
+   spin_lock(>vma.lock);
+   list_for_each_entry(vma, >vma.list, obj_link) {
+   if (i915_vma_is_pinned(vma)) {
+   spin_unlock(>vma.lock);
+   return false;
+   }
+   if (atomic_read(>pages_count))
+   pin_count--;

Can't pages_count be > 1, which would also be reflected in
pages_pin_count? The vma_pin path looks very complex.


Yes, and Maarten pointed out a bug in it as well. We now only take a 
pages_pin_count when vma->pages_count transitions from 0->1 so the above 
code should be correct, I think.





+   }
+   spin_unlock(>vma.lock);
+   GEM_WARN_ON(pin_count < 0);
+
+   return pin_count == 0;
+}
+
  void i915_gem_init__objects(struct drm_i915_private *i915)
  {
 INIT_WORK(>mm.free_work, __i915_gem_free_work);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 2ebd79537aea..ae5930e307d5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -200,6 +200,9 @@ static inline bool i915_gem_object_trylock(struct 
drm_i915_gem_object *obj)

  static inline void i915_gem_object_unlock(struct drm_i915_gem_object *obj)
  {
+   if (obj->ops->adjust_lru)
+   obj->ops->adjust_lru(obj);

Interesting, so we bump the lru even when we just drop the lock?
Yes, as an initial approximation. TTM has historically done this and it 
may not be the best choice, but dropping the lock means we've just used 
the object for something, typically CS, so hence we bump the LRU.

+
 dma_resv_unlock(obj->base.resv);
  }

@@ -587,6 +590,12 @@ int i915_gem_object_read_from_page(struct 
drm_i915_gem_object *obj, u64 offset,

  bool i915_gem_object_is_shmem(const struct drm_i915_gem_object *obj);

+void __i915_gem_free_object_rcu(struct rcu_head *head);
+
+void __i915_gem_free_object(struct drm_i915_gem_object *obj);
+
+bool i915_gem_object_evictable(struct drm_i915_gem_object *obj);
+
  #ifdef CONFIG_MMU_NOTIFIER
  static inline bool
  i915_gem_object_is_userptr(struct drm_i915_gem_object *obj)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 98f69d8fd37d..b350765e1935 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -63,6 +63,20 @@ struct drm_i915_gem_object_ops {
   const struct drm_i915_gem_pwrite *arg);

 int (*dmabuf_export)(struct drm_i915_gem_object *obj);
+
+   /**
+* adjust_lru - notify that the madvise value was updated
+* @obj: The gem object
+*
+* The madvise value may have been updated, or object was recently
+* referenced so act accordingly (Perhaps changing an LRU list etc).
+*/
+   void (*adjust_lru)(struct drm_i915_gem_object *obj);
+
+   /**
+* delayed_free - Override the default delayed free implementation
+*/
+   void (*delayed_free)(struct drm_i915_gem_object *obj);
 void (*release)(struct drm_i915_gem_object *obj);

 const char *name; /* friendly name for debug, e.g. lockdep classes */
@@ -307,6 +321,10 @@ struct drm_i915_gem_object {
 

Re: [RFC] Add DMA_RESV_USAGE flags

2021-05-19 Thread Christian König

Am 18.05.21 um 23:17 schrieb Daniel Vetter:

[SNIP]

The problem in this case is not starting a new CS, but synchronizing to
the existing ones.

See a heavy TLB flush is made completely out of sync. E.g. it doesn't
want to wait for any previous operation.

In other words imagine the following example:
1. Both process A and B have a BO mapped.
2. Process A is heavily using the BO and doing all kind of rendering.
3. Process B is unmapping the BO.

Now that process B unmaps the BO needs to trigger page table updates and
a heavy TLB flush, but since this can take really long we want to do it
asynchronously on the hardware.

With the current approach you basically can't do that because you can't
note that a fence should not participate in synchronization at all.

E.g. we can't add a fence which doesn't wait for the exclusive one as
shared.

Ok I think that's a real problem, and  guess it's also related to all
the ttm privatization tricks and all that. So essentially we'd need
the opposite of ttm_bo->moving, as in you can't ignore it, but
otherwise it completely ignores all the userspace implicit fence
stuff.


It goes into that direction, but doesn't sounds like the full solution 
either.


[SNIP]

Can we please stop with the "amdgpu is right, everyone else is wrong" approach?


Well the approach I do here is not "amdgpu is right, everyone else is 
wrong". But rather we had DRM uAPI for i915, nouveau and radeon and 
unfortunately leaked that into DMA-buf without much thinking about it.


I'm also not saying that the approach amdgpu is right. It's just what 
amdgpu needs in it's CS interface.


What I'm saying is that DMA-buf is a device driver independent subsystem 
and we shouldn't make any assumption which come from just a handful of 
DRM driver on it's implicit sync implementation.



Like I'm pretty much going to type up the patch that does a full drm
subsytem audit of everything and whack amdgpu into compliance. Perf
hit be damned, you had a few years to fix this with better uapi. Or I
find out that there's a giant inconsistent mess, but at least we'd
gain some clarity about where exactly we are here and maybe what to do
next.


Ok to let us move forward please take a look at the first patches of the 
set. It cleans up quite a bunch of the mess we have in there before even 
coming to adding flags to the shared slots.


I think you will agree on that we should do is cleaning up the use cases 
further and separate implicit sync from resource management.


In other words we forbid touching the exclusive and shared fences 
directly and have separate APIs for resource management and implicit sync.


This makes sense anyway, no matter what implicit synchronization 
framework we will install underneath.


Regards,
Christian.


-Daniel


Regards,
Christian.


After that I think we can look at what exact oversync issue remains
and why and solve it, but until we have this this just feels like
another rehash of "amgpu insist its own dma_resv interpration is the
right one and everyone else should move one over".

Or maybe I've just become real garbage at reading random driver code,
wouldn't be the first time :-)

Cheers, Daniel


Regards,
Christian.


Cheers, Daniel


--Jason



That's also the reason the Valve guys came up with a solution where each
BO gets a flag for explicit sync, but that only works for exports and
not for imports.


I915 and iirc msm has explicit flags for this, panfrost was designed to
support this correctly from the start (also with flags I think). That's at
least what I remember from all the discussions at XDC and #dri-devel, but
didn't check the code again to give you the list of uapi flags you need
for each driver.

The other piece is making sure you're only picking up implicit fences when
you should, and not any later ones, for which Jason has a solution:

https://lore.kernel.org/dri-devel/20210317221940.2146688-1-ja...@jlekstrand.net/

Yes, I helped with that as well. But I think that this is just another
workaround without really addressing the underlying problem.


If amdgpu isn't using those, then you will suffer from
over-synchronization in vulkan and pay a price. The entire point of vulkan
is that you pick up sync points very explicitly, and we also need to have
very explicit uapi for userspace to pick up/set the implicit fences.

Trying to paper over this with more implicit magic is imo just wrong, and
definitely not the long term explicit sync model we want.

I completely disagree.

In my opinion the implicit sync model we have for dma_resv currently is
just not well designed at all, since it always requires cooperation from
userspace.

In other words you need to know when to enable implicit sync in
userspace and that information is simply not present all of the time.

What we have done here is just keeping the old reader/writer flags i915,
radeon and nouveau once had and pushed that out to everybody else making
the assumption that everybody would follow that without documenting 

Re: [V2][PATCH 2/2] drm: xlnx: consolidate the functions which programming AUDIO_VIDEO_SELECT register

2021-05-19 Thread Paul Cercueil

Hi,

Le mar., mai 18 2021 at 17:50:19 +0800, quanyang.w...@windriver.com a 
écrit :

From: Quanyang Wang 

For now, the functions zynqmp_disp_avbuf_enable/disable_audio and
zynqmp_disp_avbuf_enable/disable_video are all programming the 
register
AV_BUF_OUTPUT_AUDIO_VIDEO_SELECT to select the output for audio or 
video.

And in the future, many drm properties (like video_tpg, audio_tpg,
audio_pl, etc) also need to access it. So let's introduce some 
variables

of enum type and consolidate the code to unify handling this.

Signed-off-by: Quanyang Wang 


Acked-by: Paul Cercueil 

Cheers,
-Paul



---
 drivers/gpu/drm/xlnx/zynqmp_disp.c  | 168 
++--

 drivers/gpu/drm/xlnx/zynqmp_disp_regs.h |  23 +---
 2 files changed, 106 insertions(+), 85 deletions(-)

diff --git a/drivers/gpu/drm/xlnx/zynqmp_disp.c 
b/drivers/gpu/drm/xlnx/zynqmp_disp.c

index eefb278e24c6..3672d2f5665b 100644
--- a/drivers/gpu/drm/xlnx/zynqmp_disp.c
+++ b/drivers/gpu/drm/xlnx/zynqmp_disp.c
@@ -102,12 +102,39 @@ enum zynqmp_disp_layer_id {

 /**
  * enum zynqmp_disp_layer_mode - Layer mode
- * @ZYNQMP_DISP_LAYER_NONLIVE: non-live (memory) mode
+ * @ZYNQMP_DISP_LAYER_MEM: memory mode
  * @ZYNQMP_DISP_LAYER_LIVE: live (stream) mode
+ * @ZYNQMP_DISP_LAYER_TPG: tpg mode (only for video layer)
+ * @ZYNQMP_DISP_LAYER_DISABLE: disable mode
  */
 enum zynqmp_disp_layer_mode {
-   ZYNQMP_DISP_LAYER_NONLIVE,
-   ZYNQMP_DISP_LAYER_LIVE
+   ZYNQMP_DISP_LAYER_MEM,
+   ZYNQMP_DISP_LAYER_LIVE,
+   ZYNQMP_DISP_LAYER_TPG,
+   ZYNQMP_DISP_LAYER_DISABLE
+};
+
+enum avbuf_vid_mode {
+   VID_MODE_LIVE,
+   VID_MODE_MEM,
+   VID_MODE_TPG,
+   VID_MODE_NONE
+};
+
+enum avbuf_gfx_mode {
+   GFX_MODE_DISABLE,
+   GFX_MODE_MEM,
+   GFX_MODE_LIVE,
+   GFX_MODE_NONE
+};
+
+enum avbuf_aud_mode {
+   AUD1_MODE_LIVE,
+   AUD1_MODE_MEM,
+   AUD1_MODE_TPG,
+   AUD1_MODE_DISABLE,
+   AUD2_MODE_DISABLE,
+   AUD2_MODE_ENABLE
 };

 /**
@@ -542,92 +569,102 @@ static void 
zynqmp_disp_avbuf_disable_channels(struct zynqmp_disp_avbuf *avbuf)

 }

 /**
- * zynqmp_disp_avbuf_enable_audio - Enable audio
+ * zynqmp_disp_avbuf_output_select - Select the buffer manager 
outputs

  * @avbuf: Audio/video buffer manager
+ * @layer: The layer
+ * @mode: The mode for this layer
  *
- * Enable all audio buffers with a non-live (memory) source.
+ * Select the buffer manager outputs for @layer.
  */
-static void zynqmp_disp_avbuf_enable_audio(struct zynqmp_disp_avbuf 
*avbuf)
+static void zynqmp_disp_avbuf_output_select(struct zynqmp_disp_avbuf 
*avbuf,

+  struct zynqmp_disp_layer *layer,
+  u32 mode)
 {
-   u32 val;
+   u32 reg;

-   val = zynqmp_disp_avbuf_read(avbuf, ZYNQMP_DISP_AV_BUF_OUTPUT);
-   val &= ~ZYNQMP_DISP_AV_BUF_OUTPUT_AUD1_MASK;
-   val |= ZYNQMP_DISP_AV_BUF_OUTPUT_AUD1_MEM;
-   val |= ZYNQMP_DISP_AV_BUF_OUTPUT_AUD2_EN;
-   zynqmp_disp_avbuf_write(avbuf, ZYNQMP_DISP_AV_BUF_OUTPUT, val);
+   reg = zynqmp_disp_avbuf_read(avbuf, ZYNQMP_DISP_AV_BUF_OUTPUT);
+
+   /* Select audio mode when the layer is NULL */
+   if (layer == NULL) {
+   if (mode >= AUD2_MODE_DISABLE) {
+   reg &= ~ZYNQMP_DISP_AV_BUF_OUTPUT_AUD2_MASK;
+   reg |= FIELD_PREP(ZYNQMP_DISP_AV_BUF_OUTPUT_AUD2_MASK,
+   (mode - AUD2_MODE_DISABLE));
+   } else {
+   reg &= ~ZYNQMP_DISP_AV_BUF_OUTPUT_AUD1_MASK;
+   reg |= FIELD_PREP(ZYNQMP_DISP_AV_BUF_OUTPUT_AUD1_MASK, 
mode);
+   }
+   } else if (is_layer_vid(layer)) {
+   reg &= ~ZYNQMP_DISP_AV_BUF_OUTPUT_VID1_MASK;
+   reg |= FIELD_PREP(ZYNQMP_DISP_AV_BUF_OUTPUT_VID1_MASK, mode);
+   } else {
+   reg &= ~ZYNQMP_DISP_AV_BUF_OUTPUT_VID2_MASK;
+   reg |= FIELD_PREP(ZYNQMP_DISP_AV_BUF_OUTPUT_VID2_MASK, mode);
+   }
+
+   zynqmp_disp_avbuf_write(avbuf, ZYNQMP_DISP_AV_BUF_OUTPUT, reg);
 }

 /**
- * zynqmp_disp_avbuf_disable_audio - Disable audio
+ * zynqmp_disp_avbuf_enable_audio - Enable audio
  * @avbuf: Audio/video buffer manager
  *
- * Disable all audio buffers.
+ * Enable all audio buffers.
  */
-static void zynqmp_disp_avbuf_disable_audio(struct zynqmp_disp_avbuf 
*avbuf)
+static void zynqmp_disp_avbuf_enable_audio(struct zynqmp_disp_avbuf 
*avbuf)

 {
-   u32 val;
-
-   val = zynqmp_disp_avbuf_read(avbuf, ZYNQMP_DISP_AV_BUF_OUTPUT);
-   val &= ~ZYNQMP_DISP_AV_BUF_OUTPUT_AUD1_MASK;
-   val |= ZYNQMP_DISP_AV_BUF_OUTPUT_AUD1_DISABLE;
-   val &= ~ZYNQMP_DISP_AV_BUF_OUTPUT_AUD2_EN;
-   zynqmp_disp_avbuf_write(avbuf, ZYNQMP_DISP_AV_BUF_OUTPUT, val);
+   zynqmp_disp_avbuf_output_select(avbuf, NULL, AUD1_MODE_MEM);
+   zynqmp_disp_avbuf_output_select(avbuf, NULL, AUD2_MODE_ENABLE);
 }

 /**
- * 

Re: [GIT PULL] drm/imx: fixes, dma-fence annotation, and color encoding/range plane properties

2021-05-19 Thread Philipp Zabel
On Wed, 2021-05-19 at 09:19 +1000, Dave Airlie wrote:
> On Wed, 12 May 2021 at 23:33, Philipp Zabel  wrote:
> > Hi Dave, Daniel,
> > 
> > The following changes since commit 6efb943b8616ec53a5e444193dccf1af9ad627b5:
> > 
> >   Linux 5.13-rc1 (2021-05-09 14:17:44 -0700)
> > 
> > are available in the Git repository at:
> > 
> >   git://git.pengutronix.de/git/pza/linux.git tags/imx-drm-next-2021-05-12
> 
> Is this for 5.14 or 5.13-rc3?

This is for 5.14, I'd like these to have a full round of testing.

regards
Philipp


Re: [Intel-gfx] [RFC 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-05-19 Thread Daniel Vetter
On Tue, May 18, 2021 at 04:58:30PM -0700, Matthew Brost wrote:
> Add entry fpr i915 new parallel submission uAPI plan.
> 
> v2:
>  (Daniel Vetter):
>   - Expand logical order explaination
>   - Add dummy header
>   - Only allow N BBs in execbuf IOCTL
>   - Configure parallel submission per slot not per gem context
> 
> Cc: Tvrtko Ursulin 
> Cc: Tony Ye 
> CC: Carl Zhang 
> Cc: Daniel Vetter 
> Cc: Jason Ekstrand 
> Signed-off-by: Matthew Brost 
> ---
>  Documentation/gpu/rfc/i915_parallel_execbuf.h | 144 ++
>  Documentation/gpu/rfc/i915_scheduler.rst  |  53 ++-
>  2 files changed, 196 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h
> 
> diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h 
> b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> new file mode 100644
> index ..8c64b983ccad
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> @@ -0,0 +1,144 @@
> +#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see 
> i915_context_engines_parallel_submit */
> +
> +/*
> + * i915_context_engines_parallel_submit:
> + *
> + * Setup a slot to allow multiple BBs to be submitted in a single execbuf 
> IOCTL.
> + * Those BBs will then be scheduled to run on the GPU in parallel. Multiple
> + * hardware contexts are created internally in the i915 run these BBs. Once a
> + * slot is configured for N BBs only N BBs can be submitted in each execbuf
> + * IOCTL and this is implict behavior (e.g. the user doesn't tell the execbuf
> + * IOCTL there are N BBs, the execbuf IOCTL know how many BBs there are 
> based on
> + * the slots configuration).
> + *
> + * Their are two currently defined ways to control the placement of the
> + * hardware contexts on physical engines: default behavior (no flags) and
> + * I915_PARALLEL_IMPLICT_BONDS (a flag). More flags may be added the in the
> + * future as new hardware / use cases arise. Details of how to use this
> + * interface below above the flags.
> + *
> + * Returns -EINVAL if hardware context placement configuration invalid or if 
> the
> + * placement configuration isn't supported on the platform / submission
> + * interface.
> + * Returns -ENODEV if extension isn't supported on the platform / submission
> + * inteface.
> + */
> +struct i915_context_engines_parallel_submit {
> + struct i915_user_extension base;
> +
> + __u16 engine_index; /* slot for parallel engine */
> + __u16 width;/* number of contexts per parallel engine */
> + __u16 num_siblings; /* number of siblings per context */
> + __u16 mbz16;

Ok the big picture looks reasonable now, the flags still confuse me.

> +/*
> + * Default placement behvavior (currently unsupported):
> + *
> + * Rather than restricting parallel submission to a single class with a
> + * logically contiguous placement (I915_PARALLEL_IMPLICT_BONDS), add a mode 
> that
> + * enables parallel submission across multiple engine classes. In this case 
> each
> + * context's logical engine mask indicates where that context can placed. It 
> is
> + * implied in this mode that all contexts have mutual exclusive placement 
> (e.g.
> + * if one context is running CS0 no other contexts can run on CS0).
> + *
> + * Example 1 pseudo code:
> + * CSX[Y] = engine class X, logical instance Y
> + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
> + * set_engines(INVALID)
> + * set_parallel(engine_index=0, width=2, num_siblings=2,
> + *   engines=CS0[0],CS0[1],CS1[0],CS1[1])
> + *
> + * Results in the following valid placements:
> + * CS0[0], CS1[0]
> + * CS0[0], CS1[1]
> + * CS0[1], CS1[0]
> + * CS0[1], CS1[1]
> + *
> + * This can also be though of as 2 virtual engines:
> + * VE[0] = CS0[0], CS0[1]
> + * VE[1] = CS1[0], CS1[1]
> + *
> + * Example 2 pseudo code:
> + * CS[X] = generic engine of same class, logical instance X
> + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
> + * set_engines(INVALID)
> + * set_parallel(engine_index=0, width=2, num_siblings=3,
> + *   engines=CS[0],CS[1],CS[2],CS[0],CS[1],CS[2])
> + *
> + * Results in the following valid placements:
> + * CS[0], CS[1]
> + * CS[0], CS[2]
> + * CS[1], CS[0]
> + * CS[1], CS[2]
> + * CS[2], CS[0]
> + * CS[2], CS[1]
> + *
> + *
> + * This can also be though of as 2 virtual engines:
> + * VE[0] = CS[0], CS[1], CS[2]
> + * VE[1] = CS[0], CS[1], CS[2]
> +
> + * This enables a use case where all engines are created equally, we don't 
> care
> + * where they are scheduled, we just want a certain number of resources, for
> + * those resources to be scheduled in parallel, and possibly across multiple
> + * engine classes.
> + */

So I don't really get what this does compared to setting the flag below.
Is this just about running the batchbuffers the wrong way round, i.e. if
you have (simplest case)

width=2, num_sibglings=1, engines=CS[0], CS[1]

Then both
CS[0], CS[1]
and
CS[1], CS[0]
are possible 

Re: [PATCH v8 5/8] mm: Device exclusive memory access

2021-05-19 Thread Alistair Popple
On Wednesday, 19 May 2021 9:45:05 AM AEST Peter Xu wrote:
> External email: Use caution opening links or attachments
> 
> On Tue, May 18, 2021 at 08:03:27PM -0300, Jason Gunthorpe wrote:
> > Logically during fork all these device exclusive pages should be
> > reverted back to their CPU pages, write protected and the CPU page PTE
> > copied to the fork.
> > 
> > We should not copy the device exclusive page PTE to the fork. I think
> > I pointed to this on an earlier rev..
> 
> Agreed.  Though please see the question I posted in the other thread: now I
> am not very sure whether we'll be able to mark a page as device exclusive
> if that page has mapcount>1.
>
> > We can optimize this into the various variants above, but logically
> > device exclusive stop existing during fork.
> 
> Makes sense, I think that's indeed what this patch did at least for the COW
> case, so I think Alistair did address that comment.  It's just that I think
> we need to drop the other !COW case (imho that should correspond to the
> changes in copy_nonpresent_pte()) in this patch to guarantee it.

Right. The main change from v7 -> v8 was to remove device exclusive entries on 
fork instead of copying them. The change in copy_nonpresent_pte() is for the
!COW case. I think what you are getting at is given exclusive entries are 
(currently) only supported for PageAnon pages is_cow_mapping() will always be 
true and therefore the change to copy_nonpresent_pte() can be dropped. That 
logic seems reasonable so I will change the exclusive case in 
copy_nonpresent_pte() to a VM_WARN_ON.

> I also hope we don't make copy_pte_range() even more complicated just to do
> the lock_page() right, so we could fail the fork() if the lock is hard to
> take.

Failing fork() because we couldn't take a lock doesn't seem like the right 
approach though, especially as there is already existing code that retries. I 
get this adds complexity though, so would be happy to take a look at cleaning 
copy_pte_range() up in future.

> --
> Peter Xu






Re: [PATCH v7 13/16] drm/scheduler: Fix hang when sched_entity released

2021-05-19 Thread Andrey Grodzovsky




On 2021-05-19 6:57 a.m., Christian König wrote:

Am 18.05.21 um 20:48 schrieb Andrey Grodzovsky:

[SNIP]


Would this be the right way to do it ?


Yes, it is at least a start. Question is if we can wait blocking here 
or not.


We install a callback a bit lower to avoid blocking, so I'm pretty 
sure that won't work as expected.


Christian.


I can't see why this would create problems, as long as the dependencies
complete or force competed if they are from same device (extracted) but
on a different ring then looks to me it should work. I will give it
a try.


Ok, but please also test the case for a killed process.

Christian.


You mean something like run glxgears and then simply
terminate it ? Because I done that. Or something more ?

Andrey






Andrey


___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7Candrey.grodzovsky%40amd.com%7Cce1252e55fae4338710d08d91ab4de01%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637570186393107071%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=vGqxY5sxpEIiQGFBNn2PWkKqVjviM29r34Yjv0wujf4%3Dreserved=0 



[PATCH v2] backlight: Kconfig whitespace and indentation cleanups

2021-05-19 Thread Juerg Haefliger
Remove leading whitespaces, replace multi spaces with tabs, and fix help
text indentation.

Signed-off-by: Juerg Haefliger 
---
 drivers/video/backlight/Kconfig | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/video/backlight/Kconfig b/drivers/video/backlight/Kconfig
index d83c87b902c1..c887338de386 100644
--- a/drivers/video/backlight/Kconfig
+++ b/drivers/video/backlight/Kconfig
@@ -128,12 +128,12 @@ config LCD_HX8357
  If you have a HX-8357 LCD panel, say Y to enable its LCD control
  driver.
 
-  config LCD_OTM3225A
-   tristate "ORISE Technology OTM3225A support"
-   depends on SPI
-   help
- If you have a panel based on the OTM3225A controller
- chip then say y to include a driver for it.
+config LCD_OTM3225A
+   tristate "ORISE Technology OTM3225A support"
+   depends on SPI
+   help
+ If you have a panel based on the OTM3225A controller
+ chip then say y to include a driver for it.
 
 endif # LCD_CLASS_DEVICE
 
@@ -269,11 +269,11 @@ config BACKLIGHT_MAX8925
  WLED output, say Y here to enable this driver.
 
 config BACKLIGHT_APPLE
-   tristate "Apple Backlight Driver"
-   depends on X86 && ACPI
-   help
-If you have an Intel-based Apple say Y to enable a driver for its
-backlight.
+   tristate "Apple Backlight Driver"
+   depends on X86 && ACPI
+   help
+ If you have an Intel-based Apple say Y to enable a driver for its
+ backlight.
 
 config BACKLIGHT_TOSA
tristate "Sharp SL-6000 Backlight Driver"
-- 
2.27.0



Re: [PATCH v7 13/16] drm/scheduler: Fix hang when sched_entity released

2021-05-19 Thread Christian König

Am 18.05.21 um 20:48 schrieb Andrey Grodzovsky:

[SNIP]


Would this be the right way to do it ?


Yes, it is at least a start. Question is if we can wait blocking here 
or not.


We install a callback a bit lower to avoid blocking, so I'm pretty 
sure that won't work as expected.


Christian.


I can't see why this would create problems, as long as the dependencies
complete or force competed if they are from same device (extracted) but
on a different ring then looks to me it should work. I will give it
a try.


Ok, but please also test the case for a killed process.

Christian.



Andrey




Re: [PATCH] drm/ttm: Explain why ttm_bo_add_move_fence uses a shared slot

2021-05-19 Thread Huang Rui
On Wed, May 19, 2021 at 04:24:09PM +0800, Daniel Vetter wrote:
> Motivated because I got confused and Christian confirmed why this
> works. I think this is non-obvious enough that it merits a slightly
> longer comment.
> 
> Cc: Christian König 
> Cc: Christian Koenig 
> Cc: Huang Rui 
> Cc: Thomas Hellström 
> Signed-off-by: Daniel Vetter 

Acked-by: Huang Rui 

> ---
>  drivers/gpu/drm/ttm/ttm_bo.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index ca1b098b6a56..51a94fd63bd7 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -682,7 +682,9 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
>  }
>  
>  /*
> - * Add the last move fence to the BO and reserve a new shared slot.
> + * Add the last move fence to the BO and reserve a new shared slot. We only 
> use
> + * a shared slot to avoid unecessary sync and rely on the subsequent bo move 
> to
> + * either stall or use an exclusive fence respectively set bo->moving.
>   */
>  static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo,
>struct ttm_resource_manager *man,
> -- 
> 2.31.0
> 


Re: [RFC] Add DMA_RESV_USAGE flags

2021-05-19 Thread Michel Dänzer
On 2021-05-19 12:06 a.m., Jason Ekstrand wrote:
> On Tue, May 18, 2021 at 4:17 PM Daniel Vetter  wrote:
>>
>> On Tue, May 18, 2021 at 7:40 PM Christian König
>>  wrote:
>>>
>>> Am 18.05.21 um 18:48 schrieb Daniel Vetter:
 On Tue, May 18, 2021 at 2:49 PM Christian König
  wrote:

> And as long as we are all inside amdgpu we also don't have any oversync,
> the issue only happens when we share dma-bufs with i915 (radeon and
> AFAIK nouveau does the right thing as well).
 Yeah because then you can't use the amdgpu dma_resv model anymore and
 have to use the one atomic helpers use. Which is also the one that
 e.g. Jason is threathening to bake in as uapi with his dma_buf ioctl,
 so as soon as that lands and someone starts using it, something has to
 adapt _anytime_ you have a dma-buf hanging around. Not just when it's
 shared with another device.
>>>
>>> Yeah, and that is exactly the reason why I will NAK this uAPI change.
>>>
>>> This doesn't works for amdgpu at all for the reasons outlined above.
>>
>> Uh that's really not how uapi works. "my driver is right, everyone
>> else is wrong" is not how cross driver contracts are defined. If that
>> means a perf impact until you've fixed your rules, that's on you.
>>
>> Also you're a few years too late with nacking this, it's already uapi
>> in the form of the dma-buf poll() support.
> 
> ^^  My fancy new ioctl doesn't expose anything that isn't already
> there.  It just lets you take a snap-shot of a wait instead of doing
> an active wait which might end up with more fences added depending on
> interrupts and retries.  The dma-buf poll waits on all fences for
> POLLOUT and only the exclusive fence for POLLIN.  It's already uAPI.

Note that the dma-buf poll support could be useful to Wayland compositors for 
the same purpose as Jason's new ioctl (only using client buffers which have 
finished drawing for an output frame, to avoid missing a refresh cycle due to 
client drawing), *if* it didn't work differently with amdgpu.

Am I understanding correctly that Jason's new ioctl would also work differently 
with amdgpu as things stand currently? If so, that would be a real bummer and 
might hinder adoption of the ioctl by Wayland compositors.


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer


Re: [PATCH v8 5/8] mm: Device exclusive memory access

2021-05-19 Thread Alistair Popple
On Wednesday, 19 May 2021 7:16:38 AM AEST Peter Xu wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Wed, Apr 07, 2021 at 06:42:35PM +1000, Alistair Popple wrote:
> 
> [...]
> 
> > +static bool try_to_protect(struct page *page, struct mm_struct *mm,
> > +unsigned long address, void *arg)
> > +{
> > + struct ttp_args ttp = {
> > + .mm = mm,
> > + .address = address,
> > + .arg = arg,
> > + .valid = false,
> > + };
> > + struct rmap_walk_control rwc = {
> > + .rmap_one = try_to_protect_one,
> > + .done = page_not_mapped,
> > + .anon_lock = page_lock_anon_vma_read,
> > + .arg = ,
> > + };
> > +
> > + /*
> > +  * Restrict to anonymous pages for now to avoid potential writeback
> > +  * issues.
> > +  */
> > + if (!PageAnon(page))
> > + return false;
> > +
> > + /*
> > +  * During exec, a temporary VMA is setup and later moved.
> > +  * The VMA is moved under the anon_vma lock but not the
> > +  * page tables leading to a race where migration cannot
> > +  * find the migration ptes. Rather than increasing the
> > +  * locking requirements of exec(), migration skips
> > +  * temporary VMAs until after exec() completes.
> > +  */
> > + if (!PageKsm(page) && PageAnon(page))
> > + rwc.invalid_vma = invalid_migration_vma;
> > +
> > + rmap_walk(page, );
> > +
> > + return ttp.valid && !page_mapcount(page);
> > +}
> 
> I raised a question in the other thread regarding fork():
> 
> https://lore.kernel.org/lkml/YKQjmtMo+YQGx%2FwZ@t490s/
> 
> While I suddenly noticed that we may have similar issues even if we fork()
> before creating the ptes.
> 
> In that case, we may see multiple read-only ptes pointing to the same page. 
> We will convert all of them into device exclusive read ptes in rmap_walk()
> above, however how do we guarantee after all COW done in the parent and all
> the childs processes, the device owned page will be returned to the parent?

I assume you are talking about a fork() followed by a call to 
make_device_exclusive()? I think this should be ok because 
make_device_exclusive() always calls GUP with FOLL_WRITE both to break COW and 
because a device performing atomic operations needs to write to the page. I 
suppose a comment here highlighting the need to break COW to avoid this 
scenario would be useful though.

> E.g., if parent accesses the page earlier than the children processes
> (actually, as long as not the last one), do_wp_page() will do COW for parent
> on this page because refcount(page)>1, then the page seems to get lost to a
> random child too..
>
> To resolve all these complexity, not sure whether try_to_protect() could
> enforce VM_DONTCOPY (needs madvise MADV_DONTFORK in the user app), meanwhile
> make sure mapcount(page)==1 before granting the page to the device, so that
> this will guarantee this mm owns this page forever, I think?  It'll
> simplify fork() too as a side effect, since VM_DONTCOPY vma go away when
> fork.
> 
> --
> Peter Xu






Re: [PATCH] drm/ttm: Explain why ttm_bo_add_move_fence uses a shared slot

2021-05-19 Thread Christian König

Am 19.05.21 um 10:24 schrieb Daniel Vetter:

Motivated because I got confused and Christian confirmed why this
works. I think this is non-obvious enough that it merits a slightly
longer comment.

Cc: Christian König 
Cc: Christian Koenig 
Cc: Huang Rui 
Cc: Thomas Hellström 
Signed-off-by: Daniel Vetter 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/ttm/ttm_bo.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index ca1b098b6a56..51a94fd63bd7 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -682,7 +682,9 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
  }
  
  /*

- * Add the last move fence to the BO and reserve a new shared slot.
+ * Add the last move fence to the BO and reserve a new shared slot. We only use
+ * a shared slot to avoid unecessary sync and rely on the subsequent bo move to
+ * either stall or use an exclusive fence respectively set bo->moving.
   */
  static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo,
 struct ttm_resource_manager *man,




Re: [PATCH v2 09/15] drm/ttm, drm/amdgpu: Allow the driver some control over swapping

2021-05-19 Thread Christian König

Am 19.05.21 um 08:27 schrieb Thomas Hellström:


On 5/18/21 6:30 PM, Christian König wrote:

Am 18.05.21 um 18:07 schrieb Thomas Hellström:


On 5/18/21 5:42 PM, Christian König wrote:

Am 18.05.21 um 17:38 schrieb Thomas Hellström:


On 5/18/21 5:28 PM, Christian König wrote:

Am 18.05.21 um 17:20 schrieb Thomas Hellström:


On 5/18/21 5:18 PM, Christian König wrote:



Am 18.05.21 um 17:15 schrieb Thomas Hellström:


On 5/18/21 10:26 AM, Thomas Hellström wrote:
We are calling the eviction_valuable driver callback at 
eviction time to

determine whether we actually can evict a buffer object.
The upcoming i915 TTM backend needs the same functionality 
for swapout,

and that might actually be beneficial to other drivers as well.

Add an eviction_valuable call also in the swapout path. Try 
to keep the
current behaviour for all drivers by returning true if the 
buffer object
is already in the TTM_PL_SYSTEM placement. We change 
behaviour for the
case where a buffer object is in a TT backed placement when 
swapped out,

in which case the drivers normal eviction_valuable path is run.

Finally export ttm_tt_unpopulate() and don't swap out bos
that are not populated. This allows a driver to purge a bo at
swapout time if its content is no longer valuable rather than to
have TTM swap the contents out.

Cc: Christian König 
Signed-off-by: Thomas Hellström 



Christian,

Here we have a ttm_tt_unpopulate() export as well at the end. 
I figure you will push back on that one. What we really need 
is a functionality to just drop the bo contents and end up in 
system memory unpopulated. Should I perhaps add a utility 
function to do that instead? like ttm_bo_purge()?


We already have that. Just call ttm_bo_validate() without any 
place to put the buffer.


See how ttm_bo_pipeline_gutting() is used.

Christian.


OK, so is that reentrant from the move() or swap_notify() callback.


That sounds like a design bug to me since you should never need 
to do this.


When you want to destroy the backing store of a buffer during 
eviction you should just do this by returning an empty placement 
from the evict_flags callback.


So this is for the functionality where the user has indicated that 
the contents is no longer of value, but the buffer itself
is cached until evicted or swapped out for performance reasons. So 
the above would work for eviction, but what about swapout. Could 
we add some similar functionality there?


Amdgpu has the same functionality and you don't need to handle swap 
at all.


Just return from the evict_flags that you want to drop the backing 
store as soon as the BO leaves the GTT domain.


Hmm, the pipeline_gutting function seems ok, but overly complex if 
the bo is already idle, Am I allowed to optimize it slightly for the 
latter case?


Yeah, sure. We just never hat that use case so far.


One thing about the code here that makes me worried is that the 
"destination" ttm_tt is allocated *after* pipeline_gutting. We're not 
really allowed to fail here because that would leave the BO in a state 
where codepaths (fault for example) try to access a NULL ttm_tt. While 
the idle case can get away with ttm_tt_unpopulate, for the async case, 
ttm_tt really needs to be pre-allocated, so that we can leave the bo 
in a consistent state.


Well the original plan was to make tt allocation purely optional.

But I didn't had the time so far to actually fix that.

Christian.



/Thomas




Christian.



/Thomas




Christian.



/Thomas



Regards,
Christian.



/Thomas







Thanks,

Thomas














Re: [PATCH 3/3] drm: document minimum kernel version for DRM_CLIENT_CAP_*

2021-05-19 Thread Daniel Stone
Hi Simon,

On Tue, 18 May 2021 at 12:15, Simon Ser  wrote:
> The kernel versions including the following commits are referenced:
> [...]

Thanks a lot for writing this up! I trust your ability to drive git
log/blame/describe so didn't bother double-checking the version
numbers. So this is:
Acked-by: Daniel Stone 

Cheers,
Daniel


Re: [PATCH -next] drm: Fix missing unlock and free on error in drm_legacy_addbufs_pci()

2021-05-19 Thread Daniel Vetter
On Tue, May 18, 2021 at 08:35:02PM +0800, Zou Wei wrote:
> Add the missing unlock and free before return from function
> drm_legacy_addbufs_pci() in the error handling case.
> 
> Fixes: 70556e24e18e ("drm: remove usage of drm_pci_alloc/free")
> Reported-by: Hulk Robot 
> Signed-off-by: Zou Wei 

Applied to drm-misc-next, thanks for your patch.
-Daniel

> ---
>  drivers/gpu/drm/drm_bufs.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_bufs.c b/drivers/gpu/drm/drm_bufs.c
> index 4805726..c23d7f7 100644
> --- a/drivers/gpu/drm/drm_bufs.c
> +++ b/drivers/gpu/drm/drm_bufs.c
> @@ -984,8 +984,16 @@ int drm_legacy_addbufs_pci(struct drm_device *dev,
>  
>   while (entry->buf_count < count) {
>   dmah = kmalloc(sizeof(drm_dma_handle_t), GFP_KERNEL);
> - if (!dmah)
> + if (!dmah) {
> + /* Set count correctly so we free the proper amount. */
> + entry->buf_count = count;
> + entry->seg_count = count;
> + drm_cleanup_buf_error(dev, entry);
> + kfree(temp_pagelist);
> + mutex_unlock(>struct_mutex);
> + atomic_dec(>buf_alloc);
>   return -ENOMEM;
> + }
>  
>   dmah->size = total;
>   dmah->vaddr = dma_alloc_coherent(dev->dev,
> -- 
> 2.6.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 3/3] drm: document minimum kernel version for DRM_CLIENT_CAP_*

2021-05-19 Thread Daniel Vetter
On Tue, May 18, 2021 at 11:14:52AM +, Simon Ser wrote:
> The kernel versions including the following commits are referenced:
> 
> DRM_CLIENT_CAP_STEREO_3D
> 61d8e3282541 ("drm: Add a STEREO_3D capability to the SET_CLIENT_CAP ioctl")
> 
> DRM_CLIENT_CAP_UNIVERSAL_PLANES
> 681e7ec73044 ("drm: Allow userspace to ask for universal plane list (v2)")
> c7dbc6c9ae5c ("drm: Remove command line guard for universal planes")
> 
> DRM_CLIENT_CAP_ATOMIC
> 88a48e297b3a ("drm: add atomic properties")
> 8b72ce158cf0 ("drm: Always enable atomic API")
> 
> DRM_CLIENT_CAP_ASPECT_RATIO
> 7595bda2fb43 ("drm: Add DRM client cap for aspect-ratio")
> 
> DRM_CLIENT_CAP_WRITEBACK_CONNECTORS
> d67b6a206507 ("drm: writeback: Add client capability for exposing writeback 
> connectors")
> 
> Signed-off-by: Simon Ser 
> Cc: Daniel Vetter 
> Cc: Daniel Stone 
> Cc: Pekka Paalanen 

On the series:

Reviewed-by: Daniel Vetter 

btw your threading is busted.
-Daniel
> ---
>  include/uapi/drm/drm.h | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
> index 87878aea4526..ec2b122cdcc5 100644
> --- a/include/uapi/drm/drm.h
> +++ b/include/uapi/drm/drm.h
> @@ -780,6 +780,8 @@ struct drm_get_cap {
>   * If set to 1, the DRM core will expose the stereo 3D capabilities of the
>   * monitor by advertising the supported 3D layouts in the flags of struct
>   * drm_mode_modeinfo. See ``DRM_MODE_FLAG_3D_*``.
> + *
> + * This capability is always supported starting from kernel version 3.13.
>   */
>  #define DRM_CLIENT_CAP_STEREO_3D 1
>  
> @@ -788,6 +790,9 @@ struct drm_get_cap {
>   *
>   * If set to 1, the DRM core will expose all planes (overlay, primary, and
>   * cursor) to userspace.
> + *
> + * This capability has been introduced in kernel version 3.15. Starting from
> + * kernel version 3.17, this capability is always supported.
>   */
>  #define DRM_CLIENT_CAP_UNIVERSAL_PLANES  2
>  
> @@ -797,6 +802,13 @@ struct drm_get_cap {
>   * If set to 1, the DRM core will expose atomic properties to userspace. This
>   * implicitly enables _CLIENT_CAP_UNIVERSAL_PLANES and
>   * _CLIENT_CAP_ASPECT_RATIO.
> + *
> + * If the driver doesn't support atomic mode-setting, enabling this 
> capability
> + * will fail with -EOPNOTSUPP.
> + *
> + * This capability has been introduced in kernel version 4.0. Starting from
> + * kernel version 4.2, this capability is always supported for atomic-capable
> + * drivers.
>   */
>  #define DRM_CLIENT_CAP_ATOMIC3
>  
> @@ -805,6 +817,8 @@ struct drm_get_cap {
>   *
>   * If set to 1, the DRM core will provide aspect ratio information in modes.
>   * See ``DRM_MODE_FLAG_PIC_AR_*``.
> + *
> + * This capability is always supported starting from kernel version 4.18.
>   */
>  #define DRM_CLIENT_CAP_ASPECT_RATIO4
>  
> @@ -814,6 +828,9 @@ struct drm_get_cap {
>   * If set to 1, the DRM core will expose special connectors to be used for
>   * writing back to memory the scene setup in the commit. The client must 
> enable
>   * _CLIENT_CAP_ATOMIC first.
> + *
> + * This capability is always supported for atomic-capable drivers starting 
> from
> + * kernel version 4.19.
>   */
>  #define DRM_CLIENT_CAP_WRITEBACK_CONNECTORS  5
>  
> -- 
> 2.31.1
> 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[PATCH] Revert "drm/i915: Propagate errors on awaiting already signaled fences"

2021-05-19 Thread Daniel Vetter
From: Jason Ekstrand 

This reverts commit 9e31c1fe45d555a948ff66f1f0e3fe1f83ca63f7.  Ever
since that commit, we've been having issues where a hang in one client
can propagate to another.  In particular, a hang in an app can propagate
to the X server which causes the whole desktop to lock up.

Error propagation along fences sound like a good idea, but as your bug
shows, surprising consequences, since propagating errors across security
boundaries is not a good thing.

What we do have is track the hangs on the ctx, and report information to
userspace using RESET_STATS. That's how arb_robustness works. Also, if my
understanding is still correct, the EIO from execbuf is when your context
is banned (because not recoverable or too many hangs). And in all these
cases it's up to userspace to figure out what is all impacted and should
be reported to the application, that's not on the kernel to guess and
automatically propagate.

What's more, we're also building more features on top of ctx error
reporting with RESET_STATS ioctl: Encrypted buffers use the same, and the
userspace fence wait also relies on that mechanism. So it is the path
going forward for reporting gpu hangs and resets to userspace.

So all together that's why I think we should just bury this idea again as
not quite the direction we want to go to, hence why I think the revert is
the right option here.Signed-off-by: Jason Ekstrand 

v2: Augment commit message. Also restore Jason's sob that I
accidentally lost.

Signed-off-by: Jason Ekstrand  (v1)
Reported-by: Marcin Slusarz 
Cc:  # v5.6+
Cc: Jason Ekstrand 
Cc: Marcin Slusarz 
Cc: Jon Bloomfield 
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
Fixes: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled 
fences")
Signed-off-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/i915_request.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 970d8f4986bb..b796197c0772 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1426,10 +1426,8 @@ i915_request_await_execution(struct i915_request *rq,
 
do {
fence = *child++;
-   if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, >flags)) {
-   i915_sw_fence_set_error_once(>submit, fence->error);
+   if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, >flags))
continue;
-   }
 
if (fence->context == rq->fence.context)
continue;
@@ -1527,10 +1525,8 @@ i915_request_await_dma_fence(struct i915_request *rq, 
struct dma_fence *fence)
 
do {
fence = *child++;
-   if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, >flags)) {
-   i915_sw_fence_set_error_once(>submit, fence->error);
+   if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, >flags))
continue;
-   }
 
/*
 * Requests on the same timeline are explicitly ordered, along
-- 
2.31.0



Re: [drm-rerere PATCH] nightly.conf: drop amd branches from drm-tip

2021-05-19 Thread Daniel Vetter
On Wed, May 19, 2021 at 11:49:32AM +0300, Jani Nikula wrote:
> We've had a stale repo for amd in drm-tip since around v4.15 i.e. for
> more than three years. Nobody seems to notice or care. Drop the amd
> branches from drm-tip.
> 
> Having the current amd branches in drm-tip would be nice to have, if
> only to have a common drm integration tree. However, maintaining that
> has a cost due to the inevitable conflicts. We can add the branches back
> if and when there's interest in sharing the burden.
> 
> Cc: Alex Deucher 
> Cc: Christian König 
> Cc: Pan, Xinhui 
> Cc: Daniel Vetter 
> Signed-off-by: Jani Nikula 

Acked-by: Daniel Vetter 
> ---
>  nightly.conf | 9 -
>  1 file changed, 9 deletions(-)
> 
> diff --git a/nightly.conf b/nightly.conf
> index 9211550ef75c..35fb1d9ba600 100644
> --- a/nightly.conf
> +++ b/nightly.conf
> @@ -40,12 +40,6 @@ git://anongit.freedesktop.org/drm-misc
>  https://anongit.freedesktop.org/git/drm/drm-misc
>  https://anongit.freedesktop.org/git/drm/drm-misc.git
>  "
> -drm_tip_repos[drm-amd]="
> -ssh://git.freedesktop.org/git/drm/drm-amd
> -git://anongit.freedesktop.org/drm/drm-amd
> -https://anongit.freedesktop.org/git/drm/drm-amd
> -https://anongit.freedesktop.org/git/drm/drm-amd.git
> -"
>  drm_tip_repos[drm]="
>  ssh://git.freedesktop.org/git/drm/drm
>  git://anongit.freedesktop.org/drm/drm
> @@ -76,17 +70,14 @@ drm_tip_config=(
>   "drmdrm-fixes"
>   "drm-misc   drm-misc-fixes"
>   "drm-intel  drm-intel-fixes"
> - "drm-amddrm-amd-fixes"
>  
>   "drmdrm-next"
>   "drm-misc   drm-misc-next-fixes"
>   "drm-intel  drm-intel-next-fixes"
> - "drm-amddrm-amd-next-fixes"
>  
>   "drm-misc   drm-misc-next"
>   "drm-intel  drm-intel-next"
>   "drm-intel  drm-intel-gt-next"
> - "drm-amddrm-amd-next"
>  
>   "sound-upstream for-linus"
>   "sound-upstream for-next"
> -- 
> 2.20.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] [PATCH v2 11/15] drm/i915/lmem: Verify checks for lmem residency

2021-05-19 Thread Matthew Auld
On Tue, 18 May 2021 at 09:28, Thomas Hellström
 wrote:
>
> Since objects can be migrated or evicted when not pinned or locked,
> update the checks for lmem residency or future residency so that
> the value returned is not immediately stale.
>
> Signed-off-by: Thomas Hellström 
Reviewed-by: Matthew Auld 


Re: [RFC PATCH 0/3] A drm_plane API to support HDR planes

2021-05-19 Thread Pekka Paalanen
On Wed, 19 May 2021 11:53:37 +0300
Pekka Paalanen  wrote:

...

> TL;DR:
> 
> I would summarise my comments so far into these:
> 
> - Telling the kernel the color spaces and letting it come up with
>   whatever color transformation formula from those is not enough,
>   because it puts the render intent policy decision in the kernel.
> 
> - Telling the kernel what color transformations need to be done is
>   good, if it is clearly defined.
> 
> - Using an enum-based UAPI to tell the kernel what color
>   transformations needs to be done (e.g. which EOTF or EOTF^-1 to apply
>   at a step in the abstract pipeline) is very likely ok for many
>   Wayland compositors in most cases, but may not be sufficient for all
>   use cases. Of course, one is always bound by what hardware can do, so
>   not a big deal.
> 
> - You may need to define mutually exclusive KMS properties (referring
>   to my email in another branch of this email tree).
> 
> - I'm not sure I (we?) can meaningfully review things like "SDR boost"
>   property until we know ourselves how to composite different types of
>   content together. Maybe someone else could.
> 
> Does this help or raise thoughts?
> 
> The work on Weston CM right now is aiming to get it up to a point
> where we can start nicely testing different compositing approaches and
> methods and parameters, and I expect that will also feed back into the
> Wayland CM protocol design as well.

I have forgot to mention one important thing:

Generic Wayland compositors will be using KMS planes opportunistically.
The compositor will be switching between GL and KMS compositing
on-demand, refresh by refresh. This means that both GL and KMS
compositing must produce identical results, or users will be seeing
"color flicks" on switch.

This is a practical reason why we really want to know in full detail
how the KMS pipeline processes pixels.


Thanks,
pq


pgpI9wGoNoLIH.pgp
Description: OpenPGP digital signature


Re: 回复: [RFC PATCH 1/2] drm/amdgpu: Fix memory corruption due to swapout and swapin

2021-05-19 Thread Christian König

I'm scratching my head how that is even possible.

See when a BO is created in the system domain it is just an empty hull, 
e.g. without backing store and allocated pages.


So the swapout function will just ignore it.

Christian.

Am 19.05.21 um 07:07 schrieb Pan, Xinhui:

[AMD Official Use Only]

I have reverted Chris'  patch, still hit this failure.
Just see two lines in Chris' patch. Any BO in cpu domian would be swapout 
first. That is why we hit this issue frequently now. But the bug is there long 
time ago.

-   for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
-   list_for_each_entry(bo, >swap_lru[i], swap) {
[snip]
+   for (i = TTM_PL_SYSTEM; i < TTM_NUM_MEM_TYPES; ++i) {
+   for (j = 0; j < TTM_MAX_BO_PRIORITY; ++j) {



发件人: Pan, Xinhui 
发送时间: 2021年5月19日 12:09
收件人: Kuehling, Felix; amd-...@lists.freedesktop.org
抄送: Deucher, Alexander; Koenig, Christian; dri-devel@lists.freedesktop.org; 
dan...@ffwll.ch
主题: 回复: [RFC PATCH 1/2] drm/amdgpu: Fix memory corruption due to swapout and 
swapin

yes, we really dont swapout SG BOs.
The problems is that before we validate a userptr BO, we create this BO in CPU 
domain by default. So this BO has chance to swapout.

we set flag TTM_PAGE_FLAG_SG on userptr BO in popluate() which is too late.
I have not try to revert Chris' patch as I think it desnt help. Or I can have a 
try later.


发件人: Kuehling, Felix 
发送时间: 2021年5月19日 11:29
收件人: Pan, Xinhui; amd-...@lists.freedesktop.org
抄送: Deucher, Alexander; Koenig, Christian; dri-devel@lists.freedesktop.org; 
dan...@ffwll.ch
主题: Re: [RFC PATCH 1/2] drm/amdgpu: Fix memory corruption due to swapout and 
swapin

Swapping SG BOs makes no sense, because TTM doesn't own the pages of
this type of BO.

Last I checked, userptr BOs (and other SG BOs) were protected from
swapout by the fact that they would not be added to the swap-LRU. But it
looks like Christian just removed the swap-LRU. I guess this broke that
protection:

commit 2cb51d22d70b18eaf339abf9758bf0b7608da65c
Author: Christian König 
Date:   Tue Oct 6 16:30:09 2020 +0200

  drm/ttm: remove swap LRU v3

  Instead evict round robin from each devices SYSTEM and TT domain.

  v2: reorder num_pages access reported by Dan's script
  v3: fix rebase fallout, num_pages should be 32bit

  Signed-off-by: Christian König 
  Tested-by: Nirmoy Das 
  Reviewed-by: Huang Rui 
  Reviewed-by: Matthew Auld 
  Link: https://patchwork.freedesktop.org/patch/424009/

Regards,
Felix


On 2021-05-18 10:28 p.m., xinhui pan wrote:

cpu 1   cpu 2
kfd alloc BO A(userptr) alloc BO B(GTT)
  ->init -> validate   -> init -> validate -> 
populate
  init_user_pages-> swapout BO A //hit ttm 
pages limit
   -> get_user_pages (fill up ttm->pages)
-> validate -> populate
-> swapin BO A // Now hit the BUG

We know that get_user_pages may race with swapout on same BO.
Threre are some issues I have met.
1) memory corruption.
This is because we do a swap before memory is setup. ttm_tt_swapout()
just create a swap_storage with its content being 0x0. So when we setup
memory after the swapout. The following swapin makes the memory
corrupted.

2) panic
When swapout happes with get_user_pages, they touch ttm->pages without
anylock. It causes memory corruption too. But I hit page fault mostly.

Signed-off-by: xinhui pan 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 16 +++-
   1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 928e8d57cd08..42460e4480f8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -835,6 +835,7 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t 
user_addr)
   struct amdkfd_process_info *process_info = mem->process_info;
   struct amdgpu_bo *bo = mem->bo;
   struct ttm_operation_ctx ctx = { true, false };
+ struct page **pages;
   int ret = 0;

   mutex_lock(_info->lock);
@@ -852,7 +853,13 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t 
user_addr)
   goto out;
   }

- ret = amdgpu_ttm_tt_get_user_pages(bo, bo->tbo.ttm->pages);
+ pages = kvmalloc_array(bo->tbo.ttm->num_pages,
+ sizeof(struct page *),
+ GFP_KERNEL | __GFP_ZERO);
+ if (!pages)
+ goto unregister_out;
+
+ ret = amdgpu_ttm_tt_get_user_pages(bo, pages);
   if (ret) {
   pr_err("%s: Failed to get user pages: %d\n", __func__, ret);
   goto unregister_out;
@@ -863,6 +870,12 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t 
user_addr)
   pr_err("%s: Failed 

Re: [Intel-gfx] [PATCH v2 10/15] drm/i915/ttm: Introduce a TTM i915 gem object backend

2021-05-19 Thread Matthew Auld
On Tue, 18 May 2021 at 09:28, Thomas Hellström
 wrote:
>
> Most logical place to introduce TTM buffer objects is as an i915
> gem object backend. We need to add some ops to account for added
> functionality like delayed delete and LRU list manipulation.
>
> Initially we support only LMEM and SYSTEM memory, but SYSTEM
> (which in this case means evicted LMEM objects) is not
> visible to i915 GEM yet. The plan is to move the i915 gem system region
> over to the TTM system memory type in upcoming patches.
>
> We set up GPU bindings directly both from LMEM and from the system region,
> as there is no need to use the legacy TTM_TT memory type. We reserve
> that for future porting of GGTT bindings to TTM.
>
> Remove the old lmem backend.
>
> Signed-off-by: Thomas Hellström 
> ---
> v2:
> - Break out needed TTM functionality to a separate patch (Reported by
> Christian König).
> - Fix an unhandled error (Reported by Matthew Auld and Maarten Lankhorst)
> - Remove a stray leftover sg_table allocation (Reported by Matthew Auld)
> - Use ttm_tt_unpopulate() rather than ttm_tt_destroy() in the purge path
>   as some TTM functionality relies on having a ttm_tt present for !is_iomem.
> ---
>  drivers/gpu/drm/i915/Makefile |   1 +
>  drivers/gpu/drm/i915/gem/i915_gem_lmem.c  |  84 ---
>  drivers/gpu/drm/i915/gem/i915_gem_lmem.h  |   5 -
>  drivers/gpu/drm/i915/gem/i915_gem_object.c| 125 +++--
>  drivers/gpu/drm/i915/gem/i915_gem_object.h|   9 +
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  18 +
>  drivers/gpu/drm/i915/gem/i915_gem_region.c|   6 +-
>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c   | 519 ++
>  drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |  48 ++
>  drivers/gpu/drm/i915/gt/intel_region_lmem.c   |   3 +-
>  drivers/gpu/drm/i915/i915_gem.c   |   5 +-
>  drivers/gpu/drm/i915/intel_memory_region.c|   1 -
>  drivers/gpu/drm/i915/intel_memory_region.h|   1 -
>  drivers/gpu/drm/i915/intel_region_ttm.c   |   5 +-
>  drivers/gpu/drm/i915/intel_region_ttm.h   |   7 +-
>  15 files changed, 696 insertions(+), 141 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm.h
>
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 958ccc1edfed..ef0d884a9e2d 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -155,6 +155,7 @@ gem-y += \
> gem/i915_gem_stolen.o \
> gem/i915_gem_throttle.o \
> gem/i915_gem_tiling.o \
> +   gem/i915_gem_ttm.o \
> gem/i915_gem_ttm_bo_util.o \
> gem/i915_gem_userptr.o \
> gem/i915_gem_wait.o \
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
> index 3b4aa28a076d..2b8cd15de1d9 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
> @@ -4,74 +4,10 @@
>   */
>
>  #include "intel_memory_region.h"
> -#include "intel_region_ttm.h"
>  #include "gem/i915_gem_region.h"
>  #include "gem/i915_gem_lmem.h"
>  #include "i915_drv.h"
>
> -static void lmem_put_pages(struct drm_i915_gem_object *obj,
> -  struct sg_table *pages)
> -{
> -   intel_region_ttm_node_free(obj->mm.region, obj->mm.st_mm_node);
> -   obj->mm.dirty = false;
> -   sg_free_table(pages);
> -   kfree(pages);
> -}
> -
> -static int lmem_get_pages(struct drm_i915_gem_object *obj)
> -{
> -   unsigned int flags;
> -   struct sg_table *pages;
> -
> -   flags = I915_ALLOC_MIN_PAGE_SIZE;
> -   if (obj->flags & I915_BO_ALLOC_CONTIGUOUS)
> -   flags |= I915_ALLOC_CONTIGUOUS;
> -
> -   obj->mm.st_mm_node = intel_region_ttm_node_alloc(obj->mm.region,
> -obj->base.size,
> -flags);
> -   if (IS_ERR(obj->mm.st_mm_node))
> -   return PTR_ERR(obj->mm.st_mm_node);
> -
> -   /* Range manager is always contigous */
> -   if (obj->mm.region->is_range_manager)
> -   obj->flags |= I915_BO_ALLOC_CONTIGUOUS;
> -   pages = intel_region_ttm_node_to_st(obj->mm.region, 
> obj->mm.st_mm_node);
> -   if (IS_ERR(pages)) {
> -   intel_region_ttm_node_free(obj->mm.region, 
> obj->mm.st_mm_node);
> -   return PTR_ERR(pages);
> -   }
> -
> -   __i915_gem_object_set_pages(obj, pages, 
> i915_sg_dma_sizes(pages->sgl));
> -
> -   if (obj->flags & I915_BO_ALLOC_CPU_CLEAR) {
> -   void __iomem *vaddr =
> -   i915_gem_object_lmem_io_map(obj, 0, obj->base.size);
> -
> -   if (!vaddr) {
> -   struct sg_table *pages =
> -   __i915_gem_object_unset_pages(obj);
> -
> -   if (!IS_ERR_OR_NULL(pages))
> -   

Re: New uAPI for color management proposal and feedback request

2021-05-19 Thread Pekka Paalanen
On Wed, 12 May 2021 16:04:16 +0300
Ville Syrjälä  wrote:

> On Wed, May 12, 2021 at 02:06:56PM +0200, Werner Sembach wrote:
> > Hello,
> > 
> > In addition to the existing "max bpc", and "Broadcast RGB/output_csc" drm 
> > properties I propose 4 new properties:
> > "preferred pixel encoding", "active color depth", "active color range", and 
> > "active pixel encoding"
> > 
> > 
> > Motivation:
> > 
> > Current monitors have a variety pixel encodings available: RGB, YCbCr 
> > 4:4:4, YCbCr 4:2:2, YCbCr 4:2:0.
> > 
> > In addition they might be full or limited RGB range and the monitors accept 
> > different bit depths.
> > 
> > Currently the kernel driver for AMD and Intel GPUs automatically configure 
> > the color settings automatically with little
> > to no influence of the user. However there are several real world scenarios 
> > where the user might disagree with the
> > default chosen by the drivers and wants to set his or her own preference.
> > 
> > Some examples:
> > 
> > 1. While RGB and YCbCr 4:4:4 in theory carry the same amount of color 
> > information, some screens might look better on one
> > than the other because of bad internal conversion. The driver currently 
> > however has a fixed default that is chosen if
> > available (RGB for Intel and YCbCr 4:4:4 for AMD). The only way to change 
> > this currently is by editing and overloading
> > the edid reported by the monitor to the kernel.
> > 
> > 2. RGB and YCbCr 4:4:4 need a higher port clock then YCbCr 4:2:0. Some 
> > hardware might report that it supports the higher
> > port clock, but because of bad shielding on the PC, the cable, or the 
> > monitor the screen cuts out every few seconds when
> > RGB or YCbCr 4:4:4 encoding is used, while YCbCr 4:2:0 might just work fine 
> > without changing hardware. The drivers
> > currently however always default to the "best available" option even if it 
> > might be broken.
> > 
> > 3. Some screens natively only supporting 8-bit color, simulate 10-Bit color 
> > by rapidly switching between 2 adjacent
> > colors. They advertise themselves to the kernel as 10-bit monitors but the 
> > user might not like the "fake" 10-bit effect
> > and prefer running at the native 8-bit per color.
> > 
> > 4. Some screens are falsely classified as full RGB range wile they actually 
> > use limited RGB range. This results in
> > washed out colors in dark and bright scenes. A user override can be helpful 
> > to manually fix this issue when it occurs.
> > 
> > There already exist several requests, discussion, and patches regarding the 
> > thematic:
> > 
> > - https://gitlab.freedesktop.org/drm/amd/-/issues/476
> > 
> > - https://gitlab.freedesktop.org/drm/amd/-/issues/1548
> > 
> > - https://lkml.org/lkml/2021/5/7/695
> > 
> > - https://lkml.org/lkml/2021/5/11/416
> > 

...

> > Adoption:
> > 
> > A KDE dev wants to implement the settings in the KDE settings GUI:
> > https://gitlab.freedesktop.org/drm/amd/-/issues/476#note_912370
> > 
> > Tuxedo Computers (my employer) wants to implement the settings desktop 
> > environment agnostic in Tuxedo Control Center. I
> > will start work on this in parallel to implementing the new kernel code.  
> 
> I suspect everyone would be happier to accept new uapi if we had
> multiple compositors signed up to implement it.

I think having Weston support for these would be good, but for now it
won't be much of an UI: just weston.ini to set, and the log to see what
happened.

However, knowing what happened is going to be important for color
calibration auditing:
https://gitlab.freedesktop.org/wayland/weston/-/issues/467

Yes, please, very much for read-only properties for the feedback part.
Properties that both userspace and kernel will write are hard to deal
with in general.

Btw. "max bpc" I can kind of guess that conversion from framebuffer
format to the wire bpc happens automatically and only as the final
step, but "Broadcast RGB" is more complicated: is the output from the
abstract pixel pipeline sent as-is and "Broadcast RGB" is just another
inforframe bit to the monitor, or does "Broadcast RGB" setting actually
change what happens in the pixel pipeline *and* set infoframe bits?

My vague recollection is that framebuffer was always assumed to be in
full range, and then if "Broadcast RGB" was set to limited range, the
driver would mangle the pixel pipeline to convert from full to limited
range. This means that it would be impossible to have limited range
data in a framebuffer, or there might be a double-conversion by
userspace programming a LUT for limited->full and then the driver
adding full->limited. I'm also confused how full/limited works when
framebuffer is in RGB/YCbCr and the monitor wire format is in RGB/YCbCr
and there may be RGB->YCbCR or YCbCR->RGB conversions going on - or
maybe even FB YCbCR -> RGB -> DEGAMMA -> CTM -> GAMMA -> YCbCR.

I wish someone drew a picture of the KMS abstract pixel pipeline with
all the existing KMS properties in it. :-)


Thanks,
pq



Re: [RFC PATCH 0/3] A drm_plane API to support HDR planes

2021-05-19 Thread Pekka Paalanen
On Tue, 18 May 2021 10:19:25 -0400
Harry Wentland  wrote:

> On 2021-05-18 3:56 a.m., Pekka Paalanen wrote:
> > On Mon, 17 May 2021 15:39:03 -0400
> > Vitaly Prosyak  wrote:
> >   
> >> On 2021-05-17 12:48 p.m., Sebastian Wick wrote:  

...

> >>> I suspect that this is not about tone mapping at all. The use cases
> >>> listed always have the display in PQ mode and just assume that no
> >>> content exceeds the PQ limitations. Then you can simply bring all
> >>> content to the color space with a matrix multiplication and then map the
> >>> linear light content somewhere into the PQ range. Tone mapping is
> >>> performed in the display only.  
> > 
> > The use cases do use the word "desktop" though. Harry, could you expand
> > on this, are you seeking a design that is good for generic desktop
> > compositors too, or one that is more tailored to "embedded" video
> > player systems taking the most advantage of (potentially
> > fixed-function) hardware?
> >   
> 
> The goal is to enable this on a generic desktop, such as generic Wayland
> implementations or ChromeOS. We're not looking for a custom solution for
> some embedded systems, though the solution we end up with should obviously
> not prevent an implementation on embedded video players.

(There is a TL;DR: at the end.)

Echoing a little bit what Sebastian already said, I believe there are
two sides to this again:
- color management in the traditional sense
- modern standardised display technology

It was perhaps too harsh to say that generic Wayland compositors cannot
use enum-based color-related UAPI. Sometimes they could, sometimes it
won't be good enough.

Traditional color management assumes that no two monitors are the same,
even if they are the same make, model, and manufacturing batch, and are
driven exactly the same way. Hence, all monitors may require
calibration (adjusting monitor knobs), and/or they may require
profiling (measuring the light emission with a special hardware device
designed for that). Also the viewing environment has an effect.

For profiling to be at all meaningful, calibration must be fixed. This
means that there must be no dynamic on-the-fly adaptation done in the
monitor, in the display hardware, or in the kernel. That is a tall
order that I guess is going to be less and less achievable, especially
with HDR monitors.

The other side is where the end user trusts the standards, and trusts
that the drivers and the hardware do what they are specified to do.
This is where you can trust that the monitor does the tone-mapping magic
right.

Weston needs to support both approaches, because we want to prove our
new approach to traditional color management, but we also want to
support HDR, and if possible, do both at the same time. Doing both at
the same time is what we think foremost, because it's also the hardest
thing to achieve. If that can be done, then everything else works out
too.

However, this should not exclude the possibility to trust standards and
monitor magic, when the end user wants it.

It's also possible that a monitor simply doesn't support a mode that
would enable fully color managed HDR, so Weston will need to be able to
drive monitors with e.g. BT.2020/PQ data eventually. It's just not the
first goal we have.

This debate is a little bit ironic. The Wayland approach to traditional
color management is that end users should trust the display server to
do the right thing, where before people only trusted the individual
apps using a specific CMS implementation. The display server was the
untrusted one that should just get out of the way and not touch
anything. Now I'm arguing that I don't want to trust monitor magic, who
knows what atrocities it does to my picture! But take the next logical
step, and one would be arguing that end users should trust also
monitors to do the right thing. :-)

The above has two catches:

- Do you actually trust hardware manufacturers and marketers and EDID?
  Monitors have secret sauce you can't inspect nor change.

- You feed a single video stream to a monitor, in a single format,
  encoding and color space. The display server OTOH gets an arbitrary
  number of input video streams in arbitrary formats, encodings, and
  color spaces, and it needs to composite them into one.

Composition is hard. It's not enough to know what kind of signals you
take in and what kind of signal you must output. You also need to know
what the end user wants from the result: the render intent.

Even if we trust the monitor magic to do the right thing in
interpreting and displaying our output signal, we still need to know
what the end user wants from the composition, and we need to control
the composition formula to achieve that.

TL;DR:

I would summarise my comments so far into these:

- Telling the kernel the color spaces and letting it come up with
  whatever color transformation formula from those is not enough,
  because it puts the render intent policy decision in the kernel.

- Telling the kernel what 

[PATCH 00/10] Documentation build warning fixes

2021-05-19 Thread Mauro Carvalho Chehab
Hi Jon,

This small series contain a series of fixes for the documentation. it is
against your docs-next branch.

Three of the patches fix duplicated symbols at the ABI documents.
There are still some ABI warnings from IIO, but all but one were
already fixed at linux-next. So, hopefully, after having everything
merged, the ABI warnings will be solved.

Mauro Carvalho Chehab (10):
  docs: update sysfs-platform_profile.rst reference
  docs: vcpu-requests.rst: fix reference for atomic ops
  docs: translations/zh_CN: fix a typo at 8.Conclusion.rst
  docs: sched-bwc.rst: fix a typo on a doc name
  docs: update pin-control.rst references
  docs: virt: api.rst: fix a pointer to SGX documentation
  docs: ABI: iommu: remove duplicated definition for
sysfs-kernel-iommu_groups
  docs: ABI: sysfs-class-backlight: unify ambient light zone nodes
  docs: ABI: sysfs-class-led-trigger-pattern: remove repeat duplication
  iio: documentation: fix a typo

 Documentation/ABI/testing/sysfs-bus-iio   |   4 +-
 .../ABI/testing/sysfs-class-backlight | 100 ++
 .../ABI/testing/sysfs-class-backlight-adp5520 |  31 --
 .../ABI/testing/sysfs-class-backlight-adp8860 |  37 ---
 .../sysfs-class-backlight-driver-adp8870  |  32 --
 .../testing/sysfs-class-led-driver-el15203000 |   9 --
 .../testing/sysfs-class-led-trigger-pattern   |   3 +
 .../ABI/testing/sysfs-kernel-iommu_groups |  12 +--
 Documentation/scheduler/sched-bwc.rst |   2 +-
 .../zh_CN/process/8.Conclusion.rst|   2 +-
 Documentation/virt/kvm/api.rst|   2 +-
 Documentation/virt/kvm/vcpu-requests.rst  |   2 +-
 include/linux/device.h|   2 +-
 include/linux/mfd/madera/pdata.h  |   2 +-
 include/linux/pinctrl/pinconf-generic.h   |   2 +-
 include/linux/platform_profile.h  |   2 +-
 16 files changed, 117 insertions(+), 127 deletions(-)
 delete mode 100644 Documentation/ABI/testing/sysfs-class-backlight-adp5520
 delete mode 100644 Documentation/ABI/testing/sysfs-class-backlight-adp8860
 delete mode 100644 
Documentation/ABI/testing/sysfs-class-backlight-driver-adp8870
 delete mode 100644 Documentation/ABI/testing/sysfs-class-led-driver-el15203000

-- 
2.31.1




[PATCH 08/10] docs: ABI: sysfs-class-backlight: unify ambient light zone nodes

2021-05-19 Thread Mauro Carvalho Chehab
./scripts/get_abi.pl is warning about duplicated symbol
definition:

Warning: /sys/class/backlight//l1_daylight_max is defined 2 
times:  ./Documentation/ABI/testing/sysfs-class-backlight-driver-adp8870:4  
./Documentation/ABI/testing/sysfs-class-backlight-adp8860:12

What happens is that 3 drivers use the same pattern to report
max and dim setting for different ambient light zones.

It should be noticed that the adp8870 doc was missing an
entry for l1_daylight_dim, which was fixed on this patch.

While the ambient light zone is device-specific, the sysfs
definition is actually common. So, unify them at:

Documentation/ABI/testing/sysfs-class-backlight

and use as the contact point, the e-mail reported by
get_maintainers.pl for the subsystem.

Signed-off-by: Mauro Carvalho Chehab 
---
 .../ABI/testing/sysfs-class-backlight | 100 ++
 .../ABI/testing/sysfs-class-backlight-adp5520 |  31 --
 .../ABI/testing/sysfs-class-backlight-adp8860 |  37 ---
 .../sysfs-class-backlight-driver-adp8870  |  32 --
 4 files changed, 100 insertions(+), 100 deletions(-)
 delete mode 100644 Documentation/ABI/testing/sysfs-class-backlight-adp5520
 delete mode 100644 Documentation/ABI/testing/sysfs-class-backlight-adp8860
 delete mode 100644 
Documentation/ABI/testing/sysfs-class-backlight-driver-adp8870

diff --git a/Documentation/ABI/testing/sysfs-class-backlight 
b/Documentation/ABI/testing/sysfs-class-backlight
index 1fc86401bf95..c453646b06e2 100644
--- a/Documentation/ABI/testing/sysfs-class-backlight
+++ b/Documentation/ABI/testing/sysfs-class-backlight
@@ -84,3 +84,103 @@ Description:
It can be enabled by writing the value stored in
/sys/class/backlight//max_brightness to
/sys/class/backlight//brightness.
+
+What:  /sys/class/backlight//_max
+Date:  Sep, 2009
+KernelVersion: v2.6.32
+Contact:   device-drivers-de...@blackfin.uclinux.org
+Description:
+   Control the maximum brightness for 
+   on this . Values are between 0 and 127. This file
+   will also show the brightness level stored for this
+   .
+
+   The  is device-driver specific:
+
+   For ADP5520 and ADP5501,  can be:
+
+   ===  
+   Ambient  sysfs entry
+   light zone
+   ===  
+   daylight /sys/class/backlight//daylight_max
+   office   /sys/class/backlight//office_max
+   dark /sys/class/backlight//dark_max
+   ===  
+
+   For ADP8860,  can be:
+
+   ===  
+   Ambient  sysfs entry
+   light zone
+   ===  
+   l1_daylight  /sys/class/backlight//l1_daylight_max
+   l2_office/sys/class/backlight//l2_office_max
+   l3_dark  /sys/class/backlight//l3_dark_max
+   ===  
+
+   For ADP8870,  can be:
+
+   ===  
+   Ambient  sysfs entry
+   light zone
+   ===  
+   l1_daylight  /sys/class/backlight//l1_daylight_max
+   l2_bright/sys/class/backlight//l2_bright_max
+   l3_office/sys/class/backlight//l3_office_max
+   l4_indoor/sys/class/backlight//l4_indoor_max
+   l5_dark  /sys/class/backlight//l5_dark_max
+   ===  
+
+   See also: /sys/class/backlight//ambient_light_zone.
+
+What:  /sys/class/backlight//_dim
+Date:  Sep, 2009
+KernelVersion: v2.6.32
+Contact:   device-drivers-de...@blackfin.uclinux.org
+Description:
+   Control the dim brightness for 
+   on this . Values are between 0 and 127, typically
+   set to 0. Full off when the backlight is disabled.
+   This file will also show the dim brightness level stored for
+   this .
+
+   The  is device-driver specific:
+
+   For ADP5520 and ADP5501,  can be:
+
+   ===  
+   Ambient  sysfs entry
+   light zone
+   ===  
+   daylight /sys/class/backlight//daylight_dim
+   office   /sys/class/backlight//office_dim
+   dark 

[drm-rerere PATCH] nightly.conf: drop amd branches from drm-tip

2021-05-19 Thread Jani Nikula
We've had a stale repo for amd in drm-tip since around v4.15 i.e. for
more than three years. Nobody seems to notice or care. Drop the amd
branches from drm-tip.

Having the current amd branches in drm-tip would be nice to have, if
only to have a common drm integration tree. However, maintaining that
has a cost due to the inevitable conflicts. We can add the branches back
if and when there's interest in sharing the burden.

Cc: Alex Deucher 
Cc: Christian König 
Cc: Pan, Xinhui 
Cc: Daniel Vetter 
Signed-off-by: Jani Nikula 
---
 nightly.conf | 9 -
 1 file changed, 9 deletions(-)

diff --git a/nightly.conf b/nightly.conf
index 9211550ef75c..35fb1d9ba600 100644
--- a/nightly.conf
+++ b/nightly.conf
@@ -40,12 +40,6 @@ git://anongit.freedesktop.org/drm-misc
 https://anongit.freedesktop.org/git/drm/drm-misc
 https://anongit.freedesktop.org/git/drm/drm-misc.git
 "
-drm_tip_repos[drm-amd]="
-ssh://git.freedesktop.org/git/drm/drm-amd
-git://anongit.freedesktop.org/drm/drm-amd
-https://anongit.freedesktop.org/git/drm/drm-amd
-https://anongit.freedesktop.org/git/drm/drm-amd.git
-"
 drm_tip_repos[drm]="
 ssh://git.freedesktop.org/git/drm/drm
 git://anongit.freedesktop.org/drm/drm
@@ -76,17 +70,14 @@ drm_tip_config=(
"drmdrm-fixes"
"drm-misc   drm-misc-fixes"
"drm-intel  drm-intel-fixes"
-   "drm-amddrm-amd-fixes"
 
"drmdrm-next"
"drm-misc   drm-misc-next-fixes"
"drm-intel  drm-intel-next-fixes"
-   "drm-amddrm-amd-next-fixes"
 
"drm-misc   drm-misc-next"
"drm-intel  drm-intel-next"
"drm-intel  drm-intel-gt-next"
-   "drm-amddrm-amd-next"
 
"sound-upstream for-linus"
"sound-upstream for-next"
-- 
2.20.1



Re: [PATCH] drm/i915/gvt: remove local storage of debugfs file

2021-05-19 Thread Zhenyu Wang
On 2021.05.19 10:31:18 +0200, Greg Kroah-Hartman wrote:
> On Wed, May 19, 2021 at 04:03:13PM +0800, Zhenyu Wang wrote:
> > On 2021.05.18 18:28:53 +0200, Greg Kroah-Hartman wrote:
> > > On Tue, May 18, 2021 at 06:17:05PM +0200, Greg Kroah-Hartman wrote:
> > > > There is no need to keep the dentry around for the debugfs kvmgt cache
> > > > file, as we can just look it up when we want to remove it later on.
> > > > Simplify the structure by removing the dentry and relying on debugfs
> > > > to find the dentry to remove when we want to.
> > > > 
> > > > By doing this change, we remove the last in-kernel user that was storing
> > > > the result of debugfs_create_long(), so that api can be cleaned up.
> > > > 
> > > > Cc: Zhenyu Wang 
> > > > Cc: Zhi Wang 
> > > > Cc: Jani Nikula 
> > > > Cc: Joonas Lahtinen 
> > > > Cc: Rodrigo Vivi 
> > > > Cc: David Airlie 
> > > > Cc: Daniel Vetter 
> > > > Cc: intel-gvt-...@lists.freedesktop.org
> > > > Cc: intel-...@lists.freedesktop.org
> > > > Cc: dri-devel@lists.freedesktop.org
> > > > Signed-off-by: Greg Kroah-Hartman 
> > > > ---
> > > >  drivers/gpu/drm/i915/gvt/kvmgt.c | 11 +--
> > > >  1 file changed, 5 insertions(+), 6 deletions(-)
> > > 
> > > Note, I can take this through my debugfs tree if wanted, that way I can
> > > clean up the debugfs_create_long() api at the same time.  Otherwise it's
> > > fine, I can wait until next -rc1 for that to happen.
> > > 
> > 
> > It's fine with me to go through debugfs tree. Just double check that recent
> > kvmgt change would not cause conflict with this as well.
> 
> How can I check that?  I'll be glad to take this through my tree, we can
> handle the merge issues later for 5.14-rc1 :)
> 

Current kvmgt change in merge queue is just 
https://patchwork.freedesktop.org/patch/433536/?series=89995=2
It applies fine with debugfs change.



signature.asc
Description: PGP signature


Re: [PATCH] drm/i915/gvt: remove local storage of debugfs file

2021-05-19 Thread Greg Kroah-Hartman
On Wed, May 19, 2021 at 04:03:13PM +0800, Zhenyu Wang wrote:
> On 2021.05.18 18:28:53 +0200, Greg Kroah-Hartman wrote:
> > On Tue, May 18, 2021 at 06:17:05PM +0200, Greg Kroah-Hartman wrote:
> > > There is no need to keep the dentry around for the debugfs kvmgt cache
> > > file, as we can just look it up when we want to remove it later on.
> > > Simplify the structure by removing the dentry and relying on debugfs
> > > to find the dentry to remove when we want to.
> > > 
> > > By doing this change, we remove the last in-kernel user that was storing
> > > the result of debugfs_create_long(), so that api can be cleaned up.
> > > 
> > > Cc: Zhenyu Wang 
> > > Cc: Zhi Wang 
> > > Cc: Jani Nikula 
> > > Cc: Joonas Lahtinen 
> > > Cc: Rodrigo Vivi 
> > > Cc: David Airlie 
> > > Cc: Daniel Vetter 
> > > Cc: intel-gvt-...@lists.freedesktop.org
> > > Cc: intel-...@lists.freedesktop.org
> > > Cc: dri-devel@lists.freedesktop.org
> > > Signed-off-by: Greg Kroah-Hartman 
> > > ---
> > >  drivers/gpu/drm/i915/gvt/kvmgt.c | 11 +--
> > >  1 file changed, 5 insertions(+), 6 deletions(-)
> > 
> > Note, I can take this through my debugfs tree if wanted, that way I can
> > clean up the debugfs_create_long() api at the same time.  Otherwise it's
> > fine, I can wait until next -rc1 for that to happen.
> > 
> 
> It's fine with me to go through debugfs tree. Just double check that recent
> kvmgt change would not cause conflict with this as well.

How can I check that?  I'll be glad to take this through my tree, we can
handle the merge issues later for 5.14-rc1 :)

thanks,

greg k-h


[PATCH] drm/ttm: Explain why ttm_bo_add_move_fence uses a shared slot

2021-05-19 Thread Daniel Vetter
Motivated because I got confused and Christian confirmed why this
works. I think this is non-obvious enough that it merits a slightly
longer comment.

Cc: Christian König 
Cc: Christian Koenig 
Cc: Huang Rui 
Cc: Thomas Hellström 
Signed-off-by: Daniel Vetter 
---
 drivers/gpu/drm/ttm/ttm_bo.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index ca1b098b6a56..51a94fd63bd7 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -682,7 +682,9 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
 }
 
 /*
- * Add the last move fence to the BO and reserve a new shared slot.
+ * Add the last move fence to the BO and reserve a new shared slot. We only use
+ * a shared slot to avoid unecessary sync and rely on the subsequent bo move to
+ * either stall or use an exclusive fence respectively set bo->moving.
  */
 static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo,
 struct ttm_resource_manager *man,
-- 
2.31.0



Re: [PATCH] drm/i915/gvt: remove local storage of debugfs file

2021-05-19 Thread Zhenyu Wang
On 2021.05.18 18:28:53 +0200, Greg Kroah-Hartman wrote:
> On Tue, May 18, 2021 at 06:17:05PM +0200, Greg Kroah-Hartman wrote:
> > There is no need to keep the dentry around for the debugfs kvmgt cache
> > file, as we can just look it up when we want to remove it later on.
> > Simplify the structure by removing the dentry and relying on debugfs
> > to find the dentry to remove when we want to.
> > 
> > By doing this change, we remove the last in-kernel user that was storing
> > the result of debugfs_create_long(), so that api can be cleaned up.
> > 
> > Cc: Zhenyu Wang 
> > Cc: Zhi Wang 
> > Cc: Jani Nikula 
> > Cc: Joonas Lahtinen 
> > Cc: Rodrigo Vivi 
> > Cc: David Airlie 
> > Cc: Daniel Vetter 
> > Cc: intel-gvt-...@lists.freedesktop.org
> > Cc: intel-...@lists.freedesktop.org
> > Cc: dri-devel@lists.freedesktop.org
> > Signed-off-by: Greg Kroah-Hartman 
> > ---
> >  drivers/gpu/drm/i915/gvt/kvmgt.c | 11 +--
> >  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> Note, I can take this through my debugfs tree if wanted, that way I can
> clean up the debugfs_create_long() api at the same time.  Otherwise it's
> fine, I can wait until next -rc1 for that to happen.
> 

It's fine with me to go through debugfs tree. Just double check that recent
kvmgt change would not cause conflict with this as well.

Thanks


signature.asc
Description: PGP signature


[PATCH 1/3] gpu: drm: replace occurrences of invalid character

2021-05-19 Thread Mauro Carvalho Chehab
There are some places at drm that ended receiving a
REPLACEMENT CHARACTER U+fffd ('�'), probably because of
some bad charset conversion.

Fix them by using what it seems to be the proper
character.

Signed-off-by: Mauro Carvalho Chehab 
---
 drivers/gpu/drm/amd/include/atombios.h   | 10 +-
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  2 +-
 drivers/gpu/drm/i915/i915_gpu_error.h|  2 +-
 drivers/gpu/drm/r128/r128_drv.h  |  2 +-
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/include/atombios.h 
b/drivers/gpu/drm/amd/include/atombios.h
index 47eb84598b96..6a505d1b82a5 100644
--- a/drivers/gpu/drm/amd/include/atombios.h
+++ b/drivers/gpu/drm/amd/include/atombios.h
@@ -5178,11 +5178,11 @@ typedef struct  _ATOM_LEAKAGE_VOLTAGE_OBJECT_V3
 typedef struct  _ATOM_SVID2_VOLTAGE_OBJECT_V3
 {
ATOM_VOLTAGE_OBJECT_HEADER_V3 sHeader;// voltage mode = 
VOLTAGE_OBJ_SVID2
-// 14:7 � PSI0_VID
-// 6 � PSI0_EN
-// 5 � PSI1
-// 4:2 � load line slope trim.
-// 1:0 � offset trim,
+// 14:7 - PSI0_VID
+// 6 - PSI0_EN
+// 5 - PSI1
+// 4:2 - load line slope trim.
+// 1:0 - offset trim,
USHORT   usLoadLine_PSI;
 // GPU GPIO pin Id to SVID2 regulator VRHot pin. possible value 0~31. 0 means 
GPIO0, 31 means GPIO31
UCHARucSVDGpioId; //0~31 indicate GPIO0~31
diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h 
b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index 14e2ffb6c0e5..2694dbb9967e 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: MIT*/
 /*
- * Copyright � 2003-2018 Intel Corporation
+ * Copyright © 2003-2018 Intel Corporation
  */
 
 #ifndef _INTEL_GPU_COMMANDS_H_
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h 
b/drivers/gpu/drm/i915/i915_gpu_error.h
index 16bc42de4b84..4df24c737e13 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -1,7 +1,7 @@
 /*
  * SPDX-License-Identifier: MIT
  *
- * Copyright � 2008-2018 Intel Corporation
+ * Copyright © 2008-2018 Intel Corporation
  */
 
 #ifndef _I915_GPU_ERROR_H_
diff --git a/drivers/gpu/drm/r128/r128_drv.h b/drivers/gpu/drm/r128/r128_drv.h
index 8b256123cf2b..c4d0e21280b9 100644
--- a/drivers/gpu/drm/r128/r128_drv.h
+++ b/drivers/gpu/drm/r128/r128_drv.h
@@ -29,7 +29,7 @@
  *Rickard E. (Rik) Faith 
  *Kevin E. Martin 
  *Gareth Hughes 
- *Michel D�zer 
+ *Michel Däzer 
  */
 
 #ifndef __R128_DRV_H__
-- 
2.31.1



Re: [RFC PATCH 1/3] drm/color: Add RGB Color encodings

2021-05-19 Thread Pekka Paalanen
On Tue, 18 May 2021 10:32:48 -0400
Harry Wentland  wrote:

> On 2021-05-17 4:34 a.m., Pekka Paalanen wrote:
> > On Fri, 14 May 2021 17:04:51 -0400
> > Harry Wentland  wrote:
> >   
> >> On 2021-04-30 8:53 p.m., Sebastian Wick wrote:  
> >>> On 2021-04-26 20:56, Harry Wentland wrote:
> > 
> > ...
> >   
>  Another reason I'm proposing to define the color space (and gamma) of
>  a plane is to make this explicit. Up until the color space and gamma
>  of a plane or framebuffer are not well defined, which leads to drivers
>  assuming the color space and gamma of a buffer (for blending and other
>  purposes) and might lead to sub-optimal outcomes.
> >>>
> >>> Blending only is "correct" with linear light so that property of the
> >>> color space is important. However, why does the kernel have to be
> >>> involved here? As long as user space knows that for correct blending the
> >>> data must represent linear light and knows when in the pipeline blending
> >>> happens it can make sure that the data at that point in the pipeline
> >>> contains linear light.
> >>> 
> >>
> >> The only reason the kernel needs to be involved is to take full advantage
> >> of the available HW without requiring KMS clients to be aware of
> >> the difference in display HW.  
> > 
> > Can you explain in more tangible examples, why you think so, please?
> > 
> > Is it because hardware does not fit the KMS UAPI model of the abstract
> > pixel pipeline?
> >   
> 
> I'd wager no HW is designed to meet KMS UAPI, rather KMS UAPI is designed
> to abstract HW.

Of course, but you are in big trouble in any case if there is a
fundamental mismatch. You may have to declare that all existing KMS
properties for this stuff will be mutually exclusive with your new
properties, so that you can introduce a new generic abstract pipeline
in KMS.

By mutually exclusive I mean that a driver must advertise only one or
the other set of properties and never both. If you want to support
userspace that doesn't understand the alternative set, maybe you also
need a drm client cap to switch to the alternative set per-drm-client.

> > Or is it because you have fixed-function hardware elements that you can
> > only make use of when userspace uses an enum-based UAPI?
> >   
> 
> One example is our degamma on our latest generation HW, where we have
> fixed-function "degamma" (rather de-EOTF):
> 
> https://gitlab.freedesktop.org/agd5f/linux/-/blob/amd-staging-drm-next/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp.c#L166

Ok.

> > I would totally agree that the driver does not want to be analysing LUT
> > entries to decipher if it could use a fixed-function element or not. It
> > would introduce uncertainty in the UAPI. So fixed-function elements
> > would need their own properties, but I don't know if that is feasible
> > as generic UAPI or if it should be driver-specific (and so left unused
> > by generic userspace).
> >   
> 
> 
> For the CRTC LUTs we actually do a linearity check to program the
> HW into bypass when the LUT is linear since the KMS LUT definition
> doesn't map well onto the LUT definition used by our HW and leads
> to rounding errors and failing IGT kms_color tests (if I remember
> this correctly).
> 
> https://gitlab.freedesktop.org/agd5f/linux/-/blob/amd-staging-drm-next/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_color.c#L330
> 
> Hence the suggestion to define pre-defined TFs right at a KMS level
> for usecases where we can assume the display will tonemap the 
> content.

Right. Explaining this would have been a good introduction in your
cover letter.

Maybe you want to define new KMS properties that shall be mutually
exclusive with the existing KMS GAMMA/CTM/DEGAMMA properties and
clearly document them as such.


Thanks,
pq


pgp4BIuEnFtn3.pgp
Description: OpenPGP digital signature


[PATCH 1/2] drm/i915/cmdparser: No-op failed batches on all platforms

2021-05-19 Thread Daniel Vetter
On gen9 for blt cmd parser we relied on the magic fence error
propagation which:
- doesn't work on gen7, because there's no scheduler with ringbuffers
  there yet
- fence error propagation can be weaponized to attack other things, so
  not a good design idea

Instead of magic, do the same thing on gen9 as on gen7.

Kudos to Jason for figuring this out.

Fixes: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled 
fences")
Cc:  # v5.6+
Cc: Jason Ekstrand 
Cc: Marcin Slusarz 
Cc: Jon Bloomfield 
Relates: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
Signed-off-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 5b4b2bd46e7c..2d3336ab7ba3 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -1509,6 +1509,12 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,
}
}
 
+   /* Batch unsafe to execute with privileges, cancel! */
+   if (ret) {
+   cmd = page_mask_bits(shadow->obj->mm.mapping);
+   *cmd = MI_BATCH_BUFFER_END;
+   }
+
if (trampoline) {
/*
 * With the trampoline, the shadow is executed twice.
@@ -1524,26 +1530,20 @@ int intel_engine_cmd_parser(struct intel_engine_cs 
*engine,
 */
*batch_end = MI_BATCH_BUFFER_END;
 
-   if (ret) {
-   /* Batch unsafe to execute with privileges, cancel! */
-   cmd = page_mask_bits(shadow->obj->mm.mapping);
-   *cmd = MI_BATCH_BUFFER_END;
+   /* If batch is unsafe but valid, jump to the original */
+   if (ret == -EACCES) {
+   unsigned int flags;
 
-   /* If batch is unsafe but valid, jump to the original */
-   if (ret == -EACCES) {
-   unsigned int flags;
+   flags = MI_BATCH_NON_SECURE_I965;
+   if (IS_HASWELL(engine->i915))
+   flags = MI_BATCH_NON_SECURE_HSW;
 
-   flags = MI_BATCH_NON_SECURE_I965;
-   if (IS_HASWELL(engine->i915))
-   flags = MI_BATCH_NON_SECURE_HSW;
+   GEM_BUG_ON(!IS_GEN_RANGE(engine->i915, 6, 7));
+   __gen6_emit_bb_start(batch_end,
+batch_addr,
+flags);
 
-   GEM_BUG_ON(!IS_GEN_RANGE(engine->i915, 6, 7));
-   __gen6_emit_bb_start(batch_end,
-batch_addr,
-flags);
-
-   ret = 0; /* allow execution */
-   }
+   ret = 0; /* allow execution */
}
}
 
-- 
2.31.0



[PATCH 2/2] Revert "drm/i915: Propagate errors on awaiting already signaled fences"

2021-05-19 Thread Daniel Vetter
From: Jason Ekstrand 

This reverts commit 9e31c1fe45d555a948ff66f1f0e3fe1f83ca63f7.  Ever
since that commit, we've been having issues where a hang in one client
can propagate to another.  In particular, a hang in an app can propagate
to the X server which causes the whole desktop to lock up.

Signed-off-by: Jason Ekstrand 
Reported-by: Marcin Slusarz 
Cc:  # v5.6+
Cc: Jason Ekstrand 
Cc: Marcin Slusarz 
Cc: Jon Bloomfield 
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
Fixes: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled 
fences")
Signed-off-by: Daniel Vetter 
---
 drivers/gpu/drm/i915/i915_request.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 970d8f4986bb..b796197c0772 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1426,10 +1426,8 @@ i915_request_await_execution(struct i915_request *rq,
 
do {
fence = *child++;
-   if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, >flags)) {
-   i915_sw_fence_set_error_once(>submit, fence->error);
+   if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, >flags))
continue;
-   }
 
if (fence->context == rq->fence.context)
continue;
@@ -1527,10 +1525,8 @@ i915_request_await_dma_fence(struct i915_request *rq, 
struct dma_fence *fence)
 
do {
fence = *child++;
-   if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, >flags)) {
-   i915_sw_fence_set_error_once(>submit, fence->error);
+   if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, >flags))
continue;
-   }
 
/*
 * Requests on the same timeline are explicitly ordered, along
-- 
2.31.0



Re: [PATCH] drm/i915/gvt: remove local storage of debugfs file

2021-05-19 Thread Jani Nikula
On Wed, 19 May 2021, Zhenyu Wang  wrote:
> Reviewed-by: Zhenyu Wang 
>
> Thanks!

Thanks for the review. Please also let Greg know whether he can pick
this up via the debugfs tree; I don't care either way.

BR,
Jani.

-- 
Jani Nikula, Intel Open Source Graphics Center


Re: [PATCH v2 09/15] drm/ttm, drm/amdgpu: Allow the driver some control over swapping

2021-05-19 Thread Thomas Hellström



On 5/18/21 6:30 PM, Christian König wrote:

Am 18.05.21 um 18:07 schrieb Thomas Hellström:


On 5/18/21 5:42 PM, Christian König wrote:

Am 18.05.21 um 17:38 schrieb Thomas Hellström:


On 5/18/21 5:28 PM, Christian König wrote:

Am 18.05.21 um 17:20 schrieb Thomas Hellström:


On 5/18/21 5:18 PM, Christian König wrote:



Am 18.05.21 um 17:15 schrieb Thomas Hellström:


On 5/18/21 10:26 AM, Thomas Hellström wrote:
We are calling the eviction_valuable driver callback at 
eviction time to

determine whether we actually can evict a buffer object.
The upcoming i915 TTM backend needs the same functionality for 
swapout,

and that might actually be beneficial to other drivers as well.

Add an eviction_valuable call also in the swapout path. Try to 
keep the
current behaviour for all drivers by returning true if the 
buffer object
is already in the TTM_PL_SYSTEM placement. We change behaviour 
for the
case where a buffer object is in a TT backed placement when 
swapped out,

in which case the drivers normal eviction_valuable path is run.

Finally export ttm_tt_unpopulate() and don't swap out bos
that are not populated. This allows a driver to purge a bo at
swapout time if its content is no longer valuable rather than to
have TTM swap the contents out.

Cc: Christian König 
Signed-off-by: Thomas Hellström 



Christian,

Here we have a ttm_tt_unpopulate() export as well at the end. I 
figure you will push back on that one. What we really need is a 
functionality to just drop the bo contents and end up in system 
memory unpopulated. Should I perhaps add a utility function to 
do that instead? like ttm_bo_purge()?


We already have that. Just call ttm_bo_validate() without any 
place to put the buffer.


See how ttm_bo_pipeline_gutting() is used.

Christian.


OK, so is that reentrant from the move() or swap_notify() callback.


That sounds like a design bug to me since you should never need to 
do this.


When you want to destroy the backing store of a buffer during 
eviction you should just do this by returning an empty placement 
from the evict_flags callback.


So this is for the functionality where the user has indicated that 
the contents is no longer of value, but the buffer itself
is cached until evicted or swapped out for performance reasons. So 
the above would work for eviction, but what about swapout. Could we 
add some similar functionality there?


Amdgpu has the same functionality and you don't need to handle swap 
at all.


Just return from the evict_flags that you want to drop the backing 
store as soon as the BO leaves the GTT domain.


Hmm, the pipeline_gutting function seems ok, but overly complex if 
the bo is already idle, Am I allowed to optimize it slightly for the 
latter case?


Yeah, sure. We just never hat that use case so far.


One thing about the code here that makes me worried is that the 
"destination" ttm_tt is allocated *after* pipeline_gutting. We're not 
really allowed to fail here because that would leave the BO in a state 
where codepaths (fault for example) try to access a NULL ttm_tt. While 
the idle case can get away with ttm_tt_unpopulate, for the async case, 
ttm_tt really needs to be pre-allocated, so that we can leave the bo in 
a consistent state.


/Thomas




Christian.



/Thomas




Christian.



/Thomas



Regards,
Christian.



/Thomas







Thanks,

Thomas












<    1   2