date:20201125

On 2020-11-25 06:09, Steven Price wrote:
> On 25/11/2020 03:17, Luben Tuikov wrote:
>> Add a "done" list to which all completed jobs are added
>> to be freed. The drm_sched_job_done() callback is the
>> producer of jobs to this list.
>>
>> Add a "done" thread which consumes from the done list
>> and frees up jobs. Now, the main scheduler thread only
>> pushes jobs to the GPU and the "done" thread frees them
>> up, on the way out of the GPU when they've completed
>> execution.
> 
> Generally I'd be in favour of a "done thread" as I think there are some 
> murky corners of Panfrost's locking that would be helped by deferring 
> the free_job() callback.

Check my response to his email.

It seems you're okay with a separate thread, when both threads
could be working concurrently, and Christian wants
a single thread doing all this. You should probably address
this in a follow-up to his email, so this can be hashed out.

> 
> But I think you're trying to do too much in one patch here. And as 
> Christian has pointed out there's some dodgy looking changes to locking 
> which aren't explained.

I've addressed this in my response to his email, check it out.

So, if you're in favour of a separate thread working concurrently,
please follow up to his email, so this can be hashed out.

Thanks and Regards,
Luben

> 
> Steve
> 
>>
>> Make use of the status returned by the GPU driver
>> timeout handler to decide whether to leave the job in
>> the pending list, or to send it off to the done list.
>> If a job is done, it is added to the done list and the
>> done thread woken up. If a job needs more time, it is
>> left on the pending list and the timeout timer
>> restarted.
>>
>> Eliminate the polling mechanism of picking out done
>> jobs from the pending list, i.e. eliminate
>> drm_sched_get_cleanup_job(). Now the main scheduler
>> thread only pushes jobs down to the GPU.
>>
>> Various other optimizations to the GPU scheduler
>> and job recovery are possible with this format.
>>
>> Signed-off-by: Luben Tuikov 
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 173 +
>>   include/drm/gpu_scheduler.h|  14 ++
>>   2 files changed, 101 insertions(+), 86 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index 3eb7618a627d..289ae68cd97f 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -164,7 +164,8 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
>>* drm_sched_job_done - complete a job
>>* @s_job: pointer to the job which is done
>>*
>> - * Finish the job's fence and wake up the worker thread.
>> + * Finish the job's fence, move it to the done list,
>> + * and wake up the done thread.
>>*/
>>   static void drm_sched_job_done(struct drm_sched_job *s_job)
>>   {
>> @@ -179,7 +180,12 @@ static void drm_sched_job_done(struct drm_sched_job 
>> *s_job)
>>  dma_fence_get(_fence->finished);
>>  drm_sched_fence_finished(s_fence);
>>  dma_fence_put(_fence->finished);
>> -wake_up_interruptible(>wake_up_worker);
>> +
>> +spin_lock(>job_list_lock);
>> +list_move(_job->list, >done_list);
>> +spin_unlock(>job_list_lock);
>> +
>> +wake_up_interruptible(>done_wait_q);
>>   }
>>   
>>   /**
>> @@ -221,11 +227,10 @@ bool drm_sched_dependency_optimized(struct dma_fence* 
>> fence,
>>   EXPORT_SYMBOL(drm_sched_dependency_optimized);
>>   
>>   /**
>> - * drm_sched_start_timeout - start timeout for reset worker
>> - *
>> - * @sched: scheduler instance to start the worker for
>> + * drm_sched_start_timeout - start a timeout timer
>> + * @sched: scheduler instance whose job we're timing
>>*
>> - * Start the timeout for the given scheduler.
>> + * Start a timeout timer for the given scheduler.
>>*/
>>   static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
>>   {
>> @@ -305,8 +310,8 @@ static void drm_sched_job_begin(struct drm_sched_job 
>> *s_job)
>>   
>>  spin_lock(>job_list_lock);
>>  list_add_tail(_job->list, >pending_list);
>> -drm_sched_start_timeout(sched);
>>  spin_unlock(>job_list_lock);
>> +drm_sched_start_timeout(sched);
>>   }
>>   
>>   static void drm_sched_job_timedout(struct work_struct *work)
>> @@ -316,37 +321,30 @@ static void drm_sched_job_timedout(struct work_struct 
>> *work)
>>   
>>  sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
>>   
>> -/* Protects against concurrent deletion in drm_sched_get_cleanup_job */
>>  spin_lock(>job_list_lock);
>>  job = list_first_entry_or_null(>pending_list,
>> struct drm_sched_job, list);
>> +spin_unlock(>job_list_lock);
>>   
>>  if (job) {
>> -/*
>> - * Remove the bad job so it cannot be freed by concurrent
>> - * drm_sched_cleanup_jobs. It will be reinserted back after 
>> sched->thread
>> - * is parked at

Re: [PATCH 6/6] drm/sched: Make use of a "done" thread

On 2020-11-25 05:10, Christian König wrote:
> Am 25.11.20 um 04:17 schrieb Luben Tuikov:
>> Add a "done" list to which all completed jobs are added
>> to be freed. The drm_sched_job_done() callback is the
>> producer of jobs to this list.
>>
>> Add a "done" thread which consumes from the done list
>> and frees up jobs. Now, the main scheduler thread only
>> pushes jobs to the GPU and the "done" thread frees them
>> up, on the way out of the GPU when they've completed
>> execution.
> 
> Well there are quite a number of problems in this patch.
> 
>  From the design I think we should be getting rid of the linked list and

Sure, we can do this in a separate future patch. I'd imagine it'll
touch a lot of places and I didn't want this patch and this series
of patches to get out of hand, by changing too many things.

Here in this patch I wanted to change as little as possible.

> not extend its use. And we also don't want to offload the freeing of 
> jobs into a different thread because that could potentially mean that 
> this is executed on a different CPU.

Yes, of course it could.

From my experience working with hardware, I always envision work
being done by small units, in a pipeline, concurrently, while all
of them working concurrently, all the time.

It's hard to go back to unitary processing. :-)

> 
> Then one obvious problem seems to be that you don't take into account 
> that we moved the job freeing into the scheduler thread to make sure 
> that this is suspended while the scheduler thread is stopped. 

I don't understand what "this" refers to in "that this is suspended
while the scheduler thread is stopped."

> This 
> behavior is now completely gone, e.g. the delete thread keeps running 
> while the scheduler thread is stopped.

Yes, indeed, that is the case and intentional.

There seems to be no requirement to have to stop the main
scheduler thread, which pushes tasks down to the GPU,
so that we can free jobs. In other words, both
threads can work concurrently, one pushing jobs down
to the GPU, while the other freeing done jobs coming
out of the GPU.

If this concurrency is something you don't like,
then no problem, we can keep them interlocked in one
thread as before.

> 
> A few more comments below.
> 
>> Make use of the status returned by the GPU driver
>> timeout handler to decide whether to leave the job in
>> the pending list, or to send it off to the done list.
>> If a job is done, it is added to the done list and the
>> done thread woken up. If a job needs more time, it is
>> left on the pending list and the timeout timer
>> restarted.
>>
>> Eliminate the polling mechanism of picking out done
>> jobs from the pending list, i.e. eliminate
>> drm_sched_get_cleanup_job(). Now the main scheduler
>> thread only pushes jobs down to the GPU.
>>
>> Various other optimizations to the GPU scheduler
>> and job recovery are possible with this format.
>>
>> Signed-off-by: Luben Tuikov 
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 173 +
>>   include/drm/gpu_scheduler.h|  14 ++
>>   2 files changed, 101 insertions(+), 86 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index 3eb7618a627d..289ae68cd97f 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -164,7 +164,8 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
>>* drm_sched_job_done - complete a job
>>* @s_job: pointer to the job which is done
>>*
>> - * Finish the job's fence and wake up the worker thread.
>> + * Finish the job's fence, move it to the done list,
>> + * and wake up the done thread.
>>*/
>>   static void drm_sched_job_done(struct drm_sched_job *s_job)
>>   {
>> @@ -179,7 +180,12 @@ static void drm_sched_job_done(struct drm_sched_job 
>> *s_job)
>>  dma_fence_get(_fence->finished);
>>  drm_sched_fence_finished(s_fence);
>>  dma_fence_put(_fence->finished);
>> -wake_up_interruptible(>wake_up_worker);
>> +
>> +spin_lock(>job_list_lock);
>> +list_move(_job->list, >done_list);
>> +spin_unlock(>job_list_lock);
>> +
>> +wake_up_interruptible(>done_wait_q);
> 
> How is the worker thread then woken up to push new jobs to the hardware?

A-ha! Thank you Christian for bringing this up--perhaps that is what
the problem is I was seeing on my test machine, which I described
in the cover letter 0/6, that X/GDM just sleeping in wait.

So, I'd imagined that whomever pushed jobs down to DRM, i.e.
the producer of jobs, also did a "up"/"wake-up" of the main
scheduler thread, so that the main scheduler thread would
then wake up and "schedule" tasks down into the GPU. It seems
I've only "imagined" :-) such concurrency and the the main scheduler
thread needs to be woken up to poll? I'll try this next.
Thanks for the tip Christian!

> 
>>   }
>>   
>>   /**
>> @@ -221,11 +227,10 @@ bool drm_sched_dependency_optimized(struct dma_fence* 
>>

Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use

2020-11-25 Thread Andrey Grodzovsky



On 11/25/20 11:36 AM, Daniel Vetter wrote:

On Wed, Nov 25, 2020 at 01:57:40PM +0100, Christian König wrote:

Am 25.11.20 um 11:40 schrieb Daniel Vetter:

On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:

Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:

On 11/24/20 2:41 AM, Christian König wrote:

Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:

On 11/23/20 3:41 PM, Christian König wrote:

Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:

On 11/23/20 3:20 PM, Christian König wrote:

Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:

On 11/25/20 5:42 AM, Christian König wrote:

Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:

It's needed to drop iommu backed pages on device unplug
before device's IOMMU group is released.

It would be cleaner if we could do the whole
handling in TTM. I also need to double check
what you are doing with this function.

Christian.

Check patch "drm/amdgpu: Register IOMMU topology
notifier per device." to see
how i use it. I don't see why this should go
into TTM mid-layer - the stuff I do inside
is vendor specific and also I don't think TTM is
explicitly aware of IOMMU ?
Do you mean you prefer the IOMMU notifier to be
registered from within TTM
and then use a hook to call into vendor specific handler ?

No, that is really vendor specific.

What I meant is to have a function like
ttm_resource_manager_evict_all() which you only need
to call and all tt objects are unpopulated.

So instead of this BO list i create and later iterate in
amdgpu from the IOMMU patch you just want to do it
within
TTM with a single function ? Makes much more sense.

Yes, exactly.

The list_empty() checks we have in TTM for the LRU are
actually not the best idea, we should now check the
pin_count instead. This way we could also have a list of the
pinned BOs in TTM.

So from my IOMMU topology handler I will iterate the TTM LRU for
the unpinned BOs and this new function for the pinned ones  ?
It's probably a good idea to combine both iterations into this
new function to cover all the BOs allocated on the device.

Yes, that's what I had in my mind as well.


BTW: Have you thought about what happens when we unpopulate
a BO while we still try to use a kernel mapping for it? That
could have unforeseen consequences.

Are you asking what happens to kmap or vmap style mapped CPU
accesses once we drop all the DMA backing pages for a particular
BO ? Because for user mappings
(mmap) we took care of this with dummy page reroute but indeed
nothing was done for in kernel CPU mappings.

Yes exactly that.

In other words what happens if we free the ring buffer while the
kernel still writes to it?

Christian.

While we can't control user application accesses to the mapped buffers
explicitly and hence we use page fault rerouting
I am thinking that in this  case we may be able to sprinkle
drm_dev_enter/exit in any such sensitive place were we might
CPU access a DMA buffer from the kernel ?

Yes, I fear we are going to need that.

Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
could stuff this into begin/end_cpu_access



Do you mean guarding with drm_dev_enter/exit in dma_buf_ops.begin/end_cpu_access
driver specific hook ?



(but only for the kernel, so a
bit tricky)?



Why only kernel ? Why is it a problem to do it if it comes from dma_buf_ioctl by
some user process ? And  if we do need this distinction I think we should be 
able to
differentiate by looking at current->mm (i.e. mm_struct) pointer being NULL for 
kernel thread.




Oh very very good point! I haven't thought about DMA-buf mmaps in this
context yet.



btw the other issue with dma-buf (and even worse with dma_fence) is
refcounting of the underlying drm_device. I'd expect that all your
callbacks go boom if the dma_buf outlives your drm_device. That part isn't
yet solved in your series here.

Well thinking more about this, it seems to be a another really good argument
why mapping pages from DMA-bufs into application address space directly is a
very bad idea :)

But yes, we essentially can't remove the device as long as there is a
DMA-buf with mappings. No idea how to clean that one up.

drm_dev_get/put in drm_prime helpers should get us like 90% there I think.



What are the other 10% ?




The even more worrying thing is random dma_fence attached to the dma_resv
object. We could try to clean all of ours up, but they could have escaped
already into some other driver. And since we're talking about egpu
hotunplug, dma_fence escaping to the igpu is a pretty reasonable use-case.

I have no how to fix that one :-/
-Daniel



I assume you are referring to sync_file_create/sync_file_get_fence API  for 
dma_fence export/import ?

So with DMA bufs we have the drm_gem_object as exporter specific private data
and so we can do drm_dev_get and put at the drm_gem_object layer to bind device 
life cycle
to that of each GEM object but, we don't have such mid-layer for dma_fence which 
could allow
us to increment device reference for

Re: [Intel-wired-lan] [PATCH 000/141] Fix fall-through warnings for Clang

2020-11-25 Thread Nick Desaulniers

On Tue, Nov 24, 2020 at 11:05 PM James Bottomley
 wrote:
>
> On Tue, 2020-11-24 at 13:32 -0800, Kees Cook wrote:
> > We already enable -Wimplicit-fallthrough globally, so that's not the
> > discussion. The issue is that Clang is (correctly) even more strict
> > than GCC for this, so these are the remaining ones to fix for full
> > Clang coverage too.
> >
> > People have spent more time debating this already than it would have
> > taken to apply the patches. :)
>
> You mean we've already spent 90% of the effort to come this far so we
> might as well go the remaining 10% because then at least we get some
> return? It's certainly a clinching argument in defence procurement ...

So developers and distributions using Clang can't have
-Wimplicit-fallthrough enabled because GCC is less strict (which has
been shown in this thread to lead to bugs)?  We'd like to have nice
things too, you know.

I even agree that most of the churn comes from

case 0:
  ++x;
default:
  break;

which I have a patch for: https://reviews.llvm.org/D91895.  I agree
that can never lead to bugs.  But that's not the sole case of this
series, just most of them.

Though, note how the reviewer (C++ spec editor and clang front end
owner) in https://reviews.llvm.org/D91895 even asks in that review how
maybe a new flag would be more appropriate for a watered
down/stylistic variant of the existing behavior.  And if the current
wording of Documentation/process/deprecated.rst around "fallthrough"
is a straightforward rule of thumb, I kind of agree with him.

>
> > This is about robustness and language wrangling. It's a big code-
> > base, and this is the price of our managing technical debt for
> > permanent robustness improvements. (The numbers I ran from Gustavo's
> > earlier patches were that about 10% of the places adjusted were
> > identified as legitimate bugs being fixed. This final series may be
> > lower, but there are still bugs being found from it -- we need to
> > finish this and shut the door on it for good.)
>
> I got my six patches by analyzing the lwn.net report of the fixes that
> was cited which had 21 of which 50% didn't actually change the emitted
> code, and 25% didn't have a user visible effect.
>
> But the broader point I'm making is just because the compiler people
> come up with a shiny new warning doesn't necessarily mean the problem

That's not what this is though; you're attacking a strawman.  I'd
encourage you to bring that up when that actually occurs, unlike this
case since it's actively hindering getting -Wimplicit-fallthrough
enabled for Clang.  This is not a shiny new warning; it's already on
for GCC and has existed in both compilers for multiple releases.

And I'll also note that warnings are warnings and not errors because
they cannot be proven to be bugs in 100% of cases, but they have led
to bugs in the past.  They require a human to review their intent and
remove ambiguities.  If 97% of cases would end in a break ("Expert C
Programming: Deep C Secrets" - Peter van der Linden), then it starts
to look to me like a language defect; certainly an incorrectly chosen
default.  But the compiler can't know those 3% were intentional,
unless you're explicit for those exceptional cases.

> it's detecting is one that causes us actual problems in the code base.
> I'd really be happier if we had a theory about what classes of CVE or
> bug we could eliminate before we embrace the next new warning.

We don't generally file CVEs and waiting for them to occur might be
too reactive, but I agree that pointing to some additional
documentation in commit messages about how a warning could lead to a
bug would make it clearer to reviewers why being able to enable it
treewide, even if there's no bug in their particular subsystem, is in
the general interest of the commons.

On Mon, Nov 23, 2020 at 7:58 AM James Bottomley
 wrote:
>
> We're also complaining about the inability to recruit maintainers:
>
> https://www.theregister.com/2020/06/30/hard_to_find_linux_maintainers_says_torvalds/
>
> And burn out:
>
> http://antirez.com/news/129
>
> The whole crux of your argument seems to be maintainers' time isn't
> important so we should accept all trivial patches ... I'm pushing back
> on that assumption in two places, firstly the valulessness of the time
> and secondly that all trivial patches are valuable.

It's critical to the longevity of any open source project that there
are not single points of failure.  If someone is not expendable or
replaceable (or claims to be) then that's a risk to the project and a
bottleneck.  Not having a replacement in training or some form of
redundancy is short sighted.

If trivial patches are adding too much to your workload, consider
training a co-maintainer or asking for help from one of your reviewers
whom you trust.  I don't doubt it's hard to find maintainers, but
existing maintainers should go out of their way to entrust
co-maintainers especially when they find their workload becomes too
high.  And

Re: [PATCH v3 08/12] drm/amdgpu: Split amdgpu_device_fini into early and late

2020-11-25 Thread Andrey Grodzovsky




On 11/25/20 5:41 AM, Daniel Vetter wrote:

On Tue, Nov 24, 2020 at 10:51:57AM -0500, Andrey Grodzovsky wrote:

On 11/24/20 9:53 AM, Daniel Vetter wrote:

On Sat, Nov 21, 2020 at 12:21:18AM -0500, Andrey Grodzovsky wrote:

Some of the stuff in amdgpu_device_fini such as HW interrupts
disable and pending fences finilization must be done right away on
pci_remove while most of the stuff which relates to finilizing and
releasing driver data structures can be kept until
drm_driver.release hook is called, i.e. when the last device
reference is dropped.


Uh fini_late and fini_early are rathare meaningless namings, since no
clear why there's a split. If you used drm_connector_funcs as inspiration,
that's kinda not good because 'register' itself is a reserved keyword.
That's why we had to add late_ prefix, could as well have used
C_sucks_ as prefix :-) And then the early_unregister for consistency.

I think fini_hw and fini_sw (or maybe fini_drm) would be a lot clearer
about what they're doing.

I still strongly recommend that you cut over as much as possible of the
fini_hw work to devm_ and for the fini_sw/drm stuff there's drmm_
-Daniel


Definitely, and I put it in a TODO list in the RFC patch.Also, as I
mentioned before -
I just prefer to leave it for a follow up work because it's non trivial and
requires shuffling
a lof of stuff around in the driver. I was thinking of committing the work
in incremental steps -
so it's easier to merge it and control for breakages.

Yeah doing devm/drmm conversion later on makes sense. It'd still try to
have better names than what you're currently going with. A few of these
will likely stick around for very long, not just interim.
-Daniel


Will do.

Andrey





Andrey



Signed-off-by: Andrey Grodzovsky 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu.h|  6 +-
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 
   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|  7 ++-
   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 15 ++-
   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c| 24 +++-
   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h|  1 +
   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 12 +++-
   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c|  3 +++
   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  3 ++-
   9 files changed, 65 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 83ac06a..6243f6d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1063,7 +1063,9 @@ static inline struct amdgpu_device 
*amdgpu_ttm_adev(struct ttm_bo_device *bdev)
   int amdgpu_device_init(struct amdgpu_device *adev,
   uint32_t flags);
-void amdgpu_device_fini(struct amdgpu_device *adev);
+void amdgpu_device_fini_early(struct amdgpu_device *adev);
+void amdgpu_device_fini_late(struct amdgpu_device *adev);
+
   int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
   void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos,
@@ -1275,6 +1277,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device *dev);
   int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file 
*file_priv);
   void amdgpu_driver_postclose_kms(struct drm_device *dev,
 struct drm_file *file_priv);
+void amdgpu_driver_release_kms(struct drm_device *dev);
+
   int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
   int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
   int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2f60b70..797d94d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3557,14 +3557,12 @@ int amdgpu_device_init(struct amdgpu_device *adev,
* Tear down the driver info (all asics).
* Called at driver shutdown.
*/
-void amdgpu_device_fini(struct amdgpu_device *adev)
+void amdgpu_device_fini_early(struct amdgpu_device *adev)
   {
dev_info(adev->dev, "amdgpu: finishing device.\n");
flush_delayed_work(>delayed_init_work);
adev->shutdown = true;
-   kfree(adev->pci_state);
-
/* make sure IB test finished before entering exclusive mode
 * to avoid preemption on IB test
 * */
@@ -3581,11 +3579,18 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
else
drm_atomic_helper_shutdown(adev_to_drm(adev));
}
-   amdgpu_fence_driver_fini(adev);
+   amdgpu_fence_driver_fini_early(adev);
if (adev->pm_sysfs_en)
amdgpu_pm_sysfs_fini(adev);
amdgpu_fbdev_fini(adev);
+
+   amdgpu_irq_fini_early(adev);
+}
+
+void amdgpu_device_fini_late(struct amdgpu_device *adev)
+{
amdgpu_device_ip_fini(adev);
+   amdgpu_fence_driver_fini_late(adev);

Re: [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug

2020-11-25 Thread Andrey Grodzovsky



On 11/25/20 4:04 AM, Daniel Vetter wrote:

On Tue, Nov 24, 2020 at 11:27 PM Andrey Grodzovsky
 wrote:


On 11/24/20 9:49 AM, Daniel Vetter wrote:

On Sat, Nov 21, 2020 at 12:21:20AM -0500, Andrey Grodzovsky wrote:

Avoids NULL ptr due to kobj->sd being unset on device removal.

Signed-off-by: Andrey Grodzovsky 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   | 4 +++-
   drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 +++-
   2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index caf828a..812e592 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -27,6 +27,7 @@
   #include 
   #include 
   #include 
+#include 

   #include "amdgpu.h"
   #include "amdgpu_ras.h"
@@ -1043,7 +1044,8 @@ static int amdgpu_ras_sysfs_remove_feature_node(struct 
amdgpu_device *adev)
  .attrs = attrs,
  };

-sysfs_remove_group(>dev->kobj, );
+if (!drm_dev_is_unplugged(>ddev))
+sysfs_remove_group(>dev->kobj, );

This looks wrong. sysfs, like any other interface, should be
unconditionally thrown out when we do the drm_dev_unregister. Whether
hotunplugged or not should matter at all. Either this isn't needed at all,
or something is wrong with the ordering here. But definitely fishy.
-Daniel


So technically this is needed because kobejct's sysfs directory entry kobj->sd
is set to NULL
on device removal (from sysfs_remove_dir) but because we don't finalize the 
device
until last reference to drm file is dropped (which can happen later) we end up
calling sysfs_remove_file/dir after
this pointer is NULL. sysfs_remove_file checks for NULL and aborts while
sysfs_remove_dir
is not and that why I guard against calls to sysfs_remove_dir.
But indeed the whole approach in the driver is incorrect, as Greg pointed out -
we should use
default groups attributes instead of explicit calls to sysfs interface and this
would save those troubles.
But again. the issue here of scope of work, converting all of amdgpu to default
groups attributes is somewhat
lengthy process with extra testing as the entire driver is papered with sysfs
references and seems to me more of a standalone
cleanup, just like switching to devm_ and drmm_ work. To me at least it seems
that it makes more sense
to finalize and push the hot unplug patches so that this new functionality can
be part of the driver sooner
and then incrementally improve it by working on those other topics. Just as
devm_/drmm_ I also added sysfs cleanup
to my TODO list in the RFC patch.

Hm, whether you solve this with the default group stuff to
auto-remove, or remove explicitly at the right time doesn't matter
much. The underlying problem you have here is that it's done way too
late.


As far as I understood correctly the default group attrs by reading this
article by Greg - https://www.linux.com/news/how-create-sysfs-file-correctly/
it will be removed together with the device and not too late like now and I 
quote
from the last paragraph there:

"By setting this value, you don’t have to do anything in your
probe() or release() functions at all in order for the
sysfs files to be properly created and destroyed whenever your
device is added or removed from the system. And you will, most
importantly, do it in a race-free manner, which is always a good thing."

To me this seems like the best solution to the late remove issue. What do
you think ?



  sysfs removal (like all uapi interfaces) need to be removed as
part of drm_dev_unregister.



Do you mean we need to trace and aggregate all sysfs files creation within
the low level drivers and then call some sysfs release function inside 
drm_dev_unregister

to iterate and release them all ?



  I guess aside from the split into fini_hw
and fini_sw, you also need an unregister_late callback (like we have
already for drm_connector, so that e.g. backlight and similar stuff
can be unregistered).



Is this the callback you suggest to call from within drm_dev_unregister and
it will be responsible to release all sysfs files created within the driver ?

Andrey




Papering over the underlying bug like this doesn't really fix much,
the lifetimes are still wrong.
-Daniel


Andrey



  return 0;
   }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
index 2b7c90b..54331fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
@@ -24,6 +24,7 @@
   #include 
   #include 
   #include 
+#include 

   #include "amdgpu.h"
   #include "amdgpu_ucode.h"
@@ -464,7 +465,8 @@ int amdgpu_ucode_sysfs_init(struct amdgpu_device *adev)

   void amdgpu_ucode_sysfs_fini(struct amdgpu_device *adev)
   {
-sysfs_remove_group(>dev->kobj, _attr_group);
+if (!drm_dev_is_unplugged(>ddev))
+sysfs_remove_group(>dev->kobj, _attr_group);
   }

   static int amdgpu_ucode_init_single_fw(struct

Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use

On Wed, Nov 25, 2020 at 5:56 PM Michel Dänzer  wrote:
>
> On 2020-11-25 1:57 p.m., Christian König wrote:
> >
> > Well thinking more about this, it seems to be a another really good
> > argument why mapping pages from DMA-bufs into application address space
> > directly is a very bad idea :)
>
> Apologies for going off on a tangent here...
>
> Since allowing userspace mmap with dma-buf fds seems to be a trap in
> general[0], I wonder if there's any way we could stop supporting that?
>
>
> [0] E.g. mutter had to disable handing out dma-bufs for screen capture
> by default with non-i915 for now, because in particular with discrete
> GPUs, direct CPU reads can be unusably slow (think single-digit frames
> per second), and of course there's other userspace which goes "ooh,
> dma-buf, let's map and read!".

I think a pile of applications (cros included) use it to do uploads
across process boundaries. Think locked down jpeg decoder and stuff
like that. For that use-case it seems to work ok.

But yeah don't read from dma-buf. I'm pretty sure it's dead slow on
almost everything, except integrated gpu which have A) a coherent
fabric with the gpu and B) that fabric is actually faster for
rendering in general, not just for dedicated buffers allocated for
down/upload.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 5/6] drm/amdgpu: Don't hardcode thread name length

On 2020-11-25 04:55, Christian König wrote:
> Am 25.11.20 um 04:17 schrieb Luben Tuikov:
>> Introduce a macro DRM_THREAD_NAME_LEN
>> and use that to define ring name size,
>> instead of hardcoding it to 16.
>>
>> Signed-off-by: Luben Tuikov 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 +-
>>   include/drm/gpu_scheduler.h  | 2 ++
>>   2 files changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> index 7112137689db..bbd46c6dec65 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> @@ -230,7 +230,7 @@ struct amdgpu_ring {
>>  unsignedwptr_offs;
>>  unsignedfence_offs;
>>  uint64_tcurrent_ctx;
>> -charname[16];
>> +charname[DRM_THREAD_NAME_LEN];
>>  u32 trail_seq;
>>  unsignedtrail_fence_offs;
>>  u64 trail_fence_gpu_addr;
>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>> index 61f7121e1c19..3a5686c3b5e9 100644
>> --- a/include/drm/gpu_scheduler.h
>> +++ b/include/drm/gpu_scheduler.h
>> @@ -30,6 +30,8 @@
>>   
>>   #define MAX_WAIT_SCHED_ENTITY_Q_EMPTY msecs_to_jiffies(1000)
>>   
>> +#define DRM_THREAD_NAME_LEN TASK_COMM_LEN
>> +
> 
> The thread name is an amdgpu specific thing. I don't think we should 
> have that in the scheduler.

I need it in DRM when creating the done thread from the name
of the main scheduler thread. Since DRM creates threads,
the main scheduler thread and the done thread, it would
be good to have a preliminary limit to the name string.

> 
> And why do you use TASK_COMM_LEN here? That is completely unrelated stuff.

If you trace down into the kernel, TASK_COMM_LEN seems to be used in
snprintf() when naming a kernel thread, and its value is 16--same
as the one used in amdgpu.

So the size of the name string transitions from amdgpu to DRM to kernel
proper, where amdgpu and kernel proper set it to max 16, but DRM doesn't
give it a limit.

Sure, I can remove it from DRM, and just use a local limit
when snprintf() the name when creating a tread, possibly
using TASK_COMM_LEN. (That's in the next patch.)

Would that be better? I can do that in v2 of this patchset.

Thanks,
Luben

> 
> Regards,
> Christian.
> 
>>   struct drm_gpu_scheduler;
>>   struct drm_sched_rq;
>>   
> 

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use

2020-11-25 Thread Michel Dänzer


On 2020-11-25 1:57 p.m., Christian König wrote:


Well thinking more about this, it seems to be a another really good 
argument why mapping pages from DMA-bufs into application address space 
directly is a very bad idea :)


Apologies for going off on a tangent here...

Since allowing userspace mmap with dma-buf fds seems to be a trap in 
general[0], I wonder if there's any way we could stop supporting that?



[0] E.g. mutter had to disable handing out dma-bufs for screen capture 
by default with non-i915 for now, because in particular with discrete 
GPUs, direct CPU reads can be unusably slow (think single-digit frames 
per second), and of course there's other userspace which goes "ooh, 
dma-buf, let's map and read!".



--
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 3/6] drm/scheduler: Job timeout handler returns status

On 2020-11-25 04:50, Christian König wrote:
> Am 25.11.20 um 04:17 schrieb Luben Tuikov:
>> The job timeout handler now returns status
>> indicating back to the DRM layer whether the job
>> was successfully cancelled or whether more time
>> should be given to the job to complete.
>>
>> Signed-off-by: Luben Tuikov 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  6 --
>>   include/drm/gpu_scheduler.h | 13 ++---
>>   2 files changed, 14 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> index ff48101bab55..81b73790ecc6 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> @@ -28,7 +28,7 @@
>>   #include "amdgpu.h"
>>   #include "amdgpu_trace.h"
>>   
>> -static void amdgpu_job_timedout(struct drm_sched_job *s_job)
>> +static int amdgpu_job_timedout(struct drm_sched_job *s_job)
>>   {
>>  struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
>>  struct amdgpu_job *job = to_amdgpu_job(s_job);
>> @@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job 
>> *s_job)
>>  amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) 
>> {
>>  DRM_ERROR("ring %s timeout, but soft recovered\n",
>>s_job->sched->name);
>> -return;
>> +return 0;
>>  }
>>   
>>  amdgpu_vm_get_task_info(ring->adev, job->pasid, );
>> @@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct drm_sched_job 
>> *s_job)
>>   
>>  if (amdgpu_device_should_recover_gpu(ring->adev)) {
>>  amdgpu_device_gpu_recover(ring->adev, job);
>> +return 0;
>>  } else {
>>  drm_sched_suspend_timeout(>sched);
>>  if (amdgpu_sriov_vf(adev))
>>  adev->virt.tdr_debug = true;
>> +return 1;
>>  }
>>   }
>>   
>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>> index 2e0c368e19f6..61f7121e1c19 100644
>> --- a/include/drm/gpu_scheduler.h
>> +++ b/include/drm/gpu_scheduler.h
>> @@ -230,10 +230,17 @@ struct drm_sched_backend_ops {
>>  struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
>>   
>>  /**
>> - * @timedout_job: Called when a job has taken too long to execute,
>> - * to trigger GPU recovery.
>> + * @timedout_job: Called when a job has taken too long to execute,
>> + * to trigger GPU recovery.
>> + *
>> + * Return 0, if the job has been aborted successfully and will
>> + * never be heard of from the device. Return non-zero if the
>> + * job wasn't able to be aborted, i.e. if more time should be
>> + * given to this job. The result is not "bool" as this
>> + * function is not a predicate, although its result may seem
>> + * as one.
> 
> I think the whole approach of timing out a job needs to be rethinked. 
> What's timing out here is the hardware engine, not the job.
> 
> So we should also not have the job as parameter here. Maybe we should 
> make that the fence we are waiting for instead.

Yes, I wanted this patch to be minimal, and not to disrupt
too many things.

Yes, in the future we can totally revamp this, but this
is a minimal patch.

> 
>>   */
>> -void (*timedout_job)(struct drm_sched_job *sched_job);
>> +int (*timedout_job)(struct drm_sched_job *sched_job);
> 
> I would either return an error code, boolean or enum here. But not use a 
> number without a define.

Yes, that's a great idea--I'll make the change now, and resubmit.

Regards,
Luben

> 
> Regards,
> Christian.
> 
>>   
>>  /**
>>* @free_job: Called once the job's finished fence has been 
>> signaled
> 

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[pull] amdgpu, radeon drm-next-5.11

Hi Dave, Daniel,

More updates for 5.11.

The following changes since commit 178631700f9dc40df754acbe766b55753ddcbfec:

  drm/amd/pm: fix spelling mistakes in dev_warn messages (2020-11-17 14:07:26 
-0500)

are available in the Git repository at:

  git://people.freedesktop.org/~agd5f/linux tags/amd-drm-next-5.11-2020-11-25

for you to fetch changes up to beaff108e1bf1e38c9def60dd09f7a4ed7910481:

  drm/amd/powerplay: fix spelling mistake "smu_state_memroy_block" -> 
"smu_state_memory_block" (2020-11-24 12:09:54 -0500)


amd-drm-next-5.11-2020-11-25:

amdgpu:
- Updates for Navy Flounder
- Updates for Dimgrey Cavefish
- Updates for Vangogh
- Add experimental support for VCN decode software ring
- Only register VGA devices with the VGA arbiter
- Clang warning fixes
- Add software IH handing
- Add cursor validation
- More W=1 fixes

radeon:
- More W=1 fixes


Alex Deucher (1):
  drm/amdgpu: only register VGA devices with the VGA arbiter

Aric Cyr (1):
  drm/amd/display: 3.2.113

Ashley Thomas (1):
  drm/amd/display: Source minimum HBlank support

Bernard Zhao (2):
  amdgpu/amdgpu_ids: fix kmalloc_array not uses number as first arg
  amd/amdgpu: use kmalloc_array to replace kmalloc with multiply

Bhawanpreet Lakha (3):
  drm/amd/display: Add display only once.
  drm/amd/display: Add comments to hdcp property change code
  drm/amd/display: Add DPCS regs for dcn302 link encoder

Camille Cho (1):
  drm/amd/display: To update backlight restore mechanism

Charlene Liu (1):
  drm/amd/display: add i2c speed arbitration for dc_i2c and hdcp_i2c

Chris Park (1):
  drm/amd/display: Update panel register

Christian König (7):
  drm/amdgpu: drop leading zeros from the gmc9 fault address
  drm/amdgpu: cleanup gmc_v10_0_process_interrupt a bit
  drm/amdgpu: add infrastructure for soft IH ring
  drm/amdgpu: enabled software IH ring for Vega
  drm/amdgpu: make sure retry faults are handled in a work item on Vega
  drm/amdgpu: enabled software IH ring for Navi
  drm/amdgpu: implement retry fault handling for Navi

Colin Ian King (1):
  drm/amd/powerplay: fix spelling mistake "smu_state_memroy_block" -> 
"smu_state_memory_block"

Eric Yang (1):
  drm/amd/display: expose clk_mgr functions for reuse

Gustavo A. R. Silva (4):
  drm/amdgpu: Fix fall-through warnings for Clang
  drm/radeon: Fix fall-through warnings for Clang
  drm/amd/display: Fix fall-through warnings for Clang
  drm/amd/pm: Fix fall-through warnings for Clang

Jacky Liao (3):
  drm/amd/display: Add DMCU memory low power support
  drm/amd/display: Add BLNDGAM memory shutdown support
  drm/amd/display: Add GAMCOR memory shutdown support

James Zhu (5):
  drm/amdgpu/vcn: refactor dec message functions
  drm/amdgpu/vcn: update header to support dec software ring
  drm/amdgpu/vcn: add test for dec software ring
  drm/amdgpu/vcn3.0: add dec software ring vm functions to support
  drm/amdgpu/vcn3.0: add software ring share memory support

Jiansong Chen (1):
  drm/amdgpu: update GC golden setting for navy_flounder

Jinzhou Su (1):
  drm/amdgpu: Add gfx doorbell setting for Vangogh

Kenneth Feng (2):
  drm/amd/amdgpu: fix null pointer in runtime pm
  drm/amd/amdgpu: skip unload message in reset

Lee Jones (27):
  drm/radeon/radeon_device: Consume our own header where the prototypes are 
located
  drm/amd/amdgpu/amdgpu_ttm: Add description for 'page_flags'
  drm/amd/amdgpu/amdgpu_ib: Provide docs for 'amdgpu_ib_schedule()'s 'job' 
param
  drm/amd/amdgpu/cik_ih: Supply description for 'ih' in 'cik_ih_{get, 
set}_wptr()'
  drm/amd/amdgpu/amdgpu_virt: Correct possible copy/paste or doc-rot 
misnaming issue
  drm/amd/amdgpu/uvd_v4_2: Fix some kernel-doc misdemeanours
  drm/amd/amdgpu/dce_v8_0: Supply description for 'async'
  drm/amd/amdgpu/cik_sdma: Supply some missing function param descriptions
  drm/amd/amdgpu/gfx_v7_0: Clean-up a bunch of kernel-doc related issues
  drm/amd/amdgpu/si_dma: Fix a bunch of function documentation issues
  drm/amd/amdgpu/gfx_v6_0: Supply description for 
'gfx_v6_0_ring_test_ib()'s 'timeout' param
  drm/amd/amdgpu/uvd_v3_1: Fix-up some documentation issues
  drm/amd/amdgpu/dce_v6_0: Fix formatting and missing parameter description 
issues
  drm/amd/include/vega20_ip_offset: Mark top-level IP_BASE definition as 
__maybe_unused
  drm/amd/include/navi10_ip_offset: Mark top-level IP_BASE as __maybe_unused
  drm/amd/include/arct_ip_offset: Mark top-level IP_BASE definition as 
__maybe_unused
  drm/amd/include/navi14_ip_offset: Mark top-level IP_BASE as __maybe_unused
  drm/amd/include/navi12_ip_offset: Mark top-level IP_BASE as __maybe_unused
  drm/amd/include/sienna_cichlid_ip_offset: Mark top-level IP_BASE

Re: [PATCH 2/6] gpu/drm: ring_mirror_list --> pending_list

On 2020-11-25 04:47, Christian König wrote:
> Am 25.11.20 um 04:17 schrieb Luben Tuikov:
>> Rename "ring_mirror_list" to "pending_list",
>> to describe what something is, not what it does,
>> how it's used, or how the hardware implements it.
>>
>> This also abstracts the actual hardware
>> implementation, i.e. how the low-level driver
>> communicates with the device it drives, ring, CAM,
>> etc., shouldn't be exposed to DRM.
>>
>> The pending_list keeps jobs submitted, which are
>> out of our control. Usually this means they are
>> pending execution status in hardware, but the
>> latter definition is a more general (inclusive)
>> definition.
>>
>> Signed-off-by: Luben Tuikov 
> 
> In general the rename is a good idea, but I think we should try to 
> remove this linked list in general.
> 
> As the original name described this is essentially a ring buffer, the is 
> no reason I can see to use a linked list here except for the add/remove 
> madness we currently have.
> 
> Anyway patch is Acked-by: Christian König  for 
> now.

Thanks for the Ack Christian.

Well this list is there now and I don't want to change too many
things or this patch would get out of hand.

Yeah, in the future, perhaps we can overhaul and change this. For now
this is a minimal rename-only patch.

Thanks,
Luben

> 
> Regards,
> Christian.
> 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  4 +--
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  4 +--
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  2 +-
>>   drivers/gpu/drm/scheduler/sched_main.c  | 34 ++---
>>   include/drm/gpu_scheduler.h | 10 +++---
>>   5 files changed, 27 insertions(+), 27 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>> index 8358cae0b5a4..db77a5bdfa45 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>> @@ -1427,7 +1427,7 @@ static void amdgpu_ib_preempt_job_recovery(struct 
>> drm_gpu_scheduler *sched)
>>  struct dma_fence *fence;
>>   
>>  spin_lock(>job_list_lock);
>> -list_for_each_entry(s_job, >ring_mirror_list, list) {
>> +list_for_each_entry(s_job, >pending_list, list) {
>>  fence = sched->ops->run_job(s_job);
>>  dma_fence_put(fence);
>>  }
>> @@ -1459,7 +1459,7 @@ static void amdgpu_ib_preempt_mark_partial_job(struct 
>> amdgpu_ring *ring)
>>   
>>   no_preempt:
>>  spin_lock(>job_list_lock);
>> -list_for_each_entry_safe(s_job, tmp, >ring_mirror_list, list) {
>> +list_for_each_entry_safe(s_job, tmp, >pending_list, list) {
>>  if (dma_fence_is_signaled(_job->s_fence->finished)) {
>>  /* remove job from ring_mirror_list */
>>  list_del_init(_job->list);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 4df6de81cd41..fbae600aa5f9 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -4127,8 +4127,8 @@ bool amdgpu_device_has_job_running(struct 
>> amdgpu_device *adev)
>>  continue;
>>   
>>  spin_lock(>sched.job_list_lock);
>> -job = list_first_entry_or_null(>sched.ring_mirror_list,
>> -struct drm_sched_job, list);
>> +job = list_first_entry_or_null(>sched.pending_list,
>> +   struct drm_sched_job, list);
>>  spin_unlock(>sched.job_list_lock);
>>  if (job)
>>  return true;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> index aca52a46b93d..ff48101bab55 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> @@ -271,7 +271,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct 
>> drm_gpu_scheduler *sched)
>>  }
>>   
>>  /* Signal all jobs already scheduled to HW */
>> -list_for_each_entry(s_job, >ring_mirror_list, list) {
>> +list_for_each_entry(s_job, >pending_list, list) {
>>  struct drm_sched_fence *s_fence = s_job->s_fence;
>>   
>>  dma_fence_set_error(_fence->finished, -EHWPOISON);
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index c52eba407ebd..b694df12aaba 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -198,7 +198,7 @@ EXPORT_SYMBOL(drm_sched_dependency_optimized);
>>   static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
>>   {
>>  if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>> -!list_empty(>ring_mirror_list))
>> +!list_empty(>pending_list))
>>  schedule_delayed_work(>work_tdr, sched->timeout);
>>   }
>>   
>> @@ -258,7 +258,7 @@ void

Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use

On Wed, Nov 25, 2020 at 01:57:40PM +0100, Christian König wrote:
> Am 25.11.20 um 11:40 schrieb Daniel Vetter:
> > On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:
> > > Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
> > > > On 11/24/20 2:41 AM, Christian König wrote:
> > > > > Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
> > > > > > On 11/23/20 3:41 PM, Christian König wrote:
> > > > > > > Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
> > > > > > > > On 11/23/20 3:20 PM, Christian König wrote:
> > > > > > > > > Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
> > > > > > > > > > On 11/25/20 5:42 AM, Christian König wrote:
> > > > > > > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > > > > > > > > > > > It's needed to drop iommu backed pages on device unplug
> > > > > > > > > > > > before device's IOMMU group is released.
> > > > > > > > > > > It would be cleaner if we could do the whole
> > > > > > > > > > > handling in TTM. I also need to double check
> > > > > > > > > > > what you are doing with this function.
> > > > > > > > > > > 
> > > > > > > > > > > Christian.
> > > > > > > > > > 
> > > > > > > > > > Check patch "drm/amdgpu: Register IOMMU topology
> > > > > > > > > > notifier per device." to see
> > > > > > > > > > how i use it. I don't see why this should go
> > > > > > > > > > into TTM mid-layer - the stuff I do inside
> > > > > > > > > > is vendor specific and also I don't think TTM is
> > > > > > > > > > explicitly aware of IOMMU ?
> > > > > > > > > > Do you mean you prefer the IOMMU notifier to be
> > > > > > > > > > registered from within TTM
> > > > > > > > > > and then use a hook to call into vendor specific handler ?
> > > > > > > > > No, that is really vendor specific.
> > > > > > > > > 
> > > > > > > > > What I meant is to have a function like
> > > > > > > > > ttm_resource_manager_evict_all() which you only need
> > > > > > > > > to call and all tt objects are unpopulated.
> > > > > > > > 
> > > > > > > > So instead of this BO list i create and later iterate in
> > > > > > > > amdgpu from the IOMMU patch you just want to do it
> > > > > > > > within
> > > > > > > > TTM with a single function ? Makes much more sense.
> > > > > > > Yes, exactly.
> > > > > > > 
> > > > > > > The list_empty() checks we have in TTM for the LRU are
> > > > > > > actually not the best idea, we should now check the
> > > > > > > pin_count instead. This way we could also have a list of the
> > > > > > > pinned BOs in TTM.
> > > > > > 
> > > > > > So from my IOMMU topology handler I will iterate the TTM LRU for
> > > > > > the unpinned BOs and this new function for the pinned ones  ?
> > > > > > It's probably a good idea to combine both iterations into this
> > > > > > new function to cover all the BOs allocated on the device.
> > > > > Yes, that's what I had in my mind as well.
> > > > > 
> > > > > > 
> > > > > > > BTW: Have you thought about what happens when we unpopulate
> > > > > > > a BO while we still try to use a kernel mapping for it? That
> > > > > > > could have unforeseen consequences.
> > > > > > 
> > > > > > Are you asking what happens to kmap or vmap style mapped CPU
> > > > > > accesses once we drop all the DMA backing pages for a particular
> > > > > > BO ? Because for user mappings
> > > > > > (mmap) we took care of this with dummy page reroute but indeed
> > > > > > nothing was done for in kernel CPU mappings.
> > > > > Yes exactly that.
> > > > > 
> > > > > In other words what happens if we free the ring buffer while the
> > > > > kernel still writes to it?
> > > > > 
> > > > > Christian.
> > > > 
> > > > While we can't control user application accesses to the mapped buffers
> > > > explicitly and hence we use page fault rerouting
> > > > I am thinking that in this  case we may be able to sprinkle
> > > > drm_dev_enter/exit in any such sensitive place were we might
> > > > CPU access a DMA buffer from the kernel ?
> > > Yes, I fear we are going to need that.
> > Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
> > could stuff this into begin/end_cpu_access (but only for the kernel, so a
> > bit tricky)?
> 
> Oh very very good point! I haven't thought about DMA-buf mmaps in this
> context yet.
> 
> 
> > btw the other issue with dma-buf (and even worse with dma_fence) is
> > refcounting of the underlying drm_device. I'd expect that all your
> > callbacks go boom if the dma_buf outlives your drm_device. That part isn't
> > yet solved in your series here.
> 
> Well thinking more about this, it seems to be a another really good argument
> why mapping pages from DMA-bufs into application address space directly is a
> very bad idea :)
> 
> But yes, we essentially can't remove the device as long as there is a
> DMA-buf with mappings. No idea how to clean that one up.

drm_dev_get/put in drm_prime helpers should get us like 90% there I think.

The even more worrying thing is random dma_fence attached to the dma_resv
object. We could try to

Re: [PATCH 4/7] drm/radeon: Pin buffers while they are vmap'ed

On Wed, Nov 25, 2020 at 12:38:01PM +0100, Thomas Zimmermann wrote:
> Hi
> 
> Am 25.11.20 um 11:36 schrieb Daniel Vetter:
> > On Wed, Nov 25, 2020 at 11:13:13AM +0100, Christian König wrote:
> > > Am 25.11.20 um 09:37 schrieb Thomas Zimmermann:
> > > > Hi
> > > > 
> > > > Am 24.11.20 um 15:09 schrieb Daniel Vetter:
> > > > > On Tue, Nov 24, 2020 at 02:56:51PM +0100, Thomas Zimmermann wrote:
> > > > > > Hi
> > > > > > 
> > > > > > Am 24.11.20 um 14:36 schrieb Christian König:
> > > > > > > Am 24.11.20 um 13:15 schrieb Thomas Zimmermann:
> > > > > > > > [SNIP]
> > > > > > > > > > > > First I wanted to put this into
> > > > > > > > > > > > drm_gem_ttm_vmap/vunmap(), but then wondered why
> > > > > > > > > > > > ttm_bo_vmap() doe not acquire the lock internally?
> > > > > > > > > > > > I'd expect that vmap/vunmap are close together and
> > > > > > > > > > > > do not overlap for the same BO.
> > > > > > > > > > > 
> > > > > > > > > > > We have use cases like the following during command 
> > > > > > > > > > > submission:
> > > > > > > > > > > 
> > > > > > > > > > > 1. lock
> > > > > > > > > > > 2. map
> > > > > > > > > > > 3. copy parts of the BO content somewhere else or patch
> > > > > > > > > > > it with additional information
> > > > > > > > > > > 4. unmap
> > > > > > > > > > > 5. submit BO to the hardware
> > > > > > > > > > > 6. add hardware fence to the BO to make sure it doesn't 
> > > > > > > > > > > move
> > > > > > > > > > > 7. unlock
> > > > > > > > > > > 
> > > > > > > > > > > That use case won't be possible with vmap/vunmap if we
> > > > > > > > > > > move the lock/unlock into it and I hope to replace the
> > > > > > > > > > > kmap/kunmap functions with them in the near term.
> > > > > > > > > > > 
> > > > > > > > > > > > Otherwise, acquiring the reservation lock would
> > > > > > > > > > > > require another ref-counting variable or per-driver
> > > > > > > > > > > > code.
> > > > > > > > > > > 
> > > > > > > > > > > Hui, why that? Just put this into
> > > > > > > > > > > drm_gem_ttm_vmap/vunmap() helper as you initially
> > > > > > > > > > > planned.
> > > > > > > > > > 
> > > > > > > > > > Given your example above, step one would acquire the lock,
> > > > > > > > > > and step two would also acquire the lock as part of the vmap
> > > > > > > > > > implementation. Wouldn't this fail (At least during unmap or
> > > > > > > > > > unlock steps) ?
> > > > > > > > > 
> > > > > > > > > Oh, so you want to nest them? No, that is a rather bad no-go.
> > > > > > > > 
> > > > > > > > I don't want to nest/overlap them. My question was whether that
> > > > > > > > would be required. Apparently not.
> > > > > > > > 
> > > > > > > > While the console's BO is being set for scanout, it's protected 
> > > > > > > > from
> > > > > > > > movement via the pin/unpin implementation, right?
> > > > > > > 
> > > > > > > Yes, correct.
> > > > > > > 
> > > > > > > > The driver does not acquire the resv lock for longer periods. 
> > > > > > > > I'm
> > > > > > > > asking because this would prevent any console-buffer updates 
> > > > > > > > while
> > > > > > > > the console is being displayed.
> > > > > > > 
> > > > > > > Correct as well, we only hold the lock for things like command
> > > > > > > submission, pinning, unpinning etc etc
> > > > > > > 
> > > > > > 
> > > > > > Thanks for answering my questions.
> > > > > > 
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > You need to make sure that the lock is only taken from the FB
> > > > > > > > > path which wants to vmap the object.
> > > > > > > > > 
> > > > > > > > > Why don't you lock the GEM object from the caller in the 
> > > > > > > > > generic
> > > > > > > > > FB implementation?
> > > > > > > > 
> > > > > > > > With the current blitter code, it breaks abstraction. if 
> > > > > > > > vmap/vunmap
> > > > > > > > hold the lock implicitly, things would be easier.
> > > > > > > 
> > > > > > > Do you have a link to the code?
> > > > > > 
> > > > > > It's the damage blitter in the fbdev code. [1] While it flushes
> > > > > > the shadow
> > > > > > buffer into the BO, the BO has to be kept in place. I already
> > > > > > changed it to
> > > > > > lock struct drm_fb_helper.lock, but I don't think this is
> > > > > > enough. TTM could
> > > > > > still evict the BO concurrently.
> > > > > 
> > > > > So I'm not sure this is actually a problem: ttm could try to
> > > > > concurrently
> > > > > evict the buffer we pinned into vram, and then just skip to the next
> > > > > one.
> > > > > 
> > > > > Plus atm generic fbdev isn't used on any chip where we really care 
> > > > > about
> > > > > that last few mb of vram being useable for command submission (well 
> > > > > atm
> > > > > there's no driver using it).
> > > > 
> > > > Well, this is the patchset for radeon. If it works out, amdgpu and
> > > > nouveau are natural next choices. Especially radeon and nouveau support
> > > > cards with low- to medium-sized VRAM. The MiBs wasted on fbdev certainly
> > > > matter.
>

Re: [Intel-wired-lan] [PATCH 000/141] Fix fall-through warnings for Clang

2020-11-25 Thread Jakub Kicinski

On Wed, 25 Nov 2020 04:24:27 -0800 Nick Desaulniers wrote:
> I even agree that most of the churn comes from
> 
> case 0:
>   ++x;
> default:
>   break;

And just to spell it out,

case ENUM_VALUE1:
bla();
break;
case ENUM_VALUE2:
bla();
default:
break;

is a fairly idiomatic way of indicating that not all values of the enum
are expected to be handled by the switch statement. 

I really hope the Clang folks are reasonable and merge your patch.

> If trivial patches are adding too much to your workload, consider
> training a co-maintainer or asking for help from one of your reviewers
> whom you trust.  I don't doubt it's hard to find maintainers, but
> existing maintainers should go out of their way to entrust
> co-maintainers especially when they find their workload becomes too
> high.  And reviewing/picking up trivial patches is probably a great
> way to get started.  If we allow too much knowledge of any one
> subsystem to collect with one maintainer, what happens when that
> maintainer leaves the community (which, given a finite lifespan, is an
> inevitability)?

The burn out point is about enjoying your work and feeling that it
matters. It really doesn't make much difference if you're doing
something you don't like for 12 hours every day or only in shifts with
another maintainer. You'll dislike it either way.

Applying a real patch set and then getting a few follow ups the next day
for trivial coding things like fallthrough missing or static missing,
just because I didn't have the full range of compilers to check with
before applying makes me feel pretty shitty, like I'm not doing a good
job. YMMV.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 4/4] drm/amdgpu: Enable GPU reset for vangogh

Enable GPU reset when we encounter a hang.

Acked-by: Evan Quan 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 79dd85f71fab..355fa0057c26 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4189,6 +4189,7 @@ bool amdgpu_device_should_recover_gpu(struct 
amdgpu_device *adev)
case CHIP_NAVI14:
case CHIP_NAVI12:
case CHIP_SIENNA_CICHLID:
+   case CHIP_VANGOGH:
break;
default:
goto disabled;
-- 
2.25.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 3/4] drm/amdgpu: fix mode2 reset sequence for vangogh

We need to save and restore PCI config space.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/nv.c | 34 -
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index 221a29cdc0aa..70d6556cd01d 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -336,6 +336,38 @@ static int nv_asic_mode1_reset(struct amdgpu_device *adev)
return ret;
 }
 
+static int nv_asic_mode2_reset(struct amdgpu_device *adev)
+{
+   u32 i;
+   int ret = 0;
+
+   amdgpu_atombios_scratch_regs_engine_hung(adev, true);
+
+   /* disable BM */
+   pci_clear_master(adev->pdev);
+
+   amdgpu_device_cache_pci_state(adev->pdev);
+
+   ret = amdgpu_dpm_mode2_reset(adev);
+   if (ret)
+   dev_err(adev->dev, "GPU mode2 reset failed\n");
+
+   amdgpu_device_load_pci_state(adev->pdev);
+
+   /* wait for asic to come out of reset */
+   for (i = 0; i < adev->usec_timeout; i++) {
+   u32 memsize = adev->nbio.funcs->get_memsize(adev);
+
+   if (memsize != 0x)
+   break;
+   udelay(1);
+   }
+
+   amdgpu_atombios_scratch_regs_engine_hung(adev, false);
+
+   return ret;
+}
+
 static bool nv_asic_supports_baco(struct amdgpu_device *adev)
 {
struct smu_context *smu = >smu;
@@ -392,7 +424,7 @@ static int nv_asic_reset(struct amdgpu_device *adev)
break;
case AMD_RESET_METHOD_MODE2:
dev_info(adev->dev, "MODE2 reset\n");
-   ret = amdgpu_dpm_mode2_reset(adev);
+   ret = nv_asic_mode2_reset(adev);
break;
default:
dev_info(adev->dev, "MODE1 reset\n");
-- 
2.25.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/4] drm/amdgpu/nv: add mode2 reset handling

Vangogh will use mode2 reset, so plumb it through the nv
soc driver.

Acked-by: Evan Quan 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/nv.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index ac02dd707c44..221a29cdc0aa 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -352,6 +352,7 @@ nv_asic_reset_method(struct amdgpu_device *adev)
struct smu_context *smu = >smu;
 
if (amdgpu_reset_method == AMD_RESET_METHOD_MODE1 ||
+   amdgpu_reset_method == AMD_RESET_METHOD_MODE2 ||
amdgpu_reset_method == AMD_RESET_METHOD_BACO)
return amdgpu_reset_method;
 
@@ -360,6 +361,8 @@ nv_asic_reset_method(struct amdgpu_device *adev)
  amdgpu_reset_method);
 
switch (adev->asic_type) {
+   case CHIP_VANGOGH:
+   return AMD_RESET_METHOD_MODE2;
case CHIP_SIENNA_CICHLID:
case CHIP_NAVY_FLOUNDER:
return AMD_RESET_METHOD_MODE1;
@@ -376,7 +379,8 @@ static int nv_asic_reset(struct amdgpu_device *adev)
int ret = 0;
struct smu_context *smu = >smu;
 
-   if (nv_asic_reset_method(adev) == AMD_RESET_METHOD_BACO) {
+   switch (nv_asic_reset_method(adev)) {
+   case AMD_RESET_METHOD_BACO:
dev_info(adev->dev, "BACO reset\n");
 
ret = smu_baco_enter(smu);
@@ -385,9 +389,15 @@ static int nv_asic_reset(struct amdgpu_device *adev)
ret = smu_baco_exit(smu);
if (ret)
return ret;
-   } else {
+   break;
+   case AMD_RESET_METHOD_MODE2:
+   dev_info(adev->dev, "MODE2 reset\n");
+   ret = amdgpu_dpm_mode2_reset(adev);
+   break;
+   default:
dev_info(adev->dev, "MODE1 reset\n");
ret = nv_asic_mode1_reset(adev);
+   break;
}
 
return ret;
-- 
2.25.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/4] drm/amdgpu: add mode2 reset support for vangogh

GPU reset is handled via SMU similar to previous APUs.

Acked-by: Evan Quan 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
index 1c1af7483dfe..3d4d27a304e9 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
@@ -780,6 +780,11 @@ static int 
vangogh_set_fine_grain_gfx_freq_parameters(struct smu_context *smu)
return 0;
 }
 
+static int vangogh_mode2_reset(struct smu_context *smu)
+{
+   return smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_GfxDeviceDriverReset, SMU_RESET_MODE_2, NULL);
+}
+
 static const struct pptable_funcs vangogh_ppt_funcs = {
 
.check_fw_status = smu_v11_0_check_fw_status,
@@ -808,6 +813,7 @@ static const struct pptable_funcs vangogh_ppt_funcs = {
.print_clk_levels = vangogh_print_fine_grain_clk,
.set_default_dpm_table = vangogh_set_default_dpm_tables,
.set_fine_grain_gfx_freq_parameters = 
vangogh_set_fine_grain_gfx_freq_parameters,
+   .mode2_reset = vangogh_mode2_reset,
 };
 
 void vangogh_set_ppt_funcs(struct smu_context *smu)
-- 
2.25.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[pull] amdgpu drm-fixes-5.10

Hi Dave, Daniel,

Fixes for 5.10.

The following changes since commit 6600f9d52213b5c3455481b5c9e61cf5e305c0e6:

  Merge tag 'drm-intel-fixes-2020-11-19' of 
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes (2020-11-20 11:21:54 
+1000)

are available in the Git repository at:

  git://people.freedesktop.org/~agd5f/linux tags/amd-drm-fixes-5.10-2020-11-25

for you to fetch changes up to 60734bd54679d7998a24a257b0403f7644005572:

  drm/amdgpu: update golden setting for sienna_cichlid (2020-11-24 12:33:07 
-0500)


amd-drm-fixes-5.10-2020-11-25:

amdgpu:
- Runtime pm fix
- SI UVD suspend/resume fix
- HDCP fix for headless cards
- Sienna Cichlid golden register update


Kenneth Feng (1):
  drm/amd/amdgpu: fix null pointer in runtime pm

Likun Gao (1):
  drm/amdgpu: update golden setting for sienna_cichlid

Rodrigo Siqueira (1):
  drm/amd/display: Avoid HDCP initialization in devices without output

Sonny Jiang (2):
  drm/amdgpu: fix SI UVD firmware validate resume fail
  drm/amdgpu: fix a page fault

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.h   |  1 +
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c|  2 ++
 drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c | 20 +++-
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  2 +-
 5 files changed, 17 insertions(+), 12 deletions(-)
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH][next] drm/amdgpu: Fix sizeof() mismatch in bps_bo kmalloc_array creation

2020-11-25 Thread Chen, Guchun

[AMD Public Use]

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: Colin King  
Sent: Wednesday, November 25, 2020 10:18 PM
To: Deucher, Alexander ; Koenig, Christian 
; David Airlie ; Daniel Vetter 
; Zhou1, Tao ; Chen, Guchun 
; amd-gfx@lists.freedesktop.org; 
dri-de...@lists.freedesktop.org
Cc: kernel-janit...@vger.kernel.org; linux-ker...@vger.kernel.org
Subject: [PATCH][next] drm/amdgpu: Fix sizeof() mismatch in bps_bo 
kmalloc_array creation

From: Colin Ian King 

An incorrect sizeof() is being used, sizeof((*data)->bps_bo) is not correct, it 
should be sizeof(*(*data)->bps_bo).  It just so happens to work because the 
sizes are the same.  Fix it.

Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)")
Fixes: 5278a159cf35 ("drm/amdgpu: support reserve bad page for virt (v3)")
Signed-off-by: Colin Ian King 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index 2d51b7694d1f..df15d33e3c5c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -283,7 +283,7 @@ static int amdgpu_virt_init_ras_err_handler_data(struct 
amdgpu_device *adev)
return -ENOMEM;
 
bps = kmalloc_array(align_space, sizeof((*data)->bps), GFP_KERNEL);
-   bps_bo = kmalloc_array(align_space, sizeof((*data)->bps_bo), 
GFP_KERNEL);
+   bps_bo = kmalloc_array(align_space, sizeof(*(*data)->bps_bo), 
+GFP_KERNEL);
 
if (!bps || !bps_bo) {
kfree(bps);
--
2.29.2
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/radeon: fix check order in radeon_bo_move

Reorder the code to fix checking if blitting is available.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/radeon/radeon_ttm.c | 54 +
 1 file changed, 24 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 0ca381b95d3d..2b598141225f 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -216,27 +216,15 @@ static int radeon_bo_move(struct ttm_buffer_object *bo, 
bool evict,
struct ttm_resource *old_mem = >mem;
int r;
 
-   if ((old_mem->mem_type == TTM_PL_SYSTEM &&
-new_mem->mem_type == TTM_PL_VRAM) ||
-   (old_mem->mem_type == TTM_PL_VRAM &&
-new_mem->mem_type == TTM_PL_SYSTEM)) {
-   hop->fpfn = 0;
-   hop->lpfn = 0;
-   hop->mem_type = TTM_PL_TT;
-   hop->flags = 0;
-   return -EMULTIHOP;
-   }
-
if (new_mem->mem_type == TTM_PL_TT) {
r = radeon_ttm_tt_bind(bo->bdev, bo->ttm, new_mem);
if (r)
return r;
}
-   radeon_bo_move_notify(bo, evict, new_mem);
 
r = ttm_bo_wait_ctx(bo, ctx);
if (r)
-   goto fail;
+   return r;
 
/* Can't move a pinned BO */
rbo = container_of(bo, struct radeon_bo, tbo);
@@ -246,12 +234,12 @@ static int radeon_bo_move(struct ttm_buffer_object *bo, 
bool evict,
rdev = radeon_get_rdev(bo->bdev);
if (old_mem->mem_type == TTM_PL_SYSTEM && bo->ttm == NULL) {
ttm_bo_move_null(bo, new_mem);
-   return 0;
+   goto out;
}
if (old_mem->mem_type == TTM_PL_SYSTEM &&
new_mem->mem_type == TTM_PL_TT) {
ttm_bo_move_null(bo, new_mem);
-   return 0;
+   goto out;
}
 
if (old_mem->mem_type == TTM_PL_TT &&
@@ -259,31 +247,37 @@ static int radeon_bo_move(struct ttm_buffer_object *bo, 
bool evict,
radeon_ttm_tt_unbind(bo->bdev, bo->ttm);
ttm_resource_free(bo, >mem);
ttm_bo_assign_mem(bo, new_mem);
-   return 0;
+   goto out;
}
-   if (!rdev->ring[radeon_copy_ring_index(rdev)].ready ||
-   rdev->asic->copy.copy == NULL) {
-   /* use memcpy */
-   goto memcpy;
+   if (rdev->ring[radeon_copy_ring_index(rdev)].ready &&
+   rdev->asic->copy.copy != NULL) {
+   if ((old_mem->mem_type == TTM_PL_SYSTEM &&
+new_mem->mem_type == TTM_PL_VRAM) ||
+   (old_mem->mem_type == TTM_PL_VRAM &&
+new_mem->mem_type == TTM_PL_SYSTEM)) {
+   hop->fpfn = 0;
+   hop->lpfn = 0;
+   hop->mem_type = TTM_PL_TT;
+   hop->flags = 0;
+   return -EMULTIHOP;
+   }
+
+   r = radeon_move_blit(bo, evict, new_mem, old_mem);
+   } else {
+   r = -ENODEV;
}
 
-   r = radeon_move_blit(bo, evict, new_mem, old_mem);
if (r) {
-memcpy:
r = ttm_bo_move_memcpy(bo, ctx, new_mem);
-   if (r) {
-   goto fail;
-   }
+   if (r)
+   return r;
}
 
+out:
/* update statistics */
atomic64_add((u64)bo->num_pages << PAGE_SHIFT, >num_bytes_moved);
+   radeon_bo_move_notify(bo, evict, new_mem);
return 0;
-fail:
-   swap(*new_mem, bo->mem);
-   radeon_bo_move_notify(bo, false, new_mem);
-   swap(*new_mem, bo->mem);
-   return r;
 }
 
 static int radeon_ttm_io_mem_reserve(struct ttm_bo_device *bdev, struct 
ttm_resource *mem)
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu/vcn3.0: fix compilation warning

2020-11-25 Thread Nirmoy


Reviewed-by: Nirmoy Das 

On 11/25/20 3:09 PM, James Zhu wrote:

Fixed warning: no previous prototype.

Signed-off-by: James Zhu 
Reported-by: kernel test robot 
---
  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
index 71e10bf..0c97322 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
@@ -1701,7 +1701,7 @@ static void vcn_v3_0_dec_ring_set_wptr(struct amdgpu_ring 
*ring)
}
  }
  
-void vcn_v3_0_dec_sw_ring_emit_fence(struct amdgpu_ring *ring, u64 addr,

+static void vcn_v3_0_dec_sw_ring_emit_fence(struct amdgpu_ring *ring, u64 addr,
u64 seq, uint32_t flags)
  {
WARN_ON(flags & AMDGPU_FENCE_FLAG_64BIT);
@@ -1713,12 +1713,12 @@ void vcn_v3_0_dec_sw_ring_emit_fence(struct amdgpu_ring 
*ring, u64 addr,
amdgpu_ring_write(ring, VCN_DEC_SW_CMD_TRAP);
  }
  
-void vcn_v3_0_dec_sw_ring_insert_end(struct amdgpu_ring *ring)

+static void vcn_v3_0_dec_sw_ring_insert_end(struct amdgpu_ring *ring)
  {
amdgpu_ring_write(ring, VCN_DEC_SW_CMD_END);
  }
  
-void vcn_v3_0_dec_sw_ring_emit_ib(struct amdgpu_ring *ring,

+static void vcn_v3_0_dec_sw_ring_emit_ib(struct amdgpu_ring *ring,
   struct amdgpu_job *job,
   struct amdgpu_ib *ib,
   uint32_t flags)
@@ -1732,7 +1732,7 @@ void vcn_v3_0_dec_sw_ring_emit_ib(struct amdgpu_ring 
*ring,
amdgpu_ring_write(ring, ib->length_dw);
  }
  
-void vcn_v3_0_dec_sw_ring_emit_reg_wait(struct amdgpu_ring *ring, uint32_t reg,

+static void vcn_v3_0_dec_sw_ring_emit_reg_wait(struct amdgpu_ring *ring, 
uint32_t reg,
uint32_t val, uint32_t mask)
  {
amdgpu_ring_write(ring, VCN_DEC_SW_CMD_REG_WAIT);
@@ -1741,7 +1741,7 @@ void vcn_v3_0_dec_sw_ring_emit_reg_wait(struct 
amdgpu_ring *ring, uint32_t reg,
amdgpu_ring_write(ring, val);
  }
  
-void vcn_v3_0_dec_sw_ring_emit_vm_flush(struct amdgpu_ring *ring,

+static void vcn_v3_0_dec_sw_ring_emit_vm_flush(struct amdgpu_ring *ring,
uint32_t vmid, uint64_t pd_addr)
  {
struct amdgpu_vmhub *hub = >adev->vmhub[ring->funcs->vmhub];
@@ -1756,7 +1756,7 @@ void vcn_v3_0_dec_sw_ring_emit_vm_flush(struct 
amdgpu_ring *ring,
vcn_v3_0_dec_sw_ring_emit_reg_wait(ring, data0, data1, mask);
  }
  
-void vcn_v3_0_dec_sw_ring_emit_wreg(struct amdgpu_ring *ring, uint32_t reg, uint32_t val)

+static void vcn_v3_0_dec_sw_ring_emit_wreg(struct amdgpu_ring *ring, uint32_t 
reg, uint32_t val)
  {
amdgpu_ring_write(ring, VCN_DEC_SW_CMD_REG_WRITE);
amdgpu_ring_write(ring, reg << 2);

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH][next] drm/amdgpu: Fix sizeof() mismatch in bps_bo kmalloc_array creation

2020-11-25 Thread Colin King

From: Colin Ian King 

An incorrect sizeof() is being used, sizeof((*data)->bps_bo) is not
correct, it should be sizeof(*(*data)->bps_bo).  It just so happens
to work because the sizes are the same.  Fix it.

Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)")
Fixes: 5278a159cf35 ("drm/amdgpu: support reserve bad page for virt (v3)")
Signed-off-by: Colin Ian King 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index 2d51b7694d1f..df15d33e3c5c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -283,7 +283,7 @@ static int amdgpu_virt_init_ras_err_handler_data(struct 
amdgpu_device *adev)
return -ENOMEM;
 
bps = kmalloc_array(align_space, sizeof((*data)->bps), GFP_KERNEL);
-   bps_bo = kmalloc_array(align_space, sizeof((*data)->bps_bo), 
GFP_KERNEL);
+   bps_bo = kmalloc_array(align_space, sizeof(*(*data)->bps_bo), 
GFP_KERNEL);
 
if (!bps || !bps_bo) {
kfree(bps);
-- 
2.29.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH][next] drm/amdgpu: Fix sizeof() mismatch in bps_bo kmalloc_array creation


Am 25.11.20 um 15:18 schrieb Colin King:

From: Colin Ian King 

An incorrect sizeof() is being used, sizeof((*data)->bps_bo) is not
correct, it should be sizeof(*(*data)->bps_bo).  It just so happens
to work because the sizes are the same.  Fix it.

Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)")
Fixes: 5278a159cf35 ("drm/amdgpu: support reserve bad page for virt (v3)")
Signed-off-by: Colin Ian King 


Acked-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index 2d51b7694d1f..df15d33e3c5c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -283,7 +283,7 @@ static int amdgpu_virt_init_ras_err_handler_data(struct 
amdgpu_device *adev)
return -ENOMEM;
  
  	bps = kmalloc_array(align_space, sizeof((*data)->bps), GFP_KERNEL);

-   bps_bo = kmalloc_array(align_space, sizeof((*data)->bps_bo), 
GFP_KERNEL);
+   bps_bo = kmalloc_array(align_space, sizeof(*(*data)->bps_bo), 
GFP_KERNEL);
  
  	if (!bps || !bps_bo) {

kfree(bps);


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu/SRIOV: Extend VF reset request wait period

2020-11-25 Thread jianzh

From: Jiange Zhao 

In Virtualization case, when one VF is sending too many
FLR requests, hypervisor would stop responding to this
VF's request for a long period of time. This is called
event guard. During this period of cooling time, guest
driver should wait instead of doing other things. After
this period of time, guest driver would resume reset
process and return to normal.

Currently, guest driver would wait 12 seconds and return fail
if it doesn't get response from host.

Solution: extend this waiting time in guest driver and poll
response periodically.

Signed-off-by: Jiange Zhao 
---
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 11 ++-
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h |  2 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 11 ++-
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c 
b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
index f5ce9a9f4cf5..d8d8c623bb74 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
@@ -187,7 +187,16 @@ static int xgpu_ai_send_access_requests(struct 
amdgpu_device *adev,
 
 static int xgpu_ai_request_reset(struct amdgpu_device *adev)
 {
-   return xgpu_ai_send_access_requests(adev, IDH_REQ_GPU_RESET_ACCESS);
+   int ret, i = 0;
+
+   while (i < 11) {
+   ret = xgpu_ai_send_access_requests(adev, 
IDH_REQ_GPU_RESET_ACCESS);
+   if (!ret)
+   break;
+   i++;
+   }
+
+   return ret;
 }
 
 static int xgpu_ai_request_full_gpu_access(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h 
b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h
index 83b453f5d717..20ee2142f9ed 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h
@@ -25,7 +25,7 @@
 #define __MXGPU_AI_H__
 
 #define AI_MAILBOX_POLL_ACK_TIMEDOUT   500
-#define AI_MAILBOX_POLL_MSG_TIMEDOUT   12000
+#define AI_MAILBOX_POLL_MSG_TIMEDOUT   6000
 #define AI_MAILBOX_POLL_FLR_TIMEDOUT   5000
 
 enum idh_request {
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c 
b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
index 666ed99cc14b..0147dfe21a39 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
@@ -200,7 +200,16 @@ static int xgpu_nv_send_access_requests(struct 
amdgpu_device *adev,
 
 static int xgpu_nv_request_reset(struct amdgpu_device *adev)
 {
-   return xgpu_nv_send_access_requests(adev, IDH_REQ_GPU_RESET_ACCESS);
+   int ret, i = 0;
+
+   while (i < 11) {
+   ret = xgpu_nv_send_access_requests(adev, 
IDH_REQ_GPU_RESET_ACCESS);
+   if (!ret)
+   break;
+   i++;
+   }
+
+   return ret;
 }
 
 static int xgpu_nv_request_full_gpu_access(struct amdgpu_device *adev,
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu/vcn3.0: fix compilation warning

2020-11-25 Thread James Zhu

Fixed warning: no previous prototype.

Signed-off-by: James Zhu 
Reported-by: kernel test robot 
---
 drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
index 71e10bf..0c97322 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
@@ -1701,7 +1701,7 @@ static void vcn_v3_0_dec_ring_set_wptr(struct amdgpu_ring 
*ring)
}
 }
 
-void vcn_v3_0_dec_sw_ring_emit_fence(struct amdgpu_ring *ring, u64 addr,
+static void vcn_v3_0_dec_sw_ring_emit_fence(struct amdgpu_ring *ring, u64 addr,
u64 seq, uint32_t flags)
 {
WARN_ON(flags & AMDGPU_FENCE_FLAG_64BIT);
@@ -1713,12 +1713,12 @@ void vcn_v3_0_dec_sw_ring_emit_fence(struct amdgpu_ring 
*ring, u64 addr,
amdgpu_ring_write(ring, VCN_DEC_SW_CMD_TRAP);
 }
 
-void vcn_v3_0_dec_sw_ring_insert_end(struct amdgpu_ring *ring)
+static void vcn_v3_0_dec_sw_ring_insert_end(struct amdgpu_ring *ring)
 {
amdgpu_ring_write(ring, VCN_DEC_SW_CMD_END);
 }
 
-void vcn_v3_0_dec_sw_ring_emit_ib(struct amdgpu_ring *ring,
+static void vcn_v3_0_dec_sw_ring_emit_ib(struct amdgpu_ring *ring,
   struct amdgpu_job *job,
   struct amdgpu_ib *ib,
   uint32_t flags)
@@ -1732,7 +1732,7 @@ void vcn_v3_0_dec_sw_ring_emit_ib(struct amdgpu_ring 
*ring,
amdgpu_ring_write(ring, ib->length_dw);
 }
 
-void vcn_v3_0_dec_sw_ring_emit_reg_wait(struct amdgpu_ring *ring, uint32_t reg,
+static void vcn_v3_0_dec_sw_ring_emit_reg_wait(struct amdgpu_ring *ring, 
uint32_t reg,
uint32_t val, uint32_t mask)
 {
amdgpu_ring_write(ring, VCN_DEC_SW_CMD_REG_WAIT);
@@ -1741,7 +1741,7 @@ void vcn_v3_0_dec_sw_ring_emit_reg_wait(struct 
amdgpu_ring *ring, uint32_t reg,
amdgpu_ring_write(ring, val);
 }
 
-void vcn_v3_0_dec_sw_ring_emit_vm_flush(struct amdgpu_ring *ring,
+static void vcn_v3_0_dec_sw_ring_emit_vm_flush(struct amdgpu_ring *ring,
uint32_t vmid, uint64_t pd_addr)
 {
struct amdgpu_vmhub *hub = >adev->vmhub[ring->funcs->vmhub];
@@ -1756,7 +1756,7 @@ void vcn_v3_0_dec_sw_ring_emit_vm_flush(struct 
amdgpu_ring *ring,
vcn_v3_0_dec_sw_ring_emit_reg_wait(ring, data0, data1, mask);
 }
 
-void vcn_v3_0_dec_sw_ring_emit_wreg(struct amdgpu_ring *ring, uint32_t reg, 
uint32_t val)
+static void vcn_v3_0_dec_sw_ring_emit_wreg(struct amdgpu_ring *ring, uint32_t 
reg, uint32_t val)
 {
amdgpu_ring_write(ring, VCN_DEC_SW_CMD_REG_WRITE);
amdgpu_ring_write(ring, reg << 2);
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 11/15] drm/radeon: Remove references to struct drm_device.pdev

On Tue, Nov 24, 2020 at 6:39 AM Thomas Zimmermann  wrote:
>
> Using struct drm_device.pdev is deprecated. Convert radeon to struct
> drm_device.dev. No functional changes.
>
> Signed-off-by: Thomas Zimmermann 
> Cc: Alex Deucher 
> Cc: Christian König 

There are a few unrelated whitespace changes.  Other than that, patch is:
Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/radeon/atombios_encoders.c|  6 +-
>  drivers/gpu/drm/radeon/r100.c | 27 +++---
>  drivers/gpu/drm/radeon/radeon.h   | 32 +++
>  drivers/gpu/drm/radeon/radeon_atombios.c  | 89 ++-
>  drivers/gpu/drm/radeon/radeon_bios.c  |  6 +-
>  drivers/gpu/drm/radeon/radeon_combios.c   | 55 ++--
>  drivers/gpu/drm/radeon/radeon_cs.c|  3 +-
>  drivers/gpu/drm/radeon/radeon_device.c| 17 ++--
>  drivers/gpu/drm/radeon/radeon_display.c   |  2 +-
>  drivers/gpu/drm/radeon/radeon_drv.c   |  3 +-
>  drivers/gpu/drm/radeon/radeon_fb.c|  2 +-
>  drivers/gpu/drm/radeon/radeon_gem.c   |  6 +-
>  drivers/gpu/drm/radeon/radeon_i2c.c   |  2 +-
>  drivers/gpu/drm/radeon/radeon_irq_kms.c   |  2 +-
>  drivers/gpu/drm/radeon/radeon_kms.c   | 20 ++---
>  .../gpu/drm/radeon/radeon_legacy_encoders.c   |  6 +-
>  drivers/gpu/drm/radeon/rs780_dpm.c|  7 +-
>  17 files changed, 144 insertions(+), 141 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/atombios_encoders.c 
> b/drivers/gpu/drm/radeon/atombios_encoders.c
> index cc5ee1b3af84..a9ae8b6c5991 100644
> --- a/drivers/gpu/drm/radeon/atombios_encoders.c
> +++ b/drivers/gpu/drm/radeon/atombios_encoders.c
> @@ -2065,9 +2065,9 @@ atombios_apply_encoder_quirks(struct drm_encoder 
> *encoder,
> struct radeon_crtc *radeon_crtc = to_radeon_crtc(encoder->crtc);
>
> /* Funky macbooks */
> -   if ((dev->pdev->device == 0x71C5) &&
> -   (dev->pdev->subsystem_vendor == 0x106b) &&
> -   (dev->pdev->subsystem_device == 0x0080)) {
> +   if ((rdev->pdev->device == 0x71C5) &&
> +   (rdev->pdev->subsystem_vendor == 0x106b) &&
> +   (rdev->pdev->subsystem_device == 0x0080)) {
> if (radeon_encoder->devices & ATOM_DEVICE_LCD1_SUPPORT) {
> uint32_t lvtma_bit_depth_control = 
> RREG32(AVIVO_LVTMA_BIT_DEPTH_CONTROL);
>
> diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
> index 24c8db673931..984eeb893d76 100644
> --- a/drivers/gpu/drm/radeon/r100.c
> +++ b/drivers/gpu/drm/radeon/r100.c
> @@ -2611,7 +2611,6 @@ int r100_asic_reset(struct radeon_device *rdev, bool 
> hard)
>
>  void r100_set_common_regs(struct radeon_device *rdev)
>  {
> -   struct drm_device *dev = rdev->ddev;
> bool force_dac2 = false;
> u32 tmp;
>
> @@ -2629,7 +2628,7 @@ void r100_set_common_regs(struct radeon_device *rdev)
>  * don't report it in the bios connector
>  * table.
>  */
> -   switch (dev->pdev->device) {
> +   switch (rdev->pdev->device) {
> /* RN50 */
> case 0x515e:
> case 0x5969:
> @@ -2639,17 +2638,17 @@ void r100_set_common_regs(struct radeon_device *rdev)
> case 0x5159:
> case 0x515a:
> /* DELL triple head servers */
> -   if ((dev->pdev->subsystem_vendor == 0x1028 /* DELL */) &&
> -   ((dev->pdev->subsystem_device == 0x016c) ||
> -(dev->pdev->subsystem_device == 0x016d) ||
> -(dev->pdev->subsystem_device == 0x016e) ||
> -(dev->pdev->subsystem_device == 0x016f) ||
> -(dev->pdev->subsystem_device == 0x0170) ||
> -(dev->pdev->subsystem_device == 0x017d) ||
> -(dev->pdev->subsystem_device == 0x017e) ||
> -(dev->pdev->subsystem_device == 0x0183) ||
> -(dev->pdev->subsystem_device == 0x018a) ||
> -(dev->pdev->subsystem_device == 0x019a)))
> +   if ((rdev->pdev->subsystem_vendor == 0x1028 /* DELL */) &&
> +   ((rdev->pdev->subsystem_device == 0x016c) ||
> +(rdev->pdev->subsystem_device == 0x016d) ||
> +(rdev->pdev->subsystem_device == 0x016e) ||
> +(rdev->pdev->subsystem_device == 0x016f) ||
> +(rdev->pdev->subsystem_device == 0x0170) ||
> +(rdev->pdev->subsystem_device == 0x017d) ||
> +(rdev->pdev->subsystem_device == 0x017e) ||
> +(rdev->pdev->subsystem_device == 0x0183) ||
> +(rdev->pdev->subsystem_device == 0x018a) ||
> +(rdev->pdev->subsystem_device == 0x019a)))
> force_dac2 = true;
> break;
> }
> @@ -2797,7 +2796,7 @@ void r100_vram_init_sizes(struct radeon_device *rdev)
>

Re: [PATCH 01/15] drm/amdgpu: Remove references to struct drm_device.pdev

On Tue, Nov 24, 2020 at 6:38 AM Thomas Zimmermann  wrote:
>
> Using struct drm_device.pdev is deprecated. Convert amdgpu to struct
> drm_device.dev. No functional changes.
>
> Signed-off-by: Thomas Zimmermann 
> Cc: Alex Deucher 
> Cc: Christian König 

There are a few unrelated whitespace changes.  Other than that, patch is:
Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  | 23 ++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_display.c |  3 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  1 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c  |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 10 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 10 -
>  7 files changed, 25 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 7560b05e4ac1..d61715133825 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1404,9 +1404,9 @@ static void amdgpu_switcheroo_set_state(struct pci_dev 
> *pdev,
> /* don't suspend or resume card normally */
> dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
>
> -   pci_set_power_state(dev->pdev, PCI_D0);
> -   amdgpu_device_load_pci_state(dev->pdev);
> -   r = pci_enable_device(dev->pdev);
> +   pci_set_power_state(pdev, PCI_D0);
> +   amdgpu_device_load_pci_state(pdev);
> +   r = pci_enable_device(pdev);
> if (r)
> DRM_WARN("pci_enable_device failed (%d)\n", r);
> amdgpu_device_resume(dev, true);
> @@ -1418,10 +1418,10 @@ static void amdgpu_switcheroo_set_state(struct 
> pci_dev *pdev,
> drm_kms_helper_poll_disable(dev);
> dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
> amdgpu_device_suspend(dev, true);
> -   amdgpu_device_cache_pci_state(dev->pdev);
> +   amdgpu_device_cache_pci_state(pdev);
> /* Shut down the device */
> -   pci_disable_device(dev->pdev);
> -   pci_set_power_state(dev->pdev, PCI_D3cold);
> +   pci_disable_device(pdev);
> +   pci_set_power_state(pdev, PCI_D3cold);
> dev->switch_power_state = DRM_SWITCH_POWER_OFF;
> }
>  }
> @@ -1684,8 +1684,7 @@ static void amdgpu_device_enable_virtual_display(struct 
> amdgpu_device *adev)
> adev->enable_virtual_display = false;
>
> if (amdgpu_virtual_display) {
> -   struct drm_device *ddev = adev_to_drm(adev);
> -   const char *pci_address_name = pci_name(ddev->pdev);
> +   const char *pci_address_name = pci_name(adev->pdev);
> char *pciaddstr, *pciaddstr_tmp, *pciaddname_tmp, *pciaddname;
>
> pciaddstr = kstrdup(amdgpu_virtual_display, GFP_KERNEL);
> @@ -3375,7 +3374,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> }
> }
>
> -   pci_enable_pcie_error_reporting(adev->ddev.pdev);
> +   pci_enable_pcie_error_reporting(adev->pdev);
>
> /* Post card if necessary */
> if (amdgpu_device_need_post(adev)) {
> @@ -4922,8 +4921,8 @@ pci_ers_result_t amdgpu_pci_error_detected(struct 
> pci_dev *pdev, pci_channel_sta
> case pci_channel_io_normal:
> return PCI_ERS_RESULT_CAN_RECOVER;
> /* Fatal error, prepare for slot reset */
> -   case pci_channel_io_frozen:
> -   /*
> +   case pci_channel_io_frozen:
> +   /*
>  * Cancel and wait for all TDRs in progress if failing to
>  * set  adev->in_gpu_reset in amdgpu_device_lock_adev
>  *
> @@ -5014,7 +5013,7 @@ pci_ers_result_t amdgpu_pci_slot_reset(struct pci_dev 
> *pdev)
> goto out;
> }
>
> -   adev->in_pci_err_recovery = true;
> +   adev->in_pci_err_recovery = true;
> r = amdgpu_device_pre_asic_reset(adev, NULL, _full_reset);
> adev->in_pci_err_recovery = false;
> if (r)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> index 2e8a8b57639f..77974c3981fa 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> @@ -721,13 +721,14 @@ amdgpu_display_user_framebuffer_create(struct 
> drm_device *dev,
>struct drm_file *file_priv,
>const struct drm_mode_fb_cmd2 
> *mode_cmd)
>  {
> +   struct amdgpu_device *adev = drm_to_adev(dev);
> struct drm_gem_object *obj;
> struct amdgpu_framebuffer *amdgpu_fb;
> int ret;
>
> obj = drm_gem_object_lookup(file_priv, mode_cmd->handles[0]);
>

Re: [PATCH] drm/amdgpu: set LDS_CONFIG=0x20 on VanGogh to fix MGCG hang

Acked-by: Alex Deucher 

On Wed, Nov 25, 2020 at 1:45 AM Marek Olšák  wrote:
>
> Please review. This fixes an LDS hang that occurs with NGG, but may occur 
> with other shader stages too.
>
> Thanks,
> Marek
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 000/141] Fix fall-through warnings for Clang

2020-11-25 Thread Andy Shevchenko

On Mon, Nov 23, 2020 at 10:39 PM James Bottomley
 wrote:
> On Mon, 2020-11-23 at 19:56 +0100, Miguel Ojeda wrote:
> > On Mon, Nov 23, 2020 at 4:58 PM James Bottomley
> >  wrote:

...

> > But if we do the math, for an author, at even 1 minute per line
> > change and assuming nothing can be automated at all, it would take 1
> > month of work. For maintainers, a couple of trivial lines is noise
> > compared to many other patches.
>
> So you think a one line patch should take one minute to produce ... I
> really don't think that's grounded in reality.  I suppose a one line
> patch only takes a minute to merge with b4 if no-one reviews or tests
> it, but that's not really desirable.

In my practice most of the one line patches were either to fix or to
introduce quite interesting issues.
1 minute is 2-3 orders less than usually needed for such patches.
That's why I don't like churn produced by people who often even didn't
compile their useful contributions.

-- 
With Best Regards,
Andy Shevchenko
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 000/141] Fix fall-through warnings for Clang

2020-11-25 Thread Sean Young

On Mon, Nov 23, 2020 at 07:58:06AM -0800, James Bottomley wrote:
> On Mon, 2020-11-23 at 15:19 +0100, Miguel Ojeda wrote:
> > On Sun, Nov 22, 2020 at 11:36 PM James Bottomley
> >  wrote:
> > > It's not about the risk of the changes it's about the cost of
> > > implementing them.  Even if you discount the producer time (which
> > > someone gets to pay for, and if I were the engineering manager, I'd
> > > be unhappy about), the review/merge/rework time is pretty
> > > significant in exchange for six minor bug fixes.  Fine, when a new
> > > compiler warning comes along it's certainly reasonable to see if we
> > > can benefit from it and the fact that the compiler people think
> > > it's worthwhile is enough evidence to assume this initially.  But
> > > at some point you have to ask whether that assumption is supported
> > > by the evidence we've accumulated over the time we've been using
> > > it.  And if the evidence doesn't support it perhaps it is time to
> > > stop the experiment.
> > 
> > Maintainers routinely review 1-line trivial patches, not to mention
> > internal API changes, etc.
> 
> We're also complaining about the inability to recruit maintainers:
> 
> https://www.theregister.com/2020/06/30/hard_to_find_linux_maintainers_says_torvalds/
> 
> And burn out:
> 
> http://antirez.com/news/129
> 
> The whole crux of your argument seems to be maintainers' time isn't
> important so we should accept all trivial patches ... I'm pushing back
> on that assumption in two places, firstly the valulessness of the time
> and secondly that all trivial patches are valuable.

You're assuming burn out or recruitment problems is due to patch workload
or too many "trivial" patches.

In my experience, "other maintainers" is by far the biggest cause of
burn out for my kernel maintenance work.

Certainly arguing with a maintainer about some obviously-correct patch
series must be a good example of this.


Sean
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: increase reserved VRAM size to 8MB

2020-11-25 Thread Chen, Guchun

[AMD Public Use]

/* Reserve 4MB VRAM for page tables */

This comment needs to correct to 8MB as well. With this addressed, the patch: 
Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Likun Gao
Sent: Wednesday, November 25, 2020 9:12 PM
To: amd-gfx@lists.freedesktop.org
Cc: Gao, Likun ; Zhang, Hawking 
Subject: [PATCH] drm/amdgpu: increase reserved VRAM size to 8MB

From: Likun Gao 

4MB reserved VRAM size which used for page tables was not enough for some 
condition, increase it to 8MB to reduce page table contention.

Signed-off-by: Likun Gao 
Change-Id: Ibbc0c14a75bd0e57d77e30b7140a144f4030114a
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index fa7d181934e5..1ed130d518a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -104,7 +104,7 @@ struct amdgpu_bo_list_entry;
 #define AMDGPU_VM_FAULT_STOP_ALWAYS2
 
 /* Reserve 4MB VRAM for page tables */
-#define AMDGPU_VM_RESERVED_VRAM(4ULL << 20)
+#define AMDGPU_VM_RESERVED_VRAM(8ULL << 20)
 
 /* max number of VMHUB */
 #define AMDGPU_MAX_VMHUBS  3
--
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=04%7C01%7Cguchun.chen%40amd.com%7Ce05ef079631e47d5397608d89143c34a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637419067540371620%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=1O6mceVaMU%2F88bqL1iXBXY6EvRJQBPVdjbgr%2FZOPvbk%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: increase reserved VRAM size to 8MB


Am 25.11.20 um 14:12 schrieb Likun Gao:

From: Likun Gao 

4MB reserved VRAM size which used for page tables was not enough for
some condition, increase it to 8MB to reduce page table contention.


What's the use case here? 8MB is already pretty extensive, I don't want 
to run into problems with APUs.



Signed-off-by: Likun Gao 
Change-Id: Ibbc0c14a75bd0e57d77e30b7140a144f4030114a


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index fa7d181934e5..1ed130d518a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -104,7 +104,7 @@ struct amdgpu_bo_list_entry;
  #define AMDGPU_VM_FAULT_STOP_ALWAYS   2
  
  /* Reserve 4MB VRAM for page tables */

-#define AMDGPU_VM_RESERVED_VRAM(4ULL << 20)
+#define AMDGPU_VM_RESERVED_VRAM(8ULL << 20)
  
  /* max number of VMHUB */

  #define AMDGPU_MAX_VMHUBS 3


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: increase reserved VRAM size to 8MB

2020-11-25 Thread Zhang, Hawking

[AMD Public Use]

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: Gao, Likun  
Sent: Wednesday, November 25, 2020 21:12
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Gao, Likun 
Subject: [PATCH] drm/amdgpu: increase reserved VRAM size to 8MB

From: Likun Gao 

4MB reserved VRAM size which used for page tables was not enough for some 
condition, increase it to 8MB to reduce page table contention.

Signed-off-by: Likun Gao 
Change-Id: Ibbc0c14a75bd0e57d77e30b7140a144f4030114a
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index fa7d181934e5..1ed130d518a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -104,7 +104,7 @@ struct amdgpu_bo_list_entry;
 #define AMDGPU_VM_FAULT_STOP_ALWAYS2
 
 /* Reserve 4MB VRAM for page tables */
-#define AMDGPU_VM_RESERVED_VRAM(4ULL << 20)
+#define AMDGPU_VM_RESERVED_VRAM(8ULL << 20)
 
 /* max number of VMHUB */
 #define AMDGPU_MAX_VMHUBS  3
--
2.25.1
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: increase reserved VRAM size to 8MB

2020-11-25 Thread Likun Gao

From: Likun Gao 

4MB reserved VRAM size which used for page tables was not enough for
some condition, increase it to 8MB to reduce page table contention.

Signed-off-by: Likun Gao 
Change-Id: Ibbc0c14a75bd0e57d77e30b7140a144f4030114a
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index fa7d181934e5..1ed130d518a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -104,7 +104,7 @@ struct amdgpu_bo_list_entry;
 #define AMDGPU_VM_FAULT_STOP_ALWAYS2
 
 /* Reserve 4MB VRAM for page tables */
-#define AMDGPU_VM_RESERVED_VRAM(4ULL << 20)
+#define AMDGPU_VM_RESERVED_VRAM(8ULL << 20)
 
 /* max number of VMHUB */
 #define AMDGPU_MAX_VMHUBS  3
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use


Am 25.11.20 um 11:40 schrieb Daniel Vetter:

On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:

Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:

On 11/24/20 2:41 AM, Christian König wrote:

Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:

On 11/23/20 3:41 PM, Christian König wrote:

Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:

On 11/23/20 3:20 PM, Christian König wrote:

Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:

On 11/25/20 5:42 AM, Christian König wrote:

Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:

It's needed to drop iommu backed pages on device unplug
before device's IOMMU group is released.

It would be cleaner if we could do the whole
handling in TTM. I also need to double check
what you are doing with this function.

Christian.


Check patch "drm/amdgpu: Register IOMMU topology
notifier per device." to see
how i use it. I don't see why this should go
into TTM mid-layer - the stuff I do inside
is vendor specific and also I don't think TTM is
explicitly aware of IOMMU ?
Do you mean you prefer the IOMMU notifier to be
registered from within TTM
and then use a hook to call into vendor specific handler ?

No, that is really vendor specific.

What I meant is to have a function like
ttm_resource_manager_evict_all() which you only need
to call and all tt objects are unpopulated.


So instead of this BO list i create and later iterate in
amdgpu from the IOMMU patch you just want to do it
within
TTM with a single function ? Makes much more sense.

Yes, exactly.

The list_empty() checks we have in TTM for the LRU are
actually not the best idea, we should now check the
pin_count instead. This way we could also have a list of the
pinned BOs in TTM.


So from my IOMMU topology handler I will iterate the TTM LRU for
the unpinned BOs and this new function for the pinned ones  ?
It's probably a good idea to combine both iterations into this
new function to cover all the BOs allocated on the device.

Yes, that's what I had in my mind as well.




BTW: Have you thought about what happens when we unpopulate
a BO while we still try to use a kernel mapping for it? That
could have unforeseen consequences.


Are you asking what happens to kmap or vmap style mapped CPU
accesses once we drop all the DMA backing pages for a particular
BO ? Because for user mappings
(mmap) we took care of this with dummy page reroute but indeed
nothing was done for in kernel CPU mappings.

Yes exactly that.

In other words what happens if we free the ring buffer while the
kernel still writes to it?

Christian.


While we can't control user application accesses to the mapped buffers
explicitly and hence we use page fault rerouting
I am thinking that in this  case we may be able to sprinkle
drm_dev_enter/exit in any such sensitive place were we might
CPU access a DMA buffer from the kernel ?

Yes, I fear we are going to need that.

Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
could stuff this into begin/end_cpu_access (but only for the kernel, so a
bit tricky)?


Oh very very good point! I haven't thought about DMA-buf mmaps in this 
context yet.




btw the other issue with dma-buf (and even worse with dma_fence) is
refcounting of the underlying drm_device. I'd expect that all your
callbacks go boom if the dma_buf outlives your drm_device. That part isn't
yet solved in your series here.


Well thinking more about this, it seems to be a another really good 
argument why mapping pages from DMA-bufs into application address space 
directly is a very bad idea :)


But yes, we essentially can't remove the device as long as there is a 
DMA-buf with mappings. No idea how to clean that one up.


Christian.


-Daniel


Things like CPU page table updates, ring buffer accesses and FW memcpy ?
Is there other places ?

Puh, good question. I have no idea.


Another point is that at this point the driver shouldn't access any such
buffers as we are at the process finishing the device.
AFAIK there is no page fault mechanism for kernel mappings so I don't
think there is anything else to do ?

Well there is a page fault handler for kernel mappings, but that one just
prints the stack trace into the system log and calls BUG(); :)

Long story short we need to avoid any access to released pages after unplug.
No matter if it's from the kernel or userspace.

Regards,
Christian.


Andrey


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 3/6] drm/scheduler: Job timeout handler returns status


Am 25.11.20 um 12:04 schrieb Steven Price:

On 25/11/2020 03:17, Luben Tuikov wrote:

The job timeout handler now returns status
indicating back to the DRM layer whether the job
was successfully cancelled or whether more time
should be given to the job to complete.


I'm not sure I understand in what circumstances you would want to give 
the job more time to complete. Could you expand on that?


One thing we're missing at the moment in Panfrost is the ability to 
suspend ("soft stop" is the Mali jargon) a job and pick something else 
to run. The propitiatory driver stack uses this to avoid timing out 
long running jobs while still allowing other processes to have time on 
the GPU. But this interface as it stands doesn't seem to provide that.


On AMD hardware we call this IB preemption and it is indeed not handled 
very well by the scheduler at the moment.


See how the amdgpu code messes with the preempted IBs to restart them 
for example.


Christian.



As the kernel test robot has already pointed out - you'll need to at 
the very least update the other uses of this interface.


Steve



Signed-off-by: Luben Tuikov 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  6 --
  include/drm/gpu_scheduler.h | 13 ++---
  2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c

index ff48101bab55..81b73790ecc6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -28,7 +28,7 @@
  #include "amdgpu.h"
  #include "amdgpu_trace.h"
  -static void amdgpu_job_timedout(struct drm_sched_job *s_job)
+static int amdgpu_job_timedout(struct drm_sched_job *s_job)
  {
  struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
  struct amdgpu_job *job = to_amdgpu_job(s_job);
@@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct 
drm_sched_job *s_job)
  amdgpu_ring_soft_recovery(ring, job->vmid, 
s_job->s_fence->parent)) {

  DRM_ERROR("ring %s timeout, but soft recovered\n",
    s_job->sched->name);
-    return;
+    return 0;
  }
    amdgpu_vm_get_task_info(ring->adev, job->pasid, );
@@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct 
drm_sched_job *s_job)

    if (amdgpu_device_should_recover_gpu(ring->adev)) {
  amdgpu_device_gpu_recover(ring->adev, job);
+    return 0;
  } else {
  drm_sched_suspend_timeout(>sched);
  if (amdgpu_sriov_vf(adev))
  adev->virt.tdr_debug = true;
+    return 1;
  }
  }
  diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 2e0c368e19f6..61f7121e1c19 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -230,10 +230,17 @@ struct drm_sched_backend_ops {
  struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
    /**
- * @timedout_job: Called when a job has taken too long to 
execute,

- * to trigger GPU recovery.
+ * @timedout_job: Called when a job has taken too long to execute,
+ * to trigger GPU recovery.
+ *
+ * Return 0, if the job has been aborted successfully and will
+ * never be heard of from the device. Return non-zero if the
+ * job wasn't able to be aborted, i.e. if more time should be
+ * given to this job. The result is not "bool" as this
+ * function is not a predicate, although its result may seem
+ * as one.
   */
-    void (*timedout_job)(struct drm_sched_job *sched_job);
+    int (*timedout_job)(struct drm_sched_job *sched_job);
    /**
   * @free_job: Called once the job's finished fence has been 
signaled






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 3/6] drm/scheduler: Job timeout handler returns status

2020-11-25 Thread Lucas Stach

Am Mittwoch, den 25.11.2020, 11:22 + schrieb Steven Price:
> On 25/11/2020 11:15, Lucas Stach wrote:
> > Am Mittwoch, den 25.11.2020, 11:04 + schrieb Steven Price:
> > > On 25/11/2020 03:17, Luben Tuikov wrote:
> > > > The job timeout handler now returns status
> > > > indicating back to the DRM layer whether the job
> > > > was successfully cancelled or whether more time
> > > > should be given to the job to complete.
> > > 
> > > I'm not sure I understand in what circumstances you would want to give
> > > the job more time to complete. Could you expand on that?
> > 
> > On etnaviv we don't have the ability to preempt a running job, but we
> > can look at the GPU state to determine if it's still making progress
> > with the current job, so we want to extend the timeout in that case to
> > not kill a long running but valid job.
> 
> Ok, fair enough. Although from my experience (on Mali) jobs very rarely 
> "get stuck" it's just that their run time can be excessive[1] causing 
> other processes to not make forward progress. So I'd expect the timeout 
> to be set based on how long a job can run before you need to stop it to 
> allow other processes to run their jobs.

Yea, we might want to kill the job eventually, but people tend to get
very angry if their use-case gets broken just because the userspace
driver manages to put enough blits in one job to run over the 500ms
timeout we allow for a job and the kernel then just hard-kills the job.

In an ideal world we would just preempt the job and allow something
else to run for a while, but without proper preemption support in HW
that's not an option right now.

> But I'm not familiar with etnaviv so perhaps stuck jobs are actually a 
> thing there.

It happens from time to time when our understanding of the HW isn't
complete and the userspace driver manages to create command streams
with missing semaphores between HW engines. ;)

Regards,
Lucas

> Thanks,
> 
> Steve
> 
> [1] Also on Mali it's quite possible to create an infinite duration job 
> which appears to be making forward progress, so in that case our measure 
> of progress isn't useful against these malicious jobs.
> 
> > Regards,
> > Lucas
> > 
> > > One thing we're missing at the moment in Panfrost is the ability to
> > > suspend ("soft stop" is the Mali jargon) a job and pick something else
> > > to run. The propitiatory driver stack uses this to avoid timing out long
> > > running jobs while still allowing other processes to have time on the
> > > GPU. But this interface as it stands doesn't seem to provide that.
> > > 
> > > As the kernel test robot has already pointed out - you'll need to at the
> > > very least update the other uses of this interface.
> > > 
> > > Steve
> > > 
> > > > Signed-off-by: Luben Tuikov 
> > > > ---
> > > >drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  6 --
> > > >include/drm/gpu_scheduler.h | 13 ++---
> > > >2 files changed, 14 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
> > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > > index ff48101bab55..81b73790ecc6 100644
> > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > > @@ -28,7 +28,7 @@
> > > >#include "amdgpu.h"
> > > >#include "amdgpu_trace.h"
> > > >
> > > > -static void amdgpu_job_timedout(struct drm_sched_job *s_job)
> > > > +static int amdgpu_job_timedout(struct drm_sched_job *s_job)
> > > >{
> > > > struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
> > > > struct amdgpu_job *job = to_amdgpu_job(s_job);
> > > > @@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job 
> > > > *s_job)
> > > > amdgpu_ring_soft_recovery(ring, job->vmid, 
> > > > s_job->s_fence->parent)) {
> > > > DRM_ERROR("ring %s timeout, but soft recovered\n",
> > > >   s_job->sched->name);
> > > > -   return;
> > > > +   return 0;
> > > > }
> > > >
> > > > amdgpu_vm_get_task_info(ring->adev, job->pasid, );
> > > > @@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct 
> > > > drm_sched_job *s_job)
> > > >
> > > > if (amdgpu_device_should_recover_gpu(ring->adev)) {
> > > > amdgpu_device_gpu_recover(ring->adev, job);
> > > > +   return 0;
> > > > } else {
> > > > drm_sched_suspend_timeout(>sched);
> > > > if (amdgpu_sriov_vf(adev))
> > > > adev->virt.tdr_debug = true;
> > > > +   return 1;
> > > > }
> > > >}
> > > >
> > > > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > > > index 2e0c368e19f6..61f7121e1c19 100644
> > > > --- a/include/drm/gpu_scheduler.h
> > > > +++ b/include/drm/gpu_scheduler.h
> > > > @@ -230,10 +230,17 @@ struct drm_sched_backend_ops {
> > > > struct

Re: [PATCH 4/7] drm/radeon: Pin buffers while they are vmap'ed

2020-11-25 Thread Thomas Zimmermann


Hi

Am 25.11.20 um 11:36 schrieb Daniel Vetter:

On Wed, Nov 25, 2020 at 11:13:13AM +0100, Christian König wrote:

Am 25.11.20 um 09:37 schrieb Thomas Zimmermann:

Hi

Am 24.11.20 um 15:09 schrieb Daniel Vetter:

On Tue, Nov 24, 2020 at 02:56:51PM +0100, Thomas Zimmermann wrote:

Hi

Am 24.11.20 um 14:36 schrieb Christian König:

Am 24.11.20 um 13:15 schrieb Thomas Zimmermann:

[SNIP]

First I wanted to put this into
drm_gem_ttm_vmap/vunmap(), but then wondered why
ttm_bo_vmap() doe not acquire the lock internally?
I'd expect that vmap/vunmap are close together and
do not overlap for the same BO.


We have use cases like the following during command submission:

1. lock
2. map
3. copy parts of the BO content somewhere else or patch
it with additional information
4. unmap
5. submit BO to the hardware
6. add hardware fence to the BO to make sure it doesn't move
7. unlock

That use case won't be possible with vmap/vunmap if we
move the lock/unlock into it and I hope to replace the
kmap/kunmap functions with them in the near term.


Otherwise, acquiring the reservation lock would
require another ref-counting variable or per-driver
code.


Hui, why that? Just put this into
drm_gem_ttm_vmap/vunmap() helper as you initially
planned.


Given your example above, step one would acquire the lock,
and step two would also acquire the lock as part of the vmap
implementation. Wouldn't this fail (At least during unmap or
unlock steps) ?


Oh, so you want to nest them? No, that is a rather bad no-go.


I don't want to nest/overlap them. My question was whether that
would be required. Apparently not.

While the console's BO is being set for scanout, it's protected from
movement via the pin/unpin implementation, right?


Yes, correct.


The driver does not acquire the resv lock for longer periods. I'm
asking because this would prevent any console-buffer updates while
the console is being displayed.


Correct as well, we only hold the lock for things like command
submission, pinning, unpinning etc etc



Thanks for answering my questions.





You need to make sure that the lock is only taken from the FB
path which wants to vmap the object.

Why don't you lock the GEM object from the caller in the generic
FB implementation?


With the current blitter code, it breaks abstraction. if vmap/vunmap
hold the lock implicitly, things would be easier.


Do you have a link to the code?


It's the damage blitter in the fbdev code. [1] While it flushes
the shadow
buffer into the BO, the BO has to be kept in place. I already
changed it to
lock struct drm_fb_helper.lock, but I don't think this is
enough. TTM could
still evict the BO concurrently.


So I'm not sure this is actually a problem: ttm could try to
concurrently
evict the buffer we pinned into vram, and then just skip to the next
one.

Plus atm generic fbdev isn't used on any chip where we really care about
that last few mb of vram being useable for command submission (well atm
there's no driver using it).


Well, this is the patchset for radeon. If it works out, amdgpu and
nouveau are natural next choices. Especially radeon and nouveau support
cards with low- to medium-sized VRAM. The MiBs wasted on fbdev certainly
matter.



Having the buffer pinned into system memory and trying to do a
concurrent
modeset that tries to pull it in is the hard failure mode. And holding
fb_helper.lock fully prevents that.

So not really clear on what failure mode you're seeing here?


Imagine the fbdev BO is in VRAM, but not pinned. (Maybe Xorg or Wayland
is running.) The fbdev BO is a few MiBs and not in use, so TTM would
want to evict it if memory gets tight.

What I have in mind is a concurrent modeset that requires the memory. If
we do a concurrent damage blit without protecting against eviction,
things go boom. Same for concurrent 3d graphics with textures, model
data, etc.


Completely agree.

This needs proper lock protection of the memory mapped buffer. Relying on
that some other code isn't run because we have some third part locks taken
is not sufficient here.


We are still protected by the pin count in this scenario. Plus, with
current drivers we always pin the fbdev buffer into vram, so occasionally
failing to move it out isn't a regression.


Why is this protected by the pin count? The counter should be zero in 
this scenario. Otherwise, we could not evict the fbdev BO on all those 
systems where that's a hard requirement (e.g., ast).


The pin count is currently maintained by the vmap implementation in vram 
helpers. Calling vmap is an implicit pin; calling vunmap is an implicit 
unpin. This prevents eviction in the damage worker. But now I was told 
than pinning is only for BOs that are controlled by userspace and 
internal users should acquire the resv lock. So vram helpers have to be 
fixed, actually.


In vram helpers, unmapping does not mean eviction. The unmap operation 
only marks the BO as unmappable. The real unmap happens when the 
eviction takes place. This avoids

Re: [PATCH 3/6] drm/scheduler: Job timeout handler returns status

2020-11-25 Thread Steven Price


On 25/11/2020 11:15, Lucas Stach wrote:

Am Mittwoch, den 25.11.2020, 11:04 + schrieb Steven Price:

On 25/11/2020 03:17, Luben Tuikov wrote:

The job timeout handler now returns status
indicating back to the DRM layer whether the job
was successfully cancelled or whether more time
should be given to the job to complete.


I'm not sure I understand in what circumstances you would want to give
the job more time to complete. Could you expand on that?


On etnaviv we don't have the ability to preempt a running job, but we
can look at the GPU state to determine if it's still making progress
with the current job, so we want to extend the timeout in that case to
not kill a long running but valid job.


Ok, fair enough. Although from my experience (on Mali) jobs very rarely 
"get stuck" it's just that their run time can be excessive[1] causing 
other processes to not make forward progress. So I'd expect the timeout 
to be set based on how long a job can run before you need to stop it to 
allow other processes to run their jobs.


But I'm not familiar with etnaviv so perhaps stuck jobs are actually a 
thing there.


Thanks,

Steve

[1] Also on Mali it's quite possible to create an infinite duration job 
which appears to be making forward progress, so in that case our measure 
of progress isn't useful against these malicious jobs.



Regards,
Lucas


One thing we're missing at the moment in Panfrost is the ability to
suspend ("soft stop" is the Mali jargon) a job and pick something else
to run. The propitiatory driver stack uses this to avoid timing out long
running jobs while still allowing other processes to have time on the
GPU. But this interface as it stands doesn't seem to provide that.

As the kernel test robot has already pointed out - you'll need to at the
very least update the other uses of this interface.

Steve


Signed-off-by: Luben Tuikov 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  6 --
   include/drm/gpu_scheduler.h | 13 ++---
   2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index ff48101bab55..81b73790ecc6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -28,7 +28,7 @@
   #include "amdgpu.h"
   #include "amdgpu_trace.h"
   
-static void amdgpu_job_timedout(struct drm_sched_job *s_job)

+static int amdgpu_job_timedout(struct drm_sched_job *s_job)
   {
struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
struct amdgpu_job *job = to_amdgpu_job(s_job);
@@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) 
{
DRM_ERROR("ring %s timeout, but soft recovered\n",
  s_job->sched->name);
-   return;
+   return 0;
}
   
   	amdgpu_vm_get_task_info(ring->adev, job->pasid, );

@@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
   
   	if (amdgpu_device_should_recover_gpu(ring->adev)) {

amdgpu_device_gpu_recover(ring->adev, job);
+   return 0;
} else {
drm_sched_suspend_timeout(>sched);
if (amdgpu_sriov_vf(adev))
adev->virt.tdr_debug = true;
+   return 1;
}
   }
   
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h

index 2e0c368e19f6..61f7121e1c19 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -230,10 +230,17 @@ struct drm_sched_backend_ops {
struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
   
   	/**

- * @timedout_job: Called when a job has taken too long to execute,
- * to trigger GPU recovery.
+* @timedout_job: Called when a job has taken too long to execute,
+* to trigger GPU recovery.
+*
+* Return 0, if the job has been aborted successfully and will
+* never be heard of from the device. Return non-zero if the
+* job wasn't able to be aborted, i.e. if more time should be
+* given to this job. The result is not "bool" as this
+* function is not a predicate, although its result may seem
+* as one.
 */
-   void (*timedout_job)(struct drm_sched_job *sched_job);
+   int (*timedout_job)(struct drm_sched_job *sched_job);
   
   	/**

* @free_job: Called once the job's finished fence has been signaled





___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 3/6] drm/scheduler: Job timeout handler returns status

2020-11-25 Thread Lucas Stach

Am Mittwoch, den 25.11.2020, 11:04 + schrieb Steven Price:
> On 25/11/2020 03:17, Luben Tuikov wrote:
> > The job timeout handler now returns status
> > indicating back to the DRM layer whether the job
> > was successfully cancelled or whether more time
> > should be given to the job to complete.
> 
> I'm not sure I understand in what circumstances you would want to give 
> the job more time to complete. Could you expand on that?

On etnaviv we don't have the ability to preempt a running job, but we
can look at the GPU state to determine if it's still making progress
with the current job, so we want to extend the timeout in that case to
not kill a long running but valid job.

Regards,
Lucas

> One thing we're missing at the moment in Panfrost is the ability to 
> suspend ("soft stop" is the Mali jargon) a job and pick something else 
> to run. The propitiatory driver stack uses this to avoid timing out long 
> running jobs while still allowing other processes to have time on the 
> GPU. But this interface as it stands doesn't seem to provide that.
> 
> As the kernel test robot has already pointed out - you'll need to at the 
> very least update the other uses of this interface.
> 
> Steve
> 
> > Signed-off-by: Luben Tuikov 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  6 --
> >   include/drm/gpu_scheduler.h | 13 ++---
> >   2 files changed, 14 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > index ff48101bab55..81b73790ecc6 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > @@ -28,7 +28,7 @@
> >   #include "amdgpu.h"
> >   #include "amdgpu_trace.h"
> >   
> > -static void amdgpu_job_timedout(struct drm_sched_job *s_job)
> > +static int amdgpu_job_timedout(struct drm_sched_job *s_job)
> >   {
> > struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
> > struct amdgpu_job *job = to_amdgpu_job(s_job);
> > @@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job 
> > *s_job)
> > amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) 
> > {
> > DRM_ERROR("ring %s timeout, but soft recovered\n",
> >   s_job->sched->name);
> > -   return;
> > +   return 0;
> > }
> >   
> > amdgpu_vm_get_task_info(ring->adev, job->pasid, );
> > @@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct drm_sched_job 
> > *s_job)
> >   
> > if (amdgpu_device_should_recover_gpu(ring->adev)) {
> > amdgpu_device_gpu_recover(ring->adev, job);
> > +   return 0;
> > } else {
> > drm_sched_suspend_timeout(>sched);
> > if (amdgpu_sriov_vf(adev))
> > adev->virt.tdr_debug = true;
> > +   return 1;
> > }
> >   }
> >   
> > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > index 2e0c368e19f6..61f7121e1c19 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -230,10 +230,17 @@ struct drm_sched_backend_ops {
> > struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
> >   
> > /**
> > - * @timedout_job: Called when a job has taken too long to execute,
> > - * to trigger GPU recovery.
> > +* @timedout_job: Called when a job has taken too long to execute,
> > +* to trigger GPU recovery.
> > +*
> > +* Return 0, if the job has been aborted successfully and will
> > +* never be heard of from the device. Return non-zero if the
> > +* job wasn't able to be aborted, i.e. if more time should be
> > +* given to this job. The result is not "bool" as this
> > +* function is not a predicate, although its result may seem
> > +* as one.
> >  */
> > -   void (*timedout_job)(struct drm_sched_job *sched_job);
> > +   int (*timedout_job)(struct drm_sched_job *sched_job);
> >   
> > /**
> >* @free_job: Called once the job's finished fence has been 
> > signaled
> > 

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 6/6] drm/sched: Make use of a "done" thread

2020-11-25 Thread Steven Price


On 25/11/2020 03:17, Luben Tuikov wrote:

Add a "done" list to which all completed jobs are added
to be freed. The drm_sched_job_done() callback is the
producer of jobs to this list.

Add a "done" thread which consumes from the done list
and frees up jobs. Now, the main scheduler thread only
pushes jobs to the GPU and the "done" thread frees them
up, on the way out of the GPU when they've completed
execution.


Generally I'd be in favour of a "done thread" as I think there are some 
murky corners of Panfrost's locking that would be helped by deferring 
the free_job() callback.


But I think you're trying to do too much in one patch here. And as 
Christian has pointed out there's some dodgy looking changes to locking 
which aren't explained.


Steve



Make use of the status returned by the GPU driver
timeout handler to decide whether to leave the job in
the pending list, or to send it off to the done list.
If a job is done, it is added to the done list and the
done thread woken up. If a job needs more time, it is
left on the pending list and the timeout timer
restarted.

Eliminate the polling mechanism of picking out done
jobs from the pending list, i.e. eliminate
drm_sched_get_cleanup_job(). Now the main scheduler
thread only pushes jobs down to the GPU.

Various other optimizations to the GPU scheduler
and job recovery are possible with this format.

Signed-off-by: Luben Tuikov 
---
  drivers/gpu/drm/scheduler/sched_main.c | 173 +
  include/drm/gpu_scheduler.h|  14 ++
  2 files changed, 101 insertions(+), 86 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 3eb7618a627d..289ae68cd97f 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -164,7 +164,8 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
   * drm_sched_job_done - complete a job
   * @s_job: pointer to the job which is done
   *
- * Finish the job's fence and wake up the worker thread.
+ * Finish the job's fence, move it to the done list,
+ * and wake up the done thread.
   */
  static void drm_sched_job_done(struct drm_sched_job *s_job)
  {
@@ -179,7 +180,12 @@ static void drm_sched_job_done(struct drm_sched_job *s_job)
dma_fence_get(_fence->finished);
drm_sched_fence_finished(s_fence);
dma_fence_put(_fence->finished);
-   wake_up_interruptible(>wake_up_worker);
+
+   spin_lock(>job_list_lock);
+   list_move(_job->list, >done_list);
+   spin_unlock(>job_list_lock);
+
+   wake_up_interruptible(>done_wait_q);
  }
  
  /**

@@ -221,11 +227,10 @@ bool drm_sched_dependency_optimized(struct dma_fence* 
fence,
  EXPORT_SYMBOL(drm_sched_dependency_optimized);
  
  /**

- * drm_sched_start_timeout - start timeout for reset worker
- *
- * @sched: scheduler instance to start the worker for
+ * drm_sched_start_timeout - start a timeout timer
+ * @sched: scheduler instance whose job we're timing
   *
- * Start the timeout for the given scheduler.
+ * Start a timeout timer for the given scheduler.
   */
  static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
  {
@@ -305,8 +310,8 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
  
  	spin_lock(>job_list_lock);

list_add_tail(_job->list, >pending_list);
-   drm_sched_start_timeout(sched);
spin_unlock(>job_list_lock);
+   drm_sched_start_timeout(sched);
  }
  
  static void drm_sched_job_timedout(struct work_struct *work)

@@ -316,37 +321,30 @@ static void drm_sched_job_timedout(struct work_struct 
*work)
  
  	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
  
-	/* Protects against concurrent deletion in drm_sched_get_cleanup_job */

spin_lock(>job_list_lock);
job = list_first_entry_or_null(>pending_list,
   struct drm_sched_job, list);
+   spin_unlock(>job_list_lock);
  
  	if (job) {

-   /*
-* Remove the bad job so it cannot be freed by concurrent
-* drm_sched_cleanup_jobs. It will be reinserted back after 
sched->thread
-* is parked at which point it's safe.
-*/
-   list_del_init(>list);
-   spin_unlock(>job_list_lock);
+   int res;
  
-		job->sched->ops->timedout_job(job);

+   job->job_status |= DRM_JOB_STATUS_TIMEOUT;
+   res = job->sched->ops->timedout_job(job);
+   if (res == 0) {
+   /* The job is out of the device.
+*/
+   spin_lock(>job_list_lock);
+   list_move(>list, >done_list);
+   spin_unlock(>job_list_lock);
  
-		/*

-* Guilty job did complete and hence needs to be manually 
removed
-* See drm_sched_stop doc.
-*/
-   if (sched->free_guilty) {
-

Re: [PATCH 3/6] drm/scheduler: Job timeout handler returns status

2020-11-25 Thread Steven Price


On 25/11/2020 03:17, Luben Tuikov wrote:

The job timeout handler now returns status
indicating back to the DRM layer whether the job
was successfully cancelled or whether more time
should be given to the job to complete.


I'm not sure I understand in what circumstances you would want to give 
the job more time to complete. Could you expand on that?


One thing we're missing at the moment in Panfrost is the ability to 
suspend ("soft stop" is the Mali jargon) a job and pick something else 
to run. The propitiatory driver stack uses this to avoid timing out long 
running jobs while still allowing other processes to have time on the 
GPU. But this interface as it stands doesn't seem to provide that.


As the kernel test robot has already pointed out - you'll need to at the 
very least update the other uses of this interface.


Steve



Signed-off-by: Luben Tuikov 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  6 --
  include/drm/gpu_scheduler.h | 13 ++---
  2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index ff48101bab55..81b73790ecc6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -28,7 +28,7 @@
  #include "amdgpu.h"
  #include "amdgpu_trace.h"
  
-static void amdgpu_job_timedout(struct drm_sched_job *s_job)

+static int amdgpu_job_timedout(struct drm_sched_job *s_job)
  {
struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
struct amdgpu_job *job = to_amdgpu_job(s_job);
@@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) 
{
DRM_ERROR("ring %s timeout, but soft recovered\n",
  s_job->sched->name);
-   return;
+   return 0;
}
  
  	amdgpu_vm_get_task_info(ring->adev, job->pasid, );

@@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
  
  	if (amdgpu_device_should_recover_gpu(ring->adev)) {

amdgpu_device_gpu_recover(ring->adev, job);
+   return 0;
} else {
drm_sched_suspend_timeout(>sched);
if (amdgpu_sriov_vf(adev))
adev->virt.tdr_debug = true;
+   return 1;
}
  }
  
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h

index 2e0c368e19f6..61f7121e1c19 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -230,10 +230,17 @@ struct drm_sched_backend_ops {
struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
  
  	/**

- * @timedout_job: Called when a job has taken too long to execute,
- * to trigger GPU recovery.
+* @timedout_job: Called when a job has taken too long to execute,
+* to trigger GPU recovery.
+*
+* Return 0, if the job has been aborted successfully and will
+* never be heard of from the device. Return non-zero if the
+* job wasn't able to be aborted, i.e. if more time should be
+* given to this job. The result is not "bool" as this
+* function is not a predicate, although its result may seem
+* as one.
 */
-   void (*timedout_job)(struct drm_sched_job *sched_job);
+   int (*timedout_job)(struct drm_sched_job *sched_job);
  
  	/**

   * @free_job: Called once the job's finished fence has been signaled



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 4/7] drm/radeon: Pin buffers while they are vmap'ed


Am 25.11.20 um 11:36 schrieb Daniel Vetter:

On Wed, Nov 25, 2020 at 11:13:13AM +0100, Christian König wrote:

Am 25.11.20 um 09:37 schrieb Thomas Zimmermann:

Hi

Am 24.11.20 um 15:09 schrieb Daniel Vetter:

On Tue, Nov 24, 2020 at 02:56:51PM +0100, Thomas Zimmermann wrote:

Hi

Am 24.11.20 um 14:36 schrieb Christian König:

Am 24.11.20 um 13:15 schrieb Thomas Zimmermann:

[SNIP]

First I wanted to put this into
drm_gem_ttm_vmap/vunmap(), but then wondered why
ttm_bo_vmap() doe not acquire the lock internally?
I'd expect that vmap/vunmap are close together and
do not overlap for the same BO.

We have use cases like the following during command submission:

1. lock
2. map
3. copy parts of the BO content somewhere else or patch
it with additional information
4. unmap
5. submit BO to the hardware
6. add hardware fence to the BO to make sure it doesn't move
7. unlock

That use case won't be possible with vmap/vunmap if we
move the lock/unlock into it and I hope to replace the
kmap/kunmap functions with them in the near term.


Otherwise, acquiring the reservation lock would
require another ref-counting variable or per-driver
code.

Hui, why that? Just put this into
drm_gem_ttm_vmap/vunmap() helper as you initially
planned.

Given your example above, step one would acquire the lock,
and step two would also acquire the lock as part of the vmap
implementation. Wouldn't this fail (At least during unmap or
unlock steps) ?

Oh, so you want to nest them? No, that is a rather bad no-go.

I don't want to nest/overlap them. My question was whether that
would be required. Apparently not.

While the console's BO is being set for scanout, it's protected from
movement via the pin/unpin implementation, right?

Yes, correct.


The driver does not acquire the resv lock for longer periods. I'm
asking because this would prevent any console-buffer updates while
the console is being displayed.

Correct as well, we only hold the lock for things like command
submission, pinning, unpinning etc etc


Thanks for answering my questions.


You need to make sure that the lock is only taken from the FB
path which wants to vmap the object.

Why don't you lock the GEM object from the caller in the generic
FB implementation?

With the current blitter code, it breaks abstraction. if vmap/vunmap
hold the lock implicitly, things would be easier.

Do you have a link to the code?

It's the damage blitter in the fbdev code. [1] While it flushes
the shadow
buffer into the BO, the BO has to be kept in place. I already
changed it to
lock struct drm_fb_helper.lock, but I don't think this is
enough. TTM could
still evict the BO concurrently.

So I'm not sure this is actually a problem: ttm could try to
concurrently
evict the buffer we pinned into vram, and then just skip to the next
one.

Plus atm generic fbdev isn't used on any chip where we really care about
that last few mb of vram being useable for command submission (well atm
there's no driver using it).

Well, this is the patchset for radeon. If it works out, amdgpu and
nouveau are natural next choices. Especially radeon and nouveau support
cards with low- to medium-sized VRAM. The MiBs wasted on fbdev certainly
matter.


Having the buffer pinned into system memory and trying to do a
concurrent
modeset that tries to pull it in is the hard failure mode. And holding
fb_helper.lock fully prevents that.

So not really clear on what failure mode you're seeing here?

Imagine the fbdev BO is in VRAM, but not pinned. (Maybe Xorg or Wayland
is running.) The fbdev BO is a few MiBs and not in use, so TTM would
want to evict it if memory gets tight.

What I have in mind is a concurrent modeset that requires the memory. If
we do a concurrent damage blit without protecting against eviction,
things go boom. Same for concurrent 3d graphics with textures, model
data, etc.

Completely agree.

This needs proper lock protection of the memory mapped buffer. Relying on
that some other code isn't run because we have some third part locks taken
is not sufficient here.

We are still protected by the pin count in this scenario. Plus, with
current drivers we always pin the fbdev buffer into vram, so occasionally
failing to move it out isn't a regression.

So I'm still not seeing how this can go boom.


Well as far as I understand it the pin count is zero for this buffer in 
this case here :)


I might be wrong on this because I don't know the FB code at all, but 
Thomas seems to be pretty clear that this is the shadow buffer which is 
not scanned out from.


Regards,
Christian.



Now long term it'd be nice to cut everything over to dma_resv locking, but
the issue there is that beyond ttm, none of the helpers (and few of the
drivers) use dma_resv. So this is a fairly big uphill battle. Quick
interim fix seems like the right solution to me.
-Daniel


Regards,
Christian.


Best regards
Thomas


There's no recursion taking place, so I guess the reservation
lock could be
acquired/release in

Re: [PATCH v3 08/12] drm/amdgpu: Split amdgpu_device_fini into early and late

On Tue, Nov 24, 2020 at 10:51:57AM -0500, Andrey Grodzovsky wrote:
> 
> On 11/24/20 9:53 AM, Daniel Vetter wrote:
> > On Sat, Nov 21, 2020 at 12:21:18AM -0500, Andrey Grodzovsky wrote:
> > > Some of the stuff in amdgpu_device_fini such as HW interrupts
> > > disable and pending fences finilization must be done right away on
> > > pci_remove while most of the stuff which relates to finilizing and
> > > releasing driver data structures can be kept until
> > > drm_driver.release hook is called, i.e. when the last device
> > > reference is dropped.
> > > 
> > Uh fini_late and fini_early are rathare meaningless namings, since no
> > clear why there's a split. If you used drm_connector_funcs as inspiration,
> > that's kinda not good because 'register' itself is a reserved keyword.
> > That's why we had to add late_ prefix, could as well have used
> > C_sucks_ as prefix :-) And then the early_unregister for consistency.
> > 
> > I think fini_hw and fini_sw (or maybe fini_drm) would be a lot clearer
> > about what they're doing.
> > 
> > I still strongly recommend that you cut over as much as possible of the
> > fini_hw work to devm_ and for the fini_sw/drm stuff there's drmm_
> > -Daniel
> 
> 
> Definitely, and I put it in a TODO list in the RFC patch.Also, as I
> mentioned before -
> I just prefer to leave it for a follow up work because it's non trivial and
> requires shuffling
> a lof of stuff around in the driver. I was thinking of committing the work
> in incremental steps -
> so it's easier to merge it and control for breakages.

Yeah doing devm/drmm conversion later on makes sense. It'd still try to
have better names than what you're currently going with. A few of these
will likely stick around for very long, not just interim.
-Daniel

> 
> Andrey
> 
> 
> > 
> > > Signed-off-by: Andrey Grodzovsky 
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu.h|  6 +-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|  7 ++-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 15 ++-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c| 24 +++-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h|  1 +
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 12 +++-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c|  3 +++
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  3 ++-
> > >   9 files changed, 65 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > index 83ac06a..6243f6d 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > @@ -1063,7 +1063,9 @@ static inline struct amdgpu_device 
> > > *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
> > >   int amdgpu_device_init(struct amdgpu_device *adev,
> > >  uint32_t flags);
> > > -void amdgpu_device_fini(struct amdgpu_device *adev);
> > > +void amdgpu_device_fini_early(struct amdgpu_device *adev);
> > > +void amdgpu_device_fini_late(struct amdgpu_device *adev);
> > > +
> > >   int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
> > >   void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos,
> > > @@ -1275,6 +1277,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device 
> > > *dev);
> > >   int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file 
> > > *file_priv);
> > >   void amdgpu_driver_postclose_kms(struct drm_device *dev,
> > >struct drm_file *file_priv);
> > > +void amdgpu_driver_release_kms(struct drm_device *dev);
> > > +
> > >   int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
> > >   int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
> > >   int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > index 2f60b70..797d94d 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > @@ -3557,14 +3557,12 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> > >* Tear down the driver info (all asics).
> > >* Called at driver shutdown.
> > >*/
> > > -void amdgpu_device_fini(struct amdgpu_device *adev)
> > > +void amdgpu_device_fini_early(struct amdgpu_device *adev)
> > >   {
> > >   dev_info(adev->dev, "amdgpu: finishing device.\n");
> > >   flush_delayed_work(>delayed_init_work);
> > >   adev->shutdown = true;
> > > - kfree(adev->pci_state);
> > > -
> > >   /* make sure IB test finished before entering exclusive mode
> > >* to avoid preemption on IB test
> > >* */
> > > @@ -3581,11 +3579,18 @@ void amdgpu_device_fini(struct amdgpu_device 
> > > *adev)
> > >   else
> > >

Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use

On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:
> Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
> > 
> > On 11/24/20 2:41 AM, Christian König wrote:
> > > Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
> > > > 
> > > > On 11/23/20 3:41 PM, Christian König wrote:
> > > > > Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
> > > > > > 
> > > > > > On 11/23/20 3:20 PM, Christian König wrote:
> > > > > > > Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
> > > > > > > > 
> > > > > > > > On 11/25/20 5:42 AM, Christian König wrote:
> > > > > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > > > > > > > > > It's needed to drop iommu backed pages on device unplug
> > > > > > > > > > before device's IOMMU group is released.
> > > > > > > > > 
> > > > > > > > > It would be cleaner if we could do the whole
> > > > > > > > > handling in TTM. I also need to double check
> > > > > > > > > what you are doing with this function.
> > > > > > > > > 
> > > > > > > > > Christian.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Check patch "drm/amdgpu: Register IOMMU topology
> > > > > > > > notifier per device." to see
> > > > > > > > how i use it. I don't see why this should go
> > > > > > > > into TTM mid-layer - the stuff I do inside
> > > > > > > > is vendor specific and also I don't think TTM is
> > > > > > > > explicitly aware of IOMMU ?
> > > > > > > > Do you mean you prefer the IOMMU notifier to be
> > > > > > > > registered from within TTM
> > > > > > > > and then use a hook to call into vendor specific handler ?
> > > > > > > 
> > > > > > > No, that is really vendor specific.
> > > > > > > 
> > > > > > > What I meant is to have a function like
> > > > > > > ttm_resource_manager_evict_all() which you only need
> > > > > > > to call and all tt objects are unpopulated.
> > > > > > 
> > > > > > 
> > > > > > So instead of this BO list i create and later iterate in
> > > > > > amdgpu from the IOMMU patch you just want to do it
> > > > > > within
> > > > > > TTM with a single function ? Makes much more sense.
> > > > > 
> > > > > Yes, exactly.
> > > > > 
> > > > > The list_empty() checks we have in TTM for the LRU are
> > > > > actually not the best idea, we should now check the
> > > > > pin_count instead. This way we could also have a list of the
> > > > > pinned BOs in TTM.
> > > > 
> > > > 
> > > > So from my IOMMU topology handler I will iterate the TTM LRU for
> > > > the unpinned BOs and this new function for the pinned ones  ?
> > > > It's probably a good idea to combine both iterations into this
> > > > new function to cover all the BOs allocated on the device.
> > > 
> > > Yes, that's what I had in my mind as well.
> > > 
> > > > 
> > > > 
> > > > > 
> > > > > BTW: Have you thought about what happens when we unpopulate
> > > > > a BO while we still try to use a kernel mapping for it? That
> > > > > could have unforeseen consequences.
> > > > 
> > > > 
> > > > Are you asking what happens to kmap or vmap style mapped CPU
> > > > accesses once we drop all the DMA backing pages for a particular
> > > > BO ? Because for user mappings
> > > > (mmap) we took care of this with dummy page reroute but indeed
> > > > nothing was done for in kernel CPU mappings.
> > > 
> > > Yes exactly that.
> > > 
> > > In other words what happens if we free the ring buffer while the
> > > kernel still writes to it?
> > > 
> > > Christian.
> > 
> > 
> > While we can't control user application accesses to the mapped buffers
> > explicitly and hence we use page fault rerouting
> > I am thinking that in this  case we may be able to sprinkle
> > drm_dev_enter/exit in any such sensitive place were we might
> > CPU access a DMA buffer from the kernel ?
> 
> Yes, I fear we are going to need that.

Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
could stuff this into begin/end_cpu_access (but only for the kernel, so a
bit tricky)?

btw the other issue with dma-buf (and even worse with dma_fence) is
refcounting of the underlying drm_device. I'd expect that all your
callbacks go boom if the dma_buf outlives your drm_device. That part isn't
yet solved in your series here.
-Daniel

> 
> > Things like CPU page table updates, ring buffer accesses and FW memcpy ?
> > Is there other places ?
> 
> Puh, good question. I have no idea.
> 
> > Another point is that at this point the driver shouldn't access any such
> > buffers as we are at the process finishing the device.
> > AFAIK there is no page fault mechanism for kernel mappings so I don't
> > think there is anything else to do ?
> 
> Well there is a page fault handler for kernel mappings, but that one just
> prints the stack trace into the system log and calls BUG(); :)
> 
> Long story short we need to avoid any access to released pages after unplug.
> No matter if it's from the kernel or userspace.
> 
> Regards,
> Christian.
> 
> > 
> > Andrey
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH 4/7] drm/radeon: Pin buffers while they are vmap'ed

On Wed, Nov 25, 2020 at 11:13:13AM +0100, Christian König wrote:
> Am 25.11.20 um 09:37 schrieb Thomas Zimmermann:
> > Hi
> > 
> > Am 24.11.20 um 15:09 schrieb Daniel Vetter:
> > > On Tue, Nov 24, 2020 at 02:56:51PM +0100, Thomas Zimmermann wrote:
> > > > Hi
> > > > 
> > > > Am 24.11.20 um 14:36 schrieb Christian König:
> > > > > Am 24.11.20 um 13:15 schrieb Thomas Zimmermann:
> > > > > > [SNIP]
> > > > > > > > > > First I wanted to put this into
> > > > > > > > > > drm_gem_ttm_vmap/vunmap(), but then wondered why
> > > > > > > > > > ttm_bo_vmap() doe not acquire the lock internally?
> > > > > > > > > > I'd expect that vmap/vunmap are close together and
> > > > > > > > > > do not overlap for the same BO.
> > > > > > > > > 
> > > > > > > > > We have use cases like the following during command 
> > > > > > > > > submission:
> > > > > > > > > 
> > > > > > > > > 1. lock
> > > > > > > > > 2. map
> > > > > > > > > 3. copy parts of the BO content somewhere else or patch
> > > > > > > > > it with additional information
> > > > > > > > > 4. unmap
> > > > > > > > > 5. submit BO to the hardware
> > > > > > > > > 6. add hardware fence to the BO to make sure it doesn't move
> > > > > > > > > 7. unlock
> > > > > > > > > 
> > > > > > > > > That use case won't be possible with vmap/vunmap if we
> > > > > > > > > move the lock/unlock into it and I hope to replace the
> > > > > > > > > kmap/kunmap functions with them in the near term.
> > > > > > > > > 
> > > > > > > > > > Otherwise, acquiring the reservation lock would
> > > > > > > > > > require another ref-counting variable or per-driver
> > > > > > > > > > code.
> > > > > > > > > 
> > > > > > > > > Hui, why that? Just put this into
> > > > > > > > > drm_gem_ttm_vmap/vunmap() helper as you initially
> > > > > > > > > planned.
> > > > > > > > 
> > > > > > > > Given your example above, step one would acquire the lock,
> > > > > > > > and step two would also acquire the lock as part of the vmap
> > > > > > > > implementation. Wouldn't this fail (At least during unmap or
> > > > > > > > unlock steps) ?
> > > > > > > 
> > > > > > > Oh, so you want to nest them? No, that is a rather bad no-go.
> > > > > > 
> > > > > > I don't want to nest/overlap them. My question was whether that
> > > > > > would be required. Apparently not.
> > > > > > 
> > > > > > While the console's BO is being set for scanout, it's protected from
> > > > > > movement via the pin/unpin implementation, right?
> > > > > 
> > > > > Yes, correct.
> > > > > 
> > > > > > The driver does not acquire the resv lock for longer periods. I'm
> > > > > > asking because this would prevent any console-buffer updates while
> > > > > > the console is being displayed.
> > > > > 
> > > > > Correct as well, we only hold the lock for things like command
> > > > > submission, pinning, unpinning etc etc
> > > > > 
> > > > 
> > > > Thanks for answering my questions.
> > > > 
> > > > > > 
> > > > > > > 
> > > > > > > You need to make sure that the lock is only taken from the FB
> > > > > > > path which wants to vmap the object.
> > > > > > > 
> > > > > > > Why don't you lock the GEM object from the caller in the generic
> > > > > > > FB implementation?
> > > > > > 
> > > > > > With the current blitter code, it breaks abstraction. if vmap/vunmap
> > > > > > hold the lock implicitly, things would be easier.
> > > > > 
> > > > > Do you have a link to the code?
> > > > 
> > > > It's the damage blitter in the fbdev code. [1] While it flushes
> > > > the shadow
> > > > buffer into the BO, the BO has to be kept in place. I already
> > > > changed it to
> > > > lock struct drm_fb_helper.lock, but I don't think this is
> > > > enough. TTM could
> > > > still evict the BO concurrently.
> > > 
> > > So I'm not sure this is actually a problem: ttm could try to
> > > concurrently
> > > evict the buffer we pinned into vram, and then just skip to the next
> > > one.
> > > 
> > > Plus atm generic fbdev isn't used on any chip where we really care about
> > > that last few mb of vram being useable for command submission (well atm
> > > there's no driver using it).
> > 
> > Well, this is the patchset for radeon. If it works out, amdgpu and
> > nouveau are natural next choices. Especially radeon and nouveau support
> > cards with low- to medium-sized VRAM. The MiBs wasted on fbdev certainly
> > matter.
> > 
> > > 
> > > Having the buffer pinned into system memory and trying to do a
> > > concurrent
> > > modeset that tries to pull it in is the hard failure mode. And holding
> > > fb_helper.lock fully prevents that.
> > > 
> > > So not really clear on what failure mode you're seeing here?
> > 
> > Imagine the fbdev BO is in VRAM, but not pinned. (Maybe Xorg or Wayland
> > is running.) The fbdev BO is a few MiBs and not in use, so TTM would
> > want to evict it if memory gets tight.
> > 
> > What I have in mind is a concurrent modeset that requires the memory. If
> > we do a concurrent damage blit without protecting against eviction,
> > things

Re: [PATCH 4/7] drm/radeon: Pin buffers while they are vmap'ed

Am 25.11.20 um 09:37 schrieb Thomas Zimmermann:

Am 24.11.20 um 15:09 schrieb Daniel Vetter:

On Tue, Nov 24, 2020 at 02:56:51PM +0100, Thomas Zimmermann wrote:

Am 24.11.20 um 14:36 schrieb Christian König:

Am 24.11.20 um 13:15 schrieb Thomas Zimmermann:

[SNIP]

First I wanted to put this into
drm_gem_ttm_vmap/vunmap(), but then wondered why
ttm_bo_vmap() doe not acquire the lock internally?
I'd expect that vmap/vunmap are close together and
do not overlap for the same BO.

We have use cases like the following during command submission:

1. lock
2. map
3. copy parts of the BO content somewhere else or patch
it with additional information
4. unmap
5. submit BO to the hardware
6. add hardware fence to the BO to make sure it doesn't move
7. unlock

That use case won't be possible with vmap/vunmap if we
move the lock/unlock into it and I hope to replace the
kmap/kunmap functions with them in the near term.

Otherwise, acquiring the reservation lock would
require another ref-counting variable or per-driver
code.

Hui, why that? Just put this into
drm_gem_ttm_vmap/vunmap() helper as you initially
planned.

Given your example above, step one would acquire the lock,
and step two would also acquire the lock as part of the vmap
implementation. Wouldn't this fail (At least during unmap or
unlock steps) ?

Oh, so you want to nest them? No, that is a rather bad no-go.

I don't want to nest/overlap them. My question was whether that
would be required. Apparently not.

While the console's BO is being set for scanout, it's protected from
movement via the pin/unpin implementation, right?

Yes, correct.

The driver does not acquire the resv lock for longer periods. I'm
asking because this would prevent any console-buffer updates while
the console is being displayed.

Correct as well, we only hold the lock for things like command
submission, pinning, unpinning etc etc

Thanks for answering my questions.

You need to make sure that the lock is only taken from the FB
path which wants to vmap the object.

Why don't you lock the GEM object from the caller in the generic
FB implementation?

With the current blitter code, it breaks abstraction. if vmap/vunmap
hold the lock implicitly, things would be easier.

Do you have a link to the code?

It's the damage blitter in the fbdev code. [1] While it flushes the
shadow
buffer into the BO, the BO has to be kept in place. I already
changed it to
lock struct drm_fb_helper.lock, but I don't think this is enough.
TTM could

still evict the BO concurrently.

So I'm not sure this is actually a problem: ttm could try to
concurrently
evict the buffer we pinned into vram, and then just skip to the next
one.

Plus atm generic fbdev isn't used on any chip where we really care about
that last few mb of vram being useable for command submission (well atm
there's no driver using it).

Well, this is the patchset for radeon. If it works out, amdgpu and
nouveau are natural next choices. Especially radeon and nouveau
support cards with low- to medium-sized VRAM. The MiBs wasted on fbdev
certainly matter.

Having the buffer pinned into system memory and trying to do a
concurrent

modeset that tries to pull it in is the hard failure mode. And holding
fb_helper.lock fully prevents that.

So not really clear on what failure mode you're seeing here?

Imagine the fbdev BO is in VRAM, but not pinned. (Maybe Xorg or
Wayland is running.) The fbdev BO is a few MiBs and not in use, so TTM
would want to evict it if memory gets tight.

What I have in mind is a concurrent modeset that requires the memory.
If we do a concurrent damage blit without protecting against eviction,
things go boom. Same for concurrent 3d graphics with textures, model
data, etc.

Completely agree.

This needs proper lock protection of the memory mapped buffer. Relying
on that some other code isn't run because we have some third part locks
taken is not sufficient here.

Regards,
Christian.

Best regards
Thomas

There's no recursion taking place, so I guess the reservation lock
could be
acquired/release in drm_client_buffer_vmap/vunmap(), or a separate
pair of

DRM client functions could do the locking.

Given how this "do the right locking" is a can of worms (and I think
it's

worse than what you dug out already) I think the fb_helper.lock hack is
perfectly good enough.

I'm also somewhat worried that starting to use dma_resv lock in generic
code, while many helpers/drivers still have their hand-rolled locking,
will make conversion over to dma_resv needlessly more complicated.
-Daniel

Best regards
Thomas

[1]
https://cgit.freedesktop.org/drm/drm-tip/tree/drivers/gpu/drm/drm_fb_helper.c?id=ac60f3f3090115d21f028bffa2dcfb67f695c4f2#n394

Please note that the reservation lock you need to take here is part of
the GEM object.

Usually we design things in the way that the code needs to take a lock
which protects an object, then do some operations with the object and
then

Re: [PATCH 6/6] drm/sched: Make use of a "done" thread


Am 25.11.20 um 04:17 schrieb Luben Tuikov:

Add a "done" list to which all completed jobs are added
to be freed. The drm_sched_job_done() callback is the
producer of jobs to this list.

Add a "done" thread which consumes from the done list
and frees up jobs. Now, the main scheduler thread only
pushes jobs to the GPU and the "done" thread frees them
up, on the way out of the GPU when they've completed
execution.


Well there are quite a number of problems in this patch.

From the design I think we should be getting rid of the linked list and 
not extend its use. And we also don't want to offload the freeing of 
jobs into a different thread because that could potentially mean that 
this is executed on a different CPU.


Then one obvious problem seems to be that you don't take into account 
that we moved the job freeing into the scheduler thread to make sure 
that this is suspended while the scheduler thread is stopped. This 
behavior is now completely gone, e.g. the delete thread keeps running 
while the scheduler thread is stopped.


A few more comments below.


Make use of the status returned by the GPU driver
timeout handler to decide whether to leave the job in
the pending list, or to send it off to the done list.
If a job is done, it is added to the done list and the
done thread woken up. If a job needs more time, it is
left on the pending list and the timeout timer
restarted.

Eliminate the polling mechanism of picking out done
jobs from the pending list, i.e. eliminate
drm_sched_get_cleanup_job(). Now the main scheduler
thread only pushes jobs down to the GPU.

Various other optimizations to the GPU scheduler
and job recovery are possible with this format.

Signed-off-by: Luben Tuikov 
---
  drivers/gpu/drm/scheduler/sched_main.c | 173 +
  include/drm/gpu_scheduler.h|  14 ++
  2 files changed, 101 insertions(+), 86 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 3eb7618a627d..289ae68cd97f 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -164,7 +164,8 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
   * drm_sched_job_done - complete a job
   * @s_job: pointer to the job which is done
   *
- * Finish the job's fence and wake up the worker thread.
+ * Finish the job's fence, move it to the done list,
+ * and wake up the done thread.
   */
  static void drm_sched_job_done(struct drm_sched_job *s_job)
  {
@@ -179,7 +180,12 @@ static void drm_sched_job_done(struct drm_sched_job *s_job)
dma_fence_get(_fence->finished);
drm_sched_fence_finished(s_fence);
dma_fence_put(_fence->finished);
-   wake_up_interruptible(>wake_up_worker);
+
+   spin_lock(>job_list_lock);
+   list_move(_job->list, >done_list);
+   spin_unlock(>job_list_lock);
+
+   wake_up_interruptible(>done_wait_q);


How is the worker thread then woken up to push new jobs to the hardware?


  }
  
  /**

@@ -221,11 +227,10 @@ bool drm_sched_dependency_optimized(struct dma_fence* 
fence,
  EXPORT_SYMBOL(drm_sched_dependency_optimized);
  
  /**

- * drm_sched_start_timeout - start timeout for reset worker
- *
- * @sched: scheduler instance to start the worker for
+ * drm_sched_start_timeout - start a timeout timer
+ * @sched: scheduler instance whose job we're timing
   *
- * Start the timeout for the given scheduler.
+ * Start a timeout timer for the given scheduler.
   */
  static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
  {
@@ -305,8 +310,8 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
  
  	spin_lock(>job_list_lock);

list_add_tail(_job->list, >pending_list);
-   drm_sched_start_timeout(sched);
spin_unlock(>job_list_lock);
+   drm_sched_start_timeout(sched);


This looks wrong, the drm_sched_start_timeout() function used to need 
the lock. Why should that have changed?



  }
  
  static void drm_sched_job_timedout(struct work_struct *work)

@@ -316,37 +321,30 @@ static void drm_sched_job_timedout(struct work_struct 
*work)
  
  	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
  
-	/* Protects against concurrent deletion in drm_sched_get_cleanup_job */

spin_lock(>job_list_lock);
job = list_first_entry_or_null(>pending_list,
   struct drm_sched_job, list);
+   spin_unlock(>job_list_lock);
  
  	if (job) {

-   /*
-* Remove the bad job so it cannot be freed by concurrent
-* drm_sched_cleanup_jobs. It will be reinserted back after 
sched->thread
-* is parked at which point it's safe.
-*/
-   list_del_init(>list);
-   spin_unlock(>job_list_lock);
+   int res;
  
-		job->sched->ops->timedout_job(job);

+   job->job_status |= DRM_JOB_STATUS_TIMEOUT;
+   res =

Re: [PATCH 5/6] drm/amdgpu: Don't hardcode thread name length


Am 25.11.20 um 04:17 schrieb Luben Tuikov:

Introduce a macro DRM_THREAD_NAME_LEN
and use that to define ring name size,
instead of hardcoding it to 16.

Signed-off-by: Luben Tuikov 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 +-
  include/drm/gpu_scheduler.h  | 2 ++
  2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 7112137689db..bbd46c6dec65 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -230,7 +230,7 @@ struct amdgpu_ring {
unsignedwptr_offs;
unsignedfence_offs;
uint64_tcurrent_ctx;
-   charname[16];
+   charname[DRM_THREAD_NAME_LEN];
u32 trail_seq;
unsignedtrail_fence_offs;
u64 trail_fence_gpu_addr;
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 61f7121e1c19..3a5686c3b5e9 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -30,6 +30,8 @@
  
  #define MAX_WAIT_SCHED_ENTITY_Q_EMPTY msecs_to_jiffies(1000)
  
+#define DRM_THREAD_NAME_LEN TASK_COMM_LEN

+


The thread name is an amdgpu specific thing. I don't think we should 
have that in the scheduler.


And why do you use TASK_COMM_LEN here? That is completely unrelated stuff.

Regards,
Christian.


  struct drm_gpu_scheduler;
  struct drm_sched_rq;
  


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 4/6] drm/scheduler: Essentialize the job done callback


Am 25.11.20 um 04:17 schrieb Luben Tuikov:

The job done callback is called from various
places, in two ways: in job done role, and
as a fence callback role.

Essentialize the callback to an atom
function to just complete the job,
and into a second function as a prototype
of fence callback which calls to complete
the job.

This is used in latter patches by the completion
code.

Signed-off-by: Luben Tuikov 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/scheduler/sched_main.c | 73 ++
  1 file changed, 40 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index b694df12aaba..3eb7618a627d 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -60,8 +60,6 @@
  #define to_drm_sched_job(sched_job)   \
container_of((sched_job), struct drm_sched_job, queue_node)
  
-static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb);

-
  /**
   * drm_sched_rq_init - initialize a given run queue struct
   *
@@ -162,6 +160,40 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
return NULL;
  }
  
+/**

+ * drm_sched_job_done - complete a job
+ * @s_job: pointer to the job which is done
+ *
+ * Finish the job's fence and wake up the worker thread.
+ */
+static void drm_sched_job_done(struct drm_sched_job *s_job)
+{
+   struct drm_sched_fence *s_fence = s_job->s_fence;
+   struct drm_gpu_scheduler *sched = s_fence->sched;
+
+   atomic_dec(>hw_rq_count);
+   atomic_dec(>score);
+
+   trace_drm_sched_process_job(s_fence);
+
+   dma_fence_get(_fence->finished);
+   drm_sched_fence_finished(s_fence);
+   dma_fence_put(_fence->finished);
+   wake_up_interruptible(>wake_up_worker);
+}
+
+/**
+ * drm_sched_job_done_cb - the callback for a done job
+ * @f: fence
+ * @cb: fence callbacks
+ */
+static void drm_sched_job_done_cb(struct dma_fence *f, struct dma_fence_cb *cb)
+{
+   struct drm_sched_job *s_job = container_of(cb, struct drm_sched_job, 
cb);
+
+   drm_sched_job_done(s_job);
+}
+
  /**
   * drm_sched_dependency_optimized
   *
@@ -473,14 +505,14 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, 
bool full_recovery)
  
  		if (fence) {

r = dma_fence_add_callback(fence, _job->cb,
-  drm_sched_process_job);
+  drm_sched_job_done_cb);
if (r == -ENOENT)
-   drm_sched_process_job(fence, _job->cb);
+   drm_sched_job_done(s_job);
else if (r)
DRM_ERROR("fence add callback failed (%d)\n",
  r);
} else
-   drm_sched_process_job(NULL, _job->cb);
+   drm_sched_job_done(s_job);
}
  
  	if (full_recovery) {

@@ -635,31 +667,6 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
return entity;
  }
  
-/**

- * drm_sched_process_job - process a job
- *
- * @f: fence
- * @cb: fence callbacks
- *
- * Called after job has finished execution.
- */
-static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb)
-{
-   struct drm_sched_job *s_job = container_of(cb, struct drm_sched_job, 
cb);
-   struct drm_sched_fence *s_fence = s_job->s_fence;
-   struct drm_gpu_scheduler *sched = s_fence->sched;
-
-   atomic_dec(>hw_rq_count);
-   atomic_dec(>score);
-
-   trace_drm_sched_process_job(s_fence);
-
-   dma_fence_get(_fence->finished);
-   drm_sched_fence_finished(s_fence);
-   dma_fence_put(_fence->finished);
-   wake_up_interruptible(>wake_up_worker);
-}
-
  /**
   * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed
   *
@@ -809,9 +816,9 @@ static int drm_sched_main(void *param)
if (!IS_ERR_OR_NULL(fence)) {
s_fence->parent = dma_fence_get(fence);
r = dma_fence_add_callback(fence, _job->cb,
-  drm_sched_process_job);
+  drm_sched_job_done_cb);
if (r == -ENOENT)
-   drm_sched_process_job(fence, _job->cb);
+   drm_sched_job_done(sched_job);
else if (r)
DRM_ERROR("fence add callback failed (%d)\n",
  r);
@@ -820,7 +827,7 @@ static int drm_sched_main(void *param)
if (IS_ERR(fence))
dma_fence_set_error(_fence->finished, 
PTR_ERR(fence));
  
-			drm_sched_process_job(NULL, _job->cb);

+   drm_sched_job_done(sched_job);
}

Re: [PATCH 3/6] drm/scheduler: Job timeout handler returns status


Am 25.11.20 um 04:17 schrieb Luben Tuikov:

The job timeout handler now returns status
indicating back to the DRM layer whether the job
was successfully cancelled or whether more time
should be given to the job to complete.

Signed-off-by: Luben Tuikov 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  6 --
  include/drm/gpu_scheduler.h | 13 ++---
  2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index ff48101bab55..81b73790ecc6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -28,7 +28,7 @@
  #include "amdgpu.h"
  #include "amdgpu_trace.h"
  
-static void amdgpu_job_timedout(struct drm_sched_job *s_job)

+static int amdgpu_job_timedout(struct drm_sched_job *s_job)
  {
struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
struct amdgpu_job *job = to_amdgpu_job(s_job);
@@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) 
{
DRM_ERROR("ring %s timeout, but soft recovered\n",
  s_job->sched->name);
-   return;
+   return 0;
}
  
  	amdgpu_vm_get_task_info(ring->adev, job->pasid, );

@@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
  
  	if (amdgpu_device_should_recover_gpu(ring->adev)) {

amdgpu_device_gpu_recover(ring->adev, job);
+   return 0;
} else {
drm_sched_suspend_timeout(>sched);
if (amdgpu_sriov_vf(adev))
adev->virt.tdr_debug = true;
+   return 1;
}
  }
  
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h

index 2e0c368e19f6..61f7121e1c19 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -230,10 +230,17 @@ struct drm_sched_backend_ops {
struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
  
  	/**

- * @timedout_job: Called when a job has taken too long to execute,
- * to trigger GPU recovery.
+* @timedout_job: Called when a job has taken too long to execute,
+* to trigger GPU recovery.
+*
+* Return 0, if the job has been aborted successfully and will
+* never be heard of from the device. Return non-zero if the
+* job wasn't able to be aborted, i.e. if more time should be
+* given to this job. The result is not "bool" as this
+* function is not a predicate, although its result may seem
+* as one.


I think the whole approach of timing out a job needs to be rethinked. 
What's timing out here is the hardware engine, not the job.


So we should also not have the job as parameter here. Maybe we should 
make that the fence we are waiting for instead.



 */
-   void (*timedout_job)(struct drm_sched_job *sched_job);
+   int (*timedout_job)(struct drm_sched_job *sched_job);


I would either return an error code, boolean or enum here. But not use a 
number without a define.


Regards,
Christian.

  
  	/**

   * @free_job: Called once the job's finished fence has been signaled


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/6] gpu/drm: ring_mirror_list --> pending_list


Am 25.11.20 um 04:17 schrieb Luben Tuikov:

Rename "ring_mirror_list" to "pending_list",
to describe what something is, not what it does,
how it's used, or how the hardware implements it.

This also abstracts the actual hardware
implementation, i.e. how the low-level driver
communicates with the device it drives, ring, CAM,
etc., shouldn't be exposed to DRM.

The pending_list keeps jobs submitted, which are
out of our control. Usually this means they are
pending execution status in hardware, but the
latter definition is a more general (inclusive)
definition.

Signed-off-by: Luben Tuikov 


In general the rename is a good idea, but I think we should try to 
remove this linked list in general.


As the original name described this is essentially a ring buffer, the is 
no reason I can see to use a linked list here except for the add/remove 
madness we currently have.


Anyway patch is Acked-by: Christian König  for 
now.


Regards,
Christian.


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  4 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  4 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  2 +-
  drivers/gpu/drm/scheduler/sched_main.c  | 34 ++---
  include/drm/gpu_scheduler.h | 10 +++---
  5 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 8358cae0b5a4..db77a5bdfa45 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1427,7 +1427,7 @@ static void amdgpu_ib_preempt_job_recovery(struct 
drm_gpu_scheduler *sched)
struct dma_fence *fence;
  
  	spin_lock(>job_list_lock);

-   list_for_each_entry(s_job, >ring_mirror_list, list) {
+   list_for_each_entry(s_job, >pending_list, list) {
fence = sched->ops->run_job(s_job);
dma_fence_put(fence);
}
@@ -1459,7 +1459,7 @@ static void amdgpu_ib_preempt_mark_partial_job(struct 
amdgpu_ring *ring)
  
  no_preempt:

spin_lock(>job_list_lock);
-   list_for_each_entry_safe(s_job, tmp, >ring_mirror_list, list) {
+   list_for_each_entry_safe(s_job, tmp, >pending_list, list) {
if (dma_fence_is_signaled(_job->s_fence->finished)) {
/* remove job from ring_mirror_list */
list_del_init(_job->list);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 4df6de81cd41..fbae600aa5f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4127,8 +4127,8 @@ bool amdgpu_device_has_job_running(struct amdgpu_device 
*adev)
continue;
  
  		spin_lock(>sched.job_list_lock);

-   job = list_first_entry_or_null(>sched.ring_mirror_list,
-   struct drm_sched_job, list);
+   job = list_first_entry_or_null(>sched.pending_list,
+  struct drm_sched_job, list);
spin_unlock(>sched.job_list_lock);
if (job)
return true;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index aca52a46b93d..ff48101bab55 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -271,7 +271,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct 
drm_gpu_scheduler *sched)
}
  
  	/* Signal all jobs already scheduled to HW */

-   list_for_each_entry(s_job, >ring_mirror_list, list) {
+   list_for_each_entry(s_job, >pending_list, list) {
struct drm_sched_fence *s_fence = s_job->s_fence;
  
  		dma_fence_set_error(_fence->finished, -EHWPOISON);

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index c52eba407ebd..b694df12aaba 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -198,7 +198,7 @@ EXPORT_SYMBOL(drm_sched_dependency_optimized);
  static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
  {
if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
-   !list_empty(>ring_mirror_list))
+   !list_empty(>pending_list))
schedule_delayed_work(>work_tdr, sched->timeout);
  }
  
@@ -258,7 +258,7 @@ void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched,

  {
spin_lock(>job_list_lock);
  
-	if (list_empty(>ring_mirror_list))

+   if (list_empty(>pending_list))
cancel_delayed_work(>work_tdr);
else
mod_delayed_work(system_wq, >work_tdr, remaining);
@@ -272,7 +272,7 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
struct drm_gpu_scheduler *sched = s_job->sched;
  
  	spin_lock(>job_list_lock);

-   list_add_tail(_job->list, >ring_mirror_list);
+

Re: [PATCH 1/6] drm/scheduler: "node" --> "list"


Am 25.11.20 um 04:17 schrieb Luben Tuikov:

Rename "node" to "list" in struct drm_sched_job,
in order to make it consistent with what we see
being used throughout gpu_scheduler.h, for
instance in struct drm_sched_entity, as well as
the rest of DRM and the kernel.

Signed-off-by: Luben Tuikov 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  6 +++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  2 +-
  drivers/gpu/drm/scheduler/sched_main.c  | 23 +++--
  include/drm/gpu_scheduler.h |  4 ++--
  5 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 5c1f3725c741..8358cae0b5a4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1427,7 +1427,7 @@ static void amdgpu_ib_preempt_job_recovery(struct 
drm_gpu_scheduler *sched)
struct dma_fence *fence;
  
  	spin_lock(>job_list_lock);

-   list_for_each_entry(s_job, >ring_mirror_list, node) {
+   list_for_each_entry(s_job, >ring_mirror_list, list) {
fence = sched->ops->run_job(s_job);
dma_fence_put(fence);
}
@@ -1459,10 +1459,10 @@ static void amdgpu_ib_preempt_mark_partial_job(struct 
amdgpu_ring *ring)
  
  no_preempt:

spin_lock(>job_list_lock);
-   list_for_each_entry_safe(s_job, tmp, >ring_mirror_list, node) {
+   list_for_each_entry_safe(s_job, tmp, >ring_mirror_list, list) {
if (dma_fence_is_signaled(_job->s_fence->finished)) {
/* remove job from ring_mirror_list */
-   list_del_init(_job->node);
+   list_del_init(_job->list);
sched->ops->free_job(s_job);
continue;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 7560b05e4ac1..4df6de81cd41 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4128,7 +4128,7 @@ bool amdgpu_device_has_job_running(struct amdgpu_device 
*adev)
  
  		spin_lock(>sched.job_list_lock);

job = list_first_entry_or_null(>sched.ring_mirror_list,
-   struct drm_sched_job, node);
+   struct drm_sched_job, list);
spin_unlock(>sched.job_list_lock);
if (job)
return true;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index dcfe8a3b03ff..aca52a46b93d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -271,7 +271,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct 
drm_gpu_scheduler *sched)
}
  
  	/* Signal all jobs already scheduled to HW */

-   list_for_each_entry(s_job, >ring_mirror_list, node) {
+   list_for_each_entry(s_job, >ring_mirror_list, list) {
struct drm_sched_fence *s_fence = s_job->s_fence;
  
  		dma_fence_set_error(_fence->finished, -EHWPOISON);

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index c6332d75025e..c52eba407ebd 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -272,7 +272,7 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
struct drm_gpu_scheduler *sched = s_job->sched;
  
  	spin_lock(>job_list_lock);

-   list_add_tail(_job->node, >ring_mirror_list);
+   list_add_tail(_job->list, >ring_mirror_list);
drm_sched_start_timeout(sched);
spin_unlock(>job_list_lock);
  }
@@ -287,7 +287,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
/* Protects against concurrent deletion in drm_sched_get_cleanup_job */
spin_lock(>job_list_lock);
job = list_first_entry_or_null(>ring_mirror_list,
-  struct drm_sched_job, node);
+  struct drm_sched_job, list);
  
  	if (job) {

/*
@@ -295,7 +295,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
 * drm_sched_cleanup_jobs. It will be reinserted back after 
sched->thread
 * is parked at which point it's safe.
 */
-   list_del_init(>node);
+   list_del_init(>list);
spin_unlock(>job_list_lock);
  
  		job->sched->ops->timedout_job(job);

@@ -392,7 +392,7 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct 
drm_sched_job *bad)
 * Add at the head of the queue to reflect it was the earliest
 * job extracted.
 */
-   list_add(>node, >ring_mirror_list);
+   list_add(>list,

Re: [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug