Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker

2020-04-09 Thread Chunming Zhou

We can have both of yours, I think.

Even switch to use spin_trylock, I think we are ok to have 
cond_resched() Xinhui added in this patch. That can give more chance to 
urgent task to use cpu.



-David

在 2020/4/9 22:59, Christian König 写道:

Why we break out the loops when there are pending bos to be released?


We do this anyway if we can't acquire the necessary locks. Freeing 
already deleted BOs is just a very lazy background work.



So it did not break anything with this patch I think.


Oh, the patch will certainly work. I'm just not sure if it's the ideal 
behavior.



https://elixir.bootlin.com/linux/latest/source/mm/slab.c#L4026

This is another example of the usage of  cond_sched.


Yes, and that is also a good example of what I mean here:

	if  (!mutex_trylock 
<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fident%2Fmutex_trylock=02%7C01%7CDavid1.Zhou%40amd.com%7Cfcae774489544a033c0608d7dc969f31%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637220411795246517=XACA%2BgpBkJEgPva9c7Wf6Ca1bAOrNaXARf%2B4ze1Mqyw%3D=0>(_mutex 
<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fident%2Fslab_mutex=02%7C01%7CDavid1.Zhou%40amd.com%7Cfcae774489544a033c0608d7dc969f31%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637220411795256512=z9uAZriS46hBXuYeVaYkAWb%2BPLYwLcCrK%2BWa4DRO0zw%3D=0>))

/* Give up. Setup the next iteration. */
goto  out;


If the function can't acquire the lock immediately it gives up and 
waits for the next iteration.


I think it would be better if we do this in TTM as well if we spend to 
much time cleaning up old BOs.


On the other hand you are right that cond_resched() has the advantage 
that we could spend more time on cleaning up old BOs if there is 
nothing else for the CPU TODO.


Regards,
Christian.

Am 09.04.20 um 16:24 schrieb Pan, Xinhui:

https://elixir.bootlin.com/linux/latest/source/mm/slab.c#L4026

This is another example of the usage of  cond_sched.

*From:* Pan, Xinhui 
*Sent:* Thursday, April 9, 2020 10:11:08 PM
*To:* Lucas Stach ; 
amd-gfx@lists.freedesktop.org ; 
Koenig, Christian 

*Cc:* dri-de...@lists.freedesktop.org 
*Subject:* Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed 
delete worker
I think it doesn't matter if workitem schedule out. Even we did not 
schedule out, the workqueue itself will schedule out later.

So it did not break anything with this patch I think.

*From:* Pan, Xinhui 
*Sent:* Thursday, April 9, 2020 10:07:09 PM
*To:* Lucas Stach ; 
amd-gfx@lists.freedesktop.org ; 
Koenig, Christian 

*Cc:* dri-de...@lists.freedesktop.org 
*Subject:* Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed 
delete worker

Why we break out the loops when there are pending bos to be released?

And I just checked the process_one_work. Right after the work item 
callback is called,  the workqueue itself will call cond_resched. So 
I think


*From:* Koenig, Christian 
*Sent:* Thursday, April 9, 2020 9:38:24 PM
*To:* Lucas Stach ; Pan, Xinhui 
; amd-gfx@lists.freedesktop.org 


*Cc:* dri-de...@lists.freedesktop.org 
*Subject:* Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed 
delete worker

Am 09.04.20 um 15:25 schrieb Lucas Stach:
> Am Donnerstag, den 09.04.2020, 14:35 +0200 schrieb Christian König:
>> Am 09.04.20 um 03:31 schrieb xinhui pan:
>>> The delayed delete list is per device which might be very huge. 
And in

>>> a heavy workload test, the list might always not be empty. That will
>>> trigger any RCU stall warnings or softlockups in non-preemptible 
kernels

>>> Lets do schedule out if possible in that case.
>> Mhm, I'm not sure if that is actually allowed. This is called from a
>> work item and those are not really supposed to be scheduled away.
> Huh? Workitems can schedule out just fine, otherwise they would be
> horribly broken when it comes to sleeping locks.

Let me refine the sentence: Work items are not really supposed to be
scheduled purposely. E.g. you shouldn't call schedule() or
cond_resched() like in the case here.

Getting scheduled away because we wait for a lock is of course perfectly
fine.

>   The workqueue code
> even has measures to keep the workqueues at the expected concurrency
> level by starting other workitems when one of them goes to sleep.

Yeah, and exactly that's what I would say we should avoid here :)

In other words work items can be scheduled away, but they should not if
not really necessary (e.g. waiting for a lock).

Otherwise as you said new threads for work item processing are started
up and I don't think we want that.

Just returning from the work item 

Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker

2020-04-09 Thread Pan, Xinhui


> 2020年4月9日 22:59,Koenig, Christian  写道:
> 
>> Why we break out the loops when there are pending bos to be released?
> 
> We do this anyway if we can't acquire the necessary locks. Freeing already 
> deleted BOs is just a very lazy background work.

That is true. eviction will reclaim the BO resource too.

> 
>> So it did not break anything with this patch I think.
> 
> Oh, the patch will certainly work. I'm just not sure if it's the ideal 
> behavior.
> 
>> https://elixir.bootlin.com/linux/latest/source/mm/slab.c#L4026
>> 
>> This is another example of the usage of  cond_sched.
> 
> Yes, and that is also a good example of what I mean here:
> 
>>  if (!mutex_trylock(_mutex))
>> 
>>  
>> /* Give up. Setup the next iteration. */
>> 
>>  
>> goto out;
> 
> If the function can't acquire the lock immediately it gives up and waits for 
> the next iteration.
> 
> I think it would be better if we do this in TTM as well if we spend to much 
> time cleaning up old BOs.

fair enough.

> 
> On the other hand you are right that cond_resched() has the advantage that we 
> could spend more time on cleaning up old BOs if there is nothing else for the 
> CPU TODO.
> 
> Regards,
> Christian.
> 
> Am 09.04.20 um 16:24 schrieb Pan, Xinhui:
>> https://elixir.bootlin.com/linux/latest/source/mm/slab.c#L4026
>> 
>> This is another example of the usage of  cond_sched.
>> From: Pan, Xinhui 
>> Sent: Thursday, April 9, 2020 10:11:08 PM
>> To: Lucas Stach ; amd-gfx@lists.freedesktop.org 
>> ; Koenig, Christian 
>> Cc: dri-de...@lists.freedesktop.org 
>> Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete 
>> worker
>>  
>> I think it doesn't matter if workitem schedule out. Even we did not schedule 
>> out, the workqueue itself will schedule out later.
>> So it did not break anything with this patch I think.
>> From: Pan, Xinhui 
>> Sent: Thursday, April 9, 2020 10:07:09 PM
>> To: Lucas Stach ; amd-gfx@lists.freedesktop.org 
>> ; Koenig, Christian 
>> Cc: dri-de...@lists.freedesktop.org 
>> Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete 
>> worker
>>  
>> Why we break out the loops when there are pending bos to be released?
>> 
>> And I just checked the process_one_work. Right after the work item callback 
>> is called,  the workqueue itself will call cond_resched. So I think
>> From: Koenig, Christian 
>> Sent: Thursday, April 9, 2020 9:38:24 PM
>> To: Lucas Stach ; Pan, Xinhui ; 
>> amd-gfx@lists.freedesktop.org 
>> Cc: dri-de...@lists.freedesktop.org 
>> Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete 
>> worker
>>  
>> Am 09.04.20 um 15:25 schrieb Lucas Stach:
>> > Am Donnerstag, den 09.04.2020, 14:35 +0200 schrieb Christian König:
>> >> Am 09.04.20 um 03:31 schrieb xinhui pan:
>> >>> The delayed delete list is per device which might be very huge. And in
>> >>> a heavy workload test, the list might always not be empty. That will
>> >>> trigger any RCU stall warnings or softlockups in non-preemptible kernels
>> >>> Lets do schedule out if possible in that case.
>> >> Mhm, I'm not sure if that is actually allowed. This is called from a
>> >> work item and those are not really supposed to be scheduled away.
>> > Huh? Workitems can schedule out just fine, otherwise they would be
>> > horribly broken when it comes to sleeping locks.
>> 
>> Let me refine the sentence: Work items are not really supposed to be 
>> scheduled purposely. E.g. you shouldn't call schedule() or 
>> cond_resched() like in the case here.
>> 
>> Getting scheduled away because we wait for a lock is of course perfectly 
>> fine.
>> 
>> >   The workqueue code
>> > even has measures to keep the workqueues at the expected concurrency
>> > level by starting other workitems when one of them goes to sleep.
>> 
>> Yeah, and exactly that's what I would say we should avoid here :)
>> 
>> In other words work items can be scheduled away, but they should not if 
>> not really necessary (e.g. waiting for a lock).
>> 
>> Otherwise as you said new threads for work item processing are started 
>> up and I don't think we want that.
>> 
>> Just returning from the work item and waiting for the next cycle is most 
>> likely the better option.
>> 
>> Regards,
>> Christian.
>> 
>> >
>> > Regards,
>&g

Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker

2020-04-09 Thread Christian König

Why we break out the loops when there are pending bos to be released?


We do this anyway if we can't acquire the necessary locks. Freeing 
already deleted BOs is just a very lazy background work.



So it did not break anything with this patch I think.


Oh, the patch will certainly work. I'm just not sure if it's the ideal 
behavior.



https://elixir.bootlin.com/linux/latest/source/mm/slab.c#L4026

This is another example of the usage of  cond_sched.


Yes, and that is also a good example of what I mean here:

	if  (!mutex_trylock 
<https://elixir.bootlin.com/linux/latest/ident/mutex_trylock>(_mutex 
<https://elixir.bootlin.com/linux/latest/ident/slab_mutex>))

/* Give up. Setup the next iteration. */
goto  out;


If the function can't acquire the lock immediately it gives up and waits 
for the next iteration.


I think it would be better if we do this in TTM as well if we spend to 
much time cleaning up old BOs.


On the other hand you are right that cond_resched() has the advantage 
that we could spend more time on cleaning up old BOs if there is nothing 
else for the CPU TODO.


Regards,
Christian.

Am 09.04.20 um 16:24 schrieb Pan, Xinhui:

https://elixir.bootlin.com/linux/latest/source/mm/slab.c#L4026

This is another example of the usage of  cond_sched.

*From:* Pan, Xinhui 
*Sent:* Thursday, April 9, 2020 10:11:08 PM
*To:* Lucas Stach ; 
amd-gfx@lists.freedesktop.org ; Koenig, 
Christian 

*Cc:* dri-de...@lists.freedesktop.org 
*Subject:* Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed 
delete worker
I think it doesn't matter if workitem schedule out. Even we did not 
schedule out, the workqueue itself will schedule out later.

So it did not break anything with this patch I think.

*From:* Pan, Xinhui 
*Sent:* Thursday, April 9, 2020 10:07:09 PM
*To:* Lucas Stach ; 
amd-gfx@lists.freedesktop.org ; Koenig, 
Christian 

*Cc:* dri-de...@lists.freedesktop.org 
*Subject:* Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed 
delete worker

Why we break out the loops when there are pending bos to be released?

And I just checked the process_one_work. Right after the work item 
callback is called,  the workqueue itself will call cond_resched. So I 
think


*From:* Koenig, Christian 
*Sent:* Thursday, April 9, 2020 9:38:24 PM
*To:* Lucas Stach ; Pan, Xinhui 
; amd-gfx@lists.freedesktop.org 


*Cc:* dri-de...@lists.freedesktop.org 
*Subject:* Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed 
delete worker

Am 09.04.20 um 15:25 schrieb Lucas Stach:
> Am Donnerstag, den 09.04.2020, 14:35 +0200 schrieb Christian König:
>> Am 09.04.20 um 03:31 schrieb xinhui pan:
>>> The delayed delete list is per device which might be very huge. And in
>>> a heavy workload test, the list might always not be empty. That will
>>> trigger any RCU stall warnings or softlockups in non-preemptible 
kernels

>>> Lets do schedule out if possible in that case.
>> Mhm, I'm not sure if that is actually allowed. This is called from a
>> work item and those are not really supposed to be scheduled away.
> Huh? Workitems can schedule out just fine, otherwise they would be
> horribly broken when it comes to sleeping locks.

Let me refine the sentence: Work items are not really supposed to be
scheduled purposely. E.g. you shouldn't call schedule() or
cond_resched() like in the case here.

Getting scheduled away because we wait for a lock is of course perfectly
fine.

>   The workqueue code
> even has measures to keep the workqueues at the expected concurrency
> level by starting other workitems when one of them goes to sleep.

Yeah, and exactly that's what I would say we should avoid here :)

In other words work items can be scheduled away, but they should not if
not really necessary (e.g. waiting for a lock).

Otherwise as you said new threads for work item processing are started
up and I don't think we want that.

Just returning from the work item and waiting for the next cycle is most
likely the better option.

Regards,
Christian.

>
> Regards,
> Lucas
>
>> Maybe rather change the while into while (!list_empty(>ddestroy)
>> && !should_reschedule(0)).
>>
>> Christian.
>>
>>> Signed-off-by: xinhui pan 
>>> ---
>>>    drivers/gpu/drm/ttm/ttm_bo.c | 1 +
>>>    1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
b/drivers/gpu/drm/ttm/ttm_bo.c

>>> index 9e07c3f75156..b8d853cab33b 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -541,6 +541,7 @@ static bool ttm_

Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker

2020-04-09 Thread Pan, Xinhui
https://elixir.bootlin.com/linux/latest/source/mm/slab.c#L4026

This is another example of the usage of  cond_sched.

From: Pan, Xinhui 
Sent: Thursday, April 9, 2020 10:11:08 PM
To: Lucas Stach ; amd-gfx@lists.freedesktop.org 
; Koenig, Christian 
Cc: dri-de...@lists.freedesktop.org 
Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete 
worker

I think it doesn't matter if workitem schedule out. Even we did not schedule 
out, the workqueue itself will schedule out later.
So it did not break anything with this patch I think.

From: Pan, Xinhui 
Sent: Thursday, April 9, 2020 10:07:09 PM
To: Lucas Stach ; amd-gfx@lists.freedesktop.org 
; Koenig, Christian 
Cc: dri-de...@lists.freedesktop.org 
Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete 
worker

Why we break out the loops when there are pending bos to be released?

And I just checked the process_one_work. Right after the work item callback is 
called,  the workqueue itself will call cond_resched. So I think

From: Koenig, Christian 
Sent: Thursday, April 9, 2020 9:38:24 PM
To: Lucas Stach ; Pan, Xinhui ; 
amd-gfx@lists.freedesktop.org 
Cc: dri-de...@lists.freedesktop.org 
Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete 
worker

Am 09.04.20 um 15:25 schrieb Lucas Stach:
> Am Donnerstag, den 09.04.2020, 14:35 +0200 schrieb Christian König:
>> Am 09.04.20 um 03:31 schrieb xinhui pan:
>>> The delayed delete list is per device which might be very huge. And in
>>> a heavy workload test, the list might always not be empty. That will
>>> trigger any RCU stall warnings or softlockups in non-preemptible kernels
>>> Lets do schedule out if possible in that case.
>> Mhm, I'm not sure if that is actually allowed. This is called from a
>> work item and those are not really supposed to be scheduled away.
> Huh? Workitems can schedule out just fine, otherwise they would be
> horribly broken when it comes to sleeping locks.

Let me refine the sentence: Work items are not really supposed to be
scheduled purposely. E.g. you shouldn't call schedule() or
cond_resched() like in the case here.

Getting scheduled away because we wait for a lock is of course perfectly
fine.

>   The workqueue code
> even has measures to keep the workqueues at the expected concurrency
> level by starting other workitems when one of them goes to sleep.

Yeah, and exactly that's what I would say we should avoid here :)

In other words work items can be scheduled away, but they should not if
not really necessary (e.g. waiting for a lock).

Otherwise as you said new threads for work item processing are started
up and I don't think we want that.

Just returning from the work item and waiting for the next cycle is most
likely the better option.

Regards,
Christian.

>
> Regards,
> Lucas
>
>> Maybe rather change the while into while (!list_empty(>ddestroy)
>> && !should_reschedule(0)).
>>
>> Christian.
>>
>>> Signed-off-by: xinhui pan 
>>> ---
>>>drivers/gpu/drm/ttm/ttm_bo.c | 1 +
>>>1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index 9e07c3f75156..b8d853cab33b 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -541,6 +541,7 @@ static bool ttm_bo_delayed_delete(struct ttm_bo_device 
>>> *bdev, bool remove_all)
>>>  }
>>>
>>>  ttm_bo_put(bo);
>>> +   cond_resched();
>>>  spin_lock(>lru_lock);
>>>  }
>>>  list_splice_tail(, >ddestroy);
>> ___
>> dri-devel mailing list
>> dri-de...@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-develdata=02%7C01%7Cchristian.koenig%40amd.com%7C0a47486676a74702f05408d7dc89839c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637220355504145868sdata=wbRkYBPI6mYuZjKBtQN3AGLDOwqJlWY3XUtwwSiUQHg%3Dreserved=0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker

2020-04-09 Thread Pan, Xinhui
I think it doesn't matter if workitem schedule out. Even we did not schedule 
out, the workqueue itself will schedule out later.
So it did not break anything with this patch I think.

From: Pan, Xinhui 
Sent: Thursday, April 9, 2020 10:07:09 PM
To: Lucas Stach ; amd-gfx@lists.freedesktop.org 
; Koenig, Christian 
Cc: dri-de...@lists.freedesktop.org 
Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete 
worker

Why we break out the loops when there are pending bos to be released?

And I just checked the process_one_work. Right after the work item callback is 
called,  the workqueue itself will call cond_resched. So I think

From: Koenig, Christian 
Sent: Thursday, April 9, 2020 9:38:24 PM
To: Lucas Stach ; Pan, Xinhui ; 
amd-gfx@lists.freedesktop.org 
Cc: dri-de...@lists.freedesktop.org 
Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete 
worker

Am 09.04.20 um 15:25 schrieb Lucas Stach:
> Am Donnerstag, den 09.04.2020, 14:35 +0200 schrieb Christian König:
>> Am 09.04.20 um 03:31 schrieb xinhui pan:
>>> The delayed delete list is per device which might be very huge. And in
>>> a heavy workload test, the list might always not be empty. That will
>>> trigger any RCU stall warnings or softlockups in non-preemptible kernels
>>> Lets do schedule out if possible in that case.
>> Mhm, I'm not sure if that is actually allowed. This is called from a
>> work item and those are not really supposed to be scheduled away.
> Huh? Workitems can schedule out just fine, otherwise they would be
> horribly broken when it comes to sleeping locks.

Let me refine the sentence: Work items are not really supposed to be
scheduled purposely. E.g. you shouldn't call schedule() or
cond_resched() like in the case here.

Getting scheduled away because we wait for a lock is of course perfectly
fine.

>   The workqueue code
> even has measures to keep the workqueues at the expected concurrency
> level by starting other workitems when one of them goes to sleep.

Yeah, and exactly that's what I would say we should avoid here :)

In other words work items can be scheduled away, but they should not if
not really necessary (e.g. waiting for a lock).

Otherwise as you said new threads for work item processing are started
up and I don't think we want that.

Just returning from the work item and waiting for the next cycle is most
likely the better option.

Regards,
Christian.

>
> Regards,
> Lucas
>
>> Maybe rather change the while into while (!list_empty(>ddestroy)
>> && !should_reschedule(0)).
>>
>> Christian.
>>
>>> Signed-off-by: xinhui pan 
>>> ---
>>>drivers/gpu/drm/ttm/ttm_bo.c | 1 +
>>>1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index 9e07c3f75156..b8d853cab33b 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -541,6 +541,7 @@ static bool ttm_bo_delayed_delete(struct ttm_bo_device 
>>> *bdev, bool remove_all)
>>>  }
>>>
>>>  ttm_bo_put(bo);
>>> +   cond_resched();
>>>  spin_lock(>lru_lock);
>>>  }
>>>  list_splice_tail(, >ddestroy);
>> ___
>> dri-devel mailing list
>> dri-de...@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-develdata=02%7C01%7Cchristian.koenig%40amd.com%7C0a47486676a74702f05408d7dc89839c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637220355504145868sdata=wbRkYBPI6mYuZjKBtQN3AGLDOwqJlWY3XUtwwSiUQHg%3Dreserved=0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker

2020-04-09 Thread Pan, Xinhui
[AMD Official Use Only - Internal Distribution Only]

Why we break out the loops when there are pending bos to be released?

And I just checked the process_one_work. Right after the work item callback is 
called,  the workqueue itself will call cond_resched. So I think

From: Koenig, Christian 
Sent: Thursday, April 9, 2020 9:38:24 PM
To: Lucas Stach ; Pan, Xinhui ; 
amd-gfx@lists.freedesktop.org 
Cc: dri-de...@lists.freedesktop.org 
Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete 
worker

Am 09.04.20 um 15:25 schrieb Lucas Stach:
> Am Donnerstag, den 09.04.2020, 14:35 +0200 schrieb Christian König:
>> Am 09.04.20 um 03:31 schrieb xinhui pan:
>>> The delayed delete list is per device which might be very huge. And in
>>> a heavy workload test, the list might always not be empty. That will
>>> trigger any RCU stall warnings or softlockups in non-preemptible kernels
>>> Lets do schedule out if possible in that case.
>> Mhm, I'm not sure if that is actually allowed. This is called from a
>> work item and those are not really supposed to be scheduled away.
> Huh? Workitems can schedule out just fine, otherwise they would be
> horribly broken when it comes to sleeping locks.

Let me refine the sentence: Work items are not really supposed to be
scheduled purposely. E.g. you shouldn't call schedule() or
cond_resched() like in the case here.

Getting scheduled away because we wait for a lock is of course perfectly
fine.

>   The workqueue code
> even has measures to keep the workqueues at the expected concurrency
> level by starting other workitems when one of them goes to sleep.

Yeah, and exactly that's what I would say we should avoid here :)

In other words work items can be scheduled away, but they should not if
not really necessary (e.g. waiting for a lock).

Otherwise as you said new threads for work item processing are started
up and I don't think we want that.

Just returning from the work item and waiting for the next cycle is most
likely the better option.

Regards,
Christian.

>
> Regards,
> Lucas
>
>> Maybe rather change the while into while (!list_empty(>ddestroy)
>> && !should_reschedule(0)).
>>
>> Christian.
>>
>>> Signed-off-by: xinhui pan 
>>> ---
>>>drivers/gpu/drm/ttm/ttm_bo.c | 1 +
>>>1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index 9e07c3f75156..b8d853cab33b 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -541,6 +541,7 @@ static bool ttm_bo_delayed_delete(struct ttm_bo_device 
>>> *bdev, bool remove_all)
>>>  }
>>>
>>>  ttm_bo_put(bo);
>>> +   cond_resched();
>>>  spin_lock(>lru_lock);
>>>  }
>>>  list_splice_tail(, >ddestroy);
>> ___
>> dri-devel mailing list
>> dri-de...@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-develdata=02%7C01%7Cchristian.koenig%40amd.com%7C0a47486676a74702f05408d7dc89839c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637220355504145868sdata=wbRkYBPI6mYuZjKBtQN3AGLDOwqJlWY3XUtwwSiUQHg%3Dreserved=0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker

2020-04-09 Thread Christian König

Am 09.04.20 um 15:25 schrieb Lucas Stach:

Am Donnerstag, den 09.04.2020, 14:35 +0200 schrieb Christian König:

Am 09.04.20 um 03:31 schrieb xinhui pan:

The delayed delete list is per device which might be very huge. And in
a heavy workload test, the list might always not be empty. That will
trigger any RCU stall warnings or softlockups in non-preemptible kernels
Lets do schedule out if possible in that case.

Mhm, I'm not sure if that is actually allowed. This is called from a
work item and those are not really supposed to be scheduled away.

Huh? Workitems can schedule out just fine, otherwise they would be
horribly broken when it comes to sleeping locks.


Let me refine the sentence: Work items are not really supposed to be 
scheduled purposely. E.g. you shouldn't call schedule() or 
cond_resched() like in the case here.


Getting scheduled away because we wait for a lock is of course perfectly 
fine.



  The workqueue code
even has measures to keep the workqueues at the expected concurrency
level by starting other workitems when one of them goes to sleep.


Yeah, and exactly that's what I would say we should avoid here :)

In other words work items can be scheduled away, but they should not if 
not really necessary (e.g. waiting for a lock).


Otherwise as you said new threads for work item processing are started 
up and I don't think we want that.


Just returning from the work item and waiting for the next cycle is most 
likely the better option.


Regards,
Christian.



Regards,
Lucas


Maybe rather change the while into while (!list_empty(>ddestroy)
&& !should_reschedule(0)).

Christian.


Signed-off-by: xinhui pan 
---
   drivers/gpu/drm/ttm/ttm_bo.c | 1 +
   1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 9e07c3f75156..b8d853cab33b 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -541,6 +541,7 @@ static bool ttm_bo_delayed_delete(struct ttm_bo_device 
*bdev, bool remove_all)
}
   
   		ttm_bo_put(bo);

+   cond_resched();
spin_lock(>lru_lock);
}
list_splice_tail(, >ddestroy);

___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-develdata=02%7C01%7Cchristian.koenig%40amd.com%7C0a47486676a74702f05408d7dc89839c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637220355504145868sdata=wbRkYBPI6mYuZjKBtQN3AGLDOwqJlWY3XUtwwSiUQHg%3Dreserved=0


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker

2020-04-09 Thread Lucas Stach
Am Donnerstag, den 09.04.2020, 14:35 +0200 schrieb Christian König:
> Am 09.04.20 um 03:31 schrieb xinhui pan:
> > The delayed delete list is per device which might be very huge. And in
> > a heavy workload test, the list might always not be empty. That will
> > trigger any RCU stall warnings or softlockups in non-preemptible kernels
> > Lets do schedule out if possible in that case.
> 
> Mhm, I'm not sure if that is actually allowed. This is called from a 
> work item and those are not really supposed to be scheduled away.

Huh? Workitems can schedule out just fine, otherwise they would be
horribly broken when it comes to sleeping locks. The workqueue code
even has measures to keep the workqueues at the expected concurrency
level by starting other workitems when one of them goes to sleep.

Regards,
Lucas

> Maybe rather change the while into while (!list_empty(>ddestroy) 
> && !should_reschedule(0)).
> 
> Christian.
> 
> > Signed-off-by: xinhui pan 
> > ---
> >   drivers/gpu/drm/ttm/ttm_bo.c | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> > index 9e07c3f75156..b8d853cab33b 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > @@ -541,6 +541,7 @@ static bool ttm_bo_delayed_delete(struct ttm_bo_device 
> > *bdev, bool remove_all)
> > }
> >   
> > ttm_bo_put(bo);
> > +   cond_resched();
> > spin_lock(>lru_lock);
> > }
> > list_splice_tail(, >ddestroy);
> 
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker

2020-04-09 Thread Christian König

Am 09.04.20 um 03:31 schrieb xinhui pan:

The delayed delete list is per device which might be very huge. And in
a heavy workload test, the list might always not be empty. That will
trigger any RCU stall warnings or softlockups in non-preemptible kernels
Lets do schedule out if possible in that case.


Mhm, I'm not sure if that is actually allowed. This is called from a 
work item and those are not really supposed to be scheduled away.


Maybe rather change the while into while (!list_empty(>ddestroy) 
&& !should_reschedule(0)).


Christian.



Signed-off-by: xinhui pan 
---
  drivers/gpu/drm/ttm/ttm_bo.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 9e07c3f75156..b8d853cab33b 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -541,6 +541,7 @@ static bool ttm_bo_delayed_delete(struct ttm_bo_device 
*bdev, bool remove_all)
}
  
  		ttm_bo_put(bo);

+   cond_resched();
spin_lock(>lru_lock);
}
list_splice_tail(, >ddestroy);


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx