subject:"Re\: \[PATCH\] drm\/scheduler\: Remove entity\->rq NULL check"

Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

2018-08-14 Thread Andrey Grodzovsky

On 08/14/2018 11:26 AM, Christian König wrote:

Am 14.08.2018 um 17:17 schrieb Andrey Grodzovsky:

I assume that this is the only code change and no locks are taken in 
drm_sched_entity_push_job -

What are you talking about? You surely now take looks in 
drm_sched_entity_push_job():

+    spin_lock(>rq_lock);
+    entity->last_user = current->group_leader;
+    if (list_empty(>list))

Oh, so your code in drm_sched_entity_flush still relies on my code in 
drm_sched_entity_push_job, OK.

What happens if process A runs drm_sched_entity_push_job after this 
code was executed from the  (dying) process B and there

are still jobs in the queue (the wait_event terminated prematurely), 
the entity already removed from rq , but bool 'first' in 
drm_sched_entity_push_job

will return false and so the entity will not be reinserted back into 
rq entity list and no wake up trigger will happen for process A 
pushing a new job.

Thought about this as well, but in this case I would say: Shit happens!

The dying process did some command submission and because of this the 
entity was killed as well when the process died and that is legitimate.

Another issue bellow -

Andrey

On 08/14/2018 03:05 AM, Christian König wrote:

I would rather like to avoid taking the lock in the hot path.

How about this:

 /* For killed process disable any more IBs enqueue right now */
    last_user = cmpxchg(>last_user, current->group_leader, 
NULL);

 if ((!last_user || last_user == current->group_leader) &&
     (current->flags & PF_EXITING) && (current->exit_code == 
SIGKILL)) {

        grab_lock();
     drm_sched_rq_remove_entity(entity->rq, entity);
        if (READ_ONCE(>last_user) != NULL)

This condition is true because just exactly now process A did 
drm_sched_entity_push_job->WRITE_ONCE(entity->last_user, 
current->group_leader);
and so the line bellow executed and entity reinserted into rq. Let's 
say also that the entity job queue is empty now. For process A bool 
'first' will be true
and hence also 
drm_sched_entity_push_job->drm_sched_rq_add_entity(entity->rq, 
entity) will take place causing double insertion of the entity queue 
into rq list.

Calling drm_sched_rq_add_entity() is harmless, it is protected against 
double insertion.

Missed that one, right...

But thinking more about it your idea of adding a killed or finished 
flag becomes more and more appealing to have a consistent handling here.

Christian.

So to be clear - you would like something like

Removing entity->last_user and adding a 'stopped' flag to 
drm_sched_entity to be set in drm_sched_entity_flush and in

drm_sched_entity_push_job check for  'if (entity->stopped)' and when 
true just return some error back to user instead of pushing the job ?

Andrey

Andrey

drm_sched_rq_add_entity(entity->rq, entity);
        drop_lock();
    }

Christian.

Am 13.08.2018 um 18:43 schrieb Andrey Grodzovsky:

Attached.

If the general idea in the patch is OK I can think of a test (and 
maybe add to libdrm amdgpu tests) to actually simulate this 
scenario with 2 forked

concurrent processes working on same entity's job queue when one is 
dying while the other keeps pushing to the same queue. For now I 
only tested it

with normal boot and ruining multiple glxgears concurrently - which 
doesn't really test this code path since i think each of them works 
on it's own FD.

Andrey

On 08/10/2018 09:27 AM, Christian König wrote:

Crap, yeah indeed that needs to be protected by some lock.

Going to prepare a patch for that,
Christian.

Am 09.08.2018 um 21:49 schrieb Andrey Grodzovsky:

Reviewed-by: Andrey Grodzovsky 

But I still  have questions about entity->last_user (didn't 
notice this before) -

Looks to me there is a race condition with it's current usage, 
let's say process A was preempted after doing 
drm_sched_entity_flush->cmpxchg(...)

now process B working on same entity (forked) is inside 
drm_sched_entity_push_job, he writes his PID to entity->last_user 
and also

executes drm_sched_rq_add_entity. Now process A runs again and 
execute drm_sched_rq_remove_entity inadvertently causing process 
B removal

from it's scheduler rq.

Looks to me like instead we should lock together 
entity->last_user accesses and adds/removals of entity to the rq.

Andrey

On 08/06/2018 10:18 AM, Nayan Deshmukh wrote:
I forgot about this since we started discussing possible 
scenarios of processes and threads.

In any case, this check is redundant. Acked-by: Nayan Deshmukh 
mailto:nayan26deshm...@gmail.com>>

Nayan

On Mon, Aug 6, 2018 at 7:43 PM Christian König 
> wrote:

Ping. Any objections to that?

Christian.

Am 03.08.2018 um 13:08 schrieb Christian König:
> That is superflous now.
>
> Signed-off-by: Christian König mailto:christian.koe...@amd.com>>
> ---
>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 -
>   1 file changed, 5 deletions(-)

Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

2018-08-14 Thread Christian König

Am 14.08.2018 um 17:17 schrieb Andrey Grodzovsky:

I assume that this is the only code change and no locks are taken in 
drm_sched_entity_push_job -

What are you talking about? You surely now take looks in 
drm_sched_entity_push_job():

+    spin_lock(>rq_lock);
+    entity->last_user = current->group_leader;
+    if (list_empty(>list))

What happens if process A runs drm_sched_entity_push_job after this 
code was executed from the  (dying) process B and there

are still jobs in the queue (the wait_event terminated prematurely), 
the entity already removed from rq , but bool 'first' in 
drm_sched_entity_push_job

will return false and so the entity will not be reinserted back into 
rq entity list and no wake up trigger will happen for process A 
pushing a new job.

Thought about this as well, but in this case I would say: Shit happens!

The dying process did some command submission and because of this the 
entity was killed as well when the process died and that is legitimate.

Another issue bellow -

Andrey

On 08/14/2018 03:05 AM, Christian König wrote:

I would rather like to avoid taking the lock in the hot path.

How about this:

 /* For killed process disable any more IBs enqueue right now */
    last_user = cmpxchg(>last_user, current->group_leader, NULL);
 if ((!last_user || last_user == current->group_leader) &&
     (current->flags & PF_EXITING) && (current->exit_code == 
SIGKILL)) {

        grab_lock();
     drm_sched_rq_remove_entity(entity->rq, entity);
        if (READ_ONCE(>last_user) != NULL)

This condition is true because just exactly now process A did 
drm_sched_entity_push_job->WRITE_ONCE(entity->last_user, 
current->group_leader);
and so the line bellow executed and entity reinserted into rq. Let's 
say also that the entity job queue is empty now. For process A bool 
'first' will be true
and hence also 
drm_sched_entity_push_job->drm_sched_rq_add_entity(entity->rq, entity) 
will take place causing double insertion of the entity queue into rq list.

Calling drm_sched_rq_add_entity() is harmless, it is protected against 
double insertion.

But thinking more about it your idea of adding a killed or finished flag 
becomes more and more appealing to have a consistent handling here.

Christian.

Andrey

drm_sched_rq_add_entity(entity->rq, entity);
        drop_lock();
    }

Christian.

Am 13.08.2018 um 18:43 schrieb Andrey Grodzovsky:

Attached.

If the general idea in the patch is OK I can think of a test (and 
maybe add to libdrm amdgpu tests) to actually simulate this scenario 
with 2 forked

concurrent processes working on same entity's job queue when one is 
dying while the other keeps pushing to the same queue. For now I 
only tested it

with normal boot and ruining multiple glxgears concurrently - which 
doesn't really test this code path since i think each of them works 
on it's own FD.

Andrey

On 08/10/2018 09:27 AM, Christian König wrote:

Crap, yeah indeed that needs to be protected by some lock.

Going to prepare a patch for that,
Christian.

Am 09.08.2018 um 21:49 schrieb Andrey Grodzovsky:

Reviewed-by: Andrey Grodzovsky 

But I still  have questions about entity->last_user (didn't notice 
this before) -

Looks to me there is a race condition with it's current usage, 
let's say process A was preempted after doing 
drm_sched_entity_flush->cmpxchg(...)

now process B working on same entity (forked) is inside 
drm_sched_entity_push_job, he writes his PID to entity->last_user 
and also

executes drm_sched_rq_add_entity. Now process A runs again and 
execute drm_sched_rq_remove_entity inadvertently causing process B 
removal

from it's scheduler rq.

Looks to me like instead we should lock together entity->last_user 
accesses and adds/removals of entity to the rq.

Andrey

On 08/06/2018 10:18 AM, Nayan Deshmukh wrote:
I forgot about this since we started discussing possible 
scenarios of processes and threads.

In any case, this check is redundant. Acked-by: Nayan Deshmukh 
mailto:nayan26deshm...@gmail.com>>

Nayan

On Mon, Aug 6, 2018 at 7:43 PM Christian König 
> wrote:

Ping. Any objections to that?

Christian.

Am 03.08.2018 um 13:08 schrieb Christian König:
> That is superflous now.
>
> Signed-off-by: Christian König mailto:christian.koe...@amd.com>>
> ---
>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 -
>   1 file changed, 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> index 85908c7f913e..65078dd3c82c 100644
> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> @@ -590,11 +590,6 @@ void drm_sched_entity_push_job(struct
drm_sched_job *sched_job,
>       if (first) {
>               /* Add the entity to the run queue */
>  spin_lock(>rq_lock);
> -             if

Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

2018-08-14 Thread Andrey Grodzovsky

I assume that this is the only code change and no locks are taken in 
drm_sched_entity_push_job -

What happens if process A runs drm_sched_entity_push_job after this code 
was executed from the  (dying) process B and there

are still jobs in the queue (the wait_event terminated prematurely), the 
entity already removed from rq , but bool 'first' in 
drm_sched_entity_push_job

will return false and so the entity will not be reinserted back into rq 
entity list and no wake up trigger will happen for process A pushing a 
new job.

Another issue bellow -

Andrey

On 08/14/2018 03:05 AM, Christian König wrote:

I would rather like to avoid taking the lock in the hot path.

How about this:

 /* For killed process disable any more IBs enqueue right now */
    last_user = cmpxchg(>last_user, current->group_leader, NULL);
 if ((!last_user || last_user == current->group_leader) &&
     (current->flags & PF_EXITING) && (current->exit_code == 
SIGKILL)) {

        grab_lock();
     drm_sched_rq_remove_entity(entity->rq, entity);
        if (READ_ONCE(>last_user) != NULL)

This condition is true because just exactly now process A did 
drm_sched_entity_push_job->WRITE_ONCE(entity->last_user, 
current->group_leader);
and so the line bellow executed and entity reinserted into rq. Let's say 
also that the entity job queue is empty now. For process A bool 'first' 
will be true
and hence also 
drm_sched_entity_push_job->drm_sched_rq_add_entity(entity->rq, entity) 
will take place causing double insertion of the entity queue into rq list.

Andrey

drm_sched_rq_add_entity(entity->rq, entity);
        drop_lock();
    }

Christian.

Am 13.08.2018 um 18:43 schrieb Andrey Grodzovsky:

Attached.

If the general idea in the patch is OK I can think of a test (and 
maybe add to libdrm amdgpu tests) to actually simulate this scenario 
with 2 forked

concurrent processes working on same entity's job queue when one is 
dying while the other keeps pushing to the same queue. For now I only 
tested it

with normal boot and ruining multiple glxgears concurrently - which 
doesn't really test this code path since i think each of them works 
on it's own FD.

Andrey

On 08/10/2018 09:27 AM, Christian König wrote:

Crap, yeah indeed that needs to be protected by some lock.

Going to prepare a patch for that,
Christian.

Am 09.08.2018 um 21:49 schrieb Andrey Grodzovsky:

Reviewed-by: Andrey Grodzovsky 

But I still  have questions about entity->last_user (didn't notice 
this before) -

Looks to me there is a race condition with it's current usage, 
let's say process A was preempted after doing 
drm_sched_entity_flush->cmpxchg(...)

now process B working on same entity (forked) is inside 
drm_sched_entity_push_job, he writes his PID to entity->last_user 
and also

executes drm_sched_rq_add_entity. Now process A runs again and 
execute drm_sched_rq_remove_entity inadvertently causing process B 
removal

from it's scheduler rq.

Looks to me like instead we should lock together entity->last_user 
accesses and adds/removals of entity to the rq.

Andrey

On 08/06/2018 10:18 AM, Nayan Deshmukh wrote:
I forgot about this since we started discussing possible scenarios 
of processes and threads.

In any case, this check is redundant. Acked-by: Nayan Deshmukh 
mailto:nayan26deshm...@gmail.com>>

Nayan

On Mon, Aug 6, 2018 at 7:43 PM Christian König 
> wrote:

Ping. Any objections to that?

Christian.

Am 03.08.2018 um 13:08 schrieb Christian König:
> That is superflous now.
>
> Signed-off-by: Christian König mailto:christian.koe...@amd.com>>
> ---
>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 -
>   1 file changed, 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> index 85908c7f913e..65078dd3c82c 100644
> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> @@ -590,11 +590,6 @@ void drm_sched_entity_push_job(struct
drm_sched_job *sched_job,
>       if (first) {
>               /* Add the entity to the run queue */
>               spin_lock(>rq_lock);
> -             if (!entity->rq) {
> -                     DRM_ERROR("Trying to push to a killed
entity\n");
> -  spin_unlock(>rq_lock);
> -                     return;
> -             }
>  drm_sched_rq_add_entity(entity->rq, entity);
>  spin_unlock(>rq_lock);
>  drm_sched_wakeup(entity->rq->sched);

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

2018-08-14 Thread Christian König

I would rather like to avoid taking the lock in the hot path.

How about this:

 /* For killed process disable any more IBs enqueue right now */
    last_user = cmpxchg(>last_user, current->group_leader, NULL);
 if ((!last_user || last_user == current->group_leader) &&
     (current->flags & PF_EXITING) && (current->exit_code == 
SIGKILL)) {

        grab_lock();
     drm_sched_rq_remove_entity(entity->rq, entity);
        if (READ_ONCE(>last_user) != NULL)
            drm_sched_rq_add_entity(entity->rq, entity);
        drop_lock();
    }

Christian.

Am 13.08.2018 um 18:43 schrieb Andrey Grodzovsky:

Attached.

If the general idea in the patch is OK I can think of a test (and 
maybe add to libdrm amdgpu tests) to actually simulate this scenario 
with 2 forked

concurrent processes working on same entity's job queue when one is 
dying while the other keeps pushing to the same queue. For now I only 
tested it

with normal boot and ruining multiple glxgears concurrently - which 
doesn't really test this code path since i think each of them works on 
it's own FD.

Andrey

On 08/10/2018 09:27 AM, Christian König wrote:

Crap, yeah indeed that needs to be protected by some lock.

Going to prepare a patch for that,
Christian.

Am 09.08.2018 um 21:49 schrieb Andrey Grodzovsky:

Reviewed-by: Andrey Grodzovsky 

But I still  have questions about entity->last_user (didn't notice 
this before) -

Looks to me there is a race condition with it's current usage, let's 
say process A was preempted after doing 
drm_sched_entity_flush->cmpxchg(...)

now process B working on same entity (forked) is inside 
drm_sched_entity_push_job, he writes his PID to entity->last_user 
and also

executes drm_sched_rq_add_entity. Now process A runs again and 
execute drm_sched_rq_remove_entity inadvertently causing process B 
removal

from it's scheduler rq.

Looks to me like instead we should lock together entity->last_user 
accesses and adds/removals of entity to the rq.

Andrey

On 08/06/2018 10:18 AM, Nayan Deshmukh wrote:
I forgot about this since we started discussing possible scenarios 
of processes and threads.

In any case, this check is redundant. Acked-by: Nayan Deshmukh 
mailto:nayan26deshm...@gmail.com>>

Nayan

On Mon, Aug 6, 2018 at 7:43 PM Christian König 
> wrote:

Ping. Any objections to that?

Christian.

Am 03.08.2018 um 13:08 schrieb Christian König:
> That is superflous now.
>
> Signed-off-by: Christian König mailto:christian.koe...@amd.com>>
> ---
>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 -
>   1 file changed, 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> index 85908c7f913e..65078dd3c82c 100644
> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> @@ -590,11 +590,6 @@ void drm_sched_entity_push_job(struct
drm_sched_job *sched_job,
>       if (first) {
>               /* Add the entity to the run queue */
>               spin_lock(>rq_lock);
> -             if (!entity->rq) {
> -                     DRM_ERROR("Trying to push to a killed
entity\n");
> -  spin_unlock(>rq_lock);
> -                     return;
> -             }
>  drm_sched_rq_add_entity(entity->rq, entity);
>               spin_unlock(>rq_lock);
>  drm_sched_wakeup(entity->rq->sched);

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

2018-08-13 Thread Andrey Grodzovsky

Attached.

If the general idea in the patch is OK I can think of a test (and maybe 
add to libdrm amdgpu tests) to actually simulate this scenario with 2 forked

concurrent processes working on same entity's job queue when one is 
dying while the other keeps pushing to the same queue. For now I only 
tested it

with normal boot and ruining multiple glxgears concurrently - which 
doesn't really test this code path since i think each of them works on 
it's own FD.

Andrey

On 08/10/2018 09:27 AM, Christian König wrote:

Crap, yeah indeed that needs to be protected by some lock.

Going to prepare a patch for that,
Christian.

Am 09.08.2018 um 21:49 schrieb Andrey Grodzovsky:

Reviewed-by: Andrey Grodzovsky 

But I still  have questions about entity->last_user (didn't notice 
this before) -

Looks to me there is a race condition with it's current usage, let's 
say process A was preempted after doing 
drm_sched_entity_flush->cmpxchg(...)

now process B working on same entity (forked) is inside 
drm_sched_entity_push_job, he writes his PID to entity->last_user and 
also

executes drm_sched_rq_add_entity. Now process A runs again and 
execute drm_sched_rq_remove_entity inadvertently causing process B 
removal

from it's scheduler rq.

Looks to me like instead we should lock together entity->last_user 
accesses and adds/removals of entity to the rq.

Andrey

On 08/06/2018 10:18 AM, Nayan Deshmukh wrote:
I forgot about this since we started discussing possible scenarios 
of processes and threads.

In any case, this check is redundant. Acked-by: Nayan Deshmukh 
mailto:nayan26deshm...@gmail.com>>

Nayan

On Mon, Aug 6, 2018 at 7:43 PM Christian König 
> wrote:

Ping. Any objections to that?

Christian.

Am 03.08.2018 um 13:08 schrieb Christian König:
> That is superflous now.
>
> Signed-off-by: Christian König mailto:christian.koe...@amd.com>>
> ---
>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 -
>   1 file changed, 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> index 85908c7f913e..65078dd3c82c 100644
> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> @@ -590,11 +590,6 @@ void drm_sched_entity_push_job(struct
drm_sched_job *sched_job,
>       if (first) {
>               /* Add the entity to the run queue */
>               spin_lock(>rq_lock);
> -             if (!entity->rq) {
> -                     DRM_ERROR("Trying to push to a killed
entity\n");
> -  spin_unlock(>rq_lock);
> -                     return;
> -             }
>               drm_sched_rq_add_entity(entity->rq, entity);
>               spin_unlock(>rq_lock);
>  drm_sched_wakeup(entity->rq->sched);

>From c581a410e1b3f82de2bb14746d8484db2162f82c Mon Sep 17 00:00:00 2001
From: Andrey Grodzovsky 
Date: Mon, 13 Aug 2018 12:33:29 -0400
Subject: drm/scheduler: Fix possible race condition.

Problem:
The possbile race scenario is as follwing -
Process A was preempted after doing drm_sched_entity_flush->cmpxchg(...)
now process B working on same entity (forked) is inside
drm_sched_entity_push_job, he writes his PID to entity->last_user and also
executes drm_sched_rq_add_entity. Now process A runs again and
execute drm_sched_rq_remove_entity inadvertently causing process B removal
from it's scheduler rq.

Fix:
Lock together entity->last_user accesses and adds/removals
of entity to the rq.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/scheduler/gpu_scheduler.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index f566405..5649c3d 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -311,10 +311,12 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout)

 	/* For killed process disable any more IBs enqueue right now */
+	spin_lock(>rq_lock);
 	last_user = cmpxchg(>last_user, current->group_leader, NULL);
 	if ((!last_user || last_user == current->group_leader) &&
 	(current->flags & PF_EXITING) && (current->exit_code == SIGKILL))
 		drm_sched_rq_remove_entity(entity->rq, entity);
+	spin_unlock(>rq_lock);

 	return ret;
 }
@@ -596,17 +598,23 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job,

 	trace_drm_sched_job(sched_job, entity);
 	atomic_inc(>rq->sched->num_jobs);
-	WRITE_ONCE(entity->last_user, current->group_leader);
 	first = spsc_queue_push(>job_queue, _job->queue_node);

-	/* first job wakes up scheduler */
-	if (first) {
-		/* Add the entity to the run queue */
-		spin_lock(>rq_lock);
+	/*
+	 * entity might not be attached to rq because it's the first time we use it
+	 * or because another process removed the

Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

2018-08-10 Thread Andrey Grodzovsky

I can take care of this.

Andrey

On 08/10/2018 09:27 AM, Christian König wrote:

Crap, yeah indeed that needs to be protected by some lock.

Going to prepare a patch for that,
Christian.

Am 09.08.2018 um 21:49 schrieb Andrey Grodzovsky:

Reviewed-by: Andrey Grodzovsky 

But I still  have questions about entity->last_user (didn't notice 
this before) -

Looks to me there is a race condition with it's current usage, let's 
say process A was preempted after doing 
drm_sched_entity_flush->cmpxchg(...)

now process B working on same entity (forked) is inside 
drm_sched_entity_push_job, he writes his PID to entity->last_user and 
also

executes drm_sched_rq_add_entity. Now process A runs again and 
execute drm_sched_rq_remove_entity inadvertently causing process B 
removal

from it's scheduler rq.

Looks to me like instead we should lock together entity->last_user 
accesses and adds/removals of entity to the rq.

Andrey

On 08/06/2018 10:18 AM, Nayan Deshmukh wrote:
I forgot about this since we started discussing possible scenarios 
of processes and threads.

In any case, this check is redundant. Acked-by: Nayan Deshmukh 
mailto:nayan26deshm...@gmail.com>>

Nayan

On Mon, Aug 6, 2018 at 7:43 PM Christian König 
> wrote:

Ping. Any objections to that?

Christian.

Am 03.08.2018 um 13:08 schrieb Christian König:
> That is superflous now.
>
> Signed-off-by: Christian König mailto:christian.koe...@amd.com>>
> ---
>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 -
>   1 file changed, 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> index 85908c7f913e..65078dd3c82c 100644
> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> @@ -590,11 +590,6 @@ void drm_sched_entity_push_job(struct
drm_sched_job *sched_job,
>       if (first) {
>               /* Add the entity to the run queue */
>               spin_lock(>rq_lock);
> -             if (!entity->rq) {
> -                     DRM_ERROR("Trying to push to a killed
entity\n");
> -  spin_unlock(>rq_lock);
> -                     return;
> -             }
>               drm_sched_rq_add_entity(entity->rq, entity);
>               spin_unlock(>rq_lock);
>  drm_sched_wakeup(entity->rq->sched);

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

2018-08-10 Thread Christian König


Crap, yeah indeed that needs to be protected by some lock.

Going to prepare a patch for that,
Christian.

Am 09.08.2018 um 21:49 schrieb Andrey Grodzovsky:


Reviewed-by: Andrey Grodzovsky 


But I still  have questions about entity->last_user (didn't notice 
this before) -


Looks to me there is a race condition with it's current usage, let's 
say process A was preempted after doing 
drm_sched_entity_flush->cmpxchg(...)


now process B working on same entity (forked) is inside 
drm_sched_entity_push_job, he writes his PID to entity->last_user and also


executes drm_sched_rq_add_entity. Now process A runs again and execute 
drm_sched_rq_remove_entity inadvertently causing process B removal


from it's scheduler rq.

Looks to me like instead we should lock together entity->last_user 
accesses and adds/removals of entity to the rq.


Andrey


On 08/06/2018 10:18 AM, Nayan Deshmukh wrote:
I forgot about this since we started discussing possible scenarios of 
processes and threads.


In any case, this check is redundant. Acked-by: Nayan Deshmukh 
mailto:nayan26deshm...@gmail.com>>


Nayan

On Mon, Aug 6, 2018 at 7:43 PM Christian König 
> wrote:


Ping. Any objections to that?

Christian.

Am 03.08.2018 um 13:08 schrieb Christian König:
> That is superflous now.
>
> Signed-off-by: Christian König mailto:christian.koe...@amd.com>>
> ---
>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 -
>   1 file changed, 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> index 85908c7f913e..65078dd3c82c 100644
> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> @@ -590,11 +590,6 @@ void drm_sched_entity_push_job(struct
drm_sched_job *sched_job,
>       if (first) {
>               /* Add the entity to the run queue */
>               spin_lock(>rq_lock);
> -             if (!entity->rq) {
> -                     DRM_ERROR("Trying to push to a killed
entity\n");
> -  spin_unlock(>rq_lock);
> -                     return;
> -             }
>               drm_sched_rq_add_entity(entity->rq, entity);
>               spin_unlock(>rq_lock);
>  drm_sched_wakeup(entity->rq->sched);





___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

2018-08-09 Thread Andrey Grodzovsky


Reviewed-by: Andrey Grodzovsky 


But I still  have questions about entity->last_user (didn't notice this 
before) -


Looks to me there is a race condition with it's current usage, let's say 
process A was preempted after doing drm_sched_entity_flush->cmpxchg(...)


now process B working on same entity (forked) is inside 
drm_sched_entity_push_job, he writes his PID to entity->last_user and also


executes drm_sched_rq_add_entity. Now process A runs again and execute 
drm_sched_rq_remove_entity inadvertently causing process B removal


from it's scheduler rq.

Looks to me like instead we should lock together entity->last_user 
accesses and adds/removals of entity to the rq.


Andrey


On 08/06/2018 10:18 AM, Nayan Deshmukh wrote:
I forgot about this since we started discussing possible scenarios of 
processes and threads.


In any case, this check is redundant. Acked-by: Nayan Deshmukh 
mailto:nayan26deshm...@gmail.com>>


Nayan

On Mon, Aug 6, 2018 at 7:43 PM Christian König 
> wrote:


Ping. Any objections to that?

Christian.

Am 03.08.2018 um 13:08 schrieb Christian König:
> That is superflous now.
>
> Signed-off-by: Christian König mailto:christian.koe...@amd.com>>
> ---
>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 -
>   1 file changed, 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> index 85908c7f913e..65078dd3c82c 100644
> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> @@ -590,11 +590,6 @@ void drm_sched_entity_push_job(struct
drm_sched_job *sched_job,
>       if (first) {
>               /* Add the entity to the run queue */
>               spin_lock(>rq_lock);
> -             if (!entity->rq) {
> -                     DRM_ERROR("Trying to push to a killed
entity\n");
> -  spin_unlock(>rq_lock);
> -                     return;
> -             }
>               drm_sched_rq_add_entity(entity->rq, entity);
>               spin_unlock(>rq_lock);
>               drm_sched_wakeup(entity->rq->sched);



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

2018-08-06 Thread Nayan Deshmukh

I forgot about this since we started discussing possible scenarios of
processes and threads.

In any case, this check is redundant. Acked-by: Nayan Deshmukh <
nayan26deshm...@gmail.com>

Nayan

On Mon, Aug 6, 2018 at 7:43 PM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Ping. Any objections to that?
>
> Christian.
>
> Am 03.08.2018 um 13:08 schrieb Christian König:
> > That is superflous now.
> >
> > Signed-off-by: Christian König 
> > ---
> >   drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 -
> >   1 file changed, 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > index 85908c7f913e..65078dd3c82c 100644
> > --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > @@ -590,11 +590,6 @@ void drm_sched_entity_push_job(struct drm_sched_job
> *sched_job,
> >   if (first) {
> >   /* Add the entity to the run queue */
> >   spin_lock(>rq_lock);
> > - if (!entity->rq) {
> > - DRM_ERROR("Trying to push to a killed entity\n");
> > - spin_unlock(>rq_lock);
> > - return;
> > - }
> >   drm_sched_rq_add_entity(entity->rq, entity);
> >   spin_unlock(>rq_lock);
> >   drm_sched_wakeup(entity->rq->sched);
>
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

2018-08-06 Thread Christian König


Ping. Any objections to that?

Christian.

Am 03.08.2018 um 13:08 schrieb Christian König:

That is superflous now.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 -
  1 file changed, 5 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index 85908c7f913e..65078dd3c82c 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -590,11 +590,6 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job,
if (first) {
/* Add the entity to the run queue */
spin_lock(>rq_lock);
-   if (!entity->rq) {
-   DRM_ERROR("Trying to push to a killed entity\n");
-   spin_unlock(>rq_lock);
-   return;
-   }
drm_sched_rq_add_entity(entity->rq, entity);
spin_unlock(>rq_lock);
drm_sched_wakeup(entity->rq->sched);


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

10 matches

Site Navigation

Mail list logo

Footer information