Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.

2018-05-16 Thread Andrey Grodzovsky



On 05/16/2018 09:16 AM, Lucas Stach wrote:

Am Mittwoch, den 16.05.2018, 15:10 +0200 schrieb Christian König:

Am 16.05.2018 um 15:00 schrieb Lucas Stach:

Am Mittwoch, den 16.05.2018, 14:32 +0200 schrieb Christian König:

Am 16.05.2018 um 14:28 schrieb Lucas Stach:

Am Mittwoch, den 16.05.2018, 14:08 +0200 schrieb Christian König:

Yes, exactly.

For normal user space command submission we should have tons of
locks
guaranteeing that (e.g. just the VM lock should do).

For kernel moves we have the mutex for the GTT windows which
protects
it.

The could be problems with the UVD/VCE queues to cleanup the
handles
when an application crashes.

FWIW, etnaviv is currently completely unlocked in this path, but I
believe that this isn't an issue as the sched fence seq numbers are
per
entity. So to actually end up with reversed seqnos one context has
to
preempt itself to do another submit, while the current one hasn't
returned from kernel space, which I believe is a fairly theoretical
issue. Is my understanding correct?

Yes. The problem is with the right timing this can be used to access
freed up memory.

If you then manage to place a page table in that freed up memory
taking
over the system is just a typing exercise.

Thanks. I believe we don't have this problem in etnaviv, as memory
referencing is tied to the job and will only be unreferenced on
free_job, but I'll re-check this when I've got some time.

Well that depends on how you use the sequence numbers.

If you use them for job completion check somewhere then you certainly
have a problem if job A gets the 1 and B the 2, but B completes before A.

We don't do this anymore. All the etnaviv internals are driven by the
fence signal callbacks.


At bare minimum that's still a bug we need to fix because it confuses
functions like dma_fence_is_later() and dma_fence_later().

Agreed. Also amending the documentation to state that
drm_sched_job_init() and drm_sched_entity_push_job() need to be called
under a common lock seems like a good idea.


I will respin the patch with added documentation.

Andrey


Regards,
Lucas


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.

2018-05-16 Thread Lucas Stach
Am Mittwoch, den 16.05.2018, 15:10 +0200 schrieb Christian König:
> Am 16.05.2018 um 15:00 schrieb Lucas Stach:
> > Am Mittwoch, den 16.05.2018, 14:32 +0200 schrieb Christian König:
> > > Am 16.05.2018 um 14:28 schrieb Lucas Stach:
> > > > Am Mittwoch, den 16.05.2018, 14:08 +0200 schrieb Christian König:
> > > > > Yes, exactly.
> > > > > 
> > > > > For normal user space command submission we should have tons of
> > > > > locks
> > > > > guaranteeing that (e.g. just the VM lock should do).
> > > > > 
> > > > > For kernel moves we have the mutex for the GTT windows which
> > > > > protects
> > > > > it.
> > > > > 
> > > > > The could be problems with the UVD/VCE queues to cleanup the
> > > > > handles
> > > > > when an application crashes.
> > > > 
> > > > FWIW, etnaviv is currently completely unlocked in this path, but I
> > > > believe that this isn't an issue as the sched fence seq numbers are
> > > > per
> > > > entity. So to actually end up with reversed seqnos one context has
> > > > to
> > > > preempt itself to do another submit, while the current one hasn't
> > > > returned from kernel space, which I believe is a fairly theoretical
> > > > issue. Is my understanding correct?
> > > 
> > > Yes. The problem is with the right timing this can be used to access
> > > freed up memory.
> > > 
> > > If you then manage to place a page table in that freed up memory
> > > taking
> > > over the system is just a typing exercise.
> > 
> > Thanks. I believe we don't have this problem in etnaviv, as memory
> > referencing is tied to the job and will only be unreferenced on
> > free_job, but I'll re-check this when I've got some time.
> 
> Well that depends on how you use the sequence numbers.
> 
> If you use them for job completion check somewhere then you certainly 
> have a problem if job A gets the 1 and B the 2, but B completes before A.

We don't do this anymore. All the etnaviv internals are driven by the
fence signal callbacks.

> At bare minimum that's still a bug we need to fix because it confuses 
> functions like dma_fence_is_later() and dma_fence_later().

Agreed. Also amending the documentation to state that
drm_sched_job_init() and drm_sched_entity_push_job() need to be called
under a common lock seems like a good idea.

Regards,
Lucas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.

2018-05-16 Thread Christian König

Am 16.05.2018 um 15:00 schrieb Lucas Stach:

Am Mittwoch, den 16.05.2018, 14:32 +0200 schrieb Christian König:

Am 16.05.2018 um 14:28 schrieb Lucas Stach:

Am Mittwoch, den 16.05.2018, 14:08 +0200 schrieb Christian König:

Yes, exactly.

For normal user space command submission we should have tons of
locks
guaranteeing that (e.g. just the VM lock should do).

For kernel moves we have the mutex for the GTT windows which
protects
it.

The could be problems with the UVD/VCE queues to cleanup the
handles
when an application crashes.

FWIW, etnaviv is currently completely unlocked in this path, but I
believe that this isn't an issue as the sched fence seq numbers are
per
entity. So to actually end up with reversed seqnos one context has
to
preempt itself to do another submit, while the current one hasn't
returned from kernel space, which I believe is a fairly theoretical
issue. Is my understanding correct?

Yes. The problem is with the right timing this can be used to access
freed up memory.

If you then manage to place a page table in that freed up memory
taking
over the system is just a typing exercise.

Thanks. I believe we don't have this problem in etnaviv, as memory
referencing is tied to the job and will only be unreferenced on
free_job, but I'll re-check this when I've got some time.


Well that depends on how you use the sequence numbers.

If you use them for job completion check somewhere then you certainly 
have a problem if job A gets the 1 and B the 2, but B completes before A.


At bare minimum that's still a bug we need to fix because it confuses 
functions like dma_fence_is_later() and dma_fence_later().


Christian.



Regards,
Lucas


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.

2018-05-16 Thread Lucas Stach
Am Mittwoch, den 16.05.2018, 14:32 +0200 schrieb Christian König:
> Am 16.05.2018 um 14:28 schrieb Lucas Stach:
> > Am Mittwoch, den 16.05.2018, 14:08 +0200 schrieb Christian König:
> > > Yes, exactly.
> > > 
> > > For normal user space command submission we should have tons of
> > > locks
> > > guaranteeing that (e.g. just the VM lock should do).
> > > 
> > > For kernel moves we have the mutex for the GTT windows which
> > > protects
> > > it.
> > > 
> > > The could be problems with the UVD/VCE queues to cleanup the
> > > handles
> > > when an application crashes.
> > 
> > FWIW, etnaviv is currently completely unlocked in this path, but I
> > believe that this isn't an issue as the sched fence seq numbers are
> > per
> > entity. So to actually end up with reversed seqnos one context has
> > to
> > preempt itself to do another submit, while the current one hasn't
> > returned from kernel space, which I believe is a fairly theoretical
> > issue. Is my understanding correct?
> 
> Yes. The problem is with the right timing this can be used to access 
> freed up memory.
> 
> If you then manage to place a page table in that freed up memory
> taking 
> over the system is just a typing exercise.

Thanks. I believe we don't have this problem in etnaviv, as memory
referencing is tied to the job and will only be unreferenced on
free_job, but I'll re-check this when I've got some time.

Regards,
Lucas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.

2018-05-16 Thread Christian König

Am 16.05.2018 um 14:28 schrieb Lucas Stach:

Am Mittwoch, den 16.05.2018, 14:08 +0200 schrieb Christian König:

Yes, exactly.

For normal user space command submission we should have tons of
locks
guaranteeing that (e.g. just the VM lock should do).

For kernel moves we have the mutex for the GTT windows which protects
it.

The could be problems with the UVD/VCE queues to cleanup the handles
when an application crashes.

FWIW, etnaviv is currently completely unlocked in this path, but I
believe that this isn't an issue as the sched fence seq numbers are per
entity. So to actually end up with reversed seqnos one context has to
preempt itself to do another submit, while the current one hasn't
returned from kernel space, which I believe is a fairly theoretical
issue. Is my understanding correct?


Yes. The problem is with the right timing this can be used to access 
freed up memory.


If you then manage to place a page table in that freed up memory taking 
over the system is just a typing exercise.


Regards,
Christian.



Regards,
Lucas


Am 16.05.2018 um 13:47 schrieb Andrey Grodzovsky:

So are you saying that you expect this to  be already in code for
any
usage of drm_sched_fence_create and drm_sched_entity_push_job ?

lock()

drm_sched_fence_create()

... (some code)

drm_sched_entity_push_job()

unlock()


Andrey

On 05/16/2018 07:23 AM, Christian König wrote:

drm_sched_fence_create() assigns a sequence number to the fence
it
creates.

Now drm_sched_fence_create() is called by drm_sched_job_init()
to
initialize the jobs we want to push on the scheduler queue.

When you now call drm_sched_entity_push_job() without a
protecting
lock it can happen that you push two (or even more) job with
reversed
sequence numbers.

Since the sequence numbers are used to determine job completion
order
reversing them can seriously mess things up.

So the spin lock should be superfluous, if it isn't we have a
much
larger bug we need to fix.

Christian.

Am 16.05.2018 um 13:15 schrieb Andrey Grodzovsky:

Can you please elaborate more, maybe give an example - I don't
understand yet the problematic scenario.

Andrey


On 05/16/2018 02:50 AM, Christian König wrote:

No, that spinlock is indeed incorrect. I

See even when we protect the spsc queue with a spinlock that
doesn't make it correct. It can happen that the jobs pushed
to the
queue are reversed in their sequence order and that can
cause
severe problems in the memory management.

Christian.

Am 16.05.2018 um 05:33 schrieb Grodzovsky, Andrey:

Yeah, that what I am not sure about... It's lockless in a
sense of
single producer single consumer but not for multiple
concurrent
producers... So now I think this spinlock should stay
there... It
just looked useless to me at first sight...

Andrey


From: Zhou, David(ChunMing)
Sent: 15 May 2018 23:04:44
To: Grodzovsky, Andrey; amd-gfx@lists.freedesktop.org;
dri-de...@lists.freedesktop.org
Cc: Koenig, Christian
Subject: Re: [PATCH 2/2] drm/scheduler: Remove obsolete
spinlock.



On 2018年05月16日 03:31, Andrey Grodzovsky wrote:

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.c
om>
---
    drivers/gpu/drm/scheduler/gpu_scheduler.c | 4 
    include/drm/gpu_scheduler.h   | 1 -
    2 files changed, 5 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index 1f1dd70..2569a63 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -140,7 +140,6 @@ int drm_sched_entity_init(struct
drm_gpu_scheduler *sched,
    entity->last_scheduled = NULL;

    spin_lock_init(>rq_lock);
- spin_lock_init(>queue_lock);
    spsc_queue_init(>job_queue);

    atomic_set(>fence_seq, 0);
@@ -424,11 +423,8 @@ void
drm_sched_entity_push_job(struct
drm_sched_job *sched_job,

    trace_drm_sched_job(sched_job, entity);

- spin_lock(>queue_lock);
    first = spsc_queue_push(>job_queue,
_job->queue_node);

- spin_unlock(>queue_lock);

Is your spsc safely to be added simultaneously?

Regards,
David Zhou

-
    /* first job wakes up scheduler */
    if (first) {
    /* Add the entity to the run queue */
diff --git a/include/drm/gpu_scheduler.h
b/include/drm/gpu_scheduler.h
index 350a62c..683eb65 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -56,7 +56,6 @@ struct drm_sched_entity {
    spinlock_t  rq_lock;
    struct drm_gpu_scheduler    *sched;

- spinlock_t  queue_lock;
    struct spsc_queue   job_queue;

    atomic_t    fence_seq;

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
dri-devel mailing list
dri-de...@lists.freedesktop.

Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.

2018-05-16 Thread Lucas Stach
Am Mittwoch, den 16.05.2018, 14:08 +0200 schrieb Christian König:
> Yes, exactly.
> 
> For normal user space command submission we should have tons of
> locks 
> guaranteeing that (e.g. just the VM lock should do).
> 
> For kernel moves we have the mutex for the GTT windows which protects
> it.
> 
> The could be problems with the UVD/VCE queues to cleanup the handles 
> when an application crashes.

FWIW, etnaviv is currently completely unlocked in this path, but I
believe that this isn't an issue as the sched fence seq numbers are per
entity. So to actually end up with reversed seqnos one context has to
preempt itself to do another submit, while the current one hasn't
returned from kernel space, which I believe is a fairly theoretical
issue. Is my understanding correct?

Regards,
Lucas

> 
> Am 16.05.2018 um 13:47 schrieb Andrey Grodzovsky:
> > So are you saying that you expect this to  be already in code for
> > any 
> > usage of drm_sched_fence_create and drm_sched_entity_push_job ?
> > 
> > lock()
> > 
> > drm_sched_fence_create()
> > 
> > ... (some code)
> > 
> > drm_sched_entity_push_job()
> > 
> > unlock()
> > 
> > 
> > Andrey
> > 
> > On 05/16/2018 07:23 AM, Christian König wrote:
> > > drm_sched_fence_create() assigns a sequence number to the fence
> > > it 
> > > creates.
> > > 
> > > Now drm_sched_fence_create() is called by drm_sched_job_init()
> > > to 
> > > initialize the jobs we want to push on the scheduler queue.
> > > 
> > > When you now call drm_sched_entity_push_job() without a
> > > protecting 
> > > lock it can happen that you push two (or even more) job with
> > > reversed 
> > > sequence numbers.
> > > 
> > > Since the sequence numbers are used to determine job completion
> > > order 
> > > reversing them can seriously mess things up.
> > > 
> > > So the spin lock should be superfluous, if it isn't we have a
> > > much 
> > > larger bug we need to fix.
> > > 
> > > Christian.
> > > 
> > > Am 16.05.2018 um 13:15 schrieb Andrey Grodzovsky:
> > > > Can you please elaborate more, maybe give an example - I don't 
> > > > understand yet the problematic scenario.
> > > > 
> > > > Andrey
> > > > 
> > > > 
> > > > On 05/16/2018 02:50 AM, Christian König wrote:
> > > > > No, that spinlock is indeed incorrect. I
> > > > > 
> > > > > See even when we protect the spsc queue with a spinlock that 
> > > > > doesn't make it correct. It can happen that the jobs pushed
> > > > > to the 
> > > > > queue are reversed in their sequence order and that can
> > > > > cause 
> > > > > severe problems in the memory management.
> > > > > 
> > > > > Christian.
> > > > > 
> > > > > Am 16.05.2018 um 05:33 schrieb Grodzovsky, Andrey:
> > > > > > Yeah, that what I am not sure about... It's lockless in a
> > > > > > sense of 
> > > > > > single producer single consumer but not for multiple
> > > > > > concurrent 
> > > > > > producers... So now I think this spinlock should stay
> > > > > > there... It 
> > > > > > just looked useless to me at first sight...
> > > > > > 
> > > > > > Andrey
> > > > > > 
> > > > > > 
> > > > > > From: Zhou, David(ChunMing)
> > > > > > Sent: 15 May 2018 23:04:44
> > > > > > To: Grodzovsky, Andrey; amd-gfx@lists.freedesktop.org; 
> > > > > > dri-de...@lists.freedesktop.org
> > > > > > Cc: Koenig, Christian
> > > > > > Subject: Re: [PATCH 2/2] drm/scheduler: Remove obsolete
> > > > > > spinlock.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > On 2018年05月16日 03:31, Andrey Grodzovsky wrote:
> > > > > > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.c
> > > > > > > om>
> > > > > > > ---
> > > > > > >    drivers/gpu/drm/scheduler/gpu_scheduler.c | 4 
> > > > > > >    include/drm/gpu_scheduler.h   | 1 -
> > > > > > >    2 files changed, 5 deletions(-)
> > > &g

Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.

2018-05-16 Thread Christian König

Yes, exactly.

For normal user space command submission we should have tons of locks 
guaranteeing that (e.g. just the VM lock should do).


For kernel moves we have the mutex for the GTT windows which protects it.

The could be problems with the UVD/VCE queues to cleanup the handles 
when an application crashes.


Christian.

Am 16.05.2018 um 13:47 schrieb Andrey Grodzovsky:
So are you saying that you expect this to  be already in code for any 
usage of drm_sched_fence_create and drm_sched_entity_push_job ?


lock()

drm_sched_fence_create()

... (some code)

drm_sched_entity_push_job()

unlock()


Andrey

On 05/16/2018 07:23 AM, Christian König wrote:
drm_sched_fence_create() assigns a sequence number to the fence it 
creates.


Now drm_sched_fence_create() is called by drm_sched_job_init() to 
initialize the jobs we want to push on the scheduler queue.


When you now call drm_sched_entity_push_job() without a protecting 
lock it can happen that you push two (or even more) job with reversed 
sequence numbers.


Since the sequence numbers are used to determine job completion order 
reversing them can seriously mess things up.


So the spin lock should be superfluous, if it isn't we have a much 
larger bug we need to fix.


Christian.

Am 16.05.2018 um 13:15 schrieb Andrey Grodzovsky:
Can you please elaborate more, maybe give an example - I don't 
understand yet the problematic scenario.


Andrey


On 05/16/2018 02:50 AM, Christian König wrote:

No, that spinlock is indeed incorrect. I

See even when we protect the spsc queue with a spinlock that 
doesn't make it correct. It can happen that the jobs pushed to the 
queue are reversed in their sequence order and that can cause 
severe problems in the memory management.


Christian.

Am 16.05.2018 um 05:33 schrieb Grodzovsky, Andrey:
Yeah, that what I am not sure about... It's lockless in a sense of 
single producer single consumer but not for multiple concurrent 
producers... So now I think this spinlock should stay there... It 
just looked useless to me at first sight...


Andrey


From: Zhou, David(ChunMing)
Sent: 15 May 2018 23:04:44
To: Grodzovsky, Andrey; amd-gfx@lists.freedesktop.org; 
dri-de...@lists.freedesktop.org

Cc: Koenig, Christian
Subject: Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.



On 2018年05月16日 03:31, Andrey Grodzovsky wrote:

Signed-off-by: Andrey Grodzovsky <andrey.grodzov...@amd.com>
---
   drivers/gpu/drm/scheduler/gpu_scheduler.c | 4 
   include/drm/gpu_scheduler.h   | 1 -
   2 files changed, 5 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/scheduler/gpu_scheduler.c

index 1f1dd70..2569a63 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -140,7 +140,6 @@ int drm_sched_entity_init(struct 
drm_gpu_scheduler *sched,

   entity->last_scheduled = NULL;

   spin_lock_init(>rq_lock);
- spin_lock_init(>queue_lock);
   spsc_queue_init(>job_queue);

   atomic_set(>fence_seq, 0);
@@ -424,11 +423,8 @@ void drm_sched_entity_push_job(struct 
drm_sched_job *sched_job,


   trace_drm_sched_job(sched_job, entity);

- spin_lock(>queue_lock);
   first = spsc_queue_push(>job_queue, 
_job->queue_node);


- spin_unlock(>queue_lock);

Is your spsc safely to be added simultaneously?

Regards,
David Zhou

-
   /* first job wakes up scheduler */
   if (first) {
   /* Add the entity to the run queue */
diff --git a/include/drm/gpu_scheduler.h 
b/include/drm/gpu_scheduler.h

index 350a62c..683eb65 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -56,7 +56,6 @@ struct drm_sched_entity {
   spinlock_t  rq_lock;
   struct drm_gpu_scheduler    *sched;

- spinlock_t  queue_lock;
   struct spsc_queue   job_queue;

   atomic_t    fence_seq;




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.

2018-05-16 Thread Andrey Grodzovsky
So are you saying that you expect this to  be already in code for any 
usage of drm_sched_fence_create and drm_sched_entity_push_job ?


lock()

drm_sched_fence_create()

... (some code)

drm_sched_entity_push_job()

unlock()


Andrey

On 05/16/2018 07:23 AM, Christian König wrote:
drm_sched_fence_create() assigns a sequence number to the fence it 
creates.


Now drm_sched_fence_create() is called by drm_sched_job_init() to 
initialize the jobs we want to push on the scheduler queue.


When you now call drm_sched_entity_push_job() without a protecting 
lock it can happen that you push two (or even more) job with reversed 
sequence numbers.


Since the sequence numbers are used to determine job completion order 
reversing them can seriously mess things up.


So the spin lock should be superfluous, if it isn't we have a much 
larger bug we need to fix.


Christian.

Am 16.05.2018 um 13:15 schrieb Andrey Grodzovsky:
Can you please elaborate more, maybe give an example - I don't 
understand yet the problematic scenario.


Andrey


On 05/16/2018 02:50 AM, Christian König wrote:

No, that spinlock is indeed incorrect. I

See even when we protect the spsc queue with a spinlock that doesn't 
make it correct. It can happen that the jobs pushed to the queue are 
reversed in their sequence order and that can cause severe problems 
in the memory management.


Christian.

Am 16.05.2018 um 05:33 schrieb Grodzovsky, Andrey:
Yeah, that what I am not sure about... It's lockless in a sense of 
single producer single consumer but not for multiple concurrent 
producers... So now I think this spinlock should stay there... It 
just looked useless to me at first sight...


Andrey


From: Zhou, David(ChunMing)
Sent: 15 May 2018 23:04:44
To: Grodzovsky, Andrey; amd-gfx@lists.freedesktop.org; 
dri-de...@lists.freedesktop.org

Cc: Koenig, Christian
Subject: Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.



On 2018年05月16日 03:31, Andrey Grodzovsky wrote:

Signed-off-by: Andrey Grodzovsky <andrey.grodzov...@amd.com>
---
   drivers/gpu/drm/scheduler/gpu_scheduler.c | 4 
   include/drm/gpu_scheduler.h   | 1 -
   2 files changed, 5 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/scheduler/gpu_scheduler.c

index 1f1dd70..2569a63 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -140,7 +140,6 @@ int drm_sched_entity_init(struct 
drm_gpu_scheduler *sched,

   entity->last_scheduled = NULL;

   spin_lock_init(>rq_lock);
- spin_lock_init(>queue_lock);
   spsc_queue_init(>job_queue);

   atomic_set(>fence_seq, 0);
@@ -424,11 +423,8 @@ void drm_sched_entity_push_job(struct 
drm_sched_job *sched_job,


   trace_drm_sched_job(sched_job, entity);

- spin_lock(>queue_lock);
   first = spsc_queue_push(>job_queue, 
_job->queue_node);


- spin_unlock(>queue_lock);

Is your spsc safely to be added simultaneously?

Regards,
David Zhou

-
   /* first job wakes up scheduler */
   if (first) {
   /* Add the entity to the run queue */
diff --git a/include/drm/gpu_scheduler.h 
b/include/drm/gpu_scheduler.h

index 350a62c..683eb65 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -56,7 +56,6 @@ struct drm_sched_entity {
   spinlock_t  rq_lock;
   struct drm_gpu_scheduler    *sched;

- spinlock_t  queue_lock;
   struct spsc_queue   job_queue;

   atomic_t    fence_seq;




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.

2018-05-16 Thread Christian König

drm_sched_fence_create() assigns a sequence number to the fence it creates.

Now drm_sched_fence_create() is called by drm_sched_job_init() to 
initialize the jobs we want to push on the scheduler queue.


When you now call drm_sched_entity_push_job() without a protecting lock 
it can happen that you push two (or even more) job with reversed 
sequence numbers.


Since the sequence numbers are used to determine job completion order 
reversing them can seriously mess things up.


So the spin lock should be superfluous, if it isn't we have a much 
larger bug we need to fix.


Christian.

Am 16.05.2018 um 13:15 schrieb Andrey Grodzovsky:
Can you please elaborate more, maybe give an example - I don't 
understand yet the problematic scenario.


Andrey


On 05/16/2018 02:50 AM, Christian König wrote:

No, that spinlock is indeed incorrect. I

See even when we protect the spsc queue with a spinlock that doesn't 
make it correct. It can happen that the jobs pushed to the queue are 
reversed in their sequence order and that can cause severe problems 
in the memory management.


Christian.

Am 16.05.2018 um 05:33 schrieb Grodzovsky, Andrey:
Yeah, that what I am not sure about... It's lockless in a sense of 
single producer single consumer but not for multiple concurrent 
producers... So now I think this spinlock should stay there... It 
just looked useless to me at first sight...


Andrey


From: Zhou, David(ChunMing)
Sent: 15 May 2018 23:04:44
To: Grodzovsky, Andrey; amd-gfx@lists.freedesktop.org; 
dri-de...@lists.freedesktop.org

Cc: Koenig, Christian
Subject: Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.



On 2018年05月16日 03:31, Andrey Grodzovsky wrote:

Signed-off-by: Andrey Grodzovsky <andrey.grodzov...@amd.com>
---
   drivers/gpu/drm/scheduler/gpu_scheduler.c | 4 
   include/drm/gpu_scheduler.h   | 1 -
   2 files changed, 5 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/scheduler/gpu_scheduler.c

index 1f1dd70..2569a63 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -140,7 +140,6 @@ int drm_sched_entity_init(struct 
drm_gpu_scheduler *sched,

   entity->last_scheduled = NULL;

   spin_lock_init(>rq_lock);
- spin_lock_init(>queue_lock);
   spsc_queue_init(>job_queue);

   atomic_set(>fence_seq, 0);
@@ -424,11 +423,8 @@ void drm_sched_entity_push_job(struct 
drm_sched_job *sched_job,


   trace_drm_sched_job(sched_job, entity);

- spin_lock(>queue_lock);
   first = spsc_queue_push(>job_queue, 
_job->queue_node);


- spin_unlock(>queue_lock);

Is your spsc safely to be added simultaneously?

Regards,
David Zhou

-
   /* first job wakes up scheduler */
   if (first) {
   /* Add the entity to the run queue */
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 350a62c..683eb65 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -56,7 +56,6 @@ struct drm_sched_entity {
   spinlock_t  rq_lock;
   struct drm_gpu_scheduler    *sched;

- spinlock_t  queue_lock;
   struct spsc_queue   job_queue;

   atomic_t    fence_seq;




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.

2018-05-16 Thread Andrey Grodzovsky
Can you please elaborate more, maybe give an example - I don't 
understand yet the problematic scenario.


Andrey


On 05/16/2018 02:50 AM, Christian König wrote:

No, that spinlock is indeed incorrect. I

See even when we protect the spsc queue with a spinlock that doesn't 
make it correct. It can happen that the jobs pushed to the queue are 
reversed in their sequence order and that can cause severe problems in 
the memory management.


Christian.

Am 16.05.2018 um 05:33 schrieb Grodzovsky, Andrey:
Yeah, that what I am not sure about... It's lockless in a sense of 
single producer single consumer but not for multiple concurrent 
producers... So now I think this spinlock should stay there... It 
just looked useless to me at first sight...


Andrey


From: Zhou, David(ChunMing)
Sent: 15 May 2018 23:04:44
To: Grodzovsky, Andrey; amd-gfx@lists.freedesktop.org; 
dri-de...@lists.freedesktop.org

Cc: Koenig, Christian
Subject: Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.



On 2018年05月16日 03:31, Andrey Grodzovsky wrote:

Signed-off-by: Andrey Grodzovsky <andrey.grodzov...@amd.com>
---
   drivers/gpu/drm/scheduler/gpu_scheduler.c | 4 
   include/drm/gpu_scheduler.h   | 1 -
   2 files changed, 5 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/scheduler/gpu_scheduler.c

index 1f1dd70..2569a63 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -140,7 +140,6 @@ int drm_sched_entity_init(struct 
drm_gpu_scheduler *sched,

   entity->last_scheduled = NULL;

   spin_lock_init(>rq_lock);
- spin_lock_init(>queue_lock);
   spsc_queue_init(>job_queue);

   atomic_set(>fence_seq, 0);
@@ -424,11 +423,8 @@ void drm_sched_entity_push_job(struct 
drm_sched_job *sched_job,


   trace_drm_sched_job(sched_job, entity);

- spin_lock(>queue_lock);
   first = spsc_queue_push(>job_queue, 
_job->queue_node);


- spin_unlock(>queue_lock);

Is your spsc safely to be added simultaneously?

Regards,
David Zhou

-
   /* first job wakes up scheduler */
   if (first) {
   /* Add the entity to the run queue */
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 350a62c..683eb65 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -56,7 +56,6 @@ struct drm_sched_entity {
   spinlock_t  rq_lock;
   struct drm_gpu_scheduler    *sched;

- spinlock_t  queue_lock;
   struct spsc_queue   job_queue;

   atomic_t    fence_seq;




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.

2018-05-16 Thread Christian König

No, that spinlock is indeed incorrect.

See even when we protect the spsc queue with a spinlock that doesn't  
make it correct. It can happen that the jobs pushed to the queue are  
reversed in their sequence order and that can cause severe problems in  
the memory management.


Christian.

Am 16.05.2018 um 05:33 schrieb Grodzovsky, Andrey:

Yeah, that what I am not sure about... It's lockless in a sense of single 
producer single consumer but not for multiple concurrent producers... So now I 
think this spinlock should stay there... It just looked useless to me at first 
sight...

Andrey


From: Zhou, David(ChunMing)
Sent: 15 May 2018 23:04:44
To: Grodzovsky, Andrey; amd-gfx@lists.freedesktop.org; 
dri-de...@lists.freedesktop.org
Cc: Koenig, Christian
Subject: Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.



On 2018年05月16日 03:31, Andrey Grodzovsky wrote:

Signed-off-by: Andrey Grodzovsky <andrey.grodzov...@amd.com>
---
   drivers/gpu/drm/scheduler/gpu_scheduler.c | 4 
   include/drm/gpu_scheduler.h   | 1 -
   2 files changed, 5 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index 1f1dd70..2569a63 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -140,7 +140,6 @@ int drm_sched_entity_init(struct drm_gpu_scheduler *sched,
   entity->last_scheduled = NULL;

   spin_lock_init(>rq_lock);
- spin_lock_init(>queue_lock);
   spsc_queue_init(>job_queue);

   atomic_set(>fence_seq, 0);
@@ -424,11 +423,8 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job,

   trace_drm_sched_job(sched_job, entity);

- spin_lock(>queue_lock);
   first = spsc_queue_push(>job_queue, _job->queue_node);

- spin_unlock(>queue_lock);

Is your spsc safely to be added simultaneously?

Regards,
David Zhou

-
   /* first job wakes up scheduler */
   if (first) {
   /* Add the entity to the run queue */
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 350a62c..683eb65 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -56,7 +56,6 @@ struct drm_sched_entity {
   spinlock_t  rq_lock;
   struct drm_gpu_scheduler*sched;

- spinlock_t  queue_lock;
   struct spsc_queue   job_queue;

   atomic_tfence_seq;


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.

2018-05-15 Thread Grodzovsky, Andrey
Yeah, that what I am not sure about... It's lockless in a sense of single 
producer single consumer but not for multiple concurrent producers... So now I 
think this spinlock should stay there... It just looked useless to me at first 
sight...

Andrey


From: Zhou, David(ChunMing)
Sent: 15 May 2018 23:04:44
To: Grodzovsky, Andrey; amd-gfx@lists.freedesktop.org; 
dri-de...@lists.freedesktop.org
Cc: Koenig, Christian
Subject: Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.



On 2018年05月16日 03:31, Andrey Grodzovsky wrote:
> Signed-off-by: Andrey Grodzovsky <andrey.grodzov...@amd.com>
> ---
>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 4 
>   include/drm/gpu_scheduler.h   | 1 -
>   2 files changed, 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
> b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> index 1f1dd70..2569a63 100644
> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> @@ -140,7 +140,6 @@ int drm_sched_entity_init(struct drm_gpu_scheduler *sched,
>   entity->last_scheduled = NULL;
>
>   spin_lock_init(>rq_lock);
> - spin_lock_init(>queue_lock);
>   spsc_queue_init(>job_queue);
>
>   atomic_set(>fence_seq, 0);
> @@ -424,11 +423,8 @@ void drm_sched_entity_push_job(struct drm_sched_job 
> *sched_job,
>
>   trace_drm_sched_job(sched_job, entity);
>
> - spin_lock(>queue_lock);
>   first = spsc_queue_push(>job_queue, _job->queue_node);
>
> - spin_unlock(>queue_lock);
Is your spsc safely to be added simultaneously?

Regards,
David Zhou
> -
>   /* first job wakes up scheduler */
>   if (first) {
>   /* Add the entity to the run queue */
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 350a62c..683eb65 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -56,7 +56,6 @@ struct drm_sched_entity {
>   spinlock_t  rq_lock;
>   struct drm_gpu_scheduler*sched;
>
> - spinlock_t  queue_lock;
>   struct spsc_queue   job_queue;
>
>   atomic_tfence_seq;

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.

2018-05-15 Thread zhoucm1



On 2018年05月16日 03:31, Andrey Grodzovsky wrote:

Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/scheduler/gpu_scheduler.c | 4 
  include/drm/gpu_scheduler.h   | 1 -
  2 files changed, 5 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index 1f1dd70..2569a63 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -140,7 +140,6 @@ int drm_sched_entity_init(struct drm_gpu_scheduler *sched,
entity->last_scheduled = NULL;
  
  	spin_lock_init(>rq_lock);

-   spin_lock_init(>queue_lock);
spsc_queue_init(>job_queue);
  
  	atomic_set(>fence_seq, 0);

@@ -424,11 +423,8 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job,
  
  	trace_drm_sched_job(sched_job, entity);
  
-	spin_lock(>queue_lock);

first = spsc_queue_push(>job_queue, _job->queue_node);
  
-	spin_unlock(>queue_lock);

Is your spsc safely to be added simultaneously?

Regards,
David Zhou

-
/* first job wakes up scheduler */
if (first) {
/* Add the entity to the run queue */
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 350a62c..683eb65 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -56,7 +56,6 @@ struct drm_sched_entity {
spinlock_t  rq_lock;
struct drm_gpu_scheduler*sched;
  
-	spinlock_t			queue_lock;

struct spsc_queue   job_queue;
  
  	atomic_t			fence_seq;


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.

2018-05-15 Thread Andrey Grodzovsky
Yea, I might need to give another thought to whether this  spinlock can 
actually be removed.


Andrey

On 05/15/2018 03:38 PM, Alex Deucher wrote:

On Tue, May 15, 2018 at 3:31 PM, Andrey Grodzovsky
 wrote:

Signed-off-by: Andrey Grodzovsky 

Please provide a better patch description.

Alex


---
  drivers/gpu/drm/scheduler/gpu_scheduler.c | 4 
  include/drm/gpu_scheduler.h   | 1 -
  2 files changed, 5 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index 1f1dd70..2569a63 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -140,7 +140,6 @@ int drm_sched_entity_init(struct drm_gpu_scheduler *sched,
 entity->last_scheduled = NULL;

 spin_lock_init(>rq_lock);
-   spin_lock_init(>queue_lock);
 spsc_queue_init(>job_queue);

 atomic_set(>fence_seq, 0);
@@ -424,11 +423,8 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job,

 trace_drm_sched_job(sched_job, entity);

-   spin_lock(>queue_lock);
 first = spsc_queue_push(>job_queue, _job->queue_node);

-   spin_unlock(>queue_lock);
-
 /* first job wakes up scheduler */
 if (first) {
 /* Add the entity to the run queue */
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 350a62c..683eb65 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -56,7 +56,6 @@ struct drm_sched_entity {
 spinlock_t  rq_lock;
 struct drm_gpu_scheduler*sched;

-   spinlock_t  queue_lock;
 struct spsc_queue   job_queue;

 atomic_tfence_seq;
--
2.7.4

___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/2] drm/scheduler: Remove obsolete spinlock.

2018-05-15 Thread Alex Deucher
On Tue, May 15, 2018 at 3:31 PM, Andrey Grodzovsky
 wrote:
> Signed-off-by: Andrey Grodzovsky 

Please provide a better patch description.

Alex

> ---
>  drivers/gpu/drm/scheduler/gpu_scheduler.c | 4 
>  include/drm/gpu_scheduler.h   | 1 -
>  2 files changed, 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
> b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> index 1f1dd70..2569a63 100644
> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> @@ -140,7 +140,6 @@ int drm_sched_entity_init(struct drm_gpu_scheduler *sched,
> entity->last_scheduled = NULL;
>
> spin_lock_init(>rq_lock);
> -   spin_lock_init(>queue_lock);
> spsc_queue_init(>job_queue);
>
> atomic_set(>fence_seq, 0);
> @@ -424,11 +423,8 @@ void drm_sched_entity_push_job(struct drm_sched_job 
> *sched_job,
>
> trace_drm_sched_job(sched_job, entity);
>
> -   spin_lock(>queue_lock);
> first = spsc_queue_push(>job_queue, _job->queue_node);
>
> -   spin_unlock(>queue_lock);
> -
> /* first job wakes up scheduler */
> if (first) {
> /* Add the entity to the run queue */
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 350a62c..683eb65 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -56,7 +56,6 @@ struct drm_sched_entity {
> spinlock_t  rq_lock;
> struct drm_gpu_scheduler*sched;
>
> -   spinlock_t  queue_lock;
> struct spsc_queue   job_queue;
>
> atomic_tfence_seq;
> --
> 2.7.4
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx