[PATCH 4/4] drm/ttm: Optimize reservation slightly

2012-11-06 Thread Thomas Hellstrom
Reservation locking currently always takes place under the LRU spinlock.
Hence, strictly there is no need for an atomic_cmpxchg call; we can use
atomic_read followed by atomic_write since nobody else will ever reserve
without the lru spinlock held.
At least on Intel this should remove a locked bus cycle on successful
reserve.

Note that thit commit may be obsoleted by the cross-device reservation work.

Signed-off-by: Thomas Hellstrom 
---
 drivers/gpu/drm/ttm/ttm_bo.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index bf6e4b5..46008ea 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -220,7 +220,7 @@ int ttm_bo_reserve_locked(struct ttm_buffer_object *bo,
struct ttm_bo_global *glob = bo->glob;
int ret;

-   while (unlikely(atomic_cmpxchg(>reserved, 0, 1) != 0)) {
+   while (unlikely(atomic_read(>reserved) != 0)) {
/**
 * Deadlock avoidance for multi-bo reserving.
 */
@@ -249,6 +249,7 @@ int ttm_bo_reserve_locked(struct ttm_buffer_object *bo,
return ret;
}

+   atomic_set(>reserved, 1);
if (use_sequence) {
/**
 * Wake up waiters that may need to recheck for deadlock,
-- 
1.7.4.4



[PATCH 4/4] drm/ttm: Optimize reservation slightly

2012-11-06 Thread Thomas Hellstrom
Reservation locking currently always takes place under the LRU spinlock.
Hence, strictly there is no need for an atomic_cmpxchg call; we can use
atomic_read followed by atomic_write since nobody else will ever reserve
without the lru spinlock held.
At least on Intel this should remove a locked bus cycle on successful
reserve.

Note that thit commit may be obsoleted by the cross-device reservation work.

Signed-off-by: Thomas Hellstrom thellst...@vmware.com
---
 drivers/gpu/drm/ttm/ttm_bo.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index bf6e4b5..46008ea 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -220,7 +220,7 @@ int ttm_bo_reserve_locked(struct ttm_buffer_object *bo,
struct ttm_bo_global *glob = bo-glob;
int ret;
 
-   while (unlikely(atomic_cmpxchg(bo-reserved, 0, 1) != 0)) {
+   while (unlikely(atomic_read(bo-reserved) != 0)) {
/**
 * Deadlock avoidance for multi-bo reserving.
 */
@@ -249,6 +249,7 @@ int ttm_bo_reserve_locked(struct ttm_buffer_object *bo,
return ret;
}
 
+   atomic_set(bo-reserved, 1);
if (use_sequence) {
/**
 * Wake up waiters that may need to recheck for deadlock,
-- 
1.7.4.4

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 4/4] drm/ttm: Optimize reservation slightly

2012-11-05 Thread Thomas Hellstrom
On 11/05/2012 03:01 PM, Maarten Lankhorst wrote:
> Hey,
>
> Op 05-11-12 14:31, Thomas Hellstrom schreef:
>> Reservation locking currently always takes place under the LRU spinlock.
>> Hence, strictly there is no need for an atomic_cmpxchg call; we can use
>> atomic_read followed by atomic_write since nobody else will ever reserve
>> without the lru spinlock held.
>> At least on Intel this should remove a locked bus cycle on successful
>> reserve.
>>
>> Signed-off-by: Thomas Hellstrom 
>>
> Is that really a good thing to submit when I am busy killing lru lock around 
> reserve? :-)
>

If your patch series makes it into the same kernel, let's kill this 
patch. Otherwise it may live
at least for a kernel release. It's not a big thing to rebase against, 
and I won't complain if your
patch adds another atomic read-modify-write op here. :)

/Thomas



[PATCH 4/4] drm/ttm: Optimize reservation slightly

2012-11-05 Thread Maarten Lankhorst
Hey,

Op 05-11-12 14:31, Thomas Hellstrom schreef:
> Reservation locking currently always takes place under the LRU spinlock.
> Hence, strictly there is no need for an atomic_cmpxchg call; we can use
> atomic_read followed by atomic_write since nobody else will ever reserve
> without the lru spinlock held.
> At least on Intel this should remove a locked bus cycle on successful
> reserve.
>
> Signed-off-by: Thomas Hellstrom 
>
Is that really a good thing to submit when I am busy killing lru lock around 
reserve? :-)

-   while (unlikely(atomic_cmpxchg(>reserved, 0, 1) != 0)) {
+   while (unlikely(atomic_xchg(>reserved, 1) != 0)) {

Works without lru lock too!

In fact mutexes are done in a similar way[1], except with some more magic, and 
unlocked state is 1, not 0.
However I do think that to get that right (saves a irq disable in unlock path, 
and less wakeups in contended
case), I should really just post the mutex extension patches for reservations 
and ride the flames. It's
getting too close to real mutexes so I really want it to be a mutex in that 
case. So lets convert it.. Soon! :-)

~Maarten

[1] See linux/include/asm-generic/mutex-xchg.h and 
linux/include/asm-generic/mutex-dec.h for how
archs generally implement mutex fastpaths.



[PATCH 4/4] drm/ttm: Optimize reservation slightly

2012-11-05 Thread Thomas Hellstrom
Reservation locking currently always takes place under the LRU spinlock.
Hence, strictly there is no need for an atomic_cmpxchg call; we can use
atomic_read followed by atomic_write since nobody else will ever reserve
without the lru spinlock held.
At least on Intel this should remove a locked bus cycle on successful
reserve.

Signed-off-by: Thomas Hellstrom 
---
 drivers/gpu/drm/ttm/ttm_bo.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index bf6e4b5..46008ea 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -220,7 +220,7 @@ int ttm_bo_reserve_locked(struct ttm_buffer_object *bo,
struct ttm_bo_global *glob = bo->glob;
int ret;

-   while (unlikely(atomic_cmpxchg(>reserved, 0, 1) != 0)) {
+   while (unlikely(atomic_read(>reserved) != 0)) {
/**
 * Deadlock avoidance for multi-bo reserving.
 */
@@ -249,6 +249,7 @@ int ttm_bo_reserve_locked(struct ttm_buffer_object *bo,
return ret;
}

+   atomic_set(>reserved, 1);
if (use_sequence) {
/**
 * Wake up waiters that may need to recheck for deadlock,
-- 
1.7.4.4



[PATCH 4/4] drm/ttm: Optimize reservation slightly

2012-11-05 Thread Thomas Hellstrom
Reservation locking currently always takes place under the LRU spinlock.
Hence, strictly there is no need for an atomic_cmpxchg call; we can use
atomic_read followed by atomic_write since nobody else will ever reserve
without the lru spinlock held.
At least on Intel this should remove a locked bus cycle on successful
reserve.

Signed-off-by: Thomas Hellstrom 
---
 drivers/gpu/drm/ttm/ttm_bo.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index bf6e4b5..46008ea 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -220,7 +220,7 @@ int ttm_bo_reserve_locked(struct ttm_buffer_object *bo,
struct ttm_bo_global *glob = bo->glob;
int ret;

-   while (unlikely(atomic_cmpxchg(>reserved, 0, 1) != 0)) {
+   while (unlikely(atomic_read(>reserved) != 0)) {
/**
 * Deadlock avoidance for multi-bo reserving.
 */
@@ -249,6 +249,7 @@ int ttm_bo_reserve_locked(struct ttm_buffer_object *bo,
return ret;
}

+   atomic_set(>reserved, 1);
if (use_sequence) {
/**
 * Wake up waiters that may need to recheck for deadlock,
-- 
1.7.4.4



[PATCH 4/4] drm/ttm: Optimize reservation slightly

2012-11-05 Thread Thomas Hellstrom
Reservation locking currently always takes place under the LRU spinlock.
Hence, strictly there is no need for an atomic_cmpxchg call; we can use
atomic_read followed by atomic_write since nobody else will ever reserve
without the lru spinlock held.
At least on Intel this should remove a locked bus cycle on successful
reserve.

Signed-off-by: Thomas Hellstrom thellst...@vmware.com
---
 drivers/gpu/drm/ttm/ttm_bo.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index bf6e4b5..46008ea 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -220,7 +220,7 @@ int ttm_bo_reserve_locked(struct ttm_buffer_object *bo,
struct ttm_bo_global *glob = bo-glob;
int ret;
 
-   while (unlikely(atomic_cmpxchg(bo-reserved, 0, 1) != 0)) {
+   while (unlikely(atomic_read(bo-reserved) != 0)) {
/**
 * Deadlock avoidance for multi-bo reserving.
 */
@@ -249,6 +249,7 @@ int ttm_bo_reserve_locked(struct ttm_buffer_object *bo,
return ret;
}
 
+   atomic_set(bo-reserved, 1);
if (use_sequence) {
/**
 * Wake up waiters that may need to recheck for deadlock,
-- 
1.7.4.4

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 4/4] drm/ttm: Optimize reservation slightly

2012-11-05 Thread Maarten Lankhorst
Hey,

Op 05-11-12 14:31, Thomas Hellstrom schreef:
 Reservation locking currently always takes place under the LRU spinlock.
 Hence, strictly there is no need for an atomic_cmpxchg call; we can use
 atomic_read followed by atomic_write since nobody else will ever reserve
 without the lru spinlock held.
 At least on Intel this should remove a locked bus cycle on successful
 reserve.

 Signed-off-by: Thomas Hellstrom thellst...@vmware.com

Is that really a good thing to submit when I am busy killing lru lock around 
reserve? :-)

-   while (unlikely(atomic_cmpxchg(bo-reserved, 0, 1) != 0)) {
+   while (unlikely(atomic_xchg(bo-reserved, 1) != 0)) {

Works without lru lock too!

In fact mutexes are done in a similar way[1], except with some more magic, and 
unlocked state is 1, not 0.
However I do think that to get that right (saves a irq disable in unlock path, 
and less wakeups in contended
case), I should really just post the mutex extension patches for reservations 
and ride the flames. It's
getting too close to real mutexes so I really want it to be a mutex in that 
case. So lets convert it.. Soon! :-)

~Maarten

[1] See linux/include/asm-generic/mutex-xchg.h and 
linux/include/asm-generic/mutex-dec.h for how
archs generally implement mutex fastpaths.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 4/4] drm/ttm: Optimize reservation slightly

2012-11-05 Thread Thomas Hellstrom

On 11/05/2012 03:01 PM, Maarten Lankhorst wrote:

Hey,

Op 05-11-12 14:31, Thomas Hellstrom schreef:

Reservation locking currently always takes place under the LRU spinlock.
Hence, strictly there is no need for an atomic_cmpxchg call; we can use
atomic_read followed by atomic_write since nobody else will ever reserve
without the lru spinlock held.
At least on Intel this should remove a locked bus cycle on successful
reserve.

Signed-off-by: Thomas Hellstrom thellst...@vmware.com


Is that really a good thing to submit when I am busy killing lru lock around 
reserve? :-)



If your patch series makes it into the same kernel, let's kill this 
patch. Otherwise it may live
at least for a kernel release. It's not a big thing to rebase against, 
and I won't complain if your

patch adds another atomic read-modify-write op here. :)

/Thomas

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel