date:20120509

In Diablo III when the ground is covered in water, it's all messed-up.

2012-05-09 Thread Mike Mestnik

I found a rendering error in Diablo III, in a well there ground is
covered in water and it's all red and flickers.

http://c566453.r53.cf2.rackcdn.com/DiabloIII-well-wine-preloader.trace.xz 3.5G,
234665296 Bytes compressed ~234M.

EE r600_shader.c:1605 r600_shader_from_tgsi - GPR limit exceeded -
shader requires 181 registers
EE r600_shader.c:140 r600_pipe_shader_create - translation from TGSI
failed !
Rendered 621 frames in 80.724 secs, average of 7.69288 fps

Include request for SA improvements

2012-05-09 Thread Dave Airlie

On Wed, May 9, 2012 at 3:31 PM, Jerome Glisse  wrote:
> On Wed, May 9, 2012 at 9:34 AM, Christian K?nig  
> wrote:
>> Hi Dave & Jerome and everybody on the list,
>>
>> I can't find any more bugs and also I'm out of things to test, so I really
>> hope that this is the last incarnation of this patchset, and if Jerome is
>> ok with it it should now be included into drm-next.
>>
>> Cheers,
>> Christian.
>>
>
> Yeah looks good to me.
>

All pushed into -next + the warning fix on top.

Thanks guys,
Dave.

[Bug 43858] DVI of ATI RADEON 9200 AGP don't work

2012-05-09 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=43858

Alex Deucher  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||NOTABUG

--- Comment #27 from Alex Deucher  2012-05-09 12:00:09 PDT 
---
The DVI issue is fixed.  Please open a new bug if you are still having gfx
problems.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.

[PATCH] drm/radeon/kms: fix warning on 32-bit in atomic fence printing

2012-05-09 Thread Dave Airlie

From: Dave Airlie 

/ssd/git/drm-core-next/drivers/gpu/drm/radeon/radeon_fence.c: In function 
?radeon_debugfs_fence_info?:
/ssd/git/drm-core-next/drivers/gpu/drm/radeon/radeon_fence.c:606:7: warning: 
format ?%lx? expects argument of type ?long unsigned int?, but argument 3 has 
type ?long long int? [-Wformat]

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/radeon/radeon_fence.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
b/drivers/gpu/drm/radeon/radeon_fence.c
index 48ec5e3..11f5f40 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -602,8 +602,8 @@ static int radeon_debugfs_fence_info(struct seq_file *m, 
void *data)
continue;

seq_printf(m, "--- ring %d ---\n", i);
-   seq_printf(m, "Last signaled fence 0x%016lx\n",
-  atomic64_read(>fence_drv[i].last_seq));
+   seq_printf(m, "Last signaled fence 0x%016llx\n",
+  (unsigned long 
long)atomic64_read(>fence_drv[i].last_seq));
seq_printf(m, "Last emitted  0x%016llx\n",
   rdev->fence_drv[i].seq);
}
-- 
1.7.7.6

[PATCH 19/19] drm/radeon: make the ib an inline object

2012-05-09 Thread Christian König

From: Jerome Glisse 

No need to malloc it any more.

Signed-off-by: Jerome Glisse 
Signed-off-by: Christian K?nig 
---
 drivers/gpu/drm/radeon/evergreen_cs.c |   10 +++---
 drivers/gpu/drm/radeon/r100.c |   38 ++--
 drivers/gpu/drm/radeon/r200.c |2 +-
 drivers/gpu/drm/radeon/r300.c |4 +--
 drivers/gpu/drm/radeon/r600.c |   16 -
 drivers/gpu/drm/radeon/r600_cs.c  |   22 ++--
 drivers/gpu/drm/radeon/radeon.h   |8 ++---
 drivers/gpu/drm/radeon/radeon_cs.c|   63 -
 drivers/gpu/drm/radeon/radeon_ring.c  |   41 +++--
 9 files changed, 93 insertions(+), 111 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c 
b/drivers/gpu/drm/radeon/evergreen_cs.c
index 70089d3..4e7dd2b 100644
--- a/drivers/gpu/drm/radeon/evergreen_cs.c
+++ b/drivers/gpu/drm/radeon/evergreen_cs.c
@@ -1057,7 +1057,7 @@ static int evergreen_cs_packet_parse_vline(struct 
radeon_cs_parser *p)
uint32_t header, h_idx, reg, wait_reg_mem_info;
volatile uint32_t *ib;

-   ib = p->ib->ptr;
+   ib = p->ib.ptr;

/* parse the WAIT_REG_MEM */
r = evergreen_cs_packet_parse(p, _reg_mem, p->idx);
@@ -1215,7 +1215,7 @@ static int evergreen_cs_check_reg(struct radeon_cs_parser 
*p, u32 reg, u32 idx)
if (!(evergreen_reg_safe_bm[i] & m))
return 0;
}
-   ib = p->ib->ptr;
+   ib = p->ib.ptr;
switch (reg) {
/* force following reg to 0 in an attempt to disable out buffer
 * which will need us to better understand how it works to perform
@@ -1896,7 +1896,7 @@ static int evergreen_packet3_check(struct 
radeon_cs_parser *p,
u32 idx_value;

track = (struct evergreen_cs_track *)p->track;
-   ib = p->ib->ptr;
+   ib = p->ib.ptr;
idx = pkt->idx + 1;
idx_value = radeon_get_ib_value(p, idx);

@@ -2610,8 +2610,8 @@ int evergreen_cs_parse(struct radeon_cs_parser *p)
}
} while (p->idx < p->chunks[p->chunk_ib_idx].length_dw);
 #if 0
-   for (r = 0; r < p->ib->length_dw; r++) {
-   printk(KERN_INFO "%05d  0x%08X\n", r, p->ib->ptr[r]);
+   for (r = 0; r < p->ib.length_dw; r++) {
+   printk(KERN_INFO "%05d  0x%08X\n", r, p->ib.ptr[r]);
mdelay(1);
}
 #endif
diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index ad6ceb7..0874a6d 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -139,9 +139,9 @@ int r100_reloc_pitch_offset(struct radeon_cs_parser *p,
}

tmp |= tile_flags;
-   p->ib->ptr[idx] = (value & 0x3fc0) | tmp;
+   p->ib.ptr[idx] = (value & 0x3fc0) | tmp;
} else
-   p->ib->ptr[idx] = (value & 0xffc0) | tmp;
+   p->ib.ptr[idx] = (value & 0xffc0) | tmp;
return 0;
 }

@@ -156,7 +156,7 @@ int r100_packet3_load_vbpntr(struct radeon_cs_parser *p,
volatile uint32_t *ib;
u32 idx_value;

-   ib = p->ib->ptr;
+   ib = p->ib.ptr;
track = (struct r100_cs_track *)p->track;
c = radeon_get_ib_value(p, idx++) & 0x1F;
if (c > 16) {
@@ -1275,7 +1275,7 @@ void r100_cs_dump_packet(struct radeon_cs_parser *p,
unsigned i;
unsigned idx;

-   ib = p->ib->ptr;
+   ib = p->ib.ptr;
idx = pkt->idx;
for (i = 0; i <= (pkt->count + 1); i++, idx++) {
DRM_INFO("ib[%d]=0x%08X\n", idx, ib[idx]);
@@ -1354,7 +1354,7 @@ int r100_cs_packet_parse_vline(struct radeon_cs_parser *p)
uint32_t header, h_idx, reg;
volatile uint32_t *ib;

-   ib = p->ib->ptr;
+   ib = p->ib.ptr;

/* parse the wait until */
r = r100_cs_packet_parse(p, , p->idx);
@@ -1533,7 +1533,7 @@ static int r100_packet0_check(struct radeon_cs_parser *p,
u32 tile_flags = 0;
u32 idx_value;

-   ib = p->ib->ptr;
+   ib = p->ib.ptr;
track = (struct r100_cs_track *)p->track;

idx_value = radeon_get_ib_value(p, idx);
@@ -1889,7 +1889,7 @@ static int r100_packet3_check(struct radeon_cs_parser *p,
volatile uint32_t *ib;
int r;

-   ib = p->ib->ptr;
+   ib = p->ib.ptr;
idx = pkt->idx + 1;
track = (struct r100_cs_track *)p->track;
switch (pkt->opcode) {
@@ -3684,7 +3684,7 @@ void r100_ring_ib_execute(struct radeon_device *rdev, 
struct radeon_ib *ib)

 int r100_ib_test(struct radeon_device *rdev, struct radeon_ring *ring)
 {
-   struct radeon_ib *ib;
+   struct radeon_ib ib;
uint32_t scratch;
uint32_t tmp = 0;
unsigned i;
@@ -3700,22 +3700,22 @@ int r100_ib_test(struct radeon_device *rdev, struct 
radeon_ring *ring)
if (r) {
return r;
}
-   ib->ptr[0] = PACKET0(scratch, 0);
-   ib->ptr[1] =

[PATCH 18/19] drm/radeon: remove r600 blit mutex v2

2012-05-09 Thread Christian König

If we don't store local data into global variables
it isn't necessary to lock anything.

v2: rebased on new SA interface

Signed-off-by: Christian K?nig 
---
 drivers/gpu/drm/radeon/evergreen_blit_kms.c |1 -
 drivers/gpu/drm/radeon/r600.c   |   13 ++--
 drivers/gpu/drm/radeon/r600_blit_kms.c  |   99 +++
 drivers/gpu/drm/radeon/radeon.h |3 -
 drivers/gpu/drm/radeon/radeon_asic.h|9 ++-
 5 files changed, 50 insertions(+), 75 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen_blit_kms.c 
b/drivers/gpu/drm/radeon/evergreen_blit_kms.c
index 222acd2..30f0480 100644
--- a/drivers/gpu/drm/radeon/evergreen_blit_kms.c
+++ b/drivers/gpu/drm/radeon/evergreen_blit_kms.c
@@ -637,7 +637,6 @@ int evergreen_blit_init(struct radeon_device *rdev)
if (rdev->r600_blit.shader_obj)
goto done;

-   mutex_init(>r600_blit.mutex);
rdev->r600_blit.state_offset = 0;

if (rdev->family < CHIP_CAYMAN)
diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index 478b51e..00b2238 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -2363,20 +2363,15 @@ int r600_copy_blit(struct radeon_device *rdev,
   unsigned num_gpu_pages,
   struct radeon_fence *fence)
 {
+   struct radeon_sa_bo *vb = NULL;
int r;

-   mutex_lock(>r600_blit.mutex);
-   rdev->r600_blit.vb_ib = NULL;
-   r = r600_blit_prepare_copy(rdev, num_gpu_pages);
+   r = r600_blit_prepare_copy(rdev, num_gpu_pages, );
if (r) {
-   if (rdev->r600_blit.vb_ib)
-   radeon_ib_free(rdev, >r600_blit.vb_ib);
-   mutex_unlock(>r600_blit.mutex);
return r;
}
-   r600_kms_blit_copy(rdev, src_offset, dst_offset, num_gpu_pages);
-   r600_blit_done_copy(rdev, fence);
-   mutex_unlock(>r600_blit.mutex);
+   r600_kms_blit_copy(rdev, src_offset, dst_offset, num_gpu_pages, vb);
+   r600_blit_done_copy(rdev, fence, vb);
return 0;
 }

diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c 
b/drivers/gpu/drm/radeon/r600_blit_kms.c
index db38f58..ef20822 100644
--- a/drivers/gpu/drm/radeon/r600_blit_kms.c
+++ b/drivers/gpu/drm/radeon/r600_blit_kms.c
@@ -513,7 +513,6 @@ int r600_blit_init(struct radeon_device *rdev)
rdev->r600_blit.primitives.set_default_state = set_default_state;

rdev->r600_blit.ring_size_common = 40; /* shaders + def state */
-   rdev->r600_blit.ring_size_common += 16; /* fence emit for VB IB */
rdev->r600_blit.ring_size_common += 5; /* done copy */
rdev->r600_blit.ring_size_common += 16; /* fence emit for done copy */

@@ -528,7 +527,6 @@ int r600_blit_init(struct radeon_device *rdev)
if (rdev->r600_blit.shader_obj)
goto done;

-   mutex_init(>r600_blit.mutex);
rdev->r600_blit.state_offset = 0;

if (rdev->family >= CHIP_RV770)
@@ -621,27 +619,6 @@ void r600_blit_fini(struct radeon_device *rdev)
radeon_bo_unref(>r600_blit.shader_obj);
 }

-static int r600_vb_ib_get(struct radeon_device *rdev, unsigned size)
-{
-   int r;
-   r = radeon_ib_get(rdev, RADEON_RING_TYPE_GFX_INDEX,
- >r600_blit.vb_ib, size);
-   if (r) {
-   DRM_ERROR("failed to get IB for vertex buffer\n");
-   return r;
-   }
-
-   rdev->r600_blit.vb_total = size;
-   rdev->r600_blit.vb_used = 0;
-   return 0;
-}
-
-static void r600_vb_ib_put(struct radeon_device *rdev)
-{
-   radeon_fence_emit(rdev, rdev->r600_blit.vb_ib->fence);
-   radeon_ib_free(rdev, >r600_blit.vb_ib);
-}
-
 static unsigned r600_blit_create_rect(unsigned num_gpu_pages,
  int *width, int *height, int max_dim)
 {
@@ -688,7 +665,8 @@ static unsigned r600_blit_create_rect(unsigned 
num_gpu_pages,
 }


-int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_gpu_pages)
+int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_gpu_pages,
+  struct radeon_sa_bo **vb)
 {
struct radeon_ring *ring = >ring[RADEON_RING_TYPE_GFX_INDEX];
int r;
@@ -705,46 +683,54 @@ int r600_blit_prepare_copy(struct radeon_device *rdev, 
unsigned num_gpu_pages)
}

/* 48 bytes for vertex per loop */
-   r = r600_vb_ib_get(rdev, (num_loops*48)+256);
-   if (r)
+   r = radeon_sa_bo_new(rdev, >ring_tmp_bo, vb,
+(num_loops*48)+256, 256, true);
+   if (r) {
return r;
+   }

/* calculate number of loops correctly */
ring_size = num_loops * dwords_per_loop;
ring_size += rdev->r600_blit.ring_size_common;
r = radeon_ring_lock(rdev, ring, ring_size);
-   if (r)
+   if (r) {
+   radeon_sa_bo_free(rdev, vb, NULL);
return r;
+   }

[PATCH 17/19] drm/radeon: move the semaphore from the fence into the ib

2012-05-09 Thread Christian König

From: Jerome Glisse 

It never really belonged there in the first place.

Signed-off-by: Christian K?nig 
---
 drivers/gpu/drm/radeon/radeon.h   |   16 
 drivers/gpu/drm/radeon/radeon_cs.c|4 ++--
 drivers/gpu/drm/radeon/radeon_fence.c |3 ---
 drivers/gpu/drm/radeon/radeon_ring.c  |2 ++
 4 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 6170307..9507be0 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -272,7 +272,6 @@ struct radeon_fence {
uint64_tseq;
/* RB, DMA, etc. */
unsignedring;
-   struct radeon_semaphore *semaphore;
 };

 int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
@@ -624,13 +623,14 @@ void radeon_irq_kms_pflip_irq_put(struct radeon_device 
*rdev, int crtc);
  */

 struct radeon_ib {
-   struct radeon_sa_bo *sa_bo;
-   uint32_tlength_dw;
-   uint64_tgpu_addr;
-   uint32_t*ptr;
-   struct radeon_fence *fence;
-   unsignedvm_id;
-   boolis_const_ib;
+   struct radeon_sa_bo *sa_bo;
+   uint32_tlength_dw;
+   uint64_tgpu_addr;
+   uint32_t*ptr;
+   struct radeon_fence *fence;
+   unsignedvm_id;
+   boolis_const_ib;
+   struct radeon_semaphore *semaphore;
 };

 struct radeon_ring {
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index 5c065bf..dcfe2a0 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -138,12 +138,12 @@ static int radeon_cs_sync_rings(struct radeon_cs_parser 
*p)
return 0;
}

-   r = radeon_semaphore_create(p->rdev, >ib->fence->semaphore);
+   r = radeon_semaphore_create(p->rdev, >ib->semaphore);
if (r) {
return r;
}

-   return radeon_semaphore_sync_rings(p->rdev, p->ib->fence->semaphore,
+   return radeon_semaphore_sync_rings(p->rdev, p->ib->semaphore,
   sync_to_ring, p->ring);
 }

diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
b/drivers/gpu/drm/radeon/radeon_fence.c
index 3a49311..48ec5e3 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -139,8 +139,6 @@ static void radeon_fence_destroy(struct kref *kref)

fence = container_of(kref, struct radeon_fence, kref);
fence->seq = RADEON_FENCE_NOTEMITED_SEQ;
-   if (fence->semaphore)
-   radeon_semaphore_free(fence->rdev, fence->semaphore, NULL);
kfree(fence);
 }

@@ -156,7 +154,6 @@ int radeon_fence_create(struct radeon_device *rdev,
(*fence)->rdev = rdev;
(*fence)->seq = RADEON_FENCE_NOTEMITED_SEQ;
(*fence)->ring = ring;
-   (*fence)->semaphore = NULL;
return 0;
 }

diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
b/drivers/gpu/drm/radeon/radeon_ring.c
index b3d6942..af8e1ee 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -93,6 +93,7 @@ int radeon_ib_get(struct radeon_device *rdev, int ring,
(*ib)->gpu_addr = radeon_sa_bo_gpu_addr((*ib)->sa_bo);
(*ib)->vm_id = 0;
(*ib)->is_const_ib = false;
+   (*ib)->semaphore = NULL;

return 0;
 }
@@ -105,6 +106,7 @@ void radeon_ib_free(struct radeon_device *rdev, struct 
radeon_ib **ib)
if (tmp == NULL) {
return;
}
+   radeon_semaphore_free(rdev, tmp->semaphore, tmp->fence);
radeon_sa_bo_free(rdev, >sa_bo, tmp->fence);
radeon_fence_unref(>fence);
kfree(tmp);
-- 
1.7.9.5

[PATCH 16/19] drm/radeon: immediately free ttm-move semaphore

2012-05-09 Thread Christian König

We can now protected the semaphore ram by a
fence, so free it immediately.

Signed-off-by: Christian K?nig 
---
 drivers/gpu/drm/radeon/radeon_ttm.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 5e3d54d..0f6aee8 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -223,6 +223,7 @@ static int radeon_move_blit(struct ttm_buffer_object *bo,
struct radeon_device *rdev;
uint64_t old_start, new_start;
struct radeon_fence *fence, *old_fence;
+   struct radeon_semaphore *sem = NULL;
int r;

rdev = radeon_get_rdev(bo->bdev);
@@ -272,15 +273,16 @@ static int radeon_move_blit(struct ttm_buffer_object *bo,
bool sync_to_ring[RADEON_NUM_RINGS] = { };
sync_to_ring[old_fence->ring] = true;

-   r = radeon_semaphore_create(rdev, >semaphore);
+   r = radeon_semaphore_create(rdev, );
if (r) {
radeon_fence_unref();
return r;
}

-   r = radeon_semaphore_sync_rings(rdev, fence->semaphore,
+   r = radeon_semaphore_sync_rings(rdev, sem,
sync_to_ring, fence->ring);
if (r) {
+   radeon_semaphore_free(rdev, sem, NULL);
radeon_fence_unref();
return r;
}
@@ -292,6 +294,7 @@ static int radeon_move_blit(struct ttm_buffer_object *bo,
/* FIXME: handle copy error */
r = ttm_bo_move_accel_cleanup(bo, (void *)fence, NULL,
  evict, no_wait_reserve, no_wait_gpu, 
new_mem);
+   radeon_semaphore_free(rdev, sem, fence);
radeon_fence_unref();
return r;
 }
-- 
1.7.9.5

[PATCH 15/19] drm/radeon: rip out the ib pool

2012-05-09 Thread Christian König

From: Jerome Glisse 

It isn't necessary any more and the suballocator seems to perform
even better.

Signed-off-by: Christian K?nig 
Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/radeon.h   |   17 +-
 drivers/gpu/drm/radeon/radeon_device.c|1 -
 drivers/gpu/drm/radeon/radeon_gart.c  |   12 +-
 drivers/gpu/drm/radeon/radeon_ring.c  |  241 -
 drivers/gpu/drm/radeon/radeon_semaphore.c |2 +-
 5 files changed, 71 insertions(+), 202 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 45164e1..6170307 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -625,7 +625,6 @@ void radeon_irq_kms_pflip_irq_put(struct radeon_device 
*rdev, int crtc);

 struct radeon_ib {
struct radeon_sa_bo *sa_bo;
-   unsignedidx;
uint32_tlength_dw;
uint64_tgpu_addr;
uint32_t*ptr;
@@ -634,18 +633,6 @@ struct radeon_ib {
boolis_const_ib;
 };

-/*
- * locking -
- * mutex protects scheduled_ibs, ready, alloc_bm
- */
-struct radeon_ib_pool {
-   struct radeon_mutex mutex;
-   struct radeon_sa_managersa_manager;
-   struct radeon_ibibs[RADEON_IB_POOL_SIZE];
-   boolready;
-   unsignedhead_id;
-};
-
 struct radeon_ring {
struct radeon_bo*ring_obj;
volatile uint32_t   *ring;
@@ -787,7 +774,6 @@ struct si_rlc {
 int radeon_ib_get(struct radeon_device *rdev, int ring,
  struct radeon_ib **ib, unsigned size);
 void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib);
-bool radeon_ib_try_free(struct radeon_device *rdev, struct radeon_ib *ib);
 int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib);
 int radeon_ib_pool_init(struct radeon_device *rdev);
 void radeon_ib_pool_fini(struct radeon_device *rdev);
@@ -1522,7 +1508,8 @@ struct radeon_device {
wait_queue_head_t   fence_queue;
struct mutexring_lock;
struct radeon_ring  ring[RADEON_NUM_RINGS];
-   struct radeon_ib_pool   ib_pool;
+   boolib_pool_ready;
+   struct radeon_sa_managerring_tmp_bo;
struct radeon_irq   irq;
struct radeon_asic  *asic;
struct radeon_gem   gem;
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 48876c1..e1bc7e9 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -724,7 +724,6 @@ int radeon_device_init(struct radeon_device *rdev,
/* mutex initialization are all done here so we
 * can recall function without having locking issues */
radeon_mutex_init(>cs_mutex);
-   radeon_mutex_init(>ib_pool.mutex);
mutex_init(>ring_lock);
mutex_init(>dc_hw_i2c_mutex);
if (rdev->family >= CHIP_R600)
diff --git a/drivers/gpu/drm/radeon/radeon_gart.c 
b/drivers/gpu/drm/radeon/radeon_gart.c
index 53dba8e..8e9ef34 100644
--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -432,8 +432,8 @@ retry_id:
rdev->vm_manager.use_bitmap |= 1 << id;
vm->id = id;
list_add_tail(>list, >vm_manager.lru_vm);
-   return radeon_vm_bo_update_pte(rdev, vm, rdev->ib_pool.sa_manager.bo,
-  >ib_pool.sa_manager.bo->tbo.mem);
+   return radeon_vm_bo_update_pte(rdev, vm, rdev->ring_tmp_bo.bo,
+  >ring_tmp_bo.bo->tbo.mem);
 }

 /* object have to be reserved */
@@ -631,7 +631,7 @@ int radeon_vm_init(struct radeon_device *rdev, struct 
radeon_vm *vm)
/* map the ib pool buffer at 0 in virtual address space, set
 * read only
 */
-   r = radeon_vm_bo_add(rdev, vm, rdev->ib_pool.sa_manager.bo, 0,
+   r = radeon_vm_bo_add(rdev, vm, rdev->ring_tmp_bo.bo, 0,
 RADEON_VM_PAGE_READABLE | RADEON_VM_PAGE_SNOOPED);
return r;
 }
@@ -648,12 +648,12 @@ void radeon_vm_fini(struct radeon_device *rdev, struct 
radeon_vm *vm)
radeon_mutex_unlock(>cs_mutex);

/* remove all bo */
-   r = radeon_bo_reserve(rdev->ib_pool.sa_manager.bo, false);
+   r = radeon_bo_reserve(rdev->ring_tmp_bo.bo, false);
if (!r) {
-   bo_va = radeon_bo_va(rdev->ib_pool.sa_manager.bo, vm);
+   bo_va = radeon_bo_va(rdev->ring_tmp_bo.bo, vm);
list_del_init(_va->bo_list);
list_del_init(_va->vm_list);
-   radeon_bo_unreserve(rdev->ib_pool.sa_manager.bo);
+   radeon_bo_unreserve(rdev->ring_tmp_bo.bo);
kfree(bo_va);
}
if

[PATCH 14/19] drm/radeon: simplify semaphore handling v2

2012-05-09 Thread Christian König

From: Jerome Glisse 

Directly use the suballocator to get small chunks of memory.
It's equally fast and doesn't crash when we encounter a GPU reset.

v2: rebased on new SA interface.

Signed-off-by: Christian K?nig 
Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/evergreen.c|1 -
 drivers/gpu/drm/radeon/ni.c   |1 -
 drivers/gpu/drm/radeon/r600.c |1 -
 drivers/gpu/drm/radeon/radeon.h   |   29 +-
 drivers/gpu/drm/radeon/radeon_device.c|2 -
 drivers/gpu/drm/radeon/radeon_fence.c |2 +-
 drivers/gpu/drm/radeon/radeon_semaphore.c |  137 +
 drivers/gpu/drm/radeon/radeon_test.c  |4 +-
 drivers/gpu/drm/radeon/rv770.c|1 -
 drivers/gpu/drm/radeon/si.c   |1 -
 10 files changed, 30 insertions(+), 149 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c 
b/drivers/gpu/drm/radeon/evergreen.c
index ecc29bc..7e7ac3d 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -3550,7 +3550,6 @@ void evergreen_fini(struct radeon_device *rdev)
evergreen_pcie_gart_fini(rdev);
r600_vram_scratch_fini(rdev);
radeon_gem_fini(rdev);
-   radeon_semaphore_driver_fini(rdev);
radeon_fence_driver_fini(rdev);
radeon_agp_fini(rdev);
radeon_bo_fini(rdev);
diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c
index 9cd2657..107b217 100644
--- a/drivers/gpu/drm/radeon/ni.c
+++ b/drivers/gpu/drm/radeon/ni.c
@@ -1744,7 +1744,6 @@ void cayman_fini(struct radeon_device *rdev)
cayman_pcie_gart_fini(rdev);
r600_vram_scratch_fini(rdev);
radeon_gem_fini(rdev);
-   radeon_semaphore_driver_fini(rdev);
radeon_fence_driver_fini(rdev);
radeon_bo_fini(rdev);
radeon_atombios_fini(rdev);
diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index d02f13f..478b51e 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -2658,7 +2658,6 @@ void r600_fini(struct radeon_device *rdev)
r600_vram_scratch_fini(rdev);
radeon_agp_fini(rdev);
radeon_gem_fini(rdev);
-   radeon_semaphore_driver_fini(rdev);
radeon_fence_driver_fini(rdev);
radeon_bo_fini(rdev);
radeon_atombios_fini(rdev);
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index cc7f16a..45164e1 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -434,34 +434,13 @@ int radeon_mode_dumb_destroy(struct drm_file *file_priv,
 /*
  * Semaphores.
  */
-struct radeon_ring;
-
-#defineRADEON_SEMAPHORE_BO_SIZE256
-
-struct radeon_semaphore_driver {
-   rwlock_tlock;
-   struct list_headbo;
-};
-
-struct radeon_semaphore_bo;
-
 /* everything here is constant */
 struct radeon_semaphore {
-   struct list_headlist;
+   struct radeon_sa_bo *sa_bo;
+   signed  waiters;
uint64_tgpu_addr;
-   uint32_t*cpu_ptr;
-   struct radeon_semaphore_bo  *bo;
 };

-struct radeon_semaphore_bo {
-   struct list_headlist;
-   struct radeon_ib*ib;
-   struct list_headfree;
-   struct radeon_semaphore semaphores[RADEON_SEMAPHORE_BO_SIZE/8];
-   unsignednused;
-};
-
-void radeon_semaphore_driver_fini(struct radeon_device *rdev);
 int radeon_semaphore_create(struct radeon_device *rdev,
struct radeon_semaphore **semaphore);
 void radeon_semaphore_emit_signal(struct radeon_device *rdev, int ring,
@@ -473,7 +452,8 @@ int radeon_semaphore_sync_rings(struct radeon_device *rdev,
bool sync_to[RADEON_NUM_RINGS],
int dst_ring);
 void radeon_semaphore_free(struct radeon_device *rdev,
-  struct radeon_semaphore *semaphore);
+  struct radeon_semaphore *semaphore,
+  struct radeon_fence *fence);

 /*
  * GART structures, functions & helpers
@@ -1540,7 +1520,6 @@ struct radeon_device {
struct radeon_mman  mman;
struct radeon_fence_driver  fence_drv[RADEON_NUM_RINGS];
wait_queue_head_t   fence_queue;
-   struct radeon_semaphore_driver  semaphore_drv;
struct mutexring_lock;
struct radeon_ring  ring[RADEON_NUM_RINGS];
struct radeon_ib_pool   ib_pool;
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index b827b2e..48876c1 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -732,11 +732,9 @@ int

[PATCH 13/19] drm/radeon: multiple ring allocator v3

2012-05-09 Thread Christian König

A startover with a new idea for a multiple ring allocator.
Should perform as well as a normal ring allocator as long
as only one ring does somthing, but falls back to a more
complex algorithm if more complex things start to happen.

We store the last allocated bo in last, we always try to allocate
after the last allocated bo. Principle is that in a linear GPU ring
progression was is after last is the oldest bo we allocated and thus
the first one that should no longer be in use by the GPU.

If it's not the case we skip over the bo after last to the closest
done bo if such one exist. If none exist and we are not asked to
block we report failure to allocate.

If we are asked to block we wait on all the oldest fence of all
rings. We just wait for any of those fence to complete.

v2: We need to be able to let hole point to the list_head, otherwise
try free will never free the first allocation of the list. Also
stop calling radeon_fence_signalled more than necessary.

v3: Don't free allocations without considering them as a hole,
otherwise we might lose holes. Also return ENOMEM instead of ENOENT
when running out of fences to wait for. Limit the number of holes
we try for each ring to 3.

Signed-off-by: Christian K?nig 
Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/radeon.h  |7 +-
 drivers/gpu/drm/radeon/radeon_ring.c |   19 +--
 drivers/gpu/drm/radeon/radeon_sa.c   |  312 --
 3 files changed, 231 insertions(+), 107 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 37a7459..cc7f16a 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -385,7 +385,9 @@ struct radeon_bo_list {
 struct radeon_sa_manager {
spinlock_t  lock;
struct radeon_bo*bo;
-   struct list_headsa_bo;
+   struct list_head*hole;
+   struct list_headflist[RADEON_NUM_RINGS];
+   struct list_headolist;
unsignedsize;
uint64_tgpu_addr;
void*cpu_ptr;
@@ -396,7 +398,8 @@ struct radeon_sa_bo;

 /* sub-allocation buffer */
 struct radeon_sa_bo {
-   struct list_headlist;
+   struct list_headolist;
+   struct list_headflist;
struct radeon_sa_manager*manager;
unsignedsoffset;
unsignedeoffset;
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
b/drivers/gpu/drm/radeon/radeon_ring.c
index 1748d93..e074ff5 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -204,25 +204,22 @@ int radeon_ib_schedule(struct radeon_device *rdev, struct 
radeon_ib *ib)

 int radeon_ib_pool_init(struct radeon_device *rdev)
 {
-   struct radeon_sa_manager tmp;
int i, r;

-   r = radeon_sa_bo_manager_init(rdev, ,
- RADEON_IB_POOL_SIZE*64*1024,
- RADEON_GEM_DOMAIN_GTT);
-   if (r) {
-   return r;
-   }
-
radeon_mutex_lock(>ib_pool.mutex);
if (rdev->ib_pool.ready) {
radeon_mutex_unlock(>ib_pool.mutex);
-   radeon_sa_bo_manager_fini(rdev, );
return 0;
}

-   rdev->ib_pool.sa_manager = tmp;
-   INIT_LIST_HEAD(>ib_pool.sa_manager.sa_bo);
+   r = radeon_sa_bo_manager_init(rdev, >ib_pool.sa_manager,
+ RADEON_IB_POOL_SIZE*64*1024,
+ RADEON_GEM_DOMAIN_GTT);
+   if (r) {
+   radeon_mutex_unlock(>ib_pool.mutex);
+   return r;
+   }
+
for (i = 0; i < RADEON_IB_POOL_SIZE; i++) {
rdev->ib_pool.ibs[i].fence = NULL;
rdev->ib_pool.ibs[i].idx = i;
diff --git a/drivers/gpu/drm/radeon/radeon_sa.c 
b/drivers/gpu/drm/radeon/radeon_sa.c
index 90ee8ad..c3ac7f4 100644
--- a/drivers/gpu/drm/radeon/radeon_sa.c
+++ b/drivers/gpu/drm/radeon/radeon_sa.c
@@ -27,21 +27,42 @@
  * Authors:
  *Jerome Glisse 
  */
+/* Algorithm:
+ *
+ * We store the last allocated bo in "hole", we always try to allocate
+ * after the last allocated bo. Principle is that in a linear GPU ring
+ * progression was is after last is the oldest bo we allocated and thus
+ * the first one that should no longer be in use by the GPU.
+ *
+ * If it's not the case we skip over the bo after last to the closest
+ * done bo if such one exist. If none exist and we are not asked to
+ * block we report failure to allocate.
+ *
+ * If we are asked to block we wait on all the oldest fence of all
+ * rings. We just wait for any of those fence to complete.
+ */
 #include "drmP.h"
 #include "drm.h"
 #include "radeon.h"

+static void radeon_sa_bo_remove_locked(struct radeon_sa_bo *sa_bo);
+static void radeon_sa_bo_try_free(struct

[PATCH 12/19] drm/radeon: use one wait queue for all rings add fence_wait_any v2

2012-05-09 Thread Christian König

From: Jerome Glisse 

Use one wait queue for all rings. When one ring progress, other
likely does to and we are not expecting to have a lot of waiter
anyway.

Also add a fence_wait_any that will wait until the first fence
in the fence array (one fence per ring) is signaled. This allow
to wait on all rings.

v2: some minor cleanups and improvements.

Signed-off-by: Christian K?nig 
Signed-off-by: Jerome Glisse 
---
 drivers/gpu/drm/radeon/radeon.h   |5 +-
 drivers/gpu/drm/radeon/radeon_fence.c |  165 +++--
 2 files changed, 163 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index ada70d1..37a7459 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -262,7 +262,6 @@ struct radeon_fence_driver {
uint64_tseq;
atomic64_t  last_seq;
unsigned long   last_activity;
-   wait_queue_head_t   queue;
boolinitialized;
 };

@@ -286,6 +285,9 @@ bool radeon_fence_signaled(struct radeon_fence *fence);
 int radeon_fence_wait(struct radeon_fence *fence, bool interruptible);
 int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring);
 int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring);
+int radeon_fence_wait_any(struct radeon_device *rdev,
+ struct radeon_fence **fences,
+ bool intr);
 struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence);
 void radeon_fence_unref(struct radeon_fence **fence);
 unsigned radeon_fence_count_emitted(struct radeon_device *rdev, int ring);
@@ -1534,6 +1536,7 @@ struct radeon_device {
struct radeon_scratch   scratch;
struct radeon_mman  mman;
struct radeon_fence_driver  fence_drv[RADEON_NUM_RINGS];
+   wait_queue_head_t   fence_queue;
struct radeon_semaphore_driver  semaphore_drv;
struct mutexring_lock;
struct radeon_ring  ring[RADEON_NUM_RINGS];
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
b/drivers/gpu/drm/radeon/radeon_fence.c
index 098d1fa..14dbc28 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -129,7 +129,7 @@ void radeon_fence_process(struct radeon_device *rdev, int 
ring)

if (wake) {
rdev->fence_drv[ring].last_activity = jiffies;
-   wake_up_all(>fence_drv[ring].queue);
+   wake_up_all(>fence_queue);
}
 }

@@ -224,11 +224,11 @@ static int radeon_fence_wait_seq(struct radeon_device 
*rdev, u64 target_seq,
trace_radeon_fence_wait_begin(rdev->ddev, seq);
radeon_irq_kms_sw_irq_get(rdev, ring);
if (intr) {
-   r = 
wait_event_interruptible_timeout(rdev->fence_drv[ring].queue,
+   r = wait_event_interruptible_timeout(rdev->fence_queue,
(signaled = radeon_fence_seq_signaled(rdev, 
target_seq, ring)),
timeout);
 } else {
-   r = wait_event_timeout(rdev->fence_drv[ring].queue,
+   r = wait_event_timeout(rdev->fence_queue,
(signaled = radeon_fence_seq_signaled(rdev, 
target_seq, ring)),
timeout);
}
@@ -306,6 +306,159 @@ int radeon_fence_wait(struct radeon_fence *fence, bool 
intr)
return 0;
 }

+bool radeon_fence_any_seq_signaled(struct radeon_device *rdev, u64 *seq)
+{
+   unsigned i;
+
+   for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+   if (seq[i] && radeon_fence_seq_signaled(rdev, seq[i], i)) {
+   return true;
+   }
+   }
+   return false;
+}
+
+static int radeon_fence_wait_any_seq(struct radeon_device *rdev,
+u64 *target_seq, bool intr)
+{
+   unsigned long timeout, last_activity, tmp;
+   unsigned i, ring = RADEON_NUM_RINGS;
+   bool signaled;
+   int r;
+
+   for (i = 0, last_activity = 0; i < RADEON_NUM_RINGS; ++i) {
+   if (!target_seq[i]) {
+   continue;
+   }
+
+   /* use the most recent one as indicator */
+   if (time_after(rdev->fence_drv[i].last_activity, 
last_activity)) {
+   last_activity = rdev->fence_drv[i].last_activity;
+   }
+
+   /* For lockup detection just pick the lowest ring we are
+* actively waiting for
+*/
+   if (i < ring) {
+   ring = i;
+   }
+   }
+
+   /* nothing to wait for ? */
+   if (ring == RADEON_NUM_RINGS) {
+   return 0;
+   }
+
+

[PATCH 11/19] drm/radeon: define new SA interface v3

2012-05-09 Thread Christian König

Define the interface without modifying the allocation
algorithm in any way.

v2: rebase on top of fence new uint64 patch
v3: add ring to debugfs output

Signed-off-by: Jerome Glisse 
Signed-off-by: Christian K?nig 
---
 drivers/gpu/drm/radeon/radeon.h   |1 +
 drivers/gpu/drm/radeon/radeon_gart.c  |6 +--
 drivers/gpu/drm/radeon/radeon_object.h|5 ++-
 drivers/gpu/drm/radeon/radeon_ring.c  |8 ++--
 drivers/gpu/drm/radeon/radeon_sa.c|   60 -
 drivers/gpu/drm/radeon/radeon_semaphore.c |2 +-
 6 files changed, 63 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 9374ab1..ada70d1 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -398,6 +398,7 @@ struct radeon_sa_bo {
struct radeon_sa_manager*manager;
unsignedsoffset;
unsignedeoffset;
+   struct radeon_fence *fence;
 };

 /*
diff --git a/drivers/gpu/drm/radeon/radeon_gart.c 
b/drivers/gpu/drm/radeon/radeon_gart.c
index c5789ef..53dba8e 100644
--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -326,7 +326,7 @@ static void radeon_vm_unbind_locked(struct radeon_device 
*rdev,
rdev->vm_manager.use_bitmap &= ~(1 << vm->id);
list_del_init(>list);
vm->id = -1;
-   radeon_sa_bo_free(rdev, >sa_bo);
+   radeon_sa_bo_free(rdev, >sa_bo, NULL);
vm->pt = NULL;

list_for_each_entry(bo_va, >va, vm_list) {
@@ -395,7 +395,7 @@ int radeon_vm_bind(struct radeon_device *rdev, struct 
radeon_vm *vm)
 retry:
r = radeon_sa_bo_new(rdev, >vm_manager.sa_manager, >sa_bo,
 RADEON_GPU_PAGE_ALIGN(vm->last_pfn * 8),
-RADEON_GPU_PAGE_SIZE);
+RADEON_GPU_PAGE_SIZE, false);
if (r) {
if (list_empty(>vm_manager.lru_vm)) {
return r;
@@ -426,7 +426,7 @@ retry_id:
/* do hw bind */
r = rdev->vm_manager.funcs->bind(rdev, vm, id);
if (r) {
-   radeon_sa_bo_free(rdev, >sa_bo);
+   radeon_sa_bo_free(rdev, >sa_bo, NULL);
return r;
}
rdev->vm_manager.use_bitmap |= 1 << id;
diff --git a/drivers/gpu/drm/radeon/radeon_object.h 
b/drivers/gpu/drm/radeon/radeon_object.h
index 4fc7f07..befec7d 100644
--- a/drivers/gpu/drm/radeon/radeon_object.h
+++ b/drivers/gpu/drm/radeon/radeon_object.h
@@ -169,9 +169,10 @@ extern int radeon_sa_bo_manager_suspend(struct 
radeon_device *rdev,
 extern int radeon_sa_bo_new(struct radeon_device *rdev,
struct radeon_sa_manager *sa_manager,
struct radeon_sa_bo **sa_bo,
-   unsigned size, unsigned align);
+   unsigned size, unsigned align, bool block);
 extern void radeon_sa_bo_free(struct radeon_device *rdev,
- struct radeon_sa_bo **sa_bo);
+ struct radeon_sa_bo **sa_bo,
+ struct radeon_fence *fence);
 #if defined(CONFIG_DEBUG_FS)
 extern void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager,
 struct seq_file *m);
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
b/drivers/gpu/drm/radeon/radeon_ring.c
index 45adb37..1748d93 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -85,7 +85,7 @@ bool radeon_ib_try_free(struct radeon_device *rdev, struct 
radeon_ib *ib)
if (ib->fence && ib->fence->seq < RADEON_FENCE_NOTEMITED_SEQ) {
if (radeon_fence_signaled(ib->fence)) {
radeon_fence_unref(>fence);
-   radeon_sa_bo_free(rdev, >sa_bo);
+   radeon_sa_bo_free(rdev, >sa_bo, NULL);
done = true;
}
}
@@ -124,7 +124,7 @@ retry:
if (rdev->ib_pool.ibs[idx].fence == NULL) {
r = radeon_sa_bo_new(rdev, >ib_pool.sa_manager,
 >ib_pool.ibs[idx].sa_bo,
-size, 256);
+size, 256, false);
if (!r) {
*ib = >ib_pool.ibs[idx];
(*ib)->ptr = 
radeon_sa_bo_cpu_addr((*ib)->sa_bo);
@@ -173,7 +173,7 @@ void radeon_ib_free(struct radeon_device *rdev, struct 
radeon_ib **ib)
}
radeon_mutex_lock(>ib_pool.mutex);
if (tmp->fence && tmp->fence->seq == RADEON_FENCE_NOTEMITED_SEQ) {
-   radeon_sa_bo_free(rdev, >sa_bo);
+   radeon_sa_bo_free(rdev, >sa_bo, NULL);
radeon_fence_unref(>fence);
}

[PATCH 10/19] drm/radeon: make sa bo a stand alone object

2012-05-09 Thread Christian König

Allocating and freeing it seperately.

Signed-off-by: Christian K?nig 
---
 drivers/gpu/drm/radeon/radeon.h   |4 ++--
 drivers/gpu/drm/radeon/radeon_cs.c|4 ++--
 drivers/gpu/drm/radeon/radeon_gart.c  |4 ++--
 drivers/gpu/drm/radeon/radeon_object.h|4 ++--
 drivers/gpu/drm/radeon/radeon_ring.c  |6 +++---
 drivers/gpu/drm/radeon/radeon_sa.c|   28 +++-
 drivers/gpu/drm/radeon/radeon_semaphore.c |4 ++--
 7 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index d1c2154..9374ab1 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -638,7 +638,7 @@ void radeon_irq_kms_pflip_irq_put(struct radeon_device 
*rdev, int crtc);
  */

 struct radeon_ib {
-   struct radeon_sa_bo sa_bo;
+   struct radeon_sa_bo *sa_bo;
unsignedidx;
uint32_tlength_dw;
uint64_tgpu_addr;
@@ -693,7 +693,7 @@ struct radeon_vm {
unsignedlast_pfn;
u64 pt_gpu_addr;
u64 *pt;
-   struct radeon_sa_bo sa_bo;
+   struct radeon_sa_bo *sa_bo;
struct mutexmutex;
/* last fence for cs using this vm */
struct radeon_fence *fence;
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index b778037..5c065bf 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -477,7 +477,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
/* ib pool is bind at 0 in virtual address space to gpu_addr is 
the
 * offset inside the pool bo
 */
-   parser->const_ib->gpu_addr = parser->const_ib->sa_bo.soffset;
+   parser->const_ib->gpu_addr = parser->const_ib->sa_bo->soffset;
r = radeon_ib_schedule(rdev, parser->const_ib);
if (r)
goto out;
@@ -487,7 +487,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
/* ib pool is bind at 0 in virtual address space to gpu_addr is the
 * offset inside the pool bo
 */
-   parser->ib->gpu_addr = parser->ib->sa_bo.soffset;
+   parser->ib->gpu_addr = parser->ib->sa_bo->soffset;
parser->ib->is_const_ib = false;
r = radeon_ib_schedule(rdev, parser->ib);
 out:
diff --git a/drivers/gpu/drm/radeon/radeon_gart.c 
b/drivers/gpu/drm/radeon/radeon_gart.c
index 4a5d9d4..c5789ef 100644
--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -404,8 +404,8 @@ retry:
radeon_vm_unbind(rdev, vm_evict);
goto retry;
}
-   vm->pt = radeon_sa_bo_cpu_addr(>sa_bo);
-   vm->pt_gpu_addr = radeon_sa_bo_gpu_addr(>sa_bo);
+   vm->pt = radeon_sa_bo_cpu_addr(vm->sa_bo);
+   vm->pt_gpu_addr = radeon_sa_bo_gpu_addr(vm->sa_bo);
memset(vm->pt, 0, RADEON_GPU_PAGE_ALIGN(vm->last_pfn * 8));

 retry_id:
diff --git a/drivers/gpu/drm/radeon/radeon_object.h 
b/drivers/gpu/drm/radeon/radeon_object.h
index 99ab46a..4fc7f07 100644
--- a/drivers/gpu/drm/radeon/radeon_object.h
+++ b/drivers/gpu/drm/radeon/radeon_object.h
@@ -168,10 +168,10 @@ extern int radeon_sa_bo_manager_suspend(struct 
radeon_device *rdev,
struct radeon_sa_manager *sa_manager);
 extern int radeon_sa_bo_new(struct radeon_device *rdev,
struct radeon_sa_manager *sa_manager,
-   struct radeon_sa_bo *sa_bo,
+   struct radeon_sa_bo **sa_bo,
unsigned size, unsigned align);
 extern void radeon_sa_bo_free(struct radeon_device *rdev,
- struct radeon_sa_bo *sa_bo);
+ struct radeon_sa_bo **sa_bo);
 #if defined(CONFIG_DEBUG_FS)
 extern void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager,
 struct seq_file *m);
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
b/drivers/gpu/drm/radeon/radeon_ring.c
index f49c9c0..45adb37 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -127,8 +127,8 @@ retry:
 size, 256);
if (!r) {
*ib = >ib_pool.ibs[idx];
-   (*ib)->ptr = 
radeon_sa_bo_cpu_addr(&(*ib)->sa_bo);
-   (*ib)->gpu_addr = 
radeon_sa_bo_gpu_addr(&(*ib)->sa_bo);
+   (*ib)->ptr = 
radeon_sa_bo_cpu_addr((*ib)->sa_bo);
+   (*ib)->gpu_addr = 
radeon_sa_bo_gpu_addr((*ib)->sa_bo);

[PATCH 09/19] drm/radeon: keep start and end offset in the SA

2012-05-09 Thread Christian König

Instead of offset + size keep start and end offset directly.

Signed-off-by: Christian K?nig 
---
 drivers/gpu/drm/radeon/radeon.h|4 ++--
 drivers/gpu/drm/radeon/radeon_cs.c |4 ++--
 drivers/gpu/drm/radeon/radeon_object.h |4 ++--
 drivers/gpu/drm/radeon/radeon_sa.c |   13 +++--
 4 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 8a6b1b3..d1c2154 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -396,8 +396,8 @@ struct radeon_sa_bo;
 struct radeon_sa_bo {
struct list_headlist;
struct radeon_sa_manager*manager;
-   unsignedoffset;
-   unsignedsize;
+   unsignedsoffset;
+   unsignedeoffset;
 };

 /*
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index 289b0d7..b778037 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -477,7 +477,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
/* ib pool is bind at 0 in virtual address space to gpu_addr is 
the
 * offset inside the pool bo
 */
-   parser->const_ib->gpu_addr = parser->const_ib->sa_bo.offset;
+   parser->const_ib->gpu_addr = parser->const_ib->sa_bo.soffset;
r = radeon_ib_schedule(rdev, parser->const_ib);
if (r)
goto out;
@@ -487,7 +487,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
/* ib pool is bind at 0 in virtual address space to gpu_addr is the
 * offset inside the pool bo
 */
-   parser->ib->gpu_addr = parser->ib->sa_bo.offset;
+   parser->ib->gpu_addr = parser->ib->sa_bo.soffset;
parser->ib->is_const_ib = false;
r = radeon_ib_schedule(rdev, parser->ib);
 out:
diff --git a/drivers/gpu/drm/radeon/radeon_object.h 
b/drivers/gpu/drm/radeon/radeon_object.h
index d9fca1e..99ab46a 100644
--- a/drivers/gpu/drm/radeon/radeon_object.h
+++ b/drivers/gpu/drm/radeon/radeon_object.h
@@ -149,12 +149,12 @@ extern struct radeon_bo_va *radeon_bo_va(struct radeon_bo 
*rbo,

 static inline uint64_t radeon_sa_bo_gpu_addr(struct radeon_sa_bo *sa_bo)
 {
-   return sa_bo->manager->gpu_addr + sa_bo->offset;
+   return sa_bo->manager->gpu_addr + sa_bo->soffset;
 }

 static inline void * radeon_sa_bo_cpu_addr(struct radeon_sa_bo *sa_bo)
 {
-   return sa_bo->manager->cpu_ptr + sa_bo->offset;
+   return sa_bo->manager->cpu_ptr + sa_bo->soffset;
 }

 extern int radeon_sa_bo_manager_init(struct radeon_device *rdev,
diff --git a/drivers/gpu/drm/radeon/radeon_sa.c 
b/drivers/gpu/drm/radeon/radeon_sa.c
index 1db0568..3bea7ba 100644
--- a/drivers/gpu/drm/radeon/radeon_sa.c
+++ b/drivers/gpu/drm/radeon/radeon_sa.c
@@ -152,11 +152,11 @@ int radeon_sa_bo_new(struct radeon_device *rdev,
offset = 0;
list_for_each_entry(tmp, _manager->sa_bo, list) {
/* room before this object ? */
-   if (offset < tmp->offset && (tmp->offset - offset) >= size) {
+   if (offset < tmp->soffset && (tmp->soffset - offset) >= size) {
head = tmp->list.prev;
goto out;
}
-   offset = tmp->offset + tmp->size;
+   offset = tmp->eoffset;
wasted = offset % align;
if (wasted) {
wasted = align - wasted;
@@ -166,7 +166,7 @@ int radeon_sa_bo_new(struct radeon_device *rdev,
/* room at the end ? */
head = sa_manager->sa_bo.prev;
tmp = list_entry(head, struct radeon_sa_bo, list);
-   offset = tmp->offset + tmp->size;
+   offset = tmp->eoffset;
wasted = offset % align;
if (wasted) {
wasted = align - wasted;
@@ -180,8 +180,8 @@ int radeon_sa_bo_new(struct radeon_device *rdev,

 out:
sa_bo->manager = sa_manager;
-   sa_bo->offset = offset;
-   sa_bo->size = size;
+   sa_bo->soffset = offset;
+   sa_bo->eoffset = offset + size;
list_add(_bo->list, head);
spin_unlock(_manager->lock);
return 0;
@@ -202,7 +202,8 @@ void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager 
*sa_manager,

spin_lock(_manager->lock);
list_for_each_entry(i, _manager->sa_bo, list) {
-   seq_printf(m, "offset %08d: size %4d\n", i->offset, i->size);
+   seq_printf(m, "[%08x %08x] size %4d [%p]\n",
+  i->soffset, i->eoffset, i->eoffset - i->soffset, i);
}
spin_unlock(_manager->lock);
 }
-- 
1.7.9.5

[PATCH 08/19] drm/radeon: add sub allocator debugfs file

2012-05-09 Thread Christian König

Dumping the current allocations.

Signed-off-by: Christian K?nig 
---
 drivers/gpu/drm/radeon/radeon_object.h |5 +
 drivers/gpu/drm/radeon/radeon_ring.c   |   22 ++
 drivers/gpu/drm/radeon/radeon_sa.c |   14 ++
 3 files changed, 41 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_object.h 
b/drivers/gpu/drm/radeon/radeon_object.h
index c120ab9..d9fca1e 100644
--- a/drivers/gpu/drm/radeon/radeon_object.h
+++ b/drivers/gpu/drm/radeon/radeon_object.h
@@ -172,5 +172,10 @@ extern int radeon_sa_bo_new(struct radeon_device *rdev,
unsigned size, unsigned align);
 extern void radeon_sa_bo_free(struct radeon_device *rdev,
  struct radeon_sa_bo *sa_bo);
+#if defined(CONFIG_DEBUG_FS)
+extern void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager,
+struct seq_file *m);
+#endif
+

 #endif
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
b/drivers/gpu/drm/radeon/radeon_ring.c
index 116be5e..f49c9c0 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -601,6 +601,23 @@ static int radeon_debugfs_ib_info(struct seq_file *m, void 
*data)
 static struct drm_info_list radeon_debugfs_ib_list[RADEON_IB_POOL_SIZE];
 static char radeon_debugfs_ib_names[RADEON_IB_POOL_SIZE][32];
 static unsigned radeon_debugfs_ib_idx[RADEON_IB_POOL_SIZE];
+
+static int radeon_debugfs_sa_info(struct seq_file *m, void *data)
+{
+   struct drm_info_node *node = (struct drm_info_node *) m->private;
+   struct drm_device *dev = node->minor->dev;
+   struct radeon_device *rdev = dev->dev_private;
+
+   radeon_sa_bo_dump_debug_info(>ib_pool.sa_manager, m);
+
+   return 0;
+
+}
+
+static struct drm_info_list radeon_debugfs_sa_list[] = {
+{"radeon_sa_info", _debugfs_sa_info, 0, NULL},
+};
+
 #endif

 int radeon_debugfs_ring_init(struct radeon_device *rdev, struct radeon_ring 
*ring)
@@ -627,6 +644,11 @@ int radeon_debugfs_ib_init(struct radeon_device *rdev)
 {
 #if defined(CONFIG_DEBUG_FS)
unsigned i;
+   int r;
+
+   r = radeon_debugfs_add_files(rdev, radeon_debugfs_sa_list, 1);
+   if (r)
+   return r;

for (i = 0; i < RADEON_IB_POOL_SIZE; i++) {
sprintf(radeon_debugfs_ib_names[i], "radeon_ib_%04u", i);
diff --git a/drivers/gpu/drm/radeon/radeon_sa.c 
b/drivers/gpu/drm/radeon/radeon_sa.c
index aed0a8c..1db0568 100644
--- a/drivers/gpu/drm/radeon/radeon_sa.c
+++ b/drivers/gpu/drm/radeon/radeon_sa.c
@@ -193,3 +193,17 @@ void radeon_sa_bo_free(struct radeon_device *rdev, struct 
radeon_sa_bo *sa_bo)
list_del_init(_bo->list);
spin_unlock(_bo->manager->lock);
 }
+
+#if defined(CONFIG_DEBUG_FS)
+void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager,
+ struct seq_file *m)
+{
+   struct radeon_sa_bo *i;
+
+   spin_lock(_manager->lock);
+   list_for_each_entry(i, _manager->sa_bo, list) {
+   seq_printf(m, "offset %08d: size %4d\n", i->offset, i->size);
+   }
+   spin_unlock(_manager->lock);
+}
+#endif
-- 
1.7.9.5

[PATCH 07/19] drm/radeon: add proper locking to the SA v3

2012-05-09 Thread Christian König

Make the suballocator self containing to locking.

v2: split the bugfix into a seperate patch.
v3: remove some unreleated changes.

Sig-off-by: Christian K?nig 
---
 drivers/gpu/drm/radeon/radeon.h|1 +
 drivers/gpu/drm/radeon/radeon_sa.c |6 ++
 2 files changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 701094b..8a6b1b3 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -381,6 +381,7 @@ struct radeon_bo_list {
  * alignment).
  */
 struct radeon_sa_manager {
+   spinlock_t  lock;
struct radeon_bo*bo;
struct list_headsa_bo;
unsignedsize;
diff --git a/drivers/gpu/drm/radeon/radeon_sa.c 
b/drivers/gpu/drm/radeon/radeon_sa.c
index 8fbfe69..aed0a8c 100644
--- a/drivers/gpu/drm/radeon/radeon_sa.c
+++ b/drivers/gpu/drm/radeon/radeon_sa.c
@@ -37,6 +37,7 @@ int radeon_sa_bo_manager_init(struct radeon_device *rdev,
 {
int r;

+   spin_lock_init(_manager->lock);
sa_manager->bo = NULL;
sa_manager->size = size;
sa_manager->domain = domain;
@@ -139,6 +140,7 @@ int radeon_sa_bo_new(struct radeon_device *rdev,

BUG_ON(align > RADEON_GPU_PAGE_SIZE);
BUG_ON(size > sa_manager->size);
+   spin_lock(_manager->lock);

/* no one ? */
head = sa_manager->sa_bo.prev;
@@ -172,6 +174,7 @@ int radeon_sa_bo_new(struct radeon_device *rdev,
offset += wasted;
if ((sa_manager->size - offset) < size) {
/* failed to find somethings big enough */
+   spin_unlock(_manager->lock);
return -ENOMEM;
}

@@ -180,10 +183,13 @@ out:
sa_bo->offset = offset;
sa_bo->size = size;
list_add(_bo->list, head);
+   spin_unlock(_manager->lock);
return 0;
 }

 void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo *sa_bo)
 {
+   spin_lock(_bo->manager->lock);
list_del_init(_bo->list);
+   spin_unlock(_bo->manager->lock);
 }
-- 
1.7.9.5

[PATCH 06/19] drm/radeon: use inline functions to calc sa_bo addr

2012-05-09 Thread Christian König

Instead of hacking the calculation multiple times.

Signed-off-by: Christian K?nig 
---
 drivers/gpu/drm/radeon/radeon_gart.c  |6 ++
 drivers/gpu/drm/radeon/radeon_object.h|   11 +++
 drivers/gpu/drm/radeon/radeon_ring.c  |6 ++
 drivers/gpu/drm/radeon/radeon_semaphore.c |6 ++
 4 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_gart.c 
b/drivers/gpu/drm/radeon/radeon_gart.c
index c58a036..4a5d9d4 100644
--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -404,10 +404,8 @@ retry:
radeon_vm_unbind(rdev, vm_evict);
goto retry;
}
-   vm->pt = rdev->vm_manager.sa_manager.cpu_ptr;
-   vm->pt += (vm->sa_bo.offset >> 3);
-   vm->pt_gpu_addr = rdev->vm_manager.sa_manager.gpu_addr;
-   vm->pt_gpu_addr += vm->sa_bo.offset;
+   vm->pt = radeon_sa_bo_cpu_addr(>sa_bo);
+   vm->pt_gpu_addr = radeon_sa_bo_gpu_addr(>sa_bo);
memset(vm->pt, 0, RADEON_GPU_PAGE_ALIGN(vm->last_pfn * 8));

 retry_id:
diff --git a/drivers/gpu/drm/radeon/radeon_object.h 
b/drivers/gpu/drm/radeon/radeon_object.h
index f9104be..c120ab9 100644
--- a/drivers/gpu/drm/radeon/radeon_object.h
+++ b/drivers/gpu/drm/radeon/radeon_object.h
@@ -146,6 +146,17 @@ extern struct radeon_bo_va *radeon_bo_va(struct radeon_bo 
*rbo,
 /*
  * sub allocation
  */
+
+static inline uint64_t radeon_sa_bo_gpu_addr(struct radeon_sa_bo *sa_bo)
+{
+   return sa_bo->manager->gpu_addr + sa_bo->offset;
+}
+
+static inline void * radeon_sa_bo_cpu_addr(struct radeon_sa_bo *sa_bo)
+{
+   return sa_bo->manager->cpu_ptr + sa_bo->offset;
+}
+
 extern int radeon_sa_bo_manager_init(struct radeon_device *rdev,
 struct radeon_sa_manager *sa_manager,
 unsigned size, u32 domain);
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
b/drivers/gpu/drm/radeon/radeon_ring.c
index 2fdc8c3..116be5e 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -127,10 +127,8 @@ retry:
 size, 256);
if (!r) {
*ib = >ib_pool.ibs[idx];
-   (*ib)->ptr = rdev->ib_pool.sa_manager.cpu_ptr;
-   (*ib)->ptr += ((*ib)->sa_bo.offset >> 2);
-   (*ib)->gpu_addr = 
rdev->ib_pool.sa_manager.gpu_addr;
-   (*ib)->gpu_addr += (*ib)->sa_bo.offset;
+   (*ib)->ptr = 
radeon_sa_bo_cpu_addr(&(*ib)->sa_bo);
+   (*ib)->gpu_addr = 
radeon_sa_bo_gpu_addr(&(*ib)->sa_bo);
(*ib)->fence = fence;
(*ib)->vm_id = 0;
(*ib)->is_const_ib = false;
diff --git a/drivers/gpu/drm/radeon/radeon_semaphore.c 
b/drivers/gpu/drm/radeon/radeon_semaphore.c
index c5b3d8e..f312ba5 100644
--- a/drivers/gpu/drm/radeon/radeon_semaphore.c
+++ b/drivers/gpu/drm/radeon/radeon_semaphore.c
@@ -53,10 +53,8 @@ static int radeon_semaphore_add_bo(struct radeon_device 
*rdev)
kfree(bo);
return r;
}
-   gpu_addr = rdev->ib_pool.sa_manager.gpu_addr;
-   gpu_addr += bo->ib->sa_bo.offset;
-   cpu_ptr = rdev->ib_pool.sa_manager.cpu_ptr;
-   cpu_ptr += (bo->ib->sa_bo.offset >> 2);
+   gpu_addr = radeon_sa_bo_gpu_addr(>ib->sa_bo);
+   cpu_ptr = radeon_sa_bo_cpu_addr(>ib->sa_bo);
for (i = 0; i < (RADEON_SEMAPHORE_BO_SIZE/8); i++) {
bo->semaphores[i].gpu_addr = gpu_addr;
bo->semaphores[i].cpu_ptr = cpu_ptr;
-- 
1.7.9.5

[PATCH 05/19] drm/radeon: rework locking ring emission mutex in fence deadlock detection v2

2012-05-09 Thread Christian König

Some callers illegal called fence_wait_next/empty
while holding the ring emission mutex. So don't
relock the mutex in that cases, and move the actual
locking into the fence code.

v2: Don't try to unlock the mutex if it isn't locked.

Signed-off-by: Christian K?nig 
---
 drivers/gpu/drm/radeon/radeon.h|4 +--
 drivers/gpu/drm/radeon/radeon_device.c |5 +++-
 drivers/gpu/drm/radeon/radeon_fence.c  |   43 +---
 drivers/gpu/drm/radeon/radeon_pm.c |8 +-
 drivers/gpu/drm/radeon/radeon_ring.c   |6 +
 5 files changed, 37 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 7c87117..701094b 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -284,8 +284,8 @@ int radeon_fence_emit(struct radeon_device *rdev, struct 
radeon_fence *fence);
 void radeon_fence_process(struct radeon_device *rdev, int ring);
 bool radeon_fence_signaled(struct radeon_fence *fence);
 int radeon_fence_wait(struct radeon_fence *fence, bool interruptible);
-int radeon_fence_wait_next(struct radeon_device *rdev, int ring);
-int radeon_fence_wait_empty(struct radeon_device *rdev, int ring);
+int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring);
+int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring);
 struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence);
 void radeon_fence_unref(struct radeon_fence **fence);
 unsigned radeon_fence_count_emitted(struct radeon_device *rdev, int ring);
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 0e7b72a..b827b2e 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -912,9 +912,12 @@ int radeon_suspend_kms(struct drm_device *dev, 
pm_message_t state)
}
/* evict vram memory */
radeon_bo_evict_vram(rdev);
+
+   mutex_lock(>ring_lock);
/* wait for gpu to finish processing current batch */
for (i = 0; i < RADEON_NUM_RINGS; i++)
-   radeon_fence_wait_empty(rdev, i);
+   radeon_fence_wait_empty_locked(rdev, i);
+   mutex_unlock(>ring_lock);

radeon_save_bios_scratch_regs(rdev);

diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
b/drivers/gpu/drm/radeon/radeon_fence.c
index ed20225..098d1fa 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -194,7 +194,7 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
 }

 static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq,
-unsigned ring, bool intr)
+unsigned ring, bool intr, bool lock_ring)
 {
unsigned long timeout, last_activity;
uint64_t seq;
@@ -249,8 +249,16 @@ static int radeon_fence_wait_seq(struct radeon_device 
*rdev, u64 target_seq,
if (seq != 
atomic64_read(>fence_drv[ring].last_seq)) {
continue;
}
+
+   if (lock_ring) {
+   mutex_lock(>ring_lock);
+   }
+
/* test if somebody else has already decided that this 
is a lockup */
if (last_activity != 
rdev->fence_drv[ring].last_activity) {
+   if (lock_ring) {
+   mutex_unlock(>ring_lock);
+   }
continue;
}

@@ -264,15 +272,17 @@ static int radeon_fence_wait_seq(struct radeon_device 
*rdev, u64 target_seq,
rdev->fence_drv[i].last_activity = 
jiffies;
}

-   /* change last activity so nobody else think 
there is a lockup */
-   for (i = 0; i < RADEON_NUM_RINGS; ++i) {
-   rdev->fence_drv[i].last_activity = 
jiffies;
-   }
-
/* mark the ring as not ready any more */
rdev->ring[ring].ready = false;
+   if (lock_ring) {
+   mutex_unlock(>ring_lock);
+   }
return -EDEADLK;
}
+
+   if (lock_ring) {
+   mutex_unlock(>ring_lock);
+   }
}
}
return 0;
@@ -287,7 +297,8 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
return -EINVAL;
}

-   r = radeon_fence_wait_seq(fence->rdev, fence->seq, fence->ring, intr);
+   r = radeon_fence_wait_seq(fence->rdev, fence->seq,
+ fence->ring, intr,

[PATCH 04/19] drm/radeon: rework fence handling, drop fence list v7

2012-05-09 Thread Christian König

From: Jerome Glisse 

Using 64bits fence sequence we can directly compare sequence
number to know if a fence is signaled or not. Thus the fence
list became useless, so does the fence lock that mainly
protected the fence list.

Things like ring.ready are no longer behind a lock, this should
be ok as ring.ready is initialized once and will only change
when facing lockup. Worst case is that we return an -EBUSY just
after a successfull GPU reset, or we go into wait state instead
of returning -EBUSY (thus delaying reporting -EBUSY to fence
wait caller).

v2: Remove left over comment, force using writeback on cayman and
newer, thus not having to suffer from possibly scratch reg
exhaustion
v3: Rebase on top of change to uint64 fence patch
v4: Change DCE5 test to force write back on cayman and newer but
also any APU such as PALM or SUMO family
v5: Rebase on top of new uint64 fence patch
v6: Just break if seq doesn't change any more. Use radeon_fence
prefix for all function names. Even if it's now highly optimized,
try avoiding polling to often.
v7: We should never poll the last_seq from the hardware without
waking the sleeping threads, otherwise we might lose events.

Signed-off-by: Jerome Glisse 
Signed-off-by: Christian K?nig 
---
 drivers/gpu/drm/radeon/radeon.h|6 +-
 drivers/gpu/drm/radeon/radeon_device.c |8 +-
 drivers/gpu/drm/radeon/radeon_fence.c  |  299 
 3 files changed, 119 insertions(+), 194 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index cdf46bc..7c87117 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -263,15 +263,12 @@ struct radeon_fence_driver {
atomic64_t  last_seq;
unsigned long   last_activity;
wait_queue_head_t   queue;
-   struct list_heademitted;
-   struct list_headsignaled;
boolinitialized;
 };

 struct radeon_fence {
struct radeon_device*rdev;
struct kref kref;
-   struct list_headlist;
/* protected by radeon_fence.lock */
uint64_tseq;
/* RB, DMA, etc. */
@@ -291,7 +288,7 @@ int radeon_fence_wait_next(struct radeon_device *rdev, int 
ring);
 int radeon_fence_wait_empty(struct radeon_device *rdev, int ring);
 struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence);
 void radeon_fence_unref(struct radeon_fence **fence);
-int radeon_fence_count_emitted(struct radeon_device *rdev, int ring);
+unsigned radeon_fence_count_emitted(struct radeon_device *rdev, int ring);

 /*
  * Tiling registers
@@ -1534,7 +1531,6 @@ struct radeon_device {
struct radeon_mode_info mode_info;
struct radeon_scratch   scratch;
struct radeon_mman  mman;
-   rwlock_tfence_lock;
struct radeon_fence_driver  fence_drv[RADEON_NUM_RINGS];
struct radeon_semaphore_driver  semaphore_drv;
struct mutexring_lock;
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 3f6ff2a..0e7b72a 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -225,9 +225,9 @@ int radeon_wb_init(struct radeon_device *rdev)
/* disable event_write fences */
rdev->wb.use_event = false;
/* disabled via module param */
-   if (radeon_no_wb == 1)
+   if (radeon_no_wb == 1) {
rdev->wb.enabled = false;
-   else {
+   } else {
if (rdev->flags & RADEON_IS_AGP) {
/* often unreliable on AGP */
rdev->wb.enabled = false;
@@ -237,8 +237,9 @@ int radeon_wb_init(struct radeon_device *rdev)
} else {
rdev->wb.enabled = true;
/* event_write fences are only available on r600+ */
-   if (rdev->family >= CHIP_R600)
+   if (rdev->family >= CHIP_R600) {
rdev->wb.use_event = true;
+   }
}
}
/* always use writeback/events on NI, APUs */
@@ -731,7 +732,6 @@ int radeon_device_init(struct radeon_device *rdev,
mutex_init(>gem.mutex);
mutex_init(>pm.mutex);
mutex_init(>vram_mutex);
-   rwlock_init(>fence_lock);
rwlock_init(>semaphore_drv.lock);
INIT_LIST_HEAD(>gem.objects);
init_waitqueue_head(>irq.vblank_queue);
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
b/drivers/gpu/drm/radeon/radeon_fence.c
index feb2bbc..ed20225 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -63,30 +63,18 @@ static u32

[PATCH 03/19] drm/radeon: convert fence to uint64_t v4

2012-05-09 Thread Christian König

From: Jerome Glisse 

This convert fence to use uint64_t sequence number intention is
to use the fact that uin64_t is big enough that we don't need to
care about wrap around.

Tested with and without writeback using 0xF000 as initial
fence sequence and thus allowing to test the wrap around from
32bits to 64bits.

v2: Add comment about possible race btw CPU & GPU, add comment
stressing that we need 2 dword aligned for R600_WB_EVENT_OFFSET
Read fence sequenc in reverse order of GPU write them so we
mitigate the race btw CPU and GPU.

v3: Drop the need for ring to emit the 64bits fence, and just have
each ring emit the lower 32bits of the fence sequence. We
handle the wrap over 32bits in fence_process.

v4: Just a small optimization: Don't reread the last_seq value
if loop restarts, since we already know its value anyway.
Also start at zero not one for seq value and use pre instead
of post increment in emmit, otherwise wait_empty will deadlock.

Signed-off-by: Jerome Glisse 
Signed-off-by: Christian K?nig 
---
 drivers/gpu/drm/radeon/radeon.h   |   39 ++-
 drivers/gpu/drm/radeon/radeon_fence.c |  116 +++--
 drivers/gpu/drm/radeon/radeon_ring.c  |9 +--
 3 files changed, 107 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index e99ea81..cdf46bc 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -100,28 +100,32 @@ extern int radeon_lockup_timeout;
  * Copy from radeon_drv.h so we don't have to include both and have conflicting
  * symbol;
  */
-#define RADEON_MAX_USEC_TIMEOUT10  /* 100 ms */
-#define RADEON_FENCE_JIFFIES_TIMEOUT   (HZ / 2)
+#define RADEON_MAX_USEC_TIMEOUT10  /* 100 ms */
+#define RADEON_FENCE_JIFFIES_TIMEOUT   (HZ / 2)
 /* RADEON_IB_POOL_SIZE must be a power of 2 */
-#define RADEON_IB_POOL_SIZE16
-#define RADEON_DEBUGFS_MAX_COMPONENTS  32
-#define RADEONFB_CONN_LIMIT4
-#define RADEON_BIOS_NUM_SCRATCH8
+#define RADEON_IB_POOL_SIZE16
+#define RADEON_DEBUGFS_MAX_COMPONENTS  32
+#define RADEONFB_CONN_LIMIT4
+#define RADEON_BIOS_NUM_SCRATCH8

 /* max number of rings */
-#define RADEON_NUM_RINGS 3
+#define RADEON_NUM_RINGS   3
+
+/* fence seq are set to this number when signaled */
+#define RADEON_FENCE_SIGNALED_SEQ  0LL
+#define RADEON_FENCE_NOTEMITED_SEQ (~0LL)

 /* internal ring indices */
 /* r1xx+ has gfx CP ring */
-#define RADEON_RING_TYPE_GFX_INDEX  0
+#define RADEON_RING_TYPE_GFX_INDEX 0

 /* cayman has 2 compute CP rings */
-#define CAYMAN_RING_TYPE_CP1_INDEX 1
-#define CAYMAN_RING_TYPE_CP2_INDEX 2
+#define CAYMAN_RING_TYPE_CP1_INDEX 1
+#define CAYMAN_RING_TYPE_CP2_INDEX 2

 /* hardcode those limit for now */
-#define RADEON_VA_RESERVED_SIZE(8 << 20)
-#define RADEON_IB_VM_MAX_SIZE  (64 << 10)
+#define RADEON_VA_RESERVED_SIZE(8 << 20)
+#define RADEON_IB_VM_MAX_SIZE  (64 << 10)

 /*
  * Errata workarounds.
@@ -254,8 +258,9 @@ struct radeon_fence_driver {
uint32_tscratch_reg;
uint64_tgpu_addr;
volatile uint32_t   *cpu_addr;
-   atomic_tseq;
-   uint32_tlast_seq;
+   /* seq is protected by ring emission lock */
+   uint64_tseq;
+   atomic64_t  last_seq;
unsigned long   last_activity;
wait_queue_head_t   queue;
struct list_heademitted;
@@ -268,11 +273,9 @@ struct radeon_fence {
struct kref kref;
struct list_headlist;
/* protected by radeon_fence.lock */
-   uint32_tseq;
-   boolemitted;
-   boolsignaled;
+   uint64_tseq;
/* RB, DMA, etc. */
-   int ring;
+   unsignedring;
struct radeon_semaphore *semaphore;
 };

diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
b/drivers/gpu/drm/radeon/radeon_fence.c
index 5bb78bf..feb2bbc 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -66,14 +66,14 @@ int radeon_fence_emit(struct radeon_device *rdev, struct 
radeon_fence *fence)
unsigned long irq_flags;

write_lock_irqsave(>fence_lock, irq_flags);
-   if (fence->emitted) {
+   if (fence->seq && fence->seq < RADEON_FENCE_NOTEMITED_SEQ) {
write_unlock_irqrestore(>fence_lock, irq_flags);

[PATCH 02/19] drm/radeon: replace the per ring mutex with a global one

2012-05-09 Thread Christian König

A single global mutex for ring submissions seems sufficient.

Signed-off-by: Christian K?nig 
---
 drivers/gpu/drm/radeon/radeon.h   |3 ++-
 drivers/gpu/drm/radeon/radeon_device.c|3 +--
 drivers/gpu/drm/radeon/radeon_pm.c|   10 ++-
 drivers/gpu/drm/radeon/radeon_ring.c  |   28 +++
 drivers/gpu/drm/radeon/radeon_semaphore.c |   42 +
 5 files changed, 41 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 82ffa6a..e99ea81 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -676,7 +676,6 @@ struct radeon_ring {
uint64_tgpu_addr;
uint32_talign_mask;
uint32_tptr_mask;
-   struct mutexmutex;
boolready;
u32 ptr_reg_shift;
u32 ptr_reg_mask;
@@ -815,6 +814,7 @@ int radeon_ring_alloc(struct radeon_device *rdev, struct 
radeon_ring *cp, unsign
 int radeon_ring_lock(struct radeon_device *rdev, struct radeon_ring *cp, 
unsigned ndw);
 void radeon_ring_commit(struct radeon_device *rdev, struct radeon_ring *cp);
 void radeon_ring_unlock_commit(struct radeon_device *rdev, struct radeon_ring 
*cp);
+void radeon_ring_undo(struct radeon_ring *ring);
 void radeon_ring_unlock_undo(struct radeon_device *rdev, struct radeon_ring 
*cp);
 int radeon_ring_test(struct radeon_device *rdev, struct radeon_ring *cp);
 void radeon_ring_force_activity(struct radeon_device *rdev, struct radeon_ring 
*ring);
@@ -1534,6 +1534,7 @@ struct radeon_device {
rwlock_tfence_lock;
struct radeon_fence_driver  fence_drv[RADEON_NUM_RINGS];
struct radeon_semaphore_driver  semaphore_drv;
+   struct mutexring_lock;
struct radeon_ring  ring[RADEON_NUM_RINGS];
struct radeon_ib_pool   ib_pool;
struct radeon_irq   irq;
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index ff28210..3f6ff2a 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -724,8 +724,7 @@ int radeon_device_init(struct radeon_device *rdev,
 * can recall function without having locking issues */
radeon_mutex_init(>cs_mutex);
radeon_mutex_init(>ib_pool.mutex);
-   for (i = 0; i < RADEON_NUM_RINGS; ++i)
-   mutex_init(>ring[i].mutex);
+   mutex_init(>ring_lock);
mutex_init(>dc_hw_i2c_mutex);
if (rdev->family >= CHIP_R600)
spin_lock_init(>ih.lock);
diff --git a/drivers/gpu/drm/radeon/radeon_pm.c 
b/drivers/gpu/drm/radeon/radeon_pm.c
index caa55d6..7c38745 100644
--- a/drivers/gpu/drm/radeon/radeon_pm.c
+++ b/drivers/gpu/drm/radeon/radeon_pm.c
@@ -252,10 +252,7 @@ static void radeon_pm_set_clocks(struct radeon_device 
*rdev)

mutex_lock(>ddev->struct_mutex);
mutex_lock(>vram_mutex);
-   for (i = 0; i < RADEON_NUM_RINGS; ++i) {
-   if (rdev->ring[i].ring_obj)
-   mutex_lock(>ring[i].mutex);
-   }
+   mutex_lock(>ring_lock);

/* gui idle int has issues on older chips it seems */
if (rdev->family >= CHIP_R600) {
@@ -311,10 +308,7 @@ static void radeon_pm_set_clocks(struct radeon_device 
*rdev)

rdev->pm.dynpm_planned_action = DYNPM_ACTION_NONE;

-   for (i = 0; i < RADEON_NUM_RINGS; ++i) {
-   if (rdev->ring[i].ring_obj)
-   mutex_unlock(>ring[i].mutex);
-   }
+   mutex_unlock(>ring_lock);
mutex_unlock(>vram_mutex);
mutex_unlock(>ddev->struct_mutex);
 }
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
b/drivers/gpu/drm/radeon/radeon_ring.c
index 2eb4c6e..a4d60ae 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -346,9 +346,9 @@ int radeon_ring_alloc(struct radeon_device *rdev, struct 
radeon_ring *ring, unsi
if (ndw < ring->ring_free_dw) {
break;
}
-   mutex_unlock(>mutex);
+   mutex_unlock(>ring_lock);
r = radeon_fence_wait_next(rdev, radeon_ring_index(rdev, ring));
-   mutex_lock(>mutex);
+   mutex_lock(>ring_lock);
if (r)
return r;
}
@@ -361,10 +361,10 @@ int radeon_ring_lock(struct radeon_device *rdev, struct 
radeon_ring *ring, unsig
 {
int r;

-   mutex_lock(>mutex);
+   mutex_lock(>ring_lock);
r = radeon_ring_alloc(rdev, ring, ndw);
if (r) {
-   mutex_unlock(>mutex);
+   mutex_unlock(>ring_lock);
return r;
}
return 0;
@@ -389,20 +389,25 @@ void radeon_ring_commit(struct radeon_device *rdev, 
struct radeon_ring

[PATCH 01/19] drm/radeon: fix possible lack of synchronization btw ttm and other ring

2012-05-09 Thread Christian König

From: Jerome Glisse 

We need to sync with the GFX ring as ttm might have schedule bo move
on it and new command scheduled for other ring need to wait for bo
data to be in place.

Signed-off-by: Jerome Glisse 
Reviewed by: Christian K?nig 
---
 drivers/gpu/drm/radeon/radeon_cs.c |   12 ++--
 include/drm/radeon_drm.h   |1 -
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index c66beb1..289b0d7 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -122,15 +122,15 @@ static int radeon_cs_sync_rings(struct radeon_cs_parser 
*p)
int i, r;

for (i = 0; i < p->nrelocs; i++) {
+   struct radeon_fence *fence;
+
if (!p->relocs[i].robj || !p->relocs[i].robj->tbo.sync_obj)
continue;

-   if (!(p->relocs[i].flags & RADEON_RELOC_DONT_SYNC)) {
-   struct radeon_fence *fence = 
p->relocs[i].robj->tbo.sync_obj;
-   if (fence->ring != p->ring && 
!radeon_fence_signaled(fence)) {
-   sync_to_ring[fence->ring] = true;
-   need_sync = true;
-   }
+   fence = p->relocs[i].robj->tbo.sync_obj;
+   if (fence->ring != p->ring && !radeon_fence_signaled(fence)) {
+   sync_to_ring[fence->ring] = true;
+   need_sync = true;
}
}

diff --git a/include/drm/radeon_drm.h b/include/drm/radeon_drm.h
index 7c491b4..5805686 100644
--- a/include/drm/radeon_drm.h
+++ b/include/drm/radeon_drm.h
@@ -926,7 +926,6 @@ struct drm_radeon_cs_chunk {
 };

 /* drm_radeon_cs_reloc.flags */
-#define RADEON_RELOC_DONT_SYNC 0x01

 struct drm_radeon_cs_reloc {
uint32_thandle;
-- 
1.7.9.5

Include request for SA improvements

2012-05-09 Thread Christian König

Hi Dave & Jerome and everybody on the list,

I can't find any more bugs and also I'm out of things to test, so I really
hope that this is the last incarnation of this patchset, and if Jerome is
ok with it it should now be included into drm-next.

Cheers,
Christian.

[PATCH 2/2 v3] drm/exynos: added userptr feature.

2012-05-09 Thread Inki Dae

this feature is used to import user space region allocated by malloc() or
mmaped into a gem. and to guarantee the pages to user space not to be
swapped out, the VMAs within the user space would be locked and then unlocked
when the pages are released.

but this lock might result in significant degradation of system performance
because the pages couldn't be swapped out so we limit user-desired userptr
size to pre-defined.

Signed-off-by: Inki Dae 
Signed-off-by: Kyungmin Park 
---
 drivers/gpu/drm/exynos/exynos_drm_drv.c |2 +
 drivers/gpu/drm/exynos/exynos_drm_gem.c |  334 +++
 drivers/gpu/drm/exynos/exynos_drm_gem.h |   17 ++-
 include/drm/exynos_drm.h|   26 +++-
 4 files changed, 376 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c 
b/drivers/gpu/drm/exynos/exynos_drm_drv.c
index 1e68ec2..e8ae3f1 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_drv.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c
@@ -220,6 +220,8 @@ static struct drm_ioctl_desc exynos_ioctls[] = {
DRM_IOCTL_DEF_DRV(EXYNOS_GEM_MAP_OFFSET,
exynos_drm_gem_map_offset_ioctl, DRM_UNLOCKED |
DRM_AUTH),
+   DRM_IOCTL_DEF_DRV(EXYNOS_GEM_USERPTR,
+   exynos_drm_gem_userptr_ioctl, DRM_UNLOCKED),
DRM_IOCTL_DEF_DRV(EXYNOS_GEM_MMAP,
exynos_drm_gem_mmap_ioctl, DRM_UNLOCKED | DRM_AUTH),
DRM_IOCTL_DEF_DRV(EXYNOS_GEM_GET,
diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c 
b/drivers/gpu/drm/exynos/exynos_drm_gem.c
index e6abb66..ccc6e3d 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_gem.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c
@@ -68,6 +68,83 @@ static int check_gem_flags(unsigned int flags)
return 0;
 }

+static struct vm_area_struct *get_vma(struct vm_area_struct *vma)
+{
+   struct vm_area_struct *vma_copy;
+
+   vma_copy = kmalloc(sizeof(*vma_copy), GFP_KERNEL);
+   if (!vma_copy)
+   return NULL;
+
+   if (vma->vm_ops && vma->vm_ops->open)
+   vma->vm_ops->open(vma);
+
+   if (vma->vm_file)
+   get_file(vma->vm_file);
+
+   memcpy(vma_copy, vma, sizeof(*vma));
+
+   vma_copy->vm_mm = NULL;
+   vma_copy->vm_next = NULL;
+   vma_copy->vm_prev = NULL;
+
+   return vma_copy;
+}
+
+
+static void put_vma(struct vm_area_struct *vma)
+{
+   if (!vma)
+   return;
+
+   if (vma->vm_ops && vma->vm_ops->close)
+   vma->vm_ops->close(vma);
+
+   if (vma->vm_file)
+   fput(vma->vm_file);
+
+   kfree(vma);
+}
+
+/*
+ * lock_userptr_vma - lock VMAs within user address space
+ *
+ * this function locks vma within user address space to avoid pages
+ * to the userspace from being swapped out.
+ * if this vma isn't locked, the pages to the userspace could be swapped out
+ * so unprivileged user might access different pages and dma of any device
+ * could access physical memory region not intended once swap-in.
+ */
+static int lock_userptr_vma(struct exynos_drm_gem_buf *buf, unsigned int lock)
+{
+   struct vm_area_struct *vma;
+   unsigned long start, end;
+
+   start = buf->userptr;
+   end = buf->userptr + buf->size - 1;
+
+   down_write(>mm->mmap_sem);
+
+   do {
+   vma = find_vma(current->mm, start);
+   if (!vma) {
+   up_write(>mm->mmap_sem);
+   return -EFAULT;
+   }
+
+   if (lock)
+   vma->vm_flags |= VM_LOCKED;
+   else
+   vma->vm_flags &= ~VM_LOCKED;
+
+   start = vma->vm_end + 1;
+   } while (vma->vm_end < end);
+
+   up_write(>mm->mmap_sem);
+
+   return 0;
+}
+
 static void update_vm_cache_attr(struct exynos_drm_gem_obj *obj,
struct vm_area_struct *vma)
 {
@@ -256,6 +333,44 @@ static void exynos_drm_gem_put_pages(struct drm_gem_object 
*obj)
/* add some codes for UNCACHED type here. TODO */
 }

+static void exynos_drm_put_userptr(struct drm_gem_object *obj)
+{
+   struct exynos_drm_gem_obj *exynos_gem_obj;
+   struct exynos_drm_gem_buf *buf;
+   struct vm_area_struct *vma;
+   int npages;
+
+   exynos_gem_obj = to_exynos_gem_obj(obj);
+   buf = exynos_gem_obj->buffer;
+   vma = exynos_gem_obj->vma;
+
+   if (vma && (vma->vm_flags & VM_PFNMAP) && (vma->vm_pgoff)) {
+   put_vma(exynos_gem_obj->vma);
+   goto out;
+   }
+
+   npages = buf->size >> PAGE_SHIFT;
+
+   if (exynos_gem_obj->flags & EXYNOS_BO_USERPTR && !buf->pfnmap)
+   lock_userptr_vma(buf, 0);
+
+   npages--;
+   while (npages >= 0) {
+   if (buf->write)
+   set_page_dirty_lock(buf->pages[npages]);
+
+   put_page(buf->pages[npages]);
+   npages--;
+   }
+
+out:
+

[PATCH 1/2 v3] drm/exynos: added userptr limit ioctl.

2012-05-09 Thread Inki Dae

this ioctl is used to limit user-desired userptr size as pre-defined
and also could be accessed by only root user.

with userptr feature, unprivileged user can allocate all the pages on system,
so the amount of free physical pages will be very limited. if the VMAs
within user address space was locked, the pages couldn't be swapped out so
it may result in significant degradation of system performance. so this
feature would be used to avoid such situation.

Signed-off-by: Inki Dae 
Signed-off-by: Kyungmin Park 
---
 drivers/gpu/drm/exynos/exynos_drm_drv.c |6 ++
 drivers/gpu/drm/exynos/exynos_drm_drv.h |6 ++
 drivers/gpu/drm/exynos/exynos_drm_gem.c |   22 ++
 drivers/gpu/drm/exynos/exynos_drm_gem.h |3 +++
 include/drm/exynos_drm.h|   17 +
 5 files changed, 54 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c 
b/drivers/gpu/drm/exynos/exynos_drm_drv.c
index 9d3204c..1e68ec2 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_drv.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c
@@ -64,6 +64,9 @@ static int exynos_drm_load(struct drm_device *dev, unsigned 
long flags)
return -ENOMEM;
}

+   /* maximum size of userptr is limited to 16MB as default. */
+   private->userptr_limit = SZ_16M;
+
INIT_LIST_HEAD(>pageflip_event_list);
dev->dev_private = (void *)private;

@@ -221,6 +224,9 @@ static struct drm_ioctl_desc exynos_ioctls[] = {
exynos_drm_gem_mmap_ioctl, DRM_UNLOCKED | DRM_AUTH),
DRM_IOCTL_DEF_DRV(EXYNOS_GEM_GET,
exynos_drm_gem_get_ioctl, DRM_UNLOCKED),
+   DRM_IOCTL_DEF_DRV(EXYNOS_USER_LIMIT,
+   exynos_drm_gem_user_limit_ioctl, DRM_MASTER |
+   DRM_ROOT_ONLY),
DRM_IOCTL_DEF_DRV(EXYNOS_PLANE_SET_ZPOS, exynos_plane_set_zpos_ioctl,
DRM_UNLOCKED | DRM_AUTH),
DRM_IOCTL_DEF_DRV(EXYNOS_VIDI_CONNECTION,
diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.h 
b/drivers/gpu/drm/exynos/exynos_drm_drv.h
index c82c90c..b38ed6f 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_drv.h
+++ b/drivers/gpu/drm/exynos/exynos_drm_drv.h
@@ -235,6 +235,12 @@ struct exynos_drm_private {
 * this array is used to be aware of which crtc did it request vblank.
 */
struct drm_crtc *crtc[MAX_CRTC];
+
+   /*
+* maximum size of allocation by userptr feature.
+* - as default, this has 16MB and only root user can change it.
+*/
+   unsigned long userptr_limit;
 };

 /*
diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c 
b/drivers/gpu/drm/exynos/exynos_drm_gem.c
index fc91293..e6abb66 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_gem.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c
@@ -33,6 +33,8 @@
 #include "exynos_drm_gem.h"
 #include "exynos_drm_buf.h"

+#define USERPTR_MAX_SIZE   SZ_64M
+
 static unsigned int convert_to_vm_err_msg(int msg)
 {
unsigned int out_msg;
@@ -630,6 +632,26 @@ int exynos_drm_gem_get_ioctl(struct drm_device *dev, void 
*data,
return 0;
 }

+int exynos_drm_gem_user_limit_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *filp)
+{
+   struct exynos_drm_private *priv = dev->dev_private;
+   struct drm_exynos_user_limit *limit = data;
+
+   if (limit->userptr_limit < PAGE_SIZE ||
+   limit->userptr_limit > USERPTR_MAX_SIZE) {
+   DRM_DEBUG_KMS("invalid userptr_limit size.\n");
+   return -EINVAL;
+   }
+
+   if (priv->userptr_limit == limit->userptr_limit)
+   return 0;
+
+   priv->userptr_limit = limit->userptr_limit;
+
+   return 0;
+}
+
 int exynos_drm_gem_init_object(struct drm_gem_object *obj)
 {
DRM_DEBUG_KMS("%s\n", __FILE__);
diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.h 
b/drivers/gpu/drm/exynos/exynos_drm_gem.h
index 14d038b..3334c9f 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_gem.h
+++ b/drivers/gpu/drm/exynos/exynos_drm_gem.h
@@ -78,6 +78,9 @@ struct exynos_drm_gem_obj {

 struct page **exynos_gem_get_pages(struct drm_gem_object *obj, gfp_t gfpmask);

+int exynos_drm_gem_user_limit_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *filp);
+
 /* destroy a buffer with gem object */
 void exynos_drm_gem_destroy(struct exynos_drm_gem_obj *exynos_gem_obj);

diff --git a/include/drm/exynos_drm.h b/include/drm/exynos_drm.h
index 54c97e8..52465dc 100644
--- a/include/drm/exynos_drm.h
+++ b/include/drm/exynos_drm.h
@@ -92,6 +92,19 @@ struct drm_exynos_gem_info {
 };

 /**
+ * A structure to userptr limited information.
+ *
+ * @userptr_limit: maximum size to userptr buffer.
+ * the buffer could be allocated by unprivileged user using malloc()
+ * and the size of the buffer would be limited as userptr_limit value.
+ * @pad: just padding to be 64-bit

[PATCH 0/2 v3] drm/exynos: added userptr feature

2012-05-09 Thread Inki Dae

this feature could be used to use memory region allocated by malloc() in user
mode and mmaped memory region allocated by other memory allocators. userptr
interface can identify memory type through vm_flags value and would get
pages or page frame numbers to user space appropriately.

changelog v2:
the memory region mmaped with VM_PFNMAP type is physically continuous and
start address of the memory region should be set into buf->dma_addr but
previous patch had a problem that end address is set into buf->dma_addr
so v2 fixes that problem.

changelog v3:
mitigated the issues pointed out by Dave and Jerome.

for this, added some codes to guarantee the pages to user space not
to be swapped out, the VMAs within the user space would be locked and
then unlocked when the pages are released.

but this lock might result in significant degradation of system performance
because the pages couldn't be swapped out so added one more feature
that we can limit user-desired userptr size to pre-defined value using
userptr limit ioctl that can be accessed by only root user.
these issues had been pointed out by Dave and Jerome.

Inki Dae (2):
  drm/exynos: added userptr limit ioctl.
  drm/exynos: added userptr feature.

 drivers/gpu/drm/exynos/exynos_drm_drv.c |8 +
 drivers/gpu/drm/exynos/exynos_drm_drv.h |6 +
 drivers/gpu/drm/exynos/exynos_drm_gem.c |  356 +++
 drivers/gpu/drm/exynos/exynos_drm_gem.h |   20 ++-
 include/drm/exynos_drm.h|   43 -
 5 files changed, 430 insertions(+), 3 deletions(-)

-- 
1.7.4.1

[PATCH 2/2 v3] drm/exynos: added userptr feature.

2012-05-09 Thread Jerome Glisse

On Wed, May 9, 2012 at 10:45 AM, Jerome Glisse  wrote:
> On Wed, May 9, 2012 at 2:17 AM, Inki Dae  wrote:
>> this feature is used to import user space region allocated by malloc() or
>> mmaped into a gem. and to guarantee the pages to user space not to be
>> swapped out, the VMAs within the user space would be locked and then unlocked
>> when the pages are released.
>>
>> but this lock might result in significant degradation of system performance
>> because the pages couldn't be swapped out so we limit user-desired userptr
>> size to pre-defined.
>>
>> Signed-off-by: Inki Dae 
>> Signed-off-by: Kyungmin Park 
>
>
> Again i would like feedback from mm people (adding cc). I am not sure
> locking the vma is the right anwser as i said in my previous mail,
> userspace can munlock it in your back, maybe VM_RESERVED is better.
> Anyway even not considering that you don't check at all that process
> don't go over the limit of locked page see mm/mlock.c RLIMIT_MEMLOCK
> for how it's done. Also you mlock complete vma but the userptr you get
> might be inside say 16M vma and you only care about 1M of userptr, if
> you mark the whole vma as locked than anytime a new page is fault in
> the vma else where than in the buffer you are interested then it got
> allocated for ever until the gem buffer is destroy, i am not sure of
> what happen to the vma on next malloc if it grows or not (i would
> think it won't grow at it would have different flags than new
> anonymous memory).
>
> The whole business of directly using malloced memory for gpu is fishy
> and i would really like to get it right rather than relying on never
> hitting strange things like page migration, vma merging, or worse
> things like over locking pages and stealing memory.
>
> Cheers,
> Jerome

I had a lengthy discussion with mm people (thx a lot for that). I
think we should split 2 different use case. The zero-copy upload case
ie :
app:
ptr = malloc()
...
glTex/VBO/UBO/...(ptr)
free(ptr) or reuse it for other things
For which i guess you want to avoid having to do a memcpy inside the
gl library (could be anything else than gl that have same useage
pattern).

ie after the upload happen you don't care about those page they can
removed from the vma or marked as cow so that anything messing with
those page after the upload won't change what you uploaded. Of course
this is assuming that the tlb cost of doing such thing is smaller than
the cost of memcpy the data.

Two way to do that, either you assume app can't not read back data
after gl can and you do an unmap_mapping_range (make sure you only
unmap fully covered page and that you copy non fully covered page) or
you want to allow userspace to still read data or possibly overwrite
them

Second use case is something more like for the opencl case of
CL_MEM_USE_HOST_PTR, in which you want to use the same page in the gpu
and keep the userspace vma pointing to those page. I think the
agreement on this case is that there is no way right now to do it
sanely inside linux kernel. mlocking will need proper accounting
against rtlimit but this limit might be low. Also the fork case might
be problematic.

For the fork case the memory is anonymous so it should be COWed in the
fork child but relative to cl context that means the child could not
use the cl context with that memory or at least if the child write to
this memory the cl will not see those change. I guess the answer to
that one is that you really need to use the cl api to read the object
or get proper ptr to read it.

Anyway in all case, implementing this userptr thing need a lot more
code. You have to check to that the vma you are trying to use is
anonymous and only handle this case and fallback to alloc new page and
copy otherwise..

Cheers,
Jerome

[Bug 49632] radeon: The kernel rejected CS,

2012-05-09 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=49632

--- Comment #9 from execute.method at gmail.com 2012-05-09 06:36:57 PDT ---
No, there is nothing else in dmesg. Is there any more info you'd like me to
gather?

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.

[Bug 49632] radeon: The kernel rejected CS,

2012-05-09 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=49632

--- Comment #8 from Alex Deucher  2012-05-09 06:14:17 PDT 
---
(In reply to comment #6)
> Ok. I didn't realize that. I have also tried with 3.4rc8, with the same 
> result.
> Did i miss something in building the kernel?

Are you getting the same error about forbidden register 0x00028354?  That
register is in the allowed list for 3.4 so you shouldn't be getting that error.

> 
> Also, what about projectM without streamout enabled?

Is there anything else in dmesg other than *ERROR* Failed to parse relocation
-12?

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.

[PATCH] drm/radeon: don't mess with hot plug detect for eDP or LVDS connector v2

2012-05-09 Thread Alex Deucher

On Wed, May 9, 2012 at 10:23 AM, Jerome Glisse  wrote:
> On Wed, May 9, 2012 at 9:40 AM, Alex Deucher  wrote:
>> On Fri, May 4, 2012 at 11:06 AM, ? wrote:
>>> From: Jerome Glisse 
>>>
>>> It seems imac pannel doesn't like whe we change the hot plug setup
>>> and then refuse to work. This help but doesn't fully fix:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=726143
>>
>> How does it help? ?Does it fix the the aux problems, but the monitor
>> still doesn't train? ?What's the working value of the relevant
>> DC_HPD*_CONTROL register?
>>
>> Alex
>
> Don't have the hw but somehow the way we program this reg completely
> disable the panel, after that the panel doesn't answer to anything
> (nor i2c nor any aux transaction). Without programming that link
> training is successfull but panel stays black. I can ask to get the
> value before and after.

Patch seems reasonable in general (we don't really need hpd to be
explicitly enabled for lvds or edp) so:

Reviewed-by: Alex Deucher 

>
> Cheers,
> Jerome

[PATCH] drm/radeon/kms: fix warning on 32-bit in atomic fence printing

2012-05-09 Thread Jerome Glisse

On Wed, May 9, 2012 at 12:28 PM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> /ssd/git/drm-core-next/drivers/gpu/drm/radeon/radeon_fence.c: In function 
> ?radeon_debugfs_fence_info?:
> /ssd/git/drm-core-next/drivers/gpu/drm/radeon/radeon_fence.c:606:7: warning: 
> format ?%lx? expects argument of type ?long unsigned int?, but argument 3 has 
> type ?long long int? [-Wformat]
>
> Signed-off-by: Dave Airlie 

Reviewed-by: Jerome Glisse 

> ---
> ?drivers/gpu/drm/radeon/radeon_fence.c | ? ?4 ++--
> ?1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
> b/drivers/gpu/drm/radeon/radeon_fence.c
> index 48ec5e3..11f5f40 100644
> --- a/drivers/gpu/drm/radeon/radeon_fence.c
> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
> @@ -602,8 +602,8 @@ static int radeon_debugfs_fence_info(struct seq_file *m, 
> void *data)
> ? ? ? ? ? ? ? ? ? ? ? ?continue;
>
> ? ? ? ? ? ? ? ?seq_printf(m, "--- ring %d ---\n", i);
> - ? ? ? ? ? ? ? seq_printf(m, "Last signaled fence 0x%016lx\n",
> - ? ? ? ? ? ? ? ? ? ? ? ? ?atomic64_read(>fence_drv[i].last_seq));
> + ? ? ? ? ? ? ? seq_printf(m, "Last signaled fence 0x%016llx\n",
> + ? ? ? ? ? ? ? ? ? ? ? ? ?(unsigned long 
> long)atomic64_read(>fence_drv[i].last_seq));
> ? ? ? ? ? ? ? ?seq_printf(m, "Last emitted ?0x%016llx\n",
> ? ? ? ? ? ? ? ? ? ? ? ? ? rdev->fence_drv[i].seq);
> ? ? ? ?}
> --
> 1.7.7.6
>
> ___
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCHv5 08/13] v4l: vb2-dma-contig: add support for scatterlist in userptr mode

2012-05-09 Thread Subash Patel

Hello Tomasz, Laurent,

I have printed some logs during the dmabuf export and attach for the SGT 
issue below. Please find it in the attachment. I hope it will be useful.

Regards,
Subash

On 05/08/2012 04:45 PM, Subash Patel wrote:
> Hi Laurent,
>
> On 05/08/2012 02:44 PM, Laurent Pinchart wrote:
>> Hi Subash,
>>
>> On Monday 07 May 2012 20:08:25 Subash Patel wrote:
>>> Hello Thomasz, Laurent,
>>>
>>> I found an issue in the function vb2_dc_pages_to_sgt() below. I saw that
>>> during the attach, the size of the SGT and size requested mis-matched
>>> (by atleast 8k bytes). Hence I made a small correction to the code as
>>> below. I could then attach the importer properly.
>>
>> Thank you for the report.
>>
>> Could you print the content of the sglist (number of chunks and size
>> of each
>> chunk) before and after your modifications, as well as the values of
>> n_pages,
>> offset and size ?
> I will put back all the printk's and generate this. As of now, my setup
> has changed and will do this when I get sometime.
>>
>>> On 04/20/2012 08:15 PM, Tomasz Stanislawski wrote:
>>
>> [snip]
>>
 +static struct sg_table *vb2_dc_pages_to_sgt(struct page **pages,
 + unsigned int n_pages, unsigned long offset, unsigned long size)
 +{
 + struct sg_table *sgt;
 + unsigned int chunks;
 + unsigned int i;
 + unsigned int cur_page;
 + int ret;
 + struct scatterlist *s;
 +
 + sgt = kzalloc(sizeof *sgt, GFP_KERNEL);
 + if (!sgt)
 + return ERR_PTR(-ENOMEM);
 +
 + /* compute number of chunks */
 + chunks = 1;
 + for (i = 1; i< n_pages; ++i)
 + if (pages[i] != pages[i - 1] + 1)
 + ++chunks;
 +
 + ret = sg_alloc_table(sgt, chunks, GFP_KERNEL);
 + if (ret) {
 + kfree(sgt);
 + return ERR_PTR(-ENOMEM);
 + }
 +
 + /* merging chunks and putting them into the scatterlist */
 + cur_page = 0;
 + for_each_sg(sgt->sgl, s, sgt->orig_nents, i) {
 + unsigned long chunk_size;
 + unsigned int j;
>>>
>>> size = PAGE_SIZE;
>>>
 +
 + for (j = cur_page + 1; j< n_pages; ++j)
>>>
>>> for (j = cur_page + 1; j< n_pages; ++j) {
>>>
 + if (pages[j] != pages[j - 1] + 1)
 + break;
>>>
>>> size += PAGE
>>> }
>>>
 +
 + chunk_size = ((j - cur_page)<< PAGE_SHIFT) - offset;
 + sg_set_page(s, pages[cur_page], min(size, chunk_size), offset);
>>>
>>> [DELETE] size -= chunk_size;
>>>
 + offset = 0;
 + cur_page = j;
 + }
 +
 + return sgt;
 +}
>>
> Regards,
> Subash
-- next part --
[  178.545000] vb2_dc_pages_to_sgt() sgt->orig_nents=2
[  178.545000] vb2_dc_pages_to_sgt():83 cur_page=0
[  178.55] vb2_dc_pages_to_sgt():84 offset=0
[  178.555000] vb2_dc_pages_to_sgt():86 chunk_size=131072
[  178.56] vb2_dc_pages_to_sgt():89 size=4294836224
[  178.565000] vb2_dc_pages_to_sgt() sgt->orig_nents=2
[  178.57] vb2_dc_pages_to_sgt():83 cur_page=32
[  178.575000] vb2_dc_pages_to_sgt():84 offset=0
[  178.58] vb2_dc_pages_to_sgt():86 chunk_size=262144
[  178.585000] vb2_dc_pages_to_sgt():89 size=4294574080
[  178.59] vb2_dc_pages_to_sgt() sgt->orig_nents=1
[  178.595000] vb2_dc_pages_to_sgt():83 cur_page=0
[  178.60] vb2_dc_pages_to_sgt():84 offset=0
[  178.605000] vb2_dc_pages_to_sgt():86 chunk_size=8192
[  178.61] vb2_dc_pages_to_sgt():89 size=4294959104
[  178.625000] vb2_dc_pages_to_sgt() sgt->orig_nents=1
[  178.625000] vb2_dc_pages_to_sgt():83 cur_page=0
[  178.63] vb2_dc_pages_to_sgt():84 offset=0
[  178.635000] vb2_dc_pages_to_sgt():86 chunk_size=131072
[  178.64] vb2_dc_pages_to_sgt():89 size=4294836224
[  178.645000] vb2_dc_pages_to_sgt() sgt->orig_nents=1
[  178.65] vb2_dc_pages_to_sgt():83 cur_page=0
[  178.655000] vb2_dc_pages_to_sgt():84 offset=0
[  178.66] vb2_dc_pages_to_sgt():86 chunk_size=131072
[  178.665000] vb2_dc_pages_to_sgt():89 size=4294836224
[  178.67] vb2_dc_mmap: mapped dma addr 0x2006 at 0xb6e01000, size 
131072
[  178.67] vb2_dc_mmap: mapped dma addr 0x2008 at 0xb6de1000, size 
131072
[  178.68] vb2_dc_pages_to_sgt() sgt->orig_nents=2
[  178.685000] vb2_dc_pages_to_sgt():83 cur_page=0
[  178.69] vb2_dc_pages_to_sgt():84 offset=0
[  178.695000] vb2_dc_pages_to_sgt():86 chunk_size=4096
[  178.70] vb2_dc_pages_to_sgt():89 size=4294963200
[  178.705000] vb2_dc_pages_to_sgt() sgt->orig_nents=2
[  178.71] vb2_dc_pages_to_sgt():83 cur_page=1
[  178.715000] vb2_dc_pages_to_sgt():84 offset=0
[  178.715000] vb2_dc_pages_to_sgt():86 chunk_size=8192
[  178.72] vb2_dc_pages_to_sgt():89 size=4294955008
[  178.725000] vb2_dc_pages_to_sgt() sgt->orig_nents=1
[  178.73] vb2_dc_pages_to_sgt():83 cur_page=0
[  178.735000] vb2_dc_pages_to_sgt():84 offset=0
[  178.74] vb2_dc_pages_to_sgt():86 chunk_size=8192
[  178.745000] vb2_dc_pages_to_sgt():89 size=4294959104
[  178.75] vb2_dc_pages_to_sgt() sgt->orig_nents=1
[  178.755000] vb2_dc_pages_to_sgt():83 cur_page=0
[  178.76]

[PATCHv5 08/13] v4l: vb2-dma-contig: add support for scatterlist in userptr mode

2012-05-09 Thread Tomasz Stanislawski

Hi Subash,
Could you post the code of vb2_dc_pages_to_sgt with all printk in it.
It will help us avoid guessing where and what is debugged in the log.

Moreover, I found a line 'size=4294836224' in the log.
It means that size is equal to -131072 (!?!) or there are some invalid
conversions in printk.

Are you suze that you do not pass size = 0 as the function argument?

Notice that earlier versions of dmabuf-for-vb2 patches has offset2
argument instead of size. It was the offset at the end of the buffer.
In (I guess) 95% of cases this offset was 0.

Could you provide only function arguments that causes the failure?
I mean pages array + size (I assume that offset is zero for your test).

Having the arguments we could reproduce that bug.

Regards,
Tomasz Stanislawski





On 05/09/2012 08:46 AM, Subash Patel wrote:
> Hello Tomasz, Laurent,
> 
> I have printed some logs during the dmabuf export and attach for the SGT 
> issue below. Please find it in the attachment. I hope
> it will be useful.
> 
> Regards,
> Subash
> 
> On 05/08/2012 04:45 PM, Subash Patel wrote:
>> Hi Laurent,
>>
>> On 05/08/2012 02:44 PM, Laurent Pinchart wrote:
>>> Hi Subash,
>>>
>>> On Monday 07 May 2012 20:08:25 Subash Patel wrote:
 Hello Thomasz, Laurent,

 I found an issue in the function vb2_dc_pages_to_sgt() below. I saw that
 during the attach, the size of the SGT and size requested mis-matched
 (by atleast 8k bytes). Hence I made a small correction to the code as
 below. I could then attach the importer properly.
>>>
>>> Thank you for the report.
>>>
>>> Could you print the content of the sglist (number of chunks and size
>>> of each
>>> chunk) before and after your modifications, as well as the values of
>>> n_pages,
>>> offset and size ?
>> I will put back all the printk's and generate this. As of now, my setup
>> has changed and will do this when I get sometime.
>>>
 On 04/20/2012 08:15 PM, Tomasz Stanislawski wrote:
>>>
>>> [snip]
>>>
> +static struct sg_table *vb2_dc_pages_to_sgt(struct page **pages,
> + unsigned int n_pages, unsigned long offset, unsigned long size)
> +{
> + struct sg_table *sgt;
> + unsigned int chunks;
> + unsigned int i;
> + unsigned int cur_page;
> + int ret;
> + struct scatterlist *s;
> +
> + sgt = kzalloc(sizeof *sgt, GFP_KERNEL);
> + if (!sgt)
> + return ERR_PTR(-ENOMEM);
> +
> + /* compute number of chunks */
> + chunks = 1;
> + for (i = 1; i< n_pages; ++i)
> + if (pages[i] != pages[i - 1] + 1)
> + ++chunks;
> +
> + ret = sg_alloc_table(sgt, chunks, GFP_KERNEL);
> + if (ret) {
> + kfree(sgt);
> + return ERR_PTR(-ENOMEM);
> + }
> +
> + /* merging chunks and putting them into the scatterlist */
> + cur_page = 0;
> + for_each_sg(sgt->sgl, s, sgt->orig_nents, i) {
> + unsigned long chunk_size;
> + unsigned int j;

 size = PAGE_SIZE;

> +
> + for (j = cur_page + 1; j< n_pages; ++j)

 for (j = cur_page + 1; j< n_pages; ++j) {

> + if (pages[j] != pages[j - 1] + 1)
> + break;

 size += PAGE
 }

> +
> + chunk_size = ((j - cur_page)<< PAGE_SHIFT) - offset;
> + sg_set_page(s, pages[cur_page], min(size, chunk_size), offset);

 [DELETE] size -= chunk_size;

> + offset = 0;
> + cur_page = j;
> + }
> +
> + return sgt;
> +}
>>>
>> Regards,
>> Subash

[PATCH 14/20] drm/radeon: multiple ring allocator v2

2012-05-09 Thread Christian König

On 08.05.2012 16:55, Jerome Glisse wrote:
> Still i don't want to loop more than necessary, it's bad, i am ok with :
> http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v3.patch
>
> If there is fence signaled it will retry 2 times at most, otherwise it
> will go to wait and
> that way better. Because with your while loop the worst case is
> something proportional to
> the manager size given it's 1Mo it can loop for a long long time.
Yeah, this loop can indeed consume quite some time. Ok then let's at 
least give every ring a chance to supply some holes, otherwise I fear 
that we might not even found something worth to wait for after only 2 tries.

Going to send that out after figuring out why the patchset still causes 
texture corruptions on my system.

Christian.

[Bug 49632] radeon: The kernel rejected CS,

2012-05-09 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=49632

--- Comment #7 from execute.method at gmail.com 2012-05-09 04:15:06 PDT ---
(In reply to comment #6)
> Ok. I didn't realize that. I have also tried with 3.4rc8, with the same 
> result.
> Did i miss something in building the kernel?
> 
> Also, what about projectM without streamout enabled?

Sorry, mean rc5.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.

[PATCH] mm: Work around Intel SNB GTT bug with some physical pages.

2012-05-09 Thread Daniel Vetter

On Tue, May 08, 2012 at 02:57:25PM -0700, Hugh Dickins wrote:
> On Mon, 7 May 2012, Stephane Marchesin wrote:
> 
> > While investing some Sandy Bridge rendering corruption, I found out
> > that all physical memory pages below 1MiB were returning garbage when
> > read through the GTT. This has been causing graphics corruption (when
> > it's used for textures, render targets and pixmaps) and GPU hangups
> > (when it's used for GPU batch buffers).
> > 
> > I talked with some people at Intel and they confirmed my findings,
> > and said that a couple of other random pages were also affected.
> > 
> > We could fix this problem by adding an e820 region preventing the
> > memory below 1 MiB to be used, but that prevents at least my machine
> > from booting. One could think that we should be able to fix it in
> > i915, but since the allocation is done by the backing shmem this is
> > not possible.
> > 
> > In the end, I came up with the ugly workaround of just leaking the
> > offending pages in shmem.c. I do realize it's truly ugly, but I'm
> > looking for a fix to the existing code, and am wondering if people on
> > this list have a better idea, short of rewriting i915_gem.c to
> > allocate its own pages directly.
> > 
> > Signed-off-by: Stephane Marchesin 
> 
> Well done for discovering and pursuing this issue, but of course (as
> you know: you're trying to provoke us to better) your patch is revolting.
> 
> And not even enough: swapin readahead and swapoff can read back
> from swap into pages which the i915 will later turn out to dislike.
> 
> I do have a shmem.c patch coming up for gma500, which cannot use pages
> over 4GB; but that fits more reasonably with memory allocation policies,
> where we expect that anyone who can use a high page can use a lower as
> well, and there's already __GFP_DMA32 to set the limit.
> 
> Your limitation is at the opposite end, so that patch won't help you at
> all.  And I don't see how Andi's ZONE_DMA exclusion would work, without
> equal hackery to enable private zonelists, avoiding that convention.
> 
> i915 is not the only user of shmem, and x86 not the only architecture:
> we're not going to make everyone suffer for this.  Once the memory
> allocator gets down to giving you the low 1MB, my guess is that it's
> already short of memory, and liable to deadlock or OOM if you refuse
> and soak up every page it then gives you.  Even if i915 has to live
> with that possibility, we're not going to extend it to everyone else.
> 
> arch/x86/Kconfig has X86_RESERVE_LOW, default 64, range 4 640 (and
> I think we reserve all the memory range from 640kB to 1MB anyway).
> Would setting that to 640 allow you to boot, and avoid the i915
> problem on all but the odd five pages?  I'm not pretending that's
> an ideal solution at all (unless freeing initmem could release most
> of it on non-SandyBridge and non-i915 machines), but it would be
> useful to know if that does provide a stopgap solution.  If that
> does work, maybe we just mark the odd five PageReserved at startup.

Hm, as a stopgap measure to make Sandybridge gpus not die that sounds
pretty good. But we still need a more generic solution for the long-term,
see below

> Is there really no way this can be handled closer to the source of
> the problem, in the i915 driver itself?  I do not know the flow of
> control in i915 (and X) at all, but on the surface it seems that the
> problem only comes when you map these problematic pages into the GTT
> (if I'm even using the right terminology), and something (not shmem.c)
> actively has to do that.
> 
> Can't you check the pfn at that point, and if it's an unsuitable page,
> copy into a suitable page (perhaps allocated then, perhaps from a pool
> you primed earlier) and map that suitable page into the GTT instead?
> Maybe using page->private to link them if that helps.
> 
> So long as the page (or its shadow) is mapped into the GTT, I imagine
> it would be pinned, and not liable to be swapped out or otherwise
> interfered with by shmem.c.  And when you unmap it from GTT, copy
> back to the unsuitable shmem object page before unpinning.
> 
> I fully accept that I have very little understanding of GPU DRM GTT
> and i915, and this may be impossible or incoherent: but please, let's
> try to keep the strangeness where it belongs.  If necessary, we'll
> have add some kind of flag and callback from shmem.c to the driver;
> but I'd so much prefer to avoid that.

The copy stuff back approach is pretty much what ttm uses atm: It
allocates suitable pages with whatever means it has (usually through the
dma api) and if the shrinker callback tells it that it's sitting on too
much memory, it copies stuff out to the shmem backing storage used by gem.

There are quite a few issues with that approach:
- We expose mmap to the shmem file directly to userspace in i915. We use
  these extensively on Sandybridge because there direct cpu access is
  coherent with what the gpu does. Original userspace would always

[Bug 49632] radeon: The kernel rejected CS,

2012-05-09 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=49632

--- Comment #6 from execute.method at gmail.com 2012-05-09 03:55:23 PDT ---
Ok. I didn't realize that. I have also tried with 3.4rc8, with the same result.
Did i miss something in building the kernel?

Also, what about projectM without streamout enabled?

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.

[PATCH 2/2 v3] drm/exynos: added userptr feature.

2012-05-09 Thread Jerome Glisse

On Wed, May 9, 2012 at 2:17 AM, Inki Dae  wrote:
> this feature is used to import user space region allocated by malloc() or
> mmaped into a gem. and to guarantee the pages to user space not to be
> swapped out, the VMAs within the user space would be locked and then unlocked
> when the pages are released.
>
> but this lock might result in significant degradation of system performance
> because the pages couldn't be swapped out so we limit user-desired userptr
> size to pre-defined.
>
> Signed-off-by: Inki Dae 
> Signed-off-by: Kyungmin Park 


Again i would like feedback from mm people (adding cc). I am not sure
locking the vma is the right anwser as i said in my previous mail,
userspace can munlock it in your back, maybe VM_RESERVED is better.
Anyway even not considering that you don't check at all that process
don't go over the limit of locked page see mm/mlock.c RLIMIT_MEMLOCK
for how it's done. Also you mlock complete vma but the userptr you get
might be inside say 16M vma and you only care about 1M of userptr, if
you mark the whole vma as locked than anytime a new page is fault in
the vma else where than in the buffer you are interested then it got
allocated for ever until the gem buffer is destroy, i am not sure of
what happen to the vma on next malloc if it grows or not (i would
think it won't grow at it would have different flags than new
anonymous memory).

The whole business of directly using malloced memory for gpu is fishy
and i would really like to get it right rather than relying on never
hitting strange things like page migration, vma merging, or worse
things like over locking pages and stealing memory.

Cheers,
Jerome


> ---
> ?drivers/gpu/drm/exynos/exynos_drm_drv.c | ? ?2 +
> ?drivers/gpu/drm/exynos/exynos_drm_gem.c | ?334 
> +++
> ?drivers/gpu/drm/exynos/exynos_drm_gem.h | ? 17 ++-
> ?include/drm/exynos_drm.h ? ? ? ? ? ? ? ?| ? 26 +++-
> ?4 files changed, 376 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c 
> b/drivers/gpu/drm/exynos/exynos_drm_drv.c
> index 1e68ec2..e8ae3f1 100644
> --- a/drivers/gpu/drm/exynos/exynos_drm_drv.c
> +++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c
> @@ -220,6 +220,8 @@ static struct drm_ioctl_desc exynos_ioctls[] = {
> ? ? ? ?DRM_IOCTL_DEF_DRV(EXYNOS_GEM_MAP_OFFSET,
> ? ? ? ? ? ? ? ? ? ? ? ?exynos_drm_gem_map_offset_ioctl, DRM_UNLOCKED |
> ? ? ? ? ? ? ? ? ? ? ? ?DRM_AUTH),
> + ? ? ? DRM_IOCTL_DEF_DRV(EXYNOS_GEM_USERPTR,
> + ? ? ? ? ? ? ? ? ? ? ? exynos_drm_gem_userptr_ioctl, DRM_UNLOCKED),
> ? ? ? ?DRM_IOCTL_DEF_DRV(EXYNOS_GEM_MMAP,
> ? ? ? ? ? ? ? ? ? ? ? ?exynos_drm_gem_mmap_ioctl, DRM_UNLOCKED | DRM_AUTH),
> ? ? ? ?DRM_IOCTL_DEF_DRV(EXYNOS_GEM_GET,
> diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c 
> b/drivers/gpu/drm/exynos/exynos_drm_gem.c
> index e6abb66..ccc6e3d 100644
> --- a/drivers/gpu/drm/exynos/exynos_drm_gem.c
> +++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c
> @@ -68,6 +68,83 @@ static int check_gem_flags(unsigned int flags)
> ? ? ? ?return 0;
> ?}
>
> +static struct vm_area_struct *get_vma(struct vm_area_struct *vma)
> +{
> + ? ? ? struct vm_area_struct *vma_copy;
> +
> + ? ? ? vma_copy = kmalloc(sizeof(*vma_copy), GFP_KERNEL);
> + ? ? ? if (!vma_copy)
> + ? ? ? ? ? ? ? return NULL;
> +
> + ? ? ? if (vma->vm_ops && vma->vm_ops->open)
> + ? ? ? ? ? ? ? vma->vm_ops->open(vma);
> +
> + ? ? ? if (vma->vm_file)
> + ? ? ? ? ? ? ? get_file(vma->vm_file);
> +
> + ? ? ? memcpy(vma_copy, vma, sizeof(*vma));
> +
> + ? ? ? vma_copy->vm_mm = NULL;
> + ? ? ? vma_copy->vm_next = NULL;
> + ? ? ? vma_copy->vm_prev = NULL;
> +
> + ? ? ? return vma_copy;
> +}
> +
> +
> +static void put_vma(struct vm_area_struct *vma)
> +{
> + ? ? ? if (!vma)
> + ? ? ? ? ? ? ? return;
> +
> + ? ? ? if (vma->vm_ops && vma->vm_ops->close)
> + ? ? ? ? ? ? ? vma->vm_ops->close(vma);
> +
> + ? ? ? if (vma->vm_file)
> + ? ? ? ? ? ? ? fput(vma->vm_file);
> +
> + ? ? ? kfree(vma);
> +}
> +
> +/*
> + * lock_userptr_vma - lock VMAs within user address space
> + *
> + * this function locks vma within user address space to avoid pages
> + * to the userspace from being swapped out.
> + * if this vma isn't locked, the pages to the userspace could be swapped out
> + * so unprivileged user might access different pages and dma of any device
> + * could access physical memory region not intended once swap-in.
> + */
> +static int lock_userptr_vma(struct exynos_drm_gem_buf *buf, unsigned int 
> lock)
> +{
> + ? ? ? struct vm_area_struct *vma;
> + ? ? ? unsigned long start, end;
> +
> + ? ? ? start = buf->userptr;
> + ? ? ? end = buf->userptr + buf->size - 1;
> +
> + ? ? ? down_write(>mm->mmap_sem);
> +
> + ? ? ? do {
> + ? ? ? ? ? ? ? vma = find_vma(current->mm, start);
> + ? ? ? ? ? ? ? if (!vma) {
> + ? ? ? ? ? ? ? ? ? ? ? up_write(>mm->mmap_sem);
> + ? ? ? ? ? ? ? ? ? ? ? return -EFAULT;
> + ? ? ? ? ? ? ? }
> +
> + ? ? ? ? ? ? ? if (lock)
> + ? ? ? ? ? ? ? ? ? ? ? vma->vm_flags |= VM_LOCKED;
> + ? ? ? ? ? ? ? else
> + ? ? ?

Include request for SA improvements

2012-05-09 Thread Jerome Glisse

On Wed, May 9, 2012 at 9:34 AM, Christian K?nig  
wrote:
> Hi Dave & Jerome and everybody on the list,
>
> I can't find any more bugs and also I'm out of things to test, so I really
> hope that this is the last incarnation of this patchset, and if Jerome is
> ok with it it should now be included into drm-next.
>
> Cheers,
> Christian.
>

Yeah looks good to me.

Cheers,
Jerome

[PATCH] drm/radeon: don't mess with hot plug detect for eDP or LVDS connector v2

2012-05-09 Thread Jerome Glisse

On Wed, May 9, 2012 at 9:40 AM, Alex Deucher  wrote:
> On Fri, May 4, 2012 at 11:06 AM, ? wrote:
>> From: Jerome Glisse 
>>
>> It seems imac pannel doesn't like whe we change the hot plug setup
>> and then refuse to work. This help but doesn't fully fix:
>> https://bugzilla.redhat.com/show_bug.cgi?id=726143
>
> How does it help? ?Does it fix the the aux problems, but the monitor
> still doesn't train? ?What's the working value of the relevant
> DC_HPD*_CONTROL register?
>
> Alex

Don't have the hw but somehow the way we program this reg completely
disable the panel, after that the panel doesn't answer to anything
(nor i2c nor any aux transaction). Without programming that link
training is successfull but panel stays black. I can ask to get the
value before and after.

Cheers,
Jerome

[PATCH] drm/radeon: don't mess with hot plug detect for eDP or LVDS connector v2

2012-05-09 Thread Alex Deucher

On Fri, May 4, 2012 at 11:06 AM,   wrote:
> From: Jerome Glisse 
>
> It seems imac pannel doesn't like whe we change the hot plug setup
> and then refuse to work. This help but doesn't fully fix:
> https://bugzilla.redhat.com/show_bug.cgi?id=726143

How does it help?  Does it fix the the aux problems, but the monitor
still doesn't train?  What's the working value of the relevant
DC_HPD*_CONTROL register?

Alex

>
> v2: fix typo and improve commit message
>
> Signed-off-by: Matthew Garrett 
> Signed-off-by: Jerome Glisse 
> ---
> ?drivers/gpu/drm/radeon/r600.c | ? ?8 
> ?1 file changed, 8 insertions(+)
>
> diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
> index 694b6b2..a304c9d 100644
> --- a/drivers/gpu/drm/radeon/r600.c
> +++ b/drivers/gpu/drm/radeon/r600.c
> @@ -713,6 +713,14 @@ void r600_hpd_init(struct radeon_device *rdev)
> ? ? ? ?list_for_each_entry(connector, >mode_config.connector_list, head) 
> {
> ? ? ? ? ? ? ? ?struct radeon_connector *radeon_connector = 
> to_radeon_connector(connector);
>
> + ? ? ? ? ? ? ? if (connector->connector_type == DRM_MODE_CONNECTOR_eDP ||
> + ? ? ? ? ? ? ? ? ? connector->connector_type == DRM_MODE_CONNECTOR_LVDS) {
> + ? ? ? ? ? ? ? ? ? ? ? /* don't try to enable hpd on eDP or LVDS avoid 
> breaking the
> + ? ? ? ? ? ? ? ? ? ? ? ?* aux dp channel on imac and help (but not 
> completely fix)
> + ? ? ? ? ? ? ? ? ? ? ? ?* https://bugzilla.redhat.com/show_bug.cgi?id=726143
> + ? ? ? ? ? ? ? ? ? ? ? ?*/
> + ? ? ? ? ? ? ? ? ? ? ? continue;
> + ? ? ? ? ? ? ? }
> ? ? ? ? ? ? ? ?if (ASIC_IS_DCE3(rdev)) {
> ? ? ? ? ? ? ? ? ? ? ? ?u32 tmp = DC_HPDx_CONNECTION_TIMER(0x9c4) | 
> DC_HPDx_RX_INT_TIMER(0xfa);
> ? ? ? ? ? ? ? ? ? ? ? ?if (ASIC_IS_DCE32(rdev))
> --
> 1.7.9.3
>
> ___
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 49603] [regression] Fullscreen video no longer smooth with GPU in low power mode

2012-05-09 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=49603

--- Comment #5 from Simon Farnsworth  
2012-05-09 02:30:15 PDT ---
Created attachment 61271
  --> https://bugs.freedesktop.org/attachment.cgi?id=61271
A program to stop the CPU entering low power states

Trying a different route. Can you compile the attached program with "gcc -o
stopsleep stopsleep.c" and leave it running while playing a video? It tells the
kernel to avoid using deep sleep states when idling.

If it helps, we have a clue. If it doesn't, and video decode doesn't saturate
all your CPU cores, can you try running "while true ; do true ; done" in a
background shell and see if that helps?

The goal of both of these is to see if the problem is that we're now letting
the CPU idle more than we used to, and finding that the resulting power save
modes hurt.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.

[PATCH] drm/radeon: eliminate redundant connector_names table

2012-05-09 Thread Alex Deucher

On Fri, May 4, 2012 at 11:25 AM, Ilija Hadzic
 wrote:
> connector_names table is just a repeat of information that
> already exists in drm_connector_enum_list and the same string
> can be retrieved using drm_get_connector_name function.
>
> Nuke the redundant table and use the proper function to retrieve
> the connector name.
>
> Signed-off-by: Ilija Hadzic 

Reviewed-by: Alex Deucher 

> ---
> ?drivers/gpu/drm/radeon/radeon_display.c | ? 20 +---
> ?1 files changed, 1 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon_display.c 
> b/drivers/gpu/drm/radeon/radeon_display.c
> index 8086c96..60aecc5 100644
> --- a/drivers/gpu/drm/radeon/radeon_display.c
> +++ b/drivers/gpu/drm/radeon/radeon_display.c
> @@ -572,24 +572,6 @@ static const char *encoder_names[36] = {
> ? ? ? ?"TRAVIS",
> ?};
>
> -static const char *connector_names[15] = {
> - ? ? ? "Unknown",
> - ? ? ? "VGA",
> - ? ? ? "DVI-I",
> - ? ? ? "DVI-D",
> - ? ? ? "DVI-A",
> - ? ? ? "Composite",
> - ? ? ? "S-video",
> - ? ? ? "LVDS",
> - ? ? ? "Component",
> - ? ? ? "DIN",
> - ? ? ? "DisplayPort",
> - ? ? ? "HDMI-A",
> - ? ? ? "HDMI-B",
> - ? ? ? "TV",
> - ? ? ? "eDP",
> -};
> -
> ?static const char *hpd_names[6] = {
> ? ? ? ?"HPD1",
> ? ? ? ?"HPD2",
> @@ -612,7 +594,7 @@ static void radeon_print_display_setup(struct drm_device 
> *dev)
> ? ? ? ?list_for_each_entry(connector, >mode_config.connector_list, head) 
> {
> ? ? ? ? ? ? ? ?radeon_connector = to_radeon_connector(connector);
> ? ? ? ? ? ? ? ?DRM_INFO("Connector %d:\n", i);
> - ? ? ? ? ? ? ? DRM_INFO(" ?%s\n", 
> connector_names[connector->connector_type]);
> + ? ? ? ? ? ? ? DRM_INFO(" ?%s\n", drm_get_connector_name(connector));
> ? ? ? ? ? ? ? ?if (radeon_connector->hpd.hpd != RADEON_HPD_NONE)
> ? ? ? ? ? ? ? ? ? ? ? ?DRM_INFO(" ?%s\n", 
> hpd_names[radeon_connector->hpd.hpd]);
> ? ? ? ? ? ? ? ?if (radeon_connector->ddc_bus) {
> --
> 1.7.8.5
>
> ___
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH] MAINTAINERS: switch drm/i915 to Daniel Vetter

2012-05-09 Thread Daniel Vetter

On Wed, May 09, 2012 at 07:51:35AM +0100, Dave Airlie wrote:
> On Tue, May 8, 2012 at 12:19 PM, Daniel Vetter  
> wrote:
> > ... given that I essentially run the show already, let's make this
> > official.
> 
> Acked-by: Dave Airlie 
> 
> probably just push it via your -next.

Will do.

Thanks, Daniel
-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48

[PATCH] MAINTAINERS: switch drm/i915 to Daniel Vetter

2012-05-09 Thread Dave Airlie

On Tue, May 8, 2012 at 12:19 PM, Daniel Vetter  
wrote:
> ... given that I essentially run the show already, let's make this
> official.

Acked-by: Dave Airlie 

probably just push it via your -next.

Dave.

[Bug 49110] debug build: AMDILCFGStructurizer.cpp:1751:3: error: 'isCurrentDebugType' was not declared in this scope

2012-05-09 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=49110

--- Comment #6 from Mike Mestnik  2012-05-08 20:25:23 PDT ---
I've added 3 patches to http://llvm.org/bugs/show_bug.cgi?id=12759 and did my
best to describe what/why.

I believe that mesa also needs this done for it's use of NDEBUG, especially if
#if(s) are used to protect object exports in include files as was the case with
llvm.  Either way, it's still possible for mesa to not directly trample over
this project global define.

This is essentially what I did for llvm:
#ifdef LLVM_NDEBUG
#define NDEBUG LLVM_NDEBUG
#endif
#include 

In the case where assert.h is included in an include file, like FLAC and alsa
do, then NDEBUG should be save/restored around the assert.h include...  Not
that it'll do any good as in those situations it's first come first served and
projects that use assert will likely include it ~vary~ early.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.

[Bug 49603] [regression] Fullscreen video no longer smooth with GPU in low power mode

2012-05-09 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=49603

--- Comment #4 from Alex Deucher  2012-05-08 19:24:48 PDT 
---
Does forcing the CPU into the highest power state fix the issue?  I suspect
that the patch reduces CPU usage which in turn means the CPU power state stays
lower longer.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.

[PATCH 0/2 v3] drm/exynos: added userptr feature

2012-05-09 Thread Inki Dae

this feature could be used to use memory region allocated by malloc() in user
mode and mmaped memory region allocated by other memory allocators. userptr
interface can identify memory type through vm_flags value and would get
pages or page frame numbers to user space appropriately.

changelog v2:
the memory region mmaped with VM_PFNMAP type is physically continuous and
start address of the memory region should be set into buf-dma_addr but
previous patch had a problem that end address is set into buf-dma_addr
so v2 fixes that problem.

changelog v3:
mitigated the issues pointed out by Dave and Jerome.

for this, added some codes to guarantee the pages to user space not
to be swapped out, the VMAs within the user space would be locked and
then unlocked when the pages are released.

but this lock might result in significant degradation of system performance
because the pages couldn't be swapped out so added one more feature
that we can limit user-desired userptr size to pre-defined value using
userptr limit ioctl that can be accessed by only root user.
these issues had been pointed out by Dave and Jerome.

Inki Dae (2):
  drm/exynos: added userptr limit ioctl.
  drm/exynos: added userptr feature.

 drivers/gpu/drm/exynos/exynos_drm_drv.c |8 +
 drivers/gpu/drm/exynos/exynos_drm_drv.h |6 +
 drivers/gpu/drm/exynos/exynos_drm_gem.c |  356 +++
 drivers/gpu/drm/exynos/exynos_drm_gem.h |   20 ++-
 include/drm/exynos_drm.h|   43 -
 5 files changed, 430 insertions(+), 3 deletions(-)

-- 
1.7.4.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 1/2 v3] drm/exynos: added userptr limit ioctl.

2012-05-09 Thread Inki Dae

this ioctl is used to limit user-desired userptr size as pre-defined
and also could be accessed by only root user.

with userptr feature, unprivileged user can allocate all the pages on system,
so the amount of free physical pages will be very limited. if the VMAs
within user address space was locked, the pages couldn't be swapped out so
it may result in significant degradation of system performance. so this
feature would be used to avoid such situation.

Signed-off-by: Inki Dae inki@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 drivers/gpu/drm/exynos/exynos_drm_drv.c |6 ++
 drivers/gpu/drm/exynos/exynos_drm_drv.h |6 ++
 drivers/gpu/drm/exynos/exynos_drm_gem.c |   22 ++
 drivers/gpu/drm/exynos/exynos_drm_gem.h |3 +++
 include/drm/exynos_drm.h|   17 +
 5 files changed, 54 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c 
b/drivers/gpu/drm/exynos/exynos_drm_drv.c
index 9d3204c..1e68ec2 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_drv.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c
@@ -64,6 +64,9 @@ static int exynos_drm_load(struct drm_device *dev, unsigned 
long flags)
return -ENOMEM;
}
 
+   /* maximum size of userptr is limited to 16MB as default. */
+   private-userptr_limit = SZ_16M;
+
INIT_LIST_HEAD(private-pageflip_event_list);
dev-dev_private = (void *)private;
 
@@ -221,6 +224,9 @@ static struct drm_ioctl_desc exynos_ioctls[] = {
exynos_drm_gem_mmap_ioctl, DRM_UNLOCKED | DRM_AUTH),
DRM_IOCTL_DEF_DRV(EXYNOS_GEM_GET,
exynos_drm_gem_get_ioctl, DRM_UNLOCKED),
+   DRM_IOCTL_DEF_DRV(EXYNOS_USER_LIMIT,
+   exynos_drm_gem_user_limit_ioctl, DRM_MASTER |
+   DRM_ROOT_ONLY),
DRM_IOCTL_DEF_DRV(EXYNOS_PLANE_SET_ZPOS, exynos_plane_set_zpos_ioctl,
DRM_UNLOCKED | DRM_AUTH),
DRM_IOCTL_DEF_DRV(EXYNOS_VIDI_CONNECTION,
diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.h 
b/drivers/gpu/drm/exynos/exynos_drm_drv.h
index c82c90c..b38ed6f 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_drv.h
+++ b/drivers/gpu/drm/exynos/exynos_drm_drv.h
@@ -235,6 +235,12 @@ struct exynos_drm_private {
 * this array is used to be aware of which crtc did it request vblank.
 */
struct drm_crtc *crtc[MAX_CRTC];
+
+   /*
+* maximum size of allocation by userptr feature.
+* - as default, this has 16MB and only root user can change it.
+*/
+   unsigned long userptr_limit;
 };
 
 /*
diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c 
b/drivers/gpu/drm/exynos/exynos_drm_gem.c
index fc91293..e6abb66 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_gem.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c
@@ -33,6 +33,8 @@
 #include exynos_drm_gem.h
 #include exynos_drm_buf.h
 
+#define USERPTR_MAX_SIZE   SZ_64M
+
 static unsigned int convert_to_vm_err_msg(int msg)
 {
unsigned int out_msg;
@@ -630,6 +632,26 @@ int exynos_drm_gem_get_ioctl(struct drm_device *dev, void 
*data,
return 0;
 }
 
+int exynos_drm_gem_user_limit_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *filp)
+{
+   struct exynos_drm_private *priv = dev-dev_private;
+   struct drm_exynos_user_limit *limit = data;
+
+   if (limit-userptr_limit  PAGE_SIZE ||
+   limit-userptr_limit  USERPTR_MAX_SIZE) {
+   DRM_DEBUG_KMS(invalid userptr_limit size.\n);
+   return -EINVAL;
+   }
+
+   if (priv-userptr_limit == limit-userptr_limit)
+   return 0;
+
+   priv-userptr_limit = limit-userptr_limit;
+
+   return 0;
+}
+
 int exynos_drm_gem_init_object(struct drm_gem_object *obj)
 {
DRM_DEBUG_KMS(%s\n, __FILE__);
diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.h 
b/drivers/gpu/drm/exynos/exynos_drm_gem.h
index 14d038b..3334c9f 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_gem.h
+++ b/drivers/gpu/drm/exynos/exynos_drm_gem.h
@@ -78,6 +78,9 @@ struct exynos_drm_gem_obj {
 
 struct page **exynos_gem_get_pages(struct drm_gem_object *obj, gfp_t gfpmask);
 
+int exynos_drm_gem_user_limit_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *filp);
+
 /* destroy a buffer with gem object */
 void exynos_drm_gem_destroy(struct exynos_drm_gem_obj *exynos_gem_obj);
 
diff --git a/include/drm/exynos_drm.h b/include/drm/exynos_drm.h
index 54c97e8..52465dc 100644
--- a/include/drm/exynos_drm.h
+++ b/include/drm/exynos_drm.h
@@ -92,6 +92,19 @@ struct drm_exynos_gem_info {
 };
 
 /**
+ * A structure to userptr limited information.
+ *
+ * @userptr_limit: maximum size to userptr buffer.
+ * the buffer could be allocated by unprivileged user using malloc()
+ * and the size of the buffer would be limited as userptr_limit value.

[PATCH 2/2 v3] drm/exynos: added userptr feature.

2012-05-09 Thread Inki Dae

this feature is used to import user space region allocated by malloc() or
mmaped into a gem. and to guarantee the pages to user space not to be
swapped out, the VMAs within the user space would be locked and then unlocked
when the pages are released.

but this lock might result in significant degradation of system performance
because the pages couldn't be swapped out so we limit user-desired userptr
size to pre-defined.

Signed-off-by: Inki Dae inki@samsung.com
Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com
---
 drivers/gpu/drm/exynos/exynos_drm_drv.c |2 +
 drivers/gpu/drm/exynos/exynos_drm_gem.c |  334 +++
 drivers/gpu/drm/exynos/exynos_drm_gem.h |   17 ++-
 include/drm/exynos_drm.h|   26 +++-
 4 files changed, 376 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c 
b/drivers/gpu/drm/exynos/exynos_drm_drv.c
index 1e68ec2..e8ae3f1 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_drv.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c
@@ -220,6 +220,8 @@ static struct drm_ioctl_desc exynos_ioctls[] = {
DRM_IOCTL_DEF_DRV(EXYNOS_GEM_MAP_OFFSET,
exynos_drm_gem_map_offset_ioctl, DRM_UNLOCKED |
DRM_AUTH),
+   DRM_IOCTL_DEF_DRV(EXYNOS_GEM_USERPTR,
+   exynos_drm_gem_userptr_ioctl, DRM_UNLOCKED),
DRM_IOCTL_DEF_DRV(EXYNOS_GEM_MMAP,
exynos_drm_gem_mmap_ioctl, DRM_UNLOCKED | DRM_AUTH),
DRM_IOCTL_DEF_DRV(EXYNOS_GEM_GET,
diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c 
b/drivers/gpu/drm/exynos/exynos_drm_gem.c
index e6abb66..ccc6e3d 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_gem.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c
@@ -68,6 +68,83 @@ static int check_gem_flags(unsigned int flags)
return 0;
 }
 
+static struct vm_area_struct *get_vma(struct vm_area_struct *vma)
+{
+   struct vm_area_struct *vma_copy;
+
+   vma_copy = kmalloc(sizeof(*vma_copy), GFP_KERNEL);
+   if (!vma_copy)
+   return NULL;
+
+   if (vma-vm_ops  vma-vm_ops-open)
+   vma-vm_ops-open(vma);
+
+   if (vma-vm_file)
+   get_file(vma-vm_file);
+
+   memcpy(vma_copy, vma, sizeof(*vma));
+
+   vma_copy-vm_mm = NULL;
+   vma_copy-vm_next = NULL;
+   vma_copy-vm_prev = NULL;
+
+   return vma_copy;
+}
+
+
+static void put_vma(struct vm_area_struct *vma)
+{
+   if (!vma)
+   return;
+
+   if (vma-vm_ops  vma-vm_ops-close)
+   vma-vm_ops-close(vma);
+
+   if (vma-vm_file)
+   fput(vma-vm_file);
+
+   kfree(vma);
+}
+
+/*
+ * lock_userptr_vma - lock VMAs within user address space
+ *
+ * this function locks vma within user address space to avoid pages
+ * to the userspace from being swapped out.
+ * if this vma isn't locked, the pages to the userspace could be swapped out
+ * so unprivileged user might access different pages and dma of any device
+ * could access physical memory region not intended once swap-in.
+ */
+static int lock_userptr_vma(struct exynos_drm_gem_buf *buf, unsigned int lock)
+{
+   struct vm_area_struct *vma;
+   unsigned long start, end;
+
+   start = buf-userptr;
+   end = buf-userptr + buf-size - 1;
+
+   down_write(current-mm-mmap_sem);
+
+   do {
+   vma = find_vma(current-mm, start);
+   if (!vma) {
+   up_write(current-mm-mmap_sem);
+   return -EFAULT;
+   }
+
+   if (lock)
+   vma-vm_flags |= VM_LOCKED;
+   else
+   vma-vm_flags = ~VM_LOCKED;
+
+   start = vma-vm_end + 1;
+   } while (vma-vm_end  end);
+
+   up_write(current-mm-mmap_sem);
+
+   return 0;
+}
+
 static void update_vm_cache_attr(struct exynos_drm_gem_obj *obj,
struct vm_area_struct *vma)
 {
@@ -256,6 +333,44 @@ static void exynos_drm_gem_put_pages(struct drm_gem_object 
*obj)
/* add some codes for UNCACHED type here. TODO */
 }
 
+static void exynos_drm_put_userptr(struct drm_gem_object *obj)
+{
+   struct exynos_drm_gem_obj *exynos_gem_obj;
+   struct exynos_drm_gem_buf *buf;
+   struct vm_area_struct *vma;
+   int npages;
+
+   exynos_gem_obj = to_exynos_gem_obj(obj);
+   buf = exynos_gem_obj-buffer;
+   vma = exynos_gem_obj-vma;
+
+   if (vma  (vma-vm_flags  VM_PFNMAP)  (vma-vm_pgoff)) {
+   put_vma(exynos_gem_obj-vma);
+   goto out;
+   }
+
+   npages = buf-size  PAGE_SHIFT;
+
+   if (exynos_gem_obj-flags  EXYNOS_BO_USERPTR  !buf-pfnmap)
+   lock_userptr_vma(buf, 0);
+
+   npages--;
+   while (npages = 0) {
+   if (buf-write)
+   set_page_dirty_lock(buf-pages[npages]);
+
+   put_page(buf-pages[npages]);
+   npages--;
+   }
+
+out:
+

Re: [PATCHv5 08/13] v4l: vb2-dma-contig: add support for scatterlist in userptr mode

2012-05-09 Thread Subash Patel


Hi Laurent,

On 05/08/2012 02:44 PM, Laurent Pinchart wrote:

Hi Subash,

On Monday 07 May 2012 20:08:25 Subash Patel wrote:

Hello Thomasz, Laurent,

I found an issue in the function vb2_dc_pages_to_sgt() below. I saw that
during the attach, the size of the SGT and size requested mis-matched
(by atleast 8k bytes). Hence I made a small correction to the code as
below. I could then attach the importer properly.


Thank you for the report.

Could you print the content of the sglist (number of chunks and size of each
chunk) before and after your modifications, as well as the values of n_pages,
offset and size ?
I will put back all the printk's and generate this. As of now, my setup 
has changed and will do this when I get sometime.



On 04/20/2012 08:15 PM, Tomasz Stanislawski wrote:


[snip]


+static struct sg_table *vb2_dc_pages_to_sgt(struct page **pages,
+   unsigned int n_pages, unsigned long offset, unsigned long size)
+{
+   struct sg_table *sgt;
+   unsigned int chunks;
+   unsigned int i;
+   unsigned int cur_page;
+   int ret;
+   struct scatterlist *s;
+
+   sgt = kzalloc(sizeof *sgt, GFP_KERNEL);
+   if (!sgt)
+   return ERR_PTR(-ENOMEM);
+
+   /* compute number of chunks */
+   chunks = 1;
+   for (i = 1; i   n_pages; ++i)
+   if (pages[i] != pages[i - 1] + 1)
+   ++chunks;
+
+   ret = sg_alloc_table(sgt, chunks, GFP_KERNEL);
+   if (ret) {
+   kfree(sgt);
+   return ERR_PTR(-ENOMEM);
+   }
+
+   /* merging chunks and putting them into the scatterlist */
+   cur_page = 0;
+   for_each_sg(sgt-sgl, s, sgt-orig_nents, i) {
+   unsigned long chunk_size;
+   unsigned int j;


size = PAGE_SIZE;


+
+   for (j = cur_page + 1; j   n_pages; ++j)


for (j = cur_page + 1; j  n_pages; ++j) {


+   if (pages[j] != pages[j - 1] + 1)
+   break;


size += PAGE
}


+
+   chunk_size = ((j - cur_page)   PAGE_SHIFT) - offset;
+   sg_set_page(s, pages[cur_page], min(size, chunk_size), offset);


[DELETE] size -= chunk_size;


+   offset = 0;
+   cur_page = j;
+   }
+
+   return sgt;
+}



Regards,
Subash
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] mm: Work around Intel SNB GTT bug with some physical pages.

2012-05-09 Thread Hugh Dickins

On Mon, 7 May 2012, Stephane Marchesin wrote:

 While investing some Sandy Bridge rendering corruption, I found out
 that all physical memory pages below 1MiB were returning garbage when
 read through the GTT. This has been causing graphics corruption (when
 it's used for textures, render targets and pixmaps) and GPU hangups
 (when it's used for GPU batch buffers).
 
 I talked with some people at Intel and they confirmed my findings,
 and said that a couple of other random pages were also affected.
 
 We could fix this problem by adding an e820 region preventing the
 memory below 1 MiB to be used, but that prevents at least my machine
 from booting. One could think that we should be able to fix it in
 i915, but since the allocation is done by the backing shmem this is
 not possible.
 
 In the end, I came up with the ugly workaround of just leaking the
 offending pages in shmem.c. I do realize it's truly ugly, but I'm
 looking for a fix to the existing code, and am wondering if people on
 this list have a better idea, short of rewriting i915_gem.c to
 allocate its own pages directly.
 
 Signed-off-by: Stephane Marchesin marc...@chromium.org

Well done for discovering and pursuing this issue, but of course (as
you know: you're trying to provoke us to better) your patch is revolting.

And not even enough: swapin readahead and swapoff can read back
from swap into pages which the i915 will later turn out to dislike.

I do have a shmem.c patch coming up for gma500, which cannot use pages
over 4GB; but that fits more reasonably with memory allocation policies,
where we expect that anyone who can use a high page can use a lower as
well, and there's already __GFP_DMA32 to set the limit.

Your limitation is at the opposite end, so that patch won't help you at
all.  And I don't see how Andi's ZONE_DMA exclusion would work, without
equal hackery to enable private zonelists, avoiding that convention.

i915 is not the only user of shmem, and x86 not the only architecture:
we're not going to make everyone suffer for this.  Once the memory
allocator gets down to giving you the low 1MB, my guess is that it's
already short of memory, and liable to deadlock or OOM if you refuse
and soak up every page it then gives you.  Even if i915 has to live
with that possibility, we're not going to extend it to everyone else.

arch/x86/Kconfig has X86_RESERVE_LOW, default 64, range 4 640 (and
I think we reserve all the memory range from 640kB to 1MB anyway).
Would setting that to 640 allow you to boot, and avoid the i915
problem on all but the odd five pages?  I'm not pretending that's
an ideal solution at all (unless freeing initmem could release most
of it on non-SandyBridge and non-i915 machines), but it would be
useful to know if that does provide a stopgap solution.  If that
does work, maybe we just mark the odd five PageReserved at startup.

Is there really no way this can be handled closer to the source of
the problem, in the i915 driver itself?  I do not know the flow of
control in i915 (and X) at all, but on the surface it seems that the
problem only comes when you map these problematic pages into the GTT
(if I'm even using the right terminology), and something (not shmem.c)
actively has to do that.

Can't you check the pfn at that point, and if it's an unsuitable page,
copy into a suitable page (perhaps allocated then, perhaps from a pool
you primed earlier) and map that suitable page into the GTT instead?
Maybe using page-private to link them if that helps.

So long as the page (or its shadow) is mapped into the GTT, I imagine
it would be pinned, and not liable to be swapped out or otherwise
interfered with by shmem.c.  And when you unmap it from GTT, copy
back to the unsuitable shmem object page before unpinning.

I fully accept that I have very little understanding of GPU DRM GTT
and i915, and this may be impossible or incoherent: but please, let's
try to keep the strangeness where it belongs.  If necessary, we'll
have add some kind of flag and callback from shmem.c to the driver;
but I'd so much prefer to avoid that.

Hugh

 
 Change-Id: I957e125fb280e0b0d6b05a83cc4068df2f05aa0a
 ---
  mm/shmem.c |   39 +--
  1 files changed, 37 insertions(+), 2 deletions(-)
 
 diff --git a/mm/shmem.c b/mm/shmem.c
 index 6c253f7..dcbb58b 100644
 --- a/mm/shmem.c
 +++ b/mm/shmem.c
 @@ -768,6 +768,31 @@ redirty:
   return 0;
  }
  
 +/*
 + * Some intel GPUs can't use those pages in the GTT, which results in
 + * graphics corruption. Sadly, it's impossible to prevent usage of those
 + * pages in the intel allocator.
 + *
 + * Instead, we test for those areas here and leak the corresponding pages.
 + *
 + * Some day, when the intel GPU memory is not backed by shmem any more,
 + * we'll be able to come up with a solution which is contained in i915.
 + */
 +static bool i915_usable_page(struct page *page)
 +{
 + dma_addr_t addr = page_to_phys(page);
 +
 + if (unlikely((addr  1 * 1024 * 1024) ||
 +

Re: [PATCH] MAINTAINERS: switch drm/i915 to Daniel Vetter

2012-05-09 Thread Dave Airlie

On Tue, May 8, 2012 at 12:19 PM, Daniel Vetter daniel.vet...@ffwll.ch wrote:
 ... given that I essentially run the show already, let's make this
 official.

Acked-by: Dave Airlie airl...@redhat.com

probably just push it via your -next.

Dave.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] MAINTAINERS: switch drm/i915 to Daniel Vetter

2012-05-09 Thread Daniel Vetter

On Wed, May 09, 2012 at 07:51:35AM +0100, Dave Airlie wrote:
 On Tue, May 8, 2012 at 12:19 PM, Daniel Vetter daniel.vet...@ffwll.ch wrote:
  ... given that I essentially run the show already, let's make this
  official.
 
 Acked-by: Dave Airlie airl...@redhat.com
 
 probably just push it via your -next.

Will do.

Thanks, Daniel
-- 
Daniel Vetter
Mail: dan...@ffwll.ch
Mobile: +41 (0)79 365 57 48
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] mm: Work around Intel SNB GTT bug with some physical pages.

2012-05-09 Thread Daniel Vetter

On Tue, May 08, 2012 at 02:57:25PM -0700, Hugh Dickins wrote:
 On Mon, 7 May 2012, Stephane Marchesin wrote:
 
  While investing some Sandy Bridge rendering corruption, I found out
  that all physical memory pages below 1MiB were returning garbage when
  read through the GTT. This has been causing graphics corruption (when
  it's used for textures, render targets and pixmaps) and GPU hangups
  (when it's used for GPU batch buffers).
  
  I talked with some people at Intel and they confirmed my findings,
  and said that a couple of other random pages were also affected.
  
  We could fix this problem by adding an e820 region preventing the
  memory below 1 MiB to be used, but that prevents at least my machine
  from booting. One could think that we should be able to fix it in
  i915, but since the allocation is done by the backing shmem this is
  not possible.
  
  In the end, I came up with the ugly workaround of just leaking the
  offending pages in shmem.c. I do realize it's truly ugly, but I'm
  looking for a fix to the existing code, and am wondering if people on
  this list have a better idea, short of rewriting i915_gem.c to
  allocate its own pages directly.
  
  Signed-off-by: Stephane Marchesin marc...@chromium.org
 
 Well done for discovering and pursuing this issue, but of course (as
 you know: you're trying to provoke us to better) your patch is revolting.
 
 And not even enough: swapin readahead and swapoff can read back
 from swap into pages which the i915 will later turn out to dislike.
 
 I do have a shmem.c patch coming up for gma500, which cannot use pages
 over 4GB; but that fits more reasonably with memory allocation policies,
 where we expect that anyone who can use a high page can use a lower as
 well, and there's already __GFP_DMA32 to set the limit.
 
 Your limitation is at the opposite end, so that patch won't help you at
 all.  And I don't see how Andi's ZONE_DMA exclusion would work, without
 equal hackery to enable private zonelists, avoiding that convention.
 
 i915 is not the only user of shmem, and x86 not the only architecture:
 we're not going to make everyone suffer for this.  Once the memory
 allocator gets down to giving you the low 1MB, my guess is that it's
 already short of memory, and liable to deadlock or OOM if you refuse
 and soak up every page it then gives you.  Even if i915 has to live
 with that possibility, we're not going to extend it to everyone else.
 
 arch/x86/Kconfig has X86_RESERVE_LOW, default 64, range 4 640 (and
 I think we reserve all the memory range from 640kB to 1MB anyway).
 Would setting that to 640 allow you to boot, and avoid the i915
 problem on all but the odd five pages?  I'm not pretending that's
 an ideal solution at all (unless freeing initmem could release most
 of it on non-SandyBridge and non-i915 machines), but it would be
 useful to know if that does provide a stopgap solution.  If that
 does work, maybe we just mark the odd five PageReserved at startup.

Hm, as a stopgap measure to make Sandybridge gpus not die that sounds
pretty good. But we still need a more generic solution for the long-term,
see below

 Is there really no way this can be handled closer to the source of
 the problem, in the i915 driver itself?  I do not know the flow of
 control in i915 (and X) at all, but on the surface it seems that the
 problem only comes when you map these problematic pages into the GTT
 (if I'm even using the right terminology), and something (not shmem.c)
 actively has to do that.
 
 Can't you check the pfn at that point, and if it's an unsuitable page,
 copy into a suitable page (perhaps allocated then, perhaps from a pool
 you primed earlier) and map that suitable page into the GTT instead?
 Maybe using page-private to link them if that helps.
 
 So long as the page (or its shadow) is mapped into the GTT, I imagine
 it would be pinned, and not liable to be swapped out or otherwise
 interfered with by shmem.c.  And when you unmap it from GTT, copy
 back to the unsuitable shmem object page before unpinning.
 
 I fully accept that I have very little understanding of GPU DRM GTT
 and i915, and this may be impossible or incoherent: but please, let's
 try to keep the strangeness where it belongs.  If necessary, we'll
 have add some kind of flag and callback from shmem.c to the driver;
 but I'd so much prefer to avoid that.

The copy stuff backforth approach is pretty much what ttm uses atm: It
allocates suitable pages with whatever means it has (usually through the
dma api) and if the shrinker callback tells it that it's sitting on too
much memory, it copies stuff out to the shmem backing storage used by gem.

There are quite a few issues with that approach:
- We expose mmap to the shmem file directly to userspace in i915. We use
  these extensively on Sandybridge because there direct cpu access is
  coherent with what the gpu does. Original userspace would always tell
  the kernel when it's done writing through cpu mappings so that the

Re: [PATCHv5 08/13] v4l: vb2-dma-contig: add support for scatterlist in userptr mode

2012-05-09 Thread Subash Patel


Hello Tomasz, Laurent,

I have printed some logs during the dmabuf export and attach for the SGT 
issue below. Please find it in the attachment. I hope it will be useful.


Regards,
Subash

On 05/08/2012 04:45 PM, Subash Patel wrote:

Hi Laurent,

On 05/08/2012 02:44 PM, Laurent Pinchart wrote:

Hi Subash,

On Monday 07 May 2012 20:08:25 Subash Patel wrote:

Hello Thomasz, Laurent,

I found an issue in the function vb2_dc_pages_to_sgt() below. I saw that
during the attach, the size of the SGT and size requested mis-matched
(by atleast 8k bytes). Hence I made a small correction to the code as
below. I could then attach the importer properly.


Thank you for the report.

Could you print the content of the sglist (number of chunks and size
of each
chunk) before and after your modifications, as well as the values of
n_pages,
offset and size ?

I will put back all the printk's and generate this. As of now, my setup
has changed and will do this when I get sometime.



On 04/20/2012 08:15 PM, Tomasz Stanislawski wrote:


[snip]


+static struct sg_table *vb2_dc_pages_to_sgt(struct page **pages,
+ unsigned int n_pages, unsigned long offset, unsigned long size)
+{
+ struct sg_table *sgt;
+ unsigned int chunks;
+ unsigned int i;
+ unsigned int cur_page;
+ int ret;
+ struct scatterlist *s;
+
+ sgt = kzalloc(sizeof *sgt, GFP_KERNEL);
+ if (!sgt)
+ return ERR_PTR(-ENOMEM);
+
+ /* compute number of chunks */
+ chunks = 1;
+ for (i = 1; i n_pages; ++i)
+ if (pages[i] != pages[i - 1] + 1)
+ ++chunks;
+
+ ret = sg_alloc_table(sgt, chunks, GFP_KERNEL);
+ if (ret) {
+ kfree(sgt);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ /* merging chunks and putting them into the scatterlist */
+ cur_page = 0;
+ for_each_sg(sgt-sgl, s, sgt-orig_nents, i) {
+ unsigned long chunk_size;
+ unsigned int j;


size = PAGE_SIZE;


+
+ for (j = cur_page + 1; j n_pages; ++j)


for (j = cur_page + 1; j n_pages; ++j) {


+ if (pages[j] != pages[j - 1] + 1)
+ break;


size += PAGE
}


+
+ chunk_size = ((j - cur_page) PAGE_SHIFT) - offset;
+ sg_set_page(s, pages[cur_page], min(size, chunk_size), offset);


[DELETE] size -= chunk_size;


+ offset = 0;
+ cur_page = j;
+ }
+
+ return sgt;
+}



Regards,
Subash
[  178.545000] vb2_dc_pages_to_sgt() sgt-orig_nents=2
[  178.545000] vb2_dc_pages_to_sgt():83 cur_page=0
[  178.55] vb2_dc_pages_to_sgt():84 offset=0
[  178.555000] vb2_dc_pages_to_sgt():86 chunk_size=131072
[  178.56] vb2_dc_pages_to_sgt():89 size=4294836224
[  178.565000] vb2_dc_pages_to_sgt() sgt-orig_nents=2
[  178.57] vb2_dc_pages_to_sgt():83 cur_page=32
[  178.575000] vb2_dc_pages_to_sgt():84 offset=0
[  178.58] vb2_dc_pages_to_sgt():86 chunk_size=262144
[  178.585000] vb2_dc_pages_to_sgt():89 size=4294574080
[  178.59] vb2_dc_pages_to_sgt() sgt-orig_nents=1
[  178.595000] vb2_dc_pages_to_sgt():83 cur_page=0
[  178.60] vb2_dc_pages_to_sgt():84 offset=0
[  178.605000] vb2_dc_pages_to_sgt():86 chunk_size=8192
[  178.61] vb2_dc_pages_to_sgt():89 size=4294959104
[  178.625000] vb2_dc_pages_to_sgt() sgt-orig_nents=1
[  178.625000] vb2_dc_pages_to_sgt():83 cur_page=0
[  178.63] vb2_dc_pages_to_sgt():84 offset=0
[  178.635000] vb2_dc_pages_to_sgt():86 chunk_size=131072
[  178.64] vb2_dc_pages_to_sgt():89 size=4294836224
[  178.645000] vb2_dc_pages_to_sgt() sgt-orig_nents=1
[  178.65] vb2_dc_pages_to_sgt():83 cur_page=0
[  178.655000] vb2_dc_pages_to_sgt():84 offset=0
[  178.66] vb2_dc_pages_to_sgt():86 chunk_size=131072
[  178.665000] vb2_dc_pages_to_sgt():89 size=4294836224
[  178.67] vb2_dc_mmap: mapped dma addr 0x2006 at 0xb6e01000, size 
131072
[  178.67] vb2_dc_mmap: mapped dma addr 0x2008 at 0xb6de1000, size 
131072
[  178.68] vb2_dc_pages_to_sgt() sgt-orig_nents=2
[  178.685000] vb2_dc_pages_to_sgt():83 cur_page=0
[  178.69] vb2_dc_pages_to_sgt():84 offset=0
[  178.695000] vb2_dc_pages_to_sgt():86 chunk_size=4096
[  178.70] vb2_dc_pages_to_sgt():89 size=4294963200
[  178.705000] vb2_dc_pages_to_sgt() sgt-orig_nents=2
[  178.71] vb2_dc_pages_to_sgt():83 cur_page=1
[  178.715000] vb2_dc_pages_to_sgt():84 offset=0
[  178.715000] vb2_dc_pages_to_sgt():86 chunk_size=8192
[  178.72] vb2_dc_pages_to_sgt():89 size=4294955008
[  178.725000] vb2_dc_pages_to_sgt() sgt-orig_nents=1
[  178.73] vb2_dc_pages_to_sgt():83 cur_page=0
[  178.735000] vb2_dc_pages_to_sgt():84 offset=0
[  178.74] vb2_dc_pages_to_sgt():86 chunk_size=8192
[  178.745000] vb2_dc_pages_to_sgt():89 size=4294959104
[  178.75] vb2_dc_pages_to_sgt() sgt-orig_nents=1
[  178.755000] vb2_dc_pages_to_sgt():83 cur_page=0
[  178.76] vb2_dc_pages_to_sgt():84 offset=0
[  178.765000] vb2_dc_pages_to_sgt():86 chunk_size=131072
[  178.77] vb2_dc_pages_to_sgt():89 size=4294836224
[  178.78] vb2_dc_pages_to_sgt() sgt-orig_nents=2
[  178.78] vb2_dc_pages_to_sgt():83 cur_page=0
[  178.785000] vb2_dc_pages_to_sgt():84 offset=0
[  178.79] vb2_dc_pages_to_sgt():86 chunk_size=65536
[  178.795000]

[Bug 49603] [regression] Fullscreen video no longer smooth with GPU in low power mode

2012-05-09 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=49603

--- Comment #5 from Simon Farnsworth simon.farnswo...@onelan.co.uk 2012-05-09 
02:30:15 PDT ---
Created attachment 61271
  -- https://bugs.freedesktop.org/attachment.cgi?id=61271
A program to stop the CPU entering low power states

Trying a different route. Can you compile the attached program with gcc -o
stopsleep stopsleep.c and leave it running while playing a video? It tells the
kernel to avoid using deep sleep states when idling.

If it helps, we have a clue. If it doesn't, and video decode doesn't saturate
all your CPU cores, can you try running while true ; do true ; done in a
background shell and see if that helps?

The goal of both of these is to see if the problem is that we're now letting
the CPU idle more than we used to, and finding that the resulting power save
modes hurt.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 14/20] drm/radeon: multiple ring allocator v2

2012-05-09 Thread Christian König


On 08.05.2012 16:55, Jerome Glisse wrote:

Still i don't want to loop more than necessary, it's bad, i am ok with :
http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v3.patch

If there is fence signaled it will retry 2 times at most, otherwise it
will go to wait and
that way better. Because with your while loop the worst case is
something proportional to
the manager size given it's 1Mo it can loop for a long long time.
Yeah, this loop can indeed consume quite some time. Ok then let's at 
least give every ring a chance to supply some holes, otherwise I fear 
that we might not even found something worth to wait for after only 2 tries.


Going to send that out after figuring out why the patchset still causes 
texture corruptions on my system.


Christian.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCHv5 08/13] v4l: vb2-dma-contig: add support for scatterlist in userptr mode

2012-05-09 Thread Tomasz Stanislawski

Hi Subash,
Could you post the code of vb2_dc_pages_to_sgt with all printk in it.
It will help us avoid guessing where and what is debugged in the log.

Moreover, I found a line 'size=4294836224' in the log.
It means that size is equal to -131072 (!?!) or there are some invalid
conversions in printk.

Are you suze that you do not pass size = 0 as the function argument?

Notice that earlier versions of dmabuf-for-vb2 patches has offset2
argument instead of size. It was the offset at the end of the buffer.
In (I guess) 95% of cases this offset was 0.

Could you provide only function arguments that causes the failure?
I mean pages array + size (I assume that offset is zero for your test).

Having the arguments we could reproduce that bug.

Regards,
Tomasz Stanislawski





On 05/09/2012 08:46 AM, Subash Patel wrote:
 Hello Tomasz, Laurent,
 
 I have printed some logs during the dmabuf export and attach for the SGT 
 issue below. Please find it in the attachment. I hope
 it will be useful.
 
 Regards,
 Subash
 
 On 05/08/2012 04:45 PM, Subash Patel wrote:
 Hi Laurent,

 On 05/08/2012 02:44 PM, Laurent Pinchart wrote:
 Hi Subash,

 On Monday 07 May 2012 20:08:25 Subash Patel wrote:
 Hello Thomasz, Laurent,

 I found an issue in the function vb2_dc_pages_to_sgt() below. I saw that
 during the attach, the size of the SGT and size requested mis-matched
 (by atleast 8k bytes). Hence I made a small correction to the code as
 below. I could then attach the importer properly.

 Thank you for the report.

 Could you print the content of the sglist (number of chunks and size
 of each
 chunk) before and after your modifications, as well as the values of
 n_pages,
 offset and size ?
 I will put back all the printk's and generate this. As of now, my setup
 has changed and will do this when I get sometime.

 On 04/20/2012 08:15 PM, Tomasz Stanislawski wrote:

 [snip]

 +static struct sg_table *vb2_dc_pages_to_sgt(struct page **pages,
 + unsigned int n_pages, unsigned long offset, unsigned long size)
 +{
 + struct sg_table *sgt;
 + unsigned int chunks;
 + unsigned int i;
 + unsigned int cur_page;
 + int ret;
 + struct scatterlist *s;
 +
 + sgt = kzalloc(sizeof *sgt, GFP_KERNEL);
 + if (!sgt)
 + return ERR_PTR(-ENOMEM);
 +
 + /* compute number of chunks */
 + chunks = 1;
 + for (i = 1; i n_pages; ++i)
 + if (pages[i] != pages[i - 1] + 1)
 + ++chunks;
 +
 + ret = sg_alloc_table(sgt, chunks, GFP_KERNEL);
 + if (ret) {
 + kfree(sgt);
 + return ERR_PTR(-ENOMEM);
 + }
 +
 + /* merging chunks and putting them into the scatterlist */
 + cur_page = 0;
 + for_each_sg(sgt-sgl, s, sgt-orig_nents, i) {
 + unsigned long chunk_size;
 + unsigned int j;

 size = PAGE_SIZE;

 +
 + for (j = cur_page + 1; j n_pages; ++j)

 for (j = cur_page + 1; j n_pages; ++j) {

 + if (pages[j] != pages[j - 1] + 1)
 + break;

 size += PAGE
 }

 +
 + chunk_size = ((j - cur_page) PAGE_SHIFT) - offset;
 + sg_set_page(s, pages[cur_page], min(size, chunk_size), offset);

 [DELETE] size -= chunk_size;

 + offset = 0;
 + cur_page = j;
 + }
 +
 + return sgt;
 +}

 Regards,
 Subash

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 49632] radeon: The kernel rejected CS,

2012-05-09 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=49632

--- Comment #6 from execute.met...@gmail.com 2012-05-09 03:55:23 PDT ---
Ok. I didn't realize that. I have also tried with 3.4rc8, with the same result.
Did i miss something in building the kernel?

Also, what about projectM without streamout enabled?

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 49632] radeon: The kernel rejected CS,

2012-05-09 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=49632

--- Comment #7 from execute.met...@gmail.com 2012-05-09 04:15:06 PDT ---
(In reply to comment #6)
 Ok. I didn't realize that. I have also tried with 3.4rc8, with the same 
 result.
 Did i miss something in building the kernel?
 
 Also, what about projectM without streamout enabled?

Sorry, mean rc5.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 49632] radeon: The kernel rejected CS,

2012-05-09 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=49632

--- Comment #8 from Alex Deucher ag...@yahoo.com 2012-05-09 06:14:17 PDT ---
(In reply to comment #6)
 Ok. I didn't realize that. I have also tried with 3.4rc8, with the same 
 result.
 Did i miss something in building the kernel?

Are you getting the same error about forbidden register 0x00028354?  That
register is in the allowed list for 3.4 so you shouldn't be getting that error.

 
 Also, what about projectM without streamout enabled?

Is there anything else in dmesg other than *ERROR* Failed to parse relocation
-12?

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/radeon: eliminate redundant connector_names table

2012-05-09 Thread Alex Deucher

On Fri, May 4, 2012 at 11:25 AM, Ilija Hadzic
ihad...@research.bell-labs.com wrote:
 connector_names table is just a repeat of information that
 already exists in drm_connector_enum_list and the same string
 can be retrieved using drm_get_connector_name function.

 Nuke the redundant table and use the proper function to retrieve
 the connector name.

 Signed-off-by: Ilija Hadzic ihad...@research.bell-labs.com

Reviewed-by: Alex Deucher alexander.deuc...@amd.com

 ---
  drivers/gpu/drm/radeon/radeon_display.c |   20 +---
  1 files changed, 1 insertions(+), 19 deletions(-)

 diff --git a/drivers/gpu/drm/radeon/radeon_display.c 
 b/drivers/gpu/drm/radeon/radeon_display.c
 index 8086c96..60aecc5 100644
 --- a/drivers/gpu/drm/radeon/radeon_display.c
 +++ b/drivers/gpu/drm/radeon/radeon_display.c
 @@ -572,24 +572,6 @@ static const char *encoder_names[36] = {
        TRAVIS,
  };

 -static const char *connector_names[15] = {
 -       Unknown,
 -       VGA,
 -       DVI-I,
 -       DVI-D,
 -       DVI-A,
 -       Composite,
 -       S-video,
 -       LVDS,
 -       Component,
 -       DIN,
 -       DisplayPort,
 -       HDMI-A,
 -       HDMI-B,
 -       TV,
 -       eDP,
 -};
 -
  static const char *hpd_names[6] = {
        HPD1,
        HPD2,
 @@ -612,7 +594,7 @@ static void radeon_print_display_setup(struct drm_device 
 *dev)
        list_for_each_entry(connector, dev-mode_config.connector_list, head) 
 {
                radeon_connector = to_radeon_connector(connector);
                DRM_INFO(Connector %d:\n, i);
 -               DRM_INFO(  %s\n, 
 connector_names[connector-connector_type]);
 +               DRM_INFO(  %s\n, drm_get_connector_name(connector));
                if (radeon_connector-hpd.hpd != RADEON_HPD_NONE)
                        DRM_INFO(  %s\n, 
 hpd_names[radeon_connector-hpd.hpd]);
                if (radeon_connector-ddc_bus) {
 --
 1.7.8.5

 ___
 dri-devel mailing list
 dri-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Include request for SA improvements

2012-05-09 Thread Christian König

Hi Dave  Jerome and everybody on the list,

I can't find any more bugs and also I'm out of things to test, so I really
hope that this is the last incarnation of this patchset, and if Jerome is
ok with it it should now be included into drm-next.

Cheers,
Christian.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 01/19] drm/radeon: fix possible lack of synchronization btw ttm and other ring

2012-05-09 Thread Christian König

From: Jerome Glisse jgli...@redhat.com

We need to sync with the GFX ring as ttm might have schedule bo move
on it and new command scheduled for other ring need to wait for bo
data to be in place.

Signed-off-by: Jerome Glisse jgli...@redhat.com
Reviewed by: Christian König christian.koe...@amd.com
---
 drivers/gpu/drm/radeon/radeon_cs.c |   12 ++--
 include/drm/radeon_drm.h   |1 -
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index c66beb1..289b0d7 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -122,15 +122,15 @@ static int radeon_cs_sync_rings(struct radeon_cs_parser 
*p)
int i, r;
 
for (i = 0; i  p-nrelocs; i++) {
+   struct radeon_fence *fence;
+
if (!p-relocs[i].robj || !p-relocs[i].robj-tbo.sync_obj)
continue;
 
-   if (!(p-relocs[i].flags  RADEON_RELOC_DONT_SYNC)) {
-   struct radeon_fence *fence = 
p-relocs[i].robj-tbo.sync_obj;
-   if (fence-ring != p-ring  
!radeon_fence_signaled(fence)) {
-   sync_to_ring[fence-ring] = true;
-   need_sync = true;
-   }
+   fence = p-relocs[i].robj-tbo.sync_obj;
+   if (fence-ring != p-ring  !radeon_fence_signaled(fence)) {
+   sync_to_ring[fence-ring] = true;
+   need_sync = true;
}
}
 
diff --git a/include/drm/radeon_drm.h b/include/drm/radeon_drm.h
index 7c491b4..5805686 100644
--- a/include/drm/radeon_drm.h
+++ b/include/drm/radeon_drm.h
@@ -926,7 +926,6 @@ struct drm_radeon_cs_chunk {
 };
 
 /* drm_radeon_cs_reloc.flags */
-#define RADEON_RELOC_DONT_SYNC 0x01
 
 struct drm_radeon_cs_reloc {
uint32_thandle;
-- 
1.7.9.5

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 02/19] drm/radeon: replace the per ring mutex with a global one

2012-05-09 Thread Christian König

A single global mutex for ring submissions seems sufficient.

Signed-off-by: Christian König deathsim...@vodafone.de
---
 drivers/gpu/drm/radeon/radeon.h   |3 ++-
 drivers/gpu/drm/radeon/radeon_device.c|3 +--
 drivers/gpu/drm/radeon/radeon_pm.c|   10 ++-
 drivers/gpu/drm/radeon/radeon_ring.c  |   28 +++
 drivers/gpu/drm/radeon/radeon_semaphore.c |   42 +
 5 files changed, 41 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 82ffa6a..e99ea81 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -676,7 +676,6 @@ struct radeon_ring {
uint64_tgpu_addr;
uint32_talign_mask;
uint32_tptr_mask;
-   struct mutexmutex;
boolready;
u32 ptr_reg_shift;
u32 ptr_reg_mask;
@@ -815,6 +814,7 @@ int radeon_ring_alloc(struct radeon_device *rdev, struct 
radeon_ring *cp, unsign
 int radeon_ring_lock(struct radeon_device *rdev, struct radeon_ring *cp, 
unsigned ndw);
 void radeon_ring_commit(struct radeon_device *rdev, struct radeon_ring *cp);
 void radeon_ring_unlock_commit(struct radeon_device *rdev, struct radeon_ring 
*cp);
+void radeon_ring_undo(struct radeon_ring *ring);
 void radeon_ring_unlock_undo(struct radeon_device *rdev, struct radeon_ring 
*cp);
 int radeon_ring_test(struct radeon_device *rdev, struct radeon_ring *cp);
 void radeon_ring_force_activity(struct radeon_device *rdev, struct radeon_ring 
*ring);
@@ -1534,6 +1534,7 @@ struct radeon_device {
rwlock_tfence_lock;
struct radeon_fence_driver  fence_drv[RADEON_NUM_RINGS];
struct radeon_semaphore_driver  semaphore_drv;
+   struct mutexring_lock;
struct radeon_ring  ring[RADEON_NUM_RINGS];
struct radeon_ib_pool   ib_pool;
struct radeon_irq   irq;
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index ff28210..3f6ff2a 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -724,8 +724,7 @@ int radeon_device_init(struct radeon_device *rdev,
 * can recall function without having locking issues */
radeon_mutex_init(rdev-cs_mutex);
radeon_mutex_init(rdev-ib_pool.mutex);
-   for (i = 0; i  RADEON_NUM_RINGS; ++i)
-   mutex_init(rdev-ring[i].mutex);
+   mutex_init(rdev-ring_lock);
mutex_init(rdev-dc_hw_i2c_mutex);
if (rdev-family = CHIP_R600)
spin_lock_init(rdev-ih.lock);
diff --git a/drivers/gpu/drm/radeon/radeon_pm.c 
b/drivers/gpu/drm/radeon/radeon_pm.c
index caa55d6..7c38745 100644
--- a/drivers/gpu/drm/radeon/radeon_pm.c
+++ b/drivers/gpu/drm/radeon/radeon_pm.c
@@ -252,10 +252,7 @@ static void radeon_pm_set_clocks(struct radeon_device 
*rdev)
 
mutex_lock(rdev-ddev-struct_mutex);
mutex_lock(rdev-vram_mutex);
-   for (i = 0; i  RADEON_NUM_RINGS; ++i) {
-   if (rdev-ring[i].ring_obj)
-   mutex_lock(rdev-ring[i].mutex);
-   }
+   mutex_lock(rdev-ring_lock);
 
/* gui idle int has issues on older chips it seems */
if (rdev-family = CHIP_R600) {
@@ -311,10 +308,7 @@ static void radeon_pm_set_clocks(struct radeon_device 
*rdev)
 
rdev-pm.dynpm_planned_action = DYNPM_ACTION_NONE;
 
-   for (i = 0; i  RADEON_NUM_RINGS; ++i) {
-   if (rdev-ring[i].ring_obj)
-   mutex_unlock(rdev-ring[i].mutex);
-   }
+   mutex_unlock(rdev-ring_lock);
mutex_unlock(rdev-vram_mutex);
mutex_unlock(rdev-ddev-struct_mutex);
 }
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
b/drivers/gpu/drm/radeon/radeon_ring.c
index 2eb4c6e..a4d60ae 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -346,9 +346,9 @@ int radeon_ring_alloc(struct radeon_device *rdev, struct 
radeon_ring *ring, unsi
if (ndw  ring-ring_free_dw) {
break;
}
-   mutex_unlock(ring-mutex);
+   mutex_unlock(rdev-ring_lock);
r = radeon_fence_wait_next(rdev, radeon_ring_index(rdev, ring));
-   mutex_lock(ring-mutex);
+   mutex_lock(rdev-ring_lock);
if (r)
return r;
}
@@ -361,10 +361,10 @@ int radeon_ring_lock(struct radeon_device *rdev, struct 
radeon_ring *ring, unsig
 {
int r;
 
-   mutex_lock(ring-mutex);
+   mutex_lock(rdev-ring_lock);
r = radeon_ring_alloc(rdev, ring, ndw);
if (r) {
-   mutex_unlock(ring-mutex);
+   mutex_unlock(rdev-ring_lock);
return r;
}

[PATCH 03/19] drm/radeon: convert fence to uint64_t v4

2012-05-09 Thread Christian König

From: Jerome Glisse jgli...@redhat.com

This convert fence to use uint64_t sequence number intention is
to use the fact that uin64_t is big enough that we don't need to
care about wrap around.

Tested with and without writeback using 0xF000 as initial
fence sequence and thus allowing to test the wrap around from
32bits to 64bits.

v2: Add comment about possible race btw CPU  GPU, add comment
stressing that we need 2 dword aligned for R600_WB_EVENT_OFFSET
Read fence sequenc in reverse order of GPU write them so we
mitigate the race btw CPU and GPU.

v3: Drop the need for ring to emit the 64bits fence, and just have
each ring emit the lower 32bits of the fence sequence. We
handle the wrap over 32bits in fence_process.

v4: Just a small optimization: Don't reread the last_seq value
if loop restarts, since we already know its value anyway.
Also start at zero not one for seq value and use pre instead
of post increment in emmit, otherwise wait_empty will deadlock.

Signed-off-by: Jerome Glisse jgli...@redhat.com
Signed-off-by: Christian König deathsim...@vodafone.de
---
 drivers/gpu/drm/radeon/radeon.h   |   39 ++-
 drivers/gpu/drm/radeon/radeon_fence.c |  116 +++--
 drivers/gpu/drm/radeon/radeon_ring.c  |9 +--
 3 files changed, 107 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index e99ea81..cdf46bc 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -100,28 +100,32 @@ extern int radeon_lockup_timeout;
  * Copy from radeon_drv.h so we don't have to include both and have conflicting
  * symbol;
  */
-#define RADEON_MAX_USEC_TIMEOUT10  /* 100 ms */
-#define RADEON_FENCE_JIFFIES_TIMEOUT   (HZ / 2)
+#define RADEON_MAX_USEC_TIMEOUT10  /* 100 ms */
+#define RADEON_FENCE_JIFFIES_TIMEOUT   (HZ / 2)
 /* RADEON_IB_POOL_SIZE must be a power of 2 */
-#define RADEON_IB_POOL_SIZE16
-#define RADEON_DEBUGFS_MAX_COMPONENTS  32
-#define RADEONFB_CONN_LIMIT4
-#define RADEON_BIOS_NUM_SCRATCH8
+#define RADEON_IB_POOL_SIZE16
+#define RADEON_DEBUGFS_MAX_COMPONENTS  32
+#define RADEONFB_CONN_LIMIT4
+#define RADEON_BIOS_NUM_SCRATCH8
 
 /* max number of rings */
-#define RADEON_NUM_RINGS 3
+#define RADEON_NUM_RINGS   3
+
+/* fence seq are set to this number when signaled */
+#define RADEON_FENCE_SIGNALED_SEQ  0LL
+#define RADEON_FENCE_NOTEMITED_SEQ (~0LL)
 
 /* internal ring indices */
 /* r1xx+ has gfx CP ring */
-#define RADEON_RING_TYPE_GFX_INDEX  0
+#define RADEON_RING_TYPE_GFX_INDEX 0
 
 /* cayman has 2 compute CP rings */
-#define CAYMAN_RING_TYPE_CP1_INDEX 1
-#define CAYMAN_RING_TYPE_CP2_INDEX 2
+#define CAYMAN_RING_TYPE_CP1_INDEX 1
+#define CAYMAN_RING_TYPE_CP2_INDEX 2
 
 /* hardcode those limit for now */
-#define RADEON_VA_RESERVED_SIZE(8  20)
-#define RADEON_IB_VM_MAX_SIZE  (64  10)
+#define RADEON_VA_RESERVED_SIZE(8  20)
+#define RADEON_IB_VM_MAX_SIZE  (64  10)
 
 /*
  * Errata workarounds.
@@ -254,8 +258,9 @@ struct radeon_fence_driver {
uint32_tscratch_reg;
uint64_tgpu_addr;
volatile uint32_t   *cpu_addr;
-   atomic_tseq;
-   uint32_tlast_seq;
+   /* seq is protected by ring emission lock */
+   uint64_tseq;
+   atomic64_t  last_seq;
unsigned long   last_activity;
wait_queue_head_t   queue;
struct list_heademitted;
@@ -268,11 +273,9 @@ struct radeon_fence {
struct kref kref;
struct list_headlist;
/* protected by radeon_fence.lock */
-   uint32_tseq;
-   boolemitted;
-   boolsignaled;
+   uint64_tseq;
/* RB, DMA, etc. */
-   int ring;
+   unsignedring;
struct radeon_semaphore *semaphore;
 };
 
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
b/drivers/gpu/drm/radeon/radeon_fence.c
index 5bb78bf..feb2bbc 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -66,14 +66,14 @@ int radeon_fence_emit(struct radeon_device *rdev, struct 
radeon_fence *fence)
unsigned long irq_flags;
 
write_lock_irqsave(rdev-fence_lock, irq_flags);
-   if (fence-emitted) {
+   if (fence-seq  fence-seq  RADEON_FENCE_NOTEMITED_SEQ) {

[PATCH 06/19] drm/radeon: use inline functions to calc sa_bo addr

2012-05-09 Thread Christian König

Instead of hacking the calculation multiple times.

Signed-off-by: Christian König deathsim...@vodafone.de
---
 drivers/gpu/drm/radeon/radeon_gart.c  |6 ++
 drivers/gpu/drm/radeon/radeon_object.h|   11 +++
 drivers/gpu/drm/radeon/radeon_ring.c  |6 ++
 drivers/gpu/drm/radeon/radeon_semaphore.c |6 ++
 4 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_gart.c 
b/drivers/gpu/drm/radeon/radeon_gart.c
index c58a036..4a5d9d4 100644
--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -404,10 +404,8 @@ retry:
radeon_vm_unbind(rdev, vm_evict);
goto retry;
}
-   vm-pt = rdev-vm_manager.sa_manager.cpu_ptr;
-   vm-pt += (vm-sa_bo.offset  3);
-   vm-pt_gpu_addr = rdev-vm_manager.sa_manager.gpu_addr;
-   vm-pt_gpu_addr += vm-sa_bo.offset;
+   vm-pt = radeon_sa_bo_cpu_addr(vm-sa_bo);
+   vm-pt_gpu_addr = radeon_sa_bo_gpu_addr(vm-sa_bo);
memset(vm-pt, 0, RADEON_GPU_PAGE_ALIGN(vm-last_pfn * 8));
 
 retry_id:
diff --git a/drivers/gpu/drm/radeon/radeon_object.h 
b/drivers/gpu/drm/radeon/radeon_object.h
index f9104be..c120ab9 100644
--- a/drivers/gpu/drm/radeon/radeon_object.h
+++ b/drivers/gpu/drm/radeon/radeon_object.h
@@ -146,6 +146,17 @@ extern struct radeon_bo_va *radeon_bo_va(struct radeon_bo 
*rbo,
 /*
  * sub allocation
  */
+
+static inline uint64_t radeon_sa_bo_gpu_addr(struct radeon_sa_bo *sa_bo)
+{
+   return sa_bo-manager-gpu_addr + sa_bo-offset;
+}
+
+static inline void * radeon_sa_bo_cpu_addr(struct radeon_sa_bo *sa_bo)
+{
+   return sa_bo-manager-cpu_ptr + sa_bo-offset;
+}
+
 extern int radeon_sa_bo_manager_init(struct radeon_device *rdev,
 struct radeon_sa_manager *sa_manager,
 unsigned size, u32 domain);
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
b/drivers/gpu/drm/radeon/radeon_ring.c
index 2fdc8c3..116be5e 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -127,10 +127,8 @@ retry:
 size, 256);
if (!r) {
*ib = rdev-ib_pool.ibs[idx];
-   (*ib)-ptr = rdev-ib_pool.sa_manager.cpu_ptr;
-   (*ib)-ptr += ((*ib)-sa_bo.offset  2);
-   (*ib)-gpu_addr = 
rdev-ib_pool.sa_manager.gpu_addr;
-   (*ib)-gpu_addr += (*ib)-sa_bo.offset;
+   (*ib)-ptr = 
radeon_sa_bo_cpu_addr((*ib)-sa_bo);
+   (*ib)-gpu_addr = 
radeon_sa_bo_gpu_addr((*ib)-sa_bo);
(*ib)-fence = fence;
(*ib)-vm_id = 0;
(*ib)-is_const_ib = false;
diff --git a/drivers/gpu/drm/radeon/radeon_semaphore.c 
b/drivers/gpu/drm/radeon/radeon_semaphore.c
index c5b3d8e..f312ba5 100644
--- a/drivers/gpu/drm/radeon/radeon_semaphore.c
+++ b/drivers/gpu/drm/radeon/radeon_semaphore.c
@@ -53,10 +53,8 @@ static int radeon_semaphore_add_bo(struct radeon_device 
*rdev)
kfree(bo);
return r;
}
-   gpu_addr = rdev-ib_pool.sa_manager.gpu_addr;
-   gpu_addr += bo-ib-sa_bo.offset;
-   cpu_ptr = rdev-ib_pool.sa_manager.cpu_ptr;
-   cpu_ptr += (bo-ib-sa_bo.offset  2);
+   gpu_addr = radeon_sa_bo_gpu_addr(bo-ib-sa_bo);
+   cpu_ptr = radeon_sa_bo_cpu_addr(bo-ib-sa_bo);
for (i = 0; i  (RADEON_SEMAPHORE_BO_SIZE/8); i++) {
bo-semaphores[i].gpu_addr = gpu_addr;
bo-semaphores[i].cpu_ptr = cpu_ptr;
-- 
1.7.9.5

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 05/19] drm/radeon: rework locking ring emission mutex in fence deadlock detection v2

2012-05-09 Thread Christian König

Some callers illegal called fence_wait_next/empty
while holding the ring emission mutex. So don't
relock the mutex in that cases, and move the actual
locking into the fence code.

v2: Don't try to unlock the mutex if it isn't locked.

Signed-off-by: Christian König deathsim...@vodafone.de
---
 drivers/gpu/drm/radeon/radeon.h|4 +--
 drivers/gpu/drm/radeon/radeon_device.c |5 +++-
 drivers/gpu/drm/radeon/radeon_fence.c  |   43 +---
 drivers/gpu/drm/radeon/radeon_pm.c |8 +-
 drivers/gpu/drm/radeon/radeon_ring.c   |6 +
 5 files changed, 37 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 7c87117..701094b 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -284,8 +284,8 @@ int radeon_fence_emit(struct radeon_device *rdev, struct 
radeon_fence *fence);
 void radeon_fence_process(struct radeon_device *rdev, int ring);
 bool radeon_fence_signaled(struct radeon_fence *fence);
 int radeon_fence_wait(struct radeon_fence *fence, bool interruptible);
-int radeon_fence_wait_next(struct radeon_device *rdev, int ring);
-int radeon_fence_wait_empty(struct radeon_device *rdev, int ring);
+int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring);
+int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring);
 struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence);
 void radeon_fence_unref(struct radeon_fence **fence);
 unsigned radeon_fence_count_emitted(struct radeon_device *rdev, int ring);
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 0e7b72a..b827b2e 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -912,9 +912,12 @@ int radeon_suspend_kms(struct drm_device *dev, 
pm_message_t state)
}
/* evict vram memory */
radeon_bo_evict_vram(rdev);
+
+   mutex_lock(rdev-ring_lock);
/* wait for gpu to finish processing current batch */
for (i = 0; i  RADEON_NUM_RINGS; i++)
-   radeon_fence_wait_empty(rdev, i);
+   radeon_fence_wait_empty_locked(rdev, i);
+   mutex_unlock(rdev-ring_lock);
 
radeon_save_bios_scratch_regs(rdev);
 
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
b/drivers/gpu/drm/radeon/radeon_fence.c
index ed20225..098d1fa 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -194,7 +194,7 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
 }
 
 static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq,
-unsigned ring, bool intr)
+unsigned ring, bool intr, bool lock_ring)
 {
unsigned long timeout, last_activity;
uint64_t seq;
@@ -249,8 +249,16 @@ static int radeon_fence_wait_seq(struct radeon_device 
*rdev, u64 target_seq,
if (seq != 
atomic64_read(rdev-fence_drv[ring].last_seq)) {
continue;
}
+
+   if (lock_ring) {
+   mutex_lock(rdev-ring_lock);
+   }
+
/* test if somebody else has already decided that this 
is a lockup */
if (last_activity != 
rdev-fence_drv[ring].last_activity) {
+   if (lock_ring) {
+   mutex_unlock(rdev-ring_lock);
+   }
continue;
}
 
@@ -264,15 +272,17 @@ static int radeon_fence_wait_seq(struct radeon_device 
*rdev, u64 target_seq,
rdev-fence_drv[i].last_activity = 
jiffies;
}
 
-   /* change last activity so nobody else think 
there is a lockup */
-   for (i = 0; i  RADEON_NUM_RINGS; ++i) {
-   rdev-fence_drv[i].last_activity = 
jiffies;
-   }
-
/* mark the ring as not ready any more */
rdev-ring[ring].ready = false;
+   if (lock_ring) {
+   mutex_unlock(rdev-ring_lock);
+   }
return -EDEADLK;
}
+
+   if (lock_ring) {
+   mutex_unlock(rdev-ring_lock);
+   }
}
}
return 0;
@@ -287,7 +297,8 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
return -EINVAL;
}
 
-   r = radeon_fence_wait_seq(fence-rdev, fence-seq, fence-ring, intr);
+   r = radeon_fence_wait_seq(fence-rdev, fence-seq,
+

[PATCH 07/19] drm/radeon: add proper locking to the SA v3

2012-05-09 Thread Christian König

Make the suballocator self containing to locking.

v2: split the bugfix into a seperate patch.
v3: remove some unreleated changes.

Sig-off-by: Christian König deathsim...@vodafone.de
---
 drivers/gpu/drm/radeon/radeon.h|1 +
 drivers/gpu/drm/radeon/radeon_sa.c |6 ++
 2 files changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 701094b..8a6b1b3 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -381,6 +381,7 @@ struct radeon_bo_list {
  * alignment).
  */
 struct radeon_sa_manager {
+   spinlock_t  lock;
struct radeon_bo*bo;
struct list_headsa_bo;
unsignedsize;
diff --git a/drivers/gpu/drm/radeon/radeon_sa.c 
b/drivers/gpu/drm/radeon/radeon_sa.c
index 8fbfe69..aed0a8c 100644
--- a/drivers/gpu/drm/radeon/radeon_sa.c
+++ b/drivers/gpu/drm/radeon/radeon_sa.c
@@ -37,6 +37,7 @@ int radeon_sa_bo_manager_init(struct radeon_device *rdev,
 {
int r;
 
+   spin_lock_init(sa_manager-lock);
sa_manager-bo = NULL;
sa_manager-size = size;
sa_manager-domain = domain;
@@ -139,6 +140,7 @@ int radeon_sa_bo_new(struct radeon_device *rdev,
 
BUG_ON(align  RADEON_GPU_PAGE_SIZE);
BUG_ON(size  sa_manager-size);
+   spin_lock(sa_manager-lock);
 
/* no one ? */
head = sa_manager-sa_bo.prev;
@@ -172,6 +174,7 @@ int radeon_sa_bo_new(struct radeon_device *rdev,
offset += wasted;
if ((sa_manager-size - offset)  size) {
/* failed to find somethings big enough */
+   spin_unlock(sa_manager-lock);
return -ENOMEM;
}
 
@@ -180,10 +183,13 @@ out:
sa_bo-offset = offset;
sa_bo-size = size;
list_add(sa_bo-list, head);
+   spin_unlock(sa_manager-lock);
return 0;
 }
 
 void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo *sa_bo)
 {
+   spin_lock(sa_bo-manager-lock);
list_del_init(sa_bo-list);
+   spin_unlock(sa_bo-manager-lock);
 }
-- 
1.7.9.5

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 09/19] drm/radeon: keep start and end offset in the SA

2012-05-09 Thread Christian König

Instead of offset + size keep start and end offset directly.

Signed-off-by: Christian König deathsim...@vodafone.de
---
 drivers/gpu/drm/radeon/radeon.h|4 ++--
 drivers/gpu/drm/radeon/radeon_cs.c |4 ++--
 drivers/gpu/drm/radeon/radeon_object.h |4 ++--
 drivers/gpu/drm/radeon/radeon_sa.c |   13 +++--
 4 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 8a6b1b3..d1c2154 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -396,8 +396,8 @@ struct radeon_sa_bo;
 struct radeon_sa_bo {
struct list_headlist;
struct radeon_sa_manager*manager;
-   unsignedoffset;
-   unsignedsize;
+   unsignedsoffset;
+   unsignedeoffset;
 };
 
 /*
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index 289b0d7..b778037 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -477,7 +477,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
/* ib pool is bind at 0 in virtual address space to gpu_addr is 
the
 * offset inside the pool bo
 */
-   parser-const_ib-gpu_addr = parser-const_ib-sa_bo.offset;
+   parser-const_ib-gpu_addr = parser-const_ib-sa_bo.soffset;
r = radeon_ib_schedule(rdev, parser-const_ib);
if (r)
goto out;
@@ -487,7 +487,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
/* ib pool is bind at 0 in virtual address space to gpu_addr is the
 * offset inside the pool bo
 */
-   parser-ib-gpu_addr = parser-ib-sa_bo.offset;
+   parser-ib-gpu_addr = parser-ib-sa_bo.soffset;
parser-ib-is_const_ib = false;
r = radeon_ib_schedule(rdev, parser-ib);
 out:
diff --git a/drivers/gpu/drm/radeon/radeon_object.h 
b/drivers/gpu/drm/radeon/radeon_object.h
index d9fca1e..99ab46a 100644
--- a/drivers/gpu/drm/radeon/radeon_object.h
+++ b/drivers/gpu/drm/radeon/radeon_object.h
@@ -149,12 +149,12 @@ extern struct radeon_bo_va *radeon_bo_va(struct radeon_bo 
*rbo,
 
 static inline uint64_t radeon_sa_bo_gpu_addr(struct radeon_sa_bo *sa_bo)
 {
-   return sa_bo-manager-gpu_addr + sa_bo-offset;
+   return sa_bo-manager-gpu_addr + sa_bo-soffset;
 }
 
 static inline void * radeon_sa_bo_cpu_addr(struct radeon_sa_bo *sa_bo)
 {
-   return sa_bo-manager-cpu_ptr + sa_bo-offset;
+   return sa_bo-manager-cpu_ptr + sa_bo-soffset;
 }
 
 extern int radeon_sa_bo_manager_init(struct radeon_device *rdev,
diff --git a/drivers/gpu/drm/radeon/radeon_sa.c 
b/drivers/gpu/drm/radeon/radeon_sa.c
index 1db0568..3bea7ba 100644
--- a/drivers/gpu/drm/radeon/radeon_sa.c
+++ b/drivers/gpu/drm/radeon/radeon_sa.c
@@ -152,11 +152,11 @@ int radeon_sa_bo_new(struct radeon_device *rdev,
offset = 0;
list_for_each_entry(tmp, sa_manager-sa_bo, list) {
/* room before this object ? */
-   if (offset  tmp-offset  (tmp-offset - offset) = size) {
+   if (offset  tmp-soffset  (tmp-soffset - offset) = size) {
head = tmp-list.prev;
goto out;
}
-   offset = tmp-offset + tmp-size;
+   offset = tmp-eoffset;
wasted = offset % align;
if (wasted) {
wasted = align - wasted;
@@ -166,7 +166,7 @@ int radeon_sa_bo_new(struct radeon_device *rdev,
/* room at the end ? */
head = sa_manager-sa_bo.prev;
tmp = list_entry(head, struct radeon_sa_bo, list);
-   offset = tmp-offset + tmp-size;
+   offset = tmp-eoffset;
wasted = offset % align;
if (wasted) {
wasted = align - wasted;
@@ -180,8 +180,8 @@ int radeon_sa_bo_new(struct radeon_device *rdev,
 
 out:
sa_bo-manager = sa_manager;
-   sa_bo-offset = offset;
-   sa_bo-size = size;
+   sa_bo-soffset = offset;
+   sa_bo-eoffset = offset + size;
list_add(sa_bo-list, head);
spin_unlock(sa_manager-lock);
return 0;
@@ -202,7 +202,8 @@ void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager 
*sa_manager,
 
spin_lock(sa_manager-lock);
list_for_each_entry(i, sa_manager-sa_bo, list) {
-   seq_printf(m, offset %08d: size %4d\n, i-offset, i-size);
+   seq_printf(m, [%08x %08x] size %4d [%p]\n,
+  i-soffset, i-eoffset, i-eoffset - i-soffset, i);
}
spin_unlock(sa_manager-lock);
 }
-- 
1.7.9.5

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 04/19] drm/radeon: rework fence handling, drop fence list v7

2012-05-09 Thread Christian König

From: Jerome Glisse jgli...@redhat.com

Using 64bits fence sequence we can directly compare sequence
number to know if a fence is signaled or not. Thus the fence
list became useless, so does the fence lock that mainly
protected the fence list.

Things like ring.ready are no longer behind a lock, this should
be ok as ring.ready is initialized once and will only change
when facing lockup. Worst case is that we return an -EBUSY just
after a successfull GPU reset, or we go into wait state instead
of returning -EBUSY (thus delaying reporting -EBUSY to fence
wait caller).

v2: Remove left over comment, force using writeback on cayman and
newer, thus not having to suffer from possibly scratch reg
exhaustion
v3: Rebase on top of change to uint64 fence patch
v4: Change DCE5 test to force write back on cayman and newer but
also any APU such as PALM or SUMO family
v5: Rebase on top of new uint64 fence patch
v6: Just break if seq doesn't change any more. Use radeon_fence
prefix for all function names. Even if it's now highly optimized,
try avoiding polling to often.
v7: We should never poll the last_seq from the hardware without
waking the sleeping threads, otherwise we might lose events.

Signed-off-by: Jerome Glisse jgli...@redhat.com
Signed-off-by: Christian König deathsim...@vodafone.de
---
 drivers/gpu/drm/radeon/radeon.h|6 +-
 drivers/gpu/drm/radeon/radeon_device.c |8 +-
 drivers/gpu/drm/radeon/radeon_fence.c  |  299 
 3 files changed, 119 insertions(+), 194 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index cdf46bc..7c87117 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -263,15 +263,12 @@ struct radeon_fence_driver {
atomic64_t  last_seq;
unsigned long   last_activity;
wait_queue_head_t   queue;
-   struct list_heademitted;
-   struct list_headsignaled;
boolinitialized;
 };
 
 struct radeon_fence {
struct radeon_device*rdev;
struct kref kref;
-   struct list_headlist;
/* protected by radeon_fence.lock */
uint64_tseq;
/* RB, DMA, etc. */
@@ -291,7 +288,7 @@ int radeon_fence_wait_next(struct radeon_device *rdev, int 
ring);
 int radeon_fence_wait_empty(struct radeon_device *rdev, int ring);
 struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence);
 void radeon_fence_unref(struct radeon_fence **fence);
-int radeon_fence_count_emitted(struct radeon_device *rdev, int ring);
+unsigned radeon_fence_count_emitted(struct radeon_device *rdev, int ring);
 
 /*
  * Tiling registers
@@ -1534,7 +1531,6 @@ struct radeon_device {
struct radeon_mode_info mode_info;
struct radeon_scratch   scratch;
struct radeon_mman  mman;
-   rwlock_tfence_lock;
struct radeon_fence_driver  fence_drv[RADEON_NUM_RINGS];
struct radeon_semaphore_driver  semaphore_drv;
struct mutexring_lock;
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 3f6ff2a..0e7b72a 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -225,9 +225,9 @@ int radeon_wb_init(struct radeon_device *rdev)
/* disable event_write fences */
rdev-wb.use_event = false;
/* disabled via module param */
-   if (radeon_no_wb == 1)
+   if (radeon_no_wb == 1) {
rdev-wb.enabled = false;
-   else {
+   } else {
if (rdev-flags  RADEON_IS_AGP) {
/* often unreliable on AGP */
rdev-wb.enabled = false;
@@ -237,8 +237,9 @@ int radeon_wb_init(struct radeon_device *rdev)
} else {
rdev-wb.enabled = true;
/* event_write fences are only available on r600+ */
-   if (rdev-family = CHIP_R600)
+   if (rdev-family = CHIP_R600) {
rdev-wb.use_event = true;
+   }
}
}
/* always use writeback/events on NI, APUs */
@@ -731,7 +732,6 @@ int radeon_device_init(struct radeon_device *rdev,
mutex_init(rdev-gem.mutex);
mutex_init(rdev-pm.mutex);
mutex_init(rdev-vram_mutex);
-   rwlock_init(rdev-fence_lock);
rwlock_init(rdev-semaphore_drv.lock);
INIT_LIST_HEAD(rdev-gem.objects);
init_waitqueue_head(rdev-irq.vblank_queue);
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
b/drivers/gpu/drm/radeon/radeon_fence.c
index feb2bbc..ed20225 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++

[PATCH 08/19] drm/radeon: add sub allocator debugfs file

2012-05-09 Thread Christian König

Dumping the current allocations.

Signed-off-by: Christian König deathsim...@vodafone.de
---
 drivers/gpu/drm/radeon/radeon_object.h |5 +
 drivers/gpu/drm/radeon/radeon_ring.c   |   22 ++
 drivers/gpu/drm/radeon/radeon_sa.c |   14 ++
 3 files changed, 41 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_object.h 
b/drivers/gpu/drm/radeon/radeon_object.h
index c120ab9..d9fca1e 100644
--- a/drivers/gpu/drm/radeon/radeon_object.h
+++ b/drivers/gpu/drm/radeon/radeon_object.h
@@ -172,5 +172,10 @@ extern int radeon_sa_bo_new(struct radeon_device *rdev,
unsigned size, unsigned align);
 extern void radeon_sa_bo_free(struct radeon_device *rdev,
  struct radeon_sa_bo *sa_bo);
+#if defined(CONFIG_DEBUG_FS)
+extern void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager,
+struct seq_file *m);
+#endif
+
 
 #endif
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
b/drivers/gpu/drm/radeon/radeon_ring.c
index 116be5e..f49c9c0 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -601,6 +601,23 @@ static int radeon_debugfs_ib_info(struct seq_file *m, void 
*data)
 static struct drm_info_list radeon_debugfs_ib_list[RADEON_IB_POOL_SIZE];
 static char radeon_debugfs_ib_names[RADEON_IB_POOL_SIZE][32];
 static unsigned radeon_debugfs_ib_idx[RADEON_IB_POOL_SIZE];
+
+static int radeon_debugfs_sa_info(struct seq_file *m, void *data)
+{
+   struct drm_info_node *node = (struct drm_info_node *) m-private;
+   struct drm_device *dev = node-minor-dev;
+   struct radeon_device *rdev = dev-dev_private;
+
+   radeon_sa_bo_dump_debug_info(rdev-ib_pool.sa_manager, m);
+
+   return 0;
+
+}
+
+static struct drm_info_list radeon_debugfs_sa_list[] = {
+{radeon_sa_info, radeon_debugfs_sa_info, 0, NULL},
+};
+
 #endif
 
 int radeon_debugfs_ring_init(struct radeon_device *rdev, struct radeon_ring 
*ring)
@@ -627,6 +644,11 @@ int radeon_debugfs_ib_init(struct radeon_device *rdev)
 {
 #if defined(CONFIG_DEBUG_FS)
unsigned i;
+   int r;
+
+   r = radeon_debugfs_add_files(rdev, radeon_debugfs_sa_list, 1);
+   if (r)
+   return r;
 
for (i = 0; i  RADEON_IB_POOL_SIZE; i++) {
sprintf(radeon_debugfs_ib_names[i], radeon_ib_%04u, i);
diff --git a/drivers/gpu/drm/radeon/radeon_sa.c 
b/drivers/gpu/drm/radeon/radeon_sa.c
index aed0a8c..1db0568 100644
--- a/drivers/gpu/drm/radeon/radeon_sa.c
+++ b/drivers/gpu/drm/radeon/radeon_sa.c
@@ -193,3 +193,17 @@ void radeon_sa_bo_free(struct radeon_device *rdev, struct 
radeon_sa_bo *sa_bo)
list_del_init(sa_bo-list);
spin_unlock(sa_bo-manager-lock);
 }
+
+#if defined(CONFIG_DEBUG_FS)
+void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager,
+ struct seq_file *m)
+{
+   struct radeon_sa_bo *i;
+
+   spin_lock(sa_manager-lock);
+   list_for_each_entry(i, sa_manager-sa_bo, list) {
+   seq_printf(m, offset %08d: size %4d\n, i-offset, i-size);
+   }
+   spin_unlock(sa_manager-lock);
+}
+#endif
-- 
1.7.9.5

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 10/19] drm/radeon: make sa bo a stand alone object

2012-05-09 Thread Christian König

Allocating and freeing it seperately.

Signed-off-by: Christian König deathsim...@vodafone.de
---
 drivers/gpu/drm/radeon/radeon.h   |4 ++--
 drivers/gpu/drm/radeon/radeon_cs.c|4 ++--
 drivers/gpu/drm/radeon/radeon_gart.c  |4 ++--
 drivers/gpu/drm/radeon/radeon_object.h|4 ++--
 drivers/gpu/drm/radeon/radeon_ring.c  |6 +++---
 drivers/gpu/drm/radeon/radeon_sa.c|   28 +++-
 drivers/gpu/drm/radeon/radeon_semaphore.c |4 ++--
 7 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index d1c2154..9374ab1 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -638,7 +638,7 @@ void radeon_irq_kms_pflip_irq_put(struct radeon_device 
*rdev, int crtc);
  */
 
 struct radeon_ib {
-   struct radeon_sa_bo sa_bo;
+   struct radeon_sa_bo *sa_bo;
unsignedidx;
uint32_tlength_dw;
uint64_tgpu_addr;
@@ -693,7 +693,7 @@ struct radeon_vm {
unsignedlast_pfn;
u64 pt_gpu_addr;
u64 *pt;
-   struct radeon_sa_bo sa_bo;
+   struct radeon_sa_bo *sa_bo;
struct mutexmutex;
/* last fence for cs using this vm */
struct radeon_fence *fence;
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index b778037..5c065bf 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -477,7 +477,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
/* ib pool is bind at 0 in virtual address space to gpu_addr is 
the
 * offset inside the pool bo
 */
-   parser-const_ib-gpu_addr = parser-const_ib-sa_bo.soffset;
+   parser-const_ib-gpu_addr = parser-const_ib-sa_bo-soffset;
r = radeon_ib_schedule(rdev, parser-const_ib);
if (r)
goto out;
@@ -487,7 +487,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
/* ib pool is bind at 0 in virtual address space to gpu_addr is the
 * offset inside the pool bo
 */
-   parser-ib-gpu_addr = parser-ib-sa_bo.soffset;
+   parser-ib-gpu_addr = parser-ib-sa_bo-soffset;
parser-ib-is_const_ib = false;
r = radeon_ib_schedule(rdev, parser-ib);
 out:
diff --git a/drivers/gpu/drm/radeon/radeon_gart.c 
b/drivers/gpu/drm/radeon/radeon_gart.c
index 4a5d9d4..c5789ef 100644
--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -404,8 +404,8 @@ retry:
radeon_vm_unbind(rdev, vm_evict);
goto retry;
}
-   vm-pt = radeon_sa_bo_cpu_addr(vm-sa_bo);
-   vm-pt_gpu_addr = radeon_sa_bo_gpu_addr(vm-sa_bo);
+   vm-pt = radeon_sa_bo_cpu_addr(vm-sa_bo);
+   vm-pt_gpu_addr = radeon_sa_bo_gpu_addr(vm-sa_bo);
memset(vm-pt, 0, RADEON_GPU_PAGE_ALIGN(vm-last_pfn * 8));
 
 retry_id:
diff --git a/drivers/gpu/drm/radeon/radeon_object.h 
b/drivers/gpu/drm/radeon/radeon_object.h
index 99ab46a..4fc7f07 100644
--- a/drivers/gpu/drm/radeon/radeon_object.h
+++ b/drivers/gpu/drm/radeon/radeon_object.h
@@ -168,10 +168,10 @@ extern int radeon_sa_bo_manager_suspend(struct 
radeon_device *rdev,
struct radeon_sa_manager *sa_manager);
 extern int radeon_sa_bo_new(struct radeon_device *rdev,
struct radeon_sa_manager *sa_manager,
-   struct radeon_sa_bo *sa_bo,
+   struct radeon_sa_bo **sa_bo,
unsigned size, unsigned align);
 extern void radeon_sa_bo_free(struct radeon_device *rdev,
- struct radeon_sa_bo *sa_bo);
+ struct radeon_sa_bo **sa_bo);
 #if defined(CONFIG_DEBUG_FS)
 extern void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager,
 struct seq_file *m);
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
b/drivers/gpu/drm/radeon/radeon_ring.c
index f49c9c0..45adb37 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -127,8 +127,8 @@ retry:
 size, 256);
if (!r) {
*ib = rdev-ib_pool.ibs[idx];
-   (*ib)-ptr = 
radeon_sa_bo_cpu_addr((*ib)-sa_bo);
-   (*ib)-gpu_addr = 
radeon_sa_bo_gpu_addr((*ib)-sa_bo);
+   (*ib)-ptr = 
radeon_sa_bo_cpu_addr((*ib)-sa_bo);
+   (*ib)-gpu_addr = 
radeon_sa_bo_gpu_addr((*ib)-sa_bo);

[PATCH 11/19] drm/radeon: define new SA interface v3

2012-05-09 Thread Christian König

Define the interface without modifying the allocation
algorithm in any way.

v2: rebase on top of fence new uint64 patch
v3: add ring to debugfs output

Signed-off-by: Jerome Glisse jgli...@redhat.com
Signed-off-by: Christian König deathsim...@vodafone.de
---
 drivers/gpu/drm/radeon/radeon.h   |1 +
 drivers/gpu/drm/radeon/radeon_gart.c  |6 +--
 drivers/gpu/drm/radeon/radeon_object.h|5 ++-
 drivers/gpu/drm/radeon/radeon_ring.c  |8 ++--
 drivers/gpu/drm/radeon/radeon_sa.c|   60 -
 drivers/gpu/drm/radeon/radeon_semaphore.c |2 +-
 6 files changed, 63 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 9374ab1..ada70d1 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -398,6 +398,7 @@ struct radeon_sa_bo {
struct radeon_sa_manager*manager;
unsignedsoffset;
unsignedeoffset;
+   struct radeon_fence *fence;
 };
 
 /*
diff --git a/drivers/gpu/drm/radeon/radeon_gart.c 
b/drivers/gpu/drm/radeon/radeon_gart.c
index c5789ef..53dba8e 100644
--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -326,7 +326,7 @@ static void radeon_vm_unbind_locked(struct radeon_device 
*rdev,
rdev-vm_manager.use_bitmap = ~(1  vm-id);
list_del_init(vm-list);
vm-id = -1;
-   radeon_sa_bo_free(rdev, vm-sa_bo);
+   radeon_sa_bo_free(rdev, vm-sa_bo, NULL);
vm-pt = NULL;
 
list_for_each_entry(bo_va, vm-va, vm_list) {
@@ -395,7 +395,7 @@ int radeon_vm_bind(struct radeon_device *rdev, struct 
radeon_vm *vm)
 retry:
r = radeon_sa_bo_new(rdev, rdev-vm_manager.sa_manager, vm-sa_bo,
 RADEON_GPU_PAGE_ALIGN(vm-last_pfn * 8),
-RADEON_GPU_PAGE_SIZE);
+RADEON_GPU_PAGE_SIZE, false);
if (r) {
if (list_empty(rdev-vm_manager.lru_vm)) {
return r;
@@ -426,7 +426,7 @@ retry_id:
/* do hw bind */
r = rdev-vm_manager.funcs-bind(rdev, vm, id);
if (r) {
-   radeon_sa_bo_free(rdev, vm-sa_bo);
+   radeon_sa_bo_free(rdev, vm-sa_bo, NULL);
return r;
}
rdev-vm_manager.use_bitmap |= 1  id;
diff --git a/drivers/gpu/drm/radeon/radeon_object.h 
b/drivers/gpu/drm/radeon/radeon_object.h
index 4fc7f07..befec7d 100644
--- a/drivers/gpu/drm/radeon/radeon_object.h
+++ b/drivers/gpu/drm/radeon/radeon_object.h
@@ -169,9 +169,10 @@ extern int radeon_sa_bo_manager_suspend(struct 
radeon_device *rdev,
 extern int radeon_sa_bo_new(struct radeon_device *rdev,
struct radeon_sa_manager *sa_manager,
struct radeon_sa_bo **sa_bo,
-   unsigned size, unsigned align);
+   unsigned size, unsigned align, bool block);
 extern void radeon_sa_bo_free(struct radeon_device *rdev,
- struct radeon_sa_bo **sa_bo);
+ struct radeon_sa_bo **sa_bo,
+ struct radeon_fence *fence);
 #if defined(CONFIG_DEBUG_FS)
 extern void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager,
 struct seq_file *m);
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
b/drivers/gpu/drm/radeon/radeon_ring.c
index 45adb37..1748d93 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -85,7 +85,7 @@ bool radeon_ib_try_free(struct radeon_device *rdev, struct 
radeon_ib *ib)
if (ib-fence  ib-fence-seq  RADEON_FENCE_NOTEMITED_SEQ) {
if (radeon_fence_signaled(ib-fence)) {
radeon_fence_unref(ib-fence);
-   radeon_sa_bo_free(rdev, ib-sa_bo);
+   radeon_sa_bo_free(rdev, ib-sa_bo, NULL);
done = true;
}
}
@@ -124,7 +124,7 @@ retry:
if (rdev-ib_pool.ibs[idx].fence == NULL) {
r = radeon_sa_bo_new(rdev, rdev-ib_pool.sa_manager,
 rdev-ib_pool.ibs[idx].sa_bo,
-size, 256);
+size, 256, false);
if (!r) {
*ib = rdev-ib_pool.ibs[idx];
(*ib)-ptr = 
radeon_sa_bo_cpu_addr((*ib)-sa_bo);
@@ -173,7 +173,7 @@ void radeon_ib_free(struct radeon_device *rdev, struct 
radeon_ib **ib)
}
radeon_mutex_lock(rdev-ib_pool.mutex);
if (tmp-fence  tmp-fence-seq == RADEON_FENCE_NOTEMITED_SEQ) {
-   radeon_sa_bo_free(rdev, tmp-sa_bo);
+   radeon_sa_bo_free(rdev, tmp-sa_bo, NULL);

[PATCH 12/19] drm/radeon: use one wait queue for all rings add fence_wait_any v2

2012-05-09 Thread Christian König

From: Jerome Glisse jgli...@redhat.com

Use one wait queue for all rings. When one ring progress, other
likely does to and we are not expecting to have a lot of waiter
anyway.

Also add a fence_wait_any that will wait until the first fence
in the fence array (one fence per ring) is signaled. This allow
to wait on all rings.

v2: some minor cleanups and improvements.

Signed-off-by: Christian König deathsim...@vodafone.de
Signed-off-by: Jerome Glisse jgli...@redhat.com
---
 drivers/gpu/drm/radeon/radeon.h   |5 +-
 drivers/gpu/drm/radeon/radeon_fence.c |  165 +++--
 2 files changed, 163 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index ada70d1..37a7459 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -262,7 +262,6 @@ struct radeon_fence_driver {
uint64_tseq;
atomic64_t  last_seq;
unsigned long   last_activity;
-   wait_queue_head_t   queue;
boolinitialized;
 };
 
@@ -286,6 +285,9 @@ bool radeon_fence_signaled(struct radeon_fence *fence);
 int radeon_fence_wait(struct radeon_fence *fence, bool interruptible);
 int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring);
 int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring);
+int radeon_fence_wait_any(struct radeon_device *rdev,
+ struct radeon_fence **fences,
+ bool intr);
 struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence);
 void radeon_fence_unref(struct radeon_fence **fence);
 unsigned radeon_fence_count_emitted(struct radeon_device *rdev, int ring);
@@ -1534,6 +1536,7 @@ struct radeon_device {
struct radeon_scratch   scratch;
struct radeon_mman  mman;
struct radeon_fence_driver  fence_drv[RADEON_NUM_RINGS];
+   wait_queue_head_t   fence_queue;
struct radeon_semaphore_driver  semaphore_drv;
struct mutexring_lock;
struct radeon_ring  ring[RADEON_NUM_RINGS];
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
b/drivers/gpu/drm/radeon/radeon_fence.c
index 098d1fa..14dbc28 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -129,7 +129,7 @@ void radeon_fence_process(struct radeon_device *rdev, int 
ring)
 
if (wake) {
rdev-fence_drv[ring].last_activity = jiffies;
-   wake_up_all(rdev-fence_drv[ring].queue);
+   wake_up_all(rdev-fence_queue);
}
 }
 
@@ -224,11 +224,11 @@ static int radeon_fence_wait_seq(struct radeon_device 
*rdev, u64 target_seq,
trace_radeon_fence_wait_begin(rdev-ddev, seq);
radeon_irq_kms_sw_irq_get(rdev, ring);
if (intr) {
-   r = 
wait_event_interruptible_timeout(rdev-fence_drv[ring].queue,
+   r = wait_event_interruptible_timeout(rdev-fence_queue,
(signaled = radeon_fence_seq_signaled(rdev, 
target_seq, ring)),
timeout);
 } else {
-   r = wait_event_timeout(rdev-fence_drv[ring].queue,
+   r = wait_event_timeout(rdev-fence_queue,
(signaled = radeon_fence_seq_signaled(rdev, 
target_seq, ring)),
timeout);
}
@@ -306,6 +306,159 @@ int radeon_fence_wait(struct radeon_fence *fence, bool 
intr)
return 0;
 }
 
+bool radeon_fence_any_seq_signaled(struct radeon_device *rdev, u64 *seq)
+{
+   unsigned i;
+
+   for (i = 0; i  RADEON_NUM_RINGS; ++i) {
+   if (seq[i]  radeon_fence_seq_signaled(rdev, seq[i], i)) {
+   return true;
+   }
+   }
+   return false;
+}
+
+static int radeon_fence_wait_any_seq(struct radeon_device *rdev,
+u64 *target_seq, bool intr)
+{
+   unsigned long timeout, last_activity, tmp;
+   unsigned i, ring = RADEON_NUM_RINGS;
+   bool signaled;
+   int r;
+
+   for (i = 0, last_activity = 0; i  RADEON_NUM_RINGS; ++i) {
+   if (!target_seq[i]) {
+   continue;
+   }
+
+   /* use the most recent one as indicator */
+   if (time_after(rdev-fence_drv[i].last_activity, 
last_activity)) {
+   last_activity = rdev-fence_drv[i].last_activity;
+   }
+
+   /* For lockup detection just pick the lowest ring we are
+* actively waiting for
+*/
+   if (i  ring) {
+   ring = i;
+   }
+   }
+
+   /* nothing to wait for ? */
+   if (ring == RADEON_NUM_RINGS) {
+

[PATCH 13/19] drm/radeon: multiple ring allocator v3

2012-05-09 Thread Christian König

A startover with a new idea for a multiple ring allocator.
Should perform as well as a normal ring allocator as long
as only one ring does somthing, but falls back to a more
complex algorithm if more complex things start to happen.

We store the last allocated bo in last, we always try to allocate
after the last allocated bo. Principle is that in a linear GPU ring
progression was is after last is the oldest bo we allocated and thus
the first one that should no longer be in use by the GPU.

If it's not the case we skip over the bo after last to the closest
done bo if such one exist. If none exist and we are not asked to
block we report failure to allocate.

If we are asked to block we wait on all the oldest fence of all
rings. We just wait for any of those fence to complete.

v2: We need to be able to let hole point to the list_head, otherwise
try free will never free the first allocation of the list. Also
stop calling radeon_fence_signalled more than necessary.

v3: Don't free allocations without considering them as a hole,
otherwise we might lose holes. Also return ENOMEM instead of ENOENT
when running out of fences to wait for. Limit the number of holes
we try for each ring to 3.

Signed-off-by: Christian König deathsim...@vodafone.de
Signed-off-by: Jerome Glisse jgli...@redhat.com
---
 drivers/gpu/drm/radeon/radeon.h  |7 +-
 drivers/gpu/drm/radeon/radeon_ring.c |   19 +--
 drivers/gpu/drm/radeon/radeon_sa.c   |  312 --
 3 files changed, 231 insertions(+), 107 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 37a7459..cc7f16a 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -385,7 +385,9 @@ struct radeon_bo_list {
 struct radeon_sa_manager {
spinlock_t  lock;
struct radeon_bo*bo;
-   struct list_headsa_bo;
+   struct list_head*hole;
+   struct list_headflist[RADEON_NUM_RINGS];
+   struct list_headolist;
unsignedsize;
uint64_tgpu_addr;
void*cpu_ptr;
@@ -396,7 +398,8 @@ struct radeon_sa_bo;
 
 /* sub-allocation buffer */
 struct radeon_sa_bo {
-   struct list_headlist;
+   struct list_headolist;
+   struct list_headflist;
struct radeon_sa_manager*manager;
unsignedsoffset;
unsignedeoffset;
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
b/drivers/gpu/drm/radeon/radeon_ring.c
index 1748d93..e074ff5 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -204,25 +204,22 @@ int radeon_ib_schedule(struct radeon_device *rdev, struct 
radeon_ib *ib)
 
 int radeon_ib_pool_init(struct radeon_device *rdev)
 {
-   struct radeon_sa_manager tmp;
int i, r;
 
-   r = radeon_sa_bo_manager_init(rdev, tmp,
- RADEON_IB_POOL_SIZE*64*1024,
- RADEON_GEM_DOMAIN_GTT);
-   if (r) {
-   return r;
-   }
-
radeon_mutex_lock(rdev-ib_pool.mutex);
if (rdev-ib_pool.ready) {
radeon_mutex_unlock(rdev-ib_pool.mutex);
-   radeon_sa_bo_manager_fini(rdev, tmp);
return 0;
}
 
-   rdev-ib_pool.sa_manager = tmp;
-   INIT_LIST_HEAD(rdev-ib_pool.sa_manager.sa_bo);
+   r = radeon_sa_bo_manager_init(rdev, rdev-ib_pool.sa_manager,
+ RADEON_IB_POOL_SIZE*64*1024,
+ RADEON_GEM_DOMAIN_GTT);
+   if (r) {
+   radeon_mutex_unlock(rdev-ib_pool.mutex);
+   return r;
+   }
+
for (i = 0; i  RADEON_IB_POOL_SIZE; i++) {
rdev-ib_pool.ibs[i].fence = NULL;
rdev-ib_pool.ibs[i].idx = i;
diff --git a/drivers/gpu/drm/radeon/radeon_sa.c 
b/drivers/gpu/drm/radeon/radeon_sa.c
index 90ee8ad..c3ac7f4 100644
--- a/drivers/gpu/drm/radeon/radeon_sa.c
+++ b/drivers/gpu/drm/radeon/radeon_sa.c
@@ -27,21 +27,42 @@
  * Authors:
  *Jerome Glisse gli...@freedesktop.org
  */
+/* Algorithm:
+ *
+ * We store the last allocated bo in hole, we always try to allocate
+ * after the last allocated bo. Principle is that in a linear GPU ring
+ * progression was is after last is the oldest bo we allocated and thus
+ * the first one that should no longer be in use by the GPU.
+ *
+ * If it's not the case we skip over the bo after last to the closest
+ * done bo if such one exist. If none exist and we are not asked to
+ * block we report failure to allocate.
+ *
+ * If we are asked to block we wait on all the oldest fence of all
+ * rings. We just wait for any of those fence to complete.
+ */
 #include drmP.h
 #include drm.h
 #include radeon.h
 
+static void

[PATCH 15/19] drm/radeon: rip out the ib pool

2012-05-09 Thread Christian König

From: Jerome Glisse jgli...@redhat.com

It isn't necessary any more and the suballocator seems to perform
even better.

Signed-off-by: Christian König deathsim...@vodafone.de
Signed-off-by: Jerome Glisse jgli...@redhat.com
---
 drivers/gpu/drm/radeon/radeon.h   |   17 +-
 drivers/gpu/drm/radeon/radeon_device.c|1 -
 drivers/gpu/drm/radeon/radeon_gart.c  |   12 +-
 drivers/gpu/drm/radeon/radeon_ring.c  |  241 -
 drivers/gpu/drm/radeon/radeon_semaphore.c |2 +-
 5 files changed, 71 insertions(+), 202 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 45164e1..6170307 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -625,7 +625,6 @@ void radeon_irq_kms_pflip_irq_put(struct radeon_device 
*rdev, int crtc);
 
 struct radeon_ib {
struct radeon_sa_bo *sa_bo;
-   unsignedidx;
uint32_tlength_dw;
uint64_tgpu_addr;
uint32_t*ptr;
@@ -634,18 +633,6 @@ struct radeon_ib {
boolis_const_ib;
 };
 
-/*
- * locking -
- * mutex protects scheduled_ibs, ready, alloc_bm
- */
-struct radeon_ib_pool {
-   struct radeon_mutex mutex;
-   struct radeon_sa_managersa_manager;
-   struct radeon_ibibs[RADEON_IB_POOL_SIZE];
-   boolready;
-   unsignedhead_id;
-};
-
 struct radeon_ring {
struct radeon_bo*ring_obj;
volatile uint32_t   *ring;
@@ -787,7 +774,6 @@ struct si_rlc {
 int radeon_ib_get(struct radeon_device *rdev, int ring,
  struct radeon_ib **ib, unsigned size);
 void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib);
-bool radeon_ib_try_free(struct radeon_device *rdev, struct radeon_ib *ib);
 int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib);
 int radeon_ib_pool_init(struct radeon_device *rdev);
 void radeon_ib_pool_fini(struct radeon_device *rdev);
@@ -1522,7 +1508,8 @@ struct radeon_device {
wait_queue_head_t   fence_queue;
struct mutexring_lock;
struct radeon_ring  ring[RADEON_NUM_RINGS];
-   struct radeon_ib_pool   ib_pool;
+   boolib_pool_ready;
+   struct radeon_sa_managerring_tmp_bo;
struct radeon_irq   irq;
struct radeon_asic  *asic;
struct radeon_gem   gem;
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 48876c1..e1bc7e9 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -724,7 +724,6 @@ int radeon_device_init(struct radeon_device *rdev,
/* mutex initialization are all done here so we
 * can recall function without having locking issues */
radeon_mutex_init(rdev-cs_mutex);
-   radeon_mutex_init(rdev-ib_pool.mutex);
mutex_init(rdev-ring_lock);
mutex_init(rdev-dc_hw_i2c_mutex);
if (rdev-family = CHIP_R600)
diff --git a/drivers/gpu/drm/radeon/radeon_gart.c 
b/drivers/gpu/drm/radeon/radeon_gart.c
index 53dba8e..8e9ef34 100644
--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -432,8 +432,8 @@ retry_id:
rdev-vm_manager.use_bitmap |= 1  id;
vm-id = id;
list_add_tail(vm-list, rdev-vm_manager.lru_vm);
-   return radeon_vm_bo_update_pte(rdev, vm, rdev-ib_pool.sa_manager.bo,
-  rdev-ib_pool.sa_manager.bo-tbo.mem);
+   return radeon_vm_bo_update_pte(rdev, vm, rdev-ring_tmp_bo.bo,
+  rdev-ring_tmp_bo.bo-tbo.mem);
 }
 
 /* object have to be reserved */
@@ -631,7 +631,7 @@ int radeon_vm_init(struct radeon_device *rdev, struct 
radeon_vm *vm)
/* map the ib pool buffer at 0 in virtual address space, set
 * read only
 */
-   r = radeon_vm_bo_add(rdev, vm, rdev-ib_pool.sa_manager.bo, 0,
+   r = radeon_vm_bo_add(rdev, vm, rdev-ring_tmp_bo.bo, 0,
 RADEON_VM_PAGE_READABLE | RADEON_VM_PAGE_SNOOPED);
return r;
 }
@@ -648,12 +648,12 @@ void radeon_vm_fini(struct radeon_device *rdev, struct 
radeon_vm *vm)
radeon_mutex_unlock(rdev-cs_mutex);
 
/* remove all bo */
-   r = radeon_bo_reserve(rdev-ib_pool.sa_manager.bo, false);
+   r = radeon_bo_reserve(rdev-ring_tmp_bo.bo, false);
if (!r) {
-   bo_va = radeon_bo_va(rdev-ib_pool.sa_manager.bo, vm);
+   bo_va = radeon_bo_va(rdev-ring_tmp_bo.bo, vm);
list_del_init(bo_va-bo_list);
list_del_init(bo_va-vm_list);
-   radeon_bo_unreserve(rdev-ib_pool.sa_manager.bo);
+

[PATCH 14/19] drm/radeon: simplify semaphore handling v2

2012-05-09 Thread Christian König

From: Jerome Glisse jgli...@redhat.com

Directly use the suballocator to get small chunks of memory.
It's equally fast and doesn't crash when we encounter a GPU reset.

v2: rebased on new SA interface.

Signed-off-by: Christian König deathsim...@vodafone.de
Signed-off-by: Jerome Glisse jgli...@redhat.com
---
 drivers/gpu/drm/radeon/evergreen.c|1 -
 drivers/gpu/drm/radeon/ni.c   |1 -
 drivers/gpu/drm/radeon/r600.c |1 -
 drivers/gpu/drm/radeon/radeon.h   |   29 +-
 drivers/gpu/drm/radeon/radeon_device.c|2 -
 drivers/gpu/drm/radeon/radeon_fence.c |2 +-
 drivers/gpu/drm/radeon/radeon_semaphore.c |  137 +
 drivers/gpu/drm/radeon/radeon_test.c  |4 +-
 drivers/gpu/drm/radeon/rv770.c|1 -
 drivers/gpu/drm/radeon/si.c   |1 -
 10 files changed, 30 insertions(+), 149 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c 
b/drivers/gpu/drm/radeon/evergreen.c
index ecc29bc..7e7ac3d 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -3550,7 +3550,6 @@ void evergreen_fini(struct radeon_device *rdev)
evergreen_pcie_gart_fini(rdev);
r600_vram_scratch_fini(rdev);
radeon_gem_fini(rdev);
-   radeon_semaphore_driver_fini(rdev);
radeon_fence_driver_fini(rdev);
radeon_agp_fini(rdev);
radeon_bo_fini(rdev);
diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c
index 9cd2657..107b217 100644
--- a/drivers/gpu/drm/radeon/ni.c
+++ b/drivers/gpu/drm/radeon/ni.c
@@ -1744,7 +1744,6 @@ void cayman_fini(struct radeon_device *rdev)
cayman_pcie_gart_fini(rdev);
r600_vram_scratch_fini(rdev);
radeon_gem_fini(rdev);
-   radeon_semaphore_driver_fini(rdev);
radeon_fence_driver_fini(rdev);
radeon_bo_fini(rdev);
radeon_atombios_fini(rdev);
diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index d02f13f..478b51e 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -2658,7 +2658,6 @@ void r600_fini(struct radeon_device *rdev)
r600_vram_scratch_fini(rdev);
radeon_agp_fini(rdev);
radeon_gem_fini(rdev);
-   radeon_semaphore_driver_fini(rdev);
radeon_fence_driver_fini(rdev);
radeon_bo_fini(rdev);
radeon_atombios_fini(rdev);
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index cc7f16a..45164e1 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -434,34 +434,13 @@ int radeon_mode_dumb_destroy(struct drm_file *file_priv,
 /*
  * Semaphores.
  */
-struct radeon_ring;
-
-#defineRADEON_SEMAPHORE_BO_SIZE256
-
-struct radeon_semaphore_driver {
-   rwlock_tlock;
-   struct list_headbo;
-};
-
-struct radeon_semaphore_bo;
-
 /* everything here is constant */
 struct radeon_semaphore {
-   struct list_headlist;
+   struct radeon_sa_bo *sa_bo;
+   signed  waiters;
uint64_tgpu_addr;
-   uint32_t*cpu_ptr;
-   struct radeon_semaphore_bo  *bo;
 };
 
-struct radeon_semaphore_bo {
-   struct list_headlist;
-   struct radeon_ib*ib;
-   struct list_headfree;
-   struct radeon_semaphore semaphores[RADEON_SEMAPHORE_BO_SIZE/8];
-   unsignednused;
-};
-
-void radeon_semaphore_driver_fini(struct radeon_device *rdev);
 int radeon_semaphore_create(struct radeon_device *rdev,
struct radeon_semaphore **semaphore);
 void radeon_semaphore_emit_signal(struct radeon_device *rdev, int ring,
@@ -473,7 +452,8 @@ int radeon_semaphore_sync_rings(struct radeon_device *rdev,
bool sync_to[RADEON_NUM_RINGS],
int dst_ring);
 void radeon_semaphore_free(struct radeon_device *rdev,
-  struct radeon_semaphore *semaphore);
+  struct radeon_semaphore *semaphore,
+  struct radeon_fence *fence);
 
 /*
  * GART structures, functions  helpers
@@ -1540,7 +1520,6 @@ struct radeon_device {
struct radeon_mman  mman;
struct radeon_fence_driver  fence_drv[RADEON_NUM_RINGS];
wait_queue_head_t   fence_queue;
-   struct radeon_semaphore_driver  semaphore_drv;
struct mutexring_lock;
struct radeon_ring  ring[RADEON_NUM_RINGS];
struct radeon_ib_pool   ib_pool;
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index b827b2e..48876c1 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@

[PATCH 16/19] drm/radeon: immediately free ttm-move semaphore

2012-05-09 Thread Christian König

We can now protected the semaphore ram by a
fence, so free it immediately.

Signed-off-by: Christian König deathsim...@vodafone.de
---
 drivers/gpu/drm/radeon/radeon_ttm.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 5e3d54d..0f6aee8 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -223,6 +223,7 @@ static int radeon_move_blit(struct ttm_buffer_object *bo,
struct radeon_device *rdev;
uint64_t old_start, new_start;
struct radeon_fence *fence, *old_fence;
+   struct radeon_semaphore *sem = NULL;
int r;
 
rdev = radeon_get_rdev(bo-bdev);
@@ -272,15 +273,16 @@ static int radeon_move_blit(struct ttm_buffer_object *bo,
bool sync_to_ring[RADEON_NUM_RINGS] = { };
sync_to_ring[old_fence-ring] = true;
 
-   r = radeon_semaphore_create(rdev, fence-semaphore);
+   r = radeon_semaphore_create(rdev, sem);
if (r) {
radeon_fence_unref(fence);
return r;
}
 
-   r = radeon_semaphore_sync_rings(rdev, fence-semaphore,
+   r = radeon_semaphore_sync_rings(rdev, sem,
sync_to_ring, fence-ring);
if (r) {
+   radeon_semaphore_free(rdev, sem, NULL);
radeon_fence_unref(fence);
return r;
}
@@ -292,6 +294,7 @@ static int radeon_move_blit(struct ttm_buffer_object *bo,
/* FIXME: handle copy error */
r = ttm_bo_move_accel_cleanup(bo, (void *)fence, NULL,
  evict, no_wait_reserve, no_wait_gpu, 
new_mem);
+   radeon_semaphore_free(rdev, sem, fence);
radeon_fence_unref(fence);
return r;
 }
-- 
1.7.9.5

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 17/19] drm/radeon: move the semaphore from the fence into the ib

2012-05-09 Thread Christian König

From: Jerome Glisse jgli...@redhat.com

It never really belonged there in the first place.

Signed-off-by: Christian König deathsim...@vodafone.de
---
 drivers/gpu/drm/radeon/radeon.h   |   16 
 drivers/gpu/drm/radeon/radeon_cs.c|4 ++--
 drivers/gpu/drm/radeon/radeon_fence.c |3 ---
 drivers/gpu/drm/radeon/radeon_ring.c  |2 ++
 4 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 6170307..9507be0 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -272,7 +272,6 @@ struct radeon_fence {
uint64_tseq;
/* RB, DMA, etc. */
unsignedring;
-   struct radeon_semaphore *semaphore;
 };
 
 int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
@@ -624,13 +623,14 @@ void radeon_irq_kms_pflip_irq_put(struct radeon_device 
*rdev, int crtc);
  */
 
 struct radeon_ib {
-   struct radeon_sa_bo *sa_bo;
-   uint32_tlength_dw;
-   uint64_tgpu_addr;
-   uint32_t*ptr;
-   struct radeon_fence *fence;
-   unsignedvm_id;
-   boolis_const_ib;
+   struct radeon_sa_bo *sa_bo;
+   uint32_tlength_dw;
+   uint64_tgpu_addr;
+   uint32_t*ptr;
+   struct radeon_fence *fence;
+   unsignedvm_id;
+   boolis_const_ib;
+   struct radeon_semaphore *semaphore;
 };
 
 struct radeon_ring {
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index 5c065bf..dcfe2a0 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -138,12 +138,12 @@ static int radeon_cs_sync_rings(struct radeon_cs_parser 
*p)
return 0;
}
 
-   r = radeon_semaphore_create(p-rdev, p-ib-fence-semaphore);
+   r = radeon_semaphore_create(p-rdev, p-ib-semaphore);
if (r) {
return r;
}
 
-   return radeon_semaphore_sync_rings(p-rdev, p-ib-fence-semaphore,
+   return radeon_semaphore_sync_rings(p-rdev, p-ib-semaphore,
   sync_to_ring, p-ring);
 }
 
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
b/drivers/gpu/drm/radeon/radeon_fence.c
index 3a49311..48ec5e3 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -139,8 +139,6 @@ static void radeon_fence_destroy(struct kref *kref)
 
fence = container_of(kref, struct radeon_fence, kref);
fence-seq = RADEON_FENCE_NOTEMITED_SEQ;
-   if (fence-semaphore)
-   radeon_semaphore_free(fence-rdev, fence-semaphore, NULL);
kfree(fence);
 }
 
@@ -156,7 +154,6 @@ int radeon_fence_create(struct radeon_device *rdev,
(*fence)-rdev = rdev;
(*fence)-seq = RADEON_FENCE_NOTEMITED_SEQ;
(*fence)-ring = ring;
-   (*fence)-semaphore = NULL;
return 0;
 }
 
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
b/drivers/gpu/drm/radeon/radeon_ring.c
index b3d6942..af8e1ee 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -93,6 +93,7 @@ int radeon_ib_get(struct radeon_device *rdev, int ring,
(*ib)-gpu_addr = radeon_sa_bo_gpu_addr((*ib)-sa_bo);
(*ib)-vm_id = 0;
(*ib)-is_const_ib = false;
+   (*ib)-semaphore = NULL;
 
return 0;
 }
@@ -105,6 +106,7 @@ void radeon_ib_free(struct radeon_device *rdev, struct 
radeon_ib **ib)
if (tmp == NULL) {
return;
}
+   radeon_semaphore_free(rdev, tmp-semaphore, tmp-fence);
radeon_sa_bo_free(rdev, tmp-sa_bo, tmp-fence);
radeon_fence_unref(tmp-fence);
kfree(tmp);
-- 
1.7.9.5

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 18/19] drm/radeon: remove r600 blit mutex v2

2012-05-09 Thread Christian König

If we don't store local data into global variables
it isn't necessary to lock anything.

v2: rebased on new SA interface

Signed-off-by: Christian König deathsim...@vodafone.de
---
 drivers/gpu/drm/radeon/evergreen_blit_kms.c |1 -
 drivers/gpu/drm/radeon/r600.c   |   13 ++--
 drivers/gpu/drm/radeon/r600_blit_kms.c  |   99 +++
 drivers/gpu/drm/radeon/radeon.h |3 -
 drivers/gpu/drm/radeon/radeon_asic.h|9 ++-
 5 files changed, 50 insertions(+), 75 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen_blit_kms.c 
b/drivers/gpu/drm/radeon/evergreen_blit_kms.c
index 222acd2..30f0480 100644
--- a/drivers/gpu/drm/radeon/evergreen_blit_kms.c
+++ b/drivers/gpu/drm/radeon/evergreen_blit_kms.c
@@ -637,7 +637,6 @@ int evergreen_blit_init(struct radeon_device *rdev)
if (rdev-r600_blit.shader_obj)
goto done;
 
-   mutex_init(rdev-r600_blit.mutex);
rdev-r600_blit.state_offset = 0;
 
if (rdev-family  CHIP_CAYMAN)
diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index 478b51e..00b2238 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -2363,20 +2363,15 @@ int r600_copy_blit(struct radeon_device *rdev,
   unsigned num_gpu_pages,
   struct radeon_fence *fence)
 {
+   struct radeon_sa_bo *vb = NULL;
int r;
 
-   mutex_lock(rdev-r600_blit.mutex);
-   rdev-r600_blit.vb_ib = NULL;
-   r = r600_blit_prepare_copy(rdev, num_gpu_pages);
+   r = r600_blit_prepare_copy(rdev, num_gpu_pages, vb);
if (r) {
-   if (rdev-r600_blit.vb_ib)
-   radeon_ib_free(rdev, rdev-r600_blit.vb_ib);
-   mutex_unlock(rdev-r600_blit.mutex);
return r;
}
-   r600_kms_blit_copy(rdev, src_offset, dst_offset, num_gpu_pages);
-   r600_blit_done_copy(rdev, fence);
-   mutex_unlock(rdev-r600_blit.mutex);
+   r600_kms_blit_copy(rdev, src_offset, dst_offset, num_gpu_pages, vb);
+   r600_blit_done_copy(rdev, fence, vb);
return 0;
 }
 
diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c 
b/drivers/gpu/drm/radeon/r600_blit_kms.c
index db38f58..ef20822 100644
--- a/drivers/gpu/drm/radeon/r600_blit_kms.c
+++ b/drivers/gpu/drm/radeon/r600_blit_kms.c
@@ -513,7 +513,6 @@ int r600_blit_init(struct radeon_device *rdev)
rdev-r600_blit.primitives.set_default_state = set_default_state;
 
rdev-r600_blit.ring_size_common = 40; /* shaders + def state */
-   rdev-r600_blit.ring_size_common += 16; /* fence emit for VB IB */
rdev-r600_blit.ring_size_common += 5; /* done copy */
rdev-r600_blit.ring_size_common += 16; /* fence emit for done copy */
 
@@ -528,7 +527,6 @@ int r600_blit_init(struct radeon_device *rdev)
if (rdev-r600_blit.shader_obj)
goto done;
 
-   mutex_init(rdev-r600_blit.mutex);
rdev-r600_blit.state_offset = 0;
 
if (rdev-family = CHIP_RV770)
@@ -621,27 +619,6 @@ void r600_blit_fini(struct radeon_device *rdev)
radeon_bo_unref(rdev-r600_blit.shader_obj);
 }
 
-static int r600_vb_ib_get(struct radeon_device *rdev, unsigned size)
-{
-   int r;
-   r = radeon_ib_get(rdev, RADEON_RING_TYPE_GFX_INDEX,
- rdev-r600_blit.vb_ib, size);
-   if (r) {
-   DRM_ERROR(failed to get IB for vertex buffer\n);
-   return r;
-   }
-
-   rdev-r600_blit.vb_total = size;
-   rdev-r600_blit.vb_used = 0;
-   return 0;
-}
-
-static void r600_vb_ib_put(struct radeon_device *rdev)
-{
-   radeon_fence_emit(rdev, rdev-r600_blit.vb_ib-fence);
-   radeon_ib_free(rdev, rdev-r600_blit.vb_ib);
-}
-
 static unsigned r600_blit_create_rect(unsigned num_gpu_pages,
  int *width, int *height, int max_dim)
 {
@@ -688,7 +665,8 @@ static unsigned r600_blit_create_rect(unsigned 
num_gpu_pages,
 }
 
 
-int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_gpu_pages)
+int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_gpu_pages,
+  struct radeon_sa_bo **vb)
 {
struct radeon_ring *ring = rdev-ring[RADEON_RING_TYPE_GFX_INDEX];
int r;
@@ -705,46 +683,54 @@ int r600_blit_prepare_copy(struct radeon_device *rdev, 
unsigned num_gpu_pages)
}
 
/* 48 bytes for vertex per loop */
-   r = r600_vb_ib_get(rdev, (num_loops*48)+256);
-   if (r)
+   r = radeon_sa_bo_new(rdev, rdev-ring_tmp_bo, vb,
+(num_loops*48)+256, 256, true);
+   if (r) {
return r;
+   }
 
/* calculate number of loops correctly */
ring_size = num_loops * dwords_per_loop;
ring_size += rdev-r600_blit.ring_size_common;
r = radeon_ring_lock(rdev, ring, ring_size);
-   if (r)
+   if (r) {
+   radeon_sa_bo_free(rdev, vb, NULL);

[PATCH 19/19] drm/radeon: make the ib an inline object

2012-05-09 Thread Christian König

From: Jerome Glisse jgli...@redhat.com

No need to malloc it any more.

Signed-off-by: Jerome Glisse jgli...@redhat.com
Signed-off-by: Christian König deathsim...@vodafone.de
---
 drivers/gpu/drm/radeon/evergreen_cs.c |   10 +++---
 drivers/gpu/drm/radeon/r100.c |   38 ++--
 drivers/gpu/drm/radeon/r200.c |2 +-
 drivers/gpu/drm/radeon/r300.c |4 +--
 drivers/gpu/drm/radeon/r600.c |   16 -
 drivers/gpu/drm/radeon/r600_cs.c  |   22 ++--
 drivers/gpu/drm/radeon/radeon.h   |8 ++---
 drivers/gpu/drm/radeon/radeon_cs.c|   63 -
 drivers/gpu/drm/radeon/radeon_ring.c  |   41 +++--
 9 files changed, 93 insertions(+), 111 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c 
b/drivers/gpu/drm/radeon/evergreen_cs.c
index 70089d3..4e7dd2b 100644
--- a/drivers/gpu/drm/radeon/evergreen_cs.c
+++ b/drivers/gpu/drm/radeon/evergreen_cs.c
@@ -1057,7 +1057,7 @@ static int evergreen_cs_packet_parse_vline(struct 
radeon_cs_parser *p)
uint32_t header, h_idx, reg, wait_reg_mem_info;
volatile uint32_t *ib;
 
-   ib = p-ib-ptr;
+   ib = p-ib.ptr;
 
/* parse the WAIT_REG_MEM */
r = evergreen_cs_packet_parse(p, wait_reg_mem, p-idx);
@@ -1215,7 +1215,7 @@ static int evergreen_cs_check_reg(struct radeon_cs_parser 
*p, u32 reg, u32 idx)
if (!(evergreen_reg_safe_bm[i]  m))
return 0;
}
-   ib = p-ib-ptr;
+   ib = p-ib.ptr;
switch (reg) {
/* force following reg to 0 in an attempt to disable out buffer
 * which will need us to better understand how it works to perform
@@ -1896,7 +1896,7 @@ static int evergreen_packet3_check(struct 
radeon_cs_parser *p,
u32 idx_value;
 
track = (struct evergreen_cs_track *)p-track;
-   ib = p-ib-ptr;
+   ib = p-ib.ptr;
idx = pkt-idx + 1;
idx_value = radeon_get_ib_value(p, idx);
 
@@ -2610,8 +2610,8 @@ int evergreen_cs_parse(struct radeon_cs_parser *p)
}
} while (p-idx  p-chunks[p-chunk_ib_idx].length_dw);
 #if 0
-   for (r = 0; r  p-ib-length_dw; r++) {
-   printk(KERN_INFO %05d  0x%08X\n, r, p-ib-ptr[r]);
+   for (r = 0; r  p-ib.length_dw; r++) {
+   printk(KERN_INFO %05d  0x%08X\n, r, p-ib.ptr[r]);
mdelay(1);
}
 #endif
diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index ad6ceb7..0874a6d 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -139,9 +139,9 @@ int r100_reloc_pitch_offset(struct radeon_cs_parser *p,
}
 
tmp |= tile_flags;
-   p-ib-ptr[idx] = (value  0x3fc0) | tmp;
+   p-ib.ptr[idx] = (value  0x3fc0) | tmp;
} else
-   p-ib-ptr[idx] = (value  0xffc0) | tmp;
+   p-ib.ptr[idx] = (value  0xffc0) | tmp;
return 0;
 }
 
@@ -156,7 +156,7 @@ int r100_packet3_load_vbpntr(struct radeon_cs_parser *p,
volatile uint32_t *ib;
u32 idx_value;
 
-   ib = p-ib-ptr;
+   ib = p-ib.ptr;
track = (struct r100_cs_track *)p-track;
c = radeon_get_ib_value(p, idx++)  0x1F;
if (c  16) {
@@ -1275,7 +1275,7 @@ void r100_cs_dump_packet(struct radeon_cs_parser *p,
unsigned i;
unsigned idx;
 
-   ib = p-ib-ptr;
+   ib = p-ib.ptr;
idx = pkt-idx;
for (i = 0; i = (pkt-count + 1); i++, idx++) {
DRM_INFO(ib[%d]=0x%08X\n, idx, ib[idx]);
@@ -1354,7 +1354,7 @@ int r100_cs_packet_parse_vline(struct radeon_cs_parser *p)
uint32_t header, h_idx, reg;
volatile uint32_t *ib;
 
-   ib = p-ib-ptr;
+   ib = p-ib.ptr;
 
/* parse the wait until */
r = r100_cs_packet_parse(p, waitreloc, p-idx);
@@ -1533,7 +1533,7 @@ static int r100_packet0_check(struct radeon_cs_parser *p,
u32 tile_flags = 0;
u32 idx_value;
 
-   ib = p-ib-ptr;
+   ib = p-ib.ptr;
track = (struct r100_cs_track *)p-track;
 
idx_value = radeon_get_ib_value(p, idx);
@@ -1889,7 +1889,7 @@ static int r100_packet3_check(struct radeon_cs_parser *p,
volatile uint32_t *ib;
int r;
 
-   ib = p-ib-ptr;
+   ib = p-ib.ptr;
idx = pkt-idx + 1;
track = (struct r100_cs_track *)p-track;
switch (pkt-opcode) {
@@ -3684,7 +3684,7 @@ void r100_ring_ib_execute(struct radeon_device *rdev, 
struct radeon_ib *ib)
 
 int r100_ib_test(struct radeon_device *rdev, struct radeon_ring *ring)
 {
-   struct radeon_ib *ib;
+   struct radeon_ib ib;
uint32_t scratch;
uint32_t tmp = 0;
unsigned i;
@@ -3700,22 +3700,22 @@ int r100_ib_test(struct radeon_device *rdev, struct 
radeon_ring *ring)
if (r) {
return r;
}
-   ib-ptr[0] = PACKET0(scratch, 0);
-   ib-ptr[1] = 0xDEADBEEF;
-

[Bug 49632] radeon: The kernel rejected CS,

2012-05-09 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=49632

--- Comment #9 from execute.met...@gmail.com 2012-05-09 06:36:57 PDT ---
No, there is nothing else in dmesg. Is there any more info you'd like me to
gather?

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/radeon: don't mess with hot plug detect for eDP or LVDS connector v2

2012-05-09 Thread Alex Deucher

On Fri, May 4, 2012 at 11:06 AM,  j.gli...@gmail.com wrote:
 From: Jerome Glisse jgli...@redhat.com

 It seems imac pannel doesn't like whe we change the hot plug setup
 and then refuse to work. This help but doesn't fully fix:
 https://bugzilla.redhat.com/show_bug.cgi?id=726143

How does it help?  Does it fix the the aux problems, but the monitor
still doesn't train?  What's the working value of the relevant
DC_HPD*_CONTROL register?

Alex


 v2: fix typo and improve commit message

 Signed-off-by: Matthew Garrett m...@redhat.com
 Signed-off-by: Jerome Glisse jgli...@redhat.com
 ---
  drivers/gpu/drm/radeon/r600.c |    8 
  1 file changed, 8 insertions(+)

 diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
 index 694b6b2..a304c9d 100644
 --- a/drivers/gpu/drm/radeon/r600.c
 +++ b/drivers/gpu/drm/radeon/r600.c
 @@ -713,6 +713,14 @@ void r600_hpd_init(struct radeon_device *rdev)
        list_for_each_entry(connector, dev-mode_config.connector_list, head) 
 {
                struct radeon_connector *radeon_connector = 
 to_radeon_connector(connector);

 +               if (connector-connector_type == DRM_MODE_CONNECTOR_eDP ||
 +                   connector-connector_type == DRM_MODE_CONNECTOR_LVDS) {
 +                       /* don't try to enable hpd on eDP or LVDS avoid 
 breaking the
 +                        * aux dp channel on imac and help (but not 
 completely fix)
 +                        * https://bugzilla.redhat.com/show_bug.cgi?id=726143
 +                        */
 +                       continue;
 +               }
                if (ASIC_IS_DCE3(rdev)) {
                        u32 tmp = DC_HPDx_CONNECTION_TIMER(0x9c4) | 
 DC_HPDx_RX_INT_TIMER(0xfa);
                        if (ASIC_IS_DCE32(rdev))
 --
 1.7.9.3

 ___
 dri-devel mailing list
 dri-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/radeon: don't mess with hot plug detect for eDP or LVDS connector v2

2012-05-09 Thread Jerome Glisse

On Wed, May 9, 2012 at 9:40 AM, Alex Deucher alexdeuc...@gmail.com wrote:
 On Fri, May 4, 2012 at 11:06 AM,  j.gli...@gmail.com wrote:
 From: Jerome Glisse jgli...@redhat.com

 It seems imac pannel doesn't like whe we change the hot plug setup
 and then refuse to work. This help but doesn't fully fix:
 https://bugzilla.redhat.com/show_bug.cgi?id=726143

 How does it help?  Does it fix the the aux problems, but the monitor
 still doesn't train?  What's the working value of the relevant
 DC_HPD*_CONTROL register?

 Alex

Don't have the hw but somehow the way we program this reg completely
disable the panel, after that the panel doesn't answer to anything
(nor i2c nor any aux transaction). Without programming that link
training is successfull but panel stays black. I can ask to get the
value before and after.

Cheers,
Jerome
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Include request for SA improvements

2012-05-09 Thread Jerome Glisse

On Wed, May 9, 2012 at 9:34 AM, Christian König deathsim...@vodafone.de wrote:
 Hi Dave  Jerome and everybody on the list,

 I can't find any more bugs and also I'm out of things to test, so I really
 hope that this is the last incarnation of this patchset, and if Jerome is
 ok with it it should now be included into drm-next.

 Cheers,
 Christian.


Yeah looks good to me.

Cheers,
Jerome
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH] drm/radeon/kms: fix warning on 32-bit in atomic fence printing

2012-05-09 Thread Dave Airlie

From: Dave Airlie airl...@redhat.com

/ssd/git/drm-core-next/drivers/gpu/drm/radeon/radeon_fence.c: In function 
‘radeon_debugfs_fence_info’:
/ssd/git/drm-core-next/drivers/gpu/drm/radeon/radeon_fence.c:606:7: warning: 
format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 3 has 
type ‘long long int’ [-Wformat]

Signed-off-by: Dave Airlie airl...@redhat.com
---
 drivers/gpu/drm/radeon/radeon_fence.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
b/drivers/gpu/drm/radeon/radeon_fence.c
index 48ec5e3..11f5f40 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -602,8 +602,8 @@ static int radeon_debugfs_fence_info(struct seq_file *m, 
void *data)
continue;
 
seq_printf(m, --- ring %d ---\n, i);
-   seq_printf(m, Last signaled fence 0x%016lx\n,
-  atomic64_read(rdev-fence_drv[i].last_seq));
+   seq_printf(m, Last signaled fence 0x%016llx\n,
+  (unsigned long 
long)atomic64_read(rdev-fence_drv[i].last_seq));
seq_printf(m, Last emitted  0x%016llx\n,
   rdev-fence_drv[i].seq);
}
-- 
1.7.7.6

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/radeon/kms: fix warning on 32-bit in atomic fence printing

2012-05-09 Thread Jerome Glisse

On Wed, May 9, 2012 at 12:28 PM, Dave Airlie airl...@gmail.com wrote:
 From: Dave Airlie airl...@redhat.com

 /ssd/git/drm-core-next/drivers/gpu/drm/radeon/radeon_fence.c: In function 
 ‘radeon_debugfs_fence_info’:
 /ssd/git/drm-core-next/drivers/gpu/drm/radeon/radeon_fence.c:606:7: warning: 
 format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 3 has 
 type ‘long long int’ [-Wformat]

 Signed-off-by: Dave Airlie airl...@redhat.com

Reviewed-by: Jerome Glisse jgli...@redhat.com

 ---
  drivers/gpu/drm/radeon/radeon_fence.c |    4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)

 diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
 b/drivers/gpu/drm/radeon/radeon_fence.c
 index 48ec5e3..11f5f40 100644
 --- a/drivers/gpu/drm/radeon/radeon_fence.c
 +++ b/drivers/gpu/drm/radeon/radeon_fence.c
 @@ -602,8 +602,8 @@ static int radeon_debugfs_fence_info(struct seq_file *m, 
 void *data)
                        continue;

                seq_printf(m, --- ring %d ---\n, i);
 -               seq_printf(m, Last signaled fence 0x%016lx\n,
 -                          atomic64_read(rdev-fence_drv[i].last_seq));
 +               seq_printf(m, Last signaled fence 0x%016llx\n,
 +                          (unsigned long 
 long)atomic64_read(rdev-fence_drv[i].last_seq));
                seq_printf(m, Last emitted  0x%016llx\n,
                           rdev-fence_drv[i].seq);
        }
 --
 1.7.7.6

 ___
 dri-devel mailing list
 dri-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/radeon: don't mess with hot plug detect for eDP or LVDS connector v2

2012-05-09 Thread Alex Deucher

On Wed, May 9, 2012 at 10:23 AM, Jerome Glisse j.gli...@gmail.com wrote:
 On Wed, May 9, 2012 at 9:40 AM, Alex Deucher alexdeuc...@gmail.com wrote:
 On Fri, May 4, 2012 at 11:06 AM,  j.gli...@gmail.com wrote:
 From: Jerome Glisse jgli...@redhat.com

 It seems imac pannel doesn't like whe we change the hot plug setup
 and then refuse to work. This help but doesn't fully fix:
 https://bugzilla.redhat.com/show_bug.cgi?id=726143

 How does it help?  Does it fix the the aux problems, but the monitor
 still doesn't train?  What's the working value of the relevant
 DC_HPD*_CONTROL register?

 Alex

 Don't have the hw but somehow the way we program this reg completely
 disable the panel, after that the panel doesn't answer to anything
 (nor i2c nor any aux transaction). Without programming that link
 training is successfull but panel stays black. I can ask to get the
 value before and after.

Patch seems reasonable in general (we don't really need hpd to be
explicitly enabled for lvds or edp) so:

Reviewed-by: Alex Deucher alexander.deuc...@amd.com


 Cheers,
 Jerome
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 43215] New: Nouveau: Resume from s2disk fails.

2012-05-09 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=43215

   Summary: Nouveau: Resume from s2disk fails.
   Product: Drivers
   Version: 2.5
Kernel Version: 3.3.5, 3.4-rc4+, 3.4-rc6+
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Video(DRI - non Intel)
AssignedTo: drivers_video-...@kernel-bugs.osdl.org
ReportedBy: harn-s...@gmx.de
Regression: No


Created an attachment (id=73220)
 -- (https://bugzilla.kernel.org/attachment.cgi?id=73220)
dmesg output of a cold boot - suspend to disk  resume cycle

When sending the PC into suspend to disk mode via echo disk  /sys/power/state
or using pm-suspend the resume process fails with the following messages:

[   61.782648] [drm] nouveau :01:00.0: timeout:
URSOR_CTRL2_STATUS_ACTIVE(0)
[   61.782709] [drm] nouveau :01:00.0: CURSOR_CTRL2(0) = 0x

This bug is reproducible on several 3.3.x and 3.4-rc kernels. I did not test it
on prior kernels. Suspend to RAM works fine.

The card used here is a nvce, Gainward 560 Ti, 2G video ram.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Include request for SA improvements

2012-05-09 Thread Dave Airlie

On Wed, May 9, 2012 at 3:31 PM, Jerome Glisse j.gli...@gmail.com wrote:
 On Wed, May 9, 2012 at 9:34 AM, Christian König deathsim...@vodafone.de 
 wrote:
 Hi Dave  Jerome and everybody on the list,

 I can't find any more bugs and also I'm out of things to test, so I really
 hope that this is the last incarnation of this patchset, and if Jerome is
 ok with it it should now be included into drm-next.

 Cheers,
 Christian.


 Yeah looks good to me.


All pushed into -next + the warning fix on top.

Thanks guys,
Dave.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 2/2 v3] drm/exynos: added userptr feature.

2012-05-09 Thread Jerome Glisse

On Wed, May 9, 2012 at 10:45 AM, Jerome Glisse j.gli...@gmail.com wrote:
 On Wed, May 9, 2012 at 2:17 AM, Inki Dae inki@samsung.com wrote:
 this feature is used to import user space region allocated by malloc() or
 mmaped into a gem. and to guarantee the pages to user space not to be
 swapped out, the VMAs within the user space would be locked and then unlocked
 when the pages are released.

 but this lock might result in significant degradation of system performance
 because the pages couldn't be swapped out so we limit user-desired userptr
 size to pre-defined.

 Signed-off-by: Inki Dae inki@samsung.com
 Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com


 Again i would like feedback from mm people (adding cc). I am not sure
 locking the vma is the right anwser as i said in my previous mail,
 userspace can munlock it in your back, maybe VM_RESERVED is better.
 Anyway even not considering that you don't check at all that process
 don't go over the limit of locked page see mm/mlock.c RLIMIT_MEMLOCK
 for how it's done. Also you mlock complete vma but the userptr you get
 might be inside say 16M vma and you only care about 1M of userptr, if
 you mark the whole vma as locked than anytime a new page is fault in
 the vma else where than in the buffer you are interested then it got
 allocated for ever until the gem buffer is destroy, i am not sure of
 what happen to the vma on next malloc if it grows or not (i would
 think it won't grow at it would have different flags than new
 anonymous memory).

 The whole business of directly using malloced memory for gpu is fishy
 and i would really like to get it right rather than relying on never
 hitting strange things like page migration, vma merging, or worse
 things like over locking pages and stealing memory.

 Cheers,
 Jerome

I had a lengthy discussion with mm people (thx a lot for that). I
think we should split 2 different use case. The zero-copy upload case
ie :
app:
ptr = malloc()
...
glTex/VBO/UBO/...(ptr)
free(ptr) or reuse it for other things
For which i guess you want to avoid having to do a memcpy inside the
gl library (could be anything else than gl that have same useage
pattern).

ie after the upload happen you don't care about those page they can
removed from the vma or marked as cow so that anything messing with
those page after the upload won't change what you uploaded. Of course
this is assuming that the tlb cost of doing such thing is smaller than
the cost of memcpy the data.

Two way to do that, either you assume app can't not read back data
after gl can and you do an unmap_mapping_range (make sure you only
unmap fully covered page and that you copy non fully covered page) or
you want to allow userspace to still read data or possibly overwrite
them

Second use case is something more like for the opencl case of
CL_MEM_USE_HOST_PTR, in which you want to use the same page in the gpu
and keep the userspace vma pointing to those page. I think the
agreement on this case is that there is no way right now to do it
sanely inside linux kernel. mlocking will need proper accounting
against rtlimit but this limit might be low. Also the fork case might
be problematic.

For the fork case the memory is anonymous so it should be COWed in the
fork child but relative to cl context that means the child could not
use the cl context with that memory or at least if the child write to
this memory the cl will not see those change. I guess the answer to
that one is that you really need to use the cl api to read the object
or get proper ptr to read it.

Anyway in all case, implementing this userptr thing need a lot more
code. You have to check to that the vma you are trying to use is
anonymous and only handle this case and fallback to alloc new page and
copy otherwise..

Cheers,
Jerome
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 43858] DVI of ATI RADEON 9200 AGP don't work

2012-05-09 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=43858

Alex Deucher ag...@yahoo.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||NOTABUG

--- Comment #27 from Alex Deucher ag...@yahoo.com 2012-05-09 12:00:09 PDT ---
The DVI issue is fixed.  Please open a new bug if you are still having gfx
problems.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

In Diablo III when the ground is covered in water, it's all messed-up.

2012-05-09 Thread Mike Mestnik

I found a rendering error in Diablo III, in a well there ground is
covered in water and it's all red and flickers.

http://c566453.r53.cf2.rackcdn.com/DiabloIII-well-wine-preloader.trace.xz 3.5G,
234665296 Bytes compressed ~234M.

EE r600_shader.c:1605 r600_shader_from_tgsi - GPR limit exceeded -
shader requires 181 registers
EE r600_shader.c:140 r600_pipe_shader_create - translation from TGSI
failed !
Rendered 621 frames in 80.724 secs, average of 7.69288 fps

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

RE: [PATCH 2/2 v3] drm/exynos: added userptr feature.

2012-05-09 Thread Inki Dae

Hi Jerome,

 -Original Message-
 From: Jerome Glisse [mailto:j.gli...@gmail.com]
 Sent: Wednesday, May 09, 2012 11:46 PM
 To: Inki Dae
 Cc: airl...@linux.ie; dri-devel@lists.freedesktop.org;
 kyungmin.p...@samsung.com; sw0312@samsung.com; linux...@kvack.org
 Subject: Re: [PATCH 2/2 v3] drm/exynos: added userptr feature.

 On Wed, May 9, 2012 at 2:17 AM, Inki Dae inki@samsung.com wrote:
  this feature is used to import user space region allocated by malloc()
 or
  mmaped into a gem. and to guarantee the pages to user space not to be
  swapped out, the VMAs within the user space would be locked and then
 unlocked
  when the pages are released.

  but this lock might result in significant degradation of system
 performance
  because the pages couldn't be swapped out so we limit user-desired
 userptr
  size to pre-defined.

  Signed-off-by: Inki Dae inki@samsung.com
  Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com

 Again i would like feedback from mm people (adding cc). I am not sure

Thank you, I missed adding mm as cc.

 locking the vma is the right anwser as i said in my previous mail,
 userspace can munlock it in your back, maybe VM_RESERVED is better.

I know that with VM_RESERVED flag, also we can avoid the pages from being
swapped out. but these pages should be unlocked anytime we want because we
could allocate all pages on system and lock them, which in turn, it may
result in significant deterioration of system performance.(maybe other
processes requesting free memory would be blocked) so I used VM_LOCKED flags
instead. but I'm not sure this way is best also.

 Anyway even not considering that you don't check at all that process
 don't go over the limit of locked page see mm/mlock.c RLIMIT_MEMLOCK

Thank you for your advices.

 for how it's done. Also you mlock complete vma but the userptr you get
 might be inside say 16M vma and you only care about 1M of userptr, if
 you mark the whole vma as locked than anytime a new page is fault in
 the vma else where than in the buffer you are interested then it got
 allocated for ever until the gem buffer is destroy, i am not sure of
 what happen to the vma on next malloc if it grows or not (i would
 think it won't grow at it would have different flags than new
 anonymous memory).

 The whole business of directly using malloced memory for gpu is fishy
 and i would really like to get it right rather than relying on never
 hitting strange things like page migration, vma merging, or worse
 things like over locking pages and stealing memory.

Your comments are very helpful to me and I will consider some cases I missed
and you pointed out for next patch.

Thanks,
Inki Dae

 Cheers,
 Jerome

  ---
   drivers/gpu/drm/exynos/exynos_drm_drv.c |    2 +
   drivers/gpu/drm/exynos/exynos_drm_gem.c |  334
 +++
   drivers/gpu/drm/exynos/exynos_drm_gem.h |   17 ++-
   include/drm/exynos_drm.h                |   26 +++-
   4 files changed, 376 insertions(+), 3 deletions(-)

  diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c
 b/drivers/gpu/drm/exynos/exynos_drm_drv.c
  index 1e68ec2..e8ae3f1 100644
  --- a/drivers/gpu/drm/exynos/exynos_drm_drv.c
  +++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c
  @@ -220,6 +220,8 @@ static struct drm_ioctl_desc exynos_ioctls[] = {
         DRM_IOCTL_DEF_DRV(EXYNOS_GEM_MAP_OFFSET,
                         exynos_drm_gem_map_offset_ioctl, DRM_UNLOCKED |
                         DRM_AUTH),
  +       DRM_IOCTL_DEF_DRV(EXYNOS_GEM_USERPTR,
  +                       exynos_drm_gem_userptr_ioctl, DRM_UNLOCKED),
         DRM_IOCTL_DEF_DRV(EXYNOS_GEM_MMAP,
                         exynos_drm_gem_mmap_ioctl, DRM_UNLOCKED |
DRM_AUTH),
         DRM_IOCTL_DEF_DRV(EXYNOS_GEM_GET,
  diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c
 b/drivers/gpu/drm/exynos/exynos_drm_gem.c
  index e6abb66..ccc6e3d 100644
  --- a/drivers/gpu/drm/exynos/exynos_drm_gem.c
  +++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c
  @@ -68,6 +68,83 @@ static int check_gem_flags(unsigned int flags)
         return 0;
   }

  +static struct vm_area_struct *get_vma(struct vm_area_struct *vma)
  +{
  +       struct vm_area_struct *vma_copy;
  +
  +       vma_copy = kmalloc(sizeof(*vma_copy), GFP_KERNEL);
  +       if (!vma_copy)
  +               return NULL;
  +
  +       if (vma-vm_ops  vma-vm_ops-open)
  +               vma-vm_ops-open(vma);
  +
  +       if (vma-vm_file)
  +               get_file(vma-vm_file);
  +
  +       memcpy(vma_copy, vma, sizeof(*vma));
  +
  +       vma_copy-vm_mm = NULL;
  +       vma_copy-vm_next = NULL;
  +       vma_copy-vm_prev = NULL;
  +
  +       return vma_copy;
  +}
  +
  +
  +static void put_vma(struct vm_area_struct *vma)
  +{
  +       if (!vma)
  +               return;
  +
  +       if (vma-vm_ops  vma-vm_ops-close)
  +               vma-vm_ops-close(vma);
  +
  +       if (vma-vm_file)
  +               fput(vma-vm_file);
  +
  +       kfree(vma);
  +}
  +
  +/*
  + *

RE: [PATCH 2/2 v3] drm/exynos: added userptr feature.

2012-05-09 Thread Inki Dae

Hi Jerome,

Thank you again.

 -Original Message-
 From: Jerome Glisse [mailto:j.gli...@gmail.com]
 Sent: Thursday, May 10, 2012 3:33 AM
 To: Inki Dae
 Cc: airl...@linux.ie; dri-devel@lists.freedesktop.org;
 kyungmin.p...@samsung.com; sw0312@samsung.com; linux...@kvack.org
 Subject: Re: [PATCH 2/2 v3] drm/exynos: added userptr feature.

 On Wed, May 9, 2012 at 10:45 AM, Jerome Glisse j.gli...@gmail.com wrote:
  On Wed, May 9, 2012 at 2:17 AM, Inki Dae inki@samsung.com wrote:
  this feature is used to import user space region allocated by malloc()
 or
  mmaped into a gem. and to guarantee the pages to user space not to be
  swapped out, the VMAs within the user space would be locked and then
 unlocked
  when the pages are released.

  but this lock might result in significant degradation of system
 performance
  because the pages couldn't be swapped out so we limit user-desired
 userptr
  size to pre-defined.

  Signed-off-by: Inki Dae inki@samsung.com
  Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com

  Again i would like feedback from mm people (adding cc). I am not sure
  locking the vma is the right anwser as i said in my previous mail,
  userspace can munlock it in your back, maybe VM_RESERVED is better.
  Anyway even not considering that you don't check at all that process
  don't go over the limit of locked page see mm/mlock.c RLIMIT_MEMLOCK
  for how it's done. Also you mlock complete vma but the userptr you get
  might be inside say 16M vma and you only care about 1M of userptr, if
  you mark the whole vma as locked than anytime a new page is fault in
  the vma else where than in the buffer you are interested then it got
  allocated for ever until the gem buffer is destroy, i am not sure of
  what happen to the vma on next malloc if it grows or not (i would
  think it won't grow at it would have different flags than new
  anonymous memory).

  The whole business of directly using malloced memory for gpu is fishy
  and i would really like to get it right rather than relying on never
  hitting strange things like page migration, vma merging, or worse
  things like over locking pages and stealing memory.

  Cheers,
  Jerome

 I had a lengthy discussion with mm people (thx a lot for that). I
 think we should split 2 different use case. The zero-copy upload case
 ie :
 app:
 ptr = malloc()
 ...
 glTex/VBO/UBO/...(ptr)
 free(ptr) or reuse it for other things
 For which i guess you want to avoid having to do a memcpy inside the
 gl library (could be anything else than gl that have same useage
 pattern).

Right, in this case, we are using the userptr feature as pixman and evas
backend to use 2d accelerator.

 ie after the upload happen you don't care about those page they can
 removed from the vma or marked as cow so that anything messing with
 those page after the upload won't change what you uploaded. Of course

I'm not sure that I understood your mentions but could the pages be removed
from vma with VM_LOCKED or VM_RESERVED? once glTex/VBO/UBO/..., the VMAs to
user space would be locked. if cpu accessed significant part of all the
pages in user mode then pages to the part would be allocated by page fault
handler, after that, through userptr, the VMAs to user address space would
be locked(at this time, the remaining pages would be allocated also by
get_user_pages by calling page fault handler) I'd be glad to give me any
comments and advices if there is my missing point.

 this is assuming that the tlb cost of doing such thing is smaller than
 the cost of memcpy the data.

yes, in our test case, the tlb cost(incurred by tlb miss) was smaller than
the cost of memcpy also cpu usage. of course, this would be depended on gpu
performance.

 Two way to do that, either you assume app can't not read back data
 after gl can and you do an unmap_mapping_range (make sure you only
 unmap fully covered page and that you copy non fully covered page) or
 you want to allow userspace to still read data or possibly overwrite
 them

 Second use case is something more like for the opencl case of
 CL_MEM_USE_HOST_PTR, in which you want to use the same page in the gpu
 and keep the userspace vma pointing to those page. I think the
 agreement on this case is that there is no way right now to do it
 sanely inside linux kernel. mlocking will need proper accounting
 against rtlimit but this limit might be low. Also the fork case might
 be problematic.

 For the fork case the memory is anonymous so it should be COWed in the
 fork child but relative to cl context that means the child could not
 use the cl context with that memory or at least if the child write to
 this memory the cl will not see those change. I guess the answer to
 that one is that you really need to use the cl api to read the object
 or get proper ptr to read it.

 Anyway in all case, implementing this userptr thing need a lot more
 code. You have to check to that the vma you are trying to use is
 anonymous

Re: [PATCH 2/2 v3] drm/exynos: added userptr feature.

2012-05-09 Thread Minchan Kim

On 05/10/2012 10:39 AM, Inki Dae wrote:

 Hi Jerome,

 -Original Message-
 From: Jerome Glisse [mailto:j.gli...@gmail.com]
 Sent: Wednesday, May 09, 2012 11:46 PM
 To: Inki Dae
 Cc: airl...@linux.ie; dri-devel@lists.freedesktop.org;
 kyungmin.p...@samsung.com; sw0312@samsung.com; linux...@kvack.org
 Subject: Re: [PATCH 2/2 v3] drm/exynos: added userptr feature.

 On Wed, May 9, 2012 at 2:17 AM, Inki Dae inki@samsung.com wrote:
 this feature is used to import user space region allocated by malloc()
 or
 mmaped into a gem. and to guarantee the pages to user space not to be
 swapped out, the VMAs within the user space would be locked and then
 unlocked
 when the pages are released.

 but this lock might result in significant degradation of system
 performance
 because the pages couldn't be swapped out so we limit user-desired
 userptr
 size to pre-defined.

 Signed-off-by: Inki Dae inki@samsung.com
 Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com

 Again i would like feedback from mm people (adding cc). I am not sure

 Thank you, I missed adding mm as cc.

 locking the vma is the right anwser as i said in my previous mail,
 userspace can munlock it in your back, maybe VM_RESERVED is better.

 I know that with VM_RESERVED flag, also we can avoid the pages from being
 swapped out. but these pages should be unlocked anytime we want because we
 could allocate all pages on system and lock them, which in turn, it may
 result in significant deterioration of system performance.(maybe other
 processes requesting free memory would be blocked) so I used VM_LOCKED flags
 instead. but I'm not sure this way is best also.

 Anyway even not considering that you don't check at all that process
 don't go over the limit of locked page see mm/mlock.c RLIMIT_MEMLOCK

 Thank you for your advices.

 for how it's done. Also you mlock complete vma but the userptr you get
 might be inside say 16M vma and you only care about 1M of userptr, if
 you mark the whole vma as locked than anytime a new page is fault in
 the vma else where than in the buffer you are interested then it got
 allocated for ever until the gem buffer is destroy, i am not sure of
 what happen to the vma on next malloc if it grows or not (i would
 think it won't grow at it would have different flags than new
 anonymous memory).

I don't know history in detail because you didn't have sent full patches to 
linux-mm and
I didn't read the below code, either.
Just read your description and reply of Jerome. Apparently, there is something 
I missed.

Your goal is to avoid swap out some user pages which is used in kernel at the 
same time. Right?
Let's use get_user_pages. Is there any issue you can't use it?
It increases page count so reclaimer can't swap out page.
Isn't it enough?
Marking whole VMA into MLCOKED is overkill.

-- 
Kind regards,
Minchan Kim

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

99 matches

Mail list logo