In Diablo III when the ground is covered in water, it's all messed-up.
I found a rendering error in Diablo III, in a well there ground is covered in water and it's all red and flickers. http://c566453.r53.cf2.rackcdn.com/DiabloIII-well-wine-preloader.trace.xz 3.5G, 234665296 Bytes compressed ~234M. EE r600_shader.c:1605 r600_shader_from_tgsi - GPR limit exceeded - shader requires 181 registers EE r600_shader.c:140 r600_pipe_shader_create - translation from TGSI failed ! Rendered 621 frames in 80.724 secs, average of 7.69288 fps
Include request for SA improvements
On Wed, May 9, 2012 at 3:31 PM, Jerome Glisse wrote: > On Wed, May 9, 2012 at 9:34 AM, Christian K?nig > wrote: >> Hi Dave & Jerome and everybody on the list, >> >> I can't find any more bugs and also I'm out of things to test, so I really >> hope that this is the last incarnation of this patchset, and if Jerome is >> ok with it it should now be included into drm-next. >> >> Cheers, >> Christian. >> > > Yeah looks good to me. > All pushed into -next + the warning fix on top. Thanks guys, Dave.
[Bug 43858] DVI of ATI RADEON 9200 AGP don't work
https://bugs.freedesktop.org/show_bug.cgi?id=43858 Alex Deucher changed: What|Removed |Added Status|NEW |RESOLVED Resolution||NOTABUG --- Comment #27 from Alex Deucher 2012-05-09 12:00:09 PDT --- The DVI issue is fixed. Please open a new bug if you are still having gfx problems. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[PATCH] drm/radeon/kms: fix warning on 32-bit in atomic fence printing
From: Dave Airlie/ssd/git/drm-core-next/drivers/gpu/drm/radeon/radeon_fence.c: In function ?radeon_debugfs_fence_info?: /ssd/git/drm-core-next/drivers/gpu/drm/radeon/radeon_fence.c:606:7: warning: format ?%lx? expects argument of type ?long unsigned int?, but argument 3 has type ?long long int? [-Wformat] Signed-off-by: Dave Airlie --- drivers/gpu/drm/radeon/radeon_fence.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 48ec5e3..11f5f40 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -602,8 +602,8 @@ static int radeon_debugfs_fence_info(struct seq_file *m, void *data) continue; seq_printf(m, "--- ring %d ---\n", i); - seq_printf(m, "Last signaled fence 0x%016lx\n", - atomic64_read(>fence_drv[i].last_seq)); + seq_printf(m, "Last signaled fence 0x%016llx\n", + (unsigned long long)atomic64_read(>fence_drv[i].last_seq)); seq_printf(m, "Last emitted 0x%016llx\n", rdev->fence_drv[i].seq); } -- 1.7.7.6
[PATCH 19/19] drm/radeon: make the ib an inline object
From: Jerome GlisseNo need to malloc it any more. Signed-off-by: Jerome Glisse Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/evergreen_cs.c | 10 +++--- drivers/gpu/drm/radeon/r100.c | 38 ++-- drivers/gpu/drm/radeon/r200.c |2 +- drivers/gpu/drm/radeon/r300.c |4 +-- drivers/gpu/drm/radeon/r600.c | 16 - drivers/gpu/drm/radeon/r600_cs.c | 22 ++-- drivers/gpu/drm/radeon/radeon.h |8 ++--- drivers/gpu/drm/radeon/radeon_cs.c| 63 - drivers/gpu/drm/radeon/radeon_ring.c | 41 +++-- 9 files changed, 93 insertions(+), 111 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c b/drivers/gpu/drm/radeon/evergreen_cs.c index 70089d3..4e7dd2b 100644 --- a/drivers/gpu/drm/radeon/evergreen_cs.c +++ b/drivers/gpu/drm/radeon/evergreen_cs.c @@ -1057,7 +1057,7 @@ static int evergreen_cs_packet_parse_vline(struct radeon_cs_parser *p) uint32_t header, h_idx, reg, wait_reg_mem_info; volatile uint32_t *ib; - ib = p->ib->ptr; + ib = p->ib.ptr; /* parse the WAIT_REG_MEM */ r = evergreen_cs_packet_parse(p, _reg_mem, p->idx); @@ -1215,7 +1215,7 @@ static int evergreen_cs_check_reg(struct radeon_cs_parser *p, u32 reg, u32 idx) if (!(evergreen_reg_safe_bm[i] & m)) return 0; } - ib = p->ib->ptr; + ib = p->ib.ptr; switch (reg) { /* force following reg to 0 in an attempt to disable out buffer * which will need us to better understand how it works to perform @@ -1896,7 +1896,7 @@ static int evergreen_packet3_check(struct radeon_cs_parser *p, u32 idx_value; track = (struct evergreen_cs_track *)p->track; - ib = p->ib->ptr; + ib = p->ib.ptr; idx = pkt->idx + 1; idx_value = radeon_get_ib_value(p, idx); @@ -2610,8 +2610,8 @@ int evergreen_cs_parse(struct radeon_cs_parser *p) } } while (p->idx < p->chunks[p->chunk_ib_idx].length_dw); #if 0 - for (r = 0; r < p->ib->length_dw; r++) { - printk(KERN_INFO "%05d 0x%08X\n", r, p->ib->ptr[r]); + for (r = 0; r < p->ib.length_dw; r++) { + printk(KERN_INFO "%05d 0x%08X\n", r, p->ib.ptr[r]); mdelay(1); } #endif diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index ad6ceb7..0874a6d 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -139,9 +139,9 @@ int r100_reloc_pitch_offset(struct radeon_cs_parser *p, } tmp |= tile_flags; - p->ib->ptr[idx] = (value & 0x3fc0) | tmp; + p->ib.ptr[idx] = (value & 0x3fc0) | tmp; } else - p->ib->ptr[idx] = (value & 0xffc0) | tmp; + p->ib.ptr[idx] = (value & 0xffc0) | tmp; return 0; } @@ -156,7 +156,7 @@ int r100_packet3_load_vbpntr(struct radeon_cs_parser *p, volatile uint32_t *ib; u32 idx_value; - ib = p->ib->ptr; + ib = p->ib.ptr; track = (struct r100_cs_track *)p->track; c = radeon_get_ib_value(p, idx++) & 0x1F; if (c > 16) { @@ -1275,7 +1275,7 @@ void r100_cs_dump_packet(struct radeon_cs_parser *p, unsigned i; unsigned idx; - ib = p->ib->ptr; + ib = p->ib.ptr; idx = pkt->idx; for (i = 0; i <= (pkt->count + 1); i++, idx++) { DRM_INFO("ib[%d]=0x%08X\n", idx, ib[idx]); @@ -1354,7 +1354,7 @@ int r100_cs_packet_parse_vline(struct radeon_cs_parser *p) uint32_t header, h_idx, reg; volatile uint32_t *ib; - ib = p->ib->ptr; + ib = p->ib.ptr; /* parse the wait until */ r = r100_cs_packet_parse(p, , p->idx); @@ -1533,7 +1533,7 @@ static int r100_packet0_check(struct radeon_cs_parser *p, u32 tile_flags = 0; u32 idx_value; - ib = p->ib->ptr; + ib = p->ib.ptr; track = (struct r100_cs_track *)p->track; idx_value = radeon_get_ib_value(p, idx); @@ -1889,7 +1889,7 @@ static int r100_packet3_check(struct radeon_cs_parser *p, volatile uint32_t *ib; int r; - ib = p->ib->ptr; + ib = p->ib.ptr; idx = pkt->idx + 1; track = (struct r100_cs_track *)p->track; switch (pkt->opcode) { @@ -3684,7 +3684,7 @@ void r100_ring_ib_execute(struct radeon_device *rdev, struct radeon_ib *ib) int r100_ib_test(struct radeon_device *rdev, struct radeon_ring *ring) { - struct radeon_ib *ib; + struct radeon_ib ib; uint32_t scratch; uint32_t tmp = 0; unsigned i; @@ -3700,22 +3700,22 @@ int r100_ib_test(struct radeon_device *rdev, struct radeon_ring *ring) if (r) { return r; } - ib->ptr[0] = PACKET0(scratch, 0); - ib->ptr[1] =
[PATCH 18/19] drm/radeon: remove r600 blit mutex v2
If we don't store local data into global variables it isn't necessary to lock anything. v2: rebased on new SA interface Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/evergreen_blit_kms.c |1 - drivers/gpu/drm/radeon/r600.c | 13 ++-- drivers/gpu/drm/radeon/r600_blit_kms.c | 99 +++ drivers/gpu/drm/radeon/radeon.h |3 - drivers/gpu/drm/radeon/radeon_asic.h|9 ++- 5 files changed, 50 insertions(+), 75 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen_blit_kms.c b/drivers/gpu/drm/radeon/evergreen_blit_kms.c index 222acd2..30f0480 100644 --- a/drivers/gpu/drm/radeon/evergreen_blit_kms.c +++ b/drivers/gpu/drm/radeon/evergreen_blit_kms.c @@ -637,7 +637,6 @@ int evergreen_blit_init(struct radeon_device *rdev) if (rdev->r600_blit.shader_obj) goto done; - mutex_init(>r600_blit.mutex); rdev->r600_blit.state_offset = 0; if (rdev->family < CHIP_CAYMAN) diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index 478b51e..00b2238 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -2363,20 +2363,15 @@ int r600_copy_blit(struct radeon_device *rdev, unsigned num_gpu_pages, struct radeon_fence *fence) { + struct radeon_sa_bo *vb = NULL; int r; - mutex_lock(>r600_blit.mutex); - rdev->r600_blit.vb_ib = NULL; - r = r600_blit_prepare_copy(rdev, num_gpu_pages); + r = r600_blit_prepare_copy(rdev, num_gpu_pages, ); if (r) { - if (rdev->r600_blit.vb_ib) - radeon_ib_free(rdev, >r600_blit.vb_ib); - mutex_unlock(>r600_blit.mutex); return r; } - r600_kms_blit_copy(rdev, src_offset, dst_offset, num_gpu_pages); - r600_blit_done_copy(rdev, fence); - mutex_unlock(>r600_blit.mutex); + r600_kms_blit_copy(rdev, src_offset, dst_offset, num_gpu_pages, vb); + r600_blit_done_copy(rdev, fence, vb); return 0; } diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c b/drivers/gpu/drm/radeon/r600_blit_kms.c index db38f58..ef20822 100644 --- a/drivers/gpu/drm/radeon/r600_blit_kms.c +++ b/drivers/gpu/drm/radeon/r600_blit_kms.c @@ -513,7 +513,6 @@ int r600_blit_init(struct radeon_device *rdev) rdev->r600_blit.primitives.set_default_state = set_default_state; rdev->r600_blit.ring_size_common = 40; /* shaders + def state */ - rdev->r600_blit.ring_size_common += 16; /* fence emit for VB IB */ rdev->r600_blit.ring_size_common += 5; /* done copy */ rdev->r600_blit.ring_size_common += 16; /* fence emit for done copy */ @@ -528,7 +527,6 @@ int r600_blit_init(struct radeon_device *rdev) if (rdev->r600_blit.shader_obj) goto done; - mutex_init(>r600_blit.mutex); rdev->r600_blit.state_offset = 0; if (rdev->family >= CHIP_RV770) @@ -621,27 +619,6 @@ void r600_blit_fini(struct radeon_device *rdev) radeon_bo_unref(>r600_blit.shader_obj); } -static int r600_vb_ib_get(struct radeon_device *rdev, unsigned size) -{ - int r; - r = radeon_ib_get(rdev, RADEON_RING_TYPE_GFX_INDEX, - >r600_blit.vb_ib, size); - if (r) { - DRM_ERROR("failed to get IB for vertex buffer\n"); - return r; - } - - rdev->r600_blit.vb_total = size; - rdev->r600_blit.vb_used = 0; - return 0; -} - -static void r600_vb_ib_put(struct radeon_device *rdev) -{ - radeon_fence_emit(rdev, rdev->r600_blit.vb_ib->fence); - radeon_ib_free(rdev, >r600_blit.vb_ib); -} - static unsigned r600_blit_create_rect(unsigned num_gpu_pages, int *width, int *height, int max_dim) { @@ -688,7 +665,8 @@ static unsigned r600_blit_create_rect(unsigned num_gpu_pages, } -int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_gpu_pages) +int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_gpu_pages, + struct radeon_sa_bo **vb) { struct radeon_ring *ring = >ring[RADEON_RING_TYPE_GFX_INDEX]; int r; @@ -705,46 +683,54 @@ int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_gpu_pages) } /* 48 bytes for vertex per loop */ - r = r600_vb_ib_get(rdev, (num_loops*48)+256); - if (r) + r = radeon_sa_bo_new(rdev, >ring_tmp_bo, vb, +(num_loops*48)+256, 256, true); + if (r) { return r; + } /* calculate number of loops correctly */ ring_size = num_loops * dwords_per_loop; ring_size += rdev->r600_blit.ring_size_common; r = radeon_ring_lock(rdev, ring, ring_size); - if (r) + if (r) { + radeon_sa_bo_free(rdev, vb, NULL); return r; + }
[PATCH 17/19] drm/radeon: move the semaphore from the fence into the ib
From: Jerome GlisseIt never really belonged there in the first place. Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon.h | 16 drivers/gpu/drm/radeon/radeon_cs.c|4 ++-- drivers/gpu/drm/radeon/radeon_fence.c |3 --- drivers/gpu/drm/radeon/radeon_ring.c |2 ++ 4 files changed, 12 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 6170307..9507be0 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -272,7 +272,6 @@ struct radeon_fence { uint64_tseq; /* RB, DMA, etc. */ unsignedring; - struct radeon_semaphore *semaphore; }; int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring); @@ -624,13 +623,14 @@ void radeon_irq_kms_pflip_irq_put(struct radeon_device *rdev, int crtc); */ struct radeon_ib { - struct radeon_sa_bo *sa_bo; - uint32_tlength_dw; - uint64_tgpu_addr; - uint32_t*ptr; - struct radeon_fence *fence; - unsignedvm_id; - boolis_const_ib; + struct radeon_sa_bo *sa_bo; + uint32_tlength_dw; + uint64_tgpu_addr; + uint32_t*ptr; + struct radeon_fence *fence; + unsignedvm_id; + boolis_const_ib; + struct radeon_semaphore *semaphore; }; struct radeon_ring { diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 5c065bf..dcfe2a0 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -138,12 +138,12 @@ static int radeon_cs_sync_rings(struct radeon_cs_parser *p) return 0; } - r = radeon_semaphore_create(p->rdev, >ib->fence->semaphore); + r = radeon_semaphore_create(p->rdev, >ib->semaphore); if (r) { return r; } - return radeon_semaphore_sync_rings(p->rdev, p->ib->fence->semaphore, + return radeon_semaphore_sync_rings(p->rdev, p->ib->semaphore, sync_to_ring, p->ring); } diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 3a49311..48ec5e3 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -139,8 +139,6 @@ static void radeon_fence_destroy(struct kref *kref) fence = container_of(kref, struct radeon_fence, kref); fence->seq = RADEON_FENCE_NOTEMITED_SEQ; - if (fence->semaphore) - radeon_semaphore_free(fence->rdev, fence->semaphore, NULL); kfree(fence); } @@ -156,7 +154,6 @@ int radeon_fence_create(struct radeon_device *rdev, (*fence)->rdev = rdev; (*fence)->seq = RADEON_FENCE_NOTEMITED_SEQ; (*fence)->ring = ring; - (*fence)->semaphore = NULL; return 0; } diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index b3d6942..af8e1ee 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -93,6 +93,7 @@ int radeon_ib_get(struct radeon_device *rdev, int ring, (*ib)->gpu_addr = radeon_sa_bo_gpu_addr((*ib)->sa_bo); (*ib)->vm_id = 0; (*ib)->is_const_ib = false; + (*ib)->semaphore = NULL; return 0; } @@ -105,6 +106,7 @@ void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib) if (tmp == NULL) { return; } + radeon_semaphore_free(rdev, tmp->semaphore, tmp->fence); radeon_sa_bo_free(rdev, >sa_bo, tmp->fence); radeon_fence_unref(>fence); kfree(tmp); -- 1.7.9.5
[PATCH 16/19] drm/radeon: immediately free ttm-move semaphore
We can now protected the semaphore ram by a fence, so free it immediately. Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon_ttm.c |7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 5e3d54d..0f6aee8 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -223,6 +223,7 @@ static int radeon_move_blit(struct ttm_buffer_object *bo, struct radeon_device *rdev; uint64_t old_start, new_start; struct radeon_fence *fence, *old_fence; + struct radeon_semaphore *sem = NULL; int r; rdev = radeon_get_rdev(bo->bdev); @@ -272,15 +273,16 @@ static int radeon_move_blit(struct ttm_buffer_object *bo, bool sync_to_ring[RADEON_NUM_RINGS] = { }; sync_to_ring[old_fence->ring] = true; - r = radeon_semaphore_create(rdev, >semaphore); + r = radeon_semaphore_create(rdev, ); if (r) { radeon_fence_unref(); return r; } - r = radeon_semaphore_sync_rings(rdev, fence->semaphore, + r = radeon_semaphore_sync_rings(rdev, sem, sync_to_ring, fence->ring); if (r) { + radeon_semaphore_free(rdev, sem, NULL); radeon_fence_unref(); return r; } @@ -292,6 +294,7 @@ static int radeon_move_blit(struct ttm_buffer_object *bo, /* FIXME: handle copy error */ r = ttm_bo_move_accel_cleanup(bo, (void *)fence, NULL, evict, no_wait_reserve, no_wait_gpu, new_mem); + radeon_semaphore_free(rdev, sem, fence); radeon_fence_unref(); return r; } -- 1.7.9.5
[PATCH 15/19] drm/radeon: rip out the ib pool
From: Jerome GlisseIt isn't necessary any more and the suballocator seems to perform even better. Signed-off-by: Christian K?nig Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon.h | 17 +- drivers/gpu/drm/radeon/radeon_device.c|1 - drivers/gpu/drm/radeon/radeon_gart.c | 12 +- drivers/gpu/drm/radeon/radeon_ring.c | 241 - drivers/gpu/drm/radeon/radeon_semaphore.c |2 +- 5 files changed, 71 insertions(+), 202 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 45164e1..6170307 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -625,7 +625,6 @@ void radeon_irq_kms_pflip_irq_put(struct radeon_device *rdev, int crtc); struct radeon_ib { struct radeon_sa_bo *sa_bo; - unsignedidx; uint32_tlength_dw; uint64_tgpu_addr; uint32_t*ptr; @@ -634,18 +633,6 @@ struct radeon_ib { boolis_const_ib; }; -/* - * locking - - * mutex protects scheduled_ibs, ready, alloc_bm - */ -struct radeon_ib_pool { - struct radeon_mutex mutex; - struct radeon_sa_managersa_manager; - struct radeon_ibibs[RADEON_IB_POOL_SIZE]; - boolready; - unsignedhead_id; -}; - struct radeon_ring { struct radeon_bo*ring_obj; volatile uint32_t *ring; @@ -787,7 +774,6 @@ struct si_rlc { int radeon_ib_get(struct radeon_device *rdev, int ring, struct radeon_ib **ib, unsigned size); void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib); -bool radeon_ib_try_free(struct radeon_device *rdev, struct radeon_ib *ib); int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib); int radeon_ib_pool_init(struct radeon_device *rdev); void radeon_ib_pool_fini(struct radeon_device *rdev); @@ -1522,7 +1508,8 @@ struct radeon_device { wait_queue_head_t fence_queue; struct mutexring_lock; struct radeon_ring ring[RADEON_NUM_RINGS]; - struct radeon_ib_pool ib_pool; + boolib_pool_ready; + struct radeon_sa_managerring_tmp_bo; struct radeon_irq irq; struct radeon_asic *asic; struct radeon_gem gem; diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index 48876c1..e1bc7e9 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -724,7 +724,6 @@ int radeon_device_init(struct radeon_device *rdev, /* mutex initialization are all done here so we * can recall function without having locking issues */ radeon_mutex_init(>cs_mutex); - radeon_mutex_init(>ib_pool.mutex); mutex_init(>ring_lock); mutex_init(>dc_hw_i2c_mutex); if (rdev->family >= CHIP_R600) diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index 53dba8e..8e9ef34 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -432,8 +432,8 @@ retry_id: rdev->vm_manager.use_bitmap |= 1 << id; vm->id = id; list_add_tail(>list, >vm_manager.lru_vm); - return radeon_vm_bo_update_pte(rdev, vm, rdev->ib_pool.sa_manager.bo, - >ib_pool.sa_manager.bo->tbo.mem); + return radeon_vm_bo_update_pte(rdev, vm, rdev->ring_tmp_bo.bo, + >ring_tmp_bo.bo->tbo.mem); } /* object have to be reserved */ @@ -631,7 +631,7 @@ int radeon_vm_init(struct radeon_device *rdev, struct radeon_vm *vm) /* map the ib pool buffer at 0 in virtual address space, set * read only */ - r = radeon_vm_bo_add(rdev, vm, rdev->ib_pool.sa_manager.bo, 0, + r = radeon_vm_bo_add(rdev, vm, rdev->ring_tmp_bo.bo, 0, RADEON_VM_PAGE_READABLE | RADEON_VM_PAGE_SNOOPED); return r; } @@ -648,12 +648,12 @@ void radeon_vm_fini(struct radeon_device *rdev, struct radeon_vm *vm) radeon_mutex_unlock(>cs_mutex); /* remove all bo */ - r = radeon_bo_reserve(rdev->ib_pool.sa_manager.bo, false); + r = radeon_bo_reserve(rdev->ring_tmp_bo.bo, false); if (!r) { - bo_va = radeon_bo_va(rdev->ib_pool.sa_manager.bo, vm); + bo_va = radeon_bo_va(rdev->ring_tmp_bo.bo, vm); list_del_init(_va->bo_list); list_del_init(_va->vm_list); - radeon_bo_unreserve(rdev->ib_pool.sa_manager.bo); + radeon_bo_unreserve(rdev->ring_tmp_bo.bo); kfree(bo_va); } if
[PATCH 14/19] drm/radeon: simplify semaphore handling v2
From: Jerome GlisseDirectly use the suballocator to get small chunks of memory. It's equally fast and doesn't crash when we encounter a GPU reset. v2: rebased on new SA interface. Signed-off-by: Christian K?nig Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/evergreen.c|1 - drivers/gpu/drm/radeon/ni.c |1 - drivers/gpu/drm/radeon/r600.c |1 - drivers/gpu/drm/radeon/radeon.h | 29 +- drivers/gpu/drm/radeon/radeon_device.c|2 - drivers/gpu/drm/radeon/radeon_fence.c |2 +- drivers/gpu/drm/radeon/radeon_semaphore.c | 137 + drivers/gpu/drm/radeon/radeon_test.c |4 +- drivers/gpu/drm/radeon/rv770.c|1 - drivers/gpu/drm/radeon/si.c |1 - 10 files changed, 30 insertions(+), 149 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index ecc29bc..7e7ac3d 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -3550,7 +3550,6 @@ void evergreen_fini(struct radeon_device *rdev) evergreen_pcie_gart_fini(rdev); r600_vram_scratch_fini(rdev); radeon_gem_fini(rdev); - radeon_semaphore_driver_fini(rdev); radeon_fence_driver_fini(rdev); radeon_agp_fini(rdev); radeon_bo_fini(rdev); diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c index 9cd2657..107b217 100644 --- a/drivers/gpu/drm/radeon/ni.c +++ b/drivers/gpu/drm/radeon/ni.c @@ -1744,7 +1744,6 @@ void cayman_fini(struct radeon_device *rdev) cayman_pcie_gart_fini(rdev); r600_vram_scratch_fini(rdev); radeon_gem_fini(rdev); - radeon_semaphore_driver_fini(rdev); radeon_fence_driver_fini(rdev); radeon_bo_fini(rdev); radeon_atombios_fini(rdev); diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index d02f13f..478b51e 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -2658,7 +2658,6 @@ void r600_fini(struct radeon_device *rdev) r600_vram_scratch_fini(rdev); radeon_agp_fini(rdev); radeon_gem_fini(rdev); - radeon_semaphore_driver_fini(rdev); radeon_fence_driver_fini(rdev); radeon_bo_fini(rdev); radeon_atombios_fini(rdev); diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index cc7f16a..45164e1 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -434,34 +434,13 @@ int radeon_mode_dumb_destroy(struct drm_file *file_priv, /* * Semaphores. */ -struct radeon_ring; - -#defineRADEON_SEMAPHORE_BO_SIZE256 - -struct radeon_semaphore_driver { - rwlock_tlock; - struct list_headbo; -}; - -struct radeon_semaphore_bo; - /* everything here is constant */ struct radeon_semaphore { - struct list_headlist; + struct radeon_sa_bo *sa_bo; + signed waiters; uint64_tgpu_addr; - uint32_t*cpu_ptr; - struct radeon_semaphore_bo *bo; }; -struct radeon_semaphore_bo { - struct list_headlist; - struct radeon_ib*ib; - struct list_headfree; - struct radeon_semaphore semaphores[RADEON_SEMAPHORE_BO_SIZE/8]; - unsignednused; -}; - -void radeon_semaphore_driver_fini(struct radeon_device *rdev); int radeon_semaphore_create(struct radeon_device *rdev, struct radeon_semaphore **semaphore); void radeon_semaphore_emit_signal(struct radeon_device *rdev, int ring, @@ -473,7 +452,8 @@ int radeon_semaphore_sync_rings(struct radeon_device *rdev, bool sync_to[RADEON_NUM_RINGS], int dst_ring); void radeon_semaphore_free(struct radeon_device *rdev, - struct radeon_semaphore *semaphore); + struct radeon_semaphore *semaphore, + struct radeon_fence *fence); /* * GART structures, functions & helpers @@ -1540,7 +1520,6 @@ struct radeon_device { struct radeon_mman mman; struct radeon_fence_driver fence_drv[RADEON_NUM_RINGS]; wait_queue_head_t fence_queue; - struct radeon_semaphore_driver semaphore_drv; struct mutexring_lock; struct radeon_ring ring[RADEON_NUM_RINGS]; struct radeon_ib_pool ib_pool; diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index b827b2e..48876c1 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -732,11 +732,9 @@ int
[PATCH 13/19] drm/radeon: multiple ring allocator v3
A startover with a new idea for a multiple ring allocator. Should perform as well as a normal ring allocator as long as only one ring does somthing, but falls back to a more complex algorithm if more complex things start to happen. We store the last allocated bo in last, we always try to allocate after the last allocated bo. Principle is that in a linear GPU ring progression was is after last is the oldest bo we allocated and thus the first one that should no longer be in use by the GPU. If it's not the case we skip over the bo after last to the closest done bo if such one exist. If none exist and we are not asked to block we report failure to allocate. If we are asked to block we wait on all the oldest fence of all rings. We just wait for any of those fence to complete. v2: We need to be able to let hole point to the list_head, otherwise try free will never free the first allocation of the list. Also stop calling radeon_fence_signalled more than necessary. v3: Don't free allocations without considering them as a hole, otherwise we might lose holes. Also return ENOMEM instead of ENOENT when running out of fences to wait for. Limit the number of holes we try for each ring to 3. Signed-off-by: Christian K?nig Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon.h |7 +- drivers/gpu/drm/radeon/radeon_ring.c | 19 +-- drivers/gpu/drm/radeon/radeon_sa.c | 312 -- 3 files changed, 231 insertions(+), 107 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 37a7459..cc7f16a 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -385,7 +385,9 @@ struct radeon_bo_list { struct radeon_sa_manager { spinlock_t lock; struct radeon_bo*bo; - struct list_headsa_bo; + struct list_head*hole; + struct list_headflist[RADEON_NUM_RINGS]; + struct list_headolist; unsignedsize; uint64_tgpu_addr; void*cpu_ptr; @@ -396,7 +398,8 @@ struct radeon_sa_bo; /* sub-allocation buffer */ struct radeon_sa_bo { - struct list_headlist; + struct list_headolist; + struct list_headflist; struct radeon_sa_manager*manager; unsignedsoffset; unsignedeoffset; diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 1748d93..e074ff5 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -204,25 +204,22 @@ int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib) int radeon_ib_pool_init(struct radeon_device *rdev) { - struct radeon_sa_manager tmp; int i, r; - r = radeon_sa_bo_manager_init(rdev, , - RADEON_IB_POOL_SIZE*64*1024, - RADEON_GEM_DOMAIN_GTT); - if (r) { - return r; - } - radeon_mutex_lock(>ib_pool.mutex); if (rdev->ib_pool.ready) { radeon_mutex_unlock(>ib_pool.mutex); - radeon_sa_bo_manager_fini(rdev, ); return 0; } - rdev->ib_pool.sa_manager = tmp; - INIT_LIST_HEAD(>ib_pool.sa_manager.sa_bo); + r = radeon_sa_bo_manager_init(rdev, >ib_pool.sa_manager, + RADEON_IB_POOL_SIZE*64*1024, + RADEON_GEM_DOMAIN_GTT); + if (r) { + radeon_mutex_unlock(>ib_pool.mutex); + return r; + } + for (i = 0; i < RADEON_IB_POOL_SIZE; i++) { rdev->ib_pool.ibs[i].fence = NULL; rdev->ib_pool.ibs[i].idx = i; diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c index 90ee8ad..c3ac7f4 100644 --- a/drivers/gpu/drm/radeon/radeon_sa.c +++ b/drivers/gpu/drm/radeon/radeon_sa.c @@ -27,21 +27,42 @@ * Authors: *Jerome Glisse */ +/* Algorithm: + * + * We store the last allocated bo in "hole", we always try to allocate + * after the last allocated bo. Principle is that in a linear GPU ring + * progression was is after last is the oldest bo we allocated and thus + * the first one that should no longer be in use by the GPU. + * + * If it's not the case we skip over the bo after last to the closest + * done bo if such one exist. If none exist and we are not asked to + * block we report failure to allocate. + * + * If we are asked to block we wait on all the oldest fence of all + * rings. We just wait for any of those fence to complete. + */ #include "drmP.h" #include "drm.h" #include "radeon.h" +static void radeon_sa_bo_remove_locked(struct radeon_sa_bo *sa_bo); +static void radeon_sa_bo_try_free(struct
[PATCH 12/19] drm/radeon: use one wait queue for all rings add fence_wait_any v2
From: Jerome GlisseUse one wait queue for all rings. When one ring progress, other likely does to and we are not expecting to have a lot of waiter anyway. Also add a fence_wait_any that will wait until the first fence in the fence array (one fence per ring) is signaled. This allow to wait on all rings. v2: some minor cleanups and improvements. Signed-off-by: Christian K?nig Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon.h |5 +- drivers/gpu/drm/radeon/radeon_fence.c | 165 +++-- 2 files changed, 163 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index ada70d1..37a7459 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -262,7 +262,6 @@ struct radeon_fence_driver { uint64_tseq; atomic64_t last_seq; unsigned long last_activity; - wait_queue_head_t queue; boolinitialized; }; @@ -286,6 +285,9 @@ bool radeon_fence_signaled(struct radeon_fence *fence); int radeon_fence_wait(struct radeon_fence *fence, bool interruptible); int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring); int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring); +int radeon_fence_wait_any(struct radeon_device *rdev, + struct radeon_fence **fences, + bool intr); struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence); void radeon_fence_unref(struct radeon_fence **fence); unsigned radeon_fence_count_emitted(struct radeon_device *rdev, int ring); @@ -1534,6 +1536,7 @@ struct radeon_device { struct radeon_scratch scratch; struct radeon_mman mman; struct radeon_fence_driver fence_drv[RADEON_NUM_RINGS]; + wait_queue_head_t fence_queue; struct radeon_semaphore_driver semaphore_drv; struct mutexring_lock; struct radeon_ring ring[RADEON_NUM_RINGS]; diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 098d1fa..14dbc28 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -129,7 +129,7 @@ void radeon_fence_process(struct radeon_device *rdev, int ring) if (wake) { rdev->fence_drv[ring].last_activity = jiffies; - wake_up_all(>fence_drv[ring].queue); + wake_up_all(>fence_queue); } } @@ -224,11 +224,11 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq, trace_radeon_fence_wait_begin(rdev->ddev, seq); radeon_irq_kms_sw_irq_get(rdev, ring); if (intr) { - r = wait_event_interruptible_timeout(rdev->fence_drv[ring].queue, + r = wait_event_interruptible_timeout(rdev->fence_queue, (signaled = radeon_fence_seq_signaled(rdev, target_seq, ring)), timeout); } else { - r = wait_event_timeout(rdev->fence_drv[ring].queue, + r = wait_event_timeout(rdev->fence_queue, (signaled = radeon_fence_seq_signaled(rdev, target_seq, ring)), timeout); } @@ -306,6 +306,159 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr) return 0; } +bool radeon_fence_any_seq_signaled(struct radeon_device *rdev, u64 *seq) +{ + unsigned i; + + for (i = 0; i < RADEON_NUM_RINGS; ++i) { + if (seq[i] && radeon_fence_seq_signaled(rdev, seq[i], i)) { + return true; + } + } + return false; +} + +static int radeon_fence_wait_any_seq(struct radeon_device *rdev, +u64 *target_seq, bool intr) +{ + unsigned long timeout, last_activity, tmp; + unsigned i, ring = RADEON_NUM_RINGS; + bool signaled; + int r; + + for (i = 0, last_activity = 0; i < RADEON_NUM_RINGS; ++i) { + if (!target_seq[i]) { + continue; + } + + /* use the most recent one as indicator */ + if (time_after(rdev->fence_drv[i].last_activity, last_activity)) { + last_activity = rdev->fence_drv[i].last_activity; + } + + /* For lockup detection just pick the lowest ring we are +* actively waiting for +*/ + if (i < ring) { + ring = i; + } + } + + /* nothing to wait for ? */ + if (ring == RADEON_NUM_RINGS) { + return 0; + } + +
[PATCH 11/19] drm/radeon: define new SA interface v3
Define the interface without modifying the allocation algorithm in any way. v2: rebase on top of fence new uint64 patch v3: add ring to debugfs output Signed-off-by: Jerome Glisse Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon.h |1 + drivers/gpu/drm/radeon/radeon_gart.c |6 +-- drivers/gpu/drm/radeon/radeon_object.h|5 ++- drivers/gpu/drm/radeon/radeon_ring.c |8 ++-- drivers/gpu/drm/radeon/radeon_sa.c| 60 - drivers/gpu/drm/radeon/radeon_semaphore.c |2 +- 6 files changed, 63 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 9374ab1..ada70d1 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -398,6 +398,7 @@ struct radeon_sa_bo { struct radeon_sa_manager*manager; unsignedsoffset; unsignedeoffset; + struct radeon_fence *fence; }; /* diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index c5789ef..53dba8e 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -326,7 +326,7 @@ static void radeon_vm_unbind_locked(struct radeon_device *rdev, rdev->vm_manager.use_bitmap &= ~(1 << vm->id); list_del_init(>list); vm->id = -1; - radeon_sa_bo_free(rdev, >sa_bo); + radeon_sa_bo_free(rdev, >sa_bo, NULL); vm->pt = NULL; list_for_each_entry(bo_va, >va, vm_list) { @@ -395,7 +395,7 @@ int radeon_vm_bind(struct radeon_device *rdev, struct radeon_vm *vm) retry: r = radeon_sa_bo_new(rdev, >vm_manager.sa_manager, >sa_bo, RADEON_GPU_PAGE_ALIGN(vm->last_pfn * 8), -RADEON_GPU_PAGE_SIZE); +RADEON_GPU_PAGE_SIZE, false); if (r) { if (list_empty(>vm_manager.lru_vm)) { return r; @@ -426,7 +426,7 @@ retry_id: /* do hw bind */ r = rdev->vm_manager.funcs->bind(rdev, vm, id); if (r) { - radeon_sa_bo_free(rdev, >sa_bo); + radeon_sa_bo_free(rdev, >sa_bo, NULL); return r; } rdev->vm_manager.use_bitmap |= 1 << id; diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h index 4fc7f07..befec7d 100644 --- a/drivers/gpu/drm/radeon/radeon_object.h +++ b/drivers/gpu/drm/radeon/radeon_object.h @@ -169,9 +169,10 @@ extern int radeon_sa_bo_manager_suspend(struct radeon_device *rdev, extern int radeon_sa_bo_new(struct radeon_device *rdev, struct radeon_sa_manager *sa_manager, struct radeon_sa_bo **sa_bo, - unsigned size, unsigned align); + unsigned size, unsigned align, bool block); extern void radeon_sa_bo_free(struct radeon_device *rdev, - struct radeon_sa_bo **sa_bo); + struct radeon_sa_bo **sa_bo, + struct radeon_fence *fence); #if defined(CONFIG_DEBUG_FS) extern void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager, struct seq_file *m); diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 45adb37..1748d93 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -85,7 +85,7 @@ bool radeon_ib_try_free(struct radeon_device *rdev, struct radeon_ib *ib) if (ib->fence && ib->fence->seq < RADEON_FENCE_NOTEMITED_SEQ) { if (radeon_fence_signaled(ib->fence)) { radeon_fence_unref(>fence); - radeon_sa_bo_free(rdev, >sa_bo); + radeon_sa_bo_free(rdev, >sa_bo, NULL); done = true; } } @@ -124,7 +124,7 @@ retry: if (rdev->ib_pool.ibs[idx].fence == NULL) { r = radeon_sa_bo_new(rdev, >ib_pool.sa_manager, >ib_pool.ibs[idx].sa_bo, -size, 256); +size, 256, false); if (!r) { *ib = >ib_pool.ibs[idx]; (*ib)->ptr = radeon_sa_bo_cpu_addr((*ib)->sa_bo); @@ -173,7 +173,7 @@ void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib) } radeon_mutex_lock(>ib_pool.mutex); if (tmp->fence && tmp->fence->seq == RADEON_FENCE_NOTEMITED_SEQ) { - radeon_sa_bo_free(rdev, >sa_bo); + radeon_sa_bo_free(rdev, >sa_bo, NULL); radeon_fence_unref(>fence); }
[PATCH 10/19] drm/radeon: make sa bo a stand alone object
Allocating and freeing it seperately. Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon.h |4 ++-- drivers/gpu/drm/radeon/radeon_cs.c|4 ++-- drivers/gpu/drm/radeon/radeon_gart.c |4 ++-- drivers/gpu/drm/radeon/radeon_object.h|4 ++-- drivers/gpu/drm/radeon/radeon_ring.c |6 +++--- drivers/gpu/drm/radeon/radeon_sa.c| 28 +++- drivers/gpu/drm/radeon/radeon_semaphore.c |4 ++-- 7 files changed, 32 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index d1c2154..9374ab1 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -638,7 +638,7 @@ void radeon_irq_kms_pflip_irq_put(struct radeon_device *rdev, int crtc); */ struct radeon_ib { - struct radeon_sa_bo sa_bo; + struct radeon_sa_bo *sa_bo; unsignedidx; uint32_tlength_dw; uint64_tgpu_addr; @@ -693,7 +693,7 @@ struct radeon_vm { unsignedlast_pfn; u64 pt_gpu_addr; u64 *pt; - struct radeon_sa_bo sa_bo; + struct radeon_sa_bo *sa_bo; struct mutexmutex; /* last fence for cs using this vm */ struct radeon_fence *fence; diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index b778037..5c065bf 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -477,7 +477,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev, /* ib pool is bind at 0 in virtual address space to gpu_addr is the * offset inside the pool bo */ - parser->const_ib->gpu_addr = parser->const_ib->sa_bo.soffset; + parser->const_ib->gpu_addr = parser->const_ib->sa_bo->soffset; r = radeon_ib_schedule(rdev, parser->const_ib); if (r) goto out; @@ -487,7 +487,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev, /* ib pool is bind at 0 in virtual address space to gpu_addr is the * offset inside the pool bo */ - parser->ib->gpu_addr = parser->ib->sa_bo.soffset; + parser->ib->gpu_addr = parser->ib->sa_bo->soffset; parser->ib->is_const_ib = false; r = radeon_ib_schedule(rdev, parser->ib); out: diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index 4a5d9d4..c5789ef 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -404,8 +404,8 @@ retry: radeon_vm_unbind(rdev, vm_evict); goto retry; } - vm->pt = radeon_sa_bo_cpu_addr(>sa_bo); - vm->pt_gpu_addr = radeon_sa_bo_gpu_addr(>sa_bo); + vm->pt = radeon_sa_bo_cpu_addr(vm->sa_bo); + vm->pt_gpu_addr = radeon_sa_bo_gpu_addr(vm->sa_bo); memset(vm->pt, 0, RADEON_GPU_PAGE_ALIGN(vm->last_pfn * 8)); retry_id: diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h index 99ab46a..4fc7f07 100644 --- a/drivers/gpu/drm/radeon/radeon_object.h +++ b/drivers/gpu/drm/radeon/radeon_object.h @@ -168,10 +168,10 @@ extern int radeon_sa_bo_manager_suspend(struct radeon_device *rdev, struct radeon_sa_manager *sa_manager); extern int radeon_sa_bo_new(struct radeon_device *rdev, struct radeon_sa_manager *sa_manager, - struct radeon_sa_bo *sa_bo, + struct radeon_sa_bo **sa_bo, unsigned size, unsigned align); extern void radeon_sa_bo_free(struct radeon_device *rdev, - struct radeon_sa_bo *sa_bo); + struct radeon_sa_bo **sa_bo); #if defined(CONFIG_DEBUG_FS) extern void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager, struct seq_file *m); diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index f49c9c0..45adb37 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -127,8 +127,8 @@ retry: size, 256); if (!r) { *ib = >ib_pool.ibs[idx]; - (*ib)->ptr = radeon_sa_bo_cpu_addr(&(*ib)->sa_bo); - (*ib)->gpu_addr = radeon_sa_bo_gpu_addr(&(*ib)->sa_bo); + (*ib)->ptr = radeon_sa_bo_cpu_addr((*ib)->sa_bo); + (*ib)->gpu_addr = radeon_sa_bo_gpu_addr((*ib)->sa_bo);
[PATCH 09/19] drm/radeon: keep start and end offset in the SA
Instead of offset + size keep start and end offset directly. Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon.h|4 ++-- drivers/gpu/drm/radeon/radeon_cs.c |4 ++-- drivers/gpu/drm/radeon/radeon_object.h |4 ++-- drivers/gpu/drm/radeon/radeon_sa.c | 13 +++-- 4 files changed, 13 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 8a6b1b3..d1c2154 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -396,8 +396,8 @@ struct radeon_sa_bo; struct radeon_sa_bo { struct list_headlist; struct radeon_sa_manager*manager; - unsignedoffset; - unsignedsize; + unsignedsoffset; + unsignedeoffset; }; /* diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 289b0d7..b778037 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -477,7 +477,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev, /* ib pool is bind at 0 in virtual address space to gpu_addr is the * offset inside the pool bo */ - parser->const_ib->gpu_addr = parser->const_ib->sa_bo.offset; + parser->const_ib->gpu_addr = parser->const_ib->sa_bo.soffset; r = radeon_ib_schedule(rdev, parser->const_ib); if (r) goto out; @@ -487,7 +487,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev, /* ib pool is bind at 0 in virtual address space to gpu_addr is the * offset inside the pool bo */ - parser->ib->gpu_addr = parser->ib->sa_bo.offset; + parser->ib->gpu_addr = parser->ib->sa_bo.soffset; parser->ib->is_const_ib = false; r = radeon_ib_schedule(rdev, parser->ib); out: diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h index d9fca1e..99ab46a 100644 --- a/drivers/gpu/drm/radeon/radeon_object.h +++ b/drivers/gpu/drm/radeon/radeon_object.h @@ -149,12 +149,12 @@ extern struct radeon_bo_va *radeon_bo_va(struct radeon_bo *rbo, static inline uint64_t radeon_sa_bo_gpu_addr(struct radeon_sa_bo *sa_bo) { - return sa_bo->manager->gpu_addr + sa_bo->offset; + return sa_bo->manager->gpu_addr + sa_bo->soffset; } static inline void * radeon_sa_bo_cpu_addr(struct radeon_sa_bo *sa_bo) { - return sa_bo->manager->cpu_ptr + sa_bo->offset; + return sa_bo->manager->cpu_ptr + sa_bo->soffset; } extern int radeon_sa_bo_manager_init(struct radeon_device *rdev, diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c index 1db0568..3bea7ba 100644 --- a/drivers/gpu/drm/radeon/radeon_sa.c +++ b/drivers/gpu/drm/radeon/radeon_sa.c @@ -152,11 +152,11 @@ int radeon_sa_bo_new(struct radeon_device *rdev, offset = 0; list_for_each_entry(tmp, _manager->sa_bo, list) { /* room before this object ? */ - if (offset < tmp->offset && (tmp->offset - offset) >= size) { + if (offset < tmp->soffset && (tmp->soffset - offset) >= size) { head = tmp->list.prev; goto out; } - offset = tmp->offset + tmp->size; + offset = tmp->eoffset; wasted = offset % align; if (wasted) { wasted = align - wasted; @@ -166,7 +166,7 @@ int radeon_sa_bo_new(struct radeon_device *rdev, /* room at the end ? */ head = sa_manager->sa_bo.prev; tmp = list_entry(head, struct radeon_sa_bo, list); - offset = tmp->offset + tmp->size; + offset = tmp->eoffset; wasted = offset % align; if (wasted) { wasted = align - wasted; @@ -180,8 +180,8 @@ int radeon_sa_bo_new(struct radeon_device *rdev, out: sa_bo->manager = sa_manager; - sa_bo->offset = offset; - sa_bo->size = size; + sa_bo->soffset = offset; + sa_bo->eoffset = offset + size; list_add(_bo->list, head); spin_unlock(_manager->lock); return 0; @@ -202,7 +202,8 @@ void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager, spin_lock(_manager->lock); list_for_each_entry(i, _manager->sa_bo, list) { - seq_printf(m, "offset %08d: size %4d\n", i->offset, i->size); + seq_printf(m, "[%08x %08x] size %4d [%p]\n", + i->soffset, i->eoffset, i->eoffset - i->soffset, i); } spin_unlock(_manager->lock); } -- 1.7.9.5
[PATCH 08/19] drm/radeon: add sub allocator debugfs file
Dumping the current allocations. Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon_object.h |5 + drivers/gpu/drm/radeon/radeon_ring.c | 22 ++ drivers/gpu/drm/radeon/radeon_sa.c | 14 ++ 3 files changed, 41 insertions(+) diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h index c120ab9..d9fca1e 100644 --- a/drivers/gpu/drm/radeon/radeon_object.h +++ b/drivers/gpu/drm/radeon/radeon_object.h @@ -172,5 +172,10 @@ extern int radeon_sa_bo_new(struct radeon_device *rdev, unsigned size, unsigned align); extern void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo *sa_bo); +#if defined(CONFIG_DEBUG_FS) +extern void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager, +struct seq_file *m); +#endif + #endif diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 116be5e..f49c9c0 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -601,6 +601,23 @@ static int radeon_debugfs_ib_info(struct seq_file *m, void *data) static struct drm_info_list radeon_debugfs_ib_list[RADEON_IB_POOL_SIZE]; static char radeon_debugfs_ib_names[RADEON_IB_POOL_SIZE][32]; static unsigned radeon_debugfs_ib_idx[RADEON_IB_POOL_SIZE]; + +static int radeon_debugfs_sa_info(struct seq_file *m, void *data) +{ + struct drm_info_node *node = (struct drm_info_node *) m->private; + struct drm_device *dev = node->minor->dev; + struct radeon_device *rdev = dev->dev_private; + + radeon_sa_bo_dump_debug_info(>ib_pool.sa_manager, m); + + return 0; + +} + +static struct drm_info_list radeon_debugfs_sa_list[] = { +{"radeon_sa_info", _debugfs_sa_info, 0, NULL}, +}; + #endif int radeon_debugfs_ring_init(struct radeon_device *rdev, struct radeon_ring *ring) @@ -627,6 +644,11 @@ int radeon_debugfs_ib_init(struct radeon_device *rdev) { #if defined(CONFIG_DEBUG_FS) unsigned i; + int r; + + r = radeon_debugfs_add_files(rdev, radeon_debugfs_sa_list, 1); + if (r) + return r; for (i = 0; i < RADEON_IB_POOL_SIZE; i++) { sprintf(radeon_debugfs_ib_names[i], "radeon_ib_%04u", i); diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c index aed0a8c..1db0568 100644 --- a/drivers/gpu/drm/radeon/radeon_sa.c +++ b/drivers/gpu/drm/radeon/radeon_sa.c @@ -193,3 +193,17 @@ void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo *sa_bo) list_del_init(_bo->list); spin_unlock(_bo->manager->lock); } + +#if defined(CONFIG_DEBUG_FS) +void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager, + struct seq_file *m) +{ + struct radeon_sa_bo *i; + + spin_lock(_manager->lock); + list_for_each_entry(i, _manager->sa_bo, list) { + seq_printf(m, "offset %08d: size %4d\n", i->offset, i->size); + } + spin_unlock(_manager->lock); +} +#endif -- 1.7.9.5
[PATCH 07/19] drm/radeon: add proper locking to the SA v3
Make the suballocator self containing to locking. v2: split the bugfix into a seperate patch. v3: remove some unreleated changes. Sig-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon.h|1 + drivers/gpu/drm/radeon/radeon_sa.c |6 ++ 2 files changed, 7 insertions(+) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 701094b..8a6b1b3 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -381,6 +381,7 @@ struct radeon_bo_list { * alignment). */ struct radeon_sa_manager { + spinlock_t lock; struct radeon_bo*bo; struct list_headsa_bo; unsignedsize; diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c index 8fbfe69..aed0a8c 100644 --- a/drivers/gpu/drm/radeon/radeon_sa.c +++ b/drivers/gpu/drm/radeon/radeon_sa.c @@ -37,6 +37,7 @@ int radeon_sa_bo_manager_init(struct radeon_device *rdev, { int r; + spin_lock_init(_manager->lock); sa_manager->bo = NULL; sa_manager->size = size; sa_manager->domain = domain; @@ -139,6 +140,7 @@ int radeon_sa_bo_new(struct radeon_device *rdev, BUG_ON(align > RADEON_GPU_PAGE_SIZE); BUG_ON(size > sa_manager->size); + spin_lock(_manager->lock); /* no one ? */ head = sa_manager->sa_bo.prev; @@ -172,6 +174,7 @@ int radeon_sa_bo_new(struct radeon_device *rdev, offset += wasted; if ((sa_manager->size - offset) < size) { /* failed to find somethings big enough */ + spin_unlock(_manager->lock); return -ENOMEM; } @@ -180,10 +183,13 @@ out: sa_bo->offset = offset; sa_bo->size = size; list_add(_bo->list, head); + spin_unlock(_manager->lock); return 0; } void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo *sa_bo) { + spin_lock(_bo->manager->lock); list_del_init(_bo->list); + spin_unlock(_bo->manager->lock); } -- 1.7.9.5
[PATCH 06/19] drm/radeon: use inline functions to calc sa_bo addr
Instead of hacking the calculation multiple times. Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon_gart.c |6 ++ drivers/gpu/drm/radeon/radeon_object.h| 11 +++ drivers/gpu/drm/radeon/radeon_ring.c |6 ++ drivers/gpu/drm/radeon/radeon_semaphore.c |6 ++ 4 files changed, 17 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index c58a036..4a5d9d4 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -404,10 +404,8 @@ retry: radeon_vm_unbind(rdev, vm_evict); goto retry; } - vm->pt = rdev->vm_manager.sa_manager.cpu_ptr; - vm->pt += (vm->sa_bo.offset >> 3); - vm->pt_gpu_addr = rdev->vm_manager.sa_manager.gpu_addr; - vm->pt_gpu_addr += vm->sa_bo.offset; + vm->pt = radeon_sa_bo_cpu_addr(>sa_bo); + vm->pt_gpu_addr = radeon_sa_bo_gpu_addr(>sa_bo); memset(vm->pt, 0, RADEON_GPU_PAGE_ALIGN(vm->last_pfn * 8)); retry_id: diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h index f9104be..c120ab9 100644 --- a/drivers/gpu/drm/radeon/radeon_object.h +++ b/drivers/gpu/drm/radeon/radeon_object.h @@ -146,6 +146,17 @@ extern struct radeon_bo_va *radeon_bo_va(struct radeon_bo *rbo, /* * sub allocation */ + +static inline uint64_t radeon_sa_bo_gpu_addr(struct radeon_sa_bo *sa_bo) +{ + return sa_bo->manager->gpu_addr + sa_bo->offset; +} + +static inline void * radeon_sa_bo_cpu_addr(struct radeon_sa_bo *sa_bo) +{ + return sa_bo->manager->cpu_ptr + sa_bo->offset; +} + extern int radeon_sa_bo_manager_init(struct radeon_device *rdev, struct radeon_sa_manager *sa_manager, unsigned size, u32 domain); diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 2fdc8c3..116be5e 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -127,10 +127,8 @@ retry: size, 256); if (!r) { *ib = >ib_pool.ibs[idx]; - (*ib)->ptr = rdev->ib_pool.sa_manager.cpu_ptr; - (*ib)->ptr += ((*ib)->sa_bo.offset >> 2); - (*ib)->gpu_addr = rdev->ib_pool.sa_manager.gpu_addr; - (*ib)->gpu_addr += (*ib)->sa_bo.offset; + (*ib)->ptr = radeon_sa_bo_cpu_addr(&(*ib)->sa_bo); + (*ib)->gpu_addr = radeon_sa_bo_gpu_addr(&(*ib)->sa_bo); (*ib)->fence = fence; (*ib)->vm_id = 0; (*ib)->is_const_ib = false; diff --git a/drivers/gpu/drm/radeon/radeon_semaphore.c b/drivers/gpu/drm/radeon/radeon_semaphore.c index c5b3d8e..f312ba5 100644 --- a/drivers/gpu/drm/radeon/radeon_semaphore.c +++ b/drivers/gpu/drm/radeon/radeon_semaphore.c @@ -53,10 +53,8 @@ static int radeon_semaphore_add_bo(struct radeon_device *rdev) kfree(bo); return r; } - gpu_addr = rdev->ib_pool.sa_manager.gpu_addr; - gpu_addr += bo->ib->sa_bo.offset; - cpu_ptr = rdev->ib_pool.sa_manager.cpu_ptr; - cpu_ptr += (bo->ib->sa_bo.offset >> 2); + gpu_addr = radeon_sa_bo_gpu_addr(>ib->sa_bo); + cpu_ptr = radeon_sa_bo_cpu_addr(>ib->sa_bo); for (i = 0; i < (RADEON_SEMAPHORE_BO_SIZE/8); i++) { bo->semaphores[i].gpu_addr = gpu_addr; bo->semaphores[i].cpu_ptr = cpu_ptr; -- 1.7.9.5
[PATCH 05/19] drm/radeon: rework locking ring emission mutex in fence deadlock detection v2
Some callers illegal called fence_wait_next/empty while holding the ring emission mutex. So don't relock the mutex in that cases, and move the actual locking into the fence code. v2: Don't try to unlock the mutex if it isn't locked. Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon.h|4 +-- drivers/gpu/drm/radeon/radeon_device.c |5 +++- drivers/gpu/drm/radeon/radeon_fence.c | 43 +--- drivers/gpu/drm/radeon/radeon_pm.c |8 +- drivers/gpu/drm/radeon/radeon_ring.c |6 + 5 files changed, 37 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 7c87117..701094b 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -284,8 +284,8 @@ int radeon_fence_emit(struct radeon_device *rdev, struct radeon_fence *fence); void radeon_fence_process(struct radeon_device *rdev, int ring); bool radeon_fence_signaled(struct radeon_fence *fence); int radeon_fence_wait(struct radeon_fence *fence, bool interruptible); -int radeon_fence_wait_next(struct radeon_device *rdev, int ring); -int radeon_fence_wait_empty(struct radeon_device *rdev, int ring); +int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring); +int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring); struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence); void radeon_fence_unref(struct radeon_fence **fence); unsigned radeon_fence_count_emitted(struct radeon_device *rdev, int ring); diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index 0e7b72a..b827b2e 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -912,9 +912,12 @@ int radeon_suspend_kms(struct drm_device *dev, pm_message_t state) } /* evict vram memory */ radeon_bo_evict_vram(rdev); + + mutex_lock(>ring_lock); /* wait for gpu to finish processing current batch */ for (i = 0; i < RADEON_NUM_RINGS; i++) - radeon_fence_wait_empty(rdev, i); + radeon_fence_wait_empty_locked(rdev, i); + mutex_unlock(>ring_lock); radeon_save_bios_scratch_regs(rdev); diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index ed20225..098d1fa 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -194,7 +194,7 @@ bool radeon_fence_signaled(struct radeon_fence *fence) } static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq, -unsigned ring, bool intr) +unsigned ring, bool intr, bool lock_ring) { unsigned long timeout, last_activity; uint64_t seq; @@ -249,8 +249,16 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq, if (seq != atomic64_read(>fence_drv[ring].last_seq)) { continue; } + + if (lock_ring) { + mutex_lock(>ring_lock); + } + /* test if somebody else has already decided that this is a lockup */ if (last_activity != rdev->fence_drv[ring].last_activity) { + if (lock_ring) { + mutex_unlock(>ring_lock); + } continue; } @@ -264,15 +272,17 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq, rdev->fence_drv[i].last_activity = jiffies; } - /* change last activity so nobody else think there is a lockup */ - for (i = 0; i < RADEON_NUM_RINGS; ++i) { - rdev->fence_drv[i].last_activity = jiffies; - } - /* mark the ring as not ready any more */ rdev->ring[ring].ready = false; + if (lock_ring) { + mutex_unlock(>ring_lock); + } return -EDEADLK; } + + if (lock_ring) { + mutex_unlock(>ring_lock); + } } } return 0; @@ -287,7 +297,8 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr) return -EINVAL; } - r = radeon_fence_wait_seq(fence->rdev, fence->seq, fence->ring, intr); + r = radeon_fence_wait_seq(fence->rdev, fence->seq, + fence->ring, intr,
[PATCH 04/19] drm/radeon: rework fence handling, drop fence list v7
From: Jerome GlisseUsing 64bits fence sequence we can directly compare sequence number to know if a fence is signaled or not. Thus the fence list became useless, so does the fence lock that mainly protected the fence list. Things like ring.ready are no longer behind a lock, this should be ok as ring.ready is initialized once and will only change when facing lockup. Worst case is that we return an -EBUSY just after a successfull GPU reset, or we go into wait state instead of returning -EBUSY (thus delaying reporting -EBUSY to fence wait caller). v2: Remove left over comment, force using writeback on cayman and newer, thus not having to suffer from possibly scratch reg exhaustion v3: Rebase on top of change to uint64 fence patch v4: Change DCE5 test to force write back on cayman and newer but also any APU such as PALM or SUMO family v5: Rebase on top of new uint64 fence patch v6: Just break if seq doesn't change any more. Use radeon_fence prefix for all function names. Even if it's now highly optimized, try avoiding polling to often. v7: We should never poll the last_seq from the hardware without waking the sleeping threads, otherwise we might lose events. Signed-off-by: Jerome Glisse Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon.h|6 +- drivers/gpu/drm/radeon/radeon_device.c |8 +- drivers/gpu/drm/radeon/radeon_fence.c | 299 3 files changed, 119 insertions(+), 194 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index cdf46bc..7c87117 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -263,15 +263,12 @@ struct radeon_fence_driver { atomic64_t last_seq; unsigned long last_activity; wait_queue_head_t queue; - struct list_heademitted; - struct list_headsignaled; boolinitialized; }; struct radeon_fence { struct radeon_device*rdev; struct kref kref; - struct list_headlist; /* protected by radeon_fence.lock */ uint64_tseq; /* RB, DMA, etc. */ @@ -291,7 +288,7 @@ int radeon_fence_wait_next(struct radeon_device *rdev, int ring); int radeon_fence_wait_empty(struct radeon_device *rdev, int ring); struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence); void radeon_fence_unref(struct radeon_fence **fence); -int radeon_fence_count_emitted(struct radeon_device *rdev, int ring); +unsigned radeon_fence_count_emitted(struct radeon_device *rdev, int ring); /* * Tiling registers @@ -1534,7 +1531,6 @@ struct radeon_device { struct radeon_mode_info mode_info; struct radeon_scratch scratch; struct radeon_mman mman; - rwlock_tfence_lock; struct radeon_fence_driver fence_drv[RADEON_NUM_RINGS]; struct radeon_semaphore_driver semaphore_drv; struct mutexring_lock; diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index 3f6ff2a..0e7b72a 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -225,9 +225,9 @@ int radeon_wb_init(struct radeon_device *rdev) /* disable event_write fences */ rdev->wb.use_event = false; /* disabled via module param */ - if (radeon_no_wb == 1) + if (radeon_no_wb == 1) { rdev->wb.enabled = false; - else { + } else { if (rdev->flags & RADEON_IS_AGP) { /* often unreliable on AGP */ rdev->wb.enabled = false; @@ -237,8 +237,9 @@ int radeon_wb_init(struct radeon_device *rdev) } else { rdev->wb.enabled = true; /* event_write fences are only available on r600+ */ - if (rdev->family >= CHIP_R600) + if (rdev->family >= CHIP_R600) { rdev->wb.use_event = true; + } } } /* always use writeback/events on NI, APUs */ @@ -731,7 +732,6 @@ int radeon_device_init(struct radeon_device *rdev, mutex_init(>gem.mutex); mutex_init(>pm.mutex); mutex_init(>vram_mutex); - rwlock_init(>fence_lock); rwlock_init(>semaphore_drv.lock); INIT_LIST_HEAD(>gem.objects); init_waitqueue_head(>irq.vblank_queue); diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index feb2bbc..ed20225 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -63,30 +63,18 @@ static u32
[PATCH 03/19] drm/radeon: convert fence to uint64_t v4
From: Jerome GlisseThis convert fence to use uint64_t sequence number intention is to use the fact that uin64_t is big enough that we don't need to care about wrap around. Tested with and without writeback using 0xF000 as initial fence sequence and thus allowing to test the wrap around from 32bits to 64bits. v2: Add comment about possible race btw CPU & GPU, add comment stressing that we need 2 dword aligned for R600_WB_EVENT_OFFSET Read fence sequenc in reverse order of GPU write them so we mitigate the race btw CPU and GPU. v3: Drop the need for ring to emit the 64bits fence, and just have each ring emit the lower 32bits of the fence sequence. We handle the wrap over 32bits in fence_process. v4: Just a small optimization: Don't reread the last_seq value if loop restarts, since we already know its value anyway. Also start at zero not one for seq value and use pre instead of post increment in emmit, otherwise wait_empty will deadlock. Signed-off-by: Jerome Glisse Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon.h | 39 ++- drivers/gpu/drm/radeon/radeon_fence.c | 116 +++-- drivers/gpu/drm/radeon/radeon_ring.c |9 +-- 3 files changed, 107 insertions(+), 57 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index e99ea81..cdf46bc 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -100,28 +100,32 @@ extern int radeon_lockup_timeout; * Copy from radeon_drv.h so we don't have to include both and have conflicting * symbol; */ -#define RADEON_MAX_USEC_TIMEOUT10 /* 100 ms */ -#define RADEON_FENCE_JIFFIES_TIMEOUT (HZ / 2) +#define RADEON_MAX_USEC_TIMEOUT10 /* 100 ms */ +#define RADEON_FENCE_JIFFIES_TIMEOUT (HZ / 2) /* RADEON_IB_POOL_SIZE must be a power of 2 */ -#define RADEON_IB_POOL_SIZE16 -#define RADEON_DEBUGFS_MAX_COMPONENTS 32 -#define RADEONFB_CONN_LIMIT4 -#define RADEON_BIOS_NUM_SCRATCH8 +#define RADEON_IB_POOL_SIZE16 +#define RADEON_DEBUGFS_MAX_COMPONENTS 32 +#define RADEONFB_CONN_LIMIT4 +#define RADEON_BIOS_NUM_SCRATCH8 /* max number of rings */ -#define RADEON_NUM_RINGS 3 +#define RADEON_NUM_RINGS 3 + +/* fence seq are set to this number when signaled */ +#define RADEON_FENCE_SIGNALED_SEQ 0LL +#define RADEON_FENCE_NOTEMITED_SEQ (~0LL) /* internal ring indices */ /* r1xx+ has gfx CP ring */ -#define RADEON_RING_TYPE_GFX_INDEX 0 +#define RADEON_RING_TYPE_GFX_INDEX 0 /* cayman has 2 compute CP rings */ -#define CAYMAN_RING_TYPE_CP1_INDEX 1 -#define CAYMAN_RING_TYPE_CP2_INDEX 2 +#define CAYMAN_RING_TYPE_CP1_INDEX 1 +#define CAYMAN_RING_TYPE_CP2_INDEX 2 /* hardcode those limit for now */ -#define RADEON_VA_RESERVED_SIZE(8 << 20) -#define RADEON_IB_VM_MAX_SIZE (64 << 10) +#define RADEON_VA_RESERVED_SIZE(8 << 20) +#define RADEON_IB_VM_MAX_SIZE (64 << 10) /* * Errata workarounds. @@ -254,8 +258,9 @@ struct radeon_fence_driver { uint32_tscratch_reg; uint64_tgpu_addr; volatile uint32_t *cpu_addr; - atomic_tseq; - uint32_tlast_seq; + /* seq is protected by ring emission lock */ + uint64_tseq; + atomic64_t last_seq; unsigned long last_activity; wait_queue_head_t queue; struct list_heademitted; @@ -268,11 +273,9 @@ struct radeon_fence { struct kref kref; struct list_headlist; /* protected by radeon_fence.lock */ - uint32_tseq; - boolemitted; - boolsignaled; + uint64_tseq; /* RB, DMA, etc. */ - int ring; + unsignedring; struct radeon_semaphore *semaphore; }; diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 5bb78bf..feb2bbc 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -66,14 +66,14 @@ int radeon_fence_emit(struct radeon_device *rdev, struct radeon_fence *fence) unsigned long irq_flags; write_lock_irqsave(>fence_lock, irq_flags); - if (fence->emitted) { + if (fence->seq && fence->seq < RADEON_FENCE_NOTEMITED_SEQ) { write_unlock_irqrestore(>fence_lock, irq_flags);
[PATCH 02/19] drm/radeon: replace the per ring mutex with a global one
A single global mutex for ring submissions seems sufficient. Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon.h |3 ++- drivers/gpu/drm/radeon/radeon_device.c|3 +-- drivers/gpu/drm/radeon/radeon_pm.c| 10 ++- drivers/gpu/drm/radeon/radeon_ring.c | 28 +++ drivers/gpu/drm/radeon/radeon_semaphore.c | 42 + 5 files changed, 41 insertions(+), 45 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 82ffa6a..e99ea81 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -676,7 +676,6 @@ struct radeon_ring { uint64_tgpu_addr; uint32_talign_mask; uint32_tptr_mask; - struct mutexmutex; boolready; u32 ptr_reg_shift; u32 ptr_reg_mask; @@ -815,6 +814,7 @@ int radeon_ring_alloc(struct radeon_device *rdev, struct radeon_ring *cp, unsign int radeon_ring_lock(struct radeon_device *rdev, struct radeon_ring *cp, unsigned ndw); void radeon_ring_commit(struct radeon_device *rdev, struct radeon_ring *cp); void radeon_ring_unlock_commit(struct radeon_device *rdev, struct radeon_ring *cp); +void radeon_ring_undo(struct radeon_ring *ring); void radeon_ring_unlock_undo(struct radeon_device *rdev, struct radeon_ring *cp); int radeon_ring_test(struct radeon_device *rdev, struct radeon_ring *cp); void radeon_ring_force_activity(struct radeon_device *rdev, struct radeon_ring *ring); @@ -1534,6 +1534,7 @@ struct radeon_device { rwlock_tfence_lock; struct radeon_fence_driver fence_drv[RADEON_NUM_RINGS]; struct radeon_semaphore_driver semaphore_drv; + struct mutexring_lock; struct radeon_ring ring[RADEON_NUM_RINGS]; struct radeon_ib_pool ib_pool; struct radeon_irq irq; diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index ff28210..3f6ff2a 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -724,8 +724,7 @@ int radeon_device_init(struct radeon_device *rdev, * can recall function without having locking issues */ radeon_mutex_init(>cs_mutex); radeon_mutex_init(>ib_pool.mutex); - for (i = 0; i < RADEON_NUM_RINGS; ++i) - mutex_init(>ring[i].mutex); + mutex_init(>ring_lock); mutex_init(>dc_hw_i2c_mutex); if (rdev->family >= CHIP_R600) spin_lock_init(>ih.lock); diff --git a/drivers/gpu/drm/radeon/radeon_pm.c b/drivers/gpu/drm/radeon/radeon_pm.c index caa55d6..7c38745 100644 --- a/drivers/gpu/drm/radeon/radeon_pm.c +++ b/drivers/gpu/drm/radeon/radeon_pm.c @@ -252,10 +252,7 @@ static void radeon_pm_set_clocks(struct radeon_device *rdev) mutex_lock(>ddev->struct_mutex); mutex_lock(>vram_mutex); - for (i = 0; i < RADEON_NUM_RINGS; ++i) { - if (rdev->ring[i].ring_obj) - mutex_lock(>ring[i].mutex); - } + mutex_lock(>ring_lock); /* gui idle int has issues on older chips it seems */ if (rdev->family >= CHIP_R600) { @@ -311,10 +308,7 @@ static void radeon_pm_set_clocks(struct radeon_device *rdev) rdev->pm.dynpm_planned_action = DYNPM_ACTION_NONE; - for (i = 0; i < RADEON_NUM_RINGS; ++i) { - if (rdev->ring[i].ring_obj) - mutex_unlock(>ring[i].mutex); - } + mutex_unlock(>ring_lock); mutex_unlock(>vram_mutex); mutex_unlock(>ddev->struct_mutex); } diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 2eb4c6e..a4d60ae 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -346,9 +346,9 @@ int radeon_ring_alloc(struct radeon_device *rdev, struct radeon_ring *ring, unsi if (ndw < ring->ring_free_dw) { break; } - mutex_unlock(>mutex); + mutex_unlock(>ring_lock); r = radeon_fence_wait_next(rdev, radeon_ring_index(rdev, ring)); - mutex_lock(>mutex); + mutex_lock(>ring_lock); if (r) return r; } @@ -361,10 +361,10 @@ int radeon_ring_lock(struct radeon_device *rdev, struct radeon_ring *ring, unsig { int r; - mutex_lock(>mutex); + mutex_lock(>ring_lock); r = radeon_ring_alloc(rdev, ring, ndw); if (r) { - mutex_unlock(>mutex); + mutex_unlock(>ring_lock); return r; } return 0; @@ -389,20 +389,25 @@ void radeon_ring_commit(struct radeon_device *rdev, struct radeon_ring
[PATCH 01/19] drm/radeon: fix possible lack of synchronization btw ttm and other ring
From: Jerome GlisseWe need to sync with the GFX ring as ttm might have schedule bo move on it and new command scheduled for other ring need to wait for bo data to be in place. Signed-off-by: Jerome Glisse Reviewed by: Christian K?nig --- drivers/gpu/drm/radeon/radeon_cs.c | 12 ++-- include/drm/radeon_drm.h |1 - 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index c66beb1..289b0d7 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -122,15 +122,15 @@ static int radeon_cs_sync_rings(struct radeon_cs_parser *p) int i, r; for (i = 0; i < p->nrelocs; i++) { + struct radeon_fence *fence; + if (!p->relocs[i].robj || !p->relocs[i].robj->tbo.sync_obj) continue; - if (!(p->relocs[i].flags & RADEON_RELOC_DONT_SYNC)) { - struct radeon_fence *fence = p->relocs[i].robj->tbo.sync_obj; - if (fence->ring != p->ring && !radeon_fence_signaled(fence)) { - sync_to_ring[fence->ring] = true; - need_sync = true; - } + fence = p->relocs[i].robj->tbo.sync_obj; + if (fence->ring != p->ring && !radeon_fence_signaled(fence)) { + sync_to_ring[fence->ring] = true; + need_sync = true; } } diff --git a/include/drm/radeon_drm.h b/include/drm/radeon_drm.h index 7c491b4..5805686 100644 --- a/include/drm/radeon_drm.h +++ b/include/drm/radeon_drm.h @@ -926,7 +926,6 @@ struct drm_radeon_cs_chunk { }; /* drm_radeon_cs_reloc.flags */ -#define RADEON_RELOC_DONT_SYNC 0x01 struct drm_radeon_cs_reloc { uint32_thandle; -- 1.7.9.5
Include request for SA improvements
Hi Dave & Jerome and everybody on the list, I can't find any more bugs and also I'm out of things to test, so I really hope that this is the last incarnation of this patchset, and if Jerome is ok with it it should now be included into drm-next. Cheers, Christian.
[PATCH 2/2 v3] drm/exynos: added userptr feature.
this feature is used to import user space region allocated by malloc() or mmaped into a gem. and to guarantee the pages to user space not to be swapped out, the VMAs within the user space would be locked and then unlocked when the pages are released. but this lock might result in significant degradation of system performance because the pages couldn't be swapped out so we limit user-desired userptr size to pre-defined. Signed-off-by: Inki Dae Signed-off-by: Kyungmin Park --- drivers/gpu/drm/exynos/exynos_drm_drv.c |2 + drivers/gpu/drm/exynos/exynos_drm_gem.c | 334 +++ drivers/gpu/drm/exynos/exynos_drm_gem.h | 17 ++- include/drm/exynos_drm.h| 26 +++- 4 files changed, 376 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c b/drivers/gpu/drm/exynos/exynos_drm_drv.c index 1e68ec2..e8ae3f1 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_drv.c +++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c @@ -220,6 +220,8 @@ static struct drm_ioctl_desc exynos_ioctls[] = { DRM_IOCTL_DEF_DRV(EXYNOS_GEM_MAP_OFFSET, exynos_drm_gem_map_offset_ioctl, DRM_UNLOCKED | DRM_AUTH), + DRM_IOCTL_DEF_DRV(EXYNOS_GEM_USERPTR, + exynos_drm_gem_userptr_ioctl, DRM_UNLOCKED), DRM_IOCTL_DEF_DRV(EXYNOS_GEM_MMAP, exynos_drm_gem_mmap_ioctl, DRM_UNLOCKED | DRM_AUTH), DRM_IOCTL_DEF_DRV(EXYNOS_GEM_GET, diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c b/drivers/gpu/drm/exynos/exynos_drm_gem.c index e6abb66..ccc6e3d 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_gem.c +++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c @@ -68,6 +68,83 @@ static int check_gem_flags(unsigned int flags) return 0; } +static struct vm_area_struct *get_vma(struct vm_area_struct *vma) +{ + struct vm_area_struct *vma_copy; + + vma_copy = kmalloc(sizeof(*vma_copy), GFP_KERNEL); + if (!vma_copy) + return NULL; + + if (vma->vm_ops && vma->vm_ops->open) + vma->vm_ops->open(vma); + + if (vma->vm_file) + get_file(vma->vm_file); + + memcpy(vma_copy, vma, sizeof(*vma)); + + vma_copy->vm_mm = NULL; + vma_copy->vm_next = NULL; + vma_copy->vm_prev = NULL; + + return vma_copy; +} + + +static void put_vma(struct vm_area_struct *vma) +{ + if (!vma) + return; + + if (vma->vm_ops && vma->vm_ops->close) + vma->vm_ops->close(vma); + + if (vma->vm_file) + fput(vma->vm_file); + + kfree(vma); +} + +/* + * lock_userptr_vma - lock VMAs within user address space + * + * this function locks vma within user address space to avoid pages + * to the userspace from being swapped out. + * if this vma isn't locked, the pages to the userspace could be swapped out + * so unprivileged user might access different pages and dma of any device + * could access physical memory region not intended once swap-in. + */ +static int lock_userptr_vma(struct exynos_drm_gem_buf *buf, unsigned int lock) +{ + struct vm_area_struct *vma; + unsigned long start, end; + + start = buf->userptr; + end = buf->userptr + buf->size - 1; + + down_write(>mm->mmap_sem); + + do { + vma = find_vma(current->mm, start); + if (!vma) { + up_write(>mm->mmap_sem); + return -EFAULT; + } + + if (lock) + vma->vm_flags |= VM_LOCKED; + else + vma->vm_flags &= ~VM_LOCKED; + + start = vma->vm_end + 1; + } while (vma->vm_end < end); + + up_write(>mm->mmap_sem); + + return 0; +} + static void update_vm_cache_attr(struct exynos_drm_gem_obj *obj, struct vm_area_struct *vma) { @@ -256,6 +333,44 @@ static void exynos_drm_gem_put_pages(struct drm_gem_object *obj) /* add some codes for UNCACHED type here. TODO */ } +static void exynos_drm_put_userptr(struct drm_gem_object *obj) +{ + struct exynos_drm_gem_obj *exynos_gem_obj; + struct exynos_drm_gem_buf *buf; + struct vm_area_struct *vma; + int npages; + + exynos_gem_obj = to_exynos_gem_obj(obj); + buf = exynos_gem_obj->buffer; + vma = exynos_gem_obj->vma; + + if (vma && (vma->vm_flags & VM_PFNMAP) && (vma->vm_pgoff)) { + put_vma(exynos_gem_obj->vma); + goto out; + } + + npages = buf->size >> PAGE_SHIFT; + + if (exynos_gem_obj->flags & EXYNOS_BO_USERPTR && !buf->pfnmap) + lock_userptr_vma(buf, 0); + + npages--; + while (npages >= 0) { + if (buf->write) + set_page_dirty_lock(buf->pages[npages]); + + put_page(buf->pages[npages]); + npages--; + } + +out: +
[PATCH 1/2 v3] drm/exynos: added userptr limit ioctl.
this ioctl is used to limit user-desired userptr size as pre-defined and also could be accessed by only root user. with userptr feature, unprivileged user can allocate all the pages on system, so the amount of free physical pages will be very limited. if the VMAs within user address space was locked, the pages couldn't be swapped out so it may result in significant degradation of system performance. so this feature would be used to avoid such situation. Signed-off-by: Inki Dae Signed-off-by: Kyungmin Park --- drivers/gpu/drm/exynos/exynos_drm_drv.c |6 ++ drivers/gpu/drm/exynos/exynos_drm_drv.h |6 ++ drivers/gpu/drm/exynos/exynos_drm_gem.c | 22 ++ drivers/gpu/drm/exynos/exynos_drm_gem.h |3 +++ include/drm/exynos_drm.h| 17 + 5 files changed, 54 insertions(+), 0 deletions(-) diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c b/drivers/gpu/drm/exynos/exynos_drm_drv.c index 9d3204c..1e68ec2 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_drv.c +++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c @@ -64,6 +64,9 @@ static int exynos_drm_load(struct drm_device *dev, unsigned long flags) return -ENOMEM; } + /* maximum size of userptr is limited to 16MB as default. */ + private->userptr_limit = SZ_16M; + INIT_LIST_HEAD(>pageflip_event_list); dev->dev_private = (void *)private; @@ -221,6 +224,9 @@ static struct drm_ioctl_desc exynos_ioctls[] = { exynos_drm_gem_mmap_ioctl, DRM_UNLOCKED | DRM_AUTH), DRM_IOCTL_DEF_DRV(EXYNOS_GEM_GET, exynos_drm_gem_get_ioctl, DRM_UNLOCKED), + DRM_IOCTL_DEF_DRV(EXYNOS_USER_LIMIT, + exynos_drm_gem_user_limit_ioctl, DRM_MASTER | + DRM_ROOT_ONLY), DRM_IOCTL_DEF_DRV(EXYNOS_PLANE_SET_ZPOS, exynos_plane_set_zpos_ioctl, DRM_UNLOCKED | DRM_AUTH), DRM_IOCTL_DEF_DRV(EXYNOS_VIDI_CONNECTION, diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.h b/drivers/gpu/drm/exynos/exynos_drm_drv.h index c82c90c..b38ed6f 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_drv.h +++ b/drivers/gpu/drm/exynos/exynos_drm_drv.h @@ -235,6 +235,12 @@ struct exynos_drm_private { * this array is used to be aware of which crtc did it request vblank. */ struct drm_crtc *crtc[MAX_CRTC]; + + /* +* maximum size of allocation by userptr feature. +* - as default, this has 16MB and only root user can change it. +*/ + unsigned long userptr_limit; }; /* diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c b/drivers/gpu/drm/exynos/exynos_drm_gem.c index fc91293..e6abb66 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_gem.c +++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c @@ -33,6 +33,8 @@ #include "exynos_drm_gem.h" #include "exynos_drm_buf.h" +#define USERPTR_MAX_SIZE SZ_64M + static unsigned int convert_to_vm_err_msg(int msg) { unsigned int out_msg; @@ -630,6 +632,26 @@ int exynos_drm_gem_get_ioctl(struct drm_device *dev, void *data, return 0; } +int exynos_drm_gem_user_limit_ioctl(struct drm_device *dev, void *data, + struct drm_file *filp) +{ + struct exynos_drm_private *priv = dev->dev_private; + struct drm_exynos_user_limit *limit = data; + + if (limit->userptr_limit < PAGE_SIZE || + limit->userptr_limit > USERPTR_MAX_SIZE) { + DRM_DEBUG_KMS("invalid userptr_limit size.\n"); + return -EINVAL; + } + + if (priv->userptr_limit == limit->userptr_limit) + return 0; + + priv->userptr_limit = limit->userptr_limit; + + return 0; +} + int exynos_drm_gem_init_object(struct drm_gem_object *obj) { DRM_DEBUG_KMS("%s\n", __FILE__); diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.h b/drivers/gpu/drm/exynos/exynos_drm_gem.h index 14d038b..3334c9f 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_gem.h +++ b/drivers/gpu/drm/exynos/exynos_drm_gem.h @@ -78,6 +78,9 @@ struct exynos_drm_gem_obj { struct page **exynos_gem_get_pages(struct drm_gem_object *obj, gfp_t gfpmask); +int exynos_drm_gem_user_limit_ioctl(struct drm_device *dev, void *data, + struct drm_file *filp); + /* destroy a buffer with gem object */ void exynos_drm_gem_destroy(struct exynos_drm_gem_obj *exynos_gem_obj); diff --git a/include/drm/exynos_drm.h b/include/drm/exynos_drm.h index 54c97e8..52465dc 100644 --- a/include/drm/exynos_drm.h +++ b/include/drm/exynos_drm.h @@ -92,6 +92,19 @@ struct drm_exynos_gem_info { }; /** + * A structure to userptr limited information. + * + * @userptr_limit: maximum size to userptr buffer. + * the buffer could be allocated by unprivileged user using malloc() + * and the size of the buffer would be limited as userptr_limit value. + * @pad: just padding to be 64-bit
[PATCH 0/2 v3] drm/exynos: added userptr feature
this feature could be used to use memory region allocated by malloc() in user mode and mmaped memory region allocated by other memory allocators. userptr interface can identify memory type through vm_flags value and would get pages or page frame numbers to user space appropriately. changelog v2: the memory region mmaped with VM_PFNMAP type is physically continuous and start address of the memory region should be set into buf->dma_addr but previous patch had a problem that end address is set into buf->dma_addr so v2 fixes that problem. changelog v3: mitigated the issues pointed out by Dave and Jerome. for this, added some codes to guarantee the pages to user space not to be swapped out, the VMAs within the user space would be locked and then unlocked when the pages are released. but this lock might result in significant degradation of system performance because the pages couldn't be swapped out so added one more feature that we can limit user-desired userptr size to pre-defined value using userptr limit ioctl that can be accessed by only root user. these issues had been pointed out by Dave and Jerome. Inki Dae (2): drm/exynos: added userptr limit ioctl. drm/exynos: added userptr feature. drivers/gpu/drm/exynos/exynos_drm_drv.c |8 + drivers/gpu/drm/exynos/exynos_drm_drv.h |6 + drivers/gpu/drm/exynos/exynos_drm_gem.c | 356 +++ drivers/gpu/drm/exynos/exynos_drm_gem.h | 20 ++- include/drm/exynos_drm.h| 43 - 5 files changed, 430 insertions(+), 3 deletions(-) -- 1.7.4.1
[PATCH 2/2 v3] drm/exynos: added userptr feature.
On Wed, May 9, 2012 at 10:45 AM, Jerome Glisse wrote: > On Wed, May 9, 2012 at 2:17 AM, Inki Dae wrote: >> this feature is used to import user space region allocated by malloc() or >> mmaped into a gem. and to guarantee the pages to user space not to be >> swapped out, the VMAs within the user space would be locked and then unlocked >> when the pages are released. >> >> but this lock might result in significant degradation of system performance >> because the pages couldn't be swapped out so we limit user-desired userptr >> size to pre-defined. >> >> Signed-off-by: Inki Dae >> Signed-off-by: Kyungmin Park > > > Again i would like feedback from mm people (adding cc). I am not sure > locking the vma is the right anwser as i said in my previous mail, > userspace can munlock it in your back, maybe VM_RESERVED is better. > Anyway even not considering that you don't check at all that process > don't go over the limit of locked page see mm/mlock.c RLIMIT_MEMLOCK > for how it's done. Also you mlock complete vma but the userptr you get > might be inside say 16M vma and you only care about 1M of userptr, if > you mark the whole vma as locked than anytime a new page is fault in > the vma else where than in the buffer you are interested then it got > allocated for ever until the gem buffer is destroy, i am not sure of > what happen to the vma on next malloc if it grows or not (i would > think it won't grow at it would have different flags than new > anonymous memory). > > The whole business of directly using malloced memory for gpu is fishy > and i would really like to get it right rather than relying on never > hitting strange things like page migration, vma merging, or worse > things like over locking pages and stealing memory. > > Cheers, > Jerome I had a lengthy discussion with mm people (thx a lot for that). I think we should split 2 different use case. The zero-copy upload case ie : app: ptr = malloc() ... glTex/VBO/UBO/...(ptr) free(ptr) or reuse it for other things For which i guess you want to avoid having to do a memcpy inside the gl library (could be anything else than gl that have same useage pattern). ie after the upload happen you don't care about those page they can removed from the vma or marked as cow so that anything messing with those page after the upload won't change what you uploaded. Of course this is assuming that the tlb cost of doing such thing is smaller than the cost of memcpy the data. Two way to do that, either you assume app can't not read back data after gl can and you do an unmap_mapping_range (make sure you only unmap fully covered page and that you copy non fully covered page) or you want to allow userspace to still read data or possibly overwrite them Second use case is something more like for the opencl case of CL_MEM_USE_HOST_PTR, in which you want to use the same page in the gpu and keep the userspace vma pointing to those page. I think the agreement on this case is that there is no way right now to do it sanely inside linux kernel. mlocking will need proper accounting against rtlimit but this limit might be low. Also the fork case might be problematic. For the fork case the memory is anonymous so it should be COWed in the fork child but relative to cl context that means the child could not use the cl context with that memory or at least if the child write to this memory the cl will not see those change. I guess the answer to that one is that you really need to use the cl api to read the object or get proper ptr to read it. Anyway in all case, implementing this userptr thing need a lot more code. You have to check to that the vma you are trying to use is anonymous and only handle this case and fallback to alloc new page and copy otherwise.. Cheers, Jerome
[Bug 49632] radeon: The kernel rejected CS,
https://bugs.freedesktop.org/show_bug.cgi?id=49632 --- Comment #9 from execute.method at gmail.com 2012-05-09 06:36:57 PDT --- No, there is nothing else in dmesg. Is there any more info you'd like me to gather? -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[Bug 49632] radeon: The kernel rejected CS,
https://bugs.freedesktop.org/show_bug.cgi?id=49632 --- Comment #8 from Alex Deucher 2012-05-09 06:14:17 PDT --- (In reply to comment #6) > Ok. I didn't realize that. I have also tried with 3.4rc8, with the same > result. > Did i miss something in building the kernel? Are you getting the same error about forbidden register 0x00028354? That register is in the allowed list for 3.4 so you shouldn't be getting that error. > > Also, what about projectM without streamout enabled? Is there anything else in dmesg other than *ERROR* Failed to parse relocation -12? -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[PATCH] drm/radeon: don't mess with hot plug detect for eDP or LVDS connector v2
On Wed, May 9, 2012 at 10:23 AM, Jerome Glisse wrote: > On Wed, May 9, 2012 at 9:40 AM, Alex Deucher wrote: >> On Fri, May 4, 2012 at 11:06 AM, ? wrote: >>> From: Jerome Glisse >>> >>> It seems imac pannel doesn't like whe we change the hot plug setup >>> and then refuse to work. This help but doesn't fully fix: >>> https://bugzilla.redhat.com/show_bug.cgi?id=726143 >> >> How does it help? ?Does it fix the the aux problems, but the monitor >> still doesn't train? ?What's the working value of the relevant >> DC_HPD*_CONTROL register? >> >> Alex > > Don't have the hw but somehow the way we program this reg completely > disable the panel, after that the panel doesn't answer to anything > (nor i2c nor any aux transaction). Without programming that link > training is successfull but panel stays black. I can ask to get the > value before and after. Patch seems reasonable in general (we don't really need hpd to be explicitly enabled for lvds or edp) so: Reviewed-by: Alex Deucher > > Cheers, > Jerome
[PATCH] drm/radeon/kms: fix warning on 32-bit in atomic fence printing
On Wed, May 9, 2012 at 12:28 PM, Dave Airlie wrote: > From: Dave Airlie > > /ssd/git/drm-core-next/drivers/gpu/drm/radeon/radeon_fence.c: In function > ?radeon_debugfs_fence_info?: > /ssd/git/drm-core-next/drivers/gpu/drm/radeon/radeon_fence.c:606:7: warning: > format ?%lx? expects argument of type ?long unsigned int?, but argument 3 has > type ?long long int? [-Wformat] > > Signed-off-by: Dave Airlie Reviewed-by: Jerome Glisse > --- > ?drivers/gpu/drm/radeon/radeon_fence.c | ? ?4 ++-- > ?1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/radeon_fence.c > b/drivers/gpu/drm/radeon/radeon_fence.c > index 48ec5e3..11f5f40 100644 > --- a/drivers/gpu/drm/radeon/radeon_fence.c > +++ b/drivers/gpu/drm/radeon/radeon_fence.c > @@ -602,8 +602,8 @@ static int radeon_debugfs_fence_info(struct seq_file *m, > void *data) > ? ? ? ? ? ? ? ? ? ? ? ?continue; > > ? ? ? ? ? ? ? ?seq_printf(m, "--- ring %d ---\n", i); > - ? ? ? ? ? ? ? seq_printf(m, "Last signaled fence 0x%016lx\n", > - ? ? ? ? ? ? ? ? ? ? ? ? ?atomic64_read(>fence_drv[i].last_seq)); > + ? ? ? ? ? ? ? seq_printf(m, "Last signaled fence 0x%016llx\n", > + ? ? ? ? ? ? ? ? ? ? ? ? ?(unsigned long > long)atomic64_read(>fence_drv[i].last_seq)); > ? ? ? ? ? ? ? ?seq_printf(m, "Last emitted ?0x%016llx\n", > ? ? ? ? ? ? ? ? ? ? ? ? ? rdev->fence_drv[i].seq); > ? ? ? ?} > -- > 1.7.7.6 > > ___ > dri-devel mailing list > dri-devel at lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCHv5 08/13] v4l: vb2-dma-contig: add support for scatterlist in userptr mode
Hello Tomasz, Laurent, I have printed some logs during the dmabuf export and attach for the SGT issue below. Please find it in the attachment. I hope it will be useful. Regards, Subash On 05/08/2012 04:45 PM, Subash Patel wrote: > Hi Laurent, > > On 05/08/2012 02:44 PM, Laurent Pinchart wrote: >> Hi Subash, >> >> On Monday 07 May 2012 20:08:25 Subash Patel wrote: >>> Hello Thomasz, Laurent, >>> >>> I found an issue in the function vb2_dc_pages_to_sgt() below. I saw that >>> during the attach, the size of the SGT and size requested mis-matched >>> (by atleast 8k bytes). Hence I made a small correction to the code as >>> below. I could then attach the importer properly. >> >> Thank you for the report. >> >> Could you print the content of the sglist (number of chunks and size >> of each >> chunk) before and after your modifications, as well as the values of >> n_pages, >> offset and size ? > I will put back all the printk's and generate this. As of now, my setup > has changed and will do this when I get sometime. >> >>> On 04/20/2012 08:15 PM, Tomasz Stanislawski wrote: >> >> [snip] >> +static struct sg_table *vb2_dc_pages_to_sgt(struct page **pages, + unsigned int n_pages, unsigned long offset, unsigned long size) +{ + struct sg_table *sgt; + unsigned int chunks; + unsigned int i; + unsigned int cur_page; + int ret; + struct scatterlist *s; + + sgt = kzalloc(sizeof *sgt, GFP_KERNEL); + if (!sgt) + return ERR_PTR(-ENOMEM); + + /* compute number of chunks */ + chunks = 1; + for (i = 1; i< n_pages; ++i) + if (pages[i] != pages[i - 1] + 1) + ++chunks; + + ret = sg_alloc_table(sgt, chunks, GFP_KERNEL); + if (ret) { + kfree(sgt); + return ERR_PTR(-ENOMEM); + } + + /* merging chunks and putting them into the scatterlist */ + cur_page = 0; + for_each_sg(sgt->sgl, s, sgt->orig_nents, i) { + unsigned long chunk_size; + unsigned int j; >>> >>> size = PAGE_SIZE; >>> + + for (j = cur_page + 1; j< n_pages; ++j) >>> >>> for (j = cur_page + 1; j< n_pages; ++j) { >>> + if (pages[j] != pages[j - 1] + 1) + break; >>> >>> size += PAGE >>> } >>> + + chunk_size = ((j - cur_page)<< PAGE_SHIFT) - offset; + sg_set_page(s, pages[cur_page], min(size, chunk_size), offset); >>> >>> [DELETE] size -= chunk_size; >>> + offset = 0; + cur_page = j; + } + + return sgt; +} >> > Regards, > Subash -- next part -- [ 178.545000] vb2_dc_pages_to_sgt() sgt->orig_nents=2 [ 178.545000] vb2_dc_pages_to_sgt():83 cur_page=0 [ 178.55] vb2_dc_pages_to_sgt():84 offset=0 [ 178.555000] vb2_dc_pages_to_sgt():86 chunk_size=131072 [ 178.56] vb2_dc_pages_to_sgt():89 size=4294836224 [ 178.565000] vb2_dc_pages_to_sgt() sgt->orig_nents=2 [ 178.57] vb2_dc_pages_to_sgt():83 cur_page=32 [ 178.575000] vb2_dc_pages_to_sgt():84 offset=0 [ 178.58] vb2_dc_pages_to_sgt():86 chunk_size=262144 [ 178.585000] vb2_dc_pages_to_sgt():89 size=4294574080 [ 178.59] vb2_dc_pages_to_sgt() sgt->orig_nents=1 [ 178.595000] vb2_dc_pages_to_sgt():83 cur_page=0 [ 178.60] vb2_dc_pages_to_sgt():84 offset=0 [ 178.605000] vb2_dc_pages_to_sgt():86 chunk_size=8192 [ 178.61] vb2_dc_pages_to_sgt():89 size=4294959104 [ 178.625000] vb2_dc_pages_to_sgt() sgt->orig_nents=1 [ 178.625000] vb2_dc_pages_to_sgt():83 cur_page=0 [ 178.63] vb2_dc_pages_to_sgt():84 offset=0 [ 178.635000] vb2_dc_pages_to_sgt():86 chunk_size=131072 [ 178.64] vb2_dc_pages_to_sgt():89 size=4294836224 [ 178.645000] vb2_dc_pages_to_sgt() sgt->orig_nents=1 [ 178.65] vb2_dc_pages_to_sgt():83 cur_page=0 [ 178.655000] vb2_dc_pages_to_sgt():84 offset=0 [ 178.66] vb2_dc_pages_to_sgt():86 chunk_size=131072 [ 178.665000] vb2_dc_pages_to_sgt():89 size=4294836224 [ 178.67] vb2_dc_mmap: mapped dma addr 0x2006 at 0xb6e01000, size 131072 [ 178.67] vb2_dc_mmap: mapped dma addr 0x2008 at 0xb6de1000, size 131072 [ 178.68] vb2_dc_pages_to_sgt() sgt->orig_nents=2 [ 178.685000] vb2_dc_pages_to_sgt():83 cur_page=0 [ 178.69] vb2_dc_pages_to_sgt():84 offset=0 [ 178.695000] vb2_dc_pages_to_sgt():86 chunk_size=4096 [ 178.70] vb2_dc_pages_to_sgt():89 size=4294963200 [ 178.705000] vb2_dc_pages_to_sgt() sgt->orig_nents=2 [ 178.71] vb2_dc_pages_to_sgt():83 cur_page=1 [ 178.715000] vb2_dc_pages_to_sgt():84 offset=0 [ 178.715000] vb2_dc_pages_to_sgt():86 chunk_size=8192 [ 178.72] vb2_dc_pages_to_sgt():89 size=4294955008 [ 178.725000] vb2_dc_pages_to_sgt() sgt->orig_nents=1 [ 178.73] vb2_dc_pages_to_sgt():83 cur_page=0 [ 178.735000] vb2_dc_pages_to_sgt():84 offset=0 [ 178.74] vb2_dc_pages_to_sgt():86 chunk_size=8192 [ 178.745000] vb2_dc_pages_to_sgt():89 size=4294959104 [ 178.75] vb2_dc_pages_to_sgt() sgt->orig_nents=1 [ 178.755000] vb2_dc_pages_to_sgt():83 cur_page=0 [ 178.76]
[PATCHv5 08/13] v4l: vb2-dma-contig: add support for scatterlist in userptr mode
Hi Subash, Could you post the code of vb2_dc_pages_to_sgt with all printk in it. It will help us avoid guessing where and what is debugged in the log. Moreover, I found a line 'size=4294836224' in the log. It means that size is equal to -131072 (!?!) or there are some invalid conversions in printk. Are you suze that you do not pass size = 0 as the function argument? Notice that earlier versions of dmabuf-for-vb2 patches has offset2 argument instead of size. It was the offset at the end of the buffer. In (I guess) 95% of cases this offset was 0. Could you provide only function arguments that causes the failure? I mean pages array + size (I assume that offset is zero for your test). Having the arguments we could reproduce that bug. Regards, Tomasz Stanislawski On 05/09/2012 08:46 AM, Subash Patel wrote: > Hello Tomasz, Laurent, > > I have printed some logs during the dmabuf export and attach for the SGT > issue below. Please find it in the attachment. I hope > it will be useful. > > Regards, > Subash > > On 05/08/2012 04:45 PM, Subash Patel wrote: >> Hi Laurent, >> >> On 05/08/2012 02:44 PM, Laurent Pinchart wrote: >>> Hi Subash, >>> >>> On Monday 07 May 2012 20:08:25 Subash Patel wrote: Hello Thomasz, Laurent, I found an issue in the function vb2_dc_pages_to_sgt() below. I saw that during the attach, the size of the SGT and size requested mis-matched (by atleast 8k bytes). Hence I made a small correction to the code as below. I could then attach the importer properly. >>> >>> Thank you for the report. >>> >>> Could you print the content of the sglist (number of chunks and size >>> of each >>> chunk) before and after your modifications, as well as the values of >>> n_pages, >>> offset and size ? >> I will put back all the printk's and generate this. As of now, my setup >> has changed and will do this when I get sometime. >>> On 04/20/2012 08:15 PM, Tomasz Stanislawski wrote: >>> >>> [snip] >>> > +static struct sg_table *vb2_dc_pages_to_sgt(struct page **pages, > + unsigned int n_pages, unsigned long offset, unsigned long size) > +{ > + struct sg_table *sgt; > + unsigned int chunks; > + unsigned int i; > + unsigned int cur_page; > + int ret; > + struct scatterlist *s; > + > + sgt = kzalloc(sizeof *sgt, GFP_KERNEL); > + if (!sgt) > + return ERR_PTR(-ENOMEM); > + > + /* compute number of chunks */ > + chunks = 1; > + for (i = 1; i< n_pages; ++i) > + if (pages[i] != pages[i - 1] + 1) > + ++chunks; > + > + ret = sg_alloc_table(sgt, chunks, GFP_KERNEL); > + if (ret) { > + kfree(sgt); > + return ERR_PTR(-ENOMEM); > + } > + > + /* merging chunks and putting them into the scatterlist */ > + cur_page = 0; > + for_each_sg(sgt->sgl, s, sgt->orig_nents, i) { > + unsigned long chunk_size; > + unsigned int j; size = PAGE_SIZE; > + > + for (j = cur_page + 1; j< n_pages; ++j) for (j = cur_page + 1; j< n_pages; ++j) { > + if (pages[j] != pages[j - 1] + 1) > + break; size += PAGE } > + > + chunk_size = ((j - cur_page)<< PAGE_SHIFT) - offset; > + sg_set_page(s, pages[cur_page], min(size, chunk_size), offset); [DELETE] size -= chunk_size; > + offset = 0; > + cur_page = j; > + } > + > + return sgt; > +} >>> >> Regards, >> Subash
[PATCH 14/20] drm/radeon: multiple ring allocator v2
On 08.05.2012 16:55, Jerome Glisse wrote: > Still i don't want to loop more than necessary, it's bad, i am ok with : > http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v3.patch > > If there is fence signaled it will retry 2 times at most, otherwise it > will go to wait and > that way better. Because with your while loop the worst case is > something proportional to > the manager size given it's 1Mo it can loop for a long long time. Yeah, this loop can indeed consume quite some time. Ok then let's at least give every ring a chance to supply some holes, otherwise I fear that we might not even found something worth to wait for after only 2 tries. Going to send that out after figuring out why the patchset still causes texture corruptions on my system. Christian.
[Bug 49632] radeon: The kernel rejected CS,
https://bugs.freedesktop.org/show_bug.cgi?id=49632 --- Comment #7 from execute.method at gmail.com 2012-05-09 04:15:06 PDT --- (In reply to comment #6) > Ok. I didn't realize that. I have also tried with 3.4rc8, with the same > result. > Did i miss something in building the kernel? > > Also, what about projectM without streamout enabled? Sorry, mean rc5. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[PATCH] mm: Work around Intel SNB GTT bug with some physical pages.
On Tue, May 08, 2012 at 02:57:25PM -0700, Hugh Dickins wrote: > On Mon, 7 May 2012, Stephane Marchesin wrote: > > > While investing some Sandy Bridge rendering corruption, I found out > > that all physical memory pages below 1MiB were returning garbage when > > read through the GTT. This has been causing graphics corruption (when > > it's used for textures, render targets and pixmaps) and GPU hangups > > (when it's used for GPU batch buffers). > > > > I talked with some people at Intel and they confirmed my findings, > > and said that a couple of other random pages were also affected. > > > > We could fix this problem by adding an e820 region preventing the > > memory below 1 MiB to be used, but that prevents at least my machine > > from booting. One could think that we should be able to fix it in > > i915, but since the allocation is done by the backing shmem this is > > not possible. > > > > In the end, I came up with the ugly workaround of just leaking the > > offending pages in shmem.c. I do realize it's truly ugly, but I'm > > looking for a fix to the existing code, and am wondering if people on > > this list have a better idea, short of rewriting i915_gem.c to > > allocate its own pages directly. > > > > Signed-off-by: Stephane Marchesin > > Well done for discovering and pursuing this issue, but of course (as > you know: you're trying to provoke us to better) your patch is revolting. > > And not even enough: swapin readahead and swapoff can read back > from swap into pages which the i915 will later turn out to dislike. > > I do have a shmem.c patch coming up for gma500, which cannot use pages > over 4GB; but that fits more reasonably with memory allocation policies, > where we expect that anyone who can use a high page can use a lower as > well, and there's already __GFP_DMA32 to set the limit. > > Your limitation is at the opposite end, so that patch won't help you at > all. And I don't see how Andi's ZONE_DMA exclusion would work, without > equal hackery to enable private zonelists, avoiding that convention. > > i915 is not the only user of shmem, and x86 not the only architecture: > we're not going to make everyone suffer for this. Once the memory > allocator gets down to giving you the low 1MB, my guess is that it's > already short of memory, and liable to deadlock or OOM if you refuse > and soak up every page it then gives you. Even if i915 has to live > with that possibility, we're not going to extend it to everyone else. > > arch/x86/Kconfig has X86_RESERVE_LOW, default 64, range 4 640 (and > I think we reserve all the memory range from 640kB to 1MB anyway). > Would setting that to 640 allow you to boot, and avoid the i915 > problem on all but the odd five pages? I'm not pretending that's > an ideal solution at all (unless freeing initmem could release most > of it on non-SandyBridge and non-i915 machines), but it would be > useful to know if that does provide a stopgap solution. If that > does work, maybe we just mark the odd five PageReserved at startup. Hm, as a stopgap measure to make Sandybridge gpus not die that sounds pretty good. But we still need a more generic solution for the long-term, see below > Is there really no way this can be handled closer to the source of > the problem, in the i915 driver itself? I do not know the flow of > control in i915 (and X) at all, but on the surface it seems that the > problem only comes when you map these problematic pages into the GTT > (if I'm even using the right terminology), and something (not shmem.c) > actively has to do that. > > Can't you check the pfn at that point, and if it's an unsuitable page, > copy into a suitable page (perhaps allocated then, perhaps from a pool > you primed earlier) and map that suitable page into the GTT instead? > Maybe using page->private to link them if that helps. > > So long as the page (or its shadow) is mapped into the GTT, I imagine > it would be pinned, and not liable to be swapped out or otherwise > interfered with by shmem.c. And when you unmap it from GTT, copy > back to the unsuitable shmem object page before unpinning. > > I fully accept that I have very little understanding of GPU DRM GTT > and i915, and this may be impossible or incoherent: but please, let's > try to keep the strangeness where it belongs. If necessary, we'll > have add some kind of flag and callback from shmem.c to the driver; > but I'd so much prefer to avoid that. The copy stuff back approach is pretty much what ttm uses atm: It allocates suitable pages with whatever means it has (usually through the dma api) and if the shrinker callback tells it that it's sitting on too much memory, it copies stuff out to the shmem backing storage used by gem. There are quite a few issues with that approach: - We expose mmap to the shmem file directly to userspace in i915. We use these extensively on Sandybridge because there direct cpu access is coherent with what the gpu does. Original userspace would always
[Bug 49632] radeon: The kernel rejected CS,
https://bugs.freedesktop.org/show_bug.cgi?id=49632 --- Comment #6 from execute.method at gmail.com 2012-05-09 03:55:23 PDT --- Ok. I didn't realize that. I have also tried with 3.4rc8, with the same result. Did i miss something in building the kernel? Also, what about projectM without streamout enabled? -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[PATCH 2/2 v3] drm/exynos: added userptr feature.
On Wed, May 9, 2012 at 2:17 AM, Inki Dae wrote: > this feature is used to import user space region allocated by malloc() or > mmaped into a gem. and to guarantee the pages to user space not to be > swapped out, the VMAs within the user space would be locked and then unlocked > when the pages are released. > > but this lock might result in significant degradation of system performance > because the pages couldn't be swapped out so we limit user-desired userptr > size to pre-defined. > > Signed-off-by: Inki Dae > Signed-off-by: Kyungmin Park Again i would like feedback from mm people (adding cc). I am not sure locking the vma is the right anwser as i said in my previous mail, userspace can munlock it in your back, maybe VM_RESERVED is better. Anyway even not considering that you don't check at all that process don't go over the limit of locked page see mm/mlock.c RLIMIT_MEMLOCK for how it's done. Also you mlock complete vma but the userptr you get might be inside say 16M vma and you only care about 1M of userptr, if you mark the whole vma as locked than anytime a new page is fault in the vma else where than in the buffer you are interested then it got allocated for ever until the gem buffer is destroy, i am not sure of what happen to the vma on next malloc if it grows or not (i would think it won't grow at it would have different flags than new anonymous memory). The whole business of directly using malloced memory for gpu is fishy and i would really like to get it right rather than relying on never hitting strange things like page migration, vma merging, or worse things like over locking pages and stealing memory. Cheers, Jerome > --- > ?drivers/gpu/drm/exynos/exynos_drm_drv.c | ? ?2 + > ?drivers/gpu/drm/exynos/exynos_drm_gem.c | ?334 > +++ > ?drivers/gpu/drm/exynos/exynos_drm_gem.h | ? 17 ++- > ?include/drm/exynos_drm.h ? ? ? ? ? ? ? ?| ? 26 +++- > ?4 files changed, 376 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c > b/drivers/gpu/drm/exynos/exynos_drm_drv.c > index 1e68ec2..e8ae3f1 100644 > --- a/drivers/gpu/drm/exynos/exynos_drm_drv.c > +++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c > @@ -220,6 +220,8 @@ static struct drm_ioctl_desc exynos_ioctls[] = { > ? ? ? ?DRM_IOCTL_DEF_DRV(EXYNOS_GEM_MAP_OFFSET, > ? ? ? ? ? ? ? ? ? ? ? ?exynos_drm_gem_map_offset_ioctl, DRM_UNLOCKED | > ? ? ? ? ? ? ? ? ? ? ? ?DRM_AUTH), > + ? ? ? DRM_IOCTL_DEF_DRV(EXYNOS_GEM_USERPTR, > + ? ? ? ? ? ? ? ? ? ? ? exynos_drm_gem_userptr_ioctl, DRM_UNLOCKED), > ? ? ? ?DRM_IOCTL_DEF_DRV(EXYNOS_GEM_MMAP, > ? ? ? ? ? ? ? ? ? ? ? ?exynos_drm_gem_mmap_ioctl, DRM_UNLOCKED | DRM_AUTH), > ? ? ? ?DRM_IOCTL_DEF_DRV(EXYNOS_GEM_GET, > diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c > b/drivers/gpu/drm/exynos/exynos_drm_gem.c > index e6abb66..ccc6e3d 100644 > --- a/drivers/gpu/drm/exynos/exynos_drm_gem.c > +++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c > @@ -68,6 +68,83 @@ static int check_gem_flags(unsigned int flags) > ? ? ? ?return 0; > ?} > > +static struct vm_area_struct *get_vma(struct vm_area_struct *vma) > +{ > + ? ? ? struct vm_area_struct *vma_copy; > + > + ? ? ? vma_copy = kmalloc(sizeof(*vma_copy), GFP_KERNEL); > + ? ? ? if (!vma_copy) > + ? ? ? ? ? ? ? return NULL; > + > + ? ? ? if (vma->vm_ops && vma->vm_ops->open) > + ? ? ? ? ? ? ? vma->vm_ops->open(vma); > + > + ? ? ? if (vma->vm_file) > + ? ? ? ? ? ? ? get_file(vma->vm_file); > + > + ? ? ? memcpy(vma_copy, vma, sizeof(*vma)); > + > + ? ? ? vma_copy->vm_mm = NULL; > + ? ? ? vma_copy->vm_next = NULL; > + ? ? ? vma_copy->vm_prev = NULL; > + > + ? ? ? return vma_copy; > +} > + > + > +static void put_vma(struct vm_area_struct *vma) > +{ > + ? ? ? if (!vma) > + ? ? ? ? ? ? ? return; > + > + ? ? ? if (vma->vm_ops && vma->vm_ops->close) > + ? ? ? ? ? ? ? vma->vm_ops->close(vma); > + > + ? ? ? if (vma->vm_file) > + ? ? ? ? ? ? ? fput(vma->vm_file); > + > + ? ? ? kfree(vma); > +} > + > +/* > + * lock_userptr_vma - lock VMAs within user address space > + * > + * this function locks vma within user address space to avoid pages > + * to the userspace from being swapped out. > + * if this vma isn't locked, the pages to the userspace could be swapped out > + * so unprivileged user might access different pages and dma of any device > + * could access physical memory region not intended once swap-in. > + */ > +static int lock_userptr_vma(struct exynos_drm_gem_buf *buf, unsigned int > lock) > +{ > + ? ? ? struct vm_area_struct *vma; > + ? ? ? unsigned long start, end; > + > + ? ? ? start = buf->userptr; > + ? ? ? end = buf->userptr + buf->size - 1; > + > + ? ? ? down_write(>mm->mmap_sem); > + > + ? ? ? do { > + ? ? ? ? ? ? ? vma = find_vma(current->mm, start); > + ? ? ? ? ? ? ? if (!vma) { > + ? ? ? ? ? ? ? ? ? ? ? up_write(>mm->mmap_sem); > + ? ? ? ? ? ? ? ? ? ? ? return -EFAULT; > + ? ? ? ? ? ? ? } > + > + ? ? ? ? ? ? ? if (lock) > + ? ? ? ? ? ? ? ? ? ? ? vma->vm_flags |= VM_LOCKED; > + ? ? ? ? ? ? ? else > + ? ? ?
Include request for SA improvements
On Wed, May 9, 2012 at 9:34 AM, Christian K?nig wrote: > Hi Dave & Jerome and everybody on the list, > > I can't find any more bugs and also I'm out of things to test, so I really > hope that this is the last incarnation of this patchset, and if Jerome is > ok with it it should now be included into drm-next. > > Cheers, > Christian. > Yeah looks good to me. Cheers, Jerome
[PATCH] drm/radeon: don't mess with hot plug detect for eDP or LVDS connector v2
On Wed, May 9, 2012 at 9:40 AM, Alex Deucher wrote: > On Fri, May 4, 2012 at 11:06 AM, ? wrote: >> From: Jerome Glisse >> >> It seems imac pannel doesn't like whe we change the hot plug setup >> and then refuse to work. This help but doesn't fully fix: >> https://bugzilla.redhat.com/show_bug.cgi?id=726143 > > How does it help? ?Does it fix the the aux problems, but the monitor > still doesn't train? ?What's the working value of the relevant > DC_HPD*_CONTROL register? > > Alex Don't have the hw but somehow the way we program this reg completely disable the panel, after that the panel doesn't answer to anything (nor i2c nor any aux transaction). Without programming that link training is successfull but panel stays black. I can ask to get the value before and after. Cheers, Jerome
[PATCH] drm/radeon: don't mess with hot plug detect for eDP or LVDS connector v2
On Fri, May 4, 2012 at 11:06 AM, wrote: > From: Jerome Glisse > > It seems imac pannel doesn't like whe we change the hot plug setup > and then refuse to work. This help but doesn't fully fix: > https://bugzilla.redhat.com/show_bug.cgi?id=726143 How does it help? Does it fix the the aux problems, but the monitor still doesn't train? What's the working value of the relevant DC_HPD*_CONTROL register? Alex > > v2: fix typo and improve commit message > > Signed-off-by: Matthew Garrett > Signed-off-by: Jerome Glisse > --- > ?drivers/gpu/drm/radeon/r600.c | ? ?8 > ?1 file changed, 8 insertions(+) > > diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c > index 694b6b2..a304c9d 100644 > --- a/drivers/gpu/drm/radeon/r600.c > +++ b/drivers/gpu/drm/radeon/r600.c > @@ -713,6 +713,14 @@ void r600_hpd_init(struct radeon_device *rdev) > ? ? ? ?list_for_each_entry(connector, >mode_config.connector_list, head) > { > ? ? ? ? ? ? ? ?struct radeon_connector *radeon_connector = > to_radeon_connector(connector); > > + ? ? ? ? ? ? ? if (connector->connector_type == DRM_MODE_CONNECTOR_eDP || > + ? ? ? ? ? ? ? ? ? connector->connector_type == DRM_MODE_CONNECTOR_LVDS) { > + ? ? ? ? ? ? ? ? ? ? ? /* don't try to enable hpd on eDP or LVDS avoid > breaking the > + ? ? ? ? ? ? ? ? ? ? ? ?* aux dp channel on imac and help (but not > completely fix) > + ? ? ? ? ? ? ? ? ? ? ? ?* https://bugzilla.redhat.com/show_bug.cgi?id=726143 > + ? ? ? ? ? ? ? ? ? ? ? ?*/ > + ? ? ? ? ? ? ? ? ? ? ? continue; > + ? ? ? ? ? ? ? } > ? ? ? ? ? ? ? ?if (ASIC_IS_DCE3(rdev)) { > ? ? ? ? ? ? ? ? ? ? ? ?u32 tmp = DC_HPDx_CONNECTION_TIMER(0x9c4) | > DC_HPDx_RX_INT_TIMER(0xfa); > ? ? ? ? ? ? ? ? ? ? ? ?if (ASIC_IS_DCE32(rdev)) > -- > 1.7.9.3 > > ___ > dri-devel mailing list > dri-devel at lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 49603] [regression] Fullscreen video no longer smooth with GPU in low power mode
https://bugs.freedesktop.org/show_bug.cgi?id=49603 --- Comment #5 from Simon Farnsworth 2012-05-09 02:30:15 PDT --- Created attachment 61271 --> https://bugs.freedesktop.org/attachment.cgi?id=61271 A program to stop the CPU entering low power states Trying a different route. Can you compile the attached program with "gcc -o stopsleep stopsleep.c" and leave it running while playing a video? It tells the kernel to avoid using deep sleep states when idling. If it helps, we have a clue. If it doesn't, and video decode doesn't saturate all your CPU cores, can you try running "while true ; do true ; done" in a background shell and see if that helps? The goal of both of these is to see if the problem is that we're now letting the CPU idle more than we used to, and finding that the resulting power save modes hurt. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[PATCH] drm/radeon: eliminate redundant connector_names table
On Fri, May 4, 2012 at 11:25 AM, Ilija Hadzic wrote: > connector_names table is just a repeat of information that > already exists in drm_connector_enum_list and the same string > can be retrieved using drm_get_connector_name function. > > Nuke the redundant table and use the proper function to retrieve > the connector name. > > Signed-off-by: Ilija Hadzic Reviewed-by: Alex Deucher > --- > ?drivers/gpu/drm/radeon/radeon_display.c | ? 20 +--- > ?1 files changed, 1 insertions(+), 19 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/radeon_display.c > b/drivers/gpu/drm/radeon/radeon_display.c > index 8086c96..60aecc5 100644 > --- a/drivers/gpu/drm/radeon/radeon_display.c > +++ b/drivers/gpu/drm/radeon/radeon_display.c > @@ -572,24 +572,6 @@ static const char *encoder_names[36] = { > ? ? ? ?"TRAVIS", > ?}; > > -static const char *connector_names[15] = { > - ? ? ? "Unknown", > - ? ? ? "VGA", > - ? ? ? "DVI-I", > - ? ? ? "DVI-D", > - ? ? ? "DVI-A", > - ? ? ? "Composite", > - ? ? ? "S-video", > - ? ? ? "LVDS", > - ? ? ? "Component", > - ? ? ? "DIN", > - ? ? ? "DisplayPort", > - ? ? ? "HDMI-A", > - ? ? ? "HDMI-B", > - ? ? ? "TV", > - ? ? ? "eDP", > -}; > - > ?static const char *hpd_names[6] = { > ? ? ? ?"HPD1", > ? ? ? ?"HPD2", > @@ -612,7 +594,7 @@ static void radeon_print_display_setup(struct drm_device > *dev) > ? ? ? ?list_for_each_entry(connector, >mode_config.connector_list, head) > { > ? ? ? ? ? ? ? ?radeon_connector = to_radeon_connector(connector); > ? ? ? ? ? ? ? ?DRM_INFO("Connector %d:\n", i); > - ? ? ? ? ? ? ? DRM_INFO(" ?%s\n", > connector_names[connector->connector_type]); > + ? ? ? ? ? ? ? DRM_INFO(" ?%s\n", drm_get_connector_name(connector)); > ? ? ? ? ? ? ? ?if (radeon_connector->hpd.hpd != RADEON_HPD_NONE) > ? ? ? ? ? ? ? ? ? ? ? ?DRM_INFO(" ?%s\n", > hpd_names[radeon_connector->hpd.hpd]); > ? ? ? ? ? ? ? ?if (radeon_connector->ddc_bus) { > -- > 1.7.8.5 > > ___ > dri-devel mailing list > dri-devel at lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] MAINTAINERS: switch drm/i915 to Daniel Vetter
On Wed, May 09, 2012 at 07:51:35AM +0100, Dave Airlie wrote: > On Tue, May 8, 2012 at 12:19 PM, Daniel Vetter > wrote: > > ... given that I essentially run the show already, let's make this > > official. > > Acked-by: Dave Airlie > > probably just push it via your -next. Will do. Thanks, Daniel -- Daniel Vetter Mail: daniel at ffwll.ch Mobile: +41 (0)79 365 57 48
[PATCH] MAINTAINERS: switch drm/i915 to Daniel Vetter
On Tue, May 8, 2012 at 12:19 PM, Daniel Vetter wrote: > ... given that I essentially run the show already, let's make this > official. Acked-by: Dave Airlie probably just push it via your -next. Dave.
[Bug 49110] debug build: AMDILCFGStructurizer.cpp:1751:3: error: 'isCurrentDebugType' was not declared in this scope
https://bugs.freedesktop.org/show_bug.cgi?id=49110 --- Comment #6 from Mike Mestnik2012-05-08 20:25:23 PDT --- I've added 3 patches to http://llvm.org/bugs/show_bug.cgi?id=12759 and did my best to describe what/why. I believe that mesa also needs this done for it's use of NDEBUG, especially if #if(s) are used to protect object exports in include files as was the case with llvm. Either way, it's still possible for mesa to not directly trample over this project global define. This is essentially what I did for llvm: #ifdef LLVM_NDEBUG #define NDEBUG LLVM_NDEBUG #endif #include In the case where assert.h is included in an include file, like FLAC and alsa do, then NDEBUG should be save/restored around the assert.h include... Not that it'll do any good as in those situations it's first come first served and projects that use assert will likely include it ~vary~ early. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[Bug 49603] [regression] Fullscreen video no longer smooth with GPU in low power mode
https://bugs.freedesktop.org/show_bug.cgi?id=49603 --- Comment #4 from Alex Deucher 2012-05-08 19:24:48 PDT --- Does forcing the CPU into the highest power state fix the issue? I suspect that the patch reduces CPU usage which in turn means the CPU power state stays lower longer. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[PATCH 0/2 v3] drm/exynos: added userptr feature
this feature could be used to use memory region allocated by malloc() in user mode and mmaped memory region allocated by other memory allocators. userptr interface can identify memory type through vm_flags value and would get pages or page frame numbers to user space appropriately. changelog v2: the memory region mmaped with VM_PFNMAP type is physically continuous and start address of the memory region should be set into buf-dma_addr but previous patch had a problem that end address is set into buf-dma_addr so v2 fixes that problem. changelog v3: mitigated the issues pointed out by Dave and Jerome. for this, added some codes to guarantee the pages to user space not to be swapped out, the VMAs within the user space would be locked and then unlocked when the pages are released. but this lock might result in significant degradation of system performance because the pages couldn't be swapped out so added one more feature that we can limit user-desired userptr size to pre-defined value using userptr limit ioctl that can be accessed by only root user. these issues had been pointed out by Dave and Jerome. Inki Dae (2): drm/exynos: added userptr limit ioctl. drm/exynos: added userptr feature. drivers/gpu/drm/exynos/exynos_drm_drv.c |8 + drivers/gpu/drm/exynos/exynos_drm_drv.h |6 + drivers/gpu/drm/exynos/exynos_drm_gem.c | 356 +++ drivers/gpu/drm/exynos/exynos_drm_gem.h | 20 ++- include/drm/exynos_drm.h| 43 - 5 files changed, 430 insertions(+), 3 deletions(-) -- 1.7.4.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 1/2 v3] drm/exynos: added userptr limit ioctl.
this ioctl is used to limit user-desired userptr size as pre-defined and also could be accessed by only root user. with userptr feature, unprivileged user can allocate all the pages on system, so the amount of free physical pages will be very limited. if the VMAs within user address space was locked, the pages couldn't be swapped out so it may result in significant degradation of system performance. so this feature would be used to avoid such situation. Signed-off-by: Inki Dae inki@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- drivers/gpu/drm/exynos/exynos_drm_drv.c |6 ++ drivers/gpu/drm/exynos/exynos_drm_drv.h |6 ++ drivers/gpu/drm/exynos/exynos_drm_gem.c | 22 ++ drivers/gpu/drm/exynos/exynos_drm_gem.h |3 +++ include/drm/exynos_drm.h| 17 + 5 files changed, 54 insertions(+), 0 deletions(-) diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c b/drivers/gpu/drm/exynos/exynos_drm_drv.c index 9d3204c..1e68ec2 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_drv.c +++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c @@ -64,6 +64,9 @@ static int exynos_drm_load(struct drm_device *dev, unsigned long flags) return -ENOMEM; } + /* maximum size of userptr is limited to 16MB as default. */ + private-userptr_limit = SZ_16M; + INIT_LIST_HEAD(private-pageflip_event_list); dev-dev_private = (void *)private; @@ -221,6 +224,9 @@ static struct drm_ioctl_desc exynos_ioctls[] = { exynos_drm_gem_mmap_ioctl, DRM_UNLOCKED | DRM_AUTH), DRM_IOCTL_DEF_DRV(EXYNOS_GEM_GET, exynos_drm_gem_get_ioctl, DRM_UNLOCKED), + DRM_IOCTL_DEF_DRV(EXYNOS_USER_LIMIT, + exynos_drm_gem_user_limit_ioctl, DRM_MASTER | + DRM_ROOT_ONLY), DRM_IOCTL_DEF_DRV(EXYNOS_PLANE_SET_ZPOS, exynos_plane_set_zpos_ioctl, DRM_UNLOCKED | DRM_AUTH), DRM_IOCTL_DEF_DRV(EXYNOS_VIDI_CONNECTION, diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.h b/drivers/gpu/drm/exynos/exynos_drm_drv.h index c82c90c..b38ed6f 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_drv.h +++ b/drivers/gpu/drm/exynos/exynos_drm_drv.h @@ -235,6 +235,12 @@ struct exynos_drm_private { * this array is used to be aware of which crtc did it request vblank. */ struct drm_crtc *crtc[MAX_CRTC]; + + /* +* maximum size of allocation by userptr feature. +* - as default, this has 16MB and only root user can change it. +*/ + unsigned long userptr_limit; }; /* diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c b/drivers/gpu/drm/exynos/exynos_drm_gem.c index fc91293..e6abb66 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_gem.c +++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c @@ -33,6 +33,8 @@ #include exynos_drm_gem.h #include exynos_drm_buf.h +#define USERPTR_MAX_SIZE SZ_64M + static unsigned int convert_to_vm_err_msg(int msg) { unsigned int out_msg; @@ -630,6 +632,26 @@ int exynos_drm_gem_get_ioctl(struct drm_device *dev, void *data, return 0; } +int exynos_drm_gem_user_limit_ioctl(struct drm_device *dev, void *data, + struct drm_file *filp) +{ + struct exynos_drm_private *priv = dev-dev_private; + struct drm_exynos_user_limit *limit = data; + + if (limit-userptr_limit PAGE_SIZE || + limit-userptr_limit USERPTR_MAX_SIZE) { + DRM_DEBUG_KMS(invalid userptr_limit size.\n); + return -EINVAL; + } + + if (priv-userptr_limit == limit-userptr_limit) + return 0; + + priv-userptr_limit = limit-userptr_limit; + + return 0; +} + int exynos_drm_gem_init_object(struct drm_gem_object *obj) { DRM_DEBUG_KMS(%s\n, __FILE__); diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.h b/drivers/gpu/drm/exynos/exynos_drm_gem.h index 14d038b..3334c9f 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_gem.h +++ b/drivers/gpu/drm/exynos/exynos_drm_gem.h @@ -78,6 +78,9 @@ struct exynos_drm_gem_obj { struct page **exynos_gem_get_pages(struct drm_gem_object *obj, gfp_t gfpmask); +int exynos_drm_gem_user_limit_ioctl(struct drm_device *dev, void *data, + struct drm_file *filp); + /* destroy a buffer with gem object */ void exynos_drm_gem_destroy(struct exynos_drm_gem_obj *exynos_gem_obj); diff --git a/include/drm/exynos_drm.h b/include/drm/exynos_drm.h index 54c97e8..52465dc 100644 --- a/include/drm/exynos_drm.h +++ b/include/drm/exynos_drm.h @@ -92,6 +92,19 @@ struct drm_exynos_gem_info { }; /** + * A structure to userptr limited information. + * + * @userptr_limit: maximum size to userptr buffer. + * the buffer could be allocated by unprivileged user using malloc() + * and the size of the buffer would be limited as userptr_limit value.
[PATCH 2/2 v3] drm/exynos: added userptr feature.
this feature is used to import user space region allocated by malloc() or mmaped into a gem. and to guarantee the pages to user space not to be swapped out, the VMAs within the user space would be locked and then unlocked when the pages are released. but this lock might result in significant degradation of system performance because the pages couldn't be swapped out so we limit user-desired userptr size to pre-defined. Signed-off-by: Inki Dae inki@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- drivers/gpu/drm/exynos/exynos_drm_drv.c |2 + drivers/gpu/drm/exynos/exynos_drm_gem.c | 334 +++ drivers/gpu/drm/exynos/exynos_drm_gem.h | 17 ++- include/drm/exynos_drm.h| 26 +++- 4 files changed, 376 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c b/drivers/gpu/drm/exynos/exynos_drm_drv.c index 1e68ec2..e8ae3f1 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_drv.c +++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c @@ -220,6 +220,8 @@ static struct drm_ioctl_desc exynos_ioctls[] = { DRM_IOCTL_DEF_DRV(EXYNOS_GEM_MAP_OFFSET, exynos_drm_gem_map_offset_ioctl, DRM_UNLOCKED | DRM_AUTH), + DRM_IOCTL_DEF_DRV(EXYNOS_GEM_USERPTR, + exynos_drm_gem_userptr_ioctl, DRM_UNLOCKED), DRM_IOCTL_DEF_DRV(EXYNOS_GEM_MMAP, exynos_drm_gem_mmap_ioctl, DRM_UNLOCKED | DRM_AUTH), DRM_IOCTL_DEF_DRV(EXYNOS_GEM_GET, diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c b/drivers/gpu/drm/exynos/exynos_drm_gem.c index e6abb66..ccc6e3d 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_gem.c +++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c @@ -68,6 +68,83 @@ static int check_gem_flags(unsigned int flags) return 0; } +static struct vm_area_struct *get_vma(struct vm_area_struct *vma) +{ + struct vm_area_struct *vma_copy; + + vma_copy = kmalloc(sizeof(*vma_copy), GFP_KERNEL); + if (!vma_copy) + return NULL; + + if (vma-vm_ops vma-vm_ops-open) + vma-vm_ops-open(vma); + + if (vma-vm_file) + get_file(vma-vm_file); + + memcpy(vma_copy, vma, sizeof(*vma)); + + vma_copy-vm_mm = NULL; + vma_copy-vm_next = NULL; + vma_copy-vm_prev = NULL; + + return vma_copy; +} + + +static void put_vma(struct vm_area_struct *vma) +{ + if (!vma) + return; + + if (vma-vm_ops vma-vm_ops-close) + vma-vm_ops-close(vma); + + if (vma-vm_file) + fput(vma-vm_file); + + kfree(vma); +} + +/* + * lock_userptr_vma - lock VMAs within user address space + * + * this function locks vma within user address space to avoid pages + * to the userspace from being swapped out. + * if this vma isn't locked, the pages to the userspace could be swapped out + * so unprivileged user might access different pages and dma of any device + * could access physical memory region not intended once swap-in. + */ +static int lock_userptr_vma(struct exynos_drm_gem_buf *buf, unsigned int lock) +{ + struct vm_area_struct *vma; + unsigned long start, end; + + start = buf-userptr; + end = buf-userptr + buf-size - 1; + + down_write(current-mm-mmap_sem); + + do { + vma = find_vma(current-mm, start); + if (!vma) { + up_write(current-mm-mmap_sem); + return -EFAULT; + } + + if (lock) + vma-vm_flags |= VM_LOCKED; + else + vma-vm_flags = ~VM_LOCKED; + + start = vma-vm_end + 1; + } while (vma-vm_end end); + + up_write(current-mm-mmap_sem); + + return 0; +} + static void update_vm_cache_attr(struct exynos_drm_gem_obj *obj, struct vm_area_struct *vma) { @@ -256,6 +333,44 @@ static void exynos_drm_gem_put_pages(struct drm_gem_object *obj) /* add some codes for UNCACHED type here. TODO */ } +static void exynos_drm_put_userptr(struct drm_gem_object *obj) +{ + struct exynos_drm_gem_obj *exynos_gem_obj; + struct exynos_drm_gem_buf *buf; + struct vm_area_struct *vma; + int npages; + + exynos_gem_obj = to_exynos_gem_obj(obj); + buf = exynos_gem_obj-buffer; + vma = exynos_gem_obj-vma; + + if (vma (vma-vm_flags VM_PFNMAP) (vma-vm_pgoff)) { + put_vma(exynos_gem_obj-vma); + goto out; + } + + npages = buf-size PAGE_SHIFT; + + if (exynos_gem_obj-flags EXYNOS_BO_USERPTR !buf-pfnmap) + lock_userptr_vma(buf, 0); + + npages--; + while (npages = 0) { + if (buf-write) + set_page_dirty_lock(buf-pages[npages]); + + put_page(buf-pages[npages]); + npages--; + } + +out: +
Re: [PATCHv5 08/13] v4l: vb2-dma-contig: add support for scatterlist in userptr mode
Hi Laurent, On 05/08/2012 02:44 PM, Laurent Pinchart wrote: Hi Subash, On Monday 07 May 2012 20:08:25 Subash Patel wrote: Hello Thomasz, Laurent, I found an issue in the function vb2_dc_pages_to_sgt() below. I saw that during the attach, the size of the SGT and size requested mis-matched (by atleast 8k bytes). Hence I made a small correction to the code as below. I could then attach the importer properly. Thank you for the report. Could you print the content of the sglist (number of chunks and size of each chunk) before and after your modifications, as well as the values of n_pages, offset and size ? I will put back all the printk's and generate this. As of now, my setup has changed and will do this when I get sometime. On 04/20/2012 08:15 PM, Tomasz Stanislawski wrote: [snip] +static struct sg_table *vb2_dc_pages_to_sgt(struct page **pages, + unsigned int n_pages, unsigned long offset, unsigned long size) +{ + struct sg_table *sgt; + unsigned int chunks; + unsigned int i; + unsigned int cur_page; + int ret; + struct scatterlist *s; + + sgt = kzalloc(sizeof *sgt, GFP_KERNEL); + if (!sgt) + return ERR_PTR(-ENOMEM); + + /* compute number of chunks */ + chunks = 1; + for (i = 1; i n_pages; ++i) + if (pages[i] != pages[i - 1] + 1) + ++chunks; + + ret = sg_alloc_table(sgt, chunks, GFP_KERNEL); + if (ret) { + kfree(sgt); + return ERR_PTR(-ENOMEM); + } + + /* merging chunks and putting them into the scatterlist */ + cur_page = 0; + for_each_sg(sgt-sgl, s, sgt-orig_nents, i) { + unsigned long chunk_size; + unsigned int j; size = PAGE_SIZE; + + for (j = cur_page + 1; j n_pages; ++j) for (j = cur_page + 1; j n_pages; ++j) { + if (pages[j] != pages[j - 1] + 1) + break; size += PAGE } + + chunk_size = ((j - cur_page) PAGE_SHIFT) - offset; + sg_set_page(s, pages[cur_page], min(size, chunk_size), offset); [DELETE] size -= chunk_size; + offset = 0; + cur_page = j; + } + + return sgt; +} Regards, Subash ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] mm: Work around Intel SNB GTT bug with some physical pages.
On Mon, 7 May 2012, Stephane Marchesin wrote: While investing some Sandy Bridge rendering corruption, I found out that all physical memory pages below 1MiB were returning garbage when read through the GTT. This has been causing graphics corruption (when it's used for textures, render targets and pixmaps) and GPU hangups (when it's used for GPU batch buffers). I talked with some people at Intel and they confirmed my findings, and said that a couple of other random pages were also affected. We could fix this problem by adding an e820 region preventing the memory below 1 MiB to be used, but that prevents at least my machine from booting. One could think that we should be able to fix it in i915, but since the allocation is done by the backing shmem this is not possible. In the end, I came up with the ugly workaround of just leaking the offending pages in shmem.c. I do realize it's truly ugly, but I'm looking for a fix to the existing code, and am wondering if people on this list have a better idea, short of rewriting i915_gem.c to allocate its own pages directly. Signed-off-by: Stephane Marchesin marc...@chromium.org Well done for discovering and pursuing this issue, but of course (as you know: you're trying to provoke us to better) your patch is revolting. And not even enough: swapin readahead and swapoff can read back from swap into pages which the i915 will later turn out to dislike. I do have a shmem.c patch coming up for gma500, which cannot use pages over 4GB; but that fits more reasonably with memory allocation policies, where we expect that anyone who can use a high page can use a lower as well, and there's already __GFP_DMA32 to set the limit. Your limitation is at the opposite end, so that patch won't help you at all. And I don't see how Andi's ZONE_DMA exclusion would work, without equal hackery to enable private zonelists, avoiding that convention. i915 is not the only user of shmem, and x86 not the only architecture: we're not going to make everyone suffer for this. Once the memory allocator gets down to giving you the low 1MB, my guess is that it's already short of memory, and liable to deadlock or OOM if you refuse and soak up every page it then gives you. Even if i915 has to live with that possibility, we're not going to extend it to everyone else. arch/x86/Kconfig has X86_RESERVE_LOW, default 64, range 4 640 (and I think we reserve all the memory range from 640kB to 1MB anyway). Would setting that to 640 allow you to boot, and avoid the i915 problem on all but the odd five pages? I'm not pretending that's an ideal solution at all (unless freeing initmem could release most of it on non-SandyBridge and non-i915 machines), but it would be useful to know if that does provide a stopgap solution. If that does work, maybe we just mark the odd five PageReserved at startup. Is there really no way this can be handled closer to the source of the problem, in the i915 driver itself? I do not know the flow of control in i915 (and X) at all, but on the surface it seems that the problem only comes when you map these problematic pages into the GTT (if I'm even using the right terminology), and something (not shmem.c) actively has to do that. Can't you check the pfn at that point, and if it's an unsuitable page, copy into a suitable page (perhaps allocated then, perhaps from a pool you primed earlier) and map that suitable page into the GTT instead? Maybe using page-private to link them if that helps. So long as the page (or its shadow) is mapped into the GTT, I imagine it would be pinned, and not liable to be swapped out or otherwise interfered with by shmem.c. And when you unmap it from GTT, copy back to the unsuitable shmem object page before unpinning. I fully accept that I have very little understanding of GPU DRM GTT and i915, and this may be impossible or incoherent: but please, let's try to keep the strangeness where it belongs. If necessary, we'll have add some kind of flag and callback from shmem.c to the driver; but I'd so much prefer to avoid that. Hugh Change-Id: I957e125fb280e0b0d6b05a83cc4068df2f05aa0a --- mm/shmem.c | 39 +-- 1 files changed, 37 insertions(+), 2 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 6c253f7..dcbb58b 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -768,6 +768,31 @@ redirty: return 0; } +/* + * Some intel GPUs can't use those pages in the GTT, which results in + * graphics corruption. Sadly, it's impossible to prevent usage of those + * pages in the intel allocator. + * + * Instead, we test for those areas here and leak the corresponding pages. + * + * Some day, when the intel GPU memory is not backed by shmem any more, + * we'll be able to come up with a solution which is contained in i915. + */ +static bool i915_usable_page(struct page *page) +{ + dma_addr_t addr = page_to_phys(page); + + if (unlikely((addr 1 * 1024 * 1024) || +
Re: [PATCH] MAINTAINERS: switch drm/i915 to Daniel Vetter
On Tue, May 8, 2012 at 12:19 PM, Daniel Vetter daniel.vet...@ffwll.ch wrote: ... given that I essentially run the show already, let's make this official. Acked-by: Dave Airlie airl...@redhat.com probably just push it via your -next. Dave. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] MAINTAINERS: switch drm/i915 to Daniel Vetter
On Wed, May 09, 2012 at 07:51:35AM +0100, Dave Airlie wrote: On Tue, May 8, 2012 at 12:19 PM, Daniel Vetter daniel.vet...@ffwll.ch wrote: ... given that I essentially run the show already, let's make this official. Acked-by: Dave Airlie airl...@redhat.com probably just push it via your -next. Will do. Thanks, Daniel -- Daniel Vetter Mail: dan...@ffwll.ch Mobile: +41 (0)79 365 57 48 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] mm: Work around Intel SNB GTT bug with some physical pages.
On Tue, May 08, 2012 at 02:57:25PM -0700, Hugh Dickins wrote: On Mon, 7 May 2012, Stephane Marchesin wrote: While investing some Sandy Bridge rendering corruption, I found out that all physical memory pages below 1MiB were returning garbage when read through the GTT. This has been causing graphics corruption (when it's used for textures, render targets and pixmaps) and GPU hangups (when it's used for GPU batch buffers). I talked with some people at Intel and they confirmed my findings, and said that a couple of other random pages were also affected. We could fix this problem by adding an e820 region preventing the memory below 1 MiB to be used, but that prevents at least my machine from booting. One could think that we should be able to fix it in i915, but since the allocation is done by the backing shmem this is not possible. In the end, I came up with the ugly workaround of just leaking the offending pages in shmem.c. I do realize it's truly ugly, but I'm looking for a fix to the existing code, and am wondering if people on this list have a better idea, short of rewriting i915_gem.c to allocate its own pages directly. Signed-off-by: Stephane Marchesin marc...@chromium.org Well done for discovering and pursuing this issue, but of course (as you know: you're trying to provoke us to better) your patch is revolting. And not even enough: swapin readahead and swapoff can read back from swap into pages which the i915 will later turn out to dislike. I do have a shmem.c patch coming up for gma500, which cannot use pages over 4GB; but that fits more reasonably with memory allocation policies, where we expect that anyone who can use a high page can use a lower as well, and there's already __GFP_DMA32 to set the limit. Your limitation is at the opposite end, so that patch won't help you at all. And I don't see how Andi's ZONE_DMA exclusion would work, without equal hackery to enable private zonelists, avoiding that convention. i915 is not the only user of shmem, and x86 not the only architecture: we're not going to make everyone suffer for this. Once the memory allocator gets down to giving you the low 1MB, my guess is that it's already short of memory, and liable to deadlock or OOM if you refuse and soak up every page it then gives you. Even if i915 has to live with that possibility, we're not going to extend it to everyone else. arch/x86/Kconfig has X86_RESERVE_LOW, default 64, range 4 640 (and I think we reserve all the memory range from 640kB to 1MB anyway). Would setting that to 640 allow you to boot, and avoid the i915 problem on all but the odd five pages? I'm not pretending that's an ideal solution at all (unless freeing initmem could release most of it on non-SandyBridge and non-i915 machines), but it would be useful to know if that does provide a stopgap solution. If that does work, maybe we just mark the odd five PageReserved at startup. Hm, as a stopgap measure to make Sandybridge gpus not die that sounds pretty good. But we still need a more generic solution for the long-term, see below Is there really no way this can be handled closer to the source of the problem, in the i915 driver itself? I do not know the flow of control in i915 (and X) at all, but on the surface it seems that the problem only comes when you map these problematic pages into the GTT (if I'm even using the right terminology), and something (not shmem.c) actively has to do that. Can't you check the pfn at that point, and if it's an unsuitable page, copy into a suitable page (perhaps allocated then, perhaps from a pool you primed earlier) and map that suitable page into the GTT instead? Maybe using page-private to link them if that helps. So long as the page (or its shadow) is mapped into the GTT, I imagine it would be pinned, and not liable to be swapped out or otherwise interfered with by shmem.c. And when you unmap it from GTT, copy back to the unsuitable shmem object page before unpinning. I fully accept that I have very little understanding of GPU DRM GTT and i915, and this may be impossible or incoherent: but please, let's try to keep the strangeness where it belongs. If necessary, we'll have add some kind of flag and callback from shmem.c to the driver; but I'd so much prefer to avoid that. The copy stuff backforth approach is pretty much what ttm uses atm: It allocates suitable pages with whatever means it has (usually through the dma api) and if the shrinker callback tells it that it's sitting on too much memory, it copies stuff out to the shmem backing storage used by gem. There are quite a few issues with that approach: - We expose mmap to the shmem file directly to userspace in i915. We use these extensively on Sandybridge because there direct cpu access is coherent with what the gpu does. Original userspace would always tell the kernel when it's done writing through cpu mappings so that the
Re: [PATCHv5 08/13] v4l: vb2-dma-contig: add support for scatterlist in userptr mode
Hello Tomasz, Laurent, I have printed some logs during the dmabuf export and attach for the SGT issue below. Please find it in the attachment. I hope it will be useful. Regards, Subash On 05/08/2012 04:45 PM, Subash Patel wrote: Hi Laurent, On 05/08/2012 02:44 PM, Laurent Pinchart wrote: Hi Subash, On Monday 07 May 2012 20:08:25 Subash Patel wrote: Hello Thomasz, Laurent, I found an issue in the function vb2_dc_pages_to_sgt() below. I saw that during the attach, the size of the SGT and size requested mis-matched (by atleast 8k bytes). Hence I made a small correction to the code as below. I could then attach the importer properly. Thank you for the report. Could you print the content of the sglist (number of chunks and size of each chunk) before and after your modifications, as well as the values of n_pages, offset and size ? I will put back all the printk's and generate this. As of now, my setup has changed and will do this when I get sometime. On 04/20/2012 08:15 PM, Tomasz Stanislawski wrote: [snip] +static struct sg_table *vb2_dc_pages_to_sgt(struct page **pages, + unsigned int n_pages, unsigned long offset, unsigned long size) +{ + struct sg_table *sgt; + unsigned int chunks; + unsigned int i; + unsigned int cur_page; + int ret; + struct scatterlist *s; + + sgt = kzalloc(sizeof *sgt, GFP_KERNEL); + if (!sgt) + return ERR_PTR(-ENOMEM); + + /* compute number of chunks */ + chunks = 1; + for (i = 1; i n_pages; ++i) + if (pages[i] != pages[i - 1] + 1) + ++chunks; + + ret = sg_alloc_table(sgt, chunks, GFP_KERNEL); + if (ret) { + kfree(sgt); + return ERR_PTR(-ENOMEM); + } + + /* merging chunks and putting them into the scatterlist */ + cur_page = 0; + for_each_sg(sgt-sgl, s, sgt-orig_nents, i) { + unsigned long chunk_size; + unsigned int j; size = PAGE_SIZE; + + for (j = cur_page + 1; j n_pages; ++j) for (j = cur_page + 1; j n_pages; ++j) { + if (pages[j] != pages[j - 1] + 1) + break; size += PAGE } + + chunk_size = ((j - cur_page) PAGE_SHIFT) - offset; + sg_set_page(s, pages[cur_page], min(size, chunk_size), offset); [DELETE] size -= chunk_size; + offset = 0; + cur_page = j; + } + + return sgt; +} Regards, Subash [ 178.545000] vb2_dc_pages_to_sgt() sgt-orig_nents=2 [ 178.545000] vb2_dc_pages_to_sgt():83 cur_page=0 [ 178.55] vb2_dc_pages_to_sgt():84 offset=0 [ 178.555000] vb2_dc_pages_to_sgt():86 chunk_size=131072 [ 178.56] vb2_dc_pages_to_sgt():89 size=4294836224 [ 178.565000] vb2_dc_pages_to_sgt() sgt-orig_nents=2 [ 178.57] vb2_dc_pages_to_sgt():83 cur_page=32 [ 178.575000] vb2_dc_pages_to_sgt():84 offset=0 [ 178.58] vb2_dc_pages_to_sgt():86 chunk_size=262144 [ 178.585000] vb2_dc_pages_to_sgt():89 size=4294574080 [ 178.59] vb2_dc_pages_to_sgt() sgt-orig_nents=1 [ 178.595000] vb2_dc_pages_to_sgt():83 cur_page=0 [ 178.60] vb2_dc_pages_to_sgt():84 offset=0 [ 178.605000] vb2_dc_pages_to_sgt():86 chunk_size=8192 [ 178.61] vb2_dc_pages_to_sgt():89 size=4294959104 [ 178.625000] vb2_dc_pages_to_sgt() sgt-orig_nents=1 [ 178.625000] vb2_dc_pages_to_sgt():83 cur_page=0 [ 178.63] vb2_dc_pages_to_sgt():84 offset=0 [ 178.635000] vb2_dc_pages_to_sgt():86 chunk_size=131072 [ 178.64] vb2_dc_pages_to_sgt():89 size=4294836224 [ 178.645000] vb2_dc_pages_to_sgt() sgt-orig_nents=1 [ 178.65] vb2_dc_pages_to_sgt():83 cur_page=0 [ 178.655000] vb2_dc_pages_to_sgt():84 offset=0 [ 178.66] vb2_dc_pages_to_sgt():86 chunk_size=131072 [ 178.665000] vb2_dc_pages_to_sgt():89 size=4294836224 [ 178.67] vb2_dc_mmap: mapped dma addr 0x2006 at 0xb6e01000, size 131072 [ 178.67] vb2_dc_mmap: mapped dma addr 0x2008 at 0xb6de1000, size 131072 [ 178.68] vb2_dc_pages_to_sgt() sgt-orig_nents=2 [ 178.685000] vb2_dc_pages_to_sgt():83 cur_page=0 [ 178.69] vb2_dc_pages_to_sgt():84 offset=0 [ 178.695000] vb2_dc_pages_to_sgt():86 chunk_size=4096 [ 178.70] vb2_dc_pages_to_sgt():89 size=4294963200 [ 178.705000] vb2_dc_pages_to_sgt() sgt-orig_nents=2 [ 178.71] vb2_dc_pages_to_sgt():83 cur_page=1 [ 178.715000] vb2_dc_pages_to_sgt():84 offset=0 [ 178.715000] vb2_dc_pages_to_sgt():86 chunk_size=8192 [ 178.72] vb2_dc_pages_to_sgt():89 size=4294955008 [ 178.725000] vb2_dc_pages_to_sgt() sgt-orig_nents=1 [ 178.73] vb2_dc_pages_to_sgt():83 cur_page=0 [ 178.735000] vb2_dc_pages_to_sgt():84 offset=0 [ 178.74] vb2_dc_pages_to_sgt():86 chunk_size=8192 [ 178.745000] vb2_dc_pages_to_sgt():89 size=4294959104 [ 178.75] vb2_dc_pages_to_sgt() sgt-orig_nents=1 [ 178.755000] vb2_dc_pages_to_sgt():83 cur_page=0 [ 178.76] vb2_dc_pages_to_sgt():84 offset=0 [ 178.765000] vb2_dc_pages_to_sgt():86 chunk_size=131072 [ 178.77] vb2_dc_pages_to_sgt():89 size=4294836224 [ 178.78] vb2_dc_pages_to_sgt() sgt-orig_nents=2 [ 178.78] vb2_dc_pages_to_sgt():83 cur_page=0 [ 178.785000] vb2_dc_pages_to_sgt():84 offset=0 [ 178.79] vb2_dc_pages_to_sgt():86 chunk_size=65536 [ 178.795000]
[Bug 49603] [regression] Fullscreen video no longer smooth with GPU in low power mode
https://bugs.freedesktop.org/show_bug.cgi?id=49603 --- Comment #5 from Simon Farnsworth simon.farnswo...@onelan.co.uk 2012-05-09 02:30:15 PDT --- Created attachment 61271 -- https://bugs.freedesktop.org/attachment.cgi?id=61271 A program to stop the CPU entering low power states Trying a different route. Can you compile the attached program with gcc -o stopsleep stopsleep.c and leave it running while playing a video? It tells the kernel to avoid using deep sleep states when idling. If it helps, we have a clue. If it doesn't, and video decode doesn't saturate all your CPU cores, can you try running while true ; do true ; done in a background shell and see if that helps? The goal of both of these is to see if the problem is that we're now letting the CPU idle more than we used to, and finding that the resulting power save modes hurt. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 14/20] drm/radeon: multiple ring allocator v2
On 08.05.2012 16:55, Jerome Glisse wrote: Still i don't want to loop more than necessary, it's bad, i am ok with : http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v3.patch If there is fence signaled it will retry 2 times at most, otherwise it will go to wait and that way better. Because with your while loop the worst case is something proportional to the manager size given it's 1Mo it can loop for a long long time. Yeah, this loop can indeed consume quite some time. Ok then let's at least give every ring a chance to supply some holes, otherwise I fear that we might not even found something worth to wait for after only 2 tries. Going to send that out after figuring out why the patchset still causes texture corruptions on my system. Christian. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCHv5 08/13] v4l: vb2-dma-contig: add support for scatterlist in userptr mode
Hi Subash, Could you post the code of vb2_dc_pages_to_sgt with all printk in it. It will help us avoid guessing where and what is debugged in the log. Moreover, I found a line 'size=4294836224' in the log. It means that size is equal to -131072 (!?!) or there are some invalid conversions in printk. Are you suze that you do not pass size = 0 as the function argument? Notice that earlier versions of dmabuf-for-vb2 patches has offset2 argument instead of size. It was the offset at the end of the buffer. In (I guess) 95% of cases this offset was 0. Could you provide only function arguments that causes the failure? I mean pages array + size (I assume that offset is zero for your test). Having the arguments we could reproduce that bug. Regards, Tomasz Stanislawski On 05/09/2012 08:46 AM, Subash Patel wrote: Hello Tomasz, Laurent, I have printed some logs during the dmabuf export and attach for the SGT issue below. Please find it in the attachment. I hope it will be useful. Regards, Subash On 05/08/2012 04:45 PM, Subash Patel wrote: Hi Laurent, On 05/08/2012 02:44 PM, Laurent Pinchart wrote: Hi Subash, On Monday 07 May 2012 20:08:25 Subash Patel wrote: Hello Thomasz, Laurent, I found an issue in the function vb2_dc_pages_to_sgt() below. I saw that during the attach, the size of the SGT and size requested mis-matched (by atleast 8k bytes). Hence I made a small correction to the code as below. I could then attach the importer properly. Thank you for the report. Could you print the content of the sglist (number of chunks and size of each chunk) before and after your modifications, as well as the values of n_pages, offset and size ? I will put back all the printk's and generate this. As of now, my setup has changed and will do this when I get sometime. On 04/20/2012 08:15 PM, Tomasz Stanislawski wrote: [snip] +static struct sg_table *vb2_dc_pages_to_sgt(struct page **pages, + unsigned int n_pages, unsigned long offset, unsigned long size) +{ + struct sg_table *sgt; + unsigned int chunks; + unsigned int i; + unsigned int cur_page; + int ret; + struct scatterlist *s; + + sgt = kzalloc(sizeof *sgt, GFP_KERNEL); + if (!sgt) + return ERR_PTR(-ENOMEM); + + /* compute number of chunks */ + chunks = 1; + for (i = 1; i n_pages; ++i) + if (pages[i] != pages[i - 1] + 1) + ++chunks; + + ret = sg_alloc_table(sgt, chunks, GFP_KERNEL); + if (ret) { + kfree(sgt); + return ERR_PTR(-ENOMEM); + } + + /* merging chunks and putting them into the scatterlist */ + cur_page = 0; + for_each_sg(sgt-sgl, s, sgt-orig_nents, i) { + unsigned long chunk_size; + unsigned int j; size = PAGE_SIZE; + + for (j = cur_page + 1; j n_pages; ++j) for (j = cur_page + 1; j n_pages; ++j) { + if (pages[j] != pages[j - 1] + 1) + break; size += PAGE } + + chunk_size = ((j - cur_page) PAGE_SHIFT) - offset; + sg_set_page(s, pages[cur_page], min(size, chunk_size), offset); [DELETE] size -= chunk_size; + offset = 0; + cur_page = j; + } + + return sgt; +} Regards, Subash ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 49632] radeon: The kernel rejected CS,
https://bugs.freedesktop.org/show_bug.cgi?id=49632 --- Comment #6 from execute.met...@gmail.com 2012-05-09 03:55:23 PDT --- Ok. I didn't realize that. I have also tried with 3.4rc8, with the same result. Did i miss something in building the kernel? Also, what about projectM without streamout enabled? -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 49632] radeon: The kernel rejected CS,
https://bugs.freedesktop.org/show_bug.cgi?id=49632 --- Comment #7 from execute.met...@gmail.com 2012-05-09 04:15:06 PDT --- (In reply to comment #6) Ok. I didn't realize that. I have also tried with 3.4rc8, with the same result. Did i miss something in building the kernel? Also, what about projectM without streamout enabled? Sorry, mean rc5. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 49632] radeon: The kernel rejected CS,
https://bugs.freedesktop.org/show_bug.cgi?id=49632 --- Comment #8 from Alex Deucher ag...@yahoo.com 2012-05-09 06:14:17 PDT --- (In reply to comment #6) Ok. I didn't realize that. I have also tried with 3.4rc8, with the same result. Did i miss something in building the kernel? Are you getting the same error about forbidden register 0x00028354? That register is in the allowed list for 3.4 so you shouldn't be getting that error. Also, what about projectM without streamout enabled? Is there anything else in dmesg other than *ERROR* Failed to parse relocation -12? -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/radeon: eliminate redundant connector_names table
On Fri, May 4, 2012 at 11:25 AM, Ilija Hadzic ihad...@research.bell-labs.com wrote: connector_names table is just a repeat of information that already exists in drm_connector_enum_list and the same string can be retrieved using drm_get_connector_name function. Nuke the redundant table and use the proper function to retrieve the connector name. Signed-off-by: Ilija Hadzic ihad...@research.bell-labs.com Reviewed-by: Alex Deucher alexander.deuc...@amd.com --- drivers/gpu/drm/radeon/radeon_display.c | 20 +--- 1 files changed, 1 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 8086c96..60aecc5 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -572,24 +572,6 @@ static const char *encoder_names[36] = { TRAVIS, }; -static const char *connector_names[15] = { - Unknown, - VGA, - DVI-I, - DVI-D, - DVI-A, - Composite, - S-video, - LVDS, - Component, - DIN, - DisplayPort, - HDMI-A, - HDMI-B, - TV, - eDP, -}; - static const char *hpd_names[6] = { HPD1, HPD2, @@ -612,7 +594,7 @@ static void radeon_print_display_setup(struct drm_device *dev) list_for_each_entry(connector, dev-mode_config.connector_list, head) { radeon_connector = to_radeon_connector(connector); DRM_INFO(Connector %d:\n, i); - DRM_INFO( %s\n, connector_names[connector-connector_type]); + DRM_INFO( %s\n, drm_get_connector_name(connector)); if (radeon_connector-hpd.hpd != RADEON_HPD_NONE) DRM_INFO( %s\n, hpd_names[radeon_connector-hpd.hpd]); if (radeon_connector-ddc_bus) { -- 1.7.8.5 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Include request for SA improvements
Hi Dave Jerome and everybody on the list, I can't find any more bugs and also I'm out of things to test, so I really hope that this is the last incarnation of this patchset, and if Jerome is ok with it it should now be included into drm-next. Cheers, Christian. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 01/19] drm/radeon: fix possible lack of synchronization btw ttm and other ring
From: Jerome Glisse jgli...@redhat.com We need to sync with the GFX ring as ttm might have schedule bo move on it and new command scheduled for other ring need to wait for bo data to be in place. Signed-off-by: Jerome Glisse jgli...@redhat.com Reviewed by: Christian König christian.koe...@amd.com --- drivers/gpu/drm/radeon/radeon_cs.c | 12 ++-- include/drm/radeon_drm.h |1 - 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index c66beb1..289b0d7 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -122,15 +122,15 @@ static int radeon_cs_sync_rings(struct radeon_cs_parser *p) int i, r; for (i = 0; i p-nrelocs; i++) { + struct radeon_fence *fence; + if (!p-relocs[i].robj || !p-relocs[i].robj-tbo.sync_obj) continue; - if (!(p-relocs[i].flags RADEON_RELOC_DONT_SYNC)) { - struct radeon_fence *fence = p-relocs[i].robj-tbo.sync_obj; - if (fence-ring != p-ring !radeon_fence_signaled(fence)) { - sync_to_ring[fence-ring] = true; - need_sync = true; - } + fence = p-relocs[i].robj-tbo.sync_obj; + if (fence-ring != p-ring !radeon_fence_signaled(fence)) { + sync_to_ring[fence-ring] = true; + need_sync = true; } } diff --git a/include/drm/radeon_drm.h b/include/drm/radeon_drm.h index 7c491b4..5805686 100644 --- a/include/drm/radeon_drm.h +++ b/include/drm/radeon_drm.h @@ -926,7 +926,6 @@ struct drm_radeon_cs_chunk { }; /* drm_radeon_cs_reloc.flags */ -#define RADEON_RELOC_DONT_SYNC 0x01 struct drm_radeon_cs_reloc { uint32_thandle; -- 1.7.9.5 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 02/19] drm/radeon: replace the per ring mutex with a global one
A single global mutex for ring submissions seems sufficient. Signed-off-by: Christian König deathsim...@vodafone.de --- drivers/gpu/drm/radeon/radeon.h |3 ++- drivers/gpu/drm/radeon/radeon_device.c|3 +-- drivers/gpu/drm/radeon/radeon_pm.c| 10 ++- drivers/gpu/drm/radeon/radeon_ring.c | 28 +++ drivers/gpu/drm/radeon/radeon_semaphore.c | 42 + 5 files changed, 41 insertions(+), 45 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 82ffa6a..e99ea81 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -676,7 +676,6 @@ struct radeon_ring { uint64_tgpu_addr; uint32_talign_mask; uint32_tptr_mask; - struct mutexmutex; boolready; u32 ptr_reg_shift; u32 ptr_reg_mask; @@ -815,6 +814,7 @@ int radeon_ring_alloc(struct radeon_device *rdev, struct radeon_ring *cp, unsign int radeon_ring_lock(struct radeon_device *rdev, struct radeon_ring *cp, unsigned ndw); void radeon_ring_commit(struct radeon_device *rdev, struct radeon_ring *cp); void radeon_ring_unlock_commit(struct radeon_device *rdev, struct radeon_ring *cp); +void radeon_ring_undo(struct radeon_ring *ring); void radeon_ring_unlock_undo(struct radeon_device *rdev, struct radeon_ring *cp); int radeon_ring_test(struct radeon_device *rdev, struct radeon_ring *cp); void radeon_ring_force_activity(struct radeon_device *rdev, struct radeon_ring *ring); @@ -1534,6 +1534,7 @@ struct radeon_device { rwlock_tfence_lock; struct radeon_fence_driver fence_drv[RADEON_NUM_RINGS]; struct radeon_semaphore_driver semaphore_drv; + struct mutexring_lock; struct radeon_ring ring[RADEON_NUM_RINGS]; struct radeon_ib_pool ib_pool; struct radeon_irq irq; diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index ff28210..3f6ff2a 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -724,8 +724,7 @@ int radeon_device_init(struct radeon_device *rdev, * can recall function without having locking issues */ radeon_mutex_init(rdev-cs_mutex); radeon_mutex_init(rdev-ib_pool.mutex); - for (i = 0; i RADEON_NUM_RINGS; ++i) - mutex_init(rdev-ring[i].mutex); + mutex_init(rdev-ring_lock); mutex_init(rdev-dc_hw_i2c_mutex); if (rdev-family = CHIP_R600) spin_lock_init(rdev-ih.lock); diff --git a/drivers/gpu/drm/radeon/radeon_pm.c b/drivers/gpu/drm/radeon/radeon_pm.c index caa55d6..7c38745 100644 --- a/drivers/gpu/drm/radeon/radeon_pm.c +++ b/drivers/gpu/drm/radeon/radeon_pm.c @@ -252,10 +252,7 @@ static void radeon_pm_set_clocks(struct radeon_device *rdev) mutex_lock(rdev-ddev-struct_mutex); mutex_lock(rdev-vram_mutex); - for (i = 0; i RADEON_NUM_RINGS; ++i) { - if (rdev-ring[i].ring_obj) - mutex_lock(rdev-ring[i].mutex); - } + mutex_lock(rdev-ring_lock); /* gui idle int has issues on older chips it seems */ if (rdev-family = CHIP_R600) { @@ -311,10 +308,7 @@ static void radeon_pm_set_clocks(struct radeon_device *rdev) rdev-pm.dynpm_planned_action = DYNPM_ACTION_NONE; - for (i = 0; i RADEON_NUM_RINGS; ++i) { - if (rdev-ring[i].ring_obj) - mutex_unlock(rdev-ring[i].mutex); - } + mutex_unlock(rdev-ring_lock); mutex_unlock(rdev-vram_mutex); mutex_unlock(rdev-ddev-struct_mutex); } diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 2eb4c6e..a4d60ae 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -346,9 +346,9 @@ int radeon_ring_alloc(struct radeon_device *rdev, struct radeon_ring *ring, unsi if (ndw ring-ring_free_dw) { break; } - mutex_unlock(ring-mutex); + mutex_unlock(rdev-ring_lock); r = radeon_fence_wait_next(rdev, radeon_ring_index(rdev, ring)); - mutex_lock(ring-mutex); + mutex_lock(rdev-ring_lock); if (r) return r; } @@ -361,10 +361,10 @@ int radeon_ring_lock(struct radeon_device *rdev, struct radeon_ring *ring, unsig { int r; - mutex_lock(ring-mutex); + mutex_lock(rdev-ring_lock); r = radeon_ring_alloc(rdev, ring, ndw); if (r) { - mutex_unlock(ring-mutex); + mutex_unlock(rdev-ring_lock); return r; }
[PATCH 03/19] drm/radeon: convert fence to uint64_t v4
From: Jerome Glisse jgli...@redhat.com This convert fence to use uint64_t sequence number intention is to use the fact that uin64_t is big enough that we don't need to care about wrap around. Tested with and without writeback using 0xF000 as initial fence sequence and thus allowing to test the wrap around from 32bits to 64bits. v2: Add comment about possible race btw CPU GPU, add comment stressing that we need 2 dword aligned for R600_WB_EVENT_OFFSET Read fence sequenc in reverse order of GPU write them so we mitigate the race btw CPU and GPU. v3: Drop the need for ring to emit the 64bits fence, and just have each ring emit the lower 32bits of the fence sequence. We handle the wrap over 32bits in fence_process. v4: Just a small optimization: Don't reread the last_seq value if loop restarts, since we already know its value anyway. Also start at zero not one for seq value and use pre instead of post increment in emmit, otherwise wait_empty will deadlock. Signed-off-by: Jerome Glisse jgli...@redhat.com Signed-off-by: Christian König deathsim...@vodafone.de --- drivers/gpu/drm/radeon/radeon.h | 39 ++- drivers/gpu/drm/radeon/radeon_fence.c | 116 +++-- drivers/gpu/drm/radeon/radeon_ring.c |9 +-- 3 files changed, 107 insertions(+), 57 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index e99ea81..cdf46bc 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -100,28 +100,32 @@ extern int radeon_lockup_timeout; * Copy from radeon_drv.h so we don't have to include both and have conflicting * symbol; */ -#define RADEON_MAX_USEC_TIMEOUT10 /* 100 ms */ -#define RADEON_FENCE_JIFFIES_TIMEOUT (HZ / 2) +#define RADEON_MAX_USEC_TIMEOUT10 /* 100 ms */ +#define RADEON_FENCE_JIFFIES_TIMEOUT (HZ / 2) /* RADEON_IB_POOL_SIZE must be a power of 2 */ -#define RADEON_IB_POOL_SIZE16 -#define RADEON_DEBUGFS_MAX_COMPONENTS 32 -#define RADEONFB_CONN_LIMIT4 -#define RADEON_BIOS_NUM_SCRATCH8 +#define RADEON_IB_POOL_SIZE16 +#define RADEON_DEBUGFS_MAX_COMPONENTS 32 +#define RADEONFB_CONN_LIMIT4 +#define RADEON_BIOS_NUM_SCRATCH8 /* max number of rings */ -#define RADEON_NUM_RINGS 3 +#define RADEON_NUM_RINGS 3 + +/* fence seq are set to this number when signaled */ +#define RADEON_FENCE_SIGNALED_SEQ 0LL +#define RADEON_FENCE_NOTEMITED_SEQ (~0LL) /* internal ring indices */ /* r1xx+ has gfx CP ring */ -#define RADEON_RING_TYPE_GFX_INDEX 0 +#define RADEON_RING_TYPE_GFX_INDEX 0 /* cayman has 2 compute CP rings */ -#define CAYMAN_RING_TYPE_CP1_INDEX 1 -#define CAYMAN_RING_TYPE_CP2_INDEX 2 +#define CAYMAN_RING_TYPE_CP1_INDEX 1 +#define CAYMAN_RING_TYPE_CP2_INDEX 2 /* hardcode those limit for now */ -#define RADEON_VA_RESERVED_SIZE(8 20) -#define RADEON_IB_VM_MAX_SIZE (64 10) +#define RADEON_VA_RESERVED_SIZE(8 20) +#define RADEON_IB_VM_MAX_SIZE (64 10) /* * Errata workarounds. @@ -254,8 +258,9 @@ struct radeon_fence_driver { uint32_tscratch_reg; uint64_tgpu_addr; volatile uint32_t *cpu_addr; - atomic_tseq; - uint32_tlast_seq; + /* seq is protected by ring emission lock */ + uint64_tseq; + atomic64_t last_seq; unsigned long last_activity; wait_queue_head_t queue; struct list_heademitted; @@ -268,11 +273,9 @@ struct radeon_fence { struct kref kref; struct list_headlist; /* protected by radeon_fence.lock */ - uint32_tseq; - boolemitted; - boolsignaled; + uint64_tseq; /* RB, DMA, etc. */ - int ring; + unsignedring; struct radeon_semaphore *semaphore; }; diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 5bb78bf..feb2bbc 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -66,14 +66,14 @@ int radeon_fence_emit(struct radeon_device *rdev, struct radeon_fence *fence) unsigned long irq_flags; write_lock_irqsave(rdev-fence_lock, irq_flags); - if (fence-emitted) { + if (fence-seq fence-seq RADEON_FENCE_NOTEMITED_SEQ) {
[PATCH 06/19] drm/radeon: use inline functions to calc sa_bo addr
Instead of hacking the calculation multiple times. Signed-off-by: Christian König deathsim...@vodafone.de --- drivers/gpu/drm/radeon/radeon_gart.c |6 ++ drivers/gpu/drm/radeon/radeon_object.h| 11 +++ drivers/gpu/drm/radeon/radeon_ring.c |6 ++ drivers/gpu/drm/radeon/radeon_semaphore.c |6 ++ 4 files changed, 17 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index c58a036..4a5d9d4 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -404,10 +404,8 @@ retry: radeon_vm_unbind(rdev, vm_evict); goto retry; } - vm-pt = rdev-vm_manager.sa_manager.cpu_ptr; - vm-pt += (vm-sa_bo.offset 3); - vm-pt_gpu_addr = rdev-vm_manager.sa_manager.gpu_addr; - vm-pt_gpu_addr += vm-sa_bo.offset; + vm-pt = radeon_sa_bo_cpu_addr(vm-sa_bo); + vm-pt_gpu_addr = radeon_sa_bo_gpu_addr(vm-sa_bo); memset(vm-pt, 0, RADEON_GPU_PAGE_ALIGN(vm-last_pfn * 8)); retry_id: diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h index f9104be..c120ab9 100644 --- a/drivers/gpu/drm/radeon/radeon_object.h +++ b/drivers/gpu/drm/radeon/radeon_object.h @@ -146,6 +146,17 @@ extern struct radeon_bo_va *radeon_bo_va(struct radeon_bo *rbo, /* * sub allocation */ + +static inline uint64_t radeon_sa_bo_gpu_addr(struct radeon_sa_bo *sa_bo) +{ + return sa_bo-manager-gpu_addr + sa_bo-offset; +} + +static inline void * radeon_sa_bo_cpu_addr(struct radeon_sa_bo *sa_bo) +{ + return sa_bo-manager-cpu_ptr + sa_bo-offset; +} + extern int radeon_sa_bo_manager_init(struct radeon_device *rdev, struct radeon_sa_manager *sa_manager, unsigned size, u32 domain); diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 2fdc8c3..116be5e 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -127,10 +127,8 @@ retry: size, 256); if (!r) { *ib = rdev-ib_pool.ibs[idx]; - (*ib)-ptr = rdev-ib_pool.sa_manager.cpu_ptr; - (*ib)-ptr += ((*ib)-sa_bo.offset 2); - (*ib)-gpu_addr = rdev-ib_pool.sa_manager.gpu_addr; - (*ib)-gpu_addr += (*ib)-sa_bo.offset; + (*ib)-ptr = radeon_sa_bo_cpu_addr((*ib)-sa_bo); + (*ib)-gpu_addr = radeon_sa_bo_gpu_addr((*ib)-sa_bo); (*ib)-fence = fence; (*ib)-vm_id = 0; (*ib)-is_const_ib = false; diff --git a/drivers/gpu/drm/radeon/radeon_semaphore.c b/drivers/gpu/drm/radeon/radeon_semaphore.c index c5b3d8e..f312ba5 100644 --- a/drivers/gpu/drm/radeon/radeon_semaphore.c +++ b/drivers/gpu/drm/radeon/radeon_semaphore.c @@ -53,10 +53,8 @@ static int radeon_semaphore_add_bo(struct radeon_device *rdev) kfree(bo); return r; } - gpu_addr = rdev-ib_pool.sa_manager.gpu_addr; - gpu_addr += bo-ib-sa_bo.offset; - cpu_ptr = rdev-ib_pool.sa_manager.cpu_ptr; - cpu_ptr += (bo-ib-sa_bo.offset 2); + gpu_addr = radeon_sa_bo_gpu_addr(bo-ib-sa_bo); + cpu_ptr = radeon_sa_bo_cpu_addr(bo-ib-sa_bo); for (i = 0; i (RADEON_SEMAPHORE_BO_SIZE/8); i++) { bo-semaphores[i].gpu_addr = gpu_addr; bo-semaphores[i].cpu_ptr = cpu_ptr; -- 1.7.9.5 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 05/19] drm/radeon: rework locking ring emission mutex in fence deadlock detection v2
Some callers illegal called fence_wait_next/empty while holding the ring emission mutex. So don't relock the mutex in that cases, and move the actual locking into the fence code. v2: Don't try to unlock the mutex if it isn't locked. Signed-off-by: Christian König deathsim...@vodafone.de --- drivers/gpu/drm/radeon/radeon.h|4 +-- drivers/gpu/drm/radeon/radeon_device.c |5 +++- drivers/gpu/drm/radeon/radeon_fence.c | 43 +--- drivers/gpu/drm/radeon/radeon_pm.c |8 +- drivers/gpu/drm/radeon/radeon_ring.c |6 + 5 files changed, 37 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 7c87117..701094b 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -284,8 +284,8 @@ int radeon_fence_emit(struct radeon_device *rdev, struct radeon_fence *fence); void radeon_fence_process(struct radeon_device *rdev, int ring); bool radeon_fence_signaled(struct radeon_fence *fence); int radeon_fence_wait(struct radeon_fence *fence, bool interruptible); -int radeon_fence_wait_next(struct radeon_device *rdev, int ring); -int radeon_fence_wait_empty(struct radeon_device *rdev, int ring); +int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring); +int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring); struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence); void radeon_fence_unref(struct radeon_fence **fence); unsigned radeon_fence_count_emitted(struct radeon_device *rdev, int ring); diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index 0e7b72a..b827b2e 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -912,9 +912,12 @@ int radeon_suspend_kms(struct drm_device *dev, pm_message_t state) } /* evict vram memory */ radeon_bo_evict_vram(rdev); + + mutex_lock(rdev-ring_lock); /* wait for gpu to finish processing current batch */ for (i = 0; i RADEON_NUM_RINGS; i++) - radeon_fence_wait_empty(rdev, i); + radeon_fence_wait_empty_locked(rdev, i); + mutex_unlock(rdev-ring_lock); radeon_save_bios_scratch_regs(rdev); diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index ed20225..098d1fa 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -194,7 +194,7 @@ bool radeon_fence_signaled(struct radeon_fence *fence) } static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq, -unsigned ring, bool intr) +unsigned ring, bool intr, bool lock_ring) { unsigned long timeout, last_activity; uint64_t seq; @@ -249,8 +249,16 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq, if (seq != atomic64_read(rdev-fence_drv[ring].last_seq)) { continue; } + + if (lock_ring) { + mutex_lock(rdev-ring_lock); + } + /* test if somebody else has already decided that this is a lockup */ if (last_activity != rdev-fence_drv[ring].last_activity) { + if (lock_ring) { + mutex_unlock(rdev-ring_lock); + } continue; } @@ -264,15 +272,17 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq, rdev-fence_drv[i].last_activity = jiffies; } - /* change last activity so nobody else think there is a lockup */ - for (i = 0; i RADEON_NUM_RINGS; ++i) { - rdev-fence_drv[i].last_activity = jiffies; - } - /* mark the ring as not ready any more */ rdev-ring[ring].ready = false; + if (lock_ring) { + mutex_unlock(rdev-ring_lock); + } return -EDEADLK; } + + if (lock_ring) { + mutex_unlock(rdev-ring_lock); + } } } return 0; @@ -287,7 +297,8 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr) return -EINVAL; } - r = radeon_fence_wait_seq(fence-rdev, fence-seq, fence-ring, intr); + r = radeon_fence_wait_seq(fence-rdev, fence-seq, +
[PATCH 07/19] drm/radeon: add proper locking to the SA v3
Make the suballocator self containing to locking. v2: split the bugfix into a seperate patch. v3: remove some unreleated changes. Sig-off-by: Christian König deathsim...@vodafone.de --- drivers/gpu/drm/radeon/radeon.h|1 + drivers/gpu/drm/radeon/radeon_sa.c |6 ++ 2 files changed, 7 insertions(+) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 701094b..8a6b1b3 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -381,6 +381,7 @@ struct radeon_bo_list { * alignment). */ struct radeon_sa_manager { + spinlock_t lock; struct radeon_bo*bo; struct list_headsa_bo; unsignedsize; diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c index 8fbfe69..aed0a8c 100644 --- a/drivers/gpu/drm/radeon/radeon_sa.c +++ b/drivers/gpu/drm/radeon/radeon_sa.c @@ -37,6 +37,7 @@ int radeon_sa_bo_manager_init(struct radeon_device *rdev, { int r; + spin_lock_init(sa_manager-lock); sa_manager-bo = NULL; sa_manager-size = size; sa_manager-domain = domain; @@ -139,6 +140,7 @@ int radeon_sa_bo_new(struct radeon_device *rdev, BUG_ON(align RADEON_GPU_PAGE_SIZE); BUG_ON(size sa_manager-size); + spin_lock(sa_manager-lock); /* no one ? */ head = sa_manager-sa_bo.prev; @@ -172,6 +174,7 @@ int radeon_sa_bo_new(struct radeon_device *rdev, offset += wasted; if ((sa_manager-size - offset) size) { /* failed to find somethings big enough */ + spin_unlock(sa_manager-lock); return -ENOMEM; } @@ -180,10 +183,13 @@ out: sa_bo-offset = offset; sa_bo-size = size; list_add(sa_bo-list, head); + spin_unlock(sa_manager-lock); return 0; } void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo *sa_bo) { + spin_lock(sa_bo-manager-lock); list_del_init(sa_bo-list); + spin_unlock(sa_bo-manager-lock); } -- 1.7.9.5 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 09/19] drm/radeon: keep start and end offset in the SA
Instead of offset + size keep start and end offset directly. Signed-off-by: Christian König deathsim...@vodafone.de --- drivers/gpu/drm/radeon/radeon.h|4 ++-- drivers/gpu/drm/radeon/radeon_cs.c |4 ++-- drivers/gpu/drm/radeon/radeon_object.h |4 ++-- drivers/gpu/drm/radeon/radeon_sa.c | 13 +++-- 4 files changed, 13 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 8a6b1b3..d1c2154 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -396,8 +396,8 @@ struct radeon_sa_bo; struct radeon_sa_bo { struct list_headlist; struct radeon_sa_manager*manager; - unsignedoffset; - unsignedsize; + unsignedsoffset; + unsignedeoffset; }; /* diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 289b0d7..b778037 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -477,7 +477,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev, /* ib pool is bind at 0 in virtual address space to gpu_addr is the * offset inside the pool bo */ - parser-const_ib-gpu_addr = parser-const_ib-sa_bo.offset; + parser-const_ib-gpu_addr = parser-const_ib-sa_bo.soffset; r = radeon_ib_schedule(rdev, parser-const_ib); if (r) goto out; @@ -487,7 +487,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev, /* ib pool is bind at 0 in virtual address space to gpu_addr is the * offset inside the pool bo */ - parser-ib-gpu_addr = parser-ib-sa_bo.offset; + parser-ib-gpu_addr = parser-ib-sa_bo.soffset; parser-ib-is_const_ib = false; r = radeon_ib_schedule(rdev, parser-ib); out: diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h index d9fca1e..99ab46a 100644 --- a/drivers/gpu/drm/radeon/radeon_object.h +++ b/drivers/gpu/drm/radeon/radeon_object.h @@ -149,12 +149,12 @@ extern struct radeon_bo_va *radeon_bo_va(struct radeon_bo *rbo, static inline uint64_t radeon_sa_bo_gpu_addr(struct radeon_sa_bo *sa_bo) { - return sa_bo-manager-gpu_addr + sa_bo-offset; + return sa_bo-manager-gpu_addr + sa_bo-soffset; } static inline void * radeon_sa_bo_cpu_addr(struct radeon_sa_bo *sa_bo) { - return sa_bo-manager-cpu_ptr + sa_bo-offset; + return sa_bo-manager-cpu_ptr + sa_bo-soffset; } extern int radeon_sa_bo_manager_init(struct radeon_device *rdev, diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c index 1db0568..3bea7ba 100644 --- a/drivers/gpu/drm/radeon/radeon_sa.c +++ b/drivers/gpu/drm/radeon/radeon_sa.c @@ -152,11 +152,11 @@ int radeon_sa_bo_new(struct radeon_device *rdev, offset = 0; list_for_each_entry(tmp, sa_manager-sa_bo, list) { /* room before this object ? */ - if (offset tmp-offset (tmp-offset - offset) = size) { + if (offset tmp-soffset (tmp-soffset - offset) = size) { head = tmp-list.prev; goto out; } - offset = tmp-offset + tmp-size; + offset = tmp-eoffset; wasted = offset % align; if (wasted) { wasted = align - wasted; @@ -166,7 +166,7 @@ int radeon_sa_bo_new(struct radeon_device *rdev, /* room at the end ? */ head = sa_manager-sa_bo.prev; tmp = list_entry(head, struct radeon_sa_bo, list); - offset = tmp-offset + tmp-size; + offset = tmp-eoffset; wasted = offset % align; if (wasted) { wasted = align - wasted; @@ -180,8 +180,8 @@ int radeon_sa_bo_new(struct radeon_device *rdev, out: sa_bo-manager = sa_manager; - sa_bo-offset = offset; - sa_bo-size = size; + sa_bo-soffset = offset; + sa_bo-eoffset = offset + size; list_add(sa_bo-list, head); spin_unlock(sa_manager-lock); return 0; @@ -202,7 +202,8 @@ void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager, spin_lock(sa_manager-lock); list_for_each_entry(i, sa_manager-sa_bo, list) { - seq_printf(m, offset %08d: size %4d\n, i-offset, i-size); + seq_printf(m, [%08x %08x] size %4d [%p]\n, + i-soffset, i-eoffset, i-eoffset - i-soffset, i); } spin_unlock(sa_manager-lock); } -- 1.7.9.5 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 04/19] drm/radeon: rework fence handling, drop fence list v7
From: Jerome Glisse jgli...@redhat.com Using 64bits fence sequence we can directly compare sequence number to know if a fence is signaled or not. Thus the fence list became useless, so does the fence lock that mainly protected the fence list. Things like ring.ready are no longer behind a lock, this should be ok as ring.ready is initialized once and will only change when facing lockup. Worst case is that we return an -EBUSY just after a successfull GPU reset, or we go into wait state instead of returning -EBUSY (thus delaying reporting -EBUSY to fence wait caller). v2: Remove left over comment, force using writeback on cayman and newer, thus not having to suffer from possibly scratch reg exhaustion v3: Rebase on top of change to uint64 fence patch v4: Change DCE5 test to force write back on cayman and newer but also any APU such as PALM or SUMO family v5: Rebase on top of new uint64 fence patch v6: Just break if seq doesn't change any more. Use radeon_fence prefix for all function names. Even if it's now highly optimized, try avoiding polling to often. v7: We should never poll the last_seq from the hardware without waking the sleeping threads, otherwise we might lose events. Signed-off-by: Jerome Glisse jgli...@redhat.com Signed-off-by: Christian König deathsim...@vodafone.de --- drivers/gpu/drm/radeon/radeon.h|6 +- drivers/gpu/drm/radeon/radeon_device.c |8 +- drivers/gpu/drm/radeon/radeon_fence.c | 299 3 files changed, 119 insertions(+), 194 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index cdf46bc..7c87117 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -263,15 +263,12 @@ struct radeon_fence_driver { atomic64_t last_seq; unsigned long last_activity; wait_queue_head_t queue; - struct list_heademitted; - struct list_headsignaled; boolinitialized; }; struct radeon_fence { struct radeon_device*rdev; struct kref kref; - struct list_headlist; /* protected by radeon_fence.lock */ uint64_tseq; /* RB, DMA, etc. */ @@ -291,7 +288,7 @@ int radeon_fence_wait_next(struct radeon_device *rdev, int ring); int radeon_fence_wait_empty(struct radeon_device *rdev, int ring); struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence); void radeon_fence_unref(struct radeon_fence **fence); -int radeon_fence_count_emitted(struct radeon_device *rdev, int ring); +unsigned radeon_fence_count_emitted(struct radeon_device *rdev, int ring); /* * Tiling registers @@ -1534,7 +1531,6 @@ struct radeon_device { struct radeon_mode_info mode_info; struct radeon_scratch scratch; struct radeon_mman mman; - rwlock_tfence_lock; struct radeon_fence_driver fence_drv[RADEON_NUM_RINGS]; struct radeon_semaphore_driver semaphore_drv; struct mutexring_lock; diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index 3f6ff2a..0e7b72a 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -225,9 +225,9 @@ int radeon_wb_init(struct radeon_device *rdev) /* disable event_write fences */ rdev-wb.use_event = false; /* disabled via module param */ - if (radeon_no_wb == 1) + if (radeon_no_wb == 1) { rdev-wb.enabled = false; - else { + } else { if (rdev-flags RADEON_IS_AGP) { /* often unreliable on AGP */ rdev-wb.enabled = false; @@ -237,8 +237,9 @@ int radeon_wb_init(struct radeon_device *rdev) } else { rdev-wb.enabled = true; /* event_write fences are only available on r600+ */ - if (rdev-family = CHIP_R600) + if (rdev-family = CHIP_R600) { rdev-wb.use_event = true; + } } } /* always use writeback/events on NI, APUs */ @@ -731,7 +732,6 @@ int radeon_device_init(struct radeon_device *rdev, mutex_init(rdev-gem.mutex); mutex_init(rdev-pm.mutex); mutex_init(rdev-vram_mutex); - rwlock_init(rdev-fence_lock); rwlock_init(rdev-semaphore_drv.lock); INIT_LIST_HEAD(rdev-gem.objects); init_waitqueue_head(rdev-irq.vblank_queue); diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index feb2bbc..ed20225 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++
[PATCH 08/19] drm/radeon: add sub allocator debugfs file
Dumping the current allocations. Signed-off-by: Christian König deathsim...@vodafone.de --- drivers/gpu/drm/radeon/radeon_object.h |5 + drivers/gpu/drm/radeon/radeon_ring.c | 22 ++ drivers/gpu/drm/radeon/radeon_sa.c | 14 ++ 3 files changed, 41 insertions(+) diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h index c120ab9..d9fca1e 100644 --- a/drivers/gpu/drm/radeon/radeon_object.h +++ b/drivers/gpu/drm/radeon/radeon_object.h @@ -172,5 +172,10 @@ extern int radeon_sa_bo_new(struct radeon_device *rdev, unsigned size, unsigned align); extern void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo *sa_bo); +#if defined(CONFIG_DEBUG_FS) +extern void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager, +struct seq_file *m); +#endif + #endif diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 116be5e..f49c9c0 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -601,6 +601,23 @@ static int radeon_debugfs_ib_info(struct seq_file *m, void *data) static struct drm_info_list radeon_debugfs_ib_list[RADEON_IB_POOL_SIZE]; static char radeon_debugfs_ib_names[RADEON_IB_POOL_SIZE][32]; static unsigned radeon_debugfs_ib_idx[RADEON_IB_POOL_SIZE]; + +static int radeon_debugfs_sa_info(struct seq_file *m, void *data) +{ + struct drm_info_node *node = (struct drm_info_node *) m-private; + struct drm_device *dev = node-minor-dev; + struct radeon_device *rdev = dev-dev_private; + + radeon_sa_bo_dump_debug_info(rdev-ib_pool.sa_manager, m); + + return 0; + +} + +static struct drm_info_list radeon_debugfs_sa_list[] = { +{radeon_sa_info, radeon_debugfs_sa_info, 0, NULL}, +}; + #endif int radeon_debugfs_ring_init(struct radeon_device *rdev, struct radeon_ring *ring) @@ -627,6 +644,11 @@ int radeon_debugfs_ib_init(struct radeon_device *rdev) { #if defined(CONFIG_DEBUG_FS) unsigned i; + int r; + + r = radeon_debugfs_add_files(rdev, radeon_debugfs_sa_list, 1); + if (r) + return r; for (i = 0; i RADEON_IB_POOL_SIZE; i++) { sprintf(radeon_debugfs_ib_names[i], radeon_ib_%04u, i); diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c index aed0a8c..1db0568 100644 --- a/drivers/gpu/drm/radeon/radeon_sa.c +++ b/drivers/gpu/drm/radeon/radeon_sa.c @@ -193,3 +193,17 @@ void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo *sa_bo) list_del_init(sa_bo-list); spin_unlock(sa_bo-manager-lock); } + +#if defined(CONFIG_DEBUG_FS) +void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager, + struct seq_file *m) +{ + struct radeon_sa_bo *i; + + spin_lock(sa_manager-lock); + list_for_each_entry(i, sa_manager-sa_bo, list) { + seq_printf(m, offset %08d: size %4d\n, i-offset, i-size); + } + spin_unlock(sa_manager-lock); +} +#endif -- 1.7.9.5 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 10/19] drm/radeon: make sa bo a stand alone object
Allocating and freeing it seperately. Signed-off-by: Christian König deathsim...@vodafone.de --- drivers/gpu/drm/radeon/radeon.h |4 ++-- drivers/gpu/drm/radeon/radeon_cs.c|4 ++-- drivers/gpu/drm/radeon/radeon_gart.c |4 ++-- drivers/gpu/drm/radeon/radeon_object.h|4 ++-- drivers/gpu/drm/radeon/radeon_ring.c |6 +++--- drivers/gpu/drm/radeon/radeon_sa.c| 28 +++- drivers/gpu/drm/radeon/radeon_semaphore.c |4 ++-- 7 files changed, 32 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index d1c2154..9374ab1 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -638,7 +638,7 @@ void radeon_irq_kms_pflip_irq_put(struct radeon_device *rdev, int crtc); */ struct radeon_ib { - struct radeon_sa_bo sa_bo; + struct radeon_sa_bo *sa_bo; unsignedidx; uint32_tlength_dw; uint64_tgpu_addr; @@ -693,7 +693,7 @@ struct radeon_vm { unsignedlast_pfn; u64 pt_gpu_addr; u64 *pt; - struct radeon_sa_bo sa_bo; + struct radeon_sa_bo *sa_bo; struct mutexmutex; /* last fence for cs using this vm */ struct radeon_fence *fence; diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index b778037..5c065bf 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -477,7 +477,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev, /* ib pool is bind at 0 in virtual address space to gpu_addr is the * offset inside the pool bo */ - parser-const_ib-gpu_addr = parser-const_ib-sa_bo.soffset; + parser-const_ib-gpu_addr = parser-const_ib-sa_bo-soffset; r = radeon_ib_schedule(rdev, parser-const_ib); if (r) goto out; @@ -487,7 +487,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev, /* ib pool is bind at 0 in virtual address space to gpu_addr is the * offset inside the pool bo */ - parser-ib-gpu_addr = parser-ib-sa_bo.soffset; + parser-ib-gpu_addr = parser-ib-sa_bo-soffset; parser-ib-is_const_ib = false; r = radeon_ib_schedule(rdev, parser-ib); out: diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index 4a5d9d4..c5789ef 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -404,8 +404,8 @@ retry: radeon_vm_unbind(rdev, vm_evict); goto retry; } - vm-pt = radeon_sa_bo_cpu_addr(vm-sa_bo); - vm-pt_gpu_addr = radeon_sa_bo_gpu_addr(vm-sa_bo); + vm-pt = radeon_sa_bo_cpu_addr(vm-sa_bo); + vm-pt_gpu_addr = radeon_sa_bo_gpu_addr(vm-sa_bo); memset(vm-pt, 0, RADEON_GPU_PAGE_ALIGN(vm-last_pfn * 8)); retry_id: diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h index 99ab46a..4fc7f07 100644 --- a/drivers/gpu/drm/radeon/radeon_object.h +++ b/drivers/gpu/drm/radeon/radeon_object.h @@ -168,10 +168,10 @@ extern int radeon_sa_bo_manager_suspend(struct radeon_device *rdev, struct radeon_sa_manager *sa_manager); extern int radeon_sa_bo_new(struct radeon_device *rdev, struct radeon_sa_manager *sa_manager, - struct radeon_sa_bo *sa_bo, + struct radeon_sa_bo **sa_bo, unsigned size, unsigned align); extern void radeon_sa_bo_free(struct radeon_device *rdev, - struct radeon_sa_bo *sa_bo); + struct radeon_sa_bo **sa_bo); #if defined(CONFIG_DEBUG_FS) extern void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager, struct seq_file *m); diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index f49c9c0..45adb37 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -127,8 +127,8 @@ retry: size, 256); if (!r) { *ib = rdev-ib_pool.ibs[idx]; - (*ib)-ptr = radeon_sa_bo_cpu_addr((*ib)-sa_bo); - (*ib)-gpu_addr = radeon_sa_bo_gpu_addr((*ib)-sa_bo); + (*ib)-ptr = radeon_sa_bo_cpu_addr((*ib)-sa_bo); + (*ib)-gpu_addr = radeon_sa_bo_gpu_addr((*ib)-sa_bo);
[PATCH 11/19] drm/radeon: define new SA interface v3
Define the interface without modifying the allocation algorithm in any way. v2: rebase on top of fence new uint64 patch v3: add ring to debugfs output Signed-off-by: Jerome Glisse jgli...@redhat.com Signed-off-by: Christian König deathsim...@vodafone.de --- drivers/gpu/drm/radeon/radeon.h |1 + drivers/gpu/drm/radeon/radeon_gart.c |6 +-- drivers/gpu/drm/radeon/radeon_object.h|5 ++- drivers/gpu/drm/radeon/radeon_ring.c |8 ++-- drivers/gpu/drm/radeon/radeon_sa.c| 60 - drivers/gpu/drm/radeon/radeon_semaphore.c |2 +- 6 files changed, 63 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 9374ab1..ada70d1 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -398,6 +398,7 @@ struct radeon_sa_bo { struct radeon_sa_manager*manager; unsignedsoffset; unsignedeoffset; + struct radeon_fence *fence; }; /* diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index c5789ef..53dba8e 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -326,7 +326,7 @@ static void radeon_vm_unbind_locked(struct radeon_device *rdev, rdev-vm_manager.use_bitmap = ~(1 vm-id); list_del_init(vm-list); vm-id = -1; - radeon_sa_bo_free(rdev, vm-sa_bo); + radeon_sa_bo_free(rdev, vm-sa_bo, NULL); vm-pt = NULL; list_for_each_entry(bo_va, vm-va, vm_list) { @@ -395,7 +395,7 @@ int radeon_vm_bind(struct radeon_device *rdev, struct radeon_vm *vm) retry: r = radeon_sa_bo_new(rdev, rdev-vm_manager.sa_manager, vm-sa_bo, RADEON_GPU_PAGE_ALIGN(vm-last_pfn * 8), -RADEON_GPU_PAGE_SIZE); +RADEON_GPU_PAGE_SIZE, false); if (r) { if (list_empty(rdev-vm_manager.lru_vm)) { return r; @@ -426,7 +426,7 @@ retry_id: /* do hw bind */ r = rdev-vm_manager.funcs-bind(rdev, vm, id); if (r) { - radeon_sa_bo_free(rdev, vm-sa_bo); + radeon_sa_bo_free(rdev, vm-sa_bo, NULL); return r; } rdev-vm_manager.use_bitmap |= 1 id; diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h index 4fc7f07..befec7d 100644 --- a/drivers/gpu/drm/radeon/radeon_object.h +++ b/drivers/gpu/drm/radeon/radeon_object.h @@ -169,9 +169,10 @@ extern int radeon_sa_bo_manager_suspend(struct radeon_device *rdev, extern int radeon_sa_bo_new(struct radeon_device *rdev, struct radeon_sa_manager *sa_manager, struct radeon_sa_bo **sa_bo, - unsigned size, unsigned align); + unsigned size, unsigned align, bool block); extern void radeon_sa_bo_free(struct radeon_device *rdev, - struct radeon_sa_bo **sa_bo); + struct radeon_sa_bo **sa_bo, + struct radeon_fence *fence); #if defined(CONFIG_DEBUG_FS) extern void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager, struct seq_file *m); diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 45adb37..1748d93 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -85,7 +85,7 @@ bool radeon_ib_try_free(struct radeon_device *rdev, struct radeon_ib *ib) if (ib-fence ib-fence-seq RADEON_FENCE_NOTEMITED_SEQ) { if (radeon_fence_signaled(ib-fence)) { radeon_fence_unref(ib-fence); - radeon_sa_bo_free(rdev, ib-sa_bo); + radeon_sa_bo_free(rdev, ib-sa_bo, NULL); done = true; } } @@ -124,7 +124,7 @@ retry: if (rdev-ib_pool.ibs[idx].fence == NULL) { r = radeon_sa_bo_new(rdev, rdev-ib_pool.sa_manager, rdev-ib_pool.ibs[idx].sa_bo, -size, 256); +size, 256, false); if (!r) { *ib = rdev-ib_pool.ibs[idx]; (*ib)-ptr = radeon_sa_bo_cpu_addr((*ib)-sa_bo); @@ -173,7 +173,7 @@ void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib) } radeon_mutex_lock(rdev-ib_pool.mutex); if (tmp-fence tmp-fence-seq == RADEON_FENCE_NOTEMITED_SEQ) { - radeon_sa_bo_free(rdev, tmp-sa_bo); + radeon_sa_bo_free(rdev, tmp-sa_bo, NULL);
[PATCH 12/19] drm/radeon: use one wait queue for all rings add fence_wait_any v2
From: Jerome Glisse jgli...@redhat.com Use one wait queue for all rings. When one ring progress, other likely does to and we are not expecting to have a lot of waiter anyway. Also add a fence_wait_any that will wait until the first fence in the fence array (one fence per ring) is signaled. This allow to wait on all rings. v2: some minor cleanups and improvements. Signed-off-by: Christian König deathsim...@vodafone.de Signed-off-by: Jerome Glisse jgli...@redhat.com --- drivers/gpu/drm/radeon/radeon.h |5 +- drivers/gpu/drm/radeon/radeon_fence.c | 165 +++-- 2 files changed, 163 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index ada70d1..37a7459 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -262,7 +262,6 @@ struct radeon_fence_driver { uint64_tseq; atomic64_t last_seq; unsigned long last_activity; - wait_queue_head_t queue; boolinitialized; }; @@ -286,6 +285,9 @@ bool radeon_fence_signaled(struct radeon_fence *fence); int radeon_fence_wait(struct radeon_fence *fence, bool interruptible); int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring); int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring); +int radeon_fence_wait_any(struct radeon_device *rdev, + struct radeon_fence **fences, + bool intr); struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence); void radeon_fence_unref(struct radeon_fence **fence); unsigned radeon_fence_count_emitted(struct radeon_device *rdev, int ring); @@ -1534,6 +1536,7 @@ struct radeon_device { struct radeon_scratch scratch; struct radeon_mman mman; struct radeon_fence_driver fence_drv[RADEON_NUM_RINGS]; + wait_queue_head_t fence_queue; struct radeon_semaphore_driver semaphore_drv; struct mutexring_lock; struct radeon_ring ring[RADEON_NUM_RINGS]; diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 098d1fa..14dbc28 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -129,7 +129,7 @@ void radeon_fence_process(struct radeon_device *rdev, int ring) if (wake) { rdev-fence_drv[ring].last_activity = jiffies; - wake_up_all(rdev-fence_drv[ring].queue); + wake_up_all(rdev-fence_queue); } } @@ -224,11 +224,11 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq, trace_radeon_fence_wait_begin(rdev-ddev, seq); radeon_irq_kms_sw_irq_get(rdev, ring); if (intr) { - r = wait_event_interruptible_timeout(rdev-fence_drv[ring].queue, + r = wait_event_interruptible_timeout(rdev-fence_queue, (signaled = radeon_fence_seq_signaled(rdev, target_seq, ring)), timeout); } else { - r = wait_event_timeout(rdev-fence_drv[ring].queue, + r = wait_event_timeout(rdev-fence_queue, (signaled = radeon_fence_seq_signaled(rdev, target_seq, ring)), timeout); } @@ -306,6 +306,159 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr) return 0; } +bool radeon_fence_any_seq_signaled(struct radeon_device *rdev, u64 *seq) +{ + unsigned i; + + for (i = 0; i RADEON_NUM_RINGS; ++i) { + if (seq[i] radeon_fence_seq_signaled(rdev, seq[i], i)) { + return true; + } + } + return false; +} + +static int radeon_fence_wait_any_seq(struct radeon_device *rdev, +u64 *target_seq, bool intr) +{ + unsigned long timeout, last_activity, tmp; + unsigned i, ring = RADEON_NUM_RINGS; + bool signaled; + int r; + + for (i = 0, last_activity = 0; i RADEON_NUM_RINGS; ++i) { + if (!target_seq[i]) { + continue; + } + + /* use the most recent one as indicator */ + if (time_after(rdev-fence_drv[i].last_activity, last_activity)) { + last_activity = rdev-fence_drv[i].last_activity; + } + + /* For lockup detection just pick the lowest ring we are +* actively waiting for +*/ + if (i ring) { + ring = i; + } + } + + /* nothing to wait for ? */ + if (ring == RADEON_NUM_RINGS) { +
[PATCH 13/19] drm/radeon: multiple ring allocator v3
A startover with a new idea for a multiple ring allocator. Should perform as well as a normal ring allocator as long as only one ring does somthing, but falls back to a more complex algorithm if more complex things start to happen. We store the last allocated bo in last, we always try to allocate after the last allocated bo. Principle is that in a linear GPU ring progression was is after last is the oldest bo we allocated and thus the first one that should no longer be in use by the GPU. If it's not the case we skip over the bo after last to the closest done bo if such one exist. If none exist and we are not asked to block we report failure to allocate. If we are asked to block we wait on all the oldest fence of all rings. We just wait for any of those fence to complete. v2: We need to be able to let hole point to the list_head, otherwise try free will never free the first allocation of the list. Also stop calling radeon_fence_signalled more than necessary. v3: Don't free allocations without considering them as a hole, otherwise we might lose holes. Also return ENOMEM instead of ENOENT when running out of fences to wait for. Limit the number of holes we try for each ring to 3. Signed-off-by: Christian König deathsim...@vodafone.de Signed-off-by: Jerome Glisse jgli...@redhat.com --- drivers/gpu/drm/radeon/radeon.h |7 +- drivers/gpu/drm/radeon/radeon_ring.c | 19 +-- drivers/gpu/drm/radeon/radeon_sa.c | 312 -- 3 files changed, 231 insertions(+), 107 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 37a7459..cc7f16a 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -385,7 +385,9 @@ struct radeon_bo_list { struct radeon_sa_manager { spinlock_t lock; struct radeon_bo*bo; - struct list_headsa_bo; + struct list_head*hole; + struct list_headflist[RADEON_NUM_RINGS]; + struct list_headolist; unsignedsize; uint64_tgpu_addr; void*cpu_ptr; @@ -396,7 +398,8 @@ struct radeon_sa_bo; /* sub-allocation buffer */ struct radeon_sa_bo { - struct list_headlist; + struct list_headolist; + struct list_headflist; struct radeon_sa_manager*manager; unsignedsoffset; unsignedeoffset; diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 1748d93..e074ff5 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -204,25 +204,22 @@ int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib) int radeon_ib_pool_init(struct radeon_device *rdev) { - struct radeon_sa_manager tmp; int i, r; - r = radeon_sa_bo_manager_init(rdev, tmp, - RADEON_IB_POOL_SIZE*64*1024, - RADEON_GEM_DOMAIN_GTT); - if (r) { - return r; - } - radeon_mutex_lock(rdev-ib_pool.mutex); if (rdev-ib_pool.ready) { radeon_mutex_unlock(rdev-ib_pool.mutex); - radeon_sa_bo_manager_fini(rdev, tmp); return 0; } - rdev-ib_pool.sa_manager = tmp; - INIT_LIST_HEAD(rdev-ib_pool.sa_manager.sa_bo); + r = radeon_sa_bo_manager_init(rdev, rdev-ib_pool.sa_manager, + RADEON_IB_POOL_SIZE*64*1024, + RADEON_GEM_DOMAIN_GTT); + if (r) { + radeon_mutex_unlock(rdev-ib_pool.mutex); + return r; + } + for (i = 0; i RADEON_IB_POOL_SIZE; i++) { rdev-ib_pool.ibs[i].fence = NULL; rdev-ib_pool.ibs[i].idx = i; diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c index 90ee8ad..c3ac7f4 100644 --- a/drivers/gpu/drm/radeon/radeon_sa.c +++ b/drivers/gpu/drm/radeon/radeon_sa.c @@ -27,21 +27,42 @@ * Authors: *Jerome Glisse gli...@freedesktop.org */ +/* Algorithm: + * + * We store the last allocated bo in hole, we always try to allocate + * after the last allocated bo. Principle is that in a linear GPU ring + * progression was is after last is the oldest bo we allocated and thus + * the first one that should no longer be in use by the GPU. + * + * If it's not the case we skip over the bo after last to the closest + * done bo if such one exist. If none exist and we are not asked to + * block we report failure to allocate. + * + * If we are asked to block we wait on all the oldest fence of all + * rings. We just wait for any of those fence to complete. + */ #include drmP.h #include drm.h #include radeon.h +static void
[PATCH 15/19] drm/radeon: rip out the ib pool
From: Jerome Glisse jgli...@redhat.com It isn't necessary any more and the suballocator seems to perform even better. Signed-off-by: Christian König deathsim...@vodafone.de Signed-off-by: Jerome Glisse jgli...@redhat.com --- drivers/gpu/drm/radeon/radeon.h | 17 +- drivers/gpu/drm/radeon/radeon_device.c|1 - drivers/gpu/drm/radeon/radeon_gart.c | 12 +- drivers/gpu/drm/radeon/radeon_ring.c | 241 - drivers/gpu/drm/radeon/radeon_semaphore.c |2 +- 5 files changed, 71 insertions(+), 202 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 45164e1..6170307 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -625,7 +625,6 @@ void radeon_irq_kms_pflip_irq_put(struct radeon_device *rdev, int crtc); struct radeon_ib { struct radeon_sa_bo *sa_bo; - unsignedidx; uint32_tlength_dw; uint64_tgpu_addr; uint32_t*ptr; @@ -634,18 +633,6 @@ struct radeon_ib { boolis_const_ib; }; -/* - * locking - - * mutex protects scheduled_ibs, ready, alloc_bm - */ -struct radeon_ib_pool { - struct radeon_mutex mutex; - struct radeon_sa_managersa_manager; - struct radeon_ibibs[RADEON_IB_POOL_SIZE]; - boolready; - unsignedhead_id; -}; - struct radeon_ring { struct radeon_bo*ring_obj; volatile uint32_t *ring; @@ -787,7 +774,6 @@ struct si_rlc { int radeon_ib_get(struct radeon_device *rdev, int ring, struct radeon_ib **ib, unsigned size); void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib); -bool radeon_ib_try_free(struct radeon_device *rdev, struct radeon_ib *ib); int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib); int radeon_ib_pool_init(struct radeon_device *rdev); void radeon_ib_pool_fini(struct radeon_device *rdev); @@ -1522,7 +1508,8 @@ struct radeon_device { wait_queue_head_t fence_queue; struct mutexring_lock; struct radeon_ring ring[RADEON_NUM_RINGS]; - struct radeon_ib_pool ib_pool; + boolib_pool_ready; + struct radeon_sa_managerring_tmp_bo; struct radeon_irq irq; struct radeon_asic *asic; struct radeon_gem gem; diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index 48876c1..e1bc7e9 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -724,7 +724,6 @@ int radeon_device_init(struct radeon_device *rdev, /* mutex initialization are all done here so we * can recall function without having locking issues */ radeon_mutex_init(rdev-cs_mutex); - radeon_mutex_init(rdev-ib_pool.mutex); mutex_init(rdev-ring_lock); mutex_init(rdev-dc_hw_i2c_mutex); if (rdev-family = CHIP_R600) diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index 53dba8e..8e9ef34 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -432,8 +432,8 @@ retry_id: rdev-vm_manager.use_bitmap |= 1 id; vm-id = id; list_add_tail(vm-list, rdev-vm_manager.lru_vm); - return radeon_vm_bo_update_pte(rdev, vm, rdev-ib_pool.sa_manager.bo, - rdev-ib_pool.sa_manager.bo-tbo.mem); + return radeon_vm_bo_update_pte(rdev, vm, rdev-ring_tmp_bo.bo, + rdev-ring_tmp_bo.bo-tbo.mem); } /* object have to be reserved */ @@ -631,7 +631,7 @@ int radeon_vm_init(struct radeon_device *rdev, struct radeon_vm *vm) /* map the ib pool buffer at 0 in virtual address space, set * read only */ - r = radeon_vm_bo_add(rdev, vm, rdev-ib_pool.sa_manager.bo, 0, + r = radeon_vm_bo_add(rdev, vm, rdev-ring_tmp_bo.bo, 0, RADEON_VM_PAGE_READABLE | RADEON_VM_PAGE_SNOOPED); return r; } @@ -648,12 +648,12 @@ void radeon_vm_fini(struct radeon_device *rdev, struct radeon_vm *vm) radeon_mutex_unlock(rdev-cs_mutex); /* remove all bo */ - r = radeon_bo_reserve(rdev-ib_pool.sa_manager.bo, false); + r = radeon_bo_reserve(rdev-ring_tmp_bo.bo, false); if (!r) { - bo_va = radeon_bo_va(rdev-ib_pool.sa_manager.bo, vm); + bo_va = radeon_bo_va(rdev-ring_tmp_bo.bo, vm); list_del_init(bo_va-bo_list); list_del_init(bo_va-vm_list); - radeon_bo_unreserve(rdev-ib_pool.sa_manager.bo); +
[PATCH 14/19] drm/radeon: simplify semaphore handling v2
From: Jerome Glisse jgli...@redhat.com Directly use the suballocator to get small chunks of memory. It's equally fast and doesn't crash when we encounter a GPU reset. v2: rebased on new SA interface. Signed-off-by: Christian König deathsim...@vodafone.de Signed-off-by: Jerome Glisse jgli...@redhat.com --- drivers/gpu/drm/radeon/evergreen.c|1 - drivers/gpu/drm/radeon/ni.c |1 - drivers/gpu/drm/radeon/r600.c |1 - drivers/gpu/drm/radeon/radeon.h | 29 +- drivers/gpu/drm/radeon/radeon_device.c|2 - drivers/gpu/drm/radeon/radeon_fence.c |2 +- drivers/gpu/drm/radeon/radeon_semaphore.c | 137 + drivers/gpu/drm/radeon/radeon_test.c |4 +- drivers/gpu/drm/radeon/rv770.c|1 - drivers/gpu/drm/radeon/si.c |1 - 10 files changed, 30 insertions(+), 149 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index ecc29bc..7e7ac3d 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -3550,7 +3550,6 @@ void evergreen_fini(struct radeon_device *rdev) evergreen_pcie_gart_fini(rdev); r600_vram_scratch_fini(rdev); radeon_gem_fini(rdev); - radeon_semaphore_driver_fini(rdev); radeon_fence_driver_fini(rdev); radeon_agp_fini(rdev); radeon_bo_fini(rdev); diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c index 9cd2657..107b217 100644 --- a/drivers/gpu/drm/radeon/ni.c +++ b/drivers/gpu/drm/radeon/ni.c @@ -1744,7 +1744,6 @@ void cayman_fini(struct radeon_device *rdev) cayman_pcie_gart_fini(rdev); r600_vram_scratch_fini(rdev); radeon_gem_fini(rdev); - radeon_semaphore_driver_fini(rdev); radeon_fence_driver_fini(rdev); radeon_bo_fini(rdev); radeon_atombios_fini(rdev); diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index d02f13f..478b51e 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -2658,7 +2658,6 @@ void r600_fini(struct radeon_device *rdev) r600_vram_scratch_fini(rdev); radeon_agp_fini(rdev); radeon_gem_fini(rdev); - radeon_semaphore_driver_fini(rdev); radeon_fence_driver_fini(rdev); radeon_bo_fini(rdev); radeon_atombios_fini(rdev); diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index cc7f16a..45164e1 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -434,34 +434,13 @@ int radeon_mode_dumb_destroy(struct drm_file *file_priv, /* * Semaphores. */ -struct radeon_ring; - -#defineRADEON_SEMAPHORE_BO_SIZE256 - -struct radeon_semaphore_driver { - rwlock_tlock; - struct list_headbo; -}; - -struct radeon_semaphore_bo; - /* everything here is constant */ struct radeon_semaphore { - struct list_headlist; + struct radeon_sa_bo *sa_bo; + signed waiters; uint64_tgpu_addr; - uint32_t*cpu_ptr; - struct radeon_semaphore_bo *bo; }; -struct radeon_semaphore_bo { - struct list_headlist; - struct radeon_ib*ib; - struct list_headfree; - struct radeon_semaphore semaphores[RADEON_SEMAPHORE_BO_SIZE/8]; - unsignednused; -}; - -void radeon_semaphore_driver_fini(struct radeon_device *rdev); int radeon_semaphore_create(struct radeon_device *rdev, struct radeon_semaphore **semaphore); void radeon_semaphore_emit_signal(struct radeon_device *rdev, int ring, @@ -473,7 +452,8 @@ int radeon_semaphore_sync_rings(struct radeon_device *rdev, bool sync_to[RADEON_NUM_RINGS], int dst_ring); void radeon_semaphore_free(struct radeon_device *rdev, - struct radeon_semaphore *semaphore); + struct radeon_semaphore *semaphore, + struct radeon_fence *fence); /* * GART structures, functions helpers @@ -1540,7 +1520,6 @@ struct radeon_device { struct radeon_mman mman; struct radeon_fence_driver fence_drv[RADEON_NUM_RINGS]; wait_queue_head_t fence_queue; - struct radeon_semaphore_driver semaphore_drv; struct mutexring_lock; struct radeon_ring ring[RADEON_NUM_RINGS]; struct radeon_ib_pool ib_pool; diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index b827b2e..48876c1 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@
[PATCH 16/19] drm/radeon: immediately free ttm-move semaphore
We can now protected the semaphore ram by a fence, so free it immediately. Signed-off-by: Christian König deathsim...@vodafone.de --- drivers/gpu/drm/radeon/radeon_ttm.c |7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 5e3d54d..0f6aee8 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -223,6 +223,7 @@ static int radeon_move_blit(struct ttm_buffer_object *bo, struct radeon_device *rdev; uint64_t old_start, new_start; struct radeon_fence *fence, *old_fence; + struct radeon_semaphore *sem = NULL; int r; rdev = radeon_get_rdev(bo-bdev); @@ -272,15 +273,16 @@ static int radeon_move_blit(struct ttm_buffer_object *bo, bool sync_to_ring[RADEON_NUM_RINGS] = { }; sync_to_ring[old_fence-ring] = true; - r = radeon_semaphore_create(rdev, fence-semaphore); + r = radeon_semaphore_create(rdev, sem); if (r) { radeon_fence_unref(fence); return r; } - r = radeon_semaphore_sync_rings(rdev, fence-semaphore, + r = radeon_semaphore_sync_rings(rdev, sem, sync_to_ring, fence-ring); if (r) { + radeon_semaphore_free(rdev, sem, NULL); radeon_fence_unref(fence); return r; } @@ -292,6 +294,7 @@ static int radeon_move_blit(struct ttm_buffer_object *bo, /* FIXME: handle copy error */ r = ttm_bo_move_accel_cleanup(bo, (void *)fence, NULL, evict, no_wait_reserve, no_wait_gpu, new_mem); + radeon_semaphore_free(rdev, sem, fence); radeon_fence_unref(fence); return r; } -- 1.7.9.5 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 17/19] drm/radeon: move the semaphore from the fence into the ib
From: Jerome Glisse jgli...@redhat.com It never really belonged there in the first place. Signed-off-by: Christian König deathsim...@vodafone.de --- drivers/gpu/drm/radeon/radeon.h | 16 drivers/gpu/drm/radeon/radeon_cs.c|4 ++-- drivers/gpu/drm/radeon/radeon_fence.c |3 --- drivers/gpu/drm/radeon/radeon_ring.c |2 ++ 4 files changed, 12 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 6170307..9507be0 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -272,7 +272,6 @@ struct radeon_fence { uint64_tseq; /* RB, DMA, etc. */ unsignedring; - struct radeon_semaphore *semaphore; }; int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring); @@ -624,13 +623,14 @@ void radeon_irq_kms_pflip_irq_put(struct radeon_device *rdev, int crtc); */ struct radeon_ib { - struct radeon_sa_bo *sa_bo; - uint32_tlength_dw; - uint64_tgpu_addr; - uint32_t*ptr; - struct radeon_fence *fence; - unsignedvm_id; - boolis_const_ib; + struct radeon_sa_bo *sa_bo; + uint32_tlength_dw; + uint64_tgpu_addr; + uint32_t*ptr; + struct radeon_fence *fence; + unsignedvm_id; + boolis_const_ib; + struct radeon_semaphore *semaphore; }; struct radeon_ring { diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 5c065bf..dcfe2a0 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -138,12 +138,12 @@ static int radeon_cs_sync_rings(struct radeon_cs_parser *p) return 0; } - r = radeon_semaphore_create(p-rdev, p-ib-fence-semaphore); + r = radeon_semaphore_create(p-rdev, p-ib-semaphore); if (r) { return r; } - return radeon_semaphore_sync_rings(p-rdev, p-ib-fence-semaphore, + return radeon_semaphore_sync_rings(p-rdev, p-ib-semaphore, sync_to_ring, p-ring); } diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 3a49311..48ec5e3 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -139,8 +139,6 @@ static void radeon_fence_destroy(struct kref *kref) fence = container_of(kref, struct radeon_fence, kref); fence-seq = RADEON_FENCE_NOTEMITED_SEQ; - if (fence-semaphore) - radeon_semaphore_free(fence-rdev, fence-semaphore, NULL); kfree(fence); } @@ -156,7 +154,6 @@ int radeon_fence_create(struct radeon_device *rdev, (*fence)-rdev = rdev; (*fence)-seq = RADEON_FENCE_NOTEMITED_SEQ; (*fence)-ring = ring; - (*fence)-semaphore = NULL; return 0; } diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index b3d6942..af8e1ee 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -93,6 +93,7 @@ int radeon_ib_get(struct radeon_device *rdev, int ring, (*ib)-gpu_addr = radeon_sa_bo_gpu_addr((*ib)-sa_bo); (*ib)-vm_id = 0; (*ib)-is_const_ib = false; + (*ib)-semaphore = NULL; return 0; } @@ -105,6 +106,7 @@ void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib) if (tmp == NULL) { return; } + radeon_semaphore_free(rdev, tmp-semaphore, tmp-fence); radeon_sa_bo_free(rdev, tmp-sa_bo, tmp-fence); radeon_fence_unref(tmp-fence); kfree(tmp); -- 1.7.9.5 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 18/19] drm/radeon: remove r600 blit mutex v2
If we don't store local data into global variables it isn't necessary to lock anything. v2: rebased on new SA interface Signed-off-by: Christian König deathsim...@vodafone.de --- drivers/gpu/drm/radeon/evergreen_blit_kms.c |1 - drivers/gpu/drm/radeon/r600.c | 13 ++-- drivers/gpu/drm/radeon/r600_blit_kms.c | 99 +++ drivers/gpu/drm/radeon/radeon.h |3 - drivers/gpu/drm/radeon/radeon_asic.h|9 ++- 5 files changed, 50 insertions(+), 75 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen_blit_kms.c b/drivers/gpu/drm/radeon/evergreen_blit_kms.c index 222acd2..30f0480 100644 --- a/drivers/gpu/drm/radeon/evergreen_blit_kms.c +++ b/drivers/gpu/drm/radeon/evergreen_blit_kms.c @@ -637,7 +637,6 @@ int evergreen_blit_init(struct radeon_device *rdev) if (rdev-r600_blit.shader_obj) goto done; - mutex_init(rdev-r600_blit.mutex); rdev-r600_blit.state_offset = 0; if (rdev-family CHIP_CAYMAN) diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index 478b51e..00b2238 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -2363,20 +2363,15 @@ int r600_copy_blit(struct radeon_device *rdev, unsigned num_gpu_pages, struct radeon_fence *fence) { + struct radeon_sa_bo *vb = NULL; int r; - mutex_lock(rdev-r600_blit.mutex); - rdev-r600_blit.vb_ib = NULL; - r = r600_blit_prepare_copy(rdev, num_gpu_pages); + r = r600_blit_prepare_copy(rdev, num_gpu_pages, vb); if (r) { - if (rdev-r600_blit.vb_ib) - radeon_ib_free(rdev, rdev-r600_blit.vb_ib); - mutex_unlock(rdev-r600_blit.mutex); return r; } - r600_kms_blit_copy(rdev, src_offset, dst_offset, num_gpu_pages); - r600_blit_done_copy(rdev, fence); - mutex_unlock(rdev-r600_blit.mutex); + r600_kms_blit_copy(rdev, src_offset, dst_offset, num_gpu_pages, vb); + r600_blit_done_copy(rdev, fence, vb); return 0; } diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c b/drivers/gpu/drm/radeon/r600_blit_kms.c index db38f58..ef20822 100644 --- a/drivers/gpu/drm/radeon/r600_blit_kms.c +++ b/drivers/gpu/drm/radeon/r600_blit_kms.c @@ -513,7 +513,6 @@ int r600_blit_init(struct radeon_device *rdev) rdev-r600_blit.primitives.set_default_state = set_default_state; rdev-r600_blit.ring_size_common = 40; /* shaders + def state */ - rdev-r600_blit.ring_size_common += 16; /* fence emit for VB IB */ rdev-r600_blit.ring_size_common += 5; /* done copy */ rdev-r600_blit.ring_size_common += 16; /* fence emit for done copy */ @@ -528,7 +527,6 @@ int r600_blit_init(struct radeon_device *rdev) if (rdev-r600_blit.shader_obj) goto done; - mutex_init(rdev-r600_blit.mutex); rdev-r600_blit.state_offset = 0; if (rdev-family = CHIP_RV770) @@ -621,27 +619,6 @@ void r600_blit_fini(struct radeon_device *rdev) radeon_bo_unref(rdev-r600_blit.shader_obj); } -static int r600_vb_ib_get(struct radeon_device *rdev, unsigned size) -{ - int r; - r = radeon_ib_get(rdev, RADEON_RING_TYPE_GFX_INDEX, - rdev-r600_blit.vb_ib, size); - if (r) { - DRM_ERROR(failed to get IB for vertex buffer\n); - return r; - } - - rdev-r600_blit.vb_total = size; - rdev-r600_blit.vb_used = 0; - return 0; -} - -static void r600_vb_ib_put(struct radeon_device *rdev) -{ - radeon_fence_emit(rdev, rdev-r600_blit.vb_ib-fence); - radeon_ib_free(rdev, rdev-r600_blit.vb_ib); -} - static unsigned r600_blit_create_rect(unsigned num_gpu_pages, int *width, int *height, int max_dim) { @@ -688,7 +665,8 @@ static unsigned r600_blit_create_rect(unsigned num_gpu_pages, } -int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_gpu_pages) +int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_gpu_pages, + struct radeon_sa_bo **vb) { struct radeon_ring *ring = rdev-ring[RADEON_RING_TYPE_GFX_INDEX]; int r; @@ -705,46 +683,54 @@ int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_gpu_pages) } /* 48 bytes for vertex per loop */ - r = r600_vb_ib_get(rdev, (num_loops*48)+256); - if (r) + r = radeon_sa_bo_new(rdev, rdev-ring_tmp_bo, vb, +(num_loops*48)+256, 256, true); + if (r) { return r; + } /* calculate number of loops correctly */ ring_size = num_loops * dwords_per_loop; ring_size += rdev-r600_blit.ring_size_common; r = radeon_ring_lock(rdev, ring, ring_size); - if (r) + if (r) { + radeon_sa_bo_free(rdev, vb, NULL);
[PATCH 19/19] drm/radeon: make the ib an inline object
From: Jerome Glisse jgli...@redhat.com No need to malloc it any more. Signed-off-by: Jerome Glisse jgli...@redhat.com Signed-off-by: Christian König deathsim...@vodafone.de --- drivers/gpu/drm/radeon/evergreen_cs.c | 10 +++--- drivers/gpu/drm/radeon/r100.c | 38 ++-- drivers/gpu/drm/radeon/r200.c |2 +- drivers/gpu/drm/radeon/r300.c |4 +-- drivers/gpu/drm/radeon/r600.c | 16 - drivers/gpu/drm/radeon/r600_cs.c | 22 ++-- drivers/gpu/drm/radeon/radeon.h |8 ++--- drivers/gpu/drm/radeon/radeon_cs.c| 63 - drivers/gpu/drm/radeon/radeon_ring.c | 41 +++-- 9 files changed, 93 insertions(+), 111 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c b/drivers/gpu/drm/radeon/evergreen_cs.c index 70089d3..4e7dd2b 100644 --- a/drivers/gpu/drm/radeon/evergreen_cs.c +++ b/drivers/gpu/drm/radeon/evergreen_cs.c @@ -1057,7 +1057,7 @@ static int evergreen_cs_packet_parse_vline(struct radeon_cs_parser *p) uint32_t header, h_idx, reg, wait_reg_mem_info; volatile uint32_t *ib; - ib = p-ib-ptr; + ib = p-ib.ptr; /* parse the WAIT_REG_MEM */ r = evergreen_cs_packet_parse(p, wait_reg_mem, p-idx); @@ -1215,7 +1215,7 @@ static int evergreen_cs_check_reg(struct radeon_cs_parser *p, u32 reg, u32 idx) if (!(evergreen_reg_safe_bm[i] m)) return 0; } - ib = p-ib-ptr; + ib = p-ib.ptr; switch (reg) { /* force following reg to 0 in an attempt to disable out buffer * which will need us to better understand how it works to perform @@ -1896,7 +1896,7 @@ static int evergreen_packet3_check(struct radeon_cs_parser *p, u32 idx_value; track = (struct evergreen_cs_track *)p-track; - ib = p-ib-ptr; + ib = p-ib.ptr; idx = pkt-idx + 1; idx_value = radeon_get_ib_value(p, idx); @@ -2610,8 +2610,8 @@ int evergreen_cs_parse(struct radeon_cs_parser *p) } } while (p-idx p-chunks[p-chunk_ib_idx].length_dw); #if 0 - for (r = 0; r p-ib-length_dw; r++) { - printk(KERN_INFO %05d 0x%08X\n, r, p-ib-ptr[r]); + for (r = 0; r p-ib.length_dw; r++) { + printk(KERN_INFO %05d 0x%08X\n, r, p-ib.ptr[r]); mdelay(1); } #endif diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index ad6ceb7..0874a6d 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -139,9 +139,9 @@ int r100_reloc_pitch_offset(struct radeon_cs_parser *p, } tmp |= tile_flags; - p-ib-ptr[idx] = (value 0x3fc0) | tmp; + p-ib.ptr[idx] = (value 0x3fc0) | tmp; } else - p-ib-ptr[idx] = (value 0xffc0) | tmp; + p-ib.ptr[idx] = (value 0xffc0) | tmp; return 0; } @@ -156,7 +156,7 @@ int r100_packet3_load_vbpntr(struct radeon_cs_parser *p, volatile uint32_t *ib; u32 idx_value; - ib = p-ib-ptr; + ib = p-ib.ptr; track = (struct r100_cs_track *)p-track; c = radeon_get_ib_value(p, idx++) 0x1F; if (c 16) { @@ -1275,7 +1275,7 @@ void r100_cs_dump_packet(struct radeon_cs_parser *p, unsigned i; unsigned idx; - ib = p-ib-ptr; + ib = p-ib.ptr; idx = pkt-idx; for (i = 0; i = (pkt-count + 1); i++, idx++) { DRM_INFO(ib[%d]=0x%08X\n, idx, ib[idx]); @@ -1354,7 +1354,7 @@ int r100_cs_packet_parse_vline(struct radeon_cs_parser *p) uint32_t header, h_idx, reg; volatile uint32_t *ib; - ib = p-ib-ptr; + ib = p-ib.ptr; /* parse the wait until */ r = r100_cs_packet_parse(p, waitreloc, p-idx); @@ -1533,7 +1533,7 @@ static int r100_packet0_check(struct radeon_cs_parser *p, u32 tile_flags = 0; u32 idx_value; - ib = p-ib-ptr; + ib = p-ib.ptr; track = (struct r100_cs_track *)p-track; idx_value = radeon_get_ib_value(p, idx); @@ -1889,7 +1889,7 @@ static int r100_packet3_check(struct radeon_cs_parser *p, volatile uint32_t *ib; int r; - ib = p-ib-ptr; + ib = p-ib.ptr; idx = pkt-idx + 1; track = (struct r100_cs_track *)p-track; switch (pkt-opcode) { @@ -3684,7 +3684,7 @@ void r100_ring_ib_execute(struct radeon_device *rdev, struct radeon_ib *ib) int r100_ib_test(struct radeon_device *rdev, struct radeon_ring *ring) { - struct radeon_ib *ib; + struct radeon_ib ib; uint32_t scratch; uint32_t tmp = 0; unsigned i; @@ -3700,22 +3700,22 @@ int r100_ib_test(struct radeon_device *rdev, struct radeon_ring *ring) if (r) { return r; } - ib-ptr[0] = PACKET0(scratch, 0); - ib-ptr[1] = 0xDEADBEEF; -
[Bug 49632] radeon: The kernel rejected CS,
https://bugs.freedesktop.org/show_bug.cgi?id=49632 --- Comment #9 from execute.met...@gmail.com 2012-05-09 06:36:57 PDT --- No, there is nothing else in dmesg. Is there any more info you'd like me to gather? -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/radeon: don't mess with hot plug detect for eDP or LVDS connector v2
On Fri, May 4, 2012 at 11:06 AM, j.gli...@gmail.com wrote: From: Jerome Glisse jgli...@redhat.com It seems imac pannel doesn't like whe we change the hot plug setup and then refuse to work. This help but doesn't fully fix: https://bugzilla.redhat.com/show_bug.cgi?id=726143 How does it help? Does it fix the the aux problems, but the monitor still doesn't train? What's the working value of the relevant DC_HPD*_CONTROL register? Alex v2: fix typo and improve commit message Signed-off-by: Matthew Garrett m...@redhat.com Signed-off-by: Jerome Glisse jgli...@redhat.com --- drivers/gpu/drm/radeon/r600.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index 694b6b2..a304c9d 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -713,6 +713,14 @@ void r600_hpd_init(struct radeon_device *rdev) list_for_each_entry(connector, dev-mode_config.connector_list, head) { struct radeon_connector *radeon_connector = to_radeon_connector(connector); + if (connector-connector_type == DRM_MODE_CONNECTOR_eDP || + connector-connector_type == DRM_MODE_CONNECTOR_LVDS) { + /* don't try to enable hpd on eDP or LVDS avoid breaking the + * aux dp channel on imac and help (but not completely fix) + * https://bugzilla.redhat.com/show_bug.cgi?id=726143 + */ + continue; + } if (ASIC_IS_DCE3(rdev)) { u32 tmp = DC_HPDx_CONNECTION_TIMER(0x9c4) | DC_HPDx_RX_INT_TIMER(0xfa); if (ASIC_IS_DCE32(rdev)) -- 1.7.9.3 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/radeon: don't mess with hot plug detect for eDP or LVDS connector v2
On Wed, May 9, 2012 at 9:40 AM, Alex Deucher alexdeuc...@gmail.com wrote: On Fri, May 4, 2012 at 11:06 AM, j.gli...@gmail.com wrote: From: Jerome Glisse jgli...@redhat.com It seems imac pannel doesn't like whe we change the hot plug setup and then refuse to work. This help but doesn't fully fix: https://bugzilla.redhat.com/show_bug.cgi?id=726143 How does it help? Does it fix the the aux problems, but the monitor still doesn't train? What's the working value of the relevant DC_HPD*_CONTROL register? Alex Don't have the hw but somehow the way we program this reg completely disable the panel, after that the panel doesn't answer to anything (nor i2c nor any aux transaction). Without programming that link training is successfull but panel stays black. I can ask to get the value before and after. Cheers, Jerome ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Include request for SA improvements
On Wed, May 9, 2012 at 9:34 AM, Christian König deathsim...@vodafone.de wrote: Hi Dave Jerome and everybody on the list, I can't find any more bugs and also I'm out of things to test, so I really hope that this is the last incarnation of this patchset, and if Jerome is ok with it it should now be included into drm-next. Cheers, Christian. Yeah looks good to me. Cheers, Jerome ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/radeon/kms: fix warning on 32-bit in atomic fence printing
From: Dave Airlie airl...@redhat.com /ssd/git/drm-core-next/drivers/gpu/drm/radeon/radeon_fence.c: In function ‘radeon_debugfs_fence_info’: /ssd/git/drm-core-next/drivers/gpu/drm/radeon/radeon_fence.c:606:7: warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘long long int’ [-Wformat] Signed-off-by: Dave Airlie airl...@redhat.com --- drivers/gpu/drm/radeon/radeon_fence.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 48ec5e3..11f5f40 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -602,8 +602,8 @@ static int radeon_debugfs_fence_info(struct seq_file *m, void *data) continue; seq_printf(m, --- ring %d ---\n, i); - seq_printf(m, Last signaled fence 0x%016lx\n, - atomic64_read(rdev-fence_drv[i].last_seq)); + seq_printf(m, Last signaled fence 0x%016llx\n, + (unsigned long long)atomic64_read(rdev-fence_drv[i].last_seq)); seq_printf(m, Last emitted 0x%016llx\n, rdev-fence_drv[i].seq); } -- 1.7.7.6 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/radeon/kms: fix warning on 32-bit in atomic fence printing
On Wed, May 9, 2012 at 12:28 PM, Dave Airlie airl...@gmail.com wrote: From: Dave Airlie airl...@redhat.com /ssd/git/drm-core-next/drivers/gpu/drm/radeon/radeon_fence.c: In function ‘radeon_debugfs_fence_info’: /ssd/git/drm-core-next/drivers/gpu/drm/radeon/radeon_fence.c:606:7: warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘long long int’ [-Wformat] Signed-off-by: Dave Airlie airl...@redhat.com Reviewed-by: Jerome Glisse jgli...@redhat.com --- drivers/gpu/drm/radeon/radeon_fence.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 48ec5e3..11f5f40 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -602,8 +602,8 @@ static int radeon_debugfs_fence_info(struct seq_file *m, void *data) continue; seq_printf(m, --- ring %d ---\n, i); - seq_printf(m, Last signaled fence 0x%016lx\n, - atomic64_read(rdev-fence_drv[i].last_seq)); + seq_printf(m, Last signaled fence 0x%016llx\n, + (unsigned long long)atomic64_read(rdev-fence_drv[i].last_seq)); seq_printf(m, Last emitted 0x%016llx\n, rdev-fence_drv[i].seq); } -- 1.7.7.6 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/radeon: don't mess with hot plug detect for eDP or LVDS connector v2
On Wed, May 9, 2012 at 10:23 AM, Jerome Glisse j.gli...@gmail.com wrote: On Wed, May 9, 2012 at 9:40 AM, Alex Deucher alexdeuc...@gmail.com wrote: On Fri, May 4, 2012 at 11:06 AM, j.gli...@gmail.com wrote: From: Jerome Glisse jgli...@redhat.com It seems imac pannel doesn't like whe we change the hot plug setup and then refuse to work. This help but doesn't fully fix: https://bugzilla.redhat.com/show_bug.cgi?id=726143 How does it help? Does it fix the the aux problems, but the monitor still doesn't train? What's the working value of the relevant DC_HPD*_CONTROL register? Alex Don't have the hw but somehow the way we program this reg completely disable the panel, after that the panel doesn't answer to anything (nor i2c nor any aux transaction). Without programming that link training is successfull but panel stays black. I can ask to get the value before and after. Patch seems reasonable in general (we don't really need hpd to be explicitly enabled for lvds or edp) so: Reviewed-by: Alex Deucher alexander.deuc...@amd.com Cheers, Jerome ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 43215] New: Nouveau: Resume from s2disk fails.
https://bugzilla.kernel.org/show_bug.cgi?id=43215 Summary: Nouveau: Resume from s2disk fails. Product: Drivers Version: 2.5 Kernel Version: 3.3.5, 3.4-rc4+, 3.4-rc6+ Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Video(DRI - non Intel) AssignedTo: drivers_video-...@kernel-bugs.osdl.org ReportedBy: harn-s...@gmx.de Regression: No Created an attachment (id=73220) -- (https://bugzilla.kernel.org/attachment.cgi?id=73220) dmesg output of a cold boot - suspend to disk resume cycle When sending the PC into suspend to disk mode via echo disk /sys/power/state or using pm-suspend the resume process fails with the following messages: [ 61.782648] [drm] nouveau :01:00.0: timeout: URSOR_CTRL2_STATUS_ACTIVE(0) [ 61.782709] [drm] nouveau :01:00.0: CURSOR_CTRL2(0) = 0x This bug is reproducible on several 3.3.x and 3.4-rc kernels. I did not test it on prior kernels. Suspend to RAM works fine. The card used here is a nvce, Gainward 560 Ti, 2G video ram. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Include request for SA improvements
On Wed, May 9, 2012 at 3:31 PM, Jerome Glisse j.gli...@gmail.com wrote: On Wed, May 9, 2012 at 9:34 AM, Christian König deathsim...@vodafone.de wrote: Hi Dave Jerome and everybody on the list, I can't find any more bugs and also I'm out of things to test, so I really hope that this is the last incarnation of this patchset, and if Jerome is ok with it it should now be included into drm-next. Cheers, Christian. Yeah looks good to me. All pushed into -next + the warning fix on top. Thanks guys, Dave. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 2/2 v3] drm/exynos: added userptr feature.
On Wed, May 9, 2012 at 10:45 AM, Jerome Glisse j.gli...@gmail.com wrote: On Wed, May 9, 2012 at 2:17 AM, Inki Dae inki@samsung.com wrote: this feature is used to import user space region allocated by malloc() or mmaped into a gem. and to guarantee the pages to user space not to be swapped out, the VMAs within the user space would be locked and then unlocked when the pages are released. but this lock might result in significant degradation of system performance because the pages couldn't be swapped out so we limit user-desired userptr size to pre-defined. Signed-off-by: Inki Dae inki@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com Again i would like feedback from mm people (adding cc). I am not sure locking the vma is the right anwser as i said in my previous mail, userspace can munlock it in your back, maybe VM_RESERVED is better. Anyway even not considering that you don't check at all that process don't go over the limit of locked page see mm/mlock.c RLIMIT_MEMLOCK for how it's done. Also you mlock complete vma but the userptr you get might be inside say 16M vma and you only care about 1M of userptr, if you mark the whole vma as locked than anytime a new page is fault in the vma else where than in the buffer you are interested then it got allocated for ever until the gem buffer is destroy, i am not sure of what happen to the vma on next malloc if it grows or not (i would think it won't grow at it would have different flags than new anonymous memory). The whole business of directly using malloced memory for gpu is fishy and i would really like to get it right rather than relying on never hitting strange things like page migration, vma merging, or worse things like over locking pages and stealing memory. Cheers, Jerome I had a lengthy discussion with mm people (thx a lot for that). I think we should split 2 different use case. The zero-copy upload case ie : app: ptr = malloc() ... glTex/VBO/UBO/...(ptr) free(ptr) or reuse it for other things For which i guess you want to avoid having to do a memcpy inside the gl library (could be anything else than gl that have same useage pattern). ie after the upload happen you don't care about those page they can removed from the vma or marked as cow so that anything messing with those page after the upload won't change what you uploaded. Of course this is assuming that the tlb cost of doing such thing is smaller than the cost of memcpy the data. Two way to do that, either you assume app can't not read back data after gl can and you do an unmap_mapping_range (make sure you only unmap fully covered page and that you copy non fully covered page) or you want to allow userspace to still read data or possibly overwrite them Second use case is something more like for the opencl case of CL_MEM_USE_HOST_PTR, in which you want to use the same page in the gpu and keep the userspace vma pointing to those page. I think the agreement on this case is that there is no way right now to do it sanely inside linux kernel. mlocking will need proper accounting against rtlimit but this limit might be low. Also the fork case might be problematic. For the fork case the memory is anonymous so it should be COWed in the fork child but relative to cl context that means the child could not use the cl context with that memory or at least if the child write to this memory the cl will not see those change. I guess the answer to that one is that you really need to use the cl api to read the object or get proper ptr to read it. Anyway in all case, implementing this userptr thing need a lot more code. You have to check to that the vma you are trying to use is anonymous and only handle this case and fallback to alloc new page and copy otherwise.. Cheers, Jerome ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 43858] DVI of ATI RADEON 9200 AGP don't work
https://bugs.freedesktop.org/show_bug.cgi?id=43858 Alex Deucher ag...@yahoo.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution||NOTABUG --- Comment #27 from Alex Deucher ag...@yahoo.com 2012-05-09 12:00:09 PDT --- The DVI issue is fixed. Please open a new bug if you are still having gfx problems. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
In Diablo III when the ground is covered in water, it's all messed-up.
I found a rendering error in Diablo III, in a well there ground is covered in water and it's all red and flickers. http://c566453.r53.cf2.rackcdn.com/DiabloIII-well-wine-preloader.trace.xz 3.5G, 234665296 Bytes compressed ~234M. EE r600_shader.c:1605 r600_shader_from_tgsi - GPR limit exceeded - shader requires 181 registers EE r600_shader.c:140 r600_pipe_shader_create - translation from TGSI failed ! Rendered 621 frames in 80.724 secs, average of 7.69288 fps ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: [PATCH 2/2 v3] drm/exynos: added userptr feature.
Hi Jerome, -Original Message- From: Jerome Glisse [mailto:j.gli...@gmail.com] Sent: Wednesday, May 09, 2012 11:46 PM To: Inki Dae Cc: airl...@linux.ie; dri-devel@lists.freedesktop.org; kyungmin.p...@samsung.com; sw0312@samsung.com; linux...@kvack.org Subject: Re: [PATCH 2/2 v3] drm/exynos: added userptr feature. On Wed, May 9, 2012 at 2:17 AM, Inki Dae inki@samsung.com wrote: this feature is used to import user space region allocated by malloc() or mmaped into a gem. and to guarantee the pages to user space not to be swapped out, the VMAs within the user space would be locked and then unlocked when the pages are released. but this lock might result in significant degradation of system performance because the pages couldn't be swapped out so we limit user-desired userptr size to pre-defined. Signed-off-by: Inki Dae inki@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com Again i would like feedback from mm people (adding cc). I am not sure Thank you, I missed adding mm as cc. locking the vma is the right anwser as i said in my previous mail, userspace can munlock it in your back, maybe VM_RESERVED is better. I know that with VM_RESERVED flag, also we can avoid the pages from being swapped out. but these pages should be unlocked anytime we want because we could allocate all pages on system and lock them, which in turn, it may result in significant deterioration of system performance.(maybe other processes requesting free memory would be blocked) so I used VM_LOCKED flags instead. but I'm not sure this way is best also. Anyway even not considering that you don't check at all that process don't go over the limit of locked page see mm/mlock.c RLIMIT_MEMLOCK Thank you for your advices. for how it's done. Also you mlock complete vma but the userptr you get might be inside say 16M vma and you only care about 1M of userptr, if you mark the whole vma as locked than anytime a new page is fault in the vma else where than in the buffer you are interested then it got allocated for ever until the gem buffer is destroy, i am not sure of what happen to the vma on next malloc if it grows or not (i would think it won't grow at it would have different flags than new anonymous memory). The whole business of directly using malloced memory for gpu is fishy and i would really like to get it right rather than relying on never hitting strange things like page migration, vma merging, or worse things like over locking pages and stealing memory. Your comments are very helpful to me and I will consider some cases I missed and you pointed out for next patch. Thanks, Inki Dae Cheers, Jerome --- drivers/gpu/drm/exynos/exynos_drm_drv.c | 2 + drivers/gpu/drm/exynos/exynos_drm_gem.c | 334 +++ drivers/gpu/drm/exynos/exynos_drm_gem.h | 17 ++- include/drm/exynos_drm.h | 26 +++- 4 files changed, 376 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c b/drivers/gpu/drm/exynos/exynos_drm_drv.c index 1e68ec2..e8ae3f1 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_drv.c +++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c @@ -220,6 +220,8 @@ static struct drm_ioctl_desc exynos_ioctls[] = { DRM_IOCTL_DEF_DRV(EXYNOS_GEM_MAP_OFFSET, exynos_drm_gem_map_offset_ioctl, DRM_UNLOCKED | DRM_AUTH), + DRM_IOCTL_DEF_DRV(EXYNOS_GEM_USERPTR, + exynos_drm_gem_userptr_ioctl, DRM_UNLOCKED), DRM_IOCTL_DEF_DRV(EXYNOS_GEM_MMAP, exynos_drm_gem_mmap_ioctl, DRM_UNLOCKED | DRM_AUTH), DRM_IOCTL_DEF_DRV(EXYNOS_GEM_GET, diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c b/drivers/gpu/drm/exynos/exynos_drm_gem.c index e6abb66..ccc6e3d 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_gem.c +++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c @@ -68,6 +68,83 @@ static int check_gem_flags(unsigned int flags) return 0; } +static struct vm_area_struct *get_vma(struct vm_area_struct *vma) +{ + struct vm_area_struct *vma_copy; + + vma_copy = kmalloc(sizeof(*vma_copy), GFP_KERNEL); + if (!vma_copy) + return NULL; + + if (vma-vm_ops vma-vm_ops-open) + vma-vm_ops-open(vma); + + if (vma-vm_file) + get_file(vma-vm_file); + + memcpy(vma_copy, vma, sizeof(*vma)); + + vma_copy-vm_mm = NULL; + vma_copy-vm_next = NULL; + vma_copy-vm_prev = NULL; + + return vma_copy; +} + + +static void put_vma(struct vm_area_struct *vma) +{ + if (!vma) + return; + + if (vma-vm_ops vma-vm_ops-close) + vma-vm_ops-close(vma); + + if (vma-vm_file) + fput(vma-vm_file); + + kfree(vma); +} + +/* + *
RE: [PATCH 2/2 v3] drm/exynos: added userptr feature.
Hi Jerome, Thank you again. -Original Message- From: Jerome Glisse [mailto:j.gli...@gmail.com] Sent: Thursday, May 10, 2012 3:33 AM To: Inki Dae Cc: airl...@linux.ie; dri-devel@lists.freedesktop.org; kyungmin.p...@samsung.com; sw0312@samsung.com; linux...@kvack.org Subject: Re: [PATCH 2/2 v3] drm/exynos: added userptr feature. On Wed, May 9, 2012 at 10:45 AM, Jerome Glisse j.gli...@gmail.com wrote: On Wed, May 9, 2012 at 2:17 AM, Inki Dae inki@samsung.com wrote: this feature is used to import user space region allocated by malloc() or mmaped into a gem. and to guarantee the pages to user space not to be swapped out, the VMAs within the user space would be locked and then unlocked when the pages are released. but this lock might result in significant degradation of system performance because the pages couldn't be swapped out so we limit user-desired userptr size to pre-defined. Signed-off-by: Inki Dae inki@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com Again i would like feedback from mm people (adding cc). I am not sure locking the vma is the right anwser as i said in my previous mail, userspace can munlock it in your back, maybe VM_RESERVED is better. Anyway even not considering that you don't check at all that process don't go over the limit of locked page see mm/mlock.c RLIMIT_MEMLOCK for how it's done. Also you mlock complete vma but the userptr you get might be inside say 16M vma and you only care about 1M of userptr, if you mark the whole vma as locked than anytime a new page is fault in the vma else where than in the buffer you are interested then it got allocated for ever until the gem buffer is destroy, i am not sure of what happen to the vma on next malloc if it grows or not (i would think it won't grow at it would have different flags than new anonymous memory). The whole business of directly using malloced memory for gpu is fishy and i would really like to get it right rather than relying on never hitting strange things like page migration, vma merging, or worse things like over locking pages and stealing memory. Cheers, Jerome I had a lengthy discussion with mm people (thx a lot for that). I think we should split 2 different use case. The zero-copy upload case ie : app: ptr = malloc() ... glTex/VBO/UBO/...(ptr) free(ptr) or reuse it for other things For which i guess you want to avoid having to do a memcpy inside the gl library (could be anything else than gl that have same useage pattern). Right, in this case, we are using the userptr feature as pixman and evas backend to use 2d accelerator. ie after the upload happen you don't care about those page they can removed from the vma or marked as cow so that anything messing with those page after the upload won't change what you uploaded. Of course I'm not sure that I understood your mentions but could the pages be removed from vma with VM_LOCKED or VM_RESERVED? once glTex/VBO/UBO/..., the VMAs to user space would be locked. if cpu accessed significant part of all the pages in user mode then pages to the part would be allocated by page fault handler, after that, through userptr, the VMAs to user address space would be locked(at this time, the remaining pages would be allocated also by get_user_pages by calling page fault handler) I'd be glad to give me any comments and advices if there is my missing point. this is assuming that the tlb cost of doing such thing is smaller than the cost of memcpy the data. yes, in our test case, the tlb cost(incurred by tlb miss) was smaller than the cost of memcpy also cpu usage. of course, this would be depended on gpu performance. Two way to do that, either you assume app can't not read back data after gl can and you do an unmap_mapping_range (make sure you only unmap fully covered page and that you copy non fully covered page) or you want to allow userspace to still read data or possibly overwrite them Second use case is something more like for the opencl case of CL_MEM_USE_HOST_PTR, in which you want to use the same page in the gpu and keep the userspace vma pointing to those page. I think the agreement on this case is that there is no way right now to do it sanely inside linux kernel. mlocking will need proper accounting against rtlimit but this limit might be low. Also the fork case might be problematic. For the fork case the memory is anonymous so it should be COWed in the fork child but relative to cl context that means the child could not use the cl context with that memory or at least if the child write to this memory the cl will not see those change. I guess the answer to that one is that you really need to use the cl api to read the object or get proper ptr to read it. Anyway in all case, implementing this userptr thing need a lot more code. You have to check to that the vma you are trying to use is anonymous
Re: [PATCH 2/2 v3] drm/exynos: added userptr feature.
On 05/10/2012 10:39 AM, Inki Dae wrote: Hi Jerome, -Original Message- From: Jerome Glisse [mailto:j.gli...@gmail.com] Sent: Wednesday, May 09, 2012 11:46 PM To: Inki Dae Cc: airl...@linux.ie; dri-devel@lists.freedesktop.org; kyungmin.p...@samsung.com; sw0312@samsung.com; linux...@kvack.org Subject: Re: [PATCH 2/2 v3] drm/exynos: added userptr feature. On Wed, May 9, 2012 at 2:17 AM, Inki Dae inki@samsung.com wrote: this feature is used to import user space region allocated by malloc() or mmaped into a gem. and to guarantee the pages to user space not to be swapped out, the VMAs within the user space would be locked and then unlocked when the pages are released. but this lock might result in significant degradation of system performance because the pages couldn't be swapped out so we limit user-desired userptr size to pre-defined. Signed-off-by: Inki Dae inki@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com Again i would like feedback from mm people (adding cc). I am not sure Thank you, I missed adding mm as cc. locking the vma is the right anwser as i said in my previous mail, userspace can munlock it in your back, maybe VM_RESERVED is better. I know that with VM_RESERVED flag, also we can avoid the pages from being swapped out. but these pages should be unlocked anytime we want because we could allocate all pages on system and lock them, which in turn, it may result in significant deterioration of system performance.(maybe other processes requesting free memory would be blocked) so I used VM_LOCKED flags instead. but I'm not sure this way is best also. Anyway even not considering that you don't check at all that process don't go over the limit of locked page see mm/mlock.c RLIMIT_MEMLOCK Thank you for your advices. for how it's done. Also you mlock complete vma but the userptr you get might be inside say 16M vma and you only care about 1M of userptr, if you mark the whole vma as locked than anytime a new page is fault in the vma else where than in the buffer you are interested then it got allocated for ever until the gem buffer is destroy, i am not sure of what happen to the vma on next malloc if it grows or not (i would think it won't grow at it would have different flags than new anonymous memory). I don't know history in detail because you didn't have sent full patches to linux-mm and I didn't read the below code, either. Just read your description and reply of Jerome. Apparently, there is something I missed. Your goal is to avoid swap out some user pages which is used in kernel at the same time. Right? Let's use get_user_pages. Is there any issue you can't use it? It increases page count so reclaimer can't swap out page. Isn't it enough? Marking whole VMA into MLCOKED is overkill. -- Kind regards, Minchan Kim ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel