Re: general protection fault on ttm_init()
On Sat, Jul 07, 2012 at 11:31:42PM +0800, Fengguang Wu wrote: > On Sat, Jul 07, 2012 at 10:08:47AM +0800, Fengguang Wu wrote: > > On Fri, Jul 06, 2012 at 06:09:20PM +0100, Dave Airlie wrote: > > > On Fri, Jul 6, 2012 at 5:49 PM, Dave Airlie wrote: > > > > On Fri, Jul 6, 2012 at 3:48 PM, Fengguang Wu > > > > wrote: > > > >> ... The missed kconfig. > > > >> > > > >> On Fri, Jul 06, 2012 at 10:46:22PM +0800, Fengguang Wu wrote: > > > >>> Hi Thomas, > > > > > > > > Wierd, I'm sorta tempted to just depend drm on CONFIG_PROC_FS, but it > > > > looks like the error path is failing to dtrt. > > > > > > I've attached a patch that should fix it, let me know if it works. > > > > It does not work.. The dmesg (attached) remains the same. > > I got more interesting back traces in a clean kernel: Another trace shows that ttm_init tries to register with an empty name: [2.919061] WARNING: at /c/kernel-tests/tip/lib/kobject.c:166 kobject_add_internal+0x1a3/0x210() [2.917489] device: 'ttm': device_add [2.918179] [ cut here ] [2.919061] WARNING: at /c/kernel-tests/tip/lib/kobject.c:166 kobject_add_internal+0x1a3/0x210() ==>[2.920704] kobject: (8826ecc0): attempted to be registered with empty name! [2.922129] Pid: 1, comm: swapper Not tainted 3.5.0-rc2+ #28 [2.923172] Call Trace: [2.923638] [] ? kobject_add_internal+0x1a3/0x210 [2.924827] [] warn_slowpath_common+0x66/0x90 [2.925993] [] ? drm_core_init+0xca/0xca [2.927028] [] warn_slowpath_fmt+0x41/0x50 [2.928093] [] kobject_add_internal+0x1a3/0x210 [2.929261] [] ? drm_core_init+0xca/0xca [2.930327] [] ? drm_core_init+0xca/0xca [2.931473] [] kobject_add+0x67/0xc0 [2.932589] [] ? get_device_parent+0x118/0x1b7 [2.933790] [] get_device_parent+0x161/0x1b7 [2.934895] [] device_add+0x151/0x5f0 [2.935907] [] ? drm_core_init+0xca/0xca [2.936940] [] ? __raw_spin_lock_init+0x38/0x70 [2.938099] [] ? drm_core_init+0xca/0xca [2.939132] [] device_register+0x19/0x20 [2.940254] [] drm_class_device_register+0x17/0x20 [2.941437] [] ttm_init+0x37/0x62 [2.942360] [] do_one_initcall+0x78/0x136 [2.943413] [] kernel_init+0x122/0x1a6 [2.944415] [] ? loglevel+0x31/0x31 [2.945402] [] kernel_thread_helper+0x4/0x10 [2.946506] [] ? retint_restore_args+0x13/0x13 [2.947635] [] ? do_one_initcall+0x136/0x136 [2.948739] [] ? gs_change+0x13/0x13 Thanks, Fengguang > device class 'drm': registering > kobject: 'drm' (88000f07f050): kobject_add_internal: parent: 'class', > set: 'class' > kobject: 'drm' (88000f07f050): kobject_uevent_env > kobject: 'drm' (88000f07f050): fill_kobj_path: path = '/class/drm' > [drm:drm_core_init] *ERROR* Cannot create /proc/dri > device class 'drm': unregistering > kobject: 'drm' (88000f07f050): kobject_cleanup > kobject: 'drm' (88000f07f050): auto cleanup 'remove' event > kobject: 'drm' (88000f07f050): kobject_uevent_env > kobject: 'drm' (88000f07f050): fill_kobj_path: path = '/class/drm' > kobject: 'drm' (88000f07f050): auto cleanup kobject_del > kobject: 'drm' (88000f07f050): calling ktype release > class 'drm': release. > class_create_release called for drm > kobject: 'drm': free name > kobject: 'drm' (88000f080070): kobject_cleanup > kobject: 'drm' (88000f080070): calling ktype release > kobject: 'drm': free name > device: 'ttm': device_add > kobject: '(null)' (88000f080230): kobject_add_internal: parent: > 'virtual', set: '(null)' > kobject: 'ttm' (824709b0): kobject_add_internal: parent: '(null)', > set: 'devices' > general protection fault: [#1] SMP > CPU 1 > Pid: 1, comm: swapper/0 Not tainted 3.5.0-rc5-bisect #207 > RIP: 0010:[] [] > sysfs_do_create_link+0x59/0x1c0 > RSP: 0018:88107db0 EFLAGS: 00010206 > RAX: 8810 RBX: 00cc RCX: dad9 > RDX: d9d9 RSI: RDI: 8243b320 > RBP: 88107e00 R08: 88100580 R09: fe80 > R10: 8810 R11: 0200 R12: 821622db > R13: 88000f080150 R14: 0001 R15: 88000f080308 > FS: () GS:88000df0() knlGS: > CS: 0010 DS: ES: CR0: 8005003b > CR2: CR3: 02411000 CR4: 06a0 > DR0: DR1: DR2: > DR3: DR6: 0ff0 DR7: 0400 > Process swapper/0 (pid: 1, threadinfo 88106000, task 8810) > Stack: > 88000f080308 824709b0 02ec > 824709b
3.5-rc5: radeon acceleration regression on Transmeta system
On Mon, 09 Jul 2012 14:30:40 +0300, Meelis Roos said: > It's actually more complicated than that. Old kernel images started > misbehaving from around 2.6.35-rc5 and any kernel older than that was > OK. When I recompiled the older kernels with squeeze gcc (migh have been > lenny gcc before, or different answers to make oldconfig), anything from > current git down to 2.6.33 is broken with radeon.modeset=1 and works (I What releases of GCC were those? I'm chasing an issue where compiling with 4.7.[01] breaks but 4.6.2 is OK, wondering if we're chasing the same thing. -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 865 bytes Desc: not available URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20120710/a7e5b04a/attachment.pgp>
[PATCH 15/15] drm/radeon: implement ring saving on reset v2
On Die, 2012-07-10 at 14:51 +0200, Christian K?nig wrote: > Try to save whatever is on the rings when > we encounter an lockup. > > v2: Fix spelling error. Free saved ring data if reset fails. > Add documentation for the new functions. > > Signed-off-by: Christian K?nig Just some more spelling nits, otherwise this patch and patch 13 are Reviewed-by: Michel D?nzer > +/** > + * radeon_ring_backup - Backup the content of a ring > + * > + * @rdev: radeon_device pointer > + * @ring: the ring we want to backup 'back up', in both cases. > + * Saves all unprocessed commits to a ring, returns the number of dwords > saved. > + */ 'unprocessed commands from'? > +/** > + * radeon_ring_restore - append saved commands to the ring again > + * > + * @rdev: radeon_device pointer > + * @ring: ring to append commands to > + * @size: number of dwords we want to write > + * @data: saved commands > + * > + * Allocates space on the ring and restore the previusly saved commands. 'previously' -- Earthling Michel D?nzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer
Mesa shader compiling/optimizing process is too slow
Presumably there needs to be a api-level mechanism to wait for the background optimization to finish, so that piglit etc can validate the behavior of the optimized shader? -- Chris On Tue, Jul 10, 2012 at 5:17 AM, Eric Anholt wrote: > Tiziano Bacocco writes: > >> I've done benchmarks and comparison between proprietary drivers and >> Mesa, Mesa seems to be up to 200x slower compiling the same shader, >> since i understand optimizing such part of code may take months or even >> more, i have thought to solve it this way: >> >> Upon calling glLinkProgram , an unoptimized version of the shader ( >> compiles much much faster ) is uploaded to the GPU >> Then a separate thread is launched that will optimize the shader and as >> soon it is done, on the next call to glUseProgram it will upload >> optimized version in place of unoptimized one. >> >> This will solve many performance issues and temporary freezes with games >> that load/unload content while running, while not reducing performance >> once the background optimization is done > > Yeah, we've thought of this, and it would take some work. Sounds like a > fun project for someone. > > ___ > dri-devel mailing list > dri-devel at lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel >
Re: 3.5-rc5: radeon acceleration regression on Transmeta system
On Mon, 09 Jul 2012 14:30:40 +0300, Meelis Roos said: > It's actually more complicated than that. Old kernel images started > misbehaving from around 2.6.35-rc5 and any kernel older than that was > OK. When I recompiled the older kernels with squeeze gcc (migh have been > lenny gcc before, or different answers to make oldconfig), anything from > current git down to 2.6.33 is broken with radeon.modeset=1 and works (I What releases of GCC were those? I'm chasing an issue where compiling with 4.7.[01] breaks but 4.6.2 is OK, wondering if we're chasing the same thing. pgpky9saflsmH.pgp Description: PGP signature ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 3/3] drm/exynos: implement kmap/kunmap/kmap_atomic/kunmap_atomic functions of dma_buf_ops
Implement kmap/kmap_atomic, kunmap/kunmap_atomic functions of dma_buf_ops. Signed-off-by: Cooper Yuan --- drivers/gpu/drm/exynos/exynos_drm_dmabuf.c | 17 +++-- 1 files changed, 11 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c index 913a23e..805b344 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c +++ b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c @@ -138,30 +138,35 @@ static void exynos_dmabuf_release(struct dma_buf *dmabuf) static void *exynos_gem_dmabuf_kmap_atomic(struct dma_buf *dma_buf, unsigned long page_num) { - /* TODO */ + struct exynos_drm_gem_obj *exynos_gem_obj = dma_buf->priv; + struct exynos_drm_gem_buf *buf = exynos_gem_obj->buffer; - return NULL; + return kmap_atomic(buf->pages[page_num]); } static void exynos_gem_dmabuf_kunmap_atomic(struct dma_buf *dma_buf, unsigned long page_num, void *addr) { - /* TODO */ + kunmap_atomic(addr); } static void *exynos_gem_dmabuf_kmap(struct dma_buf *dma_buf, unsigned long page_num) { - /* TODO */ + struct exynos_drm_gem_obj *exynos_gem_obj = dma_buf->priv; + struct exynos_drm_gem_buf *buf = exynos_gem_obj->buffer; - return NULL; + return kmap(buf->pages[page_num]); } static void exynos_gem_dmabuf_kunmap(struct dma_buf *dma_buf, unsigned long page_num, void *addr) { - /* TODO */ + struct exynos_drm_gem_obj *exynos_gem_obj = dma_buf->priv; + struct exynos_drm_gem_buf *buf = exynos_gem_obj->buffer; + + kunmap(buf->pages[page_num]); } static int exynos_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma) -- 1.7.0.4
[PATCH 2/3] drm/exynos: add dmabuf mmap function
implement mmap function of dma_buf_ops. Signed-off-by: Cooper Yuan --- drivers/gpu/drm/exynos/exynos_drm_dmabuf.c | 38 1 files changed, 38 insertions(+), 0 deletions(-) diff --git a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c index e4eeb0b..913a23e 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c +++ b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c @@ -164,6 +164,43 @@ static void exynos_gem_dmabuf_kunmap(struct dma_buf *dma_buf, /* TODO */ } +static int exynos_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma) +{ + struct exynos_drm_gem_obj *exynos_gem_obj = dma_buf->priv; + struct drm_device *dev = exynos_gem_obj->base.dev; + struct exynos_drm_gem_buf *buf = exynos_gem_obj->buffer; + int ret = 0; + if (WARN_ON(!exynos_gem_obj->base.filp)) + return -EINVAL; + + /* Check for valid size. */ + if (buf->size < vma->vm_end - vma->vm_start) { + ret = -EINVAL; + goto out_unlock; + } + + if (!dev->driver->gem_vm_ops) { + ret = -EINVAL; + goto out_unlock; + } + + vma->vm_flags |= VM_RESERVED | VM_IO | VM_MIXEDMAP | VM_DONTEXPAND; + vma->vm_ops = dev->driver->gem_vm_ops; + vma->vm_private_data = exynos_gem_obj; + vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags)); + + /* Take a ref for this mapping of the object, so that the fault +* handler can dereference the mmap offset's pointer to the object. +* This reference is cleaned up by the corresponding vm_close +* (which should happen whether the vma was created by this call, or +* by a vm_open due to mremap or partial unmap or whatever). +*/ + vma->vm_ops->open(vma); + +out_unlock: + return ret; +} + static struct dma_buf_ops exynos_dmabuf_ops = { .map_dma_buf= exynos_gem_map_dma_buf, .unmap_dma_buf = exynos_gem_unmap_dma_buf, @@ -172,6 +209,7 @@ static struct dma_buf_ops exynos_dmabuf_ops = { .kunmap = exynos_gem_dmabuf_kunmap, .kunmap_atomic = exynos_gem_dmabuf_kunmap_atomic, .release= exynos_dmabuf_release, + .mmap = exynos_gem_dmabuf_mmap, }; struct dma_buf *exynos_dmabuf_prime_export(struct drm_device *drm_dev, -- 1.7.0.4
[PATCH 1/3] drm/exynos: correct dma_buf exporter permission as ReadWrite
Set dma_buf exporter permission as ReadWrite, otherwise mmap will get errno 13: permission denied. Signed-off-by: Cooper Yuan --- drivers/gpu/drm/exynos/exynos_drm_dmabuf.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c index 613bf8a..e4eeb0b 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c +++ b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c @@ -29,6 +29,7 @@ #include "exynos_drm_drv.h" #include "exynos_drm_gem.h" +#include #include static struct sg_table *exynos_pages_to_sg(struct page **pages, int nr_pages, @@ -179,7 +180,7 @@ struct dma_buf *exynos_dmabuf_prime_export(struct drm_device *drm_dev, struct exynos_drm_gem_obj *exynos_gem_obj = to_exynos_gem_obj(obj); return dma_buf_export(exynos_gem_obj, &exynos_dmabuf_ops, - exynos_gem_obj->base.size, 0600); + exynos_gem_obj->base.size, O_RDWR); } struct drm_gem_object *exynos_dmabuf_prime_import(struct drm_device *drm_dev, -- 1.7.0.4
[RFC] drm/radeon: restoring ring commands in case of a lockup
On 09.07.2012 18:10, Jerome Glisse wrote: > On Mon, Jul 9, 2012 at 11:59 AM, Michel D?nzer wrote: >> On Mon, 2012-07-09 at 12:41 +0200, Christian K?nig wrote: >>> Hi, >>> >>> The following patchset tries to save and restore the not yet processed >>> commands >>> from the rings in case of a lockup and with that should make a userspace >>> problem with a single application far less problematic. >>> >>> The first four patches are just stuff this patchset is based upon, followed >>> by >>> four patches which fix various bugs found while working on this feature. >>> >>> Followed by patches which change the way how memory is saved/restored on >>> suspend/resume, basically before we have unpinned most of the buffer >>> objects so >>> it could be move from vram into system memory. But that is mostly >>> unnecessary >>> cause the buffer object either are already in system memory or their content >>> can be easily reinitialized. >>> >>> The last three patches implement the actual tracking and restoring of >>> commands >>> in case of a lockup. Please take a look and review. >> Patches 3, 5 and 14 are >> >> Reviewed-by: Michel D?nzer >> > Patch 1-9 are > Reviewed-by: Jerome Glisse > > Other looks good but i want to test them too and spend a bit more time > to double check few things. Will try to do that tomorrow. Just send out v2 of the patchset. Mainly it integrates your idea of just saving rptr right before we call into the IB, but also contains all the other comments and fixes from Michel. Cheers, Christian.
[PATCH 15/15] drm/radeon: implement ring saving on reset v2
Try to save whatever is on the rings when we encounter an lockup. v2: Fix spelling error. Free saved ring data if reset fails. Add documentation for the new functions. Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon.h|4 ++ drivers/gpu/drm/radeon/radeon_device.c | 48 drivers/gpu/drm/radeon/radeon_ring.c | 75 3 files changed, 119 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 64d39ad..6715e4c 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -768,6 +768,10 @@ int radeon_ring_test(struct radeon_device *rdev, struct radeon_ring *cp); void radeon_ring_force_activity(struct radeon_device *rdev, struct radeon_ring *ring); void radeon_ring_lockup_update(struct radeon_ring *ring); bool radeon_ring_test_lockup(struct radeon_device *rdev, struct radeon_ring *ring); +unsigned radeon_ring_backup(struct radeon_device *rdev, struct radeon_ring *ring, + uint32_t **data); +int radeon_ring_restore(struct radeon_device *rdev, struct radeon_ring *ring, + unsigned size, uint32_t *data); int radeon_ring_init(struct radeon_device *rdev, struct radeon_ring *cp, unsigned ring_size, unsigned rptr_offs, unsigned rptr_reg, unsigned wptr_reg, u32 ptr_reg_shift, u32 ptr_reg_mask, u32 nop); diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index bbd0971..0302a9f 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -996,7 +996,12 @@ int radeon_resume_kms(struct drm_device *dev) int radeon_gpu_reset(struct radeon_device *rdev) { - int r; + unsigned ring_sizes[RADEON_NUM_RINGS]; + uint32_t *ring_data[RADEON_NUM_RINGS]; + + bool saved = false; + + int i, r; int resched; down_write(&rdev->exclusive_lock); @@ -1005,20 +1010,47 @@ int radeon_gpu_reset(struct radeon_device *rdev) resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev); radeon_suspend(rdev); + for (i = 0; i < RADEON_NUM_RINGS; ++i) { + ring_sizes[i] = radeon_ring_backup(rdev, &rdev->ring[i], + &ring_data[i]); + if (ring_sizes[i]) { + saved = true; + dev_info(rdev->dev, "Saved %d dwords of commands " +"on ring %d.\n", ring_sizes[i], i); + } + } + +retry: r = radeon_asic_reset(rdev); if (!r) { - dev_info(rdev->dev, "GPU reset succeed\n"); + dev_info(rdev->dev, "GPU reset succeeded, trying to resume\n"); radeon_resume(rdev); + } - r = radeon_ib_ring_tests(rdev); - if (r) - DRM_ERROR("ib ring test failed (%d).\n", r); + radeon_restore_bios_scratch_regs(rdev); + drm_helper_resume_force_mode(rdev->ddev); - radeon_restore_bios_scratch_regs(rdev); - drm_helper_resume_force_mode(rdev->ddev); - ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched); + if (!r) { + for (i = 0; i < RADEON_NUM_RINGS; ++i) { + radeon_ring_restore(rdev, &rdev->ring[i], + ring_sizes[i], ring_data[i]); + } + + r = radeon_ib_ring_tests(rdev); + if (r) { + dev_err(rdev->dev, "ib ring test failed (%d).\n", r); + if (saved) { + radeon_suspend(rdev); + goto retry; + } + } + } else { + for (i = 0; i < RADEON_NUM_RINGS; ++i) { + kfree(ring_data[i]); + } } + ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched); if (r) { /* bad news, how to tell it to userspace ? */ dev_info(rdev->dev, "GPU reset failed\n"); diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index ce8eb9d..a4fa2c7 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -362,6 +362,81 @@ bool radeon_ring_test_lockup(struct radeon_device *rdev, struct radeon_ring *rin return false; } +/** + * radeon_ring_backup - Backup the content of a ring + * + * @rdev: radeon_device pointer + * @ring: the ring we want to backup + * + * Saves all unprocessed commits to a ring, returns the number of dwords saved. + */ +unsigned radeon_ring_backup(struct radeon_device *rdev, struct radeon_ring *ring, + uint32_t **data) +{ + unsigned size, ptr, i; + + /* just in case l
[PATCH 14/15] drm/radeon: record what is next valid wptr for each ring v2
Before emitting any indirect buffer, emit the offset of the next valid ring content if any. This allow code that want to resume ring to resume ring right after ib that caused GPU lockup. v2: use scratch registers instead of storing it into memory Signed-off-by: Jerome Glisse Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/evergreen.c |8 +++- drivers/gpu/drm/radeon/ni.c | 11 ++- drivers/gpu/drm/radeon/r600.c| 18 -- drivers/gpu/drm/radeon/radeon.h |1 + drivers/gpu/drm/radeon/radeon_ring.c |4 drivers/gpu/drm/radeon/rv770.c |4 +++- drivers/gpu/drm/radeon/si.c | 22 +++--- 7 files changed, 60 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index f39b900..40de347 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -1368,7 +1368,13 @@ void evergreen_ring_ib_execute(struct radeon_device *rdev, struct radeon_ib *ib) /* set to DX10/11 mode */ radeon_ring_write(ring, PACKET3(PACKET3_MODE_CONTROL, 0)); radeon_ring_write(ring, 1); - /* FIXME: implement */ + + if (ring->rptr_save_reg) { + uint32_t next_rptr = ring->wptr + 2 + 4; + radeon_ring_write(ring, PACKET0(ring->rptr_save_reg, 0)); + radeon_ring_write(ring, next_rptr); + } + radeon_ring_write(ring, PACKET3(PACKET3_INDIRECT_BUFFER, 2)); radeon_ring_write(ring, #ifdef __BIG_ENDIAN diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c index f2afefb..6e3d448 100644 --- a/drivers/gpu/drm/radeon/ni.c +++ b/drivers/gpu/drm/radeon/ni.c @@ -855,6 +855,13 @@ void cayman_ring_ib_execute(struct radeon_device *rdev, struct radeon_ib *ib) /* set to DX10/11 mode */ radeon_ring_write(ring, PACKET3(PACKET3_MODE_CONTROL, 0)); radeon_ring_write(ring, 1); + + if (ring->rptr_save_reg) { + uint32_t next_rptr = ring->wptr + 2 + 4; + radeon_ring_write(ring, PACKET0(ring->rptr_save_reg, 0)); + radeon_ring_write(ring, next_rptr); + } + radeon_ring_write(ring, PACKET3(PACKET3_INDIRECT_BUFFER, 2)); radeon_ring_write(ring, #ifdef __BIG_ENDIAN @@ -981,8 +988,10 @@ static int cayman_cp_start(struct radeon_device *rdev) static void cayman_cp_fini(struct radeon_device *rdev) { + struct radeon_ring *ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]; cayman_cp_enable(rdev, false); - radeon_ring_fini(rdev, &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]); + radeon_ring_fini(rdev, ring); + radeon_scratch_free(rdev, ring->rptr_save_reg); } int cayman_cp_resume(struct radeon_device *rdev) diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index c808fa9..74fca15 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -2155,18 +2155,27 @@ int r600_cp_resume(struct radeon_device *rdev) void r600_ring_init(struct radeon_device *rdev, struct radeon_ring *ring, unsigned ring_size) { u32 rb_bufsz; + int r; /* Align ring size */ rb_bufsz = drm_order(ring_size / 8); ring_size = (1 << (rb_bufsz + 1)) * 4; ring->ring_size = ring_size; ring->align_mask = 16 - 1; + + r = radeon_scratch_get(rdev, &ring->rptr_save_reg); + if (r) { + DRM_ERROR("failed to get scratch reg for rptr save (%d).\n", r); + ring->rptr_save_reg = 0; + } } void r600_cp_fini(struct radeon_device *rdev) { + struct radeon_ring *ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]; r600_cp_stop(rdev); - radeon_ring_fini(rdev, &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]); + radeon_ring_fini(rdev, ring); + radeon_scratch_free(rdev, ring->rptr_save_reg); } @@ -2568,7 +2577,12 @@ void r600_ring_ib_execute(struct radeon_device *rdev, struct radeon_ib *ib) { struct radeon_ring *ring = &rdev->ring[ib->ring]; - /* FIXME: implement */ + if (ring->rptr_save_reg) { + uint32_t next_rptr = ring->wptr + 2 + 4; + radeon_ring_write(ring, PACKET0(ring->rptr_save_reg, 0)); + radeon_ring_write(ring, next_rptr); + } + radeon_ring_write(ring, PACKET3(PACKET3_INDIRECT_BUFFER, 2)); radeon_ring_write(ring, #ifdef __BIG_ENDIAN diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 872270c..64d39ad 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -622,6 +622,7 @@ struct radeon_ring { unsignedrptr; unsignedrptr_offs; unsignedrptr_reg; + unsignedrptr_save_reg; unsignedwptr; unsignedwptr_old; unsignedwptr_reg;
[PATCH 13/15] drm/radeon: move radeon_ib_ring_tests out of chipset code
Making it easier to controlwhen it is executed. Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/evergreen.c |4 drivers/gpu/drm/radeon/ni.c|4 drivers/gpu/drm/radeon/r100.c |4 drivers/gpu/drm/radeon/r300.c |4 drivers/gpu/drm/radeon/r420.c |4 drivers/gpu/drm/radeon/r520.c |4 drivers/gpu/drm/radeon/r600.c |4 drivers/gpu/drm/radeon/radeon_device.c | 15 +++ drivers/gpu/drm/radeon/rs400.c |4 drivers/gpu/drm/radeon/rs600.c |4 drivers/gpu/drm/radeon/rs690.c |4 drivers/gpu/drm/radeon/rv515.c |4 drivers/gpu/drm/radeon/rv770.c |4 drivers/gpu/drm/radeon/si.c| 21 - 14 files changed, 15 insertions(+), 69 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index 82f7aea..f39b900 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -3093,10 +3093,6 @@ static int evergreen_startup(struct radeon_device *rdev) return r; } - r = radeon_ib_ring_tests(rdev); - if (r) - return r; - r = r600_audio_init(rdev); if (r) { DRM_ERROR("radeon: audio init failed\n"); diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c index ec5307c..f2afefb 100644 --- a/drivers/gpu/drm/radeon/ni.c +++ b/drivers/gpu/drm/radeon/ni.c @@ -1276,10 +1276,6 @@ static int cayman_startup(struct radeon_device *rdev) return r; } - r = radeon_ib_ring_tests(rdev); - if (r) - return r; - r = radeon_vm_manager_init(rdev); if (r) { dev_err(rdev->dev, "vm manager initialization failed (%d).\n", r); diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index 9524bd4..e0f5ae8 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -3887,10 +3887,6 @@ static int r100_startup(struct radeon_device *rdev) return r; } - r = radeon_ib_ring_tests(rdev); - if (r) - return r; - return 0; } diff --git a/drivers/gpu/drm/radeon/r300.c b/drivers/gpu/drm/radeon/r300.c index b396e34..646a192 100644 --- a/drivers/gpu/drm/radeon/r300.c +++ b/drivers/gpu/drm/radeon/r300.c @@ -1397,10 +1397,6 @@ static int r300_startup(struct radeon_device *rdev) return r; } - r = radeon_ib_ring_tests(rdev); - if (r) - return r; - return 0; } diff --git a/drivers/gpu/drm/radeon/r420.c b/drivers/gpu/drm/radeon/r420.c index 0062938..f2f5bf6 100644 --- a/drivers/gpu/drm/radeon/r420.c +++ b/drivers/gpu/drm/radeon/r420.c @@ -281,10 +281,6 @@ static int r420_startup(struct radeon_device *rdev) return r; } - r = radeon_ib_ring_tests(rdev); - if (r) - return r; - return 0; } diff --git a/drivers/gpu/drm/radeon/r520.c b/drivers/gpu/drm/radeon/r520.c index 6df3e51..079d3c5 100644 --- a/drivers/gpu/drm/radeon/r520.c +++ b/drivers/gpu/drm/radeon/r520.c @@ -209,10 +209,6 @@ static int r520_startup(struct radeon_device *rdev) return r; } - r = radeon_ib_ring_tests(rdev); - if (r) - return r; - return 0; } diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index af2f74a..c808fa9 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -2395,10 +2395,6 @@ int r600_startup(struct radeon_device *rdev) return r; } - r = radeon_ib_ring_tests(rdev); - if (r) - return r; - r = r600_audio_init(rdev); if (r) { DRM_ERROR("radeon: audio init failed\n"); diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index 254fdb4..bbd0971 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -822,6 +822,10 @@ int radeon_device_init(struct radeon_device *rdev, if (r) return r; + r = radeon_ib_ring_tests(rdev); + if (r) + DRM_ERROR("ib ring test failed (%d).\n", r); + if (rdev->flags & RADEON_IS_AGP && !rdev->accel_working) { /* Acceleration not working on AGP card try again * with fallback to PCI or PCIE GART @@ -946,6 +950,7 @@ int radeon_resume_kms(struct drm_device *dev) { struct drm_connector *connector; struct radeon_device *rdev = dev->dev_private; + int r; if (dev->switch_power_state == DRM_SWITCH_POWER_OFF) return 0; @@ -960,6 +965,11 @@ int radeon_resume_kms(struct drm_device *dev) /* resume AGP if in use */ radeon_agp_resume(rdev);
[PATCH 12/15] drm/radeon: remove vm_manager start/suspend
Just restore the page table instead. Addressing three problem with this change: 1. Calling vm_manager_suspend in the suspend path is problematic cause it wants to wait for the VM use to end, which in case of a lockup never happens. 2. In case of a locked up memory controller unbinding the VM seems to make it even more unstable, creating an unrecoverable lockup in the end. 3. If we want to backup/restore the leftover ring content we must not unbind VMs in between. Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/ni.c | 12 ++--- drivers/gpu/drm/radeon/radeon.h |2 - drivers/gpu/drm/radeon/radeon_gart.c | 83 +- drivers/gpu/drm/radeon/si.c | 12 ++--- 4 files changed, 59 insertions(+), 50 deletions(-) diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c index 4004376..ec5307c 100644 --- a/drivers/gpu/drm/radeon/ni.c +++ b/drivers/gpu/drm/radeon/ni.c @@ -1280,9 +1280,11 @@ static int cayman_startup(struct radeon_device *rdev) if (r) return r; - r = radeon_vm_manager_start(rdev); - if (r) + r = radeon_vm_manager_init(rdev); + if (r) { + dev_err(rdev->dev, "vm manager initialization failed (%d).\n", r); return r; + } r = r600_audio_init(rdev); if (r) @@ -1315,7 +1317,6 @@ int cayman_resume(struct radeon_device *rdev) int cayman_suspend(struct radeon_device *rdev) { r600_audio_fini(rdev); - radeon_vm_manager_suspend(rdev); cayman_cp_enable(rdev, false); rdev->ring[RADEON_RING_TYPE_GFX_INDEX].ready = false; evergreen_irq_suspend(rdev); @@ -1392,11 +1393,6 @@ int cayman_init(struct radeon_device *rdev) return r; rdev->accel_working = true; - r = radeon_vm_manager_init(rdev); - if (r) { - dev_err(rdev->dev, "vm manager initialization failed (%d).\n", r); - } - r = cayman_startup(rdev); if (r) { dev_err(rdev->dev, "disabling GPU acceleration\n"); diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 8a8c3f8..872270c 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -1759,8 +1759,6 @@ extern void radeon_ttm_set_active_vram_size(struct radeon_device *rdev, u64 size */ int radeon_vm_manager_init(struct radeon_device *rdev); void radeon_vm_manager_fini(struct radeon_device *rdev); -int radeon_vm_manager_start(struct radeon_device *rdev); -int radeon_vm_manager_suspend(struct radeon_device *rdev); int radeon_vm_init(struct radeon_device *rdev, struct radeon_vm *vm); void radeon_vm_fini(struct radeon_device *rdev, struct radeon_vm *vm); int radeon_vm_bind(struct radeon_device *rdev, struct radeon_vm *vm); diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index ee11c50..56752da 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -282,27 +282,58 @@ void radeon_gart_fini(struct radeon_device *rdev) * * TODO bind a default page at vm initialization for default address */ + int radeon_vm_manager_init(struct radeon_device *rdev) { + struct radeon_vm *vm; + struct radeon_bo_va *bo_va; int r; - rdev->vm_manager.enabled = false; + if (!rdev->vm_manager.enabled) { + /* mark first vm as always in use, it's the system one */ + r = radeon_sa_bo_manager_init(rdev, &rdev->vm_manager.sa_manager, + rdev->vm_manager.max_pfn * 8, + RADEON_GEM_DOMAIN_VRAM); + if (r) { + dev_err(rdev->dev, "failed to allocate vm bo (%dKB)\n", + (rdev->vm_manager.max_pfn * 8) >> 10); + return r; + } - /* mark first vm as always in use, it's the system one */ - r = radeon_sa_bo_manager_init(rdev, &rdev->vm_manager.sa_manager, - rdev->vm_manager.max_pfn * 8, - RADEON_GEM_DOMAIN_VRAM); - if (r) { - dev_err(rdev->dev, "failed to allocate vm bo (%dKB)\n", - (rdev->vm_manager.max_pfn * 8) >> 10); - return r; + r = rdev->vm_manager.funcs->init(rdev); + if (r) + return r; + + rdev->vm_manager.enabled = true; + + r = radeon_sa_bo_manager_start(rdev, &rdev->vm_manager.sa_manager); + if (r) + return r; } - r = rdev->vm_manager.funcs->init(rdev); - if (r == 0) - rdev->vm_manager.enabled = true; + /* restore page table */ + list_for_each_entry(vm, &rdev->vm_manager.lru_vm, list) { + if (vm->id == -1)
[PATCH 11/15] drm/radeon: remove r600_blit_suspend
Just reinitialize the shader content on resume instead. Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/evergreen.c |1 - drivers/gpu/drm/radeon/evergreen_blit_kms.c | 40 +-- drivers/gpu/drm/radeon/ni.c |1 - drivers/gpu/drm/radeon/r600.c | 15 -- drivers/gpu/drm/radeon/r600_blit_kms.c | 40 +-- drivers/gpu/drm/radeon/radeon.h |2 -- drivers/gpu/drm/radeon/rv770.c |1 - drivers/gpu/drm/radeon/si.c |3 -- 8 files changed, 40 insertions(+), 63 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index 64e06e6..82f7aea 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -3139,7 +3139,6 @@ int evergreen_suspend(struct radeon_device *rdev) struct radeon_ring *ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]; r600_audio_fini(rdev); - r600_blit_suspend(rdev); r700_cp_stop(rdev); ring->ready = false; evergreen_irq_suspend(rdev); diff --git a/drivers/gpu/drm/radeon/evergreen_blit_kms.c b/drivers/gpu/drm/radeon/evergreen_blit_kms.c index e512560..89cb9fe 100644 --- a/drivers/gpu/drm/radeon/evergreen_blit_kms.c +++ b/drivers/gpu/drm/radeon/evergreen_blit_kms.c @@ -634,10 +634,6 @@ int evergreen_blit_init(struct radeon_device *rdev) rdev->r600_blit.max_dim = 16384; - /* pin copy shader into vram if already initialized */ - if (rdev->r600_blit.shader_obj) - goto done; - rdev->r600_blit.state_offset = 0; if (rdev->family < CHIP_CAYMAN) @@ -668,11 +664,26 @@ int evergreen_blit_init(struct radeon_device *rdev) obj_size += cayman_ps_size * 4; obj_size = ALIGN(obj_size, 256); - r = radeon_bo_create(rdev, obj_size, PAGE_SIZE, true, RADEON_GEM_DOMAIN_VRAM, -NULL, &rdev->r600_blit.shader_obj); - if (r) { - DRM_ERROR("evergreen failed to allocate shader\n"); - return r; + /* pin copy shader into vram if not already initialized */ + if (!rdev->r600_blit.shader_obj) { + r = radeon_bo_create(rdev, obj_size, PAGE_SIZE, true, +RADEON_GEM_DOMAIN_VRAM, +NULL, &rdev->r600_blit.shader_obj); + if (r) { + DRM_ERROR("evergreen failed to allocate shader\n"); + return r; + } + + r = radeon_bo_reserve(rdev->r600_blit.shader_obj, false); + if (unlikely(r != 0)) + return r; + r = radeon_bo_pin(rdev->r600_blit.shader_obj, RADEON_GEM_DOMAIN_VRAM, + &rdev->r600_blit.shader_gpu_addr); + radeon_bo_unreserve(rdev->r600_blit.shader_obj); + if (r) { + dev_err(rdev->dev, "(%d) pin blit object failed\n", r); + return r; + } } DRM_DEBUG("evergreen blit allocated bo %08x vs %08x ps %08x\n", @@ -714,17 +725,6 @@ int evergreen_blit_init(struct radeon_device *rdev) radeon_bo_kunmap(rdev->r600_blit.shader_obj); radeon_bo_unreserve(rdev->r600_blit.shader_obj); -done: - r = radeon_bo_reserve(rdev->r600_blit.shader_obj, false); - if (unlikely(r != 0)) - return r; - r = radeon_bo_pin(rdev->r600_blit.shader_obj, RADEON_GEM_DOMAIN_VRAM, - &rdev->r600_blit.shader_gpu_addr); - radeon_bo_unreserve(rdev->r600_blit.shader_obj); - if (r) { - dev_err(rdev->dev, "(%d) pin blit object failed\n", r); - return r; - } radeon_ttm_set_active_vram_size(rdev, rdev->mc.real_vram_size); return 0; } diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c index fe55310..4004376 100644 --- a/drivers/gpu/drm/radeon/ni.c +++ b/drivers/gpu/drm/radeon/ni.c @@ -1316,7 +1316,6 @@ int cayman_suspend(struct radeon_device *rdev) { r600_audio_fini(rdev); radeon_vm_manager_suspend(rdev); - r600_blit_suspend(rdev); cayman_cp_enable(rdev, false); rdev->ring[RADEON_RING_TYPE_GFX_INDEX].ready = false; evergreen_irq_suspend(rdev); diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index 9750f53..af2f74a 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -2307,20 +2307,6 @@ int r600_copy_blit(struct radeon_device *rdev, return 0; } -void r600_blit_suspend(struct radeon_device *rdev) -{ - int r; - - /* unpin shaders bo */ - if (rdev->r600_blit.shader_obj) { - r = radeon_bo_reserve(rdev->r600_blit.shader_obj, false); - if (!r) { - radeon_bo_unpin(rdev->r600_blit.sha
[PATCH 10/15] drm/radeon: remove ip_pool start/suspend
The IB pool is in gart memory, so it is completely superfluous to unpin / repin it on suspend / resume. Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/evergreen.c | 17 ++--- drivers/gpu/drm/radeon/ni.c | 16 ++-- drivers/gpu/drm/radeon/r100.c| 23 ++- drivers/gpu/drm/radeon/r300.c| 17 ++--- drivers/gpu/drm/radeon/r420.c| 17 ++--- drivers/gpu/drm/radeon/r520.c| 14 +- drivers/gpu/drm/radeon/r600.c| 17 ++--- drivers/gpu/drm/radeon/radeon.h |2 -- drivers/gpu/drm/radeon/radeon_asic.h |1 - drivers/gpu/drm/radeon/radeon_ring.c | 17 +++-- drivers/gpu/drm/radeon/rs400.c | 17 ++--- drivers/gpu/drm/radeon/rs600.c | 17 ++--- drivers/gpu/drm/radeon/rs690.c | 17 ++--- drivers/gpu/drm/radeon/rv515.c | 16 ++-- drivers/gpu/drm/radeon/rv770.c | 17 ++--- drivers/gpu/drm/radeon/si.c | 16 ++-- 16 files changed, 84 insertions(+), 157 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index eb9a71a..64e06e6 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -3087,9 +3087,11 @@ static int evergreen_startup(struct radeon_device *rdev) if (r) return r; - r = radeon_ib_pool_start(rdev); - if (r) + r = radeon_ib_pool_init(rdev); + if (r) { + dev_err(rdev->dev, "IB initialization failed (%d).\n", r); return r; + } r = radeon_ib_ring_tests(rdev); if (r) @@ -3137,7 +3139,6 @@ int evergreen_suspend(struct radeon_device *rdev) struct radeon_ring *ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]; r600_audio_fini(rdev); - radeon_ib_pool_suspend(rdev); r600_blit_suspend(rdev); r700_cp_stop(rdev); ring->ready = false; @@ -3224,20 +3225,14 @@ int evergreen_init(struct radeon_device *rdev) if (r) return r; - r = radeon_ib_pool_init(rdev); rdev->accel_working = true; - if (r) { - dev_err(rdev->dev, "IB initialization failed (%d).\n", r); - rdev->accel_working = false; - } - r = evergreen_startup(rdev); if (r) { dev_err(rdev->dev, "disabling GPU acceleration\n"); r700_cp_fini(rdev); r600_irq_fini(rdev); radeon_wb_fini(rdev); - r100_ib_fini(rdev); + radeon_ib_pool_fini(rdev); radeon_irq_kms_fini(rdev); evergreen_pcie_gart_fini(rdev); rdev->accel_working = false; @@ -3264,7 +3259,7 @@ void evergreen_fini(struct radeon_device *rdev) r700_cp_fini(rdev); r600_irq_fini(rdev); radeon_wb_fini(rdev); - r100_ib_fini(rdev); + radeon_ib_pool_fini(rdev); radeon_irq_kms_fini(rdev); evergreen_pcie_gart_fini(rdev); r600_vram_scratch_fini(rdev); diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c index 8b1df33..fe55310 100644 --- a/drivers/gpu/drm/radeon/ni.c +++ b/drivers/gpu/drm/radeon/ni.c @@ -1270,9 +1270,11 @@ static int cayman_startup(struct radeon_device *rdev) if (r) return r; - r = radeon_ib_pool_start(rdev); - if (r) + r = radeon_ib_pool_init(rdev); + if (r) { + dev_err(rdev->dev, "IB initialization failed (%d).\n", r); return r; + } r = radeon_ib_ring_tests(rdev); if (r) @@ -1313,7 +1315,6 @@ int cayman_resume(struct radeon_device *rdev) int cayman_suspend(struct radeon_device *rdev) { r600_audio_fini(rdev); - radeon_ib_pool_suspend(rdev); radeon_vm_manager_suspend(rdev); r600_blit_suspend(rdev); cayman_cp_enable(rdev, false); @@ -1391,12 +1392,7 @@ int cayman_init(struct radeon_device *rdev) if (r) return r; - r = radeon_ib_pool_init(rdev); rdev->accel_working = true; - if (r) { - dev_err(rdev->dev, "IB initialization failed (%d).\n", r); - rdev->accel_working = false; - } r = radeon_vm_manager_init(rdev); if (r) { dev_err(rdev->dev, "vm manager initialization failed (%d).\n", r); @@ -1410,7 +1406,7 @@ int cayman_init(struct radeon_device *rdev) if (rdev->flags & RADEON_IS_IGP) si_rlc_fini(rdev); radeon_wb_fini(rdev); - r100_ib_fini(rdev); + radeon_ib_pool_fini(rdev); radeon_vm_manager_fini(rdev); radeon_irq_kms_fini(rdev); cayman_pcie_gart_fini(rdev); @@ -1441,7 +1437,7 @@ void cayman_fini(struct radeon
[PATCH 09/15] drm/radeon: make cp init on cayman more robust
It's not critical, but the current code isn't 100% correct. Signed-off-by: Christian K?nig Reviewed-by: Jerome Glisse --- drivers/gpu/drm/radeon/ni.c | 133 ++- 1 file changed, 56 insertions(+), 77 deletions(-) diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c index 32a6082..8b1df33 100644 --- a/drivers/gpu/drm/radeon/ni.c +++ b/drivers/gpu/drm/radeon/ni.c @@ -987,10 +987,33 @@ static void cayman_cp_fini(struct radeon_device *rdev) int cayman_cp_resume(struct radeon_device *rdev) { + static const int ridx[] = { + RADEON_RING_TYPE_GFX_INDEX, + CAYMAN_RING_TYPE_CP1_INDEX, + CAYMAN_RING_TYPE_CP2_INDEX + }; + static const unsigned cp_rb_cntl[] = { + CP_RB0_CNTL, + CP_RB1_CNTL, + CP_RB2_CNTL, + }; + static const unsigned cp_rb_rptr_addr[] = { + CP_RB0_RPTR_ADDR, + CP_RB1_RPTR_ADDR, + CP_RB2_RPTR_ADDR + }; + static const unsigned cp_rb_rptr_addr_hi[] = { + CP_RB0_RPTR_ADDR_HI, + CP_RB1_RPTR_ADDR_HI, + CP_RB2_RPTR_ADDR_HI + }; + static const unsigned cp_rb_base[] = { + CP_RB0_BASE, + CP_RB1_BASE, + CP_RB2_BASE + }; struct radeon_ring *ring; - u32 tmp; - u32 rb_bufsz; - int r; + int i, r; /* Reset cp; if cp is reset, then PA, SH, VGT also need to be reset */ WREG32(GRBM_SOFT_RESET, (SOFT_RESET_CP | @@ -1012,91 +1035,47 @@ int cayman_cp_resume(struct radeon_device *rdev) WREG32(CP_DEBUG, (1 << 27)); - /* ring 0 - compute and gfx */ - /* Set ring buffer size */ - ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]; - rb_bufsz = drm_order(ring->ring_size / 8); - tmp = (drm_order(RADEON_GPU_PAGE_SIZE/8) << 8) | rb_bufsz; -#ifdef __BIG_ENDIAN - tmp |= BUF_SWAP_32BIT; -#endif - WREG32(CP_RB0_CNTL, tmp); - - /* Initialize the ring buffer's read and write pointers */ - WREG32(CP_RB0_CNTL, tmp | RB_RPTR_WR_ENA); - ring->wptr = 0; - WREG32(CP_RB0_WPTR, ring->wptr); - /* set the wb address wether it's enabled or not */ - WREG32(CP_RB0_RPTR_ADDR, (rdev->wb.gpu_addr + RADEON_WB_CP_RPTR_OFFSET) & 0xFFFC); - WREG32(CP_RB0_RPTR_ADDR_HI, upper_32_bits(rdev->wb.gpu_addr + RADEON_WB_CP_RPTR_OFFSET) & 0xFF); WREG32(SCRATCH_ADDR, ((rdev->wb.gpu_addr + RADEON_WB_SCRATCH_OFFSET) >> 8) & 0x); + WREG32(SCRATCH_UMSK, 0xff); - if (rdev->wb.enabled) - WREG32(SCRATCH_UMSK, 0xff); - else { - tmp |= RB_NO_UPDATE; - WREG32(SCRATCH_UMSK, 0); - } - - mdelay(1); - WREG32(CP_RB0_CNTL, tmp); - - WREG32(CP_RB0_BASE, ring->gpu_addr >> 8); - - ring->rptr = RREG32(CP_RB0_RPTR); + for (i = 0; i < 3; ++i) { + uint32_t rb_cntl; + uint64_t addr; - /* ring1 - compute only */ - /* Set ring buffer size */ - ring = &rdev->ring[CAYMAN_RING_TYPE_CP1_INDEX]; - rb_bufsz = drm_order(ring->ring_size / 8); - tmp = (drm_order(RADEON_GPU_PAGE_SIZE/8) << 8) | rb_bufsz; + /* Set ring buffer size */ + ring = &rdev->ring[ridx[i]]; + rb_cntl = drm_order(ring->ring_size / 8); + rb_cntl |= drm_order(RADEON_GPU_PAGE_SIZE/8) << 8; #ifdef __BIG_ENDIAN - tmp |= BUF_SWAP_32BIT; + rb_cntl |= BUF_SWAP_32BIT; #endif - WREG32(CP_RB1_CNTL, tmp); + WREG32(cp_rb_cntl[i], rb_cntl); - /* Initialize the ring buffer's read and write pointers */ - WREG32(CP_RB1_CNTL, tmp | RB_RPTR_WR_ENA); - ring->wptr = 0; - WREG32(CP_RB1_WPTR, ring->wptr); - - /* set the wb address wether it's enabled or not */ - WREG32(CP_RB1_RPTR_ADDR, (rdev->wb.gpu_addr + RADEON_WB_CP1_RPTR_OFFSET) & 0xFFFC); - WREG32(CP_RB1_RPTR_ADDR_HI, upper_32_bits(rdev->wb.gpu_addr + RADEON_WB_CP1_RPTR_OFFSET) & 0xFF); - - mdelay(1); - WREG32(CP_RB1_CNTL, tmp); - - WREG32(CP_RB1_BASE, ring->gpu_addr >> 8); - - ring->rptr = RREG32(CP_RB1_RPTR); - - /* ring2 - compute only */ - /* Set ring buffer size */ - ring = &rdev->ring[CAYMAN_RING_TYPE_CP2_INDEX]; - rb_bufsz = drm_order(ring->ring_size / 8); - tmp = (drm_order(RADEON_GPU_PAGE_SIZE/8) << 8) | rb_bufsz; -#ifdef __BIG_ENDIAN - tmp |= BUF_SWAP_32BIT; -#endif - WREG32(CP_RB2_CNTL, tmp); - - /* Initialize the ring buffer's read and write pointers */ - WREG32(CP_RB2_CNTL, tmp | RB_RPTR_WR_ENA); - ring->wptr = 0; - WREG32(CP_RB2_WPTR, ring->wptr); + /* set the wb address wether it's enabled or not */ + addr = rdev->wb.gpu_addr + RADEON_WB_CP_RPTR_OFFSET; + WREG32(cp_r
[PATCH 08/15] drm/radeon: remove FIXME comment from chipset suspend
For a normal suspend/resume we allready wait for the rings to be empty, and for a suspend/reasume in case of a lockup we REALLY don't want to wait for anything. Signed-off-by: Christian K?nig Reviewed-by: Jerome Glisse --- drivers/gpu/drm/radeon/evergreen.c |1 - drivers/gpu/drm/radeon/ni.c|1 - drivers/gpu/drm/radeon/r600.c |1 - drivers/gpu/drm/radeon/rv770.c |1 - drivers/gpu/drm/radeon/si.c|1 - 5 files changed, 5 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index f716e08..eb9a71a 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -3137,7 +3137,6 @@ int evergreen_suspend(struct radeon_device *rdev) struct radeon_ring *ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]; r600_audio_fini(rdev); - /* FIXME: we should wait for ring to be empty */ radeon_ib_pool_suspend(rdev); r600_blit_suspend(rdev); r700_cp_stop(rdev); diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c index 2366be3..32a6082 100644 --- a/drivers/gpu/drm/radeon/ni.c +++ b/drivers/gpu/drm/radeon/ni.c @@ -1334,7 +1334,6 @@ int cayman_resume(struct radeon_device *rdev) int cayman_suspend(struct radeon_device *rdev) { r600_audio_fini(rdev); - /* FIXME: we should wait for ring to be empty */ radeon_ib_pool_suspend(rdev); radeon_vm_manager_suspend(rdev); r600_blit_suspend(rdev); diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index 43d0c41..de4de2d 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -2461,7 +2461,6 @@ int r600_suspend(struct radeon_device *rdev) r600_audio_fini(rdev); radeon_ib_pool_suspend(rdev); r600_blit_suspend(rdev); - /* FIXME: we should wait for ring to be empty */ r600_cp_stop(rdev); rdev->ring[RADEON_RING_TYPE_GFX_INDEX].ready = false; r600_irq_suspend(rdev); diff --git a/drivers/gpu/drm/radeon/rv770.c b/drivers/gpu/drm/radeon/rv770.c index b4f51c5..7e230f6 100644 --- a/drivers/gpu/drm/radeon/rv770.c +++ b/drivers/gpu/drm/radeon/rv770.c @@ -996,7 +996,6 @@ int rv770_suspend(struct radeon_device *rdev) r600_audio_fini(rdev); radeon_ib_pool_suspend(rdev); r600_blit_suspend(rdev); - /* FIXME: we should wait for ring to be empty */ r700_cp_stop(rdev); rdev->ring[RADEON_RING_TYPE_GFX_INDEX].ready = false; r600_irq_suspend(rdev); diff --git a/drivers/gpu/drm/radeon/si.c b/drivers/gpu/drm/radeon/si.c index 34603b3c8..78c790f 100644 --- a/drivers/gpu/drm/radeon/si.c +++ b/drivers/gpu/drm/radeon/si.c @@ -3807,7 +3807,6 @@ int si_resume(struct radeon_device *rdev) int si_suspend(struct radeon_device *rdev) { - /* FIXME: we should wait for ring to be empty */ radeon_ib_pool_suspend(rdev); radeon_vm_manager_suspend(rdev); #if 0 -- 1.7.9.5
[PATCH 07/15] drm/radeon: fix fence init after resume
Start with last signaled fence number instead of last emitted one. Signed-off-by: Christian K?nig Reviewed-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon_fence.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index a194a14..76c5b22 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -578,7 +578,7 @@ int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring) } rdev->fence_drv[ring].cpu_addr = &rdev->wb.wb[index/4]; rdev->fence_drv[ring].gpu_addr = rdev->wb.gpu_addr + index; - radeon_fence_write(rdev, rdev->fence_drv[ring].sync_seq[ring], ring); + radeon_fence_write(rdev, atomic64_read(&rdev->fence_drv[ring].last_seq), ring); rdev->fence_drv[ring].initialized = true; dev_info(rdev->dev, "fence driver on ring %d use gpu addr 0x%016llx and cpu addr 0x%p\n", ring, rdev->fence_drv[ring].gpu_addr, rdev->fence_drv[ring].cpu_addr); -- 1.7.9.5
[PATCH 06/15] drm/radeon: fix fence value access
It is possible that radeon_fence_process is called after writeback is disabled for suspend, leading to an invalid read of register 0x0. This fixes a problem for me where the fence value is temporary incremented by 0x1 on suspend/resume. Signed-off-by: Christian K?nig Reviewed-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon_fence.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index be4e4f3..a194a14 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -42,21 +42,23 @@ static void radeon_fence_write(struct radeon_device *rdev, u32 seq, int ring) { - if (rdev->wb.enabled) { - *rdev->fence_drv[ring].cpu_addr = cpu_to_le32(seq); + struct radeon_fence_driver *drv = &rdev->fence_drv[ring]; + if (likely(rdev->wb.enabled || !drv->scratch_reg)) { + *drv->cpu_addr = cpu_to_le32(seq); } else { - WREG32(rdev->fence_drv[ring].scratch_reg, seq); + WREG32(drv->scratch_reg, seq); } } static u32 radeon_fence_read(struct radeon_device *rdev, int ring) { + struct radeon_fence_driver *drv = &rdev->fence_drv[ring]; u32 seq = 0; - if (rdev->wb.enabled) { - seq = le32_to_cpu(*rdev->fence_drv[ring].cpu_addr); + if (likely(rdev->wb.enabled || !drv->scratch_reg)) { + seq = le32_to_cpu(*drv->cpu_addr); } else { - seq = RREG32(rdev->fence_drv[ring].scratch_reg); + seq = RREG32(drv->scratch_reg); } return seq; } -- 1.7.9.5
[PATCH 05/15] drm/radeon: fix ring commit padding
We don't need to pad anything if the number of dwords written to the ring already matches the requirements. Fixes some "writting more dword to ring than expected" warnings. Signed-off-by: Christian K?nig Reviewed-by: Jerome Glisse Reviewed-by: Michel D?nzer --- drivers/gpu/drm/radeon/radeon_ring.c |7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 0826e77..674aaba 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -272,13 +272,8 @@ int radeon_ring_lock(struct radeon_device *rdev, struct radeon_ring *ring, unsig void radeon_ring_commit(struct radeon_device *rdev, struct radeon_ring *ring) { - unsigned count_dw_pad; - unsigned i; - /* We pad to match fetch size */ - count_dw_pad = (ring->align_mask + 1) - - (ring->wptr & ring->align_mask); - for (i = 0; i < count_dw_pad; i++) { + while (ring->wptr & ring->align_mask) { radeon_ring_write(ring, ring->nop); } DRM_MEMORYBARRIER(); -- 1.7.9.5
[PATCH 04/15] drm/radeon: add an exclusive lock for GPU reset v2
From: Jerome Glisse GPU reset need to be exclusive, one happening at a time. For this add a rw semaphore so that any path that trigger GPU activities have to take the semaphore as a reader thus allowing concurency. The GPU reset path take the semaphore as a writer ensuring that no concurrent reset take place. v2: init rw semaphore Signed-off-by: Jerome Glisse Reviewed-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon.h|1 + drivers/gpu/drm/radeon/radeon_cs.c |5 + drivers/gpu/drm/radeon/radeon_device.c |3 +++ drivers/gpu/drm/radeon/radeon_gem.c|8 4 files changed, 17 insertions(+) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 5861ec8..4487873 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -1446,6 +1446,7 @@ struct radeon_device { struct device *dev; struct drm_device *ddev; struct pci_dev *pdev; + struct rw_semaphore exclusive_lock; /* ASIC */ union radeon_asic_configconfig; enum radeon_family family; diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index d5aec09..553da67 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -499,7 +499,9 @@ int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) struct radeon_cs_parser parser; int r; + down_read(&rdev->exclusive_lock); if (!rdev->accel_working) { + up_read(&rdev->exclusive_lock); return -EBUSY; } /* initialize parser */ @@ -512,6 +514,7 @@ int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) if (r) { DRM_ERROR("Failed to initialize parser !\n"); radeon_cs_parser_fini(&parser, r); + up_read(&rdev->exclusive_lock); r = radeon_cs_handle_lockup(rdev, r); return r; } @@ -520,6 +523,7 @@ int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) if (r != -ERESTARTSYS) DRM_ERROR("Failed to parse relocation %d!\n", r); radeon_cs_parser_fini(&parser, r); + up_read(&rdev->exclusive_lock); r = radeon_cs_handle_lockup(rdev, r); return r; } @@ -533,6 +537,7 @@ int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) } out: radeon_cs_parser_fini(&parser, r); + up_read(&rdev->exclusive_lock); r = radeon_cs_handle_lockup(rdev, r); return r; } diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index f654ba8..254fdb4 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -734,6 +734,7 @@ int radeon_device_init(struct radeon_device *rdev, mutex_init(&rdev->gem.mutex); mutex_init(&rdev->pm.mutex); init_rwsem(&rdev->pm.mclk_lock); + init_rwsem(&rdev->exclusive_lock); init_waitqueue_head(&rdev->irq.vblank_queue); init_waitqueue_head(&rdev->irq.idle_queue); r = radeon_gem_init(rdev); @@ -988,6 +989,7 @@ int radeon_gpu_reset(struct radeon_device *rdev) int r; int resched; + down_write(&rdev->exclusive_lock); radeon_save_bios_scratch_regs(rdev); /* block TTM */ resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev); @@ -1007,6 +1009,7 @@ int radeon_gpu_reset(struct radeon_device *rdev) dev_info(rdev->dev, "GPU reset failed\n"); } + up_write(&rdev->exclusive_lock); return r; } diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c index d9b0809..b0be9c4 100644 --- a/drivers/gpu/drm/radeon/radeon_gem.c +++ b/drivers/gpu/drm/radeon/radeon_gem.c @@ -215,12 +215,14 @@ int radeon_gem_create_ioctl(struct drm_device *dev, void *data, uint32_t handle; int r; + down_read(&rdev->exclusive_lock); /* create a gem object to contain this object in */ args->size = roundup(args->size, PAGE_SIZE); r = radeon_gem_object_create(rdev, args->size, args->alignment, args->initial_domain, false, false, &gobj); if (r) { + up_read(&rdev->exclusive_lock); r = radeon_gem_handle_lockup(rdev, r); return r; } @@ -228,10 +230,12 @@ int radeon_gem_create_ioctl(struct drm_device *dev, void *data, /* drop reference from allocate - handle holds it now */ drm_gem_object_unreference_unlocked(gobj); if (r) { + up_read(&rdev->exclusive_lock); r = radeon_gem_handle_locku
[PATCH 03/15] drm/radeon: fix fence related segfault in CS
Don't return success if scheduling the IB fails, otherwise we end up with an oops in ttm_eu_fence_buffer_objects. Signed-off-by: Christian K?nig Reviewed-by: Jerome Glisse Reviewed-by: Michel D?nzer Cc: stable at vger.kernel.org --- drivers/gpu/drm/radeon/radeon_cs.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index f1b7527..d5aec09 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -358,7 +358,7 @@ static int radeon_cs_ib_chunk(struct radeon_device *rdev, if (r) { DRM_ERROR("Failed to schedule IB !\n"); } - return 0; + return r; } static int radeon_bo_vm_update_pte(struct radeon_cs_parser *parser, -- 1.7.9.5
[PATCH 02/15] drm/radeon: add error handling to radeon_vm_unbind_locked
Waiting for a fence can fail for different reasons, the most common is a deadlock. Signed-off-by: Christian K?nig Reviewed-by: Michel D?nzer Reviewed-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon_gart.c | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index 2b34c1a..ee11c50 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -316,10 +316,21 @@ static void radeon_vm_unbind_locked(struct radeon_device *rdev, } /* wait for vm use to end */ - if (vm->fence) { - radeon_fence_wait(vm->fence, false); - radeon_fence_unref(&vm->fence); + while (vm->fence) { + int r; + r = radeon_fence_wait(vm->fence, false); + if (r) + DRM_ERROR("error while waiting for fence: %d\n", r); + if (r == -EDEADLK) { + mutex_unlock(&rdev->vm_manager.lock); + r = radeon_gpu_reset(rdev); + mutex_lock(&rdev->vm_manager.lock); + if (!r) + continue; + } + break; } + radeon_fence_unref(&vm->fence); /* hw unbind */ rdev->vm_manager.funcs->unbind(rdev, vm); -- 1.7.9.5
[PATCH 01/15] drm/radeon: add error handling to fence_wait_empty_locked
Instead of returning the error handle it directly and while at it fix the comments about the ring lock. Signed-off-by: Christian K?nig Reviewed-by: Michel D?nzer Reviewed-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon.h |2 +- drivers/gpu/drm/radeon/radeon_fence.c | 33 + 2 files changed, 22 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 77b4519b..5861ec8 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -239,7 +239,7 @@ void radeon_fence_process(struct radeon_device *rdev, int ring); bool radeon_fence_signaled(struct radeon_fence *fence); int radeon_fence_wait(struct radeon_fence *fence, bool interruptible); int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring); -int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring); +void radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring); int radeon_fence_wait_any(struct radeon_device *rdev, struct radeon_fence **fences, bool intr); diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 7b55625..be4e4f3 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -440,14 +440,11 @@ int radeon_fence_wait_any(struct radeon_device *rdev, return 0; } +/* caller must hold ring lock */ int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring) { uint64_t seq; - /* We are not protected by ring lock when reading current seq but -* it's ok as worst case is we return to early while we could have -* wait. -*/ seq = atomic64_read(&rdev->fence_drv[ring].last_seq) + 1ULL; if (seq >= rdev->fence_drv[ring].sync_seq[ring]) { /* nothing to wait for, last_seq is @@ -457,15 +454,27 @@ int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring) return radeon_fence_wait_seq(rdev, seq, ring, false, false); } -int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring) +/* caller must hold ring lock */ +void radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring) { - /* We are not protected by ring lock when reading current seq -* but it's ok as wait empty is call from place where no more -* activity can be scheduled so there won't be concurrent access -* to seq value. -*/ - return radeon_fence_wait_seq(rdev, rdev->fence_drv[ring].sync_seq[ring], -ring, false, false); + uint64_t seq = rdev->fence_drv[ring].sync_seq[ring]; + + while(1) { + int r; + r = radeon_fence_wait_seq(rdev, seq, ring, false, false); + if (r == -EDEADLK) { + mutex_unlock(&rdev->ring_lock); + r = radeon_gpu_reset(rdev); + mutex_lock(&rdev->ring_lock); + if (!r) + continue; + } + if (r) { + dev_err(rdev->dev, "error waiting for ring to become" + " idle (%d)\n", r); + } + return; + } } struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence) -- 1.7.9.5
[RFC PATCH 8/8] nouveau: Prime execbuffer submission synchronization
From: Maarten Lankhorst Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/nouveau/nouveau_gem.c | 121 +++-- 1 file changed, 116 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c index 11c9c2a..e5d36bb 100644 --- a/drivers/gpu/drm/nouveau/nouveau_gem.c +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c @@ -31,6 +31,7 @@ #include "nouveau_drm.h" #include "nouveau_dma.h" #include "nouveau_fence.h" +#include #define nouveau_gem_pushbuf_sync(chan) 0 @@ -277,6 +278,7 @@ struct validate_op { struct list_head vram_list; struct list_head gart_list; struct list_head both_list; + struct list_head prime_list; }; static void @@ -305,9 +307,36 @@ validate_fini_list(struct list_head *list, struct nouveau_fence *fence) static void validate_fini(struct validate_op *op, struct nouveau_fence* fence) { + struct list_head *entry, *tmp; + struct nouveau_bo *nvbo; + struct dma_buf *sync_buf; + u32 ofs, val; + validate_fini_list(&op->vram_list, fence); validate_fini_list(&op->gart_list, fence); validate_fini_list(&op->both_list, fence); + + if (list_empty(&op->prime_list)) + return; + + if (fence && + !nouveau_fence_prime_get(fence, &sync_buf, &ofs, &val)) { + dmabufmgr_eu_fence_buffer_objects(sync_buf, ofs, val, + &op->prime_list); + dma_buf_put(sync_buf); + } else + dmabufmgr_eu_backoff_reservation(&op->prime_list); + + list_for_each_safe(entry, tmp, &op->prime_list) { + struct dmabufmgr_validate *val; + val = list_entry(entry, struct dmabufmgr_validate, head); + nvbo = val->priv; + + list_del(&val->head); + nvbo->reserved_by = NULL; + drm_gem_object_unreference_unlocked(nvbo->gem); + kfree(val); + } } static int @@ -319,9 +348,9 @@ validate_init(struct nouveau_channel *chan, struct drm_file *file_priv, struct drm_nouveau_private *dev_priv = dev->dev_private; uint32_t sequence; int trycnt = 0; - int ret, i; + int i; - sequence = atomic_add_return(1, &dev_priv->ttm.validate_sequence); + sequence = atomic_inc_return(&dev_priv->ttm.validate_sequence); retry: if (++trycnt > 10) { NV_ERROR(dev, "%s failed and gave up.\n", __func__); @@ -332,6 +361,8 @@ retry: struct drm_nouveau_gem_pushbuf_bo *b = &pbbo[i]; struct drm_gem_object *gem; struct nouveau_bo *nvbo; + int ret = 0, is_prime; + struct dmabufmgr_validate *validate = NULL; gem = drm_gem_object_lookup(dev, file_priv, b->handle); if (!gem) { @@ -340,6 +371,7 @@ retry: return -ENOENT; } nvbo = gem->driver_private; + is_prime = gem->export_dma_buf || gem->import_attach; if (nvbo->reserved_by && nvbo->reserved_by == file_priv) { NV_ERROR(dev, "multiple instances of buffer %d on " @@ -349,7 +381,21 @@ retry: return -EINVAL; } - ret = ttm_bo_reserve(&nvbo->bo, true, false, true, sequence); + if (likely(!is_prime)) + ret = ttm_bo_reserve(&nvbo->bo, true, false, +true, sequence); + else { + validate = kzalloc(sizeof(*validate), GFP_KERNEL); + if (validate) { + if (gem->import_attach) + validate->bo = + gem->import_attach->dmabuf; + else + validate->bo = gem->export_dma_buf; + validate->priv = nvbo; + } else + ret = -ENOMEM; + } if (ret) { validate_fini(op, NULL); if (unlikely(ret == -EAGAIN)) @@ -366,6 +412,9 @@ retry: b->user_priv = (uint64_t)(unsigned long)nvbo; nvbo->reserved_by = file_priv; nvbo->pbbo_index = i; + if (is_prime) { + list_add_tail(&validate->head, &op->prime_list); + } else if ((b->valid_domains & NOUVEAU_GEM_DOMAIN_VRAM) && (b->valid_domains & NOUVEAU_GEM_DOMAIN_GART)) list_add_tail(&nvbo->entry, &op->both_list); @@ -473,6 +522,60 @@ validate_list(struct nouveau_channel *chan, struct list_head *list, } static int +validate_prime(struct nouveau_channel *chan,
[RFC PATCH 7/8] nouveau: nvc0 fence prime implementation
From: Maarten Lankhorst Create a read-only mapping for every imported bo, and create a prime bo in in system memory. Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/nouveau/nvc0_fence.c | 104 +- 1 file changed, 89 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nvc0_fence.c b/drivers/gpu/drm/nouveau/nvc0_fence.c index 198e31f..dc6ccab 100644 --- a/drivers/gpu/drm/nouveau/nvc0_fence.c +++ b/drivers/gpu/drm/nouveau/nvc0_fence.c @@ -37,6 +37,7 @@ struct nvc0_fence_priv { struct nvc0_fence_chan { struct nouveau_fence_chan base; struct nouveau_vma vma; + struct nouveau_vma prime_vma; }; static int @@ -45,19 +46,23 @@ nvc0_fence_emit(struct nouveau_fence *fence, bool prime) struct nouveau_channel *chan = fence->channel; struct nvc0_fence_chan *fctx = chan->engctx[NVOBJ_ENGINE_FENCE]; u64 addr = fctx->vma.offset + chan->id * 16; - int ret; + int ret, i; - ret = RING_SPACE(chan, 5); - if (ret == 0) { + ret = RING_SPACE(chan, prime ? 10 : 5); + if (ret) + return ret; + + for (i = 0; i < (prime ? 2 : 1); ++i) { + if (i) + addr = fctx->prime_vma.offset + chan->id * 16; BEGIN_NVC0(chan, 0, NV84_SUBCHAN_SEMAPHORE_ADDRESS_HIGH, 4); OUT_RING (chan, upper_32_bits(addr)); OUT_RING (chan, lower_32_bits(addr)); OUT_RING (chan, fence->sequence); OUT_RING (chan, NV84_SUBCHAN_SEMAPHORE_TRIGGER_WRITE_LONG); - FIRE_RING (chan); } - - return ret; + FIRE_RING(chan); + return 0; } static int @@ -95,6 +100,8 @@ nvc0_fence_context_del(struct nouveau_channel *chan, int engine) struct nvc0_fence_priv *priv = nv_engine(chan->dev, engine); struct nvc0_fence_chan *fctx = chan->engctx[engine]; + if (priv->base.prime_bo) + nouveau_bo_vma_del(priv->base.prime_bo, &fctx->prime_vma); nouveau_bo_vma_del(priv->bo, &fctx->vma); nouveau_fence_context_del(chan->dev, &fctx->base); chan->engctx[engine] = NULL; @@ -115,10 +122,16 @@ nvc0_fence_context_new(struct nouveau_channel *chan, int engine) nouveau_fence_context_new(&fctx->base); ret = nouveau_bo_vma_add(priv->bo, chan->vm, &fctx->vma); + if (!ret && priv->base.prime_bo) + ret = nouveau_bo_vma_add(priv->base.prime_bo, chan->vm, +&fctx->prime_vma); if (ret) nvc0_fence_context_del(chan, engine); - nouveau_bo_wr32(priv->bo, chan->id * 16/4, 0x); + fctx->base.sequence = nouveau_bo_rd32(priv->bo, chan->id * 16/4); + if (priv->base.prime_bo) + nouveau_bo_wr32(priv->base.prime_bo, chan->id * 16/4, + fctx->base.sequence); return ret; } @@ -140,12 +153,55 @@ nvc0_fence_destroy(struct drm_device *dev, int engine) struct drm_nouveau_private *dev_priv = dev->dev_private; struct nvc0_fence_priv *priv = nv_engine(dev, engine); + nouveau_fence_prime_del(&priv->base); nouveau_bo_unmap(priv->bo); + nouveau_bo_unpin(priv->bo); nouveau_bo_ref(NULL, &priv->bo); dev_priv->eng[engine] = NULL; kfree(priv); } +static int +nvc0_fence_prime_sync(struct nouveau_channel *chan, + struct nouveau_bo *bo, + u32 ofs, u32 val, u64 sema_start) +{ + struct nvc0_fence_chan *fctx = chan->engctx[NVOBJ_ENGINE_FENCE]; + struct nvc0_fence_priv *priv = nv_engine(chan->dev, NVOBJ_ENGINE_FENCE); + int ret = RING_SPACE(chan, 5); + if (ret) + return ret; + + if (bo == priv->base.prime_bo) + sema_start = fctx->prime_vma.offset; + else + NV_ERROR(chan->dev, "syncing with %08Lx + %08x >= %08x\n", + sema_start, ofs, val); + sema_start += ofs; + + BEGIN_NVC0(chan, 0, NV84_SUBCHAN_SEMAPHORE_ADDRESS_HIGH, 4); + OUT_RING (chan, upper_32_bits(sema_start)); + OUT_RING (chan, lower_32_bits(sema_start)); + OUT_RING (chan, val); + OUT_RING (chan, NV84_SUBCHAN_SEMAPHORE_TRIGGER_ACQUIRE_GEQUAL | +NVC0_SUBCHAN_SEMAPHORE_TRIGGER_YIELD); + FIRE_RING (chan); + return ret; +} + +static void +nvc0_fence_prime_del_import(struct nouveau_fence_prime_bo_entry *entry) { + nouveau_bo_vma_del(entry->bo, &entry->vma); +} + +static int +nvc0_fence_prime_add_import(struct nouveau_fence_prime_bo_entry *entry) { + int ret = nouveau_bo_vma_add_access(entry->bo, entry->chan->vm, + &entry->vma, NV_MEM_ACCESS_RO); + entry->sema_start = entry->vma.offset; + return ret; +} + int nvc0_fence_create(struct drm_device *dev) { @@ -168,17 +224,35 @@ nvc0_fence_create(struct d
[RFC PATCH 6/8] nouveau: nv84 fence prime implementation
From: Maarten Lankhorst Create a dma object for the prime semaphore and every imported sync bo. Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/nouveau/nv84_fence.c | 121 -- 1 file changed, 115 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nv84_fence.c b/drivers/gpu/drm/nouveau/nv84_fence.c index b5cfbcb..f739dfc 100644 --- a/drivers/gpu/drm/nouveau/nv84_fence.c +++ b/drivers/gpu/drm/nouveau/nv84_fence.c @@ -31,6 +31,7 @@ struct nv84_fence_chan { struct nouveau_fence_chan base; + u32 sema_start; }; struct nv84_fence_priv { @@ -42,21 +43,25 @@ static int nv84_fence_emit(struct nouveau_fence *fence, bool prime) { struct nouveau_channel *chan = fence->channel; - int ret = RING_SPACE(chan, 7); - if (ret == 0) { + int i, ret; + + ret = RING_SPACE(chan, prime ? 14 : 7); + if (ret) + return ret; + + for (i = 0; i < (prime ? 2 : 1); ++i) { BEGIN_NV04(chan, 0, NV11_SUBCHAN_DMA_SEMAPHORE, 1); - OUT_RING (chan, NvSema); + OUT_RING (chan, i ? NvSemaPrime : NvSema); BEGIN_NV04(chan, 0, NV84_SUBCHAN_SEMAPHORE_ADDRESS_HIGH, 4); OUT_RING (chan, upper_32_bits(chan->id * 16)); OUT_RING (chan, lower_32_bits(chan->id * 16)); OUT_RING (chan, fence->sequence); OUT_RING (chan, NV84_SUBCHAN_SEMAPHORE_TRIGGER_WRITE_LONG); - FIRE_RING (chan); } + FIRE_RING (chan); return ret; } - static int nv84_fence_sync(struct nouveau_fence *fence, struct nouveau_channel *prev, struct nouveau_channel *chan) @@ -82,12 +87,94 @@ nv84_fence_read(struct nouveau_channel *chan) return nv_ro32(priv->mem, chan->id * 16); } +static int +nv84_fence_prime_sync(struct nouveau_channel *chan, + struct nouveau_bo *bo, + u32 ofs, u32 val, u64 sema_start) +{ + struct nv84_fence_priv *priv = nv_engine(chan->dev, NVOBJ_ENGINE_FENCE); + int ret = RING_SPACE(chan, 7); + u32 sema = 0; + if (ret < 0) + return ret; + + if (bo == priv->base.prime_bo) { + sema = NvSema; + } else { + struct sg_table *sgt = bo->bo.sg; + struct scatterlist *sg; + u32 i; + sema = sema_start; + for_each_sg(sgt->sgl, sg, sgt->nents, i) { + if (ofs < sg->offset + sg->length) { + ofs -= sg->offset; + break; + } + sema++; + } + } + + BEGIN_NV04(chan, 0, NV11_SUBCHAN_DMA_SEMAPHORE, 1); + OUT_RING (chan, sema); + BEGIN_NV04(chan, 0, NV84_SUBCHAN_SEMAPHORE_ADDRESS_HIGH, 4); + OUT_RING (chan, 0); + OUT_RING (chan, ofs); + OUT_RING (chan, val); + OUT_RING (chan, NV84_SUBCHAN_SEMAPHORE_TRIGGER_ACQUIRE_GEQUAL); + FIRE_RING (chan); + return ret; +} + +static void +nv84_fence_prime_del_import(struct nouveau_fence_prime_bo_entry *entry) { + u32 i; + for (i = entry->sema_start; i < entry->sema_start + entry->sema_len; ++i) + nouveau_ramht_remove(entry->chan, i); +} + +static int +nv84_fence_prime_add_import(struct nouveau_fence_prime_bo_entry *entry) { + struct sg_table *sgt = entry->bo->bo.sg; + struct nouveau_channel *chan = entry->chan; + struct nv84_fence_chan *fctx = chan->engctx[NVOBJ_ENGINE_FENCE]; + struct scatterlist *sg; + u32 i, sema; + int ret; + + sema = entry->sema_start = fctx->sema_start; + entry->sema_len = 0; + + for_each_sg(sgt->sgl, sg, sgt->nents, i) { + struct nouveau_gpuobj *obj; + ret = nouveau_gpuobj_dma_new(chan, NV_CLASS_DMA_FROM_MEMORY, +sg_dma_address(sg), PAGE_SIZE, +NV_MEM_ACCESS_RO, +NV_MEM_TARGET_PCI, &obj); + if (ret) + goto err; + + ret = nouveau_ramht_insert(chan, sema, obj); + nouveau_gpuobj_ref(NULL, &obj); + if (ret) + goto err; + entry->sema_len++; + sema++; + } + fctx->sema_start += (entry->sema_len + 0xff) & ~0xff; + return 0; + +err: + nv84_fence_prime_del_import(entry); + return ret; +} + static void nv84_fence_context_del(struct nouveau_channel *chan, int engine) { struct nv84_fence_chan *fctx = chan->engctx[engine]; nouveau_fence_context_del(chan->dev, &fctx->base); chan->engctx[engine] = NULL; + kfree(fctx); } @@ -104,6 +191,7 @@ nv84_fence_context_new(struct nouveau_channel *chan, int engine) return -ENOMEM;
[RFC PATCH 5/8] nouveau: Add methods preparing for prime fencing
From: Maarten Lankhorst This can be used by nv84 and nvc0 to implement hardware fencing, earlier systems will require more thought but can fall back to software for now. Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/nouveau/nouveau_bo.c |6 +- drivers/gpu/drm/nouveau/nouveau_channel.c |2 +- drivers/gpu/drm/nouveau/nouveau_display.c |2 +- drivers/gpu/drm/nouveau/nouveau_dma.h |1 + drivers/gpu/drm/nouveau/nouveau_drv.h |5 + drivers/gpu/drm/nouveau/nouveau_fence.c | 242 - drivers/gpu/drm/nouveau/nouveau_fence.h | 44 +- drivers/gpu/drm/nouveau/nouveau_gem.c |6 +- drivers/gpu/drm/nouveau/nouveau_prime.c |2 + drivers/gpu/drm/nouveau/nv04_fence.c |4 +- drivers/gpu/drm/nouveau/nv10_fence.c |4 +- drivers/gpu/drm/nouveau/nv84_fence.c |4 +- drivers/gpu/drm/nouveau/nvc0_fence.c |4 +- 13 files changed, 304 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 4318320..a97025a 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -52,6 +52,9 @@ nouveau_bo_del_ttm(struct ttm_buffer_object *bo) DRM_ERROR("bo %p still attached to GEM object\n", bo); nv10_mem_put_tile_region(dev, nvbo->tile, NULL); + + if (nvbo->fence_import_attach) + nouveau_fence_prime_del_bo(nvbo); kfree(nvbo); } @@ -109,6 +112,7 @@ nouveau_bo_new(struct drm_device *dev, int size, int align, INIT_LIST_HEAD(&nvbo->head); INIT_LIST_HEAD(&nvbo->entry); INIT_LIST_HEAD(&nvbo->vma_list); + INIT_LIST_HEAD(&nvbo->prime_chan_entries); nvbo->tile_mode = tile_mode; nvbo->tile_flags = tile_flags; nvbo->bo.bdev = &dev_priv->ttm.bdev; @@ -480,7 +484,7 @@ nouveau_bo_move_accel_cleanup(struct nouveau_channel *chan, struct nouveau_fence *fence = NULL; int ret; - ret = nouveau_fence_new(chan, &fence); + ret = nouveau_fence_new(chan, &fence, false); if (ret) return ret; diff --git a/drivers/gpu/drm/nouveau/nouveau_channel.c b/drivers/gpu/drm/nouveau/nouveau_channel.c index 629d8a2..85a8556 100644 --- a/drivers/gpu/drm/nouveau/nouveau_channel.c +++ b/drivers/gpu/drm/nouveau/nouveau_channel.c @@ -362,7 +362,7 @@ nouveau_channel_idle(struct nouveau_channel *chan) struct nouveau_fence *fence = NULL; int ret; - ret = nouveau_fence_new(chan, &fence); + ret = nouveau_fence_new(chan, &fence, false); if (!ret) { ret = nouveau_fence_wait(fence, false, false); nouveau_fence_unref(&fence); diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 69688ef..7c76776 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -466,7 +466,7 @@ nouveau_page_flip_emit(struct nouveau_channel *chan, } FIRE_RING (chan); - ret = nouveau_fence_new(chan, pfence); + ret = nouveau_fence_new(chan, pfence, false); if (ret) goto fail; diff --git a/drivers/gpu/drm/nouveau/nouveau_dma.h b/drivers/gpu/drm/nouveau/nouveau_dma.h index 8db68be..d02ffd3 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dma.h +++ b/drivers/gpu/drm/nouveau/nouveau_dma.h @@ -74,6 +74,7 @@ enum { NvEvoSema0 = 0x8010, NvEvoSema1 = 0x8011, NvNotify1 = 0x8012, + NvSemaPrime = 0x801f, /* G80+ display objects */ NvEvoVRAM = 0x0100, diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h index 2c17989..ad49594 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drv.h +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h @@ -126,6 +126,11 @@ struct nouveau_bo { struct ttm_bo_kmap_obj dma_buf_vmap; int vmapping_count; + + /* fence related stuff */ + struct nouveau_bo *sync_bo; + struct list_head prime_chan_entries; + struct dma_buf_attachment *fence_import_attach; }; #define nouveau_bo_tile_layout(nvbo) \ diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index 3c18049..d4c9c40 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -29,17 +29,64 @@ #include #include +#include #include "nouveau_drv.h" #include "nouveau_ramht.h" #include "nouveau_fence.h" #include "nouveau_software.h" #include "nouveau_dma.h" +#include "nouveau_fifo.h" + +int nouveau_fence_prime_init(struct drm_device *dev, +struct nouveau_fence_priv *priv, u32 align) +{ + int ret = 0; +#ifdef CONFIG_DMA_SHARED_BUFFER + struct nouveau_fifo_priv *pfifo = nv_engine(dev, NVOBJ_ENGINE_FIFO); + u32 size = PAGE_ALIGN(pfifo->c
[RFC PATCH 4/8] nouveau: add nouveau_bo_vma_add_access
From: Maarten Lankhorst This is needed to allow creation of read-only vm mappings in fence objects. Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/nouveau/nouveau_bo.c |6 +++--- drivers/gpu/drm/nouveau/nouveau_drv.h |6 -- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 7f80ed5..4318320 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -1443,15 +1443,15 @@ nouveau_bo_vma_find(struct nouveau_bo *nvbo, struct nouveau_vm *vm) } int -nouveau_bo_vma_add(struct nouveau_bo *nvbo, struct nouveau_vm *vm, - struct nouveau_vma *vma) +nouveau_bo_vma_add_access(struct nouveau_bo *nvbo, struct nouveau_vm *vm, + struct nouveau_vma *vma, u32 access) { const u32 size = nvbo->bo.mem.num_pages << PAGE_SHIFT; struct nouveau_mem *node = nvbo->bo.mem.mm_node; int ret; ret = nouveau_vm_get(vm, size, nvbo->page_shift, -NV_MEM_ACCESS_RW, vma); +access, vma); if (ret) return ret; diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h index 7c52eba..2c17989 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drv.h +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h @@ -1350,8 +1350,10 @@ extern int nouveau_bo_validate(struct nouveau_bo *, bool interruptible, extern struct nouveau_vma * nouveau_bo_vma_find(struct nouveau_bo *, struct nouveau_vm *); -extern int nouveau_bo_vma_add(struct nouveau_bo *, struct nouveau_vm *, - struct nouveau_vma *); +#define nouveau_bo_vma_add(nvbo, vm, vma) \ + nouveau_bo_vma_add_access((nvbo), (vm), (vma), NV_MEM_ACCESS_RW) +extern int nouveau_bo_vma_add_access(struct nouveau_bo *, struct nouveau_vm *, +struct nouveau_vma *, u32 access); extern void nouveau_bo_vma_del(struct nouveau_bo *, struct nouveau_vma *); /* nouveau_gem.c */ -- 1.7.9.5
[RFC PATCH 3/8] nouveau: Extend prime code
From: Maarten Lankhorst The prime code no longer requires the bo to be backed by a gem object, and cpu access calls have been implemented. This will be needed for exporting fence bo's. Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/nouveau/nouveau_drv.h |6 +- drivers/gpu/drm/nouveau/nouveau_prime.c | 106 +-- 2 files changed, 79 insertions(+), 33 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h index 8613cb2..7c52eba 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drv.h +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h @@ -1374,11 +1374,15 @@ extern int nouveau_gem_ioctl_cpu_fini(struct drm_device *, void *, extern int nouveau_gem_ioctl_info(struct drm_device *, void *, struct drm_file *); +extern int nouveau_gem_prime_export_bo(struct nouveau_bo *nvbo, int flags, + u32 size, struct dma_buf **ret); extern struct dma_buf *nouveau_gem_prime_export(struct drm_device *dev, struct drm_gem_object *obj, int flags); extern struct drm_gem_object *nouveau_gem_prime_import(struct drm_device *dev, struct dma_buf *dma_buf); - +extern int nouveau_prime_import_bo(struct drm_device *dev, + struct dma_buf *dma_buf, + struct nouveau_bo **pnvbo, bool gem); /* nouveau_display.c */ int nouveau_display_create(struct drm_device *dev); void nouveau_display_destroy(struct drm_device *dev); diff --git a/drivers/gpu/drm/nouveau/nouveau_prime.c b/drivers/gpu/drm/nouveau/nouveau_prime.c index a25cf2c..537154d3 100644 --- a/drivers/gpu/drm/nouveau/nouveau_prime.c +++ b/drivers/gpu/drm/nouveau/nouveau_prime.c @@ -35,7 +35,8 @@ static struct sg_table *nouveau_gem_map_dma_buf(struct dma_buf_attachment *attac enum dma_data_direction dir) { struct nouveau_bo *nvbo = attachment->dmabuf->priv; - struct drm_device *dev = nvbo->gem->dev; + struct drm_nouveau_private *dev_priv = nouveau_bdev(nvbo->bo.bdev); + struct drm_device *dev = dev_priv->dev; int npages = nvbo->bo.num_pages; struct sg_table *sg; int nents; @@ -59,29 +60,37 @@ static void nouveau_gem_dmabuf_release(struct dma_buf *dma_buf) { struct nouveau_bo *nvbo = dma_buf->priv; - if (nvbo->gem->export_dma_buf == dma_buf) { - nvbo->gem->export_dma_buf = NULL; + nouveau_bo_unpin(nvbo); + if (!nvbo->gem) + nouveau_bo_ref(NULL, &nvbo); + else { + if (nvbo->gem->export_dma_buf == dma_buf) + nvbo->gem->export_dma_buf = NULL; drm_gem_object_unreference_unlocked(nvbo->gem); } } static void *nouveau_gem_kmap_atomic(struct dma_buf *dma_buf, unsigned long page_num) { - return NULL; + struct nouveau_bo *nvbo = dma_buf->priv; + return kmap_atomic(nvbo->bo.ttm->pages[page_num]); } static void nouveau_gem_kunmap_atomic(struct dma_buf *dma_buf, unsigned long page_num, void *addr) { - + kunmap_atomic(addr); } + static void *nouveau_gem_kmap(struct dma_buf *dma_buf, unsigned long page_num) { - return NULL; + struct nouveau_bo *nvbo = dma_buf->priv; + return kmap(nvbo->bo.ttm->pages[page_num]); } static void nouveau_gem_kunmap(struct dma_buf *dma_buf, unsigned long page_num, void *addr) { - + struct nouveau_bo *nvbo = dma_buf->priv; + return kunmap(nvbo->bo.ttm->pages[page_num]); } static int nouveau_gem_prime_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma) @@ -92,7 +101,8 @@ static int nouveau_gem_prime_mmap(struct dma_buf *dma_buf, struct vm_area_struct static void *nouveau_gem_prime_vmap(struct dma_buf *dma_buf) { struct nouveau_bo *nvbo = dma_buf->priv; - struct drm_device *dev = nvbo->gem->dev; + struct drm_nouveau_private *dev_priv = nouveau_bdev(nvbo->bo.bdev); + struct drm_device *dev = dev_priv->dev; int ret; mutex_lock(&dev->struct_mutex); @@ -116,7 +126,8 @@ out_unlock: static void nouveau_gem_prime_vunmap(struct dma_buf *dma_buf, void *vaddr) { struct nouveau_bo *nvbo = dma_buf->priv; - struct drm_device *dev = nvbo->gem->dev; + struct drm_nouveau_private *dev_priv = nouveau_bdev(nvbo->bo.bdev); + struct drm_device *dev = dev_priv->dev; mutex_lock(&dev->struct_mutex); nvbo->vmapping_count--; @@ -140,10 +151,9 @@ static const struct dma_buf_ops nouveau_dmabuf_ops = { }; static int -nouveau_prime_new(struct drm_device *dev, - size_t size, +nouveau_prime_new(struct drm_device *dev, size_t size, struct sg_table *sg, - struct nouveau_bo **pnvbo) + struct nouveau_bo **pnvbo, bool gem) { struct nouveau_bo *nvbo; u32 flags = 0; @@
[RFC PATCH 2/8] prime wip: i915
From: Maarten Lankhorst Export the hardware status page so others can read seqno. Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 29 -- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 87 drivers/gpu/drm/i915/intel_ringbuffer.c| 42 ++ drivers/gpu/drm/i915/intel_ringbuffer.h|3 + 4 files changed, 145 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c index aa308e1..d6bcfdc 100644 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c @@ -66,12 +66,25 @@ static void i915_gem_unmap_dma_buf(struct dma_buf_attachment *attachment, static void i915_gem_dmabuf_release(struct dma_buf *dma_buf) { struct drm_i915_gem_object *obj = dma_buf->priv; + struct drm_device *dev = obj->base.dev; + + mutex_lock(&dev->struct_mutex); if (obj->base.export_dma_buf == dma_buf) { - /* drop the reference on the export fd holds */ obj->base.export_dma_buf = NULL; - drm_gem_object_unreference_unlocked(&obj->base); + } else { + drm_i915_private_t *dev_priv = dev->dev_private; + struct intel_ring_buffer *ring; + int i; + + for_each_ring(ring, dev_priv, i) + WARN_ON(ring->sync_buf == dma_buf); } + + /* drop the reference on the export fd holds */ + drm_gem_object_unreference(&obj->base); + + mutex_unlock(&dev->struct_mutex); } static void *i915_gem_dmabuf_vmap(struct dma_buf *dma_buf) @@ -129,21 +142,25 @@ static void i915_gem_dmabuf_vunmap(struct dma_buf *dma_buf, void *vaddr) static void *i915_gem_dmabuf_kmap_atomic(struct dma_buf *dma_buf, unsigned long page_num) { - return NULL; + struct drm_i915_gem_object *obj = dma_buf->priv; + return kmap_atomic(obj->pages[page_num]); } static void i915_gem_dmabuf_kunmap_atomic(struct dma_buf *dma_buf, unsigned long page_num, void *addr) { - + kunmap_atomic(addr); } + static void *i915_gem_dmabuf_kmap(struct dma_buf *dma_buf, unsigned long page_num) { - return NULL; + struct drm_i915_gem_object *obj = dma_buf->priv; + return kmap(obj->pages[page_num]); } static void i915_gem_dmabuf_kunmap(struct dma_buf *dma_buf, unsigned long page_num, void *addr) { - + struct drm_i915_gem_object *obj = dma_buf->priv; + kunmap(obj->pages[page_num]); } static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma) diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 88e2e11..245340e 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -33,6 +33,7 @@ #include "i915_trace.h" #include "intel_drv.h" #include +#include struct change_domains { uint32_t invalidate_domains; @@ -556,7 +557,8 @@ err_unpin: static int i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring, struct drm_file *file, - struct list_head *objects) + struct list_head *objects, + struct list_head *prime_val) { drm_i915_private_t *dev_priv = ring->dev->dev_private; struct drm_i915_gem_object *obj; @@ -564,6 +566,31 @@ i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring, bool has_fenced_gpu_access = INTEL_INFO(ring->dev)->gen < 4; struct list_head ordered_objects; + list_for_each_entry(obj, objects, exec_list) { + struct dmabufmgr_validate *val; + + if (!(obj->base.import_attach || + obj->base.export_dma_buf)) + continue; + + val = kzalloc(sizeof(*val), GFP_KERNEL); + if (!val) + return -ENOMEM; + + if (obj->base.export_dma_buf) + val->bo = obj->base.export_dma_buf; + else + val->bo = obj->base.import_attach->dmabuf; + val->priv = obj; + list_add_tail(&val->head, prime_val); + } + + if (!list_empty(prime_val)) { + ret = dmabufmgr_eu_reserve_buffers(prime_val); + if (ret) + return ret; + } + INIT_LIST_HEAD(&ordered_objects); while (!list_empty(objects)) { struct drm_i915_gem_exec_object2 *entry; @@ -712,6 +739,7 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev, struct drm_file *file, struct intel_ring_buffer *ring, struct list_head *objects, + struct list_head *prime_val, struct eb_objects *
[RFC PATCH 1/8] dma-buf-mgr: Try 2
From: Maarten Lankhorst Core code based on ttm_bo and ttm_execbuf_util Signed-off-by: Maarten Lankhorst --- drivers/base/Makefile |2 +- drivers/base/dma-buf-mgr-eu.c | 263 + drivers/base/dma-buf-mgr.c| 149 +++ drivers/base/dma-buf.c|4 + include/linux/dma-buf-mgr.h | 150 +++ include/linux/dma-buf.h | 24 6 files changed, 591 insertions(+), 1 deletion(-) create mode 100644 drivers/base/dma-buf-mgr-eu.c create mode 100644 drivers/base/dma-buf-mgr.c create mode 100644 include/linux/dma-buf-mgr.h diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 5aa2d70..86e7598 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -10,7 +10,7 @@ obj-$(CONFIG_CMA) += dma-contiguous.o obj-y += power/ obj-$(CONFIG_HAS_DMA) += dma-mapping.o obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o -obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o +obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o dma-buf-mgr.o dma-buf-mgr-eu.o obj-$(CONFIG_ISA) += isa.o obj-$(CONFIG_FW_LOADER)+= firmware_class.o obj-$(CONFIG_NUMA) += node.o diff --git a/drivers/base/dma-buf-mgr-eu.c b/drivers/base/dma-buf-mgr-eu.c new file mode 100644 index 000..ed5e01c --- /dev/null +++ b/drivers/base/dma-buf-mgr-eu.c @@ -0,0 +1,263 @@ +/* + * Copyright (C) 2012 Canonical Ltd + * + * Based on ttm_bo.c which bears the following copyright notice, + * but is dual licensed: + * + * Copyright (c) 2006-2009 VMware, Inc., Palo Alto, CA., USA + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sub license, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial portions + * of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, + * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR + * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE + * USE OR OTHER DEALINGS IN THE SOFTWARE. + * + **/ + +#include +#include +#include + +static void dmabufmgr_eu_backoff_reservation_locked(struct list_head *list) +{ + struct dmabufmgr_validate *entry; + + list_for_each_entry(entry, list, head) { + struct dma_buf *bo = entry->bo; + if (!entry->reserved) + continue; + entry->reserved = false; + + bo->sync_buf = entry->sync_buf; + entry->sync_buf = NULL; + + atomic_set(&bo->reserved, 0); + wake_up_all(&bo->event_queue); + } +} + +static int +dmabufmgr_eu_wait_unreserved_locked(struct list_head *list, + struct dma_buf *bo) +{ + int ret; + + spin_unlock(&dmabufmgr.lru_lock); + ret = dmabufmgr_bo_wait_unreserved(bo, true); + spin_lock(&dmabufmgr.lru_lock); + if (unlikely(ret != 0)) + dmabufmgr_eu_backoff_reservation_locked(list); + return ret; +} + +void +dmabufmgr_eu_backoff_reservation(struct list_head *list) +{ + if (list_empty(list)) + return; + + spin_lock(&dmabufmgr.lru_lock); + dmabufmgr_eu_backoff_reservation_locked(list); + spin_unlock(&dmabufmgr.lru_lock); +} +EXPORT_SYMBOL_GPL(dmabufmgr_eu_backoff_reservation); + +int +dmabufmgr_eu_reserve_buffers(struct list_head *list) +{ + struct dmabufmgr_validate *entry; + int ret; + u32 val_seq; + + if (list_empty(list)) + return 0; + + list_for_each_entry(entry, list, head) { + entry->reserved = false; + entry->sync_buf = NULL; + } + +retry: + spin_lock(&dmabufmgr.lru_lock); + val_seq = dmabufmgr.counter++; + + list_for_each_entry(entry, list, head) { + struct dma_buf *bo = entry->bo; + +retry_this_bo: + ret = dmabufmgr_bo_reserve_locked(bo, true, true, true, val_seq); + switch (ret) { + case 0: + break; + case -EBUSY: + ret = dmabufmgr_eu_wait_unreserved_locked(list, bo); +
[RFC PATCH 0/8] Dmabuf synchronization
This patch implements my attempt at dmabuf synchronization. The core idea is that a lot of devices will have their own methods of synchronization, but more complicated devices allow some way of fencing, so why not export those as dma-buf? This patchset implements dmabufmgr, which is based on ttm's code. The ttm code deals with a lot more than just reservation however, I took out almost all the code not dealing with reservations. I used the drm-intel-next-queued tree as base. It contains some i915 flushing changes. I would rather use linux-next, but the deferred fput code makes my system unbootable. That is unfortunate since it would reduce the deadlocks happening in dma_buf_put when 2 devices release each other's dmabuf. The i915 changes implement a simple cpu wait only, the nouveau code imports the sync dmabuf read-only and maps it to affected channels, then performs a wait on it in hardware. Since the hardware may still be processing other commands, it could be the case that no hardware wait would have to be performed at all. Only the nouveau nv84 code is tested, but the nvc0 code should work as well.
[PATCH 14/15] drm/radeon: record what is next valid wptr for each ring v2
On Tue, Jul 10, 2012 at 8:51 AM, Christian K?nig wrote: > Before emitting any indirect buffer, emit the offset of the next > valid ring content if any. This allow code that want to resume > ring to resume ring right after ib that caused GPU lockup. > > v2: use scratch registers instead of storing it into memory > Why using scratch register ? To minimize bus activities ? > Signed-off-by: Jerome Glisse > Signed-off-by: Christian K?nig > --- > drivers/gpu/drm/radeon/evergreen.c |8 +++- > drivers/gpu/drm/radeon/ni.c | 11 ++- > drivers/gpu/drm/radeon/r600.c| 18 -- > drivers/gpu/drm/radeon/radeon.h |1 + > drivers/gpu/drm/radeon/radeon_ring.c |4 > drivers/gpu/drm/radeon/rv770.c |4 +++- > drivers/gpu/drm/radeon/si.c | 22 +++--- > 7 files changed, 60 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/evergreen.c > b/drivers/gpu/drm/radeon/evergreen.c > index f39b900..40de347 100644 > --- a/drivers/gpu/drm/radeon/evergreen.c > +++ b/drivers/gpu/drm/radeon/evergreen.c > @@ -1368,7 +1368,13 @@ void evergreen_ring_ib_execute(struct radeon_device > *rdev, struct radeon_ib *ib) > /* set to DX10/11 mode */ > radeon_ring_write(ring, PACKET3(PACKET3_MODE_CONTROL, 0)); > radeon_ring_write(ring, 1); > - /* FIXME: implement */ > + > + if (ring->rptr_save_reg) { > + uint32_t next_rptr = ring->wptr + 2 + 4; > + radeon_ring_write(ring, PACKET0(ring->rptr_save_reg, 0)); > + radeon_ring_write(ring, next_rptr); > + } > + > radeon_ring_write(ring, PACKET3(PACKET3_INDIRECT_BUFFER, 2)); > radeon_ring_write(ring, > #ifdef __BIG_ENDIAN > diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c > index f2afefb..6e3d448 100644 > --- a/drivers/gpu/drm/radeon/ni.c > +++ b/drivers/gpu/drm/radeon/ni.c > @@ -855,6 +855,13 @@ void cayman_ring_ib_execute(struct radeon_device *rdev, > struct radeon_ib *ib) > /* set to DX10/11 mode */ > radeon_ring_write(ring, PACKET3(PACKET3_MODE_CONTROL, 0)); > radeon_ring_write(ring, 1); > + > + if (ring->rptr_save_reg) { > + uint32_t next_rptr = ring->wptr + 2 + 4; I would rather also skip the surface sync so add another + 8 > + radeon_ring_write(ring, PACKET0(ring->rptr_save_reg, 0)); > + radeon_ring_write(ring, next_rptr); > + } > + > radeon_ring_write(ring, PACKET3(PACKET3_INDIRECT_BUFFER, 2)); > radeon_ring_write(ring, > #ifdef __BIG_ENDIAN > @@ -981,8 +988,10 @@ static int cayman_cp_start(struct radeon_device *rdev) > > static void cayman_cp_fini(struct radeon_device *rdev) > { > + struct radeon_ring *ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]; > cayman_cp_enable(rdev, false); > - radeon_ring_fini(rdev, &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]); > + radeon_ring_fini(rdev, ring); > + radeon_scratch_free(rdev, ring->rptr_save_reg); > } > > int cayman_cp_resume(struct radeon_device *rdev) > diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c > index c808fa9..74fca15 100644 > --- a/drivers/gpu/drm/radeon/r600.c > +++ b/drivers/gpu/drm/radeon/r600.c > @@ -2155,18 +2155,27 @@ int r600_cp_resume(struct radeon_device *rdev) > void r600_ring_init(struct radeon_device *rdev, struct radeon_ring *ring, > unsigned ring_size) > { > u32 rb_bufsz; > + int r; > > /* Align ring size */ > rb_bufsz = drm_order(ring_size / 8); > ring_size = (1 << (rb_bufsz + 1)) * 4; > ring->ring_size = ring_size; > ring->align_mask = 16 - 1; > + > + r = radeon_scratch_get(rdev, &ring->rptr_save_reg); > + if (r) { > + DRM_ERROR("failed to get scratch reg for rptr save (%d).\n", > r); > + ring->rptr_save_reg = 0; > + } > } > > void r600_cp_fini(struct radeon_device *rdev) > { > + struct radeon_ring *ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]; > r600_cp_stop(rdev); > - radeon_ring_fini(rdev, &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]); > + radeon_ring_fini(rdev, ring); > + radeon_scratch_free(rdev, ring->rptr_save_reg); > } > > > @@ -2568,7 +2577,12 @@ void r600_ring_ib_execute(struct radeon_device *rdev, > struct radeon_ib *ib) > { > struct radeon_ring *ring = &rdev->ring[ib->ring]; > > - /* FIXME: implement */ > + if (ring->rptr_save_reg) { > + uint32_t next_rptr = ring->wptr + 2 + 4; > + radeon_ring_write(ring, PACKET0(ring->rptr_save_reg, 0)); > + radeon_ring_write(ring, next_rptr); > + } > + > radeon_ring_write(ring, PACKET3(PACKET3_INDIRECT_BUFFER, 2)); > radeon_ring_write(ring, > #ifdef __BIG_ENDIAN > diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h > index 872270c..64d39ad
[PATCH 1/2] drm: Add colouring to the range allocator
In order to support snoopable memory on non-LLC architectures (so that we can bind vgem objects into the i915 GATT for example), we have to avoid the prefetcher on the GPU from crossing memory domains and so prevent allocation of a snoopable PTE immediately following an uncached PTE. To do that, we need to extend the range allocator with support for tracking and segregating different node colours. This will be used by i915 to segregate memory domains within the GTT. v2: Now with more drm_mm helpers and less driver interference. Signed-off-by: Chris Wilson Cc: Dave Airlie Cc: Ben Skeggs Cc: Jerome Glisse Cc: Alex Deucher Cc: Daniel Vetter Cc: dri-devel at lists.freedesktop.org --- drivers/gpu/drm/drm_gem.c |2 +- drivers/gpu/drm/drm_mm.c | 169 - drivers/gpu/drm/i915/i915_gem.c |6 +- drivers/gpu/drm/i915/i915_gem_evict.c |9 +- include/drm/drm_mm.h | 93 +++--- 5 files changed, 191 insertions(+), 88 deletions(-) diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index d58e69d..fbe0842 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -354,7 +354,7 @@ drm_gem_create_mmap_offset(struct drm_gem_object *obj) /* Get a DRM GEM mmap offset allocated... */ list->file_offset_node = drm_mm_search_free(&mm->offset_manager, - obj->size / PAGE_SIZE, 0, 0); + obj->size / PAGE_SIZE, 0, false); if (!list->file_offset_node) { DRM_ERROR("failed to allocate offset for bo %d\n", obj->name); diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c index 961fb54..9bb82f7 100644 --- a/drivers/gpu/drm/drm_mm.c +++ b/drivers/gpu/drm/drm_mm.c @@ -118,45 +118,53 @@ static inline unsigned long drm_mm_hole_node_end(struct drm_mm_node *hole_node) static void drm_mm_insert_helper(struct drm_mm_node *hole_node, struct drm_mm_node *node, -unsigned long size, unsigned alignment) +unsigned long size, unsigned alignment, +unsigned long color) { struct drm_mm *mm = hole_node->mm; - unsigned long tmp = 0, wasted = 0; unsigned long hole_start = drm_mm_hole_node_start(hole_node); unsigned long hole_end = drm_mm_hole_node_end(hole_node); + unsigned long adj_start = hole_start; + unsigned long adj_end = hole_end; BUG_ON(!hole_node->hole_follows || node->allocated); - if (alignment) - tmp = hole_start % alignment; + if (mm->color_adjust) + mm->color_adjust(hole_node, color, &adj_start, &adj_end); - if (!tmp) { + if (alignment) { + unsigned tmp = adj_start % alignment; + if (tmp) + adj_start += alignment - tmp; + } + + if (adj_start == hole_start) { hole_node->hole_follows = 0; - list_del_init(&hole_node->hole_stack); - } else - wasted = alignment - tmp; + list_del(&hole_node->hole_stack); + } - node->start = hole_start + wasted; + node->start = adj_start; node->size = size; node->mm = mm; + node->color = color; node->allocated = 1; INIT_LIST_HEAD(&node->hole_stack); list_add(&node->node_list, &hole_node->node_list); - BUG_ON(node->start + node->size > hole_end); + BUG_ON(node->start + node->size > adj_end); + node->hole_follows = 0; if (node->start + node->size < hole_end) { list_add(&node->hole_stack, &mm->hole_stack); node->hole_follows = 1; - } else { - node->hole_follows = 0; } } struct drm_mm_node *drm_mm_get_block_generic(struct drm_mm_node *hole_node, unsigned long size, unsigned alignment, +unsigned long color, int atomic) { struct drm_mm_node *node; @@ -165,7 +173,7 @@ struct drm_mm_node *drm_mm_get_block_generic(struct drm_mm_node *hole_node, if (unlikely(node == NULL)) return NULL; - drm_mm_insert_helper(hole_node, node, size, alignment); + drm_mm_insert_helper(hole_node, node, size, alignment, color); return node; } @@ -181,11 +189,11 @@ int drm_mm_insert_node(struct drm_mm *mm, struct drm_mm_node *node, { struct drm_mm_node *hole_node; - hole_node = drm_mm_search_free(mm, size, alignment, 0); + hole_node = drm_mm_search_free(mm, size, alignment, false); if (!hole_node) return -ENOSPC; - drm_mm_insert_helper(hole_node, node, size, alignment); + drm_mm_insert_helper(hole_node, node, size,
Re: [PATCH 14/15] drm/radeon: record what is next valid wptr for each ring v2
On Tue, Jul 10, 2012 at 8:51 AM, Christian König wrote: > Before emitting any indirect buffer, emit the offset of the next > valid ring content if any. This allow code that want to resume > ring to resume ring right after ib that caused GPU lockup. > > v2: use scratch registers instead of storing it into memory > Why using scratch register ? To minimize bus activities ? > Signed-off-by: Jerome Glisse > Signed-off-by: Christian König > --- > drivers/gpu/drm/radeon/evergreen.c |8 +++- > drivers/gpu/drm/radeon/ni.c | 11 ++- > drivers/gpu/drm/radeon/r600.c| 18 -- > drivers/gpu/drm/radeon/radeon.h |1 + > drivers/gpu/drm/radeon/radeon_ring.c |4 > drivers/gpu/drm/radeon/rv770.c |4 +++- > drivers/gpu/drm/radeon/si.c | 22 +++--- > 7 files changed, 60 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/evergreen.c > b/drivers/gpu/drm/radeon/evergreen.c > index f39b900..40de347 100644 > --- a/drivers/gpu/drm/radeon/evergreen.c > +++ b/drivers/gpu/drm/radeon/evergreen.c > @@ -1368,7 +1368,13 @@ void evergreen_ring_ib_execute(struct radeon_device > *rdev, struct radeon_ib *ib) > /* set to DX10/11 mode */ > radeon_ring_write(ring, PACKET3(PACKET3_MODE_CONTROL, 0)); > radeon_ring_write(ring, 1); > - /* FIXME: implement */ > + > + if (ring->rptr_save_reg) { > + uint32_t next_rptr = ring->wptr + 2 + 4; > + radeon_ring_write(ring, PACKET0(ring->rptr_save_reg, 0)); > + radeon_ring_write(ring, next_rptr); > + } > + > radeon_ring_write(ring, PACKET3(PACKET3_INDIRECT_BUFFER, 2)); > radeon_ring_write(ring, > #ifdef __BIG_ENDIAN > diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c > index f2afefb..6e3d448 100644 > --- a/drivers/gpu/drm/radeon/ni.c > +++ b/drivers/gpu/drm/radeon/ni.c > @@ -855,6 +855,13 @@ void cayman_ring_ib_execute(struct radeon_device *rdev, > struct radeon_ib *ib) > /* set to DX10/11 mode */ > radeon_ring_write(ring, PACKET3(PACKET3_MODE_CONTROL, 0)); > radeon_ring_write(ring, 1); > + > + if (ring->rptr_save_reg) { > + uint32_t next_rptr = ring->wptr + 2 + 4; I would rather also skip the surface sync so add another + 8 > + radeon_ring_write(ring, PACKET0(ring->rptr_save_reg, 0)); > + radeon_ring_write(ring, next_rptr); > + } > + > radeon_ring_write(ring, PACKET3(PACKET3_INDIRECT_BUFFER, 2)); > radeon_ring_write(ring, > #ifdef __BIG_ENDIAN > @@ -981,8 +988,10 @@ static int cayman_cp_start(struct radeon_device *rdev) > > static void cayman_cp_fini(struct radeon_device *rdev) > { > + struct radeon_ring *ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]; > cayman_cp_enable(rdev, false); > - radeon_ring_fini(rdev, &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]); > + radeon_ring_fini(rdev, ring); > + radeon_scratch_free(rdev, ring->rptr_save_reg); > } > > int cayman_cp_resume(struct radeon_device *rdev) > diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c > index c808fa9..74fca15 100644 > --- a/drivers/gpu/drm/radeon/r600.c > +++ b/drivers/gpu/drm/radeon/r600.c > @@ -2155,18 +2155,27 @@ int r600_cp_resume(struct radeon_device *rdev) > void r600_ring_init(struct radeon_device *rdev, struct radeon_ring *ring, > unsigned ring_size) > { > u32 rb_bufsz; > + int r; > > /* Align ring size */ > rb_bufsz = drm_order(ring_size / 8); > ring_size = (1 << (rb_bufsz + 1)) * 4; > ring->ring_size = ring_size; > ring->align_mask = 16 - 1; > + > + r = radeon_scratch_get(rdev, &ring->rptr_save_reg); > + if (r) { > + DRM_ERROR("failed to get scratch reg for rptr save (%d).\n", > r); > + ring->rptr_save_reg = 0; > + } > } > > void r600_cp_fini(struct radeon_device *rdev) > { > + struct radeon_ring *ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]; > r600_cp_stop(rdev); > - radeon_ring_fini(rdev, &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]); > + radeon_ring_fini(rdev, ring); > + radeon_scratch_free(rdev, ring->rptr_save_reg); > } > > > @@ -2568,7 +2577,12 @@ void r600_ring_ib_execute(struct radeon_device *rdev, > struct radeon_ib *ib) > { > struct radeon_ring *ring = &rdev->ring[ib->ring]; > > - /* FIXME: implement */ > + if (ring->rptr_save_reg) { > + uint32_t next_rptr = ring->wptr + 2 + 4; > + radeon_ring_write(ring, PACKET0(ring->rptr_save_reg, 0)); > + radeon_ring_write(ring, next_rptr); > + } > + > radeon_ring_write(ring, PACKET3(PACKET3_INDIRECT_BUFFER, 2)); > radeon_ring_write(ring, > #ifdef __BIG_ENDIAN > diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h > index 872270c..64d39ad
Re: [PATCH 15/15] drm/radeon: implement ring saving on reset v2
On Die, 2012-07-10 at 14:51 +0200, Christian König wrote: > Try to save whatever is on the rings when > we encounter an lockup. > > v2: Fix spelling error. Free saved ring data if reset fails. > Add documentation for the new functions. > > Signed-off-by: Christian König Just some more spelling nits, otherwise this patch and patch 13 are Reviewed-by: Michel Dänzer > +/** > + * radeon_ring_backup - Backup the content of a ring > + * > + * @rdev: radeon_device pointer > + * @ring: the ring we want to backup 'back up', in both cases. > + * Saves all unprocessed commits to a ring, returns the number of dwords > saved. > + */ 'unprocessed commands from'? > +/** > + * radeon_ring_restore - append saved commands to the ring again > + * > + * @rdev: radeon_device pointer > + * @ring: ring to append commands to > + * @size: number of dwords we want to write > + * @data: saved commands > + * > + * Allocates space on the ring and restore the previusly saved commands. 'previously' -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 45018] [bisected] rendering regression since added support for virtual address space on cayman v11
https://bugs.freedesktop.org/show_bug.cgi?id=45018 --- Comment #65 from Alexandre Demers 2012-07-10 00:23:46 PDT --- Created attachment 64053 --> https://bugs.freedesktop.org/attachment.cgi?id=64053 xsession with drm-next .xsession with drm-next branch -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[Bug 45018] [bisected] rendering regression since added support for virtual address space on cayman v11
https://bugs.freedesktop.org/show_bug.cgi?id=45018 --- Comment #64 from Alexandre Demers 2012-07-10 00:22:55 PDT --- Created attachment 64052 --> https://bugs.freedesktop.org/attachment.cgi?id=64052 dmesg drm-next dmesg with latest drm-next branch -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[Bug 45018] [bisected] rendering regression since added support for virtual address space on cayman v11
https://bugs.freedesktop.org/show_bug.cgi?id=45018 --- Comment #63 from Alexandre Demers 2012-07-10 00:21:56 PDT --- Now running latest drm-next just in case. Always the same error, but with a little something new: with regular kernel, once the GPU crashed, it stays this way. With the drm-next branch, it loops. Attaching some files in a moment. I just started Gnome Shell, then opened a terminal window and launched piglit r600 tests. I'm pretty sure (dmesg): [ 66.238981] radeon :01:00.0: bo 88020f46bc00 va 0x0183B000 conflict with (bo 88021b65d000 0x0183B000 0x0183C000) [ 66.271373] radeon :01:00.0: bo 880222cc9400 va 0x01814000 conflict with (bo 880221a50800 0x01814000 0x01815000) [ 66.334540] radeon :01:00.0: bo 880222b7 va 0x01809000 conflict with (bo 8802230a9000 0x01809000 0x0180A000) corresponds to (.xsession-error): radeon: Failed to allocate a buffer: radeon:size : 256 bytes radeon:alignment : 256 bytes radeon:domains : 2 EE r600_texture.c:869 r600_texture_get_transfer - failed to create temporary texture to hold untiled copy Mesa: User error: GL_OUT_OF_MEMORY in glTexSubImage radeon: Failed to allocate a buffer: radeon:size : 256 bytes radeon:alignment : 256 bytes radeon:domains : 2 EE r600_texture.c:869 r600_texture_get_transfer - failed to create temporary texture to hold untiled copy radeon: Failed to allocate a buffer: radeon:size : 256 bytes radeon:alignment : 256 bytes radeon:domains : 2 EE r600_texture.c:869 r600_texture_get_transfer - failed to create temporary texture to hold untiled copy Then (dmesg): [ 196.710933] radeon :01:00.0: GPU lockup CP stall for more than 1msec [ 196.710946] radeon :01:00.0: GPU lockup (waiting for 0x0675 last fence id 0x066c) [ 196.711129] radeon :01:00.0: couldn't schedule ib [ 196.711239] radeon :01:00.0: couldn't schedule ib [ 196.711805] radeon :01:00.0: couldn't schedule ib [ 196.715732] radeon :01:00.0: couldn't schedule ib [ 196.715975] radeon :01:00.0: couldn't schedule ib [ 196.716362] radeon :01:00.0: couldn't schedule ib [ 196.716627] radeon :01:00.0: couldn't schedule ib [ 196.718012] radeon :01:00.0: couldn't schedule ib [ 196.718262] radeon :01:00.0: couldn't schedule ib [ 196.718480] radeon :01:00.0: couldn't schedule ib [ 196.718985] radeon :01:00.0: couldn't schedule ib [ 196.920396] radeon :01:00.0: couldn't schedule ib [ 196.920703] radeon :01:00.0: couldn't schedule ib [ 196.921084] radeon :01:00.0: couldn't schedule ib [ 196.921318] radeon :01:00.0: couldn't schedule ib [ 196.921558] radeon :01:00.0: couldn't schedule ib [ 196.921898] radeon :01:00.0: couldn't schedule ib [ 196.952350] radeon :01:00.0: couldn't schedule ib [ 196.952386] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 196.952439] BUG: unable to handle kernel NULL pointer dereference at 0008 [ 196.952494] IP: [] radeon_fence_ref+0xd/0x40 [radeon] [ 196.952531] PGD 221dc4067 PUD 2228ff067 PMD 0 [ 196.952556] Oops: [#1] PREEMPT SMP [ 196.952579] CPU 1 [ 196.952617] Modules linked in: fuse snd_usb_audio snd_usbmidi_lib snd_rawmidi powernow_k8 snd_seq_device radeon ttm joydev snd_hda_codec_hdmi ppdev evdev pwc snd_hda_codec_realtek r8712u(C) r8169 mperf parport_pc parport sp5100_tco usb_storage uas drm_kms_helper drm videobuf2_vmalloc videobuf2_memops hid_logitech_dj pcspkr processor snd_hda_intel snd_hda_codec i2c_algo_bit mii hid_generic videobuf2_core videodev media wmi kvm_amd snd_hwdep snd_pcm snd_page_alloc snd_timer psmouse i2c_piix4 usbhid firewire_ohci hid serio_raw i2c_core firewire_core k10temp kvm microcode crc_itu_t snd edac_core button soundcore edac_mce_amd ext4 crc16 jbd2 mbcache pata_acpi sr_mod sd_mod cdrom pata_atiixp ata_generic ohci_hcd ahci libahci libata ehci_hcd usbcore scsi_mod usb_common [ 196.952957] [ 196.952969] Pid: 715, comm: Xorg Tainted: G C 3.5.0-rc4-VANILLA-46957-g74da01d #1 Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H [ 196.953044] RIP: 0010:[] [] radeon_fence_ref+0xd/0x40 [radeon] [ 196.953092] RSP: 0018:8802230e9b48 EFLAGS: 00010286 ... and it loops. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[PATCH 3/3] drm/exynos: implement kmap/kunmap/kmap_atomic/kunmap_atomic functions of dma_buf_ops
Implement kmap/kmap_atomic, kunmap/kunmap_atomic functions of dma_buf_ops. Signed-off-by: Cooper Yuan --- drivers/gpu/drm/exynos/exynos_drm_dmabuf.c | 17 +++-- 1 files changed, 11 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c index 913a23e..805b344 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c +++ b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c @@ -138,30 +138,35 @@ static void exynos_dmabuf_release(struct dma_buf *dmabuf) static void *exynos_gem_dmabuf_kmap_atomic(struct dma_buf *dma_buf, unsigned long page_num) { - /* TODO */ + struct exynos_drm_gem_obj *exynos_gem_obj = dma_buf->priv; + struct exynos_drm_gem_buf *buf = exynos_gem_obj->buffer; - return NULL; + return kmap_atomic(buf->pages[page_num]); } static void exynos_gem_dmabuf_kunmap_atomic(struct dma_buf *dma_buf, unsigned long page_num, void *addr) { - /* TODO */ + kunmap_atomic(addr); } static void *exynos_gem_dmabuf_kmap(struct dma_buf *dma_buf, unsigned long page_num) { - /* TODO */ + struct exynos_drm_gem_obj *exynos_gem_obj = dma_buf->priv; + struct exynos_drm_gem_buf *buf = exynos_gem_obj->buffer; - return NULL; + return kmap(buf->pages[page_num]); } static void exynos_gem_dmabuf_kunmap(struct dma_buf *dma_buf, unsigned long page_num, void *addr) { - /* TODO */ + struct exynos_drm_gem_obj *exynos_gem_obj = dma_buf->priv; + struct exynos_drm_gem_buf *buf = exynos_gem_obj->buffer; + + kunmap(buf->pages[page_num]); } static int exynos_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma) -- 1.7.0.4 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 2/3] drm/exynos: add dmabuf mmap function
implement mmap function of dma_buf_ops. Signed-off-by: Cooper Yuan --- drivers/gpu/drm/exynos/exynos_drm_dmabuf.c | 38 1 files changed, 38 insertions(+), 0 deletions(-) diff --git a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c index e4eeb0b..913a23e 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c +++ b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c @@ -164,6 +164,43 @@ static void exynos_gem_dmabuf_kunmap(struct dma_buf *dma_buf, /* TODO */ } +static int exynos_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma) +{ + struct exynos_drm_gem_obj *exynos_gem_obj = dma_buf->priv; + struct drm_device *dev = exynos_gem_obj->base.dev; + struct exynos_drm_gem_buf *buf = exynos_gem_obj->buffer; + int ret = 0; + if (WARN_ON(!exynos_gem_obj->base.filp)) + return -EINVAL; + + /* Check for valid size. */ + if (buf->size < vma->vm_end - vma->vm_start) { + ret = -EINVAL; + goto out_unlock; + } + + if (!dev->driver->gem_vm_ops) { + ret = -EINVAL; + goto out_unlock; + } + + vma->vm_flags |= VM_RESERVED | VM_IO | VM_MIXEDMAP | VM_DONTEXPAND; + vma->vm_ops = dev->driver->gem_vm_ops; + vma->vm_private_data = exynos_gem_obj; + vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags)); + + /* Take a ref for this mapping of the object, so that the fault +* handler can dereference the mmap offset's pointer to the object. +* This reference is cleaned up by the corresponding vm_close +* (which should happen whether the vma was created by this call, or +* by a vm_open due to mremap or partial unmap or whatever). +*/ + vma->vm_ops->open(vma); + +out_unlock: + return ret; +} + static struct dma_buf_ops exynos_dmabuf_ops = { .map_dma_buf= exynos_gem_map_dma_buf, .unmap_dma_buf = exynos_gem_unmap_dma_buf, @@ -172,6 +209,7 @@ static struct dma_buf_ops exynos_dmabuf_ops = { .kunmap = exynos_gem_dmabuf_kunmap, .kunmap_atomic = exynos_gem_dmabuf_kunmap_atomic, .release= exynos_dmabuf_release, + .mmap = exynos_gem_dmabuf_mmap, }; struct dma_buf *exynos_dmabuf_prime_export(struct drm_device *drm_dev, -- 1.7.0.4 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 1/3] drm/exynos: correct dma_buf exporter permission as ReadWrite
Set dma_buf exporter permission as ReadWrite, otherwise mmap will get errno 13: permission denied. Signed-off-by: Cooper Yuan --- drivers/gpu/drm/exynos/exynos_drm_dmabuf.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c index 613bf8a..e4eeb0b 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c +++ b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c @@ -29,6 +29,7 @@ #include "exynos_drm_drv.h" #include "exynos_drm_gem.h" +#include #include static struct sg_table *exynos_pages_to_sg(struct page **pages, int nr_pages, @@ -179,7 +180,7 @@ struct dma_buf *exynos_dmabuf_prime_export(struct drm_device *drm_dev, struct exynos_drm_gem_obj *exynos_gem_obj = to_exynos_gem_obj(obj); return dma_buf_export(exynos_gem_obj, &exynos_dmabuf_ops, - exynos_gem_obj->base.size, 0600); + exynos_gem_obj->base.size, O_RDWR); } struct drm_gem_object *exynos_dmabuf_prime_import(struct drm_device *drm_dev, -- 1.7.0.4 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC] drm/radeon: restoring ring commands in case of a lockup
On 09.07.2012 18:10, Jerome Glisse wrote: On Mon, Jul 9, 2012 at 11:59 AM, Michel Dänzer wrote: On Mon, 2012-07-09 at 12:41 +0200, Christian König wrote: Hi, The following patchset tries to save and restore the not yet processed commands from the rings in case of a lockup and with that should make a userspace problem with a single application far less problematic. The first four patches are just stuff this patchset is based upon, followed by four patches which fix various bugs found while working on this feature. Followed by patches which change the way how memory is saved/restored on suspend/resume, basically before we have unpinned most of the buffer objects so it could be move from vram into system memory. But that is mostly unnecessary cause the buffer object either are already in system memory or their content can be easily reinitialized. The last three patches implement the actual tracking and restoring of commands in case of a lockup. Please take a look and review. Patches 3, 5 and 14 are Reviewed-by: Michel Dänzer Patch 1-9 are Reviewed-by: Jerome Glisse Other looks good but i want to test them too and spend a bit more time to double check few things. Will try to do that tomorrow. Just send out v2 of the patchset. Mainly it integrates your idea of just saving rptr right before we call into the IB, but also contains all the other comments and fixes from Michel. Cheers, Christian. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 15/15] drm/radeon: implement ring saving on reset v2
Try to save whatever is on the rings when we encounter an lockup. v2: Fix spelling error. Free saved ring data if reset fails. Add documentation for the new functions. Signed-off-by: Christian König --- drivers/gpu/drm/radeon/radeon.h|4 ++ drivers/gpu/drm/radeon/radeon_device.c | 48 drivers/gpu/drm/radeon/radeon_ring.c | 75 3 files changed, 119 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 64d39ad..6715e4c 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -768,6 +768,10 @@ int radeon_ring_test(struct radeon_device *rdev, struct radeon_ring *cp); void radeon_ring_force_activity(struct radeon_device *rdev, struct radeon_ring *ring); void radeon_ring_lockup_update(struct radeon_ring *ring); bool radeon_ring_test_lockup(struct radeon_device *rdev, struct radeon_ring *ring); +unsigned radeon_ring_backup(struct radeon_device *rdev, struct radeon_ring *ring, + uint32_t **data); +int radeon_ring_restore(struct radeon_device *rdev, struct radeon_ring *ring, + unsigned size, uint32_t *data); int radeon_ring_init(struct radeon_device *rdev, struct radeon_ring *cp, unsigned ring_size, unsigned rptr_offs, unsigned rptr_reg, unsigned wptr_reg, u32 ptr_reg_shift, u32 ptr_reg_mask, u32 nop); diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index bbd0971..0302a9f 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -996,7 +996,12 @@ int radeon_resume_kms(struct drm_device *dev) int radeon_gpu_reset(struct radeon_device *rdev) { - int r; + unsigned ring_sizes[RADEON_NUM_RINGS]; + uint32_t *ring_data[RADEON_NUM_RINGS]; + + bool saved = false; + + int i, r; int resched; down_write(&rdev->exclusive_lock); @@ -1005,20 +1010,47 @@ int radeon_gpu_reset(struct radeon_device *rdev) resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev); radeon_suspend(rdev); + for (i = 0; i < RADEON_NUM_RINGS; ++i) { + ring_sizes[i] = radeon_ring_backup(rdev, &rdev->ring[i], + &ring_data[i]); + if (ring_sizes[i]) { + saved = true; + dev_info(rdev->dev, "Saved %d dwords of commands " +"on ring %d.\n", ring_sizes[i], i); + } + } + +retry: r = radeon_asic_reset(rdev); if (!r) { - dev_info(rdev->dev, "GPU reset succeed\n"); + dev_info(rdev->dev, "GPU reset succeeded, trying to resume\n"); radeon_resume(rdev); + } - r = radeon_ib_ring_tests(rdev); - if (r) - DRM_ERROR("ib ring test failed (%d).\n", r); + radeon_restore_bios_scratch_regs(rdev); + drm_helper_resume_force_mode(rdev->ddev); - radeon_restore_bios_scratch_regs(rdev); - drm_helper_resume_force_mode(rdev->ddev); - ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched); + if (!r) { + for (i = 0; i < RADEON_NUM_RINGS; ++i) { + radeon_ring_restore(rdev, &rdev->ring[i], + ring_sizes[i], ring_data[i]); + } + + r = radeon_ib_ring_tests(rdev); + if (r) { + dev_err(rdev->dev, "ib ring test failed (%d).\n", r); + if (saved) { + radeon_suspend(rdev); + goto retry; + } + } + } else { + for (i = 0; i < RADEON_NUM_RINGS; ++i) { + kfree(ring_data[i]); + } } + ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched); if (r) { /* bad news, how to tell it to userspace ? */ dev_info(rdev->dev, "GPU reset failed\n"); diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index ce8eb9d..a4fa2c7 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -362,6 +362,81 @@ bool radeon_ring_test_lockup(struct radeon_device *rdev, struct radeon_ring *rin return false; } +/** + * radeon_ring_backup - Backup the content of a ring + * + * @rdev: radeon_device pointer + * @ring: the ring we want to backup + * + * Saves all unprocessed commits to a ring, returns the number of dwords saved. + */ +unsigned radeon_ring_backup(struct radeon_device *rdev, struct radeon_ring *ring, + uint32_t **data) +{ + unsigned size, ptr, i; + + /* just in
[PATCH 14/15] drm/radeon: record what is next valid wptr for each ring v2
Before emitting any indirect buffer, emit the offset of the next valid ring content if any. This allow code that want to resume ring to resume ring right after ib that caused GPU lockup. v2: use scratch registers instead of storing it into memory Signed-off-by: Jerome Glisse Signed-off-by: Christian König --- drivers/gpu/drm/radeon/evergreen.c |8 +++- drivers/gpu/drm/radeon/ni.c | 11 ++- drivers/gpu/drm/radeon/r600.c| 18 -- drivers/gpu/drm/radeon/radeon.h |1 + drivers/gpu/drm/radeon/radeon_ring.c |4 drivers/gpu/drm/radeon/rv770.c |4 +++- drivers/gpu/drm/radeon/si.c | 22 +++--- 7 files changed, 60 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index f39b900..40de347 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -1368,7 +1368,13 @@ void evergreen_ring_ib_execute(struct radeon_device *rdev, struct radeon_ib *ib) /* set to DX10/11 mode */ radeon_ring_write(ring, PACKET3(PACKET3_MODE_CONTROL, 0)); radeon_ring_write(ring, 1); - /* FIXME: implement */ + + if (ring->rptr_save_reg) { + uint32_t next_rptr = ring->wptr + 2 + 4; + radeon_ring_write(ring, PACKET0(ring->rptr_save_reg, 0)); + radeon_ring_write(ring, next_rptr); + } + radeon_ring_write(ring, PACKET3(PACKET3_INDIRECT_BUFFER, 2)); radeon_ring_write(ring, #ifdef __BIG_ENDIAN diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c index f2afefb..6e3d448 100644 --- a/drivers/gpu/drm/radeon/ni.c +++ b/drivers/gpu/drm/radeon/ni.c @@ -855,6 +855,13 @@ void cayman_ring_ib_execute(struct radeon_device *rdev, struct radeon_ib *ib) /* set to DX10/11 mode */ radeon_ring_write(ring, PACKET3(PACKET3_MODE_CONTROL, 0)); radeon_ring_write(ring, 1); + + if (ring->rptr_save_reg) { + uint32_t next_rptr = ring->wptr + 2 + 4; + radeon_ring_write(ring, PACKET0(ring->rptr_save_reg, 0)); + radeon_ring_write(ring, next_rptr); + } + radeon_ring_write(ring, PACKET3(PACKET3_INDIRECT_BUFFER, 2)); radeon_ring_write(ring, #ifdef __BIG_ENDIAN @@ -981,8 +988,10 @@ static int cayman_cp_start(struct radeon_device *rdev) static void cayman_cp_fini(struct radeon_device *rdev) { + struct radeon_ring *ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]; cayman_cp_enable(rdev, false); - radeon_ring_fini(rdev, &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]); + radeon_ring_fini(rdev, ring); + radeon_scratch_free(rdev, ring->rptr_save_reg); } int cayman_cp_resume(struct radeon_device *rdev) diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index c808fa9..74fca15 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -2155,18 +2155,27 @@ int r600_cp_resume(struct radeon_device *rdev) void r600_ring_init(struct radeon_device *rdev, struct radeon_ring *ring, unsigned ring_size) { u32 rb_bufsz; + int r; /* Align ring size */ rb_bufsz = drm_order(ring_size / 8); ring_size = (1 << (rb_bufsz + 1)) * 4; ring->ring_size = ring_size; ring->align_mask = 16 - 1; + + r = radeon_scratch_get(rdev, &ring->rptr_save_reg); + if (r) { + DRM_ERROR("failed to get scratch reg for rptr save (%d).\n", r); + ring->rptr_save_reg = 0; + } } void r600_cp_fini(struct radeon_device *rdev) { + struct radeon_ring *ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]; r600_cp_stop(rdev); - radeon_ring_fini(rdev, &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]); + radeon_ring_fini(rdev, ring); + radeon_scratch_free(rdev, ring->rptr_save_reg); } @@ -2568,7 +2577,12 @@ void r600_ring_ib_execute(struct radeon_device *rdev, struct radeon_ib *ib) { struct radeon_ring *ring = &rdev->ring[ib->ring]; - /* FIXME: implement */ + if (ring->rptr_save_reg) { + uint32_t next_rptr = ring->wptr + 2 + 4; + radeon_ring_write(ring, PACKET0(ring->rptr_save_reg, 0)); + radeon_ring_write(ring, next_rptr); + } + radeon_ring_write(ring, PACKET3(PACKET3_INDIRECT_BUFFER, 2)); radeon_ring_write(ring, #ifdef __BIG_ENDIAN diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 872270c..64d39ad 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -622,6 +622,7 @@ struct radeon_ring { unsignedrptr; unsignedrptr_offs; unsignedrptr_reg; + unsignedrptr_save_reg; unsignedwptr; unsignedwptr_old; unsignedwp
[PATCH 13/15] drm/radeon: move radeon_ib_ring_tests out of chipset code
Making it easier to controlwhen it is executed. Signed-off-by: Christian König --- drivers/gpu/drm/radeon/evergreen.c |4 drivers/gpu/drm/radeon/ni.c|4 drivers/gpu/drm/radeon/r100.c |4 drivers/gpu/drm/radeon/r300.c |4 drivers/gpu/drm/radeon/r420.c |4 drivers/gpu/drm/radeon/r520.c |4 drivers/gpu/drm/radeon/r600.c |4 drivers/gpu/drm/radeon/radeon_device.c | 15 +++ drivers/gpu/drm/radeon/rs400.c |4 drivers/gpu/drm/radeon/rs600.c |4 drivers/gpu/drm/radeon/rs690.c |4 drivers/gpu/drm/radeon/rv515.c |4 drivers/gpu/drm/radeon/rv770.c |4 drivers/gpu/drm/radeon/si.c| 21 - 14 files changed, 15 insertions(+), 69 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index 82f7aea..f39b900 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -3093,10 +3093,6 @@ static int evergreen_startup(struct radeon_device *rdev) return r; } - r = radeon_ib_ring_tests(rdev); - if (r) - return r; - r = r600_audio_init(rdev); if (r) { DRM_ERROR("radeon: audio init failed\n"); diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c index ec5307c..f2afefb 100644 --- a/drivers/gpu/drm/radeon/ni.c +++ b/drivers/gpu/drm/radeon/ni.c @@ -1276,10 +1276,6 @@ static int cayman_startup(struct radeon_device *rdev) return r; } - r = radeon_ib_ring_tests(rdev); - if (r) - return r; - r = radeon_vm_manager_init(rdev); if (r) { dev_err(rdev->dev, "vm manager initialization failed (%d).\n", r); diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index 9524bd4..e0f5ae8 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -3887,10 +3887,6 @@ static int r100_startup(struct radeon_device *rdev) return r; } - r = radeon_ib_ring_tests(rdev); - if (r) - return r; - return 0; } diff --git a/drivers/gpu/drm/radeon/r300.c b/drivers/gpu/drm/radeon/r300.c index b396e34..646a192 100644 --- a/drivers/gpu/drm/radeon/r300.c +++ b/drivers/gpu/drm/radeon/r300.c @@ -1397,10 +1397,6 @@ static int r300_startup(struct radeon_device *rdev) return r; } - r = radeon_ib_ring_tests(rdev); - if (r) - return r; - return 0; } diff --git a/drivers/gpu/drm/radeon/r420.c b/drivers/gpu/drm/radeon/r420.c index 0062938..f2f5bf6 100644 --- a/drivers/gpu/drm/radeon/r420.c +++ b/drivers/gpu/drm/radeon/r420.c @@ -281,10 +281,6 @@ static int r420_startup(struct radeon_device *rdev) return r; } - r = radeon_ib_ring_tests(rdev); - if (r) - return r; - return 0; } diff --git a/drivers/gpu/drm/radeon/r520.c b/drivers/gpu/drm/radeon/r520.c index 6df3e51..079d3c5 100644 --- a/drivers/gpu/drm/radeon/r520.c +++ b/drivers/gpu/drm/radeon/r520.c @@ -209,10 +209,6 @@ static int r520_startup(struct radeon_device *rdev) return r; } - r = radeon_ib_ring_tests(rdev); - if (r) - return r; - return 0; } diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index af2f74a..c808fa9 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -2395,10 +2395,6 @@ int r600_startup(struct radeon_device *rdev) return r; } - r = radeon_ib_ring_tests(rdev); - if (r) - return r; - r = r600_audio_init(rdev); if (r) { DRM_ERROR("radeon: audio init failed\n"); diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index 254fdb4..bbd0971 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -822,6 +822,10 @@ int radeon_device_init(struct radeon_device *rdev, if (r) return r; + r = radeon_ib_ring_tests(rdev); + if (r) + DRM_ERROR("ib ring test failed (%d).\n", r); + if (rdev->flags & RADEON_IS_AGP && !rdev->accel_working) { /* Acceleration not working on AGP card try again * with fallback to PCI or PCIE GART @@ -946,6 +950,7 @@ int radeon_resume_kms(struct drm_device *dev) { struct drm_connector *connector; struct radeon_device *rdev = dev->dev_private; + int r; if (dev->switch_power_state == DRM_SWITCH_POWER_OFF) return 0; @@ -960,6 +965,11 @@ int radeon_resume_kms(struct drm_device *dev) /* resume AGP if in use */ radeon_agp_resume
[PATCH 11/15] drm/radeon: remove r600_blit_suspend
Just reinitialize the shader content on resume instead. Signed-off-by: Christian König --- drivers/gpu/drm/radeon/evergreen.c |1 - drivers/gpu/drm/radeon/evergreen_blit_kms.c | 40 +-- drivers/gpu/drm/radeon/ni.c |1 - drivers/gpu/drm/radeon/r600.c | 15 -- drivers/gpu/drm/radeon/r600_blit_kms.c | 40 +-- drivers/gpu/drm/radeon/radeon.h |2 -- drivers/gpu/drm/radeon/rv770.c |1 - drivers/gpu/drm/radeon/si.c |3 -- 8 files changed, 40 insertions(+), 63 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index 64e06e6..82f7aea 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -3139,7 +3139,6 @@ int evergreen_suspend(struct radeon_device *rdev) struct radeon_ring *ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]; r600_audio_fini(rdev); - r600_blit_suspend(rdev); r700_cp_stop(rdev); ring->ready = false; evergreen_irq_suspend(rdev); diff --git a/drivers/gpu/drm/radeon/evergreen_blit_kms.c b/drivers/gpu/drm/radeon/evergreen_blit_kms.c index e512560..89cb9fe 100644 --- a/drivers/gpu/drm/radeon/evergreen_blit_kms.c +++ b/drivers/gpu/drm/radeon/evergreen_blit_kms.c @@ -634,10 +634,6 @@ int evergreen_blit_init(struct radeon_device *rdev) rdev->r600_blit.max_dim = 16384; - /* pin copy shader into vram if already initialized */ - if (rdev->r600_blit.shader_obj) - goto done; - rdev->r600_blit.state_offset = 0; if (rdev->family < CHIP_CAYMAN) @@ -668,11 +664,26 @@ int evergreen_blit_init(struct radeon_device *rdev) obj_size += cayman_ps_size * 4; obj_size = ALIGN(obj_size, 256); - r = radeon_bo_create(rdev, obj_size, PAGE_SIZE, true, RADEON_GEM_DOMAIN_VRAM, -NULL, &rdev->r600_blit.shader_obj); - if (r) { - DRM_ERROR("evergreen failed to allocate shader\n"); - return r; + /* pin copy shader into vram if not already initialized */ + if (!rdev->r600_blit.shader_obj) { + r = radeon_bo_create(rdev, obj_size, PAGE_SIZE, true, +RADEON_GEM_DOMAIN_VRAM, +NULL, &rdev->r600_blit.shader_obj); + if (r) { + DRM_ERROR("evergreen failed to allocate shader\n"); + return r; + } + + r = radeon_bo_reserve(rdev->r600_blit.shader_obj, false); + if (unlikely(r != 0)) + return r; + r = radeon_bo_pin(rdev->r600_blit.shader_obj, RADEON_GEM_DOMAIN_VRAM, + &rdev->r600_blit.shader_gpu_addr); + radeon_bo_unreserve(rdev->r600_blit.shader_obj); + if (r) { + dev_err(rdev->dev, "(%d) pin blit object failed\n", r); + return r; + } } DRM_DEBUG("evergreen blit allocated bo %08x vs %08x ps %08x\n", @@ -714,17 +725,6 @@ int evergreen_blit_init(struct radeon_device *rdev) radeon_bo_kunmap(rdev->r600_blit.shader_obj); radeon_bo_unreserve(rdev->r600_blit.shader_obj); -done: - r = radeon_bo_reserve(rdev->r600_blit.shader_obj, false); - if (unlikely(r != 0)) - return r; - r = radeon_bo_pin(rdev->r600_blit.shader_obj, RADEON_GEM_DOMAIN_VRAM, - &rdev->r600_blit.shader_gpu_addr); - radeon_bo_unreserve(rdev->r600_blit.shader_obj); - if (r) { - dev_err(rdev->dev, "(%d) pin blit object failed\n", r); - return r; - } radeon_ttm_set_active_vram_size(rdev, rdev->mc.real_vram_size); return 0; } diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c index fe55310..4004376 100644 --- a/drivers/gpu/drm/radeon/ni.c +++ b/drivers/gpu/drm/radeon/ni.c @@ -1316,7 +1316,6 @@ int cayman_suspend(struct radeon_device *rdev) { r600_audio_fini(rdev); radeon_vm_manager_suspend(rdev); - r600_blit_suspend(rdev); cayman_cp_enable(rdev, false); rdev->ring[RADEON_RING_TYPE_GFX_INDEX].ready = false; evergreen_irq_suspend(rdev); diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index 9750f53..af2f74a 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -2307,20 +2307,6 @@ int r600_copy_blit(struct radeon_device *rdev, return 0; } -void r600_blit_suspend(struct radeon_device *rdev) -{ - int r; - - /* unpin shaders bo */ - if (rdev->r600_blit.shader_obj) { - r = radeon_bo_reserve(rdev->r600_blit.shader_obj, false); - if (!r) { - radeon_bo_unpin(rdev->r600_
[PATCH 12/15] drm/radeon: remove vm_manager start/suspend
Just restore the page table instead. Addressing three problem with this change: 1. Calling vm_manager_suspend in the suspend path is problematic cause it wants to wait for the VM use to end, which in case of a lockup never happens. 2. In case of a locked up memory controller unbinding the VM seems to make it even more unstable, creating an unrecoverable lockup in the end. 3. If we want to backup/restore the leftover ring content we must not unbind VMs in between. Signed-off-by: Christian König --- drivers/gpu/drm/radeon/ni.c | 12 ++--- drivers/gpu/drm/radeon/radeon.h |2 - drivers/gpu/drm/radeon/radeon_gart.c | 83 +- drivers/gpu/drm/radeon/si.c | 12 ++--- 4 files changed, 59 insertions(+), 50 deletions(-) diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c index 4004376..ec5307c 100644 --- a/drivers/gpu/drm/radeon/ni.c +++ b/drivers/gpu/drm/radeon/ni.c @@ -1280,9 +1280,11 @@ static int cayman_startup(struct radeon_device *rdev) if (r) return r; - r = radeon_vm_manager_start(rdev); - if (r) + r = radeon_vm_manager_init(rdev); + if (r) { + dev_err(rdev->dev, "vm manager initialization failed (%d).\n", r); return r; + } r = r600_audio_init(rdev); if (r) @@ -1315,7 +1317,6 @@ int cayman_resume(struct radeon_device *rdev) int cayman_suspend(struct radeon_device *rdev) { r600_audio_fini(rdev); - radeon_vm_manager_suspend(rdev); cayman_cp_enable(rdev, false); rdev->ring[RADEON_RING_TYPE_GFX_INDEX].ready = false; evergreen_irq_suspend(rdev); @@ -1392,11 +1393,6 @@ int cayman_init(struct radeon_device *rdev) return r; rdev->accel_working = true; - r = radeon_vm_manager_init(rdev); - if (r) { - dev_err(rdev->dev, "vm manager initialization failed (%d).\n", r); - } - r = cayman_startup(rdev); if (r) { dev_err(rdev->dev, "disabling GPU acceleration\n"); diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 8a8c3f8..872270c 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -1759,8 +1759,6 @@ extern void radeon_ttm_set_active_vram_size(struct radeon_device *rdev, u64 size */ int radeon_vm_manager_init(struct radeon_device *rdev); void radeon_vm_manager_fini(struct radeon_device *rdev); -int radeon_vm_manager_start(struct radeon_device *rdev); -int radeon_vm_manager_suspend(struct radeon_device *rdev); int radeon_vm_init(struct radeon_device *rdev, struct radeon_vm *vm); void radeon_vm_fini(struct radeon_device *rdev, struct radeon_vm *vm); int radeon_vm_bind(struct radeon_device *rdev, struct radeon_vm *vm); diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index ee11c50..56752da 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -282,27 +282,58 @@ void radeon_gart_fini(struct radeon_device *rdev) * * TODO bind a default page at vm initialization for default address */ + int radeon_vm_manager_init(struct radeon_device *rdev) { + struct radeon_vm *vm; + struct radeon_bo_va *bo_va; int r; - rdev->vm_manager.enabled = false; + if (!rdev->vm_manager.enabled) { + /* mark first vm as always in use, it's the system one */ + r = radeon_sa_bo_manager_init(rdev, &rdev->vm_manager.sa_manager, + rdev->vm_manager.max_pfn * 8, + RADEON_GEM_DOMAIN_VRAM); + if (r) { + dev_err(rdev->dev, "failed to allocate vm bo (%dKB)\n", + (rdev->vm_manager.max_pfn * 8) >> 10); + return r; + } - /* mark first vm as always in use, it's the system one */ - r = radeon_sa_bo_manager_init(rdev, &rdev->vm_manager.sa_manager, - rdev->vm_manager.max_pfn * 8, - RADEON_GEM_DOMAIN_VRAM); - if (r) { - dev_err(rdev->dev, "failed to allocate vm bo (%dKB)\n", - (rdev->vm_manager.max_pfn * 8) >> 10); - return r; + r = rdev->vm_manager.funcs->init(rdev); + if (r) + return r; + + rdev->vm_manager.enabled = true; + + r = radeon_sa_bo_manager_start(rdev, &rdev->vm_manager.sa_manager); + if (r) + return r; } - r = rdev->vm_manager.funcs->init(rdev); - if (r == 0) - rdev->vm_manager.enabled = true; + /* restore page table */ + list_for_each_entry(vm, &rdev->vm_manager.lru_vm, list) { + if (vm->id =
[PATCH 10/15] drm/radeon: remove ip_pool start/suspend
The IB pool is in gart memory, so it is completely superfluous to unpin / repin it on suspend / resume. Signed-off-by: Christian König --- drivers/gpu/drm/radeon/evergreen.c | 17 ++--- drivers/gpu/drm/radeon/ni.c | 16 ++-- drivers/gpu/drm/radeon/r100.c| 23 ++- drivers/gpu/drm/radeon/r300.c| 17 ++--- drivers/gpu/drm/radeon/r420.c| 17 ++--- drivers/gpu/drm/radeon/r520.c| 14 +- drivers/gpu/drm/radeon/r600.c| 17 ++--- drivers/gpu/drm/radeon/radeon.h |2 -- drivers/gpu/drm/radeon/radeon_asic.h |1 - drivers/gpu/drm/radeon/radeon_ring.c | 17 +++-- drivers/gpu/drm/radeon/rs400.c | 17 ++--- drivers/gpu/drm/radeon/rs600.c | 17 ++--- drivers/gpu/drm/radeon/rs690.c | 17 ++--- drivers/gpu/drm/radeon/rv515.c | 16 ++-- drivers/gpu/drm/radeon/rv770.c | 17 ++--- drivers/gpu/drm/radeon/si.c | 16 ++-- 16 files changed, 84 insertions(+), 157 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index eb9a71a..64e06e6 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -3087,9 +3087,11 @@ static int evergreen_startup(struct radeon_device *rdev) if (r) return r; - r = radeon_ib_pool_start(rdev); - if (r) + r = radeon_ib_pool_init(rdev); + if (r) { + dev_err(rdev->dev, "IB initialization failed (%d).\n", r); return r; + } r = radeon_ib_ring_tests(rdev); if (r) @@ -3137,7 +3139,6 @@ int evergreen_suspend(struct radeon_device *rdev) struct radeon_ring *ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]; r600_audio_fini(rdev); - radeon_ib_pool_suspend(rdev); r600_blit_suspend(rdev); r700_cp_stop(rdev); ring->ready = false; @@ -3224,20 +3225,14 @@ int evergreen_init(struct radeon_device *rdev) if (r) return r; - r = radeon_ib_pool_init(rdev); rdev->accel_working = true; - if (r) { - dev_err(rdev->dev, "IB initialization failed (%d).\n", r); - rdev->accel_working = false; - } - r = evergreen_startup(rdev); if (r) { dev_err(rdev->dev, "disabling GPU acceleration\n"); r700_cp_fini(rdev); r600_irq_fini(rdev); radeon_wb_fini(rdev); - r100_ib_fini(rdev); + radeon_ib_pool_fini(rdev); radeon_irq_kms_fini(rdev); evergreen_pcie_gart_fini(rdev); rdev->accel_working = false; @@ -3264,7 +3259,7 @@ void evergreen_fini(struct radeon_device *rdev) r700_cp_fini(rdev); r600_irq_fini(rdev); radeon_wb_fini(rdev); - r100_ib_fini(rdev); + radeon_ib_pool_fini(rdev); radeon_irq_kms_fini(rdev); evergreen_pcie_gart_fini(rdev); r600_vram_scratch_fini(rdev); diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c index 8b1df33..fe55310 100644 --- a/drivers/gpu/drm/radeon/ni.c +++ b/drivers/gpu/drm/radeon/ni.c @@ -1270,9 +1270,11 @@ static int cayman_startup(struct radeon_device *rdev) if (r) return r; - r = radeon_ib_pool_start(rdev); - if (r) + r = radeon_ib_pool_init(rdev); + if (r) { + dev_err(rdev->dev, "IB initialization failed (%d).\n", r); return r; + } r = radeon_ib_ring_tests(rdev); if (r) @@ -1313,7 +1315,6 @@ int cayman_resume(struct radeon_device *rdev) int cayman_suspend(struct radeon_device *rdev) { r600_audio_fini(rdev); - radeon_ib_pool_suspend(rdev); radeon_vm_manager_suspend(rdev); r600_blit_suspend(rdev); cayman_cp_enable(rdev, false); @@ -1391,12 +1392,7 @@ int cayman_init(struct radeon_device *rdev) if (r) return r; - r = radeon_ib_pool_init(rdev); rdev->accel_working = true; - if (r) { - dev_err(rdev->dev, "IB initialization failed (%d).\n", r); - rdev->accel_working = false; - } r = radeon_vm_manager_init(rdev); if (r) { dev_err(rdev->dev, "vm manager initialization failed (%d).\n", r); @@ -1410,7 +1406,7 @@ int cayman_init(struct radeon_device *rdev) if (rdev->flags & RADEON_IS_IGP) si_rlc_fini(rdev); radeon_wb_fini(rdev); - r100_ib_fini(rdev); + radeon_ib_pool_fini(rdev); radeon_vm_manager_fini(rdev); radeon_irq_kms_fini(rdev); cayman_pcie_gart_fini(rdev); @@ -1441,7 +1437,7 @@ void cayman_fini(struct
[PATCH 09/15] drm/radeon: make cp init on cayman more robust
It's not critical, but the current code isn't 100% correct. Signed-off-by: Christian König Reviewed-by: Jerome Glisse --- drivers/gpu/drm/radeon/ni.c | 133 ++- 1 file changed, 56 insertions(+), 77 deletions(-) diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c index 32a6082..8b1df33 100644 --- a/drivers/gpu/drm/radeon/ni.c +++ b/drivers/gpu/drm/radeon/ni.c @@ -987,10 +987,33 @@ static void cayman_cp_fini(struct radeon_device *rdev) int cayman_cp_resume(struct radeon_device *rdev) { + static const int ridx[] = { + RADEON_RING_TYPE_GFX_INDEX, + CAYMAN_RING_TYPE_CP1_INDEX, + CAYMAN_RING_TYPE_CP2_INDEX + }; + static const unsigned cp_rb_cntl[] = { + CP_RB0_CNTL, + CP_RB1_CNTL, + CP_RB2_CNTL, + }; + static const unsigned cp_rb_rptr_addr[] = { + CP_RB0_RPTR_ADDR, + CP_RB1_RPTR_ADDR, + CP_RB2_RPTR_ADDR + }; + static const unsigned cp_rb_rptr_addr_hi[] = { + CP_RB0_RPTR_ADDR_HI, + CP_RB1_RPTR_ADDR_HI, + CP_RB2_RPTR_ADDR_HI + }; + static const unsigned cp_rb_base[] = { + CP_RB0_BASE, + CP_RB1_BASE, + CP_RB2_BASE + }; struct radeon_ring *ring; - u32 tmp; - u32 rb_bufsz; - int r; + int i, r; /* Reset cp; if cp is reset, then PA, SH, VGT also need to be reset */ WREG32(GRBM_SOFT_RESET, (SOFT_RESET_CP | @@ -1012,91 +1035,47 @@ int cayman_cp_resume(struct radeon_device *rdev) WREG32(CP_DEBUG, (1 << 27)); - /* ring 0 - compute and gfx */ - /* Set ring buffer size */ - ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]; - rb_bufsz = drm_order(ring->ring_size / 8); - tmp = (drm_order(RADEON_GPU_PAGE_SIZE/8) << 8) | rb_bufsz; -#ifdef __BIG_ENDIAN - tmp |= BUF_SWAP_32BIT; -#endif - WREG32(CP_RB0_CNTL, tmp); - - /* Initialize the ring buffer's read and write pointers */ - WREG32(CP_RB0_CNTL, tmp | RB_RPTR_WR_ENA); - ring->wptr = 0; - WREG32(CP_RB0_WPTR, ring->wptr); - /* set the wb address wether it's enabled or not */ - WREG32(CP_RB0_RPTR_ADDR, (rdev->wb.gpu_addr + RADEON_WB_CP_RPTR_OFFSET) & 0xFFFC); - WREG32(CP_RB0_RPTR_ADDR_HI, upper_32_bits(rdev->wb.gpu_addr + RADEON_WB_CP_RPTR_OFFSET) & 0xFF); WREG32(SCRATCH_ADDR, ((rdev->wb.gpu_addr + RADEON_WB_SCRATCH_OFFSET) >> 8) & 0x); + WREG32(SCRATCH_UMSK, 0xff); - if (rdev->wb.enabled) - WREG32(SCRATCH_UMSK, 0xff); - else { - tmp |= RB_NO_UPDATE; - WREG32(SCRATCH_UMSK, 0); - } - - mdelay(1); - WREG32(CP_RB0_CNTL, tmp); - - WREG32(CP_RB0_BASE, ring->gpu_addr >> 8); - - ring->rptr = RREG32(CP_RB0_RPTR); + for (i = 0; i < 3; ++i) { + uint32_t rb_cntl; + uint64_t addr; - /* ring1 - compute only */ - /* Set ring buffer size */ - ring = &rdev->ring[CAYMAN_RING_TYPE_CP1_INDEX]; - rb_bufsz = drm_order(ring->ring_size / 8); - tmp = (drm_order(RADEON_GPU_PAGE_SIZE/8) << 8) | rb_bufsz; + /* Set ring buffer size */ + ring = &rdev->ring[ridx[i]]; + rb_cntl = drm_order(ring->ring_size / 8); + rb_cntl |= drm_order(RADEON_GPU_PAGE_SIZE/8) << 8; #ifdef __BIG_ENDIAN - tmp |= BUF_SWAP_32BIT; + rb_cntl |= BUF_SWAP_32BIT; #endif - WREG32(CP_RB1_CNTL, tmp); + WREG32(cp_rb_cntl[i], rb_cntl); - /* Initialize the ring buffer's read and write pointers */ - WREG32(CP_RB1_CNTL, tmp | RB_RPTR_WR_ENA); - ring->wptr = 0; - WREG32(CP_RB1_WPTR, ring->wptr); - - /* set the wb address wether it's enabled or not */ - WREG32(CP_RB1_RPTR_ADDR, (rdev->wb.gpu_addr + RADEON_WB_CP1_RPTR_OFFSET) & 0xFFFC); - WREG32(CP_RB1_RPTR_ADDR_HI, upper_32_bits(rdev->wb.gpu_addr + RADEON_WB_CP1_RPTR_OFFSET) & 0xFF); - - mdelay(1); - WREG32(CP_RB1_CNTL, tmp); - - WREG32(CP_RB1_BASE, ring->gpu_addr >> 8); - - ring->rptr = RREG32(CP_RB1_RPTR); - - /* ring2 - compute only */ - /* Set ring buffer size */ - ring = &rdev->ring[CAYMAN_RING_TYPE_CP2_INDEX]; - rb_bufsz = drm_order(ring->ring_size / 8); - tmp = (drm_order(RADEON_GPU_PAGE_SIZE/8) << 8) | rb_bufsz; -#ifdef __BIG_ENDIAN - tmp |= BUF_SWAP_32BIT; -#endif - WREG32(CP_RB2_CNTL, tmp); - - /* Initialize the ring buffer's read and write pointers */ - WREG32(CP_RB2_CNTL, tmp | RB_RPTR_WR_ENA); - ring->wptr = 0; - WREG32(CP_RB2_WPTR, ring->wptr); + /* set the wb address wether it's enabled or not */ + addr = rdev->wb.gpu_addr + RADEON_WB_CP_RPTR_OFFSET; + WREG
[PATCH 08/15] drm/radeon: remove FIXME comment from chipset suspend
For a normal suspend/resume we allready wait for the rings to be empty, and for a suspend/reasume in case of a lockup we REALLY don't want to wait for anything. Signed-off-by: Christian König Reviewed-by: Jerome Glisse --- drivers/gpu/drm/radeon/evergreen.c |1 - drivers/gpu/drm/radeon/ni.c|1 - drivers/gpu/drm/radeon/r600.c |1 - drivers/gpu/drm/radeon/rv770.c |1 - drivers/gpu/drm/radeon/si.c|1 - 5 files changed, 5 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index f716e08..eb9a71a 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -3137,7 +3137,6 @@ int evergreen_suspend(struct radeon_device *rdev) struct radeon_ring *ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]; r600_audio_fini(rdev); - /* FIXME: we should wait for ring to be empty */ radeon_ib_pool_suspend(rdev); r600_blit_suspend(rdev); r700_cp_stop(rdev); diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c index 2366be3..32a6082 100644 --- a/drivers/gpu/drm/radeon/ni.c +++ b/drivers/gpu/drm/radeon/ni.c @@ -1334,7 +1334,6 @@ int cayman_resume(struct radeon_device *rdev) int cayman_suspend(struct radeon_device *rdev) { r600_audio_fini(rdev); - /* FIXME: we should wait for ring to be empty */ radeon_ib_pool_suspend(rdev); radeon_vm_manager_suspend(rdev); r600_blit_suspend(rdev); diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index 43d0c41..de4de2d 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -2461,7 +2461,6 @@ int r600_suspend(struct radeon_device *rdev) r600_audio_fini(rdev); radeon_ib_pool_suspend(rdev); r600_blit_suspend(rdev); - /* FIXME: we should wait for ring to be empty */ r600_cp_stop(rdev); rdev->ring[RADEON_RING_TYPE_GFX_INDEX].ready = false; r600_irq_suspend(rdev); diff --git a/drivers/gpu/drm/radeon/rv770.c b/drivers/gpu/drm/radeon/rv770.c index b4f51c5..7e230f6 100644 --- a/drivers/gpu/drm/radeon/rv770.c +++ b/drivers/gpu/drm/radeon/rv770.c @@ -996,7 +996,6 @@ int rv770_suspend(struct radeon_device *rdev) r600_audio_fini(rdev); radeon_ib_pool_suspend(rdev); r600_blit_suspend(rdev); - /* FIXME: we should wait for ring to be empty */ r700_cp_stop(rdev); rdev->ring[RADEON_RING_TYPE_GFX_INDEX].ready = false; r600_irq_suspend(rdev); diff --git a/drivers/gpu/drm/radeon/si.c b/drivers/gpu/drm/radeon/si.c index 34603b3c8..78c790f 100644 --- a/drivers/gpu/drm/radeon/si.c +++ b/drivers/gpu/drm/radeon/si.c @@ -3807,7 +3807,6 @@ int si_resume(struct radeon_device *rdev) int si_suspend(struct radeon_device *rdev) { - /* FIXME: we should wait for ring to be empty */ radeon_ib_pool_suspend(rdev); radeon_vm_manager_suspend(rdev); #if 0 -- 1.7.9.5 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 07/15] drm/radeon: fix fence init after resume
Start with last signaled fence number instead of last emitted one. Signed-off-by: Christian König Reviewed-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon_fence.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index a194a14..76c5b22 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -578,7 +578,7 @@ int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring) } rdev->fence_drv[ring].cpu_addr = &rdev->wb.wb[index/4]; rdev->fence_drv[ring].gpu_addr = rdev->wb.gpu_addr + index; - radeon_fence_write(rdev, rdev->fence_drv[ring].sync_seq[ring], ring); + radeon_fence_write(rdev, atomic64_read(&rdev->fence_drv[ring].last_seq), ring); rdev->fence_drv[ring].initialized = true; dev_info(rdev->dev, "fence driver on ring %d use gpu addr 0x%016llx and cpu addr 0x%p\n", ring, rdev->fence_drv[ring].gpu_addr, rdev->fence_drv[ring].cpu_addr); -- 1.7.9.5 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 06/15] drm/radeon: fix fence value access
It is possible that radeon_fence_process is called after writeback is disabled for suspend, leading to an invalid read of register 0x0. This fixes a problem for me where the fence value is temporary incremented by 0x1 on suspend/resume. Signed-off-by: Christian König Reviewed-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon_fence.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index be4e4f3..a194a14 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -42,21 +42,23 @@ static void radeon_fence_write(struct radeon_device *rdev, u32 seq, int ring) { - if (rdev->wb.enabled) { - *rdev->fence_drv[ring].cpu_addr = cpu_to_le32(seq); + struct radeon_fence_driver *drv = &rdev->fence_drv[ring]; + if (likely(rdev->wb.enabled || !drv->scratch_reg)) { + *drv->cpu_addr = cpu_to_le32(seq); } else { - WREG32(rdev->fence_drv[ring].scratch_reg, seq); + WREG32(drv->scratch_reg, seq); } } static u32 radeon_fence_read(struct radeon_device *rdev, int ring) { + struct radeon_fence_driver *drv = &rdev->fence_drv[ring]; u32 seq = 0; - if (rdev->wb.enabled) { - seq = le32_to_cpu(*rdev->fence_drv[ring].cpu_addr); + if (likely(rdev->wb.enabled || !drv->scratch_reg)) { + seq = le32_to_cpu(*drv->cpu_addr); } else { - seq = RREG32(rdev->fence_drv[ring].scratch_reg); + seq = RREG32(drv->scratch_reg); } return seq; } -- 1.7.9.5 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 05/15] drm/radeon: fix ring commit padding
We don't need to pad anything if the number of dwords written to the ring already matches the requirements. Fixes some "writting more dword to ring than expected" warnings. Signed-off-by: Christian König Reviewed-by: Jerome Glisse Reviewed-by: Michel Dänzer --- drivers/gpu/drm/radeon/radeon_ring.c |7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 0826e77..674aaba 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -272,13 +272,8 @@ int radeon_ring_lock(struct radeon_device *rdev, struct radeon_ring *ring, unsig void radeon_ring_commit(struct radeon_device *rdev, struct radeon_ring *ring) { - unsigned count_dw_pad; - unsigned i; - /* We pad to match fetch size */ - count_dw_pad = (ring->align_mask + 1) - - (ring->wptr & ring->align_mask); - for (i = 0; i < count_dw_pad; i++) { + while (ring->wptr & ring->align_mask) { radeon_ring_write(ring, ring->nop); } DRM_MEMORYBARRIER(); -- 1.7.9.5 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 04/15] drm/radeon: add an exclusive lock for GPU reset v2
From: Jerome Glisse GPU reset need to be exclusive, one happening at a time. For this add a rw semaphore so that any path that trigger GPU activities have to take the semaphore as a reader thus allowing concurency. The GPU reset path take the semaphore as a writer ensuring that no concurrent reset take place. v2: init rw semaphore Signed-off-by: Jerome Glisse Reviewed-by: Christian König --- drivers/gpu/drm/radeon/radeon.h|1 + drivers/gpu/drm/radeon/radeon_cs.c |5 + drivers/gpu/drm/radeon/radeon_device.c |3 +++ drivers/gpu/drm/radeon/radeon_gem.c|8 4 files changed, 17 insertions(+) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 5861ec8..4487873 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -1446,6 +1446,7 @@ struct radeon_device { struct device *dev; struct drm_device *ddev; struct pci_dev *pdev; + struct rw_semaphore exclusive_lock; /* ASIC */ union radeon_asic_configconfig; enum radeon_family family; diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index d5aec09..553da67 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -499,7 +499,9 @@ int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) struct radeon_cs_parser parser; int r; + down_read(&rdev->exclusive_lock); if (!rdev->accel_working) { + up_read(&rdev->exclusive_lock); return -EBUSY; } /* initialize parser */ @@ -512,6 +514,7 @@ int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) if (r) { DRM_ERROR("Failed to initialize parser !\n"); radeon_cs_parser_fini(&parser, r); + up_read(&rdev->exclusive_lock); r = radeon_cs_handle_lockup(rdev, r); return r; } @@ -520,6 +523,7 @@ int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) if (r != -ERESTARTSYS) DRM_ERROR("Failed to parse relocation %d!\n", r); radeon_cs_parser_fini(&parser, r); + up_read(&rdev->exclusive_lock); r = radeon_cs_handle_lockup(rdev, r); return r; } @@ -533,6 +537,7 @@ int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) } out: radeon_cs_parser_fini(&parser, r); + up_read(&rdev->exclusive_lock); r = radeon_cs_handle_lockup(rdev, r); return r; } diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index f654ba8..254fdb4 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -734,6 +734,7 @@ int radeon_device_init(struct radeon_device *rdev, mutex_init(&rdev->gem.mutex); mutex_init(&rdev->pm.mutex); init_rwsem(&rdev->pm.mclk_lock); + init_rwsem(&rdev->exclusive_lock); init_waitqueue_head(&rdev->irq.vblank_queue); init_waitqueue_head(&rdev->irq.idle_queue); r = radeon_gem_init(rdev); @@ -988,6 +989,7 @@ int radeon_gpu_reset(struct radeon_device *rdev) int r; int resched; + down_write(&rdev->exclusive_lock); radeon_save_bios_scratch_regs(rdev); /* block TTM */ resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev); @@ -1007,6 +1009,7 @@ int radeon_gpu_reset(struct radeon_device *rdev) dev_info(rdev->dev, "GPU reset failed\n"); } + up_write(&rdev->exclusive_lock); return r; } diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c index d9b0809..b0be9c4 100644 --- a/drivers/gpu/drm/radeon/radeon_gem.c +++ b/drivers/gpu/drm/radeon/radeon_gem.c @@ -215,12 +215,14 @@ int radeon_gem_create_ioctl(struct drm_device *dev, void *data, uint32_t handle; int r; + down_read(&rdev->exclusive_lock); /* create a gem object to contain this object in */ args->size = roundup(args->size, PAGE_SIZE); r = radeon_gem_object_create(rdev, args->size, args->alignment, args->initial_domain, false, false, &gobj); if (r) { + up_read(&rdev->exclusive_lock); r = radeon_gem_handle_lockup(rdev, r); return r; } @@ -228,10 +230,12 @@ int radeon_gem_create_ioctl(struct drm_device *dev, void *data, /* drop reference from allocate - handle holds it now */ drm_gem_object_unreference_unlocked(gobj); if (r) { + up_read(&rdev->exclusive_lock); r = radeon_gem_handle_
[PATCH 01/15] drm/radeon: add error handling to fence_wait_empty_locked
Instead of returning the error handle it directly and while at it fix the comments about the ring lock. Signed-off-by: Christian König Reviewed-by: Michel Dänzer Reviewed-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon.h |2 +- drivers/gpu/drm/radeon/radeon_fence.c | 33 + 2 files changed, 22 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 77b4519b..5861ec8 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -239,7 +239,7 @@ void radeon_fence_process(struct radeon_device *rdev, int ring); bool radeon_fence_signaled(struct radeon_fence *fence); int radeon_fence_wait(struct radeon_fence *fence, bool interruptible); int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring); -int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring); +void radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring); int radeon_fence_wait_any(struct radeon_device *rdev, struct radeon_fence **fences, bool intr); diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 7b55625..be4e4f3 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -440,14 +440,11 @@ int radeon_fence_wait_any(struct radeon_device *rdev, return 0; } +/* caller must hold ring lock */ int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring) { uint64_t seq; - /* We are not protected by ring lock when reading current seq but -* it's ok as worst case is we return to early while we could have -* wait. -*/ seq = atomic64_read(&rdev->fence_drv[ring].last_seq) + 1ULL; if (seq >= rdev->fence_drv[ring].sync_seq[ring]) { /* nothing to wait for, last_seq is @@ -457,15 +454,27 @@ int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring) return radeon_fence_wait_seq(rdev, seq, ring, false, false); } -int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring) +/* caller must hold ring lock */ +void radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring) { - /* We are not protected by ring lock when reading current seq -* but it's ok as wait empty is call from place where no more -* activity can be scheduled so there won't be concurrent access -* to seq value. -*/ - return radeon_fence_wait_seq(rdev, rdev->fence_drv[ring].sync_seq[ring], -ring, false, false); + uint64_t seq = rdev->fence_drv[ring].sync_seq[ring]; + + while(1) { + int r; + r = radeon_fence_wait_seq(rdev, seq, ring, false, false); + if (r == -EDEADLK) { + mutex_unlock(&rdev->ring_lock); + r = radeon_gpu_reset(rdev); + mutex_lock(&rdev->ring_lock); + if (!r) + continue; + } + if (r) { + dev_err(rdev->dev, "error waiting for ring to become" + " idle (%d)\n", r); + } + return; + } } struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence) -- 1.7.9.5 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 03/15] drm/radeon: fix fence related segfault in CS
Don't return success if scheduling the IB fails, otherwise we end up with an oops in ttm_eu_fence_buffer_objects. Signed-off-by: Christian König Reviewed-by: Jerome Glisse Reviewed-by: Michel Dänzer Cc: sta...@vger.kernel.org --- drivers/gpu/drm/radeon/radeon_cs.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index f1b7527..d5aec09 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -358,7 +358,7 @@ static int radeon_cs_ib_chunk(struct radeon_device *rdev, if (r) { DRM_ERROR("Failed to schedule IB !\n"); } - return 0; + return r; } static int radeon_bo_vm_update_pte(struct radeon_cs_parser *parser, -- 1.7.9.5 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 02/15] drm/radeon: add error handling to radeon_vm_unbind_locked
Waiting for a fence can fail for different reasons, the most common is a deadlock. Signed-off-by: Christian König Reviewed-by: Michel Dänzer Reviewed-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon_gart.c | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index 2b34c1a..ee11c50 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -316,10 +316,21 @@ static void radeon_vm_unbind_locked(struct radeon_device *rdev, } /* wait for vm use to end */ - if (vm->fence) { - radeon_fence_wait(vm->fence, false); - radeon_fence_unref(&vm->fence); + while (vm->fence) { + int r; + r = radeon_fence_wait(vm->fence, false); + if (r) + DRM_ERROR("error while waiting for fence: %d\n", r); + if (r == -EDEADLK) { + mutex_unlock(&rdev->vm_manager.lock); + r = radeon_gpu_reset(rdev); + mutex_lock(&rdev->vm_manager.lock); + if (!r) + continue; + } + break; } + radeon_fence_unref(&vm->fence); /* hw unbind */ rdev->vm_manager.funcs->unbind(rdev, vm); -- 1.7.9.5 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[RFC PATCH 8/8] nouveau: Prime execbuffer submission synchronization
From: Maarten Lankhorst Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/nouveau/nouveau_gem.c | 121 +++-- 1 file changed, 116 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c index 11c9c2a..e5d36bb 100644 --- a/drivers/gpu/drm/nouveau/nouveau_gem.c +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c @@ -31,6 +31,7 @@ #include "nouveau_drm.h" #include "nouveau_dma.h" #include "nouveau_fence.h" +#include #define nouveau_gem_pushbuf_sync(chan) 0 @@ -277,6 +278,7 @@ struct validate_op { struct list_head vram_list; struct list_head gart_list; struct list_head both_list; + struct list_head prime_list; }; static void @@ -305,9 +307,36 @@ validate_fini_list(struct list_head *list, struct nouveau_fence *fence) static void validate_fini(struct validate_op *op, struct nouveau_fence* fence) { + struct list_head *entry, *tmp; + struct nouveau_bo *nvbo; + struct dma_buf *sync_buf; + u32 ofs, val; + validate_fini_list(&op->vram_list, fence); validate_fini_list(&op->gart_list, fence); validate_fini_list(&op->both_list, fence); + + if (list_empty(&op->prime_list)) + return; + + if (fence && + !nouveau_fence_prime_get(fence, &sync_buf, &ofs, &val)) { + dmabufmgr_eu_fence_buffer_objects(sync_buf, ofs, val, + &op->prime_list); + dma_buf_put(sync_buf); + } else + dmabufmgr_eu_backoff_reservation(&op->prime_list); + + list_for_each_safe(entry, tmp, &op->prime_list) { + struct dmabufmgr_validate *val; + val = list_entry(entry, struct dmabufmgr_validate, head); + nvbo = val->priv; + + list_del(&val->head); + nvbo->reserved_by = NULL; + drm_gem_object_unreference_unlocked(nvbo->gem); + kfree(val); + } } static int @@ -319,9 +348,9 @@ validate_init(struct nouveau_channel *chan, struct drm_file *file_priv, struct drm_nouveau_private *dev_priv = dev->dev_private; uint32_t sequence; int trycnt = 0; - int ret, i; + int i; - sequence = atomic_add_return(1, &dev_priv->ttm.validate_sequence); + sequence = atomic_inc_return(&dev_priv->ttm.validate_sequence); retry: if (++trycnt > 10) { NV_ERROR(dev, "%s failed and gave up.\n", __func__); @@ -332,6 +361,8 @@ retry: struct drm_nouveau_gem_pushbuf_bo *b = &pbbo[i]; struct drm_gem_object *gem; struct nouveau_bo *nvbo; + int ret = 0, is_prime; + struct dmabufmgr_validate *validate = NULL; gem = drm_gem_object_lookup(dev, file_priv, b->handle); if (!gem) { @@ -340,6 +371,7 @@ retry: return -ENOENT; } nvbo = gem->driver_private; + is_prime = gem->export_dma_buf || gem->import_attach; if (nvbo->reserved_by && nvbo->reserved_by == file_priv) { NV_ERROR(dev, "multiple instances of buffer %d on " @@ -349,7 +381,21 @@ retry: return -EINVAL; } - ret = ttm_bo_reserve(&nvbo->bo, true, false, true, sequence); + if (likely(!is_prime)) + ret = ttm_bo_reserve(&nvbo->bo, true, false, +true, sequence); + else { + validate = kzalloc(sizeof(*validate), GFP_KERNEL); + if (validate) { + if (gem->import_attach) + validate->bo = + gem->import_attach->dmabuf; + else + validate->bo = gem->export_dma_buf; + validate->priv = nvbo; + } else + ret = -ENOMEM; + } if (ret) { validate_fini(op, NULL); if (unlikely(ret == -EAGAIN)) @@ -366,6 +412,9 @@ retry: b->user_priv = (uint64_t)(unsigned long)nvbo; nvbo->reserved_by = file_priv; nvbo->pbbo_index = i; + if (is_prime) { + list_add_tail(&validate->head, &op->prime_list); + } else if ((b->valid_domains & NOUVEAU_GEM_DOMAIN_VRAM) && (b->valid_domains & NOUVEAU_GEM_DOMAIN_GART)) list_add_tail(&nvbo->entry, &op->both_list); @@ -473,6 +522,60 @@ validate_list(struct nouveau_channel *chan, struct list_head *list, } static int +validate_prime(struct nouveau_chann
[RFC PATCH 7/8] nouveau: nvc0 fence prime implementation
From: Maarten Lankhorst Create a read-only mapping for every imported bo, and create a prime bo in in system memory. Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/nouveau/nvc0_fence.c | 104 +- 1 file changed, 89 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nvc0_fence.c b/drivers/gpu/drm/nouveau/nvc0_fence.c index 198e31f..dc6ccab 100644 --- a/drivers/gpu/drm/nouveau/nvc0_fence.c +++ b/drivers/gpu/drm/nouveau/nvc0_fence.c @@ -37,6 +37,7 @@ struct nvc0_fence_priv { struct nvc0_fence_chan { struct nouveau_fence_chan base; struct nouveau_vma vma; + struct nouveau_vma prime_vma; }; static int @@ -45,19 +46,23 @@ nvc0_fence_emit(struct nouveau_fence *fence, bool prime) struct nouveau_channel *chan = fence->channel; struct nvc0_fence_chan *fctx = chan->engctx[NVOBJ_ENGINE_FENCE]; u64 addr = fctx->vma.offset + chan->id * 16; - int ret; + int ret, i; - ret = RING_SPACE(chan, 5); - if (ret == 0) { + ret = RING_SPACE(chan, prime ? 10 : 5); + if (ret) + return ret; + + for (i = 0; i < (prime ? 2 : 1); ++i) { + if (i) + addr = fctx->prime_vma.offset + chan->id * 16; BEGIN_NVC0(chan, 0, NV84_SUBCHAN_SEMAPHORE_ADDRESS_HIGH, 4); OUT_RING (chan, upper_32_bits(addr)); OUT_RING (chan, lower_32_bits(addr)); OUT_RING (chan, fence->sequence); OUT_RING (chan, NV84_SUBCHAN_SEMAPHORE_TRIGGER_WRITE_LONG); - FIRE_RING (chan); } - - return ret; + FIRE_RING(chan); + return 0; } static int @@ -95,6 +100,8 @@ nvc0_fence_context_del(struct nouveau_channel *chan, int engine) struct nvc0_fence_priv *priv = nv_engine(chan->dev, engine); struct nvc0_fence_chan *fctx = chan->engctx[engine]; + if (priv->base.prime_bo) + nouveau_bo_vma_del(priv->base.prime_bo, &fctx->prime_vma); nouveau_bo_vma_del(priv->bo, &fctx->vma); nouveau_fence_context_del(chan->dev, &fctx->base); chan->engctx[engine] = NULL; @@ -115,10 +122,16 @@ nvc0_fence_context_new(struct nouveau_channel *chan, int engine) nouveau_fence_context_new(&fctx->base); ret = nouveau_bo_vma_add(priv->bo, chan->vm, &fctx->vma); + if (!ret && priv->base.prime_bo) + ret = nouveau_bo_vma_add(priv->base.prime_bo, chan->vm, +&fctx->prime_vma); if (ret) nvc0_fence_context_del(chan, engine); - nouveau_bo_wr32(priv->bo, chan->id * 16/4, 0x); + fctx->base.sequence = nouveau_bo_rd32(priv->bo, chan->id * 16/4); + if (priv->base.prime_bo) + nouveau_bo_wr32(priv->base.prime_bo, chan->id * 16/4, + fctx->base.sequence); return ret; } @@ -140,12 +153,55 @@ nvc0_fence_destroy(struct drm_device *dev, int engine) struct drm_nouveau_private *dev_priv = dev->dev_private; struct nvc0_fence_priv *priv = nv_engine(dev, engine); + nouveau_fence_prime_del(&priv->base); nouveau_bo_unmap(priv->bo); + nouveau_bo_unpin(priv->bo); nouveau_bo_ref(NULL, &priv->bo); dev_priv->eng[engine] = NULL; kfree(priv); } +static int +nvc0_fence_prime_sync(struct nouveau_channel *chan, + struct nouveau_bo *bo, + u32 ofs, u32 val, u64 sema_start) +{ + struct nvc0_fence_chan *fctx = chan->engctx[NVOBJ_ENGINE_FENCE]; + struct nvc0_fence_priv *priv = nv_engine(chan->dev, NVOBJ_ENGINE_FENCE); + int ret = RING_SPACE(chan, 5); + if (ret) + return ret; + + if (bo == priv->base.prime_bo) + sema_start = fctx->prime_vma.offset; + else + NV_ERROR(chan->dev, "syncing with %08Lx + %08x >= %08x\n", + sema_start, ofs, val); + sema_start += ofs; + + BEGIN_NVC0(chan, 0, NV84_SUBCHAN_SEMAPHORE_ADDRESS_HIGH, 4); + OUT_RING (chan, upper_32_bits(sema_start)); + OUT_RING (chan, lower_32_bits(sema_start)); + OUT_RING (chan, val); + OUT_RING (chan, NV84_SUBCHAN_SEMAPHORE_TRIGGER_ACQUIRE_GEQUAL | +NVC0_SUBCHAN_SEMAPHORE_TRIGGER_YIELD); + FIRE_RING (chan); + return ret; +} + +static void +nvc0_fence_prime_del_import(struct nouveau_fence_prime_bo_entry *entry) { + nouveau_bo_vma_del(entry->bo, &entry->vma); +} + +static int +nvc0_fence_prime_add_import(struct nouveau_fence_prime_bo_entry *entry) { + int ret = nouveau_bo_vma_add_access(entry->bo, entry->chan->vm, + &entry->vma, NV_MEM_ACCESS_RO); + entry->sema_start = entry->vma.offset; + return ret; +} + int nvc0_fence_create(struct drm_device *dev) { @@ -168,17 +224,35 @@ nvc0_fence_create
[RFC PATCH 6/8] nouveau: nv84 fence prime implementation
From: Maarten Lankhorst Create a dma object for the prime semaphore and every imported sync bo. Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/nouveau/nv84_fence.c | 121 -- 1 file changed, 115 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nv84_fence.c b/drivers/gpu/drm/nouveau/nv84_fence.c index b5cfbcb..f739dfc 100644 --- a/drivers/gpu/drm/nouveau/nv84_fence.c +++ b/drivers/gpu/drm/nouveau/nv84_fence.c @@ -31,6 +31,7 @@ struct nv84_fence_chan { struct nouveau_fence_chan base; + u32 sema_start; }; struct nv84_fence_priv { @@ -42,21 +43,25 @@ static int nv84_fence_emit(struct nouveau_fence *fence, bool prime) { struct nouveau_channel *chan = fence->channel; - int ret = RING_SPACE(chan, 7); - if (ret == 0) { + int i, ret; + + ret = RING_SPACE(chan, prime ? 14 : 7); + if (ret) + return ret; + + for (i = 0; i < (prime ? 2 : 1); ++i) { BEGIN_NV04(chan, 0, NV11_SUBCHAN_DMA_SEMAPHORE, 1); - OUT_RING (chan, NvSema); + OUT_RING (chan, i ? NvSemaPrime : NvSema); BEGIN_NV04(chan, 0, NV84_SUBCHAN_SEMAPHORE_ADDRESS_HIGH, 4); OUT_RING (chan, upper_32_bits(chan->id * 16)); OUT_RING (chan, lower_32_bits(chan->id * 16)); OUT_RING (chan, fence->sequence); OUT_RING (chan, NV84_SUBCHAN_SEMAPHORE_TRIGGER_WRITE_LONG); - FIRE_RING (chan); } + FIRE_RING (chan); return ret; } - static int nv84_fence_sync(struct nouveau_fence *fence, struct nouveau_channel *prev, struct nouveau_channel *chan) @@ -82,12 +87,94 @@ nv84_fence_read(struct nouveau_channel *chan) return nv_ro32(priv->mem, chan->id * 16); } +static int +nv84_fence_prime_sync(struct nouveau_channel *chan, + struct nouveau_bo *bo, + u32 ofs, u32 val, u64 sema_start) +{ + struct nv84_fence_priv *priv = nv_engine(chan->dev, NVOBJ_ENGINE_FENCE); + int ret = RING_SPACE(chan, 7); + u32 sema = 0; + if (ret < 0) + return ret; + + if (bo == priv->base.prime_bo) { + sema = NvSema; + } else { + struct sg_table *sgt = bo->bo.sg; + struct scatterlist *sg; + u32 i; + sema = sema_start; + for_each_sg(sgt->sgl, sg, sgt->nents, i) { + if (ofs < sg->offset + sg->length) { + ofs -= sg->offset; + break; + } + sema++; + } + } + + BEGIN_NV04(chan, 0, NV11_SUBCHAN_DMA_SEMAPHORE, 1); + OUT_RING (chan, sema); + BEGIN_NV04(chan, 0, NV84_SUBCHAN_SEMAPHORE_ADDRESS_HIGH, 4); + OUT_RING (chan, 0); + OUT_RING (chan, ofs); + OUT_RING (chan, val); + OUT_RING (chan, NV84_SUBCHAN_SEMAPHORE_TRIGGER_ACQUIRE_GEQUAL); + FIRE_RING (chan); + return ret; +} + +static void +nv84_fence_prime_del_import(struct nouveau_fence_prime_bo_entry *entry) { + u32 i; + for (i = entry->sema_start; i < entry->sema_start + entry->sema_len; ++i) + nouveau_ramht_remove(entry->chan, i); +} + +static int +nv84_fence_prime_add_import(struct nouveau_fence_prime_bo_entry *entry) { + struct sg_table *sgt = entry->bo->bo.sg; + struct nouveau_channel *chan = entry->chan; + struct nv84_fence_chan *fctx = chan->engctx[NVOBJ_ENGINE_FENCE]; + struct scatterlist *sg; + u32 i, sema; + int ret; + + sema = entry->sema_start = fctx->sema_start; + entry->sema_len = 0; + + for_each_sg(sgt->sgl, sg, sgt->nents, i) { + struct nouveau_gpuobj *obj; + ret = nouveau_gpuobj_dma_new(chan, NV_CLASS_DMA_FROM_MEMORY, +sg_dma_address(sg), PAGE_SIZE, +NV_MEM_ACCESS_RO, +NV_MEM_TARGET_PCI, &obj); + if (ret) + goto err; + + ret = nouveau_ramht_insert(chan, sema, obj); + nouveau_gpuobj_ref(NULL, &obj); + if (ret) + goto err; + entry->sema_len++; + sema++; + } + fctx->sema_start += (entry->sema_len + 0xff) & ~0xff; + return 0; + +err: + nv84_fence_prime_del_import(entry); + return ret; +} + static void nv84_fence_context_del(struct nouveau_channel *chan, int engine) { struct nv84_fence_chan *fctx = chan->engctx[engine]; nouveau_fence_context_del(chan->dev, &fctx->base); chan->engctx[engine] = NULL; + kfree(fctx); } @@ -104,6 +191,7 @@ nv84_fence_context_new(struct nouveau_channel *chan, int engine) return -EN
[RFC PATCH 5/8] nouveau: Add methods preparing for prime fencing
From: Maarten Lankhorst This can be used by nv84 and nvc0 to implement hardware fencing, earlier systems will require more thought but can fall back to software for now. Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/nouveau/nouveau_bo.c |6 +- drivers/gpu/drm/nouveau/nouveau_channel.c |2 +- drivers/gpu/drm/nouveau/nouveau_display.c |2 +- drivers/gpu/drm/nouveau/nouveau_dma.h |1 + drivers/gpu/drm/nouveau/nouveau_drv.h |5 + drivers/gpu/drm/nouveau/nouveau_fence.c | 242 - drivers/gpu/drm/nouveau/nouveau_fence.h | 44 +- drivers/gpu/drm/nouveau/nouveau_gem.c |6 +- drivers/gpu/drm/nouveau/nouveau_prime.c |2 + drivers/gpu/drm/nouveau/nv04_fence.c |4 +- drivers/gpu/drm/nouveau/nv10_fence.c |4 +- drivers/gpu/drm/nouveau/nv84_fence.c |4 +- drivers/gpu/drm/nouveau/nvc0_fence.c |4 +- 13 files changed, 304 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 4318320..a97025a 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -52,6 +52,9 @@ nouveau_bo_del_ttm(struct ttm_buffer_object *bo) DRM_ERROR("bo %p still attached to GEM object\n", bo); nv10_mem_put_tile_region(dev, nvbo->tile, NULL); + + if (nvbo->fence_import_attach) + nouveau_fence_prime_del_bo(nvbo); kfree(nvbo); } @@ -109,6 +112,7 @@ nouveau_bo_new(struct drm_device *dev, int size, int align, INIT_LIST_HEAD(&nvbo->head); INIT_LIST_HEAD(&nvbo->entry); INIT_LIST_HEAD(&nvbo->vma_list); + INIT_LIST_HEAD(&nvbo->prime_chan_entries); nvbo->tile_mode = tile_mode; nvbo->tile_flags = tile_flags; nvbo->bo.bdev = &dev_priv->ttm.bdev; @@ -480,7 +484,7 @@ nouveau_bo_move_accel_cleanup(struct nouveau_channel *chan, struct nouveau_fence *fence = NULL; int ret; - ret = nouveau_fence_new(chan, &fence); + ret = nouveau_fence_new(chan, &fence, false); if (ret) return ret; diff --git a/drivers/gpu/drm/nouveau/nouveau_channel.c b/drivers/gpu/drm/nouveau/nouveau_channel.c index 629d8a2..85a8556 100644 --- a/drivers/gpu/drm/nouveau/nouveau_channel.c +++ b/drivers/gpu/drm/nouveau/nouveau_channel.c @@ -362,7 +362,7 @@ nouveau_channel_idle(struct nouveau_channel *chan) struct nouveau_fence *fence = NULL; int ret; - ret = nouveau_fence_new(chan, &fence); + ret = nouveau_fence_new(chan, &fence, false); if (!ret) { ret = nouveau_fence_wait(fence, false, false); nouveau_fence_unref(&fence); diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 69688ef..7c76776 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -466,7 +466,7 @@ nouveau_page_flip_emit(struct nouveau_channel *chan, } FIRE_RING (chan); - ret = nouveau_fence_new(chan, pfence); + ret = nouveau_fence_new(chan, pfence, false); if (ret) goto fail; diff --git a/drivers/gpu/drm/nouveau/nouveau_dma.h b/drivers/gpu/drm/nouveau/nouveau_dma.h index 8db68be..d02ffd3 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dma.h +++ b/drivers/gpu/drm/nouveau/nouveau_dma.h @@ -74,6 +74,7 @@ enum { NvEvoSema0 = 0x8010, NvEvoSema1 = 0x8011, NvNotify1 = 0x8012, + NvSemaPrime = 0x801f, /* G80+ display objects */ NvEvoVRAM = 0x0100, diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h index 2c17989..ad49594 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drv.h +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h @@ -126,6 +126,11 @@ struct nouveau_bo { struct ttm_bo_kmap_obj dma_buf_vmap; int vmapping_count; + + /* fence related stuff */ + struct nouveau_bo *sync_bo; + struct list_head prime_chan_entries; + struct dma_buf_attachment *fence_import_attach; }; #define nouveau_bo_tile_layout(nvbo) \ diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index 3c18049..d4c9c40 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -29,17 +29,64 @@ #include #include +#include #include "nouveau_drv.h" #include "nouveau_ramht.h" #include "nouveau_fence.h" #include "nouveau_software.h" #include "nouveau_dma.h" +#include "nouveau_fifo.h" + +int nouveau_fence_prime_init(struct drm_device *dev, +struct nouveau_fence_priv *priv, u32 align) +{ + int ret = 0; +#ifdef CONFIG_DMA_SHARED_BUFFER + struct nouveau_fifo_priv *pfifo = nv_engine(dev, NVOBJ_ENGINE_FIFO); + u32 size = PAGE_AL
[RFC PATCH 4/8] nouveau: add nouveau_bo_vma_add_access
From: Maarten Lankhorst This is needed to allow creation of read-only vm mappings in fence objects. Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/nouveau/nouveau_bo.c |6 +++--- drivers/gpu/drm/nouveau/nouveau_drv.h |6 -- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 7f80ed5..4318320 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -1443,15 +1443,15 @@ nouveau_bo_vma_find(struct nouveau_bo *nvbo, struct nouveau_vm *vm) } int -nouveau_bo_vma_add(struct nouveau_bo *nvbo, struct nouveau_vm *vm, - struct nouveau_vma *vma) +nouveau_bo_vma_add_access(struct nouveau_bo *nvbo, struct nouveau_vm *vm, + struct nouveau_vma *vma, u32 access) { const u32 size = nvbo->bo.mem.num_pages << PAGE_SHIFT; struct nouveau_mem *node = nvbo->bo.mem.mm_node; int ret; ret = nouveau_vm_get(vm, size, nvbo->page_shift, -NV_MEM_ACCESS_RW, vma); +access, vma); if (ret) return ret; diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h index 7c52eba..2c17989 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drv.h +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h @@ -1350,8 +1350,10 @@ extern int nouveau_bo_validate(struct nouveau_bo *, bool interruptible, extern struct nouveau_vma * nouveau_bo_vma_find(struct nouveau_bo *, struct nouveau_vm *); -extern int nouveau_bo_vma_add(struct nouveau_bo *, struct nouveau_vm *, - struct nouveau_vma *); +#define nouveau_bo_vma_add(nvbo, vm, vma) \ + nouveau_bo_vma_add_access((nvbo), (vm), (vma), NV_MEM_ACCESS_RW) +extern int nouveau_bo_vma_add_access(struct nouveau_bo *, struct nouveau_vm *, +struct nouveau_vma *, u32 access); extern void nouveau_bo_vma_del(struct nouveau_bo *, struct nouveau_vma *); /* nouveau_gem.c */ -- 1.7.9.5 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[RFC PATCH 3/8] nouveau: Extend prime code
From: Maarten Lankhorst The prime code no longer requires the bo to be backed by a gem object, and cpu access calls have been implemented. This will be needed for exporting fence bo's. Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/nouveau/nouveau_drv.h |6 +- drivers/gpu/drm/nouveau/nouveau_prime.c | 106 +-- 2 files changed, 79 insertions(+), 33 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h index 8613cb2..7c52eba 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drv.h +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h @@ -1374,11 +1374,15 @@ extern int nouveau_gem_ioctl_cpu_fini(struct drm_device *, void *, extern int nouveau_gem_ioctl_info(struct drm_device *, void *, struct drm_file *); +extern int nouveau_gem_prime_export_bo(struct nouveau_bo *nvbo, int flags, + u32 size, struct dma_buf **ret); extern struct dma_buf *nouveau_gem_prime_export(struct drm_device *dev, struct drm_gem_object *obj, int flags); extern struct drm_gem_object *nouveau_gem_prime_import(struct drm_device *dev, struct dma_buf *dma_buf); - +extern int nouveau_prime_import_bo(struct drm_device *dev, + struct dma_buf *dma_buf, + struct nouveau_bo **pnvbo, bool gem); /* nouveau_display.c */ int nouveau_display_create(struct drm_device *dev); void nouveau_display_destroy(struct drm_device *dev); diff --git a/drivers/gpu/drm/nouveau/nouveau_prime.c b/drivers/gpu/drm/nouveau/nouveau_prime.c index a25cf2c..537154d3 100644 --- a/drivers/gpu/drm/nouveau/nouveau_prime.c +++ b/drivers/gpu/drm/nouveau/nouveau_prime.c @@ -35,7 +35,8 @@ static struct sg_table *nouveau_gem_map_dma_buf(struct dma_buf_attachment *attac enum dma_data_direction dir) { struct nouveau_bo *nvbo = attachment->dmabuf->priv; - struct drm_device *dev = nvbo->gem->dev; + struct drm_nouveau_private *dev_priv = nouveau_bdev(nvbo->bo.bdev); + struct drm_device *dev = dev_priv->dev; int npages = nvbo->bo.num_pages; struct sg_table *sg; int nents; @@ -59,29 +60,37 @@ static void nouveau_gem_dmabuf_release(struct dma_buf *dma_buf) { struct nouveau_bo *nvbo = dma_buf->priv; - if (nvbo->gem->export_dma_buf == dma_buf) { - nvbo->gem->export_dma_buf = NULL; + nouveau_bo_unpin(nvbo); + if (!nvbo->gem) + nouveau_bo_ref(NULL, &nvbo); + else { + if (nvbo->gem->export_dma_buf == dma_buf) + nvbo->gem->export_dma_buf = NULL; drm_gem_object_unreference_unlocked(nvbo->gem); } } static void *nouveau_gem_kmap_atomic(struct dma_buf *dma_buf, unsigned long page_num) { - return NULL; + struct nouveau_bo *nvbo = dma_buf->priv; + return kmap_atomic(nvbo->bo.ttm->pages[page_num]); } static void nouveau_gem_kunmap_atomic(struct dma_buf *dma_buf, unsigned long page_num, void *addr) { - + kunmap_atomic(addr); } + static void *nouveau_gem_kmap(struct dma_buf *dma_buf, unsigned long page_num) { - return NULL; + struct nouveau_bo *nvbo = dma_buf->priv; + return kmap(nvbo->bo.ttm->pages[page_num]); } static void nouveau_gem_kunmap(struct dma_buf *dma_buf, unsigned long page_num, void *addr) { - + struct nouveau_bo *nvbo = dma_buf->priv; + return kunmap(nvbo->bo.ttm->pages[page_num]); } static int nouveau_gem_prime_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma) @@ -92,7 +101,8 @@ static int nouveau_gem_prime_mmap(struct dma_buf *dma_buf, struct vm_area_struct static void *nouveau_gem_prime_vmap(struct dma_buf *dma_buf) { struct nouveau_bo *nvbo = dma_buf->priv; - struct drm_device *dev = nvbo->gem->dev; + struct drm_nouveau_private *dev_priv = nouveau_bdev(nvbo->bo.bdev); + struct drm_device *dev = dev_priv->dev; int ret; mutex_lock(&dev->struct_mutex); @@ -116,7 +126,8 @@ out_unlock: static void nouveau_gem_prime_vunmap(struct dma_buf *dma_buf, void *vaddr) { struct nouveau_bo *nvbo = dma_buf->priv; - struct drm_device *dev = nvbo->gem->dev; + struct drm_nouveau_private *dev_priv = nouveau_bdev(nvbo->bo.bdev); + struct drm_device *dev = dev_priv->dev; mutex_lock(&dev->struct_mutex); nvbo->vmapping_count--; @@ -140,10 +151,9 @@ static const struct dma_buf_ops nouveau_dmabuf_ops = { }; static int -nouveau_prime_new(struct drm_device *dev, - size_t size, +nouveau_prime_new(struct drm_device *dev, size_t size, struct sg_table *sg, - struct nouveau_bo **pnvbo) + struct nouveau_bo **pnvbo, bool gem) { struct nouveau_bo *nvbo; u32 flag
[RFC PATCH 2/8] prime wip: i915
From: Maarten Lankhorst Export the hardware status page so others can read seqno. Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 29 -- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 87 drivers/gpu/drm/i915/intel_ringbuffer.c| 42 ++ drivers/gpu/drm/i915/intel_ringbuffer.h|3 + 4 files changed, 145 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c index aa308e1..d6bcfdc 100644 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c @@ -66,12 +66,25 @@ static void i915_gem_unmap_dma_buf(struct dma_buf_attachment *attachment, static void i915_gem_dmabuf_release(struct dma_buf *dma_buf) { struct drm_i915_gem_object *obj = dma_buf->priv; + struct drm_device *dev = obj->base.dev; + + mutex_lock(&dev->struct_mutex); if (obj->base.export_dma_buf == dma_buf) { - /* drop the reference on the export fd holds */ obj->base.export_dma_buf = NULL; - drm_gem_object_unreference_unlocked(&obj->base); + } else { + drm_i915_private_t *dev_priv = dev->dev_private; + struct intel_ring_buffer *ring; + int i; + + for_each_ring(ring, dev_priv, i) + WARN_ON(ring->sync_buf == dma_buf); } + + /* drop the reference on the export fd holds */ + drm_gem_object_unreference(&obj->base); + + mutex_unlock(&dev->struct_mutex); } static void *i915_gem_dmabuf_vmap(struct dma_buf *dma_buf) @@ -129,21 +142,25 @@ static void i915_gem_dmabuf_vunmap(struct dma_buf *dma_buf, void *vaddr) static void *i915_gem_dmabuf_kmap_atomic(struct dma_buf *dma_buf, unsigned long page_num) { - return NULL; + struct drm_i915_gem_object *obj = dma_buf->priv; + return kmap_atomic(obj->pages[page_num]); } static void i915_gem_dmabuf_kunmap_atomic(struct dma_buf *dma_buf, unsigned long page_num, void *addr) { - + kunmap_atomic(addr); } + static void *i915_gem_dmabuf_kmap(struct dma_buf *dma_buf, unsigned long page_num) { - return NULL; + struct drm_i915_gem_object *obj = dma_buf->priv; + return kmap(obj->pages[page_num]); } static void i915_gem_dmabuf_kunmap(struct dma_buf *dma_buf, unsigned long page_num, void *addr) { - + struct drm_i915_gem_object *obj = dma_buf->priv; + kunmap(obj->pages[page_num]); } static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma) diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 88e2e11..245340e 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -33,6 +33,7 @@ #include "i915_trace.h" #include "intel_drv.h" #include +#include struct change_domains { uint32_t invalidate_domains; @@ -556,7 +557,8 @@ err_unpin: static int i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring, struct drm_file *file, - struct list_head *objects) + struct list_head *objects, + struct list_head *prime_val) { drm_i915_private_t *dev_priv = ring->dev->dev_private; struct drm_i915_gem_object *obj; @@ -564,6 +566,31 @@ i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring, bool has_fenced_gpu_access = INTEL_INFO(ring->dev)->gen < 4; struct list_head ordered_objects; + list_for_each_entry(obj, objects, exec_list) { + struct dmabufmgr_validate *val; + + if (!(obj->base.import_attach || + obj->base.export_dma_buf)) + continue; + + val = kzalloc(sizeof(*val), GFP_KERNEL); + if (!val) + return -ENOMEM; + + if (obj->base.export_dma_buf) + val->bo = obj->base.export_dma_buf; + else + val->bo = obj->base.import_attach->dmabuf; + val->priv = obj; + list_add_tail(&val->head, prime_val); + } + + if (!list_empty(prime_val)) { + ret = dmabufmgr_eu_reserve_buffers(prime_val); + if (ret) + return ret; + } + INIT_LIST_HEAD(&ordered_objects); while (!list_empty(objects)) { struct drm_i915_gem_exec_object2 *entry; @@ -712,6 +739,7 @@ i915_gem_execbuffer_relocate_slow(struct drm_device *dev, struct drm_file *file, struct intel_ring_buffer *ring, struct list_head *objects, + struct list_head *prime_val, struct eb_o
[RFC PATCH 1/8] dma-buf-mgr: Try 2
From: Maarten Lankhorst Core code based on ttm_bo and ttm_execbuf_util Signed-off-by: Maarten Lankhorst --- drivers/base/Makefile |2 +- drivers/base/dma-buf-mgr-eu.c | 263 + drivers/base/dma-buf-mgr.c| 149 +++ drivers/base/dma-buf.c|4 + include/linux/dma-buf-mgr.h | 150 +++ include/linux/dma-buf.h | 24 6 files changed, 591 insertions(+), 1 deletion(-) create mode 100644 drivers/base/dma-buf-mgr-eu.c create mode 100644 drivers/base/dma-buf-mgr.c create mode 100644 include/linux/dma-buf-mgr.h diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 5aa2d70..86e7598 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -10,7 +10,7 @@ obj-$(CONFIG_CMA) += dma-contiguous.o obj-y += power/ obj-$(CONFIG_HAS_DMA) += dma-mapping.o obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o -obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o +obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o dma-buf-mgr.o dma-buf-mgr-eu.o obj-$(CONFIG_ISA) += isa.o obj-$(CONFIG_FW_LOADER)+= firmware_class.o obj-$(CONFIG_NUMA) += node.o diff --git a/drivers/base/dma-buf-mgr-eu.c b/drivers/base/dma-buf-mgr-eu.c new file mode 100644 index 000..ed5e01c --- /dev/null +++ b/drivers/base/dma-buf-mgr-eu.c @@ -0,0 +1,263 @@ +/* + * Copyright (C) 2012 Canonical Ltd + * + * Based on ttm_bo.c which bears the following copyright notice, + * but is dual licensed: + * + * Copyright (c) 2006-2009 VMware, Inc., Palo Alto, CA., USA + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sub license, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial portions + * of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, + * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR + * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE + * USE OR OTHER DEALINGS IN THE SOFTWARE. + * + **/ + +#include +#include +#include + +static void dmabufmgr_eu_backoff_reservation_locked(struct list_head *list) +{ + struct dmabufmgr_validate *entry; + + list_for_each_entry(entry, list, head) { + struct dma_buf *bo = entry->bo; + if (!entry->reserved) + continue; + entry->reserved = false; + + bo->sync_buf = entry->sync_buf; + entry->sync_buf = NULL; + + atomic_set(&bo->reserved, 0); + wake_up_all(&bo->event_queue); + } +} + +static int +dmabufmgr_eu_wait_unreserved_locked(struct list_head *list, + struct dma_buf *bo) +{ + int ret; + + spin_unlock(&dmabufmgr.lru_lock); + ret = dmabufmgr_bo_wait_unreserved(bo, true); + spin_lock(&dmabufmgr.lru_lock); + if (unlikely(ret != 0)) + dmabufmgr_eu_backoff_reservation_locked(list); + return ret; +} + +void +dmabufmgr_eu_backoff_reservation(struct list_head *list) +{ + if (list_empty(list)) + return; + + spin_lock(&dmabufmgr.lru_lock); + dmabufmgr_eu_backoff_reservation_locked(list); + spin_unlock(&dmabufmgr.lru_lock); +} +EXPORT_SYMBOL_GPL(dmabufmgr_eu_backoff_reservation); + +int +dmabufmgr_eu_reserve_buffers(struct list_head *list) +{ + struct dmabufmgr_validate *entry; + int ret; + u32 val_seq; + + if (list_empty(list)) + return 0; + + list_for_each_entry(entry, list, head) { + entry->reserved = false; + entry->sync_buf = NULL; + } + +retry: + spin_lock(&dmabufmgr.lru_lock); + val_seq = dmabufmgr.counter++; + + list_for_each_entry(entry, list, head) { + struct dma_buf *bo = entry->bo; + +retry_this_bo: + ret = dmabufmgr_bo_reserve_locked(bo, true, true, true, val_seq); + switch (ret) { + case 0: + break; + case -EBUSY: + ret = dmabufmgr_eu_wait_unreserved_locked(list, bo); +
[RFC PATCH 0/8] Dmabuf synchronization
This patch implements my attempt at dmabuf synchronization. The core idea is that a lot of devices will have their own methods of synchronization, but more complicated devices allow some way of fencing, so why not export those as dma-buf? This patchset implements dmabufmgr, which is based on ttm's code. The ttm code deals with a lot more than just reservation however, I took out almost all the code not dealing with reservations. I used the drm-intel-next-queued tree as base. It contains some i915 flushing changes. I would rather use linux-next, but the deferred fput code makes my system unbootable. That is unfortunate since it would reduce the deadlocks happening in dma_buf_put when 2 devices release each other's dmabuf. The i915 changes implement a simple cpu wait only, the nouveau code imports the sync dmabuf read-only and maps it to affected channels, then performs a wait on it in hardware. Since the hardware may still be processing other commands, it could be the case that no hardware wait would have to be performed at all. Only the nouveau nv84 code is tested, but the nvc0 code should work as well. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 1/2] drm: Add colouring to the range allocator
In order to support snoopable memory on non-LLC architectures (so that we can bind vgem objects into the i915 GATT for example), we have to avoid the prefetcher on the GPU from crossing memory domains and so prevent allocation of a snoopable PTE immediately following an uncached PTE. To do that, we need to extend the range allocator with support for tracking and segregating different node colours. This will be used by i915 to segregate memory domains within the GTT. v2: Now with more drm_mm helpers and less driver interference. Signed-off-by: Chris Wilson Cc: Dave Airlie Cc: Ben Skeggs Cc: Jerome Glisse Cc: Alex Deucher Cc: Daniel Vetter Cc: dri-devel@lists.freedesktop.org --- drivers/gpu/drm/drm_gem.c |2 +- drivers/gpu/drm/drm_mm.c | 169 - drivers/gpu/drm/i915/i915_gem.c |6 +- drivers/gpu/drm/i915/i915_gem_evict.c |9 +- include/drm/drm_mm.h | 93 +++--- 5 files changed, 191 insertions(+), 88 deletions(-) diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index d58e69d..fbe0842 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -354,7 +354,7 @@ drm_gem_create_mmap_offset(struct drm_gem_object *obj) /* Get a DRM GEM mmap offset allocated... */ list->file_offset_node = drm_mm_search_free(&mm->offset_manager, - obj->size / PAGE_SIZE, 0, 0); + obj->size / PAGE_SIZE, 0, false); if (!list->file_offset_node) { DRM_ERROR("failed to allocate offset for bo %d\n", obj->name); diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c index 961fb54..9bb82f7 100644 --- a/drivers/gpu/drm/drm_mm.c +++ b/drivers/gpu/drm/drm_mm.c @@ -118,45 +118,53 @@ static inline unsigned long drm_mm_hole_node_end(struct drm_mm_node *hole_node) static void drm_mm_insert_helper(struct drm_mm_node *hole_node, struct drm_mm_node *node, -unsigned long size, unsigned alignment) +unsigned long size, unsigned alignment, +unsigned long color) { struct drm_mm *mm = hole_node->mm; - unsigned long tmp = 0, wasted = 0; unsigned long hole_start = drm_mm_hole_node_start(hole_node); unsigned long hole_end = drm_mm_hole_node_end(hole_node); + unsigned long adj_start = hole_start; + unsigned long adj_end = hole_end; BUG_ON(!hole_node->hole_follows || node->allocated); - if (alignment) - tmp = hole_start % alignment; + if (mm->color_adjust) + mm->color_adjust(hole_node, color, &adj_start, &adj_end); - if (!tmp) { + if (alignment) { + unsigned tmp = adj_start % alignment; + if (tmp) + adj_start += alignment - tmp; + } + + if (adj_start == hole_start) { hole_node->hole_follows = 0; - list_del_init(&hole_node->hole_stack); - } else - wasted = alignment - tmp; + list_del(&hole_node->hole_stack); + } - node->start = hole_start + wasted; + node->start = adj_start; node->size = size; node->mm = mm; + node->color = color; node->allocated = 1; INIT_LIST_HEAD(&node->hole_stack); list_add(&node->node_list, &hole_node->node_list); - BUG_ON(node->start + node->size > hole_end); + BUG_ON(node->start + node->size > adj_end); + node->hole_follows = 0; if (node->start + node->size < hole_end) { list_add(&node->hole_stack, &mm->hole_stack); node->hole_follows = 1; - } else { - node->hole_follows = 0; } } struct drm_mm_node *drm_mm_get_block_generic(struct drm_mm_node *hole_node, unsigned long size, unsigned alignment, +unsigned long color, int atomic) { struct drm_mm_node *node; @@ -165,7 +173,7 @@ struct drm_mm_node *drm_mm_get_block_generic(struct drm_mm_node *hole_node, if (unlikely(node == NULL)) return NULL; - drm_mm_insert_helper(hole_node, node, size, alignment); + drm_mm_insert_helper(hole_node, node, size, alignment, color); return node; } @@ -181,11 +189,11 @@ int drm_mm_insert_node(struct drm_mm *mm, struct drm_mm_node *node, { struct drm_mm_node *hole_node; - hole_node = drm_mm_search_free(mm, size, alignment, 0); + hole_node = drm_mm_search_free(mm, size, alignment, false); if (!hole_node) return -ENOSPC; - drm_mm_insert_helper(hole_node, node, size, alignment); + drm_mm_insert_helper(hole_node,
Re: [REGRESSION] nouveau: Memory corruption using nva3 engine for 0xaf
On Mon, Jul 09, 2012 at 03:13:25PM +0200, Henrik Rydberg wrote: > On Thu, Jul 05, 2012 at 10:34:10AM +0200, Henrik Rydberg wrote: > > On Thu, Jul 05, 2012 at 08:54:46AM +0200, Henrik Rydberg wrote: > > > > Thanks for tracking down the source of this corruption. I don't have > > > > any such hardware, so until someone can figure it out, I think we > > > > should apply this patch. > > > > > > In that case, I would have to massage the patch a bit first; it > > > creates a problem with suspend/resume. Might be something with > > > nva3_pm.c, who knows. I am really stabbing in the dark here. :-) > > > > It seems the suspend/resume problem is unrelated (bad systemd update), > > so I am fine with applying this as is. Obviously not the best > > solution, and if I have time I will continue to look for problems in > > the nva3 copy code, but for now, > > > > Signed-off-by: Henrik Rydberg > > I have not encountered the problem in a long while, and I do not have > the patch applied. It is entirely possible that this was fixed by > something else. Unless you have already applied the patch, I would > suggest holding on to it to see if the problem reappears. > > Sorry for the churn. ... and there it was again, hours after giving up on it. Oh well. What makes this bug particularly difficult is that as soon as the patch is applied, the problem disappears and does not show itself again - with or without the patch applied. Sounds very much like the problem is a failure state that does not get reset by current mainline, but somehow gets reset with the patch applied. I also learnt that the problem is not in the nva3_copy code itself; I reverted nva3_copy.c and nva3_pm.c back to v3.4, but the problem persisted. A DMA problem elsewhere, in the drm code or in the pci layer, seems more likely than this particular hardware having problems with this particular copy engine. As it stands, though, applying the patch is the only thing known to work. Thanks, Henrik ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/exynos: Add exynos drm specific fb_mmap function
This patch adds a exynos drm specific implementation of fb_mmap which supports mapping a non-contiguous buffer to user space. This new function does not assume that the frame buffer is contiguous and calls dma_mmap_writecombine for mapping the buffer to user space. dma_mmap_writecombine will be able to map a contiguous buffer as well as non-contig buffer depending on whether an IOMMU mapping is created for drm or not. Signed-off-by: Prathyush K --- drivers/gpu/drm/exynos/exynos_drm_fbdev.c | 16 1 files changed, 16 insertions(+), 0 deletions(-) diff --git a/drivers/gpu/drm/exynos/exynos_drm_fbdev.c b/drivers/gpu/drm/exynos/exynos_drm_fbdev.c index d5586cc..b53e638 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_fbdev.c +++ b/drivers/gpu/drm/exynos/exynos_drm_fbdev.c @@ -46,8 +46,24 @@ struct exynos_drm_fbdev { struct exynos_drm_gem_obj *exynos_gem_obj; }; +static int exynos_drm_fb_mmap(struct fb_info *info, + struct vm_area_struct *vma) +{ + if ((vma->vm_end - vma->vm_start) > info->fix.smem_len) + return -EINVAL; + + vma->vm_pgoff = 0; + vma->vm_flags |= VM_IO | VM_RESERVED; + if (dma_mmap_writecombine(info->device, vma, info->screen_base, + info->fix.smem_start, vma->vm_end - vma->vm_start)) + return -EAGAIN; + + return 0; +} + static struct fb_ops exynos_drm_fb_ops = { .owner = THIS_MODULE, + .fb_mmap= exynos_drm_fb_mmap, .fb_fillrect= cfb_fillrect, .fb_copyarea= cfb_copyarea, .fb_imageblit = cfb_imageblit, -- 1.7.0.4 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel