[PATCH 14/20] drm/radeon: multiple ring allocator v2
On 07.05.2012 20:52, Jerome Glisse wrote: > On Mon, May 7, 2012 at 1:59 PM, Jerome Glisse wrote: >>> On 07.05.2012 17:23, Jerome Glisse wrote: On Mon, May 7, 2012 at 7:42 AM, Christian K?nig wrote: > A startover with a new idea for a multiple ring allocator. > Should perform as well as a normal ring allocator as long > as only one ring does somthing, but falls back to a more > complex algorithm if more complex things start to happen. > > We store the last allocated bo in last, we always try to allocate > after the last allocated bo. Principle is that in a linear GPU ring > progression was is after last is the oldest bo we allocated and thus > the first one that should no longer be in use by the GPU. > > If it's not the case we skip over the bo after last to the closest > done bo if such one exist. If none exist and we are not asked to > block we report failure to allocate. > > If we are asked to block we wait on all the oldest fence of all > rings. We just wait for any of those fence to complete. > > v2: We need to be able to let hole point to the list_head, otherwise > try free will never free the first allocation of the list. Also > stop calling radeon_fence_signalled more than necessary. > > Signed-off-by: Christian K?nig > Signed-off-by: Jerome Glisse This one is NAK please use my patch. Yes in my patch we never try to free anything if there is only on sa_bo in the list if you really care about this it's a one line change: http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v2.patch >>> Nope that won't work correctly, "last" is pointing to the last allocation >>> and that's the most unlikely to be freed at this time. Also in this version >>> (like in the one before) radeon_sa_bo_next_hole lets hole point to the >>> "prev" of the found sa_bo without checking if this isn't the lists head. >>> That might cause a crash if an to be freed allocation is the first one in >>> the buffer. >>> >>> What radeon_sa_bo_try_free would need to do to get your approach working is >>> to loop over the end of the buffer and also try to free at the beginning, >>> but saying that keeping the last allocation results in a whole bunch of >>> extra cases and "if"s, while just keeping a pointer to the "hole" (e.g. >>> where the next allocation is most likely to succeed) simplifies the code >>> quite a bit (but I agree that on the down side it makes it harder to >>> understand). >>> Your patch here can enter in infinite loop and never return holding the lock. See below. [SNIP] > + } while (radeon_sa_bo_next_hole(sa_manager, fences)); Here you can infinite loop, in the case there is a bunch of hole in the allocator but none of them allow to full fill the allocation. radeon_sa_bo_next_hole will keep returning true looping over and over on all the all. That's why i only restrict my patch to 2 hole skeeping and then fails the allocation or try to wait. I believe sadly we need an heuristic and 2 hole skeeping at most sounded like a good one. >>> Nope, that can't be an infinite loop, cause radeon_sa_bo_next_hole in >>> conjunction with radeon_sa_bo_try_free are eating up the opportunities for >>> holes. >>> >>> Look again, it probably will never loop more than RADEON_NUM_RINGS + 1, with >>> the exception for allocating in a complete scattered buffer, and even then >>> it will never loop more often than halve the number of current allocations >>> (and that is really really unlikely). >>> >>> Cheers, >>> Christian. >> I looked again and yes it can loop infinitly, think of hole you can >> never free ie radeon_sa_bo_try_free can't free anything. This >> situation can happen if you have several thread allocating sa bo at >> the same time while none of them are yet done with there sa_bo (ie >> none have call sa_bo_free yet). I updated a v3 that track oldest and >> fix all things you were pointing out above. No that isn't a problem, radeon_sa_bo_next_hole takes the firsts entries of the flist, so it only considers holes that have a signaled fence and so can be freed. Having multiple threads allocate objects that can't be freed yet will just result in empty flists, and so radeon_sa_bo_next_hole will return false, resulting in calling radeon_fence_wait_any with an empty fence list, which in turn will result in an ENOENT and abortion of allocation (ok maybe we should catch that and return -ENOMEM instead). So even the corner cases should now be handled fine. >> >> http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v3.patch >> >> Cheers, >> Jerome > Of course by tracking oldest it defeat the algo so updated patch : > > http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v3.patch > > Just fix the corner case of list of single
[Bug 49603] New: [regression] Fullscreen video no longer smooth with GPU in low power mode
https://bugs.freedesktop.org/show_bug.cgi?id=49603 Bug #: 49603 Summary: [regression] Fullscreen video no longer smooth with GPU in low power mode Classification: Unclassified Product: Mesa Version: 8.0 Platform: Other OS/Version: All Status: NEW Severity: normal Priority: medium Component: Drivers/Gallium/r600 AssignedTo: dri-devel at lists.freedesktop.org ReportedBy: sa at whiz.se With Mesa 8.0 I can watch fullscreen videos using Totem (and other players that uses an OpenGL sink) with my GPU set to "low" power mode without problems. With 8.0.1 (and later) this is no longer possible. Every so often I get stalls and what looks like dropped frames. It's a blink and you'll miss it kind of thing, but over a longer period of time (such as watching a movie) it's quite noticeable and annoying. Setting the card to "mid" or higher works around this, but as it's a passive card I would prefer to keep it running in low as much as possible. Bisecting for this bug turns up the below commit, I have confirmed it by reverting this change. (I'm not sure if adding the patch author to the cc list is considered good practice or not?) System environment: -- system architecture: 32-bit -- Linux distribution: Debian unstable -- GPU: REDWOOD -- Model: XFX Radeon HD 5670 1GB -- Display connector: DVI -- xf86-video-ati: 6.14.4 -- xserver: 1.12.1 -- mesa: 8.0.2 -- drm: 2.4.33 -- kernel: 3.3.4 106ea10d1b246aba1a0f4e171fd7d21268f3960f is the first bad commit commit 106ea10d1b246aba1a0f4e171fd7d21268f3960f Author: Simon Farnsworth Date: Tue Feb 14 12:06:20 2012 + r600g: Use a fake reloc to sleep for fences r300g is able to sleep until a fence completes rather than busywait because it creates a special buffer object and relocation that stays busy until the CS containing the fence is finished. Copy the idea into r600g, and use it to sleep if the user asked for an infinite wait, falling back to busywaiting if the user provided a timeout. Signed-off-by: Simon Farnsworth Signed-off-by: Alex Deucher (cherry picked from commit 8cd03b933cf868ff867e2db4a0937005a02fd0e4) Conflicts: src/gallium/drivers/r600/r600_pipe.c :04 04 390170e370f86ee323dce284906ed21693ed9d09 cccea412e6be4f3619422196231e02b375ab4772 Msrc -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[PATCHv5 08/13] v4l: vb2-dma-contig: add support for scatterlist in userptr mode
Hello Thomasz, Laurent, I found an issue in the function vb2_dc_pages_to_sgt() below. I saw that during the attach, the size of the SGT and size requested mis-matched (by atleast 8k bytes). Hence I made a small correction to the code as below. I could then attach the importer properly. Regards, Subash On 04/20/2012 08:15 PM, Tomasz Stanislawski wrote: > From: Andrzej Pietrasiewicz > > This patch introduces usage of dma_map_sg to map memory behind > a userspace pointer to a device as dma-contiguous mapping. > > Signed-off-by: Andrzej Pietrasiewicz > Signed-off-by: Marek Szyprowski > [bugfixing] > Signed-off-by: Kamil Debski > [bugfixing] > Signed-off-by: Tomasz Stanislawski > [add sglist subroutines/code refactoring] > Signed-off-by: Kyungmin Park > --- > drivers/media/video/videobuf2-dma-contig.c | 279 > ++-- > 1 files changed, 262 insertions(+), 17 deletions(-) > > diff --git a/drivers/media/video/videobuf2-dma-contig.c > b/drivers/media/video/videobuf2-dma-contig.c > index 476e536..9cbc8d4 100644 > --- a/drivers/media/video/videobuf2-dma-contig.c > +++ b/drivers/media/video/videobuf2-dma-contig.c > @@ -11,6 +11,8 @@ >*/ > > #include > +#include > +#include > #include > #include > > @@ -22,6 +24,8 @@ struct vb2_dc_buf { > void*vaddr; > unsigned long size; > dma_addr_t dma_addr; > + enum dma_data_direction dma_dir; > + struct sg_table *dma_sgt; > > /* MMAP related */ > struct vb2_vmarea_handler handler; > @@ -32,6 +36,95 @@ struct vb2_dc_buf { > }; > > /*/ > +/*scatterlist table functions*/ > +/*/ > + > +static struct sg_table *vb2_dc_pages_to_sgt(struct page **pages, > + unsigned int n_pages, unsigned long offset, unsigned long size) > +{ > + struct sg_table *sgt; > + unsigned int chunks; > + unsigned int i; > + unsigned int cur_page; > + int ret; > + struct scatterlist *s; > + > + sgt = kzalloc(sizeof *sgt, GFP_KERNEL); > + if (!sgt) > + return ERR_PTR(-ENOMEM); > + > + /* compute number of chunks */ > + chunks = 1; > + for (i = 1; i< n_pages; ++i) > + if (pages[i] != pages[i - 1] + 1) > + ++chunks; > + > + ret = sg_alloc_table(sgt, chunks, GFP_KERNEL); > + if (ret) { > + kfree(sgt); > + return ERR_PTR(-ENOMEM); > + } > + > + /* merging chunks and putting them into the scatterlist */ > + cur_page = 0; > + for_each_sg(sgt->sgl, s, sgt->orig_nents, i) { > + unsigned long chunk_size; > + unsigned int j; size = PAGE_SIZE; > + > + for (j = cur_page + 1; j< n_pages; ++j) for (j = cur_page + 1; j < n_pages; ++j) { > + if (pages[j] != pages[j - 1] + 1) > + break; size += PAGE } > + > + chunk_size = ((j - cur_page)<< PAGE_SHIFT) - offset; > + sg_set_page(s, pages[cur_page], min(size, chunk_size), offset); [DELETE] size -= chunk_size; > + offset = 0; > + cur_page = j; > + } > + > + return sgt; > +} > + > +static void vb2_dc_release_sgtable(struct sg_table *sgt) > +{ > + sg_free_table(sgt); > + kfree(sgt); > +} > + > +static void vb2_dc_sgt_foreach_page(struct sg_table *sgt, > + void (*cb)(struct page *pg)) > +{ > + struct scatterlist *s; > + unsigned int i; > + > + for_each_sg(sgt->sgl, s, sgt->nents, i) { > + struct page *page = sg_page(s); > + unsigned int n_pages = PAGE_ALIGN(s->offset + s->length) > + >> PAGE_SHIFT; > + unsigned int j; > + > + for (j = 0; j< n_pages; ++j, ++page) > + cb(page); > + } > +} > + > +static unsigned long vb2_dc_get_contiguous_size(struct sg_table *sgt) > +{ > + struct scatterlist *s; > + dma_addr_t expected = sg_dma_address(sgt->sgl); > + unsigned int i; > + unsigned long size = 0; > + > + for_each_sg(sgt->sgl, s, sgt->nents, i) { > + if (sg_dma_address(s) != expected) > + break; > + expected = sg_dma_address(s) + sg_dma_len(s); > + size += sg_dma_len(s); > + } > + return size; > +} > + > +/*/ > /* callbacks for all buffers */ > /*/ > > @@ -116,42 +209,194 @@ static int vb2_dc_mmap(void *buf_priv, struct > vm_area_struct *vma) > /* callbacks for USERPTR buffers */ > /*/ > > +static inline int vma_is_io(struct vm_area_struct *vma) > +{ > + return
[Bug 42490] NUTMEG DP to VGA bridge not working
https://bugs.freedesktop.org/show_bug.cgi?id=42490 --- Comment #28 from Jerome Glisse 2012-05-07 12:57:34 PDT --- Does people here have better luck with the patch mentioned previously: drm/radeon/kms: need to set up ss on DP bridges as well -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[Bug 49567] No rule to make target libradeon.a, needed by libr600.a.
https://bugs.freedesktop.org/show_bug.cgi?id=49567 --- Comment #5 from Mike Mestnik2012-05-07 11:59:39 PDT --- This patch worked for me and got me to the next undefined reference. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[PATCH 14/20] drm/radeon: multiple ring allocator v2
On 07.05.2012 17:23, Jerome Glisse wrote: > On Mon, May 7, 2012 at 7:42 AM, Christian K?nig > wrote: >> A startover with a new idea for a multiple ring allocator. >> Should perform as well as a normal ring allocator as long >> as only one ring does somthing, but falls back to a more >> complex algorithm if more complex things start to happen. >> >> We store the last allocated bo in last, we always try to allocate >> after the last allocated bo. Principle is that in a linear GPU ring >> progression was is after last is the oldest bo we allocated and thus >> the first one that should no longer be in use by the GPU. >> >> If it's not the case we skip over the bo after last to the closest >> done bo if such one exist. If none exist and we are not asked to >> block we report failure to allocate. >> >> If we are asked to block we wait on all the oldest fence of all >> rings. We just wait for any of those fence to complete. >> >> v2: We need to be able to let hole point to the list_head, otherwise >> try free will never free the first allocation of the list. Also >> stop calling radeon_fence_signalled more than necessary. >> >> Signed-off-by: Christian K?nig >> Signed-off-by: Jerome Glisse > This one is NAK please use my patch. Yes in my patch we never try to > free anything if there is only on sa_bo in the list if you really care > about this it's a one line change: > http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v2.patch Nope that won't work correctly, "last" is pointing to the last allocation and that's the most unlikely to be freed at this time. Also in this version (like in the one before) radeon_sa_bo_next_hole lets hole point to the "prev" of the found sa_bo without checking if this isn't the lists head. That might cause a crash if an to be freed allocation is the first one in the buffer. What radeon_sa_bo_try_free would need to do to get your approach working is to loop over the end of the buffer and also try to free at the beginning, but saying that keeping the last allocation results in a whole bunch of extra cases and "if"s, while just keeping a pointer to the "hole" (e.g. where the next allocation is most likely to succeed) simplifies the code quite a bit (but I agree that on the down side it makes it harder to understand). > Your patch here can enter in infinite loop and never return holding > the lock. See below. > > [SNIP] >> + } while (radeon_sa_bo_next_hole(sa_manager, fences)); > Here you can infinite loop, in the case there is a bunch of hole in > the allocator but none of them allow to full fill the allocation. > radeon_sa_bo_next_hole will keep returning true looping over and over > on all the all. That's why i only restrict my patch to 2 hole skeeping > and then fails the allocation or try to wait. I believe sadly we need > an heuristic and 2 hole skeeping at most sounded like a good one. Nope, that can't be an infinite loop, cause radeon_sa_bo_next_hole in conjunction with radeon_sa_bo_try_free are eating up the opportunities for holes. Look again, it probably will never loop more than RADEON_NUM_RINGS + 1, with the exception for allocating in a complete scattered buffer, and even then it will never loop more often than halve the number of current allocations (and that is really really unlikely). Cheers, Christian.
[Bug 49484] invalid enum 0x500, invalid value 0x501
https://bugs.freedesktop.org/show_bug.cgi?id=49484 Michal Suchanek changed: What|Removed |Added Status|NEW |RESOLVED Resolution||INVALID --- Comment #6 from Michal Suchanek 2012-05-07 11:03:54 PDT --- Indeed, it works with the texture compression library installed. I guess this is something that Wine should report. Unfortunately, the available messages are very unhelpful. Sorry about the noise. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[Bug 49567] No rule to make target libradeon.a, needed by libr600.a.
https://bugs.freedesktop.org/show_bug.cgi?id=49567 --- Comment #4 from Tom Stellard 2012-05-07 10:43:27 PDT --- Created attachment 61159 --> https://bugs.freedesktop.org/attachment.cgi?id=61159 Possible fix Does it build with this patch? -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
Enhancing EDID quirk functionality
n I'd fixed nouveau for this before. I'll send the fix along, thanks for catching it. - ajax -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20120507/67406281/attachment.pgp>
[Bug 49484] invalid enum 0x500, invalid value 0x501
https://bugs.freedesktop.org/show_bug.cgi?id=49484 --- Comment #5 from Henri Verbeet 2012-05-07 10:37:29 PDT --- That generally happens when an application tries to use a (D3D) format (e.g. DXT/s3tc) even though it's not available. A WINEDEBUG=+d3d,+d3d_surface log should show which format, although typically it's either s3tc or one of the floating point formats. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[PATCH 14/20] drm/radeon: multiple ring allocator v2
On Mon, May 7, 2012 at 4:38 PM, Christian K?nig wrote: > On 07.05.2012 20:52, Jerome Glisse wrote: >> >> On Mon, May 7, 2012 at 1:59 PM, Jerome Glisse ?wrote: On 07.05.2012 17:23, Jerome Glisse wrote: > > On Mon, May 7, 2012 at 7:42 AM, Christian > K?nig > ?wrote: >> >> A startover with a new idea for a multiple ring allocator. >> Should perform as well as a normal ring allocator as long >> as only one ring does somthing, but falls back to a more >> complex algorithm if more complex things start to happen. >> >> We store the last allocated bo in last, we always try to allocate >> after the last allocated bo. Principle is that in a linear GPU ring >> progression was is after last is the oldest bo we allocated and thus >> the first one that should no longer be in use by the GPU. >> >> If it's not the case we skip over the bo after last to the closest >> done bo if such one exist. If none exist and we are not asked to >> block we report failure to allocate. >> >> If we are asked to block we wait on all the oldest fence of all >> rings. We just wait for any of those fence to complete. >> >> v2: We need to be able to let hole point to the list_head, otherwise >> ? ?try free will never free the first allocation of the list. Also >> ? ?stop calling radeon_fence_signalled more than necessary. >> >> Signed-off-by: Christian K?nig >> Signed-off-by: Jerome Glisse > > This one is NAK please use my patch. Yes in my patch we never try to > free anything if there is only on sa_bo in the list if you really care > about this it's a one line change: > > > http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v2.patch Nope that won't work correctly, "last" is pointing to the last allocation and that's the most unlikely to be freed at this time. Also in this version (like in the one before) radeon_sa_bo_next_hole lets hole point to the "prev" of the found sa_bo without checking if this isn't the lists head. That might cause a crash if an to be freed allocation is the first one in the buffer. What radeon_sa_bo_try_free would need to do to get your approach working is to loop over the end of the buffer and also try to free at the beginning, but saying that keeping the last allocation results in a whole bunch of extra cases and "if"s, while just keeping a pointer to the "hole" (e.g. where the next allocation is most likely to succeed) simplifies the code quite a bit (but I agree that on the down side it makes it harder to understand). > Your patch here can enter in infinite loop and never return holding > the lock. See below. > > [SNIP] > >> + ? ? ? ? ? ? ? } while (radeon_sa_bo_next_hole(sa_manager, fences)); > > Here you can infinite loop, in the case there is a bunch of hole in > the allocator but none of them allow to full fill the allocation. > radeon_sa_bo_next_hole will keep returning true looping over and over > on all the all. That's why i only restrict my patch to 2 hole skeeping > and then fails the allocation or try to wait. I believe sadly we need > an heuristic and 2 hole skeeping at most sounded like a good one. Nope, that can't be an infinite loop, cause radeon_sa_bo_next_hole in conjunction with radeon_sa_bo_try_free are eating up the opportunities for holes. Look again, it probably will never loop more than RADEON_NUM_RINGS + 1, with the exception for allocating in a complete scattered buffer, and even then it will never loop more often than halve the number of current allocations (and that is really really unlikely). Cheers, Christian. >>> >>> I looked again and yes it can loop infinitly, think of hole you can >>> never free ie radeon_sa_bo_try_free can't free anything. This >>> situation can happen if you have several thread allocating sa bo at >>> the same time while none of them are yet done with there sa_bo (ie >>> none have call sa_bo_free yet). I updated a v3 that track oldest and >>> fix all things you were pointing out above. > > No that isn't a problem, radeon_sa_bo_next_hole takes the firsts entries of > the flist, so it only considers holes that have a signaled fence and so can > be freed. > > Having multiple threads allocate objects that can't be freed yet will just > result in empty flists, and so radeon_sa_bo_next_hole will return false, > resulting in calling radeon_fence_wait_any with an empty fence list, which > in turn will result in an ENOENT and abortion of allocation (ok maybe we > should catch that and return -ENOMEM instead). > > So even the corner cases should now be handled fine. No, there is still infinite loop possible with gpu lockup, i am against the while (next_hole)
[PATCHv5 08/13] v4l: vb2-dma-contig: add support for scatterlist in userptr mode
Hi Subash, Could you provide a detailed description of a test case that causes a failure of vb2_dc_pages_to_sgt? Regards, Tomasz Stanislawski
[PATCH 04/20] drm/radeon: convert fence to uint64_t v4
On 07.05.2012 16:39, Jerome Glisse wrote: > On Mon, May 7, 2012 at 7:42 AM, Christian K?nig > wrote: >> From: Jerome Glisse >> >> This convert fence to use uint64_t sequence number intention is >> to use the fact that uin64_t is big enough that we don't need to >> care about wrap around. >> >> Tested with and without writeback using 0xF000 as initial >> fence sequence and thus allowing to test the wrap around from >> 32bits to 64bits. >> >> v2: Add comment about possible race btw CPU& GPU, add comment >> stressing that we need 2 dword aligned for R600_WB_EVENT_OFFSET >> Read fence sequenc in reverse order of GPU write them so we >> mitigate the race btw CPU and GPU. >> >> v3: Drop the need for ring to emit the 64bits fence, and just have >> each ring emit the lower 32bits of the fence sequence. We >> handle the wrap over 32bits in fence_process. >> >> v4: Just a small optimization: Don't reread the last_seq value >> if loop restarts, since we already know its value anyway. >> Also start at zero not one for seq value and use pre instead >> of post increment in emmit, otherwise wait_empty will deadlock. > Why changing that v3 was already good no deadlock. I started at 1 > especialy for that, a signaled fence is set to 0 so it always compare > as signaled. Just using preincrement is exactly like starting at one. > I don't see the need for this change but if it makes you happy. Not exactly, the last emitted sequence is also used in radeon_fence_wait_empty. So when you use post increment radeon_fence_wait_empty will actually not wait for the last emitted fence to be signaled, but for last emitted + 1, so it practically waits forever. Without this change suspend (for example) will just lockup. Cheers, Christian. > > Cheers, > Jerome >> Signed-off-by: Jerome Glisse >> Signed-off-by: Christian K?nig >> --- >> drivers/gpu/drm/radeon/radeon.h | 39 ++- >> drivers/gpu/drm/radeon/radeon_fence.c | 116 >> +++-- >> drivers/gpu/drm/radeon/radeon_ring.c |9 ++- >> 3 files changed, 107 insertions(+), 57 deletions(-) >> >> diff --git a/drivers/gpu/drm/radeon/radeon.h >> b/drivers/gpu/drm/radeon/radeon.h >> index e99ea81..cdf46bc 100644 >> --- a/drivers/gpu/drm/radeon/radeon.h >> +++ b/drivers/gpu/drm/radeon/radeon.h >> @@ -100,28 +100,32 @@ extern int radeon_lockup_timeout; >> * Copy from radeon_drv.h so we don't have to include both and have >> conflicting >> * symbol; >> */ >> -#define RADEON_MAX_USEC_TIMEOUT10 /* 100 ms */ >> -#define RADEON_FENCE_JIFFIES_TIMEOUT (HZ / 2) >> +#define RADEON_MAX_USEC_TIMEOUT10 /* 100 ms */ >> +#define RADEON_FENCE_JIFFIES_TIMEOUT (HZ / 2) >> /* RADEON_IB_POOL_SIZE must be a power of 2 */ >> -#define RADEON_IB_POOL_SIZE16 >> -#define RADEON_DEBUGFS_MAX_COMPONENTS 32 >> -#define RADEONFB_CONN_LIMIT4 >> -#define RADEON_BIOS_NUM_SCRATCH8 >> +#define RADEON_IB_POOL_SIZE16 >> +#define RADEON_DEBUGFS_MAX_COMPONENTS 32 >> +#define RADEONFB_CONN_LIMIT4 >> +#define RADEON_BIOS_NUM_SCRATCH8 >> >> /* max number of rings */ >> -#define RADEON_NUM_RINGS 3 >> +#define RADEON_NUM_RINGS 3 >> + >> +/* fence seq are set to this number when signaled */ >> +#define RADEON_FENCE_SIGNALED_SEQ 0LL >> +#define RADEON_FENCE_NOTEMITED_SEQ (~0LL) >> >> /* internal ring indices */ >> /* r1xx+ has gfx CP ring */ >> -#define RADEON_RING_TYPE_GFX_INDEX 0 >> +#define RADEON_RING_TYPE_GFX_INDEX 0 >> >> /* cayman has 2 compute CP rings */ >> -#define CAYMAN_RING_TYPE_CP1_INDEX 1 >> -#define CAYMAN_RING_TYPE_CP2_INDEX 2 >> +#define CAYMAN_RING_TYPE_CP1_INDEX 1 >> +#define CAYMAN_RING_TYPE_CP2_INDEX 2 >> >> /* hardcode those limit for now */ >> -#define RADEON_VA_RESERVED_SIZE(8<< 20) >> -#define RADEON_IB_VM_MAX_SIZE (64<< 10) >> +#define RADEON_VA_RESERVED_SIZE(8<< 20) >> +#define RADEON_IB_VM_MAX_SIZE (64<< 10) >> >> /* >> * Errata workarounds. >> @@ -254,8 +258,9 @@ struct radeon_fence_driver { >> uint32_tscratch_reg; >> uint64_tgpu_addr; >> volatile uint32_t *cpu_addr; >> - atomic_tseq; >> - uint32_tlast_seq; >> + /* seq is protected by ring emission lock */ >> + uint64_tseq; >> + atomic64_t last_seq; >> unsigned long last_activity; >> wait_queue_head_t queue; >> struct list_heademitted; >> @@ -268,11 +273,9 @@ struct radeon_fence { >> struct kref kref; >>
[Bug 49484] invalid enum 0x500, invalid value 0x501
https://bugs.freedesktop.org/show_bug.cgi?id=49484 --- Comment #4 from Michal Suchanek 2012-05-07 10:02:05 PDT --- invalid value: Breakpoint 1, _mesa_error (ctx=0xccba90, error=1281, fmtString=0x74706278 "glTexImage%dD(internalFormat=%s)") at main/errors.c:996 996main/errors.c: No such file or directory. (gdb) bt full #0 _mesa_error (ctx=0xccba90, error=1281, fmtString=0x74706278 "glTexImage%dD(internalFormat=%s)") at main/errors.c:996 do_output = 225 '\341' do_log = #1 0x745e2a84 in texture_error_check (border=0, depth=1, height=64, width=64, type=0, format=0, internalFormat=0, level=0, target=3553, dimensions=2, ctx=0xccba90) at main/teximage.c:1621 proxyTarget = err = indexFormat = 0 '\000' isProxy = sizeOK = 1 '\001' colorFormat = #2 teximage (ctx=0xccba90, dims=2, target=3553, level=0, internalFormat=0, width=64, height=64, depth=1, border=0, format=0, type=0, pixels=0x0) at main/teximage.c:2501 error = 1 '\001' unpack_no_border = {Alignment = -7152, RowLength = 32767, SkipPixels = 9180912, SkipRows = 0, ImageHeight = -8144, SkipImages = 32767, SwapBytes = 45 '-', LsbFirst = 17 '\021', Invert = 90 'Z', BufferObj = 0x7fffe410} unpack = 0xcd22e0 #3 0x745e2fc4 in _mesa_TexImage2D (target=, level=, internalFormat=, width=, height=, border=, format=0, type=0, pixels=0x0) at main/teximage.c:2639 No locals. #4 0x00480145 in ?? () No symbol table info available. #5 0x004bbfd6 in ?? () No symbol table info available. #6 0x00440687 in ?? () No symbol table info available. #7 0x0043b985 in ?? () No symbol table info available. #8 0x0043c092 in ?? () No symbol table info available. #9 0x7696eead in __libc_start_main (main=, argc=, ubp_av=, init=, fini=, rtld_fini=, stack_end=0x7fffe408) at libc-start.c:228 result = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0, 738182590451561014, 4428032, 140737488348176, 0, 0, -738182590032367050, -738203265834012106}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x5ddb20, 0x7fffe418}, data = {prev = 0x0, cleanup = 0x0, canceltype = 6150944}}} not_first_call = #10 0x00439129 in _start () No symbol table info available. (gdb) c Continuing. 37923 glTexImage2D(target = GL_TEXTURE_2D, level = 0, internalformat = GL_ZERO, width = 64, height = 64, border = 0, format = GL_ZERO, type = GL_ZERO, pixels = NULL) 37923: warning: glGetError(glTexImage2D) = GL_INVALID_VALUE -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[Bug 49567] No rule to make target libradeon.a, needed by libr600.a.
https://bugs.freedesktop.org/show_bug.cgi?id=49567 --- Comment #3 from Mike Mestnik2012-05-07 09:58:45 PDT --- Tom, The short of it: I'm already doing that. The long: I took a look at that script and it eventually just calls "autoreconf -v --install" my log clearly shows "autoreconf -vfi" being called. Also note that that script will call configure. Thanks! -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[PATCH] mm: Work around Intel SNB GTT bug with some physical pages.
St?phane Marchesin writes: > While investing some Sandy Bridge rendering corruption, I found out > that all physical memory pages below 1MiB were returning garbage when > read through the GTT. This has been causing graphics corruption (when > it's used for textures, render targets and pixmaps) and GPU hangups > (when it's used for GPU batch buffers). It would be possible to exlude GFP_DMA from the page allocator. That covers the first 16MB. You just need a custom zone list with ZONE_DMA. -Andi -- ak at linux.intel.com -- Speaking for myself only
[Bug 49484] invalid enum 0x500, invalid value 0x501
https://bugs.freedesktop.org/show_bug.cgi?id=49484 --- Comment #3 from Michal Suchanek 2012-05-07 09:55:15 PDT --- I get no mesa warnings, only warnings from wine about Mesa returning GL_INVALID* -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[PATCH] mm: Work around Intel SNB GTT bug with some physical pages.
While investing some Sandy Bridge rendering corruption, I found out that all physical memory pages below 1MiB were returning garbage when read through the GTT. This has been causing graphics corruption (when it's used for textures, render targets and pixmaps) and GPU hangups (when it's used for GPU batch buffers). I talked with some people at Intel and they confirmed my findings, and said that a couple of other random pages were also affected. We could fix this problem by adding an e820 region preventing the memory below 1 MiB to be used, but that prevents at least my machine from booting. One could think that we should be able to fix it in i915, but since the allocation is done by the backing shmem this is not possible. In the end, I came up with the ugly workaround of just leaking the offending pages in shmem.c. I do realize it's truly ugly, but I'm looking for a fix to the existing code, and am wondering if people on this list have a better idea, short of rewriting i915_gem.c to allocate its own pages directly. Signed-off-by: St?phane Marchesin Change-Id: I957e125fb280e0b0d6b05a83cc4068df2f05aa0a --- mm/shmem.c | 39 +-- 1 files changed, 37 insertions(+), 2 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 6c253f7..dcbb58b 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -768,6 +768,31 @@ redirty: return 0; } +/* + * Some intel GPUs can't use those pages in the GTT, which results in + * graphics corruption. Sadly, it's impossible to prevent usage of those + * pages in the intel allocator. + * + * Instead, we test for those areas here and leak the corresponding pages. + * + * Some day, when the intel GPU memory is not backed by shmem any more, + * we'll be able to come up with a solution which is contained in i915. + */ +static bool i915_usable_page(struct page *page) +{ + dma_addr_t addr = page_to_phys(page); + + if (unlikely((addr < 1 * 1024 * 1024) || + (addr == 0x2005) || + (addr == 0x2011) || + (addr == 0x2013) || + (addr == 0x20138000) || + (addr == 0x40004000))) + return false; + + return true; +} + #ifdef CONFIG_NUMA #ifdef CONFIG_TMPFS static void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol) @@ -816,6 +841,7 @@ static struct page *shmem_alloc_page(gfp_t gfp, struct shmem_inode_info *info, pgoff_t index) { struct vm_area_struct pvma; + struct page *page; /* Create a pseudo vma that just contains the policy */ pvma.vm_start = 0; @@ -826,7 +852,11 @@ static struct page *shmem_alloc_page(gfp_t gfp, /* * alloc_page_vma() will drop the shared policy reference */ - return alloc_page_vma(gfp, , 0); + do { + page = alloc_page_vma(gfp, , 0); + } while (!i915_usable_page(page)); + + return page; } #else /* !CONFIG_NUMA */ #ifdef CONFIG_TMPFS @@ -844,7 +874,12 @@ static inline struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp, static inline struct page *shmem_alloc_page(gfp_t gfp, struct shmem_inode_info *info, pgoff_t index) { - return alloc_page(gfp); + struct page *page; + do { + page = alloc_page(gfp); + } while (!i915_usable_page(page)); + + return page; } #endif /* CONFIG_NUMA */ -- 1.7.5.3.367.ga9930
[RFC 05/13] v4l: vb2-dma-contig: add support for DMABUF exporting
Hi Tomasz, Sorry for the late reply, this one slipped through the cracks. On Thursday 19 April 2012 12:42:12 Tomasz Stanislawski wrote: > On 04/17/2012 04:08 PM, Laurent Pinchart wrote: > > On Tuesday 10 April 2012 15:10:39 Tomasz Stanislawski wrote: > >> This patch adds support for exporting a dma-contig buffer using > >> DMABUF interface. > >> > >> Signed-off-by: Tomasz Stanislawski > >> Signed-off-by: Kyungmin Park > >> --- > > [snip] > > >> +static struct sg_table *vb2_dc_dmabuf_ops_map( > >> + struct dma_buf_attachment *db_attach, enum dma_data_direction dir) > >> +{ > >> + struct dma_buf *dbuf = db_attach->dmabuf; > >> + struct vb2_dc_buf *buf = dbuf->priv; > >> + struct vb2_dc_attachment *attach = db_attach->priv; > >> + struct sg_table *sgt; > >> + struct scatterlist *rd, *wr; > >> + int i, ret; > > > > You can make i an unsigned int :-) > > Right.. splitting declaration may be also a good idea :) > > >> + > >> + /* return previously mapped sg table */ > >> + if (attach) > >> + return >sgt; > > > > This effectively keeps the mapping around as long as the attachment > > exists. We don't try to swap out buffers in V4L2 as is done in DRM at the > > moment, so it might not be too much of an issue, but the behaviour of the > > implementation will change if we later decide to map/unmap the buffers in > > the map/unmap handlers. Do you think that could be a problem ? > > I don't that it is a problem. If an importer calls dma_map_sg then caching > sgt on an exporter side reduces a cost of an allocating and an > initialization of sgt. > > >> + > >> + attach = kzalloc(sizeof *attach, GFP_KERNEL); > >> + if (!attach) > >> + return ERR_PTR(-ENOMEM); > > > > Why don't you allocate the vb2_dc_attachment here instead of > > vb2_dc_dmabuf_ops_attach() ? > > Good point. > The attachment could be allocated at vb2_dc_attachment but all its > fields would be uninitialized. I mean an empty sgt and an undefined > dma direction. I decided to allocate the attachment in vb2_dc_dmabuf_ops_map > because only than all information needed to create a valid attachment > object are available. > > The other solution might be the allocation at vb2_dc_attachment. The field > dir would be set to DMA_NONE. If this filed is equal to DMA_NONE at > vb2_dc_dmabuf_ops_map then sgt is allocated and mapped and direction field > is updated. If value is not DMA_NONE then the sgt is reused. > > Do you think that it is a good idea? I think I would prefer that. It sounds more logical to allocate the attachment in the attach operation handler. > >> + sgt = >sgt; > >> + attach->dir = dir; > >> + > >> + /* copying the buf->base_sgt to attachment */ > > > > I would add an explanation regarding why you need to copy the SG list. > > Something like. > > > > "Copy the buf->base_sgt scatter list to the attachment, as we can't map > > the same scatter list to multiple devices at the same time." > > ok > > >> + ret = sg_alloc_table(sgt, buf->sgt_base->orig_nents, GFP_KERNEL); > >> + if (ret) { > >> + kfree(attach); > >> + return ERR_PTR(-ENOMEM); > >> + } > >> + > >> + rd = buf->sgt_base->sgl; > >> + wr = sgt->sgl; > >> + for (i = 0; i < sgt->orig_nents; ++i) { > >> + sg_set_page(wr, sg_page(rd), rd->length, rd->offset); > >> + rd = sg_next(rd); > >> + wr = sg_next(wr); > >> + } > >> > >> + /* mapping new sglist to the client */ > >> + ret = dma_map_sg(db_attach->dev, sgt->sgl, sgt->orig_nents, dir); > >> + if (ret <= 0) { > >> + printk(KERN_ERR "failed to map scatterlist\n"); > >> + sg_free_table(sgt); > >> + kfree(attach); > >> + return ERR_PTR(-EIO); > >> + } > >> + > >> + db_attach->priv = attach; > >> + > >> + return sgt; > >> +} > >> + > >> +static void vb2_dc_dmabuf_ops_unmap(struct dma_buf_attachment > >> *db_attach, > >> + struct sg_table *sgt, enum dma_data_direction dir) > >> +{ > >> + /* nothing to be done here */ > >> +} > >> + > >> +static void vb2_dc_dmabuf_ops_release(struct dma_buf *dbuf) > >> +{ > >> + /* drop reference obtained in vb2_dc_get_dmabuf */ > >> + vb2_dc_put(dbuf->priv); > > > > Shouldn't you set vb2_dc_buf::dma_buf to NULL here ? Otherwise the next > > vb2_dc_get_dmabuf() call will return a DMABUF object that has been freed. > > No. > > The buffer object is destroyed at vb2_dc_put when reference count drops to > 0. It happens could happen after only REQBUF(count=0) or on last close(). > The DMABUF object is created only for MMAP buffers. The DMABUF object is > based only on results of dma_alloc_coherent and dma_get_pages (or its future > equivalent). Therefore the DMABUF object is valid as long as the buffer is > valid. OK. > Notice that dmabuf object could be created in vb2_dc_alloc. I moved it to > vb2_dc_get_dmabuf to avoid a creation of an object that may not be used. > > >> +} > >> + > >> +static struct dma_buf_ops vb2_dc_dmabuf_ops = { > >> + .attach =
[PATCH 14/20] drm/radeon: multiple ring allocator v2
On Mon, May 7, 2012 at 1:59 PM, Jerome Glisse wrote: >> On 07.05.2012 17:23, Jerome Glisse wrote: >>> >>> On Mon, May 7, 2012 at 7:42 AM, Christian K?nig >>> ?wrote: A startover with a new idea for a multiple ring allocator. Should perform as well as a normal ring allocator as long as only one ring does somthing, but falls back to a more complex algorithm if more complex things start to happen. We store the last allocated bo in last, we always try to allocate after the last allocated bo. Principle is that in a linear GPU ring progression was is after last is the oldest bo we allocated and thus the first one that should no longer be in use by the GPU. If it's not the case we skip over the bo after last to the closest done bo if such one exist. If none exist and we are not asked to block we report failure to allocate. If we are asked to block we wait on all the oldest fence of all rings. We just wait for any of those fence to complete. v2: We need to be able to let hole point to the list_head, otherwise ? ?try free will never free the first allocation of the list. Also ? ?stop calling radeon_fence_signalled more than necessary. Signed-off-by: Christian K?nig Signed-off-by: Jerome Glisse >>> >>> This one is NAK please use my patch. Yes in my patch we never try to >>> free anything if there is only on sa_bo in the list if you really care >>> about this it's a one line change: >>> >>> http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v2.patch >> >> Nope that won't work correctly, "last" is pointing to the last allocation >> and that's the most unlikely to be freed at this time. Also in this version >> (like in the one before) radeon_sa_bo_next_hole lets hole point to the >> "prev" of the found sa_bo without checking if this isn't the lists head. >> That might cause a crash if an to be freed allocation is the first one in >> the buffer. >> >> What radeon_sa_bo_try_free would need to do to get your approach working is >> to loop over the end of the buffer and also try to free at the beginning, >> but saying that keeping the last allocation results in a whole bunch of >> extra cases and "if"s, while just keeping a pointer to the "hole" (e.g. >> where the next allocation is most likely to succeed) simplifies the code >> quite a bit (but I agree that on the down side it makes it harder to >> understand). >> >>> Your patch here can enter in infinite loop and never return holding >>> the lock. See below. >>> >>> [SNIP] >>> + ? ? ? ? ? ? ? } while (radeon_sa_bo_next_hole(sa_manager, fences)); >>> >>> Here you can infinite loop, in the case there is a bunch of hole in >>> the allocator but none of them allow to full fill the allocation. >>> radeon_sa_bo_next_hole will keep returning true looping over and over >>> on all the all. That's why i only restrict my patch to 2 hole skeeping >>> and then fails the allocation or try to wait. I believe sadly we need >>> an heuristic and 2 hole skeeping at most sounded like a good one. >> >> Nope, that can't be an infinite loop, cause radeon_sa_bo_next_hole in >> conjunction with radeon_sa_bo_try_free are eating up the opportunities for >> holes. >> >> Look again, it probably will never loop more than RADEON_NUM_RINGS + 1, with >> the exception for allocating in a complete scattered buffer, and even then >> it will never loop more often than halve the number of current allocations >> (and that is really really unlikely). >> >> Cheers, >> Christian. > > I looked again and yes it can loop infinitly, think of hole you can > never free ie radeon_sa_bo_try_free can't free anything. This > situation can happen if you have several thread allocating sa bo at > the same time while none of them are yet done with there sa_bo (ie > none have call sa_bo_free yet). I updated a v3 that track oldest and > fix all things you were pointing out above. > > http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v3.patch > > Cheers, > Jerome Of course by tracking oldest it defeat the algo so updated patch : http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v3.patch Just fix the corner case of list of single entry. Cheers, Jerome
Enhancing EDID quirk functionality
On 05/03/2012 02:42 PM, Adam Jackson wrote: > This looks good, thank you for taking it on. It was either that or give up on my big display, so ... you're welcome. > I'd like to see documentation for the bit values of the quirks as well. > And, ideally, this would also have some runtime API for manipulating the > quirk list, so that way you can test new quirks without needing a reboot > cycle. I agree that the bit values should be documented. I'm not sure where that documentation should go, however, since I can't find any documentation of the existing drm module parameters. Tell me where it should go, and I'll happily write the doc. I also agree that it would be nice to be able to manipulate the quirk list at runtime, and I did think about trying to enable that. I held off for a couple of reasons: 1) I'm a total noob at kernel code, so things like in-kernel locking, sysfs, memory management, etc., that would be required for a more dynamic API are all new to me. That said, I'm more that willing to give it a go, if I can get some guidance on those (and similar) topics. 2) I'm not sure how a runtime API should work. The simplest possibility is to just take a string, parse it, and overwrite the old extra quirk list with the new list. The downside to this is that all of the existing extra quirks need to be repeated to change a single quirk. > To close the loop all the way on that I'd also want to be able to scrape > the quirk list back out from that API, but that's not completely clean > right now. Sound like a couple of sysfs files to me, one for the built-in quirks and one for the extra quirks -- maybe one quirk per line? See my comments about the sysfs API above. > We're being a little cavalier with the quirk list as it > stands because we don't differentiate among phy layers, and I can easily > imagine a monitor that needs a quirk on DVI but where the same quirk on > the same monitors' VGA would break it. I don't think this has caused > problems yet, but. Now you're above my pay grade. What little I've read discovered about the way DisplayPort, HDMI, VGA, and DVI play together makes me think this is a nightmare best deferred, hopefully forever. > InfoFrames are not valid for non-HDMI sinks, so yes, I'd call that a bug. That's pretty much what I figured. > Where the EDID for DP-1 appears to be truncated: the "extension" field > (second byte from the end) is 1 as you'd expect for an HDMI monitor, but > there's no extension block. How big of a file do you get from > /sys/class/drm/*/edid for that port? The EDID data in sysfs is 256 bytes, which I believe means that it does include the extension block. I just tried connecting an HDMI TV to my laptop, and I saw the same behavior -- 256-byte edid file in sysfs, but "xrandr --verbose" only shows 128 bytes. When I attach the same TV to my workstation with Intel "HD 2000" graphics, "xrandr --verbose" shows all 256 bytes of EDID data. So it appears that the full data is being read by both systems, but the behavior of xrandr (or presumably whatever API xrandr uses to get the EDID data that it displays) differs between the two drivers. Fun. Thanks! -- Ian Pilcher arequipeno at gmail.com "If you're going to shift my paradigm ... at least buy me dinner first."
[PULL] drm-intel-next manual merge
Hi Dave, As discussed on irc, here's the pull request for the manual merge to unconfuse git about the changes in intel_display.c. Note that I've manually frobbed the shortlog to exclude all the changes merge through Linus' tree. Yours, Daniel The following changes since commit 5bc69bf9aeb73547cad8e1ce683a103fe9728282: Merge tag 'drm-intel-next-2012-04-23' of git://people.freedesktop.org/~danvet/drm-intel into drm-core-next (2012-05-02 09:22:29 +0100) are available in the git repository at: git://people.freedesktop.org/~danvet/drm-intel for-airlied for you to fetch changes up to dc257cf154be708ecc47b8b89c12ad8cd2cc35e4: Merge tag 'v3.4-rc6' into drm-intel-next (2012-05-07 14:02:14 +0200) Daniel Vetter (1): Merge tag 'v3.4-rc6' into drm-intel-next -- Daniel Vetter Mail: daniel at ffwll.ch Mobile: +41 (0)79 365 57 48
[Bug 49567] No rule to make target libradeon.a, needed by libr600.a.
https://bugs.freedesktop.org/show_bug.cgi?id=49567 --- Comment #2 from Tom Stellard 2012-05-07 07:18:45 PDT --- If you re-run autogen.sh and configure does that fix the problem? -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[PATCH v2 3/4] drm/exynos: added userptr feature.
On Sat, May 5, 2012 at 6:22 AM, Dave Airlie wrote: > On Sat, May 5, 2012 at 11:19 AM, wrote: >> Hi Dave, >> >> 2012. 4. 25. ?? 7:15 Dave Airlie ??: >> >>> On Tue, Apr 24, 2012 at 6:17 AM, Inki Dae wrote: this feature could be used to use memory region allocated by malloc() in user mode and mmaped memory region allocated by other memory allocators. userptr interface can identify memory type through vm_flags value and would get pages or page frame numbers to user space appropriately. >>> >>> Is there anything to stop the unpriviledged userspace driver locking >>> all the RAM in the machine inside userptr? >>> >> >> you mean that there is something that it can stop user space driver locking >> some memory region of RAM? and if any user space driver locked some region >> then anyone on user space can't access the region? could you please tell me >> about your concerns in more detail so that we can solve the issue? I guess >> you mean that any user level driver such as specific EGL library can >> allocate some memory region and also lock the region so that other user >> space applications can't access the region until rendering is completed by >> hw accelerator such as 2d/3d core or opposite case. >> >> actually, this feature has already been used by v4l2 so I didn't try to >> consider we could face with any problem with this and I've got a feeling >> maybe there is something I missed so I'd be happy for you or anyone give me >> any advices. > > Well v4l get to make their own bad design decisions. > > The problem is if an unprivledged users accessing the drm can lock all > the pages it allocates into memory, by passing them to the kernel as > userptrs., thus bypassing the swap and blocking all other users on the > system. > > Dave. Beside that you are not locking the vma and afaik this means that the page backing the vma might change, yes you will still own the page you get but userspace might be reading/writing to different pages. The vma would need to be locked but than the userspace might unlock it in your back and you start right from the begining. Cheers, Jerome
[RFC][PATCH] drm/radeon/hdmi: define struct for AVI infoframe
On Mon, May 7, 2012 at 3:38 AM, Michel D?nzer wrote: > On Son, 2012-05-06 at 18:29 +0200, Rafa? Mi?ecki wrote: >> 2012/5/6 Dave Airlie : >> > On Sun, May 6, 2012 at 5:19 PM, Rafa? Mi?ecki wrote: >> >> 2012/5/6 Rafa? Mi?ecki : >> >>> diff --git a/drivers/gpu/drm/radeon/r600_hdmi.c >> >>> b/drivers/gpu/drm/radeon/r600_hdmi.c >> >>> index c308432..b14c90a 100644 >> >>> --- a/drivers/gpu/drm/radeon/r600_hdmi.c >> >>> +++ b/drivers/gpu/drm/radeon/r600_hdmi.c >> >>> @@ -134,78 +134,22 @@ static void r600_hdmi_infoframe_checksum(uint8_t >> >>> packetType, >> >>> ?} >> >>> >> >>> ?/* >> >>> - * build a HDMI Video Info Frame >> >>> + * Upload a HDMI AVI Infoframe >> >>> ?*/ >> >>> -static void r600_hdmi_videoinfoframe( >> >>> - ? ? ? struct drm_encoder *encoder, >> >>> - ? ? ? enum r600_hdmi_color_format color_format, >> >>> - ? ? ? int active_information_present, >> >>> - ? ? ? uint8_t active_format_aspect_ratio, >> >>> - ? ? ? uint8_t scan_information, >> >>> - ? ? ? uint8_t colorimetry, >> >>> - ? ? ? uint8_t ex_colorimetry, >> >>> - ? ? ? uint8_t quantization, >> >>> - ? ? ? int ITC, >> >>> - ? ? ? uint8_t picture_aspect_ratio, >> >>> - ? ? ? uint8_t video_format_identification, >> >>> - ? ? ? uint8_t pixel_repetition, >> >>> - ? ? ? uint8_t non_uniform_picture_scaling, >> >>> - ? ? ? uint8_t bar_info_data_valid, >> >>> - ? ? ? uint16_t top_bar, >> >>> - ? ? ? uint16_t bottom_bar, >> >>> - ? ? ? uint16_t left_bar, >> >>> - ? ? ? uint16_t right_bar >> >>> -) >> >> >> >> In case someone wonders about the reason: I think it's really ugly to >> >> have a function taking 18 arguments, 17 of them related to the >> >> infoframe. It makes much more sense for me to use struct for that. >> >> While working on that I though it's reasonable to prepare nice >> >> bitfield __packed struct ready-to-be-written to the GPU registers. >> > >> > won't this screw up on other endian machines? >> >> Hm, maybe it can. Is there some easy to handle it correctly? Some trick like >> __le8 foo: 3 >> __le8 bar: 1 >> maybe? > > Not really. The memory layout of bitfields is basically completely up to > the C implementation, so IMHO they're just inadequate for describing > fixed memory layouts. > > Yes i agree please stay away from bitfields, i know it looks cool but bitshift is cool too. Cheers, Jerome
Fwd: [PATCH 14/20] drm/radeon: multiple ring allocator v2
> On 07.05.2012 17:23, Jerome Glisse wrote: >> >> On Mon, May 7, 2012 at 7:42 AM, Christian K?nig >> ?wrote: >>> >>> A startover with a new idea for a multiple ring allocator. >>> Should perform as well as a normal ring allocator as long >>> as only one ring does somthing, but falls back to a more >>> complex algorithm if more complex things start to happen. >>> >>> We store the last allocated bo in last, we always try to allocate >>> after the last allocated bo. Principle is that in a linear GPU ring >>> progression was is after last is the oldest bo we allocated and thus >>> the first one that should no longer be in use by the GPU. >>> >>> If it's not the case we skip over the bo after last to the closest >>> done bo if such one exist. If none exist and we are not asked to >>> block we report failure to allocate. >>> >>> If we are asked to block we wait on all the oldest fence of all >>> rings. We just wait for any of those fence to complete. >>> >>> v2: We need to be able to let hole point to the list_head, otherwise >>> ? ?try free will never free the first allocation of the list. Also >>> ? ?stop calling radeon_fence_signalled more than necessary. >>> >>> Signed-off-by: Christian K?nig >>> Signed-off-by: Jerome Glisse >> >> This one is NAK please use my patch. Yes in my patch we never try to >> free anything if there is only on sa_bo in the list if you really care >> about this it's a one line change: >> >> http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v2.patch > > Nope that won't work correctly, "last" is pointing to the last allocation > and that's the most unlikely to be freed at this time. Also in this version > (like in the one before) radeon_sa_bo_next_hole lets hole point to the > "prev" of the found sa_bo without checking if this isn't the lists head. > That might cause a crash if an to be freed allocation is the first one in > the buffer. > > What radeon_sa_bo_try_free would need to do to get your approach working is > to loop over the end of the buffer and also try to free at the beginning, > but saying that keeping the last allocation results in a whole bunch of > extra cases and "if"s, while just keeping a pointer to the "hole" (e.g. > where the next allocation is most likely to succeed) simplifies the code > quite a bit (but I agree that on the down side it makes it harder to > understand). > >> Your patch here can enter in infinite loop and never return holding >> the lock. See below. >> >> [SNIP] >> >>> + ? ? ? ? ? ? ? } while (radeon_sa_bo_next_hole(sa_manager, fences)); >> >> Here you can infinite loop, in the case there is a bunch of hole in >> the allocator but none of them allow to full fill the allocation. >> radeon_sa_bo_next_hole will keep returning true looping over and over >> on all the all. That's why i only restrict my patch to 2 hole skeeping >> and then fails the allocation or try to wait. I believe sadly we need >> an heuristic and 2 hole skeeping at most sounded like a good one. > > Nope, that can't be an infinite loop, cause radeon_sa_bo_next_hole in > conjunction with radeon_sa_bo_try_free are eating up the opportunities for > holes. > > Look again, it probably will never loop more than RADEON_NUM_RINGS + 1, with > the exception for allocating in a complete scattered buffer, and even then > it will never loop more often than halve the number of current allocations > (and that is really really unlikely). > > Cheers, > Christian. I looked again and yes it can loop infinitly, think of hole you can never free ie radeon_sa_bo_try_free can't free anything. This situation can happen if you have several thread allocating sa bo at the same time while none of them are yet done with there sa_bo (ie none have call sa_bo_free yet). I updated a v3 that track oldest and fix all things you were pointing out above. http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v3.patch Cheers, Jerome
[PATCH 20/20] drm/radeon: make the ib an inline object
From: Jerome GlisseNo need to malloc it any more. Signed-off-by: Jerome Glisse Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/evergreen_cs.c | 10 +++--- drivers/gpu/drm/radeon/r100.c | 38 ++-- drivers/gpu/drm/radeon/r200.c |2 +- drivers/gpu/drm/radeon/r300.c |4 +- drivers/gpu/drm/radeon/r600.c | 16 drivers/gpu/drm/radeon/r600_cs.c | 22 +-- drivers/gpu/drm/radeon/radeon.h |8 ++-- drivers/gpu/drm/radeon/radeon_cs.c| 63 - drivers/gpu/drm/radeon/radeon_ring.c | 41 +++-- 9 files changed, 93 insertions(+), 111 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c b/drivers/gpu/drm/radeon/evergreen_cs.c index 70089d3..4e7dd2b 100644 --- a/drivers/gpu/drm/radeon/evergreen_cs.c +++ b/drivers/gpu/drm/radeon/evergreen_cs.c @@ -1057,7 +1057,7 @@ static int evergreen_cs_packet_parse_vline(struct radeon_cs_parser *p) uint32_t header, h_idx, reg, wait_reg_mem_info; volatile uint32_t *ib; - ib = p->ib->ptr; + ib = p->ib.ptr; /* parse the WAIT_REG_MEM */ r = evergreen_cs_packet_parse(p, _reg_mem, p->idx); @@ -1215,7 +1215,7 @@ static int evergreen_cs_check_reg(struct radeon_cs_parser *p, u32 reg, u32 idx) if (!(evergreen_reg_safe_bm[i] & m)) return 0; } - ib = p->ib->ptr; + ib = p->ib.ptr; switch (reg) { /* force following reg to 0 in an attempt to disable out buffer * which will need us to better understand how it works to perform @@ -1896,7 +1896,7 @@ static int evergreen_packet3_check(struct radeon_cs_parser *p, u32 idx_value; track = (struct evergreen_cs_track *)p->track; - ib = p->ib->ptr; + ib = p->ib.ptr; idx = pkt->idx + 1; idx_value = radeon_get_ib_value(p, idx); @@ -2610,8 +2610,8 @@ int evergreen_cs_parse(struct radeon_cs_parser *p) } } while (p->idx < p->chunks[p->chunk_ib_idx].length_dw); #if 0 - for (r = 0; r < p->ib->length_dw; r++) { - printk(KERN_INFO "%05d 0x%08X\n", r, p->ib->ptr[r]); + for (r = 0; r < p->ib.length_dw; r++) { + printk(KERN_INFO "%05d 0x%08X\n", r, p->ib.ptr[r]); mdelay(1); } #endif diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c index ad6ceb7..0874a6d 100644 --- a/drivers/gpu/drm/radeon/r100.c +++ b/drivers/gpu/drm/radeon/r100.c @@ -139,9 +139,9 @@ int r100_reloc_pitch_offset(struct radeon_cs_parser *p, } tmp |= tile_flags; - p->ib->ptr[idx] = (value & 0x3fc0) | tmp; + p->ib.ptr[idx] = (value & 0x3fc0) | tmp; } else - p->ib->ptr[idx] = (value & 0xffc0) | tmp; + p->ib.ptr[idx] = (value & 0xffc0) | tmp; return 0; } @@ -156,7 +156,7 @@ int r100_packet3_load_vbpntr(struct radeon_cs_parser *p, volatile uint32_t *ib; u32 idx_value; - ib = p->ib->ptr; + ib = p->ib.ptr; track = (struct r100_cs_track *)p->track; c = radeon_get_ib_value(p, idx++) & 0x1F; if (c > 16) { @@ -1275,7 +1275,7 @@ void r100_cs_dump_packet(struct radeon_cs_parser *p, unsigned i; unsigned idx; - ib = p->ib->ptr; + ib = p->ib.ptr; idx = pkt->idx; for (i = 0; i <= (pkt->count + 1); i++, idx++) { DRM_INFO("ib[%d]=0x%08X\n", idx, ib[idx]); @@ -1354,7 +1354,7 @@ int r100_cs_packet_parse_vline(struct radeon_cs_parser *p) uint32_t header, h_idx, reg; volatile uint32_t *ib; - ib = p->ib->ptr; + ib = p->ib.ptr; /* parse the wait until */ r = r100_cs_packet_parse(p, , p->idx); @@ -1533,7 +1533,7 @@ static int r100_packet0_check(struct radeon_cs_parser *p, u32 tile_flags = 0; u32 idx_value; - ib = p->ib->ptr; + ib = p->ib.ptr; track = (struct r100_cs_track *)p->track; idx_value = radeon_get_ib_value(p, idx); @@ -1889,7 +1889,7 @@ static int r100_packet3_check(struct radeon_cs_parser *p, volatile uint32_t *ib; int r; - ib = p->ib->ptr; + ib = p->ib.ptr; idx = pkt->idx + 1; track = (struct r100_cs_track *)p->track; switch (pkt->opcode) { @@ -3684,7 +3684,7 @@ void r100_ring_ib_execute(struct radeon_device *rdev, struct radeon_ib *ib) int r100_ib_test(struct radeon_device *rdev, struct radeon_ring *ring) { - struct radeon_ib *ib; + struct radeon_ib ib; uint32_t scratch; uint32_t tmp = 0; unsigned i; @@ -3700,22 +3700,22 @@ int r100_ib_test(struct radeon_device *rdev, struct radeon_ring *ring) if (r) { return r; } - ib->ptr[0] = PACKET0(scratch, 0); - ib->ptr[1] = 0xDEADBEEF; -
[PATCH 19/20] drm/radeon: remove r600 blit mutex v2
If we don't store local data into global variables it isn't necessary to lock anything. v2: rebased on new SA interface Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/evergreen_blit_kms.c |1 - drivers/gpu/drm/radeon/r600.c | 13 +--- drivers/gpu/drm/radeon/r600_blit_kms.c | 99 +++ drivers/gpu/drm/radeon/radeon.h |3 - drivers/gpu/drm/radeon/radeon_asic.h|9 ++- 5 files changed, 50 insertions(+), 75 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen_blit_kms.c b/drivers/gpu/drm/radeon/evergreen_blit_kms.c index 222acd2..30f0480 100644 --- a/drivers/gpu/drm/radeon/evergreen_blit_kms.c +++ b/drivers/gpu/drm/radeon/evergreen_blit_kms.c @@ -637,7 +637,6 @@ int evergreen_blit_init(struct radeon_device *rdev) if (rdev->r600_blit.shader_obj) goto done; - mutex_init(>r600_blit.mutex); rdev->r600_blit.state_offset = 0; if (rdev->family < CHIP_CAYMAN) diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index 0ae2d2d..9d6009a 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -2363,20 +2363,15 @@ int r600_copy_blit(struct radeon_device *rdev, unsigned num_gpu_pages, struct radeon_fence *fence) { + struct radeon_sa_bo *vb = NULL; int r; - mutex_lock(>r600_blit.mutex); - rdev->r600_blit.vb_ib = NULL; - r = r600_blit_prepare_copy(rdev, num_gpu_pages); + r = r600_blit_prepare_copy(rdev, num_gpu_pages, ); if (r) { - if (rdev->r600_blit.vb_ib) - radeon_ib_free(rdev, >r600_blit.vb_ib); - mutex_unlock(>r600_blit.mutex); return r; } - r600_kms_blit_copy(rdev, src_offset, dst_offset, num_gpu_pages); - r600_blit_done_copy(rdev, fence); - mutex_unlock(>r600_blit.mutex); + r600_kms_blit_copy(rdev, src_offset, dst_offset, num_gpu_pages, vb); + r600_blit_done_copy(rdev, fence, vb); return 0; } diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c b/drivers/gpu/drm/radeon/r600_blit_kms.c index db38f58..ef20822 100644 --- a/drivers/gpu/drm/radeon/r600_blit_kms.c +++ b/drivers/gpu/drm/radeon/r600_blit_kms.c @@ -513,7 +513,6 @@ int r600_blit_init(struct radeon_device *rdev) rdev->r600_blit.primitives.set_default_state = set_default_state; rdev->r600_blit.ring_size_common = 40; /* shaders + def state */ - rdev->r600_blit.ring_size_common += 16; /* fence emit for VB IB */ rdev->r600_blit.ring_size_common += 5; /* done copy */ rdev->r600_blit.ring_size_common += 16; /* fence emit for done copy */ @@ -528,7 +527,6 @@ int r600_blit_init(struct radeon_device *rdev) if (rdev->r600_blit.shader_obj) goto done; - mutex_init(>r600_blit.mutex); rdev->r600_blit.state_offset = 0; if (rdev->family >= CHIP_RV770) @@ -621,27 +619,6 @@ void r600_blit_fini(struct radeon_device *rdev) radeon_bo_unref(>r600_blit.shader_obj); } -static int r600_vb_ib_get(struct radeon_device *rdev, unsigned size) -{ - int r; - r = radeon_ib_get(rdev, RADEON_RING_TYPE_GFX_INDEX, - >r600_blit.vb_ib, size); - if (r) { - DRM_ERROR("failed to get IB for vertex buffer\n"); - return r; - } - - rdev->r600_blit.vb_total = size; - rdev->r600_blit.vb_used = 0; - return 0; -} - -static void r600_vb_ib_put(struct radeon_device *rdev) -{ - radeon_fence_emit(rdev, rdev->r600_blit.vb_ib->fence); - radeon_ib_free(rdev, >r600_blit.vb_ib); -} - static unsigned r600_blit_create_rect(unsigned num_gpu_pages, int *width, int *height, int max_dim) { @@ -688,7 +665,8 @@ static unsigned r600_blit_create_rect(unsigned num_gpu_pages, } -int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_gpu_pages) +int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_gpu_pages, + struct radeon_sa_bo **vb) { struct radeon_ring *ring = >ring[RADEON_RING_TYPE_GFX_INDEX]; int r; @@ -705,46 +683,54 @@ int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_gpu_pages) } /* 48 bytes for vertex per loop */ - r = r600_vb_ib_get(rdev, (num_loops*48)+256); - if (r) + r = radeon_sa_bo_new(rdev, >ring_tmp_bo, vb, +(num_loops*48)+256, 256, true); + if (r) { return r; + } /* calculate number of loops correctly */ ring_size = num_loops * dwords_per_loop; ring_size += rdev->r600_blit.ring_size_common; r = radeon_ring_lock(rdev, ring, ring_size); - if (r) + if (r) { + radeon_sa_bo_free(rdev, vb, NULL); return r; + }
[PATCH 18/20] drm/radeon: move the semaphore from the fence into the ib
From: Jerome GlisseIt never really belonged there in the first place. Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon.h | 16 drivers/gpu/drm/radeon/radeon_cs.c|4 ++-- drivers/gpu/drm/radeon/radeon_fence.c |3 --- drivers/gpu/drm/radeon/radeon_ring.c |2 ++ 4 files changed, 12 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 6170307..9507be0 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -272,7 +272,6 @@ struct radeon_fence { uint64_tseq; /* RB, DMA, etc. */ unsignedring; - struct radeon_semaphore *semaphore; }; int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring); @@ -624,13 +623,14 @@ void radeon_irq_kms_pflip_irq_put(struct radeon_device *rdev, int crtc); */ struct radeon_ib { - struct radeon_sa_bo *sa_bo; - uint32_tlength_dw; - uint64_tgpu_addr; - uint32_t*ptr; - struct radeon_fence *fence; - unsignedvm_id; - boolis_const_ib; + struct radeon_sa_bo *sa_bo; + uint32_tlength_dw; + uint64_tgpu_addr; + uint32_t*ptr; + struct radeon_fence *fence; + unsignedvm_id; + boolis_const_ib; + struct radeon_semaphore *semaphore; }; struct radeon_ring { diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 5c065bf..dcfe2a0 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -138,12 +138,12 @@ static int radeon_cs_sync_rings(struct radeon_cs_parser *p) return 0; } - r = radeon_semaphore_create(p->rdev, >ib->fence->semaphore); + r = radeon_semaphore_create(p->rdev, >ib->semaphore); if (r) { return r; } - return radeon_semaphore_sync_rings(p->rdev, p->ib->fence->semaphore, + return radeon_semaphore_sync_rings(p->rdev, p->ib->semaphore, sync_to_ring, p->ring); } diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 6767381..c1f5233 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -137,8 +137,6 @@ static void radeon_fence_destroy(struct kref *kref) fence = container_of(kref, struct radeon_fence, kref); fence->seq = RADEON_FENCE_NOTEMITED_SEQ; - if (fence->semaphore) - radeon_semaphore_free(fence->rdev, fence->semaphore, NULL); kfree(fence); } @@ -154,7 +152,6 @@ int radeon_fence_create(struct radeon_device *rdev, (*fence)->rdev = rdev; (*fence)->seq = RADEON_FENCE_NOTEMITED_SEQ; (*fence)->ring = ring; - (*fence)->semaphore = NULL; return 0; } diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index b3d6942..af8e1ee 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -93,6 +93,7 @@ int radeon_ib_get(struct radeon_device *rdev, int ring, (*ib)->gpu_addr = radeon_sa_bo_gpu_addr((*ib)->sa_bo); (*ib)->vm_id = 0; (*ib)->is_const_ib = false; + (*ib)->semaphore = NULL; return 0; } @@ -105,6 +106,7 @@ void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib) if (tmp == NULL) { return; } + radeon_semaphore_free(rdev, tmp->semaphore, tmp->fence); radeon_sa_bo_free(rdev, >sa_bo, tmp->fence); radeon_fence_unref(>fence); kfree(tmp); -- 1.7.5.4
[PATCH 17/20] drm/radeon: immediately free ttm-move semaphore
We can now protected the semaphore ram by a fence, so free it immediately. Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon_ttm.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 5e3d54d..0f6aee8 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -223,6 +223,7 @@ static int radeon_move_blit(struct ttm_buffer_object *bo, struct radeon_device *rdev; uint64_t old_start, new_start; struct radeon_fence *fence, *old_fence; + struct radeon_semaphore *sem = NULL; int r; rdev = radeon_get_rdev(bo->bdev); @@ -272,15 +273,16 @@ static int radeon_move_blit(struct ttm_buffer_object *bo, bool sync_to_ring[RADEON_NUM_RINGS] = { }; sync_to_ring[old_fence->ring] = true; - r = radeon_semaphore_create(rdev, >semaphore); + r = radeon_semaphore_create(rdev, ); if (r) { radeon_fence_unref(); return r; } - r = radeon_semaphore_sync_rings(rdev, fence->semaphore, + r = radeon_semaphore_sync_rings(rdev, sem, sync_to_ring, fence->ring); if (r) { + radeon_semaphore_free(rdev, sem, NULL); radeon_fence_unref(); return r; } @@ -292,6 +294,7 @@ static int radeon_move_blit(struct ttm_buffer_object *bo, /* FIXME: handle copy error */ r = ttm_bo_move_accel_cleanup(bo, (void *)fence, NULL, evict, no_wait_reserve, no_wait_gpu, new_mem); + radeon_semaphore_free(rdev, sem, fence); radeon_fence_unref(); return r; } -- 1.7.5.4
[PATCH 16/20] drm/radeon: rip out the ib pool
From: Jerome GlisseIt isn't necessary any more and the suballocator seems to perform even better. Signed-off-by: Christian K?nig Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon.h | 17 +-- drivers/gpu/drm/radeon/radeon_device.c|1 - drivers/gpu/drm/radeon/radeon_gart.c | 12 +- drivers/gpu/drm/radeon/radeon_ring.c | 241 - drivers/gpu/drm/radeon/radeon_semaphore.c |2 +- 5 files changed, 71 insertions(+), 202 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 45164e1..6170307 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -625,7 +625,6 @@ void radeon_irq_kms_pflip_irq_put(struct radeon_device *rdev, int crtc); struct radeon_ib { struct radeon_sa_bo *sa_bo; - unsignedidx; uint32_tlength_dw; uint64_tgpu_addr; uint32_t*ptr; @@ -634,18 +633,6 @@ struct radeon_ib { boolis_const_ib; }; -/* - * locking - - * mutex protects scheduled_ibs, ready, alloc_bm - */ -struct radeon_ib_pool { - struct radeon_mutex mutex; - struct radeon_sa_managersa_manager; - struct radeon_ibibs[RADEON_IB_POOL_SIZE]; - boolready; - unsignedhead_id; -}; - struct radeon_ring { struct radeon_bo*ring_obj; volatile uint32_t *ring; @@ -787,7 +774,6 @@ struct si_rlc { int radeon_ib_get(struct radeon_device *rdev, int ring, struct radeon_ib **ib, unsigned size); void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib); -bool radeon_ib_try_free(struct radeon_device *rdev, struct radeon_ib *ib); int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib); int radeon_ib_pool_init(struct radeon_device *rdev); void radeon_ib_pool_fini(struct radeon_device *rdev); @@ -1522,7 +1508,8 @@ struct radeon_device { wait_queue_head_t fence_queue; struct mutexring_lock; struct radeon_ring ring[RADEON_NUM_RINGS]; - struct radeon_ib_pool ib_pool; + boolib_pool_ready; + struct radeon_sa_managerring_tmp_bo; struct radeon_irq irq; struct radeon_asic *asic; struct radeon_gem gem; diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index 48876c1..e1bc7e9 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -724,7 +724,6 @@ int radeon_device_init(struct radeon_device *rdev, /* mutex initialization are all done here so we * can recall function without having locking issues */ radeon_mutex_init(>cs_mutex); - radeon_mutex_init(>ib_pool.mutex); mutex_init(>ring_lock); mutex_init(>dc_hw_i2c_mutex); if (rdev->family >= CHIP_R600) diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index 53dba8e..8e9ef34 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -432,8 +432,8 @@ retry_id: rdev->vm_manager.use_bitmap |= 1 << id; vm->id = id; list_add_tail(>list, >vm_manager.lru_vm); - return radeon_vm_bo_update_pte(rdev, vm, rdev->ib_pool.sa_manager.bo, - >ib_pool.sa_manager.bo->tbo.mem); + return radeon_vm_bo_update_pte(rdev, vm, rdev->ring_tmp_bo.bo, + >ring_tmp_bo.bo->tbo.mem); } /* object have to be reserved */ @@ -631,7 +631,7 @@ int radeon_vm_init(struct radeon_device *rdev, struct radeon_vm *vm) /* map the ib pool buffer at 0 in virtual address space, set * read only */ - r = radeon_vm_bo_add(rdev, vm, rdev->ib_pool.sa_manager.bo, 0, + r = radeon_vm_bo_add(rdev, vm, rdev->ring_tmp_bo.bo, 0, RADEON_VM_PAGE_READABLE | RADEON_VM_PAGE_SNOOPED); return r; } @@ -648,12 +648,12 @@ void radeon_vm_fini(struct radeon_device *rdev, struct radeon_vm *vm) radeon_mutex_unlock(>cs_mutex); /* remove all bo */ - r = radeon_bo_reserve(rdev->ib_pool.sa_manager.bo, false); + r = radeon_bo_reserve(rdev->ring_tmp_bo.bo, false); if (!r) { - bo_va = radeon_bo_va(rdev->ib_pool.sa_manager.bo, vm); + bo_va = radeon_bo_va(rdev->ring_tmp_bo.bo, vm); list_del_init(_va->bo_list); list_del_init(_va->vm_list); - radeon_bo_unreserve(rdev->ib_pool.sa_manager.bo); + radeon_bo_unreserve(rdev->ring_tmp_bo.bo); kfree(bo_va); } if
[PATCH 15/20] drm/radeon: simplify semaphore handling v2
From: Jerome GlisseDirectly use the suballocator to get small chunks of memory. It's equally fast and doesn't crash when we encounter a GPU reset. v2: rebased on new SA interface. Signed-off-by: Christian K?nig Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/evergreen.c|1 - drivers/gpu/drm/radeon/ni.c |1 - drivers/gpu/drm/radeon/r600.c |1 - drivers/gpu/drm/radeon/radeon.h | 29 +- drivers/gpu/drm/radeon/radeon_device.c|2 - drivers/gpu/drm/radeon/radeon_fence.c |2 +- drivers/gpu/drm/radeon/radeon_semaphore.c | 137 + drivers/gpu/drm/radeon/radeon_test.c |4 +- drivers/gpu/drm/radeon/rv770.c|1 - drivers/gpu/drm/radeon/si.c |1 - 10 files changed, 30 insertions(+), 149 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index ecc29bc..7e7ac3d 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -3550,7 +3550,6 @@ void evergreen_fini(struct radeon_device *rdev) evergreen_pcie_gart_fini(rdev); r600_vram_scratch_fini(rdev); radeon_gem_fini(rdev); - radeon_semaphore_driver_fini(rdev); radeon_fence_driver_fini(rdev); radeon_agp_fini(rdev); radeon_bo_fini(rdev); diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c index 9cd2657..107b217 100644 --- a/drivers/gpu/drm/radeon/ni.c +++ b/drivers/gpu/drm/radeon/ni.c @@ -1744,7 +1744,6 @@ void cayman_fini(struct radeon_device *rdev) cayman_pcie_gart_fini(rdev); r600_vram_scratch_fini(rdev); radeon_gem_fini(rdev); - radeon_semaphore_driver_fini(rdev); radeon_fence_driver_fini(rdev); radeon_bo_fini(rdev); radeon_atombios_fini(rdev); diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index 87a2333..0ae2d2d 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -2658,7 +2658,6 @@ void r600_fini(struct radeon_device *rdev) r600_vram_scratch_fini(rdev); radeon_agp_fini(rdev); radeon_gem_fini(rdev); - radeon_semaphore_driver_fini(rdev); radeon_fence_driver_fini(rdev); radeon_bo_fini(rdev); radeon_atombios_fini(rdev); diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index cc7f16a..45164e1 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -434,34 +434,13 @@ int radeon_mode_dumb_destroy(struct drm_file *file_priv, /* * Semaphores. */ -struct radeon_ring; - -#defineRADEON_SEMAPHORE_BO_SIZE256 - -struct radeon_semaphore_driver { - rwlock_tlock; - struct list_headbo; -}; - -struct radeon_semaphore_bo; - /* everything here is constant */ struct radeon_semaphore { - struct list_headlist; + struct radeon_sa_bo *sa_bo; + signed waiters; uint64_tgpu_addr; - uint32_t*cpu_ptr; - struct radeon_semaphore_bo *bo; }; -struct radeon_semaphore_bo { - struct list_headlist; - struct radeon_ib*ib; - struct list_headfree; - struct radeon_semaphore semaphores[RADEON_SEMAPHORE_BO_SIZE/8]; - unsignednused; -}; - -void radeon_semaphore_driver_fini(struct radeon_device *rdev); int radeon_semaphore_create(struct radeon_device *rdev, struct radeon_semaphore **semaphore); void radeon_semaphore_emit_signal(struct radeon_device *rdev, int ring, @@ -473,7 +452,8 @@ int radeon_semaphore_sync_rings(struct radeon_device *rdev, bool sync_to[RADEON_NUM_RINGS], int dst_ring); void radeon_semaphore_free(struct radeon_device *rdev, - struct radeon_semaphore *semaphore); + struct radeon_semaphore *semaphore, + struct radeon_fence *fence); /* * GART structures, functions & helpers @@ -1540,7 +1520,6 @@ struct radeon_device { struct radeon_mman mman; struct radeon_fence_driver fence_drv[RADEON_NUM_RINGS]; wait_queue_head_t fence_queue; - struct radeon_semaphore_driver semaphore_drv; struct mutexring_lock; struct radeon_ring ring[RADEON_NUM_RINGS]; struct radeon_ib_pool ib_pool; diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index b827b2e..48876c1 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -732,11 +732,9 @@ int
[PATCH 14/20] drm/radeon: multiple ring allocator v2
A startover with a new idea for a multiple ring allocator. Should perform as well as a normal ring allocator as long as only one ring does somthing, but falls back to a more complex algorithm if more complex things start to happen. We store the last allocated bo in last, we always try to allocate after the last allocated bo. Principle is that in a linear GPU ring progression was is after last is the oldest bo we allocated and thus the first one that should no longer be in use by the GPU. If it's not the case we skip over the bo after last to the closest done bo if such one exist. If none exist and we are not asked to block we report failure to allocate. If we are asked to block we wait on all the oldest fence of all rings. We just wait for any of those fence to complete. v2: We need to be able to let hole point to the list_head, otherwise try free will never free the first allocation of the list. Also stop calling radeon_fence_signalled more than necessary. Signed-off-by: Christian K?nig Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon.h |7 +- drivers/gpu/drm/radeon/radeon_ring.c | 19 +-- drivers/gpu/drm/radeon/radeon_sa.c | 292 +++--- 3 files changed, 210 insertions(+), 108 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 37a7459..cc7f16a 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -385,7 +385,9 @@ struct radeon_bo_list { struct radeon_sa_manager { spinlock_t lock; struct radeon_bo*bo; - struct list_headsa_bo; + struct list_head*hole; + struct list_headflist[RADEON_NUM_RINGS]; + struct list_headolist; unsignedsize; uint64_tgpu_addr; void*cpu_ptr; @@ -396,7 +398,8 @@ struct radeon_sa_bo; /* sub-allocation buffer */ struct radeon_sa_bo { - struct list_headlist; + struct list_headolist; + struct list_headflist; struct radeon_sa_manager*manager; unsignedsoffset; unsignedeoffset; diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 1748d93..e074ff5 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -204,25 +204,22 @@ int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib) int radeon_ib_pool_init(struct radeon_device *rdev) { - struct radeon_sa_manager tmp; int i, r; - r = radeon_sa_bo_manager_init(rdev, , - RADEON_IB_POOL_SIZE*64*1024, - RADEON_GEM_DOMAIN_GTT); - if (r) { - return r; - } - radeon_mutex_lock(>ib_pool.mutex); if (rdev->ib_pool.ready) { radeon_mutex_unlock(>ib_pool.mutex); - radeon_sa_bo_manager_fini(rdev, ); return 0; } - rdev->ib_pool.sa_manager = tmp; - INIT_LIST_HEAD(>ib_pool.sa_manager.sa_bo); + r = radeon_sa_bo_manager_init(rdev, >ib_pool.sa_manager, + RADEON_IB_POOL_SIZE*64*1024, + RADEON_GEM_DOMAIN_GTT); + if (r) { + radeon_mutex_unlock(>ib_pool.mutex); + return r; + } + for (i = 0; i < RADEON_IB_POOL_SIZE; i++) { rdev->ib_pool.ibs[i].fence = NULL; rdev->ib_pool.ibs[i].idx = i; diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c index 90ee8ad..757a9d4 100644 --- a/drivers/gpu/drm/radeon/radeon_sa.c +++ b/drivers/gpu/drm/radeon/radeon_sa.c @@ -27,21 +27,42 @@ * Authors: *Jerome Glisse */ +/* Algorithm: + * + * We store the last allocated bo in "hole", we always try to allocate + * after the last allocated bo. Principle is that in a linear GPU ring + * progression was is after last is the oldest bo we allocated and thus + * the first one that should no longer be in use by the GPU. + * + * If it's not the case we skip over the bo after last to the closest + * done bo if such one exist. If none exist and we are not asked to + * block we report failure to allocate. + * + * If we are asked to block we wait on all the oldest fence of all + * rings. We just wait for any of those fence to complete. + */ #include "drmP.h" #include "drm.h" #include "radeon.h" +static void radeon_sa_bo_remove_locked(struct radeon_sa_bo *sa_bo); +static void radeon_sa_bo_try_free(struct radeon_sa_manager *sa_manager); + int radeon_sa_bo_manager_init(struct radeon_device *rdev, struct radeon_sa_manager *sa_manager, unsigned size, u32 domain) { - int r; + int i,
[PATCH 13/20] drm/radeon: use one wait queue for all rings add fence_wait_any v2
From: Jerome GlisseUse one wait queue for all rings. When one ring progress, other likely does to and we are not expecting to have a lot of waiter anyway. Also add a fence_wait_any that will wait until the first fence in the fence array (one fence per ring) is signaled. This allow to wait on all rings. v2: some minor cleanups and improvements. Signed-off-by: Christian K?nig Signed-off-by: Jerome Glisse --- drivers/gpu/drm/radeon/radeon.h |5 +- drivers/gpu/drm/radeon/radeon_fence.c | 163 - 2 files changed, 162 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index ada70d1..37a7459 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -262,7 +262,6 @@ struct radeon_fence_driver { uint64_tseq; atomic64_t last_seq; unsigned long last_activity; - wait_queue_head_t queue; boolinitialized; }; @@ -286,6 +285,9 @@ bool radeon_fence_signaled(struct radeon_fence *fence); int radeon_fence_wait(struct radeon_fence *fence, bool interruptible); int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring); int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring); +int radeon_fence_wait_any(struct radeon_device *rdev, + struct radeon_fence **fences, + bool intr); struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence); void radeon_fence_unref(struct radeon_fence **fence); unsigned radeon_fence_count_emitted(struct radeon_device *rdev, int ring); @@ -1534,6 +1536,7 @@ struct radeon_device { struct radeon_scratch scratch; struct radeon_mman mman; struct radeon_fence_driver fence_drv[RADEON_NUM_RINGS]; + wait_queue_head_t fence_queue; struct radeon_semaphore_driver semaphore_drv; struct mutexring_lock; struct radeon_ring ring[RADEON_NUM_RINGS]; diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 8034b42..45d4e6e 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -222,11 +222,11 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq, trace_radeon_fence_wait_begin(rdev->ddev, seq); radeon_irq_kms_sw_irq_get(rdev, ring); if (intr) { - r = wait_event_interruptible_timeout(rdev->fence_drv[ring].queue, + r = wait_event_interruptible_timeout(rdev->fence_queue, (signaled = radeon_fence_seq_signaled(rdev, target_seq, ring)), timeout); } else { - r = wait_event_timeout(rdev->fence_drv[ring].queue, + r = wait_event_timeout(rdev->fence_queue, (signaled = radeon_fence_seq_signaled(rdev, target_seq, ring)), timeout); } @@ -300,6 +300,159 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr) return 0; } +bool radeon_fence_any_seq_signaled(struct radeon_device *rdev, u64 *seq) +{ + unsigned i; + + for (i = 0; i < RADEON_NUM_RINGS; ++i) { + if (seq[i] && radeon_fence_seq_signaled(rdev, seq[i], i)) { + return true; + } + } + return false; +} + +static int radeon_fence_wait_any_seq(struct radeon_device *rdev, +u64 *target_seq, bool intr) +{ + unsigned long timeout, last_activity, tmp; + unsigned i, ring = RADEON_NUM_RINGS; + bool signaled; + int r; + + for (i = 0, last_activity = 0; i < RADEON_NUM_RINGS; ++i) { + if (!target_seq[i]) { + continue; + } + + /* use the most recent one as indicator */ + if (time_after(rdev->fence_drv[i].last_activity, last_activity)) { + last_activity = rdev->fence_drv[i].last_activity; + } + + /* For lockup detection just pick the lowest ring we are +* actively waiting for +*/ + if (i < ring) { + ring = i; + } + } + + /* nothing to wait for ? */ + if (ring == RADEON_NUM_RINGS) { + return 0; + } + + while (!radeon_fence_any_seq_signaled(rdev, target_seq)) { + timeout = jiffies - RADEON_FENCE_JIFFIES_TIMEOUT; + if (time_after(last_activity, timeout)) { + /* the normal case, timeout is somewhere before last_activity */ +
[PATCH 12/20] drm/radeon: define new SA interface v3
Define the interface without modifying the allocation algorithm in any way. v2: rebase on top of fence new uint64 patch v3: add ring to debugfs output Signed-off-by: Jerome Glisse Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon.h |1 + drivers/gpu/drm/radeon/radeon_gart.c |6 +- drivers/gpu/drm/radeon/radeon_object.h|5 +- drivers/gpu/drm/radeon/radeon_ring.c |8 ++-- drivers/gpu/drm/radeon/radeon_sa.c| 60 drivers/gpu/drm/radeon/radeon_semaphore.c |2 +- 6 files changed, 63 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 9374ab1..ada70d1 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -398,6 +398,7 @@ struct radeon_sa_bo { struct radeon_sa_manager*manager; unsignedsoffset; unsignedeoffset; + struct radeon_fence *fence; }; /* diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index c5789ef..53dba8e 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -326,7 +326,7 @@ static void radeon_vm_unbind_locked(struct radeon_device *rdev, rdev->vm_manager.use_bitmap &= ~(1 << vm->id); list_del_init(>list); vm->id = -1; - radeon_sa_bo_free(rdev, >sa_bo); + radeon_sa_bo_free(rdev, >sa_bo, NULL); vm->pt = NULL; list_for_each_entry(bo_va, >va, vm_list) { @@ -395,7 +395,7 @@ int radeon_vm_bind(struct radeon_device *rdev, struct radeon_vm *vm) retry: r = radeon_sa_bo_new(rdev, >vm_manager.sa_manager, >sa_bo, RADEON_GPU_PAGE_ALIGN(vm->last_pfn * 8), -RADEON_GPU_PAGE_SIZE); +RADEON_GPU_PAGE_SIZE, false); if (r) { if (list_empty(>vm_manager.lru_vm)) { return r; @@ -426,7 +426,7 @@ retry_id: /* do hw bind */ r = rdev->vm_manager.funcs->bind(rdev, vm, id); if (r) { - radeon_sa_bo_free(rdev, >sa_bo); + radeon_sa_bo_free(rdev, >sa_bo, NULL); return r; } rdev->vm_manager.use_bitmap |= 1 << id; diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h index 4fc7f07..befec7d 100644 --- a/drivers/gpu/drm/radeon/radeon_object.h +++ b/drivers/gpu/drm/radeon/radeon_object.h @@ -169,9 +169,10 @@ extern int radeon_sa_bo_manager_suspend(struct radeon_device *rdev, extern int radeon_sa_bo_new(struct radeon_device *rdev, struct radeon_sa_manager *sa_manager, struct radeon_sa_bo **sa_bo, - unsigned size, unsigned align); + unsigned size, unsigned align, bool block); extern void radeon_sa_bo_free(struct radeon_device *rdev, - struct radeon_sa_bo **sa_bo); + struct radeon_sa_bo **sa_bo, + struct radeon_fence *fence); #if defined(CONFIG_DEBUG_FS) extern void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager, struct seq_file *m); diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 45adb37..1748d93 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -85,7 +85,7 @@ bool radeon_ib_try_free(struct radeon_device *rdev, struct radeon_ib *ib) if (ib->fence && ib->fence->seq < RADEON_FENCE_NOTEMITED_SEQ) { if (radeon_fence_signaled(ib->fence)) { radeon_fence_unref(>fence); - radeon_sa_bo_free(rdev, >sa_bo); + radeon_sa_bo_free(rdev, >sa_bo, NULL); done = true; } } @@ -124,7 +124,7 @@ retry: if (rdev->ib_pool.ibs[idx].fence == NULL) { r = radeon_sa_bo_new(rdev, >ib_pool.sa_manager, >ib_pool.ibs[idx].sa_bo, -size, 256); +size, 256, false); if (!r) { *ib = >ib_pool.ibs[idx]; (*ib)->ptr = radeon_sa_bo_cpu_addr((*ib)->sa_bo); @@ -173,7 +173,7 @@ void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib) } radeon_mutex_lock(>ib_pool.mutex); if (tmp->fence && tmp->fence->seq == RADEON_FENCE_NOTEMITED_SEQ) { - radeon_sa_bo_free(rdev, >sa_bo); + radeon_sa_bo_free(rdev, >sa_bo, NULL); radeon_fence_unref(>fence); }
[PATCH 11/20] drm/radeon: make sa bo a stand alone object
Allocating and freeing it seperately. Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon.h |4 ++-- drivers/gpu/drm/radeon/radeon_cs.c|4 ++-- drivers/gpu/drm/radeon/radeon_gart.c |4 ++-- drivers/gpu/drm/radeon/radeon_object.h|4 ++-- drivers/gpu/drm/radeon/radeon_ring.c |6 +++--- drivers/gpu/drm/radeon/radeon_sa.c| 28 +++- drivers/gpu/drm/radeon/radeon_semaphore.c |4 ++-- 7 files changed, 32 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index d1c2154..9374ab1 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -638,7 +638,7 @@ void radeon_irq_kms_pflip_irq_put(struct radeon_device *rdev, int crtc); */ struct radeon_ib { - struct radeon_sa_bo sa_bo; + struct radeon_sa_bo *sa_bo; unsignedidx; uint32_tlength_dw; uint64_tgpu_addr; @@ -693,7 +693,7 @@ struct radeon_vm { unsignedlast_pfn; u64 pt_gpu_addr; u64 *pt; - struct radeon_sa_bo sa_bo; + struct radeon_sa_bo *sa_bo; struct mutexmutex; /* last fence for cs using this vm */ struct radeon_fence *fence; diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index b778037..5c065bf 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -477,7 +477,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev, /* ib pool is bind at 0 in virtual address space to gpu_addr is the * offset inside the pool bo */ - parser->const_ib->gpu_addr = parser->const_ib->sa_bo.soffset; + parser->const_ib->gpu_addr = parser->const_ib->sa_bo->soffset; r = radeon_ib_schedule(rdev, parser->const_ib); if (r) goto out; @@ -487,7 +487,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev, /* ib pool is bind at 0 in virtual address space to gpu_addr is the * offset inside the pool bo */ - parser->ib->gpu_addr = parser->ib->sa_bo.soffset; + parser->ib->gpu_addr = parser->ib->sa_bo->soffset; parser->ib->is_const_ib = false; r = radeon_ib_schedule(rdev, parser->ib); out: diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index 4a5d9d4..c5789ef 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -404,8 +404,8 @@ retry: radeon_vm_unbind(rdev, vm_evict); goto retry; } - vm->pt = radeon_sa_bo_cpu_addr(>sa_bo); - vm->pt_gpu_addr = radeon_sa_bo_gpu_addr(>sa_bo); + vm->pt = radeon_sa_bo_cpu_addr(vm->sa_bo); + vm->pt_gpu_addr = radeon_sa_bo_gpu_addr(vm->sa_bo); memset(vm->pt, 0, RADEON_GPU_PAGE_ALIGN(vm->last_pfn * 8)); retry_id: diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h index 99ab46a..4fc7f07 100644 --- a/drivers/gpu/drm/radeon/radeon_object.h +++ b/drivers/gpu/drm/radeon/radeon_object.h @@ -168,10 +168,10 @@ extern int radeon_sa_bo_manager_suspend(struct radeon_device *rdev, struct radeon_sa_manager *sa_manager); extern int radeon_sa_bo_new(struct radeon_device *rdev, struct radeon_sa_manager *sa_manager, - struct radeon_sa_bo *sa_bo, + struct radeon_sa_bo **sa_bo, unsigned size, unsigned align); extern void radeon_sa_bo_free(struct radeon_device *rdev, - struct radeon_sa_bo *sa_bo); + struct radeon_sa_bo **sa_bo); #if defined(CONFIG_DEBUG_FS) extern void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager, struct seq_file *m); diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index f49c9c0..45adb37 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -127,8 +127,8 @@ retry: size, 256); if (!r) { *ib = >ib_pool.ibs[idx]; - (*ib)->ptr = radeon_sa_bo_cpu_addr(&(*ib)->sa_bo); - (*ib)->gpu_addr = radeon_sa_bo_gpu_addr(&(*ib)->sa_bo); + (*ib)->ptr = radeon_sa_bo_cpu_addr((*ib)->sa_bo); + (*ib)->gpu_addr = radeon_sa_bo_gpu_addr((*ib)->sa_bo);
[PATCH 10/20] drm/radeon: keep start and end offset in the SA
Instead of offset + size keep start and end offset directly. Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon.h|4 ++-- drivers/gpu/drm/radeon/radeon_cs.c |4 ++-- drivers/gpu/drm/radeon/radeon_object.h |4 ++-- drivers/gpu/drm/radeon/radeon_sa.c | 13 +++-- 4 files changed, 13 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 8a6b1b3..d1c2154 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -396,8 +396,8 @@ struct radeon_sa_bo; struct radeon_sa_bo { struct list_headlist; struct radeon_sa_manager*manager; - unsignedoffset; - unsignedsize; + unsignedsoffset; + unsignedeoffset; }; /* diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 289b0d7..b778037 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -477,7 +477,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev, /* ib pool is bind at 0 in virtual address space to gpu_addr is the * offset inside the pool bo */ - parser->const_ib->gpu_addr = parser->const_ib->sa_bo.offset; + parser->const_ib->gpu_addr = parser->const_ib->sa_bo.soffset; r = radeon_ib_schedule(rdev, parser->const_ib); if (r) goto out; @@ -487,7 +487,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev, /* ib pool is bind at 0 in virtual address space to gpu_addr is the * offset inside the pool bo */ - parser->ib->gpu_addr = parser->ib->sa_bo.offset; + parser->ib->gpu_addr = parser->ib->sa_bo.soffset; parser->ib->is_const_ib = false; r = radeon_ib_schedule(rdev, parser->ib); out: diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h index d9fca1e..99ab46a 100644 --- a/drivers/gpu/drm/radeon/radeon_object.h +++ b/drivers/gpu/drm/radeon/radeon_object.h @@ -149,12 +149,12 @@ extern struct radeon_bo_va *radeon_bo_va(struct radeon_bo *rbo, static inline uint64_t radeon_sa_bo_gpu_addr(struct radeon_sa_bo *sa_bo) { - return sa_bo->manager->gpu_addr + sa_bo->offset; + return sa_bo->manager->gpu_addr + sa_bo->soffset; } static inline void * radeon_sa_bo_cpu_addr(struct radeon_sa_bo *sa_bo) { - return sa_bo->manager->cpu_ptr + sa_bo->offset; + return sa_bo->manager->cpu_ptr + sa_bo->soffset; } extern int radeon_sa_bo_manager_init(struct radeon_device *rdev, diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c index 1db0568..3bea7ba 100644 --- a/drivers/gpu/drm/radeon/radeon_sa.c +++ b/drivers/gpu/drm/radeon/radeon_sa.c @@ -152,11 +152,11 @@ int radeon_sa_bo_new(struct radeon_device *rdev, offset = 0; list_for_each_entry(tmp, _manager->sa_bo, list) { /* room before this object ? */ - if (offset < tmp->offset && (tmp->offset - offset) >= size) { + if (offset < tmp->soffset && (tmp->soffset - offset) >= size) { head = tmp->list.prev; goto out; } - offset = tmp->offset + tmp->size; + offset = tmp->eoffset; wasted = offset % align; if (wasted) { wasted = align - wasted; @@ -166,7 +166,7 @@ int radeon_sa_bo_new(struct radeon_device *rdev, /* room at the end ? */ head = sa_manager->sa_bo.prev; tmp = list_entry(head, struct radeon_sa_bo, list); - offset = tmp->offset + tmp->size; + offset = tmp->eoffset; wasted = offset % align; if (wasted) { wasted = align - wasted; @@ -180,8 +180,8 @@ int radeon_sa_bo_new(struct radeon_device *rdev, out: sa_bo->manager = sa_manager; - sa_bo->offset = offset; - sa_bo->size = size; + sa_bo->soffset = offset; + sa_bo->eoffset = offset + size; list_add(_bo->list, head); spin_unlock(_manager->lock); return 0; @@ -202,7 +202,8 @@ void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager, spin_lock(_manager->lock); list_for_each_entry(i, _manager->sa_bo, list) { - seq_printf(m, "offset %08d: size %4d\n", i->offset, i->size); + seq_printf(m, "[%08x %08x] size %4d [%p]\n", + i->soffset, i->eoffset, i->eoffset - i->soffset, i); } spin_unlock(_manager->lock); } -- 1.7.5.4
[PATCH 09/20] drm/radeon: add sub allocator debugfs file
Dumping the current allocations. Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon_object.h |5 + drivers/gpu/drm/radeon/radeon_ring.c | 22 ++ drivers/gpu/drm/radeon/radeon_sa.c | 14 ++ 3 files changed, 41 insertions(+), 0 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h index c120ab9..d9fca1e 100644 --- a/drivers/gpu/drm/radeon/radeon_object.h +++ b/drivers/gpu/drm/radeon/radeon_object.h @@ -172,5 +172,10 @@ extern int radeon_sa_bo_new(struct radeon_device *rdev, unsigned size, unsigned align); extern void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo *sa_bo); +#if defined(CONFIG_DEBUG_FS) +extern void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager, +struct seq_file *m); +#endif + #endif diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 116be5e..f49c9c0 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -601,6 +601,23 @@ static int radeon_debugfs_ib_info(struct seq_file *m, void *data) static struct drm_info_list radeon_debugfs_ib_list[RADEON_IB_POOL_SIZE]; static char radeon_debugfs_ib_names[RADEON_IB_POOL_SIZE][32]; static unsigned radeon_debugfs_ib_idx[RADEON_IB_POOL_SIZE]; + +static int radeon_debugfs_sa_info(struct seq_file *m, void *data) +{ + struct drm_info_node *node = (struct drm_info_node *) m->private; + struct drm_device *dev = node->minor->dev; + struct radeon_device *rdev = dev->dev_private; + + radeon_sa_bo_dump_debug_info(>ib_pool.sa_manager, m); + + return 0; + +} + +static struct drm_info_list radeon_debugfs_sa_list[] = { +{"radeon_sa_info", _debugfs_sa_info, 0, NULL}, +}; + #endif int radeon_debugfs_ring_init(struct radeon_device *rdev, struct radeon_ring *ring) @@ -627,6 +644,11 @@ int radeon_debugfs_ib_init(struct radeon_device *rdev) { #if defined(CONFIG_DEBUG_FS) unsigned i; + int r; + + r = radeon_debugfs_add_files(rdev, radeon_debugfs_sa_list, 1); + if (r) + return r; for (i = 0; i < RADEON_IB_POOL_SIZE; i++) { sprintf(radeon_debugfs_ib_names[i], "radeon_ib_%04u", i); diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c index aed0a8c..1db0568 100644 --- a/drivers/gpu/drm/radeon/radeon_sa.c +++ b/drivers/gpu/drm/radeon/radeon_sa.c @@ -193,3 +193,17 @@ void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo *sa_bo) list_del_init(_bo->list); spin_unlock(_bo->manager->lock); } + +#if defined(CONFIG_DEBUG_FS) +void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager, + struct seq_file *m) +{ + struct radeon_sa_bo *i; + + spin_lock(_manager->lock); + list_for_each_entry(i, _manager->sa_bo, list) { + seq_printf(m, "offset %08d: size %4d\n", i->offset, i->size); + } + spin_unlock(_manager->lock); +} +#endif -- 1.7.5.4
[PATCH 08/20] drm/radeon: add proper locking to the SA v3
Make the suballocator self containing to locking. v2: split the bugfix into a seperate patch. v3: remove some unreleated changes. Sig-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon.h|1 + drivers/gpu/drm/radeon/radeon_sa.c |6 ++ 2 files changed, 7 insertions(+), 0 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 701094b..8a6b1b3 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -381,6 +381,7 @@ struct radeon_bo_list { * alignment). */ struct radeon_sa_manager { + spinlock_t lock; struct radeon_bo*bo; struct list_headsa_bo; unsignedsize; diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c index 8fbfe69..aed0a8c 100644 --- a/drivers/gpu/drm/radeon/radeon_sa.c +++ b/drivers/gpu/drm/radeon/radeon_sa.c @@ -37,6 +37,7 @@ int radeon_sa_bo_manager_init(struct radeon_device *rdev, { int r; + spin_lock_init(_manager->lock); sa_manager->bo = NULL; sa_manager->size = size; sa_manager->domain = domain; @@ -139,6 +140,7 @@ int radeon_sa_bo_new(struct radeon_device *rdev, BUG_ON(align > RADEON_GPU_PAGE_SIZE); BUG_ON(size > sa_manager->size); + spin_lock(_manager->lock); /* no one ? */ head = sa_manager->sa_bo.prev; @@ -172,6 +174,7 @@ int radeon_sa_bo_new(struct radeon_device *rdev, offset += wasted; if ((sa_manager->size - offset) < size) { /* failed to find somethings big enough */ + spin_unlock(_manager->lock); return -ENOMEM; } @@ -180,10 +183,13 @@ out: sa_bo->offset = offset; sa_bo->size = size; list_add(_bo->list, head); + spin_unlock(_manager->lock); return 0; } void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo *sa_bo) { + spin_lock(_bo->manager->lock); list_del_init(_bo->list); + spin_unlock(_bo->manager->lock); } -- 1.7.5.4
[PATCH 07/20] drm/radeon: use inline functions to calc sa_bo addr
Instead of hacking the calculation multiple times. Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon_gart.c |6 ++ drivers/gpu/drm/radeon/radeon_object.h| 11 +++ drivers/gpu/drm/radeon/radeon_ring.c |6 ++ drivers/gpu/drm/radeon/radeon_semaphore.c |6 ++ 4 files changed, 17 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index c58a036..4a5d9d4 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -404,10 +404,8 @@ retry: radeon_vm_unbind(rdev, vm_evict); goto retry; } - vm->pt = rdev->vm_manager.sa_manager.cpu_ptr; - vm->pt += (vm->sa_bo.offset >> 3); - vm->pt_gpu_addr = rdev->vm_manager.sa_manager.gpu_addr; - vm->pt_gpu_addr += vm->sa_bo.offset; + vm->pt = radeon_sa_bo_cpu_addr(>sa_bo); + vm->pt_gpu_addr = radeon_sa_bo_gpu_addr(>sa_bo); memset(vm->pt, 0, RADEON_GPU_PAGE_ALIGN(vm->last_pfn * 8)); retry_id: diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h index f9104be..c120ab9 100644 --- a/drivers/gpu/drm/radeon/radeon_object.h +++ b/drivers/gpu/drm/radeon/radeon_object.h @@ -146,6 +146,17 @@ extern struct radeon_bo_va *radeon_bo_va(struct radeon_bo *rbo, /* * sub allocation */ + +static inline uint64_t radeon_sa_bo_gpu_addr(struct radeon_sa_bo *sa_bo) +{ + return sa_bo->manager->gpu_addr + sa_bo->offset; +} + +static inline void * radeon_sa_bo_cpu_addr(struct radeon_sa_bo *sa_bo) +{ + return sa_bo->manager->cpu_ptr + sa_bo->offset; +} + extern int radeon_sa_bo_manager_init(struct radeon_device *rdev, struct radeon_sa_manager *sa_manager, unsigned size, u32 domain); diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 2fdc8c3..116be5e 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -127,10 +127,8 @@ retry: size, 256); if (!r) { *ib = >ib_pool.ibs[idx]; - (*ib)->ptr = rdev->ib_pool.sa_manager.cpu_ptr; - (*ib)->ptr += ((*ib)->sa_bo.offset >> 2); - (*ib)->gpu_addr = rdev->ib_pool.sa_manager.gpu_addr; - (*ib)->gpu_addr += (*ib)->sa_bo.offset; + (*ib)->ptr = radeon_sa_bo_cpu_addr(&(*ib)->sa_bo); + (*ib)->gpu_addr = radeon_sa_bo_gpu_addr(&(*ib)->sa_bo); (*ib)->fence = fence; (*ib)->vm_id = 0; (*ib)->is_const_ib = false; diff --git a/drivers/gpu/drm/radeon/radeon_semaphore.c b/drivers/gpu/drm/radeon/radeon_semaphore.c index c5b3d8e..f312ba5 100644 --- a/drivers/gpu/drm/radeon/radeon_semaphore.c +++ b/drivers/gpu/drm/radeon/radeon_semaphore.c @@ -53,10 +53,8 @@ static int radeon_semaphore_add_bo(struct radeon_device *rdev) kfree(bo); return r; } - gpu_addr = rdev->ib_pool.sa_manager.gpu_addr; - gpu_addr += bo->ib->sa_bo.offset; - cpu_ptr = rdev->ib_pool.sa_manager.cpu_ptr; - cpu_ptr += (bo->ib->sa_bo.offset >> 2); + gpu_addr = radeon_sa_bo_gpu_addr(>ib->sa_bo); + cpu_ptr = radeon_sa_bo_cpu_addr(>ib->sa_bo); for (i = 0; i < (RADEON_SEMAPHORE_BO_SIZE/8); i++) { bo->semaphores[i].gpu_addr = gpu_addr; bo->semaphores[i].cpu_ptr = cpu_ptr; -- 1.7.5.4
[PATCH 06/20] drm/radeon: rework locking ring emission mutex in fence deadlock detection
Some callers illegal called fence_wait_next/empty while holding the ring emission mutex. So don't relock the mutex in that cases, and move the actual locking into the fence code. Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon.h|4 +- drivers/gpu/drm/radeon/radeon_device.c |5 +++- drivers/gpu/drm/radeon/radeon_fence.c | 39 --- drivers/gpu/drm/radeon/radeon_pm.c |8 +- drivers/gpu/drm/radeon/radeon_ring.c |6 + 5 files changed, 33 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 7c87117..701094b 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -284,8 +284,8 @@ int radeon_fence_emit(struct radeon_device *rdev, struct radeon_fence *fence); void radeon_fence_process(struct radeon_device *rdev, int ring); bool radeon_fence_signaled(struct radeon_fence *fence); int radeon_fence_wait(struct radeon_fence *fence, bool interruptible); -int radeon_fence_wait_next(struct radeon_device *rdev, int ring); -int radeon_fence_wait_empty(struct radeon_device *rdev, int ring); +int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring); +int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring); struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence); void radeon_fence_unref(struct radeon_fence **fence); unsigned radeon_fence_count_emitted(struct radeon_device *rdev, int ring); diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index 0e7b72a..b827b2e 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -912,9 +912,12 @@ int radeon_suspend_kms(struct drm_device *dev, pm_message_t state) } /* evict vram memory */ radeon_bo_evict_vram(rdev); + + mutex_lock(>ring_lock); /* wait for gpu to finish processing current batch */ for (i = 0; i < RADEON_NUM_RINGS; i++) - radeon_fence_wait_empty(rdev, i); + radeon_fence_wait_empty_locked(rdev, i); + mutex_unlock(>ring_lock); radeon_save_bios_scratch_regs(rdev); diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index f386807..8034b42 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -192,7 +192,7 @@ bool radeon_fence_signaled(struct radeon_fence *fence) } static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq, -unsigned ring, bool intr) +unsigned ring, bool intr, bool lock_ring) { unsigned long timeout, last_activity; uint64_t seq; @@ -247,8 +247,14 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq, if (seq != atomic64_read(>fence_drv[ring].last_seq)) { continue; } + + if (lock_ring) { + mutex_lock(>ring_lock); + } + /* test if somebody else has already decided that this is a lockup */ if (last_activity != rdev->fence_drv[ring].last_activity) { + mutex_unlock(>ring_lock); continue; } @@ -262,15 +268,15 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq, rdev->fence_drv[i].last_activity = jiffies; } - /* change last activity so nobody else think there is a lockup */ - for (i = 0; i < RADEON_NUM_RINGS; ++i) { - rdev->fence_drv[i].last_activity = jiffies; - } - /* mark the ring as not ready any more */ rdev->ring[ring].ready = false; + mutex_unlock(>ring_lock); return -EDEADLK; } + + if (lock_ring) { + mutex_unlock(>ring_lock); + } } } return 0; @@ -285,7 +291,8 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr) return -EINVAL; } - r = radeon_fence_wait_seq(fence->rdev, fence->seq, fence->ring, intr); + r = radeon_fence_wait_seq(fence->rdev, fence->seq, + fence->ring, intr, true); if (r) { return r; } @@ -293,7 +300,7 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr) return 0; } -int radeon_fence_wait_next(struct radeon_device *rdev, int ring) +int
[PATCH 05/20] drm/radeon: rework fence handling, drop fence list v5
From: Jerome GlisseUsing 64bits fence sequence we can directly compare sequence number to know if a fence is signaled or not. Thus the fence list became useless, so does the fence lock that mainly protected the fence list. Things like ring.ready are no longer behind a lock, this should be ok as ring.ready is initialized once and will only change when facing lockup. Worst case is that we return an -EBUSY just after a successfull GPU reset, or we go into wait state instead of returning -EBUSY (thus delaying reporting -EBUSY to fence wait caller). v2: Remove left over comment, force using writeback on cayman and newer, thus not having to suffer from possibly scratch reg exhaustion v3: Rebase on top of change to uint64 fence patch v4: Change DCE5 test to force write back on cayman and newer but also any APU such as PALM or SUMO family v5: Rebase on top of new uint64 fence patch v6: Just break if seq doesn't change any more. Use radeon_fence prefix for all function names. Even if it's now highly optimized, try avoiding polling to often. Signed-off-by: Jerome Glisse Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon.h|6 +- drivers/gpu/drm/radeon/radeon_device.c |8 +- drivers/gpu/drm/radeon/radeon_fence.c | 289 +--- 3 files changed, 118 insertions(+), 185 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index cdf46bc..7c87117 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -263,15 +263,12 @@ struct radeon_fence_driver { atomic64_t last_seq; unsigned long last_activity; wait_queue_head_t queue; - struct list_heademitted; - struct list_headsignaled; boolinitialized; }; struct radeon_fence { struct radeon_device*rdev; struct kref kref; - struct list_headlist; /* protected by radeon_fence.lock */ uint64_tseq; /* RB, DMA, etc. */ @@ -291,7 +288,7 @@ int radeon_fence_wait_next(struct radeon_device *rdev, int ring); int radeon_fence_wait_empty(struct radeon_device *rdev, int ring); struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence); void radeon_fence_unref(struct radeon_fence **fence); -int radeon_fence_count_emitted(struct radeon_device *rdev, int ring); +unsigned radeon_fence_count_emitted(struct radeon_device *rdev, int ring); /* * Tiling registers @@ -1534,7 +1531,6 @@ struct radeon_device { struct radeon_mode_info mode_info; struct radeon_scratch scratch; struct radeon_mman mman; - rwlock_tfence_lock; struct radeon_fence_driver fence_drv[RADEON_NUM_RINGS]; struct radeon_semaphore_driver semaphore_drv; struct mutexring_lock; diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index 3f6ff2a..0e7b72a 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -225,9 +225,9 @@ int radeon_wb_init(struct radeon_device *rdev) /* disable event_write fences */ rdev->wb.use_event = false; /* disabled via module param */ - if (radeon_no_wb == 1) + if (radeon_no_wb == 1) { rdev->wb.enabled = false; - else { + } else { if (rdev->flags & RADEON_IS_AGP) { /* often unreliable on AGP */ rdev->wb.enabled = false; @@ -237,8 +237,9 @@ int radeon_wb_init(struct radeon_device *rdev) } else { rdev->wb.enabled = true; /* event_write fences are only available on r600+ */ - if (rdev->family >= CHIP_R600) + if (rdev->family >= CHIP_R600) { rdev->wb.use_event = true; + } } } /* always use writeback/events on NI, APUs */ @@ -731,7 +732,6 @@ int radeon_device_init(struct radeon_device *rdev, mutex_init(>gem.mutex); mutex_init(>pm.mutex); mutex_init(>vram_mutex); - rwlock_init(>fence_lock); rwlock_init(>semaphore_drv.lock); INIT_LIST_HEAD(>gem.objects); init_waitqueue_head(>irq.vblank_queue); diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index feb2bbc..f386807 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -63,30 +63,18 @@ static u32 radeon_fence_read(struct radeon_device *rdev, int ring) int radeon_fence_emit(struct radeon_device *rdev, struct radeon_fence *fence) { -
[PATCH 04/20] drm/radeon: convert fence to uint64_t v4
From: Jerome GlisseThis convert fence to use uint64_t sequence number intention is to use the fact that uin64_t is big enough that we don't need to care about wrap around. Tested with and without writeback using 0xF000 as initial fence sequence and thus allowing to test the wrap around from 32bits to 64bits. v2: Add comment about possible race btw CPU & GPU, add comment stressing that we need 2 dword aligned for R600_WB_EVENT_OFFSET Read fence sequenc in reverse order of GPU write them so we mitigate the race btw CPU and GPU. v3: Drop the need for ring to emit the 64bits fence, and just have each ring emit the lower 32bits of the fence sequence. We handle the wrap over 32bits in fence_process. v4: Just a small optimization: Don't reread the last_seq value if loop restarts, since we already know its value anyway. Also start at zero not one for seq value and use pre instead of post increment in emmit, otherwise wait_empty will deadlock. Signed-off-by: Jerome Glisse Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon.h | 39 ++- drivers/gpu/drm/radeon/radeon_fence.c | 116 +++-- drivers/gpu/drm/radeon/radeon_ring.c |9 ++- 3 files changed, 107 insertions(+), 57 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index e99ea81..cdf46bc 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -100,28 +100,32 @@ extern int radeon_lockup_timeout; * Copy from radeon_drv.h so we don't have to include both and have conflicting * symbol; */ -#define RADEON_MAX_USEC_TIMEOUT10 /* 100 ms */ -#define RADEON_FENCE_JIFFIES_TIMEOUT (HZ / 2) +#define RADEON_MAX_USEC_TIMEOUT10 /* 100 ms */ +#define RADEON_FENCE_JIFFIES_TIMEOUT (HZ / 2) /* RADEON_IB_POOL_SIZE must be a power of 2 */ -#define RADEON_IB_POOL_SIZE16 -#define RADEON_DEBUGFS_MAX_COMPONENTS 32 -#define RADEONFB_CONN_LIMIT4 -#define RADEON_BIOS_NUM_SCRATCH8 +#define RADEON_IB_POOL_SIZE16 +#define RADEON_DEBUGFS_MAX_COMPONENTS 32 +#define RADEONFB_CONN_LIMIT4 +#define RADEON_BIOS_NUM_SCRATCH8 /* max number of rings */ -#define RADEON_NUM_RINGS 3 +#define RADEON_NUM_RINGS 3 + +/* fence seq are set to this number when signaled */ +#define RADEON_FENCE_SIGNALED_SEQ 0LL +#define RADEON_FENCE_NOTEMITED_SEQ (~0LL) /* internal ring indices */ /* r1xx+ has gfx CP ring */ -#define RADEON_RING_TYPE_GFX_INDEX 0 +#define RADEON_RING_TYPE_GFX_INDEX 0 /* cayman has 2 compute CP rings */ -#define CAYMAN_RING_TYPE_CP1_INDEX 1 -#define CAYMAN_RING_TYPE_CP2_INDEX 2 +#define CAYMAN_RING_TYPE_CP1_INDEX 1 +#define CAYMAN_RING_TYPE_CP2_INDEX 2 /* hardcode those limit for now */ -#define RADEON_VA_RESERVED_SIZE(8 << 20) -#define RADEON_IB_VM_MAX_SIZE (64 << 10) +#define RADEON_VA_RESERVED_SIZE(8 << 20) +#define RADEON_IB_VM_MAX_SIZE (64 << 10) /* * Errata workarounds. @@ -254,8 +258,9 @@ struct radeon_fence_driver { uint32_tscratch_reg; uint64_tgpu_addr; volatile uint32_t *cpu_addr; - atomic_tseq; - uint32_tlast_seq; + /* seq is protected by ring emission lock */ + uint64_tseq; + atomic64_t last_seq; unsigned long last_activity; wait_queue_head_t queue; struct list_heademitted; @@ -268,11 +273,9 @@ struct radeon_fence { struct kref kref; struct list_headlist; /* protected by radeon_fence.lock */ - uint32_tseq; - boolemitted; - boolsignaled; + uint64_tseq; /* RB, DMA, etc. */ - int ring; + unsignedring; struct radeon_semaphore *semaphore; }; diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 5bb78bf..feb2bbc 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -66,14 +66,14 @@ int radeon_fence_emit(struct radeon_device *rdev, struct radeon_fence *fence) unsigned long irq_flags; write_lock_irqsave(>fence_lock, irq_flags); - if (fence->emitted) { + if (fence->seq && fence->seq < RADEON_FENCE_NOTEMITED_SEQ) { write_unlock_irqrestore(>fence_lock, irq_flags);
[PATCH 03/20] drm/radeon: replace the per ring mutex with a global one
A single global mutex for ring submissions seems sufficient. Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon.h |3 +- drivers/gpu/drm/radeon/radeon_device.c|3 +- drivers/gpu/drm/radeon/radeon_pm.c| 10 +- drivers/gpu/drm/radeon/radeon_ring.c | 28 +++--- drivers/gpu/drm/radeon/radeon_semaphore.c | 42 + 5 files changed, 41 insertions(+), 45 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 82ffa6a..e99ea81 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -676,7 +676,6 @@ struct radeon_ring { uint64_tgpu_addr; uint32_talign_mask; uint32_tptr_mask; - struct mutexmutex; boolready; u32 ptr_reg_shift; u32 ptr_reg_mask; @@ -815,6 +814,7 @@ int radeon_ring_alloc(struct radeon_device *rdev, struct radeon_ring *cp, unsign int radeon_ring_lock(struct radeon_device *rdev, struct radeon_ring *cp, unsigned ndw); void radeon_ring_commit(struct radeon_device *rdev, struct radeon_ring *cp); void radeon_ring_unlock_commit(struct radeon_device *rdev, struct radeon_ring *cp); +void radeon_ring_undo(struct radeon_ring *ring); void radeon_ring_unlock_undo(struct radeon_device *rdev, struct radeon_ring *cp); int radeon_ring_test(struct radeon_device *rdev, struct radeon_ring *cp); void radeon_ring_force_activity(struct radeon_device *rdev, struct radeon_ring *ring); @@ -1534,6 +1534,7 @@ struct radeon_device { rwlock_tfence_lock; struct radeon_fence_driver fence_drv[RADEON_NUM_RINGS]; struct radeon_semaphore_driver semaphore_drv; + struct mutexring_lock; struct radeon_ring ring[RADEON_NUM_RINGS]; struct radeon_ib_pool ib_pool; struct radeon_irq irq; diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index ff28210..3f6ff2a 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -724,8 +724,7 @@ int radeon_device_init(struct radeon_device *rdev, * can recall function without having locking issues */ radeon_mutex_init(>cs_mutex); radeon_mutex_init(>ib_pool.mutex); - for (i = 0; i < RADEON_NUM_RINGS; ++i) - mutex_init(>ring[i].mutex); + mutex_init(>ring_lock); mutex_init(>dc_hw_i2c_mutex); if (rdev->family >= CHIP_R600) spin_lock_init(>ih.lock); diff --git a/drivers/gpu/drm/radeon/radeon_pm.c b/drivers/gpu/drm/radeon/radeon_pm.c index caa55d6..7c38745 100644 --- a/drivers/gpu/drm/radeon/radeon_pm.c +++ b/drivers/gpu/drm/radeon/radeon_pm.c @@ -252,10 +252,7 @@ static void radeon_pm_set_clocks(struct radeon_device *rdev) mutex_lock(>ddev->struct_mutex); mutex_lock(>vram_mutex); - for (i = 0; i < RADEON_NUM_RINGS; ++i) { - if (rdev->ring[i].ring_obj) - mutex_lock(>ring[i].mutex); - } + mutex_lock(>ring_lock); /* gui idle int has issues on older chips it seems */ if (rdev->family >= CHIP_R600) { @@ -311,10 +308,7 @@ static void radeon_pm_set_clocks(struct radeon_device *rdev) rdev->pm.dynpm_planned_action = DYNPM_ACTION_NONE; - for (i = 0; i < RADEON_NUM_RINGS; ++i) { - if (rdev->ring[i].ring_obj) - mutex_unlock(>ring[i].mutex); - } + mutex_unlock(>ring_lock); mutex_unlock(>vram_mutex); mutex_unlock(>ddev->struct_mutex); } diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 2eb4c6e..a4d60ae 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -346,9 +346,9 @@ int radeon_ring_alloc(struct radeon_device *rdev, struct radeon_ring *ring, unsi if (ndw < ring->ring_free_dw) { break; } - mutex_unlock(>mutex); + mutex_unlock(>ring_lock); r = radeon_fence_wait_next(rdev, radeon_ring_index(rdev, ring)); - mutex_lock(>mutex); + mutex_lock(>ring_lock); if (r) return r; } @@ -361,10 +361,10 @@ int radeon_ring_lock(struct radeon_device *rdev, struct radeon_ring *ring, unsig { int r; - mutex_lock(>mutex); + mutex_lock(>ring_lock); r = radeon_ring_alloc(rdev, ring, ndw); if (r) { - mutex_unlock(>mutex); + mutex_unlock(>ring_lock); return r; } return 0; @@ -389,20 +389,25 @@ void radeon_ring_commit(struct radeon_device *rdev, struct radeon_ring *ring)
[PATCH 02/20] drm/radeon: clarify and extend wb setup on APUs and NI+ asics
From: Alex DeucherUse family rather than DCE check for clarity, also always use wb on APUs, there will never be AGP variants. Signed-off-by: Alex Deucher Reviewed by: Christian K?nig --- drivers/gpu/drm/radeon/radeon_device.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index d18f0c4..ff28210 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -241,8 +241,8 @@ int radeon_wb_init(struct radeon_device *rdev) rdev->wb.use_event = true; } } - /* always use writeback/events on NI */ - if (ASIC_IS_DCE5(rdev)) { + /* always use writeback/events on NI, APUs */ + if (rdev->family >= CHIP_PALM) { rdev->wb.enabled = true; rdev->wb.use_event = true; } -- 1.7.5.4
[PATCH 01/20] drm/radeon: fix possible lack of synchronization btw ttm and other ring
From: Jerome GlisseWe need to sync with the GFX ring as ttm might have schedule bo move on it and new command scheduled for other ring need to wait for bo data to be in place. Signed-off-by: Jerome Glisse Reviewed by: Christian K?nig --- drivers/gpu/drm/radeon/radeon_cs.c | 12 ++-- include/drm/radeon_drm.h |1 - 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index c66beb1..289b0d7 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -122,15 +122,15 @@ static int radeon_cs_sync_rings(struct radeon_cs_parser *p) int i, r; for (i = 0; i < p->nrelocs; i++) { + struct radeon_fence *fence; + if (!p->relocs[i].robj || !p->relocs[i].robj->tbo.sync_obj) continue; - if (!(p->relocs[i].flags & RADEON_RELOC_DONT_SYNC)) { - struct radeon_fence *fence = p->relocs[i].robj->tbo.sync_obj; - if (fence->ring != p->ring && !radeon_fence_signaled(fence)) { - sync_to_ring[fence->ring] = true; - need_sync = true; - } + fence = p->relocs[i].robj->tbo.sync_obj; + if (fence->ring != p->ring && !radeon_fence_signaled(fence)) { + sync_to_ring[fence->ring] = true; + need_sync = true; } } diff --git a/include/drm/radeon_drm.h b/include/drm/radeon_drm.h index 7c491b4..5805686 100644 --- a/include/drm/radeon_drm.h +++ b/include/drm/radeon_drm.h @@ -926,7 +926,6 @@ struct drm_radeon_cs_chunk { }; /* drm_radeon_cs_reloc.flags */ -#define RADEON_RELOC_DONT_SYNC 0x01 struct drm_radeon_cs_reloc { uint32_thandle; -- 1.7.5.4
SA and other Patches.
Hi Jerome & everybody on the list, this gathers together every patch we developed over the last week or so and which is not already in drm-next. I've run quite some tests with them yesterday and today and as far as I can see hammered out every known bug. For the SA allocator I reverted to tracking the hole pointer instead of just the last allocation, cause otherwise we will never release the first allocation on the list. Glxgears now even keeps happily running if I deadlock on the not GFX rings on purpose. Please take a second look at them and if nobody objects any more we should commit them to drm-next. Cheers, Christian.
[RFC v2 5/5] drm: Add NVIDIA Tegra support
On 25.04.2012 12:45, Thierry Reding wrote: > +/ { > + ... > + > + /* host1x */ > + host1x: host1x at 5000 { > + compatible = "nvidia,tegra20-host1x"; > + reg = <0x5000 0x00024000>; > + interrupts = <0 64 0x04 /* cop syncpt */ > + 0 65 0x04 /* mpcore syncpt */ > + 0 66 0x04 /* cop general */ > + 0 67 0x04>; /* mpcore general */ > + }; > + > + /* video-encoding/decoding */ > + mpe at 5404 { > + reg = <0x5404 0x0004>; > + interrupts = <0 68 0x04>; > + }; > + (...) Hi Thierry, I have still lots of questions regarding how device trees work. I'm now just trying to match the device tree structure with hardware - let me know if that goes wrong. There's a hierarchy in the hardware, which should be represented in the device trees. All of the hardware are client modules for host1x - with the exception of host1x obviously. CPU has two methods for accessing the hardware: clients' register aperture and host1x channels. Both of these operate via host1x hardware. We should define host1x bus in the device tree, and move all nodes except host1x under that bus. This will help us in the long run, as we will have multiple drivers (drm, v4l2) each accessing hardware under host1x. We will need to model the bus and the bus_type will need to take over responsibilities of managing the common resources. When we are clocking hardware, whenever we want to access display's register aperture, host1x needs to be clocked. > + /* graphics host */ > + graphics at 5400 { > + compatible = "nvidia,tegra20-graphics"; > + > + #address-cells = <1>; > + #size-cells = <1>; > + ranges; > + > + display-controllers = < >; > + carveout = <0x0e00 0x0200>; > + host1x = <>; > + gart = <>; > + > + connectors { > + #address-cells = <1>; > + #size-cells = <0>; > + > + connector at 0 { > + reg = <0>; > + edid = /incbin/("machine.edid"); > + output = <>; > + }; > + > + connector at 1 { > + reg = <1>; > + output = <>; > + ddc = <>; > + > + hpd-gpio = < 111 0>; /* PN7 */ > + }; > + }; > + }; > +}; I'm not sure what this node means. The register range from 5400 onwards is actually the one that you just described in the nodes of the individual client modules. Why is it represented here again? Terje
[git pull] drm fixes
Two fixes from Intel, one a regression, one because I merged an early version of a fix. Also the nouveau revert of the i2c code that was tested on the list. Dave. The following changes since commit febb72a6e4cc6c8cffcc1ea649a3fb364f1ea432: IA32 emulation: Fix build problem for modular ia32 a.out support (2012-05-06 18:26:20 -0700) are available in the git repository at: git://people.freedesktop.org/~airlied/linux drm-fixes Ben Skeggs (1): drm/nouveau/i2c: resume use of i2c-algo-bit, rather than custom stack Daniel Vetter (2): drm/i915: disable sdvo hotplug on i945g/gm drm/i915: Do no set Stencil Cache eviction LRA w/a on gen7+ Dave Airlie (1): Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes drivers/gpu/drm/i915/intel_ringbuffer.c |9 +- drivers/gpu/drm/i915/intel_sdvo.c |6 + drivers/gpu/drm/nouveau/nouveau_i2c.c | 199 --- drivers/gpu/drm/nouveau/nouveau_i2c.h |1 + 4 files changed, 34 insertions(+), 181 deletions(-)
[PATCH 04/20] drm/radeon: convert fence to uint64_t v4
On Mon, May 7, 2012 at 11:04 AM, Christian K?nig wrote: > On 07.05.2012 16:39, Jerome Glisse wrote: >> >> On Mon, May 7, 2012 at 7:42 AM, Christian K?nig >> ?wrote: >>> >>> From: Jerome Glisse >>> >>> This convert fence to use uint64_t sequence number intention is >>> to use the fact that uin64_t is big enough that we don't need to >>> care about wrap around. >>> >>> Tested with and without writeback using 0xF000 as initial >>> fence sequence and thus allowing to test the wrap around from >>> 32bits to 64bits. >>> >>> v2: Add comment about possible race btw CPU& ?GPU, add comment >>> >>> ? ?stressing that we need 2 dword aligned for R600_WB_EVENT_OFFSET >>> ? ?Read fence sequenc in reverse order of GPU write them so we >>> ? ?mitigate the race btw CPU and GPU. >>> >>> v3: Drop the need for ring to emit the 64bits fence, and just have >>> ? ?each ring emit the lower 32bits of the fence sequence. We >>> ? ?handle the wrap over 32bits in fence_process. >>> >>> v4: Just a small optimization: Don't reread the last_seq value >>> ? ?if loop restarts, since we already know its value anyway. >>> ? ?Also start at zero not one for seq value and use pre instead >>> ? ?of post increment in emmit, otherwise wait_empty will deadlock. >> >> Why changing that v3 was already good no deadlock. I started at 1 >> especialy for that, a signaled fence is set to 0 so it always compare >> as signaled. Just using preincrement is exactly like starting at one. >> I don't see the need for this change but if it makes you happy. > > > Not exactly, the last emitted sequence is also used in > radeon_fence_wait_empty. So when you use post increment > radeon_fence_wait_empty will actually not wait for the last emitted fence to > be signaled, but for last emitted + 1, so it practically waits forever. > > Without this change suspend (for example) will just lockup. > > Cheers, > Christian. Yeah you right, my tree had a fix for that. I probably messed up the rebase patch at one point. Well as your version fix it i am fine with it. Cheers, Jerome >> >> Cheers, >> Jerome >>> >>> Signed-off-by: Jerome Glisse >>> Signed-off-by: Christian K?nig >>> --- >>> ?drivers/gpu/drm/radeon/radeon.h ? ? ? | ? 39 ++- >>> ?drivers/gpu/drm/radeon/radeon_fence.c | ?116 >>> +++-- >>> ?drivers/gpu/drm/radeon/radeon_ring.c ?| ? ?9 ++- >>> ?3 files changed, 107 insertions(+), 57 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/radeon/radeon.h >>> b/drivers/gpu/drm/radeon/radeon.h >>> index e99ea81..cdf46bc 100644 >>> --- a/drivers/gpu/drm/radeon/radeon.h >>> +++ b/drivers/gpu/drm/radeon/radeon.h >>> @@ -100,28 +100,32 @@ extern int radeon_lockup_timeout; >>> ?* Copy from radeon_drv.h so we don't have to include both and have >>> conflicting >>> ?* symbol; >>> ?*/ >>> -#define RADEON_MAX_USEC_TIMEOUT ? ? ? ? ? ? ? ?10 ?/* 100 ms */ >>> -#define RADEON_FENCE_JIFFIES_TIMEOUT ? (HZ / 2) >>> +#define RADEON_MAX_USEC_TIMEOUT ? ? ? ? ? ? ? ? ? ? ? ?10 ?/* 100 ms >>> */ >>> +#define RADEON_FENCE_JIFFIES_TIMEOUT ? ? ? ? ? (HZ / 2) >>> ?/* RADEON_IB_POOL_SIZE must be a power of 2 */ >>> -#define RADEON_IB_POOL_SIZE ? ? ? ? ? ?16 >>> -#define RADEON_DEBUGFS_MAX_COMPONENTS ?32 >>> -#define RADEONFB_CONN_LIMIT ? ? ? ? ? ?4 >>> -#define RADEON_BIOS_NUM_SCRATCH ? ? ? ? ? ? ? ?8 >>> +#define RADEON_IB_POOL_SIZE ? ? ? ? ? ? ? ? ? ?16 >>> +#define RADEON_DEBUGFS_MAX_COMPONENTS ? ? ? ? ?32 >>> +#define RADEONFB_CONN_LIMIT ? ? ? ? ? ? ? ? ? ?4 >>> +#define RADEON_BIOS_NUM_SCRATCH ? ? ? ? ? ? ? ? ? ? ? ?8 >>> >>> ?/* max number of rings */ >>> -#define RADEON_NUM_RINGS 3 >>> +#define RADEON_NUM_RINGS ? ? ? ? ? ? ? ? ? ? ? 3 >>> + >>> +/* fence seq are set to this number when signaled */ >>> +#define RADEON_FENCE_SIGNALED_SEQ ? ? ? ? ? ? ?0LL >>> +#define RADEON_FENCE_NOTEMITED_SEQ ? ? ? ? ? ? (~0LL) >>> >>> ?/* internal ring indices */ >>> ?/* r1xx+ has gfx CP ring */ >>> -#define RADEON_RING_TYPE_GFX_INDEX ?0 >>> +#define RADEON_RING_TYPE_GFX_INDEX ? ? ? ? ? ? 0 >>> >>> ?/* cayman has 2 compute CP rings */ >>> -#define CAYMAN_RING_TYPE_CP1_INDEX 1 >>> -#define CAYMAN_RING_TYPE_CP2_INDEX 2 >>> +#define CAYMAN_RING_TYPE_CP1_INDEX ? ? ? ? ? ? 1 >>> +#define CAYMAN_RING_TYPE_CP2_INDEX ? ? ? ? ? ? 2 >>> >>> ?/* hardcode those limit for now */ >>> -#define RADEON_VA_RESERVED_SIZE ? ? ? ? ? ? ? ?(8<< ?20) >>> -#define RADEON_IB_VM_MAX_SIZE ? ? ? ? ?(64<< ?10) >>> +#define RADEON_VA_RESERVED_SIZE ? ? ? ? ? ? ? ? ? ? ? ?(8<< ?20) >>> +#define RADEON_IB_VM_MAX_SIZE ? ? ? ? ? ? ? ? ?(64<< ?10) >>> >>> ?/* >>> ?* Errata workarounds. >>> @@ -254,8 +258,9 @@ struct radeon_fence_driver { >>> ? ? ? ?uint32_t ? ? ? ? ? ? ? ? ? ? ? ?scratch_reg; >>> ? ? ? ?uint64_t ? ? ? ? ? ? ? ? ? ? ? ?gpu_addr; >>> ? ? ? ?volatile uint32_t ? ? ? ? ? ? ? *cpu_addr; >>> - ? ? ? atomic_t ? ? ? ? ? ? ? ? ? ? ? ?seq; >>> - ? ? ? uint32_t ? ? ? ? ? ? ? ? ? ? ? ?last_seq; >>> + ? ? ? /* seq is protected by ring emission lock */ >>> + ? ? ? uint64_t ? ? ? ? ? ? ? ? ? ? ?
[PATCH 14/20] drm/radeon: multiple ring allocator v2
On Mon, May 7, 2012 at 7:42 AM, Christian K?nig wrote: > A startover with a new idea for a multiple ring allocator. > Should perform as well as a normal ring allocator as long > as only one ring does somthing, but falls back to a more > complex algorithm if more complex things start to happen. > > We store the last allocated bo in last, we always try to allocate > after the last allocated bo. Principle is that in a linear GPU ring > progression was is after last is the oldest bo we allocated and thus > the first one that should no longer be in use by the GPU. > > If it's not the case we skip over the bo after last to the closest > done bo if such one exist. If none exist and we are not asked to > block we report failure to allocate. > > If we are asked to block we wait on all the oldest fence of all > rings. We just wait for any of those fence to complete. > > v2: We need to be able to let hole point to the list_head, otherwise > ? ?try free will never free the first allocation of the list. Also > ? ?stop calling radeon_fence_signalled more than necessary. > > Signed-off-by: Christian K?nig > Signed-off-by: Jerome Glisse This one is NAK please use my patch. Yes in my patch we never try to free anything if there is only on sa_bo in the list if you really care about this it's a one line change: http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v2.patch Your patch here can enter in infinite loop and never return holding the lock. See below. Cheers, Jerome > --- > ?drivers/gpu/drm/radeon/radeon.h ? ? ?| ? ?7 +- > ?drivers/gpu/drm/radeon/radeon_ring.c | ? 19 +-- > ?drivers/gpu/drm/radeon/radeon_sa.c ? | ?292 > +++--- > ?3 files changed, 210 insertions(+), 108 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h > index 37a7459..cc7f16a 100644 > --- a/drivers/gpu/drm/radeon/radeon.h > +++ b/drivers/gpu/drm/radeon/radeon.h > @@ -385,7 +385,9 @@ struct radeon_bo_list { > ?struct radeon_sa_manager { > ? ? ? ?spinlock_t ? ? ? ? ? ? ?lock; > ? ? ? ?struct radeon_bo ? ? ? ?*bo; > - ? ? ? struct list_head ? ? ? ?sa_bo; > + ? ? ? struct list_head ? ? ? ?*hole; > + ? ? ? struct list_head ? ? ? ?flist[RADEON_NUM_RINGS]; > + ? ? ? struct list_head ? ? ? ?olist; > ? ? ? ?unsigned ? ? ? ? ? ? ? ?size; > ? ? ? ?uint64_t ? ? ? ? ? ? ? ?gpu_addr; > ? ? ? ?void ? ? ? ? ? ? ? ? ? ?*cpu_ptr; > @@ -396,7 +398,8 @@ struct radeon_sa_bo; > > ?/* sub-allocation buffer */ > ?struct radeon_sa_bo { > - ? ? ? struct list_head ? ? ? ? ? ? ? ?list; > + ? ? ? struct list_head ? ? ? ? ? ? ? ?olist; > + ? ? ? struct list_head ? ? ? ? ? ? ? ?flist; > ? ? ? ?struct radeon_sa_manager ? ? ? ?*manager; > ? ? ? ?unsigned ? ? ? ? ? ? ? ? ? ? ? ?soffset; > ? ? ? ?unsigned ? ? ? ? ? ? ? ? ? ? ? ?eoffset; > diff --git a/drivers/gpu/drm/radeon/radeon_ring.c > b/drivers/gpu/drm/radeon/radeon_ring.c > index 1748d93..e074ff5 100644 > --- a/drivers/gpu/drm/radeon/radeon_ring.c > +++ b/drivers/gpu/drm/radeon/radeon_ring.c > @@ -204,25 +204,22 @@ int radeon_ib_schedule(struct radeon_device *rdev, > struct radeon_ib *ib) > > ?int radeon_ib_pool_init(struct radeon_device *rdev) > ?{ > - ? ? ? struct radeon_sa_manager tmp; > ? ? ? ?int i, r; > > - ? ? ? r = radeon_sa_bo_manager_init(rdev, , > - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? RADEON_IB_POOL_SIZE*64*1024, > - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? RADEON_GEM_DOMAIN_GTT); > - ? ? ? if (r) { > - ? ? ? ? ? ? ? return r; > - ? ? ? } > - > ? ? ? ?radeon_mutex_lock(>ib_pool.mutex); > ? ? ? ?if (rdev->ib_pool.ready) { > ? ? ? ? ? ? ? ?radeon_mutex_unlock(>ib_pool.mutex); > - ? ? ? ? ? ? ? radeon_sa_bo_manager_fini(rdev, ); > ? ? ? ? ? ? ? ?return 0; > ? ? ? ?} > > - ? ? ? rdev->ib_pool.sa_manager = tmp; > - ? ? ? INIT_LIST_HEAD(>ib_pool.sa_manager.sa_bo); > + ? ? ? r = radeon_sa_bo_manager_init(rdev, >ib_pool.sa_manager, > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? RADEON_IB_POOL_SIZE*64*1024, > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? RADEON_GEM_DOMAIN_GTT); > + ? ? ? if (r) { > + ? ? ? ? ? ? ? radeon_mutex_unlock(>ib_pool.mutex); > + ? ? ? ? ? ? ? return r; > + ? ? ? } > + > ? ? ? ?for (i = 0; i < RADEON_IB_POOL_SIZE; i++) { > ? ? ? ? ? ? ? ?rdev->ib_pool.ibs[i].fence = NULL; > ? ? ? ? ? ? ? ?rdev->ib_pool.ibs[i].idx = i; > diff --git a/drivers/gpu/drm/radeon/radeon_sa.c > b/drivers/gpu/drm/radeon/radeon_sa.c > index 90ee8ad..757a9d4 100644 > --- a/drivers/gpu/drm/radeon/radeon_sa.c > +++ b/drivers/gpu/drm/radeon/radeon_sa.c > @@ -27,21 +27,42 @@ > ?* Authors: > ?* ? ?Jerome Glisse > ?*/ > +/* Algorithm: > + * > + * We store the last allocated bo in "hole", we always try to allocate > + * after the last allocated bo. Principle is that in a linear GPU ring > + * progression was is after last is the oldest bo we allocated and thus > + * the first one that should no longer be in use by the GPU. > + * > + * If it's not the case we skip over the bo after last to the closest > + * done bo if such one exist. If
[PULL] drm-intel-fixes
Hi Dave, 2 little patches: - One regression fix to disable sdvo hotplug on broken hw. - One patch to upconvert the snb hang workaround from patch v1 to patch v2. Yours, Daniel The following changes since commit d48b97b403d23f6df0b990cee652bdf9a52337a3: Linux 3.4-rc6 (2012-05-06 15:07:32 -0700) are available in the git repository at: git://people.freedesktop.org/~danvet/drm-intel drm-intel-fixes for you to fetch changes up to 2e7a44814d802c8ba479164b8924070cd908d6b5: drm/i915: Do no set Stencil Cache eviction LRA w/a on gen7+ (2012-05-07 10:37:56 +0200) Daniel Vetter (2): drm/i915: disable sdvo hotplug on i945g/gm drm/i915: Do no set Stencil Cache eviction LRA w/a on gen7+ drivers/gpu/drm/i915/intel_ringbuffer.c |9 ++--- drivers/gpu/drm/i915/intel_sdvo.c |6 ++ 2 files changed, 12 insertions(+), 3 deletions(-) -- Daniel Vetter Mail: daniel at ffwll.ch Mobile: +41 (0)79 365 57 48
[PATCH 04/20] drm/radeon: convert fence to uint64_t v4
On Mon, May 7, 2012 at 7:42 AM, Christian K?nig wrote: > From: Jerome Glisse > > This convert fence to use uint64_t sequence number intention is > to use the fact that uin64_t is big enough that we don't need to > care about wrap around. > > Tested with and without writeback using 0xF000 as initial > fence sequence and thus allowing to test the wrap around from > 32bits to 64bits. > > v2: Add comment about possible race btw CPU & GPU, add comment > ? ?stressing that we need 2 dword aligned for R600_WB_EVENT_OFFSET > ? ?Read fence sequenc in reverse order of GPU write them so we > ? ?mitigate the race btw CPU and GPU. > > v3: Drop the need for ring to emit the 64bits fence, and just have > ? ?each ring emit the lower 32bits of the fence sequence. We > ? ?handle the wrap over 32bits in fence_process. > > v4: Just a small optimization: Don't reread the last_seq value > ? ?if loop restarts, since we already know its value anyway. > ? ?Also start at zero not one for seq value and use pre instead > ? ?of post increment in emmit, otherwise wait_empty will deadlock. Why changing that v3 was already good no deadlock. I started at 1 especialy for that, a signaled fence is set to 0 so it always compare as signaled. Just using preincrement is exactly like starting at one. I don't see the need for this change but if it makes you happy. Cheers, Jerome > > Signed-off-by: Jerome Glisse > Signed-off-by: Christian K?nig > --- > ?drivers/gpu/drm/radeon/radeon.h ? ? ? | ? 39 ++- > ?drivers/gpu/drm/radeon/radeon_fence.c | ?116 > +++-- > ?drivers/gpu/drm/radeon/radeon_ring.c ?| ? ?9 ++- > ?3 files changed, 107 insertions(+), 57 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h > index e99ea81..cdf46bc 100644 > --- a/drivers/gpu/drm/radeon/radeon.h > +++ b/drivers/gpu/drm/radeon/radeon.h > @@ -100,28 +100,32 @@ extern int radeon_lockup_timeout; > ?* Copy from radeon_drv.h so we don't have to include both and have > conflicting > ?* symbol; > ?*/ > -#define RADEON_MAX_USEC_TIMEOUT ? ? ? ? ? ? ? ?10 ?/* 100 ms */ > -#define RADEON_FENCE_JIFFIES_TIMEOUT ? (HZ / 2) > +#define RADEON_MAX_USEC_TIMEOUT ? ? ? ? ? ? ? ? ? ? ? ?10 ?/* 100 ms */ > +#define RADEON_FENCE_JIFFIES_TIMEOUT ? ? ? ? ? (HZ / 2) > ?/* RADEON_IB_POOL_SIZE must be a power of 2 */ > -#define RADEON_IB_POOL_SIZE ? ? ? ? ? ?16 > -#define RADEON_DEBUGFS_MAX_COMPONENTS ?32 > -#define RADEONFB_CONN_LIMIT ? ? ? ? ? ?4 > -#define RADEON_BIOS_NUM_SCRATCH ? ? ? ? ? ? ? ?8 > +#define RADEON_IB_POOL_SIZE ? ? ? ? ? ? ? ? ? ?16 > +#define RADEON_DEBUGFS_MAX_COMPONENTS ? ? ? ? ?32 > +#define RADEONFB_CONN_LIMIT ? ? ? ? ? ? ? ? ? ?4 > +#define RADEON_BIOS_NUM_SCRATCH ? ? ? ? ? ? ? ? ? ? ? ?8 > > ?/* max number of rings */ > -#define RADEON_NUM_RINGS 3 > +#define RADEON_NUM_RINGS ? ? ? ? ? ? ? ? ? ? ? 3 > + > +/* fence seq are set to this number when signaled */ > +#define RADEON_FENCE_SIGNALED_SEQ ? ? ? ? ? ? ?0LL > +#define RADEON_FENCE_NOTEMITED_SEQ ? ? ? ? ? ? (~0LL) > > ?/* internal ring indices */ > ?/* r1xx+ has gfx CP ring */ > -#define RADEON_RING_TYPE_GFX_INDEX ?0 > +#define RADEON_RING_TYPE_GFX_INDEX ? ? ? ? ? ? 0 > > ?/* cayman has 2 compute CP rings */ > -#define CAYMAN_RING_TYPE_CP1_INDEX 1 > -#define CAYMAN_RING_TYPE_CP2_INDEX 2 > +#define CAYMAN_RING_TYPE_CP1_INDEX ? ? ? ? ? ? 1 > +#define CAYMAN_RING_TYPE_CP2_INDEX ? ? ? ? ? ? 2 > > ?/* hardcode those limit for now */ > -#define RADEON_VA_RESERVED_SIZE ? ? ? ? ? ? ? ?(8 << 20) > -#define RADEON_IB_VM_MAX_SIZE ? ? ? ? ?(64 << 10) > +#define RADEON_VA_RESERVED_SIZE ? ? ? ? ? ? ? ? ? ? ? ?(8 << 20) > +#define RADEON_IB_VM_MAX_SIZE ? ? ? ? ? ? ? ? ?(64 << 10) > > ?/* > ?* Errata workarounds. > @@ -254,8 +258,9 @@ struct radeon_fence_driver { > ? ? ? ?uint32_t ? ? ? ? ? ? ? ? ? ? ? ?scratch_reg; > ? ? ? ?uint64_t ? ? ? ? ? ? ? ? ? ? ? ?gpu_addr; > ? ? ? ?volatile uint32_t ? ? ? ? ? ? ? *cpu_addr; > - ? ? ? atomic_t ? ? ? ? ? ? ? ? ? ? ? ?seq; > - ? ? ? uint32_t ? ? ? ? ? ? ? ? ? ? ? ?last_seq; > + ? ? ? /* seq is protected by ring emission lock */ > + ? ? ? uint64_t ? ? ? ? ? ? ? ? ? ? ? ?seq; > + ? ? ? atomic64_t ? ? ? ? ? ? ? ? ? ? ?last_seq; > ? ? ? ?unsigned long ? ? ? ? ? ? ? ? ? last_activity; > ? ? ? ?wait_queue_head_t ? ? ? ? ? ? ? queue; > ? ? ? ?struct list_head ? ? ? ? ? ? ? ?emitted; > @@ -268,11 +273,9 @@ struct radeon_fence { > ? ? ? ?struct kref ? ? ? ? ? ? ? ? ? ? kref; > ? ? ? ?struct list_head ? ? ? ? ? ? ? ?list; > ? ? ? ?/* protected by radeon_fence.lock */ > - ? ? ? uint32_t ? ? ? ? ? ? ? ? ? ? ? ?seq; > - ? ? ? bool ? ? ? ? ? ? ? ? ? ? ? ? ? ?emitted; > - ? ? ? bool ? ? ? ? ? ? ? ? ? ? ? ? ? ?signaled; > + ? ? ? uint64_t ? ? ? ? ? ? ? ? ? ? ? ?seq; > ? ? ? ?/* RB, DMA, etc. */ > - ? ? ? int ? ? ? ? ? ? ? ? ? ? ? ? ? ? ring; > + ? ? ? unsigned ? ? ? ? ? ? ? ? ? ? ? ?ring; > ? ? ? ?struct radeon_semaphore ? ? ? ? *semaphore; > ?}; > > diff --git a/drivers/gpu/drm/radeon/radeon_fence.c >
SA and other Patches.
On Mon, May 7, 2012 at 7:42 AM, Christian K?nig wrote: > Hi Jerome & everybody on the list, > > this gathers together every patch we developed over the last week or so and > which is not already in drm-next. > > I've run quite some tests with them yesterday and today and as far as I can > see hammered out every known bug. For the SA allocator I reverted to tracking > the hole pointer instead of just the last allocation, cause otherwise we will > never release the first allocation on the list. Glxgears now even keeps > happily > running if I deadlock on the not GFX rings on purpose. Now we will release the first entry if we use the last allocate ptr i believe it's cleaner to use the last ptr. > Please take a second look at them and if nobody objects any more we should > commit them to drm-next. > > Cheers, > Christian. > Cheers, Jerome
[RFC v2 5/5] drm: Add NVIDIA Tegra support
On 05/07/2012 02:50 AM, Terje Bergstr?m wrote: > On 25.04.2012 12:45, Thierry Reding wrote: > >> +/ { >> + ... >> + >> + /* host1x */ >> + host1x: host1x at 5000 { >> + compatible = "nvidia,tegra20-host1x"; >> + reg = <0x5000 0x00024000>; >> + interrupts = <0 64 0x04 /* cop syncpt */ >> + 0 65 0x04 /* mpcore syncpt */ >> + 0 66 0x04 /* cop general */ >> + 0 67 0x04>; /* mpcore general */ >> + }; >> + >> + /* video-encoding/decoding */ >> + mpe at 5404 { >> + reg = <0x5404 0x0004>; >> + interrupts = <0 68 0x04>; >> + }; >> + > > (...) > > Hi Thierry, > > I have still lots of questions regarding how device trees work. I'm now > just trying to match the device tree structure with hardware - let me > know if that goes wrong. > > There's a hierarchy in the hardware, which should be represented in the > device trees. All of the hardware are client modules for host1x - with > the exception of host1x obviously. CPU has two methods for accessing the > hardware: clients' register aperture and host1x channels. Both of these > operate via host1x hardware. > > We should define host1x bus in the device tree, and move all nodes > except host1x under that bus. I think the host1x node /is/ that bus.
[Bug 45018] [bisected] rendering regression since added support for virtual address space on cayman v11
https://bugs.freedesktop.org/show_bug.cgi?id=45018 --- Comment #55 from Michel D?nzer 2012-05-07 03:07:07 PDT --- (In reply to comment #54) > On latest git (3cd7bee48f7caf7850ea64d40f43875d4c975507), in > src/gallium/drivers/r600/r66_hw_context.c, on line 194, shouldn't it be: > - int offset > + unsigned offset That might be slightly better, but it doesn't really matter. It's the offset from the start of the MMIO aperture, so it would only matter if the register aperture grew beyond 2GB, which we're almost 5 orders of magnitude short of. Very unlikely. > Also, at line 1259, I'm not quite sure why it is shifted by 2. Most of the > time, offset is usually shifted by 8. It's just converting offset from units of 32 bits to bytes. > Just looking through the code to see if something could have been missed... Right now it would be most useful to track down why radeon_bomgr_find_va / radeon_bomgr_force_va ends up returning the offset the kernel complains about. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[RFC][PATCH] drm/radeon/hdmi: define struct for AVI infoframe
On Son, 2012-05-06 at 18:29 +0200, Rafa? Mi?ecki wrote: > 2012/5/6 Dave Airlie : > > On Sun, May 6, 2012 at 5:19 PM, Rafa? Mi?ecki wrote: > >> 2012/5/6 Rafa? Mi?ecki : > >>> diff --git a/drivers/gpu/drm/radeon/r600_hdmi.c > >>> b/drivers/gpu/drm/radeon/r600_hdmi.c > >>> index c308432..b14c90a 100644 > >>> --- a/drivers/gpu/drm/radeon/r600_hdmi.c > >>> +++ b/drivers/gpu/drm/radeon/r600_hdmi.c > >>> @@ -134,78 +134,22 @@ static void r600_hdmi_infoframe_checksum(uint8_t > >>> packetType, > >>> } > >>> > >>> /* > >>> - * build a HDMI Video Info Frame > >>> + * Upload a HDMI AVI Infoframe > >>> */ > >>> -static void r600_hdmi_videoinfoframe( > >>> - struct drm_encoder *encoder, > >>> - enum r600_hdmi_color_format color_format, > >>> - int active_information_present, > >>> - uint8_t active_format_aspect_ratio, > >>> - uint8_t scan_information, > >>> - uint8_t colorimetry, > >>> - uint8_t ex_colorimetry, > >>> - uint8_t quantization, > >>> - int ITC, > >>> - uint8_t picture_aspect_ratio, > >>> - uint8_t video_format_identification, > >>> - uint8_t pixel_repetition, > >>> - uint8_t non_uniform_picture_scaling, > >>> - uint8_t bar_info_data_valid, > >>> - uint16_t top_bar, > >>> - uint16_t bottom_bar, > >>> - uint16_t left_bar, > >>> - uint16_t right_bar > >>> -) > >> > >> In case someone wonders about the reason: I think it's really ugly to > >> have a function taking 18 arguments, 17 of them related to the > >> infoframe. It makes much more sense for me to use struct for that. > >> While working on that I though it's reasonable to prepare nice > >> bitfield __packed struct ready-to-be-written to the GPU registers. > > > > won't this screw up on other endian machines? > > Hm, maybe it can. Is there some easy to handle it correctly? Some trick like > __le8 foo: 3 > __le8 bar: 1 > maybe? Not really. The memory layout of bitfields is basically completely up to the C implementation, so IMHO they're just inadequate for describing fixed memory layouts. -- Earthling Michel D?nzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer
[Bug 49567] No rule to make target libradeon.a, needed by libr600.a.
https://bugs.freedesktop.org/show_bug.cgi?id=49567 --- Comment #1 from Mike Mestnik2012-05-06 20:47:52 PDT --- I got this same error with llvm1.3-rc1 and rc2. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[Bug 49110] debug build: AMDILCFGStructurizer.cpp:1751:3: error: 'isCurrentDebugType' was not declared in this scope
https://bugs.freedesktop.org/show_bug.cgi?id=49110 --- Comment #4 from Mike Mestnik2012-05-06 20:44:15 PDT --- This bug cloned to: https://bugs.freedesktop.org/show_bug.cgi?id=49567 No rule to make target libradeon.a, needed by libr600.a. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[Bug 49110] debug build: AMDILCFGStructurizer.cpp:1751:3: error: 'isCurrentDebugType' was not declared in this scope
https://bugs.freedesktop.org/show_bug.cgi?id=49110 Mike Mestnikchanged: What|Removed |Added See Also||https://bugs.freedesktop.or ||g/show_bug.cgi?id=49567 -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[Bug 49567] No rule to make target libradeon.a, needed by libr600.a.
https://bugs.freedesktop.org/show_bug.cgi?id=49567 Mike Mestnikchanged: What|Removed |Added Depends on|49110 | -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[Bug 49110] debug build: AMDILCFGStructurizer.cpp:1751:3: error: 'isCurrentDebugType' was not declared in this scope
https://bugs.freedesktop.org/show_bug.cgi?id=49110 Mike Mestnikchanged: What|Removed |Added Blocks|49567 | -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[Bug 49110] debug build: AMDILCFGStructurizer.cpp:1751:3: error: 'isCurrentDebugType' was not declared in this scope
https://bugs.freedesktop.org/show_bug.cgi?id=49110 Mike Mestnikchanged: What|Removed |Added Priority|medium |low -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[Bug 49110] debug build: AMDILCFGStructurizer.cpp:1751:3: error: 'isCurrentDebugType' was not declared in this scope
https://bugs.freedesktop.org/show_bug.cgi?id=49110 Mike Mestnikchanged: What|Removed |Added Summary|AMDILCFGStructurizer.cpp:17 |debug build: |51:3: error:|AMDILCFGStructurizer.cpp:17 |'isCurrentDebugType' was|51:3: error: |not declared in this scope |'isCurrentDebugType' was ||not declared in this scope -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[Bug 49567] New: No rule to make target libradeon.a, needed by libr600.a.
https://bugs.freedesktop.org/show_bug.cgi?id=49567 Bug #: 49567 Summary: No rule to make target libradeon.a, needed by libr600.a. Classification: Unclassified Product: Mesa Version: git Platform: x86 (IA32) OS/Version: Linux (All) Status: NEW Severity: major Priority: medium Component: Drivers/Gallium/r600 AssignedTo: dri-devel at lists.freedesktop.org ReportedBy: cheako+bugs_freedesktop_org at mikemestnik.net CC: cheako+bugs_freedesktop_org at mikemestnik.net, fabio.ped at libero.it Depends on: 49110 +++ This bug was initially created as a clone of Bug #49110 +++ make[5]: *** No rule to make target `../../../../../../src/gallium/drivers/radeon/libradeon.a', needed by `libr600.a'. Stop. Full log at: https://launchpadlibrarian.net/103127393/buildlog_ubuntu-precise-i386.mesa_8.1~git1204261417.a2f7ec~gd~p_FAILEDTOBUILD.txt.gz https://launchpadlibrarian.net/104275700/buildlog_ubuntu-precise-i386.mesa_8.1~git20120504.5cc4b4aa-1ubuntu0cheako2~precise_FAILEDTOBUILD.txt.gz -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
[Bug 49110] AMDILCFGStructurizer.cpp:1751:3: error: 'isCurrentDebugType' was not declared in this scope
https://bugs.freedesktop.org/show_bug.cgi?id=49110 Mike Mestnikchanged: What|Removed |Added Blocks||49567 -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
Re: [RFC][PATCH] drm/radeon/hdmi: define struct for AVI infoframe
On Son, 2012-05-06 at 18:29 +0200, Rafał Miłecki wrote: 2012/5/6 Dave Airlie airl...@gmail.com: On Sun, May 6, 2012 at 5:19 PM, Rafał Miłecki zaj...@gmail.com wrote: 2012/5/6 Rafał Miłecki zaj...@gmail.com: diff --git a/drivers/gpu/drm/radeon/r600_hdmi.c b/drivers/gpu/drm/radeon/r600_hdmi.c index c308432..b14c90a 100644 --- a/drivers/gpu/drm/radeon/r600_hdmi.c +++ b/drivers/gpu/drm/radeon/r600_hdmi.c @@ -134,78 +134,22 @@ static void r600_hdmi_infoframe_checksum(uint8_t packetType, } /* - * build a HDMI Video Info Frame + * Upload a HDMI AVI Infoframe */ -static void r600_hdmi_videoinfoframe( - struct drm_encoder *encoder, - enum r600_hdmi_color_format color_format, - int active_information_present, - uint8_t active_format_aspect_ratio, - uint8_t scan_information, - uint8_t colorimetry, - uint8_t ex_colorimetry, - uint8_t quantization, - int ITC, - uint8_t picture_aspect_ratio, - uint8_t video_format_identification, - uint8_t pixel_repetition, - uint8_t non_uniform_picture_scaling, - uint8_t bar_info_data_valid, - uint16_t top_bar, - uint16_t bottom_bar, - uint16_t left_bar, - uint16_t right_bar -) In case someone wonders about the reason: I think it's really ugly to have a function taking 18 arguments, 17 of them related to the infoframe. It makes much more sense for me to use struct for that. While working on that I though it's reasonable to prepare nice bitfield __packed struct ready-to-be-written to the GPU registers. won't this screw up on other endian machines? Hm, maybe it can. Is there some easy to handle it correctly? Some trick like __le8 foo: 3 __le8 bar: 1 maybe? Not really. The memory layout of bitfields is basically completely up to the C implementation, so IMHO they're just inadequate for describing fixed memory layouts. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PULL] drm-intel-fixes
Hi Dave, 2 little patches: - One regression fix to disable sdvo hotplug on broken hw. - One patch to upconvert the snb hang workaround from patch v1 to patch v2. Yours, Daniel The following changes since commit d48b97b403d23f6df0b990cee652bdf9a52337a3: Linux 3.4-rc6 (2012-05-06 15:07:32 -0700) are available in the git repository at: git://people.freedesktop.org/~danvet/drm-intel drm-intel-fixes for you to fetch changes up to 2e7a44814d802c8ba479164b8924070cd908d6b5: drm/i915: Do no set Stencil Cache eviction LRA w/a on gen7+ (2012-05-07 10:37:56 +0200) Daniel Vetter (2): drm/i915: disable sdvo hotplug on i945g/gm drm/i915: Do no set Stencil Cache eviction LRA w/a on gen7+ drivers/gpu/drm/i915/intel_ringbuffer.c |9 ++--- drivers/gpu/drm/i915/intel_sdvo.c |6 ++ 2 files changed, 12 insertions(+), 3 deletions(-) -- Daniel Vetter Mail: dan...@ffwll.ch Mobile: +41 (0)79 365 57 48 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC v2 5/5] drm: Add NVIDIA Tegra support
On 25.04.2012 12:45, Thierry Reding wrote: +/ { + ... + + /* host1x */ + host1x: host1x@5000 { + compatible = nvidia,tegra20-host1x; + reg = 0x5000 0x00024000; + interrupts = 0 64 0x04 /* cop syncpt */ + 0 65 0x04 /* mpcore syncpt */ + 0 66 0x04 /* cop general */ + 0 67 0x04; /* mpcore general */ + }; + + /* video-encoding/decoding */ + mpe@5404 { + reg = 0x5404 0x0004; + interrupts = 0 68 0x04; + }; + (...) Hi Thierry, I have still lots of questions regarding how device trees work. I'm now just trying to match the device tree structure with hardware - let me know if that goes wrong. There's a hierarchy in the hardware, which should be represented in the device trees. All of the hardware are client modules for host1x - with the exception of host1x obviously. CPU has two methods for accessing the hardware: clients' register aperture and host1x channels. Both of these operate via host1x hardware. We should define host1x bus in the device tree, and move all nodes except host1x under that bus. This will help us in the long run, as we will have multiple drivers (drm, v4l2) each accessing hardware under host1x. We will need to model the bus and the bus_type will need to take over responsibilities of managing the common resources. When we are clocking hardware, whenever we want to access display's register aperture, host1x needs to be clocked. + /* graphics host */ + graphics@5400 { + compatible = nvidia,tegra20-graphics; + + #address-cells = 1; + #size-cells = 1; + ranges; + + display-controllers = disp1 disp2; + carveout = 0x0e00 0x0200; + host1x = host1x; + gart = gart; + + connectors { + #address-cells = 1; + #size-cells = 0; + + connector@0 { + reg = 0; + edid = /incbin/(machine.edid); + output = lvds; + }; + + connector@1 { + reg = 1; + output = hdmi; + ddc = i2c2; + + hpd-gpio = gpio 111 0; /* PN7 */ + }; + }; + }; +}; I'm not sure what this node means. The register range from 5400 onwards is actually the one that you just described in the nodes of the individual client modules. Why is it represented here again? Terje ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 45018] [bisected] rendering regression since added support for virtual address space on cayman v11
https://bugs.freedesktop.org/show_bug.cgi?id=45018 --- Comment #55 from Michel Dänzer mic...@daenzer.net 2012-05-07 03:07:07 PDT --- (In reply to comment #54) On latest git (3cd7bee48f7caf7850ea64d40f43875d4c975507), in src/gallium/drivers/r600/r66_hw_context.c, on line 194, shouldn't it be: - int offset + unsigned offset That might be slightly better, but it doesn't really matter. It's the offset from the start of the MMIO aperture, so it would only matter if the register aperture grew beyond 2GB, which we're almost 5 orders of magnitude short of. Very unlikely. Also, at line 1259, I'm not quite sure why it is shifted by 2. Most of the time, offset is usually shifted by 8. It's just converting offset from units of 32 bits to bytes. Just looking through the code to see if something could have been missed... Right now it would be most useful to track down why radeon_bomgr_find_va / radeon_bomgr_force_va ends up returning the offset the kernel complains about. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[git pull] drm fixes
Two fixes from Intel, one a regression, one because I merged an early version of a fix. Also the nouveau revert of the i2c code that was tested on the list. Dave. The following changes since commit febb72a6e4cc6c8cffcc1ea649a3fb364f1ea432: IA32 emulation: Fix build problem for modular ia32 a.out support (2012-05-06 18:26:20 -0700) are available in the git repository at: git://people.freedesktop.org/~airlied/linux drm-fixes Ben Skeggs (1): drm/nouveau/i2c: resume use of i2c-algo-bit, rather than custom stack Daniel Vetter (2): drm/i915: disable sdvo hotplug on i945g/gm drm/i915: Do no set Stencil Cache eviction LRA w/a on gen7+ Dave Airlie (1): Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes drivers/gpu/drm/i915/intel_ringbuffer.c |9 +- drivers/gpu/drm/i915/intel_sdvo.c |6 + drivers/gpu/drm/nouveau/nouveau_i2c.c | 199 --- drivers/gpu/drm/nouveau/nouveau_i2c.h |1 + 4 files changed, 34 insertions(+), 181 deletions(-) ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
SA and other Patches.
Hi Jerome everybody on the list, this gathers together every patch we developed over the last week or so and which is not already in drm-next. I've run quite some tests with them yesterday and today and as far as I can see hammered out every known bug. For the SA allocator I reverted to tracking the hole pointer instead of just the last allocation, cause otherwise we will never release the first allocation on the list. Glxgears now even keeps happily running if I deadlock on the not GFX rings on purpose. Please take a second look at them and if nobody objects any more we should commit them to drm-next. Cheers, Christian. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 02/20] drm/radeon: clarify and extend wb setup on APUs and NI+ asics
From: Alex Deucher alexander.deuc...@amd.com Use family rather than DCE check for clarity, also always use wb on APUs, there will never be AGP variants. Signed-off-by: Alex Deucher alexander.deuc...@amd.com Reviewed by: Christian König christian.koe...@amd.com --- drivers/gpu/drm/radeon/radeon_device.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index d18f0c4..ff28210 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -241,8 +241,8 @@ int radeon_wb_init(struct radeon_device *rdev) rdev-wb.use_event = true; } } - /* always use writeback/events on NI */ - if (ASIC_IS_DCE5(rdev)) { + /* always use writeback/events on NI, APUs */ + if (rdev-family = CHIP_PALM) { rdev-wb.enabled = true; rdev-wb.use_event = true; } -- 1.7.5.4 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 09/20] drm/radeon: add sub allocator debugfs file
Dumping the current allocations. Signed-off-by: Christian König deathsim...@vodafone.de --- drivers/gpu/drm/radeon/radeon_object.h |5 + drivers/gpu/drm/radeon/radeon_ring.c | 22 ++ drivers/gpu/drm/radeon/radeon_sa.c | 14 ++ 3 files changed, 41 insertions(+), 0 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h index c120ab9..d9fca1e 100644 --- a/drivers/gpu/drm/radeon/radeon_object.h +++ b/drivers/gpu/drm/radeon/radeon_object.h @@ -172,5 +172,10 @@ extern int radeon_sa_bo_new(struct radeon_device *rdev, unsigned size, unsigned align); extern void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo *sa_bo); +#if defined(CONFIG_DEBUG_FS) +extern void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager, +struct seq_file *m); +#endif + #endif diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 116be5e..f49c9c0 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -601,6 +601,23 @@ static int radeon_debugfs_ib_info(struct seq_file *m, void *data) static struct drm_info_list radeon_debugfs_ib_list[RADEON_IB_POOL_SIZE]; static char radeon_debugfs_ib_names[RADEON_IB_POOL_SIZE][32]; static unsigned radeon_debugfs_ib_idx[RADEON_IB_POOL_SIZE]; + +static int radeon_debugfs_sa_info(struct seq_file *m, void *data) +{ + struct drm_info_node *node = (struct drm_info_node *) m-private; + struct drm_device *dev = node-minor-dev; + struct radeon_device *rdev = dev-dev_private; + + radeon_sa_bo_dump_debug_info(rdev-ib_pool.sa_manager, m); + + return 0; + +} + +static struct drm_info_list radeon_debugfs_sa_list[] = { +{radeon_sa_info, radeon_debugfs_sa_info, 0, NULL}, +}; + #endif int radeon_debugfs_ring_init(struct radeon_device *rdev, struct radeon_ring *ring) @@ -627,6 +644,11 @@ int radeon_debugfs_ib_init(struct radeon_device *rdev) { #if defined(CONFIG_DEBUG_FS) unsigned i; + int r; + + r = radeon_debugfs_add_files(rdev, radeon_debugfs_sa_list, 1); + if (r) + return r; for (i = 0; i RADEON_IB_POOL_SIZE; i++) { sprintf(radeon_debugfs_ib_names[i], radeon_ib_%04u, i); diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c index aed0a8c..1db0568 100644 --- a/drivers/gpu/drm/radeon/radeon_sa.c +++ b/drivers/gpu/drm/radeon/radeon_sa.c @@ -193,3 +193,17 @@ void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo *sa_bo) list_del_init(sa_bo-list); spin_unlock(sa_bo-manager-lock); } + +#if defined(CONFIG_DEBUG_FS) +void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager, + struct seq_file *m) +{ + struct radeon_sa_bo *i; + + spin_lock(sa_manager-lock); + list_for_each_entry(i, sa_manager-sa_bo, list) { + seq_printf(m, offset %08d: size %4d\n, i-offset, i-size); + } + spin_unlock(sa_manager-lock); +} +#endif -- 1.7.5.4 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 12/20] drm/radeon: define new SA interface v3
Define the interface without modifying the allocation algorithm in any way. v2: rebase on top of fence new uint64 patch v3: add ring to debugfs output Signed-off-by: Jerome Glisse jgli...@redhat.com Signed-off-by: Christian König deathsim...@vodafone.de --- drivers/gpu/drm/radeon/radeon.h |1 + drivers/gpu/drm/radeon/radeon_gart.c |6 +- drivers/gpu/drm/radeon/radeon_object.h|5 +- drivers/gpu/drm/radeon/radeon_ring.c |8 ++-- drivers/gpu/drm/radeon/radeon_sa.c| 60 drivers/gpu/drm/radeon/radeon_semaphore.c |2 +- 6 files changed, 63 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 9374ab1..ada70d1 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -398,6 +398,7 @@ struct radeon_sa_bo { struct radeon_sa_manager*manager; unsignedsoffset; unsignedeoffset; + struct radeon_fence *fence; }; /* diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index c5789ef..53dba8e 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -326,7 +326,7 @@ static void radeon_vm_unbind_locked(struct radeon_device *rdev, rdev-vm_manager.use_bitmap = ~(1 vm-id); list_del_init(vm-list); vm-id = -1; - radeon_sa_bo_free(rdev, vm-sa_bo); + radeon_sa_bo_free(rdev, vm-sa_bo, NULL); vm-pt = NULL; list_for_each_entry(bo_va, vm-va, vm_list) { @@ -395,7 +395,7 @@ int radeon_vm_bind(struct radeon_device *rdev, struct radeon_vm *vm) retry: r = radeon_sa_bo_new(rdev, rdev-vm_manager.sa_manager, vm-sa_bo, RADEON_GPU_PAGE_ALIGN(vm-last_pfn * 8), -RADEON_GPU_PAGE_SIZE); +RADEON_GPU_PAGE_SIZE, false); if (r) { if (list_empty(rdev-vm_manager.lru_vm)) { return r; @@ -426,7 +426,7 @@ retry_id: /* do hw bind */ r = rdev-vm_manager.funcs-bind(rdev, vm, id); if (r) { - radeon_sa_bo_free(rdev, vm-sa_bo); + radeon_sa_bo_free(rdev, vm-sa_bo, NULL); return r; } rdev-vm_manager.use_bitmap |= 1 id; diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h index 4fc7f07..befec7d 100644 --- a/drivers/gpu/drm/radeon/radeon_object.h +++ b/drivers/gpu/drm/radeon/radeon_object.h @@ -169,9 +169,10 @@ extern int radeon_sa_bo_manager_suspend(struct radeon_device *rdev, extern int radeon_sa_bo_new(struct radeon_device *rdev, struct radeon_sa_manager *sa_manager, struct radeon_sa_bo **sa_bo, - unsigned size, unsigned align); + unsigned size, unsigned align, bool block); extern void radeon_sa_bo_free(struct radeon_device *rdev, - struct radeon_sa_bo **sa_bo); + struct radeon_sa_bo **sa_bo, + struct radeon_fence *fence); #if defined(CONFIG_DEBUG_FS) extern void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager, struct seq_file *m); diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 45adb37..1748d93 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -85,7 +85,7 @@ bool radeon_ib_try_free(struct radeon_device *rdev, struct radeon_ib *ib) if (ib-fence ib-fence-seq RADEON_FENCE_NOTEMITED_SEQ) { if (radeon_fence_signaled(ib-fence)) { radeon_fence_unref(ib-fence); - radeon_sa_bo_free(rdev, ib-sa_bo); + radeon_sa_bo_free(rdev, ib-sa_bo, NULL); done = true; } } @@ -124,7 +124,7 @@ retry: if (rdev-ib_pool.ibs[idx].fence == NULL) { r = radeon_sa_bo_new(rdev, rdev-ib_pool.sa_manager, rdev-ib_pool.ibs[idx].sa_bo, -size, 256); +size, 256, false); if (!r) { *ib = rdev-ib_pool.ibs[idx]; (*ib)-ptr = radeon_sa_bo_cpu_addr((*ib)-sa_bo); @@ -173,7 +173,7 @@ void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib) } radeon_mutex_lock(rdev-ib_pool.mutex); if (tmp-fence tmp-fence-seq == RADEON_FENCE_NOTEMITED_SEQ) { - radeon_sa_bo_free(rdev, tmp-sa_bo); + radeon_sa_bo_free(rdev, tmp-sa_bo, NULL);
[PATCH 13/20] drm/radeon: use one wait queue for all rings add fence_wait_any v2
From: Jerome Glisse jgli...@redhat.com Use one wait queue for all rings. When one ring progress, other likely does to and we are not expecting to have a lot of waiter anyway. Also add a fence_wait_any that will wait until the first fence in the fence array (one fence per ring) is signaled. This allow to wait on all rings. v2: some minor cleanups and improvements. Signed-off-by: Christian König deathsim...@vodafone.de Signed-off-by: Jerome Glisse jgli...@redhat.com --- drivers/gpu/drm/radeon/radeon.h |5 +- drivers/gpu/drm/radeon/radeon_fence.c | 163 - 2 files changed, 162 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index ada70d1..37a7459 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -262,7 +262,6 @@ struct radeon_fence_driver { uint64_tseq; atomic64_t last_seq; unsigned long last_activity; - wait_queue_head_t queue; boolinitialized; }; @@ -286,6 +285,9 @@ bool radeon_fence_signaled(struct radeon_fence *fence); int radeon_fence_wait(struct radeon_fence *fence, bool interruptible); int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring); int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring); +int radeon_fence_wait_any(struct radeon_device *rdev, + struct radeon_fence **fences, + bool intr); struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence); void radeon_fence_unref(struct radeon_fence **fence); unsigned radeon_fence_count_emitted(struct radeon_device *rdev, int ring); @@ -1534,6 +1536,7 @@ struct radeon_device { struct radeon_scratch scratch; struct radeon_mman mman; struct radeon_fence_driver fence_drv[RADEON_NUM_RINGS]; + wait_queue_head_t fence_queue; struct radeon_semaphore_driver semaphore_drv; struct mutexring_lock; struct radeon_ring ring[RADEON_NUM_RINGS]; diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 8034b42..45d4e6e 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -222,11 +222,11 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq, trace_radeon_fence_wait_begin(rdev-ddev, seq); radeon_irq_kms_sw_irq_get(rdev, ring); if (intr) { - r = wait_event_interruptible_timeout(rdev-fence_drv[ring].queue, + r = wait_event_interruptible_timeout(rdev-fence_queue, (signaled = radeon_fence_seq_signaled(rdev, target_seq, ring)), timeout); } else { - r = wait_event_timeout(rdev-fence_drv[ring].queue, + r = wait_event_timeout(rdev-fence_queue, (signaled = radeon_fence_seq_signaled(rdev, target_seq, ring)), timeout); } @@ -300,6 +300,159 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr) return 0; } +bool radeon_fence_any_seq_signaled(struct radeon_device *rdev, u64 *seq) +{ + unsigned i; + + for (i = 0; i RADEON_NUM_RINGS; ++i) { + if (seq[i] radeon_fence_seq_signaled(rdev, seq[i], i)) { + return true; + } + } + return false; +} + +static int radeon_fence_wait_any_seq(struct radeon_device *rdev, +u64 *target_seq, bool intr) +{ + unsigned long timeout, last_activity, tmp; + unsigned i, ring = RADEON_NUM_RINGS; + bool signaled; + int r; + + for (i = 0, last_activity = 0; i RADEON_NUM_RINGS; ++i) { + if (!target_seq[i]) { + continue; + } + + /* use the most recent one as indicator */ + if (time_after(rdev-fence_drv[i].last_activity, last_activity)) { + last_activity = rdev-fence_drv[i].last_activity; + } + + /* For lockup detection just pick the lowest ring we are +* actively waiting for +*/ + if (i ring) { + ring = i; + } + } + + /* nothing to wait for ? */ + if (ring == RADEON_NUM_RINGS) { + return 0; + } + + while (!radeon_fence_any_seq_signaled(rdev, target_seq)) { + timeout = jiffies - RADEON_FENCE_JIFFIES_TIMEOUT; + if (time_after(last_activity, timeout)) { + /* the normal case, timeout is somewhere
[PATCH 15/20] drm/radeon: simplify semaphore handling v2
From: Jerome Glisse jgli...@redhat.com Directly use the suballocator to get small chunks of memory. It's equally fast and doesn't crash when we encounter a GPU reset. v2: rebased on new SA interface. Signed-off-by: Christian König deathsim...@vodafone.de Signed-off-by: Jerome Glisse jgli...@redhat.com --- drivers/gpu/drm/radeon/evergreen.c|1 - drivers/gpu/drm/radeon/ni.c |1 - drivers/gpu/drm/radeon/r600.c |1 - drivers/gpu/drm/radeon/radeon.h | 29 +- drivers/gpu/drm/radeon/radeon_device.c|2 - drivers/gpu/drm/radeon/radeon_fence.c |2 +- drivers/gpu/drm/radeon/radeon_semaphore.c | 137 + drivers/gpu/drm/radeon/radeon_test.c |4 +- drivers/gpu/drm/radeon/rv770.c|1 - drivers/gpu/drm/radeon/si.c |1 - 10 files changed, 30 insertions(+), 149 deletions(-) diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index ecc29bc..7e7ac3d 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -3550,7 +3550,6 @@ void evergreen_fini(struct radeon_device *rdev) evergreen_pcie_gart_fini(rdev); r600_vram_scratch_fini(rdev); radeon_gem_fini(rdev); - radeon_semaphore_driver_fini(rdev); radeon_fence_driver_fini(rdev); radeon_agp_fini(rdev); radeon_bo_fini(rdev); diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c index 9cd2657..107b217 100644 --- a/drivers/gpu/drm/radeon/ni.c +++ b/drivers/gpu/drm/radeon/ni.c @@ -1744,7 +1744,6 @@ void cayman_fini(struct radeon_device *rdev) cayman_pcie_gart_fini(rdev); r600_vram_scratch_fini(rdev); radeon_gem_fini(rdev); - radeon_semaphore_driver_fini(rdev); radeon_fence_driver_fini(rdev); radeon_bo_fini(rdev); radeon_atombios_fini(rdev); diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index 87a2333..0ae2d2d 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -2658,7 +2658,6 @@ void r600_fini(struct radeon_device *rdev) r600_vram_scratch_fini(rdev); radeon_agp_fini(rdev); radeon_gem_fini(rdev); - radeon_semaphore_driver_fini(rdev); radeon_fence_driver_fini(rdev); radeon_bo_fini(rdev); radeon_atombios_fini(rdev); diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index cc7f16a..45164e1 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -434,34 +434,13 @@ int radeon_mode_dumb_destroy(struct drm_file *file_priv, /* * Semaphores. */ -struct radeon_ring; - -#defineRADEON_SEMAPHORE_BO_SIZE256 - -struct radeon_semaphore_driver { - rwlock_tlock; - struct list_headbo; -}; - -struct radeon_semaphore_bo; - /* everything here is constant */ struct radeon_semaphore { - struct list_headlist; + struct radeon_sa_bo *sa_bo; + signed waiters; uint64_tgpu_addr; - uint32_t*cpu_ptr; - struct radeon_semaphore_bo *bo; }; -struct radeon_semaphore_bo { - struct list_headlist; - struct radeon_ib*ib; - struct list_headfree; - struct radeon_semaphore semaphores[RADEON_SEMAPHORE_BO_SIZE/8]; - unsignednused; -}; - -void radeon_semaphore_driver_fini(struct radeon_device *rdev); int radeon_semaphore_create(struct radeon_device *rdev, struct radeon_semaphore **semaphore); void radeon_semaphore_emit_signal(struct radeon_device *rdev, int ring, @@ -473,7 +452,8 @@ int radeon_semaphore_sync_rings(struct radeon_device *rdev, bool sync_to[RADEON_NUM_RINGS], int dst_ring); void radeon_semaphore_free(struct radeon_device *rdev, - struct radeon_semaphore *semaphore); + struct radeon_semaphore *semaphore, + struct radeon_fence *fence); /* * GART structures, functions helpers @@ -1540,7 +1520,6 @@ struct radeon_device { struct radeon_mman mman; struct radeon_fence_driver fence_drv[RADEON_NUM_RINGS]; wait_queue_head_t fence_queue; - struct radeon_semaphore_driver semaphore_drv; struct mutexring_lock; struct radeon_ring ring[RADEON_NUM_RINGS]; struct radeon_ib_pool ib_pool; diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index b827b2e..48876c1 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@
[PATCH 17/20] drm/radeon: immediately free ttm-move semaphore
We can now protected the semaphore ram by a fence, so free it immediately. Signed-off-by: Christian König deathsim...@vodafone.de --- drivers/gpu/drm/radeon/radeon_ttm.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 5e3d54d..0f6aee8 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -223,6 +223,7 @@ static int radeon_move_blit(struct ttm_buffer_object *bo, struct radeon_device *rdev; uint64_t old_start, new_start; struct radeon_fence *fence, *old_fence; + struct radeon_semaphore *sem = NULL; int r; rdev = radeon_get_rdev(bo-bdev); @@ -272,15 +273,16 @@ static int radeon_move_blit(struct ttm_buffer_object *bo, bool sync_to_ring[RADEON_NUM_RINGS] = { }; sync_to_ring[old_fence-ring] = true; - r = radeon_semaphore_create(rdev, fence-semaphore); + r = radeon_semaphore_create(rdev, sem); if (r) { radeon_fence_unref(fence); return r; } - r = radeon_semaphore_sync_rings(rdev, fence-semaphore, + r = radeon_semaphore_sync_rings(rdev, sem, sync_to_ring, fence-ring); if (r) { + radeon_semaphore_free(rdev, sem, NULL); radeon_fence_unref(fence); return r; } @@ -292,6 +294,7 @@ static int radeon_move_blit(struct ttm_buffer_object *bo, /* FIXME: handle copy error */ r = ttm_bo_move_accel_cleanup(bo, (void *)fence, NULL, evict, no_wait_reserve, no_wait_gpu, new_mem); + radeon_semaphore_free(rdev, sem, fence); radeon_fence_unref(fence); return r; } -- 1.7.5.4 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 14/20] drm/radeon: multiple ring allocator v2
A startover with a new idea for a multiple ring allocator. Should perform as well as a normal ring allocator as long as only one ring does somthing, but falls back to a more complex algorithm if more complex things start to happen. We store the last allocated bo in last, we always try to allocate after the last allocated bo. Principle is that in a linear GPU ring progression was is after last is the oldest bo we allocated and thus the first one that should no longer be in use by the GPU. If it's not the case we skip over the bo after last to the closest done bo if such one exist. If none exist and we are not asked to block we report failure to allocate. If we are asked to block we wait on all the oldest fence of all rings. We just wait for any of those fence to complete. v2: We need to be able to let hole point to the list_head, otherwise try free will never free the first allocation of the list. Also stop calling radeon_fence_signalled more than necessary. Signed-off-by: Christian König deathsim...@vodafone.de Signed-off-by: Jerome Glisse jgli...@redhat.com --- drivers/gpu/drm/radeon/radeon.h |7 +- drivers/gpu/drm/radeon/radeon_ring.c | 19 +-- drivers/gpu/drm/radeon/radeon_sa.c | 292 +++--- 3 files changed, 210 insertions(+), 108 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 37a7459..cc7f16a 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -385,7 +385,9 @@ struct radeon_bo_list { struct radeon_sa_manager { spinlock_t lock; struct radeon_bo*bo; - struct list_headsa_bo; + struct list_head*hole; + struct list_headflist[RADEON_NUM_RINGS]; + struct list_headolist; unsignedsize; uint64_tgpu_addr; void*cpu_ptr; @@ -396,7 +398,8 @@ struct radeon_sa_bo; /* sub-allocation buffer */ struct radeon_sa_bo { - struct list_headlist; + struct list_headolist; + struct list_headflist; struct radeon_sa_manager*manager; unsignedsoffset; unsignedeoffset; diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 1748d93..e074ff5 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -204,25 +204,22 @@ int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib) int radeon_ib_pool_init(struct radeon_device *rdev) { - struct radeon_sa_manager tmp; int i, r; - r = radeon_sa_bo_manager_init(rdev, tmp, - RADEON_IB_POOL_SIZE*64*1024, - RADEON_GEM_DOMAIN_GTT); - if (r) { - return r; - } - radeon_mutex_lock(rdev-ib_pool.mutex); if (rdev-ib_pool.ready) { radeon_mutex_unlock(rdev-ib_pool.mutex); - radeon_sa_bo_manager_fini(rdev, tmp); return 0; } - rdev-ib_pool.sa_manager = tmp; - INIT_LIST_HEAD(rdev-ib_pool.sa_manager.sa_bo); + r = radeon_sa_bo_manager_init(rdev, rdev-ib_pool.sa_manager, + RADEON_IB_POOL_SIZE*64*1024, + RADEON_GEM_DOMAIN_GTT); + if (r) { + radeon_mutex_unlock(rdev-ib_pool.mutex); + return r; + } + for (i = 0; i RADEON_IB_POOL_SIZE; i++) { rdev-ib_pool.ibs[i].fence = NULL; rdev-ib_pool.ibs[i].idx = i; diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c index 90ee8ad..757a9d4 100644 --- a/drivers/gpu/drm/radeon/radeon_sa.c +++ b/drivers/gpu/drm/radeon/radeon_sa.c @@ -27,21 +27,42 @@ * Authors: *Jerome Glisse gli...@freedesktop.org */ +/* Algorithm: + * + * We store the last allocated bo in hole, we always try to allocate + * after the last allocated bo. Principle is that in a linear GPU ring + * progression was is after last is the oldest bo we allocated and thus + * the first one that should no longer be in use by the GPU. + * + * If it's not the case we skip over the bo after last to the closest + * done bo if such one exist. If none exist and we are not asked to + * block we report failure to allocate. + * + * If we are asked to block we wait on all the oldest fence of all + * rings. We just wait for any of those fence to complete. + */ #include drmP.h #include drm.h #include radeon.h +static void radeon_sa_bo_remove_locked(struct radeon_sa_bo *sa_bo); +static void radeon_sa_bo_try_free(struct radeon_sa_manager *sa_manager); + int radeon_sa_bo_manager_init(struct radeon_device *rdev, struct radeon_sa_manager *sa_manager,
[PATCH 16/20] drm/radeon: rip out the ib pool
From: Jerome Glisse jgli...@redhat.com It isn't necessary any more and the suballocator seems to perform even better. Signed-off-by: Christian König deathsim...@vodafone.de Signed-off-by: Jerome Glisse jgli...@redhat.com --- drivers/gpu/drm/radeon/radeon.h | 17 +-- drivers/gpu/drm/radeon/radeon_device.c|1 - drivers/gpu/drm/radeon/radeon_gart.c | 12 +- drivers/gpu/drm/radeon/radeon_ring.c | 241 - drivers/gpu/drm/radeon/radeon_semaphore.c |2 +- 5 files changed, 71 insertions(+), 202 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 45164e1..6170307 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -625,7 +625,6 @@ void radeon_irq_kms_pflip_irq_put(struct radeon_device *rdev, int crtc); struct radeon_ib { struct radeon_sa_bo *sa_bo; - unsignedidx; uint32_tlength_dw; uint64_tgpu_addr; uint32_t*ptr; @@ -634,18 +633,6 @@ struct radeon_ib { boolis_const_ib; }; -/* - * locking - - * mutex protects scheduled_ibs, ready, alloc_bm - */ -struct radeon_ib_pool { - struct radeon_mutex mutex; - struct radeon_sa_managersa_manager; - struct radeon_ibibs[RADEON_IB_POOL_SIZE]; - boolready; - unsignedhead_id; -}; - struct radeon_ring { struct radeon_bo*ring_obj; volatile uint32_t *ring; @@ -787,7 +774,6 @@ struct si_rlc { int radeon_ib_get(struct radeon_device *rdev, int ring, struct radeon_ib **ib, unsigned size); void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib); -bool radeon_ib_try_free(struct radeon_device *rdev, struct radeon_ib *ib); int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib); int radeon_ib_pool_init(struct radeon_device *rdev); void radeon_ib_pool_fini(struct radeon_device *rdev); @@ -1522,7 +1508,8 @@ struct radeon_device { wait_queue_head_t fence_queue; struct mutexring_lock; struct radeon_ring ring[RADEON_NUM_RINGS]; - struct radeon_ib_pool ib_pool; + boolib_pool_ready; + struct radeon_sa_managerring_tmp_bo; struct radeon_irq irq; struct radeon_asic *asic; struct radeon_gem gem; diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index 48876c1..e1bc7e9 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -724,7 +724,6 @@ int radeon_device_init(struct radeon_device *rdev, /* mutex initialization are all done here so we * can recall function without having locking issues */ radeon_mutex_init(rdev-cs_mutex); - radeon_mutex_init(rdev-ib_pool.mutex); mutex_init(rdev-ring_lock); mutex_init(rdev-dc_hw_i2c_mutex); if (rdev-family = CHIP_R600) diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c index 53dba8e..8e9ef34 100644 --- a/drivers/gpu/drm/radeon/radeon_gart.c +++ b/drivers/gpu/drm/radeon/radeon_gart.c @@ -432,8 +432,8 @@ retry_id: rdev-vm_manager.use_bitmap |= 1 id; vm-id = id; list_add_tail(vm-list, rdev-vm_manager.lru_vm); - return radeon_vm_bo_update_pte(rdev, vm, rdev-ib_pool.sa_manager.bo, - rdev-ib_pool.sa_manager.bo-tbo.mem); + return radeon_vm_bo_update_pte(rdev, vm, rdev-ring_tmp_bo.bo, + rdev-ring_tmp_bo.bo-tbo.mem); } /* object have to be reserved */ @@ -631,7 +631,7 @@ int radeon_vm_init(struct radeon_device *rdev, struct radeon_vm *vm) /* map the ib pool buffer at 0 in virtual address space, set * read only */ - r = radeon_vm_bo_add(rdev, vm, rdev-ib_pool.sa_manager.bo, 0, + r = radeon_vm_bo_add(rdev, vm, rdev-ring_tmp_bo.bo, 0, RADEON_VM_PAGE_READABLE | RADEON_VM_PAGE_SNOOPED); return r; } @@ -648,12 +648,12 @@ void radeon_vm_fini(struct radeon_device *rdev, struct radeon_vm *vm) radeon_mutex_unlock(rdev-cs_mutex); /* remove all bo */ - r = radeon_bo_reserve(rdev-ib_pool.sa_manager.bo, false); + r = radeon_bo_reserve(rdev-ring_tmp_bo.bo, false); if (!r) { - bo_va = radeon_bo_va(rdev-ib_pool.sa_manager.bo, vm); + bo_va = radeon_bo_va(rdev-ring_tmp_bo.bo, vm); list_del_init(bo_va-bo_list); list_del_init(bo_va-vm_list); - radeon_bo_unreserve(rdev-ib_pool.sa_manager.bo); +
[PULL] drm-intel-next manual merge
Hi Dave, As discussed on ircmail, here's the pull request for the manual merge to unconfuse git about the changes in intel_display.c. Note that I've manually frobbed the shortlog to exclude all the changes merge through Linus' tree. Yours, Daniel The following changes since commit 5bc69bf9aeb73547cad8e1ce683a103fe9728282: Merge tag 'drm-intel-next-2012-04-23' of git://people.freedesktop.org/~danvet/drm-intel into drm-core-next (2012-05-02 09:22:29 +0100) are available in the git repository at: git://people.freedesktop.org/~danvet/drm-intel for-airlied for you to fetch changes up to dc257cf154be708ecc47b8b89c12ad8cd2cc35e4: Merge tag 'v3.4-rc6' into drm-intel-next (2012-05-07 14:02:14 +0200) Daniel Vetter (1): Merge tag 'v3.4-rc6' into drm-intel-next -- Daniel Vetter Mail: dan...@ffwll.ch Mobile: +41 (0)79 365 57 48 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC 05/13] v4l: vb2-dma-contig: add support for DMABUF exporting
Hi Tomasz, Sorry for the late reply, this one slipped through the cracks. On Thursday 19 April 2012 12:42:12 Tomasz Stanislawski wrote: On 04/17/2012 04:08 PM, Laurent Pinchart wrote: On Tuesday 10 April 2012 15:10:39 Tomasz Stanislawski wrote: This patch adds support for exporting a dma-contig buffer using DMABUF interface. Signed-off-by: Tomasz Stanislawski t.stanisl...@samsung.com Signed-off-by: Kyungmin Park kyungmin.p...@samsung.com --- [snip] +static struct sg_table *vb2_dc_dmabuf_ops_map( + struct dma_buf_attachment *db_attach, enum dma_data_direction dir) +{ + struct dma_buf *dbuf = db_attach-dmabuf; + struct vb2_dc_buf *buf = dbuf-priv; + struct vb2_dc_attachment *attach = db_attach-priv; + struct sg_table *sgt; + struct scatterlist *rd, *wr; + int i, ret; You can make i an unsigned int :-) Right.. splitting declaration may be also a good idea :) + + /* return previously mapped sg table */ + if (attach) + return attach-sgt; This effectively keeps the mapping around as long as the attachment exists. We don't try to swap out buffers in V4L2 as is done in DRM at the moment, so it might not be too much of an issue, but the behaviour of the implementation will change if we later decide to map/unmap the buffers in the map/unmap handlers. Do you think that could be a problem ? I don't that it is a problem. If an importer calls dma_map_sg then caching sgt on an exporter side reduces a cost of an allocating and an initialization of sgt. + + attach = kzalloc(sizeof *attach, GFP_KERNEL); + if (!attach) + return ERR_PTR(-ENOMEM); Why don't you allocate the vb2_dc_attachment here instead of vb2_dc_dmabuf_ops_attach() ? Good point. The attachment could be allocated at vb2_dc_attachment but all its fields would be uninitialized. I mean an empty sgt and an undefined dma direction. I decided to allocate the attachment in vb2_dc_dmabuf_ops_map because only than all information needed to create a valid attachment object are available. The other solution might be the allocation at vb2_dc_attachment. The field dir would be set to DMA_NONE. If this filed is equal to DMA_NONE at vb2_dc_dmabuf_ops_map then sgt is allocated and mapped and direction field is updated. If value is not DMA_NONE then the sgt is reused. Do you think that it is a good idea? I think I would prefer that. It sounds more logical to allocate the attachment in the attach operation handler. + sgt = attach-sgt; + attach-dir = dir; + + /* copying the buf-base_sgt to attachment */ I would add an explanation regarding why you need to copy the SG list. Something like. Copy the buf-base_sgt scatter list to the attachment, as we can't map the same scatter list to multiple devices at the same time. ok + ret = sg_alloc_table(sgt, buf-sgt_base-orig_nents, GFP_KERNEL); + if (ret) { + kfree(attach); + return ERR_PTR(-ENOMEM); + } + + rd = buf-sgt_base-sgl; + wr = sgt-sgl; + for (i = 0; i sgt-orig_nents; ++i) { + sg_set_page(wr, sg_page(rd), rd-length, rd-offset); + rd = sg_next(rd); + wr = sg_next(wr); + } + /* mapping new sglist to the client */ + ret = dma_map_sg(db_attach-dev, sgt-sgl, sgt-orig_nents, dir); + if (ret = 0) { + printk(KERN_ERR failed to map scatterlist\n); + sg_free_table(sgt); + kfree(attach); + return ERR_PTR(-EIO); + } + + db_attach-priv = attach; + + return sgt; +} + +static void vb2_dc_dmabuf_ops_unmap(struct dma_buf_attachment *db_attach, + struct sg_table *sgt, enum dma_data_direction dir) +{ + /* nothing to be done here */ +} + +static void vb2_dc_dmabuf_ops_release(struct dma_buf *dbuf) +{ + /* drop reference obtained in vb2_dc_get_dmabuf */ + vb2_dc_put(dbuf-priv); Shouldn't you set vb2_dc_buf::dma_buf to NULL here ? Otherwise the next vb2_dc_get_dmabuf() call will return a DMABUF object that has been freed. No. The buffer object is destroyed at vb2_dc_put when reference count drops to 0. It happens could happen after only REQBUF(count=0) or on last close(). The DMABUF object is created only for MMAP buffers. The DMABUF object is based only on results of dma_alloc_coherent and dma_get_pages (or its future equivalent). Therefore the DMABUF object is valid as long as the buffer is valid. OK. Notice that dmabuf object could be created in vb2_dc_alloc. I moved it to vb2_dc_get_dmabuf to avoid a creation of an object that may not be used. +} + +static struct dma_buf_ops vb2_dc_dmabuf_ops = { + .attach = vb2_dc_dmabuf_ops_attach, + .detach = vb2_dc_dmabuf_ops_detach, + .map_dma_buf = vb2_dc_dmabuf_ops_map, + .unmap_dma_buf = vb2_dc_dmabuf_ops_unmap, + .release = vb2_dc_dmabuf_ops_release, +}; + +static struct dma_buf *vb2_dc_get_dmabuf(void
[Bug 49567] No rule to make target libradeon.a, needed by libr600.a.
https://bugs.freedesktop.org/show_bug.cgi?id=49567 --- Comment #2 from Tom Stellard tstel...@gmail.com 2012-05-07 07:18:45 PDT --- If you re-run autogen.sh and configure does that fix the problem? -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: SA and other Patches.
On Mon, May 7, 2012 at 7:42 AM, Christian König deathsim...@vodafone.de wrote: Hi Jerome everybody on the list, this gathers together every patch we developed over the last week or so and which is not already in drm-next. I've run quite some tests with them yesterday and today and as far as I can see hammered out every known bug. For the SA allocator I reverted to tracking the hole pointer instead of just the last allocation, cause otherwise we will never release the first allocation on the list. Glxgears now even keeps happily running if I deadlock on the not GFX rings on purpose. Now we will release the first entry if we use the last allocate ptr i believe it's cleaner to use the last ptr. Please take a second look at them and if nobody objects any more we should commit them to drm-next. Cheers, Christian. Cheers, Jerome ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 04/20] drm/radeon: convert fence to uint64_t v4
On Mon, May 7, 2012 at 7:42 AM, Christian König deathsim...@vodafone.de wrote: From: Jerome Glisse jgli...@redhat.com This convert fence to use uint64_t sequence number intention is to use the fact that uin64_t is big enough that we don't need to care about wrap around. Tested with and without writeback using 0xF000 as initial fence sequence and thus allowing to test the wrap around from 32bits to 64bits. v2: Add comment about possible race btw CPU GPU, add comment stressing that we need 2 dword aligned for R600_WB_EVENT_OFFSET Read fence sequenc in reverse order of GPU write them so we mitigate the race btw CPU and GPU. v3: Drop the need for ring to emit the 64bits fence, and just have each ring emit the lower 32bits of the fence sequence. We handle the wrap over 32bits in fence_process. v4: Just a small optimization: Don't reread the last_seq value if loop restarts, since we already know its value anyway. Also start at zero not one for seq value and use pre instead of post increment in emmit, otherwise wait_empty will deadlock. Why changing that v3 was already good no deadlock. I started at 1 especialy for that, a signaled fence is set to 0 so it always compare as signaled. Just using preincrement is exactly like starting at one. I don't see the need for this change but if it makes you happy. Cheers, Jerome Signed-off-by: Jerome Glisse jgli...@redhat.com Signed-off-by: Christian König deathsim...@vodafone.de --- drivers/gpu/drm/radeon/radeon.h | 39 ++- drivers/gpu/drm/radeon/radeon_fence.c | 116 +++-- drivers/gpu/drm/radeon/radeon_ring.c | 9 ++- 3 files changed, 107 insertions(+), 57 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index e99ea81..cdf46bc 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -100,28 +100,32 @@ extern int radeon_lockup_timeout; * Copy from radeon_drv.h so we don't have to include both and have conflicting * symbol; */ -#define RADEON_MAX_USEC_TIMEOUT 10 /* 100 ms */ -#define RADEON_FENCE_JIFFIES_TIMEOUT (HZ / 2) +#define RADEON_MAX_USEC_TIMEOUT 10 /* 100 ms */ +#define RADEON_FENCE_JIFFIES_TIMEOUT (HZ / 2) /* RADEON_IB_POOL_SIZE must be a power of 2 */ -#define RADEON_IB_POOL_SIZE 16 -#define RADEON_DEBUGFS_MAX_COMPONENTS 32 -#define RADEONFB_CONN_LIMIT 4 -#define RADEON_BIOS_NUM_SCRATCH 8 +#define RADEON_IB_POOL_SIZE 16 +#define RADEON_DEBUGFS_MAX_COMPONENTS 32 +#define RADEONFB_CONN_LIMIT 4 +#define RADEON_BIOS_NUM_SCRATCH 8 /* max number of rings */ -#define RADEON_NUM_RINGS 3 +#define RADEON_NUM_RINGS 3 + +/* fence seq are set to this number when signaled */ +#define RADEON_FENCE_SIGNALED_SEQ 0LL +#define RADEON_FENCE_NOTEMITED_SEQ (~0LL) /* internal ring indices */ /* r1xx+ has gfx CP ring */ -#define RADEON_RING_TYPE_GFX_INDEX 0 +#define RADEON_RING_TYPE_GFX_INDEX 0 /* cayman has 2 compute CP rings */ -#define CAYMAN_RING_TYPE_CP1_INDEX 1 -#define CAYMAN_RING_TYPE_CP2_INDEX 2 +#define CAYMAN_RING_TYPE_CP1_INDEX 1 +#define CAYMAN_RING_TYPE_CP2_INDEX 2 /* hardcode those limit for now */ -#define RADEON_VA_RESERVED_SIZE (8 20) -#define RADEON_IB_VM_MAX_SIZE (64 10) +#define RADEON_VA_RESERVED_SIZE (8 20) +#define RADEON_IB_VM_MAX_SIZE (64 10) /* * Errata workarounds. @@ -254,8 +258,9 @@ struct radeon_fence_driver { uint32_t scratch_reg; uint64_t gpu_addr; volatile uint32_t *cpu_addr; - atomic_t seq; - uint32_t last_seq; + /* seq is protected by ring emission lock */ + uint64_t seq; + atomic64_t last_seq; unsigned long last_activity; wait_queue_head_t queue; struct list_head emitted; @@ -268,11 +273,9 @@ struct radeon_fence { struct kref kref; struct list_head list; /* protected by radeon_fence.lock */ - uint32_t seq; - bool emitted; - bool signaled; + uint64_t seq; /* RB, DMA, etc. */ - int ring; + unsigned ring; struct radeon_semaphore *semaphore; }; diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index
Re: [PATCHv5 08/13] v4l: vb2-dma-contig: add support for scatterlist in userptr mode
Hi Subash, Could you provide a detailed description of a test case that causes a failure of vb2_dc_pages_to_sgt? Regards, Tomasz Stanislawski ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 14/20] drm/radeon: multiple ring allocator v2
On Mon, May 7, 2012 at 7:42 AM, Christian König deathsim...@vodafone.de wrote: A startover with a new idea for a multiple ring allocator. Should perform as well as a normal ring allocator as long as only one ring does somthing, but falls back to a more complex algorithm if more complex things start to happen. We store the last allocated bo in last, we always try to allocate after the last allocated bo. Principle is that in a linear GPU ring progression was is after last is the oldest bo we allocated and thus the first one that should no longer be in use by the GPU. If it's not the case we skip over the bo after last to the closest done bo if such one exist. If none exist and we are not asked to block we report failure to allocate. If we are asked to block we wait on all the oldest fence of all rings. We just wait for any of those fence to complete. v2: We need to be able to let hole point to the list_head, otherwise try free will never free the first allocation of the list. Also stop calling radeon_fence_signalled more than necessary. Signed-off-by: Christian König deathsim...@vodafone.de Signed-off-by: Jerome Glisse jgli...@redhat.com This one is NAK please use my patch. Yes in my patch we never try to free anything if there is only on sa_bo in the list if you really care about this it's a one line change: http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v2.patch Your patch here can enter in infinite loop and never return holding the lock. See below. Cheers, Jerome --- drivers/gpu/drm/radeon/radeon.h | 7 +- drivers/gpu/drm/radeon/radeon_ring.c | 19 +-- drivers/gpu/drm/radeon/radeon_sa.c | 292 +++--- 3 files changed, 210 insertions(+), 108 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 37a7459..cc7f16a 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -385,7 +385,9 @@ struct radeon_bo_list { struct radeon_sa_manager { spinlock_t lock; struct radeon_bo *bo; - struct list_head sa_bo; + struct list_head *hole; + struct list_head flist[RADEON_NUM_RINGS]; + struct list_head olist; unsigned size; uint64_t gpu_addr; void *cpu_ptr; @@ -396,7 +398,8 @@ struct radeon_sa_bo; /* sub-allocation buffer */ struct radeon_sa_bo { - struct list_head list; + struct list_head olist; + struct list_head flist; struct radeon_sa_manager *manager; unsigned soffset; unsigned eoffset; diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 1748d93..e074ff5 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -204,25 +204,22 @@ int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib) int radeon_ib_pool_init(struct radeon_device *rdev) { - struct radeon_sa_manager tmp; int i, r; - r = radeon_sa_bo_manager_init(rdev, tmp, - RADEON_IB_POOL_SIZE*64*1024, - RADEON_GEM_DOMAIN_GTT); - if (r) { - return r; - } - radeon_mutex_lock(rdev-ib_pool.mutex); if (rdev-ib_pool.ready) { radeon_mutex_unlock(rdev-ib_pool.mutex); - radeon_sa_bo_manager_fini(rdev, tmp); return 0; } - rdev-ib_pool.sa_manager = tmp; - INIT_LIST_HEAD(rdev-ib_pool.sa_manager.sa_bo); + r = radeon_sa_bo_manager_init(rdev, rdev-ib_pool.sa_manager, + RADEON_IB_POOL_SIZE*64*1024, + RADEON_GEM_DOMAIN_GTT); + if (r) { + radeon_mutex_unlock(rdev-ib_pool.mutex); + return r; + } + for (i = 0; i RADEON_IB_POOL_SIZE; i++) { rdev-ib_pool.ibs[i].fence = NULL; rdev-ib_pool.ibs[i].idx = i; diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c index 90ee8ad..757a9d4 100644 --- a/drivers/gpu/drm/radeon/radeon_sa.c +++ b/drivers/gpu/drm/radeon/radeon_sa.c @@ -27,21 +27,42 @@ * Authors: * Jerome Glisse gli...@freedesktop.org */ +/* Algorithm: + * + * We store the last allocated bo in hole, we always try to allocate + * after the last allocated bo. Principle is that in a linear GPU ring + * progression was is after last is the oldest bo we allocated and thus + * the first one that should no longer be in use by the GPU. + * + * If it's not the case we skip over the bo after last to the closest + * done bo if such one exist. If none exist
Re: [PATCH 14/20] drm/radeon: multiple ring allocator v2
On 07.05.2012 17:23, Jerome Glisse wrote: On Mon, May 7, 2012 at 7:42 AM, Christian Königdeathsim...@vodafone.de wrote: A startover with a new idea for a multiple ring allocator. Should perform as well as a normal ring allocator as long as only one ring does somthing, but falls back to a more complex algorithm if more complex things start to happen. We store the last allocated bo in last, we always try to allocate after the last allocated bo. Principle is that in a linear GPU ring progression was is after last is the oldest bo we allocated and thus the first one that should no longer be in use by the GPU. If it's not the case we skip over the bo after last to the closest done bo if such one exist. If none exist and we are not asked to block we report failure to allocate. If we are asked to block we wait on all the oldest fence of all rings. We just wait for any of those fence to complete. v2: We need to be able to let hole point to the list_head, otherwise try free will never free the first allocation of the list. Also stop calling radeon_fence_signalled more than necessary. Signed-off-by: Christian Königdeathsim...@vodafone.de Signed-off-by: Jerome Glissejgli...@redhat.com This one is NAK please use my patch. Yes in my patch we never try to free anything if there is only on sa_bo in the list if you really care about this it's a one line change: http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v2.patch Nope that won't work correctly, last is pointing to the last allocation and that's the most unlikely to be freed at this time. Also in this version (like in the one before) radeon_sa_bo_next_hole lets hole point to the prev of the found sa_bo without checking if this isn't the lists head. That might cause a crash if an to be freed allocation is the first one in the buffer. What radeon_sa_bo_try_free would need to do to get your approach working is to loop over the end of the buffer and also try to free at the beginning, but saying that keeping the last allocation results in a whole bunch of extra cases and ifs, while just keeping a pointer to the hole (e.g. where the next allocation is most likely to succeed) simplifies the code quite a bit (but I agree that on the down side it makes it harder to understand). Your patch here can enter in infinite loop and never return holding the lock. See below. [SNIP] + } while (radeon_sa_bo_next_hole(sa_manager, fences)); Here you can infinite loop, in the case there is a bunch of hole in the allocator but none of them allow to full fill the allocation. radeon_sa_bo_next_hole will keep returning true looping over and over on all the all. That's why i only restrict my patch to 2 hole skeeping and then fails the allocation or try to wait. I believe sadly we need an heuristic and 2 hole skeeping at most sounded like a good one. Nope, that can't be an infinite loop, cause radeon_sa_bo_next_hole in conjunction with radeon_sa_bo_try_free are eating up the opportunities for holes. Look again, it probably will never loop more than RADEON_NUM_RINGS + 1, with the exception for allocating in a complete scattered buffer, and even then it will never loop more often than halve the number of current allocations (and that is really really unlikely). Cheers, Christian. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 49484] invalid enum 0x500, invalid value 0x501
https://bugs.freedesktop.org/show_bug.cgi?id=49484 --- Comment #3 from Michal Suchanek hramr...@gmail.com 2012-05-07 09:55:15 PDT --- I get no mesa warnings, only warnings from wine about Mesa returning GL_INVALID* -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 49567] No rule to make target libradeon.a, needed by libr600.a.
https://bugs.freedesktop.org/show_bug.cgi?id=49567 --- Comment #3 from Mike Mestnik cheako+bugs_freedesktop_...@mikemestnik.net 2012-05-07 09:58:45 PDT --- Tom, The short of it: I'm already doing that. The long: I took a look at that script and it eventually just calls autoreconf -v --install my log clearly shows autoreconf -vfi being called. Also note that that script will call configure. Thanks! -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 49484] invalid enum 0x500, invalid value 0x501
https://bugs.freedesktop.org/show_bug.cgi?id=49484 --- Comment #4 from Michal Suchanek hramr...@gmail.com 2012-05-07 10:02:05 PDT --- invalid value: Breakpoint 1, _mesa_error (ctx=0xccba90, error=1281, fmtString=0x74706278 glTexImage%dD(internalFormat=%s)) at main/errors.c:996 996main/errors.c: No such file or directory. (gdb) bt full #0 _mesa_error (ctx=0xccba90, error=1281, fmtString=0x74706278 glTexImage%dD(internalFormat=%s)) at main/errors.c:996 do_output = 225 '\341' do_log = optimized out #1 0x745e2a84 in texture_error_check (border=0, depth=1, height=64, width=64, type=0, format=0, internalFormat=0, level=0, target=3553, dimensions=2, ctx=0xccba90) at main/teximage.c:1621 proxyTarget = optimized out err = optimized out indexFormat = 0 '\000' isProxy = optimized out sizeOK = 1 '\001' colorFormat = optimized out #2 teximage (ctx=0xccba90, dims=2, target=3553, level=0, internalFormat=0, width=64, height=64, depth=1, border=0, format=0, type=0, pixels=0x0) at main/teximage.c:2501 error = 1 '\001' unpack_no_border = {Alignment = -7152, RowLength = 32767, SkipPixels = 9180912, SkipRows = 0, ImageHeight = -8144, SkipImages = 32767, SwapBytes = 45 '-', LsbFirst = 17 '\021', Invert = 90 'Z', BufferObj = 0x7fffe410} unpack = 0xcd22e0 #3 0x745e2fc4 in _mesa_TexImage2D (target=optimized out, level=optimized out, internalFormat=optimized out, width=optimized out, height=optimized out, border=optimized out, format=0, type=0, pixels=0x0) at main/teximage.c:2639 No locals. #4 0x00480145 in ?? () No symbol table info available. #5 0x004bbfd6 in ?? () No symbol table info available. #6 0x00440687 in ?? () No symbol table info available. #7 0x0043b985 in ?? () No symbol table info available. #8 0x0043c092 in ?? () No symbol table info available. #9 0x7696eead in __libc_start_main (main=optimized out, argc=optimized out, ubp_av=optimized out, init=optimized out, fini=optimized out, rtld_fini=optimized out, stack_end=0x7fffe408) at libc-start.c:228 result = optimized out unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0, 738182590451561014, 4428032, 140737488348176, 0, 0, -738182590032367050, -738203265834012106}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x5ddb20, 0x7fffe418}, data = {prev = 0x0, cleanup = 0x0, canceltype = 6150944}}} not_first_call = optimized out #10 0x00439129 in _start () No symbol table info available. (gdb) c Continuing. 37923 glTexImage2D(target = GL_TEXTURE_2D, level = 0, internalformat = GL_ZERO, width = 64, height = 64, border = 0, format = GL_ZERO, type = GL_ZERO, pixels = NULL) 37923: warning: glGetError(glTexImage2D) = GL_INVALID_VALUE -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 49484] invalid enum 0x500, invalid value 0x501
https://bugs.freedesktop.org/show_bug.cgi?id=49484 --- Comment #5 from Henri Verbeet hverb...@gmail.com 2012-05-07 10:37:29 PDT --- That generally happens when an application tries to use a (D3D) format (e.g. DXT/s3tc) even though it's not available. A WINEDEBUG=+d3d,+d3d_surface log should show which format, although typically it's either s3tc or one of the floating point formats. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 49567] No rule to make target libradeon.a, needed by libr600.a.
https://bugs.freedesktop.org/show_bug.cgi?id=49567 --- Comment #4 from Tom Stellard tstel...@gmail.com 2012-05-07 10:43:27 PDT --- Created attachment 61159 -- https://bugs.freedesktop.org/attachment.cgi?id=61159 Possible fix Does it build with this patch? -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 49484] invalid enum 0x500, invalid value 0x501
https://bugs.freedesktop.org/show_bug.cgi?id=49484 Michal Suchanek hramr...@gmail.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution||INVALID --- Comment #6 from Michal Suchanek hramr...@gmail.com 2012-05-07 11:03:54 PDT --- Indeed, it works with the texture compression library installed. I guess this is something that Wine should report. Unfortunately, the available messages are very unhelpful. Sorry about the noise. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [RFC][PATCH] drm/radeon/hdmi: define struct for AVI infoframe
On Mon, May 7, 2012 at 3:38 AM, Michel Dänzer mic...@daenzer.net wrote: On Son, 2012-05-06 at 18:29 +0200, Rafał Miłecki wrote: 2012/5/6 Dave Airlie airl...@gmail.com: On Sun, May 6, 2012 at 5:19 PM, Rafał Miłecki zaj...@gmail.com wrote: 2012/5/6 Rafał Miłecki zaj...@gmail.com: diff --git a/drivers/gpu/drm/radeon/r600_hdmi.c b/drivers/gpu/drm/radeon/r600_hdmi.c index c308432..b14c90a 100644 --- a/drivers/gpu/drm/radeon/r600_hdmi.c +++ b/drivers/gpu/drm/radeon/r600_hdmi.c @@ -134,78 +134,22 @@ static void r600_hdmi_infoframe_checksum(uint8_t packetType, } /* - * build a HDMI Video Info Frame + * Upload a HDMI AVI Infoframe */ -static void r600_hdmi_videoinfoframe( - struct drm_encoder *encoder, - enum r600_hdmi_color_format color_format, - int active_information_present, - uint8_t active_format_aspect_ratio, - uint8_t scan_information, - uint8_t colorimetry, - uint8_t ex_colorimetry, - uint8_t quantization, - int ITC, - uint8_t picture_aspect_ratio, - uint8_t video_format_identification, - uint8_t pixel_repetition, - uint8_t non_uniform_picture_scaling, - uint8_t bar_info_data_valid, - uint16_t top_bar, - uint16_t bottom_bar, - uint16_t left_bar, - uint16_t right_bar -) In case someone wonders about the reason: I think it's really ugly to have a function taking 18 arguments, 17 of them related to the infoframe. It makes much more sense for me to use struct for that. While working on that I though it's reasonable to prepare nice bitfield __packed struct ready-to-be-written to the GPU registers. won't this screw up on other endian machines? Hm, maybe it can. Is there some easy to handle it correctly? Some trick like __le8 foo: 3 __le8 bar: 1 maybe? Not really. The memory layout of bitfields is basically completely up to the C implementation, so IMHO they're just inadequate for describing fixed memory layouts. Yes i agree please stay away from bitfields, i know it looks cool but bitshift is cool too. Cheers, Jerome ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2 3/4] drm/exynos: added userptr feature.
On Sat, May 5, 2012 at 6:22 AM, Dave Airlie airl...@gmail.com wrote: On Sat, May 5, 2012 at 11:19 AM, daei...@gmail.com wrote: Hi Dave, 2012. 4. 25. 오후 7:15 Dave Airlie airl...@gmail.com 작성: On Tue, Apr 24, 2012 at 6:17 AM, Inki Dae inki@samsung.com wrote: this feature could be used to use memory region allocated by malloc() in user mode and mmaped memory region allocated by other memory allocators. userptr interface can identify memory type through vm_flags value and would get pages or page frame numbers to user space appropriately. Is there anything to stop the unpriviledged userspace driver locking all the RAM in the machine inside userptr? you mean that there is something that it can stop user space driver locking some memory region of RAM? and if any user space driver locked some region then anyone on user space can't access the region? could you please tell me about your concerns in more detail so that we can solve the issue? I guess you mean that any user level driver such as specific EGL library can allocate some memory region and also lock the region so that other user space applications can't access the region until rendering is completed by hw accelerator such as 2d/3d core or opposite case. actually, this feature has already been used by v4l2 so I didn't try to consider we could face with any problem with this and I've got a feeling maybe there is something I missed so I'd be happy for you or anyone give me any advices. Well v4l get to make their own bad design decisions. The problem is if an unprivledged users accessing the drm can lock all the pages it allocates into memory, by passing them to the kernel as userptrs., thus bypassing the swap and blocking all other users on the system. Dave. Beside that you are not locking the vma and afaik this means that the page backing the vma might change, yes you will still own the page you get but userspace might be reading/writing to different pages. The vma would need to be locked but than the userspace might unlock it in your back and you start right from the begining. Cheers, Jerome ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 14/20] drm/radeon: multiple ring allocator v2
On Mon, May 7, 2012 at 1:59 PM, Jerome Glisse j.gli...@gmail.com wrote: On 07.05.2012 17:23, Jerome Glisse wrote: On Mon, May 7, 2012 at 7:42 AM, Christian Königdeathsim...@vodafone.de wrote: A startover with a new idea for a multiple ring allocator. Should perform as well as a normal ring allocator as long as only one ring does somthing, but falls back to a more complex algorithm if more complex things start to happen. We store the last allocated bo in last, we always try to allocate after the last allocated bo. Principle is that in a linear GPU ring progression was is after last is the oldest bo we allocated and thus the first one that should no longer be in use by the GPU. If it's not the case we skip over the bo after last to the closest done bo if such one exist. If none exist and we are not asked to block we report failure to allocate. If we are asked to block we wait on all the oldest fence of all rings. We just wait for any of those fence to complete. v2: We need to be able to let hole point to the list_head, otherwise try free will never free the first allocation of the list. Also stop calling radeon_fence_signalled more than necessary. Signed-off-by: Christian Königdeathsim...@vodafone.de Signed-off-by: Jerome Glissejgli...@redhat.com This one is NAK please use my patch. Yes in my patch we never try to free anything if there is only on sa_bo in the list if you really care about this it's a one line change: http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v2.patch Nope that won't work correctly, last is pointing to the last allocation and that's the most unlikely to be freed at this time. Also in this version (like in the one before) radeon_sa_bo_next_hole lets hole point to the prev of the found sa_bo without checking if this isn't the lists head. That might cause a crash if an to be freed allocation is the first one in the buffer. What radeon_sa_bo_try_free would need to do to get your approach working is to loop over the end of the buffer and also try to free at the beginning, but saying that keeping the last allocation results in a whole bunch of extra cases and ifs, while just keeping a pointer to the hole (e.g. where the next allocation is most likely to succeed) simplifies the code quite a bit (but I agree that on the down side it makes it harder to understand). Your patch here can enter in infinite loop and never return holding the lock. See below. [SNIP] + } while (radeon_sa_bo_next_hole(sa_manager, fences)); Here you can infinite loop, in the case there is a bunch of hole in the allocator but none of them allow to full fill the allocation. radeon_sa_bo_next_hole will keep returning true looping over and over on all the all. That's why i only restrict my patch to 2 hole skeeping and then fails the allocation or try to wait. I believe sadly we need an heuristic and 2 hole skeeping at most sounded like a good one. Nope, that can't be an infinite loop, cause radeon_sa_bo_next_hole in conjunction with radeon_sa_bo_try_free are eating up the opportunities for holes. Look again, it probably will never loop more than RADEON_NUM_RINGS + 1, with the exception for allocating in a complete scattered buffer, and even then it will never loop more often than halve the number of current allocations (and that is really really unlikely). Cheers, Christian. I looked again and yes it can loop infinitly, think of hole you can never free ie radeon_sa_bo_try_free can't free anything. This situation can happen if you have several thread allocating sa bo at the same time while none of them are yet done with there sa_bo (ie none have call sa_bo_free yet). I updated a v3 that track oldest and fix all things you were pointing out above. http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v3.patch Cheers, Jerome Of course by tracking oldest it defeat the algo so updated patch : http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v3.patch Just fix the corner case of list of single entry. Cheers, Jerome ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 49567] No rule to make target libradeon.a, needed by libr600.a.
https://bugs.freedesktop.org/show_bug.cgi?id=49567 --- Comment #5 from Mike Mestnik cheako+bugs_freedesktop_...@mikemestnik.net 2012-05-07 11:59:39 PDT --- This patch worked for me and got me to the next undefined reference. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Enhancing EDID quirk functionality
On 05/03/2012 02:42 PM, Adam Jackson wrote: This looks good, thank you for taking it on. It was either that or give up on my big display, so ... you're welcome. I'd like to see documentation for the bit values of the quirks as well. And, ideally, this would also have some runtime API for manipulating the quirk list, so that way you can test new quirks without needing a reboot cycle. I agree that the bit values should be documented. I'm not sure where that documentation should go, however, since I can't find any documentation of the existing drm module parameters. Tell me where it should go, and I'll happily write the doc. I also agree that it would be nice to be able to manipulate the quirk list at runtime, and I did think about trying to enable that. I held off for a couple of reasons: 1) I'm a total noob at kernel code, so things like in-kernel locking, sysfs, memory management, etc., that would be required for a more dynamic API are all new to me. That said, I'm more that willing to give it a go, if I can get some guidance on those (and similar) topics. 2) I'm not sure how a runtime API should work. The simplest possibility is to just take a string, parse it, and overwrite the old extra quirk list with the new list. The downside to this is that all of the existing extra quirks need to be repeated to change a single quirk. To close the loop all the way on that I'd also want to be able to scrape the quirk list back out from that API, but that's not completely clean right now. Sound like a couple of sysfs files to me, one for the built-in quirks and one for the extra quirks -- maybe one quirk per line? See my comments about the sysfs API above. We're being a little cavalier with the quirk list as it stands because we don't differentiate among phy layers, and I can easily imagine a monitor that needs a quirk on DVI but where the same quirk on the same monitors' VGA would break it. I don't think this has caused problems yet, but. Now you're above my pay grade. What little I've read discovered about the way DisplayPort, HDMI, VGA, and DVI play together makes me think this is a nightmare best deferred, hopefully forever. InfoFrames are not valid for non-HDMI sinks, so yes, I'd call that a bug. That's pretty much what I figured. Where the EDID for DP-1 appears to be truncated: the extension field (second byte from the end) is 1 as you'd expect for an HDMI monitor, but there's no extension block. How big of a file do you get from /sys/class/drm/*/edid for that port? The EDID data in sysfs is 256 bytes, which I believe means that it does include the extension block. I just tried connecting an HDMI TV to my laptop, and I saw the same behavior -- 256-byte edid file in sysfs, but xrandr --verbose only shows 128 bytes. When I attach the same TV to my workstation with Intel HD 2000 graphics, xrandr --verbose shows all 256 bytes of EDID data. So it appears that the full data is being read by both systems, but the behavior of xrandr (or presumably whatever API xrandr uses to get the EDID data that it displays) differs between the two drivers. Fun. Thanks! -- Ian Pilcher arequip...@gmail.com If you're going to shift my paradigm ... at least buy me dinner first. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[Bug 42490] NUTMEG DP to VGA bridge not working
https://bugs.freedesktop.org/show_bug.cgi?id=42490 --- Comment #28 from Jerome Glisse gli...@freedesktop.org 2012-05-07 12:57:34 PDT --- Does people here have better luck with the patch mentioned previously: drm/radeon/kms: need to set up ss on DP bridges as well -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel