[PATCH] dma-buf: Update docs for SYNC ioctl
On 03/29/2016 06:47 AM, David Herrmann wrote: > Hi > > On Mon, Mar 28, 2016 at 9:42 PM, Tiago Vignatti > wrote: >> Do we have an agreement here after all? David? I need to know whether this >> fixup is okay to go cause I'll need to submit to Chrome OS then. > > Sure it is fine. The code is already there, we cannot change it. ah true. Only now that I've noticed it's already in Linus tree. Thanks anyway! Tiago
[PATCH] dma-buf: Update docs for SYNC ioctl
On 03/23/2016 12:42 PM, Chris Wilson wrote: > On Wed, Mar 23, 2016 at 04:32:59PM +0100, David Herrmann wrote: >> Hi >> >> On Wed, Mar 23, 2016 at 12:56 PM, Chris Wilson >> wrote: >>> On Wed, Mar 23, 2016 at 12:30:42PM +0100, David Herrmann wrote: My question was rather about why we do this? Semantics for EINTR are well defined, and with SA_RESTART (default on linux) user-space can ignore it. However, looping on EAGAIN is very uncommon, and it is not at all clear why it is needed? Returning an error to user-space makes sense if user-space has a reason to react to it. I fail to see how EAGAIN on a cache-flush/sync operation helps user-space at all? As someone without insight into the driver implementation, it is hard to tell why.. Any hints? >>> >>> The reason we return EAGAIN is to workaround a deadlock we face when >>> blocking on the GPU holding the struct_mutex (inside the client's >>> process), but the GPU is dead. As our locking is very, very coarse we >>> cannot restart the GPU without acquiring the struct_mutex being held by >>> the client so we wake the client up and tell them the resource they are >>> waiting on (the flush of the object from the GPU into the CPU domain) is >>> temporarily unavailable. If they try to immediately wait upon the ioctl >>> again, they are blocked waiting for the reset to occur before they may >>> complete their flush. There are a few other possible deadlocks that are >>> also avoided with EAGAIN (again, the issue is more or less the lack of >>> fine grained locking). >> >> ...so you hijacked EAGAIN for all DRM ioctls just for a driver >> workaround? > > No, we utilized the fact that EAGAIN was already enshrined by libdrm as > the defacto mechanism for repeating the ioctl in order to repeat the > ioctl for a driver workaround. Do we have an agreement here after all? David? I need to know whether this fixup is okay to go cause I'll need to submit to Chrome OS then. Best Regards, Tiago
[PATCH] dma-buf: Update docs for SYNC ioctl
On 03/21/2016 04:51 AM, Daniel Vetter wrote: > Just a bit of wording polish plus mentioning that it can fail and must > be restarted. > > Requested by Sumit. > > v2: Fix them typos (Hans). > > Cc: Chris Wilson > Cc: Tiago Vignatti > Cc: Stéphane Marchesin > Cc: David Herrmann > Cc: Sumit Semwal > Cc: Daniel Vetter > CC: linux-media at vger.kernel.org > Cc: dri-devel at lists.freedesktop.org > Cc: linaro-mm-sig at lists.linaro.org > Cc: intel-gfx at lists.freedesktop.org > Cc: devel at driverdev.osuosl.org > Cc: Hans Verkuil > Acked-by: Sumit Semwal > Signed-off-by: Daniel Vetter Reviewed-by: Tiago Vignatti Best regards, Tiago > --- > Documentation/dma-buf-sharing.txt | 11 ++- > drivers/dma-buf/dma-buf.c | 2 +- > 2 files changed, 7 insertions(+), 6 deletions(-) > > diff --git a/Documentation/dma-buf-sharing.txt > b/Documentation/dma-buf-sharing.txt > index 32ac32e773e1..ca44c5820585 100644 > --- a/Documentation/dma-buf-sharing.txt > +++ b/Documentation/dma-buf-sharing.txt > @@ -352,7 +352,8 @@ Being able to mmap an export dma-buf buffer object has 2 > main use-cases: > > No special interfaces, userspace simply calls mmap on the dma-buf fd, > making > sure that the cache synchronization ioctl (DMA_BUF_IOCTL_SYNC) is > *always* > - used when the access happens. This is discussed next paragraphs. > + used when the access happens. Note that DMA_BUF_IOCTL_SYNC can fail with > + -EAGAIN or -EINTR, in which case it must be restarted. > > Some systems might need some sort of cache coherency management e.g. when > CPU and GPU domains are being accessed through dma-buf at the same time. > To > @@ -366,10 +367,10 @@ Being able to mmap an export dma-buf buffer object has > 2 main use-cases: > want (with the new data being consumed by the GPU or say scanout > device) >- munmap once you don't need the buffer any more > > -Therefore, for correctness and optimal performance, systems with the > memory > -cache shared by the GPU and CPU i.e. the "coherent" and also the > -"incoherent" are always required to use SYNC_START and SYNC_END before > and > -after, respectively, when accessing the mapped address. > +For correctness and optimal performance, it is always required to use > +SYNC_START and SYNC_END before and after, respectively, when accessing > the > +mapped address. Userspace cannot rely on coherent access, even when there > +are systems where it just works without calling these ioctls. > > 2. Supporting existing mmap interfaces in importers > > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c > index 774a60f4309a..4a2c07ee6677 100644 > --- a/drivers/dma-buf/dma-buf.c > +++ b/drivers/dma-buf/dma-buf.c > @@ -612,7 +612,7 @@ EXPORT_SYMBOL_GPL(dma_buf_begin_cpu_access); >* @dmabuf: [in]buffer to complete cpu access for. >* @direction: [in]length of range for cpu access. >* > - * This call must always succeed. > + * Can return negative error values, returns 0 on success. >*/ > int dma_buf_end_cpu_access(struct dma_buf *dmabuf, > enum dma_data_direction direction) >
[PATCH] dma-buf,drm,ion: Propagate error code from dma_buf_start_cpu_access()
On 03/18/2016 05:02 PM, Chris Wilson wrote: > Drivers, especially i915.ko, can fail during the initial migration of a > dma-buf for CPU access. However, the error code from the driver was not > being propagated back to ioctl and so userspace was blissfully ignorant > of the failure. Rendering corruption ensues. > > Whilst fixing the ioctl to return the error code from > dma_buf_start_cpu_access(), also do the same for > dma_buf_end_cpu_access(). For most drivers, dma_buf_end_cpu_access() > cannot fail. i915.ko however, as most drivers would, wants to avoid being > uninterruptible (as would be required to guarrantee no failure when > flushing the buffer to the device). As userspace already has to handle > errors from the SYNC_IOCTL, take advantage of this to be able to restart > the syscall across signals. > > This fixes a coherency issue for i915.ko as well as reducing the > uninterruptible hold upon its BKL, the struct_mutex. > > Fixes commit c11e391da2a8fe973c3c2398452000bed505851e > Author: Daniel Vetter > Date: Thu Feb 11 20:04:51 2016 -0200 > > dma-buf: Add ioctls to allow userspace to flush > > Testcase: igt/gem_concurrent_blit/*dmabuf*interruptible > Testcase: igt/prime_mmap_coherency/ioctl-errors > Signed-off-by: Chris Wilson > Cc: Tiago Vignatti > Cc: Stéphane Marchesin > Cc: David Herrmann > Cc: Sumit Semwal > Cc: Daniel Vetter > CC: linux-media at vger.kernel.org > Cc: dri-devel at lists.freedesktop.org > Cc: linaro-mm-sig at lists.linaro.org > Cc: intel-gfx at lists.freedesktop.org > Cc: devel at driverdev.osuosl.org Reviewed-by: Tiago Vignatti Best regards, Tiago
[PATCH v3] prime_mmap_coherency: Add return error tests for prime sync ioctl
On 03/18/2016 03:11 PM, Daniel Vetter wrote: > On Fri, Mar 18, 2016 at 03:08:56PM -0300, Tiago Vignatti wrote: >> This patch adds ioctl-errors subtest to be used for exercising prime sync >> ioctl >> errors. >> >> The subtest constantly interrupts via signals a function doing concurrent >> blit >> to stress out the right usage of prime_sync_*, making sure these ioctl errors >> are handled accordingly. Important to note that in case of failure (e.g. in a >> case where the ioctl wouldn't try again in a return error) this test does not >> reliably catch the problem with 100% of accuracy. >> >> v2: fix prime sync direction when reading mmap'ed file. >> v3: change the upper bound using time rather than loops >> >> Cc: Chris Wilson >> Signed-off-by: Tiago Vignatti > > I'm probably blind, but where is the reviewed kernel patch for this? If > it's somewhere hidden, please resubmit with all the whizzbang stuff needed > for merging added ;-) > > Thanks, Daniel You're not blind Daniel :) Chris will be sending the kernel side but regardless this igt test should be good to go even without the kernel patch. Tiago
[PATCH v3] prime_mmap_coherency: Add return error tests for prime sync ioctl
This patch adds ioctl-errors subtest to be used for exercising prime sync ioctl errors. The subtest constantly interrupts via signals a function doing concurrent blit to stress out the right usage of prime_sync_*, making sure these ioctl errors are handled accordingly. Important to note that in case of failure (e.g. in a case where the ioctl wouldn't try again in a return error) this test does not reliably catch the problem with 100% of accuracy. v2: fix prime sync direction when reading mmap'ed file. v3: change the upper bound using time rather than loops Cc: Chris Wilson Signed-off-by: Tiago Vignatti --- tests/prime_mmap_coherency.c | 89 1 file changed, 89 insertions(+) diff --git a/tests/prime_mmap_coherency.c b/tests/prime_mmap_coherency.c index 180d8a4..d2b2a4f 100644 --- a/tests/prime_mmap_coherency.c +++ b/tests/prime_mmap_coherency.c @@ -180,6 +180,90 @@ static void test_write_flush(bool expect_stale_cache) munmap(ptr_cpu, width * height); } +static void blit_and_cmp(void) +{ + drm_intel_bo *bo_1; + drm_intel_bo *bo_2; + uint32_t *ptr_cpu; + uint32_t *ptr2_cpu; + int dma_buf_fd, dma_buf2_fd, i; + int local_fd; + drm_intel_bufmgr *local_bufmgr; + struct intel_batchbuffer *local_batch; + + /* recreate process local variables */ + local_fd = drm_open_driver(DRIVER_INTEL); + local_bufmgr = drm_intel_bufmgr_gem_init(local_fd, 4096); + igt_assert(local_bufmgr); + + local_batch = intel_batchbuffer_alloc(local_bufmgr, intel_get_drm_devid(local_fd)); + igt_assert(local_batch); + + bo_1 = drm_intel_bo_alloc(local_bufmgr, "BO 1", width * height * 4, 4096); + dma_buf_fd = prime_handle_to_fd_for_mmap(local_fd, bo_1->handle); + igt_skip_on(errno == EINVAL); + + ptr_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE, + MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr_cpu != MAP_FAILED); + + bo_2 = drm_intel_bo_alloc(local_bufmgr, "BO 2", width * height * 4, 4096); + dma_buf2_fd = prime_handle_to_fd_for_mmap(local_fd, bo_2->handle); + + ptr2_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE, + MAP_SHARED, dma_buf2_fd, 0); + igt_assert(ptr2_cpu != MAP_FAILED); + + /* Fill up BO 1 with '1's and BO 2 with '0's */ + prime_sync_start(dma_buf_fd, true); + memset(ptr_cpu, 0x11, width * height); + prime_sync_end(dma_buf_fd, true); + + prime_sync_start(dma_buf2_fd, true); + memset(ptr2_cpu, 0x00, width * height); + prime_sync_end(dma_buf2_fd, true); + + /* Copy BO 1 into BO 2, using blitter. */ + intel_copy_bo(local_batch, bo_2, bo_1, width * height); + usleep(0); /* let someone else claim the mutex */ + + /* Compare BOs. If prime_sync_* were executed properly, the caches +* should be synced. */ + prime_sync_start(dma_buf2_fd, false); + for (i = 0; i < (width * height) / 4; i++) + igt_fail_on_f(ptr2_cpu[i] != 0x, "Found 0x%08x at offset 0x%08x\n", ptr2_cpu[i], i); + prime_sync_end(dma_buf2_fd, false); + + drm_intel_bo_unreference(bo_1); + drm_intel_bo_unreference(bo_2); + munmap(ptr_cpu, width * height); + munmap(ptr2_cpu, width * height); +} + +/* + * Constantly interrupt concurrent blits to stress out prime_sync_* and make + * sure these ioctl errors are handled accordingly. + * + * Important to note that in case of failure (e.g. in a case where the ioctl + * wouldn't try again in a return error) this test does not reliably catch the + * problem with 100% of accuracy. + */ +static void test_ioctl_errors(void) +{ + int ncpus = sysconf(_SC_NPROCESSORS_ONLN); + + igt_fork_signal_helper(); + for (int num_children = 1; num_children <= 8 *ncpus; num_children <<= 1) { + igt_fork(child, num_children) { + struct timespec start = {}; + while (igt_nsec_elapsed(&start) <= num_children) + blit_and_cmp(); + } + igt_waitchildren(); + } + igt_stop_signal_helper(); +} + int main(int argc, char **argv) { int i; @@ -235,6 +319,11 @@ int main(int argc, char **argv) igt_fail_on_f(!stale, "couldn't find any stale cache lines\n"); } + igt_subtest("ioctl-errors") { + igt_info("exercising concurrent blit to get ioctl errors\n"); + test_ioctl_errors(); + } + igt_fixture { intel_batchbuffer_free(batch); drm_intel_bufmgr_destroy(bufmgr); -- 2.1.4
[PATCH v2] prime_mmap_coherency: Add return error tests for prime sync ioctl
This patch adds ioctl-errors subtest to be used for exercising prime sync ioctl errors. The subtest constantly interrupts via signals a function doing concurrent blit to stress out the right usage of prime_sync_*, making sure these ioctl errors are handled accordingly. Important to note that in case of failure (e.g. in a case where the ioctl wouldn't try again in a return error) this test does not reliably catch the problem with 100% of accuracy. v2: fix prime sync direction when reading mmap'ed file. Cc: Chris Wilson Signed-off-by: Tiago Vignatti --- tests/prime_mmap_coherency.c | 87 1 file changed, 87 insertions(+) diff --git a/tests/prime_mmap_coherency.c b/tests/prime_mmap_coherency.c index 180d8a4..80d1c1f 100644 --- a/tests/prime_mmap_coherency.c +++ b/tests/prime_mmap_coherency.c @@ -180,6 +180,88 @@ static void test_write_flush(bool expect_stale_cache) munmap(ptr_cpu, width * height); } +static void blit_and_cmp(void) +{ + drm_intel_bo *bo_1; + drm_intel_bo *bo_2; + uint32_t *ptr_cpu; + uint32_t *ptr2_cpu; + int dma_buf_fd, dma_buf2_fd, i; + int local_fd; + drm_intel_bufmgr *local_bufmgr; + struct intel_batchbuffer *local_batch; + + /* recreate process local variables */ + local_fd = drm_open_driver(DRIVER_INTEL); + local_bufmgr = drm_intel_bufmgr_gem_init(local_fd, 4096); + igt_assert(local_bufmgr); + + local_batch = intel_batchbuffer_alloc(local_bufmgr, intel_get_drm_devid(local_fd)); + igt_assert(local_batch); + + bo_1 = drm_intel_bo_alloc(local_bufmgr, "BO 1", width * height * 4, 4096); + dma_buf_fd = prime_handle_to_fd_for_mmap(local_fd, bo_1->handle); + igt_skip_on(errno == EINVAL); + + ptr_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE, + MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr_cpu != MAP_FAILED); + + bo_2 = drm_intel_bo_alloc(local_bufmgr, "BO 2", width * height * 4, 4096); + dma_buf2_fd = prime_handle_to_fd_for_mmap(local_fd, bo_2->handle); + + ptr2_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE, + MAP_SHARED, dma_buf2_fd, 0); + igt_assert(ptr2_cpu != MAP_FAILED); + + /* Fill up BO 1 with '1's and BO 2 with '0's */ + prime_sync_start(dma_buf_fd, true); + memset(ptr_cpu, 0x11, width * height); + prime_sync_end(dma_buf_fd, true); + + prime_sync_start(dma_buf2_fd, true); + memset(ptr2_cpu, 0x00, width * height); + prime_sync_end(dma_buf2_fd, true); + + /* Copy BO 1 into BO 2, using blitter. */ + intel_copy_bo(local_batch, bo_2, bo_1, width * height); + usleep(0); /* let someone else claim the mutex */ + + /* Compare BOs. If prime_sync_* were executed properly, the caches +* should be synced. */ + prime_sync_start(dma_buf2_fd, false); + for (i = 0; i < (width * height) / 4; i++) + igt_fail_on_f(ptr2_cpu[i] != 0x, "Found 0x%08x at offset 0x%08x\n", ptr2_cpu[i], i); + prime_sync_end(dma_buf2_fd, false); + + drm_intel_bo_unreference(bo_1); + drm_intel_bo_unreference(bo_2); + munmap(ptr_cpu, width * height); + munmap(ptr2_cpu, width * height); +} + +/* + * Constantly interrupt concurrent blits to stress out prime_sync_* and make + * sure these ioctl errors are handled accordingly. + * + * Important to note that in case of failure (e.g. in a case where the ioctl + * wouldn't try again in a return error) this test does not reliably catch the + * problem with 100% of accuracy. + */ +static void test_ioctl_errors(void) +{ + int i; + int num_children = 8*sysconf(_SC_NPROCESSORS_ONLN); + + igt_fork_signal_helper(); + igt_fork(child, num_children) { + for (i = 0; i < ROUNDS; i++) + blit_and_cmp(); + } + igt_waitchildren(); + igt_stop_signal_helper(); +} + int main(int argc, char **argv) { int i; @@ -235,6 +317,11 @@ int main(int argc, char **argv) igt_fail_on_f(!stale, "couldn't find any stale cache lines\n"); } + igt_subtest("ioctl-errors") { + igt_info("exercising concurrent blit to get ioctl errors\n"); + test_ioctl_errors(); + } + igt_fixture { intel_batchbuffer_free(batch); drm_intel_bufmgr_destroy(bufmgr); -- 2.1.4
[PATCH] prime_mmap_coherency: Add return error tests for prime sync ioctl
On 03/17/2016 06:01 PM, Chris Wilson wrote: > On Thu, Mar 17, 2016 at 03:18:03PM -0300, Tiago Vignatti wrote: >> This patch adds ioctl-errors subtest to be used for exercising prime sync >> ioctl >> errors. >> >> The subtest constantly interrupts via signals a function doing concurrent >> blit >> to stress out the right usage of prime_sync_*, making sure these ioctl errors >> are handled accordingly. Important to note that in case of failure (e.g. in a >> case where the ioctl wouldn't try again in a return error) this test does not >> reliably catch the problem with 100% of accuracy. >> >> Cc: Chris Wilson >> Signed-off-by: Tiago Vignatti >> --- >> >> Chris, your unpolished dma-buf patch for adding return error into the ioctl >> calls lgtm. Let me know if you think this kind of test is useful now in igt. >> >> Thanks >> >> Tiago >> >> tests/prime_mmap_coherency.c | 87 >> >> 1 file changed, 87 insertions(+) >> >> diff --git a/tests/prime_mmap_coherency.c b/tests/prime_mmap_coherency.c >> index 180d8a4..bae2144 100644 >> --- a/tests/prime_mmap_coherency.c >> +++ b/tests/prime_mmap_coherency.c >> @@ -180,6 +180,88 @@ static void test_write_flush(bool expect_stale_cache) >> munmap(ptr_cpu, width * height); >> } >> >> +static void blit_and_cmp(void) >> +{ >> +drm_intel_bo *bo_1; >> +drm_intel_bo *bo_2; >> +uint32_t *ptr_cpu; >> +uint32_t *ptr2_cpu; >> +int dma_buf_fd, dma_buf2_fd, i; >> +int local_fd; >> +drm_intel_bufmgr *local_bufmgr; >> +struct intel_batchbuffer *local_batch; >> + >> +/* recreate process local variables */ >> +local_fd = drm_open_driver(DRIVER_INTEL); >> +local_bufmgr = drm_intel_bufmgr_gem_init(local_fd, 4096); >> +igt_assert(local_bufmgr); >> + >> +local_batch = intel_batchbuffer_alloc(local_bufmgr, >> intel_get_drm_devid(local_fd)); >> +igt_assert(local_batch); >> + >> +bo_1 = drm_intel_bo_alloc(local_bufmgr, "BO 1", width * height * 4, >> 4096); >> +dma_buf_fd = prime_handle_to_fd_for_mmap(local_fd, bo_1->handle); >> +igt_skip_on(errno == EINVAL); >> + >> +ptr_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE, >> +MAP_SHARED, dma_buf_fd, 0); >> +igt_assert(ptr_cpu != MAP_FAILED); >> + >> +bo_2 = drm_intel_bo_alloc(local_bufmgr, "BO 2", width * height * 4, >> 4096); >> +dma_buf2_fd = prime_handle_to_fd_for_mmap(local_fd, bo_2->handle); >> + >> +ptr2_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE, >> +MAP_SHARED, dma_buf2_fd, 0); >> +igt_assert(ptr2_cpu != MAP_FAILED); >> + >> +/* Fill up BO 1 with '1's and BO 2 with '0's */ >> +prime_sync_start(dma_buf_fd, true); >> +memset(ptr_cpu, 0x11, width * height); >> +prime_sync_end(dma_buf_fd, true); >> + >> +prime_sync_start(dma_buf2_fd, true); >> +memset(ptr2_cpu, 0x00, width * height); >> +prime_sync_end(dma_buf2_fd, true); >> + >> +/* Copy BO 1 into BO 2, using blitter. */ >> +intel_copy_bo(local_batch, bo_2, bo_1, width * height); >> +usleep(0); /* let someone else claim the mutex */ >> + >> +/* Compare BOs. If prime_sync_* were executed properly, the caches >> + * should be synced. */ >> +prime_sync_start(dma_buf2_fd, true); > > Maybe false here? Note that it makes any difference for the driver atm. oh, my bad. >> +for (i = 0; i < (width * height) / 4; i++) >> +igt_fail_on_f(ptr2_cpu[i] != 0x, "Found 0x%08x at >> offset 0x%08x\n", ptr2_cpu[i], i); >> +prime_sync_end(dma_buf2_fd, true); >> + >> +drm_intel_bo_unreference(bo_1); >> +drm_intel_bo_unreference(bo_2); >> +munmap(ptr_cpu, width * height); >> +munmap(ptr2_cpu, width * height); > > Do we have anything that verifies that dmabuf maps persist beyond > gem_close() on the original bo? that's test_refcounting in prime_mmap.c > Yes, that test should hit all interruptible paths we have in dmabuf and > would be a great addition to igt. cool, thanks. I'm resending now v2. Tiago
[PATCH] prime_mmap_coherency: Add return error tests for prime sync ioctl
This patch adds ioctl-errors subtest to be used for exercising prime sync ioctl errors. The subtest constantly interrupts via signals a function doing concurrent blit to stress out the right usage of prime_sync_*, making sure these ioctl errors are handled accordingly. Important to note that in case of failure (e.g. in a case where the ioctl wouldn't try again in a return error) this test does not reliably catch the problem with 100% of accuracy. Cc: Chris Wilson Signed-off-by: Tiago Vignatti --- Chris, your unpolished dma-buf patch for adding return error into the ioctl calls lgtm. Let me know if you think this kind of test is useful now in igt. Thanks Tiago tests/prime_mmap_coherency.c | 87 1 file changed, 87 insertions(+) diff --git a/tests/prime_mmap_coherency.c b/tests/prime_mmap_coherency.c index 180d8a4..bae2144 100644 --- a/tests/prime_mmap_coherency.c +++ b/tests/prime_mmap_coherency.c @@ -180,6 +180,88 @@ static void test_write_flush(bool expect_stale_cache) munmap(ptr_cpu, width * height); } +static void blit_and_cmp(void) +{ + drm_intel_bo *bo_1; + drm_intel_bo *bo_2; + uint32_t *ptr_cpu; + uint32_t *ptr2_cpu; + int dma_buf_fd, dma_buf2_fd, i; + int local_fd; + drm_intel_bufmgr *local_bufmgr; + struct intel_batchbuffer *local_batch; + + /* recreate process local variables */ + local_fd = drm_open_driver(DRIVER_INTEL); + local_bufmgr = drm_intel_bufmgr_gem_init(local_fd, 4096); + igt_assert(local_bufmgr); + + local_batch = intel_batchbuffer_alloc(local_bufmgr, intel_get_drm_devid(local_fd)); + igt_assert(local_batch); + + bo_1 = drm_intel_bo_alloc(local_bufmgr, "BO 1", width * height * 4, 4096); + dma_buf_fd = prime_handle_to_fd_for_mmap(local_fd, bo_1->handle); + igt_skip_on(errno == EINVAL); + + ptr_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE, + MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr_cpu != MAP_FAILED); + + bo_2 = drm_intel_bo_alloc(local_bufmgr, "BO 2", width * height * 4, 4096); + dma_buf2_fd = prime_handle_to_fd_for_mmap(local_fd, bo_2->handle); + + ptr2_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE, + MAP_SHARED, dma_buf2_fd, 0); + igt_assert(ptr2_cpu != MAP_FAILED); + + /* Fill up BO 1 with '1's and BO 2 with '0's */ + prime_sync_start(dma_buf_fd, true); + memset(ptr_cpu, 0x11, width * height); + prime_sync_end(dma_buf_fd, true); + + prime_sync_start(dma_buf2_fd, true); + memset(ptr2_cpu, 0x00, width * height); + prime_sync_end(dma_buf2_fd, true); + + /* Copy BO 1 into BO 2, using blitter. */ + intel_copy_bo(local_batch, bo_2, bo_1, width * height); + usleep(0); /* let someone else claim the mutex */ + + /* Compare BOs. If prime_sync_* were executed properly, the caches +* should be synced. */ + prime_sync_start(dma_buf2_fd, true); + for (i = 0; i < (width * height) / 4; i++) + igt_fail_on_f(ptr2_cpu[i] != 0x, "Found 0x%08x at offset 0x%08x\n", ptr2_cpu[i], i); + prime_sync_end(dma_buf2_fd, true); + + drm_intel_bo_unreference(bo_1); + drm_intel_bo_unreference(bo_2); + munmap(ptr_cpu, width * height); + munmap(ptr2_cpu, width * height); +} + +/* + * Constantly interrupt concurrent blits to stress out prime_sync_* and make + * sure these ioctl errors are handled accordingly. + * + * Important to note that in case of failure (e.g. in a case where the ioctl + * wouldn't try again in a return error) this test does not reliably catch the + * problem with 100% of accuracy. + */ +static void test_ioctl_errors(void) +{ + int i; + int num_children = 8*sysconf(_SC_NPROCESSORS_ONLN); + + igt_fork_signal_helper(); + igt_fork(child, num_children) { + for (i = 0; i < ROUNDS; i++) + blit_and_cmp(); + } + igt_waitchildren(); + igt_stop_signal_helper(); +} + int main(int argc, char **argv) { int i; @@ -235,6 +317,11 @@ int main(int argc, char **argv) igt_fail_on_f(!stale, "couldn't find any stale cache lines\n"); } + igt_subtest("ioctl-errors") { + igt_info("exercising concurrent blit to get ioctl errors\n"); + test_ioctl_errors(); + } + igt_fixture { intel_batchbuffer_free(batch); drm_intel_bufmgr_destroy(bufmgr); -- 2.1.4
[PATCH v9] dma-buf: Add ioctls to allow userspace to flush
On 03/05/2016 06:34 AM, Daniel Vetter wrote: > On Mon, Feb 29, 2016 at 03:02:09PM +, Chris Wilson wrote: >> On Mon, Feb 29, 2016 at 03:54:19PM +0100, Daniel Vetter wrote: >>> On Thu, Feb 25, 2016 at 06:01:22PM +, Chris Wilson wrote: >>>> On Thu, Feb 11, 2016 at 08:04:51PM -0200, Tiago Vignatti wrote: >>>>> +static long dma_buf_ioctl(struct file *file, >>>>> + unsigned int cmd, unsigned long arg) >>>>> +{ >>>>> + struct dma_buf *dmabuf; >>>>> + struct dma_buf_sync sync; >>>>> + enum dma_data_direction direction; >>>>> + >>>>> + dmabuf = file->private_data; >>>>> + >>>>> + switch (cmd) { >>>>> + case DMA_BUF_IOCTL_SYNC: >>>>> + if (copy_from_user(&sync, (void __user *) arg, sizeof(sync))) >>>>> + return -EFAULT; >>>>> + >>>>> + if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK) >>>>> + return -EINVAL; >>>>> + >>>>> + switch (sync.flags & DMA_BUF_SYNC_RW) { >>>>> + case DMA_BUF_SYNC_READ: >>>>> + direction = DMA_FROM_DEVICE; >>>>> + break; >>>>> + case DMA_BUF_SYNC_WRITE: >>>>> + direction = DMA_TO_DEVICE; >>>>> + break; >>>>> + case DMA_BUF_SYNC_RW: >>>>> + direction = DMA_BIDIRECTIONAL; >>>>> + break; >>>>> + default: >>>>> + return -EINVAL; >>>>> + } >>>>> + >>>>> + if (sync.flags & DMA_BUF_SYNC_END) >>>>> + dma_buf_end_cpu_access(dmabuf, direction); >>>>> + else >>>>> + dma_buf_begin_cpu_access(dmabuf, direction); >>>> >>>> We forgot to report the error back to userspace. Might as well fixup the >>>> callchain to propagate error from end-cpu-access as well. Found after >>>> updating igt/gem_concurrent_blit to exercise dmabuf mmaps vs the GPU. >>> >>> EINTR? Do we need to make this ABI - I guess so? Tiago, do you have >>> patches? See drmIoctl() in libdrm for what's needed on the userspace side >>> if my guess is right. >> >> EINTR is the easiest, but conceivably we could also get EIO and >> currently EAGAIN. >> >> I am also seeing some strange timing dependent (i.e. valgrind doesn't >> show up anything client side and the tests then pass) failures (SIGSEGV, >> SIGBUS) with !llc. > > Tiago, ping. Also probably a gap in igt coverage besides the kernel side. > -Daniel Hey guys! I'm back from vacation now. I'll take a look on it in the next days and then come back to you. Tiago
[PATCH v9] dma-buf: Add ioctls to allow userspace to flush
From: Daniel Vetter The userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be used like following: - mmap dma-buf fd - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write to mmap area 3. SYNC_END ioctl. This can be repeated as often as you want (with the new data being consumed by the GPU or say scanout device) - munmap once you don't need the buffer any more v2 (Tiago): Fix header file type names (u64 -> __u64) v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end dma-buf functions. Check for overflows in start/length. v4 (Tiago): use 2d regions for sync. v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and remove range information from struct dma_buf_sync. v6 (Tiago): use __u64 structured padded flags instead enum. Adjust documentation about the recommendation on using sync ioctls. v7 (Tiago): Alex' nit on flags definition and being even more wording in the doc about sync usage. v9 (Tiago): remove useless is_dma_buf_file check. Fix sync.flags conditionals and its mask order check. Add include in dma-buf.h. Cc: Ville Syrjälä Cc: David Herrmann Cc: Sumit Semwal Reviewed-by: Stéphane Marchesin Signed-off-by: Daniel Vetter Signed-off-by: Tiago Vignatti --- I left SYNC_START and SYNC_END exclusive, just how the logic was before. If we see an useful use case, maybe like the way David said, to store two frames next to each other in the same BO, we can patch up later fairly easily. About the ioctl direction, just like Ville pointed, we're doing only copy_from_user at the moment and seems that _IOW is all we need. So I also didn't touch anything on that. David, Ville PTAL. Thank you, Tiago Documentation/dma-buf-sharing.txt | 21 +- drivers/dma-buf/dma-buf.c | 45 +++ include/uapi/linux/dma-buf.h | 40 ++ 3 files changed, 105 insertions(+), 1 deletion(-) create mode 100644 include/uapi/linux/dma-buf.h diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 4f4a84b..32ac32e 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -350,7 +350,26 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases: handles, too). So it's beneficial to support this in a similar fashion on dma-buf to have a good transition path for existing Android userspace. - No special interfaces, userspace simply calls mmap on the dma-buf fd. + No special interfaces, userspace simply calls mmap on the dma-buf fd, making + sure that the cache synchronization ioctl (DMA_BUF_IOCTL_SYNC) is *always* + used when the access happens. This is discussed next paragraphs. + + Some systems might need some sort of cache coherency management e.g. when + CPU and GPU domains are being accessed through dma-buf at the same time. To + circumvent this problem there are begin/end coherency markers, that forward + directly to existing dma-buf device drivers vfunc hooks. Userspace can make + use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence + would be used like following: + - mmap dma-buf fd + - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write + to mmap area 3. SYNC_END ioctl. This can be repeated as often as you + want (with the new data being consumed by the GPU or say scanout device) + - munmap once you don't need the buffer any more + +Therefore, for correctness and optimal performance, systems with the memory +cache shared by the GPU and CPU i.e. the "coherent" and also the +"incoherent" are always required to use SYNC_START and SYNC_END before and +after, respectively, when accessing the mapped address. 2. Supporting existing mmap interfaces in importers diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index b2ac13b..9810d1d 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -34,6 +34,8 @@ #include #include +#include + static inline int is_dma_buf_file(struct file *); struct dma_buf_list { @@ -251,11 +253,54 @@ out: return events; } +static long dma_buf_ioctl(struct file *file, + unsigned int cmd, unsigned long arg) +{ + struct dma_buf *dmabuf; + struct dma_buf_sync sync; + enum dma_data_direction direction; + + dmabuf = file->private_data; + + switch (cmd) { + case DMA_BUF_IOCTL_SYNC: + if (copy_from_user(&sync, (void __user *
[PATCH v7 3/5] dma-buf: Add ioctls to allow userspace to flush
Thanks for reviewing, David. Please take a look in my comments in-line. On 02/09/2016 07:26 AM, David Herrmann wrote: > > On Tue, Dec 22, 2015 at 10:36 PM, Tiago Vignatti > wrote: >> From: Daniel Vetter >> >> The userspace might need some sort of cache coherency management e.g. when >> CPU >> and GPU domains are being accessed through dma-buf at the same time. To >> circumvent this problem there are begin/end coherency markers, that forward >> directly to existing dma-buf device drivers vfunc hooks. Userspace can make >> use >> of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be >> used like following: >> - mmap dma-buf fd >> - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. >> read/write >> to mmap area 3. SYNC_END ioctl. This can be repeated as often as you >> want (with the new data being consumed by the GPU or say scanout >> device) >> - munmap once you don't need the buffer any more >> >> v2 (Tiago): Fix header file type names (u64 -> __u64) >> v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end >> dma-buf functions. Check for overflows in start/length. >> v4 (Tiago): use 2d regions for sync. >> v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and >> remove range information from struct dma_buf_sync. >> v6 (Tiago): use __u64 structured padded flags instead enum. Adjust >> documentation about the recommendation on using sync ioctls. >> v7 (Tiago): Alex' nit on flags definition and being even more wording in the >> doc about sync usage. >> >> Cc: Sumit Semwal >> Signed-off-by: Daniel Vetter >> Signed-off-by: Tiago Vignatti >> --- >> Documentation/dma-buf-sharing.txt | 21 ++- >> drivers/dma-buf/dma-buf.c | 43 >> +++ >> include/uapi/linux/dma-buf.h | 38 ++ >> 3 files changed, 101 insertions(+), 1 deletion(-) >> create mode 100644 include/uapi/linux/dma-buf.h >> >> diff --git a/Documentation/dma-buf-sharing.txt >> b/Documentation/dma-buf-sharing.txt >> index 4f4a84b..32ac32e 100644 >> --- a/Documentation/dma-buf-sharing.txt >> +++ b/Documentation/dma-buf-sharing.txt >> @@ -350,7 +350,26 @@ Being able to mmap an export dma-buf buffer object has >> 2 main use-cases: >> handles, too). So it's beneficial to support this in a similar fashion >> on >> dma-buf to have a good transition path for existing Android userspace. >> >> - No special interfaces, userspace simply calls mmap on the dma-buf fd. >> + No special interfaces, userspace simply calls mmap on the dma-buf fd, >> making >> + sure that the cache synchronization ioctl (DMA_BUF_IOCTL_SYNC) is >> *always* >> + used when the access happens. This is discussed next paragraphs. >> + >> + Some systems might need some sort of cache coherency management e.g. when >> + CPU and GPU domains are being accessed through dma-buf at the same time. >> To >> + circumvent this problem there are begin/end coherency markers, that >> forward >> + directly to existing dma-buf device drivers vfunc hooks. Userspace can >> make >> + use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence >> + would be used like following: >> + - mmap dma-buf fd >> + - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. >> read/write >> + to mmap area 3. SYNC_END ioctl. This can be repeated as often as you >> + want (with the new data being consumed by the GPU or say scanout >> device) >> + - munmap once you don't need the buffer any more >> + >> +Therefore, for correctness and optimal performance, systems with the >> memory >> +cache shared by the GPU and CPU i.e. the "coherent" and also the >> +"incoherent" are always required to use SYNC_START and SYNC_END before >> and >> +after, respectively, when accessing the mapped address. >> >> 2. Supporting existing mmap interfaces in importers >> >> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c >> index b2ac13b..9a298bd 100644 >> --- a/drivers/dma-buf/dma-buf.c >> +++ b/drivers/dma-buf/dma-buf.c >> @@ -34,6 +34,8 @@ >> #include >> #include >> >> +#include >> + >> static inline int is_dma_buf_file(struct file *); >> >> struct dma_buf_list { >> @@ -2
Direct userspace dma-buf mmap (v7)
On 02/04/2016 06:55 PM, Stéphane Marchesin wrote: > On Tue, Dec 22, 2015 at 1:36 PM, Tiago Vignatti > wrote: >> Hey back, >> >> Thank you Daniel, Chris, Alex and Thomas for reviewing the last series. I >> think I addressed most of the comments now in version 7, including: >>- being even more wording in the doc about sync usage. >>- pass .write = false always in i915 end_cpu_access. >>- add sync invalid flags test (igt). >>- in kms_mmap_write_crc, use CPU hog and testing rounds to catch the sync >> problems (igt). >> >> Here are the trees: >> >> https://github.com/tiagovignatti/drm-intel/commits/drm-intel-nightly_dma-buf-mmap-v7 >> https://github.com/tiagovignatti/intel-gpu-tools/commits/dma-buf-mmap-v7 >> >> Also, Chrome OS side is in progress. This past week I've been mostly >> struggling with fail attempts to build it (boots and goes to a black screen. >> Sigh.) and also finding a way to make my funky BayTrail-T laptop with 32-bit >> UEFI firmware boot up (success with Ubuntu but no success yet in CrOS). A WIP >> of Chromium changes can be seen here anyways: >> >> https://codereview.chromium.org/1262043002/ >> >> Happy Holidays! > > For the series: > > Reviewed-by: Stéphane Marchesin Thank you! Daniel, here are the trees ready for pulling (I hope) now: https://github.com/tiagovignatti/drm-intel/commits/drm-intel-nightly_dma-buf-mmap-v8 https://github.com/tiagovignatti/intel-gpu-tools/commits/dma-buf-mmap-v8 Tiago
[PATCH igt v7 6/6] tests: Add prime_mmap_coherency for cache coherency tests
Different than kms_mmap_write_crc that captures the coherency issues within the scanout mapped buffer, this one is meant for test dma-buf mmap on !llc platforms mostly and provoke coherency bugs so we know where we need the sync ioctls. I tested this with !llc and llc platforms, BTY and IVY respectively. Signed-off-by: Tiago Vignatti --- tests/Makefile.sources | 1 + tests/prime_mmap_coherency.c | 246 +++ 2 files changed, 247 insertions(+) create mode 100644 tests/prime_mmap_coherency.c diff --git a/tests/Makefile.sources b/tests/Makefile.sources index ad2dd6a..78605c6 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -97,6 +97,7 @@ TESTS_progs_M = \ pm_rc6_residency \ pm_sseu \ prime_mmap \ + prime_mmap_coherency \ prime_self_import \ template \ $(NULL) diff --git a/tests/prime_mmap_coherency.c b/tests/prime_mmap_coherency.c new file mode 100644 index 000..a9a2664 --- /dev/null +++ b/tests/prime_mmap_coherency.c @@ -0,0 +1,246 @@ +/* + * Copyright © 2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Tiago Vignatti + */ + +/** @file prime_mmap_coherency.c + * + * TODO: need to show the need for prime_sync_end(). + */ + +#include "igt.h" + +IGT_TEST_DESCRIPTION("Test dma-buf mmap on !llc platforms mostly and provoke" + " coherency bugs so we know for sure where we need the sync ioctls."); + +#define ROUNDS 20 + +int fd; +int stale = 0; +static drm_intel_bufmgr *bufmgr; +struct intel_batchbuffer *batch; +static int width = 1024, height = 1024; + +/* + * Exercises the need for read flush: + * 1. create a BO and write '0's, in GTT domain. + * 2. read BO using the dma-buf CPU mmap. + * 3. write '1's, in GTT domain. + * 4. read again through the mapped dma-buf. + */ +static void test_read_flush(bool expect_stale_cache) +{ + drm_intel_bo *bo_1; + drm_intel_bo *bo_2; + uint32_t *ptr_cpu; + uint32_t *ptr_gtt; + int dma_buf_fd, i; + + if (expect_stale_cache) + igt_require(!gem_has_llc(fd)); + + bo_1 = drm_intel_bo_alloc(bufmgr, "BO 1", width * height * 4, 4096); + + /* STEP #1: put the BO 1 in GTT domain. We use the blitter to copy and fill +* zeros to BO 1, so commands will be submitted and likely to place BO 1 in +* the GTT domain. */ + bo_2 = drm_intel_bo_alloc(bufmgr, "BO 2", width * height * 4, 4096); + intel_copy_bo(batch, bo_1, bo_2, width * height); + gem_sync(fd, bo_1->handle); + drm_intel_bo_unreference(bo_2); + + /* STEP #2: read BO 1 using the dma-buf CPU mmap. This dirties the CPU caches. */ + dma_buf_fd = prime_handle_to_fd_for_mmap(fd, bo_1->handle); + igt_skip_on(errno == EINVAL); + + ptr_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE, + MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr_cpu != MAP_FAILED); + + for (i = 0; i < (width * height) / 4; i++) + igt_assert_eq(ptr_cpu[i], 0); + + /* STEP #3: write 0x11 into BO 1. */ + bo_2 = drm_intel_bo_alloc(bufmgr, "BO 2", width * height * 4, 4096); + ptr_gtt = gem_mmap__gtt(fd, bo_2->handle, width * height, PROT_READ | PROT_WRITE); + memset(ptr_gtt, 0x11, width * height); + munmap(ptr_gtt, width * height); + + intel_copy_bo(batch, bo_1, bo_2, width * height); + gem_sync(fd, bo_1->handle); + drm_intel_bo_unreference(bo_2); + + /* STEP #4: read again using the CPU mmap. Doing #1 before #3 makes sure we +* don't do a full CPU cache flush in step #3 again. That makes sure all the +* stale cachelines from step #2 survive (mostly, a few w
[PATCH igt v7 5/6] tests: Add kms_mmap_write_crc for cache coherency tests
This program can be used to detect when CPU writes in the dma-buf mapped object don't land in scanout due cache incoherency. Although this seems a problem inherently of non-LCC machines ("Atom"), this particular test catches a cache dirt on scanout on LLC machines as well. It's inspired in Ville's kms_pwrite_crc.c and can be used also to test the correctness of the driver's begin_cpu_access and end_cpu_access (which requires i915 implementation. To see the need for flush, one has to run using '-n' option to not call the sync ioctls which, via a rather simple CPU hog the system will trashes the caches, while the test will catch the coherency issue. If you now suppress '-n', then things should just work like expected. I tested this with !llc and llc platforms, BTY and IVY respectively. v2: use prime_handle_to_fd_for_mmap instead. v3: merge end_cpu_access() patch with this and provide options to disable sync. v4: use library's prime_sync_{start,end} instead. v7: use CPU hog instead and use testing rounds to catch the sync problems. Signed-off-by: Tiago Vignatti --- tests/Makefile.sources | 1 + tests/kms_mmap_write_crc.c | 313 + 2 files changed, 314 insertions(+) create mode 100644 tests/kms_mmap_write_crc.c diff --git a/tests/Makefile.sources b/tests/Makefile.sources index 75f3cb0..ad2dd6a 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -168,6 +168,7 @@ TESTS_progs = \ kms_3d \ kms_fence_pin_leak \ kms_force_connector_basic \ + kms_mmap_write_crc \ kms_pwrite_crc \ kms_sink_crc_basic \ prime_udl \ diff --git a/tests/kms_mmap_write_crc.c b/tests/kms_mmap_write_crc.c new file mode 100644 index 000..6984bbd --- /dev/null +++ b/tests/kms_mmap_write_crc.c @@ -0,0 +1,313 @@ +/* + * Copyright © 2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Tiago Vignatti + */ + +#include +#include +#include +#include +#include + +#include "drmtest.h" +#include "igt_debugfs.h" +#include "igt_kms.h" +#include "intel_chipset.h" +#include "ioctl_wrappers.h" +#include "igt_aux.h" + +IGT_TEST_DESCRIPTION( + "Use the display CRC support to validate mmap write to an already uncached future scanout buffer."); + +#define ROUNDS 10 + +typedef struct { + int drm_fd; + igt_display_t display; + struct igt_fb fb[2]; + igt_output_t *output; + igt_plane_t *primary; + enum pipe pipe; + igt_crc_t ref_crc; + igt_pipe_crc_t *pipe_crc; + uint32_t devid; +} data_t; + +static int ioctl_sync = true; +int dma_buf_fd; + +static char *dmabuf_mmap_framebuffer(int drm_fd, struct igt_fb *fb) +{ + char *ptr = NULL; + + dma_buf_fd = prime_handle_to_fd_for_mmap(drm_fd, fb->gem_handle); + igt_skip_on(dma_buf_fd == -1 && errno == EINVAL); + + ptr = mmap(NULL, fb->size, PROT_READ | PROT_WRITE, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + + return ptr; +} + +static void test(data_t *data) +{ + igt_display_t *display = &data->display; + igt_output_t *output = data->output; + struct igt_fb *fb = &data->fb[1]; + drmModeModeInfo *mode; + cairo_t *cr; + char *ptr; + uint32_t caching; + void *buf; + igt_crc_t crc; + + mode = igt_output_get_mode(output); + + /* create a non-white fb where we can write later */ + igt_create_fb(data->drm_fd, mode->hdisplay, mode->vdisplay, + DRM_FORMAT_XRGB, LOCAL_DRM_FORMAT_MOD_NONE, fb); + + ptr = dmabuf_mmap_framebuffer(data->drm_fd, fb); + + cr = igt_get_cairo_ctx(data->drm_fd, fb);
[PATCH igt v7 4/6] lib: Add prime_sync_start and prime_sync_end helpers
This patch adds dma-buf mmap synchronization ioctls that can be used by tests for cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. v7: add sync invalid flags test. Signed-off-by: Tiago Vignatti --- lib/ioctl_wrappers.c | 26 ++ lib/ioctl_wrappers.h | 17 + tests/prime_mmap.c | 25 + 3 files changed, 68 insertions(+) diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c index 86a61ba..0d84d00 100644 --- a/lib/ioctl_wrappers.c +++ b/lib/ioctl_wrappers.c @@ -1400,6 +1400,32 @@ off_t prime_get_size(int dma_buf_fd) } /** + * prime_sync_start + * @dma_buf_fd: dma-buf fd handle + */ +void prime_sync_start(int dma_buf_fd) +{ + struct local_dma_buf_sync sync_start; + + memset(&sync_start, 0, sizeof(sync_start)); + sync_start.flags = LOCAL_DMA_BUF_SYNC_START | LOCAL_DMA_BUF_SYNC_RW; + do_ioctl(dma_buf_fd, LOCAL_DMA_BUF_IOCTL_SYNC, &sync_start); +} + +/** + * prime_sync_end + * @dma_buf_fd: dma-buf fd handle + */ +void prime_sync_end(int dma_buf_fd) +{ + struct local_dma_buf_sync sync_end; + + memset(&sync_end, 0, sizeof(sync_end)); + sync_end.flags = LOCAL_DMA_BUF_SYNC_END | LOCAL_DMA_BUF_SYNC_RW; + do_ioctl(dma_buf_fd, LOCAL_DMA_BUF_IOCTL_SYNC, &sync_end); +} + +/** * igt_require_fb_modifiers: * @fd: Open DRM file descriptor. * diff --git a/lib/ioctl_wrappers.h b/lib/ioctl_wrappers.h index d3ffba2..d004165 100644 --- a/lib/ioctl_wrappers.h +++ b/lib/ioctl_wrappers.h @@ -148,6 +148,21 @@ void gem_require_caching(int fd); void gem_require_ring(int fd, int ring_id); /* prime */ +struct local_dma_buf_sync { + uint64_t flags; +}; + +#define LOCAL_DMA_BUF_SYNC_READ (1 << 0) +#define LOCAL_DMA_BUF_SYNC_WRITE (2 << 0) +#define LOCAL_DMA_BUF_SYNC_RW(LOCAL_DMA_BUF_SYNC_READ | LOCAL_DMA_BUF_SYNC_WRITE) +#define LOCAL_DMA_BUF_SYNC_START (0 << 2) +#define LOCAL_DMA_BUF_SYNC_END (1 << 2) +#define LOCAL_DMA_BUF_SYNC_VALID_FLAGS_MASK \ + (LOCAL_DMA_BUF_SYNC_RW | LOCAL_DMA_BUF_SYNC_END) + +#define LOCAL_DMA_BUF_BASE 'b' +#define LOCAL_DMA_BUF_IOCTL_SYNC _IOW(LOCAL_DMA_BUF_BASE, 0, struct local_dma_buf_sync) + int prime_handle_to_fd(int fd, uint32_t handle); #ifndef DRM_RDWR #define DRM_RDWR O_RDWR @@ -155,6 +170,8 @@ int prime_handle_to_fd(int fd, uint32_t handle); int prime_handle_to_fd_for_mmap(int fd, uint32_t handle); uint32_t prime_fd_to_handle(int fd, int dma_buf_fd); off_t prime_get_size(int dma_buf_fd); +void prime_sync_start(int dma_buf_fd); +void prime_sync_end(int dma_buf_fd); /* addfb2 fb modifiers */ struct local_drm_mode_fb_cmd2 { diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c index 269ada6..29a0cfd 100644 --- a/tests/prime_mmap.c +++ b/tests/prime_mmap.c @@ -401,6 +401,30 @@ test_errors(void) gem_close(fd, handle); } +/* Test for invalid flags on sync ioctl */ +static void +test_invalid_sync_flags(void) +{ + int i, dma_buf_fd; + uint32_t handle; + struct local_dma_buf_sync sync; + int invalid_flags[] = {-1, + 0x00, + LOCAL_DMA_BUF_SYNC_RW + 1, + LOCAL_DMA_BUF_SYNC_VALID_FLAGS_MASK + 1}; + + handle = gem_create(fd, BO_SIZE); + dma_buf_fd = prime_handle_to_fd(fd, handle); + for (i = 0; i < sizeof(invalid_flags) / sizeof(invalid_flags[0]); i++) { + memset(&sync, 0, sizeof(sync)); + sync.flags = invalid_flags[i]; + + drmIoctl(dma_buf_fd, LOCAL_DMA_BUF_IOCTL_SYNC, &sync); + igt_assert_eq(errno, EINVAL); + errno = 0; + } +} + static void test_aperture_limit(void) { @@ -473,6 +497,7 @@ igt_main { "test_dup", test_dup }, { "test_userptr", test_userptr }, { "test_errors", test_errors }, + { "test_invalid_sync_flags", test_invalid_sync_flags }, { "test_aperture_limit", test_aperture_limit }, }; int i; -- 2.1.4
[PATCH igt v7 3/6] prime_mmap: Add basic tests to write in a bo using CPU
This patch adds test_correct_cpu_write, which maps the texture buffer through a prime fd and then writes directly to it using the CPU. It stresses the driver to guarantee cache synchronization among the different domains. This test also adds test_forked_cpu_write, which creates the GEM bo in one process and pass the prime handle of the it to another process, which in turn uses the handle only to map and write. Roughly speaking this test simulates Chrome OS architecture, where the Web content ("unpriviledged process") maps and CPU-draws a buffer, which was previously allocated in the GPU process ("priviledged process"). This requires kernel modifications (Daniel Thompson's "drm: prime: Honour O_RDWR during prime-handle-to-fd") and therefore prime_handle_to_fd_for_mmap is added to fail in case these lack. Also, upcoming tests (e.g. next patch) are going to use it as well, so make it public and available in the lib. v2: adds prime_handle_to_fd_with_mmap for skipping test in older kernels and test for invalid flags. Signed-off-by: Tiago Vignatti --- lib/ioctl_wrappers.c | 25 +++ lib/ioctl_wrappers.h | 4 +++ tests/prime_mmap.c | 89 3 files changed, 112 insertions(+), 6 deletions(-) diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c index 6cad8a2..86a61ba 100644 --- a/lib/ioctl_wrappers.c +++ b/lib/ioctl_wrappers.c @@ -1329,6 +1329,31 @@ int prime_handle_to_fd(int fd, uint32_t handle) } /** + * prime_handle_to_fd_for_mmap: + * @fd: open i915 drm file descriptor + * @handle: file-private gem buffer object handle + * + * Same as prime_handle_to_fd above but with DRM_RDWR capabilities, which can + * be useful for writing into the mmap'ed dma-buf file-descriptor. + * + * Returns: The created dma-buf fd handle or -1 if the ioctl fails. + */ +int prime_handle_to_fd_for_mmap(int fd, uint32_t handle) +{ + struct drm_prime_handle args; + + memset(&args, 0, sizeof(args)); + args.handle = handle; + args.flags = DRM_CLOEXEC | DRM_RDWR; + args.fd = -1; + + if (drmIoctl(fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &args) != 0) + return -1; + + return args.fd; +} + +/** * prime_fd_to_handle: * @fd: open i915 drm file descriptor * @dma_buf_fd: dma-buf fd handle diff --git a/lib/ioctl_wrappers.h b/lib/ioctl_wrappers.h index bb8a858..d3ffba2 100644 --- a/lib/ioctl_wrappers.h +++ b/lib/ioctl_wrappers.h @@ -149,6 +149,10 @@ void gem_require_ring(int fd, int ring_id); /* prime */ int prime_handle_to_fd(int fd, uint32_t handle); +#ifndef DRM_RDWR +#define DRM_RDWR O_RDWR +#endif +int prime_handle_to_fd_for_mmap(int fd, uint32_t handle); uint32_t prime_fd_to_handle(int fd, int dma_buf_fd); off_t prime_get_size(int dma_buf_fd); diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c index 95304a9..269ada6 100644 --- a/tests/prime_mmap.c +++ b/tests/prime_mmap.c @@ -22,6 +22,7 @@ * * Authors: *Rob Bradford + *Tiago Vignatti * */ @@ -66,6 +67,12 @@ fill_bo(uint32_t handle, size_t size) } static void +fill_bo_cpu(char *ptr) +{ + memcpy(ptr, pattern, sizeof(pattern)); +} + +static void test_correct(void) { int dma_buf_fd; @@ -180,6 +187,65 @@ test_forked(void) gem_close(fd, handle); } +/* test simple CPU write */ +static void +test_correct_cpu_write(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd_for_mmap(fd, handle); + + /* Skip if DRM_RDWR is not supported */ + igt_skip_on(errno == EINVAL); + + /* Check correctness of map using write protection (PROT_WRITE) */ + ptr = mmap(NULL, BO_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + + /* Fill bo using CPU */ + fill_bo_cpu(ptr); + + /* Check pattern correctness */ + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + + munmap(ptr, BO_SIZE); + close(dma_buf_fd); + gem_close(fd, handle); +} + +/* map from another process and then write using CPU */ +static void +test_forked_cpu_write(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd_for_mmap(fd, handle); + + /* Skip if DRM_RDWR is not supported */ + igt_skip_on(errno == EINVAL); + + igt_fork(childno, 1) { + ptr = mmap(NULL, BO_SIZE, PROT_READ | PROT_WRITE , MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + fill_bo_cpu(ptr); + + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + munmap(ptr, BO_SIZE); + close(dma_buf_fd); + } + close(dma_buf_fd); + igt_waitchildren(); + gem_close(fd, handle); +} + static void test_ref
[PATCH igt v7 2/6] prime_mmap: Add new test for calling mmap() on dma-buf fds
From: Rob Bradford This test has the following subtests: - test_correct for correctness of the data - test_map_unmap checks for mapping idempotency - test_reprime checks for dma-buf creation idempotency - test_forked checks for multiprocess access - test_refcounting checks for buffer reference counting - test_dup checks that dup()ing the fd works - test_userptr make sure it fails when mmaping due the lack of obj->base.filp in a userptr. - test_errors checks the error return values for failures - test_aperture_limit tests multiple buffer creation at the gtt aperture limit v2 (Tiago): Removed pattern_check(), which was walking through a useless iterator. Removed superfluous PROT_WRITE from gem_mmap, in test_correct(). Added binary file to .gitignore v3 (Tiago): squash patch "prime_mmap: Test for userptr mmap" into this one. v4 (Tiago): use synchronized userptr for testing. Add test for buffer overlapping. Signed-off-by: Rob Bradford Signed-off-by: Tiago Vignatti --- tests/Makefile.sources | 1 + tests/prime_mmap.c | 417 + 2 files changed, 418 insertions(+) create mode 100644 tests/prime_mmap.c diff --git a/tests/Makefile.sources b/tests/Makefile.sources index d594038..75f3cb0 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -96,6 +96,7 @@ TESTS_progs_M = \ pm_rps \ pm_rc6_residency \ pm_sseu \ + prime_mmap \ prime_self_import \ template \ $(NULL) diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c new file mode 100644 index 000..95304a9 --- /dev/null +++ b/tests/prime_mmap.c @@ -0,0 +1,417 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Rob Bradford + * + */ + +/* + * Testcase: Check whether mmap()ing dma-buf works + */ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "drm.h" +#include "i915_drm.h" +#include "drmtest.h" +#include "igt_debugfs.h" +#include "ioctl_wrappers.h" + +#define BO_SIZE (16*1024) + +static int fd; + +char pattern[] = {0xff, 0x00, 0x00, 0x00, + 0x00, 0xff, 0x00, 0x00, + 0x00, 0x00, 0xff, 0x00, + 0x00, 0x00, 0x00, 0xff}; + +static void +fill_bo(uint32_t handle, size_t size) +{ + off_t i; + for (i = 0; i < size; i+=sizeof(pattern)) + { + gem_write(fd, handle, i, pattern, sizeof(pattern)); + } +} + +static void +test_correct(void) +{ + int dma_buf_fd; + char *ptr1, *ptr2; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + + /* Check correctness vs GEM_MMAP_GTT */ + ptr1 = gem_mmap__gtt(fd, handle, BO_SIZE, PROT_READ); + ptr2 = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr1 != MAP_FAILED); + igt_assert(ptr2 != MAP_FAILED); + igt_assert(memcmp(ptr1, ptr2, BO_SIZE) == 0); + + /* Check pattern correctness */ + igt_assert(memcmp(ptr2, pattern, sizeof(pattern)) == 0); + + munmap(ptr1, BO_SIZE); + munmap(ptr2, BO_SIZE); + close(dma_buf_fd); + gem_close(fd, handle); +} + +static void +test_map_unmap(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + + /* Unmap
[PATCH igt v7 1/6] lib: Add gem_userptr and __gem_userptr helpers
This patch moves userptr definitions and helpers implementation that were locally in gem_userptr_benchmark and gem_userptr_blits to the library, so other tests can make use of them as well. There's no functional changes. v2: added __ function to differentiate when errors want to be handled back in the caller; bring gem_userptr_sync back to gem_userptr_blits; added gtkdoc. Signed-off-by: Tiago Vignatti --- benchmarks/gem_userptr_benchmark.c | 55 +++- lib/ioctl_wrappers.c | 41 +++ lib/ioctl_wrappers.h | 13 + tests/gem_userptr_blits.c | 104 ++--- 4 files changed, 86 insertions(+), 127 deletions(-) diff --git a/benchmarks/gem_userptr_benchmark.c b/benchmarks/gem_userptr_benchmark.c index 1eae7ff..f7716df 100644 --- a/benchmarks/gem_userptr_benchmark.c +++ b/benchmarks/gem_userptr_benchmark.c @@ -58,17 +58,6 @@ #define PAGE_SIZE 4096 #endif -#define LOCAL_I915_GEM_USERPTR 0x33 -#define LOCAL_IOCTL_I915_GEM_USERPTR DRM_IOWR (DRM_COMMAND_BASE + LOCAL_I915_GEM_USERPTR, struct local_i915_gem_userptr) -struct local_i915_gem_userptr { - uint64_t user_ptr; - uint64_t user_size; - uint32_t flags; -#define LOCAL_I915_USERPTR_READ_ONLY (1<<0) -#define LOCAL_I915_USERPTR_UNSYNCHRONIZED (1<<31) - uint32_t handle; -}; - static uint32_t userptr_flags = LOCAL_I915_USERPTR_UNSYNCHRONIZED; #define BO_SIZE (65536) @@ -83,30 +72,6 @@ static void gem_userptr_test_synchronized(void) userptr_flags = 0; } -static int gem_userptr(int fd, void *ptr, int size, int read_only, uint32_t *handle) -{ - struct local_i915_gem_userptr userptr; - int ret; - - userptr.user_ptr = (uintptr_t)ptr; - userptr.user_size = size; - userptr.flags = userptr_flags; - if (read_only) - userptr.flags |= LOCAL_I915_USERPTR_READ_ONLY; - - ret = drmIoctl(fd, LOCAL_IOCTL_I915_GEM_USERPTR, &userptr); - if (ret) - ret = errno; - igt_skip_on_f(ret == ENODEV && - (userptr_flags & LOCAL_I915_USERPTR_UNSYNCHRONIZED) == 0 && - !read_only, - "Skipping, synchronized mappings with no kernel CONFIG_MMU_NOTIFIER?"); - if (ret == 0) - *handle = userptr.handle; - - return ret; -} - static void **handle_ptr_map; static unsigned int num_handle_ptr_map; @@ -144,8 +109,7 @@ static uint32_t create_userptr_bo(int fd, int size) ret = posix_memalign(&ptr, PAGE_SIZE, size); igt_assert(ret == 0); - ret = gem_userptr(fd, (uint32_t *)ptr, size, 0, &handle); - igt_assert(ret == 0); + gem_userptr(fd, (uint32_t *)ptr, size, 0, userptr_flags, &handle); add_handle_ptr(handle, ptr); return handle; @@ -167,7 +131,7 @@ static int has_userptr(int fd) assert(posix_memalign(&ptr, PAGE_SIZE, PAGE_SIZE) == 0); oldflags = userptr_flags; gem_userptr_test_unsynchronized(); - ret = gem_userptr(fd, ptr, PAGE_SIZE, 0, &handle); + ret = __gem_userptr(fd, ptr, PAGE_SIZE, 0, userptr_flags, &handle); userptr_flags = oldflags; if (ret != 0) { free(ptr); @@ -379,9 +343,7 @@ static void test_impact_overlap(int fd, const char *prefix) for (i = 0, p = block; i < nr_bos[subtest]; i++, p += PAGE_SIZE) - ret = gem_userptr(fd, (uint32_t *)p, BO_SIZE, 0, - &handles[i]); - igt_assert(ret == 0); + gem_userptr(fd, (uint32_t *)p, BO_SIZE, 0, userptr_flags, &handles[i]); } if (nr_bos[subtest] > 0) @@ -427,7 +389,6 @@ static void test_single(int fd) char *ptr, *bo_ptr; uint32_t handle = 0; unsigned long iter = 0; - int ret; unsigned long map_size = BO_SIZE + PAGE_SIZE - 1; ptr = mmap(NULL, map_size, PROT_READ | PROT_WRITE, @@ -439,8 +400,7 @@ static void test_single(int fd) start_test(test_duration_sec); while (run_test) { - ret = gem_userptr(fd, bo_ptr, BO_SIZE, 0, &handle); - assert(ret == 0); + gem_userptr(fd, bo_ptr, BO_SIZE, 0, userptr_flags, &handle); gem_close(fd, handle); iter++; } @@ -456,7 +416,6 @@ static void test_multiple(int fd, unsigned int batch, int random) uint32_t handles[1]; int map[1]; unsigned long iter = 0; - int ret; int i; unsigned long map_size = batch * BO_SIZE + PAGE_SIZE - 1; @@ -478,10 +437,8 @@ static void test_multiple(int fd, unsigned int batch, int random) if (random)
[PATCH v7 5/5] drm/i915: Use CPU mapping for userspace dma-buf mmap()
Userspace is the one in charge of flush CPU by wrapping mmap with begin{,end}_cpu_access. v2: Remove LLC check cause we have dma-buf sync providers now. Also, fix return before transferring ownership when mmap fails. v3: Fix return values. v4: !obj->base.filp is user triggerable, so removed the WARN_ON. Reviewed-by: Chris Wilson Signed-off-by: Tiago Vignatti --- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c index 8c9ed2a..1f3eef6 100644 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c @@ -193,7 +193,23 @@ static void i915_gem_dmabuf_kunmap(struct dma_buf *dma_buf, unsigned long page_n static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma) { - return -EINVAL; + struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); + int ret; + + if (obj->base.size < vma->vm_end - vma->vm_start) + return -EINVAL; + + if (!obj->base.filp) + return -ENODEV; + + ret = obj->base.filp->f_op->mmap(obj->base.filp, vma); + if (ret) + return ret; + + fput(vma->vm_file); + vma->vm_file = get_file(obj->base.filp); + + return 0; } static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction direction) -- 2.1.4
[PATCH v7 4/5] drm/i915: Implement end_cpu_access
This function is meant to be used with dma-buf mmap, when finishing the CPU access of the mapped pointer. The error case should be rare to happen though, requiring the buffer become active during the sync period and for the end_cpu_access to be interrupted. So we use a uninterruptible mutex_lock to spit out when it ever happens. v2: disable interruption to make sure errors are reported. v3: update to the new end_cpu_access API. v7: use .write = false cause it doesn't need to know whether it's write. Reviewed-by: Chris Wilson Signed-off-by: Tiago Vignatti --- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c index 65ab2bd..8c9ed2a 100644 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c @@ -212,6 +212,27 @@ static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_dire return ret; } +static void i915_gem_end_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction direction) +{ + struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); + struct drm_device *dev = obj->base.dev; + struct drm_i915_private *dev_priv = to_i915(dev); + bool was_interruptible; + int ret; + + mutex_lock(&dev->struct_mutex); + was_interruptible = dev_priv->mm.interruptible; + dev_priv->mm.interruptible = false; + + ret = i915_gem_object_set_to_gtt_domain(obj, false); + + dev_priv->mm.interruptible = was_interruptible; + mutex_unlock(&dev->struct_mutex); + + if (unlikely(ret)) + DRM_ERROR("unable to flush buffer following CPU access; rendering may be corrupt\n"); +} + static const struct dma_buf_ops i915_dmabuf_ops = { .map_dma_buf = i915_gem_map_dma_buf, .unmap_dma_buf = i915_gem_unmap_dma_buf, @@ -224,6 +245,7 @@ static const struct dma_buf_ops i915_dmabuf_ops = { .vmap = i915_gem_dmabuf_vmap, .vunmap = i915_gem_dmabuf_vunmap, .begin_cpu_access = i915_gem_begin_cpu_access, + .end_cpu_access = i915_gem_end_cpu_access, }; struct dma_buf *i915_gem_prime_export(struct drm_device *dev, -- 2.1.4
[PATCH v7 3/5] dma-buf: Add ioctls to allow userspace to flush
From: Daniel Vetter The userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be used like following: - mmap dma-buf fd - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write to mmap area 3. SYNC_END ioctl. This can be repeated as often as you want (with the new data being consumed by the GPU or say scanout device) - munmap once you don't need the buffer any more v2 (Tiago): Fix header file type names (u64 -> __u64) v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end dma-buf functions. Check for overflows in start/length. v4 (Tiago): use 2d regions for sync. v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and remove range information from struct dma_buf_sync. v6 (Tiago): use __u64 structured padded flags instead enum. Adjust documentation about the recommendation on using sync ioctls. v7 (Tiago): Alex' nit on flags definition and being even more wording in the doc about sync usage. Cc: Sumit Semwal Signed-off-by: Daniel Vetter Signed-off-by: Tiago Vignatti --- Documentation/dma-buf-sharing.txt | 21 ++- drivers/dma-buf/dma-buf.c | 43 +++ include/uapi/linux/dma-buf.h | 38 ++ 3 files changed, 101 insertions(+), 1 deletion(-) create mode 100644 include/uapi/linux/dma-buf.h diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 4f4a84b..32ac32e 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -350,7 +350,26 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases: handles, too). So it's beneficial to support this in a similar fashion on dma-buf to have a good transition path for existing Android userspace. - No special interfaces, userspace simply calls mmap on the dma-buf fd. + No special interfaces, userspace simply calls mmap on the dma-buf fd, making + sure that the cache synchronization ioctl (DMA_BUF_IOCTL_SYNC) is *always* + used when the access happens. This is discussed next paragraphs. + + Some systems might need some sort of cache coherency management e.g. when + CPU and GPU domains are being accessed through dma-buf at the same time. To + circumvent this problem there are begin/end coherency markers, that forward + directly to existing dma-buf device drivers vfunc hooks. Userspace can make + use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence + would be used like following: + - mmap dma-buf fd + - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write + to mmap area 3. SYNC_END ioctl. This can be repeated as often as you + want (with the new data being consumed by the GPU or say scanout device) + - munmap once you don't need the buffer any more + +Therefore, for correctness and optimal performance, systems with the memory +cache shared by the GPU and CPU i.e. the "coherent" and also the +"incoherent" are always required to use SYNC_START and SYNC_END before and +after, respectively, when accessing the mapped address. 2. Supporting existing mmap interfaces in importers diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index b2ac13b..9a298bd 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -34,6 +34,8 @@ #include #include +#include + static inline int is_dma_buf_file(struct file *); struct dma_buf_list { @@ -251,11 +253,52 @@ out: return events; } +static long dma_buf_ioctl(struct file *file, + unsigned int cmd, unsigned long arg) +{ + struct dma_buf *dmabuf; + struct dma_buf_sync sync; + enum dma_data_direction direction; + + dmabuf = file->private_data; + + if (!is_dma_buf_file(file)) + return -EINVAL; + + switch (cmd) { + case DMA_BUF_IOCTL_SYNC: + if (copy_from_user(&sync, (void __user *) arg, sizeof(sync))) + return -EFAULT; + + if (sync.flags & DMA_BUF_SYNC_RW) + direction = DMA_BIDIRECTIONAL; + else if (sync.flags & DMA_BUF_SYNC_READ) + direction = DMA_FROM_DEVICE; + else if (sync.flags & DMA_BUF_SYNC_WRITE) + direction = DMA_TO_DEVICE; + else + return -EINVAL; + + if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK) + r
[PATCH v7 2/5] dma-buf: Remove range-based flush
This patch removes range-based information used for optimizations in begin_cpu_access and end_cpu_access. We don't have any user nor implementation using range-based flush. It seems a consensus that if we ever want something like that again (or even more robust using 2D, 3D sub-range regions) we can use the upcoming dma-buf sync ioctl for such. Cc: Sumit Semwal Cc: Daniel Vetter Signed-off-by: Tiago Vignatti --- Documentation/dma-buf-sharing.txt | 19 --- drivers/dma-buf/dma-buf.c | 13 - drivers/gpu/drm/i915/i915_gem_dmabuf.c| 2 +- drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 4 ++-- drivers/gpu/drm/udl/udl_fb.c | 2 -- drivers/staging/android/ion/ion.c | 6 ++ drivers/staging/android/ion/ion_test.c| 4 ++-- include/linux/dma-buf.h | 12 +--- 8 files changed, 24 insertions(+), 38 deletions(-) diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 480c8de..4f4a84b 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -257,17 +257,15 @@ Access to a dma_buf from the kernel context involves three steps: Interface: int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, - size_t start, size_t len, enum dma_data_direction direction) This allows the exporter to ensure that the memory is actually available for cpu access - the exporter might need to allocate or swap-in and pin the backing storage. The exporter also needs to ensure that cpu access is - coherent for the given range and access direction. The range and access - direction can be used by the exporter to optimize the cache flushing, i.e. - access outside of the range or with a different direction (read instead of - write) might return stale or even bogus data (e.g. when the exporter needs to - copy the data to temporary storage). + coherent for the access direction. The direction can be used by the exporter + to optimize the cache flushing, i.e. access with a different direction (read + instead of write) might return stale or even bogus data (e.g. when the + exporter needs to copy the data to temporary storage). This step might fail, e.g. in oom conditions. @@ -322,14 +320,13 @@ Access to a dma_buf from the kernel context involves three steps: 3. Finish access - When the importer is done accessing the range specified in begin_cpu_access, - it needs to announce this to the exporter (to facilitate cache flushing and - unpinning of any pinned resources). The result of any dma_buf kmap calls - after end_cpu_access is undefined. + When the importer is done accessing the CPU, it needs to announce this to + the exporter (to facilitate cache flushing and unpinning of any pinned + resources). The result of any dma_buf kmap calls after end_cpu_access is + undefined. Interface: void dma_buf_end_cpu_access(struct dma_buf *dma_buf, - size_t start, size_t len, enum dma_data_direction dir); diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 155c146..b2ac13b 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -539,13 +539,11 @@ EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment); * preparations. Coherency is only guaranteed in the specified range for the * specified access direction. * @dmabuf:[in]buffer to prepare cpu access for. - * @start: [in]start of range for cpu access. - * @len: [in]length of range for cpu access. * @direction: [in]length of range for cpu access. * * Can return negative error values, returns 0 on success. */ -int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, size_t start, size_t len, +int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, enum dma_data_direction direction) { int ret = 0; @@ -554,8 +552,7 @@ int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, size_t start, size_t len, return -EINVAL; if (dmabuf->ops->begin_cpu_access) - ret = dmabuf->ops->begin_cpu_access(dmabuf, start, - len, direction); + ret = dmabuf->ops->begin_cpu_access(dmabuf, direction); return ret; } @@ -567,19 +564,17 @@ EXPORT_SYMBOL_GPL(dma_buf_begin_cpu_access); * actions. Coherency is only guaranteed in the specified range for the * specified access direction. * @dmabuf:[in]buffer to complete cpu access for. - * @start: [in]start of range for cpu access. - * @len: [in]length of range for cpu access. * @direction: [in]length of range for cpu access. * * This call must always succeed. */ -void dma_buf_end_cpu_access(struct dma_buf *dmabuf, size_t s
[PATCH v7 1/5] drm: prime: Honour O_RDWR during prime-handle-to-fd
From: Daniel Thompson Currently DRM_IOCTL_PRIME_HANDLE_TO_FD rejects all flags except (DRM|O)_CLOEXEC making it difficult (maybe impossible) for userspace to mmap() the resulting dma-buf even when this is supported by the DRM driver. It is trivial to relax the restriction and permit read/write access. This is safe because the flags are seldom touched by drm; mostly they are passed verbatim to dma_buf calls. v3 (Tiago): removed unused flags variable from drm_prime_handle_to_fd_ioctl. Reviewed-by: Chris Wilson Signed-off-by: Daniel Thompson Signed-off-by: Tiago Vignatti --- drivers/gpu/drm/drm_prime.c | 10 +++--- include/uapi/drm/drm.h | 1 + 2 files changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 27aa718..df6cdc7 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -329,7 +329,7 @@ static const struct dma_buf_ops drm_gem_prime_dmabuf_ops = { * drm_gem_prime_export - helper library implementation of the export callback * @dev: drm_device to export from * @obj: GEM object to export - * @flags: flags like DRM_CLOEXEC + * @flags: flags like DRM_CLOEXEC and DRM_RDWR * * This is the implementation of the gem_prime_export functions for GEM drivers * using the PRIME helpers. @@ -628,7 +628,6 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { struct drm_prime_handle *args = data; - uint32_t flags; if (!drm_core_check_feature(dev, DRIVER_PRIME)) return -EINVAL; @@ -637,14 +636,11 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data, return -ENOSYS; /* check flags are valid */ - if (args->flags & ~DRM_CLOEXEC) + if (args->flags & ~(DRM_CLOEXEC | DRM_RDWR)) return -EINVAL; - /* we only want to pass DRM_CLOEXEC which is == O_CLOEXEC */ - flags = args->flags & DRM_CLOEXEC; - return dev->driver->prime_handle_to_fd(dev, file_priv, - args->handle, flags, &args->fd); + args->handle, args->flags, &args->fd); } int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data, diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index b4e92eb..a0ebfe7 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -669,6 +669,7 @@ struct drm_set_client_cap { __u64 value; }; +#define DRM_RDWR O_RDWR #define DRM_CLOEXEC O_CLOEXEC struct drm_prime_handle { __u32 handle; -- 2.1.4
Direct userspace dma-buf mmap (v7)
Hey back, Thank you Daniel, Chris, Alex and Thomas for reviewing the last series. I think I addressed most of the comments now in version 7, including: - being even more wording in the doc about sync usage. - pass .write = false always in i915 end_cpu_access. - add sync invalid flags test (igt). - in kms_mmap_write_crc, use CPU hog and testing rounds to catch the sync problems (igt). Here are the trees: https://github.com/tiagovignatti/drm-intel/commits/drm-intel-nightly_dma-buf-mmap-v7 https://github.com/tiagovignatti/intel-gpu-tools/commits/dma-buf-mmap-v7 Also, Chrome OS side is in progress. This past week I've been mostly struggling with fail attempts to build it (boots and goes to a black screen. Sigh.) and also finding a way to make my funky BayTrail-T laptop with 32-bit UEFI firmware boot up (success with Ubuntu but no success yet in CrOS). A WIP of Chromium changes can be seen here anyways: https://codereview.chromium.org/1262043002/ Happy Holidays! Tiago -- 2.1.4
[PATCH v6 3/5] dma-buf: Add ioctls to allow userspace to flush
On 12/17/2015 07:58 PM, Thomas Hellstrom wrote: > On 12/16/2015 11:25 PM, Tiago Vignatti wrote: >> From: Daniel Vetter >> >> The userspace might need some sort of cache coherency management e.g. when >> CPU >> and GPU domains are being accessed through dma-buf at the same time. To >> circumvent this problem there are begin/end coherency markers, that forward >> directly to existing dma-buf device drivers vfunc hooks. Userspace can make >> use >> of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be >> used like following: >> - mmap dma-buf fd >> - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. >> read/write >> to mmap area 3. SYNC_END ioctl. This can be repeated as often as you >> want (with the new data being consumed by the GPU or say scanout >> device) >> - munmap once you don't need the buffer any more >> >> v2 (Tiago): Fix header file type names (u64 -> __u64) >> v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end >> dma-buf functions. Check for overflows in start/length. >> v4 (Tiago): use 2d regions for sync. >> v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and >> remove range information from struct dma_buf_sync. >> v6 (Tiago): use __u64 structured padded flags instead enum. Adjust >> documentation about the recommendation on using sync ioctls. >> >> Cc: Sumit Semwal >> Signed-off-by: Daniel Vetter >> Signed-off-by: Tiago Vignatti >> --- >> Documentation/dma-buf-sharing.txt | 22 +++- >> drivers/dma-buf/dma-buf.c | 43 >> +++ >> include/uapi/linux/dma-buf.h | 38 ++ >> 3 files changed, 102 insertions(+), 1 deletion(-) >> create mode 100644 include/uapi/linux/dma-buf.h >> >> diff --git a/Documentation/dma-buf-sharing.txt >> b/Documentation/dma-buf-sharing.txt >> index 4f4a84b..2ddd4b2 100644 >> --- a/Documentation/dma-buf-sharing.txt >> +++ b/Documentation/dma-buf-sharing.txt >> @@ -350,7 +350,27 @@ Being able to mmap an export dma-buf buffer object has >> 2 main use-cases: >> handles, too). So it's beneficial to support this in a similar fashion >> on >> dma-buf to have a good transition path for existing Android userspace. >> >> - No special interfaces, userspace simply calls mmap on the dma-buf fd. >> + No special interfaces, userspace simply calls mmap on the dma-buf fd. >> Very >> + important to note though is that, even if it is not mandatory, the >> userspace >> + is strongly recommended to always use the cache synchronization ioctl >> + (DMA_BUF_IOCTL_SYNC) discussed next. >> + >> + Some systems might need some sort of cache coherency management e.g. when >> + CPU and GPU domains are being accessed through dma-buf at the same time. >> To >> + circumvent this problem there are begin/end coherency markers, that >> forward >> + directly to existing dma-buf device drivers vfunc hooks. Userspace can >> make >> + use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence >> + would be used like following: >> + - mmap dma-buf fd >> + - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. >> read/write >> + to mmap area 3. SYNC_END ioctl. This can be repeated as often as you >> + want (with the new data being consumed by the GPU or say scanout >> device) >> + - munmap once you don't need the buffer any more >> + >> +In principle systems with the memory cache shared by the GPU and CPU may >> +not need SYNC_START and SYNC_END but still, userspace is always >> encouraged >> +to use these ioctls before and after, respectively, when accessing the >> +mapped address. >> > > I think the wording here is far too weak. If this is a generic > user-space interface and syncing > is required for > a) Correctness: then syncing must be mandatory. > b) Optimal performance then an implementation must generate expected > results also in the absence of SYNC ioctls, but is allowed to rely on > correct pairing of SYNC_START and SYNC_END to render correctly. Thomas, do you think the following write-up captures this? - No special interfaces, userspace simply calls mmap on the dma-buf fd. + No special interfaces, userspace simply calls mmap on the dma-buf fd, making + sure that the cache synchronization ioctl (DMA_BUF_IOCTL_SYNC) is *always* + used when the
[PATCH v6 4/5] drm/i915: Implement end_cpu_access
On 12/18/2015 05:02 PM, Tiago Vignatti wrote: > On 12/17/2015 06:01 AM, Chris Wilson wrote: >> On Wed, Dec 16, 2015 at 08:25:36PM -0200, Tiago Vignatti wrote: >>> This function is meant to be used with dma-buf mmap, when finishing >>> the CPU >>> access of the mapped pointer. >>> >>> +static void i915_gem_end_cpu_access(struct dma_buf *dma_buf, enum >>> dma_data_direction direction) >>> +{ >>> +struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); >>> +struct drm_device *dev = obj->base.dev; >>> +struct drm_i915_private *dev_priv = to_i915(dev); >>> +bool was_interruptible, write = (direction == DMA_BIDIRECTIONAL >>> || direction == DMA_TO_DEVICE); >>> +int ret; >>> + >>> +mutex_lock(&dev->struct_mutex); >>> +was_interruptible = dev_priv->mm.interruptible; >>> +dev_priv->mm.interruptible = false; >>> + >>> +ret = i915_gem_object_set_to_gtt_domain(obj, write); >> >> This only needs to pass .write=false. The dma-buf direction is >> only for the period of the user access, and we are now flushing the >> caches. This is equivalent to the sw-finish ioctl and ideally we just >> want the i915_gem_object_flush_cpu_write_domain(). > > in fact the only usage so far I found for end_cpu_access is when the > pinned buffer is scanout out. Should I pretty much copy sw-finish in > end_cpu_access then? And do you think it's okay to declare i915_gem_object_flush_cpu_write_domain outside its file's only scope? Tiago
[PATCH v6 4/5] drm/i915: Implement end_cpu_access
On 12/17/2015 06:01 AM, Chris Wilson wrote: > On Wed, Dec 16, 2015 at 08:25:36PM -0200, Tiago Vignatti wrote: >> This function is meant to be used with dma-buf mmap, when finishing the CPU >> access of the mapped pointer. >> >> +static void i915_gem_end_cpu_access(struct dma_buf *dma_buf, enum >> dma_data_direction direction) >> +{ >> +struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); >> +struct drm_device *dev = obj->base.dev; >> +struct drm_i915_private *dev_priv = to_i915(dev); >> +bool was_interruptible, write = (direction == DMA_BIDIRECTIONAL || >> direction == DMA_TO_DEVICE); >> +int ret; >> + >> +mutex_lock(&dev->struct_mutex); >> +was_interruptible = dev_priv->mm.interruptible; >> +dev_priv->mm.interruptible = false; >> + >> +ret = i915_gem_object_set_to_gtt_domain(obj, write); > > This only needs to pass .write=false. The dma-buf direction is > only for the period of the user access, and we are now flushing the > caches. This is equivalent to the sw-finish ioctl and ideally we just > want the i915_gem_object_flush_cpu_write_domain(). in fact the only usage so far I found for end_cpu_access is when the pinned buffer is scanout out. Should I pretty much copy sw-finish in end_cpu_access then? Thanks, Tiago
[PATCH v6 3/5] dma-buf: Add ioctls to allow userspace to flush
On 12/17/2015 04:19 PM, Alex Deucher wrote: > On Wed, Dec 16, 2015 at 5:25 PM, Tiago Vignatti > wrote: >> From: Daniel Vetter >> >> The userspace might need some sort of cache coherency management e.g. when >> CPU >> and GPU domains are being accessed through dma-buf at the same time. To >> circumvent this problem there are begin/end coherency markers, that forward >> directly to existing dma-buf device drivers vfunc hooks. Userspace can make >> use >> of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be >> used like following: >> - mmap dma-buf fd >> - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. >> read/write >> to mmap area 3. SYNC_END ioctl. This can be repeated as often as you >> want (with the new data being consumed by the GPU or say scanout >> device) >> - munmap once you don't need the buffer any more >> >> v2 (Tiago): Fix header file type names (u64 -> __u64) >> v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end >> dma-buf functions. Check for overflows in start/length. >> v4 (Tiago): use 2d regions for sync. >> v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and >> remove range information from struct dma_buf_sync. >> v6 (Tiago): use __u64 structured padded flags instead enum. Adjust >> documentation about the recommendation on using sync ioctls. >> >> Cc: Sumit Semwal >> Signed-off-by: Daniel Vetter >> Signed-off-by: Tiago Vignatti >> --- >> Documentation/dma-buf-sharing.txt | 22 +++- >> drivers/dma-buf/dma-buf.c | 43 >> +++ >> include/uapi/linux/dma-buf.h | 38 ++ >> 3 files changed, 102 insertions(+), 1 deletion(-) >> create mode 100644 include/uapi/linux/dma-buf.h >> >> diff --git a/Documentation/dma-buf-sharing.txt >> b/Documentation/dma-buf-sharing.txt >> index 4f4a84b..2ddd4b2 100644 >> --- a/Documentation/dma-buf-sharing.txt >> +++ b/Documentation/dma-buf-sharing.txt >> @@ -350,7 +350,27 @@ Being able to mmap an export dma-buf buffer object has >> 2 main use-cases: >> handles, too). So it's beneficial to support this in a similar fashion >> on >> dma-buf to have a good transition path for existing Android userspace. >> >> - No special interfaces, userspace simply calls mmap on the dma-buf fd. >> + No special interfaces, userspace simply calls mmap on the dma-buf fd. >> Very >> + important to note though is that, even if it is not mandatory, the >> userspace >> + is strongly recommended to always use the cache synchronization ioctl >> + (DMA_BUF_IOCTL_SYNC) discussed next. >> + >> + Some systems might need some sort of cache coherency management e.g. when >> + CPU and GPU domains are being accessed through dma-buf at the same time. >> To >> + circumvent this problem there are begin/end coherency markers, that >> forward >> + directly to existing dma-buf device drivers vfunc hooks. Userspace can >> make >> + use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence >> + would be used like following: >> + - mmap dma-buf fd >> + - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. >> read/write >> + to mmap area 3. SYNC_END ioctl. This can be repeated as often as you >> + want (with the new data being consumed by the GPU or say scanout >> device) >> + - munmap once you don't need the buffer any more >> + >> +In principle systems with the memory cache shared by the GPU and CPU may >> +not need SYNC_START and SYNC_END but still, userspace is always >> encouraged >> +to use these ioctls before and after, respectively, when accessing the >> +mapped address. >> >> 2. Supporting existing mmap interfaces in importers >> >> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c >> index b2ac13b..9a298bd 100644 >> --- a/drivers/dma-buf/dma-buf.c >> +++ b/drivers/dma-buf/dma-buf.c >> @@ -34,6 +34,8 @@ >> #include >> #include >> >> +#include >> + >> static inline int is_dma_buf_file(struct file *); >> >> struct dma_buf_list { >> @@ -251,11 +253,52 @@ out: >> return events; >> } >> >> +static long dma_buf_ioctl(struct file *file, >> + unsigned int cmd, un
[PATCH igt v6 6/6] tests: Add prime_mmap_coherency for cache coherency tests
Different than kms_mmap_write_crc that captures the coherency issues within the scanout mapped buffer, this one is meant for test dma-buf mmap on !llc platforms mostly and provoke coherency bugs so we know where we need the sync ioctls. I tested this with !llc and llc platforms, BTY and IVY respectively. Signed-off-by: Tiago Vignatti --- tests/Makefile.sources | 1 + tests/prime_mmap_coherency.c | 246 +++ 2 files changed, 247 insertions(+) create mode 100644 tests/prime_mmap_coherency.c diff --git a/tests/Makefile.sources b/tests/Makefile.sources index ad2dd6a..78605c6 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -97,6 +97,7 @@ TESTS_progs_M = \ pm_rc6_residency \ pm_sseu \ prime_mmap \ + prime_mmap_coherency \ prime_self_import \ template \ $(NULL) diff --git a/tests/prime_mmap_coherency.c b/tests/prime_mmap_coherency.c new file mode 100644 index 000..a9a2664 --- /dev/null +++ b/tests/prime_mmap_coherency.c @@ -0,0 +1,246 @@ +/* + * Copyright © 2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Tiago Vignatti + */ + +/** @file prime_mmap_coherency.c + * + * TODO: need to show the need for prime_sync_end(). + */ + +#include "igt.h" + +IGT_TEST_DESCRIPTION("Test dma-buf mmap on !llc platforms mostly and provoke" + " coherency bugs so we know for sure where we need the sync ioctls."); + +#define ROUNDS 20 + +int fd; +int stale = 0; +static drm_intel_bufmgr *bufmgr; +struct intel_batchbuffer *batch; +static int width = 1024, height = 1024; + +/* + * Exercises the need for read flush: + * 1. create a BO and write '0's, in GTT domain. + * 2. read BO using the dma-buf CPU mmap. + * 3. write '1's, in GTT domain. + * 4. read again through the mapped dma-buf. + */ +static void test_read_flush(bool expect_stale_cache) +{ + drm_intel_bo *bo_1; + drm_intel_bo *bo_2; + uint32_t *ptr_cpu; + uint32_t *ptr_gtt; + int dma_buf_fd, i; + + if (expect_stale_cache) + igt_require(!gem_has_llc(fd)); + + bo_1 = drm_intel_bo_alloc(bufmgr, "BO 1", width * height * 4, 4096); + + /* STEP #1: put the BO 1 in GTT domain. We use the blitter to copy and fill +* zeros to BO 1, so commands will be submitted and likely to place BO 1 in +* the GTT domain. */ + bo_2 = drm_intel_bo_alloc(bufmgr, "BO 2", width * height * 4, 4096); + intel_copy_bo(batch, bo_1, bo_2, width * height); + gem_sync(fd, bo_1->handle); + drm_intel_bo_unreference(bo_2); + + /* STEP #2: read BO 1 using the dma-buf CPU mmap. This dirties the CPU caches. */ + dma_buf_fd = prime_handle_to_fd_for_mmap(fd, bo_1->handle); + igt_skip_on(errno == EINVAL); + + ptr_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE, + MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr_cpu != MAP_FAILED); + + for (i = 0; i < (width * height) / 4; i++) + igt_assert_eq(ptr_cpu[i], 0); + + /* STEP #3: write 0x11 into BO 1. */ + bo_2 = drm_intel_bo_alloc(bufmgr, "BO 2", width * height * 4, 4096); + ptr_gtt = gem_mmap__gtt(fd, bo_2->handle, width * height, PROT_READ | PROT_WRITE); + memset(ptr_gtt, 0x11, width * height); + munmap(ptr_gtt, width * height); + + intel_copy_bo(batch, bo_1, bo_2, width * height); + gem_sync(fd, bo_1->handle); + drm_intel_bo_unreference(bo_2); + + /* STEP #4: read again using the CPU mmap. Doing #1 before #3 makes sure we +* don't do a full CPU cache flush in step #3 again. That makes sure all the +* stale cachelines from step #2 survive (mostly, a few w
[PATCH igt v6 5/6] tests: Add kms_mmap_write_crc for cache coherency tests
This program can be used to detect when CPU writes in the dma-buf mapped object don't land in scanout due cache incoherency. Although this seems a problem inherently of non-LCC machines ("Atom"), this particular test catches a cache dirt on scanout on LLC machines as well. It's inspired in Ville's kms_pwrite_crc.c and can be used also to test the correctness of the driver's begin_cpu_access and end_cpu_access (which requires i915 implementation. To see the need for flush, one has to run this same binary a few times cause it's not 100% reproducible -- what I usually do is the following, using '-n' option to not call the sync ioctls: $ while ((1)) ; do ./kms_mmap_write_crc -n; done # in terminal A $ find / # in terminal B That will most likely trashes the memory while the test will catch the coherency issue. If you now suppress '-n', then things should just work like expected. I tested this with !llc and llc platforms, BTY and IVY respectively. v2: use prime_handle_to_fd_for_mmap instead. v3: merge end_cpu_access() patch with this and provide options to disable sync. v4: use library's prime_sync_{start,end} instead. Signed-off-by: Tiago Vignatti --- tests/Makefile.sources | 1 + tests/kms_mmap_write_crc.c | 281 + 2 files changed, 282 insertions(+) create mode 100644 tests/kms_mmap_write_crc.c diff --git a/tests/Makefile.sources b/tests/Makefile.sources index 75f3cb0..ad2dd6a 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -168,6 +168,7 @@ TESTS_progs = \ kms_3d \ kms_fence_pin_leak \ kms_force_connector_basic \ + kms_mmap_write_crc \ kms_pwrite_crc \ kms_sink_crc_basic \ prime_udl \ diff --git a/tests/kms_mmap_write_crc.c b/tests/kms_mmap_write_crc.c new file mode 100644 index 000..6a12539 --- /dev/null +++ b/tests/kms_mmap_write_crc.c @@ -0,0 +1,281 @@ +/* + * Copyright © 2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Tiago Vignatti + */ + +#include +#include +#include +#include +#include + +#include "drmtest.h" +#include "igt_debugfs.h" +#include "igt_kms.h" +#include "intel_chipset.h" +#include "ioctl_wrappers.h" +#include "igt_aux.h" + +IGT_TEST_DESCRIPTION( + "Use the display CRC support to validate mmap write to an already uncached future scanout buffer."); + +typedef struct { + int drm_fd; + igt_display_t display; + struct igt_fb fb[2]; + igt_output_t *output; + igt_plane_t *primary; + enum pipe pipe; + igt_crc_t ref_crc; + igt_pipe_crc_t *pipe_crc; + uint32_t devid; +} data_t; + +static int ioctl_sync = true; +int dma_buf_fd; + +static char *dmabuf_mmap_framebuffer(int drm_fd, struct igt_fb *fb) +{ + char *ptr = NULL; + + dma_buf_fd = prime_handle_to_fd_for_mmap(drm_fd, fb->gem_handle); + igt_skip_on(dma_buf_fd == -1 && errno == EINVAL); + + ptr = mmap(NULL, fb->size, PROT_READ | PROT_WRITE, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + + return ptr; +} + +static void test(data_t *data) +{ + igt_display_t *display = &data->display; + igt_output_t *output = data->output; + struct igt_fb *fb = &data->fb[1]; + drmModeModeInfo *mode; + cairo_t *cr; + char *ptr; + uint32_t caching; + void *buf; + igt_crc_t crc; + + mode = igt_output_get_mode(output); + + /* create a non-white fb where we can write later */ + igt_create_fb(data->drm_fd, mode->hdisplay, mode->vdisplay, + DRM_FORMAT_XRGB, LOCAL_DRM
[PATCH igt v6 4/6] lib: Add prime_sync_start and prime_sync_end helpers
This patch adds dma-buf mmap synchronization ioctls that can be used by tests for cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. Signed-off-by: Tiago Vignatti --- lib/ioctl_wrappers.c | 26 ++ lib/ioctl_wrappers.h | 15 +++ 2 files changed, 41 insertions(+) diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c index 86a61ba..0d84d00 100644 --- a/lib/ioctl_wrappers.c +++ b/lib/ioctl_wrappers.c @@ -1400,6 +1400,32 @@ off_t prime_get_size(int dma_buf_fd) } /** + * prime_sync_start + * @dma_buf_fd: dma-buf fd handle + */ +void prime_sync_start(int dma_buf_fd) +{ + struct local_dma_buf_sync sync_start; + + memset(&sync_start, 0, sizeof(sync_start)); + sync_start.flags = LOCAL_DMA_BUF_SYNC_START | LOCAL_DMA_BUF_SYNC_RW; + do_ioctl(dma_buf_fd, LOCAL_DMA_BUF_IOCTL_SYNC, &sync_start); +} + +/** + * prime_sync_end + * @dma_buf_fd: dma-buf fd handle + */ +void prime_sync_end(int dma_buf_fd) +{ + struct local_dma_buf_sync sync_end; + + memset(&sync_end, 0, sizeof(sync_end)); + sync_end.flags = LOCAL_DMA_BUF_SYNC_END | LOCAL_DMA_BUF_SYNC_RW; + do_ioctl(dma_buf_fd, LOCAL_DMA_BUF_IOCTL_SYNC, &sync_end); +} + +/** * igt_require_fb_modifiers: * @fd: Open DRM file descriptor. * diff --git a/lib/ioctl_wrappers.h b/lib/ioctl_wrappers.h index d3ffba2..cbd7a73 100644 --- a/lib/ioctl_wrappers.h +++ b/lib/ioctl_wrappers.h @@ -148,6 +148,19 @@ void gem_require_caching(int fd); void gem_require_ring(int fd, int ring_id); /* prime */ +struct local_dma_buf_sync { + uint64_t flags; +}; + +#define LOCAL_DMA_BUF_SYNC_RW(3 << 0) +#define LOCAL_DMA_BUF_SYNC_START (0 << 2) +#define LOCAL_DMA_BUF_SYNC_END (1 << 2) +#define DMA_BUF_SYNC_VALID_FLAGS_MASK \ + (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END) + +#define LOCAL_DMA_BUF_BASE 'b' +#define LOCAL_DMA_BUF_IOCTL_SYNC _IOW(LOCAL_DMA_BUF_BASE, 0, struct local_dma_buf_sync) + int prime_handle_to_fd(int fd, uint32_t handle); #ifndef DRM_RDWR #define DRM_RDWR O_RDWR @@ -155,6 +168,8 @@ int prime_handle_to_fd(int fd, uint32_t handle); int prime_handle_to_fd_for_mmap(int fd, uint32_t handle); uint32_t prime_fd_to_handle(int fd, int dma_buf_fd); off_t prime_get_size(int dma_buf_fd); +void prime_sync_start(int dma_buf_fd); +void prime_sync_end(int dma_buf_fd); /* addfb2 fb modifiers */ struct local_drm_mode_fb_cmd2 { -- 2.1.4
[PATCH igt v6 3/6] prime_mmap: Add basic tests to write in a bo using CPU
This patch adds test_correct_cpu_write, which maps the texture buffer through a prime fd and then writes directly to it using the CPU. It stresses the driver to guarantee cache synchronization among the different domains. This test also adds test_forked_cpu_write, which creates the GEM bo in one process and pass the prime handle of the it to another process, which in turn uses the handle only to map and write. Roughly speaking this test simulates Chrome OS architecture, where the Web content ("unpriviledged process") maps and CPU-draws a buffer, which was previously allocated in the GPU process ("priviledged process"). This requires kernel modifications (Daniel Thompson's "drm: prime: Honour O_RDWR during prime-handle-to-fd") and therefore prime_handle_to_fd_for_mmap is added to fail in case these lack. Also, upcoming tests (e.g. next patch) are going to use it as well, so make it public and available in the lib. v2: adds prime_handle_to_fd_with_mmap for skipping test in older kernels and test for invalid flags. Signed-off-by: Tiago Vignatti --- lib/ioctl_wrappers.c | 25 +++ lib/ioctl_wrappers.h | 4 +++ tests/prime_mmap.c | 89 3 files changed, 112 insertions(+), 6 deletions(-) diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c index 6cad8a2..86a61ba 100644 --- a/lib/ioctl_wrappers.c +++ b/lib/ioctl_wrappers.c @@ -1329,6 +1329,31 @@ int prime_handle_to_fd(int fd, uint32_t handle) } /** + * prime_handle_to_fd_for_mmap: + * @fd: open i915 drm file descriptor + * @handle: file-private gem buffer object handle + * + * Same as prime_handle_to_fd above but with DRM_RDWR capabilities, which can + * be useful for writing into the mmap'ed dma-buf file-descriptor. + * + * Returns: The created dma-buf fd handle or -1 if the ioctl fails. + */ +int prime_handle_to_fd_for_mmap(int fd, uint32_t handle) +{ + struct drm_prime_handle args; + + memset(&args, 0, sizeof(args)); + args.handle = handle; + args.flags = DRM_CLOEXEC | DRM_RDWR; + args.fd = -1; + + if (drmIoctl(fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &args) != 0) + return -1; + + return args.fd; +} + +/** * prime_fd_to_handle: * @fd: open i915 drm file descriptor * @dma_buf_fd: dma-buf fd handle diff --git a/lib/ioctl_wrappers.h b/lib/ioctl_wrappers.h index bb8a858..d3ffba2 100644 --- a/lib/ioctl_wrappers.h +++ b/lib/ioctl_wrappers.h @@ -149,6 +149,10 @@ void gem_require_ring(int fd, int ring_id); /* prime */ int prime_handle_to_fd(int fd, uint32_t handle); +#ifndef DRM_RDWR +#define DRM_RDWR O_RDWR +#endif +int prime_handle_to_fd_for_mmap(int fd, uint32_t handle); uint32_t prime_fd_to_handle(int fd, int dma_buf_fd); off_t prime_get_size(int dma_buf_fd); diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c index 95304a9..269ada6 100644 --- a/tests/prime_mmap.c +++ b/tests/prime_mmap.c @@ -22,6 +22,7 @@ * * Authors: *Rob Bradford + *Tiago Vignatti * */ @@ -66,6 +67,12 @@ fill_bo(uint32_t handle, size_t size) } static void +fill_bo_cpu(char *ptr) +{ + memcpy(ptr, pattern, sizeof(pattern)); +} + +static void test_correct(void) { int dma_buf_fd; @@ -180,6 +187,65 @@ test_forked(void) gem_close(fd, handle); } +/* test simple CPU write */ +static void +test_correct_cpu_write(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd_for_mmap(fd, handle); + + /* Skip if DRM_RDWR is not supported */ + igt_skip_on(errno == EINVAL); + + /* Check correctness of map using write protection (PROT_WRITE) */ + ptr = mmap(NULL, BO_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + + /* Fill bo using CPU */ + fill_bo_cpu(ptr); + + /* Check pattern correctness */ + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + + munmap(ptr, BO_SIZE); + close(dma_buf_fd); + gem_close(fd, handle); +} + +/* map from another process and then write using CPU */ +static void +test_forked_cpu_write(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd_for_mmap(fd, handle); + + /* Skip if DRM_RDWR is not supported */ + igt_skip_on(errno == EINVAL); + + igt_fork(childno, 1) { + ptr = mmap(NULL, BO_SIZE, PROT_READ | PROT_WRITE , MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + fill_bo_cpu(ptr); + + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + munmap(ptr, BO_SIZE); + close(dma_buf_fd); + } + close(dma_buf_fd); + igt_waitchildren(); + gem_close(fd, handle); +} + static void test_ref
[PATCH igt v6 2/6] prime_mmap: Add new test for calling mmap() on dma-buf fds
From: Rob Bradford This test has the following subtests: - test_correct for correctness of the data - test_map_unmap checks for mapping idempotency - test_reprime checks for dma-buf creation idempotency - test_forked checks for multiprocess access - test_refcounting checks for buffer reference counting - test_dup checks that dup()ing the fd works - test_userptr make sure it fails when mmaping due the lack of obj->base.filp in a userptr. - test_errors checks the error return values for failures - test_aperture_limit tests multiple buffer creation at the gtt aperture limit v2 (Tiago): Removed pattern_check(), which was walking through a useless iterator. Removed superfluous PROT_WRITE from gem_mmap, in test_correct(). Added binary file to .gitignore v3 (Tiago): squash patch "prime_mmap: Test for userptr mmap" into this one. v4 (Tiago): use synchronized userptr for testing. Add test for buffer overlapping. Signed-off-by: Rob Bradford Signed-off-by: Tiago Vignatti --- tests/Makefile.sources | 1 + tests/prime_mmap.c | 417 + 2 files changed, 418 insertions(+) create mode 100644 tests/prime_mmap.c diff --git a/tests/Makefile.sources b/tests/Makefile.sources index d594038..75f3cb0 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -96,6 +96,7 @@ TESTS_progs_M = \ pm_rps \ pm_rc6_residency \ pm_sseu \ + prime_mmap \ prime_self_import \ template \ $(NULL) diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c new file mode 100644 index 000..95304a9 --- /dev/null +++ b/tests/prime_mmap.c @@ -0,0 +1,417 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Rob Bradford + * + */ + +/* + * Testcase: Check whether mmap()ing dma-buf works + */ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "drm.h" +#include "i915_drm.h" +#include "drmtest.h" +#include "igt_debugfs.h" +#include "ioctl_wrappers.h" + +#define BO_SIZE (16*1024) + +static int fd; + +char pattern[] = {0xff, 0x00, 0x00, 0x00, + 0x00, 0xff, 0x00, 0x00, + 0x00, 0x00, 0xff, 0x00, + 0x00, 0x00, 0x00, 0xff}; + +static void +fill_bo(uint32_t handle, size_t size) +{ + off_t i; + for (i = 0; i < size; i+=sizeof(pattern)) + { + gem_write(fd, handle, i, pattern, sizeof(pattern)); + } +} + +static void +test_correct(void) +{ + int dma_buf_fd; + char *ptr1, *ptr2; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + + /* Check correctness vs GEM_MMAP_GTT */ + ptr1 = gem_mmap__gtt(fd, handle, BO_SIZE, PROT_READ); + ptr2 = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr1 != MAP_FAILED); + igt_assert(ptr2 != MAP_FAILED); + igt_assert(memcmp(ptr1, ptr2, BO_SIZE) == 0); + + /* Check pattern correctness */ + igt_assert(memcmp(ptr2, pattern, sizeof(pattern)) == 0); + + munmap(ptr1, BO_SIZE); + munmap(ptr2, BO_SIZE); + close(dma_buf_fd); + gem_close(fd, handle); +} + +static void +test_map_unmap(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + + /* Unmap
[PATCH igt v6 1/6] lib: Add gem_userptr and __gem_userptr helpers
This patch moves userptr definitions and helpers implementation that were locally in gem_userptr_benchmark and gem_userptr_blits to the library, so other tests can make use of them as well. There's no functional changes. v2: added __ function to differentiate when errors want to be handled back in the caller; bring gem_userptr_sync back to gem_userptr_blits; added gtkdoc. Signed-off-by: Tiago Vignatti --- benchmarks/gem_userptr_benchmark.c | 55 +++- lib/ioctl_wrappers.c | 41 +++ lib/ioctl_wrappers.h | 13 + tests/gem_userptr_blits.c | 104 ++--- 4 files changed, 86 insertions(+), 127 deletions(-) diff --git a/benchmarks/gem_userptr_benchmark.c b/benchmarks/gem_userptr_benchmark.c index 1eae7ff..f7716df 100644 --- a/benchmarks/gem_userptr_benchmark.c +++ b/benchmarks/gem_userptr_benchmark.c @@ -58,17 +58,6 @@ #define PAGE_SIZE 4096 #endif -#define LOCAL_I915_GEM_USERPTR 0x33 -#define LOCAL_IOCTL_I915_GEM_USERPTR DRM_IOWR (DRM_COMMAND_BASE + LOCAL_I915_GEM_USERPTR, struct local_i915_gem_userptr) -struct local_i915_gem_userptr { - uint64_t user_ptr; - uint64_t user_size; - uint32_t flags; -#define LOCAL_I915_USERPTR_READ_ONLY (1<<0) -#define LOCAL_I915_USERPTR_UNSYNCHRONIZED (1<<31) - uint32_t handle; -}; - static uint32_t userptr_flags = LOCAL_I915_USERPTR_UNSYNCHRONIZED; #define BO_SIZE (65536) @@ -83,30 +72,6 @@ static void gem_userptr_test_synchronized(void) userptr_flags = 0; } -static int gem_userptr(int fd, void *ptr, int size, int read_only, uint32_t *handle) -{ - struct local_i915_gem_userptr userptr; - int ret; - - userptr.user_ptr = (uintptr_t)ptr; - userptr.user_size = size; - userptr.flags = userptr_flags; - if (read_only) - userptr.flags |= LOCAL_I915_USERPTR_READ_ONLY; - - ret = drmIoctl(fd, LOCAL_IOCTL_I915_GEM_USERPTR, &userptr); - if (ret) - ret = errno; - igt_skip_on_f(ret == ENODEV && - (userptr_flags & LOCAL_I915_USERPTR_UNSYNCHRONIZED) == 0 && - !read_only, - "Skipping, synchronized mappings with no kernel CONFIG_MMU_NOTIFIER?"); - if (ret == 0) - *handle = userptr.handle; - - return ret; -} - static void **handle_ptr_map; static unsigned int num_handle_ptr_map; @@ -144,8 +109,7 @@ static uint32_t create_userptr_bo(int fd, int size) ret = posix_memalign(&ptr, PAGE_SIZE, size); igt_assert(ret == 0); - ret = gem_userptr(fd, (uint32_t *)ptr, size, 0, &handle); - igt_assert(ret == 0); + gem_userptr(fd, (uint32_t *)ptr, size, 0, userptr_flags, &handle); add_handle_ptr(handle, ptr); return handle; @@ -167,7 +131,7 @@ static int has_userptr(int fd) assert(posix_memalign(&ptr, PAGE_SIZE, PAGE_SIZE) == 0); oldflags = userptr_flags; gem_userptr_test_unsynchronized(); - ret = gem_userptr(fd, ptr, PAGE_SIZE, 0, &handle); + ret = __gem_userptr(fd, ptr, PAGE_SIZE, 0, userptr_flags, &handle); userptr_flags = oldflags; if (ret != 0) { free(ptr); @@ -379,9 +343,7 @@ static void test_impact_overlap(int fd, const char *prefix) for (i = 0, p = block; i < nr_bos[subtest]; i++, p += PAGE_SIZE) - ret = gem_userptr(fd, (uint32_t *)p, BO_SIZE, 0, - &handles[i]); - igt_assert(ret == 0); + gem_userptr(fd, (uint32_t *)p, BO_SIZE, 0, userptr_flags, &handles[i]); } if (nr_bos[subtest] > 0) @@ -427,7 +389,6 @@ static void test_single(int fd) char *ptr, *bo_ptr; uint32_t handle = 0; unsigned long iter = 0; - int ret; unsigned long map_size = BO_SIZE + PAGE_SIZE - 1; ptr = mmap(NULL, map_size, PROT_READ | PROT_WRITE, @@ -439,8 +400,7 @@ static void test_single(int fd) start_test(test_duration_sec); while (run_test) { - ret = gem_userptr(fd, bo_ptr, BO_SIZE, 0, &handle); - assert(ret == 0); + gem_userptr(fd, bo_ptr, BO_SIZE, 0, userptr_flags, &handle); gem_close(fd, handle); iter++; } @@ -456,7 +416,6 @@ static void test_multiple(int fd, unsigned int batch, int random) uint32_t handles[1]; int map[1]; unsigned long iter = 0; - int ret; int i; unsigned long map_size = batch * BO_SIZE + PAGE_SIZE - 1; @@ -478,10 +437,8 @@ static void test_multiple(int fd, unsigned int batch, int random) if (random)
[PATCH v6 5/5] drm/i915: Use CPU mapping for userspace dma-buf mmap()
Userspace is the one in charge of flush CPU by wrapping mmap with begin{,end}_cpu_access. v2: Remove LLC check cause we have dma-buf sync providers now. Also, fix return before transferring ownership when mmap fails. v3: Fix return values. v4: !obj->base.filp is user triggerable, so removed the WARN_ON. Reviewed-by: Chris Wilson Signed-off-by: Tiago Vignatti --- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c index 9dba876..b7e7a90 100644 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c @@ -193,7 +193,23 @@ static void i915_gem_dmabuf_kunmap(struct dma_buf *dma_buf, unsigned long page_n static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma) { - return -EINVAL; + struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); + int ret; + + if (obj->base.size < vma->vm_end - vma->vm_start) + return -EINVAL; + + if (!obj->base.filp) + return -ENODEV; + + ret = obj->base.filp->f_op->mmap(obj->base.filp, vma); + if (ret) + return ret; + + fput(vma->vm_file); + vma->vm_file = get_file(obj->base.filp); + + return 0; } static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction direction) -- 2.1.4
[PATCH v6 4/5] drm/i915: Implement end_cpu_access
This function is meant to be used with dma-buf mmap, when finishing the CPU access of the mapped pointer. The error case should be rare to happen though, requiring the buffer become active during the sync period and for the end_cpu_access to be interrupted. So we use a uninterruptible mutex_lock to spit out when it ever happens. v2: disable interruption to make sure errors are reported. v3: update to the new end_cpu_access API. Reviewed-by: Chris Wilson Signed-off-by: Tiago Vignatti --- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c index 65ab2bd..9dba876 100644 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c @@ -212,6 +212,27 @@ static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_dire return ret; } +static void i915_gem_end_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction direction) +{ + struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); + struct drm_device *dev = obj->base.dev; + struct drm_i915_private *dev_priv = to_i915(dev); + bool was_interruptible, write = (direction == DMA_BIDIRECTIONAL || direction == DMA_TO_DEVICE); + int ret; + + mutex_lock(&dev->struct_mutex); + was_interruptible = dev_priv->mm.interruptible; + dev_priv->mm.interruptible = false; + + ret = i915_gem_object_set_to_gtt_domain(obj, write); + + dev_priv->mm.interruptible = was_interruptible; + mutex_unlock(&dev->struct_mutex); + + if (unlikely(ret)) + DRM_ERROR("unable to flush buffer following CPU access; rendering may be corrupt\n"); +} + static const struct dma_buf_ops i915_dmabuf_ops = { .map_dma_buf = i915_gem_map_dma_buf, .unmap_dma_buf = i915_gem_unmap_dma_buf, @@ -224,6 +245,7 @@ static const struct dma_buf_ops i915_dmabuf_ops = { .vmap = i915_gem_dmabuf_vmap, .vunmap = i915_gem_dmabuf_vunmap, .begin_cpu_access = i915_gem_begin_cpu_access, + .end_cpu_access = i915_gem_end_cpu_access, }; struct dma_buf *i915_gem_prime_export(struct drm_device *dev, -- 2.1.4
[PATCH v6 3/5] dma-buf: Add ioctls to allow userspace to flush
From: Daniel Vetter The userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be used like following: - mmap dma-buf fd - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write to mmap area 3. SYNC_END ioctl. This can be repeated as often as you want (with the new data being consumed by the GPU or say scanout device) - munmap once you don't need the buffer any more v2 (Tiago): Fix header file type names (u64 -> __u64) v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end dma-buf functions. Check for overflows in start/length. v4 (Tiago): use 2d regions for sync. v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and remove range information from struct dma_buf_sync. v6 (Tiago): use __u64 structured padded flags instead enum. Adjust documentation about the recommendation on using sync ioctls. Cc: Sumit Semwal Signed-off-by: Daniel Vetter Signed-off-by: Tiago Vignatti --- Documentation/dma-buf-sharing.txt | 22 +++- drivers/dma-buf/dma-buf.c | 43 +++ include/uapi/linux/dma-buf.h | 38 ++ 3 files changed, 102 insertions(+), 1 deletion(-) create mode 100644 include/uapi/linux/dma-buf.h diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 4f4a84b..2ddd4b2 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -350,7 +350,27 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases: handles, too). So it's beneficial to support this in a similar fashion on dma-buf to have a good transition path for existing Android userspace. - No special interfaces, userspace simply calls mmap on the dma-buf fd. + No special interfaces, userspace simply calls mmap on the dma-buf fd. Very + important to note though is that, even if it is not mandatory, the userspace + is strongly recommended to always use the cache synchronization ioctl + (DMA_BUF_IOCTL_SYNC) discussed next. + + Some systems might need some sort of cache coherency management e.g. when + CPU and GPU domains are being accessed through dma-buf at the same time. To + circumvent this problem there are begin/end coherency markers, that forward + directly to existing dma-buf device drivers vfunc hooks. Userspace can make + use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence + would be used like following: + - mmap dma-buf fd + - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write + to mmap area 3. SYNC_END ioctl. This can be repeated as often as you + want (with the new data being consumed by the GPU or say scanout device) + - munmap once you don't need the buffer any more + +In principle systems with the memory cache shared by the GPU and CPU may +not need SYNC_START and SYNC_END but still, userspace is always encouraged +to use these ioctls before and after, respectively, when accessing the +mapped address. 2. Supporting existing mmap interfaces in importers diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index b2ac13b..9a298bd 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -34,6 +34,8 @@ #include #include +#include + static inline int is_dma_buf_file(struct file *); struct dma_buf_list { @@ -251,11 +253,52 @@ out: return events; } +static long dma_buf_ioctl(struct file *file, + unsigned int cmd, unsigned long arg) +{ + struct dma_buf *dmabuf; + struct dma_buf_sync sync; + enum dma_data_direction direction; + + dmabuf = file->private_data; + + if (!is_dma_buf_file(file)) + return -EINVAL; + + switch (cmd) { + case DMA_BUF_IOCTL_SYNC: + if (copy_from_user(&sync, (void __user *) arg, sizeof(sync))) + return -EFAULT; + + if (sync.flags & DMA_BUF_SYNC_RW) + direction = DMA_BIDIRECTIONAL; + else if (sync.flags & DMA_BUF_SYNC_READ) + direction = DMA_FROM_DEVICE; + else if (sync.flags & DMA_BUF_SYNC_WRITE) + direction = DMA_TO_DEVICE; + else + return -EINVAL; + + if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK) + return -EINVAL; + + if (sync.flags & DMA_BUF_SYNC_END) + dma_buf_end_cpu_ac
[PATCH v6 2/5] dma-buf: Remove range-based flush
This patch removes range-based information used for optimizations in begin_cpu_access and end_cpu_access. We don't have any user nor implementation using range-based flush. It seems a consensus that if we ever want something like that again (or even more robust using 2D, 3D sub-range regions) we can use the upcoming dma-buf sync ioctl for such. Cc: Sumit Semwal Cc: Daniel Vetter Signed-off-by: Tiago Vignatti --- Documentation/dma-buf-sharing.txt | 19 --- drivers/dma-buf/dma-buf.c | 13 - drivers/gpu/drm/i915/i915_gem_dmabuf.c| 2 +- drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 4 ++-- drivers/gpu/drm/udl/udl_fb.c | 2 -- drivers/staging/android/ion/ion.c | 6 ++ drivers/staging/android/ion/ion_test.c| 4 ++-- include/linux/dma-buf.h | 12 +--- 8 files changed, 24 insertions(+), 38 deletions(-) diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 480c8de..4f4a84b 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -257,17 +257,15 @@ Access to a dma_buf from the kernel context involves three steps: Interface: int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, - size_t start, size_t len, enum dma_data_direction direction) This allows the exporter to ensure that the memory is actually available for cpu access - the exporter might need to allocate or swap-in and pin the backing storage. The exporter also needs to ensure that cpu access is - coherent for the given range and access direction. The range and access - direction can be used by the exporter to optimize the cache flushing, i.e. - access outside of the range or with a different direction (read instead of - write) might return stale or even bogus data (e.g. when the exporter needs to - copy the data to temporary storage). + coherent for the access direction. The direction can be used by the exporter + to optimize the cache flushing, i.e. access with a different direction (read + instead of write) might return stale or even bogus data (e.g. when the + exporter needs to copy the data to temporary storage). This step might fail, e.g. in oom conditions. @@ -322,14 +320,13 @@ Access to a dma_buf from the kernel context involves three steps: 3. Finish access - When the importer is done accessing the range specified in begin_cpu_access, - it needs to announce this to the exporter (to facilitate cache flushing and - unpinning of any pinned resources). The result of any dma_buf kmap calls - after end_cpu_access is undefined. + When the importer is done accessing the CPU, it needs to announce this to + the exporter (to facilitate cache flushing and unpinning of any pinned + resources). The result of any dma_buf kmap calls after end_cpu_access is + undefined. Interface: void dma_buf_end_cpu_access(struct dma_buf *dma_buf, - size_t start, size_t len, enum dma_data_direction dir); diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 155c146..b2ac13b 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -539,13 +539,11 @@ EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment); * preparations. Coherency is only guaranteed in the specified range for the * specified access direction. * @dmabuf:[in]buffer to prepare cpu access for. - * @start: [in]start of range for cpu access. - * @len: [in]length of range for cpu access. * @direction: [in]length of range for cpu access. * * Can return negative error values, returns 0 on success. */ -int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, size_t start, size_t len, +int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, enum dma_data_direction direction) { int ret = 0; @@ -554,8 +552,7 @@ int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, size_t start, size_t len, return -EINVAL; if (dmabuf->ops->begin_cpu_access) - ret = dmabuf->ops->begin_cpu_access(dmabuf, start, - len, direction); + ret = dmabuf->ops->begin_cpu_access(dmabuf, direction); return ret; } @@ -567,19 +564,17 @@ EXPORT_SYMBOL_GPL(dma_buf_begin_cpu_access); * actions. Coherency is only guaranteed in the specified range for the * specified access direction. * @dmabuf:[in]buffer to complete cpu access for. - * @start: [in]start of range for cpu access. - * @len: [in]length of range for cpu access. * @direction: [in]length of range for cpu access. * * This call must always succeed. */ -void dma_buf_end_cpu_access(struct dma_buf *dmabuf, size_t s
[PATCH v6 1/5] drm: prime: Honour O_RDWR during prime-handle-to-fd
From: Daniel Thompson Currently DRM_IOCTL_PRIME_HANDLE_TO_FD rejects all flags except (DRM|O)_CLOEXEC making it difficult (maybe impossible) for userspace to mmap() the resulting dma-buf even when this is supported by the DRM driver. It is trivial to relax the restriction and permit read/write access. This is safe because the flags are seldom touched by drm; mostly they are passed verbatim to dma_buf calls. v3 (Tiago): removed unused flags variable from drm_prime_handle_to_fd_ioctl. Reviewed-by: Chris Wilson Signed-off-by: Daniel Thompson Signed-off-by: Tiago Vignatti --- drivers/gpu/drm/drm_prime.c | 10 +++--- include/uapi/drm/drm.h | 1 + 2 files changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 27aa718..df6cdc7 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -329,7 +329,7 @@ static const struct dma_buf_ops drm_gem_prime_dmabuf_ops = { * drm_gem_prime_export - helper library implementation of the export callback * @dev: drm_device to export from * @obj: GEM object to export - * @flags: flags like DRM_CLOEXEC + * @flags: flags like DRM_CLOEXEC and DRM_RDWR * * This is the implementation of the gem_prime_export functions for GEM drivers * using the PRIME helpers. @@ -628,7 +628,6 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { struct drm_prime_handle *args = data; - uint32_t flags; if (!drm_core_check_feature(dev, DRIVER_PRIME)) return -EINVAL; @@ -637,14 +636,11 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data, return -ENOSYS; /* check flags are valid */ - if (args->flags & ~DRM_CLOEXEC) + if (args->flags & ~(DRM_CLOEXEC | DRM_RDWR)) return -EINVAL; - /* we only want to pass DRM_CLOEXEC which is == O_CLOEXEC */ - flags = args->flags & DRM_CLOEXEC; - return dev->driver->prime_handle_to_fd(dev, file_priv, - args->handle, flags, &args->fd); + args->handle, args->flags, &args->fd); } int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data, diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index b4e92eb..a0ebfe7 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -669,6 +669,7 @@ struct drm_set_client_cap { __u64 value; }; +#define DRM_RDWR O_RDWR #define DRM_CLOEXEC O_CLOEXEC struct drm_prime_handle { __u32 handle; -- 2.1.4
Direct userspace dma-buf mmap (v6)
Hi all, The last version of this work was sent a while ago here: http://lists.freedesktop.org/archives/dri-devel/2015-August/089263.html So let's recap this series: 1. it adds a vendor-independent client interface for mapping gem objects through prime, IOW it implements userspace mmap() on dma-buf fd. This could be used for texturing from CPU rendered buffer, passing buffers among processes without performing copies in the userspace. 2. the series lets the client write on the mmap'ed memory, and 3. it deals with GPU and CPU caches synchronization. Based on previous discussions seems that people are fine with 1. and 2. but not really with 3., given that caches coherency is a bit more boring to deal with. It's easier to use this new infra on "coherent hardware" (systems with the memory cache that is shared by the GPU and CPU) because they rarely need to use that kind of synchronization. But would be much more convenient to have the very same interface exposed for clients no matter whether the underlying hardware is cache coherent or not. One idea that came up was to force clients to call the sync ioctls after the dma-buf was mmaped. But apparently there's no easy, and performant, way to do so cause seems too costly to go over the page table entry and check the dirty bits. Also, depending on the instructions order sent for the devices, it might be needed a sync call after the mapped region gets accessed as well, to flush all cachelines and make sure for example the GPU domain won't read stale data. So that would make the things even more complicated, if we ever decide to go to this direction of forcing sync ioctls. The alternative therefore is to simply document it very well, strong wording the clients to use the sync ioctl regardless otherwise they will mis-behave. Do we have objections or maybe other wiser ways to circumvent this? I've made similar comments in August and no one has came up with better ideas. Lastly, the diff of v6 series is that I've basically addressed concerns pointed in the igt tests, organized those changes better a bit (in smaller patches), documented the usage of sync ioctls and I have extensively tested this in different types of hardware. https://github.com/tiagovignatti/drm-intel/commits/drm-intel-nightly_dma-buf-mmap-v6 https://github.com/tiagovignatti/intel-gpu-tools/commits/dma-buf-mmap-v6 Tiago Daniel Thompson (1): drm: prime: Honour O_RDWR during prime-handle-to-fd Daniel Vetter (1): dma-buf: Add ioctls to allow userspace to flush Tiago Vignatti (3): dma-buf: Remove range-based flush drm/i915: Implement end_cpu_access drm/i915: Use CPU mapping for userspace dma-buf mmap() Documentation/dma-buf-sharing.txt | 41 +++--- drivers/dma-buf/dma-buf.c | 56 ++- drivers/gpu/drm/drm_prime.c | 10 ++ drivers/gpu/drm/i915/i915_gem_dmabuf.c| 42 +-- drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 4 +-- drivers/gpu/drm/udl/udl_fb.c | 2 -- drivers/staging/android/ion/ion.c | 6 ++-- drivers/staging/android/ion/ion_test.c| 4 +-- include/linux/dma-buf.h | 12 +++ include/uapi/drm/drm.h| 1 + include/uapi/linux/dma-buf.h | 38 + 11 files changed, 169 insertions(+), 47 deletions(-) create mode 100644 include/uapi/linux/dma-buf.h And the igt changes: Rob Bradford (1): prime_mmap: Add new test for calling mmap() on dma-buf fds Tiago Vignatti (5): lib: Add gem_userptr and __gem_userptr helpers prime_mmap: Add basic tests to write in a bo using CPU lib: Add prime_sync_start and prime_sync_end helpers tests: Add kms_mmap_write_crc for cache coherency tests tests: Add prime_mmap_coherency for cache coherency tests benchmarks/gem_userptr_benchmark.c | 55 + lib/ioctl_wrappers.c | 92 +++ lib/ioctl_wrappers.h | 32 +++ tests/Makefile.sources | 3 + tests/gem_userptr_blits.c | 104 ++-- tests/kms_mmap_write_crc.c | 281 + tests/prime_mmap.c | 494 + tests/prime_mmap_coherency.c | 246 ++ 8 files changed, 1180 insertions(+), 127 deletions(-) create mode 100644 tests/kms_mmap_write_crc.c create mode 100644 tests/prime_mmap.c create mode 100644 tests/prime_mmap_coherency.c -- 2.1.4
[PATCH 3/6] dma-buf: Add ioctls to allow userspace to flush
On 08/27/2015 05:03 AM, Chris Wilson wrote: > On Wed, Aug 26, 2015 at 08:29:15PM -0300, Tiago Vignatti wrote: >> +#ifndef _DMA_BUF_UAPI_H_ >> +#define _DMA_BUF_UAPI_H_ >> + >> +enum dma_buf_sync_flags { >> +DMA_BUF_SYNC_READ = (1 << 0), >> +DMA_BUF_SYNC_WRITE = (2 << 0), >> +DMA_BUF_SYNC_RW = (3 << 0), >> +DMA_BUF_SYNC_START = (0 << 2), >> +DMA_BUF_SYNC_END = (1 << 2), >> + >> +DMA_BUF_SYNC_VALID_FLAGS_MASK = DMA_BUF_SYNC_RW | >> +DMA_BUF_SYNC_END >> +}; >> + >> +/* begin/end dma-buf functions used for userspace mmap. */ >> +struct dma_buf_sync { >> +enum dma_buf_sync_flags flags; > > It is better to use explicitly sized types. And since this is not 64b > padded, probably best to add that padding now. ahh, I've changed it to use enum instead. If I rollback to use defines then do you think works better? Like this: struct dma_buf_sync { __u64 flags; }; #define DMA_BUF_SYNC_READ (1 << 0) #define DMA_BUF_SYNC_WRITE (2 << 0) #define DMA_BUF_SYNC_RW(3 << 0) #define DMA_BUF_SYNC_START (0 << 2) #define DMA_BUF_SYNC_END (1 << 2) #define DMA_BUF_SYNC_VALID_FLAGS_MASK \ (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END) #define DMA_BUF_BASE'b' #define DMA_BUF_IOCTL_SYNC _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync)
[PATCH 3/6] dma-buf: Add ioctls to allow userspace to flush
On 08/27/2015 09:06 AM, Hwang, Dongseong wrote: > On Thu, Aug 27, 2015 at 2:29 AM, Tiago Vignatti >> +#define DMA_BUF_BASE 'b' > > 'b' is occupied by vme and qat driver; > https://github.com/torvalds/linux/blob/master/Documentation/ioctl/ioctl-number.txt#L201 I believe this is alright, as noted in that txt file: "Because of the large number of drivers, many drivers share a partial letter with other drivers". > is it bad idea for drm.h to include these definition? > http://lxr.free-electrons.com/source/include/uapi/drm/drm.h#L684 this is not a drm code and other type of device drivers might want to use it as well. Tiago
[PATCH 1/6] drm: prime: Honour O_RDWR during prime-handle-to-fd
On 08/27/2015 01:36 PM, Emil Velikov wrote: > Hi all, > > On 27 August 2015 at 00:29, Tiago Vignatti > wrote: >> From: Daniel Thompson >> >> Currently DRM_IOCTL_PRIME_HANDLE_TO_FD rejects all flags except >> (DRM|O)_CLOEXEC making it difficult (maybe impossible) for userspace >> to mmap() the resulting dma-buf even when this is supported by the >> DRM driver. >> >> It is trivial to relax the restriction and permit read/write access. >> This is safe because the flags are seldom touched by drm; mostly they >> are passed verbatim to dma_buf calls. >> > Strictly speaking shouldn't this patch be the last one in the series ? > I.e. we should lift this restriction, after the sync method > (ioctl/syscall/etc.) is in place. Or perhaps I missed something ? I think you're right about it, Emil. Thank you, Tiago
[PATCH 6/6] drm/i915: Use CPU mapping for userspace dma-buf mmap()
Userspace is the one in charge of flush CPU by wrapping mmap with begin{,end}_cpu_access. v2: Remove LLC check cause we have dma-buf sync providers now. Also, fix return before transferring ownership when mmap fails. v3: Fix return values. v4: !obj->base.filp is user triggerable, so removed the WARN_ON. Reviewed-by: Chris Wilson Signed-off-by: Tiago Vignatti --- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c index 9dba876..b7e7a90 100644 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c @@ -193,7 +193,23 @@ static void i915_gem_dmabuf_kunmap(struct dma_buf *dma_buf, unsigned long page_n static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma) { - return -EINVAL; + struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); + int ret; + + if (obj->base.size < vma->vm_end - vma->vm_start) + return -EINVAL; + + if (!obj->base.filp) + return -ENODEV; + + ret = obj->base.filp->f_op->mmap(obj->base.filp, vma); + if (ret) + return ret; + + fput(vma->vm_file); + vma->vm_file = get_file(obj->base.filp); + + return 0; } static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction direction) -- 2.1.0
[PATCH 5/6] drm/i915: Implement end_cpu_access
This function is meant to be used with dma-buf mmap, when finishing the CPU access of the mapped pointer. The error case should be rare to happen though, requiring the buffer become active during the sync period and for the end_cpu_access to be interrupted. So we use a uninterruptible mutex_lock to spit out when it ever happens. v2: disable interruption to make sure errors are reported. v3: update to the new end_cpu_access API. Reviewed-by: Chris Wilson Signed-off-by: Tiago Vignatti --- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c index 65ab2bd..9dba876 100644 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c @@ -212,6 +212,27 @@ static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_dire return ret; } +static void i915_gem_end_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction direction) +{ + struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); + struct drm_device *dev = obj->base.dev; + struct drm_i915_private *dev_priv = to_i915(dev); + bool was_interruptible, write = (direction == DMA_BIDIRECTIONAL || direction == DMA_TO_DEVICE); + int ret; + + mutex_lock(&dev->struct_mutex); + was_interruptible = dev_priv->mm.interruptible; + dev_priv->mm.interruptible = false; + + ret = i915_gem_object_set_to_gtt_domain(obj, write); + + dev_priv->mm.interruptible = was_interruptible; + mutex_unlock(&dev->struct_mutex); + + if (unlikely(ret)) + DRM_ERROR("unable to flush buffer following CPU access; rendering may be corrupt\n"); +} + static const struct dma_buf_ops i915_dmabuf_ops = { .map_dma_buf = i915_gem_map_dma_buf, .unmap_dma_buf = i915_gem_unmap_dma_buf, @@ -224,6 +245,7 @@ static const struct dma_buf_ops i915_dmabuf_ops = { .vmap = i915_gem_dmabuf_vmap, .vunmap = i915_gem_dmabuf_vunmap, .begin_cpu_access = i915_gem_begin_cpu_access, + .end_cpu_access = i915_gem_end_cpu_access, }; struct dma_buf *i915_gem_prime_export(struct drm_device *dev, -- 2.1.0
[PATCH 4/6] dma-buf: DRAFT: Make SYNC mandatory when userspace mmap
This is my (failed) attempt to make the SYNC_* mandatory. I've tried to revoke write access to the mapped region until begin_cpu_access is called. The tasklet schedule order seems alright but the whole logic is not working and I guess it's something related to the fs trick I'm trying to do with the put{,get}_write_access pair... Not sure if I should follow this direction though. I've spent much time already on it!. What do you think? Cc: Thomas Hellstrom Cc: Jérôme Glisse --- drivers/dma-buf/dma-buf.c | 31 ++- include/linux/dma-buf.h | 3 +++ 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 9a298bd..06cb37b 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -75,14 +75,34 @@ static int dma_buf_release(struct inode *inode, struct file *file) if (dmabuf->resv == (struct reservation_object *)&dmabuf[1]) reservation_object_fini(dmabuf->resv); + tasklet_kill(&dmabuf->tasklet); + module_put(dmabuf->owner); kfree(dmabuf); return 0; } +static void dmabuf_mmap_tasklet(unsigned long data) +{ + struct dma_buf *dmabuf = (struct dma_buf *) data; + struct inode *inode = file_inode(dmabuf->file); + + if (!inode) + return; + + /* the CPU accessing another device may put the cache in an incoherent state. +* Therefore if the mmap succeeds, we forbid any further write access to the +* dma-buf until SYNC_START ioctl takes place, which gets back the write +* access. */ + put_write_access(inode); + + inode_dio_wait(inode); +} + static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma) { struct dma_buf *dmabuf; + int ret; if (!is_dma_buf_file(file)) return -EINVAL; @@ -94,7 +114,11 @@ static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma) dmabuf->size >> PAGE_SHIFT) return -EINVAL; - return dmabuf->ops->mmap(dmabuf, vma); + ret = dmabuf->ops->mmap(dmabuf, vma); + if (!ret) + tasklet_schedule(&dmabuf->tasklet); + + return ret; } static loff_t dma_buf_llseek(struct file *file, loff_t offset, int whence) @@ -389,6 +413,8 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info) list_add(&dmabuf->list_node, &db_list.head); mutex_unlock(&db_list.lock); + tasklet_init(&dmabuf->tasklet, dmabuf_mmap_tasklet, (unsigned long) dmabuf); + return dmabuf; } EXPORT_SYMBOL_GPL(dma_buf_export); @@ -589,6 +615,7 @@ EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment); int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, enum dma_data_direction direction) { + struct inode *inode = file_inode(dmabuf->file); int ret = 0; if (WARN_ON(!dmabuf)) @@ -597,6 +624,8 @@ int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, if (dmabuf->ops->begin_cpu_access) ret = dmabuf->ops->begin_cpu_access(dmabuf, direction); + get_write_access(inode); + return ret; } EXPORT_SYMBOL_GPL(dma_buf_begin_cpu_access); diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 532108e..0359792 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -24,6 +24,7 @@ #ifndef __DMA_BUF_H__ #define __DMA_BUF_H__ +#include #include #include #include @@ -118,6 +119,7 @@ struct dma_buf_ops { * @list_node: node for dma_buf accounting and debugging. * @priv: exporter specific private data for this buffer object. * @resv: reservation object linked to this dma-buf + * @tasklet: tasklet for deferred mmap tasks. */ struct dma_buf { size_t size; @@ -133,6 +135,7 @@ struct dma_buf { struct list_head list_node; void *priv; struct reservation_object *resv; + struct tasklet_struct tasklet; /* poll support */ wait_queue_head_t poll; -- 2.1.0
[PATCH 3/6] dma-buf: Add ioctls to allow userspace to flush
From: Daniel Vetter The userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be used like following: - mmap dma-buf fd - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write to mmap area 3. SYNC_END ioctl. This can be repeated as often as you want (with the new data being consumed by the GPU or say scanout device) - munamp once you don't need the buffer any more v2 (Tiago): Fix header file type names (u64 -> __u64) v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end dma-buf functions. Check for overflows in start/length. v4 (Tiago): use 2d regions for sync. v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and remove range information from struct dma_buf_sync. Cc: Sumit Semwal Signed-off-by: Daniel Vetter Signed-off-by: Tiago Vignatti --- Documentation/dma-buf-sharing.txt | 12 +++ drivers/dma-buf/dma-buf.c | 43 +++ include/uapi/linux/dma-buf.h | 41 + 3 files changed, 96 insertions(+) create mode 100644 include/uapi/linux/dma-buf.h diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 4f4a84b..ec0ab1d 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -352,6 +352,18 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases: No special interfaces, userspace simply calls mmap on the dma-buf fd. + Also, the userspace might need some sort of cache coherency management e.g. + when CPU and GPU domains are being accessed through dma-buf at the same + time. To circumvent this problem there are begin/end coherency markers, that + forward directly to existing dma-buf device drivers vfunc hooks. Userspace + can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The + sequence would be used like following: + - mmap dma-buf fd + - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write + to mmap area 3. SYNC_END ioctl. This can be repeated as often as you + want (with the new data being consumed by the GPU or say scanout device) + - munamp once you don't need the buffer any more + 2. Supporting existing mmap interfaces in importers Similar to the motivation for kernel cpu access it is again important that diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index b2ac13b..9a298bd 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -34,6 +34,8 @@ #include #include +#include + static inline int is_dma_buf_file(struct file *); struct dma_buf_list { @@ -251,11 +253,52 @@ out: return events; } +static long dma_buf_ioctl(struct file *file, + unsigned int cmd, unsigned long arg) +{ + struct dma_buf *dmabuf; + struct dma_buf_sync sync; + enum dma_data_direction direction; + + dmabuf = file->private_data; + + if (!is_dma_buf_file(file)) + return -EINVAL; + + switch (cmd) { + case DMA_BUF_IOCTL_SYNC: + if (copy_from_user(&sync, (void __user *) arg, sizeof(sync))) + return -EFAULT; + + if (sync.flags & DMA_BUF_SYNC_RW) + direction = DMA_BIDIRECTIONAL; + else if (sync.flags & DMA_BUF_SYNC_READ) + direction = DMA_FROM_DEVICE; + else if (sync.flags & DMA_BUF_SYNC_WRITE) + direction = DMA_TO_DEVICE; + else + return -EINVAL; + + if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK) + return -EINVAL; + + if (sync.flags & DMA_BUF_SYNC_END) + dma_buf_end_cpu_access(dmabuf, direction); + else + dma_buf_begin_cpu_access(dmabuf, direction); + + return 0; + default: + return -ENOTTY; + } +} + static const struct file_operations dma_buf_fops = { .release= dma_buf_release, .mmap = dma_buf_mmap_internal, .llseek = dma_buf_llseek, .poll = dma_buf_poll, + .unlocked_ioctl = dma_buf_ioctl, }; /* diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h new file mode 100644 index 000..262f7a7 --- /dev/null +++ b/include/uapi/linux/dma-buf.h @@ -0,0 +1,41 @@ +/* + * Framework for buffer objects that can be shared across devices/subsystems. + * + * Copyright(C) 2015 Intel Ltd + * +
[PATCH 2/6] dma-buf: Remove range-based flush
This patch removes range-based information used for optimizations in begin_cpu_access and end_cpu_access. We don't have any user nor implementation using range-based flush. It seems a consensus that if we ever want something like that again (or even more robust using 2D, 3D sub-range regions) we can use the upcoming dma-buf sync ioctl for such. Cc: Sumit Semwal Cc: Daniel Vetter Signed-off-by: Tiago Vignatti --- Documentation/dma-buf-sharing.txt | 19 --- drivers/dma-buf/dma-buf.c | 13 - drivers/gpu/drm/i915/i915_gem_dmabuf.c| 2 +- drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 4 ++-- drivers/gpu/drm/udl/udl_fb.c | 2 -- drivers/staging/android/ion/ion.c | 6 ++ drivers/staging/android/ion/ion_test.c| 4 ++-- include/linux/dma-buf.h | 12 +--- 8 files changed, 24 insertions(+), 38 deletions(-) diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 480c8de..4f4a84b 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -257,17 +257,15 @@ Access to a dma_buf from the kernel context involves three steps: Interface: int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, - size_t start, size_t len, enum dma_data_direction direction) This allows the exporter to ensure that the memory is actually available for cpu access - the exporter might need to allocate or swap-in and pin the backing storage. The exporter also needs to ensure that cpu access is - coherent for the given range and access direction. The range and access - direction can be used by the exporter to optimize the cache flushing, i.e. - access outside of the range or with a different direction (read instead of - write) might return stale or even bogus data (e.g. when the exporter needs to - copy the data to temporary storage). + coherent for the access direction. The direction can be used by the exporter + to optimize the cache flushing, i.e. access with a different direction (read + instead of write) might return stale or even bogus data (e.g. when the + exporter needs to copy the data to temporary storage). This step might fail, e.g. in oom conditions. @@ -322,14 +320,13 @@ Access to a dma_buf from the kernel context involves three steps: 3. Finish access - When the importer is done accessing the range specified in begin_cpu_access, - it needs to announce this to the exporter (to facilitate cache flushing and - unpinning of any pinned resources). The result of any dma_buf kmap calls - after end_cpu_access is undefined. + When the importer is done accessing the CPU, it needs to announce this to + the exporter (to facilitate cache flushing and unpinning of any pinned + resources). The result of any dma_buf kmap calls after end_cpu_access is + undefined. Interface: void dma_buf_end_cpu_access(struct dma_buf *dma_buf, - size_t start, size_t len, enum dma_data_direction dir); diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 155c146..b2ac13b 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -539,13 +539,11 @@ EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment); * preparations. Coherency is only guaranteed in the specified range for the * specified access direction. * @dmabuf:[in]buffer to prepare cpu access for. - * @start: [in]start of range for cpu access. - * @len: [in]length of range for cpu access. * @direction: [in]length of range for cpu access. * * Can return negative error values, returns 0 on success. */ -int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, size_t start, size_t len, +int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, enum dma_data_direction direction) { int ret = 0; @@ -554,8 +552,7 @@ int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, size_t start, size_t len, return -EINVAL; if (dmabuf->ops->begin_cpu_access) - ret = dmabuf->ops->begin_cpu_access(dmabuf, start, - len, direction); + ret = dmabuf->ops->begin_cpu_access(dmabuf, direction); return ret; } @@ -567,19 +564,17 @@ EXPORT_SYMBOL_GPL(dma_buf_begin_cpu_access); * actions. Coherency is only guaranteed in the specified range for the * specified access direction. * @dmabuf:[in]buffer to complete cpu access for. - * @start: [in]start of range for cpu access. - * @len: [in]length of range for cpu access. * @direction: [in]length of range for cpu access. * * This call must always succeed. */ -void dma_buf_end_cpu_access(struct dma_buf *dmabuf, size_t s
[PATCH 1/6] drm: prime: Honour O_RDWR during prime-handle-to-fd
From: Daniel Thompson Currently DRM_IOCTL_PRIME_HANDLE_TO_FD rejects all flags except (DRM|O)_CLOEXEC making it difficult (maybe impossible) for userspace to mmap() the resulting dma-buf even when this is supported by the DRM driver. It is trivial to relax the restriction and permit read/write access. This is safe because the flags are seldom touched by drm; mostly they are passed verbatim to dma_buf calls. v3 (Tiago): removed unused flags variable from drm_prime_handle_to_fd_ioctl. Reviewed-by: Chris Wilson Signed-off-by: Daniel Thompson Signed-off-by: Tiago Vignatti --- drivers/gpu/drm/drm_prime.c | 10 +++--- include/uapi/drm/drm.h | 1 + 2 files changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 27aa718..df6cdc7 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -329,7 +329,7 @@ static const struct dma_buf_ops drm_gem_prime_dmabuf_ops = { * drm_gem_prime_export - helper library implementation of the export callback * @dev: drm_device to export from * @obj: GEM object to export - * @flags: flags like DRM_CLOEXEC + * @flags: flags like DRM_CLOEXEC and DRM_RDWR * * This is the implementation of the gem_prime_export functions for GEM drivers * using the PRIME helpers. @@ -628,7 +628,6 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { struct drm_prime_handle *args = data; - uint32_t flags; if (!drm_core_check_feature(dev, DRIVER_PRIME)) return -EINVAL; @@ -637,14 +636,11 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data, return -ENOSYS; /* check flags are valid */ - if (args->flags & ~DRM_CLOEXEC) + if (args->flags & ~(DRM_CLOEXEC | DRM_RDWR)) return -EINVAL; - /* we only want to pass DRM_CLOEXEC which is == O_CLOEXEC */ - flags = args->flags & DRM_CLOEXEC; - return dev->driver->prime_handle_to_fd(dev, file_priv, - args->handle, flags, &args->fd); + args->handle, args->flags, &args->fd); } int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data, diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 3801584..ad8223e 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -668,6 +668,7 @@ struct drm_set_client_cap { __u64 value; }; +#define DRM_RDWR O_RDWR #define DRM_CLOEXEC O_CLOEXEC struct drm_prime_handle { __u32 handle; -- 2.1.0
[PATCH v5] mmap on the dma-buf directly
Lot's of discussion since v4, thanks for your feedback: http://lists.freedesktop.org/archives/dri-devel/2015-August/088259.html The two main concerns was about range-based flush (which we decided to postpone) and about making the SYNC ioctl mandatory. I need you guys guiding and educating me on the latter now. PTAL. Tiago Daniel Thompson (1): drm: prime: Honour O_RDWR during prime-handle-to-fd Daniel Vetter (1): dma-buf: Add ioctls to allow userspace to flush Tiago Vignatti (4): dma-buf: Remove range-based flush dma-buf: DRAFT: Make SYNC mandatory when userspace mmap drm/i915: Implement end_cpu_access drm/i915: Use CPU mapping for userspace dma-buf mmap() Documentation/dma-buf-sharing.txt | 31 +++ drivers/dma-buf/dma-buf.c | 87 +++ drivers/gpu/drm/drm_prime.c | 10 ++-- drivers/gpu/drm/i915/i915_gem_dmabuf.c| 42 ++- drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 4 +- drivers/gpu/drm/udl/udl_fb.c | 2 - drivers/staging/android/ion/ion.c | 6 +-- drivers/staging/android/ion/ion_test.c| 4 +- include/linux/dma-buf.h | 15 +++--- include/uapi/drm/drm.h| 1 + include/uapi/linux/dma-buf.h | 41 +++ 11 files changed, 196 insertions(+), 47 deletions(-) create mode 100644 include/uapi/linux/dma-buf.h -- 2.1.0
[PATCH v4] dma-buf: Add ioctls to allow userspace to flush
On 08/26/2015 11:51 AM, Daniel Vetter wrote: > On Wed, Aug 26, 2015 at 11:32:30AM -0300, Tiago Vignatti wrote: >> On 08/26/2015 09:58 AM, Daniel Vetter wrote: >>> The other is that right now there's no user nor implementation in sight >>> which actually does range-based flush optimizations, so I'm pretty much >>> expecting we'll get it wrong. Maybe instead we should go one step further >>> and remove the range from the internal dma-buf interface and also drop it >> >from the ioctl? With the flags we can always add something later on once >>> we have a real user with a clear need for it. But afaik cros only wants to >>> shuffle around entire tiles and has a buffer-per-tile approach. >> >> Thomas, I think Daniel has a point here and also, I wouldn't mind removing >> all range control from the dma-buf ioctl either. > > if we go with nuking it from the ioctl I'd suggest to also nuke it from > the dma-buf internal inferface first too. yep, I can do it. Thomas, so we leave 2d sync out now? Tiago
[PATCH v4] dma-buf: Add ioctls to allow userspace to flush
On 08/26/2015 09:58 AM, Daniel Vetter wrote: > The other is that right now there's no user nor implementation in sight > which actually does range-based flush optimizations, so I'm pretty much > expecting we'll get it wrong. Maybe instead we should go one step further > and remove the range from the internal dma-buf interface and also drop it > from the ioctl? With the flags we can always add something later on once > we have a real user with a clear need for it. But afaik cros only wants to > shuffle around entire tiles and has a buffer-per-tile approach. Thomas, I think Daniel has a point here and also, I wouldn't mind removing all range control from the dma-buf ioctl either. In this last patch we can see that it's not complicated to add the 2d-sync if we eventually decides to want it. But right now there's no way we can test it. Therefore in that case I'm all in favour of doing this work gradually, adding the basics first. Tiago
[PATCH v4] dma-buf: Add ioctls to allow userspace to flush
From: Daniel Vetter The userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be used like following: - mmap dma-buf fd - for each drawing/upload cycle in CPU 1. SYNC_START ioctl 2. read/write to mmap area or a 2d sub-region of it 3. SYNC_END ioctl. - munamp once you don't need the buffer any more v2 (Tiago): Fix header file type names (u64 -> __u64) v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end dma-buf functions. Check for overflows in start/length. v4 (Tiago): use 2d regions for sync. Cc: Sumit Semwal Signed-off-by: Daniel Vetter Signed-off-by: Tiago Vignatti --- I'm unable to test the 2d sync properly, because begin/end access in i915 don't track mapped range for nothing. Documentation/dma-buf-sharing.txt | 13 ++ drivers/dma-buf/dma-buf.c | 77 -- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 6 ++- include/linux/dma-buf.h| 20 + include/uapi/linux/dma-buf.h | 57 + 5 files changed, 150 insertions(+), 23 deletions(-) create mode 100644 include/uapi/linux/dma-buf.h diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 480c8de..8061ac0 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -355,6 +355,19 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases: No special interfaces, userspace simply calls mmap on the dma-buf fd. + Also, the userspace might need some sort of cache coherency management e.g. + when CPU and GPU domains are being accessed through dma-buf at the same + time. To circumvent this problem there are begin/end coherency markers, that + forward directly to existing dma-buf device drivers vfunc hooks. Userspace + can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The + sequence would be used like following: + - mmap dma-buf fd + - for each drawing/upload cycle in CPU + 1. SYNC_START ioctl + 2. read/write to mmap area or a 2d sub-region of it + 3. SYNC_END ioctl. + - munamp once you don't need the buffer any more + 2. Supporting existing mmap interfaces in importers Similar to the motivation for kernel cpu access it is again important that diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 155c146..b6a4a06 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -251,11 +251,55 @@ out: return events; } +static long dma_buf_ioctl(struct file *file, + unsigned int cmd, unsigned long arg) +{ + struct dma_buf *dmabuf; + struct dma_buf_sync sync; + enum dma_data_direction direction; + + dmabuf = file->private_data; + + if (!is_dma_buf_file(file)) + return -EINVAL; + + if (cmd != DMA_BUF_IOCTL_SYNC) + return -ENOTTY; + + if (copy_from_user(&sync, (void __user *) arg, sizeof(sync))) + return -EFAULT; + + if (sync.flags & DMA_BUF_SYNC_RW) + direction = DMA_BIDIRECTIONAL; + else if (sync.flags & DMA_BUF_SYNC_READ) + direction = DMA_FROM_DEVICE; + else if (sync.flags & DMA_BUF_SYNC_WRITE) + direction = DMA_TO_DEVICE; + else + return -EINVAL; + + if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK) + return -EINVAL; + + /* TODO: check for overflowing the buffer's size - how so, checking region by +* region here? Maybe need to check for the other parameters as well. */ + + if (sync.flags & DMA_BUF_SYNC_END) + dma_buf_end_cpu_access(dmabuf, sync.stride_bytes, sync.bytes_per_pixel, + sync.num_regions, sync.regions, direction); + else + dma_buf_begin_cpu_access(dmabuf, sync.stride_bytes, sync.bytes_per_pixel, + sync.num_regions, sync.regions, direction); + + return 0; +} + static const struct file_operations dma_buf_fops = { .release= dma_buf_release, .mmap = dma_buf_mmap_internal, .llseek = dma_buf_llseek, .poll = dma_buf_poll, + .unlocked_ioctl = dma_buf_ioctl, }; /* @@ -539,14 +583,17 @@ EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment); * preparations. Coherency is only guaranteed in the specified range for the * specified access direction. * @dmabuf:[in]buffer to prepare cpu access for. - * @start: [in]start of range for cpu access.
about mmap dma-buf and sync
On 08/25/2015 06:30 AM, Thomas Hellstrom wrote: > On 08/25/2015 11:02 AM, Daniel Vetter wrote: >> I really feel like any kind of multi-range flush interface is feature >> bloat, and if we do it then we should only do it later on when there's a >> clear need for it. > > IMO all the use-cases so far that wanted to do this have been 2D > updates. and having only a 1D sync will most probably scare people away > from this interface. > >> Afaiui dma-buf mmap will mostly be used for up/downloads, which means >> there will be some explicit copy involved somewhere anyway. So similar to >> userptr usage. And for those most often you just slurp in a linear block >> and then let the blitter rearrange it, at least on i915. >> >> Also thus far the consensus was that dma-buf doesn't care/know about >> content layout, adding strides/bytes/whatever does bend this quite a bit. > > I think with a 2D interface, the stride only applies to the sync > operation itself and is only a parameter for that sync operation. > Whether we should have multiple regions or not is not a big deal for me, > but I think at least a 2D sync is crucial. Right now only omap, ion and udl-fb make use of begin{,end}_cpu_access() dma-buf interface, but it's curious that none uses those 1-d parameters (start and len). So in that sense it seems that the tendency is to feature bloat the API if we do the 2-d additions. OTOH we're talking about a different usage of dma-buf right now, so the driver might actually start to use in fact that API. That said, I thought it was somewhat simple to turn the common code into 2-d, cause, as I pointed in the other email, we'd be pushing the whole responsibility of dealing with the regions and so on to the driver implementors. Thomas, any comments in the dma_buf_begin_cpu_access() new API I showed? Maybe I should just clean up here the draft and sent it to the ML :) Tiago
about mmap dma-buf and sync
On 08/24/2015 03:01 PM, Tiago Vignatti wrote: > yup, I think so. So IIUC the main changes needed for the drivers > implement 2D sync lies in the dma_buf_sync_2d structure only. I.e. > there's nothing really to be changed in the common code, right? Do we have any special requirements in how we want pass the sync information to the drivers? I was thinking to push the whole responsibility for them, something like: +int dma_buf_begin_cpu_access(struct dma_buf *dma_buf, size_t stride_bytes, +size_t bytes_per_pixel, size_t num_regions, +struct dma_buf_sync_region regions[], enum dma_data_direction dir); Daniel Vetter mentioned about dma-buf design that should not track metadata but I haven't read anything about it, so do you think this looks alright? Tiago
about mmap dma-buf and sync
On 08/24/2015 02:42 PM, Thomas Hellstrom wrote: > On 08/24/2015 07:12 PM, Daniel Stone wrote: >> Hi, >> >> On 24 August 2015 at 18:10, Thomas Hellstrom >> wrote: >>> On 08/24/2015 07:04 PM, Daniel Stone wrote: On 24 August 2015 at 17:56, Thomas Hellstrom wrote: > On 08/24/2015 05:52 PM, Daniel Stone wrote: >> I still don't think this ameliorates the need for batching: consider >> the case where you update two disjoint screen regions and want them >> both flushed. Either you issue two separate sync calls (which can be >> disadvantageous performance-wise on some hardware / setups), or you >> accumulate the regions and only flush later. So either two ioctls (one >> in the style of dirtyfb and one to perform the sync/flush; you can >> shortcut to assume the full buffer was damaged if the former is >> missing), or one like this: >> >> struct dma_buf_sync_2d { >> enum dma_buf_sync_flags flags; >> >> __u64 stride_bytes; >> __u32 bytes_per_pixel; >> __u32 num_regions; >> >> struct { >> __u64 x; >> __u64 y; >> __u64 width; >> __u64 height; >> } regions[]; >> }; > Fine with me, although perhaps bytes_per_pixel is a bit redundant? Redundant how? It's not implicit in stride. >>> For flushing purposes, isn't it possible to cover all cases by assuming >>> bytes_per_pixel=1? Not that it matters much. >> Sure, though in that case best to replace x with line_byte_offset or >> something, because otherwise I guarantee you everyone will get it >> wrong, and it'll be a pain to track down. Like how I managed to >> misread it now. :) > > OK, yeah you have a point. IMO let's go for your proposal. > > Tiago, is this OK with you? yup, I think so. So IIUC the main changes needed for the drivers implement 2D sync lies in the dma_buf_sync_2d structure only. I.e. there's nothing really to be changed in the common code, right? Then I'll just need to stick somewhere the logic about making sync mandatory, which I couldn't conclude much from your discussions with Jerome et al. I'll need to investigate more here. Also, I still want to iterate with Google policy team about the actual need for a syscall. But I believe that eventually could be an secondary phase of this work (in case we ever agree upon having that). Tiago
about mmap dma-buf and sync
Hi back! On 08/20/2015 03:48 AM, Thomas Hellstrom wrote: > Hi, Tiago! Something that the Chrome OS folks told me today is whether we could change the sync API to use a syscall for that instead. Reason for that is to eventually fit this in nicely in their sandbox architecture requirements, so yet another ioctl wouldn't need to be white-listed for the unpriviledged process. How does that change the game, for example regarding the details we been talking here about make the sync mandatory? Can we reuse, say msync(2) for that for example? Best Regards, Tiago
[PATCH] staging/android: Update ION TODO per LPC discussion
sgtm. Thanks for keeping me in the loop. Tiago On 08/21/2015 06:02 PM, Daniel Vetter wrote: > We discussed a bit with the folks on the Cc: list below what to do > with ION. Two big take-aways: > > - High-performance drivers (like gpus) always want to play tricks with >coherency and will lie to the dma api (radeon, nouveau, i915 gpu >drivers all do so in upstream). What needs to be done here is fill >gaps in dma-buf so that we can do this without breaking the dma-api >expections of other clients like v4l. The consesus is that hw won't >stop needing these tricks anytime soon. > > - Placement constraints for shared buffers won't be solved any other >way than through something platform-specific like ion with >platform-specific knowledge in userspace in something like gralloc. >For general-purpose devices where this assumption would be painful >for userspace (like servers) the consensus is that such devices will >have proper MMUs where placement constraint handling is fairly >irrelevant. > > Hence it is reasonable to destage ion as-is without changing the > overall design to enable these use-cases and just fixing up a these > few fairly minor things. Since there won't relly be an open-source > userspace for ion (and hence drm maintainers won't take it) the > proposal is to eventually move it to drivers/android/ion.[hc]. Laura > would be ok with being maintainer once this is all done and ion is > destaged. > > Note that Thiago is working on exposing the cpu cache flushing for > cpu access from userspace through mmaps so this is alread in progress. > Also adding him to the Cc: list. > > v2: Add ION_IOC_IMPORT to the list of ioctl that probably should go. > > Cc: Laura Abbott > Cc: sumit.semwal at linaro.org > Cc: laurent.pinchart at ideasonboard.com > Cc: ghackmann at google.com > Cc: robdclark at gmail.com > Cc: david.brown at arm.com > Cc: romlem at google.com > Cc: Tiago Vignatti > Signed-off-by: Daniel Vetter > --- > drivers/staging/android/TODO | 20 > 1 file changed, 20 insertions(+) > > diff --git a/drivers/staging/android/TODO b/drivers/staging/android/TODO > index 06954cdf3dba..bc84a72af32d 100644 > --- a/drivers/staging/android/TODO > +++ b/drivers/staging/android/TODO > @@ -13,5 +13,25 @@ TODO: > - This bug is introduced by Xiong Zhou in the patch bd471258f2e09 > - ("staging: android: logger: use kuid_t instead of uid_t") > > + > +ion/ > + - Remove ION_IOC_SYNC: Flushing for devices should be purely a kernel > internal > + interface on top of dma-buf. flush_for_device needs to be added to dma-buf > + first. > + - Remove ION_IOC_CUSTOM: Atm used for cache flushing for cpu access in some > + vendor trees. Should be replaced with an ioctl on the dma-buf to expose > the > + begin/end_cpu_access hooks to userspace. > + - Clarify the tricks ion plays with explicitly managing coherency behind the > + dma api's back (this is absolutely needed for high-perf gpu drivers): Add > an > + explicit coherency management mode to flush_for_device to be used by > drivers > + which want to manage caches themselves and which indicates whether cpu > caches > + need flushing. > + - With those removed there's probably no use for ION_IOC_IMPORT anymore > either > + since ion would just be the central allocator for shared buffers. > + - Add dt-binding to expose cma regions as ion heaps, with the rule that any > + such cma regions must already be used by some device for dma. I.e. ion > only > + exposes existing cma regions and doesn't reserve unecessarily memory when > + booting a system which doesn't use ion. > + > Please send patches to Greg Kroah-Hartman and Cc: > Brian Swetland >
[PATCH 1/7] prime_mmap: Add new test for calling mmap() on dma-buf fds
Hi Daniel, On 08/13/2015 04:04 AM, Daniel Vetter wrote: > On Wed, Aug 12, 2015 at 08:29:14PM -0300, Tiago Vignatti wrote: >> +/* Map too big */ >> +handle = gem_create(fd, BO_SIZE); >> +fill_bo(handle, BO_SIZE); >> +dma_buf_fd = prime_handle_to_fd(fd, handle); >> +igt_assert(errno == 0); >> +ptr = mmap(NULL, BO_SIZE * 2, PROT_READ, MAP_SHARED, dma_buf_fd, 0); >> +igt_assert(ptr == MAP_FAILED && errno == EINVAL); >> +errno = 0; >> +close(dma_buf_fd); >> +gem_close(fd, handle); > > That only checks for one of the conditions, trying to map something > offset/overlapping the end of the buffer, but small enough needs to be > tested separately. you mean test for smaller length with a non-zero offset? I don't think I get what you wanted to say here maybe. > Also I think a testcase for invalid buffer2fd flags would be good, just > for completeness - we seem to be missing that one. you mean test for different flags than the ones supported by DRM_IOCTL_PRIME_HANDLE_TO_FD? Tiago
[PATCH 3/7] prime_mmap: Add basic tests to write in a bo using CPU
On 08/13/2015 04:01 AM, Daniel Vetter wrote: > On Wed, Aug 12, 2015 at 08:29:16PM -0300, Tiago Vignatti wrote: >> This patch adds test_correct_cpu_write, which maps the texture buffer >> through a >> prime fd and then writes directly to it using the CPU. It stresses the driver >> to guarantee cache synchronization among the different domains. >> >> This test also adds test_forked_cpu_write, which creates the GEM bo in one >> process and pass the prime handle of the it to another process, which in turn >> uses the handle only to map and write. Grossly speaking this test simulates >> Chrome OS architecture, where the Web content ("unpriviledged process") maps >> and CPU-draws a buffer, which was previously allocated in the GPU process >> ("priviledged process"). >> >> This requires kernel modifications (Daniel Thompson's "drm: prime: Honour >> O_RDWR during prime-handle-to-fd"). >> >> Signed-off-by: Tiago Vignatti > > Squash with previous patch? why? if the whole point is to decrease the amount of patches, then I prefer to squash 2/7 with the 1/7 (although they're from different authors and would be nice to keep separately the changes from each). This patch here introduces this writing to mmap'ed dma-buf fd, a concept that is still in debate, requiring a kernel counter-part so that's why I preferred to keep it away. >> --- >> lib/ioctl_wrappers.c | 5 +++- >> tests/prime_mmap.c | 65 >> >> 2 files changed, 69 insertions(+), 1 deletion(-) >> >> diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c >> index 53bd635..941fa66 100644 >> --- a/lib/ioctl_wrappers.c >> +++ b/lib/ioctl_wrappers.c >> @@ -1125,6 +1125,9 @@ void gem_require_ring(int fd, int ring_id) >> >> /* prime */ >> >> +#ifndef DRM_RDWR >> +#define DRM_RDWR O_RDWR >> +#endif >> /** >>* prime_handle_to_fd: >>* @fd: open i915 drm file descriptor >> @@ -1142,7 +1145,7 @@ int prime_handle_to_fd(int fd, uint32_t handle) >> >> memset(&args, 0, sizeof(args)); >> args.handle = handle; >> -args.flags = DRM_CLOEXEC; >> +args.flags = DRM_CLOEXEC | DRM_RDWR; > > This needs to be optional otherwise all the existing prime tests start > falling over on older kernels. Probably need a > prime_handle_to_fd_with_mmap, which doesn an igt_skip if it fails. true. Thank you. > -Daniel > >> args.fd = -1; >> >> do_ioctl(fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &args); >> diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c >> index dc59e8f..ad91371 100644 >> --- a/tests/prime_mmap.c >> +++ b/tests/prime_mmap.c >> @@ -22,6 +22,7 @@ >>* >>* Authors: >>*Rob Bradford >> + *Tiago Vignatti >>* >>*/ >> >> @@ -66,6 +67,12 @@ fill_bo(uint32_t handle, size_t size) >> } >> >> static void >> +fill_bo_cpu(char *ptr) >> +{ >> +memcpy(ptr, pattern, sizeof(pattern)); >> +} >> + >> +static void >> test_correct(void) >> { >> int dma_buf_fd; >> @@ -180,6 +187,62 @@ test_forked(void) >> gem_close(fd, handle); >> } >> >> +/* test CPU write. This has a rather big implication for the driver which >> must >> + * guarantee cache synchronization when writing the bo using CPU. */ >> +static void >> +test_correct_cpu_write(void) >> +{ >> +int dma_buf_fd; >> +char *ptr; >> +uint32_t handle; >> + >> +handle = gem_create(fd, BO_SIZE); >> + >> +dma_buf_fd = prime_handle_to_fd(fd, handle); >> +igt_assert(errno == 0); >> + >> +/* Check correctness of map using write protection (PROT_WRITE) */ >> +ptr = mmap(NULL, BO_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, >> dma_buf_fd, 0); >> +igt_assert(ptr != MAP_FAILED); >> + >> +/* Fill bo using CPU */ >> +fill_bo_cpu(ptr); >> + >> +/* Check pattern correctness */ >> +igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); >> + >> +munmap(ptr, BO_SIZE); >> +close(dma_buf_fd); >> +gem_close(fd, handle); >> +} >> + >> +/* map from another process and then write using CPU */ >> +static void >> +test_forked_cpu_write(void) >> +{ >> +int dma_buf_fd; >> +char *ptr; >> +uint32_t handle; >> + >> +handle = gem_create(fd, BO_SIZE); >> + >> +d
[PATCH v4] mmap on the dma-buf directly
On 08/13/2015 08:09 AM, Thomas Hellstrom wrote: > Tiago, > > I take it, this is intended to be a generic interface used mostly for 2D > rendering. yup. "generic" is an important point that I've actually forgot to mention in the description, which is probably the whole motivation for bringing this up. We want avoid link any vendor-specific library to the unpriviledged process for security reasons, so it's a requirement to it not have access to driver-specific ioctls when mapping the buffers. The use-case for it is texturing from CPU rendered buffer, like you said, with the intention of passing these buffers among processes without performing any copy in the user-space ("zero-copy"). > In that case, using SYNC is crucial for performance of incoherent > architectures and I'd rather see it mandatory than an option. It could > perhaps be made mandatory preferrably using an error or a one-time > kernel warning. If nobody uses the SYNC interface, it is of little use. hmm I'm not sure it is little use. Our hardware (the "LLC" capable) has this very specific case where the cache gets dirty wrt the GPU, which is when the same buffer is shared with the scanout device. This is not something will happen in Chrome OS for example, so we wouldn't need the SYNC markers there. In any case I think that making it mandatory works for us, but I'll have to check with Daniel/Chris whether there are performance penalties when accessing it and so on. > Also I think the syncing needs to be extended to two dimensions. A long > time ago when this was brought up people argued why we should limit it > to two dimensions, but I believe two dimensions addresses most > performance-problematic use-cases. A default implementation of > twodimensional sync can easily be made using the one-dimensional API. two dimension sync? You're saying that there could be a GPU access API in dma-buf as well? I don't get it, what's the use-case? I'm sure I missed the discussions because I wasn't particularly interested in this whole thingy before :) Thanks for reviewing, Thomas. Tiago
[PATCH 7/7] tests/kms_mmap_write_crc: Demonstrate the need for end_cpu_access
It requires i915 changes to add end_cpu_access(). Signed-off-by: Tiago Vignatti --- tests/kms_mmap_write_crc.c | 63 -- 1 file changed, 55 insertions(+), 8 deletions(-) diff --git a/tests/kms_mmap_write_crc.c b/tests/kms_mmap_write_crc.c index e24d535..59ac9e7 100644 --- a/tests/kms_mmap_write_crc.c +++ b/tests/kms_mmap_write_crc.c @@ -67,6 +67,24 @@ static char *dmabuf_mmap_framebuffer(int drm_fd, struct igt_fb *fb) return ptr; } +static void dmabuf_sync_start(void) +{ + struct dma_buf_sync sync_start; + + memset(&sync_start, 0, sizeof(sync_start)); + sync_start.flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_RW; + do_ioctl(dma_buf_fd, DMA_BUF_IOCTL_SYNC, &sync_start); +} + +static void dmabuf_sync_end(void) +{ + struct dma_buf_sync sync_end; + + memset(&sync_end, 0, sizeof(sync_end)); + sync_end.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_RW; + do_ioctl(dma_buf_fd, DMA_BUF_IOCTL_SYNC, &sync_end); +} + static void test_begin_access(data_t *data) { igt_display_t *display = &data->display; @@ -103,14 +121,11 @@ static void test_begin_access(data_t *data) caching = gem_get_caching(data->drm_fd, fb->gem_handle); igt_assert(caching == I915_CACHING_NONE || caching == I915_CACHING_DISPLAY); - // Uncomment the following for flush and the crc check next passes. It - // requires the kernel counter-part of it implemented obviously. - // { - // struct dma_buf_sync sync_start; - // memset(&sync_start, 0, sizeof(sync_start)); - // sync_start.flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_RW; - // do_ioctl(dma_buf_fd, DMA_BUF_IOCTL_SYNC, &sync_start); - // } + /* +* firstly demonstrate the need for DMA_BUF_SYNC_START ("begin_cpu_access") +*/ + + dmabuf_sync_start(); /* use dmabuf pointer to make the other fb all white too */ buf = malloc(fb->size); @@ -126,6 +141,38 @@ static void test_begin_access(data_t *data) /* check that the crc is as expected, which requires that caches got flushed */ igt_pipe_crc_collect_crc(data->pipe_crc, &crc); igt_assert_crc_equal(&crc, &data->ref_crc); + + /* +* now demonstrates the need for DMA_BUF_SYNC_END ("end_cpu_access") +*/ + + /* start over, writing non-white to the fb again and flip to it to make it +* fully flushed */ + cr = igt_get_cairo_ctx(data->drm_fd, fb); + igt_paint_test_pattern(cr, fb->width, fb->height); + cairo_destroy(cr); + + igt_plane_set_fb(data->primary, fb); + igt_display_commit(display); + + /* sync start, to move to CPU domain */ + dmabuf_sync_start(); + + /* use dmabuf pointer in the same fb to make it all white */ + buf = malloc(fb->size); + igt_assert(buf != NULL); + memset(buf, 0xff, fb->size); + memcpy(ptr, buf, fb->size); + free(buf); + + /* there's an implicit flush in set_fb() as well (to set to the GTT domain), +* so if we don't do it and instead write directly into the fb as it is the +* scanout, that should demonstrate the need for end_cpu_access */ + dmabuf_sync_end(); + + /* check that the crc is as expected, which requires that caches got flushed */ + igt_pipe_crc_collect_crc(data->pipe_crc, &crc); + igt_assert_crc_equal(&crc, &data->ref_crc); } static bool prepare_crtc(data_t *data) -- 2.1.0
[PATCH 6/7] tests: Add kms_mmap_write_crc for cache coherency tests
This program can be used to detect when the writes don't land in scanout due cache incoherency. Although this seems a problem inherently of non-LCC machines ("Atom"), this particular test catches a cache dirt on scanout on LLC machines as well. It's inspired in Ville's kms_pwrite_crc.c and can be used also to test the correctness of the driver's begin_cpu_access (TODO end_cpu_access). To see the need for flush, one has to run this same binary a few times cause it's not 100% reproducible (in my Core machine it's always possible to catch the problem by running it at most 5 times). Signed-off-by: Tiago Vignatti --- tests/.gitignore | 1 + tests/Makefile.sources | 1 + tests/kms_mmap_write_crc.c | 236 + 3 files changed, 238 insertions(+) create mode 100644 tests/kms_mmap_write_crc.c diff --git a/tests/.gitignore b/tests/.gitignore index 5bc4a58..9ba1e48 100644 --- a/tests/.gitignore +++ b/tests/.gitignore @@ -140,6 +140,7 @@ kms_force_connector kms_frontbuffer_tracking kms_legacy_colorkey kms_mmio_vs_cs_flip +kms_mmap_write_crc kms_pipe_b_c_ivb kms_pipe_crc_basic kms_plane diff --git a/tests/Makefile.sources b/tests/Makefile.sources index 5b2072e..31c5508 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -163,6 +163,7 @@ TESTS_progs = \ kms_3d \ kms_fence_pin_leak \ kms_force_connector \ + kms_mmap_write_crc \ kms_pwrite_crc \ kms_sink_crc_basic \ prime_udl \ diff --git a/tests/kms_mmap_write_crc.c b/tests/kms_mmap_write_crc.c new file mode 100644 index 000..e24d535 --- /dev/null +++ b/tests/kms_mmap_write_crc.c @@ -0,0 +1,236 @@ +/* + * Copyright © 2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Tiago Vignatti + */ + +#include +#include +#include +#include +#include + +#include "drmtest.h" +#include "igt_debugfs.h" +#include "igt_kms.h" +#include "intel_chipset.h" +#include "ioctl_wrappers.h" +#include "igt_aux.h" + +IGT_TEST_DESCRIPTION( + "Use the display CRC support to validate mmap write to an already uncached future scanout buffer."); + +typedef struct { + int drm_fd; + igt_display_t display; + struct igt_fb fb[2]; + igt_output_t *output; + igt_plane_t *primary; + enum pipe pipe; + igt_crc_t ref_crc; + igt_pipe_crc_t *pipe_crc; + uint32_t devid; +} data_t; + +int dma_buf_fd; + +static char *dmabuf_mmap_framebuffer(int drm_fd, struct igt_fb *fb) +{ + char *ptr = NULL; + + dma_buf_fd = prime_handle_to_fd(drm_fd, fb->gem_handle); + igt_assert(errno == 0); + + ptr = mmap(NULL, fb->size, PROT_READ | PROT_WRITE, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + + return ptr; +} + +static void test_begin_access(data_t *data) +{ + igt_display_t *display = &data->display; + igt_output_t *output = data->output; + struct igt_fb *fb = &data->fb[1]; + drmModeModeInfo *mode; + cairo_t *cr; + char *ptr; + uint32_t caching; + void *buf; + igt_crc_t crc; + + mode = igt_output_get_mode(output); + + /* create a non-white fb where we can write later */ + igt_create_fb(data->drm_fd, mode->hdisplay, mode->vdisplay, + DRM_FORMAT_XRGB, LOCAL_DRM_FORMAT_MOD_NONE, fb); + + ptr = dmabuf_mmap_framebuffer(data->drm_fd, fb); + + cr = igt_get_cairo_ctx(data->drm_fd, fb); + igt_paint_test_pattern(cr, fb->width, fb->height); + cairo_destroy(cr); + + /* flip to it to make it UC/WC and fully flushed */ + igt_plane_set_fb(data->primary, fb); + igt_display_commit
[PATCH 5/7] prime_mmap: Test for userptr mmap
A userptr doesn't have the obj->base.filp, but can be exported via dma-buf, so make sure it fails when mmaping. Signed-off-by: Tiago Vignatti --- In machine, export the handle to fd is actually returning error and falling before the actual test happens. Same issue happens in gem_userptr_blits's test_dmabuf(). This patch needs to be tested properly therefore. tests/prime_mmap.c | 38 +- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c index ad91371..fd6d13b 100644 --- a/tests/prime_mmap.c +++ b/tests/prime_mmap.c @@ -299,12 +299,47 @@ static int prime_handle_to_fd_no_assert(uint32_t handle, int *fd_out) args.fd = -1; ret = drmIoctl(fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &args); - + if (ret) + ret = errno; *fd_out = args.fd; return ret; } +/* test for mmap(dma_buf_export(userptr)) */ +static void +test_userptr(void) +{ + int ret, dma_buf_fd; + void *ptr; + uint32_t handle; + + /* create userptr bo */ + ret = posix_memalign(&ptr, 4096, BO_SIZE); + igt_assert_eq(ret, 0); + + ret = gem_userptr(fd, (uint32_t *)ptr, BO_SIZE, 0, LOCAL_I915_USERPTR_UNSYNCHRONIZED, &handle); + igt_assert_eq(ret, 0); + + /* export userptr */ + ret = prime_handle_to_fd_no_assert(handle, &dma_buf_fd); + if (ret) { + igt_assert(ret == EINVAL || ret == ENODEV); + goto free_userptr; + } else { + igt_assert_eq(ret, 0); + igt_assert_lte(0, dma_buf_fd); + } + + /* a userptr doesn't have the obj->base.filp, but can be exported via +* dma-buf, so make sure it fails here */ + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr == MAP_FAILED && errno == ENODEV); +free_userptr: + gem_close(fd, handle); + close(dma_buf_fd); +} + static void test_errors(void) { @@ -413,6 +448,7 @@ igt_main { "test_forked_cpu_write", test_forked_cpu_write }, { "test_refcounting", test_refcounting }, { "test_dup", test_dup }, + { "test_userptr", test_userptr }, { "test_errors", test_errors }, { "test_aperture_limit", test_aperture_limit }, }; -- 2.1.0
[PATCH 4/7] lib: Add gem_userptr helpers
This patch moves userptr definitions and helpers implementation that were locally in gem_userptr_benchmark and gem_userptr_blits to the library, so other tests can make use of them as well. There's no functional changes. Signed-off-by: Tiago Vignatti --- benchmarks/gem_userptr_benchmark.c | 45 +++ lib/ioctl_wrappers.c | 30 + lib/ioctl_wrappers.h | 13 ++ tests/gem_userptr_blits.c | 92 +++--- 4 files changed, 73 insertions(+), 107 deletions(-) diff --git a/benchmarks/gem_userptr_benchmark.c b/benchmarks/gem_userptr_benchmark.c index b804fdd..e0797dc 100644 --- a/benchmarks/gem_userptr_benchmark.c +++ b/benchmarks/gem_userptr_benchmark.c @@ -58,17 +58,6 @@ #define PAGE_SIZE 4096 #endif -#define LOCAL_I915_GEM_USERPTR 0x33 -#define LOCAL_IOCTL_I915_GEM_USERPTR DRM_IOWR (DRM_COMMAND_BASE + LOCAL_I915_GEM_USERPTR, struct local_i915_gem_userptr) -struct local_i915_gem_userptr { - uint64_t user_ptr; - uint64_t user_size; - uint32_t flags; -#define LOCAL_I915_USERPTR_READ_ONLY (1<<0) -#define LOCAL_I915_USERPTR_UNSYNCHRONIZED (1<<31) - uint32_t handle; -}; - static uint32_t userptr_flags = LOCAL_I915_USERPTR_UNSYNCHRONIZED; #define BO_SIZE (65536) @@ -83,30 +72,6 @@ static void gem_userptr_test_synchronized(void) userptr_flags = 0; } -static int gem_userptr(int fd, void *ptr, int size, int read_only, uint32_t *handle) -{ - struct local_i915_gem_userptr userptr; - int ret; - - userptr.user_ptr = (uintptr_t)ptr; - userptr.user_size = size; - userptr.flags = userptr_flags; - if (read_only) - userptr.flags |= LOCAL_I915_USERPTR_READ_ONLY; - - ret = drmIoctl(fd, LOCAL_IOCTL_I915_GEM_USERPTR, &userptr); - if (ret) - ret = errno; - igt_skip_on_f(ret == ENODEV && - (userptr_flags & LOCAL_I915_USERPTR_UNSYNCHRONIZED) == 0 && - !read_only, - "Skipping, synchronized mappings with no kernel CONFIG_MMU_NOTIFIER?"); - if (ret == 0) - *handle = userptr.handle; - - return ret; -} - static void **handle_ptr_map; static unsigned int num_handle_ptr_map; @@ -144,7 +109,7 @@ static uint32_t create_userptr_bo(int fd, int size) ret = posix_memalign(&ptr, PAGE_SIZE, size); igt_assert(ret == 0); - ret = gem_userptr(fd, (uint32_t *)ptr, size, 0, &handle); + ret = gem_userptr(fd, (uint32_t *)ptr, size, 0, userptr_flags, &handle); igt_assert(ret == 0); add_handle_ptr(handle, ptr); @@ -167,7 +132,7 @@ static int has_userptr(int fd) assert(posix_memalign(&ptr, PAGE_SIZE, PAGE_SIZE) == 0); oldflags = userptr_flags; gem_userptr_test_unsynchronized(); - ret = gem_userptr(fd, ptr, PAGE_SIZE, 0, &handle); + ret = gem_userptr(fd, ptr, PAGE_SIZE, 0, userptr_flags, &handle); userptr_flags = oldflags; if (ret != 0) { free(ptr); @@ -379,7 +344,7 @@ static void test_impact_overlap(int fd, const char *prefix) for (i = 0, p = block; i < nr_bos[subtest]; i++, p += PAGE_SIZE) - ret = gem_userptr(fd, (uint32_t *)p, BO_SIZE, 0, + ret = gem_userptr(fd, (uint32_t *)p, BO_SIZE, 0, userptr_flags, &handles[i]); igt_assert(ret == 0); } @@ -439,7 +404,7 @@ static void test_single(int fd) start_test(test_duration_sec); while (run_test) { - ret = gem_userptr(fd, bo_ptr, BO_SIZE, 0, &handle); + ret = gem_userptr(fd, bo_ptr, BO_SIZE, 0, userptr_flags, &handle); assert(ret == 0); gem_close(fd, handle); iter++; @@ -480,7 +445,7 @@ static void test_multiple(int fd, unsigned int batch, int random) for (i = 0; i < batch; i++) { ret = gem_userptr(fd, bo_ptr + map[i] * BO_SIZE, BO_SIZE, - 0, &handles[i]); + 0, userptr_flags, &handles[i]); assert(ret == 0); } if (random) diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c index 941fa66..5152647 100644 --- a/lib/ioctl_wrappers.c +++ b/lib/ioctl_wrappers.c @@ -742,6 +742,36 @@ void gem_context_require_ban_period(int fd) igt_require(has_ban_period); } +int gem_userptr(int fd, void *ptr, int size, int read_only, uint32_t flags, uint32_t *handle) +{ + struct local_i915_gem_userptr userptr; + int ret; + + memset(&
[PATCH 3/7] prime_mmap: Add basic tests to write in a bo using CPU
This patch adds test_correct_cpu_write, which maps the texture buffer through a prime fd and then writes directly to it using the CPU. It stresses the driver to guarantee cache synchronization among the different domains. This test also adds test_forked_cpu_write, which creates the GEM bo in one process and pass the prime handle of the it to another process, which in turn uses the handle only to map and write. Grossly speaking this test simulates Chrome OS architecture, where the Web content ("unpriviledged process") maps and CPU-draws a buffer, which was previously allocated in the GPU process ("priviledged process"). This requires kernel modifications (Daniel Thompson's "drm: prime: Honour O_RDWR during prime-handle-to-fd"). Signed-off-by: Tiago Vignatti --- lib/ioctl_wrappers.c | 5 +++- tests/prime_mmap.c | 65 2 files changed, 69 insertions(+), 1 deletion(-) diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c index 53bd635..941fa66 100644 --- a/lib/ioctl_wrappers.c +++ b/lib/ioctl_wrappers.c @@ -1125,6 +1125,9 @@ void gem_require_ring(int fd, int ring_id) /* prime */ +#ifndef DRM_RDWR +#define DRM_RDWR O_RDWR +#endif /** * prime_handle_to_fd: * @fd: open i915 drm file descriptor @@ -1142,7 +1145,7 @@ int prime_handle_to_fd(int fd, uint32_t handle) memset(&args, 0, sizeof(args)); args.handle = handle; - args.flags = DRM_CLOEXEC; + args.flags = DRM_CLOEXEC | DRM_RDWR; args.fd = -1; do_ioctl(fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &args); diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c index dc59e8f..ad91371 100644 --- a/tests/prime_mmap.c +++ b/tests/prime_mmap.c @@ -22,6 +22,7 @@ * * Authors: *Rob Bradford + *Tiago Vignatti * */ @@ -66,6 +67,12 @@ fill_bo(uint32_t handle, size_t size) } static void +fill_bo_cpu(char *ptr) +{ + memcpy(ptr, pattern, sizeof(pattern)); +} + +static void test_correct(void) { int dma_buf_fd; @@ -180,6 +187,62 @@ test_forked(void) gem_close(fd, handle); } +/* test CPU write. This has a rather big implication for the driver which must + * guarantee cache synchronization when writing the bo using CPU. */ +static void +test_correct_cpu_write(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + + /* Check correctness of map using write protection (PROT_WRITE) */ + ptr = mmap(NULL, BO_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + + /* Fill bo using CPU */ + fill_bo_cpu(ptr); + + /* Check pattern correctness */ + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + + munmap(ptr, BO_SIZE); + close(dma_buf_fd); + gem_close(fd, handle); +} + +/* map from another process and then write using CPU */ +static void +test_forked_cpu_write(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + + igt_fork(childno, 1) { + ptr = mmap(NULL, BO_SIZE, PROT_READ | PROT_WRITE , MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + fill_bo_cpu(ptr); + + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + munmap(ptr, BO_SIZE); + close(dma_buf_fd); + } + close(dma_buf_fd); + igt_waitchildren(); + gem_close(fd, handle); +} + static void test_refcounting(void) { @@ -346,6 +409,8 @@ igt_main { "test_map_unmap", test_map_unmap }, { "test_reprime", test_reprime }, { "test_forked", test_forked }, + { "test_correct_cpu_write", test_correct_cpu_write }, + { "test_forked_cpu_write", test_forked_cpu_write }, { "test_refcounting", test_refcounting }, { "test_dup", test_dup }, { "test_errors", test_errors }, -- 2.1.0
[PATCH 2/7] prime_mmap: Fix a few misc stuff
- Remove pattern_check(), which was walking through a useless iterator - Remove superfluous PROT_WRITE from gem_mmap, in test_correct() - Add binary file to .gitignore Signed-off-by: Tiago Vignatti --- tests/.gitignore | 1 + tests/prime_mmap.c | 37 - 2 files changed, 13 insertions(+), 25 deletions(-) diff --git a/tests/.gitignore b/tests/.gitignore index 0af0899..5bc4a58 100644 --- a/tests/.gitignore +++ b/tests/.gitignore @@ -163,6 +163,7 @@ pm_sseu prime_nv_api prime_nv_pcopy prime_nv_test +prime_mmap prime_self_import prime_udl template diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c index 4dc2055..dc59e8f 100644 --- a/tests/prime_mmap.c +++ b/tests/prime_mmap.c @@ -65,19 +65,6 @@ fill_bo(uint32_t handle, size_t size) } } -static int -pattern_check(char *ptr, size_t size) -{ - off_t i; - for (i = 0; i < size; i+=sizeof(pattern)) - { - if (memcmp(ptr, pattern, sizeof(pattern)) != 0) - return 1; - } - - return 0; -} - static void test_correct(void) { @@ -92,14 +79,14 @@ test_correct(void) igt_assert(errno == 0); /* Check correctness vs GEM_MMAP_GTT */ - ptr1 = gem_mmap(fd, handle, BO_SIZE, PROT_READ | PROT_WRITE); + ptr1 = gem_mmap(fd, handle, BO_SIZE, PROT_READ); ptr2 = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); igt_assert(ptr1 != MAP_FAILED); igt_assert(ptr2 != MAP_FAILED); igt_assert(memcmp(ptr1, ptr2, BO_SIZE) == 0); /* Check pattern correctness */ - igt_assert(pattern_check(ptr2, BO_SIZE) == 0); + igt_assert(memcmp(ptr2, pattern, sizeof(pattern)) == 0); munmap(ptr1, BO_SIZE); munmap(ptr2, BO_SIZE); @@ -122,13 +109,13 @@ test_map_unmap(void) ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); igt_assert(ptr != MAP_FAILED); - igt_assert(pattern_check(ptr, BO_SIZE) == 0); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); /* Unmap and remap */ munmap(ptr, BO_SIZE); ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); igt_assert(ptr != MAP_FAILED); - igt_assert(pattern_check(ptr, BO_SIZE) == 0); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); munmap(ptr, BO_SIZE); close(dma_buf_fd); @@ -151,16 +138,16 @@ test_reprime(void) ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); igt_assert(ptr != MAP_FAILED); - igt_assert(pattern_check(ptr, BO_SIZE) == 0); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); close (dma_buf_fd); - igt_assert(pattern_check(ptr, BO_SIZE) == 0); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); munmap(ptr, BO_SIZE); dma_buf_fd = prime_handle_to_fd(fd, handle); ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); igt_assert(ptr != MAP_FAILED); - igt_assert(pattern_check(ptr, BO_SIZE) == 0); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); munmap(ptr, BO_SIZE); close(dma_buf_fd); @@ -184,7 +171,7 @@ test_forked(void) igt_fork(childno, 1) { ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); igt_assert(ptr != MAP_FAILED); - igt_assert(pattern_check(ptr, BO_SIZE) == 0); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); munmap(ptr, BO_SIZE); close(dma_buf_fd); } @@ -210,7 +197,7 @@ test_refcounting(void) ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); igt_assert(ptr != MAP_FAILED); - igt_assert(pattern_check(ptr, BO_SIZE) == 0); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); munmap(ptr, BO_SIZE); close (dma_buf_fd); } @@ -231,7 +218,7 @@ test_dup(void) ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); igt_assert(ptr != MAP_FAILED); - igt_assert(pattern_check(ptr, BO_SIZE) == 0); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); munmap(ptr, BO_SIZE); gem_close(fd, handle); close (dma_buf_fd); @@ -310,7 +297,7 @@ test_aperture_limit(void) igt_assert(errno == 0); ptr1 = mmap(NULL, size1, PROT_READ, MAP_SHARED, dma_buf_fd1, 0); igt_assert(ptr1 != MAP_FAILED); - igt_assert(pattern_check(ptr1, BO_SIZE) == 0); + igt_assert(memcmp(ptr1, pattern, sizeof(pattern)) == 0); handle2 = gem_create(fd, size1); fill_bo(handle2, BO_SIZE); @@ -318,7 +305,7 @@ test_aperture_limit(void) igt_assert(errno == 0); ptr2 = mmap(NULL, size2, PROT_READ, MAP_SHARED, dma_buf_fd2, 0); igt_assert(ptr2 != MAP_FAILED); - igt_assert(pattern_check(ptr2, BO_SIZE) == 0); + igt_assert(memcmp(ptr2, pattern, sizeof(pattern)) =
[PATCH 1/7] prime_mmap: Add new test for calling mmap() on dma-buf fds
From: Rob Bradford This test has the following subtests: - test_correct for correctness of the data - test_map_unmap checks for mapping idempotency - test_reprime checks for dma-buf creation idempotency - test_forked checks for multiprocess access - test_refcounting checks for buffer reference counting - test_dup chats that dup()ing the fd works - test_errors checks the error return values for failures - test_aperture_limit tests multiple buffer creation at the gtt aperture limit Signed-off-by: Rob Bradford Signed-off-by: Tiago Vignatti --- tests/Makefile.sources | 1 + tests/prime_mmap.c | 383 + 2 files changed, 384 insertions(+) create mode 100644 tests/prime_mmap.c diff --git a/tests/Makefile.sources b/tests/Makefile.sources index c94714b..5b2072e 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -90,6 +90,7 @@ TESTS_progs_M = \ pm_rps \ pm_rc6_residency \ pm_sseu \ + prime_mmap \ prime_self_import \ template \ $(NULL) diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c new file mode 100644 index 000..4dc2055 --- /dev/null +++ b/tests/prime_mmap.c @@ -0,0 +1,383 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Rob Bradford + * + */ + +/* + * Testcase: Check whether mmap()ing dma-buf works + */ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "drm.h" +#include "i915_drm.h" +#include "drmtest.h" +#include "igt_debugfs.h" +#include "ioctl_wrappers.h" + +#define BO_SIZE (16*1024) + +static int fd; + +char pattern[] = {0xff, 0x00, 0x00, 0x00, + 0x00, 0xff, 0x00, 0x00, + 0x00, 0x00, 0xff, 0x00, + 0x00, 0x00, 0x00, 0xff}; + +static void +fill_bo(uint32_t handle, size_t size) +{ + off_t i; + for (i = 0; i < size; i+=sizeof(pattern)) + { + gem_write(fd, handle, i, pattern, sizeof(pattern)); + } +} + +static int +pattern_check(char *ptr, size_t size) +{ + off_t i; + for (i = 0; i < size; i+=sizeof(pattern)) + { + if (memcmp(ptr, pattern, sizeof(pattern)) != 0) + return 1; + } + + return 0; +} + +static void +test_correct(void) +{ + int dma_buf_fd; + char *ptr1, *ptr2; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + + /* Check correctness vs GEM_MMAP_GTT */ + ptr1 = gem_mmap(fd, handle, BO_SIZE, PROT_READ | PROT_WRITE); + ptr2 = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr1 != MAP_FAILED); + igt_assert(ptr2 != MAP_FAILED); + igt_assert(memcmp(ptr1, ptr2, BO_SIZE) == 0); + + /* Check pattern correctness */ + igt_assert(pattern_check(ptr2, BO_SIZE) == 0); + + munmap(ptr1, BO_SIZE); + munmap(ptr2, BO_SIZE); + close(dma_buf_fd); + gem_close(fd, handle); +} + +static void +test_map_unmap(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + igt_assert(pattern_check(ptr, BO_SIZE) == 0); + + /* Unmap and remap */ + munmap(ptr, BO_SIZE); + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + igt_assert(pattern_check(p
[PATCH 4/4] drm/i915: Use CPU mapping for userspace dma-buf mmap()
Userspace is the one in charge of flush CPU by wrapping mmap with begin{,end}_cpu_access. v2: Remove LLC check cause we have dma-buf sync providers now. Also, fix return before transferring ownership when mmap fails. v3: Fix return values. v4: !obj->base.filp is user triggerable, so removed the WARN_ON. Signed-off-by: Tiago Vignatti --- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c index 8447ba4..ecd00d6 100644 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c @@ -193,7 +193,23 @@ static void i915_gem_dmabuf_kunmap(struct dma_buf *dma_buf, unsigned long page_n static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma) { - return -EINVAL; + struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); + int ret; + + if (obj->base.size < vma->vm_end - vma->vm_start) + return -EINVAL; + + if (!obj->base.filp) + return -ENODEV; + + ret = obj->base.filp->f_op->mmap(obj->base.filp, vma); + if (ret) + return ret; + + fput(vma->vm_file); + vma->vm_file = get_file(obj->base.filp); + + return 0; } static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, size_t start, size_t length, enum dma_data_direction direction) -- 2.1.0
[PATCH 3/4] drm/i915: Implement end_cpu_access
Signed-off-by: Tiago Vignatti --- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c index e9c2bfd..8447ba4 100644 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c @@ -212,6 +212,15 @@ static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, size_t start, size return ret; } +static void i915_gem_end_cpu_access(struct dma_buf *dma_buf, size_t start, size_t length, enum dma_data_direction direction) +{ + struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); + bool write = (direction == DMA_BIDIRECTIONAL || direction == DMA_TO_DEVICE); + + if (i915_gem_object_set_to_gtt_domain(obj, write)) + DRM_ERROR("failed to set bo into the GTT\n"); +} + static const struct dma_buf_ops i915_dmabuf_ops = { .map_dma_buf = i915_gem_map_dma_buf, .unmap_dma_buf = i915_gem_unmap_dma_buf, @@ -224,6 +233,7 @@ static const struct dma_buf_ops i915_dmabuf_ops = { .vmap = i915_gem_dmabuf_vmap, .vunmap = i915_gem_dmabuf_vunmap, .begin_cpu_access = i915_gem_begin_cpu_access, + .end_cpu_access = i915_gem_end_cpu_access, }; struct dma_buf *i915_gem_prime_export(struct drm_device *dev, -- 2.1.0
[PATCH 2/4] dma-buf: Add ioctls to allow userspace to flush
From: Daniel Vetter The userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be used like following: - mmap dma-buf fd - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write to mmap area 3. SYNC_END ioctl. This can be repeated as often as you want (with the new data being consumed by the GPU or say scanout device) - munamp once you don't need the buffer any more v2 (Tiago): Fix header file type names (u64 -> __u64) v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end dma-buf functions. Check for overflows in start/length. Cc: Sumit Semwal Signed-off-by: Daniel Vetter Signed-off-by: Tiago Vignatti --- Documentation/dma-buf-sharing.txt | 12 ++ drivers/dma-buf/dma-buf.c | 50 +++ include/uapi/linux/dma-buf.h | 43 + 3 files changed, 105 insertions(+) create mode 100644 include/uapi/linux/dma-buf.h diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 480c8de..2d8ee3b 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -355,6 +355,18 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases: No special interfaces, userspace simply calls mmap on the dma-buf fd. + Also, the userspace might need some sort of cache coherency management e.g. + when CPU and GPU domains are being accessed through dma-buf at the same + time. To circumvent this problem there are begin/end coherency markers, that + forward directly to existing dma-buf device drivers vfunc hooks. Userspace + can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The + sequence would be used like following: + - mmap dma-buf fd + - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write + to mmap area 3. SYNC_END ioctl. This can be repeated as often as you + want (with the new data being consumed by the GPU or say scanout device) + - munamp once you don't need the buffer any more + 2. Supporting existing mmap interfaces in importers Similar to the motivation for kernel cpu access it is again important that diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 155c146..e628415 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -34,6 +34,8 @@ #include #include +#include + static inline int is_dma_buf_file(struct file *); struct dma_buf_list { @@ -251,11 +253,59 @@ out: return events; } +static long dma_buf_ioctl(struct file *file, + unsigned int cmd, unsigned long arg) +{ + struct dma_buf *dmabuf; + struct dma_buf_sync sync; + enum dma_data_direction direction; + + dmabuf = file->private_data; + + if (!is_dma_buf_file(file)) + return -EINVAL; + + switch (cmd) { + case DMA_BUF_IOCTL_SYNC: + if (copy_from_user(&sync, (void __user *) arg, sizeof(sync))) + return -EFAULT; + + if (sync.flags & DMA_BUF_SYNC_RW) + direction = DMA_BIDIRECTIONAL; + else if (sync.flags & DMA_BUF_SYNC_READ) + direction = DMA_FROM_DEVICE; + else if (sync.flags & DMA_BUF_SYNC_WRITE) + direction = DMA_TO_DEVICE; + else + return -EINVAL; + + if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK) + return -EINVAL; + + /* check for overflowing the buffer's size */ + if (sync.start > dmabuf->size || + sync.length > dmabuf->size - sync.start) + return -EINVAL; + + if (sync.flags & DMA_BUF_SYNC_END) + dma_buf_end_cpu_access(dmabuf, sync.start, + sync.length, direction); + else + dma_buf_begin_cpu_access(dmabuf, sync.start, +sync.length, direction); + + return 0; + default: + return -ENOTTY; + } +} + static const struct file_operations dma_buf_fops = { .release= dma_buf_release, .mmap = dma_buf_mmap_internal, .llseek = dma_buf_llseek, .poll = dma_buf_poll, + .unlocked_ioctl = dma_buf_ioctl, }; /* diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h new file mode 100644 index 00
[PATCH 1/4] drm: prime: Honour O_RDWR during prime-handle-to-fd
From: Daniel Thompson Currently DRM_IOCTL_PRIME_HANDLE_TO_FD rejects all flags except (DRM|O)_CLOEXEC making it difficult (maybe impossible) for userspace to mmap() the resulting dma-buf even when this is supported by the DRM driver. It is trivial to relax the restriction and permit read/write access. This is safe because the flags are seldom touched by drm; mostly they are passed verbatim to dma_buf calls. v3 (Tiago): removed unused flags variable from drm_prime_handle_to_fd_ioctl. Signed-off-by: Daniel Thompson Signed-off-by: Tiago Vignatti --- drivers/gpu/drm/drm_prime.c | 10 +++--- include/uapi/drm/drm.h | 1 + 2 files changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 27aa718..df6cdc7 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -329,7 +329,7 @@ static const struct dma_buf_ops drm_gem_prime_dmabuf_ops = { * drm_gem_prime_export - helper library implementation of the export callback * @dev: drm_device to export from * @obj: GEM object to export - * @flags: flags like DRM_CLOEXEC + * @flags: flags like DRM_CLOEXEC and DRM_RDWR * * This is the implementation of the gem_prime_export functions for GEM drivers * using the PRIME helpers. @@ -628,7 +628,6 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { struct drm_prime_handle *args = data; - uint32_t flags; if (!drm_core_check_feature(dev, DRIVER_PRIME)) return -EINVAL; @@ -637,14 +636,11 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data, return -ENOSYS; /* check flags are valid */ - if (args->flags & ~DRM_CLOEXEC) + if (args->flags & ~(DRM_CLOEXEC | DRM_RDWR)) return -EINVAL; - /* we only want to pass DRM_CLOEXEC which is == O_CLOEXEC */ - flags = args->flags & DRM_CLOEXEC; - return dev->driver->prime_handle_to_fd(dev, file_priv, - args->handle, flags, &args->fd); + args->handle, args->flags, &args->fd); } int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data, diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 3801584..ad8223e 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -668,6 +668,7 @@ struct drm_set_client_cap { __u64 value; }; +#define DRM_RDWR O_RDWR #define DRM_CLOEXEC O_CLOEXEC struct drm_prime_handle { __u32 handle; -- 2.1.0
[PATCH v4] mmap on the dma-buf directly
Hi, The idea is to create a GEM bo in one process and pass the prime handle of the it to another process, which in turn uses the handle only to map and write. This could be useful for Chrome OS architecture, where the Web content ("unpriviledged process") maps and CPU-draws a buffer, which was previously allocated in the GPU process ("priviledged process"). In v2, I've added a patch that Daniel kindly drafted to allow the unpriviledged process flush through a prime fd. In v3, I've fixed a few concerns and then added end_cpu_access to i915. In v4, I fixed Sumit Semwal's concerns about dma-duf documentation and the FIXME missing in that same patch, and also removed WARN in i915 dma-buf mmap (pointed by Chris). PTAL. Best Regards, Tiago Daniel Thompson (1): drm: prime: Honour O_RDWR during prime-handle-to-fd Daniel Vetter (1): dma-buf: Add ioctls to allow userspace to flush Tiago Vignatti (2): drm/i915: Implement end_cpu_access drm/i915: Use CPU mapping for userspace dma-buf mmap() Documentation/dma-buf-sharing.txt | 12 drivers/dma-buf/dma-buf.c | 50 ++ drivers/gpu/drm/drm_prime.c| 10 ++- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 28 ++- include/uapi/drm/drm.h | 1 + include/uapi/linux/dma-buf.h | 43 + 6 files changed, 136 insertions(+), 8 deletions(-) create mode 100644 include/uapi/linux/dma-buf.h -- 2.1.0
[PATCH i-g-t 5/5] tests/kms_mmap_write_crc: Demonstrate the need for end_cpu_access
It requires i915 changes to add end_cpu_access(). Signed-off-by: Tiago Vignatti --- tests/kms_mmap_write_crc.c | 63 -- 1 file changed, 55 insertions(+), 8 deletions(-) diff --git a/tests/kms_mmap_write_crc.c b/tests/kms_mmap_write_crc.c index e24d535..59ac9e7 100644 --- a/tests/kms_mmap_write_crc.c +++ b/tests/kms_mmap_write_crc.c @@ -67,6 +67,24 @@ static char *dmabuf_mmap_framebuffer(int drm_fd, struct igt_fb *fb) return ptr; } +static void dmabuf_sync_start(void) +{ + struct dma_buf_sync sync_start; + + memset(&sync_start, 0, sizeof(sync_start)); + sync_start.flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_RW; + do_ioctl(dma_buf_fd, DMA_BUF_IOCTL_SYNC, &sync_start); +} + +static void dmabuf_sync_end(void) +{ + struct dma_buf_sync sync_end; + + memset(&sync_end, 0, sizeof(sync_end)); + sync_end.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_RW; + do_ioctl(dma_buf_fd, DMA_BUF_IOCTL_SYNC, &sync_end); +} + static void test_begin_access(data_t *data) { igt_display_t *display = &data->display; @@ -103,14 +121,11 @@ static void test_begin_access(data_t *data) caching = gem_get_caching(data->drm_fd, fb->gem_handle); igt_assert(caching == I915_CACHING_NONE || caching == I915_CACHING_DISPLAY); - // Uncomment the following for flush and the crc check next passes. It - // requires the kernel counter-part of it implemented obviously. - // { - // struct dma_buf_sync sync_start; - // memset(&sync_start, 0, sizeof(sync_start)); - // sync_start.flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_RW; - // do_ioctl(dma_buf_fd, DMA_BUF_IOCTL_SYNC, &sync_start); - // } + /* +* firstly demonstrate the need for DMA_BUF_SYNC_START ("begin_cpu_access") +*/ + + dmabuf_sync_start(); /* use dmabuf pointer to make the other fb all white too */ buf = malloc(fb->size); @@ -126,6 +141,38 @@ static void test_begin_access(data_t *data) /* check that the crc is as expected, which requires that caches got flushed */ igt_pipe_crc_collect_crc(data->pipe_crc, &crc); igt_assert_crc_equal(&crc, &data->ref_crc); + + /* +* now demonstrates the need for DMA_BUF_SYNC_END ("end_cpu_access") +*/ + + /* start over, writing non-white to the fb again and flip to it to make it +* fully flushed */ + cr = igt_get_cairo_ctx(data->drm_fd, fb); + igt_paint_test_pattern(cr, fb->width, fb->height); + cairo_destroy(cr); + + igt_plane_set_fb(data->primary, fb); + igt_display_commit(display); + + /* sync start, to move to CPU domain */ + dmabuf_sync_start(); + + /* use dmabuf pointer in the same fb to make it all white */ + buf = malloc(fb->size); + igt_assert(buf != NULL); + memset(buf, 0xff, fb->size); + memcpy(ptr, buf, fb->size); + free(buf); + + /* there's an implicit flush in set_fb() as well (to set to the GTT domain), +* so if we don't do it and instead write directly into the fb as it is the +* scanout, that should demonstrate the need for end_cpu_access */ + dmabuf_sync_end(); + + /* check that the crc is as expected, which requires that caches got flushed */ + igt_pipe_crc_collect_crc(data->pipe_crc, &crc); + igt_assert_crc_equal(&crc, &data->ref_crc); } static bool prepare_crtc(data_t *data) -- 2.1.0
[PATCH i-g-t 4/5] tests: Add kms_mmap_write_crc for cache coherency tests
This program can be used to detect when the writes don't land in scanout due cache incoherency. Although this seems a problem inherently of non-LCC machines ("Atom"), this particular test catches a cache dirt on scanout on LLC machines as well. It's inspired in Ville's kms_pwrite_crc.c and can be used also to test the correctness of the driver's begin_cpu_access (TODO end_cpu_access). To see the need for flush, one has to run this same binary a few times cause it's not 100% reproducible (in my Core machine it's always possible to catch the problem by running it at most 5 times). Signed-off-by: Tiago Vignatti --- tests/.gitignore | 1 + tests/Makefile.sources | 1 + tests/kms_mmap_write_crc.c | 236 + 3 files changed, 238 insertions(+) create mode 100644 tests/kms_mmap_write_crc.c diff --git a/tests/.gitignore b/tests/.gitignore index 5bc4a58..9ba1e48 100644 --- a/tests/.gitignore +++ b/tests/.gitignore @@ -140,6 +140,7 @@ kms_force_connector kms_frontbuffer_tracking kms_legacy_colorkey kms_mmio_vs_cs_flip +kms_mmap_write_crc kms_pipe_b_c_ivb kms_pipe_crc_basic kms_plane diff --git a/tests/Makefile.sources b/tests/Makefile.sources index 5b2072e..31c5508 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -163,6 +163,7 @@ TESTS_progs = \ kms_3d \ kms_fence_pin_leak \ kms_force_connector \ + kms_mmap_write_crc \ kms_pwrite_crc \ kms_sink_crc_basic \ prime_udl \ diff --git a/tests/kms_mmap_write_crc.c b/tests/kms_mmap_write_crc.c new file mode 100644 index 000..e24d535 --- /dev/null +++ b/tests/kms_mmap_write_crc.c @@ -0,0 +1,236 @@ +/* + * Copyright © 2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Tiago Vignatti + */ + +#include +#include +#include +#include +#include + +#include "drmtest.h" +#include "igt_debugfs.h" +#include "igt_kms.h" +#include "intel_chipset.h" +#include "ioctl_wrappers.h" +#include "igt_aux.h" + +IGT_TEST_DESCRIPTION( + "Use the display CRC support to validate mmap write to an already uncached future scanout buffer."); + +typedef struct { + int drm_fd; + igt_display_t display; + struct igt_fb fb[2]; + igt_output_t *output; + igt_plane_t *primary; + enum pipe pipe; + igt_crc_t ref_crc; + igt_pipe_crc_t *pipe_crc; + uint32_t devid; +} data_t; + +int dma_buf_fd; + +static char *dmabuf_mmap_framebuffer(int drm_fd, struct igt_fb *fb) +{ + char *ptr = NULL; + + dma_buf_fd = prime_handle_to_fd(drm_fd, fb->gem_handle); + igt_assert(errno == 0); + + ptr = mmap(NULL, fb->size, PROT_READ | PROT_WRITE, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + + return ptr; +} + +static void test_begin_access(data_t *data) +{ + igt_display_t *display = &data->display; + igt_output_t *output = data->output; + struct igt_fb *fb = &data->fb[1]; + drmModeModeInfo *mode; + cairo_t *cr; + char *ptr; + uint32_t caching; + void *buf; + igt_crc_t crc; + + mode = igt_output_get_mode(output); + + /* create a non-white fb where we can write later */ + igt_create_fb(data->drm_fd, mode->hdisplay, mode->vdisplay, + DRM_FORMAT_XRGB, LOCAL_DRM_FORMAT_MOD_NONE, fb); + + ptr = dmabuf_mmap_framebuffer(data->drm_fd, fb); + + cr = igt_get_cairo_ctx(data->drm_fd, fb); + igt_paint_test_pattern(cr, fb->width, fb->height); + cairo_destroy(cr); + + /* flip to it to make it UC/WC and fully flushed */ + igt_plane_set_fb(data->primary, fb); + igt_display_commit
[PATCH i-g-t 3/5] prime_mmap: Add basic tests to write in a bo using CPU
This patch adds test_correct_cpu_write, which maps the texture buffer through a prime fd and then writes directly to it using the CPU. It stresses the driver to guarantee cache synchronization among the different domains. This test also adds test_forked_cpu_write, which creates the GEM bo in one process and pass the prime handle of the it to another process, which in turn uses the handle only to map and write. Grossly speaking this test simulates Chrome OS architecture, where the Web content ("unpriviledged process") maps and CPU-draws a buffer, which was previously allocated in the GPU process ("priviledged process"). This requires kernel modifications (Daniel Thompson's "drm: prime: Honour O_RDWR during prime-handle-to-fd"). Signed-off-by: Tiago Vignatti --- lib/ioctl_wrappers.c | 5 +++- tests/prime_mmap.c | 65 2 files changed, 69 insertions(+), 1 deletion(-) diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c index 53bd635..941fa66 100644 --- a/lib/ioctl_wrappers.c +++ b/lib/ioctl_wrappers.c @@ -1125,6 +1125,9 @@ void gem_require_ring(int fd, int ring_id) /* prime */ +#ifndef DRM_RDWR +#define DRM_RDWR O_RDWR +#endif /** * prime_handle_to_fd: * @fd: open i915 drm file descriptor @@ -1142,7 +1145,7 @@ int prime_handle_to_fd(int fd, uint32_t handle) memset(&args, 0, sizeof(args)); args.handle = handle; - args.flags = DRM_CLOEXEC; + args.flags = DRM_CLOEXEC | DRM_RDWR; args.fd = -1; do_ioctl(fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &args); diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c index dc59e8f..ad91371 100644 --- a/tests/prime_mmap.c +++ b/tests/prime_mmap.c @@ -22,6 +22,7 @@ * * Authors: *Rob Bradford + *Tiago Vignatti * */ @@ -66,6 +67,12 @@ fill_bo(uint32_t handle, size_t size) } static void +fill_bo_cpu(char *ptr) +{ + memcpy(ptr, pattern, sizeof(pattern)); +} + +static void test_correct(void) { int dma_buf_fd; @@ -180,6 +187,62 @@ test_forked(void) gem_close(fd, handle); } +/* test CPU write. This has a rather big implication for the driver which must + * guarantee cache synchronization when writing the bo using CPU. */ +static void +test_correct_cpu_write(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + + /* Check correctness of map using write protection (PROT_WRITE) */ + ptr = mmap(NULL, BO_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + + /* Fill bo using CPU */ + fill_bo_cpu(ptr); + + /* Check pattern correctness */ + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + + munmap(ptr, BO_SIZE); + close(dma_buf_fd); + gem_close(fd, handle); +} + +/* map from another process and then write using CPU */ +static void +test_forked_cpu_write(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + + igt_fork(childno, 1) { + ptr = mmap(NULL, BO_SIZE, PROT_READ | PROT_WRITE , MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + fill_bo_cpu(ptr); + + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + munmap(ptr, BO_SIZE); + close(dma_buf_fd); + } + close(dma_buf_fd); + igt_waitchildren(); + gem_close(fd, handle); +} + static void test_refcounting(void) { @@ -346,6 +409,8 @@ igt_main { "test_map_unmap", test_map_unmap }, { "test_reprime", test_reprime }, { "test_forked", test_forked }, + { "test_correct_cpu_write", test_correct_cpu_write }, + { "test_forked_cpu_write", test_forked_cpu_write }, { "test_refcounting", test_refcounting }, { "test_dup", test_dup }, { "test_errors", test_errors }, -- 2.1.0
[PATCH i-g-t 2/5] prime_mmap: Fix a few misc stuff
- Remove pattern_check(), which was walking through a useless iterator - Remove superfluous PROT_WRITE from gem_mmap, in test_correct() - Add binary file to .gitignore Signed-off-by: Tiago Vignatti --- tests/.gitignore | 1 + tests/prime_mmap.c | 37 - 2 files changed, 13 insertions(+), 25 deletions(-) diff --git a/tests/.gitignore b/tests/.gitignore index 0af0899..5bc4a58 100644 --- a/tests/.gitignore +++ b/tests/.gitignore @@ -163,6 +163,7 @@ pm_sseu prime_nv_api prime_nv_pcopy prime_nv_test +prime_mmap prime_self_import prime_udl template diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c index 4dc2055..dc59e8f 100644 --- a/tests/prime_mmap.c +++ b/tests/prime_mmap.c @@ -65,19 +65,6 @@ fill_bo(uint32_t handle, size_t size) } } -static int -pattern_check(char *ptr, size_t size) -{ - off_t i; - for (i = 0; i < size; i+=sizeof(pattern)) - { - if (memcmp(ptr, pattern, sizeof(pattern)) != 0) - return 1; - } - - return 0; -} - static void test_correct(void) { @@ -92,14 +79,14 @@ test_correct(void) igt_assert(errno == 0); /* Check correctness vs GEM_MMAP_GTT */ - ptr1 = gem_mmap(fd, handle, BO_SIZE, PROT_READ | PROT_WRITE); + ptr1 = gem_mmap(fd, handle, BO_SIZE, PROT_READ); ptr2 = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); igt_assert(ptr1 != MAP_FAILED); igt_assert(ptr2 != MAP_FAILED); igt_assert(memcmp(ptr1, ptr2, BO_SIZE) == 0); /* Check pattern correctness */ - igt_assert(pattern_check(ptr2, BO_SIZE) == 0); + igt_assert(memcmp(ptr2, pattern, sizeof(pattern)) == 0); munmap(ptr1, BO_SIZE); munmap(ptr2, BO_SIZE); @@ -122,13 +109,13 @@ test_map_unmap(void) ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); igt_assert(ptr != MAP_FAILED); - igt_assert(pattern_check(ptr, BO_SIZE) == 0); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); /* Unmap and remap */ munmap(ptr, BO_SIZE); ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); igt_assert(ptr != MAP_FAILED); - igt_assert(pattern_check(ptr, BO_SIZE) == 0); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); munmap(ptr, BO_SIZE); close(dma_buf_fd); @@ -151,16 +138,16 @@ test_reprime(void) ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); igt_assert(ptr != MAP_FAILED); - igt_assert(pattern_check(ptr, BO_SIZE) == 0); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); close (dma_buf_fd); - igt_assert(pattern_check(ptr, BO_SIZE) == 0); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); munmap(ptr, BO_SIZE); dma_buf_fd = prime_handle_to_fd(fd, handle); ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); igt_assert(ptr != MAP_FAILED); - igt_assert(pattern_check(ptr, BO_SIZE) == 0); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); munmap(ptr, BO_SIZE); close(dma_buf_fd); @@ -184,7 +171,7 @@ test_forked(void) igt_fork(childno, 1) { ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); igt_assert(ptr != MAP_FAILED); - igt_assert(pattern_check(ptr, BO_SIZE) == 0); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); munmap(ptr, BO_SIZE); close(dma_buf_fd); } @@ -210,7 +197,7 @@ test_refcounting(void) ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); igt_assert(ptr != MAP_FAILED); - igt_assert(pattern_check(ptr, BO_SIZE) == 0); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); munmap(ptr, BO_SIZE); close (dma_buf_fd); } @@ -231,7 +218,7 @@ test_dup(void) ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); igt_assert(ptr != MAP_FAILED); - igt_assert(pattern_check(ptr, BO_SIZE) == 0); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); munmap(ptr, BO_SIZE); gem_close(fd, handle); close (dma_buf_fd); @@ -310,7 +297,7 @@ test_aperture_limit(void) igt_assert(errno == 0); ptr1 = mmap(NULL, size1, PROT_READ, MAP_SHARED, dma_buf_fd1, 0); igt_assert(ptr1 != MAP_FAILED); - igt_assert(pattern_check(ptr1, BO_SIZE) == 0); + igt_assert(memcmp(ptr1, pattern, sizeof(pattern)) == 0); handle2 = gem_create(fd, size1); fill_bo(handle2, BO_SIZE); @@ -318,7 +305,7 @@ test_aperture_limit(void) igt_assert(errno == 0); ptr2 = mmap(NULL, size2, PROT_READ, MAP_SHARED, dma_buf_fd2, 0); igt_assert(ptr2 != MAP_FAILED); - igt_assert(pattern_check(ptr2, BO_SIZE) == 0); + igt_assert(memcmp(ptr2, pattern, sizeof(pattern)) =
[PATCH i-g-t 1/5] prime_mmap: Add new test for calling mmap() on dma-buf fds
From: Rob Bradford This test has the following subtests: - test_correct for correctness of the data - test_map_unmap checks for mapping idempotency - test_reprime checks for dma-buf creation idempotency - test_forked checks for multiprocess access - test_refcounting checks for buffer reference counting - test_dup chats that dup()ing the fd works - test_errors checks the error return values for failures - test_aperture_limit tests multiple buffer creation at the gtt aperture limit Signed-off-by: Rob Bradford Signed-off-by: Tiago Vignatti --- tests/Makefile.sources | 1 + tests/prime_mmap.c | 383 + 2 files changed, 384 insertions(+) create mode 100644 tests/prime_mmap.c diff --git a/tests/Makefile.sources b/tests/Makefile.sources index c94714b..5b2072e 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -90,6 +90,7 @@ TESTS_progs_M = \ pm_rps \ pm_rc6_residency \ pm_sseu \ + prime_mmap \ prime_self_import \ template \ $(NULL) diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c new file mode 100644 index 000..4dc2055 --- /dev/null +++ b/tests/prime_mmap.c @@ -0,0 +1,383 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Rob Bradford + * + */ + +/* + * Testcase: Check whether mmap()ing dma-buf works + */ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "drm.h" +#include "i915_drm.h" +#include "drmtest.h" +#include "igt_debugfs.h" +#include "ioctl_wrappers.h" + +#define BO_SIZE (16*1024) + +static int fd; + +char pattern[] = {0xff, 0x00, 0x00, 0x00, + 0x00, 0xff, 0x00, 0x00, + 0x00, 0x00, 0xff, 0x00, + 0x00, 0x00, 0x00, 0xff}; + +static void +fill_bo(uint32_t handle, size_t size) +{ + off_t i; + for (i = 0; i < size; i+=sizeof(pattern)) + { + gem_write(fd, handle, i, pattern, sizeof(pattern)); + } +} + +static int +pattern_check(char *ptr, size_t size) +{ + off_t i; + for (i = 0; i < size; i+=sizeof(pattern)) + { + if (memcmp(ptr, pattern, sizeof(pattern)) != 0) + return 1; + } + + return 0; +} + +static void +test_correct(void) +{ + int dma_buf_fd; + char *ptr1, *ptr2; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + + /* Check correctness vs GEM_MMAP_GTT */ + ptr1 = gem_mmap(fd, handle, BO_SIZE, PROT_READ | PROT_WRITE); + ptr2 = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr1 != MAP_FAILED); + igt_assert(ptr2 != MAP_FAILED); + igt_assert(memcmp(ptr1, ptr2, BO_SIZE) == 0); + + /* Check pattern correctness */ + igt_assert(pattern_check(ptr2, BO_SIZE) == 0); + + munmap(ptr1, BO_SIZE); + munmap(ptr2, BO_SIZE); + close(dma_buf_fd); + gem_close(fd, handle); +} + +static void +test_map_unmap(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + igt_assert(pattern_check(ptr, BO_SIZE) == 0); + + /* Unmap and remap */ + munmap(ptr, BO_SIZE); + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + igt_assert(pattern_check(p
[PATCH 4/4] drm/i915: Use CPU mapping for userspace dma-buf mmap()
Userspace is the one in charge of flush CPU by wrapping mmap with begin{,end}_cpu_access. v2: Remove LLC check cause we have dma-buf sync providers now. Also, fix return before transferring ownership when mmap fails. v3: Fix return values. Signed-off-by: Tiago Vignatti --- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c index 8447ba4..8b87c86 100644 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c @@ -193,7 +193,23 @@ static void i915_gem_dmabuf_kunmap(struct dma_buf *dma_buf, unsigned long page_n static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma) { - return -EINVAL; + struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); + int ret; + + if (obj->base.size < vma->vm_end - vma->vm_start) + return -EINVAL; + + if (WARN_ON(!obj->base.filp)) + return -ENODEV; + + ret = obj->base.filp->f_op->mmap(obj->base.filp, vma); + if (ret) + return ret; + + fput(vma->vm_file); + vma->vm_file = get_file(obj->base.filp); + + return 0; } static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, size_t start, size_t length, enum dma_data_direction direction) -- 2.1.0
[PATCH 3/4] drm/i915: Implement end_cpu_access
Signed-off-by: Tiago Vignatti --- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c index e9c2bfd..8447ba4 100644 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c @@ -212,6 +212,15 @@ static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, size_t start, size return ret; } +static void i915_gem_end_cpu_access(struct dma_buf *dma_buf, size_t start, size_t length, enum dma_data_direction direction) +{ + struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); + bool write = (direction == DMA_BIDIRECTIONAL || direction == DMA_TO_DEVICE); + + if (i915_gem_object_set_to_gtt_domain(obj, write)) + DRM_ERROR("failed to set bo into the GTT\n"); +} + static const struct dma_buf_ops i915_dmabuf_ops = { .map_dma_buf = i915_gem_map_dma_buf, .unmap_dma_buf = i915_gem_unmap_dma_buf, @@ -224,6 +233,7 @@ static const struct dma_buf_ops i915_dmabuf_ops = { .vmap = i915_gem_dmabuf_vmap, .vunmap = i915_gem_dmabuf_vunmap, .begin_cpu_access = i915_gem_begin_cpu_access, + .end_cpu_access = i915_gem_end_cpu_access, }; struct dma_buf *i915_gem_prime_export(struct drm_device *dev, -- 2.1.0
[PATCH 2/4] dma-buf: Add ioctls to allow userspace to flush
From: Daniel Vetter FIXME: Update kerneldoc for begin/end to make it clear that those are for mmap too. Open: Do we need a special indication that the begin/end is from userspace mmap and not from kernel mmap? There's also the question already about kernel internal users - vmap and kmap interfaces are much different ... We might need to add a mapping enum to the begin/end dma-buf functions. v2 (Tiago): Fix header file type names (u64 -> __u64) Signed-off-by: Daniel Vetter Signed-off-by: Tiago Vignatti --- drivers/dma-buf/dma-buf.c| 47 include/uapi/linux/dma-buf.h | 39 2 files changed, 86 insertions(+) create mode 100644 include/uapi/linux/dma-buf.h diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 155c146..4820d61 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -34,6 +34,8 @@ #include #include +#include + static inline int is_dma_buf_file(struct file *); struct dma_buf_list { @@ -251,11 +253,56 @@ out: return events; } +static long dma_buf_ioctl(struct file *file, + unsigned int cmd, unsigned long arg) +{ + struct dma_buf *dmabuf; + struct dma_buf_sync sync; + enum dma_data_direction direction; + + dmabuf = file->private_data; + + if (!is_dma_buf_file(file)) + return -EINVAL; + + switch (cmd) { + case DMA_BUF_IOCTL_SYNC: + if (copy_from_user(&sync, (void __user *) arg, sizeof(sync))) + return -EFAULT; + + if (sync.flags & DMA_BUF_SYNC_RW) + direction = DMA_BIDIRECTIONAL; + else if (sync.flags & DMA_BUF_SYNC_READ) + direction = DMA_FROM_DEVICE; + else if (sync.flags & DMA_BUF_SYNC_WRITE) + direction = DMA_TO_DEVICE; + else + return -EINVAL; + + if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK) + return -EINVAL; + + /* FIXME: Check for overflows in start/length. */ + + if (sync.flags & DMA_BUF_SYNC_END) + dma_buf_end_cpu_access(dmabuf, sync.start, + sync.length, direction); + else + dma_buf_begin_cpu_access(dmabuf, sync.start, +sync.length, direction); + + return 0; + default: + return -ENOTTY; + } +} + static const struct file_operations dma_buf_fops = { .release= dma_buf_release, .mmap = dma_buf_mmap_internal, .llseek = dma_buf_llseek, .poll = dma_buf_poll, + .unlocked_ioctl = dma_buf_ioctl, }; /* diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h new file mode 100644 index 000..e5327df --- /dev/null +++ b/include/uapi/linux/dma-buf.h @@ -0,0 +1,39 @@ +/* + * Framework for buffer objects that can be shared across devices/subsystems. + * + * Copyright(C) 2015 Intel Ltd + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef _DMA_BUF_UAPI_H_ +#define _DMA_BUF_UAPI_H_ + +struct dma_buf_sync { + __u64 flags; + __u64 start; + __u64 length; +}; + +#define DMA_BUF_SYNC_READ (1 << 0) +#define DMA_BUF_SYNC_WRITE (2 << 0) +#define DMA_BUF_SYNC_RW(3 << 0) +#define DMA_BUF_SYNC_START (0 << 2) +#define DMA_BUF_SYNC_END (1 << 2) +#define DMA_BUF_SYNC_VALID_FLAGS_MASK \ + (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END) + +#define DMA_BUF_BASE 'b' +#define DMA_BUF_IOCTL_SYNC _IOWR(DMA_BUF_BASE, 0, struct dma_buf_sync) + +#endif -- 2.1.0
[PATCH 1/4] drm: prime: Honour O_RDWR during prime-handle-to-fd
From: Daniel Thompson Currently DRM_IOCTL_PRIME_HANDLE_TO_FD rejects all flags except (DRM|O)_CLOEXEC making it difficult (maybe impossible) for userspace to mmap() the resulting dma-buf even when this is supported by the DRM driver. It is trivial to relax the restriction and permit read/write access. This is safe because the flags are seldom touched by drm; mostly they are passed verbatim to dma_buf calls. v3 (Tiago): removed unused flags variable from drm_prime_handle_to_fd_ioctl. Signed-off-by: Daniel Thompson Signed-off-by: Tiago Vignatti --- drivers/gpu/drm/drm_prime.c | 10 +++--- include/uapi/drm/drm.h | 1 + 2 files changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 27aa718..df6cdc7 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -329,7 +329,7 @@ static const struct dma_buf_ops drm_gem_prime_dmabuf_ops = { * drm_gem_prime_export - helper library implementation of the export callback * @dev: drm_device to export from * @obj: GEM object to export - * @flags: flags like DRM_CLOEXEC + * @flags: flags like DRM_CLOEXEC and DRM_RDWR * * This is the implementation of the gem_prime_export functions for GEM drivers * using the PRIME helpers. @@ -628,7 +628,6 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { struct drm_prime_handle *args = data; - uint32_t flags; if (!drm_core_check_feature(dev, DRIVER_PRIME)) return -EINVAL; @@ -637,14 +636,11 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data, return -ENOSYS; /* check flags are valid */ - if (args->flags & ~DRM_CLOEXEC) + if (args->flags & ~(DRM_CLOEXEC | DRM_RDWR)) return -EINVAL; - /* we only want to pass DRM_CLOEXEC which is == O_CLOEXEC */ - flags = args->flags & DRM_CLOEXEC; - return dev->driver->prime_handle_to_fd(dev, file_priv, - args->handle, flags, &args->fd); + args->handle, args->flags, &args->fd); } int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data, diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 3801584..ad8223e 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -668,6 +668,7 @@ struct drm_set_client_cap { __u64 value; }; +#define DRM_RDWR O_RDWR #define DRM_CLOEXEC O_CLOEXEC struct drm_prime_handle { __u32 handle; -- 2.1.0
[PATCH v3] mmap on the dma-buf directly
Hi, The idea is to create a GEM bo in one process and pass the prime handle of the it to another process, which in turn uses the handle only to map and write. This could be useful for Chrome OS architecture, where the Web content ("unpriviledged process") maps and CPU-draws a buffer, which was previously allocated in the GPU process ("priviledged process"). In v2, I've added a patch that Daniel kindly drafted to allow the unpriviledged process flush through a prime fd. In v3, I've fixed a few concerns and then added end_cpu_access to i915. To validate all this I'm using igt, and sending the tests for review now. Please take a look. Tiago Daniel Thompson (1): drm: prime: Honour O_RDWR during prime-handle-to-fd Daniel Vetter (1): dma-buf: Add ioctls to allow userspace to flush Tiago Vignatti (2): drm/i915: Implement end_cpu_access drm/i915: Use CPU mapping for userspace dma-buf mmap() drivers/dma-buf/dma-buf.c | 47 ++ drivers/gpu/drm/drm_prime.c| 10 +++- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 28 +++- include/uapi/drm/drm.h | 1 + include/uapi/linux/dma-buf.h | 39 5 files changed, 117 insertions(+), 8 deletions(-) create mode 100644 include/uapi/linux/dma-buf.h -- 2.1.0
[PATCH 3/3] drm/i915: Use CPU mapping for userspace dma-buf mmap()
Userspace is the one in charge of flush CPU by wrapping mmap with begin{,end}_cpu_access. v2: Remove LLC check cause we have dma-buf sync providers now. Also, fix return before transferring ownership when mmap fails. Signed-off-by: Tiago Vignatti --- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c index e9c2bfd..b608f67 100644 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c @@ -193,7 +193,23 @@ static void i915_gem_dmabuf_kunmap(struct dma_buf *dma_buf, unsigned long page_n static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma) { - return -EINVAL; + struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); + int ret; + + if (obj->base.size < vma->vm_end - vma->vm_start) + return -EINVAL; + + if (WARN_ON(!obj->base.filp)) + return -EINVAL; + + ret = obj->base.filp->f_op->mmap(obj->base.filp, vma); + if (IS_ERR_VALUE(ret)) + return -EINVAL; + + fput(vma->vm_file); + vma->vm_file = get_file(obj->base.filp); + + return ret; } static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, size_t start, size_t length, enum dma_data_direction direction) -- 2.1.0
[PATCH 2/3] dma-buf: Add ioctls to allow userspace to flush
From: Daniel Vetter FIXME: Update kerneldoc for begin/end to make it clear that those are for mmap too. Open: Do we need a special indication that the begin/end is from userspace mmap and not from kernel mmap? There's also the question already about kernel internal users - vmap and kmap interfaces are much different ... We might need to add a mapping enum to the begin/end dma-buf functions. v2 (Tiago): Fix header file type names (u64 -> __u64) Signed-off-by: Daniel Vetter Signed-off-by: Tiago Vignatti --- drivers/dma-buf/dma-buf.c| 47 include/uapi/linux/dma-buf.h | 39 2 files changed, 86 insertions(+) create mode 100644 include/uapi/linux/dma-buf.h diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 155c146..4820d61 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -34,6 +34,8 @@ #include #include +#include + static inline int is_dma_buf_file(struct file *); struct dma_buf_list { @@ -251,11 +253,56 @@ out: return events; } +static long dma_buf_ioctl(struct file *file, + unsigned int cmd, unsigned long arg) +{ + struct dma_buf *dmabuf; + struct dma_buf_sync sync; + enum dma_data_direction direction; + + dmabuf = file->private_data; + + if (!is_dma_buf_file(file)) + return -EINVAL; + + switch (cmd) { + case DMA_BUF_IOCTL_SYNC: + if (copy_from_user(&sync, (void __user *) arg, sizeof(sync))) + return -EFAULT; + + if (sync.flags & DMA_BUF_SYNC_RW) + direction = DMA_BIDIRECTIONAL; + else if (sync.flags & DMA_BUF_SYNC_READ) + direction = DMA_FROM_DEVICE; + else if (sync.flags & DMA_BUF_SYNC_WRITE) + direction = DMA_TO_DEVICE; + else + return -EINVAL; + + if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK) + return -EINVAL; + + /* FIXME: Check for overflows in start/length. */ + + if (sync.flags & DMA_BUF_SYNC_END) + dma_buf_end_cpu_access(dmabuf, sync.start, + sync.length, direction); + else + dma_buf_begin_cpu_access(dmabuf, sync.start, +sync.length, direction); + + return 0; + default: + return -ENOTTY; + } +} + static const struct file_operations dma_buf_fops = { .release= dma_buf_release, .mmap = dma_buf_mmap_internal, .llseek = dma_buf_llseek, .poll = dma_buf_poll, + .unlocked_ioctl = dma_buf_ioctl, }; /* diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h new file mode 100644 index 000..e5327df --- /dev/null +++ b/include/uapi/linux/dma-buf.h @@ -0,0 +1,39 @@ +/* + * Framework for buffer objects that can be shared across devices/subsystems. + * + * Copyright(C) 2015 Intel Ltd + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef _DMA_BUF_UAPI_H_ +#define _DMA_BUF_UAPI_H_ + +struct dma_buf_sync { + __u64 flags; + __u64 start; + __u64 length; +}; + +#define DMA_BUF_SYNC_READ (1 << 0) +#define DMA_BUF_SYNC_WRITE (2 << 0) +#define DMA_BUF_SYNC_RW(3 << 0) +#define DMA_BUF_SYNC_START (0 << 2) +#define DMA_BUF_SYNC_END (1 << 2) +#define DMA_BUF_SYNC_VALID_FLAGS_MASK \ + (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END) + +#define DMA_BUF_BASE 'b' +#define DMA_BUF_IOCTL_SYNC _IOWR(DMA_BUF_BASE, 0, struct dma_buf_sync) + +#endif -- 2.1.0
[PATCH 1/3] drm: prime: Honour O_RDWR during prime-handle-to-fd
From: Daniel Thompson Currently DRM_IOCTL_PRIME_HANDLE_TO_FD rejects all flags except (DRM|O)_CLOEXEC making it difficult (maybe impossible) for userspace to mmap() the resulting dma-buf even when this is supported by the DRM driver. It is trivial to relax the restriction and permit read/write access. This is safe because the flags are seldom touched by drm; mostly they are passed verbatim to dma_buf calls. v3 (Tiago): removed unused flags variable from drm_prime_handle_to_fd_ioctl. Signed-off-by: Daniel Thompson Signed-off-by: Tiago Vignatti --- drivers/gpu/drm/drm_prime.c | 10 +++--- include/uapi/drm/drm.h | 1 + 2 files changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 27aa718..df6cdc7 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -329,7 +329,7 @@ static const struct dma_buf_ops drm_gem_prime_dmabuf_ops = { * drm_gem_prime_export - helper library implementation of the export callback * @dev: drm_device to export from * @obj: GEM object to export - * @flags: flags like DRM_CLOEXEC + * @flags: flags like DRM_CLOEXEC and DRM_RDWR * * This is the implementation of the gem_prime_export functions for GEM drivers * using the PRIME helpers. @@ -628,7 +628,6 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { struct drm_prime_handle *args = data; - uint32_t flags; if (!drm_core_check_feature(dev, DRIVER_PRIME)) return -EINVAL; @@ -637,14 +636,11 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data, return -ENOSYS; /* check flags are valid */ - if (args->flags & ~DRM_CLOEXEC) + if (args->flags & ~(DRM_CLOEXEC | DRM_RDWR)) return -EINVAL; - /* we only want to pass DRM_CLOEXEC which is == O_CLOEXEC */ - flags = args->flags & DRM_CLOEXEC; - return dev->driver->prime_handle_to_fd(dev, file_priv, - args->handle, flags, &args->fd); + args->handle, args->flags, &args->fd); } int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data, diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 3801584..ad8223e 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -668,6 +668,7 @@ struct drm_set_client_cap { __u64 value; }; +#define DRM_RDWR O_RDWR #define DRM_CLOEXEC O_CLOEXEC struct drm_prime_handle { __u32 handle; -- 2.1.0
[PATCH v2 0/3] mmap on the dma-buf directly
Hi, I've tested these patches (in drm-intel-nightly, but also in CrOS kernel v3.14) and they seem just enough for what we want to do: the idea is to create a GEM bo in one process and pass the prime handle of the it to another process, which in turn uses the handle only to map and write. This could be useful for Chrome OS architecture, where the Web content ("unpriviledged process") maps and CPU-draws a buffer, which was previously allocated in the GPU process ("priviledged process"). In v2, I've added a patch that Daniel kindly drafted to allow the unpriviledged process flush through a prime fd. To validate it I've built a test on top of igt's kms_pwrite_crc (which I'll be sending next for review). Let me know your concerns. Tiago Daniel Thompson (1): drm: prime: Honour O_RDWR during prime-handle-to-fd Daniel Vetter (1): dma-buf: Add ioctls to allow userspace to flush Tiago Vignatti (1): drm/i915: Use CPU mapping for userspace dma-buf mmap() drivers/dma-buf/dma-buf.c | 47 ++ drivers/gpu/drm/drm_prime.c| 10 +++- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 18 - include/uapi/drm/drm.h | 1 + include/uapi/linux/dma-buf.h | 39 5 files changed, 107 insertions(+), 8 deletions(-) create mode 100644 include/uapi/linux/dma-buf.h -- 2.1.0
[Intel-gfx] [PATCH 1/2] drm/i915: Use CPU mapping for userspace dma-buf mmap()
On 08/05/2015 04:08 AM, Daniel Vetter wrote: > On Tue, Aug 04, 2015 at 06:30:25PM -0300, Tiago Vignatti wrote: > Nah they don't have to be equal since the problem isn't that nothing goes > out to memory where the display can see it, but usually only parts of it. > I.e. you need to change your test to > - draw black screen (it starts that way so nothing to do really), grab crtc > - draw white screen and make sure you flush correctly, don't bother with >crc (we can't test for inequality >because collisions are too easy) > - draw black screen again without flushing, grab crc > > Then assert that your two crc will be inequal (which they shouldn't be > because some cachelines will still be stuck). Maybe also add a delay > somewhere so you can see the cacheline dirt pattern, it's very > characteristic. Cool, I've got it now. The test below makes the cachelines dirt, requiring them to get flushed correctly -- I'll work on it now. Should we add that kind of test somewhere in igt BTW? PS: I had an issue with the original kms_pwrite_crc which returns frequent fails. Paulo helped though and showed me that pwrite is currently broken: https://bugs.freedesktop.org/show_bug.cgi?id=86422 Tiago diff --git a/tests/kms_pwrite_crc.c b/tests/kms_pwrite_crc.c index 05b9e38..419b46d 100644 --- a/tests/kms_pwrite_crc.c +++ b/tests/kms_pwrite_crc.c @@ -50,6 +50,20 @@ typedef struct { uint32_t devid; } data_t; +static char *dmabuf_mmap_framebuffer(int drm_fd, struct igt_fb *fb) +{ + int dma_buf_fd; + char *ptr = NULL; + + dma_buf_fd = prime_handle_to_fd(drm_fd, fb->gem_handle); + igt_assert(errno == 0); + + ptr = mmap(NULL, fb->size, PROT_READ | PROT_WRITE, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + + return ptr; +} + static void test(data_t *data) { igt_display_t *display = &data->display; @@ -57,6 +71,7 @@ static void test(data_t *data) struct igt_fb *fb = &data->fb[1]; drmModeModeInfo *mode; cairo_t *cr; + char *ptr; uint32_t caching; void *buf; igt_crc_t crc; @@ -67,6 +82,8 @@ static void test(data_t *data) igt_create_fb(data->drm_fd, mode->hdisplay, mode->vdisplay, DRM_FORMAT_XRGB, LOCAL_DRM_FORMAT_MOD_NONE, fb); + ptr = dmabuf_mmap_framebuffer(data->drm_fd, fb); + cr = igt_get_cairo_ctx(data->drm_fd, fb); igt_paint_test_pattern(cr, fb->width, fb->height); cairo_destroy(cr); @@ -83,11 +100,11 @@ static void test(data_t *data) caching = gem_get_caching(data->drm_fd, fb->gem_handle); igt_assert(caching == I915_CACHING_NONE || caching == I915_CACHING_DISPLAY); - /* use pwrite to make the other fb all white too */ + /* use dmabuf pointer to make the other fb all white too */ buf = malloc(fb->size); igt_assert(buf != NULL); memset(buf, 0xff, fb->size); - gem_write(data->drm_fd, fb->gem_handle, 0, buf, fb->size); + memcpy(ptr, buf, fb->size); free(buf); /* and flip to it */
[Intel-gfx] [PATCH 1/2] drm/i915: Use CPU mapping for userspace dma-buf mmap()
On 07/31/2015 06:02 PM, Chris Wilson wrote: > > The first problem is that llc does not guarrantee that the buffer is > cache coherent with all aspects of the GPU. For scanout and similar > writes need to be WC. > > if (obj->has_framebuffer_references) would at least catch where the fb > is made before the mmap. > > Equally this buffer could then be shared with other devices and exposing > a CPU mmap to userspace (and no flush/set-domain protocol) will result in > corruption. I've built an igt test to catch this corruption but it's not really falling there in my IvyBridge. If what you described is right (and so what I coded) then this test should write in the mapped buffer but not update the screen. Any idea what's going on? https://github.com/tiagovignatti/intel-gpu-tools/commit/3e130ac2b274f1a3f6889c78cb72d0673ca2.patch From 3e130ac2b274f1a3f68855559c78cb72d0673ca2 Mon Sep 17 00:00:00 2001 From: Tiago Vignatti Date: Tue, 4 Aug 2015 13:38:09 -0300 Subject: [PATCH] tests: Add prime_crc for cache coherency This program can be used to detect when the writes don't land in scanout, due cache incoherency. Run it like ./prime_crc --interactive-debug=crc Signed-off-by: Tiago Vignatti --- tests/.gitignore | 1 + tests/Makefile.sources | 1 + tests/prime_crc.c | 201 + 3 files changed, 203 insertions(+) create mode 100644 tests/prime_crc.c diff --git a/tests/.gitignore b/tests/.gitignore index 5bc4a58..96dbf57 100644 --- a/tests/.gitignore +++ b/tests/.gitignore @@ -160,6 +160,7 @@ pm_rc6_residency pm_rpm pm_rps pm_sseu +prime_crc prime_nv_api prime_nv_pcopy prime_nv_test diff --git a/tests/Makefile.sources b/tests/Makefile.sources index 5b2072e..c05b5a7 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -90,6 +90,7 @@ TESTS_progs_M = \ pm_rps \ pm_rc6_residency \ pm_sseu \ + prime_crc \ prime_mmap \ prime_self_import \ template \ diff --git a/tests/prime_crc.c b/tests/prime_crc.c new file mode 100644 index 000..3474cc9 --- /dev/null +++ b/tests/prime_crc.c @@ -0,0 +1,201 @@ +/* + * Copyright © 2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Tiago Vignatti + * + */ + +/* This program can detect when the writes don't land in scanout, due cache + * incoherency. */ + +#include "drmtest.h" +#include "igt_debugfs.h" +#include "igt_kms.h" + +#define MAX_CONNECTORS 32 + +struct modeset_params { + uint32_t crtc_id; + uint32_t connector_id; + drmModeModeInfoPtr mode; +}; + +int drm_fd; +drmModeResPtr drm_res; +drmModeConnectorPtr drm_connectors[MAX_CONNECTORS]; +drm_intel_bufmgr *bufmgr; +igt_pipe_crc_t *pipe_crc; + +struct modeset_params ms; + +static void find_modeset_params(void) +{ + int i; + uint32_t connector_id = 0, crtc_id; + drmModeModeInfoPtr mode = NULL; + + for (i = 0; i < drm_res->count_connectors; i++) { + drmModeConnectorPtr c = drm_connectors[i]; + + if (c->count_modes) { + connector_id = c->connector_id; + mode = &c->modes[0]; + break; + } + } + igt_require(connector_id); + + crtc_id = drm_res->crtcs[0]; + igt_assert(crtc_id); + igt_assert(mode); + + ms.connector_id = connector_id; + ms.crtc_id = crtc_id; + ms.mode = mode; + +} + +#define BO_SIZE (16*1024) + +char pattern[] = {0xff, 0x00, 0x00, 0x00, + 0x00, 0xff, 0x00, 0x00, + 0x00, 0x00, 0xff, 0x00, + 0x00, 0x00, 0x00, 0xff}; + +static void mess_with_coherency(char *ptr) +{ + off_t i; + + for (i = 0
[PATCH 2/2] drm: prime: Honour O_RDWR during prime-handle-to-fd
From: Daniel Thompson Currently DRM_IOCTL_PRIME_HANDLE_TO_FD rejects all flags except (DRM|O)_CLOEXEC making it difficult (maybe impossible) for userspace to mmap() the resulting dma-buf even when this is supported by the DRM driver. It is trivial to relax the restriction and permit read/write access. This is safe because the flags are seldom touched by drm; mostly they are passed verbatim to dma_buf calls. v3 (Tiago): removed unused flags variable from drm_prime_handle_to_fd_ioctl. Signed-off-by: Daniel Thompson Signed-off-by: Tiago Vignatti --- drivers/gpu/drm/drm_prime.c | 10 +++--- include/uapi/drm/drm.h | 1 + 2 files changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 27aa718..df6cdc7 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -329,7 +329,7 @@ static const struct dma_buf_ops drm_gem_prime_dmabuf_ops = { * drm_gem_prime_export - helper library implementation of the export callback * @dev: drm_device to export from * @obj: GEM object to export - * @flags: flags like DRM_CLOEXEC + * @flags: flags like DRM_CLOEXEC and DRM_RDWR * * This is the implementation of the gem_prime_export functions for GEM drivers * using the PRIME helpers. @@ -628,7 +628,6 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { struct drm_prime_handle *args = data; - uint32_t flags; if (!drm_core_check_feature(dev, DRIVER_PRIME)) return -EINVAL; @@ -637,14 +636,11 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data, return -ENOSYS; /* check flags are valid */ - if (args->flags & ~DRM_CLOEXEC) + if (args->flags & ~(DRM_CLOEXEC | DRM_RDWR)) return -EINVAL; - /* we only want to pass DRM_CLOEXEC which is == O_CLOEXEC */ - flags = args->flags & DRM_CLOEXEC; - return dev->driver->prime_handle_to_fd(dev, file_priv, - args->handle, flags, &args->fd); + args->handle, args->flags, &args->fd); } int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data, diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 3801584..ad8223e 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -668,6 +668,7 @@ struct drm_set_client_cap { __u64 value; }; +#define DRM_RDWR O_RDWR #define DRM_CLOEXEC O_CLOEXEC struct drm_prime_handle { __u32 handle; -- 2.1.0
[PATCH 1/2] drm/i915: Use CPU mapping for userspace dma-buf mmap()
For now we're opting out devices that don't have the LLC CPU cache (mostly "Atom" devices). Alternatively, we could build up a path to mmap them through GTT WC (and ignore the fact that will be dead-slow for reading). Or, an even more complex work I believe, would involve on setting up dma-buf ioctls to allow userspace flush, controlling manually the synchronization via begin{,end}_cpu_access. Signed-off-by: Tiago Vignatti --- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 21 - 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c index e9c2bfd..e6cb402 100644 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c @@ -193,7 +193,26 @@ static void i915_gem_dmabuf_kunmap(struct dma_buf *dma_buf, unsigned long page_n static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma) { - return -EINVAL; + struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); + struct drm_device *dev = obj->base.dev; + int ret; + + if (obj->base.size < vma->vm_end - vma->vm_start) + return -EINVAL; + + /* On non-LLC machines we'd need to be careful cause CPU and GPU don't +* share the CPU's L3 cache and coherency may hurt when CPU mapping. */ + if (!HAS_LLC(dev)) + return -EINVAL; + + if (!obj->base.filp) + return -EINVAL; + + ret = obj->base.filp->f_op->mmap(obj->base.filp, vma); + fput(vma->vm_file); + vma->vm_file = get_file(obj->base.filp); + + return ret; } static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, size_t start, size_t length, enum dma_data_direction direction) -- 2.1.0
[PATCH 0/2] mmap on the dma-buf directly
Hi, I've tested these two patches (in drm-intel-nightly, but also in CrOS kernel v3.14) and they seem just enough for what we want to do: the idea is to create a GEM bo in one process and pass the prime handle of the it to another process, which in turn uses the handle only to map and write. This could be useful for Chrome OS architecture, where the Web content ("unpriviledged process") maps and CPU-draws a buffer, which was previously allocated in the GPU process ("priviledged process"). I'm using a modified igt mostly to test these things. PTAL here: https://github.com/tiagovignatti/intel-gpu-tools/commits/prime_mmap Thank you, Tiago Daniel Thompson (1): drm: prime: Honour O_RDWR during prime-handle-to-fd Tiago Vignatti (1): drm/i915: Use CPU mapping for userspace dma-buf mmap() drivers/gpu/drm/drm_prime.c| 10 +++--- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 21 - include/uapi/drm/drm.h | 1 + 3 files changed, 24 insertions(+), 8 deletions(-) -- 2.1.0
[PATCH resent twice 2/3] vgaarb: use MIT license
Signed-off-by: Tiago Vignatti Cc: Henry Zhao --- Jesse and Dave, that was send two times already and no one said anything. Please, pull it. Oracle's Henry Zhao is already employing it in Solaris and, after all authors agreed, we haven't changed yet the license. drivers/gpu/vga/vgaarb.c | 26 +++--- include/linux/vgaarb.h | 21 + 2 files changed, 44 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/gpu/vga/vgaarb.c index 290b0cc..b87569e 100644 --- a/drivers/gpu/vga/vgaarb.c +++ b/drivers/gpu/vga/vgaarb.c @@ -1,12 +1,32 @@ /* - * vgaarb.c + * vgaarb.c: Implements the VGA arbitration. For details refer to + * Documentation/vgaarbiter.txt + * * * (C) Copyright 2005 Benjamin Herrenschmidt * (C) Copyright 2007 Paulo R. Zanoni * (C) Copyright 2007, 2009 Tiago Vignatti * - * Implements the VGA arbitration. For details refer to - * Documentation/vgaarbiter.txt + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS + * IN THE SOFTWARE. + * */ #include diff --git a/include/linux/vgaarb.h b/include/linux/vgaarb.h index 2dfaa29..c9a9759 100644 --- a/include/linux/vgaarb.h +++ b/include/linux/vgaarb.h @@ -5,6 +5,27 @@ * (C) Copyright 2005 Benjamin Herrenschmidt * (C) Copyright 2007 Paulo R. Zanoni * (C) Copyright 2007, 2009 Tiago Vignatti + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS + * IN THE SOFTWARE. + * */ #ifndef LINUX_VGA_H -- 1.6.0.4 ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH 1/3] vgaarb: convert pr_devel() to pr_debug()
We want to be able to use CONFIG_DYNAMIC_DEBUG in arbiter code, switch the few existing pr_devel() calls to pr_debug(). Also, add one more debug information regarding decoding count. Signed-off-by: Tiago Vignatti --- drivers/gpu/vga/vgaarb.c | 35 ++- 1 files changed, 18 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/gpu/vga/vgaarb.c index 441e38c..290b0cc 100644 --- a/drivers/gpu/vga/vgaarb.c +++ b/drivers/gpu/vga/vgaarb.c @@ -155,8 +155,8 @@ static struct vga_device *__vga_tryget(struct vga_device *vgadev, (vgadev->decodes & VGA_RSRC_LEGACY_MEM)) rsrc |= VGA_RSRC_LEGACY_MEM; - pr_devel("%s: %d\n", __func__, rsrc); - pr_devel("%s: owns: %d\n", __func__, vgadev->owns); + pr_debug("%s: %d\n", __func__, rsrc); + pr_debug("%s: owns: %d\n", __func__, vgadev->owns); /* Check what resources we need to acquire */ wants = rsrc & ~vgadev->owns; @@ -268,7 +268,7 @@ static void __vga_put(struct vga_device *vgadev, unsigned int rsrc) { unsigned int old_locks = vgadev->locks; - pr_devel("%s\n", __func__); + pr_debug("%s\n", __func__); /* Update our counters, and account for equivalent legacy resources * if we decode them @@ -575,6 +575,7 @@ static inline void vga_update_device_decodes(struct vga_device *vgadev, else vga_decode_count--; } + pr_debug("vgaarb: decoding count now is: %d\n", vga_decode_count); } void __vga_set_legacy_decoding(struct pci_dev *pdev, unsigned int decodes, bool userspace) @@ -831,7 +832,7 @@ static ssize_t vga_arb_write(struct file *file, const char __user * buf, curr_pos += 5; remaining -= 5; - pr_devel("client 0x%p called 'lock'\n", priv); + pr_debug("client 0x%p called 'lock'\n", priv); if (!vga_str_to_iostate(curr_pos, remaining, &io_state)) { ret_val = -EPROTO; @@ -867,7 +868,7 @@ static ssize_t vga_arb_write(struct file *file, const char __user * buf, curr_pos += 7; remaining -= 7; - pr_devel("client 0x%p called 'unlock'\n", priv); + pr_debug("client 0x%p called 'unlock'\n", priv); if (strncmp(curr_pos, "all", 3) == 0) io_state = VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM; @@ -917,7 +918,7 @@ static ssize_t vga_arb_write(struct file *file, const char __user * buf, curr_pos += 8; remaining -= 8; - pr_devel("client 0x%p called 'trylock'\n", priv); + pr_debug("client 0x%p called 'trylock'\n", priv); if (!vga_str_to_iostate(curr_pos, remaining, &io_state)) { ret_val = -EPROTO; @@ -961,7 +962,7 @@ static ssize_t vga_arb_write(struct file *file, const char __user * buf, curr_pos += 7; remaining -= 7; - pr_devel("client 0x%p called 'target'\n", priv); + pr_debug("client 0x%p called 'target'\n", priv); /* if target is default */ if (!strncmp(curr_pos, "default", 7)) pdev = pci_dev_get(vga_default_device()); @@ -971,11 +972,11 @@ static ssize_t vga_arb_write(struct file *file, const char __user * buf, ret_val = -EPROTO; goto done; } - pr_devel("vgaarb: %s ==> %x:%x:%x.%x\n", curr_pos, + pr_debug("vgaarb: %s ==> %x:%x:%x.%x\n", curr_pos, domain, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); pbus = pci_find_bus(domain, bus); - pr_devel("vgaarb: pbus %p\n", pbus); + pr_debug("vgaarb: pbus %p\n", pbus); if (pbus == NULL) { pr_err("vgaarb: invalid PCI domain and/or bus address %x:%x\n", domain, bus); @@ -983,7 +984,7 @@ static ssize_t vga_arb_write(struct file *file, const char __user * buf, goto done; } pdev = pci_get_slot(pbus, devfn); - pr_devel("vgaarb: pdev %p\n", pdev); + pr_debug("vgaarb: pdev %p\n", pdev); if (!pdev) { pr_err("vgaarb: invalid PCI address %x:%x\n
[PATCH resent twice 2/3] vgaarb: use MIT license
Signed-off-by: Tiago Vignatti Cc: Henry Zhao --- Jesse and Dave, that was send two times already and no one said anything. Please, pull it. Oracle's Henry Zhao is already employing it in Solaris and, after all authors agreed, we haven't changed yet the license. drivers/gpu/vga/vgaarb.c | 26 +++--- include/linux/vgaarb.h | 21 + 2 files changed, 44 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/gpu/vga/vgaarb.c index 290b0cc..b87569e 100644 --- a/drivers/gpu/vga/vgaarb.c +++ b/drivers/gpu/vga/vgaarb.c @@ -1,12 +1,32 @@ /* - * vgaarb.c + * vgaarb.c: Implements the VGA arbitration. For details refer to + * Documentation/vgaarbiter.txt + * * * (C) Copyright 2005 Benjamin Herrenschmidt * (C) Copyright 2007 Paulo R. Zanoni * (C) Copyright 2007, 2009 Tiago Vignatti * - * Implements the VGA arbitration. For details refer to - * Documentation/vgaarbiter.txt + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS + * IN THE SOFTWARE. + * */ #include diff --git a/include/linux/vgaarb.h b/include/linux/vgaarb.h index 2dfaa29..c9a9759 100644 --- a/include/linux/vgaarb.h +++ b/include/linux/vgaarb.h @@ -5,6 +5,27 @@ * (C) Copyright 2005 Benjamin Herrenschmidt * (C) Copyright 2007 Paulo R. Zanoni * (C) Copyright 2007, 2009 Tiago Vignatti + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS + * IN THE SOFTWARE. + * */ #ifndef LINUX_VGA_H -- 1.6.0.4
[PATCH 1/3] vgaarb: convert pr_devel() to pr_debug()
We want to be able to use CONFIG_DYNAMIC_DEBUG in arbiter code, switch the few existing pr_devel() calls to pr_debug(). Also, add one more debug information regarding decoding count. Signed-off-by: Tiago Vignatti --- drivers/gpu/vga/vgaarb.c | 35 ++- 1 files changed, 18 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/gpu/vga/vgaarb.c index 441e38c..290b0cc 100644 --- a/drivers/gpu/vga/vgaarb.c +++ b/drivers/gpu/vga/vgaarb.c @@ -155,8 +155,8 @@ static struct vga_device *__vga_tryget(struct vga_device *vgadev, (vgadev->decodes & VGA_RSRC_LEGACY_MEM)) rsrc |= VGA_RSRC_LEGACY_MEM; - pr_devel("%s: %d\n", __func__, rsrc); - pr_devel("%s: owns: %d\n", __func__, vgadev->owns); + pr_debug("%s: %d\n", __func__, rsrc); + pr_debug("%s: owns: %d\n", __func__, vgadev->owns); /* Check what resources we need to acquire */ wants = rsrc & ~vgadev->owns; @@ -268,7 +268,7 @@ static void __vga_put(struct vga_device *vgadev, unsigned int rsrc) { unsigned int old_locks = vgadev->locks; - pr_devel("%s\n", __func__); + pr_debug("%s\n", __func__); /* Update our counters, and account for equivalent legacy resources * if we decode them @@ -575,6 +575,7 @@ static inline void vga_update_device_decodes(struct vga_device *vgadev, else vga_decode_count--; } + pr_debug("vgaarb: decoding count now is: %d\n", vga_decode_count); } void __vga_set_legacy_decoding(struct pci_dev *pdev, unsigned int decodes, bool userspace) @@ -831,7 +832,7 @@ static ssize_t vga_arb_write(struct file *file, const char __user * buf, curr_pos += 5; remaining -= 5; - pr_devel("client 0x%p called 'lock'\n", priv); + pr_debug("client 0x%p called 'lock'\n", priv); if (!vga_str_to_iostate(curr_pos, remaining, &io_state)) { ret_val = -EPROTO; @@ -867,7 +868,7 @@ static ssize_t vga_arb_write(struct file *file, const char __user * buf, curr_pos += 7; remaining -= 7; - pr_devel("client 0x%p called 'unlock'\n", priv); + pr_debug("client 0x%p called 'unlock'\n", priv); if (strncmp(curr_pos, "all", 3) == 0) io_state = VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM; @@ -917,7 +918,7 @@ static ssize_t vga_arb_write(struct file *file, const char __user * buf, curr_pos += 8; remaining -= 8; - pr_devel("client 0x%p called 'trylock'\n", priv); + pr_debug("client 0x%p called 'trylock'\n", priv); if (!vga_str_to_iostate(curr_pos, remaining, &io_state)) { ret_val = -EPROTO; @@ -961,7 +962,7 @@ static ssize_t vga_arb_write(struct file *file, const char __user * buf, curr_pos += 7; remaining -= 7; - pr_devel("client 0x%p called 'target'\n", priv); + pr_debug("client 0x%p called 'target'\n", priv); /* if target is default */ if (!strncmp(curr_pos, "default", 7)) pdev = pci_dev_get(vga_default_device()); @@ -971,11 +972,11 @@ static ssize_t vga_arb_write(struct file *file, const char __user * buf, ret_val = -EPROTO; goto done; } - pr_devel("vgaarb: %s ==> %x:%x:%x.%x\n", curr_pos, + pr_debug("vgaarb: %s ==> %x:%x:%x.%x\n", curr_pos, domain, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); pbus = pci_find_bus(domain, bus); - pr_devel("vgaarb: pbus %p\n", pbus); + pr_debug("vgaarb: pbus %p\n", pbus); if (pbus == NULL) { pr_err("vgaarb: invalid PCI domain and/or bus address %x:%x\n", domain, bus); @@ -983,7 +984,7 @@ static ssize_t vga_arb_write(struct file *file, const char __user * buf, goto done; } pdev = pci_get_slot(pbus, devfn); - pr_devel("vgaarb: pdev %p\n", pdev); + pr_debug("vgaarb: pdev %p\n", pdev); if (!pdev) { pr_err("vgaarb: invalid PCI address %x:%x\