from:"Tiago Vignatti"

[PATCH] dma-buf: Update docs for SYNC ioctl

2016-03-29 Thread Tiago Vignatti

On 03/29/2016 06:47 AM, David Herrmann wrote:
> Hi
>
> On Mon, Mar 28, 2016 at 9:42 PM, Tiago Vignatti
>  wrote:
>> Do we have an agreement here after all? David? I need to know whether this
>> fixup is okay to go cause I'll need to submit to Chrome OS then.
>
> Sure it is fine. The code is already there, we cannot change it.

ah true. Only now that I've noticed it's already in Linus tree. Thanks 
anyway!

Tiago

[PATCH] dma-buf: Update docs for SYNC ioctl

2016-03-28 Thread Tiago Vignatti

On 03/23/2016 12:42 PM, Chris Wilson wrote:
> On Wed, Mar 23, 2016 at 04:32:59PM +0100, David Herrmann wrote:
>> Hi
>>
>> On Wed, Mar 23, 2016 at 12:56 PM, Chris Wilson  
>> wrote:
>>> On Wed, Mar 23, 2016 at 12:30:42PM +0100, David Herrmann wrote:
 My question was rather about why we do this? Semantics for EINTR are
 well defined, and with SA_RESTART (default on linux) user-space can
 ignore it. However, looping on EAGAIN is very uncommon, and it is not
 at all clear why it is needed?

 Returning an error to user-space makes sense if user-space has a
 reason to react to it. I fail to see how EAGAIN on a cache-flush/sync
 operation helps user-space at all? As someone without insight into the
 driver implementation, it is hard to tell why.. Any hints?
>>>
>>> The reason we return EAGAIN is to workaround a deadlock we face when
>>> blocking on the GPU holding the struct_mutex (inside the client's
>>> process), but the GPU is dead. As our locking is very, very coarse we
>>> cannot restart the GPU without acquiring the struct_mutex being held by
>>> the client so we wake the client up and tell them the resource they are
>>> waiting on (the flush of the object from the GPU into the CPU domain) is
>>> temporarily unavailable. If they try to immediately wait upon the ioctl
>>> again, they are blocked waiting for the reset to occur before they may
>>> complete their flush. There are a few other possible deadlocks that are
>>> also avoided with EAGAIN (again, the issue is more or less the lack of
>>> fine grained locking).
>>
>> ...so you hijacked EAGAIN for all DRM ioctls just for a driver
>> workaround?
>
> No, we utilized the fact that EAGAIN was already enshrined by libdrm as
> the defacto mechanism for repeating the ioctl in order to repeat the
> ioctl for a driver workaround.

Do we have an agreement here after all? David? I need to know whether 
this fixup is okay to go cause I'll need to submit to Chrome OS then.

Best Regards,

Tiago

[PATCH] dma-buf: Update docs for SYNC ioctl

2016-03-21 Thread Tiago Vignatti

On 03/21/2016 04:51 AM, Daniel Vetter wrote:
> Just a bit of wording polish plus mentioning that it can fail and must
> be restarted.
>
> Requested by Sumit.
>
> v2: Fix them typos (Hans).
>
> Cc: Chris Wilson 
> Cc: Tiago Vignatti 
> Cc: StÃ©phane Marchesin 
> Cc: David Herrmann 
> Cc: Sumit Semwal 
> Cc: Daniel Vetter 
> CC: linux-media at vger.kernel.org
> Cc: dri-devel at lists.freedesktop.org
> Cc: linaro-mm-sig at lists.linaro.org
> Cc: intel-gfx at lists.freedesktop.org
> Cc: devel at driverdev.osuosl.org
> Cc: Hans Verkuil 
> Acked-by: Sumit Semwal 
> Signed-off-by: Daniel Vetter 

Reviewed-by: Tiago Vignatti 

Best regards,

Tiago


> ---
>   Documentation/dma-buf-sharing.txt | 11 ++-
>   drivers/dma-buf/dma-buf.c |  2 +-
>   2 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/dma-buf-sharing.txt 
> b/Documentation/dma-buf-sharing.txt
> index 32ac32e773e1..ca44c5820585 100644
> --- a/Documentation/dma-buf-sharing.txt
> +++ b/Documentation/dma-buf-sharing.txt
> @@ -352,7 +352,8 @@ Being able to mmap an export dma-buf buffer object has 2 
> main use-cases:
>
>  No special interfaces, userspace simply calls mmap on the dma-buf fd, 
> making
>  sure that the cache synchronization ioctl (DMA_BUF_IOCTL_SYNC) is 
> *always*
> -   used when the access happens. This is discussed next paragraphs.
> +   used when the access happens. Note that DMA_BUF_IOCTL_SYNC can fail with
> +   -EAGAIN or -EINTR, in which case it must be restarted.
>
>  Some systems might need some sort of cache coherency management e.g. when
>  CPU and GPU domains are being accessed through dma-buf at the same time. 
> To
> @@ -366,10 +367,10 @@ Being able to mmap an export dma-buf buffer object has 
> 2 main use-cases:
>  want (with the new data being consumed by the GPU or say scanout 
> device)
>- munmap once you don't need the buffer any more
>
> -Therefore, for correctness and optimal performance, systems with the 
> memory
> -cache shared by the GPU and CPU i.e. the "coherent" and also the
> -"incoherent" are always required to use SYNC_START and SYNC_END before 
> and
> -after, respectively, when accessing the mapped address.
> +For correctness and optimal performance, it is always required to use
> +SYNC_START and SYNC_END before and after, respectively, when accessing 
> the
> +mapped address. Userspace cannot rely on coherent access, even when there
> +are systems where it just works without calling these ioctls.
>
>   2. Supporting existing mmap interfaces in importers
>
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index 774a60f4309a..4a2c07ee6677 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -612,7 +612,7 @@ EXPORT_SYMBOL_GPL(dma_buf_begin_cpu_access);
>* @dmabuf: [in]buffer to complete cpu access for.
>* @direction:  [in]length of range for cpu access.
>*
> - * This call must always succeed.
> + * Can return negative error values, returns 0 on success.
>*/
>   int dma_buf_end_cpu_access(struct dma_buf *dmabuf,
>  enum dma_data_direction direction)
>

[PATCH] dma-buf,drm,ion: Propagate error code from dma_buf_start_cpu_access()

2016-03-21 Thread Tiago Vignatti

On 03/18/2016 05:02 PM, Chris Wilson wrote:
> Drivers, especially i915.ko, can fail during the initial migration of a
> dma-buf for CPU access. However, the error code from the driver was not
> being propagated back to ioctl and so userspace was blissfully ignorant
> of the failure. Rendering corruption ensues.
>
> Whilst fixing the ioctl to return the error code from
> dma_buf_start_cpu_access(), also do the same for
> dma_buf_end_cpu_access().  For most drivers, dma_buf_end_cpu_access()
> cannot fail. i915.ko however, as most drivers would, wants to avoid being
> uninterruptible (as would be required to guarrantee no failure when
> flushing the buffer to the device). As userspace already has to handle
> errors from the SYNC_IOCTL, take advantage of this to be able to restart
> the syscall across signals.
>
> This fixes a coherency issue for i915.ko as well as reducing the
> uninterruptible hold upon its BKL, the struct_mutex.
>
> Fixes commit c11e391da2a8fe973c3c2398452000bed505851e
> Author: Daniel Vetter 
> Date:   Thu Feb 11 20:04:51 2016 -0200
>
>  dma-buf: Add ioctls to allow userspace to flush
>
> Testcase: igt/gem_concurrent_blit/*dmabuf*interruptible
> Testcase: igt/prime_mmap_coherency/ioctl-errors
> Signed-off-by: Chris Wilson 
> Cc: Tiago Vignatti 
> Cc: StÃ©phane Marchesin 
> Cc: David Herrmann 
> Cc: Sumit Semwal 
> Cc: Daniel Vetter 
> CC: linux-media at vger.kernel.org
> Cc: dri-devel at lists.freedesktop.org
> Cc: linaro-mm-sig at lists.linaro.org
> Cc: intel-gfx at lists.freedesktop.org
> Cc: devel at driverdev.osuosl.org

Reviewed-by: Tiago Vignatti 

Best regards,

Tiago

[PATCH v3] prime_mmap_coherency: Add return error tests for prime sync ioctl

2016-03-18 Thread Tiago Vignatti

On 03/18/2016 03:11 PM, Daniel Vetter wrote:
> On Fri, Mar 18, 2016 at 03:08:56PM -0300, Tiago Vignatti wrote:
>> This patch adds ioctl-errors subtest to be used for exercising prime sync 
>> ioctl
>> errors.
>>
>> The subtest constantly interrupts via signals a function doing concurrent 
>> blit
>> to stress out the right usage of prime_sync_*, making sure these ioctl errors
>> are handled accordingly. Important to note that in case of failure (e.g. in a
>> case where the ioctl wouldn't try again in a return error) this test does not
>> reliably catch the problem with 100% of accuracy.
>>
>> v2: fix prime sync direction when reading mmap'ed file.
>> v3: change the upper bound using time rather than loops
>>
>> Cc: Chris Wilson 
>> Signed-off-by: Tiago Vignatti 
>
> I'm probably blind, but where is the reviewed kernel patch for this? If
> it's somewhere hidden, please resubmit with all the whizzbang stuff needed
> for merging added ;-)
>
> Thanks, Daniel

You're not blind Daniel :) Chris will be sending the kernel side but 
regardless this igt test should be good to go even without the kernel patch.

Tiago

[PATCH v3] prime_mmap_coherency: Add return error tests for prime sync ioctl

2016-03-18 Thread Tiago Vignatti

This patch adds ioctl-errors subtest to be used for exercising prime sync ioctl
errors.

The subtest constantly interrupts via signals a function doing concurrent blit
to stress out the right usage of prime_sync_*, making sure these ioctl errors
are handled accordingly. Important to note that in case of failure (e.g. in a
case where the ioctl wouldn't try again in a return error) this test does not
reliably catch the problem with 100% of accuracy.

v2: fix prime sync direction when reading mmap'ed file.
v3: change the upper bound using time rather than loops

Cc: Chris Wilson 
Signed-off-by: Tiago Vignatti 
---
 tests/prime_mmap_coherency.c | 89 
 1 file changed, 89 insertions(+)

diff --git a/tests/prime_mmap_coherency.c b/tests/prime_mmap_coherency.c
index 180d8a4..d2b2a4f 100644
--- a/tests/prime_mmap_coherency.c
+++ b/tests/prime_mmap_coherency.c
@@ -180,6 +180,90 @@ static void test_write_flush(bool expect_stale_cache)
munmap(ptr_cpu, width * height);
 }

+static void blit_and_cmp(void)
+{
+   drm_intel_bo *bo_1;
+   drm_intel_bo *bo_2;
+   uint32_t *ptr_cpu;
+   uint32_t *ptr2_cpu;
+   int dma_buf_fd, dma_buf2_fd, i;
+   int local_fd;
+   drm_intel_bufmgr *local_bufmgr;
+   struct intel_batchbuffer *local_batch;
+
+   /* recreate process local variables */
+   local_fd = drm_open_driver(DRIVER_INTEL);
+   local_bufmgr = drm_intel_bufmgr_gem_init(local_fd, 4096);
+   igt_assert(local_bufmgr);
+
+   local_batch = intel_batchbuffer_alloc(local_bufmgr, 
intel_get_drm_devid(local_fd));
+   igt_assert(local_batch);
+
+   bo_1 = drm_intel_bo_alloc(local_bufmgr, "BO 1", width * height * 4, 
4096);
+   dma_buf_fd = prime_handle_to_fd_for_mmap(local_fd, bo_1->handle);
+   igt_skip_on(errno == EINVAL);
+
+   ptr_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE,
+   MAP_SHARED, dma_buf_fd, 0);
+   igt_assert(ptr_cpu != MAP_FAILED);
+
+   bo_2 = drm_intel_bo_alloc(local_bufmgr, "BO 2", width * height * 4, 
4096);
+   dma_buf2_fd = prime_handle_to_fd_for_mmap(local_fd, bo_2->handle);
+
+   ptr2_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE,
+   MAP_SHARED, dma_buf2_fd, 0);
+   igt_assert(ptr2_cpu != MAP_FAILED);
+
+   /* Fill up BO 1 with '1's and BO 2 with '0's */
+   prime_sync_start(dma_buf_fd, true);
+   memset(ptr_cpu, 0x11, width * height);
+   prime_sync_end(dma_buf_fd, true);
+
+   prime_sync_start(dma_buf2_fd, true);
+   memset(ptr2_cpu, 0x00, width * height);
+   prime_sync_end(dma_buf2_fd, true);
+
+   /* Copy BO 1 into BO 2, using blitter. */
+   intel_copy_bo(local_batch, bo_2, bo_1, width * height);
+   usleep(0); /* let someone else claim the mutex */
+
+   /* Compare BOs. If prime_sync_* were executed properly, the caches
+* should be synced. */
+   prime_sync_start(dma_buf2_fd, false);
+   for (i = 0; i < (width * height) / 4; i++)
+   igt_fail_on_f(ptr2_cpu[i] != 0x, "Found 0x%08x at 
offset 0x%08x\n", ptr2_cpu[i], i);
+   prime_sync_end(dma_buf2_fd, false);
+
+   drm_intel_bo_unreference(bo_1);
+   drm_intel_bo_unreference(bo_2);
+   munmap(ptr_cpu, width * height);
+   munmap(ptr2_cpu, width * height);
+}
+
+/*
+ * Constantly interrupt concurrent blits to stress out prime_sync_* and make
+ * sure these ioctl errors are handled accordingly.
+ *
+ * Important to note that in case of failure (e.g. in a case where the ioctl
+ * wouldn't try again in a return error) this test does not reliably catch the
+ * problem with 100% of accuracy.
+ */
+static void test_ioctl_errors(void)
+{
+   int ncpus = sysconf(_SC_NPROCESSORS_ONLN);
+
+   igt_fork_signal_helper();
+   for (int num_children = 1; num_children <= 8 *ncpus; num_children <<= 
1) {
+   igt_fork(child, num_children) {
+   struct timespec start = {};
+   while (igt_nsec_elapsed(&start) <= num_children)
+   blit_and_cmp();
+   }
+   igt_waitchildren();
+   }
+   igt_stop_signal_helper();
+}
+
 int main(int argc, char **argv)
 {
int i;
@@ -235,6 +319,11 @@ int main(int argc, char **argv)
igt_fail_on_f(!stale, "couldn't find any stale cache lines\n");
}

+   igt_subtest("ioctl-errors") {
+   igt_info("exercising concurrent blit to get ioctl errors\n");
+   test_ioctl_errors();
+   }
+
igt_fixture {
intel_batchbuffer_free(batch);
drm_intel_bufmgr_destroy(bufmgr);
-- 
2.1.4

[PATCH v2] prime_mmap_coherency: Add return error tests for prime sync ioctl

2016-03-17 Thread Tiago Vignatti

This patch adds ioctl-errors subtest to be used for exercising prime sync ioctl
errors.

The subtest constantly interrupts via signals a function doing concurrent blit
to stress out the right usage of prime_sync_*, making sure these ioctl errors
are handled accordingly. Important to note that in case of failure (e.g. in a
case where the ioctl wouldn't try again in a return error) this test does not
reliably catch the problem with 100% of accuracy.

v2: fix prime sync direction when reading mmap'ed file.

Cc: Chris Wilson 
Signed-off-by: Tiago Vignatti 
---
 tests/prime_mmap_coherency.c | 87 
 1 file changed, 87 insertions(+)

diff --git a/tests/prime_mmap_coherency.c b/tests/prime_mmap_coherency.c
index 180d8a4..80d1c1f 100644
--- a/tests/prime_mmap_coherency.c
+++ b/tests/prime_mmap_coherency.c
@@ -180,6 +180,88 @@ static void test_write_flush(bool expect_stale_cache)
munmap(ptr_cpu, width * height);
 }

+static void blit_and_cmp(void)
+{
+   drm_intel_bo *bo_1;
+   drm_intel_bo *bo_2;
+   uint32_t *ptr_cpu;
+   uint32_t *ptr2_cpu;
+   int dma_buf_fd, dma_buf2_fd, i;
+   int local_fd;
+   drm_intel_bufmgr *local_bufmgr;
+   struct intel_batchbuffer *local_batch;
+
+   /* recreate process local variables */
+   local_fd = drm_open_driver(DRIVER_INTEL);
+   local_bufmgr = drm_intel_bufmgr_gem_init(local_fd, 4096);
+   igt_assert(local_bufmgr);
+
+   local_batch = intel_batchbuffer_alloc(local_bufmgr, 
intel_get_drm_devid(local_fd));
+   igt_assert(local_batch);
+
+   bo_1 = drm_intel_bo_alloc(local_bufmgr, "BO 1", width * height * 4, 
4096);
+   dma_buf_fd = prime_handle_to_fd_for_mmap(local_fd, bo_1->handle);
+   igt_skip_on(errno == EINVAL);
+
+   ptr_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE,
+   MAP_SHARED, dma_buf_fd, 0);
+   igt_assert(ptr_cpu != MAP_FAILED);
+
+   bo_2 = drm_intel_bo_alloc(local_bufmgr, "BO 2", width * height * 4, 
4096);
+   dma_buf2_fd = prime_handle_to_fd_for_mmap(local_fd, bo_2->handle);
+
+   ptr2_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE,
+   MAP_SHARED, dma_buf2_fd, 0);
+   igt_assert(ptr2_cpu != MAP_FAILED);
+
+   /* Fill up BO 1 with '1's and BO 2 with '0's */
+   prime_sync_start(dma_buf_fd, true);
+   memset(ptr_cpu, 0x11, width * height);
+   prime_sync_end(dma_buf_fd, true);
+
+   prime_sync_start(dma_buf2_fd, true);
+   memset(ptr2_cpu, 0x00, width * height);
+   prime_sync_end(dma_buf2_fd, true);
+
+   /* Copy BO 1 into BO 2, using blitter. */
+   intel_copy_bo(local_batch, bo_2, bo_1, width * height);
+   usleep(0); /* let someone else claim the mutex */
+
+   /* Compare BOs. If prime_sync_* were executed properly, the caches
+* should be synced. */
+   prime_sync_start(dma_buf2_fd, false);
+   for (i = 0; i < (width * height) / 4; i++)
+   igt_fail_on_f(ptr2_cpu[i] != 0x, "Found 0x%08x at 
offset 0x%08x\n", ptr2_cpu[i], i);
+   prime_sync_end(dma_buf2_fd, false);
+
+   drm_intel_bo_unreference(bo_1);
+   drm_intel_bo_unreference(bo_2);
+   munmap(ptr_cpu, width * height);
+   munmap(ptr2_cpu, width * height);
+}
+
+/*
+ * Constantly interrupt concurrent blits to stress out prime_sync_* and make
+ * sure these ioctl errors are handled accordingly.
+ *
+ * Important to note that in case of failure (e.g. in a case where the ioctl
+ * wouldn't try again in a return error) this test does not reliably catch the
+ * problem with 100% of accuracy.
+ */
+static void test_ioctl_errors(void)
+{
+   int i;
+   int num_children = 8*sysconf(_SC_NPROCESSORS_ONLN);
+
+   igt_fork_signal_helper();
+   igt_fork(child, num_children) {
+   for (i = 0; i < ROUNDS; i++)
+   blit_and_cmp();
+   }
+   igt_waitchildren();
+   igt_stop_signal_helper();
+}
+
 int main(int argc, char **argv)
 {
int i;
@@ -235,6 +317,11 @@ int main(int argc, char **argv)
igt_fail_on_f(!stale, "couldn't find any stale cache lines\n");
}

+   igt_subtest("ioctl-errors") {
+   igt_info("exercising concurrent blit to get ioctl errors\n");
+   test_ioctl_errors();
+   }
+
igt_fixture {
intel_batchbuffer_free(batch);
drm_intel_bufmgr_destroy(bufmgr);
-- 
2.1.4

[PATCH] prime_mmap_coherency: Add return error tests for prime sync ioctl

2016-03-17 Thread Tiago Vignatti

On 03/17/2016 06:01 PM, Chris Wilson wrote:
> On Thu, Mar 17, 2016 at 03:18:03PM -0300, Tiago Vignatti wrote:
>> This patch adds ioctl-errors subtest to be used for exercising prime sync 
>> ioctl
>> errors.
>>
>> The subtest constantly interrupts via signals a function doing concurrent 
>> blit
>> to stress out the right usage of prime_sync_*, making sure these ioctl errors
>> are handled accordingly. Important to note that in case of failure (e.g. in a
>> case where the ioctl wouldn't try again in a return error) this test does not
>> reliably catch the problem with 100% of accuracy.
>>
>> Cc: Chris Wilson 
>> Signed-off-by: Tiago Vignatti 
>> ---
>>
>> Chris, your unpolished dma-buf patch for adding return error into the ioctl
>> calls lgtm. Let me know if you think this kind of test is useful now in igt.
>>
>> Thanks
>>
>> Tiago
>>
>>   tests/prime_mmap_coherency.c | 87 
>> 
>>   1 file changed, 87 insertions(+)
>>
>> diff --git a/tests/prime_mmap_coherency.c b/tests/prime_mmap_coherency.c
>> index 180d8a4..bae2144 100644
>> --- a/tests/prime_mmap_coherency.c
>> +++ b/tests/prime_mmap_coherency.c
>> @@ -180,6 +180,88 @@ static void test_write_flush(bool expect_stale_cache)
>>  munmap(ptr_cpu, width * height);
>>   }
>>
>> +static void blit_and_cmp(void)
>> +{
>> +drm_intel_bo *bo_1;
>> +drm_intel_bo *bo_2;
>> +uint32_t *ptr_cpu;
>> +uint32_t *ptr2_cpu;
>> +int dma_buf_fd, dma_buf2_fd, i;
>> +int local_fd;
>> +drm_intel_bufmgr *local_bufmgr;
>> +struct intel_batchbuffer *local_batch;
>> +
>> +/* recreate process local variables */
>> +local_fd = drm_open_driver(DRIVER_INTEL);
>> +local_bufmgr = drm_intel_bufmgr_gem_init(local_fd, 4096);
>> +igt_assert(local_bufmgr);
>> +
>> +local_batch = intel_batchbuffer_alloc(local_bufmgr, 
>> intel_get_drm_devid(local_fd));
>> +igt_assert(local_batch);
>> +
>> +bo_1 = drm_intel_bo_alloc(local_bufmgr, "BO 1", width * height * 4, 
>> 4096);
>> +dma_buf_fd = prime_handle_to_fd_for_mmap(local_fd, bo_1->handle);
>> +igt_skip_on(errno == EINVAL);
>> +
>> +ptr_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE,
>> +MAP_SHARED, dma_buf_fd, 0);
>> +igt_assert(ptr_cpu != MAP_FAILED);
>> +
>> +bo_2 = drm_intel_bo_alloc(local_bufmgr, "BO 2", width * height * 4, 
>> 4096);
>> +dma_buf2_fd = prime_handle_to_fd_for_mmap(local_fd, bo_2->handle);
>> +
>> +ptr2_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE,
>> +MAP_SHARED, dma_buf2_fd, 0);
>> +igt_assert(ptr2_cpu != MAP_FAILED);
>> +
>> +/* Fill up BO 1 with '1's and BO 2 with '0's */
>> +prime_sync_start(dma_buf_fd, true);
>> +memset(ptr_cpu, 0x11, width * height);
>> +prime_sync_end(dma_buf_fd, true);
>> +
>> +prime_sync_start(dma_buf2_fd, true);
>> +memset(ptr2_cpu, 0x00, width * height);
>> +prime_sync_end(dma_buf2_fd, true);
>> +
>> +/* Copy BO 1 into BO 2, using blitter. */
>> +intel_copy_bo(local_batch, bo_2, bo_1, width * height);
>> +usleep(0); /* let someone else claim the mutex */
>> +
>> +/* Compare BOs. If prime_sync_* were executed properly, the caches
>> + * should be synced. */
>> +prime_sync_start(dma_buf2_fd, true);
>
> Maybe false here? Note that it makes any difference for the driver atm.

oh, my bad.

>> +for (i = 0; i < (width * height) / 4; i++)
>> +igt_fail_on_f(ptr2_cpu[i] != 0x, "Found 0x%08x at 
>> offset 0x%08x\n", ptr2_cpu[i], i);
>> +prime_sync_end(dma_buf2_fd, true);
>> +
>> +drm_intel_bo_unreference(bo_1);
>> +drm_intel_bo_unreference(bo_2);
>> +munmap(ptr_cpu, width * height);
>> +munmap(ptr2_cpu, width * height);
>
> Do we have anything that verifies that dmabuf maps persist beyond
> gem_close() on the original bo?

that's test_refcounting in prime_mmap.c


> Yes, that test should hit all interruptible paths we have in dmabuf and
> would be a great addition to igt.

cool, thanks. I'm resending now v2.

Tiago

[PATCH] prime_mmap_coherency: Add return error tests for prime sync ioctl

2016-03-17 Thread Tiago Vignatti

This patch adds ioctl-errors subtest to be used for exercising prime sync ioctl
errors.

The subtest constantly interrupts via signals a function doing concurrent blit
to stress out the right usage of prime_sync_*, making sure these ioctl errors
are handled accordingly. Important to note that in case of failure (e.g. in a
case where the ioctl wouldn't try again in a return error) this test does not
reliably catch the problem with 100% of accuracy.

Cc: Chris Wilson 
Signed-off-by: Tiago Vignatti 
---

Chris, your unpolished dma-buf patch for adding return error into the ioctl
calls lgtm. Let me know if you think this kind of test is useful now in igt.

Thanks

Tiago

 tests/prime_mmap_coherency.c | 87 
 1 file changed, 87 insertions(+)

diff --git a/tests/prime_mmap_coherency.c b/tests/prime_mmap_coherency.c
index 180d8a4..bae2144 100644
--- a/tests/prime_mmap_coherency.c
+++ b/tests/prime_mmap_coherency.c
@@ -180,6 +180,88 @@ static void test_write_flush(bool expect_stale_cache)
munmap(ptr_cpu, width * height);
 }

+static void blit_and_cmp(void)
+{
+   drm_intel_bo *bo_1;
+   drm_intel_bo *bo_2;
+   uint32_t *ptr_cpu;
+   uint32_t *ptr2_cpu;
+   int dma_buf_fd, dma_buf2_fd, i;
+   int local_fd;
+   drm_intel_bufmgr *local_bufmgr;
+   struct intel_batchbuffer *local_batch;
+
+   /* recreate process local variables */
+   local_fd = drm_open_driver(DRIVER_INTEL);
+   local_bufmgr = drm_intel_bufmgr_gem_init(local_fd, 4096);
+   igt_assert(local_bufmgr);
+
+   local_batch = intel_batchbuffer_alloc(local_bufmgr, 
intel_get_drm_devid(local_fd));
+   igt_assert(local_batch);
+
+   bo_1 = drm_intel_bo_alloc(local_bufmgr, "BO 1", width * height * 4, 
4096);
+   dma_buf_fd = prime_handle_to_fd_for_mmap(local_fd, bo_1->handle);
+   igt_skip_on(errno == EINVAL);
+
+   ptr_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE,
+   MAP_SHARED, dma_buf_fd, 0);
+   igt_assert(ptr_cpu != MAP_FAILED);
+
+   bo_2 = drm_intel_bo_alloc(local_bufmgr, "BO 2", width * height * 4, 
4096);
+   dma_buf2_fd = prime_handle_to_fd_for_mmap(local_fd, bo_2->handle);
+
+   ptr2_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE,
+   MAP_SHARED, dma_buf2_fd, 0);
+   igt_assert(ptr2_cpu != MAP_FAILED);
+
+   /* Fill up BO 1 with '1's and BO 2 with '0's */
+   prime_sync_start(dma_buf_fd, true);
+   memset(ptr_cpu, 0x11, width * height);
+   prime_sync_end(dma_buf_fd, true);
+
+   prime_sync_start(dma_buf2_fd, true);
+   memset(ptr2_cpu, 0x00, width * height);
+   prime_sync_end(dma_buf2_fd, true);
+
+   /* Copy BO 1 into BO 2, using blitter. */
+   intel_copy_bo(local_batch, bo_2, bo_1, width * height);
+   usleep(0); /* let someone else claim the mutex */
+
+   /* Compare BOs. If prime_sync_* were executed properly, the caches
+* should be synced. */
+   prime_sync_start(dma_buf2_fd, true);
+   for (i = 0; i < (width * height) / 4; i++)
+   igt_fail_on_f(ptr2_cpu[i] != 0x, "Found 0x%08x at 
offset 0x%08x\n", ptr2_cpu[i], i);
+   prime_sync_end(dma_buf2_fd, true);
+
+   drm_intel_bo_unreference(bo_1);
+   drm_intel_bo_unreference(bo_2);
+   munmap(ptr_cpu, width * height);
+   munmap(ptr2_cpu, width * height);
+}
+
+/*
+ * Constantly interrupt concurrent blits to stress out prime_sync_* and make
+ * sure these ioctl errors are handled accordingly.
+ *
+ * Important to note that in case of failure (e.g. in a case where the ioctl
+ * wouldn't try again in a return error) this test does not reliably catch the
+ * problem with 100% of accuracy.
+ */
+static void test_ioctl_errors(void)
+{
+   int i;
+   int num_children = 8*sysconf(_SC_NPROCESSORS_ONLN);
+
+   igt_fork_signal_helper();
+   igt_fork(child, num_children) {
+   for (i = 0; i < ROUNDS; i++)
+   blit_and_cmp();
+   }
+   igt_waitchildren();
+   igt_stop_signal_helper();
+}
+
 int main(int argc, char **argv)
 {
int i;
@@ -235,6 +317,11 @@ int main(int argc, char **argv)
igt_fail_on_f(!stale, "couldn't find any stale cache lines\n");
}

+   igt_subtest("ioctl-errors") {
+   igt_info("exercising concurrent blit to get ioctl errors\n");
+   test_ioctl_errors();
+   }
+
igt_fixture {
intel_batchbuffer_free(batch);
drm_intel_bufmgr_destroy(bufmgr);
-- 
2.1.4

[PATCH v9] dma-buf: Add ioctls to allow userspace to flush

2016-03-14 Thread Tiago Vignatti

On 03/05/2016 06:34 AM, Daniel Vetter wrote:
> On Mon, Feb 29, 2016 at 03:02:09PM +, Chris Wilson wrote:
>> On Mon, Feb 29, 2016 at 03:54:19PM +0100, Daniel Vetter wrote:
>>> On Thu, Feb 25, 2016 at 06:01:22PM +, Chris Wilson wrote:
>>>> On Thu, Feb 11, 2016 at 08:04:51PM -0200, Tiago Vignatti wrote:
>>>>> +static long dma_buf_ioctl(struct file *file,
>>>>> +   unsigned int cmd, unsigned long arg)
>>>>> +{
>>>>> + struct dma_buf *dmabuf;
>>>>> + struct dma_buf_sync sync;
>>>>> + enum dma_data_direction direction;
>>>>> +
>>>>> + dmabuf = file->private_data;
>>>>> +
>>>>> + switch (cmd) {
>>>>> + case DMA_BUF_IOCTL_SYNC:
>>>>> + if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
>>>>> + return -EFAULT;
>>>>> +
>>>>> + if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
>>>>> + return -EINVAL;
>>>>> +
>>>>> + switch (sync.flags & DMA_BUF_SYNC_RW) {
>>>>> + case DMA_BUF_SYNC_READ:
>>>>> + direction = DMA_FROM_DEVICE;
>>>>> + break;
>>>>> + case DMA_BUF_SYNC_WRITE:
>>>>> + direction = DMA_TO_DEVICE;
>>>>> + break;
>>>>> + case DMA_BUF_SYNC_RW:
>>>>> + direction = DMA_BIDIRECTIONAL;
>>>>> + break;
>>>>> + default:
>>>>> + return -EINVAL;
>>>>> + }
>>>>> +
>>>>> + if (sync.flags & DMA_BUF_SYNC_END)
>>>>> + dma_buf_end_cpu_access(dmabuf, direction);
>>>>> + else
>>>>> + dma_buf_begin_cpu_access(dmabuf, direction);
>>>>
>>>> We forgot to report the error back to userspace. Might as well fixup the
>>>> callchain to propagate error from end-cpu-access as well. Found after
>>>> updating igt/gem_concurrent_blit to exercise dmabuf mmaps vs the GPU.
>>>
>>> EINTR? Do we need to make this ABI - I guess so? Tiago, do you have
>>> patches? See drmIoctl() in libdrm for what's needed on the userspace side
>>> if my guess is right.
>>
>> EINTR is the easiest, but conceivably we could also get EIO and
>> currently EAGAIN.
>>
>> I am also seeing some strange timing dependent (i.e. valgrind doesn't
>> show up anything client side and the tests then pass) failures (SIGSEGV,
>> SIGBUS) with !llc.
>
> Tiago, ping. Also probably a gap in igt coverage besides the kernel side.
> -Daniel

Hey guys! I'm back from vacation now. I'll take a look on it in the next 
days and then come back to you.

Tiago

[PATCH v9] dma-buf: Add ioctls to allow userspace to flush

2016-02-11 Thread Tiago Vignatti

From: Daniel Vetter 

The userspace might need some sort of cache coherency management e.g. when CPU
and GPU domains are being accessed through dma-buf at the same time. To
circumvent this problem there are begin/end coherency markers, that forward
directly to existing dma-buf device drivers vfunc hooks. Userspace can make use
of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be
used like following:
 - mmap dma-buf fd
 - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
   to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
   want (with the new data being consumed by the GPU or say scanout device)
 - munmap once you don't need the buffer any more

v2 (Tiago): Fix header file type names (u64 -> __u64)
v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end
dma-buf functions. Check for overflows in start/length.
v4 (Tiago): use 2d regions for sync.
v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and
remove range information from struct dma_buf_sync.
v6 (Tiago): use __u64 structured padded flags instead enum. Adjust
documentation about the recommendation on using sync ioctls.
v7 (Tiago): Alex' nit on flags definition and being even more wording in the
doc about sync usage.
v9 (Tiago): remove useless is_dma_buf_file check. Fix sync.flags conditionals
and its mask order check. Add  include in dma-buf.h.

Cc: Ville SyrjÃ¤lÃ¤ 
Cc: David Herrmann 
Cc: Sumit Semwal 
Reviewed-by: StÃ©phane Marchesin 
Signed-off-by: Daniel Vetter 
Signed-off-by: Tiago Vignatti 
---

I left SYNC_START and SYNC_END exclusive, just how the logic was before. If we
see an useful use case, maybe like the way David said, to store two frames
next to each other in the same BO, we can patch up later fairly easily.

About the ioctl direction, just like Ville pointed, we're doing only
copy_from_user at the moment and seems that _IOW is all we need. So I also
didn't touch anything on that.

David, Ville PTAL. Thank you,

Tiago

 Documentation/dma-buf-sharing.txt | 21 +-
 drivers/dma-buf/dma-buf.c | 45 +++
 include/uapi/linux/dma-buf.h  | 40 ++
 3 files changed, 105 insertions(+), 1 deletion(-)
 create mode 100644 include/uapi/linux/dma-buf.h

diff --git a/Documentation/dma-buf-sharing.txt 
b/Documentation/dma-buf-sharing.txt
index 4f4a84b..32ac32e 100644
--- a/Documentation/dma-buf-sharing.txt
+++ b/Documentation/dma-buf-sharing.txt
@@ -350,7 +350,26 @@ Being able to mmap an export dma-buf buffer object has 2 
main use-cases:
handles, too). So it's beneficial to support this in a similar fashion on
dma-buf to have a good transition path for existing Android userspace.

-   No special interfaces, userspace simply calls mmap on the dma-buf fd.
+   No special interfaces, userspace simply calls mmap on the dma-buf fd, making
+   sure that the cache synchronization ioctl (DMA_BUF_IOCTL_SYNC) is *always*
+   used when the access happens. This is discussed next paragraphs.
+
+   Some systems might need some sort of cache coherency management e.g. when
+   CPU and GPU domains are being accessed through dma-buf at the same time. To
+   circumvent this problem there are begin/end coherency markers, that forward
+   directly to existing dma-buf device drivers vfunc hooks. Userspace can make
+   use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence
+   would be used like following:
+ - mmap dma-buf fd
+ - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
+   to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
+   want (with the new data being consumed by the GPU or say scanout device)
+ - munmap once you don't need the buffer any more
+
+Therefore, for correctness and optimal performance, systems with the memory
+cache shared by the GPU and CPU i.e. the "coherent" and also the
+"incoherent" are always required to use SYNC_START and SYNC_END before and
+after, respectively, when accessing the mapped address.

 2. Supporting existing mmap interfaces in importers

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index b2ac13b..9810d1d 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -34,6 +34,8 @@
 #include 
 #include 

+#include 
+
 static inline int is_dma_buf_file(struct file *);

 struct dma_buf_list {
@@ -251,11 +253,54 @@ out:
return events;
 }

+static long dma_buf_ioctl(struct file *file,
+ unsigned int cmd, unsigned long arg)
+{
+   struct dma_buf *dmabuf;
+   struct dma_buf_sync sync;
+   enum dma_data_direction direction;
+
+   dmabuf = file->private_data;
+
+   switch (cmd) {
+   case DMA_BUF_IOCTL_SYNC:
+   if (copy_from_user(&sync, (void __user *

[PATCH v7 3/5] dma-buf: Add ioctls to allow userspace to flush

2016-02-11 Thread Tiago Vignatti


Thanks for reviewing, David. Please take a look in my comments in-line.


On 02/09/2016 07:26 AM, David Herrmann wrote:
>
> On Tue, Dec 22, 2015 at 10:36 PM, Tiago Vignatti
>  wrote:
>> From: Daniel Vetter 
>>
>> The userspace might need some sort of cache coherency management e.g. when 
>> CPU
>> and GPU domains are being accessed through dma-buf at the same time. To
>> circumvent this problem there are begin/end coherency markers, that forward
>> directly to existing dma-buf device drivers vfunc hooks. Userspace can make 
>> use
>> of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be
>> used like following:
>>   - mmap dma-buf fd
>>   - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. 
>> read/write
>> to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
>> want (with the new data being consumed by the GPU or say scanout 
>> device)
>>   - munmap once you don't need the buffer any more
>>
>> v2 (Tiago): Fix header file type names (u64 -> __u64)
>> v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end
>> dma-buf functions. Check for overflows in start/length.
>> v4 (Tiago): use 2d regions for sync.
>> v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and
>> remove range information from struct dma_buf_sync.
>> v6 (Tiago): use __u64 structured padded flags instead enum. Adjust
>> documentation about the recommendation on using sync ioctls.
>> v7 (Tiago): Alex' nit on flags definition and being even more wording in the
>> doc about sync usage.
>>
>> Cc: Sumit Semwal 
>> Signed-off-by: Daniel Vetter 
>> Signed-off-by: Tiago Vignatti 
>> ---
>>   Documentation/dma-buf-sharing.txt | 21 ++-
>>   drivers/dma-buf/dma-buf.c | 43 
>> +++
>>   include/uapi/linux/dma-buf.h  | 38 ++
>>   3 files changed, 101 insertions(+), 1 deletion(-)
>>   create mode 100644 include/uapi/linux/dma-buf.h
>>
>> diff --git a/Documentation/dma-buf-sharing.txt 
>> b/Documentation/dma-buf-sharing.txt
>> index 4f4a84b..32ac32e 100644
>> --- a/Documentation/dma-buf-sharing.txt
>> +++ b/Documentation/dma-buf-sharing.txt
>> @@ -350,7 +350,26 @@ Being able to mmap an export dma-buf buffer object has 
>> 2 main use-cases:
>>  handles, too). So it's beneficial to support this in a similar fashion 
>> on
>>  dma-buf to have a good transition path for existing Android userspace.
>>
>> -   No special interfaces, userspace simply calls mmap on the dma-buf fd.
>> +   No special interfaces, userspace simply calls mmap on the dma-buf fd, 
>> making
>> +   sure that the cache synchronization ioctl (DMA_BUF_IOCTL_SYNC) is 
>> *always*
>> +   used when the access happens. This is discussed next paragraphs.
>> +
>> +   Some systems might need some sort of cache coherency management e.g. when
>> +   CPU and GPU domains are being accessed through dma-buf at the same time. 
>> To
>> +   circumvent this problem there are begin/end coherency markers, that 
>> forward
>> +   directly to existing dma-buf device drivers vfunc hooks. Userspace can 
>> make
>> +   use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence
>> +   would be used like following:
>> + - mmap dma-buf fd
>> + - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. 
>> read/write
>> +   to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
>> +   want (with the new data being consumed by the GPU or say scanout 
>> device)
>> + - munmap once you don't need the buffer any more
>> +
>> +Therefore, for correctness and optimal performance, systems with the 
>> memory
>> +cache shared by the GPU and CPU i.e. the "coherent" and also the
>> +"incoherent" are always required to use SYNC_START and SYNC_END before 
>> and
>> +after, respectively, when accessing the mapped address.
>>
>>   2. Supporting existing mmap interfaces in importers
>>
>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
>> index b2ac13b..9a298bd 100644
>> --- a/drivers/dma-buf/dma-buf.c
>> +++ b/drivers/dma-buf/dma-buf.c
>> @@ -34,6 +34,8 @@
>>   #include 
>>   #include 
>>
>> +#include 
>> +
>>   static inline int is_dma_buf_file(struct file *);
>>
>>   struct dma_buf_list {
>> @@ -2

Direct userspace dma-buf mmap (v7)

2016-02-05 Thread Tiago Vignatti

On 02/04/2016 06:55 PM, StÃ©phane Marchesin wrote:
> On Tue, Dec 22, 2015 at 1:36 PM, Tiago Vignatti
>  wrote:
>> Hey back,
>>
>> Thank you Daniel, Chris, Alex and Thomas for reviewing the last series. I
>> think I addressed most of the comments now in version 7, including:
>>- being even more wording in the doc about sync usage.
>>- pass .write = false always in i915 end_cpu_access.
>>- add sync invalid flags test (igt).
>>- in kms_mmap_write_crc, use CPU hog and testing rounds to catch the sync
>>  problems (igt).
>>
>> Here are the trees:
>>
>> https://github.com/tiagovignatti/drm-intel/commits/drm-intel-nightly_dma-buf-mmap-v7
>> https://github.com/tiagovignatti/intel-gpu-tools/commits/dma-buf-mmap-v7
>>
>> Also, Chrome OS side is in progress. This past week I've been mostly
>> struggling with fail attempts to build it (boots and goes to a black screen.
>> Sigh.) and also finding a way to make my funky BayTrail-T laptop with 32-bit
>> UEFI firmware boot up (success with Ubuntu but no success yet in CrOS). A WIP
>> of Chromium changes can be seen here anyways:
>>
>> https://codereview.chromium.org/1262043002/
>>
>> Happy Holidays!
>
> For the series:
>
> Reviewed-by: StÃ©phane Marchesin 

Thank you! Daniel, here are the trees ready for pulling (I hope) now:

https://github.com/tiagovignatti/drm-intel/commits/drm-intel-nightly_dma-buf-mmap-v8
https://github.com/tiagovignatti/intel-gpu-tools/commits/dma-buf-mmap-v8

Tiago

[PATCH igt v7 6/6] tests: Add prime_mmap_coherency for cache coherency tests

2015-12-22 Thread Tiago Vignatti

Different than kms_mmap_write_crc that captures the coherency issues within the
scanout mapped buffer, this one is meant for test dma-buf mmap on !llc
platforms mostly and provoke coherency bugs so we know where we need the sync
ioctls.

I tested this with !llc and llc platforms, BTY and IVY respectively.

Signed-off-by: Tiago Vignatti 
---
 tests/Makefile.sources   |   1 +
 tests/prime_mmap_coherency.c | 246 +++
 2 files changed, 247 insertions(+)
 create mode 100644 tests/prime_mmap_coherency.c

diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index ad2dd6a..78605c6 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -97,6 +97,7 @@ TESTS_progs_M = \
pm_rc6_residency \
pm_sseu \
prime_mmap \
+   prime_mmap_coherency \
prime_self_import \
template \
$(NULL)
diff --git a/tests/prime_mmap_coherency.c b/tests/prime_mmap_coherency.c
new file mode 100644
index 000..a9a2664
--- /dev/null
+++ b/tests/prime_mmap_coherency.c
@@ -0,0 +1,246 @@
+/*
+ * Copyright Â© 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Tiago Vignatti
+ */
+
+/** @file prime_mmap_coherency.c
+ *
+ * TODO: need to show the need for prime_sync_end().
+ */
+
+#include "igt.h"
+
+IGT_TEST_DESCRIPTION("Test dma-buf mmap on !llc platforms mostly and provoke"
+   " coherency bugs so we know for sure where we need the sync 
ioctls.");
+
+#define ROUNDS 20
+
+int fd;
+int stale = 0;
+static drm_intel_bufmgr *bufmgr;
+struct intel_batchbuffer *batch;
+static int width = 1024, height = 1024;
+
+/*
+ * Exercises the need for read flush:
+ *   1. create a BO and write '0's, in GTT domain.
+ *   2. read BO using the dma-buf CPU mmap.
+ *   3. write '1's, in GTT domain.
+ *   4. read again through the mapped dma-buf.
+ */
+static void test_read_flush(bool expect_stale_cache)
+{
+   drm_intel_bo *bo_1;
+   drm_intel_bo *bo_2;
+   uint32_t *ptr_cpu;
+   uint32_t *ptr_gtt;
+   int dma_buf_fd, i;
+
+   if (expect_stale_cache)
+   igt_require(!gem_has_llc(fd));
+
+   bo_1 = drm_intel_bo_alloc(bufmgr, "BO 1", width * height * 4, 4096);
+
+   /* STEP #1: put the BO 1 in GTT domain. We use the blitter to copy and 
fill
+* zeros to BO 1, so commands will be submitted and likely to place BO 
1 in
+* the GTT domain. */
+   bo_2 = drm_intel_bo_alloc(bufmgr, "BO 2", width * height * 4, 4096);
+   intel_copy_bo(batch, bo_1, bo_2, width * height);
+   gem_sync(fd, bo_1->handle);
+   drm_intel_bo_unreference(bo_2);
+
+   /* STEP #2: read BO 1 using the dma-buf CPU mmap. This dirties the CPU 
caches. */
+   dma_buf_fd = prime_handle_to_fd_for_mmap(fd, bo_1->handle);
+   igt_skip_on(errno == EINVAL);
+
+   ptr_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE,
+  MAP_SHARED, dma_buf_fd, 0);
+   igt_assert(ptr_cpu != MAP_FAILED);
+
+   for (i = 0; i < (width * height) / 4; i++)
+   igt_assert_eq(ptr_cpu[i], 0);
+
+   /* STEP #3: write 0x11 into BO 1. */
+   bo_2 = drm_intel_bo_alloc(bufmgr, "BO 2", width * height * 4, 4096);
+   ptr_gtt = gem_mmap__gtt(fd, bo_2->handle, width * height, PROT_READ | 
PROT_WRITE);
+   memset(ptr_gtt, 0x11, width * height);
+   munmap(ptr_gtt, width * height);
+
+   intel_copy_bo(batch, bo_1, bo_2, width * height);
+   gem_sync(fd, bo_1->handle);
+   drm_intel_bo_unreference(bo_2);
+
+   /* STEP #4: read again using the CPU mmap. Doing #1 before #3 makes 
sure we
+* don't do a full CPU cache flush in step #3 again. That makes sure 
all the
+* stale cachelines from step #2 survive (mostly, a few w

[PATCH igt v7 5/6] tests: Add kms_mmap_write_crc for cache coherency tests

2015-12-22 Thread Tiago Vignatti

This program can be used to detect when CPU writes in the dma-buf mapped object
don't land in scanout due cache incoherency.

Although this seems a problem inherently of non-LCC machines ("Atom"), this
particular test catches a cache dirt on scanout on LLC machines as well. It's
inspired in Ville's kms_pwrite_crc.c and can be used also to test the
correctness of the driver's begin_cpu_access and end_cpu_access (which requires
i915 implementation.

To see the need for flush, one has to run using '-n' option to not call the
sync ioctls which, via a rather simple CPU hog the system will trashes the
caches, while the test will catch the coherency issue. If you now suppress
'-n', then things should just work like expected.

I tested this with !llc and llc platforms, BTY and IVY respectively.

v2: use prime_handle_to_fd_for_mmap instead.
v3: merge end_cpu_access() patch with this and provide options to disable sync.
v4: use library's prime_sync_{start,end} instead.
v7: use CPU hog instead and use testing rounds to catch the sync problems.

Signed-off-by: Tiago Vignatti 
---
 tests/Makefile.sources |   1 +
 tests/kms_mmap_write_crc.c | 313 +
 2 files changed, 314 insertions(+)
 create mode 100644 tests/kms_mmap_write_crc.c

diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index 75f3cb0..ad2dd6a 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -168,6 +168,7 @@ TESTS_progs = \
kms_3d \
kms_fence_pin_leak \
kms_force_connector_basic \
+   kms_mmap_write_crc \
kms_pwrite_crc \
kms_sink_crc_basic \
prime_udl \
diff --git a/tests/kms_mmap_write_crc.c b/tests/kms_mmap_write_crc.c
new file mode 100644
index 000..6984bbd
--- /dev/null
+++ b/tests/kms_mmap_write_crc.c
@@ -0,0 +1,313 @@
+/*
+ * Copyright Â© 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Tiago Vignatti 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "drmtest.h"
+#include "igt_debugfs.h"
+#include "igt_kms.h"
+#include "intel_chipset.h"
+#include "ioctl_wrappers.h"
+#include "igt_aux.h"
+
+IGT_TEST_DESCRIPTION(
+   "Use the display CRC support to validate mmap write to an already uncached 
future scanout buffer.");
+
+#define ROUNDS 10
+
+typedef struct {
+   int drm_fd;
+   igt_display_t display;
+   struct igt_fb fb[2];
+   igt_output_t *output;
+   igt_plane_t *primary;
+   enum pipe pipe;
+   igt_crc_t ref_crc;
+   igt_pipe_crc_t *pipe_crc;
+   uint32_t devid;
+} data_t;
+
+static int ioctl_sync = true;
+int dma_buf_fd;
+
+static char *dmabuf_mmap_framebuffer(int drm_fd, struct igt_fb *fb)
+{
+   char *ptr = NULL;
+
+   dma_buf_fd = prime_handle_to_fd_for_mmap(drm_fd, fb->gem_handle);
+   igt_skip_on(dma_buf_fd == -1 && errno == EINVAL);
+
+   ptr = mmap(NULL, fb->size, PROT_READ | PROT_WRITE, MAP_SHARED, 
dma_buf_fd, 0);
+   igt_assert(ptr != MAP_FAILED);
+
+   return ptr;
+}
+
+static void test(data_t *data)
+{
+   igt_display_t *display = &data->display;
+   igt_output_t *output = data->output;
+   struct igt_fb *fb = &data->fb[1];
+   drmModeModeInfo *mode;
+   cairo_t *cr;
+   char *ptr;
+   uint32_t caching;
+   void *buf;
+   igt_crc_t crc;
+
+   mode = igt_output_get_mode(output);
+
+   /* create a non-white fb where we can write later */
+   igt_create_fb(data->drm_fd, mode->hdisplay, mode->vdisplay,
+ DRM_FORMAT_XRGB, LOCAL_DRM_FORMAT_MOD_NONE, fb);
+
+   ptr = dmabuf_mmap_framebuffer(data->drm_fd, fb);
+
+   cr = igt_get_cairo_ctx(data->drm_fd, fb);

[PATCH igt v7 4/6] lib: Add prime_sync_start and prime_sync_end helpers

2015-12-22 Thread Tiago Vignatti

This patch adds dma-buf mmap synchronization ioctls that can be used by tests
for cache coherency management e.g. when CPU and GPU domains are being accessed
through dma-buf at the same time.

v7: add sync invalid flags test.

Signed-off-by: Tiago Vignatti 
---
 lib/ioctl_wrappers.c | 26 ++
 lib/ioctl_wrappers.h | 17 +
 tests/prime_mmap.c   | 25 +
 3 files changed, 68 insertions(+)

diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c
index 86a61ba..0d84d00 100644
--- a/lib/ioctl_wrappers.c
+++ b/lib/ioctl_wrappers.c
@@ -1400,6 +1400,32 @@ off_t prime_get_size(int dma_buf_fd)
 }

 /**
+ * prime_sync_start
+ * @dma_buf_fd: dma-buf fd handle
+ */
+void prime_sync_start(int dma_buf_fd)
+{
+   struct local_dma_buf_sync sync_start;
+
+   memset(&sync_start, 0, sizeof(sync_start));
+   sync_start.flags = LOCAL_DMA_BUF_SYNC_START | LOCAL_DMA_BUF_SYNC_RW;
+   do_ioctl(dma_buf_fd, LOCAL_DMA_BUF_IOCTL_SYNC, &sync_start);
+}
+
+/**
+ * prime_sync_end
+ * @dma_buf_fd: dma-buf fd handle
+ */
+void prime_sync_end(int dma_buf_fd)
+{
+   struct local_dma_buf_sync sync_end;
+
+   memset(&sync_end, 0, sizeof(sync_end));
+   sync_end.flags = LOCAL_DMA_BUF_SYNC_END | LOCAL_DMA_BUF_SYNC_RW;
+   do_ioctl(dma_buf_fd, LOCAL_DMA_BUF_IOCTL_SYNC, &sync_end);
+}
+
+/**
  * igt_require_fb_modifiers:
  * @fd: Open DRM file descriptor.
  *
diff --git a/lib/ioctl_wrappers.h b/lib/ioctl_wrappers.h
index d3ffba2..d004165 100644
--- a/lib/ioctl_wrappers.h
+++ b/lib/ioctl_wrappers.h
@@ -148,6 +148,21 @@ void gem_require_caching(int fd);
 void gem_require_ring(int fd, int ring_id);

 /* prime */
+struct local_dma_buf_sync {
+   uint64_t flags;
+};
+
+#define LOCAL_DMA_BUF_SYNC_READ  (1 << 0)
+#define LOCAL_DMA_BUF_SYNC_WRITE (2 << 0)
+#define LOCAL_DMA_BUF_SYNC_RW(LOCAL_DMA_BUF_SYNC_READ | 
LOCAL_DMA_BUF_SYNC_WRITE)
+#define LOCAL_DMA_BUF_SYNC_START (0 << 2)
+#define LOCAL_DMA_BUF_SYNC_END   (1 << 2)
+#define LOCAL_DMA_BUF_SYNC_VALID_FLAGS_MASK \
+   (LOCAL_DMA_BUF_SYNC_RW | LOCAL_DMA_BUF_SYNC_END)
+
+#define LOCAL_DMA_BUF_BASE 'b'
+#define LOCAL_DMA_BUF_IOCTL_SYNC _IOW(LOCAL_DMA_BUF_BASE, 0, struct 
local_dma_buf_sync)
+
 int prime_handle_to_fd(int fd, uint32_t handle);
 #ifndef DRM_RDWR
 #define DRM_RDWR O_RDWR
@@ -155,6 +170,8 @@ int prime_handle_to_fd(int fd, uint32_t handle);
 int prime_handle_to_fd_for_mmap(int fd, uint32_t handle);
 uint32_t prime_fd_to_handle(int fd, int dma_buf_fd);
 off_t prime_get_size(int dma_buf_fd);
+void prime_sync_start(int dma_buf_fd);
+void prime_sync_end(int dma_buf_fd);

 /* addfb2 fb modifiers */
 struct local_drm_mode_fb_cmd2 {
diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c
index 269ada6..29a0cfd 100644
--- a/tests/prime_mmap.c
+++ b/tests/prime_mmap.c
@@ -401,6 +401,30 @@ test_errors(void)
gem_close(fd, handle);
 }

+/* Test for invalid flags on sync ioctl */
+static void
+test_invalid_sync_flags(void)
+{
+   int i, dma_buf_fd;
+   uint32_t handle;
+   struct local_dma_buf_sync sync;
+   int invalid_flags[] = {-1,
+  0x00,
+  LOCAL_DMA_BUF_SYNC_RW + 1,
+  LOCAL_DMA_BUF_SYNC_VALID_FLAGS_MASK + 1};
+
+   handle = gem_create(fd, BO_SIZE);
+   dma_buf_fd = prime_handle_to_fd(fd, handle);
+   for (i = 0; i < sizeof(invalid_flags) / sizeof(invalid_flags[0]); i++) {
+   memset(&sync, 0, sizeof(sync));
+   sync.flags = invalid_flags[i];
+
+   drmIoctl(dma_buf_fd, LOCAL_DMA_BUF_IOCTL_SYNC, &sync);
+   igt_assert_eq(errno, EINVAL);
+   errno = 0;
+   }
+}
+
 static void
 test_aperture_limit(void)
 {
@@ -473,6 +497,7 @@ igt_main
{ "test_dup", test_dup },
{ "test_userptr", test_userptr },
{ "test_errors", test_errors },
+   { "test_invalid_sync_flags", test_invalid_sync_flags },
{ "test_aperture_limit", test_aperture_limit },
};
int i;
-- 
2.1.4

[PATCH igt v7 3/6] prime_mmap: Add basic tests to write in a bo using CPU

2015-12-22 Thread Tiago Vignatti

This patch adds test_correct_cpu_write, which maps the texture buffer through a
prime fd and then writes directly to it using the CPU. It stresses the driver
to guarantee cache synchronization among the different domains.

This test also adds test_forked_cpu_write, which creates the GEM bo in one
process and pass the prime handle of the it to another process, which in turn
uses the handle only to map and write. Roughly speaking this test simulates
Chrome OS  architecture, where the Web content ("unpriviledged process") maps
and CPU-draws a buffer, which was previously allocated in the GPU process
("priviledged process").

This requires kernel modifications (Daniel Thompson's "drm: prime: Honour
O_RDWR during prime-handle-to-fd") and therefore prime_handle_to_fd_for_mmap is
added to fail in case these lack. Also, upcoming tests (e.g. next patch) are
going to use it as well, so make it public and available in the lib.

v2: adds prime_handle_to_fd_with_mmap for skipping test in older kernels and
test for invalid flags.

Signed-off-by: Tiago Vignatti 
---
 lib/ioctl_wrappers.c | 25 +++
 lib/ioctl_wrappers.h |  4 +++
 tests/prime_mmap.c   | 89 
 3 files changed, 112 insertions(+), 6 deletions(-)

diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c
index 6cad8a2..86a61ba 100644
--- a/lib/ioctl_wrappers.c
+++ b/lib/ioctl_wrappers.c
@@ -1329,6 +1329,31 @@ int prime_handle_to_fd(int fd, uint32_t handle)
 }

 /**
+ * prime_handle_to_fd_for_mmap:
+ * @fd: open i915 drm file descriptor
+ * @handle: file-private gem buffer object handle
+ *
+ * Same as prime_handle_to_fd above but with DRM_RDWR capabilities, which can
+ * be useful for writing into the mmap'ed dma-buf file-descriptor.
+ *
+ * Returns: The created dma-buf fd handle or -1 if the ioctl fails.
+ */
+int prime_handle_to_fd_for_mmap(int fd, uint32_t handle)
+{
+   struct drm_prime_handle args;
+
+   memset(&args, 0, sizeof(args));
+   args.handle = handle;
+   args.flags = DRM_CLOEXEC | DRM_RDWR;
+   args.fd = -1;
+
+   if (drmIoctl(fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &args) != 0)
+   return -1;
+
+   return args.fd;
+}
+
+/**
  * prime_fd_to_handle:
  * @fd: open i915 drm file descriptor
  * @dma_buf_fd: dma-buf fd handle
diff --git a/lib/ioctl_wrappers.h b/lib/ioctl_wrappers.h
index bb8a858..d3ffba2 100644
--- a/lib/ioctl_wrappers.h
+++ b/lib/ioctl_wrappers.h
@@ -149,6 +149,10 @@ void gem_require_ring(int fd, int ring_id);

 /* prime */
 int prime_handle_to_fd(int fd, uint32_t handle);
+#ifndef DRM_RDWR
+#define DRM_RDWR O_RDWR
+#endif
+int prime_handle_to_fd_for_mmap(int fd, uint32_t handle);
 uint32_t prime_fd_to_handle(int fd, int dma_buf_fd);
 off_t prime_get_size(int dma_buf_fd);

diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c
index 95304a9..269ada6 100644
--- a/tests/prime_mmap.c
+++ b/tests/prime_mmap.c
@@ -22,6 +22,7 @@
  *
  * Authors:
  *Rob Bradford 
+ *Tiago Vignatti 
  *
  */

@@ -66,6 +67,12 @@ fill_bo(uint32_t handle, size_t size)
 }

 static void
+fill_bo_cpu(char *ptr)
+{
+   memcpy(ptr, pattern, sizeof(pattern));
+}
+
+static void
 test_correct(void)
 {
int dma_buf_fd;
@@ -180,6 +187,65 @@ test_forked(void)
gem_close(fd, handle);
 }

+/* test simple CPU write */
+static void
+test_correct_cpu_write(void)
+{
+   int dma_buf_fd;
+   char *ptr;
+   uint32_t handle;
+
+   handle = gem_create(fd, BO_SIZE);
+
+   dma_buf_fd = prime_handle_to_fd_for_mmap(fd, handle);
+
+   /* Skip if DRM_RDWR is not supported */
+   igt_skip_on(errno == EINVAL);
+
+   /* Check correctness of map using write protection (PROT_WRITE) */
+   ptr = mmap(NULL, BO_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, 
dma_buf_fd, 0);
+   igt_assert(ptr != MAP_FAILED);
+
+   /* Fill bo using CPU */
+   fill_bo_cpu(ptr);
+
+   /* Check pattern correctness */
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);
+
+   munmap(ptr, BO_SIZE);
+   close(dma_buf_fd);
+   gem_close(fd, handle);
+}
+
+/* map from another process and then write using CPU */
+static void
+test_forked_cpu_write(void)
+{
+   int dma_buf_fd;
+   char *ptr;
+   uint32_t handle;
+
+   handle = gem_create(fd, BO_SIZE);
+
+   dma_buf_fd = prime_handle_to_fd_for_mmap(fd, handle);
+
+   /* Skip if DRM_RDWR is not supported */
+   igt_skip_on(errno == EINVAL);
+
+   igt_fork(childno, 1) {
+   ptr = mmap(NULL, BO_SIZE, PROT_READ | PROT_WRITE , MAP_SHARED, 
dma_buf_fd, 0);
+   igt_assert(ptr != MAP_FAILED);
+   fill_bo_cpu(ptr);
+
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);
+   munmap(ptr, BO_SIZE);
+   close(dma_buf_fd);
+   }
+   close(dma_buf_fd);
+   igt_waitchildren();
+   gem_close(fd, handle);
+}
+
 static void
 test_ref

[PATCH igt v7 2/6] prime_mmap: Add new test for calling mmap() on dma-buf fds

2015-12-22 Thread Tiago Vignatti

From: Rob Bradford 

This test has the following subtests:
 - test_correct for correctness of the data
 - test_map_unmap checks for mapping idempotency
 - test_reprime checks for dma-buf creation idempotency
 - test_forked checks for multiprocess access
 - test_refcounting checks for buffer reference counting
 - test_dup checks that dup()ing the fd works
 - test_userptr make sure it fails when mmaping due the lack of obj->base.filp
   in a userptr.
 - test_errors checks the error return values for failures
 - test_aperture_limit tests multiple buffer creation at the gtt aperture
   limit

v2 (Tiago): Removed pattern_check(), which was walking through a useless
iterator. Removed superfluous PROT_WRITE from gem_mmap, in test_correct().
Added binary file to .gitignore
v3 (Tiago): squash patch "prime_mmap: Test for userptr mmap" into this one.
v4 (Tiago): use synchronized userptr for testing. Add test for buffer
overlapping.

Signed-off-by: Rob Bradford 
Signed-off-by: Tiago Vignatti 
---
 tests/Makefile.sources |   1 +
 tests/prime_mmap.c | 417 +
 2 files changed, 418 insertions(+)
 create mode 100644 tests/prime_mmap.c

diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index d594038..75f3cb0 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -96,6 +96,7 @@ TESTS_progs_M = \
pm_rps \
pm_rc6_residency \
pm_sseu \
+   prime_mmap \
prime_self_import \
template \
$(NULL)
diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c
new file mode 100644
index 000..95304a9
--- /dev/null
+++ b/tests/prime_mmap.c
@@ -0,0 +1,417 @@
+/*
+ * Copyright Â© 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Rob Bradford 
+ *
+ */
+
+/*
+ * Testcase: Check whether mmap()ing dma-buf works
+ */
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "drm.h"
+#include "i915_drm.h"
+#include "drmtest.h"
+#include "igt_debugfs.h"
+#include "ioctl_wrappers.h"
+
+#define BO_SIZE (16*1024)
+
+static int fd;
+
+char pattern[] = {0xff, 0x00, 0x00, 0x00,
+   0x00, 0xff, 0x00, 0x00,
+   0x00, 0x00, 0xff, 0x00,
+   0x00, 0x00, 0x00, 0xff};
+
+static void
+fill_bo(uint32_t handle, size_t size)
+{
+   off_t i;
+   for (i = 0; i < size; i+=sizeof(pattern))
+   {
+   gem_write(fd, handle, i, pattern, sizeof(pattern));
+   }
+}
+
+static void
+test_correct(void)
+{
+   int dma_buf_fd;
+   char *ptr1, *ptr2;
+   uint32_t handle;
+
+   handle = gem_create(fd, BO_SIZE);
+   fill_bo(handle, BO_SIZE);
+
+   dma_buf_fd = prime_handle_to_fd(fd, handle);
+   igt_assert(errno == 0);
+
+   /* Check correctness vs GEM_MMAP_GTT */
+   ptr1 = gem_mmap__gtt(fd, handle, BO_SIZE, PROT_READ);
+   ptr2 = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
+   igt_assert(ptr1 != MAP_FAILED);
+   igt_assert(ptr2 != MAP_FAILED);
+   igt_assert(memcmp(ptr1, ptr2, BO_SIZE) == 0);
+
+   /* Check pattern correctness */
+   igt_assert(memcmp(ptr2, pattern, sizeof(pattern)) == 0);
+
+   munmap(ptr1, BO_SIZE);
+   munmap(ptr2, BO_SIZE);
+   close(dma_buf_fd);
+   gem_close(fd, handle);
+}
+
+static void
+test_map_unmap(void)
+{
+   int dma_buf_fd;
+   char *ptr;
+   uint32_t handle;
+
+   handle = gem_create(fd, BO_SIZE);
+   fill_bo(handle, BO_SIZE);
+
+   dma_buf_fd = prime_handle_to_fd(fd, handle);
+   igt_assert(errno == 0);
+
+   ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
+   igt_assert(ptr != MAP_FAILED);
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);
+
+   /* Unmap

[PATCH igt v7 1/6] lib: Add gem_userptr and __gem_userptr helpers

2015-12-22 Thread Tiago Vignatti

This patch moves userptr definitions and helpers implementation that were
locally in gem_userptr_benchmark and gem_userptr_blits to the library, so other
tests can make use of them as well. There's no functional changes.

v2: added __ function to differentiate when errors want to be handled back in
the caller; bring gem_userptr_sync back to gem_userptr_blits; added gtkdoc.

Signed-off-by: Tiago Vignatti 
---
 benchmarks/gem_userptr_benchmark.c |  55 +++-
 lib/ioctl_wrappers.c   |  41 +++
 lib/ioctl_wrappers.h   |  13 +
 tests/gem_userptr_blits.c  | 104 ++---
 4 files changed, 86 insertions(+), 127 deletions(-)

diff --git a/benchmarks/gem_userptr_benchmark.c 
b/benchmarks/gem_userptr_benchmark.c
index 1eae7ff..f7716df 100644
--- a/benchmarks/gem_userptr_benchmark.c
+++ b/benchmarks/gem_userptr_benchmark.c
@@ -58,17 +58,6 @@
   #define PAGE_SIZE 4096
 #endif

-#define LOCAL_I915_GEM_USERPTR   0x33
-#define LOCAL_IOCTL_I915_GEM_USERPTR DRM_IOWR (DRM_COMMAND_BASE + 
LOCAL_I915_GEM_USERPTR, struct local_i915_gem_userptr)
-struct local_i915_gem_userptr {
-   uint64_t user_ptr;
-   uint64_t user_size;
-   uint32_t flags;
-#define LOCAL_I915_USERPTR_READ_ONLY (1<<0)
-#define LOCAL_I915_USERPTR_UNSYNCHRONIZED (1<<31)
-   uint32_t handle;
-};
-
 static uint32_t userptr_flags = LOCAL_I915_USERPTR_UNSYNCHRONIZED;

 #define BO_SIZE (65536)
@@ -83,30 +72,6 @@ static void gem_userptr_test_synchronized(void)
userptr_flags = 0;
 }

-static int gem_userptr(int fd, void *ptr, int size, int read_only, uint32_t 
*handle)
-{
-   struct local_i915_gem_userptr userptr;
-   int ret;
-
-   userptr.user_ptr = (uintptr_t)ptr;
-   userptr.user_size = size;
-   userptr.flags = userptr_flags;
-   if (read_only)
-   userptr.flags |= LOCAL_I915_USERPTR_READ_ONLY;
-
-   ret = drmIoctl(fd, LOCAL_IOCTL_I915_GEM_USERPTR, &userptr);
-   if (ret)
-   ret = errno;
-   igt_skip_on_f(ret == ENODEV &&
- (userptr_flags & LOCAL_I915_USERPTR_UNSYNCHRONIZED) == 0 
&&
- !read_only,
- "Skipping, synchronized mappings with no kernel 
CONFIG_MMU_NOTIFIER?");
-   if (ret == 0)
-   *handle = userptr.handle;
-
-   return ret;
-}
-
 static void **handle_ptr_map;
 static unsigned int num_handle_ptr_map;

@@ -144,8 +109,7 @@ static uint32_t create_userptr_bo(int fd, int size)
ret = posix_memalign(&ptr, PAGE_SIZE, size);
igt_assert(ret == 0);

-   ret = gem_userptr(fd, (uint32_t *)ptr, size, 0, &handle);
-   igt_assert(ret == 0);
+   gem_userptr(fd, (uint32_t *)ptr, size, 0, userptr_flags, &handle);
add_handle_ptr(handle, ptr);

return handle;
@@ -167,7 +131,7 @@ static int has_userptr(int fd)
assert(posix_memalign(&ptr, PAGE_SIZE, PAGE_SIZE) == 0);
oldflags = userptr_flags;
gem_userptr_test_unsynchronized();
-   ret = gem_userptr(fd, ptr, PAGE_SIZE, 0, &handle);
+   ret = __gem_userptr(fd, ptr, PAGE_SIZE, 0, userptr_flags, &handle);
userptr_flags = oldflags;
if (ret != 0) {
free(ptr);
@@ -379,9 +343,7 @@ static void test_impact_overlap(int fd, const char *prefix)

for (i = 0, p = block; i < nr_bos[subtest];
 i++, p += PAGE_SIZE)
-   ret = gem_userptr(fd, (uint32_t *)p, BO_SIZE, 0,
- &handles[i]);
-   igt_assert(ret == 0);
+   gem_userptr(fd, (uint32_t *)p, BO_SIZE, 0, 
userptr_flags, &handles[i]);
}

if (nr_bos[subtest] > 0)
@@ -427,7 +389,6 @@ static void test_single(int fd)
char *ptr, *bo_ptr;
uint32_t handle = 0;
unsigned long iter = 0;
-   int ret;
unsigned long map_size = BO_SIZE + PAGE_SIZE - 1;

ptr = mmap(NULL, map_size, PROT_READ | PROT_WRITE,
@@ -439,8 +400,7 @@ static void test_single(int fd)
start_test(test_duration_sec);

while (run_test) {
-   ret = gem_userptr(fd, bo_ptr, BO_SIZE, 0, &handle);
-   assert(ret == 0);
+   gem_userptr(fd, bo_ptr, BO_SIZE, 0, userptr_flags, &handle);
gem_close(fd, handle);
iter++;
}
@@ -456,7 +416,6 @@ static void test_multiple(int fd, unsigned int batch, int 
random)
uint32_t handles[1];
int map[1];
unsigned long iter = 0;
-   int ret;
int i;
unsigned long map_size = batch * BO_SIZE + PAGE_SIZE - 1;

@@ -478,10 +437,8 @@ static void test_multiple(int fd, unsigned int batch, int 
random)
if (random)

[PATCH v7 5/5] drm/i915: Use CPU mapping for userspace dma-buf mmap()

2015-12-22 Thread Tiago Vignatti

Userspace is the one in charge of flush CPU by wrapping mmap with
begin{,end}_cpu_access.

v2: Remove LLC check cause we have dma-buf sync providers now. Also, fix return
before transferring ownership when mmap fails.
v3: Fix return values.
v4: !obj->base.filp is user triggerable, so removed the WARN_ON.

Reviewed-by: Chris Wilson 
Signed-off-by: Tiago Vignatti 
---
 drivers/gpu/drm/i915/i915_gem_dmabuf.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index 8c9ed2a..1f3eef6 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -193,7 +193,23 @@ static void i915_gem_dmabuf_kunmap(struct dma_buf 
*dma_buf, unsigned long page_n

 static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct 
*vma)
 {
-   return -EINVAL;
+   struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
+   int ret;
+
+   if (obj->base.size < vma->vm_end - vma->vm_start)
+   return -EINVAL;
+
+   if (!obj->base.filp)
+   return -ENODEV;
+
+   ret = obj->base.filp->f_op->mmap(obj->base.filp, vma);
+   if (ret)
+   return ret;
+
+   fput(vma->vm_file);
+   vma->vm_file = get_file(obj->base.filp);
+
+   return 0;
 }

 static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum 
dma_data_direction direction)
-- 
2.1.4

[PATCH v7 4/5] drm/i915: Implement end_cpu_access

2015-12-22 Thread Tiago Vignatti

This function is meant to be used with dma-buf mmap, when finishing the CPU
access of the mapped pointer.

The error case should be rare to happen though, requiring the buffer become
active during the sync period and for the end_cpu_access to be interrupted. So
we use a uninterruptible mutex_lock to spit out when it ever happens.

v2: disable interruption to make sure errors are reported.
v3: update to the new end_cpu_access API.
v7: use .write = false cause it doesn't need to know whether it's write.

Reviewed-by: Chris Wilson 
Signed-off-by: Tiago Vignatti 
---
 drivers/gpu/drm/i915/i915_gem_dmabuf.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index 65ab2bd..8c9ed2a 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -212,6 +212,27 @@ static int i915_gem_begin_cpu_access(struct dma_buf 
*dma_buf, enum dma_data_dire
return ret;
 }

+static void i915_gem_end_cpu_access(struct dma_buf *dma_buf, enum 
dma_data_direction direction)
+{
+   struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
+   struct drm_device *dev = obj->base.dev;
+   struct drm_i915_private *dev_priv = to_i915(dev);
+   bool was_interruptible;
+   int ret;
+
+   mutex_lock(&dev->struct_mutex);
+   was_interruptible = dev_priv->mm.interruptible;
+   dev_priv->mm.interruptible = false;
+
+   ret = i915_gem_object_set_to_gtt_domain(obj, false);
+
+   dev_priv->mm.interruptible = was_interruptible;
+   mutex_unlock(&dev->struct_mutex);
+
+   if (unlikely(ret))
+   DRM_ERROR("unable to flush buffer following CPU access; 
rendering may be corrupt\n");
+}
+
 static const struct dma_buf_ops i915_dmabuf_ops =  {
.map_dma_buf = i915_gem_map_dma_buf,
.unmap_dma_buf = i915_gem_unmap_dma_buf,
@@ -224,6 +245,7 @@ static const struct dma_buf_ops i915_dmabuf_ops =  {
.vmap = i915_gem_dmabuf_vmap,
.vunmap = i915_gem_dmabuf_vunmap,
.begin_cpu_access = i915_gem_begin_cpu_access,
+   .end_cpu_access = i915_gem_end_cpu_access,
 };

 struct dma_buf *i915_gem_prime_export(struct drm_device *dev,
-- 
2.1.4

[PATCH v7 3/5] dma-buf: Add ioctls to allow userspace to flush

2015-12-22 Thread Tiago Vignatti

From: Daniel Vetter 

The userspace might need some sort of cache coherency management e.g. when CPU
and GPU domains are being accessed through dma-buf at the same time. To
circumvent this problem there are begin/end coherency markers, that forward
directly to existing dma-buf device drivers vfunc hooks. Userspace can make use
of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be
used like following:
 - mmap dma-buf fd
 - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
   to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
   want (with the new data being consumed by the GPU or say scanout device)
 - munmap once you don't need the buffer any more

v2 (Tiago): Fix header file type names (u64 -> __u64)
v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end
dma-buf functions. Check for overflows in start/length.
v4 (Tiago): use 2d regions for sync.
v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and
remove range information from struct dma_buf_sync.
v6 (Tiago): use __u64 structured padded flags instead enum. Adjust
documentation about the recommendation on using sync ioctls.
v7 (Tiago): Alex' nit on flags definition and being even more wording in the
doc about sync usage.

Cc: Sumit Semwal 
Signed-off-by: Daniel Vetter 
Signed-off-by: Tiago Vignatti 
---
 Documentation/dma-buf-sharing.txt | 21 ++-
 drivers/dma-buf/dma-buf.c | 43 +++
 include/uapi/linux/dma-buf.h  | 38 ++
 3 files changed, 101 insertions(+), 1 deletion(-)
 create mode 100644 include/uapi/linux/dma-buf.h

diff --git a/Documentation/dma-buf-sharing.txt 
b/Documentation/dma-buf-sharing.txt
index 4f4a84b..32ac32e 100644
--- a/Documentation/dma-buf-sharing.txt
+++ b/Documentation/dma-buf-sharing.txt
@@ -350,7 +350,26 @@ Being able to mmap an export dma-buf buffer object has 2 
main use-cases:
handles, too). So it's beneficial to support this in a similar fashion on
dma-buf to have a good transition path for existing Android userspace.

-   No special interfaces, userspace simply calls mmap on the dma-buf fd.
+   No special interfaces, userspace simply calls mmap on the dma-buf fd, making
+   sure that the cache synchronization ioctl (DMA_BUF_IOCTL_SYNC) is *always*
+   used when the access happens. This is discussed next paragraphs.
+
+   Some systems might need some sort of cache coherency management e.g. when
+   CPU and GPU domains are being accessed through dma-buf at the same time. To
+   circumvent this problem there are begin/end coherency markers, that forward
+   directly to existing dma-buf device drivers vfunc hooks. Userspace can make
+   use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence
+   would be used like following:
+ - mmap dma-buf fd
+ - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
+   to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
+   want (with the new data being consumed by the GPU or say scanout device)
+ - munmap once you don't need the buffer any more
+
+Therefore, for correctness and optimal performance, systems with the memory
+cache shared by the GPU and CPU i.e. the "coherent" and also the
+"incoherent" are always required to use SYNC_START and SYNC_END before and
+after, respectively, when accessing the mapped address.

 2. Supporting existing mmap interfaces in importers

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index b2ac13b..9a298bd 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -34,6 +34,8 @@
 #include 
 #include 

+#include 
+
 static inline int is_dma_buf_file(struct file *);

 struct dma_buf_list {
@@ -251,11 +253,52 @@ out:
return events;
 }

+static long dma_buf_ioctl(struct file *file,
+ unsigned int cmd, unsigned long arg)
+{
+   struct dma_buf *dmabuf;
+   struct dma_buf_sync sync;
+   enum dma_data_direction direction;
+
+   dmabuf = file->private_data;
+
+   if (!is_dma_buf_file(file))
+   return -EINVAL;
+
+   switch (cmd) {
+   case DMA_BUF_IOCTL_SYNC:
+   if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
+   return -EFAULT;
+
+   if (sync.flags & DMA_BUF_SYNC_RW)
+   direction = DMA_BIDIRECTIONAL;
+   else if (sync.flags & DMA_BUF_SYNC_READ)
+   direction = DMA_FROM_DEVICE;
+   else if (sync.flags & DMA_BUF_SYNC_WRITE)
+   direction = DMA_TO_DEVICE;
+   else
+   return -EINVAL;
+
+   if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
+   r

[PATCH v7 2/5] dma-buf: Remove range-based flush

2015-12-22 Thread Tiago Vignatti

This patch removes range-based information used for optimizations in
begin_cpu_access and end_cpu_access.

We don't have any user nor implementation using range-based flush. It seems a
consensus that if we ever want something like that again (or even more robust
using 2D, 3D sub-range regions) we can use the upcoming dma-buf sync ioctl for
such.

Cc: Sumit Semwal 
Cc: Daniel Vetter 
Signed-off-by: Tiago Vignatti 
---
 Documentation/dma-buf-sharing.txt | 19 ---
 drivers/dma-buf/dma-buf.c | 13 -
 drivers/gpu/drm/i915/i915_gem_dmabuf.c|  2 +-
 drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c |  4 ++--
 drivers/gpu/drm/udl/udl_fb.c  |  2 --
 drivers/staging/android/ion/ion.c |  6 ++
 drivers/staging/android/ion/ion_test.c|  4 ++--
 include/linux/dma-buf.h   | 12 +---
 8 files changed, 24 insertions(+), 38 deletions(-)

diff --git a/Documentation/dma-buf-sharing.txt 
b/Documentation/dma-buf-sharing.txt
index 480c8de..4f4a84b 100644
--- a/Documentation/dma-buf-sharing.txt
+++ b/Documentation/dma-buf-sharing.txt
@@ -257,17 +257,15 @@ Access to a dma_buf from the kernel context involves 
three steps:

Interface:
   int dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
-  size_t start, size_t len,
   enum dma_data_direction direction)

This allows the exporter to ensure that the memory is actually available for
cpu access - the exporter might need to allocate or swap-in and pin the
backing storage. The exporter also needs to ensure that cpu access is
-   coherent for the given range and access direction. The range and access
-   direction can be used by the exporter to optimize the cache flushing, i.e.
-   access outside of the range or with a different direction (read instead of
-   write) might return stale or even bogus data (e.g. when the exporter needs 
to
-   copy the data to temporary storage).
+   coherent for the access direction. The direction can be used by the exporter
+   to optimize the cache flushing, i.e. access with a different direction (read
+   instead of write) might return stale or even bogus data (e.g. when the
+   exporter needs to copy the data to temporary storage).

This step might fail, e.g. in oom conditions.

@@ -322,14 +320,13 @@ Access to a dma_buf from the kernel context involves 
three steps:

 3. Finish access

-   When the importer is done accessing the range specified in begin_cpu_access,
-   it needs to announce this to the exporter (to facilitate cache flushing and
-   unpinning of any pinned resources). The result of any dma_buf kmap calls
-   after end_cpu_access is undefined.
+   When the importer is done accessing the CPU, it needs to announce this to
+   the exporter (to facilitate cache flushing and unpinning of any pinned
+   resources). The result of any dma_buf kmap calls after end_cpu_access is
+   undefined.

Interface:
   void dma_buf_end_cpu_access(struct dma_buf *dma_buf,
- size_t start, size_t len,
  enum dma_data_direction dir);


diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 155c146..b2ac13b 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -539,13 +539,11 @@ EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment);
  * preparations. Coherency is only guaranteed in the specified range for the
  * specified access direction.
  * @dmabuf:[in]buffer to prepare cpu access for.
- * @start: [in]start of range for cpu access.
- * @len:   [in]length of range for cpu access.
  * @direction: [in]length of range for cpu access.
  *
  * Can return negative error values, returns 0 on success.
  */
-int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, size_t start, size_t len,
+int dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
 enum dma_data_direction direction)
 {
int ret = 0;
@@ -554,8 +552,7 @@ int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, size_t 
start, size_t len,
return -EINVAL;

if (dmabuf->ops->begin_cpu_access)
-   ret = dmabuf->ops->begin_cpu_access(dmabuf, start,
-   len, direction);
+   ret = dmabuf->ops->begin_cpu_access(dmabuf, direction);

return ret;
 }
@@ -567,19 +564,17 @@ EXPORT_SYMBOL_GPL(dma_buf_begin_cpu_access);
  * actions. Coherency is only guaranteed in the specified range for the
  * specified access direction.
  * @dmabuf:[in]buffer to complete cpu access for.
- * @start: [in]start of range for cpu access.
- * @len:   [in]length of range for cpu access.
  * @direction: [in]length of range for cpu access.
  *
  * This call must always succeed.
  */
-void dma_buf_end_cpu_access(struct dma_buf *dmabuf, size_t s

[PATCH v7 1/5] drm: prime: Honour O_RDWR during prime-handle-to-fd

2015-12-22 Thread Tiago Vignatti

From: Daniel Thompson 

Currently DRM_IOCTL_PRIME_HANDLE_TO_FD rejects all flags except
(DRM|O)_CLOEXEC making it difficult (maybe impossible) for userspace
to mmap() the resulting dma-buf even when this is supported by the
DRM driver.

It is trivial to relax the restriction and permit read/write access.
This is safe because the flags are seldom touched by drm; mostly they
are passed verbatim to dma_buf calls.

v3 (Tiago): removed unused flags variable from drm_prime_handle_to_fd_ioctl.

Reviewed-by: Chris Wilson 
Signed-off-by: Daniel Thompson 
Signed-off-by: Tiago Vignatti 
---
 drivers/gpu/drm/drm_prime.c | 10 +++---
 include/uapi/drm/drm.h  |  1 +
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 27aa718..df6cdc7 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -329,7 +329,7 @@ static const struct dma_buf_ops drm_gem_prime_dmabuf_ops =  
{
  * drm_gem_prime_export - helper library implementation of the export callback
  * @dev: drm_device to export from
  * @obj: GEM object to export
- * @flags: flags like DRM_CLOEXEC
+ * @flags: flags like DRM_CLOEXEC and DRM_RDWR
  *
  * This is the implementation of the gem_prime_export functions for GEM drivers
  * using the PRIME helpers.
@@ -628,7 +628,6 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, 
void *data,
 struct drm_file *file_priv)
 {
struct drm_prime_handle *args = data;
-   uint32_t flags;

if (!drm_core_check_feature(dev, DRIVER_PRIME))
return -EINVAL;
@@ -637,14 +636,11 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, 
void *data,
return -ENOSYS;

/* check flags are valid */
-   if (args->flags & ~DRM_CLOEXEC)
+   if (args->flags & ~(DRM_CLOEXEC | DRM_RDWR))
return -EINVAL;

-   /* we only want to pass DRM_CLOEXEC which is == O_CLOEXEC */
-   flags = args->flags & DRM_CLOEXEC;
-
return dev->driver->prime_handle_to_fd(dev, file_priv,
-   args->handle, flags, &args->fd);
+   args->handle, args->flags, &args->fd);
 }

 int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data,
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index b4e92eb..a0ebfe7 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -669,6 +669,7 @@ struct drm_set_client_cap {
__u64 value;
 };

+#define DRM_RDWR O_RDWR
 #define DRM_CLOEXEC O_CLOEXEC
 struct drm_prime_handle {
__u32 handle;
-- 
2.1.4

Direct userspace dma-buf mmap (v7)

2015-12-22 Thread Tiago Vignatti

Hey back,

Thank you Daniel, Chris, Alex and Thomas for reviewing the last series. I
think I addressed most of the comments now in version 7, including:
  - being even more wording in the doc about sync usage.
  - pass .write = false always in i915 end_cpu_access.
  - add sync invalid flags test (igt).
  - in kms_mmap_write_crc, use CPU hog and testing rounds to catch the sync
problems (igt).

Here are the trees:

https://github.com/tiagovignatti/drm-intel/commits/drm-intel-nightly_dma-buf-mmap-v7
https://github.com/tiagovignatti/intel-gpu-tools/commits/dma-buf-mmap-v7

Also, Chrome OS side is in progress. This past week I've been mostly
struggling with fail attempts to build it (boots and goes to a black screen.
Sigh.) and also finding a way to make my funky BayTrail-T laptop with 32-bit
UEFI firmware boot up (success with Ubuntu but no success yet in CrOS). A WIP
of Chromium changes can be seen here anyways:

https://codereview.chromium.org/1262043002/

Happy Holidays!

Tiago

-- 
2.1.4

[PATCH v6 3/5] dma-buf: Add ioctls to allow userspace to flush

2015-12-18 Thread Tiago Vignatti

On 12/17/2015 07:58 PM, Thomas Hellstrom wrote:
> On 12/16/2015 11:25 PM, Tiago Vignatti wrote:
>> From: Daniel Vetter 
>>
>> The userspace might need some sort of cache coherency management e.g. when 
>> CPU
>> and GPU domains are being accessed through dma-buf at the same time. To
>> circumvent this problem there are begin/end coherency markers, that forward
>> directly to existing dma-buf device drivers vfunc hooks. Userspace can make 
>> use
>> of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be
>> used like following:
>>   - mmap dma-buf fd
>>   - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. 
>> read/write
>> to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
>> want (with the new data being consumed by the GPU or say scanout 
>> device)
>>   - munmap once you don't need the buffer any more
>>
>> v2 (Tiago): Fix header file type names (u64 -> __u64)
>> v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end
>> dma-buf functions. Check for overflows in start/length.
>> v4 (Tiago): use 2d regions for sync.
>> v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and
>> remove range information from struct dma_buf_sync.
>> v6 (Tiago): use __u64 structured padded flags instead enum. Adjust
>> documentation about the recommendation on using sync ioctls.
>>
>> Cc: Sumit Semwal 
>> Signed-off-by: Daniel Vetter 
>> Signed-off-by: Tiago Vignatti 
>> ---
>>   Documentation/dma-buf-sharing.txt | 22 +++-
>>   drivers/dma-buf/dma-buf.c | 43 
>> +++
>>   include/uapi/linux/dma-buf.h  | 38 ++
>>   3 files changed, 102 insertions(+), 1 deletion(-)
>>   create mode 100644 include/uapi/linux/dma-buf.h
>>
>> diff --git a/Documentation/dma-buf-sharing.txt 
>> b/Documentation/dma-buf-sharing.txt
>> index 4f4a84b..2ddd4b2 100644
>> --- a/Documentation/dma-buf-sharing.txt
>> +++ b/Documentation/dma-buf-sharing.txt
>> @@ -350,7 +350,27 @@ Being able to mmap an export dma-buf buffer object has 
>> 2 main use-cases:
>>  handles, too). So it's beneficial to support this in a similar fashion 
>> on
>>  dma-buf to have a good transition path for existing Android userspace.
>>
>> -   No special interfaces, userspace simply calls mmap on the dma-buf fd.
>> +   No special interfaces, userspace simply calls mmap on the dma-buf fd. 
>> Very
>> +   important to note though is that, even if it is not mandatory, the 
>> userspace
>> +   is strongly recommended to always use the cache synchronization ioctl
>> +   (DMA_BUF_IOCTL_SYNC) discussed next.
>> +
>> +   Some systems might need some sort of cache coherency management e.g. when
>> +   CPU and GPU domains are being accessed through dma-buf at the same time. 
>> To
>> +   circumvent this problem there are begin/end coherency markers, that 
>> forward
>> +   directly to existing dma-buf device drivers vfunc hooks. Userspace can 
>> make
>> +   use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence
>> +   would be used like following:
>> + - mmap dma-buf fd
>> + - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. 
>> read/write
>> +   to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
>> +   want (with the new data being consumed by the GPU or say scanout 
>> device)
>> + - munmap once you don't need the buffer any more
>> +
>> +In principle systems with the memory cache shared by the GPU and CPU may
>> +not need SYNC_START and SYNC_END but still, userspace is always 
>> encouraged
>> +to use these ioctls before and after, respectively, when accessing the
>> +mapped address.
>>
>
> I think the wording here is far too weak. If this is a generic
> user-space interface and syncing
> is required for
> a) Correctness: then syncing must be mandatory.
> b) Optimal performance then an implementation must generate expected
> results also in the absence of SYNC ioctls, but is allowed to rely on
> correct pairing of SYNC_START and SYNC_END to render correctly.

Thomas, do you think the following write-up captures this?


-   No special interfaces, userspace simply calls mmap on the dma-buf fd.
+   No special interfaces, userspace simply calls mmap on the dma-buf 
fd, making
+   sure that the cache synchronization ioctl (DMA_BUF_IOCTL_SYNC) is 
*always*
+   used when the

[PATCH v6 4/5] drm/i915: Implement end_cpu_access

2015-12-18 Thread Tiago Vignatti

On 12/18/2015 05:02 PM, Tiago Vignatti wrote:
> On 12/17/2015 06:01 AM, Chris Wilson wrote:
>> On Wed, Dec 16, 2015 at 08:25:36PM -0200, Tiago Vignatti wrote:
>>> This function is meant to be used with dma-buf mmap, when finishing
>>> the CPU
>>> access of the mapped pointer.
>>>
>>> +static void i915_gem_end_cpu_access(struct dma_buf *dma_buf, enum
>>> dma_data_direction direction)
>>> +{
>>> +struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
>>> +struct drm_device *dev = obj->base.dev;
>>> +struct drm_i915_private *dev_priv = to_i915(dev);
>>> +bool was_interruptible, write = (direction == DMA_BIDIRECTIONAL
>>> || direction == DMA_TO_DEVICE);
>>> +int ret;
>>> +
>>> +mutex_lock(&dev->struct_mutex);
>>> +was_interruptible = dev_priv->mm.interruptible;
>>> +dev_priv->mm.interruptible = false;
>>> +
>>> +ret = i915_gem_object_set_to_gtt_domain(obj, write);
>>
>> This only needs to pass .write=false. The dma-buf direction is
>> only for the period of the user access, and we are now flushing the
>> caches. This is equivalent to the sw-finish ioctl and ideally we just
>> want the i915_gem_object_flush_cpu_write_domain().
>
> in fact the only usage so far I found for end_cpu_access is when the
> pinned buffer is scanout out. Should I pretty much copy sw-finish in
> end_cpu_access then?

And do you think it's okay to declare 
i915_gem_object_flush_cpu_write_domain outside its file's only scope?

Tiago

[PATCH v6 4/5] drm/i915: Implement end_cpu_access

2015-12-18 Thread Tiago Vignatti

On 12/17/2015 06:01 AM, Chris Wilson wrote:
> On Wed, Dec 16, 2015 at 08:25:36PM -0200, Tiago Vignatti wrote:
>> This function is meant to be used with dma-buf mmap, when finishing the CPU
>> access of the mapped pointer.
>>
>> +static void i915_gem_end_cpu_access(struct dma_buf *dma_buf, enum 
>> dma_data_direction direction)
>> +{
>> +struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
>> +struct drm_device *dev = obj->base.dev;
>> +struct drm_i915_private *dev_priv = to_i915(dev);
>> +bool was_interruptible, write = (direction == DMA_BIDIRECTIONAL || 
>> direction == DMA_TO_DEVICE);
>> +int ret;
>> +
>> +mutex_lock(&dev->struct_mutex);
>> +was_interruptible = dev_priv->mm.interruptible;
>> +dev_priv->mm.interruptible = false;
>> +
>> +ret = i915_gem_object_set_to_gtt_domain(obj, write);
>
> This only needs to pass .write=false. The dma-buf direction is
> only for the period of the user access, and we are now flushing the
> caches. This is equivalent to the sw-finish ioctl and ideally we just
> want the i915_gem_object_flush_cpu_write_domain().

in fact the only usage so far I found for end_cpu_access is when the 
pinned buffer is scanout out. Should I pretty much copy sw-finish in 
end_cpu_access then?

Thanks,

Tiago

[PATCH v6 3/5] dma-buf: Add ioctls to allow userspace to flush

2015-12-18 Thread Tiago Vignatti

On 12/17/2015 04:19 PM, Alex Deucher wrote:
> On Wed, Dec 16, 2015 at 5:25 PM, Tiago Vignatti
>  wrote:
>> From: Daniel Vetter 
>>
>> The userspace might need some sort of cache coherency management e.g. when 
>> CPU
>> and GPU domains are being accessed through dma-buf at the same time. To
>> circumvent this problem there are begin/end coherency markers, that forward
>> directly to existing dma-buf device drivers vfunc hooks. Userspace can make 
>> use
>> of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be
>> used like following:
>>   - mmap dma-buf fd
>>   - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. 
>> read/write
>> to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
>> want (with the new data being consumed by the GPU or say scanout 
>> device)
>>   - munmap once you don't need the buffer any more
>>
>> v2 (Tiago): Fix header file type names (u64 -> __u64)
>> v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end
>> dma-buf functions. Check for overflows in start/length.
>> v4 (Tiago): use 2d regions for sync.
>> v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and
>> remove range information from struct dma_buf_sync.
>> v6 (Tiago): use __u64 structured padded flags instead enum. Adjust
>> documentation about the recommendation on using sync ioctls.
>>
>> Cc: Sumit Semwal 
>> Signed-off-by: Daniel Vetter 
>> Signed-off-by: Tiago Vignatti 
>> ---
>>   Documentation/dma-buf-sharing.txt | 22 +++-
>>   drivers/dma-buf/dma-buf.c | 43 
>> +++
>>   include/uapi/linux/dma-buf.h  | 38 ++
>>   3 files changed, 102 insertions(+), 1 deletion(-)
>>   create mode 100644 include/uapi/linux/dma-buf.h
>>
>> diff --git a/Documentation/dma-buf-sharing.txt 
>> b/Documentation/dma-buf-sharing.txt
>> index 4f4a84b..2ddd4b2 100644
>> --- a/Documentation/dma-buf-sharing.txt
>> +++ b/Documentation/dma-buf-sharing.txt
>> @@ -350,7 +350,27 @@ Being able to mmap an export dma-buf buffer object has 
>> 2 main use-cases:
>>  handles, too). So it's beneficial to support this in a similar fashion 
>> on
>>  dma-buf to have a good transition path for existing Android userspace.
>>
>> -   No special interfaces, userspace simply calls mmap on the dma-buf fd.
>> +   No special interfaces, userspace simply calls mmap on the dma-buf fd. 
>> Very
>> +   important to note though is that, even if it is not mandatory, the 
>> userspace
>> +   is strongly recommended to always use the cache synchronization ioctl
>> +   (DMA_BUF_IOCTL_SYNC) discussed next.
>> +
>> +   Some systems might need some sort of cache coherency management e.g. when
>> +   CPU and GPU domains are being accessed through dma-buf at the same time. 
>> To
>> +   circumvent this problem there are begin/end coherency markers, that 
>> forward
>> +   directly to existing dma-buf device drivers vfunc hooks. Userspace can 
>> make
>> +   use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence
>> +   would be used like following:
>> + - mmap dma-buf fd
>> + - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. 
>> read/write
>> +   to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
>> +   want (with the new data being consumed by the GPU or say scanout 
>> device)
>> + - munmap once you don't need the buffer any more
>> +
>> +In principle systems with the memory cache shared by the GPU and CPU may
>> +not need SYNC_START and SYNC_END but still, userspace is always 
>> encouraged
>> +to use these ioctls before and after, respectively, when accessing the
>> +mapped address.
>>
>>   2. Supporting existing mmap interfaces in importers
>>
>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
>> index b2ac13b..9a298bd 100644
>> --- a/drivers/dma-buf/dma-buf.c
>> +++ b/drivers/dma-buf/dma-buf.c
>> @@ -34,6 +34,8 @@
>>   #include 
>>   #include 
>>
>> +#include 
>> +
>>   static inline int is_dma_buf_file(struct file *);
>>
>>   struct dma_buf_list {
>> @@ -251,11 +253,52 @@ out:
>>  return events;
>>   }
>>
>> +static long dma_buf_ioctl(struct file *file,
>> + unsigned int cmd, un

[PATCH igt v6 6/6] tests: Add prime_mmap_coherency for cache coherency tests

2015-12-16 Thread Tiago Vignatti

Different than kms_mmap_write_crc that captures the coherency issues within the
scanout mapped buffer, this one is meant for test dma-buf mmap on !llc
platforms mostly and provoke coherency bugs so we know where we need the sync
ioctls.

I tested this with !llc and llc platforms, BTY and IVY respectively.

Signed-off-by: Tiago Vignatti 
---
 tests/Makefile.sources   |   1 +
 tests/prime_mmap_coherency.c | 246 +++
 2 files changed, 247 insertions(+)
 create mode 100644 tests/prime_mmap_coherency.c

diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index ad2dd6a..78605c6 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -97,6 +97,7 @@ TESTS_progs_M = \
pm_rc6_residency \
pm_sseu \
prime_mmap \
+   prime_mmap_coherency \
prime_self_import \
template \
$(NULL)
diff --git a/tests/prime_mmap_coherency.c b/tests/prime_mmap_coherency.c
new file mode 100644
index 000..a9a2664
--- /dev/null
+++ b/tests/prime_mmap_coherency.c
@@ -0,0 +1,246 @@
+/*
+ * Copyright Â© 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Tiago Vignatti
+ */
+
+/** @file prime_mmap_coherency.c
+ *
+ * TODO: need to show the need for prime_sync_end().
+ */
+
+#include "igt.h"
+
+IGT_TEST_DESCRIPTION("Test dma-buf mmap on !llc platforms mostly and provoke"
+   " coherency bugs so we know for sure where we need the sync 
ioctls.");
+
+#define ROUNDS 20
+
+int fd;
+int stale = 0;
+static drm_intel_bufmgr *bufmgr;
+struct intel_batchbuffer *batch;
+static int width = 1024, height = 1024;
+
+/*
+ * Exercises the need for read flush:
+ *   1. create a BO and write '0's, in GTT domain.
+ *   2. read BO using the dma-buf CPU mmap.
+ *   3. write '1's, in GTT domain.
+ *   4. read again through the mapped dma-buf.
+ */
+static void test_read_flush(bool expect_stale_cache)
+{
+   drm_intel_bo *bo_1;
+   drm_intel_bo *bo_2;
+   uint32_t *ptr_cpu;
+   uint32_t *ptr_gtt;
+   int dma_buf_fd, i;
+
+   if (expect_stale_cache)
+   igt_require(!gem_has_llc(fd));
+
+   bo_1 = drm_intel_bo_alloc(bufmgr, "BO 1", width * height * 4, 4096);
+
+   /* STEP #1: put the BO 1 in GTT domain. We use the blitter to copy and 
fill
+* zeros to BO 1, so commands will be submitted and likely to place BO 
1 in
+* the GTT domain. */
+   bo_2 = drm_intel_bo_alloc(bufmgr, "BO 2", width * height * 4, 4096);
+   intel_copy_bo(batch, bo_1, bo_2, width * height);
+   gem_sync(fd, bo_1->handle);
+   drm_intel_bo_unreference(bo_2);
+
+   /* STEP #2: read BO 1 using the dma-buf CPU mmap. This dirties the CPU 
caches. */
+   dma_buf_fd = prime_handle_to_fd_for_mmap(fd, bo_1->handle);
+   igt_skip_on(errno == EINVAL);
+
+   ptr_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE,
+  MAP_SHARED, dma_buf_fd, 0);
+   igt_assert(ptr_cpu != MAP_FAILED);
+
+   for (i = 0; i < (width * height) / 4; i++)
+   igt_assert_eq(ptr_cpu[i], 0);
+
+   /* STEP #3: write 0x11 into BO 1. */
+   bo_2 = drm_intel_bo_alloc(bufmgr, "BO 2", width * height * 4, 4096);
+   ptr_gtt = gem_mmap__gtt(fd, bo_2->handle, width * height, PROT_READ | 
PROT_WRITE);
+   memset(ptr_gtt, 0x11, width * height);
+   munmap(ptr_gtt, width * height);
+
+   intel_copy_bo(batch, bo_1, bo_2, width * height);
+   gem_sync(fd, bo_1->handle);
+   drm_intel_bo_unreference(bo_2);
+
+   /* STEP #4: read again using the CPU mmap. Doing #1 before #3 makes 
sure we
+* don't do a full CPU cache flush in step #3 again. That makes sure 
all the
+* stale cachelines from step #2 survive (mostly, a few w

[PATCH igt v6 5/6] tests: Add kms_mmap_write_crc for cache coherency tests

2015-12-16 Thread Tiago Vignatti

This program can be used to detect when CPU writes in the dma-buf mapped object
don't land in scanout due cache incoherency.

Although this seems a problem inherently of non-LCC machines ("Atom"), this
particular test catches a cache dirt on scanout on LLC machines as well. It's
inspired in Ville's kms_pwrite_crc.c and can be used also to test the
correctness of the driver's begin_cpu_access and end_cpu_access (which requires
i915 implementation.

To see the need for flush, one has to run this same binary a few times cause
it's not 100% reproducible -- what I usually do is the following, using '-n'
option to not call the sync ioctls:

$ while ((1)) ; do ./kms_mmap_write_crc -n; done  # in terminal A
$ find /  # in terminal B

That will most likely trashes the memory while the test will catch the
coherency issue. If you now suppress '-n', then things should just work like
expected.

I tested this with !llc and llc platforms, BTY and IVY respectively.

v2: use prime_handle_to_fd_for_mmap instead.
v3: merge end_cpu_access() patch with this and provide options to disable sync.
v4: use library's prime_sync_{start,end} instead.

Signed-off-by: Tiago Vignatti 
---
 tests/Makefile.sources |   1 +
 tests/kms_mmap_write_crc.c | 281 +
 2 files changed, 282 insertions(+)
 create mode 100644 tests/kms_mmap_write_crc.c

diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index 75f3cb0..ad2dd6a 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -168,6 +168,7 @@ TESTS_progs = \
kms_3d \
kms_fence_pin_leak \
kms_force_connector_basic \
+   kms_mmap_write_crc \
kms_pwrite_crc \
kms_sink_crc_basic \
prime_udl \
diff --git a/tests/kms_mmap_write_crc.c b/tests/kms_mmap_write_crc.c
new file mode 100644
index 000..6a12539
--- /dev/null
+++ b/tests/kms_mmap_write_crc.c
@@ -0,0 +1,281 @@
+/*
+ * Copyright Â© 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Tiago Vignatti 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "drmtest.h"
+#include "igt_debugfs.h"
+#include "igt_kms.h"
+#include "intel_chipset.h"
+#include "ioctl_wrappers.h"
+#include "igt_aux.h"
+
+IGT_TEST_DESCRIPTION(
+   "Use the display CRC support to validate mmap write to an already uncached 
future scanout buffer.");
+
+typedef struct {
+   int drm_fd;
+   igt_display_t display;
+   struct igt_fb fb[2];
+   igt_output_t *output;
+   igt_plane_t *primary;
+   enum pipe pipe;
+   igt_crc_t ref_crc;
+   igt_pipe_crc_t *pipe_crc;
+   uint32_t devid;
+} data_t;
+
+static int ioctl_sync = true;
+int dma_buf_fd;
+
+static char *dmabuf_mmap_framebuffer(int drm_fd, struct igt_fb *fb)
+{
+   char *ptr = NULL;
+
+   dma_buf_fd = prime_handle_to_fd_for_mmap(drm_fd, fb->gem_handle);
+   igt_skip_on(dma_buf_fd == -1 && errno == EINVAL);
+
+   ptr = mmap(NULL, fb->size, PROT_READ | PROT_WRITE, MAP_SHARED, 
dma_buf_fd, 0);
+   igt_assert(ptr != MAP_FAILED);
+
+   return ptr;
+}
+
+static void test(data_t *data)
+{
+   igt_display_t *display = &data->display;
+   igt_output_t *output = data->output;
+   struct igt_fb *fb = &data->fb[1];
+   drmModeModeInfo *mode;
+   cairo_t *cr;
+   char *ptr;
+   uint32_t caching;
+   void *buf;
+   igt_crc_t crc;
+
+   mode = igt_output_get_mode(output);
+
+   /* create a non-white fb where we can write later */
+   igt_create_fb(data->drm_fd, mode->hdisplay, mode->vdisplay,
+ DRM_FORMAT_XRGB, LOCAL_DRM

[PATCH igt v6 4/6] lib: Add prime_sync_start and prime_sync_end helpers

2015-12-16 Thread Tiago Vignatti

This patch adds dma-buf mmap synchronization ioctls that can be used by tests
for cache coherency management e.g. when CPU and GPU domains are being accessed
through dma-buf at the same time.

Signed-off-by: Tiago Vignatti 
---
 lib/ioctl_wrappers.c | 26 ++
 lib/ioctl_wrappers.h | 15 +++
 2 files changed, 41 insertions(+)

diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c
index 86a61ba..0d84d00 100644
--- a/lib/ioctl_wrappers.c
+++ b/lib/ioctl_wrappers.c
@@ -1400,6 +1400,32 @@ off_t prime_get_size(int dma_buf_fd)
 }

 /**
+ * prime_sync_start
+ * @dma_buf_fd: dma-buf fd handle
+ */
+void prime_sync_start(int dma_buf_fd)
+{
+   struct local_dma_buf_sync sync_start;
+
+   memset(&sync_start, 0, sizeof(sync_start));
+   sync_start.flags = LOCAL_DMA_BUF_SYNC_START | LOCAL_DMA_BUF_SYNC_RW;
+   do_ioctl(dma_buf_fd, LOCAL_DMA_BUF_IOCTL_SYNC, &sync_start);
+}
+
+/**
+ * prime_sync_end
+ * @dma_buf_fd: dma-buf fd handle
+ */
+void prime_sync_end(int dma_buf_fd)
+{
+   struct local_dma_buf_sync sync_end;
+
+   memset(&sync_end, 0, sizeof(sync_end));
+   sync_end.flags = LOCAL_DMA_BUF_SYNC_END | LOCAL_DMA_BUF_SYNC_RW;
+   do_ioctl(dma_buf_fd, LOCAL_DMA_BUF_IOCTL_SYNC, &sync_end);
+}
+
+/**
  * igt_require_fb_modifiers:
  * @fd: Open DRM file descriptor.
  *
diff --git a/lib/ioctl_wrappers.h b/lib/ioctl_wrappers.h
index d3ffba2..cbd7a73 100644
--- a/lib/ioctl_wrappers.h
+++ b/lib/ioctl_wrappers.h
@@ -148,6 +148,19 @@ void gem_require_caching(int fd);
 void gem_require_ring(int fd, int ring_id);

 /* prime */
+struct local_dma_buf_sync {
+   uint64_t flags;
+};
+
+#define LOCAL_DMA_BUF_SYNC_RW(3 << 0)
+#define LOCAL_DMA_BUF_SYNC_START (0 << 2)
+#define LOCAL_DMA_BUF_SYNC_END   (1 << 2)
+#define DMA_BUF_SYNC_VALID_FLAGS_MASK \
+   (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END)
+
+#define LOCAL_DMA_BUF_BASE 'b'
+#define LOCAL_DMA_BUF_IOCTL_SYNC _IOW(LOCAL_DMA_BUF_BASE, 0, struct 
local_dma_buf_sync)
+
 int prime_handle_to_fd(int fd, uint32_t handle);
 #ifndef DRM_RDWR
 #define DRM_RDWR O_RDWR
@@ -155,6 +168,8 @@ int prime_handle_to_fd(int fd, uint32_t handle);
 int prime_handle_to_fd_for_mmap(int fd, uint32_t handle);
 uint32_t prime_fd_to_handle(int fd, int dma_buf_fd);
 off_t prime_get_size(int dma_buf_fd);
+void prime_sync_start(int dma_buf_fd);
+void prime_sync_end(int dma_buf_fd);

 /* addfb2 fb modifiers */
 struct local_drm_mode_fb_cmd2 {
-- 
2.1.4

[PATCH igt v6 3/6] prime_mmap: Add basic tests to write in a bo using CPU

2015-12-16 Thread Tiago Vignatti

This patch adds test_correct_cpu_write, which maps the texture buffer through a
prime fd and then writes directly to it using the CPU. It stresses the driver
to guarantee cache synchronization among the different domains.

This test also adds test_forked_cpu_write, which creates the GEM bo in one
process and pass the prime handle of the it to another process, which in turn
uses the handle only to map and write. Roughly speaking this test simulates
Chrome OS  architecture, where the Web content ("unpriviledged process") maps
and CPU-draws a buffer, which was previously allocated in the GPU process
("priviledged process").

This requires kernel modifications (Daniel Thompson's "drm: prime: Honour
O_RDWR during prime-handle-to-fd") and therefore prime_handle_to_fd_for_mmap is
added to fail in case these lack. Also, upcoming tests (e.g. next patch) are
going to use it as well, so make it public and available in the lib.

v2: adds prime_handle_to_fd_with_mmap for skipping test in older kernels and
test for invalid flags.

Signed-off-by: Tiago Vignatti 
---
 lib/ioctl_wrappers.c | 25 +++
 lib/ioctl_wrappers.h |  4 +++
 tests/prime_mmap.c   | 89 
 3 files changed, 112 insertions(+), 6 deletions(-)

diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c
index 6cad8a2..86a61ba 100644
--- a/lib/ioctl_wrappers.c
+++ b/lib/ioctl_wrappers.c
@@ -1329,6 +1329,31 @@ int prime_handle_to_fd(int fd, uint32_t handle)
 }

 /**
+ * prime_handle_to_fd_for_mmap:
+ * @fd: open i915 drm file descriptor
+ * @handle: file-private gem buffer object handle
+ *
+ * Same as prime_handle_to_fd above but with DRM_RDWR capabilities, which can
+ * be useful for writing into the mmap'ed dma-buf file-descriptor.
+ *
+ * Returns: The created dma-buf fd handle or -1 if the ioctl fails.
+ */
+int prime_handle_to_fd_for_mmap(int fd, uint32_t handle)
+{
+   struct drm_prime_handle args;
+
+   memset(&args, 0, sizeof(args));
+   args.handle = handle;
+   args.flags = DRM_CLOEXEC | DRM_RDWR;
+   args.fd = -1;
+
+   if (drmIoctl(fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &args) != 0)
+   return -1;
+
+   return args.fd;
+}
+
+/**
  * prime_fd_to_handle:
  * @fd: open i915 drm file descriptor
  * @dma_buf_fd: dma-buf fd handle
diff --git a/lib/ioctl_wrappers.h b/lib/ioctl_wrappers.h
index bb8a858..d3ffba2 100644
--- a/lib/ioctl_wrappers.h
+++ b/lib/ioctl_wrappers.h
@@ -149,6 +149,10 @@ void gem_require_ring(int fd, int ring_id);

 /* prime */
 int prime_handle_to_fd(int fd, uint32_t handle);
+#ifndef DRM_RDWR
+#define DRM_RDWR O_RDWR
+#endif
+int prime_handle_to_fd_for_mmap(int fd, uint32_t handle);
 uint32_t prime_fd_to_handle(int fd, int dma_buf_fd);
 off_t prime_get_size(int dma_buf_fd);

diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c
index 95304a9..269ada6 100644
--- a/tests/prime_mmap.c
+++ b/tests/prime_mmap.c
@@ -22,6 +22,7 @@
  *
  * Authors:
  *Rob Bradford 
+ *Tiago Vignatti 
  *
  */

@@ -66,6 +67,12 @@ fill_bo(uint32_t handle, size_t size)
 }

 static void
+fill_bo_cpu(char *ptr)
+{
+   memcpy(ptr, pattern, sizeof(pattern));
+}
+
+static void
 test_correct(void)
 {
int dma_buf_fd;
@@ -180,6 +187,65 @@ test_forked(void)
gem_close(fd, handle);
 }

+/* test simple CPU write */
+static void
+test_correct_cpu_write(void)
+{
+   int dma_buf_fd;
+   char *ptr;
+   uint32_t handle;
+
+   handle = gem_create(fd, BO_SIZE);
+
+   dma_buf_fd = prime_handle_to_fd_for_mmap(fd, handle);
+
+   /* Skip if DRM_RDWR is not supported */
+   igt_skip_on(errno == EINVAL);
+
+   /* Check correctness of map using write protection (PROT_WRITE) */
+   ptr = mmap(NULL, BO_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, 
dma_buf_fd, 0);
+   igt_assert(ptr != MAP_FAILED);
+
+   /* Fill bo using CPU */
+   fill_bo_cpu(ptr);
+
+   /* Check pattern correctness */
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);
+
+   munmap(ptr, BO_SIZE);
+   close(dma_buf_fd);
+   gem_close(fd, handle);
+}
+
+/* map from another process and then write using CPU */
+static void
+test_forked_cpu_write(void)
+{
+   int dma_buf_fd;
+   char *ptr;
+   uint32_t handle;
+
+   handle = gem_create(fd, BO_SIZE);
+
+   dma_buf_fd = prime_handle_to_fd_for_mmap(fd, handle);
+
+   /* Skip if DRM_RDWR is not supported */
+   igt_skip_on(errno == EINVAL);
+
+   igt_fork(childno, 1) {
+   ptr = mmap(NULL, BO_SIZE, PROT_READ | PROT_WRITE , MAP_SHARED, 
dma_buf_fd, 0);
+   igt_assert(ptr != MAP_FAILED);
+   fill_bo_cpu(ptr);
+
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);
+   munmap(ptr, BO_SIZE);
+   close(dma_buf_fd);
+   }
+   close(dma_buf_fd);
+   igt_waitchildren();
+   gem_close(fd, handle);
+}
+
 static void
 test_ref

[PATCH igt v6 2/6] prime_mmap: Add new test for calling mmap() on dma-buf fds

2015-12-16 Thread Tiago Vignatti

From: Rob Bradford 

This test has the following subtests:
 - test_correct for correctness of the data
 - test_map_unmap checks for mapping idempotency
 - test_reprime checks for dma-buf creation idempotency
 - test_forked checks for multiprocess access
 - test_refcounting checks for buffer reference counting
 - test_dup checks that dup()ing the fd works
 - test_userptr make sure it fails when mmaping due the lack of obj->base.filp
   in a userptr.
 - test_errors checks the error return values for failures
 - test_aperture_limit tests multiple buffer creation at the gtt aperture
   limit

v2 (Tiago): Removed pattern_check(), which was walking through a useless
iterator. Removed superfluous PROT_WRITE from gem_mmap, in test_correct().
Added binary file to .gitignore
v3 (Tiago): squash patch "prime_mmap: Test for userptr mmap" into this one.
v4 (Tiago): use synchronized userptr for testing. Add test for buffer
overlapping.

Signed-off-by: Rob Bradford 
Signed-off-by: Tiago Vignatti 
---
 tests/Makefile.sources |   1 +
 tests/prime_mmap.c | 417 +
 2 files changed, 418 insertions(+)
 create mode 100644 tests/prime_mmap.c

diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index d594038..75f3cb0 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -96,6 +96,7 @@ TESTS_progs_M = \
pm_rps \
pm_rc6_residency \
pm_sseu \
+   prime_mmap \
prime_self_import \
template \
$(NULL)
diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c
new file mode 100644
index 000..95304a9
--- /dev/null
+++ b/tests/prime_mmap.c
@@ -0,0 +1,417 @@
+/*
+ * Copyright Â© 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Rob Bradford 
+ *
+ */
+
+/*
+ * Testcase: Check whether mmap()ing dma-buf works
+ */
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "drm.h"
+#include "i915_drm.h"
+#include "drmtest.h"
+#include "igt_debugfs.h"
+#include "ioctl_wrappers.h"
+
+#define BO_SIZE (16*1024)
+
+static int fd;
+
+char pattern[] = {0xff, 0x00, 0x00, 0x00,
+   0x00, 0xff, 0x00, 0x00,
+   0x00, 0x00, 0xff, 0x00,
+   0x00, 0x00, 0x00, 0xff};
+
+static void
+fill_bo(uint32_t handle, size_t size)
+{
+   off_t i;
+   for (i = 0; i < size; i+=sizeof(pattern))
+   {
+   gem_write(fd, handle, i, pattern, sizeof(pattern));
+   }
+}
+
+static void
+test_correct(void)
+{
+   int dma_buf_fd;
+   char *ptr1, *ptr2;
+   uint32_t handle;
+
+   handle = gem_create(fd, BO_SIZE);
+   fill_bo(handle, BO_SIZE);
+
+   dma_buf_fd = prime_handle_to_fd(fd, handle);
+   igt_assert(errno == 0);
+
+   /* Check correctness vs GEM_MMAP_GTT */
+   ptr1 = gem_mmap__gtt(fd, handle, BO_SIZE, PROT_READ);
+   ptr2 = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
+   igt_assert(ptr1 != MAP_FAILED);
+   igt_assert(ptr2 != MAP_FAILED);
+   igt_assert(memcmp(ptr1, ptr2, BO_SIZE) == 0);
+
+   /* Check pattern correctness */
+   igt_assert(memcmp(ptr2, pattern, sizeof(pattern)) == 0);
+
+   munmap(ptr1, BO_SIZE);
+   munmap(ptr2, BO_SIZE);
+   close(dma_buf_fd);
+   gem_close(fd, handle);
+}
+
+static void
+test_map_unmap(void)
+{
+   int dma_buf_fd;
+   char *ptr;
+   uint32_t handle;
+
+   handle = gem_create(fd, BO_SIZE);
+   fill_bo(handle, BO_SIZE);
+
+   dma_buf_fd = prime_handle_to_fd(fd, handle);
+   igt_assert(errno == 0);
+
+   ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
+   igt_assert(ptr != MAP_FAILED);
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);
+
+   /* Unmap

[PATCH igt v6 1/6] lib: Add gem_userptr and __gem_userptr helpers

2015-12-16 Thread Tiago Vignatti

This patch moves userptr definitions and helpers implementation that were
locally in gem_userptr_benchmark and gem_userptr_blits to the library, so other
tests can make use of them as well. There's no functional changes.

v2: added __ function to differentiate when errors want to be handled back in
the caller; bring gem_userptr_sync back to gem_userptr_blits; added gtkdoc.

Signed-off-by: Tiago Vignatti 
---
 benchmarks/gem_userptr_benchmark.c |  55 +++-
 lib/ioctl_wrappers.c   |  41 +++
 lib/ioctl_wrappers.h   |  13 +
 tests/gem_userptr_blits.c  | 104 ++---
 4 files changed, 86 insertions(+), 127 deletions(-)

diff --git a/benchmarks/gem_userptr_benchmark.c 
b/benchmarks/gem_userptr_benchmark.c
index 1eae7ff..f7716df 100644
--- a/benchmarks/gem_userptr_benchmark.c
+++ b/benchmarks/gem_userptr_benchmark.c
@@ -58,17 +58,6 @@
   #define PAGE_SIZE 4096
 #endif

-#define LOCAL_I915_GEM_USERPTR   0x33
-#define LOCAL_IOCTL_I915_GEM_USERPTR DRM_IOWR (DRM_COMMAND_BASE + 
LOCAL_I915_GEM_USERPTR, struct local_i915_gem_userptr)
-struct local_i915_gem_userptr {
-   uint64_t user_ptr;
-   uint64_t user_size;
-   uint32_t flags;
-#define LOCAL_I915_USERPTR_READ_ONLY (1<<0)
-#define LOCAL_I915_USERPTR_UNSYNCHRONIZED (1<<31)
-   uint32_t handle;
-};
-
 static uint32_t userptr_flags = LOCAL_I915_USERPTR_UNSYNCHRONIZED;

 #define BO_SIZE (65536)
@@ -83,30 +72,6 @@ static void gem_userptr_test_synchronized(void)
userptr_flags = 0;
 }

-static int gem_userptr(int fd, void *ptr, int size, int read_only, uint32_t 
*handle)
-{
-   struct local_i915_gem_userptr userptr;
-   int ret;
-
-   userptr.user_ptr = (uintptr_t)ptr;
-   userptr.user_size = size;
-   userptr.flags = userptr_flags;
-   if (read_only)
-   userptr.flags |= LOCAL_I915_USERPTR_READ_ONLY;
-
-   ret = drmIoctl(fd, LOCAL_IOCTL_I915_GEM_USERPTR, &userptr);
-   if (ret)
-   ret = errno;
-   igt_skip_on_f(ret == ENODEV &&
- (userptr_flags & LOCAL_I915_USERPTR_UNSYNCHRONIZED) == 0 
&&
- !read_only,
- "Skipping, synchronized mappings with no kernel 
CONFIG_MMU_NOTIFIER?");
-   if (ret == 0)
-   *handle = userptr.handle;
-
-   return ret;
-}
-
 static void **handle_ptr_map;
 static unsigned int num_handle_ptr_map;

@@ -144,8 +109,7 @@ static uint32_t create_userptr_bo(int fd, int size)
ret = posix_memalign(&ptr, PAGE_SIZE, size);
igt_assert(ret == 0);

-   ret = gem_userptr(fd, (uint32_t *)ptr, size, 0, &handle);
-   igt_assert(ret == 0);
+   gem_userptr(fd, (uint32_t *)ptr, size, 0, userptr_flags, &handle);
add_handle_ptr(handle, ptr);

return handle;
@@ -167,7 +131,7 @@ static int has_userptr(int fd)
assert(posix_memalign(&ptr, PAGE_SIZE, PAGE_SIZE) == 0);
oldflags = userptr_flags;
gem_userptr_test_unsynchronized();
-   ret = gem_userptr(fd, ptr, PAGE_SIZE, 0, &handle);
+   ret = __gem_userptr(fd, ptr, PAGE_SIZE, 0, userptr_flags, &handle);
userptr_flags = oldflags;
if (ret != 0) {
free(ptr);
@@ -379,9 +343,7 @@ static void test_impact_overlap(int fd, const char *prefix)

for (i = 0, p = block; i < nr_bos[subtest];
 i++, p += PAGE_SIZE)
-   ret = gem_userptr(fd, (uint32_t *)p, BO_SIZE, 0,
- &handles[i]);
-   igt_assert(ret == 0);
+   gem_userptr(fd, (uint32_t *)p, BO_SIZE, 0, 
userptr_flags, &handles[i]);
}

if (nr_bos[subtest] > 0)
@@ -427,7 +389,6 @@ static void test_single(int fd)
char *ptr, *bo_ptr;
uint32_t handle = 0;
unsigned long iter = 0;
-   int ret;
unsigned long map_size = BO_SIZE + PAGE_SIZE - 1;

ptr = mmap(NULL, map_size, PROT_READ | PROT_WRITE,
@@ -439,8 +400,7 @@ static void test_single(int fd)
start_test(test_duration_sec);

while (run_test) {
-   ret = gem_userptr(fd, bo_ptr, BO_SIZE, 0, &handle);
-   assert(ret == 0);
+   gem_userptr(fd, bo_ptr, BO_SIZE, 0, userptr_flags, &handle);
gem_close(fd, handle);
iter++;
}
@@ -456,7 +416,6 @@ static void test_multiple(int fd, unsigned int batch, int 
random)
uint32_t handles[1];
int map[1];
unsigned long iter = 0;
-   int ret;
int i;
unsigned long map_size = batch * BO_SIZE + PAGE_SIZE - 1;

@@ -478,10 +437,8 @@ static void test_multiple(int fd, unsigned int batch, int 
random)
if (random)

[PATCH v6 5/5] drm/i915: Use CPU mapping for userspace dma-buf mmap()

2015-12-16 Thread Tiago Vignatti

Userspace is the one in charge of flush CPU by wrapping mmap with
begin{,end}_cpu_access.

v2: Remove LLC check cause we have dma-buf sync providers now. Also, fix return
before transferring ownership when mmap fails.
v3: Fix return values.
v4: !obj->base.filp is user triggerable, so removed the WARN_ON.

Reviewed-by: Chris Wilson 
Signed-off-by: Tiago Vignatti 
---
 drivers/gpu/drm/i915/i915_gem_dmabuf.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index 9dba876..b7e7a90 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -193,7 +193,23 @@ static void i915_gem_dmabuf_kunmap(struct dma_buf 
*dma_buf, unsigned long page_n

 static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct 
*vma)
 {
-   return -EINVAL;
+   struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
+   int ret;
+
+   if (obj->base.size < vma->vm_end - vma->vm_start)
+   return -EINVAL;
+
+   if (!obj->base.filp)
+   return -ENODEV;
+
+   ret = obj->base.filp->f_op->mmap(obj->base.filp, vma);
+   if (ret)
+   return ret;
+
+   fput(vma->vm_file);
+   vma->vm_file = get_file(obj->base.filp);
+
+   return 0;
 }

 static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum 
dma_data_direction direction)
-- 
2.1.4

[PATCH v6 4/5] drm/i915: Implement end_cpu_access

2015-12-16 Thread Tiago Vignatti

This function is meant to be used with dma-buf mmap, when finishing the CPU
access of the mapped pointer.

The error case should be rare to happen though, requiring the buffer become
active during the sync period and for the end_cpu_access to be interrupted. So
we use a uninterruptible mutex_lock to spit out when it ever happens.

v2: disable interruption to make sure errors are reported.
v3: update to the new end_cpu_access API.

Reviewed-by: Chris Wilson 
Signed-off-by: Tiago Vignatti 
---
 drivers/gpu/drm/i915/i915_gem_dmabuf.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index 65ab2bd..9dba876 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -212,6 +212,27 @@ static int i915_gem_begin_cpu_access(struct dma_buf 
*dma_buf, enum dma_data_dire
return ret;
 }

+static void i915_gem_end_cpu_access(struct dma_buf *dma_buf, enum 
dma_data_direction direction)
+{
+   struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
+   struct drm_device *dev = obj->base.dev;
+   struct drm_i915_private *dev_priv = to_i915(dev);
+   bool was_interruptible, write = (direction == DMA_BIDIRECTIONAL || 
direction == DMA_TO_DEVICE);
+   int ret;
+
+   mutex_lock(&dev->struct_mutex);
+   was_interruptible = dev_priv->mm.interruptible;
+   dev_priv->mm.interruptible = false;
+
+   ret = i915_gem_object_set_to_gtt_domain(obj, write);
+
+   dev_priv->mm.interruptible = was_interruptible;
+   mutex_unlock(&dev->struct_mutex);
+
+   if (unlikely(ret))
+   DRM_ERROR("unable to flush buffer following CPU access; 
rendering may be corrupt\n");
+}
+
 static const struct dma_buf_ops i915_dmabuf_ops =  {
.map_dma_buf = i915_gem_map_dma_buf,
.unmap_dma_buf = i915_gem_unmap_dma_buf,
@@ -224,6 +245,7 @@ static const struct dma_buf_ops i915_dmabuf_ops =  {
.vmap = i915_gem_dmabuf_vmap,
.vunmap = i915_gem_dmabuf_vunmap,
.begin_cpu_access = i915_gem_begin_cpu_access,
+   .end_cpu_access = i915_gem_end_cpu_access,
 };

 struct dma_buf *i915_gem_prime_export(struct drm_device *dev,
-- 
2.1.4

[PATCH v6 3/5] dma-buf: Add ioctls to allow userspace to flush

2015-12-16 Thread Tiago Vignatti

From: Daniel Vetter 

The userspace might need some sort of cache coherency management e.g. when CPU
and GPU domains are being accessed through dma-buf at the same time. To
circumvent this problem there are begin/end coherency markers, that forward
directly to existing dma-buf device drivers vfunc hooks. Userspace can make use
of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be
used like following:
 - mmap dma-buf fd
 - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
   to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
   want (with the new data being consumed by the GPU or say scanout device)
 - munmap once you don't need the buffer any more

v2 (Tiago): Fix header file type names (u64 -> __u64)
v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end
dma-buf functions. Check for overflows in start/length.
v4 (Tiago): use 2d regions for sync.
v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and
remove range information from struct dma_buf_sync.
v6 (Tiago): use __u64 structured padded flags instead enum. Adjust
documentation about the recommendation on using sync ioctls.

Cc: Sumit Semwal 
Signed-off-by: Daniel Vetter 
Signed-off-by: Tiago Vignatti 
---
 Documentation/dma-buf-sharing.txt | 22 +++-
 drivers/dma-buf/dma-buf.c | 43 +++
 include/uapi/linux/dma-buf.h  | 38 ++
 3 files changed, 102 insertions(+), 1 deletion(-)
 create mode 100644 include/uapi/linux/dma-buf.h

diff --git a/Documentation/dma-buf-sharing.txt 
b/Documentation/dma-buf-sharing.txt
index 4f4a84b..2ddd4b2 100644
--- a/Documentation/dma-buf-sharing.txt
+++ b/Documentation/dma-buf-sharing.txt
@@ -350,7 +350,27 @@ Being able to mmap an export dma-buf buffer object has 2 
main use-cases:
handles, too). So it's beneficial to support this in a similar fashion on
dma-buf to have a good transition path for existing Android userspace.

-   No special interfaces, userspace simply calls mmap on the dma-buf fd.
+   No special interfaces, userspace simply calls mmap on the dma-buf fd. Very
+   important to note though is that, even if it is not mandatory, the userspace
+   is strongly recommended to always use the cache synchronization ioctl
+   (DMA_BUF_IOCTL_SYNC) discussed next.
+
+   Some systems might need some sort of cache coherency management e.g. when
+   CPU and GPU domains are being accessed through dma-buf at the same time. To
+   circumvent this problem there are begin/end coherency markers, that forward
+   directly to existing dma-buf device drivers vfunc hooks. Userspace can make
+   use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence
+   would be used like following:
+ - mmap dma-buf fd
+ - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
+   to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
+   want (with the new data being consumed by the GPU or say scanout device)
+ - munmap once you don't need the buffer any more
+
+In principle systems with the memory cache shared by the GPU and CPU may
+not need SYNC_START and SYNC_END but still, userspace is always encouraged
+to use these ioctls before and after, respectively, when accessing the
+mapped address.

 2. Supporting existing mmap interfaces in importers

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index b2ac13b..9a298bd 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -34,6 +34,8 @@
 #include 
 #include 

+#include 
+
 static inline int is_dma_buf_file(struct file *);

 struct dma_buf_list {
@@ -251,11 +253,52 @@ out:
return events;
 }

+static long dma_buf_ioctl(struct file *file,
+ unsigned int cmd, unsigned long arg)
+{
+   struct dma_buf *dmabuf;
+   struct dma_buf_sync sync;
+   enum dma_data_direction direction;
+
+   dmabuf = file->private_data;
+
+   if (!is_dma_buf_file(file))
+   return -EINVAL;
+
+   switch (cmd) {
+   case DMA_BUF_IOCTL_SYNC:
+   if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
+   return -EFAULT;
+
+   if (sync.flags & DMA_BUF_SYNC_RW)
+   direction = DMA_BIDIRECTIONAL;
+   else if (sync.flags & DMA_BUF_SYNC_READ)
+   direction = DMA_FROM_DEVICE;
+   else if (sync.flags & DMA_BUF_SYNC_WRITE)
+   direction = DMA_TO_DEVICE;
+   else
+   return -EINVAL;
+
+   if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
+   return -EINVAL;
+
+   if (sync.flags & DMA_BUF_SYNC_END)
+   dma_buf_end_cpu_ac

[PATCH v6 2/5] dma-buf: Remove range-based flush

2015-12-16 Thread Tiago Vignatti

This patch removes range-based information used for optimizations in
begin_cpu_access and end_cpu_access.

We don't have any user nor implementation using range-based flush. It seems a
consensus that if we ever want something like that again (or even more robust
using 2D, 3D sub-range regions) we can use the upcoming dma-buf sync ioctl for
such.

Cc: Sumit Semwal 
Cc: Daniel Vetter 
Signed-off-by: Tiago Vignatti 
---
 Documentation/dma-buf-sharing.txt | 19 ---
 drivers/dma-buf/dma-buf.c | 13 -
 drivers/gpu/drm/i915/i915_gem_dmabuf.c|  2 +-
 drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c |  4 ++--
 drivers/gpu/drm/udl/udl_fb.c  |  2 --
 drivers/staging/android/ion/ion.c |  6 ++
 drivers/staging/android/ion/ion_test.c|  4 ++--
 include/linux/dma-buf.h   | 12 +---
 8 files changed, 24 insertions(+), 38 deletions(-)

diff --git a/Documentation/dma-buf-sharing.txt 
b/Documentation/dma-buf-sharing.txt
index 480c8de..4f4a84b 100644
--- a/Documentation/dma-buf-sharing.txt
+++ b/Documentation/dma-buf-sharing.txt
@@ -257,17 +257,15 @@ Access to a dma_buf from the kernel context involves 
three steps:

Interface:
   int dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
-  size_t start, size_t len,
   enum dma_data_direction direction)

This allows the exporter to ensure that the memory is actually available for
cpu access - the exporter might need to allocate or swap-in and pin the
backing storage. The exporter also needs to ensure that cpu access is
-   coherent for the given range and access direction. The range and access
-   direction can be used by the exporter to optimize the cache flushing, i.e.
-   access outside of the range or with a different direction (read instead of
-   write) might return stale or even bogus data (e.g. when the exporter needs 
to
-   copy the data to temporary storage).
+   coherent for the access direction. The direction can be used by the exporter
+   to optimize the cache flushing, i.e. access with a different direction (read
+   instead of write) might return stale or even bogus data (e.g. when the
+   exporter needs to copy the data to temporary storage).

This step might fail, e.g. in oom conditions.

@@ -322,14 +320,13 @@ Access to a dma_buf from the kernel context involves 
three steps:

 3. Finish access

-   When the importer is done accessing the range specified in begin_cpu_access,
-   it needs to announce this to the exporter (to facilitate cache flushing and
-   unpinning of any pinned resources). The result of any dma_buf kmap calls
-   after end_cpu_access is undefined.
+   When the importer is done accessing the CPU, it needs to announce this to
+   the exporter (to facilitate cache flushing and unpinning of any pinned
+   resources). The result of any dma_buf kmap calls after end_cpu_access is
+   undefined.

Interface:
   void dma_buf_end_cpu_access(struct dma_buf *dma_buf,
- size_t start, size_t len,
  enum dma_data_direction dir);


diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 155c146..b2ac13b 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -539,13 +539,11 @@ EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment);
  * preparations. Coherency is only guaranteed in the specified range for the
  * specified access direction.
  * @dmabuf:[in]buffer to prepare cpu access for.
- * @start: [in]start of range for cpu access.
- * @len:   [in]length of range for cpu access.
  * @direction: [in]length of range for cpu access.
  *
  * Can return negative error values, returns 0 on success.
  */
-int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, size_t start, size_t len,
+int dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
 enum dma_data_direction direction)
 {
int ret = 0;
@@ -554,8 +552,7 @@ int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, size_t 
start, size_t len,
return -EINVAL;

if (dmabuf->ops->begin_cpu_access)
-   ret = dmabuf->ops->begin_cpu_access(dmabuf, start,
-   len, direction);
+   ret = dmabuf->ops->begin_cpu_access(dmabuf, direction);

return ret;
 }
@@ -567,19 +564,17 @@ EXPORT_SYMBOL_GPL(dma_buf_begin_cpu_access);
  * actions. Coherency is only guaranteed in the specified range for the
  * specified access direction.
  * @dmabuf:[in]buffer to complete cpu access for.
- * @start: [in]start of range for cpu access.
- * @len:   [in]length of range for cpu access.
  * @direction: [in]length of range for cpu access.
  *
  * This call must always succeed.
  */
-void dma_buf_end_cpu_access(struct dma_buf *dmabuf, size_t s

[PATCH v6 1/5] drm: prime: Honour O_RDWR during prime-handle-to-fd

2015-12-16 Thread Tiago Vignatti

From: Daniel Thompson 

Currently DRM_IOCTL_PRIME_HANDLE_TO_FD rejects all flags except
(DRM|O)_CLOEXEC making it difficult (maybe impossible) for userspace
to mmap() the resulting dma-buf even when this is supported by the
DRM driver.

It is trivial to relax the restriction and permit read/write access.
This is safe because the flags are seldom touched by drm; mostly they
are passed verbatim to dma_buf calls.

v3 (Tiago): removed unused flags variable from drm_prime_handle_to_fd_ioctl.

Reviewed-by: Chris Wilson 
Signed-off-by: Daniel Thompson 
Signed-off-by: Tiago Vignatti 
---
 drivers/gpu/drm/drm_prime.c | 10 +++---
 include/uapi/drm/drm.h  |  1 +
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 27aa718..df6cdc7 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -329,7 +329,7 @@ static const struct dma_buf_ops drm_gem_prime_dmabuf_ops =  
{
  * drm_gem_prime_export - helper library implementation of the export callback
  * @dev: drm_device to export from
  * @obj: GEM object to export
- * @flags: flags like DRM_CLOEXEC
+ * @flags: flags like DRM_CLOEXEC and DRM_RDWR
  *
  * This is the implementation of the gem_prime_export functions for GEM drivers
  * using the PRIME helpers.
@@ -628,7 +628,6 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, 
void *data,
 struct drm_file *file_priv)
 {
struct drm_prime_handle *args = data;
-   uint32_t flags;

if (!drm_core_check_feature(dev, DRIVER_PRIME))
return -EINVAL;
@@ -637,14 +636,11 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, 
void *data,
return -ENOSYS;

/* check flags are valid */
-   if (args->flags & ~DRM_CLOEXEC)
+   if (args->flags & ~(DRM_CLOEXEC | DRM_RDWR))
return -EINVAL;

-   /* we only want to pass DRM_CLOEXEC which is == O_CLOEXEC */
-   flags = args->flags & DRM_CLOEXEC;
-
return dev->driver->prime_handle_to_fd(dev, file_priv,
-   args->handle, flags, &args->fd);
+   args->handle, args->flags, &args->fd);
 }

 int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data,
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index b4e92eb..a0ebfe7 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -669,6 +669,7 @@ struct drm_set_client_cap {
__u64 value;
 };

+#define DRM_RDWR O_RDWR
 #define DRM_CLOEXEC O_CLOEXEC
 struct drm_prime_handle {
__u32 handle;
-- 
2.1.4

Direct userspace dma-buf mmap (v6)

2015-12-16 Thread Tiago Vignatti

Hi all,

The last version of this work was sent a while ago here:

http://lists.freedesktop.org/archives/dri-devel/2015-August/089263.html

So let's recap this series:

1. it adds a vendor-independent client interface for mapping gem objects
   through prime, IOW it implements userspace mmap() on dma-buf fd.
   This could be used for texturing from CPU rendered buffer, passing
   buffers among processes without performing copies in the userspace.
2. the series lets the client write on the mmap'ed memory, and
3. it deals with GPU and CPU caches synchronization.

Based on previous discussions seems that people are fine with 1. and 2. but 
not really with 3., given that caches coherency is a bit more boring to deal 
with.

It's easier to use this new infra on "coherent hardware" (systems with the
memory cache that is shared by the GPU and CPU) because they rarely need to
use that kind of synchronization. But would be much more convenient to have 
the very same interface exposed for clients no matter whether the underlying 
hardware is cache coherent or not.

One idea that came up was to force clients to call the sync ioctls after the
dma-buf was mmaped. But apparently there's no easy, and performant, way to do
so cause seems too costly to go over the page table entry and check the dirty
bits. Also, depending on the instructions order sent for the devices, it
might be needed a sync call after the mapped region gets accessed as well, to
flush all cachelines and make sure for example the GPU domain won't read stale 
data. So that would make the things even more complicated, if we ever decide
to go to this direction of forcing sync ioctls. The alternative therefore is to
simply document it very well, strong wording the clients to use the sync ioctl 
regardless otherwise they will mis-behave. Do we have objections or maybe 
other wiser ways to circumvent this? I've made similar comments in August and
no one has came up with better ideas.

Lastly, the diff of v6 series is that I've basically addressed concerns
pointed in the igt tests, organized those changes better a bit (in smaller
patches), documented the usage of sync ioctls and I have extensively tested 
this in different types of hardware.

https://github.com/tiagovignatti/drm-intel/commits/drm-intel-nightly_dma-buf-mmap-v6
https://github.com/tiagovignatti/intel-gpu-tools/commits/dma-buf-mmap-v6

Tiago


Daniel Thompson (1):
  drm: prime: Honour O_RDWR during prime-handle-to-fd

Daniel Vetter (1):
  dma-buf: Add ioctls to allow userspace to flush

Tiago Vignatti (3):
  dma-buf: Remove range-based flush
  drm/i915: Implement end_cpu_access
  drm/i915: Use CPU mapping for userspace dma-buf mmap()

 Documentation/dma-buf-sharing.txt | 41 +++---
 drivers/dma-buf/dma-buf.c | 56 ++-
 drivers/gpu/drm/drm_prime.c   | 10 ++
 drivers/gpu/drm/i915/i915_gem_dmabuf.c| 42 +--
 drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c |  4 +--
 drivers/gpu/drm/udl/udl_fb.c  |  2 --
 drivers/staging/android/ion/ion.c |  6 ++--
 drivers/staging/android/ion/ion_test.c|  4 +--
 include/linux/dma-buf.h   | 12 +++
 include/uapi/drm/drm.h|  1 +
 include/uapi/linux/dma-buf.h  | 38 +
 11 files changed, 169 insertions(+), 47 deletions(-)
 create mode 100644 include/uapi/linux/dma-buf.h


And the igt changes:
Rob Bradford (1):
  prime_mmap: Add new test for calling mmap() on dma-buf fds

Tiago Vignatti (5):
  lib: Add gem_userptr and __gem_userptr helpers
  prime_mmap: Add basic tests to write in a bo using CPU
  lib: Add prime_sync_start and prime_sync_end helpers
  tests: Add kms_mmap_write_crc for cache coherency tests
  tests: Add prime_mmap_coherency for cache coherency tests

 benchmarks/gem_userptr_benchmark.c |  55 +
 lib/ioctl_wrappers.c   |  92 +++
 lib/ioctl_wrappers.h   |  32 +++
 tests/Makefile.sources |   3 +
 tests/gem_userptr_blits.c  | 104 ++--
 tests/kms_mmap_write_crc.c | 281 +
 tests/prime_mmap.c | 494 +
 tests/prime_mmap_coherency.c   | 246 ++
 8 files changed, 1180 insertions(+), 127 deletions(-)
 create mode 100644 tests/kms_mmap_write_crc.c
 create mode 100644 tests/prime_mmap.c
 create mode 100644 tests/prime_mmap_coherency.c

-- 
2.1.4

[PATCH 3/6] dma-buf: Add ioctls to allow userspace to flush

2015-08-27 Thread Tiago Vignatti

On 08/27/2015 05:03 AM, Chris Wilson wrote:
> On Wed, Aug 26, 2015 at 08:29:15PM -0300, Tiago Vignatti wrote:
>> +#ifndef _DMA_BUF_UAPI_H_
>> +#define _DMA_BUF_UAPI_H_
>> +
>> +enum dma_buf_sync_flags {
>> +DMA_BUF_SYNC_READ = (1 << 0),
>> +DMA_BUF_SYNC_WRITE = (2 << 0),
>> +DMA_BUF_SYNC_RW = (3 << 0),
>> +DMA_BUF_SYNC_START = (0 << 2),
>> +DMA_BUF_SYNC_END = (1 << 2),
>> +
>> +DMA_BUF_SYNC_VALID_FLAGS_MASK = DMA_BUF_SYNC_RW |
>> +DMA_BUF_SYNC_END
>> +};
>> +
>> +/* begin/end dma-buf functions used for userspace mmap. */
>> +struct dma_buf_sync {
>> +enum dma_buf_sync_flags flags;
>
> It is better to use explicitly sized types. And since this is not 64b
> padded, probably best to add that padding now.

ahh, I've changed it to use enum instead. If I rollback to use defines 
then do you think works better? Like this:

struct dma_buf_sync {
   __u64 flags;
};

#define DMA_BUF_SYNC_READ  (1 << 0)
#define DMA_BUF_SYNC_WRITE (2 << 0)
#define DMA_BUF_SYNC_RW(3 << 0)
#define DMA_BUF_SYNC_START (0 << 2)
#define DMA_BUF_SYNC_END   (1 << 2)
#define DMA_BUF_SYNC_VALID_FLAGS_MASK \
   (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END)

#define DMA_BUF_BASE'b'
#define DMA_BUF_IOCTL_SYNC  _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync)

[PATCH 3/6] dma-buf: Add ioctls to allow userspace to flush

2015-08-27 Thread Tiago Vignatti

On 08/27/2015 09:06 AM, Hwang, Dongseong wrote:
> On Thu, Aug 27, 2015 at 2:29 AM, Tiago Vignatti
>> +#define DMA_BUF_BASE   'b'
>
> 'b' is occupied by vme and qat driver;
> https://github.com/torvalds/linux/blob/master/Documentation/ioctl/ioctl-number.txt#L201

I believe this is alright, as noted in that txt file: "Because of the 
large number of drivers, many drivers share a partial letter with other 
drivers".

> is it bad idea for drm.h to include these definition?
> http://lxr.free-electrons.com/source/include/uapi/drm/drm.h#L684

this is not a drm code and other type of device drivers might want to 
use it as well.

Tiago

[PATCH 1/6] drm: prime: Honour O_RDWR during prime-handle-to-fd

2015-08-27 Thread Tiago Vignatti

On 08/27/2015 01:36 PM, Emil Velikov wrote:
> Hi all,
>
> On 27 August 2015 at 00:29, Tiago Vignatti  
> wrote:
>> From: Daniel Thompson 
>>
>> Currently DRM_IOCTL_PRIME_HANDLE_TO_FD rejects all flags except
>> (DRM|O)_CLOEXEC making it difficult (maybe impossible) for userspace
>> to mmap() the resulting dma-buf even when this is supported by the
>> DRM driver.
>>
>> It is trivial to relax the restriction and permit read/write access.
>> This is safe because the flags are seldom touched by drm; mostly they
>> are passed verbatim to dma_buf calls.
>>
> Strictly speaking shouldn't this patch be the last one in the series ?
> I.e. we should lift this restriction, after the sync method
> (ioctl/syscall/etc.) is in place. Or perhaps I missed something ?

I think you're right about it, Emil.

Thank you,

Tiago

[PATCH 6/6] drm/i915: Use CPU mapping for userspace dma-buf mmap()

2015-08-26 Thread Tiago Vignatti

Userspace is the one in charge of flush CPU by wrapping mmap with
begin{,end}_cpu_access.

v2: Remove LLC check cause we have dma-buf sync providers now. Also, fix return
before transferring ownership when mmap fails.
v3: Fix return values.
v4: !obj->base.filp is user triggerable, so removed the WARN_ON.

Reviewed-by: Chris Wilson 
Signed-off-by: Tiago Vignatti 
---
 drivers/gpu/drm/i915/i915_gem_dmabuf.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index 9dba876..b7e7a90 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -193,7 +193,23 @@ static void i915_gem_dmabuf_kunmap(struct dma_buf 
*dma_buf, unsigned long page_n

 static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct 
*vma)
 {
-   return -EINVAL;
+   struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
+   int ret;
+
+   if (obj->base.size < vma->vm_end - vma->vm_start)
+   return -EINVAL;
+
+   if (!obj->base.filp)
+   return -ENODEV;
+
+   ret = obj->base.filp->f_op->mmap(obj->base.filp, vma);
+   if (ret)
+   return ret;
+
+   fput(vma->vm_file);
+   vma->vm_file = get_file(obj->base.filp);
+
+   return 0;
 }

 static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum 
dma_data_direction direction)
-- 
2.1.0

[PATCH 5/6] drm/i915: Implement end_cpu_access

2015-08-26 Thread Tiago Vignatti

This function is meant to be used with dma-buf mmap, when finishing the CPU
access of the mapped pointer.

The error case should be rare to happen though, requiring the buffer become
active during the sync period and for the end_cpu_access to be interrupted. So
we use a uninterruptible mutex_lock to spit out when it ever happens.

v2: disable interruption to make sure errors are reported.
v3: update to the new end_cpu_access API.

Reviewed-by: Chris Wilson 
Signed-off-by: Tiago Vignatti 
---
 drivers/gpu/drm/i915/i915_gem_dmabuf.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index 65ab2bd..9dba876 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -212,6 +212,27 @@ static int i915_gem_begin_cpu_access(struct dma_buf 
*dma_buf, enum dma_data_dire
return ret;
 }

+static void i915_gem_end_cpu_access(struct dma_buf *dma_buf, enum 
dma_data_direction direction)
+{
+   struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
+   struct drm_device *dev = obj->base.dev;
+   struct drm_i915_private *dev_priv = to_i915(dev);
+   bool was_interruptible, write = (direction == DMA_BIDIRECTIONAL || 
direction == DMA_TO_DEVICE);
+   int ret;
+
+   mutex_lock(&dev->struct_mutex);
+   was_interruptible = dev_priv->mm.interruptible;
+   dev_priv->mm.interruptible = false;
+
+   ret = i915_gem_object_set_to_gtt_domain(obj, write);
+
+   dev_priv->mm.interruptible = was_interruptible;
+   mutex_unlock(&dev->struct_mutex);
+
+   if (unlikely(ret))
+   DRM_ERROR("unable to flush buffer following CPU access; 
rendering may be corrupt\n");
+}
+
 static const struct dma_buf_ops i915_dmabuf_ops =  {
.map_dma_buf = i915_gem_map_dma_buf,
.unmap_dma_buf = i915_gem_unmap_dma_buf,
@@ -224,6 +245,7 @@ static const struct dma_buf_ops i915_dmabuf_ops =  {
.vmap = i915_gem_dmabuf_vmap,
.vunmap = i915_gem_dmabuf_vunmap,
.begin_cpu_access = i915_gem_begin_cpu_access,
+   .end_cpu_access = i915_gem_end_cpu_access,
 };

 struct dma_buf *i915_gem_prime_export(struct drm_device *dev,
-- 
2.1.0

[PATCH 4/6] dma-buf: DRAFT: Make SYNC mandatory when userspace mmap

2015-08-26 Thread Tiago Vignatti

This is my (failed) attempt to make the SYNC_* mandatory. I've tried to revoke
write access to the mapped region until begin_cpu_access is called.

The tasklet schedule order seems alright but the whole logic is not working and
I guess it's something related to the fs trick I'm trying to do with the
put{,get}_write_access pair...

Not sure if I should follow this direction though. I've spent much time already
on it!. What do you think?

Cc: Thomas Hellstrom 
Cc: JÃ©rÃ´me Glisse 

---
 drivers/dma-buf/dma-buf.c | 31 ++-
 include/linux/dma-buf.h   |  3 +++
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 9a298bd..06cb37b 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -75,14 +75,34 @@ static int dma_buf_release(struct inode *inode, struct file 
*file)
if (dmabuf->resv == (struct reservation_object *)&dmabuf[1])
reservation_object_fini(dmabuf->resv);

+   tasklet_kill(&dmabuf->tasklet);
+
module_put(dmabuf->owner);
kfree(dmabuf);
return 0;
 }

+static void dmabuf_mmap_tasklet(unsigned long data)
+{
+   struct dma_buf *dmabuf = (struct dma_buf *) data;
+   struct inode *inode = file_inode(dmabuf->file);
+
+   if (!inode)
+   return;
+
+   /* the CPU accessing another device may put the cache in an incoherent 
state.
+* Therefore if the mmap succeeds, we forbid any further write access 
to the
+* dma-buf until SYNC_START ioctl takes place, which gets back the write
+* access. */
+   put_write_access(inode);
+
+   inode_dio_wait(inode);
+}
+
 static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
 {
struct dma_buf *dmabuf;
+   int ret;

if (!is_dma_buf_file(file))
return -EINVAL;
@@ -94,7 +114,11 @@ static int dma_buf_mmap_internal(struct file *file, struct 
vm_area_struct *vma)
dmabuf->size >> PAGE_SHIFT)
return -EINVAL;

-   return dmabuf->ops->mmap(dmabuf, vma);
+   ret = dmabuf->ops->mmap(dmabuf, vma);
+   if (!ret)
+   tasklet_schedule(&dmabuf->tasklet);
+
+   return ret;
 }

 static loff_t dma_buf_llseek(struct file *file, loff_t offset, int whence)
@@ -389,6 +413,8 @@ struct dma_buf *dma_buf_export(const struct 
dma_buf_export_info *exp_info)
list_add(&dmabuf->list_node, &db_list.head);
mutex_unlock(&db_list.lock);

+   tasklet_init(&dmabuf->tasklet, dmabuf_mmap_tasklet, (unsigned long) 
dmabuf);
+
return dmabuf;
 }
 EXPORT_SYMBOL_GPL(dma_buf_export);
@@ -589,6 +615,7 @@ EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment);
 int dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
 enum dma_data_direction direction)
 {
+   struct inode *inode = file_inode(dmabuf->file);
int ret = 0;

if (WARN_ON(!dmabuf))
@@ -597,6 +624,8 @@ int dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
if (dmabuf->ops->begin_cpu_access)
ret = dmabuf->ops->begin_cpu_access(dmabuf, direction);

+   get_write_access(inode);
+
return ret;
 }
 EXPORT_SYMBOL_GPL(dma_buf_begin_cpu_access);
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 532108e..0359792 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -24,6 +24,7 @@
 #ifndef __DMA_BUF_H__
 #define __DMA_BUF_H__

+#include 
 #include 
 #include 
 #include 
@@ -118,6 +119,7 @@ struct dma_buf_ops {
  * @list_node: node for dma_buf accounting and debugging.
  * @priv: exporter specific private data for this buffer object.
  * @resv: reservation object linked to this dma-buf
+ * @tasklet: tasklet for deferred mmap tasks.
  */
 struct dma_buf {
size_t size;
@@ -133,6 +135,7 @@ struct dma_buf {
struct list_head list_node;
void *priv;
struct reservation_object *resv;
+   struct tasklet_struct tasklet;

/* poll support */
wait_queue_head_t poll;
-- 
2.1.0

[PATCH 3/6] dma-buf: Add ioctls to allow userspace to flush

2015-08-26 Thread Tiago Vignatti

From: Daniel Vetter 

The userspace might need some sort of cache coherency management e.g. when CPU
and GPU domains are being accessed through dma-buf at the same time. To
circumvent this problem there are begin/end coherency markers, that forward
directly to existing dma-buf device drivers vfunc hooks. Userspace can make use
of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be
used like following:
 - mmap dma-buf fd
 - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
   to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
   want (with the new data being consumed by the GPU or say scanout device)
 - munamp once you don't need the buffer any more

v2 (Tiago): Fix header file type names (u64 -> __u64)
v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end
dma-buf functions. Check for overflows in start/length.
v4 (Tiago): use 2d regions for sync.
v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and
remove range information from struct dma_buf_sync.

Cc: Sumit Semwal 
Signed-off-by: Daniel Vetter 
Signed-off-by: Tiago Vignatti 
---
 Documentation/dma-buf-sharing.txt | 12 +++
 drivers/dma-buf/dma-buf.c | 43 +++
 include/uapi/linux/dma-buf.h  | 41 +
 3 files changed, 96 insertions(+)
 create mode 100644 include/uapi/linux/dma-buf.h

diff --git a/Documentation/dma-buf-sharing.txt 
b/Documentation/dma-buf-sharing.txt
index 4f4a84b..ec0ab1d 100644
--- a/Documentation/dma-buf-sharing.txt
+++ b/Documentation/dma-buf-sharing.txt
@@ -352,6 +352,18 @@ Being able to mmap an export dma-buf buffer object has 2 
main use-cases:

No special interfaces, userspace simply calls mmap on the dma-buf fd.

+   Also, the userspace might need some sort of cache coherency management e.g.
+   when CPU and GPU domains are being accessed through dma-buf at the same
+   time. To circumvent this problem there are begin/end coherency markers, that
+   forward directly to existing dma-buf device drivers vfunc hooks. Userspace
+   can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The
+   sequence would be used like following:
+ - mmap dma-buf fd
+ - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
+   to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
+   want (with the new data being consumed by the GPU or say scanout device)
+ - munamp once you don't need the buffer any more
+
 2. Supporting existing mmap interfaces in importers

Similar to the motivation for kernel cpu access it is again important that
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index b2ac13b..9a298bd 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -34,6 +34,8 @@
 #include 
 #include 

+#include 
+
 static inline int is_dma_buf_file(struct file *);

 struct dma_buf_list {
@@ -251,11 +253,52 @@ out:
return events;
 }

+static long dma_buf_ioctl(struct file *file,
+ unsigned int cmd, unsigned long arg)
+{
+   struct dma_buf *dmabuf;
+   struct dma_buf_sync sync;
+   enum dma_data_direction direction;
+
+   dmabuf = file->private_data;
+
+   if (!is_dma_buf_file(file))
+   return -EINVAL;
+
+   switch (cmd) {
+   case DMA_BUF_IOCTL_SYNC:
+   if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
+   return -EFAULT;
+
+   if (sync.flags & DMA_BUF_SYNC_RW)
+   direction = DMA_BIDIRECTIONAL;
+   else if (sync.flags & DMA_BUF_SYNC_READ)
+   direction = DMA_FROM_DEVICE;
+   else if (sync.flags & DMA_BUF_SYNC_WRITE)
+   direction = DMA_TO_DEVICE;
+   else
+   return -EINVAL;
+
+   if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
+   return -EINVAL;
+
+   if (sync.flags & DMA_BUF_SYNC_END)
+   dma_buf_end_cpu_access(dmabuf, direction);
+   else
+   dma_buf_begin_cpu_access(dmabuf, direction);
+
+   return 0;
+   default:
+   return -ENOTTY;
+   }
+}
+
 static const struct file_operations dma_buf_fops = {
.release= dma_buf_release,
.mmap   = dma_buf_mmap_internal,
.llseek = dma_buf_llseek,
.poll   = dma_buf_poll,
+   .unlocked_ioctl = dma_buf_ioctl,
 };

 /*
diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h
new file mode 100644
index 000..262f7a7
--- /dev/null
+++ b/include/uapi/linux/dma-buf.h
@@ -0,0 +1,41 @@
+/*
+ * Framework for buffer objects that can be shared across devices/subsystems.
+ *
+ * Copyright(C) 2015 Intel Ltd
+ *
+

[PATCH 2/6] dma-buf: Remove range-based flush

2015-08-26 Thread Tiago Vignatti

This patch removes range-based information used for optimizations in
begin_cpu_access and end_cpu_access.

We don't have any user nor implementation using range-based flush. It seems a
consensus that if we ever want something like that again (or even more robust
using 2D, 3D sub-range regions) we can use the upcoming dma-buf sync ioctl for
such.

Cc: Sumit Semwal 
Cc: Daniel Vetter 
Signed-off-by: Tiago Vignatti 
---
 Documentation/dma-buf-sharing.txt | 19 ---
 drivers/dma-buf/dma-buf.c | 13 -
 drivers/gpu/drm/i915/i915_gem_dmabuf.c|  2 +-
 drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c |  4 ++--
 drivers/gpu/drm/udl/udl_fb.c  |  2 --
 drivers/staging/android/ion/ion.c |  6 ++
 drivers/staging/android/ion/ion_test.c|  4 ++--
 include/linux/dma-buf.h   | 12 +---
 8 files changed, 24 insertions(+), 38 deletions(-)

diff --git a/Documentation/dma-buf-sharing.txt 
b/Documentation/dma-buf-sharing.txt
index 480c8de..4f4a84b 100644
--- a/Documentation/dma-buf-sharing.txt
+++ b/Documentation/dma-buf-sharing.txt
@@ -257,17 +257,15 @@ Access to a dma_buf from the kernel context involves 
three steps:

Interface:
   int dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
-  size_t start, size_t len,
   enum dma_data_direction direction)

This allows the exporter to ensure that the memory is actually available for
cpu access - the exporter might need to allocate or swap-in and pin the
backing storage. The exporter also needs to ensure that cpu access is
-   coherent for the given range and access direction. The range and access
-   direction can be used by the exporter to optimize the cache flushing, i.e.
-   access outside of the range or with a different direction (read instead of
-   write) might return stale or even bogus data (e.g. when the exporter needs 
to
-   copy the data to temporary storage).
+   coherent for the access direction. The direction can be used by the exporter
+   to optimize the cache flushing, i.e. access with a different direction (read
+   instead of write) might return stale or even bogus data (e.g. when the
+   exporter needs to copy the data to temporary storage).

This step might fail, e.g. in oom conditions.

@@ -322,14 +320,13 @@ Access to a dma_buf from the kernel context involves 
three steps:

 3. Finish access

-   When the importer is done accessing the range specified in begin_cpu_access,
-   it needs to announce this to the exporter (to facilitate cache flushing and
-   unpinning of any pinned resources). The result of any dma_buf kmap calls
-   after end_cpu_access is undefined.
+   When the importer is done accessing the CPU, it needs to announce this to
+   the exporter (to facilitate cache flushing and unpinning of any pinned
+   resources). The result of any dma_buf kmap calls after end_cpu_access is
+   undefined.

Interface:
   void dma_buf_end_cpu_access(struct dma_buf *dma_buf,
- size_t start, size_t len,
  enum dma_data_direction dir);


diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 155c146..b2ac13b 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -539,13 +539,11 @@ EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment);
  * preparations. Coherency is only guaranteed in the specified range for the
  * specified access direction.
  * @dmabuf:[in]buffer to prepare cpu access for.
- * @start: [in]start of range for cpu access.
- * @len:   [in]length of range for cpu access.
  * @direction: [in]length of range for cpu access.
  *
  * Can return negative error values, returns 0 on success.
  */
-int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, size_t start, size_t len,
+int dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
 enum dma_data_direction direction)
 {
int ret = 0;
@@ -554,8 +552,7 @@ int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, size_t 
start, size_t len,
return -EINVAL;

if (dmabuf->ops->begin_cpu_access)
-   ret = dmabuf->ops->begin_cpu_access(dmabuf, start,
-   len, direction);
+   ret = dmabuf->ops->begin_cpu_access(dmabuf, direction);

return ret;
 }
@@ -567,19 +564,17 @@ EXPORT_SYMBOL_GPL(dma_buf_begin_cpu_access);
  * actions. Coherency is only guaranteed in the specified range for the
  * specified access direction.
  * @dmabuf:[in]buffer to complete cpu access for.
- * @start: [in]start of range for cpu access.
- * @len:   [in]length of range for cpu access.
  * @direction: [in]length of range for cpu access.
  *
  * This call must always succeed.
  */
-void dma_buf_end_cpu_access(struct dma_buf *dmabuf, size_t s

[PATCH 1/6] drm: prime: Honour O_RDWR during prime-handle-to-fd

2015-08-26 Thread Tiago Vignatti

From: Daniel Thompson 

Currently DRM_IOCTL_PRIME_HANDLE_TO_FD rejects all flags except
(DRM|O)_CLOEXEC making it difficult (maybe impossible) for userspace
to mmap() the resulting dma-buf even when this is supported by the
DRM driver.

It is trivial to relax the restriction and permit read/write access.
This is safe because the flags are seldom touched by drm; mostly they
are passed verbatim to dma_buf calls.

v3 (Tiago): removed unused flags variable from drm_prime_handle_to_fd_ioctl.

Reviewed-by: Chris Wilson 
Signed-off-by: Daniel Thompson 
Signed-off-by: Tiago Vignatti 
---
 drivers/gpu/drm/drm_prime.c | 10 +++---
 include/uapi/drm/drm.h  |  1 +
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 27aa718..df6cdc7 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -329,7 +329,7 @@ static const struct dma_buf_ops drm_gem_prime_dmabuf_ops =  
{
  * drm_gem_prime_export - helper library implementation of the export callback
  * @dev: drm_device to export from
  * @obj: GEM object to export
- * @flags: flags like DRM_CLOEXEC
+ * @flags: flags like DRM_CLOEXEC and DRM_RDWR
  *
  * This is the implementation of the gem_prime_export functions for GEM drivers
  * using the PRIME helpers.
@@ -628,7 +628,6 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, 
void *data,
 struct drm_file *file_priv)
 {
struct drm_prime_handle *args = data;
-   uint32_t flags;

if (!drm_core_check_feature(dev, DRIVER_PRIME))
return -EINVAL;
@@ -637,14 +636,11 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, 
void *data,
return -ENOSYS;

/* check flags are valid */
-   if (args->flags & ~DRM_CLOEXEC)
+   if (args->flags & ~(DRM_CLOEXEC | DRM_RDWR))
return -EINVAL;

-   /* we only want to pass DRM_CLOEXEC which is == O_CLOEXEC */
-   flags = args->flags & DRM_CLOEXEC;
-
return dev->driver->prime_handle_to_fd(dev, file_priv,
-   args->handle, flags, &args->fd);
+   args->handle, args->flags, &args->fd);
 }

 int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data,
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 3801584..ad8223e 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -668,6 +668,7 @@ struct drm_set_client_cap {
__u64 value;
 };

+#define DRM_RDWR O_RDWR
 #define DRM_CLOEXEC O_CLOEXEC
 struct drm_prime_handle {
__u32 handle;
-- 
2.1.0

[PATCH v5] mmap on the dma-buf directly

2015-08-26 Thread Tiago Vignatti

Lot's of discussion since v4, thanks for your feedback:

http://lists.freedesktop.org/archives/dri-devel/2015-August/088259.html

The two main concerns was about range-based flush (which we decided to
postpone) and about making the SYNC ioctl mandatory. I need you guys guiding
and educating me on the latter now. PTAL.

Tiago


Daniel Thompson (1):
  drm: prime: Honour O_RDWR during prime-handle-to-fd

Daniel Vetter (1):
  dma-buf: Add ioctls to allow userspace to flush

Tiago Vignatti (4):
  dma-buf: Remove range-based flush
  dma-buf: DRAFT: Make SYNC mandatory when userspace mmap
  drm/i915: Implement end_cpu_access
  drm/i915: Use CPU mapping for userspace dma-buf mmap()

 Documentation/dma-buf-sharing.txt | 31 +++
 drivers/dma-buf/dma-buf.c | 87 +++
 drivers/gpu/drm/drm_prime.c   | 10 ++--
 drivers/gpu/drm/i915/i915_gem_dmabuf.c| 42 ++-
 drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c |  4 +-
 drivers/gpu/drm/udl/udl_fb.c  |  2 -
 drivers/staging/android/ion/ion.c |  6 +--
 drivers/staging/android/ion/ion_test.c|  4 +-
 include/linux/dma-buf.h   | 15 +++---
 include/uapi/drm/drm.h|  1 +
 include/uapi/linux/dma-buf.h  | 41 +++
 11 files changed, 196 insertions(+), 47 deletions(-)
 create mode 100644 include/uapi/linux/dma-buf.h

-- 
2.1.0

[PATCH v4] dma-buf: Add ioctls to allow userspace to flush

2015-08-26 Thread Tiago Vignatti

On 08/26/2015 11:51 AM, Daniel Vetter wrote:
> On Wed, Aug 26, 2015 at 11:32:30AM -0300, Tiago Vignatti wrote:
>> On 08/26/2015 09:58 AM, Daniel Vetter wrote:
>>> The other is that right now there's no user nor implementation in sight
>>> which actually does range-based flush optimizations, so I'm pretty much
>>> expecting we'll get it wrong. Maybe instead we should go one step further
>>> and remove the range from the internal dma-buf interface and also drop it
>> >from the ioctl? With the flags we can always add something later on once
>>> we have a real user with a clear need for it. But afaik cros only wants to
>>> shuffle around entire tiles and has a buffer-per-tile approach.
>>
>> Thomas, I think Daniel has a point here and also, I wouldn't mind removing
>> all range control from the dma-buf ioctl either.
>
> if we go with nuking it from the ioctl I'd suggest to also nuke it from
> the dma-buf internal inferface first too.

yep, I can do it.

Thomas, so we leave 2d sync out now?

Tiago

[PATCH v4] dma-buf: Add ioctls to allow userspace to flush

2015-08-26 Thread Tiago Vignatti

On 08/26/2015 09:58 AM, Daniel Vetter wrote:
> The other is that right now there's no user nor implementation in sight
> which actually does range-based flush optimizations, so I'm pretty much
> expecting we'll get it wrong. Maybe instead we should go one step further
> and remove the range from the internal dma-buf interface and also drop it
> from the ioctl? With the flags we can always add something later on once
> we have a real user with a clear need for it. But afaik cros only wants to
> shuffle around entire tiles and has a buffer-per-tile approach.

Thomas, I think Daniel has a point here and also, I wouldn't mind 
removing all range control from the dma-buf ioctl either.

In this last patch we can see that it's not complicated to add the 
2d-sync if we eventually decides to want it. But right now there's no 
way we can test it. Therefore in that case I'm all in favour of doing 
this work gradually, adding the basics first.

Tiago

[PATCH v4] dma-buf: Add ioctls to allow userspace to flush

2015-08-25 Thread Tiago Vignatti

From: Daniel Vetter 

The userspace might need some sort of cache coherency management e.g. when CPU
and GPU domains are being accessed through dma-buf at the same time. To
circumvent this problem there are begin/end coherency markers, that forward
directly to existing dma-buf device drivers vfunc hooks. Userspace can make use
of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be
used like following:

  - mmap dma-buf fd
  - for each drawing/upload cycle in CPU
1. SYNC_START ioctl
2. read/write to mmap area or a 2d sub-region of it
3. SYNC_END ioctl.
  - munamp once you don't need the buffer any more

v2 (Tiago): Fix header file type names (u64 -> __u64)
v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end
dma-buf functions. Check for overflows in start/length.
v4 (Tiago): use 2d regions for sync.

Cc: Sumit Semwal 
Signed-off-by: Daniel Vetter 
Signed-off-by: Tiago Vignatti 
---

I'm unable to test the 2d sync properly, because begin/end access in i915
don't track mapped range for nothing.

 Documentation/dma-buf-sharing.txt  | 13 ++
 drivers/dma-buf/dma-buf.c  | 77 --
 drivers/gpu/drm/i915/i915_gem_dmabuf.c |  6 ++-
 include/linux/dma-buf.h| 20 +
 include/uapi/linux/dma-buf.h   | 57 +
 5 files changed, 150 insertions(+), 23 deletions(-)
 create mode 100644 include/uapi/linux/dma-buf.h

diff --git a/Documentation/dma-buf-sharing.txt 
b/Documentation/dma-buf-sharing.txt
index 480c8de..8061ac0 100644
--- a/Documentation/dma-buf-sharing.txt
+++ b/Documentation/dma-buf-sharing.txt
@@ -355,6 +355,19 @@ Being able to mmap an export dma-buf buffer object has 2 
main use-cases:

No special interfaces, userspace simply calls mmap on the dma-buf fd.

+   Also, the userspace might need some sort of cache coherency management e.g.
+   when CPU and GPU domains are being accessed through dma-buf at the same
+   time. To circumvent this problem there are begin/end coherency markers, that
+   forward directly to existing dma-buf device drivers vfunc hooks. Userspace
+   can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The
+   sequence would be used like following:
+ - mmap dma-buf fd
+ - for each drawing/upload cycle in CPU
+   1. SYNC_START ioctl
+   2. read/write to mmap area or a 2d sub-region of it
+   3. SYNC_END ioctl.
+ - munamp once you don't need the buffer any more
+
 2. Supporting existing mmap interfaces in importers

Similar to the motivation for kernel cpu access it is again important that
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 155c146..b6a4a06 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -251,11 +251,55 @@ out:
return events;
 }

+static long dma_buf_ioctl(struct file *file,
+ unsigned int cmd, unsigned long arg)
+{
+   struct dma_buf *dmabuf;
+   struct dma_buf_sync sync;
+   enum dma_data_direction direction;
+
+   dmabuf = file->private_data;
+
+   if (!is_dma_buf_file(file))
+   return -EINVAL;
+
+   if (cmd != DMA_BUF_IOCTL_SYNC)
+   return -ENOTTY;
+
+   if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
+   return -EFAULT;
+
+   if (sync.flags & DMA_BUF_SYNC_RW)
+   direction = DMA_BIDIRECTIONAL;
+   else if (sync.flags & DMA_BUF_SYNC_READ)
+   direction = DMA_FROM_DEVICE;
+   else if (sync.flags & DMA_BUF_SYNC_WRITE)
+   direction = DMA_TO_DEVICE;
+   else
+   return -EINVAL;
+
+   if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
+   return -EINVAL;
+
+   /* TODO: check for overflowing the buffer's size - how so, checking 
region by
+* region here? Maybe need to check for the other parameters as well. */
+
+   if (sync.flags & DMA_BUF_SYNC_END)
+   dma_buf_end_cpu_access(dmabuf, sync.stride_bytes, 
sync.bytes_per_pixel,
+   sync.num_regions, sync.regions, direction);
+   else
+   dma_buf_begin_cpu_access(dmabuf, sync.stride_bytes, 
sync.bytes_per_pixel,
+   sync.num_regions, sync.regions, direction);
+
+   return 0;
+}
+
 static const struct file_operations dma_buf_fops = {
.release= dma_buf_release,
.mmap   = dma_buf_mmap_internal,
.llseek = dma_buf_llseek,
.poll   = dma_buf_poll,
+   .unlocked_ioctl = dma_buf_ioctl,
 };

 /*
@@ -539,14 +583,17 @@ EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment);
  * preparations. Coherency is only guaranteed in the specified range for the
  * specified access direction.
  * @dmabuf:[in]buffer to prepare cpu access for.
- * @start: [in]start of range for cpu access.

about mmap dma-buf and sync

2015-08-25 Thread Tiago Vignatti

On 08/25/2015 06:30 AM, Thomas Hellstrom wrote:
> On 08/25/2015 11:02 AM, Daniel Vetter wrote:
>> I really feel like any kind of multi-range flush interface is feature
>> bloat, and if we do it then we should only do it later on when there's a
>> clear need for it.
>
> IMO all the use-cases so far that wanted to do this have been 2D
> updates. and having only a 1D sync will most probably scare people away
> from this interface.
>
>> Afaiui dma-buf mmap will mostly be used for up/downloads, which means
>> there will be some explicit copy involved somewhere anyway. So similar to
>> userptr usage. And for those most often you just slurp in a linear block
>> and then let the blitter rearrange it, at least on i915.
>>
>> Also thus far the consensus was that dma-buf doesn't care/know about
>> content layout, adding strides/bytes/whatever does bend this quite a bit.
>
> I think with a 2D interface, the stride only applies to the sync
> operation itself and is only a parameter for that sync operation.
> Whether we should have multiple regions or not is not a big deal for me,
> but I think at least a 2D sync is crucial.

Right now only omap, ion and udl-fb make use of begin{,end}_cpu_access() 
dma-buf interface, but it's curious that none uses those 1-d parameters 
(start and len). So in that sense it seems that the tendency is to 
feature bloat the API if we do the 2-d additions.

OTOH we're talking about a different usage of dma-buf right now, so the 
driver might actually start to use in fact that API. That said, I 
thought it was somewhat simple to turn the common code into 2-d, cause, 
as I pointed in the other email, we'd be pushing the whole 
responsibility of dealing with the regions and so on to the driver 
implementors.

Thomas, any comments in the dma_buf_begin_cpu_access() new API I showed? 
Maybe I should just clean up here the draft and sent it to the ML :)

Tiago

about mmap dma-buf and sync

2015-08-24 Thread Tiago Vignatti

On 08/24/2015 03:01 PM, Tiago Vignatti wrote:
> yup, I think so. So IIUC the main changes needed for the drivers
> implement 2D sync lies in the dma_buf_sync_2d structure only. I.e.
> there's nothing really to be changed in the common code, right?

Do we have any special requirements in how we want pass the sync 
information to the drivers? I was thinking to push the whole 
responsibility for them, something like:

+int dma_buf_begin_cpu_access(struct dma_buf *dma_buf, size_t stride_bytes,
+size_t bytes_per_pixel, size_t num_regions,
+struct dma_buf_sync_region regions[], enum 
dma_data_direction dir);

Daniel Vetter mentioned about dma-buf design that should not track 
metadata but I haven't read anything about it, so do you think this 
looks alright?

Tiago

about mmap dma-buf and sync

2015-08-24 Thread Tiago Vignatti

On 08/24/2015 02:42 PM, Thomas Hellstrom wrote:
> On 08/24/2015 07:12 PM, Daniel Stone wrote:
>> Hi,
>>
>> On 24 August 2015 at 18:10, Thomas Hellstrom  
>> wrote:
>>> On 08/24/2015 07:04 PM, Daniel Stone wrote:
 On 24 August 2015 at 17:56, Thomas Hellstrom  
 wrote:
> On 08/24/2015 05:52 PM, Daniel Stone wrote:
>> I still don't think this ameliorates the need for batching: consider
>> the case where you update two disjoint screen regions and want them
>> both flushed. Either you issue two separate sync calls (which can be
>> disadvantageous performance-wise on some hardware / setups), or you
>> accumulate the regions and only flush later. So either two ioctls (one
>> in the style of dirtyfb and one to perform the sync/flush; you can
>> shortcut to assume the full buffer was damaged if the former is
>> missing), or one like this:
>>
>> struct dma_buf_sync_2d {
>>  enum dma_buf_sync_flags flags;
>>
>>  __u64 stride_bytes;
>>  __u32 bytes_per_pixel;
>>  __u32 num_regions;
>>
>>  struct {
>>  __u64 x;
>>  __u64 y;
>>  __u64 width;
>>  __u64 height;
>>  } regions[];
>> };
> Fine with me, although perhaps bytes_per_pixel is a bit redundant?
 Redundant how? It's not implicit in stride.
>>> For flushing purposes, isn't it possible to cover all cases by assuming
>>> bytes_per_pixel=1? Not that it matters much.
>> Sure, though in that case best to replace x with line_byte_offset or
>> something, because otherwise I guarantee you everyone will get it
>> wrong, and it'll be a pain to track down. Like how I managed to
>> misread it now. :)
>
> OK, yeah you have a point. IMO let's go for your proposal.
>
> Tiago, is this OK with you?

yup, I think so. So IIUC the main changes needed for the drivers 
implement 2D sync lies in the dma_buf_sync_2d structure only. I.e. 
there's nothing really to be changed in the common code, right? Then 
I'll just need to stick somewhere the logic about making sync mandatory, 
which I couldn't conclude much from your discussions with Jerome et al. 
I'll need to investigate more here.

Also, I still want to iterate with Google policy team about the actual 
need for a syscall. But I believe that eventually could be an secondary 
phase of this work (in case we ever agree upon having that).

Tiago

about mmap dma-buf and sync

2015-08-21 Thread Tiago Vignatti

Hi back!

On 08/20/2015 03:48 AM, Thomas Hellstrom wrote:
> Hi, Tiago!

Something that the Chrome OS folks told me today is whether we could 
change the sync API to use a syscall for that instead. Reason for that 
is to eventually fit this in nicely in their sandbox architecture 
requirements, so yet another ioctl wouldn't need to be white-listed for 
the unpriviledged process.

How does that change the game, for example regarding the details we been 
talking here about make the sync mandatory? Can we reuse, say msync(2) 
for that for example?

Best Regards,

Tiago

[PATCH] staging/android: Update ION TODO per LPC discussion

2015-08-21 Thread Tiago Vignatti

sgtm. Thanks for keeping me in the loop.

Tiago

On 08/21/2015 06:02 PM, Daniel Vetter wrote:
> We discussed a bit with the folks on the Cc: list below what to do
> with ION. Two big take-aways:
>
> - High-performance drivers (like gpus) always want to play tricks with
>coherency and will lie to the dma api (radeon, nouveau, i915 gpu
>drivers all do so in upstream). What needs to be done here is fill
>gaps in dma-buf so that we can do this without breaking the dma-api
>expections of other clients like v4l. The consesus is that hw won't
>stop needing these tricks anytime soon.
>
> - Placement constraints for shared buffers won't be solved any other
>way than through something platform-specific like ion with
>platform-specific knowledge in userspace in something like gralloc.
>For general-purpose devices where this assumption would be painful
>for userspace (like servers) the consensus is that such devices will
>have proper MMUs where placement constraint handling is fairly
>irrelevant.
>
> Hence it is reasonable to destage ion as-is without changing the
> overall design to enable these use-cases and just fixing up a these
> few fairly minor things. Since there won't relly be an open-source
> userspace for ion (and hence drm maintainers won't take it) the
> proposal is to eventually move it to drivers/android/ion.[hc]. Laura
> would be ok with being maintainer once this is all done and ion is
> destaged.
>
> Note that Thiago is working on exposing the cpu cache flushing for
> cpu access from userspace through mmaps so this is alread in progress.
> Also adding him to the Cc: list.
>
> v2: Add ION_IOC_IMPORT to the list of ioctl that probably should go.
>
> Cc: Laura Abbott 
> Cc: sumit.semwal at linaro.org
> Cc: laurent.pinchart at ideasonboard.com
> Cc: ghackmann at google.com
> Cc: robdclark at gmail.com
> Cc: david.brown at arm.com
> Cc: romlem at google.com
> Cc: Tiago Vignatti 
> Signed-off-by: Daniel Vetter 
> ---
>   drivers/staging/android/TODO | 20 
>   1 file changed, 20 insertions(+)
>
> diff --git a/drivers/staging/android/TODO b/drivers/staging/android/TODO
> index 06954cdf3dba..bc84a72af32d 100644
> --- a/drivers/staging/android/TODO
> +++ b/drivers/staging/android/TODO
> @@ -13,5 +13,25 @@ TODO:
>   - This bug is introduced by Xiong Zhou in the patch bd471258f2e09
>   - ("staging: android: logger: use kuid_t instead of uid_t")
>
> +
> +ion/
> + - Remove ION_IOC_SYNC: Flushing for devices should be purely a kernel 
> internal
> +   interface on top of dma-buf. flush_for_device needs to be added to dma-buf
> +   first.
> + - Remove ION_IOC_CUSTOM: Atm used for cache flushing for cpu access in some
> +   vendor trees. Should be replaced with an ioctl on the dma-buf to expose 
> the
> +   begin/end_cpu_access hooks to userspace.
> + - Clarify the tricks ion plays with explicitly managing coherency behind the
> +   dma api's back (this is absolutely needed for high-perf gpu drivers): Add 
> an
> +   explicit coherency management mode to flush_for_device to be used by 
> drivers
> +   which want to manage caches themselves and which indicates whether cpu 
> caches
> +   need flushing.
> + - With those removed there's probably no use for ION_IOC_IMPORT anymore 
> either
> +   since ion would just be the central allocator for shared buffers.
> + - Add dt-binding to expose cma regions as ion heaps, with the rule that any
> +   such cma regions must already be used by some device for dma. I.e. ion 
> only
> +   exposes existing cma regions and doesn't reserve unecessarily memory when
> +   booting a system which doesn't use ion.
> +
>   Please send patches to Greg Kroah-Hartman  and Cc:
>   Brian Swetland 
>

[PATCH 1/7] prime_mmap: Add new test for calling mmap() on dma-buf fds

2015-08-14 Thread Tiago Vignatti

Hi Daniel,

On 08/13/2015 04:04 AM, Daniel Vetter wrote:
> On Wed, Aug 12, 2015 at 08:29:14PM -0300, Tiago Vignatti wrote:
>> +/* Map too big */
>> +handle = gem_create(fd, BO_SIZE);
>> +fill_bo(handle, BO_SIZE);
>> +dma_buf_fd = prime_handle_to_fd(fd, handle);
>> +igt_assert(errno == 0);
>> +ptr = mmap(NULL, BO_SIZE * 2, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
>> +igt_assert(ptr == MAP_FAILED && errno == EINVAL);
>> +errno = 0;
>> +close(dma_buf_fd);
>> +gem_close(fd, handle);
>
> That only checks for one of the conditions, trying to map something
> offset/overlapping the end of the buffer, but small enough needs to be
> tested separately.

you mean test for smaller length with a non-zero offset? I don't think I 
get what you wanted to say here maybe.

> Also I think a testcase for invalid buffer2fd flags would be good, just
> for completeness - we seem to be missing that one.

you mean test for different flags than the ones supported by 
DRM_IOCTL_PRIME_HANDLE_TO_FD?

Tiago

[PATCH 3/7] prime_mmap: Add basic tests to write in a bo using CPU

2015-08-13 Thread Tiago Vignatti

On 08/13/2015 04:01 AM, Daniel Vetter wrote:
> On Wed, Aug 12, 2015 at 08:29:16PM -0300, Tiago Vignatti wrote:
>> This patch adds test_correct_cpu_write, which maps the texture buffer 
>> through a
>> prime fd and then writes directly to it using the CPU. It stresses the driver
>> to guarantee cache synchronization among the different domains.
>>
>> This test also adds test_forked_cpu_write, which creates the GEM bo in one
>> process and pass the prime handle of the it to another process, which in turn
>> uses the handle only to map and write. Grossly speaking this test simulates
>> Chrome OS  architecture, where the Web content ("unpriviledged process") maps
>> and CPU-draws a buffer, which was previously allocated in the GPU process
>> ("priviledged process").
>>
>> This requires kernel modifications (Daniel Thompson's "drm: prime: Honour
>> O_RDWR during prime-handle-to-fd").
>>
>> Signed-off-by: Tiago Vignatti 
>
> Squash with previous patch?

why? if the whole point is to decrease the amount of patches, then I 
prefer to squash 2/7 with the 1/7 (although they're from different 
authors and would be nice to keep separately the changes from each). 
This patch here introduces this writing to mmap'ed dma-buf fd, a concept 
that is still in debate, requiring a kernel counter-part so that's why I 
preferred to keep it away.


>> ---
>>   lib/ioctl_wrappers.c |  5 +++-
>>   tests/prime_mmap.c   | 65 
>> 
>>   2 files changed, 69 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c
>> index 53bd635..941fa66 100644
>> --- a/lib/ioctl_wrappers.c
>> +++ b/lib/ioctl_wrappers.c
>> @@ -1125,6 +1125,9 @@ void gem_require_ring(int fd, int ring_id)
>>
>>   /* prime */
>>
>> +#ifndef DRM_RDWR
>> +#define DRM_RDWR O_RDWR
>> +#endif
>>   /**
>>* prime_handle_to_fd:
>>* @fd: open i915 drm file descriptor
>> @@ -1142,7 +1145,7 @@ int prime_handle_to_fd(int fd, uint32_t handle)
>>
>>  memset(&args, 0, sizeof(args));
>>  args.handle = handle;
>> -args.flags = DRM_CLOEXEC;
>> +args.flags = DRM_CLOEXEC | DRM_RDWR;
>
> This needs to be optional otherwise all the existing prime tests start
> falling over on older kernels. Probably need a
> prime_handle_to_fd_with_mmap, which doesn an igt_skip if it fails.

true. Thank you.


> -Daniel
>
>>  args.fd = -1;
>>
>>  do_ioctl(fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &args);
>> diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c
>> index dc59e8f..ad91371 100644
>> --- a/tests/prime_mmap.c
>> +++ b/tests/prime_mmap.c
>> @@ -22,6 +22,7 @@
>>*
>>* Authors:
>>*Rob Bradford 
>> + *Tiago Vignatti 
>>*
>>*/
>>
>> @@ -66,6 +67,12 @@ fill_bo(uint32_t handle, size_t size)
>>   }
>>
>>   static void
>> +fill_bo_cpu(char *ptr)
>> +{
>> +memcpy(ptr, pattern, sizeof(pattern));
>> +}
>> +
>> +static void
>>   test_correct(void)
>>   {
>>  int dma_buf_fd;
>> @@ -180,6 +187,62 @@ test_forked(void)
>>  gem_close(fd, handle);
>>   }
>>
>> +/* test CPU write. This has a rather big implication for the driver which 
>> must
>> + * guarantee cache synchronization when writing the bo using CPU. */
>> +static void
>> +test_correct_cpu_write(void)
>> +{
>> +int dma_buf_fd;
>> +char *ptr;
>> +uint32_t handle;
>> +
>> +handle = gem_create(fd, BO_SIZE);
>> +
>> +dma_buf_fd = prime_handle_to_fd(fd, handle);
>> +igt_assert(errno == 0);
>> +
>> +/* Check correctness of map using write protection (PROT_WRITE) */
>> +ptr = mmap(NULL, BO_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, 
>> dma_buf_fd, 0);
>> +igt_assert(ptr != MAP_FAILED);
>> +
>> +/* Fill bo using CPU */
>> +fill_bo_cpu(ptr);
>> +
>> +/* Check pattern correctness */
>> +igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);
>> +
>> +munmap(ptr, BO_SIZE);
>> +close(dma_buf_fd);
>> +gem_close(fd, handle);
>> +}
>> +
>> +/* map from another process and then write using CPU */
>> +static void
>> +test_forked_cpu_write(void)
>> +{
>> +int dma_buf_fd;
>> +char *ptr;
>> +uint32_t handle;
>> +
>> +handle = gem_create(fd, BO_SIZE);
>> +
>> +d

[PATCH v4] mmap on the dma-buf directly

2015-08-13 Thread Tiago Vignatti

On 08/13/2015 08:09 AM, Thomas Hellstrom wrote:
> Tiago,
>
> I take it, this is intended to be a generic interface used mostly for 2D
> rendering.

yup. "generic" is an important point that I've actually forgot to 
mention in the description, which is probably the whole motivation for 
bringing this up.

We want avoid link any vendor-specific library to the unpriviledged 
process for security reasons, so it's a requirement to it not have 
access to driver-specific ioctls when mapping the buffers. The use-case 
for it is texturing from CPU rendered buffer, like you said, with the 
intention of passing these buffers among processes without performing 
any copy in the user-space ("zero-copy").

> In that case, using SYNC is crucial for performance of incoherent
> architectures and I'd rather see it mandatory than an option. It could
> perhaps be made mandatory preferrably using an error or a one-time
> kernel warning. If nobody uses the SYNC interface, it is of little use.

hmm I'm not sure it is little use. Our hardware (the "LLC" capable) has 
this very specific case where the cache gets dirty wrt the GPU, which is 
when the same buffer is shared with the scanout device. This is not 
something will happen in Chrome OS for example, so we wouldn't need the 
SYNC markers there.

In any case I think that making it mandatory works for us, but I'll have 
to check with Daniel/Chris whether there are performance penalties when 
accessing it and so on.

> Also I think the syncing needs to be extended to two dimensions. A long
> time ago when this was brought up people argued why we should limit it
> to two dimensions, but I believe two dimensions addresses most
> performance-problematic use-cases. A default implementation of
> twodimensional sync can easily be made using the one-dimensional API.

two dimension sync? You're saying that there could be a GPU access API 
in dma-buf as well? I don't get it, what's the use-case? I'm sure I 
missed the discussions because I wasn't particularly interested in this 
whole thingy before :)

Thanks for reviewing, Thomas.

Tiago

[PATCH 7/7] tests/kms_mmap_write_crc: Demonstrate the need for end_cpu_access

2015-08-12 Thread Tiago Vignatti

It requires i915 changes to add end_cpu_access().

Signed-off-by: Tiago Vignatti 
---
 tests/kms_mmap_write_crc.c | 63 --
 1 file changed, 55 insertions(+), 8 deletions(-)

diff --git a/tests/kms_mmap_write_crc.c b/tests/kms_mmap_write_crc.c
index e24d535..59ac9e7 100644
--- a/tests/kms_mmap_write_crc.c
+++ b/tests/kms_mmap_write_crc.c
@@ -67,6 +67,24 @@ static char *dmabuf_mmap_framebuffer(int drm_fd, struct 
igt_fb *fb)
return ptr;
 }

+static void dmabuf_sync_start(void)
+{
+   struct dma_buf_sync sync_start;
+
+   memset(&sync_start, 0, sizeof(sync_start));
+   sync_start.flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_RW;
+   do_ioctl(dma_buf_fd, DMA_BUF_IOCTL_SYNC, &sync_start);
+}
+
+static void dmabuf_sync_end(void)
+{
+   struct dma_buf_sync sync_end;
+
+   memset(&sync_end, 0, sizeof(sync_end));
+   sync_end.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_RW;
+   do_ioctl(dma_buf_fd, DMA_BUF_IOCTL_SYNC, &sync_end);
+}
+
 static void test_begin_access(data_t *data)
 {
igt_display_t *display = &data->display;
@@ -103,14 +121,11 @@ static void test_begin_access(data_t *data)
caching = gem_get_caching(data->drm_fd, fb->gem_handle);
igt_assert(caching == I915_CACHING_NONE || caching == 
I915_CACHING_DISPLAY);

-   // Uncomment the following for flush and the crc check next passes. It
-   // requires the kernel counter-part of it implemented obviously.
-   // {
-   // struct dma_buf_sync sync_start;
-   // memset(&sync_start, 0, sizeof(sync_start));
-   // sync_start.flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_RW;
-   // do_ioctl(dma_buf_fd, DMA_BUF_IOCTL_SYNC, &sync_start);
-   // }
+   /*
+* firstly demonstrate the need for DMA_BUF_SYNC_START 
("begin_cpu_access")
+*/
+
+   dmabuf_sync_start();

/* use dmabuf pointer to make the other fb all white too */
buf = malloc(fb->size);
@@ -126,6 +141,38 @@ static void test_begin_access(data_t *data)
/* check that the crc is as expected, which requires that caches got 
flushed */
igt_pipe_crc_collect_crc(data->pipe_crc, &crc);
igt_assert_crc_equal(&crc, &data->ref_crc);
+
+   /*
+* now demonstrates the need for DMA_BUF_SYNC_END ("end_cpu_access")
+*/
+
+   /* start over, writing non-white to the fb again and flip to it to make 
it
+* fully flushed */
+   cr = igt_get_cairo_ctx(data->drm_fd, fb);
+   igt_paint_test_pattern(cr, fb->width, fb->height);
+   cairo_destroy(cr);
+
+   igt_plane_set_fb(data->primary, fb);
+   igt_display_commit(display);
+
+   /* sync start, to move to CPU domain */
+   dmabuf_sync_start();
+
+   /* use dmabuf pointer in the same fb to make it all white */
+   buf = malloc(fb->size);
+   igt_assert(buf != NULL);
+   memset(buf, 0xff, fb->size);
+   memcpy(ptr, buf, fb->size);
+   free(buf);
+
+   /* there's an implicit flush in set_fb() as well (to set to the GTT 
domain),
+* so if we don't do it and instead write directly into the fb as it is 
the
+* scanout, that should demonstrate the need for end_cpu_access */
+   dmabuf_sync_end();
+
+   /* check that the crc is as expected, which requires that caches got 
flushed */
+   igt_pipe_crc_collect_crc(data->pipe_crc, &crc);
+   igt_assert_crc_equal(&crc, &data->ref_crc);
 }

 static bool prepare_crtc(data_t *data)
-- 
2.1.0

[PATCH 6/7] tests: Add kms_mmap_write_crc for cache coherency tests

2015-08-12 Thread Tiago Vignatti

This program can be used to detect when the writes don't land in scanout due
cache incoherency. Although this seems a problem inherently of non-LCC machines
("Atom"), this particular test catches a cache dirt on scanout on LLC machines
as well. It's inspired in Ville's kms_pwrite_crc.c and can be used also to test
the correctness of the driver's begin_cpu_access (TODO end_cpu_access).

To see the need for flush, one has to run this same binary a few times cause
it's not 100% reproducible (in my Core machine it's always possible to catch
the problem by running it at most 5 times).

Signed-off-by: Tiago Vignatti 
---
 tests/.gitignore   |   1 +
 tests/Makefile.sources |   1 +
 tests/kms_mmap_write_crc.c | 236 +
 3 files changed, 238 insertions(+)
 create mode 100644 tests/kms_mmap_write_crc.c

diff --git a/tests/.gitignore b/tests/.gitignore
index 5bc4a58..9ba1e48 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -140,6 +140,7 @@ kms_force_connector
 kms_frontbuffer_tracking
 kms_legacy_colorkey
 kms_mmio_vs_cs_flip
+kms_mmap_write_crc
 kms_pipe_b_c_ivb
 kms_pipe_crc_basic
 kms_plane
diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index 5b2072e..31c5508 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -163,6 +163,7 @@ TESTS_progs = \
kms_3d \
kms_fence_pin_leak \
kms_force_connector \
+   kms_mmap_write_crc \
kms_pwrite_crc \
kms_sink_crc_basic \
prime_udl \
diff --git a/tests/kms_mmap_write_crc.c b/tests/kms_mmap_write_crc.c
new file mode 100644
index 000..e24d535
--- /dev/null
+++ b/tests/kms_mmap_write_crc.c
@@ -0,0 +1,236 @@
+/*
+ * Copyright Â© 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Tiago Vignatti 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "drmtest.h"
+#include "igt_debugfs.h"
+#include "igt_kms.h"
+#include "intel_chipset.h"
+#include "ioctl_wrappers.h"
+#include "igt_aux.h"
+
+IGT_TEST_DESCRIPTION(
+   "Use the display CRC support to validate mmap write to an already uncached 
future scanout buffer.");
+
+typedef struct {
+   int drm_fd;
+   igt_display_t display;
+   struct igt_fb fb[2];
+   igt_output_t *output;
+   igt_plane_t *primary;
+   enum pipe pipe;
+   igt_crc_t ref_crc;
+   igt_pipe_crc_t *pipe_crc;
+   uint32_t devid;
+} data_t;
+
+int dma_buf_fd;
+
+static char *dmabuf_mmap_framebuffer(int drm_fd, struct igt_fb *fb)
+{
+   char *ptr = NULL;
+
+   dma_buf_fd = prime_handle_to_fd(drm_fd, fb->gem_handle);
+   igt_assert(errno == 0);
+
+   ptr = mmap(NULL, fb->size, PROT_READ | PROT_WRITE, MAP_SHARED, 
dma_buf_fd, 0);
+   igt_assert(ptr != MAP_FAILED);
+
+   return ptr;
+}
+
+static void test_begin_access(data_t *data)
+{
+   igt_display_t *display = &data->display;
+   igt_output_t *output = data->output;
+   struct igt_fb *fb = &data->fb[1];
+   drmModeModeInfo *mode;
+   cairo_t *cr;
+   char *ptr;
+   uint32_t caching;
+   void *buf;
+   igt_crc_t crc;
+
+   mode = igt_output_get_mode(output);
+
+   /* create a non-white fb where we can write later */
+   igt_create_fb(data->drm_fd, mode->hdisplay, mode->vdisplay,
+ DRM_FORMAT_XRGB, LOCAL_DRM_FORMAT_MOD_NONE, fb);
+
+   ptr = dmabuf_mmap_framebuffer(data->drm_fd, fb);
+
+   cr = igt_get_cairo_ctx(data->drm_fd, fb);
+   igt_paint_test_pattern(cr, fb->width, fb->height);
+   cairo_destroy(cr);
+
+   /* flip to it to make it UC/WC and fully flushed */
+   igt_plane_set_fb(data->primary, fb);
+   igt_display_commit

[PATCH 5/7] prime_mmap: Test for userptr mmap

2015-08-12 Thread Tiago Vignatti

A userptr doesn't have the obj->base.filp, but can be exported via dma-buf, so
make sure it fails when mmaping.

Signed-off-by: Tiago Vignatti 
---
In machine, export the handle to fd is actually returning error and falling
before the actual test happens. Same issue happens in gem_userptr_blits's
test_dmabuf(). This patch needs to be tested properly therefore.

 tests/prime_mmap.c | 38 +-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c
index ad91371..fd6d13b 100644
--- a/tests/prime_mmap.c
+++ b/tests/prime_mmap.c
@@ -299,12 +299,47 @@ static int prime_handle_to_fd_no_assert(uint32_t handle, 
int *fd_out)
args.fd = -1;

ret = drmIoctl(fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &args);
-
+   if (ret)
+   ret = errno;
*fd_out = args.fd;

return ret;
 }

+/* test for mmap(dma_buf_export(userptr)) */
+static void
+test_userptr(void)
+{
+   int ret, dma_buf_fd;
+   void *ptr;
+   uint32_t handle;
+
+   /* create userptr bo */
+   ret = posix_memalign(&ptr, 4096, BO_SIZE);
+   igt_assert_eq(ret, 0);
+
+   ret = gem_userptr(fd, (uint32_t *)ptr, BO_SIZE, 0, 
LOCAL_I915_USERPTR_UNSYNCHRONIZED, &handle);
+   igt_assert_eq(ret, 0);
+
+   /* export userptr */
+   ret = prime_handle_to_fd_no_assert(handle, &dma_buf_fd);
+   if (ret) {
+   igt_assert(ret == EINVAL || ret == ENODEV);
+   goto free_userptr;
+   } else {
+   igt_assert_eq(ret, 0);
+   igt_assert_lte(0, dma_buf_fd);
+   }
+
+   /* a userptr doesn't have the obj->base.filp, but can be exported via
+* dma-buf, so make sure it fails here */
+   ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
+   igt_assert(ptr == MAP_FAILED && errno == ENODEV);
+free_userptr:
+   gem_close(fd, handle);
+   close(dma_buf_fd);
+}
+
 static void
 test_errors(void)
 {
@@ -413,6 +448,7 @@ igt_main
{ "test_forked_cpu_write", test_forked_cpu_write },
{ "test_refcounting", test_refcounting },
{ "test_dup", test_dup },
+   { "test_userptr", test_userptr },
{ "test_errors", test_errors },
{ "test_aperture_limit", test_aperture_limit },
};
-- 
2.1.0

[PATCH 4/7] lib: Add gem_userptr helpers

2015-08-12 Thread Tiago Vignatti

This patch moves userptr definitions and helpers implementation that were
locally in gem_userptr_benchmark and gem_userptr_blits to the library, so other
tests can make use of them as well. There's no functional changes.

Signed-off-by: Tiago Vignatti 
---
 benchmarks/gem_userptr_benchmark.c | 45 +++
 lib/ioctl_wrappers.c   | 30 +
 lib/ioctl_wrappers.h   | 13 ++
 tests/gem_userptr_blits.c  | 92 +++---
 4 files changed, 73 insertions(+), 107 deletions(-)

diff --git a/benchmarks/gem_userptr_benchmark.c 
b/benchmarks/gem_userptr_benchmark.c
index b804fdd..e0797dc 100644
--- a/benchmarks/gem_userptr_benchmark.c
+++ b/benchmarks/gem_userptr_benchmark.c
@@ -58,17 +58,6 @@
   #define PAGE_SIZE 4096
 #endif

-#define LOCAL_I915_GEM_USERPTR   0x33
-#define LOCAL_IOCTL_I915_GEM_USERPTR DRM_IOWR (DRM_COMMAND_BASE + 
LOCAL_I915_GEM_USERPTR, struct local_i915_gem_userptr)
-struct local_i915_gem_userptr {
-   uint64_t user_ptr;
-   uint64_t user_size;
-   uint32_t flags;
-#define LOCAL_I915_USERPTR_READ_ONLY (1<<0)
-#define LOCAL_I915_USERPTR_UNSYNCHRONIZED (1<<31)
-   uint32_t handle;
-};
-
 static uint32_t userptr_flags = LOCAL_I915_USERPTR_UNSYNCHRONIZED;

 #define BO_SIZE (65536)
@@ -83,30 +72,6 @@ static void gem_userptr_test_synchronized(void)
userptr_flags = 0;
 }

-static int gem_userptr(int fd, void *ptr, int size, int read_only, uint32_t 
*handle)
-{
-   struct local_i915_gem_userptr userptr;
-   int ret;
-
-   userptr.user_ptr = (uintptr_t)ptr;
-   userptr.user_size = size;
-   userptr.flags = userptr_flags;
-   if (read_only)
-   userptr.flags |= LOCAL_I915_USERPTR_READ_ONLY;
-
-   ret = drmIoctl(fd, LOCAL_IOCTL_I915_GEM_USERPTR, &userptr);
-   if (ret)
-   ret = errno;
-   igt_skip_on_f(ret == ENODEV &&
- (userptr_flags & LOCAL_I915_USERPTR_UNSYNCHRONIZED) == 0 
&&
- !read_only,
- "Skipping, synchronized mappings with no kernel 
CONFIG_MMU_NOTIFIER?");
-   if (ret == 0)
-   *handle = userptr.handle;
-
-   return ret;
-}
-
 static void **handle_ptr_map;
 static unsigned int num_handle_ptr_map;

@@ -144,7 +109,7 @@ static uint32_t create_userptr_bo(int fd, int size)
ret = posix_memalign(&ptr, PAGE_SIZE, size);
igt_assert(ret == 0);

-   ret = gem_userptr(fd, (uint32_t *)ptr, size, 0, &handle);
+   ret = gem_userptr(fd, (uint32_t *)ptr, size, 0, userptr_flags, &handle);
igt_assert(ret == 0);
add_handle_ptr(handle, ptr);

@@ -167,7 +132,7 @@ static int has_userptr(int fd)
assert(posix_memalign(&ptr, PAGE_SIZE, PAGE_SIZE) == 0);
oldflags = userptr_flags;
gem_userptr_test_unsynchronized();
-   ret = gem_userptr(fd, ptr, PAGE_SIZE, 0, &handle);
+   ret = gem_userptr(fd, ptr, PAGE_SIZE, 0, userptr_flags, &handle);
userptr_flags = oldflags;
if (ret != 0) {
free(ptr);
@@ -379,7 +344,7 @@ static void test_impact_overlap(int fd, const char *prefix)

for (i = 0, p = block; i < nr_bos[subtest];
 i++, p += PAGE_SIZE)
-   ret = gem_userptr(fd, (uint32_t *)p, BO_SIZE, 0,
+   ret = gem_userptr(fd, (uint32_t *)p, BO_SIZE, 
0, userptr_flags,
  &handles[i]);
igt_assert(ret == 0);
}
@@ -439,7 +404,7 @@ static void test_single(int fd)
start_test(test_duration_sec);

while (run_test) {
-   ret = gem_userptr(fd, bo_ptr, BO_SIZE, 0, &handle);
+   ret = gem_userptr(fd, bo_ptr, BO_SIZE, 0, userptr_flags, 
&handle);
assert(ret == 0);
gem_close(fd, handle);
iter++;
@@ -480,7 +445,7 @@ static void test_multiple(int fd, unsigned int batch, int 
random)
for (i = 0; i < batch; i++) {
ret = gem_userptr(fd, bo_ptr + map[i] * BO_SIZE,
BO_SIZE,
-   0, &handles[i]);
+   0, userptr_flags, &handles[i]);
assert(ret == 0);
}
if (random)
diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c
index 941fa66..5152647 100644
--- a/lib/ioctl_wrappers.c
+++ b/lib/ioctl_wrappers.c
@@ -742,6 +742,36 @@ void gem_context_require_ban_period(int fd)
igt_require(has_ban_period);
 }

+int gem_userptr(int fd, void *ptr, int size, int read_only, uint32_t flags, 
uint32_t *handle)
+{
+   struct local_i915_gem_userptr userptr;
+   int ret;
+
+   memset(&

[PATCH 3/7] prime_mmap: Add basic tests to write in a bo using CPU

2015-08-12 Thread Tiago Vignatti

This patch adds test_correct_cpu_write, which maps the texture buffer through a
prime fd and then writes directly to it using the CPU. It stresses the driver
to guarantee cache synchronization among the different domains.

This test also adds test_forked_cpu_write, which creates the GEM bo in one
process and pass the prime handle of the it to another process, which in turn
uses the handle only to map and write. Grossly speaking this test simulates
Chrome OS  architecture, where the Web content ("unpriviledged process") maps
and CPU-draws a buffer, which was previously allocated in the GPU process
("priviledged process").

This requires kernel modifications (Daniel Thompson's "drm: prime: Honour
O_RDWR during prime-handle-to-fd").

Signed-off-by: Tiago Vignatti 
---
 lib/ioctl_wrappers.c |  5 +++-
 tests/prime_mmap.c   | 65 
 2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c
index 53bd635..941fa66 100644
--- a/lib/ioctl_wrappers.c
+++ b/lib/ioctl_wrappers.c
@@ -1125,6 +1125,9 @@ void gem_require_ring(int fd, int ring_id)

 /* prime */

+#ifndef DRM_RDWR
+#define DRM_RDWR O_RDWR
+#endif
 /**
  * prime_handle_to_fd:
  * @fd: open i915 drm file descriptor
@@ -1142,7 +1145,7 @@ int prime_handle_to_fd(int fd, uint32_t handle)

memset(&args, 0, sizeof(args));
args.handle = handle;
-   args.flags = DRM_CLOEXEC;
+   args.flags = DRM_CLOEXEC | DRM_RDWR;
args.fd = -1;

do_ioctl(fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &args);
diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c
index dc59e8f..ad91371 100644
--- a/tests/prime_mmap.c
+++ b/tests/prime_mmap.c
@@ -22,6 +22,7 @@
  *
  * Authors:
  *Rob Bradford 
+ *Tiago Vignatti 
  *
  */

@@ -66,6 +67,12 @@ fill_bo(uint32_t handle, size_t size)
 }

 static void
+fill_bo_cpu(char *ptr)
+{
+   memcpy(ptr, pattern, sizeof(pattern));
+}
+
+static void
 test_correct(void)
 {
int dma_buf_fd;
@@ -180,6 +187,62 @@ test_forked(void)
gem_close(fd, handle);
 }

+/* test CPU write. This has a rather big implication for the driver which must
+ * guarantee cache synchronization when writing the bo using CPU. */
+static void
+test_correct_cpu_write(void)
+{
+   int dma_buf_fd;
+   char *ptr;
+   uint32_t handle;
+
+   handle = gem_create(fd, BO_SIZE);
+
+   dma_buf_fd = prime_handle_to_fd(fd, handle);
+   igt_assert(errno == 0);
+
+   /* Check correctness of map using write protection (PROT_WRITE) */
+   ptr = mmap(NULL, BO_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, 
dma_buf_fd, 0);
+   igt_assert(ptr != MAP_FAILED);
+
+   /* Fill bo using CPU */
+   fill_bo_cpu(ptr);
+
+   /* Check pattern correctness */
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);
+
+   munmap(ptr, BO_SIZE);
+   close(dma_buf_fd);
+   gem_close(fd, handle);
+}
+
+/* map from another process and then write using CPU */
+static void
+test_forked_cpu_write(void)
+{
+   int dma_buf_fd;
+   char *ptr;
+   uint32_t handle;
+
+   handle = gem_create(fd, BO_SIZE);
+
+   dma_buf_fd = prime_handle_to_fd(fd, handle);
+   igt_assert(errno == 0);
+
+   igt_fork(childno, 1) {
+   ptr = mmap(NULL, BO_SIZE, PROT_READ | PROT_WRITE , MAP_SHARED, 
dma_buf_fd, 0);
+   igt_assert(ptr != MAP_FAILED);
+   fill_bo_cpu(ptr);
+
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);
+   munmap(ptr, BO_SIZE);
+   close(dma_buf_fd);
+   }
+   close(dma_buf_fd);
+   igt_waitchildren();
+   gem_close(fd, handle);
+}
+
 static void
 test_refcounting(void)
 {
@@ -346,6 +409,8 @@ igt_main
{ "test_map_unmap", test_map_unmap },
{ "test_reprime", test_reprime },
{ "test_forked", test_forked },
+   { "test_correct_cpu_write", test_correct_cpu_write },
+   { "test_forked_cpu_write", test_forked_cpu_write },
{ "test_refcounting", test_refcounting },
{ "test_dup", test_dup },
{ "test_errors", test_errors },
-- 
2.1.0

[PATCH 2/7] prime_mmap: Fix a few misc stuff

2015-08-12 Thread Tiago Vignatti

- Remove pattern_check(), which was walking through a useless iterator
- Remove superfluous PROT_WRITE from gem_mmap, in test_correct()
- Add binary file to .gitignore

Signed-off-by: Tiago Vignatti 
---
 tests/.gitignore   |  1 +
 tests/prime_mmap.c | 37 -
 2 files changed, 13 insertions(+), 25 deletions(-)

diff --git a/tests/.gitignore b/tests/.gitignore
index 0af0899..5bc4a58 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -163,6 +163,7 @@ pm_sseu
 prime_nv_api
 prime_nv_pcopy
 prime_nv_test
+prime_mmap
 prime_self_import
 prime_udl
 template
diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c
index 4dc2055..dc59e8f 100644
--- a/tests/prime_mmap.c
+++ b/tests/prime_mmap.c
@@ -65,19 +65,6 @@ fill_bo(uint32_t handle, size_t size)
}
 }

-static int
-pattern_check(char *ptr, size_t size)
-{
-   off_t i;
-   for (i = 0; i < size; i+=sizeof(pattern))
-   {
-   if (memcmp(ptr, pattern, sizeof(pattern)) != 0)
-   return 1;
-   }
-
-   return 0;
-}
-
 static void
 test_correct(void)
 {
@@ -92,14 +79,14 @@ test_correct(void)
igt_assert(errno == 0);

/* Check correctness vs GEM_MMAP_GTT */
-   ptr1 = gem_mmap(fd, handle, BO_SIZE, PROT_READ | PROT_WRITE);
+   ptr1 = gem_mmap(fd, handle, BO_SIZE, PROT_READ);
ptr2 = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
igt_assert(ptr1 != MAP_FAILED);
igt_assert(ptr2 != MAP_FAILED);
igt_assert(memcmp(ptr1, ptr2, BO_SIZE) == 0);

/* Check pattern correctness */
-   igt_assert(pattern_check(ptr2, BO_SIZE) == 0);
+   igt_assert(memcmp(ptr2, pattern, sizeof(pattern)) == 0);

munmap(ptr1, BO_SIZE);
munmap(ptr2, BO_SIZE);
@@ -122,13 +109,13 @@ test_map_unmap(void)

ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
igt_assert(ptr != MAP_FAILED);
-   igt_assert(pattern_check(ptr, BO_SIZE) == 0);
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);

/* Unmap and remap */
munmap(ptr, BO_SIZE);
ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
igt_assert(ptr != MAP_FAILED);
-   igt_assert(pattern_check(ptr, BO_SIZE) == 0);
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);

munmap(ptr, BO_SIZE);
close(dma_buf_fd);
@@ -151,16 +138,16 @@ test_reprime(void)

ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
igt_assert(ptr != MAP_FAILED);
-   igt_assert(pattern_check(ptr, BO_SIZE) == 0);
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);

close (dma_buf_fd);
-   igt_assert(pattern_check(ptr, BO_SIZE) == 0);
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);
munmap(ptr, BO_SIZE);

dma_buf_fd = prime_handle_to_fd(fd, handle);
ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
igt_assert(ptr != MAP_FAILED);
-   igt_assert(pattern_check(ptr, BO_SIZE) == 0);
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);

munmap(ptr, BO_SIZE);
close(dma_buf_fd);
@@ -184,7 +171,7 @@ test_forked(void)
igt_fork(childno, 1) {
ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
igt_assert(ptr != MAP_FAILED);
-   igt_assert(pattern_check(ptr, BO_SIZE) == 0);
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);
munmap(ptr, BO_SIZE);
close(dma_buf_fd);
}
@@ -210,7 +197,7 @@ test_refcounting(void)

ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
igt_assert(ptr != MAP_FAILED);
-   igt_assert(pattern_check(ptr, BO_SIZE) == 0);
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);
munmap(ptr, BO_SIZE);
close (dma_buf_fd);
 }
@@ -231,7 +218,7 @@ test_dup(void)

ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
igt_assert(ptr != MAP_FAILED);
-   igt_assert(pattern_check(ptr, BO_SIZE) == 0);
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);
munmap(ptr, BO_SIZE);
gem_close(fd, handle);
close (dma_buf_fd);
@@ -310,7 +297,7 @@ test_aperture_limit(void)
igt_assert(errno == 0);
ptr1 = mmap(NULL, size1, PROT_READ, MAP_SHARED, dma_buf_fd1, 0);
igt_assert(ptr1 != MAP_FAILED);
-   igt_assert(pattern_check(ptr1, BO_SIZE) == 0);
+   igt_assert(memcmp(ptr1, pattern, sizeof(pattern)) == 0);

handle2 = gem_create(fd, size1);
fill_bo(handle2, BO_SIZE);
@@ -318,7 +305,7 @@ test_aperture_limit(void)
igt_assert(errno == 0);
ptr2 = mmap(NULL, size2, PROT_READ, MAP_SHARED, dma_buf_fd2, 0);
igt_assert(ptr2 != MAP_FAILED);
-   igt_assert(pattern_check(ptr2, BO_SIZE) == 0);
+   igt_assert(memcmp(ptr2, pattern, sizeof(pattern)) =

[PATCH 1/7] prime_mmap: Add new test for calling mmap() on dma-buf fds

2015-08-12 Thread Tiago Vignatti

From: Rob Bradford 

This test has the following subtests:
 - test_correct for correctness of the data
 - test_map_unmap checks for mapping idempotency
 - test_reprime checks for dma-buf creation idempotency
 - test_forked checks for multiprocess access
 - test_refcounting checks for buffer reference counting
 - test_dup chats that dup()ing the fd works
 - test_errors checks the error return values for failures
 - test_aperture_limit tests multiple buffer creation at the gtt aperture
   limit

Signed-off-by: Rob Bradford 
Signed-off-by: Tiago Vignatti 
---
 tests/Makefile.sources |   1 +
 tests/prime_mmap.c | 383 +
 2 files changed, 384 insertions(+)
 create mode 100644 tests/prime_mmap.c

diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index c94714b..5b2072e 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -90,6 +90,7 @@ TESTS_progs_M = \
pm_rps \
pm_rc6_residency \
pm_sseu \
+   prime_mmap \
prime_self_import \
template \
$(NULL)
diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c
new file mode 100644
index 000..4dc2055
--- /dev/null
+++ b/tests/prime_mmap.c
@@ -0,0 +1,383 @@
+/*
+ * Copyright Â© 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Rob Bradford 
+ *
+ */
+
+/*
+ * Testcase: Check whether mmap()ing dma-buf works
+ */
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "drm.h"
+#include "i915_drm.h"
+#include "drmtest.h"
+#include "igt_debugfs.h"
+#include "ioctl_wrappers.h"
+
+#define BO_SIZE (16*1024)
+
+static int fd;
+
+char pattern[] = {0xff, 0x00, 0x00, 0x00,
+   0x00, 0xff, 0x00, 0x00,
+   0x00, 0x00, 0xff, 0x00,
+   0x00, 0x00, 0x00, 0xff};
+
+static void
+fill_bo(uint32_t handle, size_t size)
+{
+   off_t i;
+   for (i = 0; i < size; i+=sizeof(pattern))
+   {
+   gem_write(fd, handle, i, pattern, sizeof(pattern));
+   }
+}
+
+static int
+pattern_check(char *ptr, size_t size)
+{
+   off_t i;
+   for (i = 0; i < size; i+=sizeof(pattern))
+   {
+   if (memcmp(ptr, pattern, sizeof(pattern)) != 0)
+   return 1;
+   }
+
+   return 0;
+}
+
+static void
+test_correct(void)
+{
+   int dma_buf_fd;
+   char *ptr1, *ptr2;
+   uint32_t handle;
+
+   handle = gem_create(fd, BO_SIZE);
+   fill_bo(handle, BO_SIZE);
+
+   dma_buf_fd = prime_handle_to_fd(fd, handle);
+   igt_assert(errno == 0);
+
+   /* Check correctness vs GEM_MMAP_GTT */
+   ptr1 = gem_mmap(fd, handle, BO_SIZE, PROT_READ | PROT_WRITE);
+   ptr2 = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
+   igt_assert(ptr1 != MAP_FAILED);
+   igt_assert(ptr2 != MAP_FAILED);
+   igt_assert(memcmp(ptr1, ptr2, BO_SIZE) == 0);
+
+   /* Check pattern correctness */
+   igt_assert(pattern_check(ptr2, BO_SIZE) == 0);
+
+   munmap(ptr1, BO_SIZE);
+   munmap(ptr2, BO_SIZE);
+   close(dma_buf_fd);
+   gem_close(fd, handle);
+}
+
+static void
+test_map_unmap(void)
+{
+   int dma_buf_fd;
+   char *ptr;
+   uint32_t handle;
+
+   handle = gem_create(fd, BO_SIZE);
+   fill_bo(handle, BO_SIZE);
+
+   dma_buf_fd = prime_handle_to_fd(fd, handle);
+   igt_assert(errno == 0);
+
+   ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
+   igt_assert(ptr != MAP_FAILED);
+   igt_assert(pattern_check(ptr, BO_SIZE) == 0);
+
+   /* Unmap and remap */
+   munmap(ptr, BO_SIZE);
+   ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
+   igt_assert(ptr != MAP_FAILED);
+   igt_assert(pattern_check(p

[PATCH 4/4] drm/i915: Use CPU mapping for userspace dma-buf mmap()

2015-08-12 Thread Tiago Vignatti

Userspace is the one in charge of flush CPU by wrapping mmap with
begin{,end}_cpu_access.

v2: Remove LLC check cause we have dma-buf sync providers now. Also, fix return
before transferring ownership when mmap fails.
v3: Fix return values.
v4: !obj->base.filp is user triggerable, so removed the WARN_ON.

Signed-off-by: Tiago Vignatti 
---
 drivers/gpu/drm/i915/i915_gem_dmabuf.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index 8447ba4..ecd00d6 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -193,7 +193,23 @@ static void i915_gem_dmabuf_kunmap(struct dma_buf 
*dma_buf, unsigned long page_n

 static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct 
*vma)
 {
-   return -EINVAL;
+   struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
+   int ret;
+
+   if (obj->base.size < vma->vm_end - vma->vm_start)
+   return -EINVAL;
+
+   if (!obj->base.filp)
+   return -ENODEV;
+
+   ret = obj->base.filp->f_op->mmap(obj->base.filp, vma);
+   if (ret)
+   return ret;
+
+   fput(vma->vm_file);
+   vma->vm_file = get_file(obj->base.filp);
+
+   return 0;
 }

 static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, size_t start, 
size_t length, enum dma_data_direction direction)
-- 
2.1.0

[PATCH 3/4] drm/i915: Implement end_cpu_access

2015-08-12 Thread Tiago Vignatti

Signed-off-by: Tiago Vignatti 
---
 drivers/gpu/drm/i915/i915_gem_dmabuf.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index e9c2bfd..8447ba4 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -212,6 +212,15 @@ static int i915_gem_begin_cpu_access(struct dma_buf 
*dma_buf, size_t start, size
return ret;
 }

+static void i915_gem_end_cpu_access(struct dma_buf *dma_buf, size_t start, 
size_t length, enum dma_data_direction direction)
+{
+   struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
+   bool write = (direction == DMA_BIDIRECTIONAL || direction == 
DMA_TO_DEVICE);
+
+   if (i915_gem_object_set_to_gtt_domain(obj, write))
+   DRM_ERROR("failed to set bo into the GTT\n");
+}
+
 static const struct dma_buf_ops i915_dmabuf_ops =  {
.map_dma_buf = i915_gem_map_dma_buf,
.unmap_dma_buf = i915_gem_unmap_dma_buf,
@@ -224,6 +233,7 @@ static const struct dma_buf_ops i915_dmabuf_ops =  {
.vmap = i915_gem_dmabuf_vmap,
.vunmap = i915_gem_dmabuf_vunmap,
.begin_cpu_access = i915_gem_begin_cpu_access,
+   .end_cpu_access = i915_gem_end_cpu_access,
 };

 struct dma_buf *i915_gem_prime_export(struct drm_device *dev,
-- 
2.1.0

[PATCH 2/4] dma-buf: Add ioctls to allow userspace to flush

2015-08-12 Thread Tiago Vignatti

From: Daniel Vetter 

The userspace might need some sort of cache coherency management e.g. when CPU
and GPU domains are being accessed through dma-buf at the same time. To
circumvent this problem there are begin/end coherency markers, that forward
directly to existing dma-buf device drivers vfunc hooks. Userspace can make use
of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be
used like following:
 - mmap dma-buf fd
 - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
   to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
   want (with the new data being consumed by the GPU or say scanout device)
 - munamp once you don't need the buffer any more

v2 (Tiago): Fix header file type names (u64 -> __u64)
v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end
dma-buf functions. Check for overflows in start/length.

Cc: Sumit Semwal 
Signed-off-by: Daniel Vetter 
Signed-off-by: Tiago Vignatti 
---
 Documentation/dma-buf-sharing.txt | 12 ++
 drivers/dma-buf/dma-buf.c | 50 +++
 include/uapi/linux/dma-buf.h  | 43 +
 3 files changed, 105 insertions(+)
 create mode 100644 include/uapi/linux/dma-buf.h

diff --git a/Documentation/dma-buf-sharing.txt 
b/Documentation/dma-buf-sharing.txt
index 480c8de..2d8ee3b 100644
--- a/Documentation/dma-buf-sharing.txt
+++ b/Documentation/dma-buf-sharing.txt
@@ -355,6 +355,18 @@ Being able to mmap an export dma-buf buffer object has 2 
main use-cases:

No special interfaces, userspace simply calls mmap on the dma-buf fd.

+   Also, the userspace might need some sort of cache coherency management e.g.
+   when CPU and GPU domains are being accessed through dma-buf at the same
+   time. To circumvent this problem there are begin/end coherency markers, that
+   forward directly to existing dma-buf device drivers vfunc hooks. Userspace
+   can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The
+   sequence would be used like following:
+ - mmap dma-buf fd
+ - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
+   to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
+   want (with the new data being consumed by the GPU or say scanout device)
+ - munamp once you don't need the buffer any more
+
 2. Supporting existing mmap interfaces in importers

Similar to the motivation for kernel cpu access it is again important that
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 155c146..e628415 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -34,6 +34,8 @@
 #include 
 #include 

+#include 
+
 static inline int is_dma_buf_file(struct file *);

 struct dma_buf_list {
@@ -251,11 +253,59 @@ out:
return events;
 }

+static long dma_buf_ioctl(struct file *file,
+ unsigned int cmd, unsigned long arg)
+{
+   struct dma_buf *dmabuf;
+   struct dma_buf_sync sync;
+   enum dma_data_direction direction;
+
+   dmabuf = file->private_data;
+
+   if (!is_dma_buf_file(file))
+   return -EINVAL;
+
+   switch (cmd) {
+   case DMA_BUF_IOCTL_SYNC:
+   if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
+   return -EFAULT;
+
+   if (sync.flags & DMA_BUF_SYNC_RW)
+   direction = DMA_BIDIRECTIONAL;
+   else if (sync.flags & DMA_BUF_SYNC_READ)
+   direction = DMA_FROM_DEVICE;
+   else if (sync.flags & DMA_BUF_SYNC_WRITE)
+   direction = DMA_TO_DEVICE;
+   else
+   return -EINVAL;
+
+   if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
+   return -EINVAL;
+
+   /* check for overflowing the buffer's size */
+   if (sync.start > dmabuf->size ||
+   sync.length > dmabuf->size - sync.start)
+   return -EINVAL;
+
+   if (sync.flags & DMA_BUF_SYNC_END)
+   dma_buf_end_cpu_access(dmabuf, sync.start,
+  sync.length, direction);
+   else
+   dma_buf_begin_cpu_access(dmabuf, sync.start,
+sync.length, direction);
+
+   return 0;
+   default:
+   return -ENOTTY;
+   }
+}
+
 static const struct file_operations dma_buf_fops = {
.release= dma_buf_release,
.mmap   = dma_buf_mmap_internal,
.llseek = dma_buf_llseek,
.poll   = dma_buf_poll,
+   .unlocked_ioctl = dma_buf_ioctl,
 };

 /*
diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h
new file mode 100644
index 00

[PATCH 1/4] drm: prime: Honour O_RDWR during prime-handle-to-fd

2015-08-12 Thread Tiago Vignatti

From: Daniel Thompson 

Currently DRM_IOCTL_PRIME_HANDLE_TO_FD rejects all flags except
(DRM|O)_CLOEXEC making it difficult (maybe impossible) for userspace
to mmap() the resulting dma-buf even when this is supported by the
DRM driver.

It is trivial to relax the restriction and permit read/write access.
This is safe because the flags are seldom touched by drm; mostly they
are passed verbatim to dma_buf calls.

v3 (Tiago): removed unused flags variable from drm_prime_handle_to_fd_ioctl.

Signed-off-by: Daniel Thompson 
Signed-off-by: Tiago Vignatti 
---
 drivers/gpu/drm/drm_prime.c | 10 +++---
 include/uapi/drm/drm.h  |  1 +
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 27aa718..df6cdc7 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -329,7 +329,7 @@ static const struct dma_buf_ops drm_gem_prime_dmabuf_ops =  
{
  * drm_gem_prime_export - helper library implementation of the export callback
  * @dev: drm_device to export from
  * @obj: GEM object to export
- * @flags: flags like DRM_CLOEXEC
+ * @flags: flags like DRM_CLOEXEC and DRM_RDWR
  *
  * This is the implementation of the gem_prime_export functions for GEM drivers
  * using the PRIME helpers.
@@ -628,7 +628,6 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, 
void *data,
 struct drm_file *file_priv)
 {
struct drm_prime_handle *args = data;
-   uint32_t flags;

if (!drm_core_check_feature(dev, DRIVER_PRIME))
return -EINVAL;
@@ -637,14 +636,11 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, 
void *data,
return -ENOSYS;

/* check flags are valid */
-   if (args->flags & ~DRM_CLOEXEC)
+   if (args->flags & ~(DRM_CLOEXEC | DRM_RDWR))
return -EINVAL;

-   /* we only want to pass DRM_CLOEXEC which is == O_CLOEXEC */
-   flags = args->flags & DRM_CLOEXEC;
-
return dev->driver->prime_handle_to_fd(dev, file_priv,
-   args->handle, flags, &args->fd);
+   args->handle, args->flags, &args->fd);
 }

 int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data,
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 3801584..ad8223e 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -668,6 +668,7 @@ struct drm_set_client_cap {
__u64 value;
 };

+#define DRM_RDWR O_RDWR
 #define DRM_CLOEXEC O_CLOEXEC
 struct drm_prime_handle {
__u32 handle;
-- 
2.1.0

[PATCH v4] mmap on the dma-buf directly

2015-08-12 Thread Tiago Vignatti

Hi,

The idea is to create a GEM bo in one process and pass the prime handle of the
it to another process, which in turn uses the handle only to map and write.
This could be useful for Chrome OS architecture, where the Web content
("unpriviledged process") maps and CPU-draws a buffer, which was previously
allocated in the GPU process ("priviledged process").

In v2, I've added a patch that Daniel kindly drafted to allow the
unpriviledged process flush through a prime fd. In v3, I've fixed a few
concerns and then added end_cpu_access to i915. In v4, I fixed Sumit Semwal's
concerns about dma-duf documentation and the FIXME missing in that same patch,
and also removed WARN in i915 dma-buf mmap (pointed by Chris). PTAL.

Best Regards,

Tiago


Daniel Thompson (1):
  drm: prime: Honour O_RDWR during prime-handle-to-fd

Daniel Vetter (1):
  dma-buf: Add ioctls to allow userspace to flush

Tiago Vignatti (2):
  drm/i915: Implement end_cpu_access
  drm/i915: Use CPU mapping for userspace dma-buf mmap()

 Documentation/dma-buf-sharing.txt  | 12 
 drivers/dma-buf/dma-buf.c  | 50 ++
 drivers/gpu/drm/drm_prime.c| 10 ++-
 drivers/gpu/drm/i915/i915_gem_dmabuf.c | 28 ++-
 include/uapi/drm/drm.h |  1 +
 include/uapi/linux/dma-buf.h   | 43 +
 6 files changed, 136 insertions(+), 8 deletions(-)
 create mode 100644 include/uapi/linux/dma-buf.h

-- 
2.1.0

[PATCH i-g-t 5/5] tests/kms_mmap_write_crc: Demonstrate the need for end_cpu_access

2015-08-11 Thread Tiago Vignatti

It requires i915 changes to add end_cpu_access().

Signed-off-by: Tiago Vignatti 
---
 tests/kms_mmap_write_crc.c | 63 --
 1 file changed, 55 insertions(+), 8 deletions(-)

diff --git a/tests/kms_mmap_write_crc.c b/tests/kms_mmap_write_crc.c
index e24d535..59ac9e7 100644
--- a/tests/kms_mmap_write_crc.c
+++ b/tests/kms_mmap_write_crc.c
@@ -67,6 +67,24 @@ static char *dmabuf_mmap_framebuffer(int drm_fd, struct 
igt_fb *fb)
return ptr;
 }

+static void dmabuf_sync_start(void)
+{
+   struct dma_buf_sync sync_start;
+
+   memset(&sync_start, 0, sizeof(sync_start));
+   sync_start.flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_RW;
+   do_ioctl(dma_buf_fd, DMA_BUF_IOCTL_SYNC, &sync_start);
+}
+
+static void dmabuf_sync_end(void)
+{
+   struct dma_buf_sync sync_end;
+
+   memset(&sync_end, 0, sizeof(sync_end));
+   sync_end.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_RW;
+   do_ioctl(dma_buf_fd, DMA_BUF_IOCTL_SYNC, &sync_end);
+}
+
 static void test_begin_access(data_t *data)
 {
igt_display_t *display = &data->display;
@@ -103,14 +121,11 @@ static void test_begin_access(data_t *data)
caching = gem_get_caching(data->drm_fd, fb->gem_handle);
igt_assert(caching == I915_CACHING_NONE || caching == 
I915_CACHING_DISPLAY);

-   // Uncomment the following for flush and the crc check next passes. It
-   // requires the kernel counter-part of it implemented obviously.
-   // {
-   // struct dma_buf_sync sync_start;
-   // memset(&sync_start, 0, sizeof(sync_start));
-   // sync_start.flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_RW;
-   // do_ioctl(dma_buf_fd, DMA_BUF_IOCTL_SYNC, &sync_start);
-   // }
+   /*
+* firstly demonstrate the need for DMA_BUF_SYNC_START 
("begin_cpu_access")
+*/
+
+   dmabuf_sync_start();

/* use dmabuf pointer to make the other fb all white too */
buf = malloc(fb->size);
@@ -126,6 +141,38 @@ static void test_begin_access(data_t *data)
/* check that the crc is as expected, which requires that caches got 
flushed */
igt_pipe_crc_collect_crc(data->pipe_crc, &crc);
igt_assert_crc_equal(&crc, &data->ref_crc);
+
+   /*
+* now demonstrates the need for DMA_BUF_SYNC_END ("end_cpu_access")
+*/
+
+   /* start over, writing non-white to the fb again and flip to it to make 
it
+* fully flushed */
+   cr = igt_get_cairo_ctx(data->drm_fd, fb);
+   igt_paint_test_pattern(cr, fb->width, fb->height);
+   cairo_destroy(cr);
+
+   igt_plane_set_fb(data->primary, fb);
+   igt_display_commit(display);
+
+   /* sync start, to move to CPU domain */
+   dmabuf_sync_start();
+
+   /* use dmabuf pointer in the same fb to make it all white */
+   buf = malloc(fb->size);
+   igt_assert(buf != NULL);
+   memset(buf, 0xff, fb->size);
+   memcpy(ptr, buf, fb->size);
+   free(buf);
+
+   /* there's an implicit flush in set_fb() as well (to set to the GTT 
domain),
+* so if we don't do it and instead write directly into the fb as it is 
the
+* scanout, that should demonstrate the need for end_cpu_access */
+   dmabuf_sync_end();
+
+   /* check that the crc is as expected, which requires that caches got 
flushed */
+   igt_pipe_crc_collect_crc(data->pipe_crc, &crc);
+   igt_assert_crc_equal(&crc, &data->ref_crc);
 }

 static bool prepare_crtc(data_t *data)
-- 
2.1.0

[PATCH i-g-t 4/5] tests: Add kms_mmap_write_crc for cache coherency tests

2015-08-11 Thread Tiago Vignatti

This program can be used to detect when the writes don't land in scanout due
cache incoherency. Although this seems a problem inherently of non-LCC machines
("Atom"), this particular test catches a cache dirt on scanout on LLC machines
as well. It's inspired in Ville's kms_pwrite_crc.c and can be used also to test
the correctness of the driver's begin_cpu_access (TODO end_cpu_access).

To see the need for flush, one has to run this same binary a few times cause
it's not 100% reproducible (in my Core machine it's always possible to catch
the problem by running it at most 5 times).

Signed-off-by: Tiago Vignatti 
---
 tests/.gitignore   |   1 +
 tests/Makefile.sources |   1 +
 tests/kms_mmap_write_crc.c | 236 +
 3 files changed, 238 insertions(+)
 create mode 100644 tests/kms_mmap_write_crc.c

diff --git a/tests/.gitignore b/tests/.gitignore
index 5bc4a58..9ba1e48 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -140,6 +140,7 @@ kms_force_connector
 kms_frontbuffer_tracking
 kms_legacy_colorkey
 kms_mmio_vs_cs_flip
+kms_mmap_write_crc
 kms_pipe_b_c_ivb
 kms_pipe_crc_basic
 kms_plane
diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index 5b2072e..31c5508 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -163,6 +163,7 @@ TESTS_progs = \
kms_3d \
kms_fence_pin_leak \
kms_force_connector \
+   kms_mmap_write_crc \
kms_pwrite_crc \
kms_sink_crc_basic \
prime_udl \
diff --git a/tests/kms_mmap_write_crc.c b/tests/kms_mmap_write_crc.c
new file mode 100644
index 000..e24d535
--- /dev/null
+++ b/tests/kms_mmap_write_crc.c
@@ -0,0 +1,236 @@
+/*
+ * Copyright Â© 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Tiago Vignatti 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "drmtest.h"
+#include "igt_debugfs.h"
+#include "igt_kms.h"
+#include "intel_chipset.h"
+#include "ioctl_wrappers.h"
+#include "igt_aux.h"
+
+IGT_TEST_DESCRIPTION(
+   "Use the display CRC support to validate mmap write to an already uncached 
future scanout buffer.");
+
+typedef struct {
+   int drm_fd;
+   igt_display_t display;
+   struct igt_fb fb[2];
+   igt_output_t *output;
+   igt_plane_t *primary;
+   enum pipe pipe;
+   igt_crc_t ref_crc;
+   igt_pipe_crc_t *pipe_crc;
+   uint32_t devid;
+} data_t;
+
+int dma_buf_fd;
+
+static char *dmabuf_mmap_framebuffer(int drm_fd, struct igt_fb *fb)
+{
+   char *ptr = NULL;
+
+   dma_buf_fd = prime_handle_to_fd(drm_fd, fb->gem_handle);
+   igt_assert(errno == 0);
+
+   ptr = mmap(NULL, fb->size, PROT_READ | PROT_WRITE, MAP_SHARED, 
dma_buf_fd, 0);
+   igt_assert(ptr != MAP_FAILED);
+
+   return ptr;
+}
+
+static void test_begin_access(data_t *data)
+{
+   igt_display_t *display = &data->display;
+   igt_output_t *output = data->output;
+   struct igt_fb *fb = &data->fb[1];
+   drmModeModeInfo *mode;
+   cairo_t *cr;
+   char *ptr;
+   uint32_t caching;
+   void *buf;
+   igt_crc_t crc;
+
+   mode = igt_output_get_mode(output);
+
+   /* create a non-white fb where we can write later */
+   igt_create_fb(data->drm_fd, mode->hdisplay, mode->vdisplay,
+ DRM_FORMAT_XRGB, LOCAL_DRM_FORMAT_MOD_NONE, fb);
+
+   ptr = dmabuf_mmap_framebuffer(data->drm_fd, fb);
+
+   cr = igt_get_cairo_ctx(data->drm_fd, fb);
+   igt_paint_test_pattern(cr, fb->width, fb->height);
+   cairo_destroy(cr);
+
+   /* flip to it to make it UC/WC and fully flushed */
+   igt_plane_set_fb(data->primary, fb);
+   igt_display_commit

[PATCH i-g-t 3/5] prime_mmap: Add basic tests to write in a bo using CPU

2015-08-11 Thread Tiago Vignatti

This patch adds test_correct_cpu_write, which maps the texture buffer through a
prime fd and then writes directly to it using the CPU. It stresses the driver
to guarantee cache synchronization among the different domains.

This test also adds test_forked_cpu_write, which creates the GEM bo in one
process and pass the prime handle of the it to another process, which in turn
uses the handle only to map and write. Grossly speaking this test simulates
Chrome OS  architecture, where the Web content ("unpriviledged process") maps
and CPU-draws a buffer, which was previously allocated in the GPU process
("priviledged process").

This requires kernel modifications (Daniel Thompson's "drm: prime: Honour
O_RDWR during prime-handle-to-fd").

Signed-off-by: Tiago Vignatti 
---
 lib/ioctl_wrappers.c |  5 +++-
 tests/prime_mmap.c   | 65 
 2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c
index 53bd635..941fa66 100644
--- a/lib/ioctl_wrappers.c
+++ b/lib/ioctl_wrappers.c
@@ -1125,6 +1125,9 @@ void gem_require_ring(int fd, int ring_id)

 /* prime */

+#ifndef DRM_RDWR
+#define DRM_RDWR O_RDWR
+#endif
 /**
  * prime_handle_to_fd:
  * @fd: open i915 drm file descriptor
@@ -1142,7 +1145,7 @@ int prime_handle_to_fd(int fd, uint32_t handle)

memset(&args, 0, sizeof(args));
args.handle = handle;
-   args.flags = DRM_CLOEXEC;
+   args.flags = DRM_CLOEXEC | DRM_RDWR;
args.fd = -1;

do_ioctl(fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &args);
diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c
index dc59e8f..ad91371 100644
--- a/tests/prime_mmap.c
+++ b/tests/prime_mmap.c
@@ -22,6 +22,7 @@
  *
  * Authors:
  *Rob Bradford 
+ *Tiago Vignatti 
  *
  */

@@ -66,6 +67,12 @@ fill_bo(uint32_t handle, size_t size)
 }

 static void
+fill_bo_cpu(char *ptr)
+{
+   memcpy(ptr, pattern, sizeof(pattern));
+}
+
+static void
 test_correct(void)
 {
int dma_buf_fd;
@@ -180,6 +187,62 @@ test_forked(void)
gem_close(fd, handle);
 }

+/* test CPU write. This has a rather big implication for the driver which must
+ * guarantee cache synchronization when writing the bo using CPU. */
+static void
+test_correct_cpu_write(void)
+{
+   int dma_buf_fd;
+   char *ptr;
+   uint32_t handle;
+
+   handle = gem_create(fd, BO_SIZE);
+
+   dma_buf_fd = prime_handle_to_fd(fd, handle);
+   igt_assert(errno == 0);
+
+   /* Check correctness of map using write protection (PROT_WRITE) */
+   ptr = mmap(NULL, BO_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, 
dma_buf_fd, 0);
+   igt_assert(ptr != MAP_FAILED);
+
+   /* Fill bo using CPU */
+   fill_bo_cpu(ptr);
+
+   /* Check pattern correctness */
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);
+
+   munmap(ptr, BO_SIZE);
+   close(dma_buf_fd);
+   gem_close(fd, handle);
+}
+
+/* map from another process and then write using CPU */
+static void
+test_forked_cpu_write(void)
+{
+   int dma_buf_fd;
+   char *ptr;
+   uint32_t handle;
+
+   handle = gem_create(fd, BO_SIZE);
+
+   dma_buf_fd = prime_handle_to_fd(fd, handle);
+   igt_assert(errno == 0);
+
+   igt_fork(childno, 1) {
+   ptr = mmap(NULL, BO_SIZE, PROT_READ | PROT_WRITE , MAP_SHARED, 
dma_buf_fd, 0);
+   igt_assert(ptr != MAP_FAILED);
+   fill_bo_cpu(ptr);
+
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);
+   munmap(ptr, BO_SIZE);
+   close(dma_buf_fd);
+   }
+   close(dma_buf_fd);
+   igt_waitchildren();
+   gem_close(fd, handle);
+}
+
 static void
 test_refcounting(void)
 {
@@ -346,6 +409,8 @@ igt_main
{ "test_map_unmap", test_map_unmap },
{ "test_reprime", test_reprime },
{ "test_forked", test_forked },
+   { "test_correct_cpu_write", test_correct_cpu_write },
+   { "test_forked_cpu_write", test_forked_cpu_write },
{ "test_refcounting", test_refcounting },
{ "test_dup", test_dup },
{ "test_errors", test_errors },
-- 
2.1.0

[PATCH i-g-t 2/5] prime_mmap: Fix a few misc stuff

2015-08-11 Thread Tiago Vignatti

- Remove pattern_check(), which was walking through a useless iterator
- Remove superfluous PROT_WRITE from gem_mmap, in test_correct()
- Add binary file to .gitignore

Signed-off-by: Tiago Vignatti 
---
 tests/.gitignore   |  1 +
 tests/prime_mmap.c | 37 -
 2 files changed, 13 insertions(+), 25 deletions(-)

diff --git a/tests/.gitignore b/tests/.gitignore
index 0af0899..5bc4a58 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -163,6 +163,7 @@ pm_sseu
 prime_nv_api
 prime_nv_pcopy
 prime_nv_test
+prime_mmap
 prime_self_import
 prime_udl
 template
diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c
index 4dc2055..dc59e8f 100644
--- a/tests/prime_mmap.c
+++ b/tests/prime_mmap.c
@@ -65,19 +65,6 @@ fill_bo(uint32_t handle, size_t size)
}
 }

-static int
-pattern_check(char *ptr, size_t size)
-{
-   off_t i;
-   for (i = 0; i < size; i+=sizeof(pattern))
-   {
-   if (memcmp(ptr, pattern, sizeof(pattern)) != 0)
-   return 1;
-   }
-
-   return 0;
-}
-
 static void
 test_correct(void)
 {
@@ -92,14 +79,14 @@ test_correct(void)
igt_assert(errno == 0);

/* Check correctness vs GEM_MMAP_GTT */
-   ptr1 = gem_mmap(fd, handle, BO_SIZE, PROT_READ | PROT_WRITE);
+   ptr1 = gem_mmap(fd, handle, BO_SIZE, PROT_READ);
ptr2 = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
igt_assert(ptr1 != MAP_FAILED);
igt_assert(ptr2 != MAP_FAILED);
igt_assert(memcmp(ptr1, ptr2, BO_SIZE) == 0);

/* Check pattern correctness */
-   igt_assert(pattern_check(ptr2, BO_SIZE) == 0);
+   igt_assert(memcmp(ptr2, pattern, sizeof(pattern)) == 0);

munmap(ptr1, BO_SIZE);
munmap(ptr2, BO_SIZE);
@@ -122,13 +109,13 @@ test_map_unmap(void)

ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
igt_assert(ptr != MAP_FAILED);
-   igt_assert(pattern_check(ptr, BO_SIZE) == 0);
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);

/* Unmap and remap */
munmap(ptr, BO_SIZE);
ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
igt_assert(ptr != MAP_FAILED);
-   igt_assert(pattern_check(ptr, BO_SIZE) == 0);
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);

munmap(ptr, BO_SIZE);
close(dma_buf_fd);
@@ -151,16 +138,16 @@ test_reprime(void)

ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
igt_assert(ptr != MAP_FAILED);
-   igt_assert(pattern_check(ptr, BO_SIZE) == 0);
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);

close (dma_buf_fd);
-   igt_assert(pattern_check(ptr, BO_SIZE) == 0);
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);
munmap(ptr, BO_SIZE);

dma_buf_fd = prime_handle_to_fd(fd, handle);
ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
igt_assert(ptr != MAP_FAILED);
-   igt_assert(pattern_check(ptr, BO_SIZE) == 0);
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);

munmap(ptr, BO_SIZE);
close(dma_buf_fd);
@@ -184,7 +171,7 @@ test_forked(void)
igt_fork(childno, 1) {
ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
igt_assert(ptr != MAP_FAILED);
-   igt_assert(pattern_check(ptr, BO_SIZE) == 0);
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);
munmap(ptr, BO_SIZE);
close(dma_buf_fd);
}
@@ -210,7 +197,7 @@ test_refcounting(void)

ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
igt_assert(ptr != MAP_FAILED);
-   igt_assert(pattern_check(ptr, BO_SIZE) == 0);
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);
munmap(ptr, BO_SIZE);
close (dma_buf_fd);
 }
@@ -231,7 +218,7 @@ test_dup(void)

ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
igt_assert(ptr != MAP_FAILED);
-   igt_assert(pattern_check(ptr, BO_SIZE) == 0);
+   igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0);
munmap(ptr, BO_SIZE);
gem_close(fd, handle);
close (dma_buf_fd);
@@ -310,7 +297,7 @@ test_aperture_limit(void)
igt_assert(errno == 0);
ptr1 = mmap(NULL, size1, PROT_READ, MAP_SHARED, dma_buf_fd1, 0);
igt_assert(ptr1 != MAP_FAILED);
-   igt_assert(pattern_check(ptr1, BO_SIZE) == 0);
+   igt_assert(memcmp(ptr1, pattern, sizeof(pattern)) == 0);

handle2 = gem_create(fd, size1);
fill_bo(handle2, BO_SIZE);
@@ -318,7 +305,7 @@ test_aperture_limit(void)
igt_assert(errno == 0);
ptr2 = mmap(NULL, size2, PROT_READ, MAP_SHARED, dma_buf_fd2, 0);
igt_assert(ptr2 != MAP_FAILED);
-   igt_assert(pattern_check(ptr2, BO_SIZE) == 0);
+   igt_assert(memcmp(ptr2, pattern, sizeof(pattern)) =

[PATCH i-g-t 1/5] prime_mmap: Add new test for calling mmap() on dma-buf fds

2015-08-11 Thread Tiago Vignatti

From: Rob Bradford 

This test has the following subtests:
 - test_correct for correctness of the data
 - test_map_unmap checks for mapping idempotency
 - test_reprime checks for dma-buf creation idempotency
 - test_forked checks for multiprocess access
 - test_refcounting checks for buffer reference counting
 - test_dup chats that dup()ing the fd works
 - test_errors checks the error return values for failures
 - test_aperture_limit tests multiple buffer creation at the gtt aperture
   limit

Signed-off-by: Rob Bradford 
Signed-off-by: Tiago Vignatti 
---
 tests/Makefile.sources |   1 +
 tests/prime_mmap.c | 383 +
 2 files changed, 384 insertions(+)
 create mode 100644 tests/prime_mmap.c

diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index c94714b..5b2072e 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -90,6 +90,7 @@ TESTS_progs_M = \
pm_rps \
pm_rc6_residency \
pm_sseu \
+   prime_mmap \
prime_self_import \
template \
$(NULL)
diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c
new file mode 100644
index 000..4dc2055
--- /dev/null
+++ b/tests/prime_mmap.c
@@ -0,0 +1,383 @@
+/*
+ * Copyright Â© 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Rob Bradford 
+ *
+ */
+
+/*
+ * Testcase: Check whether mmap()ing dma-buf works
+ */
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "drm.h"
+#include "i915_drm.h"
+#include "drmtest.h"
+#include "igt_debugfs.h"
+#include "ioctl_wrappers.h"
+
+#define BO_SIZE (16*1024)
+
+static int fd;
+
+char pattern[] = {0xff, 0x00, 0x00, 0x00,
+   0x00, 0xff, 0x00, 0x00,
+   0x00, 0x00, 0xff, 0x00,
+   0x00, 0x00, 0x00, 0xff};
+
+static void
+fill_bo(uint32_t handle, size_t size)
+{
+   off_t i;
+   for (i = 0; i < size; i+=sizeof(pattern))
+   {
+   gem_write(fd, handle, i, pattern, sizeof(pattern));
+   }
+}
+
+static int
+pattern_check(char *ptr, size_t size)
+{
+   off_t i;
+   for (i = 0; i < size; i+=sizeof(pattern))
+   {
+   if (memcmp(ptr, pattern, sizeof(pattern)) != 0)
+   return 1;
+   }
+
+   return 0;
+}
+
+static void
+test_correct(void)
+{
+   int dma_buf_fd;
+   char *ptr1, *ptr2;
+   uint32_t handle;
+
+   handle = gem_create(fd, BO_SIZE);
+   fill_bo(handle, BO_SIZE);
+
+   dma_buf_fd = prime_handle_to_fd(fd, handle);
+   igt_assert(errno == 0);
+
+   /* Check correctness vs GEM_MMAP_GTT */
+   ptr1 = gem_mmap(fd, handle, BO_SIZE, PROT_READ | PROT_WRITE);
+   ptr2 = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
+   igt_assert(ptr1 != MAP_FAILED);
+   igt_assert(ptr2 != MAP_FAILED);
+   igt_assert(memcmp(ptr1, ptr2, BO_SIZE) == 0);
+
+   /* Check pattern correctness */
+   igt_assert(pattern_check(ptr2, BO_SIZE) == 0);
+
+   munmap(ptr1, BO_SIZE);
+   munmap(ptr2, BO_SIZE);
+   close(dma_buf_fd);
+   gem_close(fd, handle);
+}
+
+static void
+test_map_unmap(void)
+{
+   int dma_buf_fd;
+   char *ptr;
+   uint32_t handle;
+
+   handle = gem_create(fd, BO_SIZE);
+   fill_bo(handle, BO_SIZE);
+
+   dma_buf_fd = prime_handle_to_fd(fd, handle);
+   igt_assert(errno == 0);
+
+   ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
+   igt_assert(ptr != MAP_FAILED);
+   igt_assert(pattern_check(ptr, BO_SIZE) == 0);
+
+   /* Unmap and remap */
+   munmap(ptr, BO_SIZE);
+   ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0);
+   igt_assert(ptr != MAP_FAILED);
+   igt_assert(pattern_check(p

[PATCH 4/4] drm/i915: Use CPU mapping for userspace dma-buf mmap()

2015-08-11 Thread Tiago Vignatti

Userspace is the one in charge of flush CPU by wrapping mmap with
begin{,end}_cpu_access.

v2: Remove LLC check cause we have dma-buf sync providers now. Also, fix return
before transferring ownership when mmap fails.
v3: Fix return values.

Signed-off-by: Tiago Vignatti 
---
 drivers/gpu/drm/i915/i915_gem_dmabuf.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index 8447ba4..8b87c86 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -193,7 +193,23 @@ static void i915_gem_dmabuf_kunmap(struct dma_buf 
*dma_buf, unsigned long page_n

 static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct 
*vma)
 {
-   return -EINVAL;
+   struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
+   int ret;
+
+   if (obj->base.size < vma->vm_end - vma->vm_start)
+   return -EINVAL;
+
+   if (WARN_ON(!obj->base.filp))
+   return -ENODEV;
+
+   ret = obj->base.filp->f_op->mmap(obj->base.filp, vma);
+   if (ret)
+   return ret;
+
+   fput(vma->vm_file);
+   vma->vm_file = get_file(obj->base.filp);
+
+   return 0;
 }

 static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, size_t start, 
size_t length, enum dma_data_direction direction)
-- 
2.1.0

[PATCH 3/4] drm/i915: Implement end_cpu_access

2015-08-11 Thread Tiago Vignatti

Signed-off-by: Tiago Vignatti 
---
 drivers/gpu/drm/i915/i915_gem_dmabuf.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index e9c2bfd..8447ba4 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -212,6 +212,15 @@ static int i915_gem_begin_cpu_access(struct dma_buf 
*dma_buf, size_t start, size
return ret;
 }

+static void i915_gem_end_cpu_access(struct dma_buf *dma_buf, size_t start, 
size_t length, enum dma_data_direction direction)
+{
+   struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
+   bool write = (direction == DMA_BIDIRECTIONAL || direction == 
DMA_TO_DEVICE);
+
+   if (i915_gem_object_set_to_gtt_domain(obj, write))
+   DRM_ERROR("failed to set bo into the GTT\n");
+}
+
 static const struct dma_buf_ops i915_dmabuf_ops =  {
.map_dma_buf = i915_gem_map_dma_buf,
.unmap_dma_buf = i915_gem_unmap_dma_buf,
@@ -224,6 +233,7 @@ static const struct dma_buf_ops i915_dmabuf_ops =  {
.vmap = i915_gem_dmabuf_vmap,
.vunmap = i915_gem_dmabuf_vunmap,
.begin_cpu_access = i915_gem_begin_cpu_access,
+   .end_cpu_access = i915_gem_end_cpu_access,
 };

 struct dma_buf *i915_gem_prime_export(struct drm_device *dev,
-- 
2.1.0

[PATCH 2/4] dma-buf: Add ioctls to allow userspace to flush

2015-08-11 Thread Tiago Vignatti

From: Daniel Vetter 

FIXME: Update kerneldoc for begin/end to make it clear that those are
for mmap too.

Open: Do we need a special indication that the begin/end is from
userspace mmap and not from kernel mmap?

There's also the question already about kernel internal users - vmap
and kmap interfaces are much different ... We might need to add a
mapping enum to the begin/end dma-buf functions.

v2 (Tiago): Fix header file type names (u64 -> __u64)

Signed-off-by: Daniel Vetter 
Signed-off-by: Tiago Vignatti 
---
 drivers/dma-buf/dma-buf.c| 47 
 include/uapi/linux/dma-buf.h | 39 
 2 files changed, 86 insertions(+)
 create mode 100644 include/uapi/linux/dma-buf.h

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 155c146..4820d61 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -34,6 +34,8 @@
 #include 
 #include 

+#include 
+
 static inline int is_dma_buf_file(struct file *);

 struct dma_buf_list {
@@ -251,11 +253,56 @@ out:
return events;
 }

+static long dma_buf_ioctl(struct file *file,
+ unsigned int cmd, unsigned long arg)
+{
+   struct dma_buf *dmabuf;
+   struct dma_buf_sync sync;
+   enum dma_data_direction direction;
+
+   dmabuf = file->private_data;
+
+   if (!is_dma_buf_file(file))
+   return -EINVAL;
+
+   switch (cmd) {
+   case DMA_BUF_IOCTL_SYNC:
+   if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
+   return -EFAULT;
+
+   if (sync.flags & DMA_BUF_SYNC_RW)
+   direction = DMA_BIDIRECTIONAL;
+   else if (sync.flags & DMA_BUF_SYNC_READ)
+   direction = DMA_FROM_DEVICE;
+   else if (sync.flags & DMA_BUF_SYNC_WRITE)
+   direction = DMA_TO_DEVICE;
+   else
+   return -EINVAL;
+
+   if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
+   return -EINVAL;
+
+   /* FIXME: Check for overflows in start/length. */
+
+   if (sync.flags & DMA_BUF_SYNC_END)
+   dma_buf_end_cpu_access(dmabuf, sync.start,
+  sync.length, direction);
+   else
+   dma_buf_begin_cpu_access(dmabuf, sync.start,
+sync.length, direction);
+
+   return 0;
+   default:
+   return -ENOTTY;
+   }
+}
+
 static const struct file_operations dma_buf_fops = {
.release= dma_buf_release,
.mmap   = dma_buf_mmap_internal,
.llseek = dma_buf_llseek,
.poll   = dma_buf_poll,
+   .unlocked_ioctl = dma_buf_ioctl,
 };

 /*
diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h
new file mode 100644
index 000..e5327df
--- /dev/null
+++ b/include/uapi/linux/dma-buf.h
@@ -0,0 +1,39 @@
+/*
+ * Framework for buffer objects that can be shared across devices/subsystems.
+ *
+ * Copyright(C) 2015 Intel Ltd
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _DMA_BUF_UAPI_H_
+#define _DMA_BUF_UAPI_H_
+
+struct dma_buf_sync {
+   __u64 flags;
+   __u64 start;
+   __u64 length;
+};
+
+#define DMA_BUF_SYNC_READ  (1 << 0)
+#define DMA_BUF_SYNC_WRITE (2 << 0)
+#define DMA_BUF_SYNC_RW(3 << 0)
+#define DMA_BUF_SYNC_START (0 << 2)
+#define DMA_BUF_SYNC_END   (1 << 2)
+#define DMA_BUF_SYNC_VALID_FLAGS_MASK \
+   (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END)
+
+#define DMA_BUF_BASE   'b'
+#define DMA_BUF_IOCTL_SYNC _IOWR(DMA_BUF_BASE, 0, struct dma_buf_sync)
+
+#endif
-- 
2.1.0

[PATCH 1/4] drm: prime: Honour O_RDWR during prime-handle-to-fd

2015-08-11 Thread Tiago Vignatti

From: Daniel Thompson 

Currently DRM_IOCTL_PRIME_HANDLE_TO_FD rejects all flags except
(DRM|O)_CLOEXEC making it difficult (maybe impossible) for userspace
to mmap() the resulting dma-buf even when this is supported by the
DRM driver.

It is trivial to relax the restriction and permit read/write access.
This is safe because the flags are seldom touched by drm; mostly they
are passed verbatim to dma_buf calls.

v3 (Tiago): removed unused flags variable from drm_prime_handle_to_fd_ioctl.

Signed-off-by: Daniel Thompson 
Signed-off-by: Tiago Vignatti 
---
 drivers/gpu/drm/drm_prime.c | 10 +++---
 include/uapi/drm/drm.h  |  1 +
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 27aa718..df6cdc7 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -329,7 +329,7 @@ static const struct dma_buf_ops drm_gem_prime_dmabuf_ops =  
{
  * drm_gem_prime_export - helper library implementation of the export callback
  * @dev: drm_device to export from
  * @obj: GEM object to export
- * @flags: flags like DRM_CLOEXEC
+ * @flags: flags like DRM_CLOEXEC and DRM_RDWR
  *
  * This is the implementation of the gem_prime_export functions for GEM drivers
  * using the PRIME helpers.
@@ -628,7 +628,6 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, 
void *data,
 struct drm_file *file_priv)
 {
struct drm_prime_handle *args = data;
-   uint32_t flags;

if (!drm_core_check_feature(dev, DRIVER_PRIME))
return -EINVAL;
@@ -637,14 +636,11 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, 
void *data,
return -ENOSYS;

/* check flags are valid */
-   if (args->flags & ~DRM_CLOEXEC)
+   if (args->flags & ~(DRM_CLOEXEC | DRM_RDWR))
return -EINVAL;

-   /* we only want to pass DRM_CLOEXEC which is == O_CLOEXEC */
-   flags = args->flags & DRM_CLOEXEC;
-
return dev->driver->prime_handle_to_fd(dev, file_priv,
-   args->handle, flags, &args->fd);
+   args->handle, args->flags, &args->fd);
 }

 int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data,
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 3801584..ad8223e 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -668,6 +668,7 @@ struct drm_set_client_cap {
__u64 value;
 };

+#define DRM_RDWR O_RDWR
 #define DRM_CLOEXEC O_CLOEXEC
 struct drm_prime_handle {
__u32 handle;
-- 
2.1.0

[PATCH v3] mmap on the dma-buf directly

2015-08-11 Thread Tiago Vignatti

Hi,

The idea is to create a GEM bo in one process and pass the prime handle of the
it to another process, which in turn uses the handle only to map and write.
This could be useful for Chrome OS  architecture, where the Web content
("unpriviledged process") maps and CPU-draws a buffer, which was previously
allocated in the GPU process ("priviledged process").

In v2, I've added a patch that Daniel kindly drafted to allow the
unpriviledged process flush through a prime fd. In v3, I've fixed a few
concerns and then added end_cpu_access to i915.

To validate all this I'm using igt, and sending the tests for review now.
Please take a look.

Tiago


Daniel Thompson (1):
  drm: prime: Honour O_RDWR during prime-handle-to-fd

Daniel Vetter (1):
  dma-buf: Add ioctls to allow userspace to flush

Tiago Vignatti (2):
  drm/i915: Implement end_cpu_access
  drm/i915: Use CPU mapping for userspace dma-buf mmap()

 drivers/dma-buf/dma-buf.c  | 47 ++
 drivers/gpu/drm/drm_prime.c| 10 +++-
 drivers/gpu/drm/i915/i915_gem_dmabuf.c | 28 +++-
 include/uapi/drm/drm.h |  1 +
 include/uapi/linux/dma-buf.h   | 39 
 5 files changed, 117 insertions(+), 8 deletions(-)
 create mode 100644 include/uapi/linux/dma-buf.h

-- 
2.1.0

[PATCH 3/3] drm/i915: Use CPU mapping for userspace dma-buf mmap()

2015-08-05 Thread Tiago Vignatti

Userspace is the one in charge of flush CPU by wrapping mmap with
begin{,end}_cpu_access.

v2: Remove LLC check cause we have dma-buf sync providers now. Also, fix return
before transferring ownership when mmap fails.

Signed-off-by: Tiago Vignatti 
---
 drivers/gpu/drm/i915/i915_gem_dmabuf.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index e9c2bfd..b608f67 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -193,7 +193,23 @@ static void i915_gem_dmabuf_kunmap(struct dma_buf 
*dma_buf, unsigned long page_n

 static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct 
*vma)
 {
-   return -EINVAL;
+   struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
+   int ret;
+
+   if (obj->base.size < vma->vm_end - vma->vm_start)
+   return -EINVAL;
+
+   if (WARN_ON(!obj->base.filp))
+   return -EINVAL;
+
+   ret = obj->base.filp->f_op->mmap(obj->base.filp, vma);
+   if (IS_ERR_VALUE(ret))
+   return -EINVAL;
+
+   fput(vma->vm_file);
+   vma->vm_file = get_file(obj->base.filp);
+
+   return ret;
 }

 static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, size_t start, 
size_t length, enum dma_data_direction direction)
-- 
2.1.0

[PATCH 2/3] dma-buf: Add ioctls to allow userspace to flush

2015-08-05 Thread Tiago Vignatti

From: Daniel Vetter 

FIXME: Update kerneldoc for begin/end to make it clear that those are
for mmap too.

Open: Do we need a special indication that the begin/end is from
userspace mmap and not from kernel mmap?

There's also the question already about kernel internal users - vmap
and kmap interfaces are much different ... We might need to add a
mapping enum to the begin/end dma-buf functions.

v2 (Tiago): Fix header file type names (u64 -> __u64)

Signed-off-by: Daniel Vetter 
Signed-off-by: Tiago Vignatti 
---
 drivers/dma-buf/dma-buf.c| 47 
 include/uapi/linux/dma-buf.h | 39 
 2 files changed, 86 insertions(+)
 create mode 100644 include/uapi/linux/dma-buf.h

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 155c146..4820d61 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -34,6 +34,8 @@
 #include 
 #include 

+#include 
+
 static inline int is_dma_buf_file(struct file *);

 struct dma_buf_list {
@@ -251,11 +253,56 @@ out:
return events;
 }

+static long dma_buf_ioctl(struct file *file,
+ unsigned int cmd, unsigned long arg)
+{
+   struct dma_buf *dmabuf;
+   struct dma_buf_sync sync;
+   enum dma_data_direction direction;
+
+   dmabuf = file->private_data;
+
+   if (!is_dma_buf_file(file))
+   return -EINVAL;
+
+   switch (cmd) {
+   case DMA_BUF_IOCTL_SYNC:
+   if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
+   return -EFAULT;
+
+   if (sync.flags & DMA_BUF_SYNC_RW)
+   direction = DMA_BIDIRECTIONAL;
+   else if (sync.flags & DMA_BUF_SYNC_READ)
+   direction = DMA_FROM_DEVICE;
+   else if (sync.flags & DMA_BUF_SYNC_WRITE)
+   direction = DMA_TO_DEVICE;
+   else
+   return -EINVAL;
+
+   if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
+   return -EINVAL;
+
+   /* FIXME: Check for overflows in start/length. */
+
+   if (sync.flags & DMA_BUF_SYNC_END)
+   dma_buf_end_cpu_access(dmabuf, sync.start,
+  sync.length, direction);
+   else
+   dma_buf_begin_cpu_access(dmabuf, sync.start,
+sync.length, direction);
+
+   return 0;
+   default:
+   return -ENOTTY;
+   }
+}
+
 static const struct file_operations dma_buf_fops = {
.release= dma_buf_release,
.mmap   = dma_buf_mmap_internal,
.llseek = dma_buf_llseek,
.poll   = dma_buf_poll,
+   .unlocked_ioctl = dma_buf_ioctl,
 };

 /*
diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h
new file mode 100644
index 000..e5327df
--- /dev/null
+++ b/include/uapi/linux/dma-buf.h
@@ -0,0 +1,39 @@
+/*
+ * Framework for buffer objects that can be shared across devices/subsystems.
+ *
+ * Copyright(C) 2015 Intel Ltd
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _DMA_BUF_UAPI_H_
+#define _DMA_BUF_UAPI_H_
+
+struct dma_buf_sync {
+   __u64 flags;
+   __u64 start;
+   __u64 length;
+};
+
+#define DMA_BUF_SYNC_READ  (1 << 0)
+#define DMA_BUF_SYNC_WRITE (2 << 0)
+#define DMA_BUF_SYNC_RW(3 << 0)
+#define DMA_BUF_SYNC_START (0 << 2)
+#define DMA_BUF_SYNC_END   (1 << 2)
+#define DMA_BUF_SYNC_VALID_FLAGS_MASK \
+   (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END)
+
+#define DMA_BUF_BASE   'b'
+#define DMA_BUF_IOCTL_SYNC _IOWR(DMA_BUF_BASE, 0, struct dma_buf_sync)
+
+#endif
-- 
2.1.0

[PATCH 1/3] drm: prime: Honour O_RDWR during prime-handle-to-fd

2015-08-05 Thread Tiago Vignatti

From: Daniel Thompson 

Currently DRM_IOCTL_PRIME_HANDLE_TO_FD rejects all flags except
(DRM|O)_CLOEXEC making it difficult (maybe impossible) for userspace
to mmap() the resulting dma-buf even when this is supported by the
DRM driver.

It is trivial to relax the restriction and permit read/write access.
This is safe because the flags are seldom touched by drm; mostly they
are passed verbatim to dma_buf calls.

v3 (Tiago): removed unused flags variable from drm_prime_handle_to_fd_ioctl.

Signed-off-by: Daniel Thompson 
Signed-off-by: Tiago Vignatti 
---
 drivers/gpu/drm/drm_prime.c | 10 +++---
 include/uapi/drm/drm.h  |  1 +
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 27aa718..df6cdc7 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -329,7 +329,7 @@ static const struct dma_buf_ops drm_gem_prime_dmabuf_ops =  
{
  * drm_gem_prime_export - helper library implementation of the export callback
  * @dev: drm_device to export from
  * @obj: GEM object to export
- * @flags: flags like DRM_CLOEXEC
+ * @flags: flags like DRM_CLOEXEC and DRM_RDWR
  *
  * This is the implementation of the gem_prime_export functions for GEM drivers
  * using the PRIME helpers.
@@ -628,7 +628,6 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, 
void *data,
 struct drm_file *file_priv)
 {
struct drm_prime_handle *args = data;
-   uint32_t flags;

if (!drm_core_check_feature(dev, DRIVER_PRIME))
return -EINVAL;
@@ -637,14 +636,11 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, 
void *data,
return -ENOSYS;

/* check flags are valid */
-   if (args->flags & ~DRM_CLOEXEC)
+   if (args->flags & ~(DRM_CLOEXEC | DRM_RDWR))
return -EINVAL;

-   /* we only want to pass DRM_CLOEXEC which is == O_CLOEXEC */
-   flags = args->flags & DRM_CLOEXEC;
-
return dev->driver->prime_handle_to_fd(dev, file_priv,
-   args->handle, flags, &args->fd);
+   args->handle, args->flags, &args->fd);
 }

 int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data,
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 3801584..ad8223e 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -668,6 +668,7 @@ struct drm_set_client_cap {
__u64 value;
 };

+#define DRM_RDWR O_RDWR
 #define DRM_CLOEXEC O_CLOEXEC
 struct drm_prime_handle {
__u32 handle;
-- 
2.1.0

[PATCH v2 0/3] mmap on the dma-buf directly

2015-08-05 Thread Tiago Vignatti

Hi,

I've tested these patches (in drm-intel-nightly, but also in CrOS kernel
v3.14) and they seem just enough for what we want to do: the idea is to create
a GEM bo in one process and pass the prime handle of the it to another
process, which in turn uses the handle only to map and write. This could be
useful for Chrome OS  architecture, where the Web content ("unpriviledged
process") maps and CPU-draws a buffer, which was previously allocated in the
GPU process ("priviledged process").

In v2, I've added a patch that Daniel kindly drafted to allow the
unpriviledged process flush through a prime fd. To validate it I've built a
test on top of igt's kms_pwrite_crc (which I'll be sending next for review).

Let me know your concerns.

Tiago


Daniel Thompson (1):
  drm: prime: Honour O_RDWR during prime-handle-to-fd

Daniel Vetter (1):
  dma-buf: Add ioctls to allow userspace to flush

Tiago Vignatti (1):
  drm/i915: Use CPU mapping for userspace dma-buf mmap()

 drivers/dma-buf/dma-buf.c  | 47 ++
 drivers/gpu/drm/drm_prime.c| 10 +++-
 drivers/gpu/drm/i915/i915_gem_dmabuf.c | 18 -
 include/uapi/drm/drm.h |  1 +
 include/uapi/linux/dma-buf.h   | 39 
 5 files changed, 107 insertions(+), 8 deletions(-)
 create mode 100644 include/uapi/linux/dma-buf.h

-- 
2.1.0

[Intel-gfx] [PATCH 1/2] drm/i915: Use CPU mapping for userspace dma-buf mmap()

2015-08-05 Thread Tiago Vignatti

On 08/05/2015 04:08 AM, Daniel Vetter wrote:
> On Tue, Aug 04, 2015 at 06:30:25PM -0300, Tiago Vignatti wrote:
> Nah they don't have to be equal since the problem isn't that nothing goes
> out to memory where the display can see it, but usually only parts of it.
> I.e. you need to change your test to
> - draw black screen (it starts that way so nothing to do really), grab crtc
> - draw white screen and make sure you flush correctly, don't bother with
>crc (we can't test for inequality
>because collisions are too easy)
> - draw black screen again without flushing, grab crc
>
> Then assert that your two crc will be inequal (which they shouldn't be
> because some cachelines will still be stuck). Maybe also add a delay
> somewhere so you can see the cacheline dirt pattern, it's very
> characteristic.

Cool, I've got it now. The test below makes the cachelines dirt, 
requiring them to get flushed correctly -- I'll work on it now. Should 
we add that kind of test somewhere in igt BTW?

PS: I had an issue with the original kms_pwrite_crc which returns 
frequent fails. Paulo helped though and showed me that pwrite is 
currently broken: https://bugs.freedesktop.org/show_bug.cgi?id=86422

Tiago

diff --git a/tests/kms_pwrite_crc.c b/tests/kms_pwrite_crc.c
index 05b9e38..419b46d 100644
--- a/tests/kms_pwrite_crc.c
+++ b/tests/kms_pwrite_crc.c
@@ -50,6 +50,20 @@ typedef struct {
uint32_t devid;
  } data_t;

+static char *dmabuf_mmap_framebuffer(int drm_fd, struct igt_fb *fb)
+{
+   int dma_buf_fd;
+   char *ptr = NULL;
+
+   dma_buf_fd = prime_handle_to_fd(drm_fd, fb->gem_handle);
+   igt_assert(errno == 0);
+
+   ptr = mmap(NULL, fb->size, PROT_READ | PROT_WRITE, MAP_SHARED, 
dma_buf_fd, 0);
+   igt_assert(ptr != MAP_FAILED);
+
+   return ptr;
+}
+
  static void test(data_t *data)
  {
igt_display_t *display = &data->display;
@@ -57,6 +71,7 @@ static void test(data_t *data)
struct igt_fb *fb = &data->fb[1];
drmModeModeInfo *mode;
cairo_t *cr;
+   char *ptr;
uint32_t caching;
void *buf;
igt_crc_t crc;
@@ -67,6 +82,8 @@ static void test(data_t *data)
igt_create_fb(data->drm_fd, mode->hdisplay, mode->vdisplay,
  DRM_FORMAT_XRGB, LOCAL_DRM_FORMAT_MOD_NONE, fb);

+   ptr = dmabuf_mmap_framebuffer(data->drm_fd, fb);
+
cr = igt_get_cairo_ctx(data->drm_fd, fb);
igt_paint_test_pattern(cr, fb->width, fb->height);
cairo_destroy(cr);
@@ -83,11 +100,11 @@ static void test(data_t *data)
caching = gem_get_caching(data->drm_fd, fb->gem_handle);
igt_assert(caching == I915_CACHING_NONE || caching == 
I915_CACHING_DISPLAY);

-   /* use pwrite to make the other fb all white too */
+   /* use dmabuf pointer to make the other fb all white too */
buf = malloc(fb->size);
igt_assert(buf != NULL);
memset(buf, 0xff, fb->size);
-   gem_write(data->drm_fd, fb->gem_handle, 0, buf, fb->size);
+   memcpy(ptr, buf, fb->size);
free(buf);

/* and flip to it */

[Intel-gfx] [PATCH 1/2] drm/i915: Use CPU mapping for userspace dma-buf mmap()

2015-08-04 Thread Tiago Vignatti

On 07/31/2015 06:02 PM, Chris Wilson wrote:
>
> The first problem is that llc does not guarrantee that the buffer is
> cache coherent with all aspects of the GPU. For scanout and similar
> writes need to be WC.
>
> if (obj->has_framebuffer_references) would at least catch where the fb
> is made before the mmap.
>
> Equally this buffer could then be shared with other devices and exposing
> a CPU mmap to userspace (and no flush/set-domain protocol) will result in
> corruption.

I've built an igt test to catch this corruption but it's not really 
falling there in my IvyBridge. If what you described is right (and so 
what I coded) then this test should write in the mapped buffer but not 
update the screen.

Any idea what's going on?

https://github.com/tiagovignatti/intel-gpu-tools/commit/3e130ac2b274f1a3f6889c78cb72d0673ca2.patch


 From 3e130ac2b274f1a3f68855559c78cb72d0673ca2 Mon Sep 17 00:00:00 2001
From: Tiago Vignatti 
Date: Tue, 4 Aug 2015 13:38:09 -0300
Subject: [PATCH] tests: Add prime_crc for cache coherency

This program can be used to detect when the writes don't land in 
scanout, due
cache incoherency.

Run it like ./prime_crc --interactive-debug=crc

Signed-off-by: Tiago Vignatti 
---
  tests/.gitignore   |   1 +
  tests/Makefile.sources |   1 +
  tests/prime_crc.c  | 201 
+
  3 files changed, 203 insertions(+)
  create mode 100644 tests/prime_crc.c

diff --git a/tests/.gitignore b/tests/.gitignore
index 5bc4a58..96dbf57 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -160,6 +160,7 @@ pm_rc6_residency
  pm_rpm
  pm_rps
  pm_sseu
+prime_crc
  prime_nv_api
  prime_nv_pcopy
  prime_nv_test
diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index 5b2072e..c05b5a7 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -90,6 +90,7 @@ TESTS_progs_M = \
pm_rps \
pm_rc6_residency \
pm_sseu \
+   prime_crc \
prime_mmap \
prime_self_import \
template \
diff --git a/tests/prime_crc.c b/tests/prime_crc.c
new file mode 100644
index 000..3474cc9
--- /dev/null
+++ b/tests/prime_crc.c
@@ -0,0 +1,201 @@
+/*
+ * Copyright Â© 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the 
"Software"),
+ * to deal in the Software without restriction, including without 
limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the 
next
+ * paragraph) shall be included in all copies or substantial portions 
of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Tiago Vignatti 
+ *
+ */
+
+/* This program can detect when the writes don't land in scanout, due cache
+ * incoherency. */
+
+#include "drmtest.h"
+#include "igt_debugfs.h"
+#include "igt_kms.h"
+
+#define MAX_CONNECTORS 32
+
+struct modeset_params {
+   uint32_t crtc_id;
+   uint32_t connector_id;
+   drmModeModeInfoPtr mode;
+};
+
+int drm_fd;
+drmModeResPtr drm_res;
+drmModeConnectorPtr drm_connectors[MAX_CONNECTORS];
+drm_intel_bufmgr *bufmgr;
+igt_pipe_crc_t *pipe_crc;
+
+struct modeset_params ms;
+
+static void find_modeset_params(void)
+{
+   int i;
+   uint32_t connector_id = 0, crtc_id;
+   drmModeModeInfoPtr mode = NULL;
+
+   for (i = 0; i < drm_res->count_connectors; i++) {
+   drmModeConnectorPtr c = drm_connectors[i];
+
+   if (c->count_modes) {
+   connector_id = c->connector_id;
+   mode = &c->modes[0];
+   break;
+   }
+   }
+   igt_require(connector_id);
+
+   crtc_id = drm_res->crtcs[0];
+   igt_assert(crtc_id);
+   igt_assert(mode);
+
+   ms.connector_id = connector_id;
+   ms.crtc_id = crtc_id;
+   ms.mode = mode;
+
+}
+
+#define BO_SIZE (16*1024)
+
+char pattern[] = {0xff, 0x00, 0x00, 0x00,
+   0x00, 0xff, 0x00, 0x00,
+   0x00, 0x00, 0xff, 0x00,
+   0x00, 0x00, 0x00, 0xff};
+
+static void mess_with_coherency(char *ptr)
+{
+   off_t i;
+
+   for (i = 0

[PATCH 2/2] drm: prime: Honour O_RDWR during prime-handle-to-fd

2015-07-31 Thread Tiago Vignatti

From: Daniel Thompson 

Currently DRM_IOCTL_PRIME_HANDLE_TO_FD rejects all flags except
(DRM|O)_CLOEXEC making it difficult (maybe impossible) for userspace
to mmap() the resulting dma-buf even when this is supported by the
DRM driver.

It is trivial to relax the restriction and permit read/write access.
This is safe because the flags are seldom touched by drm; mostly they
are passed verbatim to dma_buf calls.

v3 (Tiago): removed unused flags variable from drm_prime_handle_to_fd_ioctl.

Signed-off-by: Daniel Thompson 
Signed-off-by: Tiago Vignatti 
---
 drivers/gpu/drm/drm_prime.c | 10 +++---
 include/uapi/drm/drm.h  |  1 +
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 27aa718..df6cdc7 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -329,7 +329,7 @@ static const struct dma_buf_ops drm_gem_prime_dmabuf_ops =  
{
  * drm_gem_prime_export - helper library implementation of the export callback
  * @dev: drm_device to export from
  * @obj: GEM object to export
- * @flags: flags like DRM_CLOEXEC
+ * @flags: flags like DRM_CLOEXEC and DRM_RDWR
  *
  * This is the implementation of the gem_prime_export functions for GEM drivers
  * using the PRIME helpers.
@@ -628,7 +628,6 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, 
void *data,
 struct drm_file *file_priv)
 {
struct drm_prime_handle *args = data;
-   uint32_t flags;

if (!drm_core_check_feature(dev, DRIVER_PRIME))
return -EINVAL;
@@ -637,14 +636,11 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, 
void *data,
return -ENOSYS;

/* check flags are valid */
-   if (args->flags & ~DRM_CLOEXEC)
+   if (args->flags & ~(DRM_CLOEXEC | DRM_RDWR))
return -EINVAL;

-   /* we only want to pass DRM_CLOEXEC which is == O_CLOEXEC */
-   flags = args->flags & DRM_CLOEXEC;
-
return dev->driver->prime_handle_to_fd(dev, file_priv,
-   args->handle, flags, &args->fd);
+   args->handle, args->flags, &args->fd);
 }

 int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data,
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 3801584..ad8223e 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -668,6 +668,7 @@ struct drm_set_client_cap {
__u64 value;
 };

+#define DRM_RDWR O_RDWR
 #define DRM_CLOEXEC O_CLOEXEC
 struct drm_prime_handle {
__u32 handle;
-- 
2.1.0

[PATCH 1/2] drm/i915: Use CPU mapping for userspace dma-buf mmap()

2015-07-31 Thread Tiago Vignatti

For now we're opting out devices that don't have the LLC CPU cache (mostly
"Atom" devices). Alternatively, we could build up a path to mmap them through
GTT WC (and ignore the fact that will be dead-slow for reading). Or, an even
more complex work I believe, would involve on setting up dma-buf ioctls to
allow userspace flush, controlling manually the synchronization via
begin{,end}_cpu_access.

Signed-off-by: Tiago Vignatti 
---
 drivers/gpu/drm/i915/i915_gem_dmabuf.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index e9c2bfd..e6cb402 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -193,7 +193,26 @@ static void i915_gem_dmabuf_kunmap(struct dma_buf 
*dma_buf, unsigned long page_n

 static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct 
*vma)
 {
-   return -EINVAL;
+   struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
+   struct drm_device *dev = obj->base.dev;
+   int ret;
+
+   if (obj->base.size < vma->vm_end - vma->vm_start)
+   return -EINVAL;
+
+   /* On non-LLC machines we'd need to be careful cause CPU and GPU don't
+* share the CPU's L3 cache and coherency may hurt when CPU mapping. */
+   if (!HAS_LLC(dev))
+   return -EINVAL;
+
+   if (!obj->base.filp)
+   return -EINVAL;
+
+   ret = obj->base.filp->f_op->mmap(obj->base.filp, vma);
+   fput(vma->vm_file);
+   vma->vm_file = get_file(obj->base.filp);
+
+   return ret;
 }

 static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, size_t start, 
size_t length, enum dma_data_direction direction)
-- 
2.1.0

[PATCH 0/2] mmap on the dma-buf directly

2015-07-31 Thread Tiago Vignatti

Hi,

I've tested these two patches (in drm-intel-nightly, but also in CrOS kernel
v3.14) and they seem just enough for what we want to do: the idea is to create
a GEM bo in one process and pass the prime handle of the it to another
process, which in turn uses the handle only to map and write. This could be
useful for Chrome OS  architecture, where the Web content ("unpriviledged
process") maps and CPU-draws a buffer, which was previously allocated in the
GPU process ("priviledged process").

I'm using a modified igt mostly to test these things. PTAL here:
https://github.com/tiagovignatti/intel-gpu-tools/commits/prime_mmap

Thank you,

Tiago


Daniel Thompson (1):
  drm: prime: Honour O_RDWR during prime-handle-to-fd

Tiago Vignatti (1):
  drm/i915: Use CPU mapping for userspace dma-buf mmap()

 drivers/gpu/drm/drm_prime.c| 10 +++---
 drivers/gpu/drm/i915/i915_gem_dmabuf.c | 21 -
 include/uapi/drm/drm.h |  1 +
 3 files changed, 24 insertions(+), 8 deletions(-)

-- 
2.1.0

[PATCH resent twice 2/3] vgaarb: use MIT license

2010-05-24 Thread Tiago Vignatti

Signed-off-by: Tiago Vignatti 
Cc: Henry Zhao 
---
Jesse and Dave, that was send two times already and no one said anything.
Please, pull it. Oracle's Henry Zhao is already employing it in Solaris and,
after all authors agreed, we haven't changed yet the license.


 drivers/gpu/vga/vgaarb.c |   26 +++---
 include/linux/vgaarb.h   |   21 +
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/gpu/vga/vgaarb.c
index 290b0cc..b87569e 100644
--- a/drivers/gpu/vga/vgaarb.c
+++ b/drivers/gpu/vga/vgaarb.c
@@ -1,12 +1,32 @@
 /*
- * vgaarb.c
+ * vgaarb.c: Implements the VGA arbitration. For details refer to
+ * Documentation/vgaarbiter.txt
+ *
  *
  * (C) Copyright 2005 Benjamin Herrenschmidt 
  * (C) Copyright 2007 Paulo R. Zanoni 
  * (C) Copyright 2007, 2009 Tiago Vignatti 
  *
- * Implements the VGA arbitration. For details refer to
- * Documentation/vgaarbiter.txt
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS
+ * IN THE SOFTWARE.
+ *
  */
 
 #include 
diff --git a/include/linux/vgaarb.h b/include/linux/vgaarb.h
index 2dfaa29..c9a9759 100644
--- a/include/linux/vgaarb.h
+++ b/include/linux/vgaarb.h
@@ -5,6 +5,27 @@
  * (C) Copyright 2005 Benjamin Herrenschmidt 
  * (C) Copyright 2007 Paulo R. Zanoni 
  * (C) Copyright 2007, 2009 Tiago Vignatti 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS
+ * IN THE SOFTWARE.
+ *
  */
 
 #ifndef LINUX_VGA_H
-- 
1.6.0.4

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH 1/3] vgaarb: convert pr_devel() to pr_debug()

2010-05-24 Thread Tiago Vignatti

We want to be able to use CONFIG_DYNAMIC_DEBUG in arbiter code, switch
the few existing pr_devel() calls to pr_debug().

Also, add one more debug information regarding decoding count.

Signed-off-by: Tiago Vignatti 
---
 drivers/gpu/vga/vgaarb.c |   35 ++-
 1 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/gpu/vga/vgaarb.c
index 441e38c..290b0cc 100644
--- a/drivers/gpu/vga/vgaarb.c
+++ b/drivers/gpu/vga/vgaarb.c
@@ -155,8 +155,8 @@ static struct vga_device *__vga_tryget(struct vga_device 
*vgadev,
(vgadev->decodes & VGA_RSRC_LEGACY_MEM))
rsrc |= VGA_RSRC_LEGACY_MEM;
 
-   pr_devel("%s: %d\n", __func__, rsrc);
-   pr_devel("%s: owns: %d\n", __func__, vgadev->owns);
+   pr_debug("%s: %d\n", __func__, rsrc);
+   pr_debug("%s: owns: %d\n", __func__, vgadev->owns);
 
/* Check what resources we need to acquire */
wants = rsrc & ~vgadev->owns;
@@ -268,7 +268,7 @@ static void __vga_put(struct vga_device *vgadev, unsigned 
int rsrc)
 {
unsigned int old_locks = vgadev->locks;
 
-   pr_devel("%s\n", __func__);
+   pr_debug("%s\n", __func__);
 
/* Update our counters, and account for equivalent legacy resources
 * if we decode them
@@ -575,6 +575,7 @@ static inline void vga_update_device_decodes(struct 
vga_device *vgadev,
else
vga_decode_count--;
}
+   pr_debug("vgaarb: decoding count now is: %d\n", vga_decode_count);
 }
 
 void __vga_set_legacy_decoding(struct pci_dev *pdev, unsigned int decodes, 
bool userspace)
@@ -831,7 +832,7 @@ static ssize_t vga_arb_write(struct file *file, const char 
__user * buf,
curr_pos += 5;
remaining -= 5;
 
-   pr_devel("client 0x%p called 'lock'\n", priv);
+   pr_debug("client 0x%p called 'lock'\n", priv);
 
if (!vga_str_to_iostate(curr_pos, remaining, &io_state)) {
ret_val = -EPROTO;
@@ -867,7 +868,7 @@ static ssize_t vga_arb_write(struct file *file, const char 
__user * buf,
curr_pos += 7;
remaining -= 7;
 
-   pr_devel("client 0x%p called 'unlock'\n", priv);
+   pr_debug("client 0x%p called 'unlock'\n", priv);
 
if (strncmp(curr_pos, "all", 3) == 0)
io_state = VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM;
@@ -917,7 +918,7 @@ static ssize_t vga_arb_write(struct file *file, const char 
__user * buf,
curr_pos += 8;
remaining -= 8;
 
-   pr_devel("client 0x%p called 'trylock'\n", priv);
+   pr_debug("client 0x%p called 'trylock'\n", priv);
 
if (!vga_str_to_iostate(curr_pos, remaining, &io_state)) {
ret_val = -EPROTO;
@@ -961,7 +962,7 @@ static ssize_t vga_arb_write(struct file *file, const char 
__user * buf,
 
curr_pos += 7;
remaining -= 7;
-   pr_devel("client 0x%p called 'target'\n", priv);
+   pr_debug("client 0x%p called 'target'\n", priv);
/* if target is default */
if (!strncmp(curr_pos, "default", 7))
pdev = pci_dev_get(vga_default_device());
@@ -971,11 +972,11 @@ static ssize_t vga_arb_write(struct file *file, const 
char __user * buf,
ret_val = -EPROTO;
goto done;
}
-   pr_devel("vgaarb: %s ==> %x:%x:%x.%x\n", curr_pos,
+   pr_debug("vgaarb: %s ==> %x:%x:%x.%x\n", curr_pos,
domain, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
 
pbus = pci_find_bus(domain, bus);
-   pr_devel("vgaarb: pbus %p\n", pbus);
+   pr_debug("vgaarb: pbus %p\n", pbus);
if (pbus == NULL) {
pr_err("vgaarb: invalid PCI domain and/or bus 
address %x:%x\n",
domain, bus);
@@ -983,7 +984,7 @@ static ssize_t vga_arb_write(struct file *file, const char 
__user * buf,
goto done;
}
pdev = pci_get_slot(pbus, devfn);
-   pr_devel("vgaarb: pdev %p\n", pdev);
+   pr_debug("vgaarb: pdev %p\n", pdev);
if (!pdev) {
pr_err("vgaarb: invalid PCI address %x:%x\n

[PATCH resent twice 2/3] vgaarb: use MIT license

2010-05-24 Thread Tiago Vignatti

Signed-off-by: Tiago Vignatti 
Cc: Henry Zhao 
---
Jesse and Dave, that was send two times already and no one said anything.
Please, pull it. Oracle's Henry Zhao is already employing it in Solaris and,
after all authors agreed, we haven't changed yet the license.


 drivers/gpu/vga/vgaarb.c |   26 +++---
 include/linux/vgaarb.h   |   21 +
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/gpu/vga/vgaarb.c
index 290b0cc..b87569e 100644
--- a/drivers/gpu/vga/vgaarb.c
+++ b/drivers/gpu/vga/vgaarb.c
@@ -1,12 +1,32 @@
 /*
- * vgaarb.c
+ * vgaarb.c: Implements the VGA arbitration. For details refer to
+ * Documentation/vgaarbiter.txt
+ *
  *
  * (C) Copyright 2005 Benjamin Herrenschmidt 
  * (C) Copyright 2007 Paulo R. Zanoni 
  * (C) Copyright 2007, 2009 Tiago Vignatti 
  *
- * Implements the VGA arbitration. For details refer to
- * Documentation/vgaarbiter.txt
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS
+ * IN THE SOFTWARE.
+ *
  */

 #include 
diff --git a/include/linux/vgaarb.h b/include/linux/vgaarb.h
index 2dfaa29..c9a9759 100644
--- a/include/linux/vgaarb.h
+++ b/include/linux/vgaarb.h
@@ -5,6 +5,27 @@
  * (C) Copyright 2005 Benjamin Herrenschmidt 
  * (C) Copyright 2007 Paulo R. Zanoni 
  * (C) Copyright 2007, 2009 Tiago Vignatti 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS
+ * IN THE SOFTWARE.
+ *
  */

 #ifndef LINUX_VGA_H
-- 
1.6.0.4

[PATCH 1/3] vgaarb: convert pr_devel() to pr_debug()

2010-05-24 Thread Tiago Vignatti

We want to be able to use CONFIG_DYNAMIC_DEBUG in arbiter code, switch
the few existing pr_devel() calls to pr_debug().

Also, add one more debug information regarding decoding count.

Signed-off-by: Tiago Vignatti 
---
 drivers/gpu/vga/vgaarb.c |   35 ++-
 1 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/gpu/vga/vgaarb.c
index 441e38c..290b0cc 100644
--- a/drivers/gpu/vga/vgaarb.c
+++ b/drivers/gpu/vga/vgaarb.c
@@ -155,8 +155,8 @@ static struct vga_device *__vga_tryget(struct vga_device 
*vgadev,
(vgadev->decodes & VGA_RSRC_LEGACY_MEM))
rsrc |= VGA_RSRC_LEGACY_MEM;

-   pr_devel("%s: %d\n", __func__, rsrc);
-   pr_devel("%s: owns: %d\n", __func__, vgadev->owns);
+   pr_debug("%s: %d\n", __func__, rsrc);
+   pr_debug("%s: owns: %d\n", __func__, vgadev->owns);

/* Check what resources we need to acquire */
wants = rsrc & ~vgadev->owns;
@@ -268,7 +268,7 @@ static void __vga_put(struct vga_device *vgadev, unsigned 
int rsrc)
 {
unsigned int old_locks = vgadev->locks;

-   pr_devel("%s\n", __func__);
+   pr_debug("%s\n", __func__);

/* Update our counters, and account for equivalent legacy resources
 * if we decode them
@@ -575,6 +575,7 @@ static inline void vga_update_device_decodes(struct 
vga_device *vgadev,
else
vga_decode_count--;
}
+   pr_debug("vgaarb: decoding count now is: %d\n", vga_decode_count);
 }

 void __vga_set_legacy_decoding(struct pci_dev *pdev, unsigned int decodes, 
bool userspace)
@@ -831,7 +832,7 @@ static ssize_t vga_arb_write(struct file *file, const char 
__user * buf,
curr_pos += 5;
remaining -= 5;

-   pr_devel("client 0x%p called 'lock'\n", priv);
+   pr_debug("client 0x%p called 'lock'\n", priv);

if (!vga_str_to_iostate(curr_pos, remaining, &io_state)) {
ret_val = -EPROTO;
@@ -867,7 +868,7 @@ static ssize_t vga_arb_write(struct file *file, const char 
__user * buf,
curr_pos += 7;
remaining -= 7;

-   pr_devel("client 0x%p called 'unlock'\n", priv);
+   pr_debug("client 0x%p called 'unlock'\n", priv);

if (strncmp(curr_pos, "all", 3) == 0)
io_state = VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM;
@@ -917,7 +918,7 @@ static ssize_t vga_arb_write(struct file *file, const char 
__user * buf,
curr_pos += 8;
remaining -= 8;

-   pr_devel("client 0x%p called 'trylock'\n", priv);
+   pr_debug("client 0x%p called 'trylock'\n", priv);

if (!vga_str_to_iostate(curr_pos, remaining, &io_state)) {
ret_val = -EPROTO;
@@ -961,7 +962,7 @@ static ssize_t vga_arb_write(struct file *file, const char 
__user * buf,

curr_pos += 7;
remaining -= 7;
-   pr_devel("client 0x%p called 'target'\n", priv);
+   pr_debug("client 0x%p called 'target'\n", priv);
/* if target is default */
if (!strncmp(curr_pos, "default", 7))
pdev = pci_dev_get(vga_default_device());
@@ -971,11 +972,11 @@ static ssize_t vga_arb_write(struct file *file, const 
char __user * buf,
ret_val = -EPROTO;
goto done;
}
-   pr_devel("vgaarb: %s ==> %x:%x:%x.%x\n", curr_pos,
+   pr_debug("vgaarb: %s ==> %x:%x:%x.%x\n", curr_pos,
domain, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));

pbus = pci_find_bus(domain, bus);
-   pr_devel("vgaarb: pbus %p\n", pbus);
+   pr_debug("vgaarb: pbus %p\n", pbus);
if (pbus == NULL) {
pr_err("vgaarb: invalid PCI domain and/or bus 
address %x:%x\n",
domain, bus);
@@ -983,7 +984,7 @@ static ssize_t vga_arb_write(struct file *file, const char 
__user * buf,
goto done;
}
pdev = pci_get_slot(pbus, devfn);
-   pr_devel("vgaarb: pdev %p\n", pdev);
+   pr_debug("vgaarb: pdev %p\n", pdev);
if (!pdev) {
pr_err("vgaarb: invalid PCI address %x:%x\

97 matches

Mail list logo