Re: [PATCH 2/2] [RFC] reservation: add suppport for read-only access using rcu

2014-04-10 Thread Thomas Hellstrom
Hi!

Ugh. This became more complicated than I thought, but I'm OK with moving
TTM over to fence while we sort out
how / if we're going to use this.

While reviewing, it struck me that this is kind of error-prone, and hard
to follow since we're operating on a structure that may be
continually updated under us, needing a lot of RCU-specific macros and
barriers.

Also the rcu wait appears to not complete until there are no busy fences
left (new ones can be added while we wait) rather than
waiting on a snapshot of busy fences.

I wonder if these issues can be addressed by having a function that
provides a snapshot of all busy fences: This can be accomplished
either by including the exclusive fence in the fence_list structure and
allocate a new such structure each time it is updated. The RCU reader
could then just make a copy of the current fence_list structure pointed
to by obj-fence, but I'm not sure we want to reallocate *each* time we
update the fence pointer.

The other approach uses a seqlock to obtain a consistent snapshot, and
I've attached an incomplete outline, and I'm not 100% whether it's OK to
combine RCU and seqlocks in this way...

Both these approaches have the benefit of hiding the RCU snapshotting in
a single function, that can then be used by any waiting
or polling function.

/Thomas



On 04/09/2014 04:49 PM, Maarten Lankhorst wrote:
 This adds 3 more functions to deal with rcu.

 reservation_object_wait_timeout_rcu() will wait on all fences of the
 reservation_object, without obtaining the ww_mutex.

 reservation_object_test_signaled_rcu() will test if all fences of the
 reservation_object are signaled without using the ww_mutex.

 reservation_object_get_excl() is added because touching the fence_excl
 member directly will trigger a sparse warning.

 Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com
 ---
  drivers/base/dma-buf.c  |   46 +++--
  drivers/base/reservation.c  |  147 
 +--
  include/linux/fence.h   |   22 ++
  include/linux/reservation.h |   40 
  4 files changed, 224 insertions(+), 31 deletions(-)


diff --git a/drivers/base/reservation.c b/drivers/base/reservation.c
index b82a5b6..c4bcf10 100644
--- a/drivers/base/reservation.c
+++ b/drivers/base/reservation.c
@@ -82,6 +82,8 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
 {
 	u32 i;
 
+	preempt_disable();
+	write_seqcount_begin(obj-seq);
 	for (i = 0; i  fobj-shared_count; ++i) {
 		if (fobj-shared[i]-context == fence-context) {
 			struct fence *old_fence = fobj-shared[i];
@@ -90,6 +92,8 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
 
 			fobj-shared[i] = fence;
 
+			write_seqcount_end(obj-seq);
+			preempt_enable();
 			fence_put(old_fence);
 			return;
 		}
@@ -101,8 +105,9 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
 	 * make the new fence visible before incrementing
 	 * fobj-shared_count
 	 */
-	smp_wmb();
 	fobj-shared_count++;
+	write_seqcount_end(obj-seq);
+	preempt_enable();
 }
 
 static void
@@ -141,7 +146,11 @@ reservation_object_add_shared_replace(struct reservation_object *obj,
 		fobj-shared[fobj-shared_count++] = fence;
 
 done:
+	preempt_disable();
+	write_seqcount_begin(obj-seq);
 	obj-fence = fobj;
+	write_seqcount_end(obj-seq);
+	preempt_enable();
 	kfree(old);
 }
 
@@ -173,6 +182,8 @@ void reservation_object_add_excl_fence(struct reservation_object *obj,
 	u32 i = 0;
 
 	old = reservation_object_get_list(obj);
+	preempt_disable();
+	write_seqcount_begin(obj-seq);
 	if (old) {
 		i = old-shared_count;
 		old-shared_count = 0;
@@ -182,7 +193,8 @@ void reservation_object_add_excl_fence(struct reservation_object *obj,
 		fence_get(fence);
 
 	obj-fence_excl = fence;
-
+	write_seqcount_end(obj-seq);
+	preempt_enable();
 	/* inplace update, no shared fences */
 	while (i--)
 		fence_put(old-shared[i]);
@@ -191,3 +203,76 @@ void reservation_object_add_excl_fence(struct reservation_object *obj,
 		fence_put(old_fence);
 }
 EXPORT_SYMBOL(reservation_object_add_excl_fence);
+
+struct unsignaled {
+	unsigned shared_max;
+	unsigned shared_count;
+	struct fence **shared;
+	struct fence *exclusive;
+};
+
+static int reservation_object_unsignaled_rcu(struct reservation_object *obj,
+	 struct unsignaled *us)
+{
+	unsigned seq;
+	struct reservation_object_list *fobj, list;
+	struct fence *fence;
+
+retry:
+	seq = read_seqcount_begin(obj-seq);
+	rcu_read_lock();
+
+	fobj = obj-fence;
+	fence = obj-exclusive;
+
+	/* Check pointers for validity */
+	if (read_seqcount_retry(obj-seq, seq)) {
+		rcu_read_unlock();
+		goto retry;
+	}
+
+	list = *fobj;
+
+	/* Check list for validity */
+	if (read_seqcount_retry(obj-seq, seq)) {
+		rcu_read_unlock();
+		goto retry;
+	}
+
+	if (list.shared_count == 0) {
+		if (fence 
+		!test_bit(FENCE_FLAG_SIGNALED_BIT, fence-flags) 
+		fence_get_rcu(fence))
+			us-exclusive = exclusive;
+		rcu_read_unlock();
+		

Re: [PATCH 2/2] [RFC] reservation: add suppport for read-only access using rcu

2014-04-10 Thread Maarten Lankhorst

Hey,

op 10-04-14 10:46, Thomas Hellstrom schreef:

Hi!

Ugh. This became more complicated than I thought, but I'm OK with moving
TTM over to fence while we sort out
how / if we're going to use this.

While reviewing, it struck me that this is kind of error-prone, and hard
to follow since we're operating on a structure that may be
continually updated under us, needing a lot of RCU-specific macros and
barriers.

Yeah, but with the exception of dma_buf_poll I don't think there is anything 
else
outside drivers/base/reservation.c has to deal with rcu.


Also the rcu wait appears to not complete until there are no busy fences
left (new ones can be added while we wait) rather than
waiting on a snapshot of busy fences.

This has been by design, because 'wait for bo idle' type of functions only care
if the bo is completely idle or not.

It would be easy to make a snapshot even without seqlocks, just copy
reservation_object_test_signaled_rcu to return a shared list if test_all is 
set, or return pointer to exclusive otherwise.


I wonder if these issues can be addressed by having a function that
provides a snapshot of all busy fences: This can be accomplished
either by including the exclusive fence in the fence_list structure and
allocate a new such structure each time it is updated. The RCU reader
could then just make a copy of the current fence_list structure pointed
to by obj-fence, but I'm not sure we want to reallocate *each* time we
update the fence pointer.

No, the most common operation is updating fence pointers, which is why
the current design makes that cheap. It's also why doing rcu reads is more 
expensive.

The other approach uses a seqlock to obtain a consistent snapshot, and
I've attached an incomplete outline, and I'm not 100% whether it's OK to
combine RCU and seqlocks in this way...

Both these approaches have the benefit of hiding the RCU snapshotting in
a single function, that can then be used by any waiting
or polling function.



I think the middle way with using seqlocks to protect the fence_excl pointer 
and shared list combination,
and using RCU to protect the refcounts for fences and the availability of the 
list could work for our usecase
and might remove a bunch of memory barriers. But yeah that depends on layering 
rcu and seqlocks.
No idea if that is allowed. But I suppose it is.

Also, you're being overly paranoid with seqlock reading, we would only need 
something like this:

rcu_read_lock()
preempt_disable()
seq = read_seqcount_begin();
read fence_excl, shared_count = ACCESS_ONCE(fence-shared_count)
copy shared to a struct.
if (read_seqcount_retry()) { unlock and retry }
  preempt_enable();
  use fence_get_rcu() to bump refcount on everything, if that fails unlock, 
put, and retry
rcu_read_unlock()

But the shared list would still need to be RCU'd, to make sure we're not 
reading freed garbage.

~Maarten

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] [RFC] reservation: add suppport for read-only access using rcu

2014-04-10 Thread Thomas Hellstrom
On 04/10/2014 12:07 PM, Maarten Lankhorst wrote:
 Hey,

 op 10-04-14 10:46, Thomas Hellstrom schreef:
 Hi!

 Ugh. This became more complicated than I thought, but I'm OK with moving
 TTM over to fence while we sort out
 how / if we're going to use this.

 While reviewing, it struck me that this is kind of error-prone, and hard
 to follow since we're operating on a structure that may be
 continually updated under us, needing a lot of RCU-specific macros and
 barriers.
 Yeah, but with the exception of dma_buf_poll I don't think there is
 anything else
 outside drivers/base/reservation.c has to deal with rcu.

 Also the rcu wait appears to not complete until there are no busy fences
 left (new ones can be added while we wait) rather than
 waiting on a snapshot of busy fences.
 This has been by design, because 'wait for bo idle' type of functions
 only care
 if the bo is completely idle or not.

No, not when using RCU, because the bo may be busy again before the
function returns :)
Complete idleness can only be guaranteed if holding the reservation, or
otherwise making sure
that no new rendering is submitted to the buffer, so it's an overkill to
wait for complete idleness here.


 It would be easy to make a snapshot even without seqlocks, just copy
 reservation_object_test_signaled_rcu to return a shared list if
 test_all is set, or return pointer to exclusive otherwise.

 I wonder if these issues can be addressed by having a function that
 provides a snapshot of all busy fences: This can be accomplished
 either by including the exclusive fence in the fence_list structure and
 allocate a new such structure each time it is updated. The RCU reader
 could then just make a copy of the current fence_list structure pointed
 to by obj-fence, but I'm not sure we want to reallocate *each* time we
 update the fence pointer.
 No, the most common operation is updating fence pointers, which is why
 the current design makes that cheap. It's also why doing rcu reads is
 more expensive.
 The other approach uses a seqlock to obtain a consistent snapshot, and
 I've attached an incomplete outline, and I'm not 100% whether it's OK to
 combine RCU and seqlocks in this way...

 Both these approaches have the benefit of hiding the RCU snapshotting in
 a single function, that can then be used by any waiting
 or polling function.


 I think the middle way with using seqlocks to protect the fence_excl
 pointer and shared list combination,
 and using RCU to protect the refcounts for fences and the availability
 of the list could work for our usecase
 and might remove a bunch of memory barriers. But yeah that depends on
 layering rcu and seqlocks.
 No idea if that is allowed. But I suppose it is.

 Also, you're being overly paranoid with seqlock reading, we would only
 need something like this:

 rcu_read_lock()
 preempt_disable()
 seq = read_seqcount_begin()
 read fence_excl, shared_count = ACCESS_ONCE(fence-shared_count)
 copy shared to a struct.
 if (read_seqcount_retry()) { unlock and retry }
   preempt_enable();
   use fence_get_rcu() to bump refcount on everything, if that fails
 unlock, put, and retry
 rcu_read_unlock()

 But the shared list would still need to be RCU'd, to make sure we're
 not reading freed garbage.

Ah. OK,
But I think we should use rcu inside seqcount, because
read_seqcount_begin() may spin for a long time if there are
many writers. Also I don't think the preempt_disable() is needed for
read_seq critical sections other than they might
decrease the risc of retries..

Thanks,
Thomas



 ~Maarten
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] [RFC] reservation: add suppport for read-only access using rcu

2014-04-10 Thread Thomas Hellstrom
On 04/10/2014 01:08 PM, Thomas Hellstrom wrote:
 On 04/10/2014 12:07 PM, Maarten Lankhorst wrote:
 Hey,

 op 10-04-14 10:46, Thomas Hellstrom schreef:
 Hi!

 Ugh. This became more complicated than I thought, but I'm OK with moving
 TTM over to fence while we sort out
 how / if we're going to use this.

 While reviewing, it struck me that this is kind of error-prone, and hard
 to follow since we're operating on a structure that may be
 continually updated under us, needing a lot of RCU-specific macros and
 barriers.
 Yeah, but with the exception of dma_buf_poll I don't think there is
 anything else
 outside drivers/base/reservation.c has to deal with rcu.

 Also the rcu wait appears to not complete until there are no busy fences
 left (new ones can be added while we wait) rather than
 waiting on a snapshot of busy fences.
 This has been by design, because 'wait for bo idle' type of functions
 only care
 if the bo is completely idle or not.
 No, not when using RCU, because the bo may be busy again before the
 function returns :)
 Complete idleness can only be guaranteed if holding the reservation, or
 otherwise making sure
 that no new rendering is submitted to the buffer, so it's an overkill to
 wait for complete idleness here.

Although, if we fail to get a refcount for a fence, and it's still busy
we need to do  a seq retry,
because the fence might have been replaced by another fence from the
same context, without being idle. That check is not present in the
snapshot code I sent.

/Thomas
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html