Re: [Linaro-mm-sig] [PATCH 4/7] fence: dma-buf cross-device synchronization (v11)

2013-01-31 Thread Inki Dae
Hi,

below is my opinion.

 +struct fence;
 +struct fence_ops;
 +struct fence_cb;
 +
 +/**
 + * struct fence - software synchronization primitive
 + * @refcount: refcount for this fence
 + * @ops: fence_ops associated with this fence
 + * @cb_list: list of all callbacks to call
 + * @lock: spin_lock_irqsave used for locking
 + * @priv: fence specific private data
 + * @flags: A mask of FENCE_FLAG_* defined below
 + *
 + * the flags member must be manipulated and read using the appropriate
 + * atomic ops (bit_*), so taking the spinlock will not be needed most
 + * of the time.
 + *
 + * FENCE_FLAG_SIGNALED_BIT - fence is already signaled
 + * FENCE_FLAG_ENABLE_SIGNAL_BIT - enable_signaling might have been called*
 + * FENCE_FLAG_USER_BITS - start of the unused bits, can be used by the
 + * implementer of the fence for its own purposes. Can be used in different
 + * ways by different fence implementers, so do not rely on this.
 + *
 + * *) Since atomic bitops are used, this is not guaranteed to be the case.
 + * Particularly, if the bit was set, but fence_signal was called right
 + * before this bit was set, it would have been able to set the
 + * FENCE_FLAG_SIGNALED_BIT, before enable_signaling was called.
 + * Adding a check for FENCE_FLAG_SIGNALED_BIT after setting
 + * FENCE_FLAG_ENABLE_SIGNAL_BIT closes this race, and makes sure that
 + * after fence_signal was called, any enable_signaling call will have either
 + * been completed, or never called at all.
 + */
 +struct fence {
 +   struct kref refcount;
 +   const struct fence_ops *ops;
 +   struct list_head cb_list;
 +   spinlock_t *lock;
 +   unsigned context, seqno;
 +   unsigned long flags;
 +};
 +
 +enum fence_flag_bits {
 +   FENCE_FLAG_SIGNALED_BIT,
 +   FENCE_FLAG_ENABLE_SIGNAL_BIT,
 +   FENCE_FLAG_USER_BITS, /* must always be last member */
 +};
 +

It seems like that this fence framework need to add read/write flags.
In case of two read operations, one might wait for another one. But
the another is just read operation so we doesn't need to wait for it.
Shouldn't fence-wait-request be ignored? In this case, I think it's
enough to consider just only write operation.

For this, you could add the following,

enum fence_flag_bits {
...
FENCE_FLAG_ACCESS_READ_BIT,
FENCE_FLAG_ACCESS_WRITE_BIT,
...
};

And the producer could call fence_init() like below,
__fence_init(..., FENCE_FLAG_ACCESS_WRITE_BIT,...);

With this, fence-flags has FENCE_FLAG_ACCESS_WRITE_BIT as write
operation and then other sides(read or write operation) would wait for
the write operation completion.
And also consumer calls that function with FENCE_FLAG_ACCESS_READ_BIT
so that other consumers could ignore the fence-wait to any read
operations.

Thanks,
Inki Dae
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 4/7] fence: dma-buf cross-device synchronization (v11)

2013-01-31 Thread Maarten Lankhorst
Op 31-01-13 10:32, Inki Dae schreef:
 Hi,

 below is my opinion.

 +struct fence;
 +struct fence_ops;
 +struct fence_cb;
 +
 +/**
 + * struct fence - software synchronization primitive
 + * @refcount: refcount for this fence
 + * @ops: fence_ops associated with this fence
 + * @cb_list: list of all callbacks to call
 + * @lock: spin_lock_irqsave used for locking
 + * @priv: fence specific private data
 + * @flags: A mask of FENCE_FLAG_* defined below
 + *
 + * the flags member must be manipulated and read using the appropriate
 + * atomic ops (bit_*), so taking the spinlock will not be needed most
 + * of the time.
 + *
 + * FENCE_FLAG_SIGNALED_BIT - fence is already signaled
 + * FENCE_FLAG_ENABLE_SIGNAL_BIT - enable_signaling might have been called*
 + * FENCE_FLAG_USER_BITS - start of the unused bits, can be used by the
 + * implementer of the fence for its own purposes. Can be used in different
 + * ways by different fence implementers, so do not rely on this.
 + *
 + * *) Since atomic bitops are used, this is not guaranteed to be the case.
 + * Particularly, if the bit was set, but fence_signal was called right
 + * before this bit was set, it would have been able to set the
 + * FENCE_FLAG_SIGNALED_BIT, before enable_signaling was called.
 + * Adding a check for FENCE_FLAG_SIGNALED_BIT after setting
 + * FENCE_FLAG_ENABLE_SIGNAL_BIT closes this race, and makes sure that
 + * after fence_signal was called, any enable_signaling call will have either
 + * been completed, or never called at all.
 + */
 +struct fence {
 +   struct kref refcount;
 +   const struct fence_ops *ops;
 +   struct list_head cb_list;
 +   spinlock_t *lock;
 +   unsigned context, seqno;
 +   unsigned long flags;
 +};
 +
 +enum fence_flag_bits {
 +   FENCE_FLAG_SIGNALED_BIT,
 +   FENCE_FLAG_ENABLE_SIGNAL_BIT,
 +   FENCE_FLAG_USER_BITS, /* must always be last member */
 +};
 +
 It seems like that this fence framework need to add read/write flags.
 In case of two read operations, one might wait for another one. But
 the another is just read operation so we doesn't need to wait for it.
 Shouldn't fence-wait-request be ignored? In this case, I think it's
 enough to consider just only write operation.

 For this, you could add the following,

 enum fence_flag_bits {
 ...
 FENCE_FLAG_ACCESS_READ_BIT,
 FENCE_FLAG_ACCESS_WRITE_BIT,
 ...
 };

 And the producer could call fence_init() like below,
 __fence_init(..., FENCE_FLAG_ACCESS_WRITE_BIT,...);

 With this, fence-flags has FENCE_FLAG_ACCESS_WRITE_BIT as write
 operation and then other sides(read or write operation) would wait for
 the write operation completion.
 And also consumer calls that function with FENCE_FLAG_ACCESS_READ_BIT
 so that other consumers could ignore the fence-wait to any read
 operations.

You can't put that information in the fence. If you use a fence to fence off a 
hardware memcpy operation,
there would be one buffer for which you would attach the fence in read mode and 
another buffer where you need
write access.

~Maarten

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 4/7] fence: dma-buf cross-device synchronization (v11)

2013-01-31 Thread Daniel Vetter
On Thu, Jan 31, 2013 at 06:32:15PM +0900, Inki Dae wrote:
 Hi,
 
 below is my opinion.
 
  +struct fence;
  +struct fence_ops;
  +struct fence_cb;
  +
  +/**
  + * struct fence - software synchronization primitive
  + * @refcount: refcount for this fence
  + * @ops: fence_ops associated with this fence
  + * @cb_list: list of all callbacks to call
  + * @lock: spin_lock_irqsave used for locking
  + * @priv: fence specific private data
  + * @flags: A mask of FENCE_FLAG_* defined below
  + *
  + * the flags member must be manipulated and read using the appropriate
  + * atomic ops (bit_*), so taking the spinlock will not be needed most
  + * of the time.
  + *
  + * FENCE_FLAG_SIGNALED_BIT - fence is already signaled
  + * FENCE_FLAG_ENABLE_SIGNAL_BIT - enable_signaling might have been called*
  + * FENCE_FLAG_USER_BITS - start of the unused bits, can be used by the
  + * implementer of the fence for its own purposes. Can be used in different
  + * ways by different fence implementers, so do not rely on this.
  + *
  + * *) Since atomic bitops are used, this is not guaranteed to be the case.
  + * Particularly, if the bit was set, but fence_signal was called right
  + * before this bit was set, it would have been able to set the
  + * FENCE_FLAG_SIGNALED_BIT, before enable_signaling was called.
  + * Adding a check for FENCE_FLAG_SIGNALED_BIT after setting
  + * FENCE_FLAG_ENABLE_SIGNAL_BIT closes this race, and makes sure that
  + * after fence_signal was called, any enable_signaling call will have 
  either
  + * been completed, or never called at all.
  + */
  +struct fence {
  +   struct kref refcount;
  +   const struct fence_ops *ops;
  +   struct list_head cb_list;
  +   spinlock_t *lock;
  +   unsigned context, seqno;
  +   unsigned long flags;
  +};
  +
  +enum fence_flag_bits {
  +   FENCE_FLAG_SIGNALED_BIT,
  +   FENCE_FLAG_ENABLE_SIGNAL_BIT,
  +   FENCE_FLAG_USER_BITS, /* must always be last member */
  +};
  +
 
 It seems like that this fence framework need to add read/write flags.
 In case of two read operations, one might wait for another one. But
 the another is just read operation so we doesn't need to wait for it.
 Shouldn't fence-wait-request be ignored? In this case, I think it's
 enough to consider just only write operation.
 
 For this, you could add the following,
 
 enum fence_flag_bits {
 ...
 FENCE_FLAG_ACCESS_READ_BIT,
 FENCE_FLAG_ACCESS_WRITE_BIT,
 ...
 };
 
 And the producer could call fence_init() like below,
 __fence_init(..., FENCE_FLAG_ACCESS_WRITE_BIT,...);
 
 With this, fence-flags has FENCE_FLAG_ACCESS_WRITE_BIT as write
 operation and then other sides(read or write operation) would wait for
 the write operation completion.
 And also consumer calls that function with FENCE_FLAG_ACCESS_READ_BIT
 so that other consumers could ignore the fence-wait to any read
 operations.

Fences here match more to the sync-points concept from the android stuff.
The idea is that they only signal when a hw operation completes.

Synchronization integration happens at the dma_buf level, where you can
specify whether the new operation you're doing is exclusive (which means
that you need to wait for all previous operations to complete), i.e. a
write. Or whether the operation is non-excluses (i.e. just reading) in
which case you only need to wait for any still outstanding exclusive
fences attached to the dma_buf. But you _can_ attach more than one
non-exclusive fence to a dma_buf at the same time, and so e.g. read a
buffer objects from different engines concurrently.

There's been some talk whether we also need a non-exclusive write
attachment (i.e. allow multiple concurrent writers), but I don't yet fully
understand the use-case.

In short the proposed patches can do what you want to do, it's just that
read/write access isn't part of the fences, but how you attach fences to
dma_bufs.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [PATCH 4/7] fence: dma-buf cross-device synchronization (v11)

2013-01-31 Thread Inki Dae
2013/1/31 Daniel Vetter dan...@ffwll.ch:
 On Thu, Jan 31, 2013 at 06:32:15PM +0900, Inki Dae wrote:
 Hi,

 below is my opinion.

  +struct fence;
  +struct fence_ops;
  +struct fence_cb;
  +
  +/**
  + * struct fence - software synchronization primitive
  + * @refcount: refcount for this fence
  + * @ops: fence_ops associated with this fence
  + * @cb_list: list of all callbacks to call
  + * @lock: spin_lock_irqsave used for locking
  + * @priv: fence specific private data
  + * @flags: A mask of FENCE_FLAG_* defined below
  + *
  + * the flags member must be manipulated and read using the appropriate
  + * atomic ops (bit_*), so taking the spinlock will not be needed most
  + * of the time.
  + *
  + * FENCE_FLAG_SIGNALED_BIT - fence is already signaled
  + * FENCE_FLAG_ENABLE_SIGNAL_BIT - enable_signaling might have been called*
  + * FENCE_FLAG_USER_BITS - start of the unused bits, can be used by the
  + * implementer of the fence for its own purposes. Can be used in different
  + * ways by different fence implementers, so do not rely on this.
  + *
  + * *) Since atomic bitops are used, this is not guaranteed to be the case.
  + * Particularly, if the bit was set, but fence_signal was called right
  + * before this bit was set, it would have been able to set the
  + * FENCE_FLAG_SIGNALED_BIT, before enable_signaling was called.
  + * Adding a check for FENCE_FLAG_SIGNALED_BIT after setting
  + * FENCE_FLAG_ENABLE_SIGNAL_BIT closes this race, and makes sure that
  + * after fence_signal was called, any enable_signaling call will have 
  either
  + * been completed, or never called at all.
  + */
  +struct fence {
  +   struct kref refcount;
  +   const struct fence_ops *ops;
  +   struct list_head cb_list;
  +   spinlock_t *lock;
  +   unsigned context, seqno;
  +   unsigned long flags;
  +};
  +
  +enum fence_flag_bits {
  +   FENCE_FLAG_SIGNALED_BIT,
  +   FENCE_FLAG_ENABLE_SIGNAL_BIT,
  +   FENCE_FLAG_USER_BITS, /* must always be last member */
  +};
  +

 It seems like that this fence framework need to add read/write flags.
 In case of two read operations, one might wait for another one. But
 the another is just read operation so we doesn't need to wait for it.
 Shouldn't fence-wait-request be ignored? In this case, I think it's
 enough to consider just only write operation.

 For this, you could add the following,

 enum fence_flag_bits {
 ...
 FENCE_FLAG_ACCESS_READ_BIT,
 FENCE_FLAG_ACCESS_WRITE_BIT,
 ...
 };

 And the producer could call fence_init() like below,
 __fence_init(..., FENCE_FLAG_ACCESS_WRITE_BIT,...);

 With this, fence-flags has FENCE_FLAG_ACCESS_WRITE_BIT as write
 operation and then other sides(read or write operation) would wait for
 the write operation completion.
 And also consumer calls that function with FENCE_FLAG_ACCESS_READ_BIT
 so that other consumers could ignore the fence-wait to any read
 operations.

 Fences here match more to the sync-points concept from the android stuff.
 The idea is that they only signal when a hw operation completes.

 Synchronization integration happens at the dma_buf level, where you can
 specify whether the new operation you're doing is exclusive (which means
 that you need to wait for all previous operations to complete), i.e. a
 write. Or whether the operation is non-excluses (i.e. just reading) in
 which case you only need to wait for any still outstanding exclusive
 fences attached to the dma_buf. But you _can_ attach more than one
 non-exclusive fence to a dma_buf at the same time, and so e.g. read a
 buffer objects from different engines concurrently.

 There's been some talk whether we also need a non-exclusive write
 attachment (i.e. allow multiple concurrent writers), but I don't yet fully
 understand the use-case.

 In short the proposed patches can do what you want to do, it's just that
 read/write access isn't part of the fences, but how you attach fences to
 dma_bufs.


Thanks for comments, Maarten and Daniel.

I think I understand as your comment but I don't think that I
understand fully the dma-fence mechanism. So I wish you to give me
some advices for it. In our case, I'm applying the dma-fence to
mali(3d gpu) driver as producer and exynos drm(display controller)
driver as consumer.

And the sequence is as the following:
In case of producer,
1. call fence_wait to wait for the dma access completion of others.
2. And then the producer creates a fence and a new reservation entry.
3. And then it sets the given dmabuf's resv(reservation_object) to the
new reservation entry.
4. And then it adds the reservation entry to entries list.
5. And then it sets the fence to all dmabufs of the entries list.
Actually, this work is to set the fence to the reservaion_object of
each dmabuf.
6. And then the producer's dma start.
7. Finally, when the dma start is completed, we get the entries list
from a 3d job command(in case of mali core, pp job) and call
fence_signal() with 

Re: [Linaro-mm-sig] [PATCH 4/7] fence: dma-buf cross-device synchronization (v11)

2013-01-31 Thread Daniel Vetter
On Thu, Jan 31, 2013 at 3:38 PM, Inki Dae inki@samsung.com wrote:
 I think I understand as your comment but I don't think that I
 understand fully the dma-fence mechanism. So I wish you to give me
 some advices for it. In our case, I'm applying the dma-fence to
 mali(3d gpu) driver as producer and exynos drm(display controller)
 driver as consumer.

 And the sequence is as the following:
 In case of producer,
 1. call fence_wait to wait for the dma access completion of others.
 2. And then the producer creates a fence and a new reservation entry.
 3. And then it sets the given dmabuf's resv(reservation_object) to the
 new reservation entry.
 4. And then it adds the reservation entry to entries list.
 5. And then it sets the fence to all dmabufs of the entries list.
 Actually, this work is to set the fence to the reservaion_object of
 each dmabuf.
 6. And then the producer's dma start.
 7. Finally, when the dma start is completed, we get the entries list
 from a 3d job command(in case of mali core, pp job) and call
 fence_signal() with each fence of each reservation entry.

 From here, is there my missing point?

Yeah, more or less. Although you need to wrap everything into ticket
reservation locking so that you can atomically update fences if you
have support for some form of device2device singalling (i.e. without
blocking on the cpu until all the old users completed). At least
that's the main point of Maarten's patches (and this does work with
prime between a few drivers by now), but ofc you can use cpu blocking
as a fallback.

 And I thought the fence from reservation entry at step 7 means that
 the producer wouldn't access the dmabuf attaching this fence anymore
 so this step wakes up all processes blocked. So I understood that the
 fence means a owner accessing the given dmabuf and we could aware of
 whether the owner commited its own fence to the given dmabuf to read
 or write through the fence's flags.

The fence doesn't give ownership of the dma_buf object, but only
indicates when the dma access will have completed. The relationship
between dma_buf/reservation and the attached fences specify whether
other hw engines can access the dma_buf, too (if the fence is
non-exclusive).

 If you give me some advices, I'd be happy.

Rob and Maarten are working on some howtos and documentation with
example code, I guess it'd be best to wait a bit until we have that.
Or just review the existing stuff Rob just posted and reply with
questions there.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html