RE: Introduce a new helper framework for buffer synchronization

2013-05-28 Thread Inki Dae


> -Original Message-
> From: daniel.vet...@ffwll.ch [mailto:daniel.vet...@ffwll.ch] On Behalf Of
> Daniel Vetter
> Sent: Wednesday, May 29, 2013 1:50 AM
> To: Inki Dae
> Cc: Rob Clark; Maarten Lankhorst; linux-fbdev; YoungJun Cho; Kyungmin
Park;
> myungjoo.ham; DRI mailing list; linux-arm-ker...@lists.infradead.org;
> linux-media@vger.kernel.org
> Subject: Re: Introduce a new helper framework for buffer synchronization
> 
> On Tue, May 28, 2013 at 4:50 PM, Inki Dae  wrote:
> > I think I already used reservation stuff any time in that way except
> > ww-mutex. And I'm not sure that embedded system really needs ww-mutex.
> If
> > there is any case,
> > could you tell me the case? I really need more advice and
> understanding :)
> 
> If you have only one driver, you can get away without ww_mutex.
> drm/i915 does it, all buffer state is protected by dev->struct_mutex.
> But as soon as you have multiple drivers sharing buffers with dma_buf
> things will blow up.
> 
> Yep, current prime is broken and can lead to deadlocks.
> 
> In practice it doesn't (yet) matter since only the X server does the
> sharing dance, and that one's single-threaded. Now you can claim that
> since you have all buffers pinned in embedded gfx anyway, you don't
> care. But both in desktop gfx and embedded gfx the real fun starts
> once you put fences into the mix and link them up with buffers, then
> every command submission risks that deadlock. Furthermore you can get
> unlucky and construct a circle of fences waiting on each another (only
> though if the fence singalling fires off the next batchbuffer
> asynchronously).

In our case, we haven't ever experienced deadlock yet but there is still
possible to face with deadlock in case that a process is sharing two buffer
with another process like below,
Process A committed buffer A and  waits for buffer B,
Process B committed buffer B and waits for buffer A

That is deadlock and it seems that you say we can resolve deadlock issue
with ww-mutexes. And it seems that we can replace our block-wakeup mechanism
with mutex lock for more performance.

> 
> To prevent such deadlocks you _absolutely_ need to lock _all_ buffers
> that take part in a command submission at once. To do that you either
> need a global lock (ugh) or ww_mutexes.
> 
> So ww_mutexes are the fundamental ingredient of all this, if you don't
> see why you need them then everything piled on top is broken. I think
> until you've understood why exactly we need ww_mutexes there's not
> much point in discussing the finer issues of fences, reservation
> objects and how to integrate it with dma_bufs exactly.
> 
> I'll try to clarify the motivating example in the ww_mutex
> documentation a bit, but I dunno how else I could explain this ...
> 

I don't really want for you waste your time on me. I will trying to apply
ww-mutexes (v4) to the proposed framework for more understanding.

Thanks for your advices.:) 
Inki Dae

> Yours, Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Introduce a new helper framework for buffer synchronization

2013-05-28 Thread Daniel Vetter
On Tue, May 28, 2013 at 4:50 PM, Inki Dae  wrote:
> I think I already used reservation stuff any time in that way except
> ww-mutex. And I'm not sure that embedded system really needs ww-mutex. If
> there is any case,
> could you tell me the case? I really need more advice and understanding :)

If you have only one driver, you can get away without ww_mutex.
drm/i915 does it, all buffer state is protected by dev->struct_mutex.
But as soon as you have multiple drivers sharing buffers with dma_buf
things will blow up.

Yep, current prime is broken and can lead to deadlocks.

In practice it doesn't (yet) matter since only the X server does the
sharing dance, and that one's single-threaded. Now you can claim that
since you have all buffers pinned in embedded gfx anyway, you don't
care. But both in desktop gfx and embedded gfx the real fun starts
once you put fences into the mix and link them up with buffers, then
every command submission risks that deadlock. Furthermore you can get
unlucky and construct a circle of fences waiting on each another (only
though if the fence singalling fires off the next batchbuffer
asynchronously).

To prevent such deadlocks you _absolutely_ need to lock _all_ buffers
that take part in a command submission at once. To do that you either
need a global lock (ugh) or ww_mutexes.

So ww_mutexes are the fundamental ingredient of all this, if you don't
see why you need them then everything piled on top is broken. I think
until you've understood why exactly we need ww_mutexes there's not
much point in discussing the finer issues of fences, reservation
objects and how to integrate it with dma_bufs exactly.

I'll try to clarify the motivating example in the ww_mutex
documentation a bit, but I dunno how else I could explain this ...

Yours, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Introduce a new helper framework for buffer synchronization

2013-05-28 Thread Inki Dae


> -Original Message-
> From: linux-fbdev-ow...@vger.kernel.org [mailto:linux-fbdev-
> ow...@vger.kernel.org] On Behalf Of Rob Clark
> Sent: Tuesday, May 28, 2013 10:49 PM
> To: Inki Dae
> Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho; Kyungmin
> Park; myungjoo.ham; DRI mailing list;
linux-arm-ker...@lists.infradead.org;
> linux-media@vger.kernel.org
> Subject: Re: Introduce a new helper framework for buffer synchronization
> 
> On Mon, May 27, 2013 at 11:56 PM, Inki Dae  wrote:
> >
> >
> >> -Original Message-
> >> From: linux-fbdev-ow...@vger.kernel.org [mailto:linux-fbdev-
> >> ow...@vger.kernel.org] On Behalf Of Rob Clark
> >> Sent: Tuesday, May 28, 2013 12:48 AM
> >> To: Inki Dae
> >> Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho;
> Kyungmin
> >> Park; myungjoo.ham; DRI mailing list;
> > linux-arm-ker...@lists.infradead.org;
> >> linux-media@vger.kernel.org
> >> Subject: Re: Introduce a new helper framework for buffer
> synchronization
> >>
> >> On Mon, May 27, 2013 at 6:38 AM, Inki Dae  wrote:
> >> > Hi all,
> >> >
> >> > I have been removed previous branch and added new one with more
> cleanup.
> >> > This time, the fence helper doesn't include user side interfaces and
> >> cache
> >> > operation relevant codes anymore because not only we are not sure
> that
> >> > coupling those two things, synchronizing caches and buffer access
> >> between
> >> > CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side
> is
> >> a
> >> > good idea yet but also existing codes for user side have problems
> with
> >> badly
> >> > behaved or crashing userspace. So this could be more discussed later.
> >> >
> >> > The below is a new branch,
> >> >
> >> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
> >> exynos.git/?h=dma-f
> >> > ence-helper
> >> >
> >> > And fence helper codes,
> >> >
> >> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
> >> exynos.git/commit/?
> >> > h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005
> >> >
> >> > And example codes for device driver,
> >> >
> >> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
> >> exynos.git/commit/?
> >> > h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae
> >> >
> >> > I think the time is not yet ripe for RFC posting: maybe existing dma
> >> fence
> >> > and reservation need more review and addition work. So I'd glad for
> >> somebody
> >> > giving other opinions and advices in advance before RFC posting.
> >>
> >> thoughts from a *really* quick, pre-coffee, first look:
> >> * any sort of helper to simplify single-buffer sort of use-cases (v4l)
> >> probably wouldn't want to bake in assumption that seqno_fence is used.
> >> * I guess g2d is probably not actually a simple use case, since I
> >> expect you can submit blits involving multiple buffers :-P
> >
> > I don't think so. One and more buffers can be used: seqno_fence also has
> > only one buffer. Actually, we have already applied this approach to most
> > devices; multimedia, gpu and display controller. And this approach shows
> > more performance; reduced power consumption against traditional way. And
> g2d
> > example is just to show you how to apply my approach to device driver.
> 
> no, you need the ww-mutex / reservation stuff any time you have
> multiple independent devices (or rings/contexts for hw that can
> support multiple contexts) which can do operations with multiple
> buffers.

I think I already used reservation stuff any time in that way except
ww-mutex. And I'm not sure that embedded system really needs ww-mutex. If
there is any case, 
could you tell me the case? I really need more advice and understanding :)

Thanks,
Inki Dae

  So you could conceivably hit this w/ gpu + g2d if multiple
> buffers where shared between the two.  vram migration and such
> 'desktop stuff' might make the problem worse, but just because you
> don't have vram doesn't mean you don't have a problem with multiple
> buffers.
> 
> >> * otherwise, you probably don't want to depend on dmabuf, which is why
> >> reservation/fence is split out the way it is..  you want to be able to
> >> use a single reserva

RE: Introduce a new helper framework for buffer synchronization

2013-05-28 Thread Inki Dae

Hi Daniel,

Thank you so much. And so very useful.:) Sorry but could be give me more
comments to the below my comments? There are still things making me
confusing.:(


> -Original Message-
> From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Tuesday, May 28, 2013 7:33 PM
> To: Inki Dae
> Cc: 'Rob Clark'; 'Maarten Lankhorst'; 'Daniel Vetter'; 'linux-fbdev';
> 'YoungJun Cho'; 'Kyungmin Park'; 'myungjoo.ham'; 'DRI mailing list';
> linux-arm-ker...@lists.infradead.org; linux-media@vger.kernel.org
> Subject: Re: Introduce a new helper framework for buffer synchronization
> 
> On Tue, May 28, 2013 at 12:56:57PM +0900, Inki Dae wrote:
> >
> >
> > > -Original Message-
> > > From: linux-fbdev-ow...@vger.kernel.org [mailto:linux-fbdev-
> > > ow...@vger.kernel.org] On Behalf Of Rob Clark
> > > Sent: Tuesday, May 28, 2013 12:48 AM
> > > To: Inki Dae
> > > Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho;
> Kyungmin
> > > Park; myungjoo.ham; DRI mailing list;
> > linux-arm-ker...@lists.infradead.org;
> > > linux-media@vger.kernel.org
> > > Subject: Re: Introduce a new helper framework for buffer
> synchronization
> > >
> > > On Mon, May 27, 2013 at 6:38 AM, Inki Dae 
wrote:
> > > > Hi all,
> > > >
> > > > I have been removed previous branch and added new one with more
> cleanup.
> > > > This time, the fence helper doesn't include user side interfaces and
> > > cache
> > > > operation relevant codes anymore because not only we are not sure
> that
> > > > coupling those two things, synchronizing caches and buffer access
> > > between
> > > > CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel
> side is
> > > a
> > > > good idea yet but also existing codes for user side have problems
> with
> > > badly
> > > > behaved or crashing userspace. So this could be more discussed
later.
> > > >
> > > > The below is a new branch,
> > > >
> > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
> > > exynos.git/?h=dma-f
> > > > ence-helper
> > > >
> > > > And fence helper codes,
> > > >
> > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
> > > exynos.git/commit/?
> > > > h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005
> > > >
> > > > And example codes for device driver,
> > > >
> > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
> > > exynos.git/commit/?
> > > > h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae
> > > >
> > > > I think the time is not yet ripe for RFC posting: maybe existing dma
> > > fence
> > > > and reservation need more review and addition work. So I'd glad for
> > > somebody
> > > > giving other opinions and advices in advance before RFC posting.
> > >
> > > thoughts from a *really* quick, pre-coffee, first look:
> > > * any sort of helper to simplify single-buffer sort of use-cases (v4l)
> > > probably wouldn't want to bake in assumption that seqno_fence is used.
> > > * I guess g2d is probably not actually a simple use case, since I
> > > expect you can submit blits involving multiple buffers :-P
> >
> > I don't think so. One and more buffers can be used: seqno_fence also has
> > only one buffer. Actually, we have already applied this approach to most
> > devices; multimedia, gpu and display controller. And this approach shows
> > more performance; reduced power consumption against traditional way. And
> g2d
> > example is just to show you how to apply my approach to device driver.
> 
> Note that seqno_fence is an implementation pattern for a certain type of
> direct hw->hw synchronization which uses a shared dma_buf to exchange the
> sync cookie.

I'm afraid that I don't understand hw->hw synchronization. hw->hw
synchronization means that device has a hardware feature which supports
buffer synchronization hardware internally? And what is the sync cookie?

> The dma_buf attached to the seqno_fence has _nothing_ to do
> with the dma_buf the fence actually coordinates access to.
> 
> I think that confusing is a large reason for why Maarten&I don't
> understand what you want to achieve with your fence helpers. Currently
> they're us

Re: Introduce a new helper framework for buffer synchronization

2013-05-28 Thread Rob Clark
On Mon, May 27, 2013 at 11:56 PM, Inki Dae  wrote:
>
>
>> -Original Message-
>> From: linux-fbdev-ow...@vger.kernel.org [mailto:linux-fbdev-
>> ow...@vger.kernel.org] On Behalf Of Rob Clark
>> Sent: Tuesday, May 28, 2013 12:48 AM
>> To: Inki Dae
>> Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho; Kyungmin
>> Park; myungjoo.ham; DRI mailing list;
> linux-arm-ker...@lists.infradead.org;
>> linux-media@vger.kernel.org
>> Subject: Re: Introduce a new helper framework for buffer synchronization
>>
>> On Mon, May 27, 2013 at 6:38 AM, Inki Dae  wrote:
>> > Hi all,
>> >
>> > I have been removed previous branch and added new one with more cleanup.
>> > This time, the fence helper doesn't include user side interfaces and
>> cache
>> > operation relevant codes anymore because not only we are not sure that
>> > coupling those two things, synchronizing caches and buffer access
>> between
>> > CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is
>> a
>> > good idea yet but also existing codes for user side have problems with
>> badly
>> > behaved or crashing userspace. So this could be more discussed later.
>> >
>> > The below is a new branch,
>> >
>> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
>> exynos.git/?h=dma-f
>> > ence-helper
>> >
>> > And fence helper codes,
>> >
>> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
>> exynos.git/commit/?
>> > h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005
>> >
>> > And example codes for device driver,
>> >
>> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
>> exynos.git/commit/?
>> > h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae
>> >
>> > I think the time is not yet ripe for RFC posting: maybe existing dma
>> fence
>> > and reservation need more review and addition work. So I'd glad for
>> somebody
>> > giving other opinions and advices in advance before RFC posting.
>>
>> thoughts from a *really* quick, pre-coffee, first look:
>> * any sort of helper to simplify single-buffer sort of use-cases (v4l)
>> probably wouldn't want to bake in assumption that seqno_fence is used.
>> * I guess g2d is probably not actually a simple use case, since I
>> expect you can submit blits involving multiple buffers :-P
>
> I don't think so. One and more buffers can be used: seqno_fence also has
> only one buffer. Actually, we have already applied this approach to most
> devices; multimedia, gpu and display controller. And this approach shows
> more performance; reduced power consumption against traditional way. And g2d
> example is just to show you how to apply my approach to device driver.

no, you need the ww-mutex / reservation stuff any time you have
multiple independent devices (or rings/contexts for hw that can
support multiple contexts) which can do operations with multiple
buffers.  So you could conceivably hit this w/ gpu + g2d if multiple
buffers where shared between the two.  vram migration and such
'desktop stuff' might make the problem worse, but just because you
don't have vram doesn't mean you don't have a problem with multiple
buffers.

>> * otherwise, you probably don't want to depend on dmabuf, which is why
>> reservation/fence is split out the way it is..  you want to be able to
>> use a single reservation/fence mechanism within your driver without
>> having to care about which buffers are exported to dmabuf's and which
>> are not.  Creating a dmabuf for every GEM bo is too heavyweight.
>
> Right. But I think we should dealt with this separately. Actually, we are
> trying to use reservation for gpu pipe line synchronization such as sgx sync
> object and this approach is used without dmabuf. In order words, some device
> can use only reservation for such pipe line synchronization and at the same
> time, fence helper or similar thing with dmabuf for buffer synchronization.

it is probably easier to approach from the reverse direction.. ie, get
reservation/synchronization right first, and then dmabuf.  (Well, that
isn't really a problem because Maarten's reservation/fence patches
support dmabuf from the beginning.)

BR,
-R

>>
>> I'm not entirely sure if reservation/fence could/should be made any
>> simpler for multi-buffer users.  Probably the best thing to do is just
>> get reservation/fence rolled out in a few drivers and see if some
>> common patterns emerge.
>>
>> BR,
>> -R
>>
>> >
>> > Thanks,
>> > Inki Dae
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Introduce a new helper framework for buffer synchronization

2013-05-28 Thread Daniel Vetter
On Tue, May 28, 2013 at 12:56:57PM +0900, Inki Dae wrote:
> 
> 
> > -Original Message-
> > From: linux-fbdev-ow...@vger.kernel.org [mailto:linux-fbdev-
> > ow...@vger.kernel.org] On Behalf Of Rob Clark
> > Sent: Tuesday, May 28, 2013 12:48 AM
> > To: Inki Dae
> > Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho; Kyungmin
> > Park; myungjoo.ham; DRI mailing list;
> linux-arm-ker...@lists.infradead.org;
> > linux-media@vger.kernel.org
> > Subject: Re: Introduce a new helper framework for buffer synchronization
> > 
> > On Mon, May 27, 2013 at 6:38 AM, Inki Dae  wrote:
> > > Hi all,
> > >
> > > I have been removed previous branch and added new one with more cleanup.
> > > This time, the fence helper doesn't include user side interfaces and
> > cache
> > > operation relevant codes anymore because not only we are not sure that
> > > coupling those two things, synchronizing caches and buffer access
> > between
> > > CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is
> > a
> > > good idea yet but also existing codes for user side have problems with
> > badly
> > > behaved or crashing userspace. So this could be more discussed later.
> > >
> > > The below is a new branch,
> > >
> > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
> > exynos.git/?h=dma-f
> > > ence-helper
> > >
> > > And fence helper codes,
> > >
> > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
> > exynos.git/commit/?
> > > h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005
> > >
> > > And example codes for device driver,
> > >
> > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
> > exynos.git/commit/?
> > > h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae
> > >
> > > I think the time is not yet ripe for RFC posting: maybe existing dma
> > fence
> > > and reservation need more review and addition work. So I'd glad for
> > somebody
> > > giving other opinions and advices in advance before RFC posting.
> > 
> > thoughts from a *really* quick, pre-coffee, first look:
> > * any sort of helper to simplify single-buffer sort of use-cases (v4l)
> > probably wouldn't want to bake in assumption that seqno_fence is used.
> > * I guess g2d is probably not actually a simple use case, since I
> > expect you can submit blits involving multiple buffers :-P
> 
> I don't think so. One and more buffers can be used: seqno_fence also has
> only one buffer. Actually, we have already applied this approach to most
> devices; multimedia, gpu and display controller. And this approach shows
> more performance; reduced power consumption against traditional way. And g2d
> example is just to show you how to apply my approach to device driver.

Note that seqno_fence is an implementation pattern for a certain type of
direct hw->hw synchronization which uses a shared dma_buf to exchange the
sync cookie. The dma_buf attached to the seqno_fence has _nothing_ to do
with the dma_buf the fence actually coordinates access to.

I think that confusing is a large reason for why Maarten&I don't
understand what you want to achieve with your fence helpers. Currently
they're using the seqno_fence, but totally not in a way the seqno_fence
was meant to be used.

Note that with the current code there is only a pointer from dma_bufs to
the fence. The fence itself has _no_ pointer to the dma_buf it syncs. This
shouldn't be a problem since the fence fastpath for already signalled
fences is completely barrier&lock free (it's just a load+bit-test), and
fences are meant to be embedded into whatever dma tracking structure you
already have, so no overhead there. The only ugly part is the fence
refcounting, but I don't think we can drop that.

Note that you completely reinvent this part of Maarten's fence patches by
adding new r/w_complete completions to the reservation object, which
completely replaces the fence stuff.

Also note that a list of reservation entries is again meant to be used
only when submitting the dma to the gpu. With your patches you seem to
hang onto that list until dma completes. This has the ugly side-effect
that you need to allocate these reservation entries with kmalloc, whereas
if you just use them in the execbuf ioctl to construct the dma you can
usually embed it into something else you need already anyway. At least
i915 and ttm based drivers can work that way.

Furthermore fences are specifically constructed as frankenstein-monsters
between completion/waitqueues and c

Re: Introduce a new helper framework for buffer synchronization

2013-05-28 Thread Maarten Lankhorst
Hey,

Op 28-05-13 04:49, Inki Dae schreef:
>
>> -Original Message-
>> From: Maarten Lankhorst [mailto:maarten.lankho...@canonical.com]
>> Sent: Tuesday, May 28, 2013 12:23 AM
>> To: Inki Dae
>> Cc: 'Daniel Vetter'; 'Rob Clark'; 'linux-fbdev'; 'YoungJun Cho'; 'Kyungmin
>> Park'; 'myungjoo.ham'; 'DRI mailing list'; linux-arm-
>> ker...@lists.infradead.org; linux-media@vger.kernel.org
>> Subject: Re: Introduce a new helper framework for buffer synchronization
>>
>> Hey,
>>
>> Op 27-05-13 12:38, Inki Dae schreef:
>>> Hi all,
>>>
>>> I have been removed previous branch and added new one with more cleanup.
>>> This time, the fence helper doesn't include user side interfaces and
>> cache
>>> operation relevant codes anymore because not only we are not sure that
>>> coupling those two things, synchronizing caches and buffer access
>> between
>>> CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is
>> a
>>> good idea yet but also existing codes for user side have problems with
>> badly
>>> behaved or crashing userspace. So this could be more discussed later.
>>>
>>> The below is a new branch,
>>>
>>> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
>> exynos.git/?h=dma-f
>>> ence-helper
>>>
>>> And fence helper codes,
>>>
>>> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
>> exynos.git/commit/?
>>> h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005
>>>
>>> And example codes for device driver,
>>>
>>> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
>> exynos.git/commit/?
>>> h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae
>>>
>>> I think the time is not yet ripe for RFC posting: maybe existing dma
>> fence
>>> and reservation need more review and addition work. So I'd glad for
>> somebody
>>> giving other opinions and advices in advance before RFC posting.
>>>
>> NAK.
>>
>> For examples for how to handle locking properly, see Documentation/ww-
>> mutex-design.txt in my recent tree.
>> I could list what I believe is wrong with your implementation, but real
>> problem is that the approach you're taking is wrong.
> I just removed ticket stubs to show my approach you guys as simple as
> possible, and I just wanted to show that we could use buffer synchronization
> mechanism without ticket stubs.
The tickets have been removed in favor of a ww_context. Moving it in as a base 
primitive
allows more locking abuse to be detected, and makes some other things easier 
too.

> Question, WW-Mutexes could be used for all devices? I guess this has
> dependence on x86 gpu: gpu has VRAM and it means different memory domain.
> And could you tell my why shared fence should have only eight objects? I
> think we could need more than eight objects for read access. Anyway I think
> I don't surely understand yet so there might be my missing point.
Yes, ww mutexes are not limited in any way to x86. They're a locking mechanism.
When you acquired the ww mutexes for all buffer objects, all it does is say at
that point in time you have exclusively acquired the locks of all bo's.

After locking everything you can read the fence pointers safely, queue waits, 
and set a
new fence pointer on all reservation_objects. You only need a single fence
on all those objects, so 8 is plenty. Nonetheless this was a limitation of my
earlier design, and I'll dynamically allocate fence_shared in the future.

~Maarten

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Introduce a new helper framework for buffer synchronization

2013-05-27 Thread Inki Dae


> -Original Message-
> From: daniel.vet...@ffwll.ch [mailto:daniel.vet...@ffwll.ch] On Behalf Of
> Daniel Vetter
> Sent: Tuesday, May 28, 2013 1:02 AM
> To: Rob Clark
> Cc: Inki Dae; Maarten Lankhorst; linux-fbdev; YoungJun Cho; Kyungmin Park;
> myungjoo.ham; DRI mailing list; linux-arm-ker...@lists.infradead.org;
> linux-media@vger.kernel.org
> Subject: Re: Introduce a new helper framework for buffer synchronization
> 
> On Mon, May 27, 2013 at 5:47 PM, Rob Clark  wrote:
> > On Mon, May 27, 2013 at 6:38 AM, Inki Dae  wrote:
> >> Hi all,
> >>
> >> I have been removed previous branch and added new one with more
cleanup.
> >> This time, the fence helper doesn't include user side interfaces and
> cache
> >> operation relevant codes anymore because not only we are not sure that
> >> coupling those two things, synchronizing caches and buffer access
> between
> >> CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side
> is a
> >> good idea yet but also existing codes for user side have problems with
> badly
> >> behaved or crashing userspace. So this could be more discussed later.
> >>
> >> The below is a new branch,
> >>
> >> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
> exynos.git/?h=dma-f
> >> ence-helper
> >>
> >> And fence helper codes,
> >>
> >> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
> exynos.git/commit/?
> >> h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005
> >>
> >> And example codes for device driver,
> >>
> >> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
> exynos.git/commit/?
> >> h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae
> >>
> >> I think the time is not yet ripe for RFC posting: maybe existing dma
> fence
> >> and reservation need more review and addition work. So I'd glad for
> somebody
> >> giving other opinions and advices in advance before RFC posting.
> >
> > thoughts from a *really* quick, pre-coffee, first look:
> > * any sort of helper to simplify single-buffer sort of use-cases (v4l)
> > probably wouldn't want to bake in assumption that seqno_fence is used.
> 
> Yeah, which is why Maarten&I discussed ideas already for what needs to
> be improved in the current dma-buf interface code to make this Just
> Work. At least as long as a driver doesn't want to add new fences,
> which would be especially useful for all kinds of gpu access.
> 
> > * I guess g2d is probably not actually a simple use case, since I
> > expect you can submit blits involving multiple buffers :-P
> 
> Yeah, on a quick read the current fence helper code seems to be a bit
> limited in scope.
> 
> > * otherwise, you probably don't want to depend on dmabuf, which is why
> > reservation/fence is split out the way it is..  you want to be able to
> > use a single reservation/fence mechanism within your driver without
> > having to care about which buffers are exported to dmabuf's and which
> > are not.  Creating a dmabuf for every GEM bo is too heavyweight.
> 
> That's pretty much the reason that reservations are free-standing from
> dma_bufs. The idea is to embed them into the gem/ttm/v4l buffer
> object. Maarten also has some helpers to keep track of multi-buffer
> ww_mutex locking and fence attaching in his reservation helpers, but I
> think we should wait with those until we have drivers using them.
> 
> For now I think the priority should be to get the basic stuff in and
> ttm as the first user established. Then we can go nuts later on.
> 
> > I'm not entirely sure if reservation/fence could/should be made any
> > simpler for multi-buffer users.  Probably the best thing to do is just
> > get reservation/fence rolled out in a few drivers and see if some
> > common patterns emerge.
> 
> I think we can make the 1 buffer per dma op (i.e. 1:1
> dma_buf->reservation : fence mapping) work fairly simple in dma_buf
> with maybe a dma_buf_attachment_start_dma/end_dma helpers. But there's
> also still the open that currently there's no way to flush cpu caches
> for dma access without unmapping the attachement (or resorting to


That was what I tried adding user interfaces to dmabuf: coupling
synchronizing caches and buffer access between CPU and CPU, CPU and DMA, and
DMA and DMA with fences in kernel side. We need something to do between
mapping and unmapping attachment.

> trick which might not work), so we have a few gaping holes in the
> interface already anyway.
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Introduce a new helper framework for buffer synchronization

2013-05-27 Thread Inki Dae


> -Original Message-
> From: linux-fbdev-ow...@vger.kernel.org [mailto:linux-fbdev-
> ow...@vger.kernel.org] On Behalf Of Rob Clark
> Sent: Tuesday, May 28, 2013 12:48 AM
> To: Inki Dae
> Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho; Kyungmin
> Park; myungjoo.ham; DRI mailing list;
linux-arm-ker...@lists.infradead.org;
> linux-media@vger.kernel.org
> Subject: Re: Introduce a new helper framework for buffer synchronization
> 
> On Mon, May 27, 2013 at 6:38 AM, Inki Dae  wrote:
> > Hi all,
> >
> > I have been removed previous branch and added new one with more cleanup.
> > This time, the fence helper doesn't include user side interfaces and
> cache
> > operation relevant codes anymore because not only we are not sure that
> > coupling those two things, synchronizing caches and buffer access
> between
> > CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is
> a
> > good idea yet but also existing codes for user side have problems with
> badly
> > behaved or crashing userspace. So this could be more discussed later.
> >
> > The below is a new branch,
> >
> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
> exynos.git/?h=dma-f
> > ence-helper
> >
> > And fence helper codes,
> >
> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
> exynos.git/commit/?
> > h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005
> >
> > And example codes for device driver,
> >
> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
> exynos.git/commit/?
> > h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae
> >
> > I think the time is not yet ripe for RFC posting: maybe existing dma
> fence
> > and reservation need more review and addition work. So I'd glad for
> somebody
> > giving other opinions and advices in advance before RFC posting.
> 
> thoughts from a *really* quick, pre-coffee, first look:
> * any sort of helper to simplify single-buffer sort of use-cases (v4l)
> probably wouldn't want to bake in assumption that seqno_fence is used.
> * I guess g2d is probably not actually a simple use case, since I
> expect you can submit blits involving multiple buffers :-P

I don't think so. One and more buffers can be used: seqno_fence also has
only one buffer. Actually, we have already applied this approach to most
devices; multimedia, gpu and display controller. And this approach shows
more performance; reduced power consumption against traditional way. And g2d
example is just to show you how to apply my approach to device driver.

> * otherwise, you probably don't want to depend on dmabuf, which is why
> reservation/fence is split out the way it is..  you want to be able to
> use a single reservation/fence mechanism within your driver without
> having to care about which buffers are exported to dmabuf's and which
> are not.  Creating a dmabuf for every GEM bo is too heavyweight.

Right. But I think we should dealt with this separately. Actually, we are
trying to use reservation for gpu pipe line synchronization such as sgx sync
object and this approach is used without dmabuf. In order words, some device
can use only reservation for such pipe line synchronization and at the same
time, fence helper or similar thing with dmabuf for buffer synchronization.

> 
> I'm not entirely sure if reservation/fence could/should be made any
> simpler for multi-buffer users.  Probably the best thing to do is just
> get reservation/fence rolled out in a few drivers and see if some
> common patterns emerge.
> 
> BR,
> -R
> 
> >
> > Thanks,
> > Inki Dae
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Introduce a new helper framework for buffer synchronization

2013-05-27 Thread Inki Dae


> -Original Message-
> From: Maarten Lankhorst [mailto:maarten.lankho...@canonical.com]
> Sent: Tuesday, May 28, 2013 12:23 AM
> To: Inki Dae
> Cc: 'Daniel Vetter'; 'Rob Clark'; 'linux-fbdev'; 'YoungJun Cho'; 'Kyungmin
> Park'; 'myungjoo.ham'; 'DRI mailing list'; linux-arm-
> ker...@lists.infradead.org; linux-media@vger.kernel.org
> Subject: Re: Introduce a new helper framework for buffer synchronization
> 
> Hey,
> 
> Op 27-05-13 12:38, Inki Dae schreef:
> > Hi all,
> >
> > I have been removed previous branch and added new one with more cleanup.
> > This time, the fence helper doesn't include user side interfaces and
> cache
> > operation relevant codes anymore because not only we are not sure that
> > coupling those two things, synchronizing caches and buffer access
> between
> > CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is
> a
> > good idea yet but also existing codes for user side have problems with
> badly
> > behaved or crashing userspace. So this could be more discussed later.
> >
> > The below is a new branch,
> >
> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
> exynos.git/?h=dma-f
> > ence-helper
> >
> > And fence helper codes,
> >
> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
> exynos.git/commit/?
> > h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005
> >
> > And example codes for device driver,
> >
> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-
> exynos.git/commit/?
> > h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae
> >
> > I think the time is not yet ripe for RFC posting: maybe existing dma
> fence
> > and reservation need more review and addition work. So I'd glad for
> somebody
> > giving other opinions and advices in advance before RFC posting.
> >
> NAK.
> 
> For examples for how to handle locking properly, see Documentation/ww-
> mutex-design.txt in my recent tree.
> I could list what I believe is wrong with your implementation, but real
> problem is that the approach you're taking is wrong.

I just removed ticket stubs to show my approach you guys as simple as
possible, and I just wanted to show that we could use buffer synchronization
mechanism without ticket stubs.

Question, WW-Mutexes could be used for all devices? I guess this has
dependence on x86 gpu: gpu has VRAM and it means different memory domain.
And could you tell my why shared fence should have only eight objects? I
think we could need more than eight objects for read access. Anyway I think
I don't surely understand yet so there might be my missing point.

Thanks,
Inki Dae

> 

> ~Maarten

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Introduce a new helper framework for buffer synchronization

2013-05-27 Thread Daniel Vetter
On Mon, May 27, 2013 at 5:47 PM, Rob Clark  wrote:
> On Mon, May 27, 2013 at 6:38 AM, Inki Dae  wrote:
>> Hi all,
>>
>> I have been removed previous branch and added new one with more cleanup.
>> This time, the fence helper doesn't include user side interfaces and cache
>> operation relevant codes anymore because not only we are not sure that
>> coupling those two things, synchronizing caches and buffer access between
>> CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is a
>> good idea yet but also existing codes for user side have problems with badly
>> behaved or crashing userspace. So this could be more discussed later.
>>
>> The below is a new branch,
>>
>> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/?h=dma-f
>> ence-helper
>>
>> And fence helper codes,
>>
>> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/?
>> h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005
>>
>> And example codes for device driver,
>>
>> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/?
>> h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae
>>
>> I think the time is not yet ripe for RFC posting: maybe existing dma fence
>> and reservation need more review and addition work. So I'd glad for somebody
>> giving other opinions and advices in advance before RFC posting.
>
> thoughts from a *really* quick, pre-coffee, first look:
> * any sort of helper to simplify single-buffer sort of use-cases (v4l)
> probably wouldn't want to bake in assumption that seqno_fence is used.

Yeah, which is why Maarten&I discussed ideas already for what needs to
be improved in the current dma-buf interface code to make this Just
Work. At least as long as a driver doesn't want to add new fences,
which would be especially useful for all kinds of gpu access.

> * I guess g2d is probably not actually a simple use case, since I
> expect you can submit blits involving multiple buffers :-P

Yeah, on a quick read the current fence helper code seems to be a bit
limited in scope.

> * otherwise, you probably don't want to depend on dmabuf, which is why
> reservation/fence is split out the way it is..  you want to be able to
> use a single reservation/fence mechanism within your driver without
> having to care about which buffers are exported to dmabuf's and which
> are not.  Creating a dmabuf for every GEM bo is too heavyweight.

That's pretty much the reason that reservations are free-standing from
dma_bufs. The idea is to embed them into the gem/ttm/v4l buffer
object. Maarten also has some helpers to keep track of multi-buffer
ww_mutex locking and fence attaching in his reservation helpers, but I
think we should wait with those until we have drivers using them.

For now I think the priority should be to get the basic stuff in and
ttm as the first user established. Then we can go nuts later on.

> I'm not entirely sure if reservation/fence could/should be made any
> simpler for multi-buffer users.  Probably the best thing to do is just
> get reservation/fence rolled out in a few drivers and see if some
> common patterns emerge.

I think we can make the 1 buffer per dma op (i.e. 1:1
dma_buf->reservation : fence mapping) work fairly simple in dma_buf
with maybe a dma_buf_attachment_start_dma/end_dma helpers. But there's
also still the open that currently there's no way to flush cpu caches
for dma access without unmapping the attachement (or resorting to
trick which might not work), so we have a few gaping holes in the
interface already anyway.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Introduce a new helper framework for buffer synchronization

2013-05-27 Thread Rob Clark
On Mon, May 27, 2013 at 6:38 AM, Inki Dae  wrote:
> Hi all,
>
> I have been removed previous branch and added new one with more cleanup.
> This time, the fence helper doesn't include user side interfaces and cache
> operation relevant codes anymore because not only we are not sure that
> coupling those two things, synchronizing caches and buffer access between
> CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is a
> good idea yet but also existing codes for user side have problems with badly
> behaved or crashing userspace. So this could be more discussed later.
>
> The below is a new branch,
>
> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/?h=dma-f
> ence-helper
>
> And fence helper codes,
>
> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/?
> h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005
>
> And example codes for device driver,
>
> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/?
> h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae
>
> I think the time is not yet ripe for RFC posting: maybe existing dma fence
> and reservation need more review and addition work. So I'd glad for somebody
> giving other opinions and advices in advance before RFC posting.

thoughts from a *really* quick, pre-coffee, first look:
* any sort of helper to simplify single-buffer sort of use-cases (v4l)
probably wouldn't want to bake in assumption that seqno_fence is used.
* I guess g2d is probably not actually a simple use case, since I
expect you can submit blits involving multiple buffers :-P
* otherwise, you probably don't want to depend on dmabuf, which is why
reservation/fence is split out the way it is..  you want to be able to
use a single reservation/fence mechanism within your driver without
having to care about which buffers are exported to dmabuf's and which
are not.  Creating a dmabuf for every GEM bo is too heavyweight.

I'm not entirely sure if reservation/fence could/should be made any
simpler for multi-buffer users.  Probably the best thing to do is just
get reservation/fence rolled out in a few drivers and see if some
common patterns emerge.

BR,
-R

>
> Thanks,
> Inki Dae
>
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Introduce a new helper framework for buffer synchronization

2013-05-27 Thread Maarten Lankhorst
Hey,

Op 27-05-13 12:38, Inki Dae schreef:
> Hi all,
>
> I have been removed previous branch and added new one with more cleanup.
> This time, the fence helper doesn't include user side interfaces and cache
> operation relevant codes anymore because not only we are not sure that
> coupling those two things, synchronizing caches and buffer access between
> CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is a
> good idea yet but also existing codes for user side have problems with badly
> behaved or crashing userspace. So this could be more discussed later.
>
> The below is a new branch,
>   
> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/?h=dma-f
> ence-helper
>
> And fence helper codes,
>   
> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/?
> h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005
>
> And example codes for device driver,
>   
> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/?
> h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae
>
> I think the time is not yet ripe for RFC posting: maybe existing dma fence
> and reservation need more review and addition work. So I'd glad for somebody
> giving other opinions and advices in advance before RFC posting.
>
NAK.

For examples for how to handle locking properly, see 
Documentation/ww-mutex-design.txt in my recent tree.
I could list what I believe is wrong with your implementation, but real problem 
is that the approach you're taking is wrong.

~Maarten
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Introduce a new helper framework for buffer synchronization

2013-05-27 Thread Inki Dae
Hi all,

I have been removed previous branch and added new one with more cleanup.
This time, the fence helper doesn't include user side interfaces and cache
operation relevant codes anymore because not only we are not sure that
coupling those two things, synchronizing caches and buffer access between
CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is a
good idea yet but also existing codes for user side have problems with badly
behaved or crashing userspace. So this could be more discussed later.

The below is a new branch,

https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/?h=dma-f
ence-helper

And fence helper codes,

https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/?
h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005

And example codes for device driver,

https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/?
h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae

I think the time is not yet ripe for RFC posting: maybe existing dma fence
and reservation need more review and addition work. So I'd glad for somebody
giving other opinions and advices in advance before RFC posting.

Thanks,
Inki Dae

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Introduce a new helper framework for buffer synchronization

2013-05-23 Thread Inki Dae
> -Original Message-
> From: daniel.vet...@ffwll.ch [mailto:daniel.vet...@ffwll.ch] On Behalf Of
> Daniel Vetter
> Sent: Thursday, May 23, 2013 8:56 PM
> To: Inki Dae
> Cc: Rob Clark; linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham;
> YoungJun Cho; linux-arm-ker...@lists.infradead.org; linux-
> me...@vger.kernel.org
> Subject: Re: Introduce a new helper framework for buffer synchronization
> 
> On Tue, May 21, 2013 at 11:22 AM, Inki Dae  wrote:
> >> -Original Message-
> >> From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel
> >> Vetter
> >> Sent: Tuesday, May 21, 2013 4:45 PM
> >> To: Inki Dae
> >> Cc: 'Daniel Vetter'; 'Rob Clark'; 'linux-fbdev'; 'DRI mailing list';
> >> 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho'; linux-arm-
> >> ker...@lists.infradead.org; linux-media@vger.kernel.org
> >> Subject: Re: Introduce a new helper framework for buffer
> synchronization
> >>
> >> On Tue, May 21, 2013 at 04:03:06PM +0900, Inki Dae wrote:
> >> > > - Integration of fence syncing into dma_buf. Imo we should have a
> >> > >   per-attachment mode which decides whether map/unmap (and the new
> >> sync)
> >> > >   should wait for fences or whether the driver takes care of
> syncing
> >> > >   through the new fence framework itself (for direct hw sync).
> >> >
> >> > I think it's a good idea to have per-attachment mode for buffer sync.
> >> But
> >> > I'd like to say we make sure what is the purpose of map
> >> > (dma_buf_map_attachment)first. This interface is used to get a sgt;
> >> > containing pages to physical memory region, and map that region with
> >> > device's iommu table. The important thing is that this interface is
> >> called
> >> > just one time when user wants to share an allocated buffer with dma.
> But
> >> cpu
> >> > will try to access the buffer every time as long as it wants.
> Therefore,
> >> we
> >> > need cache control every time cpu and dma access the shared buffer:
> >> cache
> >> > clean when cpu goes to dma and cache invalidate when dma goes to cpu.
> >> That
> >> > is why I added new interfaces, DMA_BUF_GET_FENCE and
> DMA_BUF_PUT_FENCE,
> >> to
> >> > dma buf framework. Of course, Those are ugly so we need a better way:
> I
> >> just
> >> > wanted to show you that in such way, we can synchronize the shared
> >> buffer
> >> > between cpu and dma. By any chance, is there any way that kernel can
> be
> >> > aware of when cpu accesses the shared buffer or is there any point I
> >> didn't
> >> > catch up?
> >>
> >> Well dma_buf_map/unmap_attachment should also flush/invalidate any
> caches,
> >> and with the current dma_buf spec those two functions are the _only_
> means
> >
> > I know that dma buf exporter should make sure that cache
> clean/invalidate
> > are done when dma_buf_map/unmap_attachement is called. For this, already
> we
> > do so. However, this function is called when dma buf import is requested
> by
> > user to map a dmabuf fd with dma. This means that
> dma_buf_map_attachement()
> > is called ONCE when user wants to share the dmabuf fd with dma.
Actually,
> in
> > case of exynos drm, dma_map_sg_attrs(), performing cache operation
> > internally, is called when dmabuf import is requested by user.
> >
> >> you have to do so. Which strictly means that if you interleave device
> dma
> >> and cpu acccess you need to unmap/map every time.
> >>
> >> Which is obviously horribly inefficient, but a known gap in the dma_buf
> >
> > Right, and also this has big overhead.
> >
> >> interface. Hence why I proposed to add dma_buf_sync_attachment similar
> to
> >> dma_sync_single_for_device:
> >>
> >> /**
> >>  * dma_buf_sync_sg_attachment - sync caches for dma access
> >>  * @attach: dma-buf attachment to sync
> >>  * @sgt: the sg table to sync (returned by dma_buf_map_attachement)
> >>  * @direction: dma direction to sync for
> >>  *
> >>  * This function synchronizes caches for device dma through the given
> >>  * dma-buf attachment when interleaving dma from different devices and
> the
> >>  * cpu. Other device dma needs to be synced also by calls to this
> >>  * function (or 

Re: Introduce a new helper framework for buffer synchronization

2013-05-23 Thread Daniel Vetter
On Tue, May 21, 2013 at 11:22 AM, Inki Dae  wrote:
>> -Original Message-
>> From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel
>> Vetter
>> Sent: Tuesday, May 21, 2013 4:45 PM
>> To: Inki Dae
>> Cc: 'Daniel Vetter'; 'Rob Clark'; 'linux-fbdev'; 'DRI mailing list';
>> 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho'; linux-arm-
>> ker...@lists.infradead.org; linux-media@vger.kernel.org
>> Subject: Re: Introduce a new helper framework for buffer synchronization
>>
>> On Tue, May 21, 2013 at 04:03:06PM +0900, Inki Dae wrote:
>> > > - Integration of fence syncing into dma_buf. Imo we should have a
>> > >   per-attachment mode which decides whether map/unmap (and the new
>> sync)
>> > >   should wait for fences or whether the driver takes care of syncing
>> > >   through the new fence framework itself (for direct hw sync).
>> >
>> > I think it's a good idea to have per-attachment mode for buffer sync.
>> But
>> > I'd like to say we make sure what is the purpose of map
>> > (dma_buf_map_attachment)first. This interface is used to get a sgt;
>> > containing pages to physical memory region, and map that region with
>> > device's iommu table. The important thing is that this interface is
>> called
>> > just one time when user wants to share an allocated buffer with dma. But
>> cpu
>> > will try to access the buffer every time as long as it wants. Therefore,
>> we
>> > need cache control every time cpu and dma access the shared buffer:
>> cache
>> > clean when cpu goes to dma and cache invalidate when dma goes to cpu.
>> That
>> > is why I added new interfaces, DMA_BUF_GET_FENCE and DMA_BUF_PUT_FENCE,
>> to
>> > dma buf framework. Of course, Those are ugly so we need a better way: I
>> just
>> > wanted to show you that in such way, we can synchronize the shared
>> buffer
>> > between cpu and dma. By any chance, is there any way that kernel can be
>> > aware of when cpu accesses the shared buffer or is there any point I
>> didn't
>> > catch up?
>>
>> Well dma_buf_map/unmap_attachment should also flush/invalidate any caches,
>> and with the current dma_buf spec those two functions are the _only_ means
>
> I know that dma buf exporter should make sure that cache clean/invalidate
> are done when dma_buf_map/unmap_attachement is called. For this, already we
> do so. However, this function is called when dma buf import is requested by
> user to map a dmabuf fd with dma. This means that dma_buf_map_attachement()
> is called ONCE when user wants to share the dmabuf fd with dma. Actually, in
> case of exynos drm, dma_map_sg_attrs(), performing cache operation
> internally, is called when dmabuf import is requested by user.
>
>> you have to do so. Which strictly means that if you interleave device dma
>> and cpu acccess you need to unmap/map every time.
>>
>> Which is obviously horribly inefficient, but a known gap in the dma_buf
>
> Right, and also this has big overhead.
>
>> interface. Hence why I proposed to add dma_buf_sync_attachment similar to
>> dma_sync_single_for_device:
>>
>> /**
>>  * dma_buf_sync_sg_attachment - sync caches for dma access
>>  * @attach: dma-buf attachment to sync
>>  * @sgt: the sg table to sync (returned by dma_buf_map_attachement)
>>  * @direction: dma direction to sync for
>>  *
>>  * This function synchronizes caches for device dma through the given
>>  * dma-buf attachment when interleaving dma from different devices and the
>>  * cpu. Other device dma needs to be synced also by calls to this
>>  * function (or a pair of dma_buf_map/unmap_attachment calls), cpu access
>>  * needs to be synced with dma_buf_begin/end_cpu_access.
>>  */
>> void dma_buf_sync_sg_attachment(struct dma_buf_attachment *attach,
>>   struct sg_table *sgt,
>>   enum dma_data_direction direction)
>>
>> Note that "sync" here only means to synchronize caches, not wait for any
>> outstanding fences. This is simply to be consistent with the established
>> lingo of the dma api. How the dma-buf fences fit into this is imo a
>> different topic, but my idea is that all the cache coherency barriers
>> (i.e. dma_buf_map/unmap_attachment, dma_buf_sync_sg_attachment and
>> dma_buf_begin/end_cpu_access) would automatically block for any attached
>> fences (i.e. block for write fences when doing read-only a

RE: Introduce a new helper framework for buffer synchronization

2013-05-21 Thread Inki Dae


> -Original Message-
> From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Tuesday, May 21, 2013 4:45 PM
> To: Inki Dae
> Cc: 'Daniel Vetter'; 'Rob Clark'; 'linux-fbdev'; 'DRI mailing list';
> 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho'; linux-arm-
> ker...@lists.infradead.org; linux-media@vger.kernel.org
> Subject: Re: Introduce a new helper framework for buffer synchronization
> 
> On Tue, May 21, 2013 at 04:03:06PM +0900, Inki Dae wrote:
> > > - Integration of fence syncing into dma_buf. Imo we should have a
> > >   per-attachment mode which decides whether map/unmap (and the new
> sync)
> > >   should wait for fences or whether the driver takes care of syncing
> > >   through the new fence framework itself (for direct hw sync).
> >
> > I think it's a good idea to have per-attachment mode for buffer sync.
> But
> > I'd like to say we make sure what is the purpose of map
> > (dma_buf_map_attachment)first. This interface is used to get a sgt;
> > containing pages to physical memory region, and map that region with
> > device's iommu table. The important thing is that this interface is
> called
> > just one time when user wants to share an allocated buffer with dma. But
> cpu
> > will try to access the buffer every time as long as it wants. Therefore,
> we
> > need cache control every time cpu and dma access the shared buffer:
> cache
> > clean when cpu goes to dma and cache invalidate when dma goes to cpu.
> That
> > is why I added new interfaces, DMA_BUF_GET_FENCE and DMA_BUF_PUT_FENCE,
> to
> > dma buf framework. Of course, Those are ugly so we need a better way: I
> just
> > wanted to show you that in such way, we can synchronize the shared
> buffer
> > between cpu and dma. By any chance, is there any way that kernel can be
> > aware of when cpu accesses the shared buffer or is there any point I
> didn't
> > catch up?
> 
> Well dma_buf_map/unmap_attachment should also flush/invalidate any caches,
> and with the current dma_buf spec those two functions are the _only_ means

I know that dma buf exporter should make sure that cache clean/invalidate
are done when dma_buf_map/unmap_attachement is called. For this, already we
do so. However, this function is called when dma buf import is requested by
user to map a dmabuf fd with dma. This means that dma_buf_map_attachement()
is called ONCE when user wants to share the dmabuf fd with dma. Actually, in
case of exynos drm, dma_map_sg_attrs(), performing cache operation
internally, is called when dmabuf import is requested by user.

> you have to do so. Which strictly means that if you interleave device dma
> and cpu acccess you need to unmap/map every time.
> 
> Which is obviously horribly inefficient, but a known gap in the dma_buf

Right, and also this has big overhead.

> interface. Hence why I proposed to add dma_buf_sync_attachment similar to
> dma_sync_single_for_device:
> 
> /**
>  * dma_buf_sync_sg_attachment - sync caches for dma access
>  * @attach: dma-buf attachment to sync
>  * @sgt: the sg table to sync (returned by dma_buf_map_attachement)
>  * @direction: dma direction to sync for
>  *
>  * This function synchronizes caches for device dma through the given
>  * dma-buf attachment when interleaving dma from different devices and the
>  * cpu. Other device dma needs to be synced also by calls to this
>  * function (or a pair of dma_buf_map/unmap_attachment calls), cpu access
>  * needs to be synced with dma_buf_begin/end_cpu_access.
>  */
> void dma_buf_sync_sg_attachment(struct dma_buf_attachment *attach,
>   struct sg_table *sgt,
>   enum dma_data_direction direction)
> 
> Note that "sync" here only means to synchronize caches, not wait for any
> outstanding fences. This is simply to be consistent with the established
> lingo of the dma api. How the dma-buf fences fit into this is imo a
> different topic, but my idea is that all the cache coherency barriers
> (i.e. dma_buf_map/unmap_attachment, dma_buf_sync_sg_attachment and
> dma_buf_begin/end_cpu_access) would automatically block for any attached
> fences (i.e. block for write fences when doing read-only access, block for
> all fences otherwise).

As I mentioned already, kernel can't aware of when cpu accesses a shared
buffer: user can access a shared buffer after mmap anytime and the shared
buffer should be synchronized between cpu and dma. Therefore, the above
cache coherency barriers should be called every time cpu and dma tries to
access a shared buffer, checking before and after of c

Re: Introduce a new helper framework for buffer synchronization

2013-05-21 Thread Daniel Vetter
On Tue, May 21, 2013 at 04:03:06PM +0900, Inki Dae wrote:
> > - Integration of fence syncing into dma_buf. Imo we should have a
> >   per-attachment mode which decides whether map/unmap (and the new sync)
> >   should wait for fences or whether the driver takes care of syncing
> >   through the new fence framework itself (for direct hw sync).
> 
> I think it's a good idea to have per-attachment mode for buffer sync. But
> I'd like to say we make sure what is the purpose of map
> (dma_buf_map_attachment)first. This interface is used to get a sgt;
> containing pages to physical memory region, and map that region with
> device's iommu table. The important thing is that this interface is called
> just one time when user wants to share an allocated buffer with dma. But cpu
> will try to access the buffer every time as long as it wants. Therefore, we
> need cache control every time cpu and dma access the shared buffer: cache
> clean when cpu goes to dma and cache invalidate when dma goes to cpu. That
> is why I added new interfaces, DMA_BUF_GET_FENCE and DMA_BUF_PUT_FENCE, to
> dma buf framework. Of course, Those are ugly so we need a better way: I just
> wanted to show you that in such way, we can synchronize the shared buffer
> between cpu and dma. By any chance, is there any way that kernel can be
> aware of when cpu accesses the shared buffer or is there any point I didn't
> catch up?

Well dma_buf_map/unmap_attachment should also flush/invalidate any caches,
and with the current dma_buf spec those two functions are the _only_ means
you have to do so. Which strictly means that if you interleave device dma
and cpu acccess you need to unmap/map every time.

Which is obviously horribly inefficient, but a known gap in the dma_buf
interface. Hence why I proposed to add dma_buf_sync_attachment similar to
dma_sync_single_for_device:

/**
 * dma_buf_sync_sg_attachment - sync caches for dma access
 * @attach: dma-buf attachment to sync
 * @sgt: the sg table to sync (returned by dma_buf_map_attachement)
 * @direction: dma direction to sync for
 *
 * This function synchronizes caches for device dma through the given
 * dma-buf attachment when interleaving dma from different devices and the
 * cpu. Other device dma needs to be synced also by calls to this
 * function (or a pair of dma_buf_map/unmap_attachment calls), cpu access
 * needs to be synced with dma_buf_begin/end_cpu_access.
 */
void dma_buf_sync_sg_attachment(struct dma_buf_attachment *attach,
struct sg_table *sgt,
enum dma_data_direction direction)

Note that "sync" here only means to synchronize caches, not wait for any
outstanding fences. This is simply to be consistent with the established
lingo of the dma api. How the dma-buf fences fit into this is imo a
different topic, but my idea is that all the cache coherency barriers
(i.e. dma_buf_map/unmap_attachment, dma_buf_sync_sg_attachment and
dma_buf_begin/end_cpu_access) would automatically block for any attached
fences (i.e. block for write fences when doing read-only access, block for
all fences otherwise).

Then we could do a new dma_buf_attach_flags interface for special cases
(might also be useful for other things, similar to the recently added
flags in the dma api for wc/no-kernel-mapping/...). A new flag like
DMA_BUF_ATTACH_NO_AUTOFENCE would then mean that the driver takes care of all
fencing for this attachment and the dma-buf functions should not do the
automagic fence blocking.

> In Linux kernel, especially embedded system, we had tried to implement
> generic interfaces for buffer management; how to allocate a buffer and how
> to share a buffer. As a result, now we have CMA (Contiguous Memory
> Allocator) for buffer allocation, and we have DMA-BUF for buffer sharing
> between cpu and dma. And then how to synchronize a buffer between cpu and
> dma? I think now it's best time to discuss buffer synchronization mechanism,
> and that is very natural.

I think it's important to differentiate between the two meanings of sync:
- synchronize caches (i.e. cpu cache flushing in most cases)
- and synchronize in-flight dma with fences.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Introduce a new helper framework for buffer synchronization

2013-05-21 Thread Inki Dae


> -Original Message-
> From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Tuesday, May 21, 2013 6:31 AM
> To: Inki Dae
> Cc: Rob Clark; linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham;
> YoungJun Cho; linux-arm-ker...@lists.infradead.org; linux-
> me...@vger.kernel.org
> Subject: Re: Introduce a new helper framework for buffer synchronization
> 
> On Mon, May 20, 2013 at 11:13:04PM +0200, Daniel Vetter wrote:
> > On Sat, May 18, 2013 at 03:47:43PM +0900, Inki Dae wrote:
> > > 2013/5/15 Rob Clark 
> > >
> > > > On Wed, May 15, 2013 at 1:19 AM, Inki Dae 
> wrote:
> > > > >
> > > > >
> > > > >> -Original Message-
> > > > >> From: Rob Clark [mailto:robdcl...@gmail.com]
> > > > >> Sent: Tuesday, May 14, 2013 10:39 PM
> > > > >> To: Inki Dae
> > > > >> Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham;
> YoungJun
> > > > >> Cho; linux-arm-ker...@lists.infradead.org; linux-
> me...@vger.kernel.org
> > > > >> Subject: Re: Introduce a new helper framework for buffer
> synchronization
> > > > >>
> > > > >> On Mon, May 13, 2013 at 10:52 PM, Inki Dae 
> > > > wrote:
> > > > >> >> well, for cache management, I think it is a better idea.. I
> didn't
> > > > >> >> really catch that this was the motivation from the initial
> patch, but
> > > > >> >> maybe I read it too quickly.  But cache can be decoupled from
> > > > >> >> synchronization, because CPU access is not asynchronous.  For
> > > > >> >> userspace/CPU access to buffer, you should:
> > > > >> >>
> > > > >> >>   1) wait for buffer
> > > > >> >>   2) prepare-access
> > > > >> >>   3)  ... do whatever cpu access to buffer ...
> > > > >> >>   4) finish-access
> > > > >> >>   5) submit buffer for new dma-operation
> > > > >> >>
> > > > >> >
> > > > >> >
> > > > >> > For data flow from CPU to DMA device,
> > > > >> > 1) wait for buffer
> > > > >> > 2) prepare-access (dma_buf_begin_cpu_access)
> > > > >> > 3) cpu access to buffer
> > > > >> >
> > > > >> >
> > > > >> > For data flow from DMA device to CPU
> > > > >> > 1) wait for buffer
> > > > >>
> > > > >> Right, but CPU access isn't asynchronous (from the point of view
> of
> > > > >> the CPU), so there isn't really any wait step at this point.  And
> if
> > > > >> you do want the CPU to be able to signal a fence from userspace
> for
> > > > >> some reason, you probably what something file/fd based so the
> > > > >> refcnting/cleanup when process dies doesn't leave some pending
> DMA
> > > > >> action wedged.  But I don't really see the point of that
> complexity
> > > > >> when the CPU access isn't asynchronous in the first place.
> > > > >>
> > > > >
> > > > > There was my missing comments, please see the below sequence.
> > > > >
> > > > > For data flow from CPU to DMA device and then from DMA device to
> CPU,
> > > > > 1) wait for buffer <- at user side - ioctl(fd,
> DMA_BUF_GET_FENCE, ...)
> > > > > - including prepare-access (dma_buf_begin_cpu_access)
> > > > > 2) cpu access to buffer
> > > > > 3) wait for buffer <- at device driver
> > > > > - but CPU is already accessing the buffer so blocked.
> > > > > 4) signal <- at user side - ioctl(fd, DMA_BUF_PUT_FENCE, ...)
> > > > > 5) the thread, blocked at 3), is waked up by 4).
> > > > > - and then finish-access (dma_buf_end_cpu_access)
> > > >
> > > > right, I understand you can have background threads, etc, in
> > > > userspace.  But there are already plenty of synchronization
> primitives
> > > > that can be used for cpu->cpu synchronization, either within the
> same
> > > > process or between multiple processes.  For cpu access, even if it
> is
> > > > handled by background threads/processes, I think it i

Re: Introduce a new helper framework for buffer synchronization

2013-05-20 Thread Daniel Vetter
On Mon, May 20, 2013 at 11:13:04PM +0200, Daniel Vetter wrote:
> On Sat, May 18, 2013 at 03:47:43PM +0900, Inki Dae wrote:
> > 2013/5/15 Rob Clark 
> > 
> > > On Wed, May 15, 2013 at 1:19 AM, Inki Dae  wrote:
> > > >
> > > >
> > > >> -Original Message-
> > > >> From: Rob Clark [mailto:robdcl...@gmail.com]
> > > >> Sent: Tuesday, May 14, 2013 10:39 PM
> > > >> To: Inki Dae
> > > >> Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; 
> > > >> YoungJun
> > > >> Cho; linux-arm-ker...@lists.infradead.org; linux-media@vger.kernel.org
> > > >> Subject: Re: Introduce a new helper framework for buffer 
> > > >> synchronization
> > > >>
> > > >> On Mon, May 13, 2013 at 10:52 PM, Inki Dae 
> > > wrote:
> > > >> >> well, for cache management, I think it is a better idea.. I didn't
> > > >> >> really catch that this was the motivation from the initial patch, 
> > > >> >> but
> > > >> >> maybe I read it too quickly.  But cache can be decoupled from
> > > >> >> synchronization, because CPU access is not asynchronous.  For
> > > >> >> userspace/CPU access to buffer, you should:
> > > >> >>
> > > >> >>   1) wait for buffer
> > > >> >>   2) prepare-access
> > > >> >>   3)  ... do whatever cpu access to buffer ...
> > > >> >>   4) finish-access
> > > >> >>   5) submit buffer for new dma-operation
> > > >> >>
> > > >> >
> > > >> >
> > > >> > For data flow from CPU to DMA device,
> > > >> > 1) wait for buffer
> > > >> > 2) prepare-access (dma_buf_begin_cpu_access)
> > > >> > 3) cpu access to buffer
> > > >> >
> > > >> >
> > > >> > For data flow from DMA device to CPU
> > > >> > 1) wait for buffer
> > > >>
> > > >> Right, but CPU access isn't asynchronous (from the point of view of
> > > >> the CPU), so there isn't really any wait step at this point.  And if
> > > >> you do want the CPU to be able to signal a fence from userspace for
> > > >> some reason, you probably what something file/fd based so the
> > > >> refcnting/cleanup when process dies doesn't leave some pending DMA
> > > >> action wedged.  But I don't really see the point of that complexity
> > > >> when the CPU access isn't asynchronous in the first place.
> > > >>
> > > >
> > > > There was my missing comments, please see the below sequence.
> > > >
> > > > For data flow from CPU to DMA device and then from DMA device to CPU,
> > > > 1) wait for buffer <- at user side - ioctl(fd, DMA_BUF_GET_FENCE, ...)
> > > > - including prepare-access (dma_buf_begin_cpu_access)
> > > > 2) cpu access to buffer
> > > > 3) wait for buffer <- at device driver
> > > > - but CPU is already accessing the buffer so blocked.
> > > > 4) signal <- at user side - ioctl(fd, DMA_BUF_PUT_FENCE, ...)
> > > > 5) the thread, blocked at 3), is waked up by 4).
> > > > - and then finish-access (dma_buf_end_cpu_access)
> > >
> > > right, I understand you can have background threads, etc, in
> > > userspace.  But there are already plenty of synchronization primitives
> > > that can be used for cpu->cpu synchronization, either within the same
> > > process or between multiple processes.  For cpu access, even if it is
> > > handled by background threads/processes, I think it is better to use
> > > the traditional pthreads or unix synchronization primitives.  They
> > > have existed forever, they are well tested, and they work.
> > >
> > > So while it seems nice and orthogonal/clean to couple cache and
> > > synchronization and handle dma->cpu and cpu->cpu and cpu->dma in the
> > > same generic way, but I think in practice we have to make things more
> > > complex than they otherwise need to be to do this.  Otherwise I think
> > > we'll be having problems with badly behaved or crashing userspace.
> > >
> > >
> > Right, this is very important. I think it's not esay to solve this
> > issue. Aand at least for 

Re: Introduce a new helper framework for buffer synchronization

2013-05-20 Thread Daniel Vetter
On Sat, May 18, 2013 at 03:47:43PM +0900, Inki Dae wrote:
> 2013/5/15 Rob Clark 
> 
> > On Wed, May 15, 2013 at 1:19 AM, Inki Dae  wrote:
> > >
> > >
> > >> -Original Message-
> > >> From: Rob Clark [mailto:robdcl...@gmail.com]
> > >> Sent: Tuesday, May 14, 2013 10:39 PM
> > >> To: Inki Dae
> > >> Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun
> > >> Cho; linux-arm-ker...@lists.infradead.org; linux-media@vger.kernel.org
> > >> Subject: Re: Introduce a new helper framework for buffer synchronization
> > >>
> > >> On Mon, May 13, 2013 at 10:52 PM, Inki Dae 
> > wrote:
> > >> >> well, for cache management, I think it is a better idea.. I didn't
> > >> >> really catch that this was the motivation from the initial patch, but
> > >> >> maybe I read it too quickly.  But cache can be decoupled from
> > >> >> synchronization, because CPU access is not asynchronous.  For
> > >> >> userspace/CPU access to buffer, you should:
> > >> >>
> > >> >>   1) wait for buffer
> > >> >>   2) prepare-access
> > >> >>   3)  ... do whatever cpu access to buffer ...
> > >> >>   4) finish-access
> > >> >>   5) submit buffer for new dma-operation
> > >> >>
> > >> >
> > >> >
> > >> > For data flow from CPU to DMA device,
> > >> > 1) wait for buffer
> > >> > 2) prepare-access (dma_buf_begin_cpu_access)
> > >> > 3) cpu access to buffer
> > >> >
> > >> >
> > >> > For data flow from DMA device to CPU
> > >> > 1) wait for buffer
> > >>
> > >> Right, but CPU access isn't asynchronous (from the point of view of
> > >> the CPU), so there isn't really any wait step at this point.  And if
> > >> you do want the CPU to be able to signal a fence from userspace for
> > >> some reason, you probably what something file/fd based so the
> > >> refcnting/cleanup when process dies doesn't leave some pending DMA
> > >> action wedged.  But I don't really see the point of that complexity
> > >> when the CPU access isn't asynchronous in the first place.
> > >>
> > >
> > > There was my missing comments, please see the below sequence.
> > >
> > > For data flow from CPU to DMA device and then from DMA device to CPU,
> > > 1) wait for buffer <- at user side - ioctl(fd, DMA_BUF_GET_FENCE, ...)
> > > - including prepare-access (dma_buf_begin_cpu_access)
> > > 2) cpu access to buffer
> > > 3) wait for buffer <- at device driver
> > > - but CPU is already accessing the buffer so blocked.
> > > 4) signal <- at user side - ioctl(fd, DMA_BUF_PUT_FENCE, ...)
> > > 5) the thread, blocked at 3), is waked up by 4).
> > > - and then finish-access (dma_buf_end_cpu_access)
> >
> > right, I understand you can have background threads, etc, in
> > userspace.  But there are already plenty of synchronization primitives
> > that can be used for cpu->cpu synchronization, either within the same
> > process or between multiple processes.  For cpu access, even if it is
> > handled by background threads/processes, I think it is better to use
> > the traditional pthreads or unix synchronization primitives.  They
> > have existed forever, they are well tested, and they work.
> >
> > So while it seems nice and orthogonal/clean to couple cache and
> > synchronization and handle dma->cpu and cpu->cpu and cpu->dma in the
> > same generic way, but I think in practice we have to make things more
> > complex than they otherwise need to be to do this.  Otherwise I think
> > we'll be having problems with badly behaved or crashing userspace.
> >
> >
> Right, this is very important. I think it's not esay to solve this
> issue. Aand at least for ARM based embedded system, such feature is useful
> to us; coupling cache operation and buffer synchronization. I'd like to
> collect other opinions and advices to solve this issue.

Maybe we have a bit a misunderstanding here. The kernel really should take
care of sync and cache coherency, and I agree that with the current
dma_buf code (and the proposed fences) that part is still a bit hazy. But
the kernel should not allow userspace to block access to a buffer until
userspace is done. It should only sync with any oustanding fences and
flush buffers before that userspace access happens.

Then the kernel would again flush caches on the next dma access (which
hopefully is only started once userspace completed access). Atm this isn't
possible in an efficient way since the dma_buf api only exposes map/unmap
attachment and not a function to just sync an existing mapping like
dma_sync_single_for_device. I guess we should add a
dma_buf_sync_attachment interface for that - we already have range-based
cpu sync support with the begin/end_cpu_access interfaces.

Yours, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Introduce a new helper framework for buffer synchronization

2013-05-16 Thread Daniel Vetter
On Wed, May 15, 2013 at 4:06 PM, Rob Clark  wrote:
> So while it seems nice and orthogonal/clean to couple cache and
> synchronization and handle dma->cpu and cpu->cpu and cpu->dma in the
> same generic way, but I think in practice we have to make things more
> complex than they otherwise need to be to do this.  Otherwise I think
> we'll be having problems with badly behaved or crashing userspace.

I haven't read through the entire thread careful but imo this is very
important. If we add a fence interface which allows userspace to block
dma this is a no-go. The only thing we need is to sync up with all
outstanding dma operations and flush caches for cpu access. If broken
userspace starts to issue new dma (or multiple thread stomp onto each
another) that's not a problem dma fences/syncpoints should try to
solve. This way we can concentrate on solving the (already
challenging) device-to-device sync issues without additional
complexities which cpu->cpu sync would impose.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Introduce a new helper framework for buffer synchronization

2013-05-15 Thread Rob Clark
On Wed, May 15, 2013 at 1:19 AM, Inki Dae  wrote:
>
>
>> -Original Message-
>> From: Rob Clark [mailto:robdcl...@gmail.com]
>> Sent: Tuesday, May 14, 2013 10:39 PM
>> To: Inki Dae
>> Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun
>> Cho; linux-arm-ker...@lists.infradead.org; linux-media@vger.kernel.org
>> Subject: Re: Introduce a new helper framework for buffer synchronization
>>
>> On Mon, May 13, 2013 at 10:52 PM, Inki Dae  wrote:
>> >> well, for cache management, I think it is a better idea.. I didn't
>> >> really catch that this was the motivation from the initial patch, but
>> >> maybe I read it too quickly.  But cache can be decoupled from
>> >> synchronization, because CPU access is not asynchronous.  For
>> >> userspace/CPU access to buffer, you should:
>> >>
>> >>   1) wait for buffer
>> >>   2) prepare-access
>> >>   3)  ... do whatever cpu access to buffer ...
>> >>   4) finish-access
>> >>   5) submit buffer for new dma-operation
>> >>
>> >
>> >
>> > For data flow from CPU to DMA device,
>> > 1) wait for buffer
>> > 2) prepare-access (dma_buf_begin_cpu_access)
>> > 3) cpu access to buffer
>> >
>> >
>> > For data flow from DMA device to CPU
>> > 1) wait for buffer
>>
>> Right, but CPU access isn't asynchronous (from the point of view of
>> the CPU), so there isn't really any wait step at this point.  And if
>> you do want the CPU to be able to signal a fence from userspace for
>> some reason, you probably what something file/fd based so the
>> refcnting/cleanup when process dies doesn't leave some pending DMA
>> action wedged.  But I don't really see the point of that complexity
>> when the CPU access isn't asynchronous in the first place.
>>
>
> There was my missing comments, please see the below sequence.
>
> For data flow from CPU to DMA device and then from DMA device to CPU,
> 1) wait for buffer <- at user side - ioctl(fd, DMA_BUF_GET_FENCE, ...)
> - including prepare-access (dma_buf_begin_cpu_access)
> 2) cpu access to buffer
> 3) wait for buffer <- at device driver
> - but CPU is already accessing the buffer so blocked.
> 4) signal <- at user side - ioctl(fd, DMA_BUF_PUT_FENCE, ...)
> 5) the thread, blocked at 3), is waked up by 4).
> - and then finish-access (dma_buf_end_cpu_access)

right, I understand you can have background threads, etc, in
userspace.  But there are already plenty of synchronization primitives
that can be used for cpu->cpu synchronization, either within the same
process or between multiple processes.  For cpu access, even if it is
handled by background threads/processes, I think it is better to use
the traditional pthreads or unix synchronization primitives.  They
have existed forever, they are well tested, and they work.

So while it seems nice and orthogonal/clean to couple cache and
synchronization and handle dma->cpu and cpu->cpu and cpu->dma in the
same generic way, but I think in practice we have to make things more
complex than they otherwise need to be to do this.  Otherwise I think
we'll be having problems with badly behaved or crashing userspace.

BR,
-R

> 6) dma access to buffer
> 7) wait for buffer <- at user side - ioctl(fd, DMA_BUF_GET_FENCE, ...)
> - but DMA is already accessing the buffer so blocked.
> 8) signal <- at device driver
> 9) the thread, blocked at 7), is waked up by 8)
> - and then prepare-access (dma_buf_begin_cpu_access)
> 10 cpu access to buffer
>
> Basically, 'wait for buffer' includes buffer synchronization, committing
> processing, and cache operation. The buffer synchronization means that a
> current thread should wait for other threads accessing a shared buffer until
> the completion of their access. And the committing processing means that a
> current thread possesses the shared buffer so any trying to access the
> shared buffer by another thread makes the thread to be blocked. However, as
> I already mentioned before, it seems that these user interfaces are so ugly
> yet. So we need better way.
>
> Give me more comments if there is my missing point :)
>
> Thanks,
> Inki Dae
>
>> BR,
>> -R
>>
>>
>> > 2) finish-access (dma_buf_end _cpu_access)
>> > 3) dma access to buffer
>> >
>> > 1) and 2) are coupled with one function: we have implemented
>> > fence_helper_commit_reserve() for it.
>> >
>> > Cache control(cache clean or cache invalidate) is performed properly
>> > checking previous access type and current access type.
>> > And the below is actual codes for it,
>
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Introduce a new helper framework for buffer synchronization

2013-05-14 Thread Inki Dae


> -Original Message-
> From: Rob Clark [mailto:robdcl...@gmail.com]
> Sent: Tuesday, May 14, 2013 10:39 PM
> To: Inki Dae
> Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun
> Cho; linux-arm-ker...@lists.infradead.org; linux-media@vger.kernel.org
> Subject: Re: Introduce a new helper framework for buffer synchronization
> 
> On Mon, May 13, 2013 at 10:52 PM, Inki Dae  wrote:
> >> well, for cache management, I think it is a better idea.. I didn't
> >> really catch that this was the motivation from the initial patch, but
> >> maybe I read it too quickly.  But cache can be decoupled from
> >> synchronization, because CPU access is not asynchronous.  For
> >> userspace/CPU access to buffer, you should:
> >>
> >>   1) wait for buffer
> >>   2) prepare-access
> >>   3)  ... do whatever cpu access to buffer ...
> >>   4) finish-access
> >>   5) submit buffer for new dma-operation
> >>
> >
> >
> > For data flow from CPU to DMA device,
> > 1) wait for buffer
> > 2) prepare-access (dma_buf_begin_cpu_access)
> > 3) cpu access to buffer
> >
> >
> > For data flow from DMA device to CPU
> > 1) wait for buffer
> 
> Right, but CPU access isn't asynchronous (from the point of view of
> the CPU), so there isn't really any wait step at this point.  And if
> you do want the CPU to be able to signal a fence from userspace for
> some reason, you probably what something file/fd based so the
> refcnting/cleanup when process dies doesn't leave some pending DMA
> action wedged.  But I don't really see the point of that complexity
> when the CPU access isn't asynchronous in the first place.
>

There was my missing comments, please see the below sequence.

For data flow from CPU to DMA device and then from DMA device to CPU,
1) wait for buffer <- at user side - ioctl(fd, DMA_BUF_GET_FENCE, ...)
- including prepare-access (dma_buf_begin_cpu_access)
2) cpu access to buffer
3) wait for buffer <- at device driver
- but CPU is already accessing the buffer so blocked.
4) signal <- at user side - ioctl(fd, DMA_BUF_PUT_FENCE, ...)
5) the thread, blocked at 3), is waked up by 4).
- and then finish-access (dma_buf_end_cpu_access)
6) dma access to buffer
7) wait for buffer <- at user side - ioctl(fd, DMA_BUF_GET_FENCE, ...)
- but DMA is already accessing the buffer so blocked.
8) signal <- at device driver
9) the thread, blocked at 7), is waked up by 8)
- and then prepare-access (dma_buf_begin_cpu_access)
10 cpu access to buffer

Basically, 'wait for buffer' includes buffer synchronization, committing
processing, and cache operation. The buffer synchronization means that a
current thread should wait for other threads accessing a shared buffer until
the completion of their access. And the committing processing means that a
current thread possesses the shared buffer so any trying to access the
shared buffer by another thread makes the thread to be blocked. However, as
I already mentioned before, it seems that these user interfaces are so ugly
yet. So we need better way.

Give me more comments if there is my missing point :)

Thanks,
Inki Dae

> BR,
> -R
> 
> 
> > 2) finish-access (dma_buf_end _cpu_access)
> > 3) dma access to buffer
> >
> > 1) and 2) are coupled with one function: we have implemented
> > fence_helper_commit_reserve() for it.
> >
> > Cache control(cache clean or cache invalidate) is performed properly
> > checking previous access type and current access type.
> > And the below is actual codes for it,

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Introduce a new helper framework for buffer synchronization

2013-05-14 Thread Rob Clark
On Mon, May 13, 2013 at 10:52 PM, Inki Dae  wrote:
>> well, for cache management, I think it is a better idea.. I didn't
>> really catch that this was the motivation from the initial patch, but
>> maybe I read it too quickly.  But cache can be decoupled from
>> synchronization, because CPU access is not asynchronous.  For
>> userspace/CPU access to buffer, you should:
>>
>>   1) wait for buffer
>>   2) prepare-access
>>   3)  ... do whatever cpu access to buffer ...
>>   4) finish-access
>>   5) submit buffer for new dma-operation
>>
>
>
> For data flow from CPU to DMA device,
> 1) wait for buffer
> 2) prepare-access (dma_buf_begin_cpu_access)
> 3) cpu access to buffer
>
>
> For data flow from DMA device to CPU
> 1) wait for buffer

Right, but CPU access isn't asynchronous (from the point of view of
the CPU), so there isn't really any wait step at this point.  And if
you do want the CPU to be able to signal a fence from userspace for
some reason, you probably what something file/fd based so the
refcnting/cleanup when process dies doesn't leave some pending DMA
action wedged.  But I don't really see the point of that complexity
when the CPU access isn't asynchronous in the first place.

BR,
-R


> 2) finish-access (dma_buf_end _cpu_access)
> 3) dma access to buffer
>
> 1) and 2) are coupled with one function: we have implemented
> fence_helper_commit_reserve() for it.
>
> Cache control(cache clean or cache invalidate) is performed properly
> checking previous access type and current access type.
> And the below is actual codes for it,
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Introduce a new helper framework for buffer synchronization

2013-05-13 Thread Inki Dae


> -Original Message-
> From: Rob Clark [mailto:robdcl...@gmail.com]
> Sent: Tuesday, May 14, 2013 2:58 AM
> To: Inki Dae
> Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun
> Cho; linux-arm-ker...@lists.infradead.org; linux-media@vger.kernel.org
> Subject: Re: Introduce a new helper framework for buffer synchronization
> 
> On Mon, May 13, 2013 at 1:18 PM, Inki Dae  wrote:
> >
> >
> > 2013/5/13 Rob Clark 
> >>
> >> On Mon, May 13, 2013 at 8:21 AM, Inki Dae  wrote:
> >> >
> >> >> In that case you still wouldn't give userspace control over the
> fences.
> >> >> I
> >> >> don't see any way that can end well.
> >> >> What if userspace never signals? What if userspace gets killed by
> oom
> >> >> killer. Who keeps track of that?
> >> >>
> >> >
> >> > In all cases, all kernel resources to user fence will be released by
> >> > kernel
> >> > once the fence is timed out: never signaling and process killing by
> oom
> >> > killer makes the fence timed out. And if we use mmap mechanism you
> >> > mentioned
> >> > before, I think user resource could also be freed properly.
> >>
> >>
> >> I tend to agree w/ Maarten here.. there is no good reason for
> >> userspace to be *signaling* fences.  The exception might be some blob
> >> gpu drivers which don't have enough knowledge in the kernel to figure
> >> out what to do.  (In which case you can add driver private ioctls for
> >> that.. still not the right thing to do but at least you don't make a
> >> public API out of it.)
> >>
> >
> > Please do not care whether those are generic or not. Let's see the
> following
> > three things. First, it's cache operation. As you know, ARM SoC has ACP
> > (Accelerator Coherency Port) and can be connected to DMA engine or
> similar
> > devices. And this port is used for cache coherency between CPU cache and
> DMA
> > device. However, most devices on ARM based embedded systems don't use
> the
> > ACP port. So they need proper cache operation before and after of DMA or
> CPU
> > access in case of using cachable mapping. Actually, I see many Linux
> based
> > platforms call cache control interfaces directly for that. I think the
> > reason, they do so, is that kernel isn't aware of when and how CPU
> accessed
> > memory.
> 
> I think we had kicked around the idea of giving dmabuf's a
> prepare/finish ioctl quite some time back.  This is probably something
> that should be at least a bit decoupled from fences.  (Possibly
> 'prepare' waits for dma access to complete, but not the other way
> around.)
> 
> And I did implement in omapdrm support for simulating coherency via
> page fault-in / shoot-down..  It is one option that makes it
> completely transparent to userspace, although there is some
> performance const, so I suppose it depends a bit on your use-case.
> 
> > And second, user process has to do so many things in case of using
> shared
> > memory with DMA device. User process should understand how DMA device is
> > operated and when interfaces for controling the DMA device are called.
> Such
> > things would make user application so complicated.
> >
> > And third, it's performance optimization to multimedia and graphics
> devices.
> > As I mentioned already, we should consider sequential processing for
> buffer
> > sharing between CPU and DMA device. This means that CPU should stay with
> > idle until DMA device is completed and vise versa.
> >
> > That is why I proposed such user interfaces. Of course, these interfaces
> > might be so ugly yet: for this, Maarten pointed already out and I agree
> with
> > him. But there must be another better way. Aren't you think we need
> similar
> > thing? With such interfaces, cache control and buffer synchronization
> can be
> > performed in kernel level. Moreover, user applization doesn't need to
> > consider DMA device controlling anymore. Therefore, one thread can
> access a
> > shared buffer and the other can control DMA device with the shared
> buffer in
> > parallel. We can really make the best use of CPU and DMA idle time. In
> other
> > words, we can really make the best use of multi tasking OS, Linux.
> >
> > So could you please tell me about that there is any reason we don't use
> > public API for it? I think we can add and use public API if NECESSARY.
> 
> we

Re: Introduce a new helper framework for buffer synchronization

2013-05-13 Thread Tomasz Figa
Hi,

On Monday 13 of May 2013 20:24:01 Inki Dae wrote:
> > -Original Message-
> > From: Maarten Lankhorst [mailto:maarten.lankho...@canonical.com]
> > Sent: Monday, May 13, 2013 6:52 PM
> > To: Inki Dae
> > Cc: 'Rob Clark'; 'Daniel Vetter'; 'DRI mailing list'; linux-arm-
> > ker...@lists.infradead.org; linux-media@vger.kernel.org;
> > 'linux-fbdev';
> > 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho'
> > Subject: Re: Introduce a new helper framework for buffer
> > synchronization> 
> > Op 13-05-13 11:21, Inki Dae schreef:
> > >> -Original Message-
> > >> From: Maarten Lankhorst [mailto:maarten.lankho...@canonical.com]
> > >> Sent: Monday, May 13, 2013 5:01 PM
> > >> To: Inki Dae
> > >> Cc: Rob Clark; Daniel Vetter; DRI mailing list; linux-arm-
> > >> ker...@lists.infradead.org; linux-media@vger.kernel.org;
> > >> linux-fbdev;
> > >> Kyungmin Park; myungjoo.ham; YoungJun Cho
> > >> Subject: Re: Introduce a new helper framework for buffer
> > 
> > synchronization
> > 
> > >> Op 09-05-13 09:33, Inki Dae schreef:
> > >>> Hi all,
> > >>> 
> > >>> This post introduces a new helper framework based on dma fence.
> > >>> And
> > 
> > the
> > 
> > >>> purpose of this post is to collect other opinions and advices
> > >>> before
> > 
> > RFC
> > 
> > >>> posting.
> > >>> 
> > >>> First of all, this helper framework, called fence helper, is in
> > 
> > progress
> > 
> > >>> yet so might not have enough comments in codes and also might need
> > >>> to
> > 
> > be
> > 
> > >>> more cleaned up. Moreover, we might be missing some parts of the
> > >>> dma
> > >> 
> > >> fence.
> > >> 
> > >>> However, I'd like to say that all things mentioned below has been
> > 
> > tested
> > 
> > >>> with Linux platform and worked well.
> > >>> 
> > >>> 
> > >>> And tutorial for user process.
> > >>> 
> > >>> just before cpu access
> > >>> 
> > >>> struct dma_buf_fence *df;
> > >>> 
> > >>> df->type = DMA_BUF_ACCESS_READ or
> 
> DMA_BUF_ACCESS_WRITE;
> 
> > >>> ioctl(fd, DMA_BUF_GET_FENCE, &df);
> > >>> 
> > >>> after memset or memcpy
> > >>> 
> > >>> ioctl(fd, DMA_BUF_PUT_FENCE, &df);
> > >> 
> > >> NAK.
> > >> 
> > >> Userspace doesn't need to trigger fences. It can do a buffer idle
> > >> wait,
> > >> and postpone submitting new commands until after it's done using
> > >> the
> > >> buffer.
> > > 
> > > Hi Maarten,
> > > 
> > > It seems that you say user should wait for a buffer like KDS does:
> > > KDS
> > 
> > uses
> > 
> > > select() to postpone submitting new commands. But I think this way
> > 
> > assumes
> > 
> > > that every data flows a DMA device to a CPU. For example, a CPU
> > > should
> > 
> > keep
> > 
> > > polling for the completion of a buffer access by a DMA device. This
> > 
> > means
> > 
> > > that the this way isn't considered for data flow to opposite case;
> > > CPU
> > 
> > to
> > 
> > > DMA device.
> > 
> > Not really. You do both things the same way. You first wait for the bo
> > to be idle, this could be implemented by adding poll support to the
> > dma-buf fd.
> > Then you either do your read or write. Since userspace is supposed to
> > be the one controlling the bo it should stay idle at that point. If
> > you have another thread queueing
> > the buffer againbefore your thread is done that's a bug in the
> 
> application,
> 
> > and can be solved with userspace locking primitives. No need for the
> > kernel to get involved.
> 
> Yes, that is how we have synchronized buffer between CPU and DMA device
> until now without buffer synchronization mechanism. I thought that it's
> best to make user not considering anything: user can acce

Re: Introduce a new helper framework for buffer synchronization

2013-05-13 Thread Rob Clark
On Mon, May 13, 2013 at 1:18 PM, Inki Dae  wrote:
>
>
> 2013/5/13 Rob Clark 
>>
>> On Mon, May 13, 2013 at 8:21 AM, Inki Dae  wrote:
>> >
>> >> In that case you still wouldn't give userspace control over the fences.
>> >> I
>> >> don't see any way that can end well.
>> >> What if userspace never signals? What if userspace gets killed by oom
>> >> killer. Who keeps track of that?
>> >>
>> >
>> > In all cases, all kernel resources to user fence will be released by
>> > kernel
>> > once the fence is timed out: never signaling and process killing by oom
>> > killer makes the fence timed out. And if we use mmap mechanism you
>> > mentioned
>> > before, I think user resource could also be freed properly.
>>
>>
>> I tend to agree w/ Maarten here.. there is no good reason for
>> userspace to be *signaling* fences.  The exception might be some blob
>> gpu drivers which don't have enough knowledge in the kernel to figure
>> out what to do.  (In which case you can add driver private ioctls for
>> that.. still not the right thing to do but at least you don't make a
>> public API out of it.)
>>
>
> Please do not care whether those are generic or not. Let's see the following
> three things. First, it's cache operation. As you know, ARM SoC has ACP
> (Accelerator Coherency Port) and can be connected to DMA engine or similar
> devices. And this port is used for cache coherency between CPU cache and DMA
> device. However, most devices on ARM based embedded systems don't use the
> ACP port. So they need proper cache operation before and after of DMA or CPU
> access in case of using cachable mapping. Actually, I see many Linux based
> platforms call cache control interfaces directly for that. I think the
> reason, they do so, is that kernel isn't aware of when and how CPU accessed
> memory.

I think we had kicked around the idea of giving dmabuf's a
prepare/finish ioctl quite some time back.  This is probably something
that should be at least a bit decoupled from fences.  (Possibly
'prepare' waits for dma access to complete, but not the other way
around.)

And I did implement in omapdrm support for simulating coherency via
page fault-in / shoot-down..  It is one option that makes it
completely transparent to userspace, although there is some
performance const, so I suppose it depends a bit on your use-case.

> And second, user process has to do so many things in case of using shared
> memory with DMA device. User process should understand how DMA device is
> operated and when interfaces for controling the DMA device are called. Such
> things would make user application so complicated.
>
> And third, it's performance optimization to multimedia and graphics devices.
> As I mentioned already, we should consider sequential processing for buffer
> sharing between CPU and DMA device. This means that CPU should stay with
> idle until DMA device is completed and vise versa.
>
> That is why I proposed such user interfaces. Of course, these interfaces
> might be so ugly yet: for this, Maarten pointed already out and I agree with
> him. But there must be another better way. Aren't you think we need similar
> thing? With such interfaces, cache control and buffer synchronization can be
> performed in kernel level. Moreover, user applization doesn't need to
> consider DMA device controlling anymore. Therefore, one thread can access a
> shared buffer and the other can control DMA device with the shared buffer in
> parallel. We can really make the best use of CPU and DMA idle time. In other
> words, we can really make the best use of multi tasking OS, Linux.
>
> So could you please tell me about that there is any reason we don't use
> public API for it? I think we can add and use public API if NECESSARY.

well, for cache management, I think it is a better idea.. I didn't
really catch that this was the motivation from the initial patch, but
maybe I read it too quickly.  But cache can be decoupled from
synchronization, because CPU access is not asynchronous.  For
userspace/CPU access to buffer, you should:

  1) wait for buffer
  2) prepare-access
  3)  ... do whatever cpu access to buffer ...
  4) finish-access
  5) submit buffer for new dma-operation

I suppose you could combine the syscall for #1 and #2.. not sure if
that is a good idea or not.  But you don't need to.  And there is
never really any need for userspace to signal a fence.

BR,
-R

> Thanks,
> Inki Dae
>
>>
>> BR,
>> -R
>> ___
>> dri-devel mailing list
>> dri-de...@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Introduce a new helper framework for buffer synchronization

2013-05-13 Thread Rob Clark
On Mon, May 13, 2013 at 8:21 AM, Inki Dae  wrote:
>
>> In that case you still wouldn't give userspace control over the fences. I
>> don't see any way that can end well.
>> What if userspace never signals? What if userspace gets killed by oom
>> killer. Who keeps track of that?
>>
>
> In all cases, all kernel resources to user fence will be released by kernel
> once the fence is timed out: never signaling and process killing by oom
> killer makes the fence timed out. And if we use mmap mechanism you mentioned
> before, I think user resource could also be freed properly.


I tend to agree w/ Maarten here.. there is no good reason for
userspace to be *signaling* fences.  The exception might be some blob
gpu drivers which don't have enough knowledge in the kernel to figure
out what to do.  (In which case you can add driver private ioctls for
that.. still not the right thing to do but at least you don't make a
public API out of it.)

BR,
-R
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Introduce a new helper framework for buffer synchronization

2013-05-13 Thread Inki Dae


> -Original Message-
> From: linux-fbdev-ow...@vger.kernel.org [mailto:linux-fbdev-
> ow...@vger.kernel.org] On Behalf Of Maarten Lankhorst
> Sent: Monday, May 13, 2013 8:41 PM
> To: Inki Dae
> Cc: 'Rob Clark'; 'Daniel Vetter'; 'DRI mailing list'; linux-arm-
> ker...@lists.infradead.org; linux-media@vger.kernel.org; 'linux-fbdev';
> 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho'
> Subject: Re: Introduce a new helper framework for buffer synchronization
> 
> Op 13-05-13 13:24, Inki Dae schreef:
> >> and can be solved with userspace locking primitives. No need for the
> >> kernel to get involved.
> >>
> > Yes, that is how we have synchronized buffer between CPU and DMA device
> > until now without buffer synchronization mechanism. I thought that it's
> best
> > to make user not considering anything: user can access a buffer
> regardless
> > of any DMA device controlling and the buffer synchronization is
> performed in
> > kernel level. Moreover, I think we could optimize graphics and
> multimedia
> > hardware performance because hardware can do more works: one thread
> accesses
> > a shared buffer and the other controls DMA device with the shared buffer
> in
> > parallel. Thus, we could avoid sequential processing and that is my
> > intention. Aren't you think about that we could improve hardware
> utilization
> > with such way or other? of course, there could be better way.
> >
> If you don't want to block the hardware the only option is to allocate a
> temp bo and blit to/from it using the hardware.
> OpenGL already does that when you want to read back, because otherwise the
> entire pipeline can get stalled.
> The overhead of command submission for that shouldn't be high, if it is
> you should really try to optimize that first
> before coming up with this crazy scheme.
> 

I have considered all devices sharing buffer with CPU; multimedia, display
controller and graphics devices (including GPU). And we could provide
easy-to-use user interfaces for buffer synchronization. Of course, the
proposed user interfaces may be so ugly yet but at least, I think we need
something else for it.

> In that case you still wouldn't give userspace control over the fences. I
> don't see any way that can end well.
> What if userspace never signals? What if userspace gets killed by oom
> killer. Who keeps track of that?
> 

In all cases, all kernel resources to user fence will be released by kernel
once the fence is timed out: never signaling and process killing by oom
killer makes the fence timed out. And if we use mmap mechanism you mentioned
before, I think user resource could also be freed properly.

Thanks,
Inki Dae

> ~Maarten
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Introduce a new helper framework for buffer synchronization

2013-05-13 Thread Maarten Lankhorst
Op 13-05-13 13:24, Inki Dae schreef:
>> and can be solved with userspace locking primitives. No need for the
>> kernel to get involved.
>>
> Yes, that is how we have synchronized buffer between CPU and DMA device
> until now without buffer synchronization mechanism. I thought that it's best
> to make user not considering anything: user can access a buffer regardless
> of any DMA device controlling and the buffer synchronization is performed in
> kernel level. Moreover, I think we could optimize graphics and multimedia
> hardware performance because hardware can do more works: one thread accesses
> a shared buffer and the other controls DMA device with the shared buffer in
> parallel. Thus, we could avoid sequential processing and that is my
> intention. Aren't you think about that we could improve hardware utilization
> with such way or other? of course, there could be better way.
>
If you don't want to block the hardware the only option is to allocate a temp 
bo and blit to/from it using the hardware.
OpenGL already does that when you want to read back, because otherwise the 
entire pipeline can get stalled.
The overhead of command submission for that shouldn't be high, if it is you 
should really try to optimize that first
before coming up with this crazy scheme.

In that case you still wouldn't give userspace control over the fences. I don't 
see any way that can end well.
What if userspace never signals? What if userspace gets killed by oom killer. 
Who keeps track of that?

~Maarten
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Introduce a new helper framework for buffer synchronization

2013-05-13 Thread Inki Dae


> -Original Message-
> From: Maarten Lankhorst [mailto:maarten.lankho...@canonical.com]
> Sent: Monday, May 13, 2013 6:52 PM
> To: Inki Dae
> Cc: 'Rob Clark'; 'Daniel Vetter'; 'DRI mailing list'; linux-arm-
> ker...@lists.infradead.org; linux-media@vger.kernel.org; 'linux-fbdev';
> 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho'
> Subject: Re: Introduce a new helper framework for buffer synchronization
> 
> Op 13-05-13 11:21, Inki Dae schreef:
> >
> >> -Original Message-
> >> From: Maarten Lankhorst [mailto:maarten.lankho...@canonical.com]
> >> Sent: Monday, May 13, 2013 5:01 PM
> >> To: Inki Dae
> >> Cc: Rob Clark; Daniel Vetter; DRI mailing list; linux-arm-
> >> ker...@lists.infradead.org; linux-media@vger.kernel.org; linux-fbdev;
> >> Kyungmin Park; myungjoo.ham; YoungJun Cho
> >> Subject: Re: Introduce a new helper framework for buffer
> synchronization
> >>
> >> Op 09-05-13 09:33, Inki Dae schreef:
> >>> Hi all,
> >>>
> >>> This post introduces a new helper framework based on dma fence. And
> the
> >>> purpose of this post is to collect other opinions and advices before
> RFC
> >>> posting.
> >>>
> >>> First of all, this helper framework, called fence helper, is in
> progress
> >>> yet so might not have enough comments in codes and also might need to
> be
> >>> more cleaned up. Moreover, we might be missing some parts of the dma
> >> fence.
> >>> However, I'd like to say that all things mentioned below has been
> tested
> >>> with Linux platform and worked well.
> >>> 
> >>>
> >>> And tutorial for user process.
> >>> just before cpu access
> >>> struct dma_buf_fence *df;
> >>>
> >>> df->type = DMA_BUF_ACCESS_READ or
DMA_BUF_ACCESS_WRITE;
> >>> ioctl(fd, DMA_BUF_GET_FENCE, &df);
> >>>
> >>> after memset or memcpy
> >>> ioctl(fd, DMA_BUF_PUT_FENCE, &df);
> >> NAK.
> >>
> >> Userspace doesn't need to trigger fences. It can do a buffer idle wait,
> >> and postpone submitting new commands until after it's done using the
> >> buffer.
> > Hi Maarten,
> >
> > It seems that you say user should wait for a buffer like KDS does: KDS
> uses
> > select() to postpone submitting new commands. But I think this way
> assumes
> > that every data flows a DMA device to a CPU. For example, a CPU should
> keep
> > polling for the completion of a buffer access by a DMA device. This
> means
> > that the this way isn't considered for data flow to opposite case; CPU
> to
> > DMA device.
> Not really. You do both things the same way. You first wait for the bo to
> be idle, this could be implemented by adding poll support to the dma-buf
> fd.
> Then you either do your read or write. Since userspace is supposed to be
> the one controlling the bo it should stay idle at that point. If you have
> another thread queueing
> the buffer againbefore your thread is done that's a bug in the
application,
> and can be solved with userspace locking primitives. No need for the
> kernel to get involved.
> 

Yes, that is how we have synchronized buffer between CPU and DMA device
until now without buffer synchronization mechanism. I thought that it's best
to make user not considering anything: user can access a buffer regardless
of any DMA device controlling and the buffer synchronization is performed in
kernel level. Moreover, I think we could optimize graphics and multimedia
hardware performance because hardware can do more works: one thread accesses
a shared buffer and the other controls DMA device with the shared buffer in
parallel. Thus, we could avoid sequential processing and that is my
intention. Aren't you think about that we could improve hardware utilization
with such way or other? of course, there could be better way.

> >> Kernel space doesn't need the root hole you created by giving a
> >> dereferencing a pointer passed from userspace.
> >> Your next exercise should be to write a security exploit from the api
> you
> >> created here. It's the only way to learn how to write safe code. Hint:
> >> df.ctx = mmap(..);
> >>
> > Also I'm not clear to use our way yet and that is why I posted. As you
> > mentioned, it seems like that using mmap() is more safe. But there is
> one
> > issue

Re: Introduce a new helper framework for buffer synchronization

2013-05-13 Thread Maarten Lankhorst
Op 13-05-13 11:21, Inki Dae schreef:
>
>> -Original Message-
>> From: Maarten Lankhorst [mailto:maarten.lankho...@canonical.com]
>> Sent: Monday, May 13, 2013 5:01 PM
>> To: Inki Dae
>> Cc: Rob Clark; Daniel Vetter; DRI mailing list; linux-arm-
>> ker...@lists.infradead.org; linux-media@vger.kernel.org; linux-fbdev;
>> Kyungmin Park; myungjoo.ham; YoungJun Cho
>> Subject: Re: Introduce a new helper framework for buffer synchronization
>>
>> Op 09-05-13 09:33, Inki Dae schreef:
>>> Hi all,
>>>
>>> This post introduces a new helper framework based on dma fence. And the
>>> purpose of this post is to collect other opinions and advices before RFC
>>> posting.
>>>
>>> First of all, this helper framework, called fence helper, is in progress
>>> yet so might not have enough comments in codes and also might need to be
>>> more cleaned up. Moreover, we might be missing some parts of the dma
>> fence.
>>> However, I'd like to say that all things mentioned below has been tested
>>> with Linux platform and worked well.
>>> 
>>>
>>> And tutorial for user process.
>>> just before cpu access
>>> struct dma_buf_fence *df;
>>>
>>> df->type = DMA_BUF_ACCESS_READ or DMA_BUF_ACCESS_WRITE;
>>> ioctl(fd, DMA_BUF_GET_FENCE, &df);
>>>
>>> after memset or memcpy
>>> ioctl(fd, DMA_BUF_PUT_FENCE, &df);
>> NAK.
>>
>> Userspace doesn't need to trigger fences. It can do a buffer idle wait,
>> and postpone submitting new commands until after it's done using the
>> buffer.
> Hi Maarten,
>
> It seems that you say user should wait for a buffer like KDS does: KDS uses
> select() to postpone submitting new commands. But I think this way assumes
> that every data flows a DMA device to a CPU. For example, a CPU should keep
> polling for the completion of a buffer access by a DMA device. This means
> that the this way isn't considered for data flow to opposite case; CPU to
> DMA device.
Not really. You do both things the same way. You first wait for the bo to be 
idle, this could be implemented by adding poll support to the dma-buf fd.
Then you either do your read or write. Since userspace is supposed to be the 
one controlling the bo it should stay idle at that point. If you have another 
thread queueing
the buffer againbefore your thread is done that's a bug in the application, and 
can be solved with userspace locking primitives. No need for the kernel to get 
involved.

>> Kernel space doesn't need the root hole you created by giving a
>> dereferencing a pointer passed from userspace.
>> Your next exercise should be to write a security exploit from the api you
>> created here. It's the only way to learn how to write safe code. Hint:
>> df.ctx = mmap(..);
>>
> Also I'm not clear to use our way yet and that is why I posted. As you
> mentioned, it seems like that using mmap() is more safe. But there is one
> issue it makes me confusing. For your hint, df.ctx = mmap(..), the issue is
> that dmabuf mmap can be used to map a dmabuf with user space. And the dmabuf
> means a physical memory region allocated by some allocator such as drm gem
> or ion.
>
> There might be my missing point so could you please give me more comments?
>
My point was that userspace could change df.ctx to some mmap'd memory, forcing 
the kernel to execute some code prepared by userspace.
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Introduce a new helper framework for buffer synchronization

2013-05-13 Thread Inki Dae


> -Original Message-
> From: Maarten Lankhorst [mailto:maarten.lankho...@canonical.com]
> Sent: Monday, May 13, 2013 5:01 PM
> To: Inki Dae
> Cc: Rob Clark; Daniel Vetter; DRI mailing list; linux-arm-
> ker...@lists.infradead.org; linux-media@vger.kernel.org; linux-fbdev;
> Kyungmin Park; myungjoo.ham; YoungJun Cho
> Subject: Re: Introduce a new helper framework for buffer synchronization
> 
> Op 09-05-13 09:33, Inki Dae schreef:
> > Hi all,
> >
> > This post introduces a new helper framework based on dma fence. And the
> > purpose of this post is to collect other opinions and advices before RFC
> > posting.
> >
> > First of all, this helper framework, called fence helper, is in progress
> > yet so might not have enough comments in codes and also might need to be
> > more cleaned up. Moreover, we might be missing some parts of the dma
> fence.
> > However, I'd like to say that all things mentioned below has been tested
> > with Linux platform and worked well.
> 
> > 
> >
> > And tutorial for user process.
> > just before cpu access
> > struct dma_buf_fence *df;
> >
> > df->type = DMA_BUF_ACCESS_READ or DMA_BUF_ACCESS_WRITE;
> > ioctl(fd, DMA_BUF_GET_FENCE, &df);
> >
> > after memset or memcpy
> > ioctl(fd, DMA_BUF_PUT_FENCE, &df);
> NAK.
> 
> Userspace doesn't need to trigger fences. It can do a buffer idle wait,
> and postpone submitting new commands until after it's done using the
> buffer.

Hi Maarten,

It seems that you say user should wait for a buffer like KDS does: KDS uses
select() to postpone submitting new commands. But I think this way assumes
that every data flows a DMA device to a CPU. For example, a CPU should keep
polling for the completion of a buffer access by a DMA device. This means
that the this way isn't considered for data flow to opposite case; CPU to
DMA device.

> Kernel space doesn't need the root hole you created by giving a
> dereferencing a pointer passed from userspace.
> Your next exercise should be to write a security exploit from the api you
> created here. It's the only way to learn how to write safe code. Hint:
> df.ctx = mmap(..);
> 

Also I'm not clear to use our way yet and that is why I posted. As you
mentioned, it seems like that using mmap() is more safe. But there is one
issue it makes me confusing. For your hint, df.ctx = mmap(..), the issue is
that dmabuf mmap can be used to map a dmabuf with user space. And the dmabuf
means a physical memory region allocated by some allocator such as drm gem
or ion.

There might be my missing point so could you please give me more comments?

Thanks,
Inki Dae



> ~Maarten

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Introduce a new helper framework for buffer synchronization

2013-05-13 Thread Maarten Lankhorst
Op 09-05-13 09:33, Inki Dae schreef:
> Hi all,
>
> This post introduces a new helper framework based on dma fence. And the
> purpose of this post is to collect other opinions and advices before RFC
> posting.
>
> First of all, this helper framework, called fence helper, is in progress
> yet so might not have enough comments in codes and also might need to be
> more cleaned up. Moreover, we might be missing some parts of the dma fence.
> However, I'd like to say that all things mentioned below has been tested
> with Linux platform and worked well.

> 
>
> And tutorial for user process.
> just before cpu access
> struct dma_buf_fence *df;
>
> df->type = DMA_BUF_ACCESS_READ or DMA_BUF_ACCESS_WRITE;
> ioctl(fd, DMA_BUF_GET_FENCE, &df);
>
> after memset or memcpy
> ioctl(fd, DMA_BUF_PUT_FENCE, &df);
NAK.

Userspace doesn't need to trigger fences. It can do a buffer idle wait, and 
postpone submitting new commands until after it's done using the buffer.
Kernel space doesn't need the root hole you created by giving a dereferencing a 
pointer passed from userspace.
Your next exercise should be to write a security exploit from the api you 
created here. It's the only way to learn how to write safe code. Hint: df.ctx = 
mmap(..);

~Maarten
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html