Re: [PATCH] dma-buf: add DMA_BUF_IOCTL_SYNC_PARTIAL support

2024-04-12 Thread Rong Qianfeng



在 2024/4/12 0:52, T.J. Mercier 写道:

On Thu, Apr 11, 2024 at 1:21 AM Rong Qianfeng <11065...@vivo.com> wrote:


在 2024/4/10 0:37, T.J. Mercier 写道:

[You don't often get email from tjmerc...@google.com. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

On Tue, Apr 9, 2024 at 12:34 AM Rong Qianfeng <11065...@vivo.com> wrote:

在 2024/4/8 15:58, Christian König 写道:

Am 07.04.24 um 09:50 schrieb Rong Qianfeng:

[SNIP]

Am 13.11.21 um 07:22 schrieb Jianqun Xu:

Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with
offset and len.

You have not given an use case for this so it is a bit hard to
review. And from the existing use cases I don't see why this should
be necessary.

Even worse from the existing backend implementation I don't even see
how drivers should be able to fulfill this semantics.

Please explain further,
Christian.

Here is a practical case:
The user space can allocate a large chunk of dma-buf for
self-management, used as a shared memory pool.
Small dma-buf can be allocated from this shared memory pool and
released back to it after use, thus improving the speed of dma-buf
allocation and release.
Additionally, custom functionalities such as memory statistics and
boundary checking can be implemented in the user space.
Of course, the above-mentioned functionalities require the
implementation of a partial cache sync interface.

Well that is obvious, but where is the code doing that?

You can't send out code without an actual user of it. That will
obviously be rejected.

Regards,
Christian.

In fact, we have already used the user-level dma-buf memory pool in the
camera shooting scenario on the phone.

   From the test results, The execution time of the photo shooting
algorithm has been reduced from 3.8s to 3s.


For phones, the (out of tree) Android version of the system heap has a
page pool connected to a shrinker. That allows you to skip page
allocation without fully pinning the memory like you get when
allocating a dma-buf that's way larger than necessary. If it's for a
camera maybe you need physically contiguous memory, but it's also
possible to set that up.

https://android.googlesource.com/kernel/common/+/refs/heads/android14-6.1/drivers/dma-buf/heaps/system_heap.c#377


Thank you for the reminder.

The page pool of the system heap can meet the needs of most scenarios,
but the camera shooting scenario is quite special.

Why the camera shooting scenario needs to have a dma-buf memory pool in
the user-level?

(1) The memory demand is extremely high and time requirements are
stringent.

Preallocating for this makes sense to me, whether it is individual
buffers or one large one.


(2) The memory in the page pool(system heap) is easily reclaimed or used
by other apps.

Yeah, by design I guess. I have seen an implementation where the page
pool is proactively refilled after it has been empty for some time
which would help in scenarios where the pool is often reclaimed and
low/empty. You could add that in a vendor heap.

Yeah, a similar feature has already been implemented by vendor.



(3) High concurrent allocation and release (with deferred-free) lead to
high memory usage peaks.

Hopefully this is not every frame? I saw enough complaints about the
deferred free of pool pages that it hasn't been carried forward since
Android 13, so this should be less of a problem on recent kernels.


Additionally, after the camera exits, the shared memory pool can be
released, with minimal impact.

Why do you care about the difference here? In one case it's when the
buffer refcount goes to 0 and memory is freed immediately, and in the
other it's the next time reclaim runs the shrinker.
I'm sorry, my description wasn't clear enough. What I meant to explain 
is that

the user-level dma-buf memory pool does not use reserved memory
(physically contiguous memory), and the memoryoccupation time is not too
long, resulting in a minimal impact on the system.



I don't want to add UAPI for DMA_BUF_IOCTL_SYNC_PARTIAL to Android
without it being in the upstream kernel. I don't think we can get that
without an in-kernel user of dma_buf_begin_cpu_access_partial first,
even though your use case wouldn't rely on that in-kernel usage. :\ So
if you want to add this to phones for your camera app, then I think
your best option is to add a vendor driver which implements this IOCTL
and calls the dma_buf_begin_cpu_access_partial functions which are
already exported.
Ok, thank you very much for your suggestion. I will definitely take it 
into consideration.


Best,
T.J.


Re: [PATCH] dma-buf: add DMA_BUF_IOCTL_SYNC_PARTIAL support

2024-04-11 Thread T.J. Mercier
On Thu, Apr 11, 2024 at 1:21 AM Rong Qianfeng <11065...@vivo.com> wrote:
>
>
> 在 2024/4/10 0:37, T.J. Mercier 写道:
> > [You don't often get email from tjmerc...@google.com. Learn why this is 
> > important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > On Tue, Apr 9, 2024 at 12:34 AM Rong Qianfeng <11065...@vivo.com> wrote:
> >>
> >> 在 2024/4/8 15:58, Christian König 写道:
> >>> Am 07.04.24 um 09:50 schrieb Rong Qianfeng:
>  [SNIP]
> > Am 13.11.21 um 07:22 schrieb Jianqun Xu:
> >> Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with
> >> offset and len.
> > You have not given an use case for this so it is a bit hard to
> > review. And from the existing use cases I don't see why this should
> > be necessary.
> >
> > Even worse from the existing backend implementation I don't even see
> > how drivers should be able to fulfill this semantics.
> >
> > Please explain further,
> > Christian.
>  Here is a practical case:
>  The user space can allocate a large chunk of dma-buf for
>  self-management, used as a shared memory pool.
>  Small dma-buf can be allocated from this shared memory pool and
>  released back to it after use, thus improving the speed of dma-buf
>  allocation and release.
>  Additionally, custom functionalities such as memory statistics and
>  boundary checking can be implemented in the user space.
>  Of course, the above-mentioned functionalities require the
>  implementation of a partial cache sync interface.
> >>> Well that is obvious, but where is the code doing that?
> >>>
> >>> You can't send out code without an actual user of it. That will
> >>> obviously be rejected.
> >>>
> >>> Regards,
> >>> Christian.
> >> In fact, we have already used the user-level dma-buf memory pool in the
> >> camera shooting scenario on the phone.
> >>
> >>   From the test results, The execution time of the photo shooting
> >> algorithm has been reduced from 3.8s to 3s.
> >>
> > For phones, the (out of tree) Android version of the system heap has a
> > page pool connected to a shrinker. That allows you to skip page
> > allocation without fully pinning the memory like you get when
> > allocating a dma-buf that's way larger than necessary. If it's for a
> > camera maybe you need physically contiguous memory, but it's also
> > possible to set that up.
> >
> > https://android.googlesource.com/kernel/common/+/refs/heads/android14-6.1/drivers/dma-buf/heaps/system_heap.c#377
> >
> Thank you for the reminder.
>
> The page pool of the system heap can meet the needs of most scenarios,
> but the camera shooting scenario is quite special.
>
> Why the camera shooting scenario needs to have a dma-buf memory pool in
> the user-level?
>
> (1) The memory demand is extremely high and time requirements are
> stringent.

Preallocating for this makes sense to me, whether it is individual
buffers or one large one.

> (2) The memory in the page pool(system heap) is easily reclaimed or used
> by other apps.

Yeah, by design I guess. I have seen an implementation where the page
pool is proactively refilled after it has been empty for some time
which would help in scenarios where the pool is often reclaimed and
low/empty. You could add that in a vendor heap.

> (3) High concurrent allocation and release (with deferred-free) lead to
> high memory usage peaks.

Hopefully this is not every frame? I saw enough complaints about the
deferred free of pool pages that it hasn't been carried forward since
Android 13, so this should be less of a problem on recent kernels.

> Additionally, after the camera exits, the shared memory pool can be
> released, with minimal impact.

Why do you care about the difference here? In one case it's when the
buffer refcount goes to 0 and memory is freed immediately, and in the
other it's the next time reclaim runs the shrinker.


I don't want to add UAPI for DMA_BUF_IOCTL_SYNC_PARTIAL to Android
without it being in the upstream kernel. I don't think we can get that
without an in-kernel user of dma_buf_begin_cpu_access_partial first,
even though your use case wouldn't rely on that in-kernel usage. :\ So
if you want to add this to phones for your camera app, then I think
your best option is to add a vendor driver which implements this IOCTL
and calls the dma_buf_begin_cpu_access_partial functions which are
already exported.

Best,
T.J.


Re: [PATCH] dma-buf: add DMA_BUF_IOCTL_SYNC_PARTIAL support

2024-04-11 Thread Christian König

Am 10.04.24 um 17:07 schrieb T.J. Mercier:

On Wed, Apr 10, 2024 at 7:22 AM Christian König
 wrote:

Am 09.04.24 um 18:37 schrieb T.J. Mercier:

On Tue, Apr 9, 2024 at 12:34 AM Rong Qianfeng <11065...@vivo.com> wrote:

在 2024/4/8 15:58, Christian König 写道:

Am 07.04.24 um 09:50 schrieb Rong Qianfeng:

[SNIP]

Am 13.11.21 um 07:22 schrieb Jianqun Xu:

Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with
offset and len.

You have not given an use case for this so it is a bit hard to
review. And from the existing use cases I don't see why this should
be necessary.

Even worse from the existing backend implementation I don't even see
how drivers should be able to fulfill this semantics.

Please explain further,
Christian.

Here is a practical case:
The user space can allocate a large chunk of dma-buf for
self-management, used as a shared memory pool.
Small dma-buf can be allocated from this shared memory pool and
released back to it after use, thus improving the speed of dma-buf
allocation and release.
Additionally, custom functionalities such as memory statistics and
boundary checking can be implemented in the user space.
Of course, the above-mentioned functionalities require the
implementation of a partial cache sync interface.

Well that is obvious, but where is the code doing that?

You can't send out code without an actual user of it. That will
obviously be rejected.

Regards,
Christian.

In fact, we have already used the user-level dma-buf memory pool in the
camera shooting scenario on the phone.

   From the test results, The execution time of the photo shooting
algorithm has been reduced from 3.8s to 3s.


For phones, the (out of tree) Android version of the system heap has a
page pool connected to a shrinker.

Well, it should be obvious but I'm going to repeat it here: Submitting
kernel patches for our of tree code is a rather *extreme* no-go.


Sorry I think my comment led to some confusion. I wasn't suggesting
you should take the patch; it's clearly incomplete. I was pointing out
another option to Rong. It appears Rong is creating a single oversized
dma-buf, and subdividing it in userspace to avoid multiple allocations
for multiple buffers. That would lead to a need for a partial sync of
the buffer from userspace. Instead of that, using a heap with a page
pool gets you kind of the same thing with a lot less headache in
userspace, and no need for partial sync. So I wanted to share that
option, and that can go in just Android if necessary without this
patch.


Ah! Thanks for the clarification and sorry for any noise I created.

I mean from the technical side the patch doesn't looks invalid or 
anything, but there is simply no upstream use case.


Regards,
Christian.




That in kernel code *must* have an in kernel user is a number one rule.

When somebody violates this rule he pretty much disqualifying himself as
reliable source of patches since maintainers then have to assume that
this person tries to submit code which doesn't have a justification to
be upstream.

So while this actually looks useful from the technical side as long as
nobody does an implementation based on an upstream driver I absolutely
have to reject it from the organizational side.

Regards,
Christian.


   That allows you to skip page
allocation without fully pinning the memory like you get when
allocating a dma-buf that's way larger than necessary. If it's for a
camera maybe you need physically contiguous memory, but it's also
possible to set that up.

https://android.googlesource.com/kernel/common/+/refs/heads/android14-6.1/drivers/dma-buf/heaps/system_heap.c#377



To be honest, I didn't understand your concern "...how drivers should be
able to fulfill this semantics." Can you please help explain it in more
detail?

Thanks,

Rong Qianfeng.


Thanks
Rong Qianfeng.




Re: [PATCH] dma-buf: add DMA_BUF_IOCTL_SYNC_PARTIAL support

2024-04-11 Thread Rong Qianfeng



在 2024/4/10 23:07, T.J. Mercier 写道:

[You don't often get email from tjmerc...@google.com. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

On Wed, Apr 10, 2024 at 7:22 AM Christian König
 wrote:

Am 09.04.24 um 18:37 schrieb T.J. Mercier:

On Tue, Apr 9, 2024 at 12:34 AM Rong Qianfeng <11065...@vivo.com> wrote:

在 2024/4/8 15:58, Christian König 写道:

Am 07.04.24 um 09:50 schrieb Rong Qianfeng:

[SNIP]

Am 13.11.21 um 07:22 schrieb Jianqun Xu:

Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with
offset and len.

You have not given an use case for this so it is a bit hard to
review. And from the existing use cases I don't see why this should
be necessary.

Even worse from the existing backend implementation I don't even see
how drivers should be able to fulfill this semantics.

Please explain further,
Christian.

Here is a practical case:
The user space can allocate a large chunk of dma-buf for
self-management, used as a shared memory pool.
Small dma-buf can be allocated from this shared memory pool and
released back to it after use, thus improving the speed of dma-buf
allocation and release.
Additionally, custom functionalities such as memory statistics and
boundary checking can be implemented in the user space.
Of course, the above-mentioned functionalities require the
implementation of a partial cache sync interface.

Well that is obvious, but where is the code doing that?

You can't send out code without an actual user of it. That will
obviously be rejected.

Regards,
Christian.

In fact, we have already used the user-level dma-buf memory pool in the
camera shooting scenario on the phone.

   From the test results, The execution time of the photo shooting
algorithm has been reduced from 3.8s to 3s.


For phones, the (out of tree) Android version of the system heap has a
page pool connected to a shrinker.

Well, it should be obvious but I'm going to repeat it here: Submitting
kernel patches for our of tree code is a rather *extreme* no-go.


Sorry I think my comment led to some confusion. I wasn't suggesting
you should take the patch; it's clearly incomplete. I was pointing out
another option to Rong. It appears Rong is creating a single oversized
dma-buf, and subdividing it in userspace to avoid multiple allocations
for multiple buffers. That would lead to a need for a partial sync of
the buffer from userspace. Instead of that, using a heap with a page
pool gets you kind of the same thing with a lot less headache in
userspace, and no need for partial sync. So I wanted to share that
option, and that can go in just Android if necessary without this
patch.


Hi T.J.

If there is a chance to use this patch on Android, I can explain to you 
in detail


why the user layer needs the dma-buf memory pool.

Thanks

Rong Qianfeng




That in kernel code *must* have an in kernel user is a number one rule.

When somebody violates this rule he pretty much disqualifying himself as
reliable source of patches since maintainers then have to assume that
this person tries to submit code which doesn't have a justification to
be upstream.

So while this actually looks useful from the technical side as long as
nobody does an implementation based on an upstream driver I absolutely
have to reject it from the organizational side.

Regards,
Christian.


   That allows you to skip page
allocation without fully pinning the memory like you get when
allocating a dma-buf that's way larger than necessary. If it's for a
camera maybe you need physically contiguous memory, but it's also
possible to set that up.

https://android.googlesource.com/kernel/common/+/refs/heads/android14-6.1/drivers/dma-buf/heaps/system_heap.c#377



To be honest, I didn't understand your concern "...how drivers should be
able to fulfill this semantics." Can you please help explain it in more
detail?

Thanks,

Rong Qianfeng.


Thanks
Rong Qianfeng.


Re: [PATCH] dma-buf: add DMA_BUF_IOCTL_SYNC_PARTIAL support

2024-04-11 Thread Rong Qianfeng



在 2024/4/10 0:37, T.J. Mercier 写道:

[You don't often get email from tjmerc...@google.com. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

On Tue, Apr 9, 2024 at 12:34 AM Rong Qianfeng <11065...@vivo.com> wrote:


在 2024/4/8 15:58, Christian König 写道:

Am 07.04.24 um 09:50 schrieb Rong Qianfeng:

[SNIP]

Am 13.11.21 um 07:22 schrieb Jianqun Xu:

Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with
offset and len.

You have not given an use case for this so it is a bit hard to
review. And from the existing use cases I don't see why this should
be necessary.

Even worse from the existing backend implementation I don't even see
how drivers should be able to fulfill this semantics.

Please explain further,
Christian.

Here is a practical case:
The user space can allocate a large chunk of dma-buf for
self-management, used as a shared memory pool.
Small dma-buf can be allocated from this shared memory pool and
released back to it after use, thus improving the speed of dma-buf
allocation and release.
Additionally, custom functionalities such as memory statistics and
boundary checking can be implemented in the user space.
Of course, the above-mentioned functionalities require the
implementation of a partial cache sync interface.

Well that is obvious, but where is the code doing that?

You can't send out code without an actual user of it. That will
obviously be rejected.

Regards,
Christian.

In fact, we have already used the user-level dma-buf memory pool in the
camera shooting scenario on the phone.

  From the test results, The execution time of the photo shooting
algorithm has been reduced from 3.8s to 3s.


For phones, the (out of tree) Android version of the system heap has a
page pool connected to a shrinker. That allows you to skip page
allocation without fully pinning the memory like you get when
allocating a dma-buf that's way larger than necessary. If it's for a
camera maybe you need physically contiguous memory, but it's also
possible to set that up.

https://android.googlesource.com/kernel/common/+/refs/heads/android14-6.1/drivers/dma-buf/heaps/system_heap.c#377


Thank you for the reminder.

The page pool of the system heap can meet the needs of most scenarios, 
but the camera shooting scenario is quite special.


Why the camera shooting scenario needs to have a dma-buf memory pool in 
the user-level?


(1) The memory demand is extremely high and time requirements are 
stringent.


(2) The memory in the page pool(system heap) is easily reclaimed or used 
by other apps.


(3) High concurrent allocation and release (with deferred-free) lead to 
high memory usage peaks.



Additionally, after the camera exits, the shared memory pool can be 
released, with minimal impact.


Thanks

Rong Qianfeng

To be honest, I didn't understand your concern "...how drivers should be
able to fulfill this semantics." Can you please help explain it in more
detail?

Thanks,

Rong Qianfeng.


Thanks
Rong Qianfeng.


Re: [PATCH] dma-buf: add DMA_BUF_IOCTL_SYNC_PARTIAL support

2024-04-10 Thread T.J. Mercier
On Wed, Apr 10, 2024 at 7:22 AM Christian König
 wrote:
>
> Am 09.04.24 um 18:37 schrieb T.J. Mercier:
> > On Tue, Apr 9, 2024 at 12:34 AM Rong Qianfeng <11065...@vivo.com> wrote:
> >>
> >> 在 2024/4/8 15:58, Christian König 写道:
> >>> Am 07.04.24 um 09:50 schrieb Rong Qianfeng:
>  [SNIP]
> > Am 13.11.21 um 07:22 schrieb Jianqun Xu:
> >> Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with
> >> offset and len.
> > You have not given an use case for this so it is a bit hard to
> > review. And from the existing use cases I don't see why this should
> > be necessary.
> >
> > Even worse from the existing backend implementation I don't even see
> > how drivers should be able to fulfill this semantics.
> >
> > Please explain further,
> > Christian.
>  Here is a practical case:
>  The user space can allocate a large chunk of dma-buf for
>  self-management, used as a shared memory pool.
>  Small dma-buf can be allocated from this shared memory pool and
>  released back to it after use, thus improving the speed of dma-buf
>  allocation and release.
>  Additionally, custom functionalities such as memory statistics and
>  boundary checking can be implemented in the user space.
>  Of course, the above-mentioned functionalities require the
>  implementation of a partial cache sync interface.
> >>> Well that is obvious, but where is the code doing that?
> >>>
> >>> You can't send out code without an actual user of it. That will
> >>> obviously be rejected.
> >>>
> >>> Regards,
> >>> Christian.
> >> In fact, we have already used the user-level dma-buf memory pool in the
> >> camera shooting scenario on the phone.
> >>
> >>   From the test results, The execution time of the photo shooting
> >> algorithm has been reduced from 3.8s to 3s.
> >>
> > For phones, the (out of tree) Android version of the system heap has a
> > page pool connected to a shrinker.
>
> Well, it should be obvious but I'm going to repeat it here: Submitting
> kernel patches for our of tree code is a rather *extreme* no-go.
>
Sorry I think my comment led to some confusion. I wasn't suggesting
you should take the patch; it's clearly incomplete. I was pointing out
another option to Rong. It appears Rong is creating a single oversized
dma-buf, and subdividing it in userspace to avoid multiple allocations
for multiple buffers. That would lead to a need for a partial sync of
the buffer from userspace. Instead of that, using a heap with a page
pool gets you kind of the same thing with a lot less headache in
userspace, and no need for partial sync. So I wanted to share that
option, and that can go in just Android if necessary without this
patch.

> That in kernel code *must* have an in kernel user is a number one rule.
>
> When somebody violates this rule he pretty much disqualifying himself as
> reliable source of patches since maintainers then have to assume that
> this person tries to submit code which doesn't have a justification to
> be upstream.
>
> So while this actually looks useful from the technical side as long as
> nobody does an implementation based on an upstream driver I absolutely
> have to reject it from the organizational side.
>
> Regards,
> Christian.
>
> >   That allows you to skip page
> > allocation without fully pinning the memory like you get when
> > allocating a dma-buf that's way larger than necessary. If it's for a
> > camera maybe you need physically contiguous memory, but it's also
> > possible to set that up.
> >
> > https://android.googlesource.com/kernel/common/+/refs/heads/android14-6.1/drivers/dma-buf/heaps/system_heap.c#377
> >
> >
> >> To be honest, I didn't understand your concern "...how drivers should be
> >> able to fulfill this semantics." Can you please help explain it in more
> >> detail?
> >>
> >> Thanks,
> >>
> >> Rong Qianfeng.
> >>
>  Thanks
>  Rong Qianfeng.
>


Re: [PATCH] dma-buf: add DMA_BUF_IOCTL_SYNC_PARTIAL support

2024-04-10 Thread Christian König

Am 09.04.24 um 18:37 schrieb T.J. Mercier:

On Tue, Apr 9, 2024 at 12:34 AM Rong Qianfeng <11065...@vivo.com> wrote:


在 2024/4/8 15:58, Christian König 写道:

Am 07.04.24 um 09:50 schrieb Rong Qianfeng:

[SNIP]

Am 13.11.21 um 07:22 schrieb Jianqun Xu:

Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with
offset and len.

You have not given an use case for this so it is a bit hard to
review. And from the existing use cases I don't see why this should
be necessary.

Even worse from the existing backend implementation I don't even see
how drivers should be able to fulfill this semantics.

Please explain further,
Christian.

Here is a practical case:
The user space can allocate a large chunk of dma-buf for
self-management, used as a shared memory pool.
Small dma-buf can be allocated from this shared memory pool and
released back to it after use, thus improving the speed of dma-buf
allocation and release.
Additionally, custom functionalities such as memory statistics and
boundary checking can be implemented in the user space.
Of course, the above-mentioned functionalities require the
implementation of a partial cache sync interface.

Well that is obvious, but where is the code doing that?

You can't send out code without an actual user of it. That will
obviously be rejected.

Regards,
Christian.

In fact, we have already used the user-level dma-buf memory pool in the
camera shooting scenario on the phone.

  From the test results, The execution time of the photo shooting
algorithm has been reduced from 3.8s to 3s.


For phones, the (out of tree) Android version of the system heap has a
page pool connected to a shrinker.


Well, it should be obvious but I'm going to repeat it here: Submitting 
kernel patches for our of tree code is a rather *extreme* no-go.


That in kernel code *must* have an in kernel user is a number one rule.

When somebody violates this rule he pretty much disqualifying himself as 
reliable source of patches since maintainers then have to assume that 
this person tries to submit code which doesn't have a justification to 
be upstream.


So while this actually looks useful from the technical side as long as 
nobody does an implementation based on an upstream driver I absolutely 
have to reject it from the organizational side.


Regards,
Christian.


  That allows you to skip page
allocation without fully pinning the memory like you get when
allocating a dma-buf that's way larger than necessary. If it's for a
camera maybe you need physically contiguous memory, but it's also
possible to set that up.

https://android.googlesource.com/kernel/common/+/refs/heads/android14-6.1/drivers/dma-buf/heaps/system_heap.c#377



To be honest, I didn't understand your concern "...how drivers should be
able to fulfill this semantics." Can you please help explain it in more
detail?

Thanks,

Rong Qianfeng.


Thanks
Rong Qianfeng.




Re: [PATCH] dma-buf: add DMA_BUF_IOCTL_SYNC_PARTIAL support

2024-04-10 Thread Rong Qianfeng



在 2024/4/9 23:34, Christian König 写道:

Am 09.04.24 um 09:32 schrieb Rong Qianfeng:


在 2024/4/8 15:58, Christian König 写道:

Am 07.04.24 um 09:50 schrieb Rong Qianfeng:

[SNIP]

Am 13.11.21 um 07:22 schrieb Jianqun Xu:

Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with
offset and len.


You have not given an use case for this so it is a bit hard to 
review. And from the existing use cases I don't see why this 
should be necessary.


Even worse from the existing backend implementation I don't even 
see how drivers should be able to fulfill this semantics.


Please explain further,
Christian.

Here is a practical case:
The user space can allocate a large chunk of dma-buf for 
self-management, used as a shared memory pool.
Small dma-buf can be allocated from this shared memory pool and 
released back to it after use, thus improving the speed of dma-buf 
allocation and release.
Additionally, custom functionalities such as memory statistics and 
boundary checking can be implemented in the user space.
Of course, the above-mentioned functionalities require the 
implementation of a partial cache sync interface.


Well that is obvious, but where is the code doing that?

You can't send out code without an actual user of it. That will 
obviously be rejected.


Regards,
Christian.


In fact, we have already used the user-level dma-buf memory pool in 
the camera shooting scenario on the phone.


From the test results, The execution time of the photo shooting 
algorithm has been reduced from 3.8s to 3s.


To be honest, I didn't understand your concern "...how drivers should 
be able to fulfill this semantics." Can you please help explain it in 
more detail?


Well you don't give any upstream driver code which actually uses this 
interface.


If you want to suggest some changes to the core Linux kernel your 
driver actually needs to be upstream.


As long as that isn't the case this approach here is a completely no-go.


Ok, I get it now, thanks!

Regards,

Rong Qianfeng.



Regards,
Christian.



Thanks,

Rong Qianfeng.





Thanks
Rong Qianfeng.






Re: [PATCH] dma-buf: add DMA_BUF_IOCTL_SYNC_PARTIAL support

2024-04-09 Thread T.J. Mercier
On Tue, Apr 9, 2024 at 12:34 AM Rong Qianfeng <11065...@vivo.com> wrote:
>
>
> 在 2024/4/8 15:58, Christian König 写道:
> > Am 07.04.24 um 09:50 schrieb Rong Qianfeng:
> >> [SNIP]
> >>> Am 13.11.21 um 07:22 schrieb Jianqun Xu:
>  Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with
>  offset and len.
> >>>
> >>> You have not given an use case for this so it is a bit hard to
> >>> review. And from the existing use cases I don't see why this should
> >>> be necessary.
> >>>
> >>> Even worse from the existing backend implementation I don't even see
> >>> how drivers should be able to fulfill this semantics.
> >>>
> >>> Please explain further,
> >>> Christian.
> >> Here is a practical case:
> >> The user space can allocate a large chunk of dma-buf for
> >> self-management, used as a shared memory pool.
> >> Small dma-buf can be allocated from this shared memory pool and
> >> released back to it after use, thus improving the speed of dma-buf
> >> allocation and release.
> >> Additionally, custom functionalities such as memory statistics and
> >> boundary checking can be implemented in the user space.
> >> Of course, the above-mentioned functionalities require the
> >> implementation of a partial cache sync interface.
> >
> > Well that is obvious, but where is the code doing that?
> >
> > You can't send out code without an actual user of it. That will
> > obviously be rejected.
> >
> > Regards,
> > Christian.
>
> In fact, we have already used the user-level dma-buf memory pool in the
> camera shooting scenario on the phone.
>
>  From the test results, The execution time of the photo shooting
> algorithm has been reduced from 3.8s to 3s.
>
For phones, the (out of tree) Android version of the system heap has a
page pool connected to a shrinker. That allows you to skip page
allocation without fully pinning the memory like you get when
allocating a dma-buf that's way larger than necessary. If it's for a
camera maybe you need physically contiguous memory, but it's also
possible to set that up.

https://android.googlesource.com/kernel/common/+/refs/heads/android14-6.1/drivers/dma-buf/heaps/system_heap.c#377


> To be honest, I didn't understand your concern "...how drivers should be
> able to fulfill this semantics." Can you please help explain it in more
> detail?
>
> Thanks,
>
> Rong Qianfeng.
>
> >
> >>
> >> Thanks
> >> Rong Qianfeng.
> >
>


Re: [PATCH] dma-buf: add DMA_BUF_IOCTL_SYNC_PARTIAL support

2024-04-09 Thread Christian König

Am 09.04.24 um 09:32 schrieb Rong Qianfeng:


在 2024/4/8 15:58, Christian König 写道:

Am 07.04.24 um 09:50 schrieb Rong Qianfeng:

[SNIP]

Am 13.11.21 um 07:22 schrieb Jianqun Xu:

Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with
offset and len.


You have not given an use case for this so it is a bit hard to 
review. And from the existing use cases I don't see why this should 
be necessary.


Even worse from the existing backend implementation I don't even 
see how drivers should be able to fulfill this semantics.


Please explain further,
Christian.

Here is a practical case:
The user space can allocate a large chunk of dma-buf for 
self-management, used as a shared memory pool.
Small dma-buf can be allocated from this shared memory pool and 
released back to it after use, thus improving the speed of dma-buf 
allocation and release.
Additionally, custom functionalities such as memory statistics and 
boundary checking can be implemented in the user space.
Of course, the above-mentioned functionalities require the 
implementation of a partial cache sync interface.


Well that is obvious, but where is the code doing that?

You can't send out code without an actual user of it. That will 
obviously be rejected.


Regards,
Christian.


In fact, we have already used the user-level dma-buf memory pool in 
the camera shooting scenario on the phone.


From the test results, The execution time of the photo shooting 
algorithm has been reduced from 3.8s to 3s.


To be honest, I didn't understand your concern "...how drivers should 
be able to fulfill this semantics." Can you please help explain it in 
more detail?


Well you don't give any upstream driver code which actually uses this 
interface.


If you want to suggest some changes to the core Linux kernel your driver 
actually needs to be upstream.


As long as that isn't the case this approach here is a completely no-go.

Regards,
Christian.



Thanks,

Rong Qianfeng.





Thanks
Rong Qianfeng.






Re: [PATCH] dma-buf: add DMA_BUF_IOCTL_SYNC_PARTIAL support

2024-04-09 Thread Rong Qianfeng



在 2024/4/8 15:58, Christian König 写道:

Am 07.04.24 um 09:50 schrieb Rong Qianfeng:

[SNIP]

Am 13.11.21 um 07:22 schrieb Jianqun Xu:

Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with
offset and len.


You have not given an use case for this so it is a bit hard to 
review. And from the existing use cases I don't see why this should 
be necessary.


Even worse from the existing backend implementation I don't even see 
how drivers should be able to fulfill this semantics.


Please explain further,
Christian.

Here is a practical case:
The user space can allocate a large chunk of dma-buf for 
self-management, used as a shared memory pool.
Small dma-buf can be allocated from this shared memory pool and 
released back to it after use, thus improving the speed of dma-buf 
allocation and release.
Additionally, custom functionalities such as memory statistics and 
boundary checking can be implemented in the user space.
Of course, the above-mentioned functionalities require the 
implementation of a partial cache sync interface.


Well that is obvious, but where is the code doing that?

You can't send out code without an actual user of it. That will 
obviously be rejected.


Regards,
Christian.


In fact, we have already used the user-level dma-buf memory pool in the 
camera shooting scenario on the phone.


From the test results, The execution time of the photo shooting 
algorithm has been reduced from 3.8s to 3s.


To be honest, I didn't understand your concern "...how drivers should be 
able to fulfill this semantics." Can you please help explain it in more 
detail?


Thanks,

Rong Qianfeng.





Thanks
Rong Qianfeng.




Re: [PATCH] dma-buf: add DMA_BUF_IOCTL_SYNC_PARTIAL support

2024-04-08 Thread Christian König

Am 07.04.24 um 09:50 schrieb Rong Qianfeng:

[SNIP]

Am 13.11.21 um 07:22 schrieb Jianqun Xu:

Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with
offset and len.


You have not given an use case for this so it is a bit hard to 
review. And from the existing use cases I don't see why this should 
be necessary.


Even worse from the existing backend implementation I don't even see 
how drivers should be able to fulfill this semantics.


Please explain further,
Christian.

Here is a practical case:
The user space can allocate a large chunk of dma-buf for 
self-management, used as a shared memory pool.
Small dma-buf can be allocated from this shared memory pool and 
released back to it after use, thus improving the speed of dma-buf 
allocation and release.
Additionally, custom functionalities such as memory statistics and 
boundary checking can be implemented in the user space.
Of course, the above-mentioned functionalities require the 
implementation of a partial cache sync interface.


Well that is obvious, but where is the code doing that?

You can't send out code without an actual user of it. That will 
obviously be rejected.


Regards,
Christian.



Thanks
Rong Qianfeng.




Re: [PATCH] dma-buf: add DMA_BUF_IOCTL_SYNC_PARTIAL support

2024-04-08 Thread Rong Qianfeng
ha256; c=relaxed/relaxed; 
d=microsoft.com; s=arcselector9901;
h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; 


bh=JkRdepoBhhvZgBl9zd4oG1PeOv+kPWSxKw1ORztFSv4=;
b=hYGlnailNdl4CzgRcFMK/ZTAFR1Ipev9djYIyqc4j32m6sMsqyx1YxjcAPDo6Rnk3qtB+9sMag1rldkddxzJCBGDOLkgX7hQFl7AFRIQhpXLQsUek5IrCfbd0rGW4HpdbUxyGyRz70XnjFPSTGQFhlz7glYKDPeyHN/X3RtVBfrUG1ywFsVzjD6+I8Uc0O9jbG6Rw5S1/dydWeyovBOPcbUVYt1ZF0z7JDt4Tj90hAElD4cTmc/rfAt3eY3pB6pydnGu+nXJ5bKZIY/U7Wec+TwdXPNGU+ia5E4MN6xuL+kVLPNPa1G6h8qetsVTw5UYOkGtPZCxcR6pUKs0ocJTZg== 


ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; 
dkim=pass

header.d=amd.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; 
s=selector1; 
h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;

bh=JkRdepoBhhvZgBl9zd4oG1PeOv+kPWSxKw1ORztFSv4=;
b=4zNaX2CfJHDxjaHlT7C84/jyXuQUhJVYoDhuLfWp/MO0v73QSPzoDc9EiD6G3taNwfNkwRBOMm5VIiF2wOMmVeJmV2JabQp0VPjTYkXDZL9xbtuoXkdCtQFx1+Fz1GJhOnpgaIMZrSQ+DugqAkbmKW5Jp6o8P3GT/ZDzIFBk4xk= 


Authentication-Results: dkim=none (message not signed)
header.d=none;dmarc=none action=none header.from=amd.com;
Received: from MWHPR1201MB0192.namprd12.prod.outlook.com
(2603:10b6:301:5a::14) by MWHPR12MB1837.namprd12.prod.outlook.com
(2603:10b6:300:113::20) with Microsoft SMTP Server (version=TLS1_2,
cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4690.27; Mon, 
15 Nov

2021 11:42:20 +
Received: from MWHPR1201MB0192.namprd12.prod.outlook.com
([fe80::2d02:26e7:a2d0:3769]) by 
MWHPR1201MB0192.namprd12.prod.outlook.com
([fe80::2d02:26e7:a2d0:3769%5]) with mapi id 15.20.4690.027; Mon, 15 
Nov 2021

11:42:20 +0000
Subject: Re: [PATCH] dma-buf: add DMA_BUF_IOCTL_SYNC_PARTIAL support
To: Jianqun Xu , sumit.sem...@linaro.org
References: <202306.3743909-1-jay...@rock-chips.com>
From: =?UTF-8?Q?Christian_K=c3=b6nig?= 
Message-ID: <1da5cdf0-ccb8-3740-cf96-794c4d5b2...@amd.com>
Date: Mon, 15 Nov 2021 12:42:13 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
Thunderbird/78.13.0
In-Reply-To: <202306.3743909-1-jay...@rock-chips.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Content-Language: en-US
X-ClientProxiedBy: AS9PR0301CA0041.eurprd03.prod.outlook.com
(2603:10a6:20b:469::32) To MWHPR1201MB0192.namprd12.prod.outlook.com
(2603:10b6:301:5a::14)
MIME-Version: 1.0
Received: from [IPv6:2a02:908:1252:fb60:bf0c:d52c:6ba0:cfc6]
(2a02:908:1252:fb60:bf0c:d52c:6ba0:cfc6) by
AS9PR0301CA0041.eurprd03.prod.outlook.com (2603:10a6:20b:469::32) with
Microsoft SMTP Server (version=TLS1_2,
cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4690.26 via 
Frontend

Transport; Mon, 15 Nov 2021 11:42:17 +
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-Correlation-Id: 
00ebe1d3-562c-44d3-ab7c-08d9a82cfb38

X-MS-TrafficTypeDiagnostic: MWHPR12MB1837:
X-Microsoft-Antispam-PRVS: 


X-MS-Oob-TLC-OOBClassifiers: OLM:8273;
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info: 
qnNpT+UDEdrvmTrphgUzQsrIExW/nJQjCEAt6/leQnM/+F75uQ4P/gIEmE2mfi+FLGZoBp+qpesYv6TE414JsgHBjmPsq9wqAxODHs5+tKntVesYVzi2T3a+bor5SPTdHrjOyz4Lv5il0Z00hyIMOsC898lxdXNK3DY8ClRa/X+z05ZLWWI9kbXDjVdrVqmD31Ciy9En6YG1TKIV+epuDLGRKEvYe8NhgoFs6tUkQ/bWmTBdRJgllNrqms9k2nXdSN5hRpvEjPb3R0jF3kat4c9/g+R9ZfNDU0z3Qo2VAfydWQzqA1BIV1A7EDnRTXmW5vnAV79Migw7l8P0CqzM1nBlO5bCjKtHXPj4OXseQUwQWFO5216Sj4yR6FeIQFVrAO7lW3pd3S4bncIRU17nSaQPkQnnNSdXm0OBFoDdrVzhxYO5g7CoHdrAh0S0Y4Q8vQFqy36ujVGByHPPFfX+aaKXQ/BWnlM6tghXuVYUcoYtqlV4AJpRnByfYBQrumA1ouTLedvwpUsQlCItufU36509QJHuQ9oa3NDqXos2SPnUS/F/6HsBJHw63Rq1Jcd0WGmqDrp9wFQaK959C6zotCP8LC+p2pB0gQYgRieAeifspuQuKmSFk46/7OZfmlwmv8i1rhIK65/inlDat2eDtJBEpqk15UW3bMvw1g5dv5Kr9RmOvFfjfRv5uYDkmxM4OcYG/KHnhd59tAKjoVDXcLgv4Z/rf5i8pgCG0D7SWcZI1XNcdbglDuWiigm3ihyx
X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; 
SCL:1; SRV:;
IPV:NLI; SFV:NSPM; H:MWHPR1201MB0192.namprd12.prod.outlook.com; PTR:; 
CAT:NONE;
SFS:(4636009)(366004)(316002)(7416002)(2616005)(6486002)(8936002)(8676002)(186003)(31686004)(66556008)(66476007)(86362001)(66946007)(8338041)(31696002)(004)(50861)(3810072)(566032)(2906002)(4326008)(36756003)(4374052)(4598051); 


DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: 
=?utf-8?B?VmtQa3NIa1NhVm40N2lmZnBueGVQS3ZuTWpaWm4waU1IQ3kvYlFHSGI1ZllH?=

=?utf-8?B?ek13MElkY3lWUlVuVWxJUkZPR0tlQ01FV0lCUnJrUWRSejg0WjZJWW5CTFdS?=
=?utf-8?B?TDVMZTVHNitXaW9taUpweTFaL2pJS1NZbGJacnlDWVNyMFl3MjlkdGd3YTEr?=
=?utf-8?B?MmRodnF3dnlzd0NKZEtZN1JaVFA2OTBJRGQ1WjZBUHROK0dhMlhMWWZkaUd3?=
=?utf-8?B?RFlQd3NmNGRWZFZydFhXSVJyRmxQMEZDWlhxU3hMcDYzNUtJRGtHVzZtMmIx?=
=?utf-8?B?NU5zT0VvNmI1MUE1WTRmQWRUZkM4aX

Re: [PATCH] dma-buf: add DMA_BUF_IOCTL_SYNC_PARTIAL support

2021-11-15 Thread Christian König

Am 13.11.21 um 07:22 schrieb Jianqun Xu:

Add DMA_BUF_IOCTL_SYNC_PARTIAL support for user to sync dma-buf with
offset and len.


You have not given an use case for this so it is a bit hard to review. 
And from the existing use cases I don't see why this should be necessary.


Even worse from the existing backend implementation I don't even see how 
drivers should be able to fulfill this semantics.


Please explain further,
Christian.



Change-Id: I03d2d2e10e48d32aa83c31abade57e0931e1be49
Signed-off-by: Jianqun Xu 
---
  drivers/dma-buf/dma-buf.c| 42 
  include/uapi/linux/dma-buf.h |  8 +++
  2 files changed, 50 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index d9948d58b3f4..78f37f7c3462 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -392,6 +392,7 @@ static long dma_buf_ioctl(struct file *file,
  {
struct dma_buf *dmabuf;
struct dma_buf_sync sync;
+   struct dma_buf_sync_partial sync_p;
enum dma_data_direction direction;
int ret;
  
@@ -430,6 +431,47 @@ static long dma_buf_ioctl(struct file *file,

case DMA_BUF_SET_NAME_B:
return dma_buf_set_name(dmabuf, (const char __user *)arg);
  
+	case DMA_BUF_IOCTL_SYNC_PARTIAL:

+   if (copy_from_user(_p, (void __user *) arg, 
sizeof(sync_p)))
+   return -EFAULT;
+
+   if (sync_p.len == 0)
+   return 0;
+
+   if ((sync_p.offset % cache_line_size()) || (sync_p.len % 
cache_line_size()))
+   return -EINVAL;
+
+   if (sync_p.len > dmabuf->size || sync_p.offset > dmabuf->size - 
sync_p.len)
+   return -EINVAL;
+
+   if (sync_p.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
+   return -EINVAL;
+
+   switch (sync_p.flags & DMA_BUF_SYNC_RW) {
+   case DMA_BUF_SYNC_READ:
+   direction = DMA_FROM_DEVICE;
+   break;
+   case DMA_BUF_SYNC_WRITE:
+   direction = DMA_TO_DEVICE;
+   break;
+   case DMA_BUF_SYNC_RW:
+   direction = DMA_BIDIRECTIONAL;
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   if (sync_p.flags & DMA_BUF_SYNC_END)
+   ret = dma_buf_end_cpu_access_partial(dmabuf, direction,
+sync_p.offset,
+sync_p.len);
+   else
+   ret = dma_buf_begin_cpu_access_partial(dmabuf, 
direction,
+  sync_p.offset,
+  sync_p.len);
+
+   return ret;
+
default:
return -ENOTTY;
}
diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h
index 7f30393b92c3..6236c644624d 100644
--- a/include/uapi/linux/dma-buf.h
+++ b/include/uapi/linux/dma-buf.h
@@ -47,4 +47,12 @@ struct dma_buf_sync {
  #define DMA_BUF_SET_NAME_A_IOW(DMA_BUF_BASE, 1, u32)
  #define DMA_BUF_SET_NAME_B_IOW(DMA_BUF_BASE, 1, u64)
  
+struct dma_buf_sync_partial {

+   __u64 flags;
+   __u32 offset;
+   __u32 len;
+};
+
+#define DMA_BUF_IOCTL_SYNC_PARTIAL _IOW(DMA_BUF_BASE, 2, struct 
dma_buf_sync_partial)
+
  #endif