Re: [PATCH v4 08/35] iommu/mediatek: Use kmalloc for protect buffer

2022-02-15 Thread Tomasz Figa
On Wed, Feb 16, 2022 at 2:55 PM Yong Wu  wrote:
>
> On Thu, 2022-01-27 at 12:08 +0100, AngeloGioacchino Del Regno wrote:
> > Il 25/01/22 09:56, Yong Wu ha scritto:
> > > No need zero for the protect buffer that is only accessed by the
> > > IOMMU HW
> > > translation fault happened.
> > >
> > > Signed-off-by: Yong Wu 
> >
> > I would rather keep this a devm_kzalloc instead... the cost is very
> > minimal and
> > this will be handy when new hardware will be introduced, as it may
> > require a bigger
> > buffer: in that case, "older" platforms will use only part of it and
> > we may get
> > garbage data at the end.
>
> Currently this is to avoid zero 512 bytes for all the platforms.
>
> Sorry, I don't understand why it is unnecessary when the new hardware
> requires a bigger buffer. If the buffer becomes bigger, then clearing
> it to 0 need more cost. then this patch is more helpful?
>
> The content in this buffer is garbage, we won't care about or analyse
> it.

I think we should zero it for security reasons regardless of any other
aspects. With this patch it's leaking kernel data to the hardware.

At the same time, we're talking here about something executed just 1
time when the driver probes. I don't think the cost would really
matter.

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] CHROMIUM: iommu: rockchip: Make sure that page table state is coherent

2022-01-23 Thread Tomasz Figa
Hi Dafna,

On Fri, Dec 10, 2021 at 12:18 AM Dafna Hirschfeld
 wrote:
>
>
>
> On 23.03.15 10:38, Tomasz Figa wrote:
> > Sorry, I had to dig my way out through my backlog.
> >
> > On Tue, Mar 3, 2015 at 10:36 PM, Joerg Roedel  wrote:
> >> On Mon, Feb 09, 2015 at 08:19:21PM +0900, Tomasz Figa wrote:
> >>> Even though the code uses the dt_lock spin lock to serialize mapping
> >>> operation from different threads, it does not protect from IOMMU
> >>> accesses that might be already taking place and thus altering state
> >>> of the IOTLB. This means that current mapping code which first zaps
> >>> the page table and only then updates it with new mapping which is
> >>> prone to mentioned race.
> >>
> >> Could you elabortate a bit on the race and why it is sufficient to zap
> >> only the first and the last iova? From the description and the comments
> >> in the patch this is not clear to me.
> >
> > Let's start with why it's sufficient to zap only first and last iova.
> >
> > While unmapping, the driver zaps all iovas belonging to the mapping,
> > so the page tables not used by any mapping won't be cached. Now when
> > the driver creates a mapping it might end up occupying several page
> > tables. However, since the mapping area is virtually contiguous, only
> > the first and last page table can be shared with different mappings.
> > This means that only first and last iovas can be already cached. In
> > fact, we could detect if first and last page tables are shared and do
> > not zap at all, but this wouldn't really optimize too much. Why
> > invalidating one iova is enough to invalidate the whole page table is
> > unclear to me as well, but it seems to be the correct way on this
> > hardware.
>
> Hi,
> It seems to me that actually each mapping needs exactly one page.
> Since (as the inline doc in rk_iommu_map states) the pgsize_bitmap
> makes sure that iova mappings fits exactly into one page table
> since the mapping size is maximum 4M.
>
> This actually means that if rk_dte_get_page_table does not allocate a
> new page table but returns one that is already partially used from previous
> mappings then two page tables might be required, but I think the iova
> allocation somehow make sure that this will not be the case.

Yes, it was exactly for the case. Note that the zap operation is
per-IO-page and not per IOPT and there is some prefetching going on in
the TLB of this IOMMU. So neighboring mappings can interfere with each
other.

>
> If it was the case then the code would be buggy because it means
> that the loop in rk_iommu_map_iova will write behind the page table
> given in rk_dte_get_page_table (which we didn't allocate)

Sorry, I don't see how it could write behind the page table. Could you
give me an example?

>
> So I it seems to me that calling 'rk_iommu_zap_iova(rk_domain, iova, 
> SPAGE_SIZE);'
> as done before this patch should be used, but be moved from
> rk_dte_get_page_table to where rk_iommu_zap_iova_first_last is now
>
> Thanks,
> Dafna
>
> >
> > As for the race, it's also kind of explained by the above. The already
> > running hardware can trigger page table look-ups in the IOMMU and so
> > caching of the page table between our zapping and updating its
> > contents. With this patch zapping is performed after updating the page
> > table so the race is gone.
> >
> > Best regards,
> > Tomasz
> >
> >  From mboxrd@z Thu Jan  1 00:00:00 1970
> > Return-Path: 
> > Received: (majord...@vger.kernel.org) by vger.kernel.org via listexpand
> >   id S1753210AbbCWM3R (ORCPT );
> >   Mon, 23 Mar 2015 08:29:17 -0400
> > Received: from 8bytes.org ([81.169.241.247]:33957 "EHLO theia.8bytes.org"
> >   rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
> >   id S1752552AbbCWM3M (ORCPT );
> >   Mon, 23 Mar 2015 08:29:12 -0400
> > Date: Mon, 23 Mar 2015 13:29:10 +0100
> > From: Joerg Roedel 
> > To: Tomasz Figa 
> > Cc: iommu@lists.linux-foundation.org,
> >  "linux-arm-ker...@lists.infradead.org"
> >   ,
> >  "linux-ker...@vger.kernel.org" ,
> >  "open list:ARM/Rockchip SoC..." 
> > ,
> >  Heiko Stuebner , Daniel Kurtz 
> > 
> > Subject: Re: [PATCH] CHROMIUM: iommu: rockchip: Make sure that page table
> >   state is coherent
> > Message-ID: <20150323122910.go4...@8bytes.org>
> > References: <1423480761-33453-1-git-send-email-tf...@chromium.org>
> >   <20150303133659.gd10...@8bytes.org>
> &

Re: [PATCH 0/3] Allow restricted-dma-pool to customize IO_TLB_SEGSIZE

2021-11-24 Thread Tomasz Figa
Hi Robin,

On Tue, Nov 23, 2021 at 8:59 PM Robin Murphy  wrote:
>
> On 2021-11-23 11:21, Hsin-Yi Wang wrote:
> > Default IO_TLB_SEGSIZE (128) slabs may be not enough for some use cases.
> > This series adds support to customize io_tlb_segsize for each
> > restricted-dma-pool.
> >
> > Example use case:
> >
> > mtk-isp drivers[1] are controlled by mtk-scp[2] and allocate memory through
> > mtk-scp. In order to use the noncontiguous DMA API[3], we need to use
> > the swiotlb pool. mtk-scp needs to allocate memory with 2560 slabs.
> > mtk-isp drivers also needs to allocate memory with 200+ slabs. Both are
> > larger than the default IO_TLB_SEGSIZE (128) slabs.
>
> Are drivers really doing streaming DMA mappings that large? If so, that
> seems like it might be worth trying to address in its own right for the
> sake of efficiency - allocating ~5MB of memory twice and copying it back
> and forth doesn't sound like the ideal thing to do.
>
> If it's really about coherent DMA buffer allocation, I thought the plan
> was that devices which expect to use a significant amount and/or size of
> coherent buffers would continue to use a shared-dma-pool for that? It's
> still what the binding implies. My understanding was that
> swiotlb_alloc() is mostly just a fallback for the sake of drivers which
> mostly do streaming DMA but may allocate a handful of pages worth of
> coherent buffers here and there. Certainly looking at the mtk_scp
> driver, that seems like it shouldn't be going anywhere near SWIOTLB at all.

First, thanks a lot for taking a look at this patch series.

The drivers would do streaming DMA within a reserved region that is
the only memory accessible to them for security reasons. This seems to
exactly match the definition of the restricted pool as merged
recently.

The new dma_alloc_noncontiguous() API would allow allocating suitable
memory directly from the pool, which would eliminate the need to copy.
However, for a restricted pool, this would exercise the SWIOTLB
allocator, which currently suffers from the limitation as described by
Hsin-Yi. Since the allocator in general is quite general purpose and
already used for coherent allocations as per the current restricted
pool implementation, I think it indeed makes sense to lift the
limitation, rather than trying to come up with yet another thing.

Best regards,
Tomasz

>
> Robin.
>
> > [1] (not in upstream) 
> > https://patchwork.kernel.org/project/linux-media/cover/20190611035344.29814-1-jungo@mediatek.com/
> > [2] 
> > https://elixir.bootlin.com/linux/latest/source/drivers/remoteproc/mtk_scp.c
> > [3] 
> > https://patchwork.kernel.org/project/linux-media/cover/20210909112430.61243-1-senozhat...@chromium.org/
> >
> > Hsin-Yi Wang (3):
> >dma: swiotlb: Allow restricted-dma-pool to customize IO_TLB_SEGSIZE
> >dt-bindings: Add io-tlb-segsize property for restricted-dma-pool
> >arm64: dts: mt8183: use restricted swiotlb for scp mem
> >
> >   .../reserved-memory/shared-dma-pool.yaml  |  8 +
> >   .../arm64/boot/dts/mediatek/mt8183-kukui.dtsi |  4 +--
> >   include/linux/swiotlb.h   |  1 +
> >   kernel/dma/swiotlb.c  | 34 ++-
> >   4 files changed, 37 insertions(+), 10 deletions(-)
> >
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/7] dma-iommu: implement ->alloc_noncontiguous

2021-02-16 Thread Tomasz Figa
Hi Christoph


On Tue, Feb 2, 2021 at 6:51 PM Christoph Hellwig  wrote:
>
> Implement support for allocating a non-contiguous DMA region.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/iommu/dma-iommu.c | 35 +++
>  1 file changed, 35 insertions(+)
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 85cb004d7a44c6..4e0b170d38d57a 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -718,6 +718,7 @@ static struct page 
> **__iommu_dma_alloc_noncontiguous(struct device *dev,
> goto out_free_sg;
>
> sgt->sgl->dma_address = iova;
> +   sgt->sgl->dma_length = size;
> return pages;
>
>  out_free_sg:
> @@ -755,6 +756,36 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
> size_t size,
> return NULL;
>  }
>
> +#ifdef CONFIG_DMA_REMAP
> +static struct sg_table *iommu_dma_alloc_noncontiguous(struct device *dev,
> +   size_t size, enum dma_data_direction dir, gfp_t gfp)
> +{
> +   struct dma_sgt_handle *sh;
> +
> +   sh = kmalloc(sizeof(*sh), gfp);
> +   if (!sh)
> +   return NULL;
> +
> +   sh->pages = __iommu_dma_alloc_noncontiguous(dev, size, >sgt, gfp,
> +   PAGE_KERNEL, 0);

When working on the videobuf2 integration with Sergey I noticed that
we always pass 0 as DMA attrs here, which removes the ability for
drivers to use DMA_ATTR_ALLOC_SINGLE_PAGES.

It's quite important from a system stability point of view, because by
default the iommu_dma allocator would prefer big order allocations for
TLB locality reasons. For many devices, though, it doesn't really
affect the performance, because of random access patterns, so single
pages are good enough and reduce the risk of allocation failures or
latency due to fragmentation.

Do you think we could add the attrs parameter to the
dma_alloc_noncontiguous() API?

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: add a new dma_alloc_noncontiguous API v2

2021-02-08 Thread Tomasz Figa
Hi Christoph,

On Mon, Feb 8, 2021 at 3:49 AM Christoph Hellwig  wrote:
>
> Any comments?
>

Sorry for the delay. The whole series looks very good to me. Thanks a lot.

Reviewed-by: Tomasz Figa 

Best regards,
Tomasz

> On Tue, Feb 02, 2021 at 10:51:03AM +0100, Christoph Hellwig wrote:
> > Hi all,
> >
> > this series adds the new noncontiguous DMA allocation API requested by
> > various media driver maintainers.
> >
> > Changes since v1:
> >  - document that flush_kernel_vmap_range and invalidate_kernel_vmap_range
> >must be called once an allocation is mapped into KVA
> >  - add dma-debug support
> >  - remove the separate dma_handle argument, and instead create fully formed
> >DMA mapped scatterlists
> >  - use a directional allocation in uvcvideo
> >  - call invalidate_kernel_vmap_range from uvcvideo
> > ___
> > iommu mailing list
> > iommu@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu
> ---end quoted text---
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 06/27] dt-bindings: mediatek: Add binding for mt8192 IOMMU

2021-01-29 Thread Tomasz Figa
On Mon, Jan 25, 2021 at 4:34 PM Yong Wu  wrote:
>
> On Mon, 2021-01-25 at 13:18 +0900, Tomasz Figa wrote:
> > On Wed, Jan 20, 2021 at 4:08 PM Yong Wu  wrote:
> > >
> > > On Wed, 2021-01-20 at 13:15 +0900, Tomasz Figa wrote:
> > > > On Wed, Jan 13, 2021 at 3:45 PM Yong Wu  wrote:
> > > > >
> > > > > On Wed, 2021-01-13 at 14:30 +0900, Tomasz Figa wrote:
> > > > > > On Thu, Dec 24, 2020 at 8:35 PM Yong Wu  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Wed, 2020-12-23 at 17:18 +0900, Tomasz Figa wrote:
> > > > > > > > On Wed, Dec 09, 2020 at 04:00:41PM +0800, Yong Wu wrote:
> > > > > > > > > This patch adds decriptions for mt8192 IOMMU and SMI.
> > > > > > > > >
> > > > > > > > > mt8192 also is MTK IOMMU gen2 which uses ARM Short-Descriptor 
> > > > > > > > > translation
> > > > > > > > > table format. The M4U-SMI HW diagram is as below:
> > > > > > > > >
> > > > > > > > >   EMI
> > > > > > > > >|
> > > > > > > > >   M4U
> > > > > > > > >|
> > > > > > > > >   
> > > > > > > > >SMI Common
> > > > > > > > >   
> > > > > > > > >|
> > > > > > > > >   +---+--+--+--+---+
> > > > > > > > >   |   |  |  |   .. |   |
> > > > > > > > >   |   |  |  |  |   |
> > > > > > > > > larb0   larb1  larb2  larb4 ..  larb19   larb20
> > > > > > > > > disp0   disp1   mdpvdec   IPE  IPE
> > > > > > > > >
> > > > > > > > > All the connections are HW fixed, SW can NOT adjust it.
> > > > > > > > >
> > > > > > > > > mt8192 M4U support 0~16GB iova range. we preassign different 
> > > > > > > > > engines
> > > > > > > > > into different iova ranges:
> > > > > > > > >
> > > > > > > > > domain-id  module iova-range  larbs
> > > > > > > > >0   disp0 ~ 4G  larb0/1
> > > > > > > > >1   vcodec  4G ~ 8G larb4/5/7
> > > > > > > > >2   cam/mdp 8G ~ 12G 
> > > > > > > > > larb2/9/11/13/14/16/17/18/19/20
> > > > > > > >
> > > > > > > > Why do we preassign these addresses in DT? Shouldn't it be a 
> > > > > > > > user's or
> > > > > > > > integrator's decision to split the 16 GB address range into 
> > > > > > > > sub-ranges
> > > > > > > > and define which larbs those sub-ranges are shared with?
> > > > > > >
> > > > > > > The problem is that we can't split the 16GB range with the larb 
> > > > > > > as unit.
> > > > > > > The example is the below ccu0(larb13 port9/10) is a independent
> > > > > > > range(domain), the others ports in larb13 is in another domain.
> > > > > > >
> > > > > > > disp/vcodec/cam/mdp don't have special iova requirement, they 
> > > > > > > could
> > > > > > > access any range. vcodec also can locate 8G~12G. it don't care 
> > > > > > > about
> > > > > > > where its iova locate. here I preassign like this following with 
> > > > > > > our
> > > > > > > internal project setting.
> > > > > >
> > > > > > Let me try to understand this a bit more. Given the split you're
> > > > > > proposing, is there actually any isolation enforced between 
> > > > > > particular
> > > > > > domains? For example, if I program vcodec to with a DMA address from
> > > 

Re: [PATCH v6 00/33] MT8192 IOMMU support

2021-01-29 Thread Tomasz Figa
Hi Yong,

On Mon, Jan 11, 2021 at 07:18:41PM +0800, Yong Wu wrote:
> This patch mainly adds support for mt8192 Multimedia IOMMU and SMI.
> 
> mt8192 also is MTK IOMMU gen2 which uses ARM Short-Descriptor translation
> table format. The M4U-SMI HW diagram is as below:
> 
>   EMI
>|
>   M4U
>|
>   
>SMI Common
>   
>|
>   +---+--+--+--+---+
>   |   |  |  |   .. |   |
>   |   |  |  |  |   |
> larb0   larb1  larb2  larb4 ..  larb19   larb20
> disp0   disp1   mdpvdec   IPE  IPE
> 
> All the connections are HW fixed, SW can NOT adjust it.
> 
> Comparing with the preview SoC, this patchset mainly adds two new functions:
> a) add iova 34 bits support.
> b) add multi domains support since several HW has the special iova
> region requirement.
> 
> change note:
> v6:a) base on v5.11-rc1. and tlb v4:
>   
> https://lore.kernel.org/linux-mediatek/20210107122909.16317-1-yong...@mediatek.com/T/#t
>  
>b) Remove the "domain id" definition in the binding header file.
>   Get the domain from dev->dma_range_map.
>   After this, Change many codes flow.
>c) the patchset adds a new common file(mtk_smi-larb-port.h).
>   This version changes that name into mtk-memory-port.h which reflect 
>   its file path. This only changes the file name. no other change.
>   thus I keep all the Reviewed-by Tags.
>   (another reason is that we will add some iommu ports unrelated with
>smi-larb)
>d) Refactor the power-domain flow suggestted by Tomasz.
>e) Some other small fix. use different oas for different soc; Change the
>macro for 34bit iova tlb flush.
> 

Thanks for the fixes.

I still think the concept of dma-ranges is not quire right for the
problem we need to solve here, but it certainly works for the time being
and it's possible to remove it in a follow up patch, so I'm fine with
merging this as is.

Reviewed-by: Tomasz Figa 

I'll comment on my suggestion for a replacement for the dma-ranges that
doesn't need hardcoding arbitrary address ranges in DT in a separate
reply.

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 06/27] dt-bindings: mediatek: Add binding for mt8192 IOMMU

2021-01-24 Thread Tomasz Figa
On Wed, Jan 20, 2021 at 4:08 PM Yong Wu  wrote:
>
> On Wed, 2021-01-20 at 13:15 +0900, Tomasz Figa wrote:
> > On Wed, Jan 13, 2021 at 3:45 PM Yong Wu  wrote:
> > >
> > > On Wed, 2021-01-13 at 14:30 +0900, Tomasz Figa wrote:
> > > > On Thu, Dec 24, 2020 at 8:35 PM Yong Wu  wrote:
> > > > >
> > > > > On Wed, 2020-12-23 at 17:18 +0900, Tomasz Figa wrote:
> > > > > > On Wed, Dec 09, 2020 at 04:00:41PM +0800, Yong Wu wrote:
> > > > > > > This patch adds decriptions for mt8192 IOMMU and SMI.
> > > > > > >
> > > > > > > mt8192 also is MTK IOMMU gen2 which uses ARM Short-Descriptor 
> > > > > > > translation
> > > > > > > table format. The M4U-SMI HW diagram is as below:
> > > > > > >
> > > > > > >   EMI
> > > > > > >|
> > > > > > >   M4U
> > > > > > >|
> > > > > > >   
> > > > > > >SMI Common
> > > > > > >   
> > > > > > >|
> > > > > > >   +---+--+--+--+---+
> > > > > > >   |   |  |  |   .. |   |
> > > > > > >   |   |  |  |  |   |
> > > > > > > larb0   larb1  larb2  larb4 ..  larb19   larb20
> > > > > > > disp0   disp1   mdpvdec   IPE  IPE
> > > > > > >
> > > > > > > All the connections are HW fixed, SW can NOT adjust it.
> > > > > > >
> > > > > > > mt8192 M4U support 0~16GB iova range. we preassign different 
> > > > > > > engines
> > > > > > > into different iova ranges:
> > > > > > >
> > > > > > > domain-id  module iova-range  larbs
> > > > > > >0   disp0 ~ 4G  larb0/1
> > > > > > >1   vcodec  4G ~ 8G larb4/5/7
> > > > > > >2   cam/mdp 8G ~ 12G 
> > > > > > > larb2/9/11/13/14/16/17/18/19/20
> > > > > >
> > > > > > Why do we preassign these addresses in DT? Shouldn't it be a user's 
> > > > > > or
> > > > > > integrator's decision to split the 16 GB address range into 
> > > > > > sub-ranges
> > > > > > and define which larbs those sub-ranges are shared with?
> > > > >
> > > > > The problem is that we can't split the 16GB range with the larb as 
> > > > > unit.
> > > > > The example is the below ccu0(larb13 port9/10) is a independent
> > > > > range(domain), the others ports in larb13 is in another domain.
> > > > >
> > > > > disp/vcodec/cam/mdp don't have special iova requirement, they could
> > > > > access any range. vcodec also can locate 8G~12G. it don't care about
> > > > > where its iova locate. here I preassign like this following with our
> > > > > internal project setting.
> > > >
> > > > Let me try to understand this a bit more. Given the split you're
> > > > proposing, is there actually any isolation enforced between particular
> > > > domains? For example, if I program vcodec to with a DMA address from
> > > > the 0-4G range, would the IOMMU actually generate a fault, even if
> > > > disp had some memory mapped at that address?
> > >
> > > In this case. we will get fault in current SW setting.
> > >
> >
> > Okay, thanks.
> >
> > > >
> > > > >
> > > > > Why set this in DT?, this is only for simplifying the code. Assume we
> > > > > put it in the platform data. We have up to 32 larbs, each larb has up 
> > > > > to
> > > > > 32 ports, each port may be in different iommu domains. we should have 
> > > > > a
> > > > > big array for this..however we only use a macro to get the domain in 
> > > > > the
> > > > > DT method.
> > > > >
> > > > > When replying this mail, I h

Re: [PATCH v5 06/27] dt-bindings: mediatek: Add binding for mt8192 IOMMU

2021-01-19 Thread Tomasz Figa
On Wed, Jan 13, 2021 at 3:45 PM Yong Wu  wrote:
>
> On Wed, 2021-01-13 at 14:30 +0900, Tomasz Figa wrote:
> > On Thu, Dec 24, 2020 at 8:35 PM Yong Wu  wrote:
> > >
> > > On Wed, 2020-12-23 at 17:18 +0900, Tomasz Figa wrote:
> > > > On Wed, Dec 09, 2020 at 04:00:41PM +0800, Yong Wu wrote:
> > > > > This patch adds decriptions for mt8192 IOMMU and SMI.
> > > > >
> > > > > mt8192 also is MTK IOMMU gen2 which uses ARM Short-Descriptor 
> > > > > translation
> > > > > table format. The M4U-SMI HW diagram is as below:
> > > > >
> > > > >   EMI
> > > > >|
> > > > >   M4U
> > > > >|
> > > > >   
> > > > >SMI Common
> > > > >   
> > > > >|
> > > > >   +---+--+--+--+---+
> > > > >   |   |  |  |   .. |   |
> > > > >   |   |  |  |  |   |
> > > > > larb0   larb1  larb2  larb4 ..  larb19   larb20
> > > > > disp0   disp1   mdpvdec   IPE  IPE
> > > > >
> > > > > All the connections are HW fixed, SW can NOT adjust it.
> > > > >
> > > > > mt8192 M4U support 0~16GB iova range. we preassign different engines
> > > > > into different iova ranges:
> > > > >
> > > > > domain-id  module iova-range  larbs
> > > > >0   disp0 ~ 4G  larb0/1
> > > > >1   vcodec  4G ~ 8G larb4/5/7
> > > > >2   cam/mdp 8G ~ 12G 
> > > > > larb2/9/11/13/14/16/17/18/19/20
> > > >
> > > > Why do we preassign these addresses in DT? Shouldn't it be a user's or
> > > > integrator's decision to split the 16 GB address range into sub-ranges
> > > > and define which larbs those sub-ranges are shared with?
> > >
> > > The problem is that we can't split the 16GB range with the larb as unit.
> > > The example is the below ccu0(larb13 port9/10) is a independent
> > > range(domain), the others ports in larb13 is in another domain.
> > >
> > > disp/vcodec/cam/mdp don't have special iova requirement, they could
> > > access any range. vcodec also can locate 8G~12G. it don't care about
> > > where its iova locate. here I preassign like this following with our
> > > internal project setting.
> >
> > Let me try to understand this a bit more. Given the split you're
> > proposing, is there actually any isolation enforced between particular
> > domains? For example, if I program vcodec to with a DMA address from
> > the 0-4G range, would the IOMMU actually generate a fault, even if
> > disp had some memory mapped at that address?
>
> In this case. we will get fault in current SW setting.
>

Okay, thanks.

> >
> > >
> > > Why set this in DT?, this is only for simplifying the code. Assume we
> > > put it in the platform data. We have up to 32 larbs, each larb has up to
> > > 32 ports, each port may be in different iommu domains. we should have a
> > > big array for this..however we only use a macro to get the domain in the
> > > DT method.
> > >
> > > When replying this mail, I happen to see there is a "dev->dev_range_map"
> > > which has "dma-range" information, I think I could use this value to get
> > > which domain the device belong to. then no need put domid in DT. I will
> > > test this.
> >
> > My feeling is that the only part that needs to be enforced statically
> > is the reserved IOVA range for CCUs. The other ranges should be
> > determined dynamically, although I think I need to understand better
> > how the hardware and your proposed design work to tell what would be
> > likely the best choice here.
>
> I have removed the domid patch in v6. and get the domain id in [27/33]
> in v6..
>
> About the other ranges should be dynamical, the commit message [30/33]
> of v6 should be helpful. the problem is that we have a bank_sel setting
> for the iova[32:33]. currently we preassign this value. thus, all the
> ranges are fixed. If you adjust this setting, you can let 

Re: [PATCH v5 06/27] dt-bindings: mediatek: Add binding for mt8192 IOMMU

2021-01-12 Thread Tomasz Figa
On Thu, Dec 24, 2020 at 8:35 PM Yong Wu  wrote:
>
> On Wed, 2020-12-23 at 17:18 +0900, Tomasz Figa wrote:
> > On Wed, Dec 09, 2020 at 04:00:41PM +0800, Yong Wu wrote:
> > > This patch adds decriptions for mt8192 IOMMU and SMI.
> > >
> > > mt8192 also is MTK IOMMU gen2 which uses ARM Short-Descriptor translation
> > > table format. The M4U-SMI HW diagram is as below:
> > >
> > >   EMI
> > >|
> > >   M4U
> > >|
> > >   
> > >SMI Common
> > >   
> > >|
> > >   +---+--+--+--+---+
> > >   |   |  |  |   .. |   |
> > >   |   |  |  |  |   |
> > > larb0   larb1  larb2  larb4 ..  larb19   larb20
> > > disp0   disp1   mdpvdec   IPE  IPE
> > >
> > > All the connections are HW fixed, SW can NOT adjust it.
> > >
> > > mt8192 M4U support 0~16GB iova range. we preassign different engines
> > > into different iova ranges:
> > >
> > > domain-id  module iova-range  larbs
> > >0   disp0 ~ 4G  larb0/1
> > >1   vcodec  4G ~ 8G larb4/5/7
> > >2   cam/mdp 8G ~ 12G 
> > > larb2/9/11/13/14/16/17/18/19/20
> >
> > Why do we preassign these addresses in DT? Shouldn't it be a user's or
> > integrator's decision to split the 16 GB address range into sub-ranges
> > and define which larbs those sub-ranges are shared with?
>
> The problem is that we can't split the 16GB range with the larb as unit.
> The example is the below ccu0(larb13 port9/10) is a independent
> range(domain), the others ports in larb13 is in another domain.
>
> disp/vcodec/cam/mdp don't have special iova requirement, they could
> access any range. vcodec also can locate 8G~12G. it don't care about
> where its iova locate. here I preassign like this following with our
> internal project setting.

Let me try to understand this a bit more. Given the split you're
proposing, is there actually any isolation enforced between particular
domains? For example, if I program vcodec to with a DMA address from
the 0-4G range, would the IOMMU actually generate a fault, even if
disp had some memory mapped at that address?

>
> Why set this in DT?, this is only for simplifying the code. Assume we
> put it in the platform data. We have up to 32 larbs, each larb has up to
> 32 ports, each port may be in different iommu domains. we should have a
> big array for this..however we only use a macro to get the domain in the
> DT method.
>
> When replying this mail, I happen to see there is a "dev->dev_range_map"
> which has "dma-range" information, I think I could use this value to get
> which domain the device belong to. then no need put domid in DT. I will
> test this.

My feeling is that the only part that needs to be enforced statically
is the reserved IOVA range for CCUs. The other ranges should be
determined dynamically, although I think I need to understand better
how the hardware and your proposed design work to tell what would be
likely the best choice here.

Best regards,
Tomasz

>
> Thanks.
> >
> > Best regards,
> > Tomasz
> >
> > >3   CCU00x4000_ ~ 0x43ff_ larb13: port 9/10
> > >4   CCU10x4400_ ~ 0x47ff_ larb14: port 4/5
> > >
> > > The iova range for CCU0/1(camera control unit) is HW requirement.
> > >
> > > Signed-off-by: Yong Wu 
> > > Reviewed-by: Rob Herring 
> > > ---
> > >  .../bindings/iommu/mediatek,iommu.yaml|  18 +-
> > >  include/dt-bindings/memory/mt8192-larb-port.h | 240 ++
> > >  2 files changed, 257 insertions(+), 1 deletion(-)
> > >  create mode 100644 include/dt-bindings/memory/mt8192-larb-port.h
> > >
> [snip]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 04/27] dt-bindings: memory: mediatek: Add domain definition

2021-01-12 Thread Tomasz Figa
On Thu, Dec 24, 2020 at 8:27 PM Yong Wu  wrote:
>
> On Wed, 2020-12-23 at 17:15 +0900, Tomasz Figa wrote:
> > Hi Yong,
> >
> > On Wed, Dec 09, 2020 at 04:00:39PM +0800, Yong Wu wrote:
> > > In the latest SoC, there are several HW IP require a sepecial iova
> > > range, mainly CCU and VPU has this requirement. Take CCU as a example,
> > > CCU require its iova locate in the range(0x4000_ ~ 0x43ff_).
> >
> > Is this really a domain? Does the address range come from the design of
> > the IOMMU?
>
> It is not a really a domain. The address range comes from CCU HW
> requirement. That HW can only access this iova range. thus I create a
> special iommu domain for it.
>

I guess it's the IOMMU/DT maintainers who have the last word here, but
shouldn't DT just specify the hardware characteristics and then the
kernel configure the hardware appropriately, possibly based on some
other configuration interface (e.g. command line parameters or sysfs)?

How I'd do this is rather than enforcing those arbitrary decisions
onto the DT bindings, I'd add properties to the master devices (e.g.
CCU) that specify which IOVA range they can operate on. Then, the
exact split of the complete address space would be done at runtime,
based on kernel configuration, command line parameters and possibly
sysfs attributes if things could be reconfigured dynamically.

Best regards,
Tomasz

> >
> > Best regards,
> > Tomasz
> >
> > >
> > > In this patch we add a domain definition for the special port. In the
> > > example of CCU, If we preassign CCU port in domain1, then iommu driver
> > > will prepare a independent iommu domain of the special iova range for it,
> > > then the iova got from dma_alloc_attrs(ccu-dev) will locate in its special
> > > range.
> > >
> > > This is a preparing patch for multi-domain support.
> > >
> > > Signed-off-by: Yong Wu 
> > > Acked-by: Krzysztof Kozlowski 
> > > Acked-by: Rob Herring 
> > > ---
> > >  include/dt-bindings/memory/mtk-smi-larb-port.h | 9 -
> > >  1 file changed, 8 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/dt-bindings/memory/mtk-smi-larb-port.h 
> > > b/include/dt-bindings/memory/mtk-smi-larb-port.h
> > > index 7d64103209af..2d4c973c174f 100644
> > > --- a/include/dt-bindings/memory/mtk-smi-larb-port.h
> > > +++ b/include/dt-bindings/memory/mtk-smi-larb-port.h
> > > @@ -7,9 +7,16 @@
> > >  #define __DT_BINDINGS_MEMORY_MTK_MEMORY_PORT_H_
> > >
> > >  #define MTK_LARB_NR_MAX32
> > > +#define MTK_M4U_DOM_NR_MAX 8
> > > +
> > > +#define MTK_M4U_DOM_ID(domid, larb, port)  \
> > > +   (((domid) & 0x7) << 16 | (((larb) & 0x1f) << 5) | ((port) & 0x1f))
> > > +
> > > +/* The default dom id is 0. */
> > > +#define MTK_M4U_ID(larb, port) MTK_M4U_DOM_ID(0, larb, port)
> > >
> > > -#define MTK_M4U_ID(larb, port) (((larb) << 5) | (port))
> > >  #define MTK_M4U_TO_LARB(id)(((id) >> 5) & 0x1f)
> > >  #define MTK_M4U_TO_PORT(id)((id) & 0x1f)
> > > +#define MTK_M4U_TO_DOM(id) (((id) >> 16) & 0x7)
> > >
> > >  #endif
> > > --
> > > 2.18.0
> > >
> > > ___
> > > iommu mailing list
> > > iommu@lists.linux-foundation.org
> > > https://lists.linuxfoundation.org/mailman/listinfo/iommu
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 0/6] Restricted DMA

2021-01-12 Thread Tomasz Figa
On Wed, Jan 13, 2021 at 12:56 PM Florian Fainelli  wrote:
>
>
>
> On 1/12/2021 6:29 PM, Tomasz Figa wrote:
> > Hi Florian,
> >
> > On Wed, Jan 13, 2021 at 3:01 AM Florian Fainelli  
> > wrote:
> >>
> >> On 1/11/21 11:48 PM, Claire Chang wrote:
> >>> On Fri, Jan 8, 2021 at 1:59 AM Florian Fainelli  
> >>> wrote:
> >>>>
> >>>> On 1/7/21 9:42 AM, Claire Chang wrote:
> >>>>
> >>>>>> Can you explain how ATF gets involved and to what extent it does help,
> >>>>>> besides enforcing a secure region from the ARM CPU's perpsective? Does
> >>>>>> the PCIe root complex not have an IOMMU but can somehow be denied 
> >>>>>> access
> >>>>>> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
> >>>>>> still some sort of basic protection that the HW enforces, right?
> >>>>>
> >>>>> We need the ATF support for memory MPU (memory protection unit).
> >>>>> Restricted DMA (with reserved-memory in dts) makes sure the predefined 
> >>>>> memory
> >>>>> region is for PCIe DMA only, but we still need MPU to locks down PCIe 
> >>>>> access to
> >>>>> that specific regions.
> >>>>
> >>>> OK so you do have a protection unit of some sort to enforce which region
> >>>> in DRAM the PCIE bridge is allowed to access, that makes sense,
> >>>> otherwise the restricted DMA region would only be a hint but nothing you
> >>>> can really enforce. This is almost entirely analogous to our systems 
> >>>> then.
> >>>
> >>> Here is the example of setting the MPU:
> >>> https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
> >>>
> >>>>
> >>>> There may be some value in standardizing on an ARM SMCCC call then since
> >>>> you already support two different SoC vendors.
> >>>>
> >>>>>
> >>>>>>
> >>>>>> On Broadcom STB SoCs we have had something similar for a while however
> >>>>>> and while we don't have an IOMMU for the PCIe bridge, we do have a a
> >>>>>> basic protection mechanism whereby we can configure a region in DRAM to
> >>>>>> be PCIe read/write and CPU read/write which then gets used as the PCIe
> >>>>>> inbound region for the PCIe EP. By default the PCIe bridge is not
> >>>>>> allowed access to DRAM so we must call into a security agent to allow
> >>>>>> the PCIe bridge to access the designated DRAM region.
> >>>>>>
> >>>>>> We have done this using a private CMA area region assigned via Device
> >>>>>> Tree, assigned with a and requiring the PCIe EP driver to use
> >>>>>> dma_alloc_from_contiguous() in order to allocate from this device
> >>>>>> private CMA area. The only drawback with that approach is that it
> >>>>>> requires knowing how much memory you need up front for buffers and DMA
> >>>>>> descriptors that the PCIe EP will need to process. The problem is that
> >>>>>> it requires driver modifications and that does not scale over the 
> >>>>>> number
> >>>>>> of PCIe EP drivers, some we absolutely do not control, but there is no
> >>>>>> need to bounce buffer. Your approach scales better across PCIe EP
> >>>>>> drivers however it does require bounce buffering which could be a
> >>>>>> performance hit.
> >>>>>
> >>>>> Only the streaming DMA (map/unmap) needs bounce buffering.
> >>>>
> >>>> True, and typically only on transmit since you don't really control
> >>>> where the sk_buff are allocated from, right? On RX since you need to
> >>>> hand buffer addresses to the WLAN chip prior to DMA, you can allocate
> >>>> them from a pool that already falls within the restricted DMA region, 
> >>>> right?
> >>>>
> >>>
> >>> Right, but applying bounce buffering to RX will make it more secure.
> >>> The device won't be able to modify the content after unmap. Just like what
> >>> iommu_unmap does.
> >>
> >> Sure, however the goals of using bounce buffe

Re: [RFC PATCH v3 0/6] Restricted DMA

2021-01-12 Thread Tomasz Figa
Hi Florian,

On Wed, Jan 13, 2021 at 3:01 AM Florian Fainelli  wrote:
>
> On 1/11/21 11:48 PM, Claire Chang wrote:
> > On Fri, Jan 8, 2021 at 1:59 AM Florian Fainelli  
> > wrote:
> >>
> >> On 1/7/21 9:42 AM, Claire Chang wrote:
> >>
>  Can you explain how ATF gets involved and to what extent it does help,
>  besides enforcing a secure region from the ARM CPU's perpsective? Does
>  the PCIe root complex not have an IOMMU but can somehow be denied access
>  to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
>  still some sort of basic protection that the HW enforces, right?
> >>>
> >>> We need the ATF support for memory MPU (memory protection unit).
> >>> Restricted DMA (with reserved-memory in dts) makes sure the predefined 
> >>> memory
> >>> region is for PCIe DMA only, but we still need MPU to locks down PCIe 
> >>> access to
> >>> that specific regions.
> >>
> >> OK so you do have a protection unit of some sort to enforce which region
> >> in DRAM the PCIE bridge is allowed to access, that makes sense,
> >> otherwise the restricted DMA region would only be a hint but nothing you
> >> can really enforce. This is almost entirely analogous to our systems then.
> >
> > Here is the example of setting the MPU:
> > https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
> >
> >>
> >> There may be some value in standardizing on an ARM SMCCC call then since
> >> you already support two different SoC vendors.
> >>
> >>>
> 
>  On Broadcom STB SoCs we have had something similar for a while however
>  and while we don't have an IOMMU for the PCIe bridge, we do have a a
>  basic protection mechanism whereby we can configure a region in DRAM to
>  be PCIe read/write and CPU read/write which then gets used as the PCIe
>  inbound region for the PCIe EP. By default the PCIe bridge is not
>  allowed access to DRAM so we must call into a security agent to allow
>  the PCIe bridge to access the designated DRAM region.
> 
>  We have done this using a private CMA area region assigned via Device
>  Tree, assigned with a and requiring the PCIe EP driver to use
>  dma_alloc_from_contiguous() in order to allocate from this device
>  private CMA area. The only drawback with that approach is that it
>  requires knowing how much memory you need up front for buffers and DMA
>  descriptors that the PCIe EP will need to process. The problem is that
>  it requires driver modifications and that does not scale over the number
>  of PCIe EP drivers, some we absolutely do not control, but there is no
>  need to bounce buffer. Your approach scales better across PCIe EP
>  drivers however it does require bounce buffering which could be a
>  performance hit.
> >>>
> >>> Only the streaming DMA (map/unmap) needs bounce buffering.
> >>
> >> True, and typically only on transmit since you don't really control
> >> where the sk_buff are allocated from, right? On RX since you need to
> >> hand buffer addresses to the WLAN chip prior to DMA, you can allocate
> >> them from a pool that already falls within the restricted DMA region, 
> >> right?
> >>
> >
> > Right, but applying bounce buffering to RX will make it more secure.
> > The device won't be able to modify the content after unmap. Just like what
> > iommu_unmap does.
>
> Sure, however the goals of using bounce buffering equally applies to RX
> and TX in that this is the only layer sitting between a stack (block,
> networking, USB, etc.) and the underlying device driver that scales well
> in order to massage a dma_addr_t to be within a particular physical range.
>
> There is however room for improvement if the drivers are willing to
> change their buffer allocation strategy. When you receive Wi-Fi frames
> you need to allocate buffers for the Wi-Fi device to DMA into, and that
> happens ahead of the DMA transfers by the Wi-Fi device. At buffer
> allocation time you could very well allocate these frames from the
> restricted DMA region without having to bounce buffer them since the
> host CPU is in control over where and when to DMA into.
>

That is, however, still a trade-off between saving that one copy and
protection from the DMA tampering with the packet contents when the
kernel is reading them. Notice how the copy effectively makes a
snapshot of the contents, guaranteeing that the kernel has a
consistent view of the packet, which is not true if the DMA could
modify the buffer contents in the middle of CPU accesses.

Best regards,
Tomasz

> The issue is that each network driver may implement its own buffer
> allocation strategy, some may simply call netdev_alloc_skb() which gives
> zero control over where the buffer comes from unless you play tricks
> with NUMA node allocations and somehow declare that your restricted DMA
> region is a different NUMA node. If the driver allocates pages and then
> attaches a SKB to 

Re: [PATCH v3 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once

2021-01-08 Thread Tomasz Figa
On Wed, Dec 23, 2020 at 8:00 PM Robin Murphy  wrote:
>
> On 2020-12-23 08:56, Tomasz Figa wrote:
> > On Wed, Dec 16, 2020 at 06:36:06PM +0800, Yong Wu wrote:
> >> In current iommu_unmap, this code is:
> >>
> >>  iommu_iotlb_gather_init(_gather);
> >>  ret = __iommu_unmap(domain, iova, size, _gather);
> >>  iommu_iotlb_sync(domain, _gather);
> >>
> >> We could gather the whole iova range in __iommu_unmap, and then do tlb
> >> synchronization in the iommu_iotlb_sync.
> >>
> >> This patch implement this, Gather the range in mtk_iommu_unmap.
> >> then iommu_iotlb_sync call tlb synchronization for the gathered iova range.
> >> we don't call iommu_iotlb_gather_add_page since our tlb synchronization
> >> could be regardless of granule size.
> >>
> >> In this way, gather->start is impossible ULONG_MAX, remove the checking.
> >>
> >> This patch aims to do tlb synchronization *once* in the iommu_unmap.
> >>
> >> Signed-off-by: Yong Wu 
> >> ---
> >>   drivers/iommu/mtk_iommu.c | 8 +---
> >>   1 file changed, 5 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> >> index db7d43adb06b..89cec51405cd 100644
> >> --- a/drivers/iommu/mtk_iommu.c
> >> +++ b/drivers/iommu/mtk_iommu.c
> >> @@ -506,7 +506,12 @@ static size_t mtk_iommu_unmap(struct iommu_domain 
> >> *domain,
> >>struct iommu_iotlb_gather *gather)
> >>   {
> >>  struct mtk_iommu_domain *dom = to_mtk_domain(domain);
> >> +unsigned long long end = iova + size;
> >>
> >> +if (gather->start > iova)
> >> +gather->start = iova;
> >> +if (gather->end < end)
> >> +gather->end = end;
> >
> > I don't know how common the case is, but what happens if
> > gather->start...gather->end is a disjoint range from iova...end? E.g.
> >
> >   | gather  | ..XXX... | iova |
> >   | |  |  |
> >   gather->start |  iova   |
> > gather->end   end
> >
> > We would also end up invalidating the TLB for the XXX area, which could
> > affect the performance.
>
> Take a closer look at iommu_unmap() - the gather data is scoped to each
> individual call, so that can't possibly happen.
>
> > Also, why is the existing code in __arm_v7s_unmap() not enough? It seems
> > to call io_pgtable_tlb_add_page() already, so it should be batching the
> > flushes.
>
> Because if we leave io-pgtable in charge of maintenance it will also
> inject additional invalidations and syncs for the sake of strictly
> correct walk cache maintenance. Apparently we can get away without that
> on this hardware, so the fundamental purpose of this series is to
> sidestep it.
>
> It's proven to be cleaner overall to devolve this kind of "non-standard"
> TLB maintenance back to drivers rather than try to cram yet more
> special-case complexity into io-pgtable itself. I'm planning to clean up
> the remains of the TLBI_ON_MAP quirk entirely after this.
>
> Robin.
>
> >>  return dom->iop->unmap(dom->iop, iova, size, gather);
> >>   }
> >>
> >> @@ -523,9 +528,6 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain 
> >> *domain,
> >>  struct mtk_iommu_domain *dom = to_mtk_domain(domain);
> >>  size_t length = gather->end - gather->start;
> >>
> >> -if (gather->start == ULONG_MAX)
> >> -return;
> >> -
> >>  mtk_iommu_tlb_flush_range_sync(gather->start, length, gather->pgsize,
> >> dom->data);
> >>   }
> >> --
> >> 2.18.0
> >>
> >> ___
> >> iommu mailing list
> >> iommu@lists.linux-foundation.org
> >> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 18/27] iommu/mediatek: Add power-domain operation

2021-01-08 Thread Tomasz Figa
On Tue, Dec 29, 2020 at 8:06 PM Yong Wu  wrote:
>
> On Wed, 2020-12-23 at 17:36 +0900, Tomasz Figa wrote:
> > On Wed, Dec 09, 2020 at 04:00:53PM +0800, Yong Wu wrote:
> > > In the previous SoC, the M4U HW is in the EMI power domain which is
> > > always on. the latest M4U is in the display power domain which may be
> > > turned on/off, thus we have to add pm_runtime interface for it.
> > >
> > > When the engine work, the engine always enable the power and clocks for
> > > smi-larb/smi-common, then the M4U's power will always be powered on
> > > automatically via the device link with smi-common.
> > >
> > > Note: we don't enable the M4U power in iommu_map/unmap for tlb flush.
> > > If its power already is on, of course it is ok. if the power is off,
> > > the main tlb will be reset while M4U power on, thus the tlb flush while
> > > m4u power off is unnecessary, just skip it.
> > >
> > > There will be one case that pm runctime status is not expected when tlb
> > > flush. After boot, the display may call dma_alloc_attrs before it call
> > > pm_runtime_get(disp-dev), then the m4u's pm status is not active inside
> > > the dma_alloc_attrs. Since it only happens after boot, the tlb is clean
> > > at that time, I also think this is ok.
> > >
> > > Signed-off-by: Yong Wu 
> > > ---
> > >  drivers/iommu/mtk_iommu.c | 41 +--
> > >  1 file changed, 35 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > > index 6fe3ee2b2bf5..0e9c03cbab32 100644
> > > --- a/drivers/iommu/mtk_iommu.c
> > > +++ b/drivers/iommu/mtk_iommu.c
> > > @@ -184,6 +184,8 @@ static void mtk_iommu_tlb_flush_all(void *cookie)
> > > struct mtk_iommu_data *data = cookie;
> > >
> > > for_each_m4u(data) {
> > > +   if (!pm_runtime_active(data->dev))
> > > +   continue;
> >
> > Is it guaranteed that the status is active in the check above, but then
> > the process is preempted and it goes down here?
> >
> > Shouldn't we do something like below?
> >
> > ret = pm_runtime_get_if_active();
> > if (!ret)
> > continue;
> > if (ret < 0)
> > // handle error
> >
> > // Flush
> >
> > pm_runtime_put();
>
> Make sense. Thanks. There is a comment in arm_smmu.c "avoid touching
> dev->power.lock in fastpaths". To avoid this here too(we have many SoC
> don't have power-domain). then the code will be like:
>
> bool has_pm = !!data->dev->pm_domain;
>
> if (has_pm) {
> if (pm_runtime_get_if_in_use(data->dev) <= 0)
> continue;
> }
>
> 
>
> if (has_pm)
> pm_runtime_put(data->dev);

Looks good to me, thanks.

> >
> > Similar comment to the other places being changed by this patch.
> >
> > > writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
> > >data->base + data->plat_data->inv_sel_reg);
> > > writel_relaxed(F_ALL_INVLD, data->base + REG_MMU_INVALIDATE);
> > > @@ -200,6 +202,10 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned 
> > > long iova, size_t size,
> > > u32 tmp;
> > >
> > > for_each_m4u(data) {
> > > +   /* skip tlb flush when pm is not active. */
> > > +   if (!pm_runtime_active(data->dev))
> > > +   continue;
> > > +
> > > spin_lock_irqsave(>tlb_lock, flags);
> > > writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
> > >data->base + data->plat_data->inv_sel_reg);
> [snip]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once

2021-01-08 Thread Tomasz Figa
On Wed, Dec 23, 2020 at 8:00 PM Robin Murphy  wrote:
>
> On 2020-12-23 08:56, Tomasz Figa wrote:
> > On Wed, Dec 16, 2020 at 06:36:06PM +0800, Yong Wu wrote:
> >> In current iommu_unmap, this code is:
> >>
> >>  iommu_iotlb_gather_init(_gather);
> >>  ret = __iommu_unmap(domain, iova, size, _gather);
> >>  iommu_iotlb_sync(domain, _gather);
> >>
> >> We could gather the whole iova range in __iommu_unmap, and then do tlb
> >> synchronization in the iommu_iotlb_sync.
> >>
> >> This patch implement this, Gather the range in mtk_iommu_unmap.
> >> then iommu_iotlb_sync call tlb synchronization for the gathered iova range.
> >> we don't call iommu_iotlb_gather_add_page since our tlb synchronization
> >> could be regardless of granule size.
> >>
> >> In this way, gather->start is impossible ULONG_MAX, remove the checking.
> >>
> >> This patch aims to do tlb synchronization *once* in the iommu_unmap.
> >>
> >> Signed-off-by: Yong Wu 
> >> ---
> >>   drivers/iommu/mtk_iommu.c | 8 +---
> >>   1 file changed, 5 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> >> index db7d43adb06b..89cec51405cd 100644
> >> --- a/drivers/iommu/mtk_iommu.c
> >> +++ b/drivers/iommu/mtk_iommu.c
> >> @@ -506,7 +506,12 @@ static size_t mtk_iommu_unmap(struct iommu_domain 
> >> *domain,
> >>struct iommu_iotlb_gather *gather)
> >>   {
> >>  struct mtk_iommu_domain *dom = to_mtk_domain(domain);
> >> +unsigned long long end = iova + size;
> >>
> >> +if (gather->start > iova)
> >> +gather->start = iova;
> >> +if (gather->end < end)
> >> +gather->end = end;
> >
> > I don't know how common the case is, but what happens if
> > gather->start...gather->end is a disjoint range from iova...end? E.g.
> >
> >   | gather  | ..XXX... | iova |
> >   | |  |  |
> >   gather->start |  iova   |
> > gather->end   end
> >
> > We would also end up invalidating the TLB for the XXX area, which could
> > affect the performance.
>
> Take a closer look at iommu_unmap() - the gather data is scoped to each
> individual call, so that can't possibly happen.
>
> > Also, why is the existing code in __arm_v7s_unmap() not enough? It seems
> > to call io_pgtable_tlb_add_page() already, so it should be batching the
> > flushes.
>
> Because if we leave io-pgtable in charge of maintenance it will also
> inject additional invalidations and syncs for the sake of strictly
> correct walk cache maintenance. Apparently we can get away without that
> on this hardware, so the fundamental purpose of this series is to
> sidestep it.
>
> It's proven to be cleaner overall to devolve this kind of "non-standard"
> TLB maintenance back to drivers rather than try to cram yet more
> special-case complexity into io-pgtable itself. I'm planning to clean up
> the remains of the TLBI_ON_MAP quirk entirely after this.

(Sorry, I sent an empty email accidentally.)

I see, thanks for clarifying. The patch looks good to me then.

Best regards,
Tomasz

>
> Robin.
>
> >>  return dom->iop->unmap(dom->iop, iova, size, gather);
> >>   }
> >>
> >> @@ -523,9 +528,6 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain 
> >> *domain,
> >>  struct mtk_iommu_domain *dom = to_mtk_domain(domain);
> >>  size_t length = gather->end - gather->start;
> >>
> >> -if (gather->start == ULONG_MAX)
> >> -return;
> >> -
> >>  mtk_iommu_tlb_flush_range_sync(gather->start, length, gather->pgsize,
> >> dom->data);
> >>   }
> >> --
> >> 2.18.0
> >>
> >> ___
> >> iommu mailing list
> >> iommu@lists.linux-foundation.org
> >> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once

2020-12-23 Thread Tomasz Figa
On Wed, Dec 16, 2020 at 06:36:06PM +0800, Yong Wu wrote:
> In current iommu_unmap, this code is:
> 
>   iommu_iotlb_gather_init(_gather);
>   ret = __iommu_unmap(domain, iova, size, _gather);
>   iommu_iotlb_sync(domain, _gather);
> 
> We could gather the whole iova range in __iommu_unmap, and then do tlb
> synchronization in the iommu_iotlb_sync.
> 
> This patch implement this, Gather the range in mtk_iommu_unmap.
> then iommu_iotlb_sync call tlb synchronization for the gathered iova range.
> we don't call iommu_iotlb_gather_add_page since our tlb synchronization
> could be regardless of granule size.
> 
> In this way, gather->start is impossible ULONG_MAX, remove the checking.
> 
> This patch aims to do tlb synchronization *once* in the iommu_unmap.
> 
> Signed-off-by: Yong Wu 
> ---
>  drivers/iommu/mtk_iommu.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index db7d43adb06b..89cec51405cd 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -506,7 +506,12 @@ static size_t mtk_iommu_unmap(struct iommu_domain 
> *domain,
> struct iommu_iotlb_gather *gather)
>  {
>   struct mtk_iommu_domain *dom = to_mtk_domain(domain);
> + unsigned long long end = iova + size;
>  
> + if (gather->start > iova)
> + gather->start = iova;
> + if (gather->end < end)
> + gather->end = end;

I don't know how common the case is, but what happens if
gather->start...gather->end is a disjoint range from iova...end? E.g.

 | gather  | ..XXX... | iova |
 | |  |  |
 gather->start |  iova   |
   gather->end   end

We would also end up invalidating the TLB for the XXX area, which could
affect the performance.

Also, why is the existing code in __arm_v7s_unmap() not enough? It seems
to call io_pgtable_tlb_add_page() already, so it should be batching the
flushes.

>   return dom->iop->unmap(dom->iop, iova, size, gather);
>  }
>  
> @@ -523,9 +528,6 @@ static void mtk_iommu_iotlb_sync(struct iommu_domain 
> *domain,
>   struct mtk_iommu_domain *dom = to_mtk_domain(domain);
>   size_t length = gather->end - gather->start;
>  
> - if (gather->start == ULONG_MAX)
> - return;
> -
>   mtk_iommu_tlb_flush_range_sync(gather->start, length, gather->pgsize,
>  dom->data);
>  }
> -- 
> 2.18.0
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 18/27] iommu/mediatek: Add power-domain operation

2020-12-23 Thread Tomasz Figa
On Wed, Dec 09, 2020 at 04:00:53PM +0800, Yong Wu wrote:
> In the previous SoC, the M4U HW is in the EMI power domain which is
> always on. the latest M4U is in the display power domain which may be
> turned on/off, thus we have to add pm_runtime interface for it.
> 
> When the engine work, the engine always enable the power and clocks for
> smi-larb/smi-common, then the M4U's power will always be powered on
> automatically via the device link with smi-common.
> 
> Note: we don't enable the M4U power in iommu_map/unmap for tlb flush.
> If its power already is on, of course it is ok. if the power is off,
> the main tlb will be reset while M4U power on, thus the tlb flush while
> m4u power off is unnecessary, just skip it.
> 
> There will be one case that pm runctime status is not expected when tlb
> flush. After boot, the display may call dma_alloc_attrs before it call
> pm_runtime_get(disp-dev), then the m4u's pm status is not active inside
> the dma_alloc_attrs. Since it only happens after boot, the tlb is clean
> at that time, I also think this is ok.
> 
> Signed-off-by: Yong Wu 
> ---
>  drivers/iommu/mtk_iommu.c | 41 +--
>  1 file changed, 35 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 6fe3ee2b2bf5..0e9c03cbab32 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -184,6 +184,8 @@ static void mtk_iommu_tlb_flush_all(void *cookie)
>   struct mtk_iommu_data *data = cookie;
>  
>   for_each_m4u(data) {
> + if (!pm_runtime_active(data->dev))
> + continue;

Is it guaranteed that the status is active in the check above, but then
the process is preempted and it goes down here?

Shouldn't we do something like below?

ret = pm_runtime_get_if_active();
if (!ret)
continue;
if (ret < 0)
// handle error

// Flush

pm_runtime_put();

Similar comment to the other places being changed by this patch.

>   writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
>  data->base + data->plat_data->inv_sel_reg);
>   writel_relaxed(F_ALL_INVLD, data->base + REG_MMU_INVALIDATE);
> @@ -200,6 +202,10 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long 
> iova, size_t size,
>   u32 tmp;
>  
>   for_each_m4u(data) {
> + /* skip tlb flush when pm is not active. */
> + if (!pm_runtime_active(data->dev))
> + continue;
> +
>   spin_lock_irqsave(>tlb_lock, flags);
>   writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
>  data->base + data->plat_data->inv_sel_reg);
> @@ -384,6 +390,8 @@ static int mtk_iommu_attach_device(struct iommu_domain 
> *domain,
>  {
>   struct mtk_iommu_data *data = dev_iommu_priv_get(dev);
>   struct mtk_iommu_domain *dom = to_mtk_domain(domain);
> + struct device *m4udev = data->dev;
> + bool pm_enabled = pm_runtime_enabled(m4udev);
>   int ret;
>  
>   if (!data)
> @@ -391,12 +399,25 @@ static int mtk_iommu_attach_device(struct iommu_domain 
> *domain,
>  
>   /* Update the pgtable base address register of the M4U HW */
>   if (!data->m4u_dom) {
> + if (pm_enabled) {
> + ret = pm_runtime_get_sync(m4udev);
> + if (ret < 0) {
> + pm_runtime_put_noidle(m4udev);
> + return ret;
> + }
> + }
>   ret = mtk_iommu_hw_init(data);
> - if (ret)
> + if (ret) {
> + if (pm_enabled)
> + pm_runtime_put(m4udev);
>   return ret;
> + }
>   data->m4u_dom = dom;
>   writel(dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK,
>  data->base + REG_MMU_PT_BASE_ADDR);
> +
> + if (pm_enabled)
> + pm_runtime_put(m4udev);
>   }
>  
>   mtk_iommu_config(data, dev, true);
> @@ -747,10 +768,13 @@ static int mtk_iommu_probe(struct platform_device *pdev)
>   if (dev->pm_domain) {
>   struct device_link *link;
>  
> + pm_runtime_enable(dev);
> +
>   link = device_link_add(data->smicomm_dev, dev,
>  DL_FLAG_STATELESS | DL_FLAG_PM_RUNTIME);
>   if (!link) {
>   dev_err(dev, "Unable link %s.\n", 
> dev_name(data->smicomm_dev));
> + pm_runtime_disable(dev);
>   return -EINVAL;
>   }
>   }
> @@ -785,8 +809,10 @@ static int mtk_iommu_probe(struct platform_device *pdev)
>  out_sysfs_remove:
>   iommu_device_sysfs_remove(>iommu);
>  out_link_remove:
> - if (dev->pm_domain)
> + if (dev->pm_domain) {
>   

Re: [PATCH v5 17/27] iommu/mediatek: Add pm runtime callback

2020-12-23 Thread Tomasz Figa
On Wed, Dec 09, 2020 at 04:00:52PM +0800, Yong Wu wrote:
> This patch adds pm runtime callback.
> 
> In pm runtime case, all the registers backup/restore and bclk are
> controlled in the pm_runtime callback, then pm_suspend is not needed in
> this case.
> 
> runtime PM is disabled when suspend, thus we call
> pm_runtime_status_suspended instead of pm_runtime_suspended.
> 
> And, m4u doesn't have its special pm runtime domain in previous SoC, in
> this case dev->power.runtime_status is RPM_SUSPENDED defaultly,

This sounds wrong and could lead to hard to debug errors when the driver
is changed in the future. Would it be possible to make the behavior
consistent across the SoCs instead, so that runtime PM status is ACTIVE
when needed, even on SoCs without an IOMMU PM domain?

> thus add
> a "dev->pm_domain" checking for the SoC that has pm runtime domain.
> 
> Signed-off-by: Yong Wu 
> ---
>  drivers/iommu/mtk_iommu.c | 22 --
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 5614015e5b96..6fe3ee2b2bf5 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -808,7 +808,7 @@ static int mtk_iommu_remove(struct platform_device *pdev)
>   return 0;
>  }
>  
> -static int __maybe_unused mtk_iommu_suspend(struct device *dev)
> +static int __maybe_unused mtk_iommu_runtime_suspend(struct device *dev)
>  {
>   struct mtk_iommu_data *data = dev_get_drvdata(dev);
>   struct mtk_iommu_suspend_reg *reg = >reg;
> @@ -826,7 +826,7 @@ static int __maybe_unused mtk_iommu_suspend(struct device 
> *dev)
>   return 0;
>  }
>  
> -static int __maybe_unused mtk_iommu_resume(struct device *dev)
> +static int __maybe_unused mtk_iommu_runtime_resume(struct device *dev)
>  {
>   struct mtk_iommu_data *data = dev_get_drvdata(dev);
>   struct mtk_iommu_suspend_reg *reg = >reg;
> @@ -853,7 +853,25 @@ static int __maybe_unused mtk_iommu_resume(struct device 
> *dev)
>   return 0;
>  }
>  
> +static int __maybe_unused mtk_iommu_suspend(struct device *dev)
> +{
> + /* runtime PM is disabled when suspend in pm_runtime case. */
> + if (dev->pm_domain && pm_runtime_status_suspended(dev))
> + return 0;
> +
> + return mtk_iommu_runtime_suspend(dev);
> +}
> +
> +static int __maybe_unused mtk_iommu_resume(struct device *dev)
> +{
> + if (dev->pm_domain && pm_runtime_status_suspended(dev))
> + return 0;
> +
> + return mtk_iommu_runtime_resume(dev);
> +}

Wouldn't it be enough to just use pm_runtime_force_suspend() and
pm_runtime_force_resume() as system sleep ops?

> +
>  static const struct dev_pm_ops mtk_iommu_pm_ops = {
> + SET_RUNTIME_PM_OPS(mtk_iommu_runtime_suspend, mtk_iommu_runtime_resume, 
> NULL)
>   SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(mtk_iommu_suspend, mtk_iommu_resume)
>  };
>  
> -- 
> 2.18.0
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 16/27] iommu/mediatek: Add device link for smi-common and m4u

2020-12-23 Thread Tomasz Figa
On Wed, Dec 09, 2020 at 04:00:51PM +0800, Yong Wu wrote:
> In the lastest SoC, M4U has its special power domain. thus, If the engine
> begin to work, it should help enable the power for M4U firstly.
> Currently if the engine work, it always enable the power/clocks for
> smi-larbs/smi-common. This patch adds device_link for smi-common and M4U.
> then, if smi-common power is enabled, the M4U power also is powered on
> automatically.
> 
> Normally M4U connect with several smi-larbs and their smi-common always
> are the same, In this patch it get smi-common dev from the first smi-larb
> device(i==0), then add the device_link only while m4u has power-domain.
> 
> Signed-off-by: Yong Wu 
> ---
>  drivers/iommu/mtk_iommu.c | 30 --
>  drivers/iommu/mtk_iommu.h |  1 +
>  2 files changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 09c8c58feb78..5614015e5b96 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -706,7 +707,7 @@ static int mtk_iommu_probe(struct platform_device *pdev)
>   return larb_nr;
>  
>   for (i = 0; i < larb_nr; i++) {
> - struct device_node *larbnode;
> + struct device_node *larbnode, *smicomm_node;
>   struct platform_device *plarbdev;
>   u32 id;
>  
> @@ -732,6 +733,26 @@ static int mtk_iommu_probe(struct platform_device *pdev)
>  
>   component_match_add_release(dev, , release_of,
>   compare_of, larbnode);
> + if (i != 0)
> + continue;

How about using the last larb instead and moving the code below outside
of the loop?

> + smicomm_node = of_parse_phandle(larbnode, "mediatek,smi", 0);
> + if (!smicomm_node)
> + return -EINVAL;
> +
> + plarbdev = of_find_device_by_node(smicomm_node);
> + of_node_put(smicomm_node);
> + data->smicomm_dev = >dev;
> + }
> +
> + if (dev->pm_domain) {
> + struct device_link *link;
> +
> + link = device_link_add(data->smicomm_dev, dev,
> +DL_FLAG_STATELESS | DL_FLAG_PM_RUNTIME);
> + if (!link) {
> + dev_err(dev, "Unable link %s.\n", 
> dev_name(data->smicomm_dev));
> + return -EINVAL;
> + }
>   }
>  
>   platform_set_drvdata(pdev, data);
> @@ -739,7 +760,7 @@ static int mtk_iommu_probe(struct platform_device *pdev)
>   ret = iommu_device_sysfs_add(>iommu, dev, NULL,
>"mtk-iommu.%pa", );
>   if (ret)
> - return ret;
> + goto out_link_remove;
>  
>   iommu_device_set_ops(>iommu, _iommu_ops);
>   iommu_device_set_fwnode(>iommu, >dev.of_node->fwnode);
> @@ -763,6 +784,9 @@ static int mtk_iommu_probe(struct platform_device *pdev)
>   iommu_device_unregister(>iommu);
>  out_sysfs_remove:
>   iommu_device_sysfs_remove(>iommu);
> +out_link_remove:
> + if (dev->pm_domain)
> + device_link_remove(data->smicomm_dev, dev);
>   return ret;
>  }
>  
> @@ -777,6 +801,8 @@ static int mtk_iommu_remove(struct platform_device *pdev)
>   bus_set_iommu(_bus_type, NULL);
>  
>   clk_disable_unprepare(data->bclk);
> + if (pdev->dev.pm_domain)
> + device_link_remove(data->smicomm_dev, >dev);
>   devm_free_irq(>dev, data->irq, data);
>   component_master_del(>dev, _iommu_com_ops);
>   return 0;
> diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
> index d0c93652bdbe..5e03a029c4dc 100644
> --- a/drivers/iommu/mtk_iommu.h
> +++ b/drivers/iommu/mtk_iommu.h
> @@ -68,6 +68,7 @@ struct mtk_iommu_data {
>  
>   struct iommu_device iommu;
>   const struct mtk_iommu_plat_data *plat_data;
> + struct device   *smicomm_dev;
>  
>   struct dma_iommu_mapping*mapping; /* For mtk_iommu_v1.c */
>  
> -- 
> 2.18.0
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 06/27] dt-bindings: mediatek: Add binding for mt8192 IOMMU

2020-12-23 Thread Tomasz Figa
On Wed, Dec 09, 2020 at 04:00:41PM +0800, Yong Wu wrote:
> This patch adds decriptions for mt8192 IOMMU and SMI.
> 
> mt8192 also is MTK IOMMU gen2 which uses ARM Short-Descriptor translation
> table format. The M4U-SMI HW diagram is as below:
> 
>   EMI
>|
>   M4U
>|
>   
>SMI Common
>   
>|
>   +---+--+--+--+---+
>   |   |  |  |   .. |   |
>   |   |  |  |  |   |
> larb0   larb1  larb2  larb4 ..  larb19   larb20
> disp0   disp1   mdpvdec   IPE  IPE
> 
> All the connections are HW fixed, SW can NOT adjust it.
> 
> mt8192 M4U support 0~16GB iova range. we preassign different engines
> into different iova ranges:
> 
> domain-id  module iova-range  larbs
>0   disp0 ~ 4G  larb0/1
>1   vcodec  4G ~ 8G larb4/5/7
>2   cam/mdp 8G ~ 12G larb2/9/11/13/14/16/17/18/19/20

Why do we preassign these addresses in DT? Shouldn't it be a user's or
integrator's decision to split the 16 GB address range into sub-ranges
and define which larbs those sub-ranges are shared with?

Best regards,
Tomasz

>3   CCU00x4000_ ~ 0x43ff_ larb13: port 9/10
>4   CCU10x4400_ ~ 0x47ff_ larb14: port 4/5
> 
> The iova range for CCU0/1(camera control unit) is HW requirement.
> 
> Signed-off-by: Yong Wu 
> Reviewed-by: Rob Herring 
> ---
>  .../bindings/iommu/mediatek,iommu.yaml|  18 +-
>  include/dt-bindings/memory/mt8192-larb-port.h | 240 ++
>  2 files changed, 257 insertions(+), 1 deletion(-)
>  create mode 100644 include/dt-bindings/memory/mt8192-larb-port.h
> 
> diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml 
> b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
> index ba6626347381..0f26fe14c8e2 100644
> --- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
> +++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
> @@ -76,6 +76,7 @@ properties:
>- mediatek,mt8167-m4u  # generation two
>- mediatek,mt8173-m4u  # generation two
>- mediatek,mt8183-m4u  # generation two
> +  - mediatek,mt8192-m4u  # generation two
>  
>- description: mt7623 generation one
>  items:
> @@ -115,7 +116,11 @@ properties:
>dt-binding/memory/mt6779-larb-port.h for mt6779,
>dt-binding/memory/mt8167-larb-port.h for mt8167,
>dt-binding/memory/mt8173-larb-port.h for mt8173,
> -  dt-binding/memory/mt8183-larb-port.h for mt8183.
> +  dt-binding/memory/mt8183-larb-port.h for mt8183,
> +  dt-binding/memory/mt8192-larb-port.h for mt8192.
> +
> +  power-domains:
> +maxItems: 1
>  
>  required:
>- compatible
> @@ -133,11 +138,22 @@ allOf:
>- mediatek,mt2701-m4u
>- mediatek,mt2712-m4u
>- mediatek,mt8173-m4u
> +  - mediatek,mt8192-m4u
>  
>  then:
>required:
>  - clocks
>  
> +  - if:
> +  properties:
> +compatible:
> +  enum:
> +- mediatek,mt8192-m4u
> +
> +then:
> +  required:
> +- power-domains
> +
>  additionalProperties: false
>  
>  examples:
> diff --git a/include/dt-bindings/memory/mt8192-larb-port.h 
> b/include/dt-bindings/memory/mt8192-larb-port.h
> new file mode 100644
> index ..ec1ac2ba7094
> --- /dev/null
> +++ b/include/dt-bindings/memory/mt8192-larb-port.h
> @@ -0,0 +1,240 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2020 MediaTek Inc.
> + *
> + * Author: Chao Hao 
> + * Author: Yong Wu 
> + */
> +#ifndef _DT_BINDINGS_MEMORY_MT8192_LARB_PORT_H_
> +#define _DT_BINDINGS_MEMORY_MT8192_LARB_PORT_H_
> +
> +#include 
> +
> +/*
> + * MM IOMMU:
> + * domain 0: display: larb0, larb1.
> + * domain 1: vcodec: larb4, larb5, larb7.
> + * domain 2: CAM/MDP: larb2, larb9, larb11, larb13, larb14, larb16,
> + *   larb17, larb18, larb19, larb20,
> + * domain 3: CCU0: larb13 - port9/10.
> + * domain 4: CCU1: larb14 - port4/5.
> + *
> + * larb3/6/8/10/12/15 is null.
> + */
> +
> +/* larb0 */
> +#define M4U_PORT_L0_DISP_POSTMASK0   MTK_M4U_DOM_ID(0, 0, 0)
> +#define M4U_PORT_L0_OVL_RDMA0_HDRMTK_M4U_DOM_ID(0, 0, 1)
> +#define M4U_PORT_L0_OVL_RDMA0MTK_M4U_DOM_ID(0, 0, 2)
> +#define M4U_PORT_L0_DISP_RDMA0   MTK_M4U_DOM_ID(0, 0, 3)
> +#define M4U_PORT_L0_DISP_WDMA0   MTK_M4U_DOM_ID(0, 0, 4)
> +#define M4U_PORT_L0_DISP_FAKE0   MTK_M4U_DOM_ID(0, 0, 5)
> +
> +/* larb1 */
> +#define M4U_PORT_L1_OVL_2L_RDMA0_HDR 

Re: [PATCH v5 15/27] iommu/mediatek: Add fail handle for sysfs_add and device_register

2020-12-23 Thread Tomasz Figa
On Wed, Dec 09, 2020 at 04:00:50PM +0800, Yong Wu wrote:
> Add fail handle for iommu_device_sysfs_add and iommu_device_register.
> 
> Fixes: b16c0170b53c ("iommu/mediatek: Make use of iommu_device_register 
> interface")
> Signed-off-by: Yong Wu 
> ---
>  drivers/iommu/mtk_iommu.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 39478cfbe0f1..09c8c58feb78 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -746,7 +746,7 @@ static int mtk_iommu_probe(struct platform_device *pdev)
>  
>   ret = iommu_device_register(>iommu);
>   if (ret)
> - return ret;
> + goto out_sysfs_remove;
>  
>   spin_lock_init(>tlb_lock);
>   list_add_tail(>list, );
> @@ -754,7 +754,16 @@ static int mtk_iommu_probe(struct platform_device *pdev)
>   if (!iommu_present(_bus_type))
>   bus_set_iommu(_bus_type, _iommu_ops);
>  
> - return component_master_add_with_match(dev, _iommu_com_ops, match);
> + ret = component_master_add_with_match(dev, _iommu_com_ops, match);
> + if (ret)
> + goto out_dev_unreg;
> + return ret;
> +
> +out_dev_unreg:

Shouldn't other operations be undone as well? I can see that above
bus_set_iommu() is set and an entry is added to m4ulist.

> + iommu_device_unregister(>iommu);
> +out_sysfs_remove:
> + iommu_device_sysfs_remove(>iommu);
> + return ret;
>  }
>  
>  static int mtk_iommu_remove(struct platform_device *pdev)
> -- 
> 2.18.0
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 09/27] iommu/io-pgtable-arm-v7s: Extend PA34 for MediaTek

2020-12-23 Thread Tomasz Figa
On Wed, Dec 09, 2020 at 04:00:44PM +0800, Yong Wu wrote:
> MediaTek extend the bit5 in lvl1 and lvl2 descriptor as PA34.
> 
> Signed-off-by: Yong Wu 
> Acked-by: Will Deacon 
> Reviewed-by: Robin Murphy 
> ---
>  drivers/iommu/io-pgtable-arm-v7s.c | 9 +++--
>  drivers/iommu/mtk_iommu.c  | 2 +-
>  include/linux/io-pgtable.h | 4 ++--
>  3 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
> b/drivers/iommu/io-pgtable-arm-v7s.c
> index e880745ab1e8..4d0aa079470f 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -112,9 +112,10 @@
>  #define ARM_V7S_TEX_MASK 0x7
>  #define ARM_V7S_ATTR_TEX(val)(((val) & ARM_V7S_TEX_MASK) << 
> ARM_V7S_TEX_SHIFT)
>  
> -/* MediaTek extend the two bits for PA 32bit/33bit */
> +/* MediaTek extend the bits below for PA 32bit/33bit/34bit */
>  #define ARM_V7S_ATTR_MTK_PA_BIT32BIT(9)
>  #define ARM_V7S_ATTR_MTK_PA_BIT33BIT(4)
> +#define ARM_V7S_ATTR_MTK_PA_BIT34BIT(5)
>  
>  /* *well, except for TEX on level 2 large pages, of course :( */
>  #define ARM_V7S_CONT_PAGE_TEX_SHIFT  6
> @@ -194,6 +195,8 @@ static arm_v7s_iopte paddr_to_iopte(phys_addr_t paddr, 
> int lvl,
>   pte |= ARM_V7S_ATTR_MTK_PA_BIT32;
>   if (paddr & BIT_ULL(33))
>   pte |= ARM_V7S_ATTR_MTK_PA_BIT33;
> + if (paddr & BIT_ULL(34))
> + pte |= ARM_V7S_ATTR_MTK_PA_BIT34;
>   return pte;
>  }
>  
> @@ -218,6 +221,8 @@ static phys_addr_t iopte_to_paddr(arm_v7s_iopte pte, int 
> lvl,
>   paddr |= BIT_ULL(32);
>   if (pte & ARM_V7S_ATTR_MTK_PA_BIT33)
>   paddr |= BIT_ULL(33);
> + if (pte & ARM_V7S_ATTR_MTK_PA_BIT34)
> + paddr |= BIT_ULL(34);
>   return paddr;
>  }
>  
> @@ -754,7 +759,7 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct 
> io_pgtable_cfg *cfg,
>   if (cfg->ias > ARM_V7S_ADDR_BITS)
>   return NULL;
>  
> - if (cfg->oas > (arm_v7s_is_mtk_enabled(cfg) ? 34 : ARM_V7S_ADDR_BITS))
> + if (cfg->oas > (arm_v7s_is_mtk_enabled(cfg) ? 35 : ARM_V7S_ADDR_BITS))
>   return NULL;
>  
>   if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 6451d83753e1..ec3c87d4b172 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -320,7 +320,7 @@ static int mtk_iommu_domain_finalise(struct 
> mtk_iommu_domain *dom)
>   IO_PGTABLE_QUIRK_ARM_MTK_EXT,
>   .pgsize_bitmap = mtk_iommu_ops.pgsize_bitmap,
>   .ias = 32,
> - .oas = 34,
> + .oas = 35,

Shouldn't this be set according to the real hardware capabilities,
instead of always setting it to 35?

Best regards,
Tomasz

>   .tlb = _iommu_flush_ops,
>   .iommu_dev = data->dev,
>   };
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index 4cde111e425b..1ae0757f4f94 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -77,8 +77,8 @@ struct io_pgtable_cfg {
>*  TLB maintenance when mapping as well as when unmapping.
>*
>* IO_PGTABLE_QUIRK_ARM_MTK_EXT: (ARM v7s format) MediaTek IOMMUs extend
> -  *  to support up to 34 bits PA where the bit32 and bit33 are
> -  *  encoded in the bit9 and bit4 of the PTE respectively.
> +  *  to support up to 35 bits PA where the bit32, bit33 and bit34 are
> +  *  encoded in the bit9, bit4 and bit5 of the PTE respectively.
>*
>* IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs
>*  on unmap, for DMA domains using the flush queue mechanism for
> -- 
> 2.18.0
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 04/27] dt-bindings: memory: mediatek: Add domain definition

2020-12-23 Thread Tomasz Figa
Hi Yong,

On Wed, Dec 09, 2020 at 04:00:39PM +0800, Yong Wu wrote:
> In the latest SoC, there are several HW IP require a sepecial iova
> range, mainly CCU and VPU has this requirement. Take CCU as a example,
> CCU require its iova locate in the range(0x4000_ ~ 0x43ff_).

Is this really a domain? Does the address range come from the design of
the IOMMU?

Best regards,
Tomasz

> 
> In this patch we add a domain definition for the special port. In the
> example of CCU, If we preassign CCU port in domain1, then iommu driver
> will prepare a independent iommu domain of the special iova range for it,
> then the iova got from dma_alloc_attrs(ccu-dev) will locate in its special
> range.
> 
> This is a preparing patch for multi-domain support.
> 
> Signed-off-by: Yong Wu 
> Acked-by: Krzysztof Kozlowski 
> Acked-by: Rob Herring 
> ---
>  include/dt-bindings/memory/mtk-smi-larb-port.h | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/include/dt-bindings/memory/mtk-smi-larb-port.h 
> b/include/dt-bindings/memory/mtk-smi-larb-port.h
> index 7d64103209af..2d4c973c174f 100644
> --- a/include/dt-bindings/memory/mtk-smi-larb-port.h
> +++ b/include/dt-bindings/memory/mtk-smi-larb-port.h
> @@ -7,9 +7,16 @@
>  #define __DT_BINDINGS_MEMORY_MTK_MEMORY_PORT_H_
>  
>  #define MTK_LARB_NR_MAX  32
> +#define MTK_M4U_DOM_NR_MAX   8
> +
> +#define MTK_M4U_DOM_ID(domid, larb, port)\
> + (((domid) & 0x7) << 16 | (((larb) & 0x1f) << 5) | ((port) & 0x1f))
> +
> +/* The default dom id is 0. */
> +#define MTK_M4U_ID(larb, port)   MTK_M4U_DOM_ID(0, larb, port)
>  
> -#define MTK_M4U_ID(larb, port)   (((larb) << 5) | (port))
>  #define MTK_M4U_TO_LARB(id)  (((id) >> 5) & 0x1f)
>  #define MTK_M4U_TO_PORT(id)  ((id) & 0x1f)
> +#define MTK_M4U_TO_DOM(id)   (((id) >> 16) & 0x7)
>  
>  #endif
> -- 
> 2.18.0
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 5/6] media: uvcvideo: Use dma_alloc_noncontiguos API

2020-12-07 Thread Tomasz Figa
Hi Christoph,

On Tue, Dec 1, 2020 at 11:49 PM Christoph Hellwig  wrote:
>
> On Tue, Dec 01, 2020 at 12:36:58PM +0900, Sergey Senozhatsky wrote:
> > Not that I have any sound experience in this area, but the helper
> > probably won't hurt. Do you also plan to add vmap() to that helper
> > or dma_alloc_noncontiguous()/sg_alloc_table_from_pages() only?
>
> Yes, I think adding the vmap is useful, and it probably makes sense
> to do that unconditionally.  I'd also include the fallback to
> dma_alloc_pages when the noncontig version isn't supported in the
> helper.

>From the media perspective, it would be good to have the vmap
optional, similarly to the DMA_ATTR_NO_KERNEL_MAPPING attribute for
coherent allocations. Actually, in the media drivers, the need to have
a kernel mapping of the DMA buffers corresponds to a minority of the
drivers. Most of them only need to map them to the userspace.

Nevertheless, that minority actually happens to be quite widely used,
e.g. the uvcvideo driver, so we can't go to the other extreme and just
drop the vmap at all.

In any case, Sergey is going to share a preliminary patch on how the
current API would be used in the V4L2 videobuf2 framework. That should
give us more input on how such a helper could look.

Other than that, again, thanks a lot for helping with this.

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-11-10 Thread Tomasz Figa
On Tue, Nov 10, 2020 at 6:33 PM Ricardo Ribalda  wrote:
>
> Hi Christoph
>
> On Tue, Nov 10, 2020 at 10:25 AM Christoph Hellwig  wrote:
> >
> > On Mon, Nov 09, 2020 at 03:53:55PM +0100, Ricardo Ribalda wrote:
> > > Hi Christoph
> > >
> > > I have started now to give a try to your patchset. Sorry for the delay.
> > >
> > > For uvc I have prepared this patch:
> > > https://github.com/ribalda/linux/commit/9094fe223fe38f8c8ff21366d893b43cbbdf0113
> > >
> > > I have tested successfully in a x86_64 noteboot..., yes I know there
> > > is no change for that platform :).
> > > I am trying to get hold of an arm device that can run the latest
> > > kernel from upstream.
> > >
> > > On the meanwhile if you could take a look to the patch to verify that
> > > this the way that you expect the drivers to use your api I would
> > > appreciate it
> >
> > This looks pretty reaosnable.
> >
>
> Great
>

Thanks Christoph for taking a look quickly.

> Also FYI, I managed to boot an ARM device with that tree. But I could
> not test the uvc driver (it was a remote device with no usb device
> attached)
>
> Hopefully I will be able to test it for real this week.
>
> Any suggestions for how to measure performance difference?

Back in time Kieran (+CC) shared a patch to add extra statistics for
packet processing and payload assembly, with results of various
approaches summarized in a spreadsheet:
https://docs.google.com/spreadsheets/d/1uPdbdVcebO9OQ0LQ8hR2LGIEySWgSnGwwhzv7LPXAlU/edit#gid=0

That and just simple CPU usage comparison would be enough.

>
> Thanks!
>
> > Note that ifdef  CONFIG_DMA_NONCOHERENT in the old code doesn't actually
> > work, as that option is an internal thing just for mips and sh..

In what terms it doesn't actually work? Last time I checked some
platforms actually defined CONFIG_DMA_NONCOHERENT, so those would
instead use the kmalloc() + dma_map() path. I don't have any
background on why that was added and whether it needs to be preserved,
though. Kieran, Laurent, do you have any insight?

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-10-14 Thread Tomasz Figa
+CC Ricardo who will be looking into using this in the USB stack (UVC
camera driver).

On Wed, Sep 30, 2020 at 6:09 PM Christoph Hellwig  wrote:
>
> Add a new API that returns a virtually non-contigous array of pages
> and dma address.  This API is only implemented for dma-iommu and will
> not be implemented for non-iommu DMA API instances that have to allocate
> contiguous memory.  It is up to the caller to check if the API is
> available.
>
> The intent is that media drivers can use this API if either:
>
>  - no kernel mapping or only temporary kernel mappings are required.
>That is as a better replacement for DMA_ATTR_NO_KERNEL_MAPPING
>  - a kernel mapping is required for cached and DMA mapped pages, but
>the driver also needs the pages to e.g. map them to userspace.
>In that sense it is a replacement for some aspects of the recently
>removed and never fully implemented DMA_ATTR_NON_CONSISTENT
>
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/iommu/dma-iommu.c   | 73 +
>  include/linux/dma-mapping.h |  9 +
>  kernel/dma/mapping.c| 35 ++
>  3 files changed, 93 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 7922f545cd5eef..158026a856622c 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -565,23 +565,12 @@ static struct page **__iommu_dma_alloc_pages(struct 
> device *dev,
> return pages;
>  }
>
> -/**
> - * iommu_dma_alloc_remap - Allocate and map a buffer contiguous in IOVA space
> - * @dev: Device to allocate memory for. Must be a real device
> - *  attached to an iommu_dma_domain
> - * @size: Size of buffer in bytes
> - * @dma_handle: Out argument for allocated DMA handle
> - * @gfp: Allocation flags
> - * @prot: pgprot_t to use for the remapped mapping
> - * @attrs: DMA attributes for this allocation
> - *
> - * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
> +/*
> + * If size is less than PAGE_SIZE, then a full CPU page will be allocated,
>   * but an IOMMU which supports smaller pages might not map the whole thing.
> - *
> - * Return: Mapped virtual address, or NULL on failure.
>   */
> -static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
> -   dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot,
> +static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev,
> +   size_t size, dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot,
> unsigned long attrs)
>  {
> struct iommu_domain *domain = iommu_get_dma_domain(dev);
> @@ -593,7 +582,6 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
> size_t size,
> struct page **pages;
> struct sg_table sgt;
> dma_addr_t iova;
> -   void *vaddr;
>
> *dma_handle = DMA_MAPPING_ERROR;
>
> @@ -636,17 +624,10 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
> size_t size,
> < size)
> goto out_free_sg;
>
> -   vaddr = dma_common_pages_remap(pages, size, prot,
> -   __builtin_return_address(0));
> -   if (!vaddr)
> -   goto out_unmap;
> -
> *dma_handle = iova;
> sg_free_table();
> -   return vaddr;
> +   return pages;
>
> -out_unmap:
> -   __iommu_dma_unmap(dev, iova, size);
>  out_free_sg:
> sg_free_table();
>  out_free_iova:
> @@ -656,6 +637,46 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
> size_t size,
> return NULL;
>  }
>
> +static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
> +   dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot,
> +   unsigned long attrs)
> +{
> +   struct page **pages;
> +   void *vaddr;
> +
> +   pages = __iommu_dma_alloc_noncontiguous(dev, size, dma_handle, gfp,
> +   prot, attrs);
> +   if (!pages)
> +   return NULL;
> +   vaddr = dma_common_pages_remap(pages, size, prot,
> +   __builtin_return_address(0));
> +   if (!vaddr)
> +   goto out_unmap;
> +   return vaddr;
> +
> +out_unmap:
> +   __iommu_dma_unmap(dev, *dma_handle, size);
> +   __iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
> +   return NULL;
> +}
> +
> +#ifdef CONFIG_DMA_REMAP
> +static struct page **iommu_dma_alloc_noncontiguous(struct device *dev,
> +   size_t size, dma_addr_t *dma_handle, gfp_t gfp,
> +   unsigned long attrs)
> +{
> +   return __iommu_dma_alloc_noncontiguous(dev, size, dma_handle, gfp,
> +  PAGE_KERNEL, attrs);
> +}
> +
> +static void iommu_dma_free_noncontiguous(struct device *dev, size_t size,
> +   struct page **pages, dma_addr_t dma_handle)
> +{
> +   __iommu_dma_unmap(dev, dma_handle, size);
> +   

Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-10-07 Thread Tomasz Figa
On Wed, Oct 7, 2020 at 8:21 AM Christoph Hellwig  wrote:
>
> On Tue, Oct 06, 2020 at 10:56:04PM +0200, Tomasz Figa wrote:
> > > Yes.  And make sure the API isn't implemented when VIVT caches are
> > > used, but that isn't really different from the current interface.
> >
> > Okay, thanks. Let's see if we can make necessary changes to the videobuf2.
> >
> > +Sergey Senozhatsky for awareness too.
>
> I can defer the changes a bit to see if you'd really much prefer
> the former interface.  I think for now the most important thing is
> that it works properly for the potential users, and the prime one is
> videobuf2 for now.  drm also seems like a big potential users, but I
> had a really hard time getting the developers to engage in API
> development.

My initial feeling is that it should work, but we'll give you a
definitive answer once we prototype it. :)

We might actually give it a try in the USB HCD subsystem as well, to
implement usb_alloc_noncoherent(), as an optimization for drivers
which have to perform multiple random accesses to the URB buffers. I
think you might recall discussing this by the way of the pwc and
uvcvideo camera drivers.

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-10-06 Thread Tomasz Figa
On Mon, Oct 5, 2020 at 10:26 AM Christoph Hellwig  wrote:
>
> On Fri, Oct 02, 2020 at 05:50:40PM +, Tomasz Figa wrote:
> > Hi Christoph,
> >
> > On Wed, Sep 30, 2020 at 06:09:17PM +0200, Christoph Hellwig wrote:
> > > Add a new API that returns a virtually non-contigous array of pages
> > > and dma address.  This API is only implemented for dma-iommu and will
> > > not be implemented for non-iommu DMA API instances that have to allocate
> > > contiguous memory.  It is up to the caller to check if the API is
> > > available.
> >
> > Would you mind scheding some more light on what made the previous attempt
> > not work well? I liked the previous API because it was more consistent with
> > the regular dma_alloc_coherent().
>
> The problem is that with a dma_alloc_noncoherent that can return pages
> not in the kernel mapping we can't just use virt_to_page to fill in
> scatterlists or mmap the buffer to userspace, but would need new helpers
> and another two methods.
>
> > >  - no kernel mapping or only temporary kernel mappings are required.
> > >That is as a better replacement for DMA_ATTR_NO_KERNEL_MAPPING
> > >  - a kernel mapping is required for cached and DMA mapped pages, but
> > >the driver also needs the pages to e.g. map them to userspace.
> > >In that sense it is a replacement for some aspects of the recently
> > >removed and never fully implemented DMA_ATTR_NON_CONSISTENT
> >
> > What's the expected allocation and mapping flow with the latter? Would that 
> > be
> >
> > pages = dma_alloc_noncoherent(...)
> > vaddr = vmap(pages, ...);
> >
> > ?
>
> Yes.  Witht the vmap step optional for replacements of
> DMA_ATTR_NO_KERNEL_MAPPING, which is another nightmare to deal with.
>
> > Would one just use the usual dma_sync_for_{cpu,device}() for cache
> > invallidate/clean, while keeping the mapping in place?
>
> Yes.  And make sure the API isn't implemented when VIVT caches are
> used, but that isn't really different from the current interface.

Okay, thanks. Let's see if we can make necessary changes to the videobuf2.

+Sergey Senozhatsky for awareness too.

Best regrards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-10-02 Thread Tomasz Figa
Hi Christoph,

On Wed, Sep 30, 2020 at 06:09:17PM +0200, Christoph Hellwig wrote:
> Add a new API that returns a virtually non-contigous array of pages
> and dma address.  This API is only implemented for dma-iommu and will
> not be implemented for non-iommu DMA API instances that have to allocate
> contiguous memory.  It is up to the caller to check if the API is
> available.

Would you mind scheding some more light on what made the previous attempt
not work well? I liked the previous API because it was more consistent with
the regular dma_alloc_coherent().

> 
> The intent is that media drivers can use this API if either:

FWIW, the USB subsystem also has similar needs, and so do some DRM drivers
using DMA API rather than IOMMU API directly. Basically I believe that all
the users removed in your previous series relied on custom downstream
patches to make DMA_ATTR_NON_CONSISTENT work and could be finally made work
in upstream using this API.

> 
>  - no kernel mapping or only temporary kernel mappings are required.
>That is as a better replacement for DMA_ATTR_NO_KERNEL_MAPPING
>  - a kernel mapping is required for cached and DMA mapped pages, but
>the driver also needs the pages to e.g. map them to userspace.
>In that sense it is a replacement for some aspects of the recently
>removed and never fully implemented DMA_ATTR_NON_CONSISTENT

What's the expected allocation and mapping flow with the latter? Would that be

pages = dma_alloc_noncoherent(...)
vaddr = vmap(pages, ...);

?

Would one just use the usual dma_sync_for_{cpu,device}() for cache
invallidate/clean, while keeping the mapping in place?

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 17/18] dma-iommu: implement ->alloc_noncoherent

2020-09-26 Thread Tomasz Figa
On Sat, Sep 26, 2020 at 4:14 PM Christoph Hellwig  wrote:
>
> On Fri, Sep 25, 2020 at 06:46:22PM +, Tomasz Figa wrote:
> > > +static void *iommu_dma_alloc_noncoherent(struct device *dev, size_t size,
> > > +   dma_addr_t *handle, enum dma_data_direction dir, gfp_t gfp)
> > > +{
> > > +   if (!gfpflags_allow_blocking(gfp)) {
> > > +   struct page *page;
> > > +
> > > +   page = dma_common_alloc_pages(dev, size, handle, dir, gfp);
> > > +   if (!page)
> > > +   return NULL;
> > > +   return page_address(page);
> > > +   }
> > > +
> > > +   return iommu_dma_alloc_remap(dev, size, handle, gfp | __GFP_ZERO,
> > > +PAGE_KERNEL, 0);
> >
> > iommu_dma_alloc_remap() makes use of the DMA_ATTR_ALLOC_SINGLE_PAGES 
> > attribute
> > to optimize the allocations for devices which don't care about how 
> > contiguous
> > the backing memory is. Do you think we could add an attrs argument to this
> > function and pass it there?
> >
> > As ARM is being moved to the common iommu-dma layer as well, we'll probably
> > make use of the argument to support the DMA_ATTR_NO_KERNEL_MAPPING 
> > attribute to
> > conserve the vmalloc area.
>
> We could probably at it.  However I wonder why this is something the
> drivers should care about.  Isn't this really something that should
> be a kernel-wide policy for a given system?

There are IOMMUs out there which support huge pages and those can
benefit *some* hardware depending on what kind of accesses they
perform, possibly on a per-buffer basis. At the same time, order > 0
allocations can be expensive, significantly affecting allocation
latency, so for devices which don't care about huge pages anyone would
prefer simple single-page allocations. Currently the drivers know the
best on whether the hardware they drive would care. There are some
decision factors listed in the documentation [1].

I can imagine cases where drivers could not be the best to decide
about this - for example, the workload could vary depending on the
userspace or a product decision regarding the performance vs
allocation latency, but we haven't seen such cases in practice yet.

[1] 
https://www.kernel.org/doc/html/latest/core-api/dma-attributes.html?highlight=dma_attr_alloc_single_pages#dma-attr-alloc-single-pages

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/18] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT flag

2020-09-25 Thread Tomasz Figa
Hi Christoph,

On Tue, Sep 15, 2020 at 05:51:05PM +0200, Christoph Hellwig wrote:
> From: Sergey Senozhatsky 
> 
> The patch partially reverts some of the UAPI bits of the buffer
> cache management hints. Namely, the queue consistency (memory
> coherency) user-space hint because, as it turned out, the kernel
> implementation of this feature was misusing DMA_ATTR_NON_CONSISTENT.
> 
> The patch revers both kernel and user space parts: removes the
> DMA consistency attr functions, rollbacks changes to v4l2_requestbuffers,
> v4l2_create_buffers structures and corresponding UAPI functions
> (plus compat32 layer) and cleanups the documentation.
> 
> Signed-off-by: Christoph Hellwig 
> Signed-off-by: Sergey Senozhatsky 
> Signed-off-by: Christoph Hellwig 
> ---
>  .../userspace-api/media/v4l/buffer.rst| 17 ---
>  .../media/v4l/vidioc-create-bufs.rst  |  6 +--
>  .../media/v4l/vidioc-reqbufs.rst  | 12 +
>  .../media/common/videobuf2/videobuf2-core.c   | 46 +++
>  .../common/videobuf2/videobuf2-dma-contig.c   | 19 
>  .../media/common/videobuf2/videobuf2-dma-sg.c |  3 +-
>  .../media/common/videobuf2/videobuf2-v4l2.c   | 18 +---
>  drivers/media/v4l2-core/v4l2-compat-ioctl32.c | 10 +---
>  drivers/media/v4l2-core/v4l2-ioctl.c  |  5 +-
>  include/media/videobuf2-core.h|  7 +--
>  include/uapi/linux/videodev2.h    | 13 +-----
>  11 files changed, 22 insertions(+), 134 deletions(-)

Acked-by: Tomasz Figa 

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 17/18] dma-iommu: implement ->alloc_noncoherent

2020-09-25 Thread Tomasz Figa
Hi Christoph,

On Tue, Sep 15, 2020 at 05:51:21PM +0200, Christoph Hellwig wrote:
> Implement the alloc_noncoherent method to provide memory that is neither
> coherent not contiguous.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/iommu/dma-iommu.c | 41 +++
>  1 file changed, 37 insertions(+), 4 deletions(-)
> 

Sorry for being late to the party and thanks a lot for the patch. Please see my
comments inline.

[snip]
> @@ -1052,6 +1055,34 @@ static void *iommu_dma_alloc(struct device *dev, 
> size_t size,
>   return cpu_addr;
>  }
>  
> +#ifdef CONFIG_DMA_REMAP
> +static void *iommu_dma_alloc_noncoherent(struct device *dev, size_t size,
> + dma_addr_t *handle, enum dma_data_direction dir, gfp_t gfp)
> +{
> + if (!gfpflags_allow_blocking(gfp)) {
> + struct page *page;
> +
> + page = dma_common_alloc_pages(dev, size, handle, dir, gfp);
> + if (!page)
> + return NULL;
> + return page_address(page);
> + }
> +
> + return iommu_dma_alloc_remap(dev, size, handle, gfp | __GFP_ZERO,
> +  PAGE_KERNEL, 0);

iommu_dma_alloc_remap() makes use of the DMA_ATTR_ALLOC_SINGLE_PAGES attribute
to optimize the allocations for devices which don't care about how contiguous
the backing memory is. Do you think we could add an attrs argument to this
function and pass it there?

As ARM is being moved to the common iommu-dma layer as well, we'll probably
make use of the argument to support the DMA_ATTR_NO_KERNEL_MAPPING attribute to
conserve the vmalloc area.

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 30/30] videobuf2: use sgtable-based scatterlist wrappers

2020-09-10 Thread Tomasz Figa
On Thu, Sep 10, 2020 at 11:17 AM Hans Verkuil  wrote:
>
> On 04/09/2020 15:17, Marek Szyprowski wrote:
> > Use recently introduced common wrappers operating directly on the struct
> > sg_table objects and scatterlist page iterators to make the code a bit
> > more compact, robust, easier to follow and copy/paste safe.
> >
> > No functional change, because the code already properly did all the
> > scatterlist related calls.
> >
> > Signed-off-by: Marek Szyprowski 
> > Reviewed-by: Robin Murphy 
>
> Acked-by: Hans Verkuil 
>
> Note that I agree with Marek to keep returning -EIO. If we want to propagate
> low-level errors, then that should be done in a separate patch. But I think 
> EIO
> is fine.

As I mentioned, there are 2 different cases here - UAPI and kAPI. I
agree that we should keep -EIO for UAPI, but kAPI is another story.
But if we're convinced that -EIO is also fine for the latter, I'm fine
with that.

Best regards,
Tomasz

>
> Regards,
>
> Hans
>
> > ---
> >  .../common/videobuf2/videobuf2-dma-contig.c   | 34 ---
> >  .../media/common/videobuf2/videobuf2-dma-sg.c | 32 +++--
> >  .../common/videobuf2/videobuf2-vmalloc.c  | 12 +++
> >  3 files changed, 31 insertions(+), 47 deletions(-)
> >
> > diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c 
> > b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > index ec3446cc45b8..1b242d844dde 100644
> > --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > @@ -58,10 +58,10 @@ static unsigned long vb2_dc_get_contiguous_size(struct 
> > sg_table *sgt)
> >   unsigned int i;
> >   unsigned long size = 0;
> >
> > - for_each_sg(sgt->sgl, s, sgt->nents, i) {
> > + for_each_sgtable_dma_sg(sgt, s, i) {
> >   if (sg_dma_address(s) != expected)
> >   break;
> > - expected = sg_dma_address(s) + sg_dma_len(s);
> > + expected += sg_dma_len(s);
> >   size += sg_dma_len(s);
> >   }
> >   return size;
> > @@ -103,8 +103,7 @@ static void vb2_dc_prepare(void *buf_priv)
> >   if (!sgt)
> >   return;
> >
> > - dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->orig_nents,
> > -buf->dma_dir);
> > + dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
> >  }
> >
> >  static void vb2_dc_finish(void *buf_priv)
> > @@ -115,7 +114,7 @@ static void vb2_dc_finish(void *buf_priv)
> >   if (!sgt)
> >   return;
> >
> > - dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->orig_nents, 
> > buf->dma_dir);
> > + dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
> >  }
> >
> >  /*/
> > @@ -275,8 +274,8 @@ static void vb2_dc_dmabuf_ops_detach(struct dma_buf 
> > *dbuf,
> >* memory locations do not require any explicit cache
> >* maintenance prior or after being used by the device.
> >*/
> > - dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
> > -attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
> > + dma_unmap_sgtable(db_attach->dev, sgt, attach->dma_dir,
> > +   DMA_ATTR_SKIP_CPU_SYNC);
> >   sg_free_table(sgt);
> >   kfree(attach);
> >   db_attach->priv = NULL;
> > @@ -301,8 +300,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
> >
> >   /* release any previous cache */
> >   if (attach->dma_dir != DMA_NONE) {
> > - dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
> > -attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
> > + dma_unmap_sgtable(db_attach->dev, sgt, attach->dma_dir,
> > +   DMA_ATTR_SKIP_CPU_SYNC);
> >   attach->dma_dir = DMA_NONE;
> >   }
> >
> > @@ -310,9 +309,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
> >* mapping to the client with new direction, no cache sync
> >* required see comment in vb2_dc_dmabuf_ops_detach()
> >*/
> > - sgt->nents = dma_map_sg_attrs(db_attach->dev, sgt->sgl, 
> > sgt->orig_nents,
> > -   dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
> > - if (!sgt->nents) {
> > + if (dma_map_sgtable(db_attach->dev, sgt, dma_dir,
> > + DMA_ATTR_SKIP_CPU_SYNC)) {
> >   pr_err("failed to map scatterlist\n");
> >   mutex_unlock(lock);
> >   return ERR_PTR(-EIO);
> > @@ -455,8 +453,8 @@ static void vb2_dc_put_userptr(void *buf_priv)
> >* No need to sync to CPU, it's already synced to the CPU
> >* since the finish() memop will have been called before this.
> >*/
> > - dma_unmap_sg_attrs(buf->dev, sgt->sgl, sgt->orig_nents,
> > -buf->dma_dir, 

Re: [PATCH v10 30/30] videobuf2: use sgtable-based scatterlist wrappers

2020-09-07 Thread Tomasz Figa
On Mon, Sep 7, 2020 at 4:02 PM Marek Szyprowski
 wrote:
>
> Hi Tomasz,
>
> On 07.09.2020 15:07, Tomasz Figa wrote:
> > On Fri, Sep 4, 2020 at 3:35 PM Marek Szyprowski
> >  wrote:
> >> Use recently introduced common wrappers operating directly on the struct
> >> sg_table objects and scatterlist page iterators to make the code a bit
> >> more compact, robust, easier to follow and copy/paste safe.
> >>
> >> No functional change, because the code already properly did all the
> >> scatterlist related calls.
> >>
> >> Signed-off-by: Marek Szyprowski 
> >> Reviewed-by: Robin Murphy 
> >> ---
> >>   .../common/videobuf2/videobuf2-dma-contig.c   | 34 ---
> >>   .../media/common/videobuf2/videobuf2-dma-sg.c | 32 +++--
> >>   .../common/videobuf2/videobuf2-vmalloc.c  | 12 +++
> >>   3 files changed, 31 insertions(+), 47 deletions(-)
> >>
> > Thanks for the patch! Please see my comments inline.
> >
> >> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c 
> >> b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> >> index ec3446cc45b8..1b242d844dde 100644
> >> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> >> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> >> @@ -58,10 +58,10 @@ static unsigned long vb2_dc_get_contiguous_size(struct 
> >> sg_table *sgt)
> >>  unsigned int i;
> >>  unsigned long size = 0;
> >>
> >> -   for_each_sg(sgt->sgl, s, sgt->nents, i) {
> >> +   for_each_sgtable_dma_sg(sgt, s, i) {
> >>  if (sg_dma_address(s) != expected)
> >>  break;
> >> -   expected = sg_dma_address(s) + sg_dma_len(s);
> >> +   expected += sg_dma_len(s);
> >>  size += sg_dma_len(s);
> >>  }
> >>  return size;
> >> @@ -103,8 +103,7 @@ static void vb2_dc_prepare(void *buf_priv)
> >>  if (!sgt)
> >>  return;
> >>
> >> -   dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->orig_nents,
> >> -  buf->dma_dir);
> >> +   dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
> >>   }
> >>
> >>   static void vb2_dc_finish(void *buf_priv)
> >> @@ -115,7 +114,7 @@ static void vb2_dc_finish(void *buf_priv)
> >>  if (!sgt)
> >>  return;
> >>
> >> -   dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->orig_nents, 
> >> buf->dma_dir);
> >> +   dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
> >>   }
> >>
> >>   /*/
> >> @@ -275,8 +274,8 @@ static void vb2_dc_dmabuf_ops_detach(struct dma_buf 
> >> *dbuf,
> >>   * memory locations do not require any explicit cache
> >>   * maintenance prior or after being used by the device.
> >>   */
> >> -   dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, 
> >> sgt->orig_nents,
> >> -  attach->dma_dir, 
> >> DMA_ATTR_SKIP_CPU_SYNC);
> >> +   dma_unmap_sgtable(db_attach->dev, sgt, attach->dma_dir,
> >> + DMA_ATTR_SKIP_CPU_SYNC);
> >>  sg_free_table(sgt);
> >>  kfree(attach);
> >>  db_attach->priv = NULL;
> >> @@ -301,8 +300,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
> >>
> >>  /* release any previous cache */
> >>  if (attach->dma_dir != DMA_NONE) {
> >> -   dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, 
> >> sgt->orig_nents,
> >> -  attach->dma_dir, 
> >> DMA_ATTR_SKIP_CPU_SYNC);
> >> +   dma_unmap_sgtable(db_attach->dev, sgt, attach->dma_dir,
> >> + DMA_ATTR_SKIP_CPU_SYNC);
> >>  attach->dma_dir = DMA_NONE;
> >>  }
> >>
> >> @@ -310,9 +309,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
> >>   * mapping to the client with new direction, no cache sync
> >>   * required see comment in vb2_dc_dmabuf_ops_detach()
> >>   */
> >> -   sgt->nents = dma_map_sg_attrs(db_attac

Re: [PATCH v10 30/30] videobuf2: use sgtable-based scatterlist wrappers

2020-09-07 Thread Tomasz Figa
Hi Marek,

On Fri, Sep 4, 2020 at 3:35 PM Marek Szyprowski
 wrote:
>
> Use recently introduced common wrappers operating directly on the struct
> sg_table objects and scatterlist page iterators to make the code a bit
> more compact, robust, easier to follow and copy/paste safe.
>
> No functional change, because the code already properly did all the
> scatterlist related calls.
>
> Signed-off-by: Marek Szyprowski 
> Reviewed-by: Robin Murphy 
> ---
>  .../common/videobuf2/videobuf2-dma-contig.c   | 34 ---
>  .../media/common/videobuf2/videobuf2-dma-sg.c | 32 +++--
>  .../common/videobuf2/videobuf2-vmalloc.c  | 12 +++
>  3 files changed, 31 insertions(+), 47 deletions(-)
>

Thanks for the patch! Please see my comments inline.

> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c 
> b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> index ec3446cc45b8..1b242d844dde 100644
> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> @@ -58,10 +58,10 @@ static unsigned long vb2_dc_get_contiguous_size(struct 
> sg_table *sgt)
> unsigned int i;
> unsigned long size = 0;
>
> -   for_each_sg(sgt->sgl, s, sgt->nents, i) {
> +   for_each_sgtable_dma_sg(sgt, s, i) {
> if (sg_dma_address(s) != expected)
> break;
> -   expected = sg_dma_address(s) + sg_dma_len(s);
> +   expected += sg_dma_len(s);
> size += sg_dma_len(s);
> }
> return size;
> @@ -103,8 +103,7 @@ static void vb2_dc_prepare(void *buf_priv)
> if (!sgt)
> return;
>
> -   dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->orig_nents,
> -  buf->dma_dir);
> +   dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
>  }
>
>  static void vb2_dc_finish(void *buf_priv)
> @@ -115,7 +114,7 @@ static void vb2_dc_finish(void *buf_priv)
> if (!sgt)
> return;
>
> -   dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->orig_nents, 
> buf->dma_dir);
> +   dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
>  }
>
>  /*/
> @@ -275,8 +274,8 @@ static void vb2_dc_dmabuf_ops_detach(struct dma_buf *dbuf,
>  * memory locations do not require any explicit cache
>  * maintenance prior or after being used by the device.
>  */
> -   dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
> -  attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
> +   dma_unmap_sgtable(db_attach->dev, sgt, attach->dma_dir,
> + DMA_ATTR_SKIP_CPU_SYNC);
> sg_free_table(sgt);
> kfree(attach);
> db_attach->priv = NULL;
> @@ -301,8 +300,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
>
> /* release any previous cache */
> if (attach->dma_dir != DMA_NONE) {
> -   dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents,
> -  attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
> +   dma_unmap_sgtable(db_attach->dev, sgt, attach->dma_dir,
> + DMA_ATTR_SKIP_CPU_SYNC);
> attach->dma_dir = DMA_NONE;
> }
>
> @@ -310,9 +309,8 @@ static struct sg_table *vb2_dc_dmabuf_ops_map(
>  * mapping to the client with new direction, no cache sync
>  * required see comment in vb2_dc_dmabuf_ops_detach()
>  */
> -   sgt->nents = dma_map_sg_attrs(db_attach->dev, sgt->sgl, 
> sgt->orig_nents,
> - dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
> -   if (!sgt->nents) {
> +   if (dma_map_sgtable(db_attach->dev, sgt, dma_dir,
> +   DMA_ATTR_SKIP_CPU_SYNC)) {
> pr_err("failed to map scatterlist\n");
> mutex_unlock(lock);
> return ERR_PTR(-EIO);

As opposed to dma_map_sg_attrs(), dma_map_sgtable() now returns an
error code on its own. Is it expected to ignore it and return -EIO?

> @@ -455,8 +453,8 @@ static void vb2_dc_put_userptr(void *buf_priv)
>  * No need to sync to CPU, it's already synced to the CPU
>  * since the finish() memop will have been called before this.
>  */
> -   dma_unmap_sg_attrs(buf->dev, sgt->sgl, sgt->orig_nents,
> -  buf->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
> +   dma_unmap_sgtable(buf->dev, sgt, buf->dma_dir,
> + DMA_ATTR_SKIP_CPU_SYNC);
> pages = frame_vector_pages(buf->vec);
> /* sgt should exist only if vector contains pages... */
> BUG_ON(IS_ERR(pages));
> @@ -553,9 +551,8 @@ static void *vb2_dc_get_userptr(struct device *dev, 
> unsigned long vaddr,
>   

Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-09-01 Thread Tomasz Figa
On Tue, Sep 1, 2020 at 1:06 PM Christoph Hellwig  wrote:
>
> On Thu, Aug 20, 2020 at 07:33:48PM +0200, Tomasz Figa wrote:
> > > It wasn't meant to be too insulting, but I found this out when trying
> > > to figure out how to just disable it.  But it also ends up using
> > > the actual dma attr flags for it's own consistency checks, so just
> > > not setting the flag did not turn out to work that easily.
> > >
> >
> > Yes, sadly the videobuf2 ended up becoming quite counterintuitive
> > after growing for the long years and that is reflected in the design
> > of this feature as well. I think we need to do something about it.
>
> So I'm about to respin the series and wonder how we should proceed.
> I've failed to come up with a clean patch to keep the flag and make
> it a no-op.  Can you or your team give it a spin?
>

Okay, I'll take a look.

> Also I wonder if the flag should be renamed from NON_CONSISTENT
> to NON_COHERENT - the consistent thing is a weird wart from the times
> the old PCI DMA API that is mostly gone now.

It originated from the DMA_ATTR_NON_CONSISTENT flag, but agreed that
NON_COHERENT would be more consistent (pun not intended) with the rest
of the DMA API given the removal of that flag. Let me see if we can
still change it.

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 4/5] dt-bindings: of: Add plumbing for restricted DMA pool

2020-08-24 Thread Tomasz Figa
On Tue, Aug 11, 2020 at 11:15 AM Tomasz Figa  wrote:
>
> On Mon, Aug 3, 2020 at 5:15 PM Tomasz Figa  wrote:
> >
> > Hi Claire and Rob,
> >
> > On Mon, Aug 3, 2020 at 4:26 PM Claire Chang  wrote:
> > >
> > > On Sat, Aug 1, 2020 at 4:58 AM Rob Herring  wrote:
> > > >
> > > > On Tue, Jul 28, 2020 at 01:01:39PM +0800, Claire Chang wrote:
> > > > > Introduce the new compatible string, device-swiotlb-pool, for 
> > > > > restricted
> > > > > DMA. One can specify the address and length of the device swiotlb 
> > > > > memory
> > > > > region by device-swiotlb-pool in the device tree.
> > > > >
> > > > > Signed-off-by: Claire Chang 
> > > > > ---
> > > > >  .../reserved-memory/reserved-memory.txt   | 35 
> > > > > +++
> > > > >  1 file changed, 35 insertions(+)
> > > > >
> > > > > diff --git 
> > > > > a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > > >  
> > > > > b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > > > index 4dd20de6977f..78850896e1d0 100644
> > > > > --- 
> > > > > a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > > > +++ 
> > > > > b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > > > @@ -51,6 +51,24 @@ compatible (optional) - standard definition
> > > > >used as a shared pool of DMA buffers for a set of devices. 
> > > > > It can
> > > > >be used by an operating system to instantiate the 
> > > > > necessary pool
> > > > >management subsystem if necessary.
> > > > > +- device-swiotlb-pool: This indicates a region of memory 
> > > > > meant to be
> > > >
> > > > swiotlb is a Linux thing. The binding should be independent.
> > > Got it. Thanks for pointing this out.
> > >
> > > >
> > > > > +  used as a pool of device swiotlb buffers for a given 
> > > > > device. When
> > > > > +  using this, the no-map and reusable properties must not be 
> > > > > set, so the
> > > > > +  operating system can create a virtual mapping that will be 
> > > > > used for
> > > > > +  synchronization. Also, there must be a restricted-dma 
> > > > > property in the
> > > > > +  device node to specify the indexes of reserved-memory 
> > > > > nodes. One can
> > > > > +  specify two reserved-memory nodes in the device tree. One 
> > > > > with
> > > > > +  shared-dma-pool to handle the coherent DMA buffer 
> > > > > allocation, and
> > > > > +  another one with device-swiotlb-pool for regular DMA 
> > > > > to/from system
> > > > > +  memory, which would be subject to bouncing. The main 
> > > > > purpose for
> > > > > +  restricted DMA is to mitigate the lack of DMA access 
> > > > > control on
> > > > > +  systems without an IOMMU, which could result in the DMA 
> > > > > accessing the
> > > > > +  system memory at unexpected times and/or unexpected 
> > > > > addresses,
> > > > > +  possibly leading to data leakage or corruption. The 
> > > > > feature on its own
> > > > > +  provides a basic level of protection against the DMA 
> > > > > overwriting buffer
> > > > > +  contents at unexpected times. However, to protect against 
> > > > > general data
> > > > > +  leakage and system memory corruption, the system needs to 
> > > > > provide a
> > > > > +  way to restrict the DMA to a predefined memory region.
> > > >
> > > > I'm pretty sure we already support per device carveouts and I don't
> > > > understand how this is different.
> > > We use this to bounce streaming DMA in and out of a specially allocated 
> > > region.
> > > I'll try to merge this with the existing one (i.e., shared-dma-pool)
> > > to see if that
> > > makes things clearer.
> > >
> >
> > Indeed,

Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-20 Thread Tomasz Figa
On Thu, Aug 20, 2020 at 6:52 PM Christoph Hellwig  wrote:
>
> On Thu, Aug 20, 2020 at 12:24:31PM +0200, Tomasz Figa wrote:
> > > Of course this still uses the scatterlist structure with its annoying
> > > mix of input and output parametes, so I'd rather not expose it as
> > > an official API at the DMA layer.
> >
> > The problem with the above open coded approach is that it requires
> > explicit handling of the non-IOMMU and IOMMU cases and this is exactly
> > what we don't want to have in vb2 and what was actually the job of the
> > DMA API to hide. Is the plan to actually move the IOMMU handling out
> > of the DMA API?
> >
> > Do you think we could instead turn it into a dma_alloc_noncoherent()
> > helper, which has similar semantics as dma_alloc_attrs() and handles
> > the various corner cases (e.g. invalidate_kernel_vmap_range and
> > flush_kernel_vmap_range) to achieve the desired functionality without
> > delegating the "hell", as you called it, to the users?
>
> Yes, I guess I could do something in that direction.  At least for
> dma-iommu, which thanks to Robin should be all you'll need in the
> foreseeable future.

That would be really great. Let me know if we can help by testing with
V4L2/vb2 or in any other way.

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-20 Thread Tomasz Figa
On Thu, Aug 20, 2020 at 6:54 PM Christoph Hellwig  wrote:
>
> On Thu, Aug 20, 2020 at 12:05:29PM +0200, Tomasz Figa wrote:
> > The UAPI and V4L2/videobuf2 changes are in good shape and the only
> > wrong part is the use of DMA API, which was based on an earlier email
> > guidance anyway, and a change to the synchronization part . I find
> > conclusions like the above insulting for people who put many hours
> > into designing and implementing the related functionality, given the
> > complexity of the videobuf2 framework and how ill-defined the DMA API
> > was, and would feel better if such could be avoided in future
> > communication.
>
> It wasn't meant to be too insulting, but I found this out when trying
> to figure out how to just disable it.  But it also ends up using
> the actual dma attr flags for it's own consistency checks, so just
> not setting the flag did not turn out to work that easily.
>

Yes, sadly the videobuf2 ended up becoming quite counterintuitive
after growing for the long years and that is reflected in the design
of this feature as well. I think we need to do something about it.

> But in general it helps to add a few more people to the Cc list for
> such things that do stranger things.  Especially if you think you did
> it based on the advice of those people.

Indeed, we should have CCed you and other DMA folks. Sergey who worked
on this series is quite new to these areas of the kernel (although not
to the kernel itself) and it's my fault for not explicitly letting him
know to do that.

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-20 Thread Tomasz Figa
On Thu, Aug 20, 2020 at 7:02 AM Christoph Hellwig  wrote:
>
> On Wed, Aug 19, 2020 at 03:07:04PM +0100, Robin Murphy wrote:
> >> FWIW, I asked back in time what the plan is for non-coherent
> >> allocations and it seemed like DMA_ATTR_NON_CONSISTENT and
> >> dma_sync_*() was supposed to be the right thing to go with. [2] The
> >> same thread also explains why dma_alloc_pages() isn't suitable for the
> >> users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT.
> >
> > AFAICS even back then Christoph was implying getting rid of NON_CONSISTENT
> > and *replacing* it with something streaming-API-based - i.e. this series -
> > not encouraging mixing the existing APIs. It doesn't seem impossible to
> > implement a remapping version of this new dma_alloc_pages() for
> > IOMMU-backed ops if it's really warranted (although at that point it seems
> > like "non-coherent" vb2-dc starts to have significant conceptual overlap
> > with vb2-sg).
>
> You can alway vmap the returned pages from dma_alloc_pages, but it will
> make cache invalidation hell - you'll need to use
> invalidate_kernel_vmap_range and flush_kernel_vmap_range to properly
> handle virtually indexed caches.
>
> Or with remapping you mean using the iommu do de-scatter/gather?

Ideally, both.

For remapping in the CPU sense, there are drivers which rely on a
contiguous kernel mapping of the vb2 buffers, which was provided by
dma_alloc_attrs(). I think they could be reworked to work on single
pages, but that would significantly complicate the code. At the same
time, such drivers would actually benefit from a cached mapping,
because they often have non-bursty, random access patterns.

Then, in the IOMMU sense, the whole idea of videobuf2-dma-contig is to
rely on the DMA API to always provide device-contiguous memory, as
required by the hardware which only has a single pointer and size.

>
> You can implement that trivially implement it yourself for the iommu
> case:
>
> {
> merge_boundary = dma_get_merge_boundary(dev);
> if (!merge_boundary || merge_boundary > chunk_size - 1) {
> /* can't coalesce */
> return -EINVAL;
> }
>
>
> nents = DIV_ROUND_UP(total_size, chunk_size);
> sg = sgl_alloc();
> for_each_sgl() {
> sg->page = __alloc_pages(get_order(chunk_size))
> sg->len = chunk_size;
> }
> dma_map_sg(sg, DMA_ATTR_SKIP_CPU_SYNC);
> // you are guaranteed to get a single dma_addr out
> }
>
> Of course this still uses the scatterlist structure with its annoying
> mix of input and output parametes, so I'd rather not expose it as
> an official API at the DMA layer.

The problem with the above open coded approach is that it requires
explicit handling of the non-IOMMU and IOMMU cases and this is exactly
what we don't want to have in vb2 and what was actually the job of the
DMA API to hide. Is the plan to actually move the IOMMU handling out
of the DMA API?

Do you think we could instead turn it into a dma_alloc_noncoherent()
helper, which has similar semantics as dma_alloc_attrs() and handles
the various corner cases (e.g. invalidate_kernel_vmap_range and
flush_kernel_vmap_range) to achieve the desired functionality without
delegating the "hell", as you called it, to the users?

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-20 Thread Tomasz Figa
On Thu, Aug 20, 2020 at 6:45 AM Christoph Hellwig  wrote:
>
> On Wed, Aug 19, 2020 at 04:11:52PM +0200, Tomasz Figa wrote:
> > > > By the way, as a videobuf2 reviewer, I'd appreciate being CC'd on any
> > > > series related to the subsystem-facing DMA API changes, since
> > > > videobuf2 is one of the biggest users of it.
> > >
> > > The cc list is too long - I cc lists and key maintainers.  As a reviewer
> > > should should watch your subsystems lists closely.
> >
> > Well, I guess we can disagree on this, because there is no clear
> > policy. I'm listed in the MAINTAINERS file for the subsystem and I
> > believe the purpose of the file is to list the people to CC on
> > relevant patches. We're all overloaded with work and having to look
> > through the huge volume of mailing lists like linux-media doesn't help
> > and thus I'd still appreciate being added on CC.
>
> I'm happy to Cc and active participant in the discussion.  I'm not
> going to add all reviewers because even with the trimmed CC list
> I'm already hitting the number of receipients limit on various lists.

Fair enough.

We'll make your job easier and just turn my MAINTAINERS entry into a
maintainer. :)

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-20 Thread Tomasz Figa
On Thu, Aug 20, 2020 at 7:20 AM Christoph Hellwig  wrote:
>
> On Thu, Aug 20, 2020 at 06:43:47AM +0200, Christoph Hellwig wrote:
> > On Wed, Aug 19, 2020 at 03:57:53PM +0200, Tomasz Figa wrote:
> > > > > Could you explain what makes you think it's unused? It's a feature of
> > > > > the UAPI generally supported by the videobuf2 framework and relied on
> > > > > by Chromium OS to get any kind of reasonable performance when
> > > > > accessing V4L2 buffers in the userspace.
> > > >
> > > > Because it doesn't do anything except on PARISC and non-coherent MIPS,
> > > > so by definition it isn't used by any of these media drivers.
> > >
> > > It's still an UAPI feature, so we can't simply remove the flag, it
> > > must stay there as a no-op, until the problem is resolved.
> >
> > Ok, I'll switch to just ignoring it for the next version.
>
> So I took a deeper look.  I don't really think it qualifies as a UAPI
> in our traditional sense.  For one it only appeared in 5.9-rc1, so we
> can trivially expedite the patch into 5.9-rc and not actually make it
> show up in any released kernel version.  And even as of the current
> Linus' tree the only user is a test driver.  So I really think the best
> way to go ahead is to just revert it ASAP as the design wasn't thought
> out at all.

The UAPI and V4L2/videobuf2 changes are in good shape and the only
wrong part is the use of DMA API, which was based on an earlier email
guidance anyway, and a change to the synchronization part . I find
conclusions like the above insulting for people who put many hours
into designing and implementing the related functionality, given the
complexity of the videobuf2 framework and how ill-defined the DMA API
was, and would feel better if such could be avoided in future
communication.

That said, we can revert it on the basis of the implementation issues,
but I feel like we wouldn't get anything by doing so, because as I
said, the design is sane and most of the implementation is fine as
well. Instead. I'd suggest simply removing the use of the attribute
being removed, so that the feature stays no-op until the DMA API
provides a way to implement it or we just migrate videobuf2 to stop
using the DMA API as much as possible, like many drivers in the DRM
subsystem did.

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 19/28] dma-mapping: replace DMA_ATTR_NON_CONSISTENT with dma_{alloc, free}_pages

2020-08-19 Thread Tomasz Figa
Hi Christoph,

On Wed, Aug 19, 2020 at 8:57 AM Christoph Hellwig  wrote:
>
> Add a new API to allocate and free pages that are guaranteed to be
> addressable by a device, but otherwise behave like pages allocated by
> alloc_pages.  The intended APIs to sync them for use with the device
> and cpu are dma_sync_single_for_{device,cpu} that are also used for
> streaming mappings.
>
> Switch all drivers over to this new API, but keep the usage of the
> crufty dma_cache_sync API for now, which will be cleaned up on a driver
> by driver basis.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  Documentation/core-api/dma-api.rst| 68 +++
>  Documentation/core-api/dma-attributes.rst |  8 ---
>  arch/alpha/kernel/pci_iommu.c |  2 +
>  arch/arm/mm/dma-mapping-nommu.c   |  2 +
>  arch/arm/mm/dma-mapping.c |  4 ++
>  arch/ia64/hp/common/sba_iommu.c   |  2 +
>  arch/mips/jazz/jazzdma.c  |  7 +--
>  arch/powerpc/kernel/dma-iommu.c   |  2 +
>  arch/powerpc/platforms/ps3/system-bus.c   |  4 ++
>  arch/powerpc/platforms/pseries/vio.c  |  2 +
>  arch/s390/pci/pci_dma.c   |  2 +
>  arch/x86/kernel/amd_gart_64.c |  2 +
>  drivers/iommu/dma-iommu.c |  2 +
>  drivers/iommu/intel/iommu.c   |  4 ++
>  drivers/net/ethernet/i825xx/lasi_82596.c  | 13 ++---
>  drivers/net/ethernet/seeq/sgiseeq.c   | 12 ++--
>  drivers/parisc/ccio-dma.c |  2 +
>  drivers/parisc/sba_iommu.c|  2 +
>  drivers/scsi/53c700.c |  8 +--
>  drivers/scsi/sgiwd93.c| 12 ++--
>  drivers/xen/swiotlb-xen.c |  2 +
>  include/linux/dma-direct.h|  5 ++
>  include/linux/dma-mapping.h   | 29 --
>  include/linux/dma-noncoherent.h   |  3 -
>  kernel/dma/direct.c   | 51 -
>  kernel/dma/mapping.c  | 43 +-
>  kernel/dma/ops_helpers.c  | 35 
>  kernel/dma/virt.c |  2 +
>  sound/mips/hal2.c | 20 +++
>  29 files changed, 254 insertions(+), 96 deletions(-)
>

Thanks for the patch. The general design looks quite nice, but please
see my comments inline.


> diff --git a/Documentation/core-api/dma-api.rst 
> b/Documentation/core-api/dma-api.rst
> index 90239348b30f6f..047fcfffa0e5cf 100644
> --- a/Documentation/core-api/dma-api.rst
> +++ b/Documentation/core-api/dma-api.rst
> @@ -516,48 +516,53 @@ routines, e.g.:::
> }
>
>
> -Part II - Advanced dma usage
> -
> +Part II - Non-coherent DMA allocations
> +--
>
> -Warning: These pieces of the DMA API should not be used in the
> -majority of cases, since they cater for unlikely corner cases that
> -don't belong in usual drivers.
> +These APIs allow to allocate pages that can be used like normal pages
> +in the kernel direct mapping, but are guaranteed to be DMA addressable.

Could we elaborate a bit more on what "like normal pages in kernel
direct mapping" mean from the driver perspective?

>
>  If you don't understand how cache line coherency works between a
>  processor and an I/O device, you should not be using this part of the
> -API at all.
> +API.
>
>  ::
>
> void *
> -   dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t 
> *dma_handle,
> -   gfp_t flag, unsigned long attrs)
> +   dma_alloc_pages(struct device *dev, size_t size, dma_addr_t 
> *dma_handle,
> +   enum dma_data_direction dir, gfp_t gfp)
> +
> +This routine allocates a region of  bytes of consistent memory.  It
> +returns a pointer to the allocated region (in the processor's virtual address
> +space) or NULL if the allocation failed. The returned memory is guanteed to
> +behave like memory allocated using alloc_pages.

There is one aspect that the existing dma_alloc_attrs() handles, but
this new function doesn't: IOMMU support. The function will always
allocate a physically-contiguous block memory, which is a costly
operation and not even guaranteed to succeed, even if enough free
memory is available.

Modern SoCs employ IOMMUs to avoid the need to allocate
physically-contiguous memory and those happen to be also the devices
that could benefit from non-coherent allocations a lot. One of the
tasks of the DMA API was making it possible to allocate suitable
memory for a given device, without having the driver know about the
SoC integration details, such as the presence of an IOMMU.

Today, dma_alloc_attrs() uses the .alloc callback of the dma_ops
struct and the IOMMU-aware implementations, like the dma-iommu helpers
[1], would allocate discontiguous pages. Therefore, while I see the
DMA-aware page allocation functionality as a useful functionality on
its own for scatter-gather-capable hardware, I believe it is 

Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-19 Thread Tomasz Figa
On Wed, Aug 19, 2020 at 4:07 PM Robin Murphy  wrote:
>
> On 2020-08-19 13:49, Tomasz Figa wrote:
> > On Wed, Aug 19, 2020 at 1:51 PM Robin Murphy  wrote:
> >>
> >> Hi Tomasz,
> >>
> >> On 2020-08-19 12:16, Tomasz Figa wrote:
> >>> Hi Christoph,
> >>>
> >>> On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig  wrote:
> >>>>
> >>>> The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused,
> >>>
> >>> Could you explain what makes you think it's unused? It's a feature of
> >>> the UAPI generally supported by the videobuf2 framework and relied on
> >>> by Chromium OS to get any kind of reasonable performance when
> >>> accessing V4L2 buffers in the userspace.
> >>>
> >>>> and causes
> >>>> weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is
> >>>> unimplemented except on PARISC and some MIPS configs, and about to be
> >>>> removed.
> >>>
> >>> It is implemented by the generic DMA mapping layer [1], which is used
> >>> by a number of architectures including ARM64 and supposed to be used
> >>> by new architectures going forward.
> >>
> >> AFAICS all that V4L2_FLAG_MEMORY_NON_CONSISTENT does is end up
> >> controling whether DMA_ATTR_NON_CONSISTENT is added to 
> >> vb2_queue::dma_attrs.
> >>
> >> Please can you point to where DMA_ATTR_NON_CONSISTENT does anything at
> >> all on arm64?
> >>
> >
> > With the default config it doesn't, but with
> > CONFIG_DMA_NONCOHERENT_CACHE_SYNC enabled it makes dma_pgprot() keep
> > the pgprot value as is, without enforcing coherence attributes.
>
> How active are the PA-RISC and MIPS ports of Chromium OS?

Not active. We enable CONFIG_DMA_NONCOHERENT_CACHE_SYNC for ARM64,
given the directions received back in April when discussing the
noncoherent memory functionality on the mailing list in the thread I
pointed out in my previous message and no clarification on why it is
disabled for ARM64 in upstream, despite making several attempts to get
some.

>
> Hacking CONFIG_DMA_NONCOHERENT_CACHE_SYNC into an architecture that
> doesn't provide dma_cache_sync() is wrong, since at worst it may break
> other drivers. If downstream is wildly misusing an API then so be it,
> but it's hardly a strong basis for an upstream argument.

I guess it means that we're wildly misusing the API, but it still does
work. Could you explain how it could break other drivers?

>
> >> Also, I posit that videobuf2 is not actually relying on
> >> DMA_ATTR_NON_CONSISTENT anyway, since it's clearly not using it properly:
> >>
> >> "By using this API, you are guaranteeing to the platform
> >> that you have all the correct and necessary sync points for this memory
> >> in the driver should it choose to return non-consistent memory."
> >>
> >> $ git grep dma_cache_sync drivers/media
> >> $
> >
> > AFAIK dma_cache_sync() isn't the only way to perform the cache
> > synchronization. The earlier patch series that I reviewed relied on
> > dma_get_sgtable() and then dma_sync_sg_*() (which existed in the
> > vb2-dc since forever [1]). However, it looks like with the final code
> > the sgtable isn't acquired and the synchronization isn't happening, so
> > you have a point.
>
> Using the streaming sync calls on coherent allocations has also always
> been wrong per the API, regardless of the bodies of code that have
> happened to get away with it for so long.
>
> > FWIW, I asked back in time what the plan is for non-coherent
> > allocations and it seemed like DMA_ATTR_NON_CONSISTENT and
> > dma_sync_*() was supposed to be the right thing to go with. [2] The
> > same thread also explains why dma_alloc_pages() isn't suitable for the
> > users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT.
>
> AFAICS even back then Christoph was implying getting rid of
> NON_CONSISTENT and *replacing* it with something streaming-API-based -

That's not how I read his reply from the thread I pointed to, but that
might of course be my misunderstanding.

> i.e. this series - not encouraging mixing the existing APIs. It doesn't
> seem impossible to implement a remapping version of this new
> dma_alloc_pages() for IOMMU-backed ops if it's really warranted
> (although at that point it seems like "non-coherent" vb2-dc starts to
> have significant conceptual overlap with vb2-sg).

No, there is no overlap between vb2-dc and vb2-sg. They differ on
another level - the former is to be used

Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-19 Thread Tomasz Figa
On Wed, Aug 19, 2020 at 3:57 PM Christoph Hellwig  wrote:
>
> On Wed, Aug 19, 2020 at 02:49:01PM +0200, Tomasz Figa wrote:
> > With the default config it doesn't, but with
> > CONFIG_DMA_NONCOHERENT_CACHE_SYNC enabled it makes dma_pgprot() keep
> > the pgprot value as is, without enforcing coherence attributes.
>
> Which isn't selected on arm64, and that is for a good reason.
>
> > AFAIK dma_cache_sync() isn't the only way to perform the cache
> > synchronization.
>
> Yes, it is the only documented way to do it.  And if you read the whole
> series instead of screaming you'd see that it provides a proper way
> to deal with non-coherent memory which will also work with arm64.
> instead of screaming
>

I'm sorry if I have offended you in any way, but would also appreciate
it if a less aggressive tone was directed towards me as well.

I have valid reasons to object to this patch, as stated in my previous
emails. The fact that the original feature has problems is of course
another story and, as I mentioned too, I'm willing to look into fixing
them.

I'm of course happy to review the rest of the series and even more
happy to help migrating this code to whatever is added there, as long
as the functionality is preserved.

> > By the way, as a videobuf2 reviewer, I'd appreciate being CC'd on any
> > series related to the subsystem-facing DMA API changes, since
> > videobuf2 is one of the biggest users of it.
>
> The cc list is too long - I cc lists and key maintainers.  As a reviewer
> should should watch your subsystems lists closely.

Well, I guess we can disagree on this, because there is no clear
policy. I'm listed in the MAINTAINERS file for the subsystem and I
believe the purpose of the file is to list the people to CC on
relevant patches. We're all overloaded with work and having to look
through the huge volume of mailing lists like linux-media doesn't help
and thus I'd still appreciate being added on CC.

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-19 Thread Tomasz Figa
On Wed, Aug 19, 2020 at 3:55 PM Christoph Hellwig  wrote:
>
> On Wed, Aug 19, 2020 at 01:16:51PM +0200, Tomasz Figa wrote:
> > Hi Christoph,
> >
> > On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig  wrote:
> > >
> > > The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused,
> >
> > Could you explain what makes you think it's unused? It's a feature of
> > the UAPI generally supported by the videobuf2 framework and relied on
> > by Chromium OS to get any kind of reasonable performance when
> > accessing V4L2 buffers in the userspace.
>
> Because it doesn't do anything except on PARISC and non-coherent MIPS,
> so by definition it isn't used by any of these media drivers.

It's still an UAPI feature, so we can't simply remove the flag, it
must stay there as a no-op, until the problem is resolved.

Also, it of course might be disputable as an out-of-tree usage, but
selecting CONFIG_DMA_NONCOHERENT_CACHE_SYNC makes the flag actually do
something on other platforms, including ARM64.

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-19 Thread Tomasz Figa
On Wed, Aug 19, 2020 at 1:51 PM Robin Murphy  wrote:
>
> Hi Tomasz,
>
> On 2020-08-19 12:16, Tomasz Figa wrote:
> > Hi Christoph,
> >
> > On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig  wrote:
> >>
> >> The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused,
> >
> > Could you explain what makes you think it's unused? It's a feature of
> > the UAPI generally supported by the videobuf2 framework and relied on
> > by Chromium OS to get any kind of reasonable performance when
> > accessing V4L2 buffers in the userspace.
> >
> >> and causes
> >> weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is
> >> unimplemented except on PARISC and some MIPS configs, and about to be
> >> removed.
> >
> > It is implemented by the generic DMA mapping layer [1], which is used
> > by a number of architectures including ARM64 and supposed to be used
> > by new architectures going forward.
>
> AFAICS all that V4L2_FLAG_MEMORY_NON_CONSISTENT does is end up
> controling whether DMA_ATTR_NON_CONSISTENT is added to vb2_queue::dma_attrs.
>
> Please can you point to where DMA_ATTR_NON_CONSISTENT does anything at
> all on arm64?
>

With the default config it doesn't, but with
CONFIG_DMA_NONCOHERENT_CACHE_SYNC enabled it makes dma_pgprot() keep
the pgprot value as is, without enforcing coherence attributes.


> Also, I posit that videobuf2 is not actually relying on
> DMA_ATTR_NON_CONSISTENT anyway, since it's clearly not using it properly:
>
> "By using this API, you are guaranteeing to the platform
> that you have all the correct and necessary sync points for this memory
> in the driver should it choose to return non-consistent memory."
>
> $ git grep dma_cache_sync drivers/media
> $

AFAIK dma_cache_sync() isn't the only way to perform the cache
synchronization. The earlier patch series that I reviewed relied on
dma_get_sgtable() and then dma_sync_sg_*() (which existed in the
vb2-dc since forever [1]). However, it looks like with the final code
the sgtable isn't acquired and the synchronization isn't happening, so
you have a point.

FWIW, I asked back in time what the plan is for non-coherent
allocations and it seemed like DMA_ATTR_NON_CONSISTENT and
dma_sync_*() was supposed to be the right thing to go with. [2] The
same thread also explains why dma_alloc_pages() isn't suitable for the
users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT.

I think we could make a deal here. We could revert back the parts
using DMA_ATTR_NON_CONSISTENT, keeping the UAPI intact, but just
rendering it no-op, since it's just a hint after all. Then, you would
propose a proper, functionally equivalent and working for ARM64,
replacement for dma_alloc_attrs(..., DMA_ATTR_NON_CONSISTENT), which
we could then use to enable the functionality expected by this UAPI.
Does it sound like something that could work as a way forward here?

By the way, as a videobuf2 reviewer, I'd appreciate being CC'd on any
series related to the subsystem-facing DMA API changes, since
videobuf2 is one of the biggest users of it.

[1] 
https://elixir.bootlin.com/linux/v5.9-rc1/source/drivers/media/common/videobuf2/videobuf2-dma-contig.c#L98
[2] https://patchwork.kernel.org/comment/23312203/

Best regards,
Tomasz


>
> Robin.
>
> > [1] 
> > https://elixir.bootlin.com/linux/v5.9-rc1/source/kernel/dma/mapping.c#L341
> >
> > When removing features from generic kernel code, I'd suggest first
> > providing viable alternatives for its users, rather than killing the
> > users altogether.
> >
> > Given the above, I'm afraid I have to NAK this.
> >
> > Best regards,
> > Tomasz
> >
> >>
> >> Signed-off-by: Christoph Hellwig 
> >> ---
> >>   .../userspace-api/media/v4l/buffer.rst| 17 -
> >>   .../media/v4l/vidioc-reqbufs.rst  |  1 -
> >>   .../media/common/videobuf2/videobuf2-core.c   | 36 +--
> >>   .../common/videobuf2/videobuf2-dma-contig.c   | 19 --
> >>   .../media/common/videobuf2/videobuf2-dma-sg.c |  3 +-
> >>   .../media/common/videobuf2/videobuf2-v4l2.c   | 12 ---
> >>   include/media/videobuf2-core.h|  3 +-
> >>   include/uapi/linux/videodev2.h|  2 --
> >>   8 files changed, 3 insertions(+), 90 deletions(-)
> >>
> >> diff --git a/Documentation/userspace-api/media/v4l/buffer.rst 
> >> b/Documentation/userspace-api/media/v4l/buffer.rst
> >> index 57e752aaf414a7..2044ed13cd9d7d 100644
> >> --- a/Documentation/userspace-api/media/v4l/buffer.rst
> >> +++ b/Documentation/userspace-api/media/v4l/buffer.rst
> >> @@ -701,23 +701,6 @@ Me

Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-19 Thread Tomasz Figa
Hi Christoph,

On Wed, Aug 19, 2020 at 8:56 AM Christoph Hellwig  wrote:
>
> The V4L2-FLAG-MEMORY-NON-CONSISTENT flag is entirely unused,

Could you explain what makes you think it's unused? It's a feature of
the UAPI generally supported by the videobuf2 framework and relied on
by Chromium OS to get any kind of reasonable performance when
accessing V4L2 buffers in the userspace.

> and causes
> weird gymanstics with the DMA_ATTR_NON_CONSISTENT flag, which is
> unimplemented except on PARISC and some MIPS configs, and about to be
> removed.

It is implemented by the generic DMA mapping layer [1], which is used
by a number of architectures including ARM64 and supposed to be used
by new architectures going forward.

[1] https://elixir.bootlin.com/linux/v5.9-rc1/source/kernel/dma/mapping.c#L341

When removing features from generic kernel code, I'd suggest first
providing viable alternatives for its users, rather than killing the
users altogether.

Given the above, I'm afraid I have to NAK this.

Best regards,
Tomasz

>
> Signed-off-by: Christoph Hellwig 
> ---
>  .../userspace-api/media/v4l/buffer.rst| 17 -
>  .../media/v4l/vidioc-reqbufs.rst  |  1 -
>  .../media/common/videobuf2/videobuf2-core.c   | 36 +--
>  .../common/videobuf2/videobuf2-dma-contig.c   | 19 --
>  .../media/common/videobuf2/videobuf2-dma-sg.c |  3 +-
>  .../media/common/videobuf2/videobuf2-v4l2.c   | 12 ---
>  include/media/videobuf2-core.h|  3 +-
>  include/uapi/linux/videodev2.h|  2 --
>  8 files changed, 3 insertions(+), 90 deletions(-)
>
> diff --git a/Documentation/userspace-api/media/v4l/buffer.rst 
> b/Documentation/userspace-api/media/v4l/buffer.rst
> index 57e752aaf414a7..2044ed13cd9d7d 100644
> --- a/Documentation/userspace-api/media/v4l/buffer.rst
> +++ b/Documentation/userspace-api/media/v4l/buffer.rst
> @@ -701,23 +701,6 @@ Memory Consistency Flags
>  :stub-columns: 0
>  :widths:   3 1 4
>
> -* .. _`V4L2-FLAG-MEMORY-NON-CONSISTENT`:
> -
> -  - ``V4L2_FLAG_MEMORY_NON_CONSISTENT``
> -  - 0x0001
> -  - A buffer is allocated either in consistent (it will be automatically
> -   coherent between the CPU and the bus) or non-consistent memory. The
> -   latter can provide performance gains, for instance the CPU cache
> -   sync/flush operations can be avoided if the buffer is accessed by the
> -   corresponding device only and the CPU does not read/write to/from that
> -   buffer. However, this requires extra care from the driver -- it must
> -   guarantee memory consistency by issuing a cache flush/sync when
> -   consistency is needed. If this flag is set V4L2 will attempt to
> -   allocate the buffer in non-consistent memory. The flag takes effect
> -   only if the buffer is used for :ref:`memory mapping ` I/O and 
> the
> -   queue reports the :ref:`V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS
> -   ` capability.
> -
>  .. c:type:: v4l2_memory
>
>  enum v4l2_memory
> diff --git a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst 
> b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
> index 75d894d9c36c42..3180c111d368ee 100644
> --- a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
> +++ b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
> @@ -169,7 +169,6 @@ aborting or finishing any DMA in progress, an implicit
>- This capability is set by the driver to indicate that the queue 
> supports
>  cache and memory management hints. However, it's only valid when the
>  queue is used for :ref:`memory mapping ` streaming I/O. See
> -:ref:`V4L2_FLAG_MEMORY_NON_CONSISTENT 
> `,
>  :ref:`V4L2_BUF_FLAG_NO_CACHE_INVALIDATE 
> ` and
>  :ref:`V4L2_BUF_FLAG_NO_CACHE_CLEAN `.
>
> diff --git a/drivers/media/common/videobuf2/videobuf2-core.c 
> b/drivers/media/common/videobuf2/videobuf2-core.c
> index f544d3393e9d6b..66a41cef33c1b1 100644
> --- a/drivers/media/common/videobuf2/videobuf2-core.c
> +++ b/drivers/media/common/videobuf2/videobuf2-core.c
> @@ -721,39 +721,14 @@ int vb2_verify_memory_type(struct vb2_queue *q,
>  }
>  EXPORT_SYMBOL(vb2_verify_memory_type);
>
> -static void set_queue_consistency(struct vb2_queue *q, bool consistent_mem)
> -{
> -   q->dma_attrs &= ~DMA_ATTR_NON_CONSISTENT;
> -
> -   if (!vb2_queue_allows_cache_hints(q))
> -   return;
> -   if (!consistent_mem)
> -   q->dma_attrs |= DMA_ATTR_NON_CONSISTENT;
> -}
> -
> -static bool verify_consistency_attr(struct vb2_queue *q, bool consistent_mem)
> -{
> -   bool queue_is_consistent = !(q->dma_attrs & DMA_ATTR_NON_CONSISTENT);
> -
> -   if (consistent_mem != queue_is_consistent) {
> -   dprintk(q, 1, "memory consistency model mismatch\n");
> -   return false;
> -   }
> -   return true;
> -}
> -
>  int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
>

Re: [RFC v2 4/5] dt-bindings: of: Add plumbing for restricted DMA pool

2020-08-11 Thread Tomasz Figa
On Mon, Aug 3, 2020 at 5:15 PM Tomasz Figa  wrote:
>
> Hi Claire and Rob,
>
> On Mon, Aug 3, 2020 at 4:26 PM Claire Chang  wrote:
> >
> > On Sat, Aug 1, 2020 at 4:58 AM Rob Herring  wrote:
> > >
> > > On Tue, Jul 28, 2020 at 01:01:39PM +0800, Claire Chang wrote:
> > > > Introduce the new compatible string, device-swiotlb-pool, for restricted
> > > > DMA. One can specify the address and length of the device swiotlb memory
> > > > region by device-swiotlb-pool in the device tree.
> > > >
> > > > Signed-off-by: Claire Chang 
> > > > ---
> > > >  .../reserved-memory/reserved-memory.txt   | 35 +++
> > > >  1 file changed, 35 insertions(+)
> > > >
> > > > diff --git 
> > > > a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
> > > > b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > > index 4dd20de6977f..78850896e1d0 100644
> > > > --- 
> > > > a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > > +++ 
> > > > b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > > @@ -51,6 +51,24 @@ compatible (optional) - standard definition
> > > >used as a shared pool of DMA buffers for a set of devices. 
> > > > It can
> > > >be used by an operating system to instantiate the necessary 
> > > > pool
> > > >management subsystem if necessary.
> > > > +- device-swiotlb-pool: This indicates a region of memory meant 
> > > > to be
> > >
> > > swiotlb is a Linux thing. The binding should be independent.
> > Got it. Thanks for pointing this out.
> >
> > >
> > > > +  used as a pool of device swiotlb buffers for a given device. 
> > > > When
> > > > +  using this, the no-map and reusable properties must not be 
> > > > set, so the
> > > > +  operating system can create a virtual mapping that will be 
> > > > used for
> > > > +  synchronization. Also, there must be a restricted-dma 
> > > > property in the
> > > > +  device node to specify the indexes of reserved-memory nodes. 
> > > > One can
> > > > +  specify two reserved-memory nodes in the device tree. One 
> > > > with
> > > > +  shared-dma-pool to handle the coherent DMA buffer 
> > > > allocation, and
> > > > +  another one with device-swiotlb-pool for regular DMA to/from 
> > > > system
> > > > +  memory, which would be subject to bouncing. The main purpose 
> > > > for
> > > > +  restricted DMA is to mitigate the lack of DMA access control 
> > > > on
> > > > +  systems without an IOMMU, which could result in the DMA 
> > > > accessing the
> > > > +  system memory at unexpected times and/or unexpected 
> > > > addresses,
> > > > +  possibly leading to data leakage or corruption. The feature 
> > > > on its own
> > > > +  provides a basic level of protection against the DMA 
> > > > overwriting buffer
> > > > +  contents at unexpected times. However, to protect against 
> > > > general data
> > > > +  leakage and system memory corruption, the system needs to 
> > > > provide a
> > > > +  way to restrict the DMA to a predefined memory region.
> > >
> > > I'm pretty sure we already support per device carveouts and I don't
> > > understand how this is different.
> > We use this to bounce streaming DMA in and out of a specially allocated 
> > region.
> > I'll try to merge this with the existing one (i.e., shared-dma-pool)
> > to see if that
> > makes things clearer.
> >
>
> Indeed, from the firmware point of view, this is just a carveout, for
> which we have the "shared-dma-pool" compatible string defined already.
>
> However, depending on the device and firmware setup, the way the
> carevout is used may change. I can see the following scenarios:
>
> 1) coherent DMA (dma_alloc_*) within a reserved pool and no
> non-coherent DMA (dma_map_*).
>
> This is how the "memory-region" property is handled today in Linux for
> devices which can only DMA from/to the given memory region. However,
>

Re: [RFC v2 4/5] dt-bindings: of: Add plumbing for restricted DMA pool

2020-08-03 Thread Tomasz Figa
Hi Claire and Rob,

On Mon, Aug 3, 2020 at 4:26 PM Claire Chang  wrote:
>
> On Sat, Aug 1, 2020 at 4:58 AM Rob Herring  wrote:
> >
> > On Tue, Jul 28, 2020 at 01:01:39PM +0800, Claire Chang wrote:
> > > Introduce the new compatible string, device-swiotlb-pool, for restricted
> > > DMA. One can specify the address and length of the device swiotlb memory
> > > region by device-swiotlb-pool in the device tree.
> > >
> > > Signed-off-by: Claire Chang 
> > > ---
> > >  .../reserved-memory/reserved-memory.txt   | 35 +++
> > >  1 file changed, 35 insertions(+)
> > >
> > > diff --git 
> > > a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
> > > b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > index 4dd20de6977f..78850896e1d0 100644
> > > --- 
> > > a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > +++ 
> > > b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > @@ -51,6 +51,24 @@ compatible (optional) - standard definition
> > >used as a shared pool of DMA buffers for a set of devices. It 
> > > can
> > >be used by an operating system to instantiate the necessary 
> > > pool
> > >management subsystem if necessary.
> > > +- device-swiotlb-pool: This indicates a region of memory meant 
> > > to be
> >
> > swiotlb is a Linux thing. The binding should be independent.
> Got it. Thanks for pointing this out.
>
> >
> > > +  used as a pool of device swiotlb buffers for a given device. 
> > > When
> > > +  using this, the no-map and reusable properties must not be 
> > > set, so the
> > > +  operating system can create a virtual mapping that will be 
> > > used for
> > > +  synchronization. Also, there must be a restricted-dma property 
> > > in the
> > > +  device node to specify the indexes of reserved-memory nodes. 
> > > One can
> > > +  specify two reserved-memory nodes in the device tree. One with
> > > +  shared-dma-pool to handle the coherent DMA buffer allocation, 
> > > and
> > > +  another one with device-swiotlb-pool for regular DMA to/from 
> > > system
> > > +  memory, which would be subject to bouncing. The main purpose 
> > > for
> > > +  restricted DMA is to mitigate the lack of DMA access control on
> > > +  systems without an IOMMU, which could result in the DMA 
> > > accessing the
> > > +  system memory at unexpected times and/or unexpected addresses,
> > > +  possibly leading to data leakage or corruption. The feature on 
> > > its own
> > > +  provides a basic level of protection against the DMA 
> > > overwriting buffer
> > > +  contents at unexpected times. However, to protect against 
> > > general data
> > > +  leakage and system memory corruption, the system needs to 
> > > provide a
> > > +  way to restrict the DMA to a predefined memory region.
> >
> > I'm pretty sure we already support per device carveouts and I don't
> > understand how this is different.
> We use this to bounce streaming DMA in and out of a specially allocated 
> region.
> I'll try to merge this with the existing one (i.e., shared-dma-pool)
> to see if that
> makes things clearer.
>

Indeed, from the firmware point of view, this is just a carveout, for
which we have the "shared-dma-pool" compatible string defined already.

However, depending on the device and firmware setup, the way the
carevout is used may change. I can see the following scenarios:

1) coherent DMA (dma_alloc_*) within a reserved pool and no
non-coherent DMA (dma_map_*).

This is how the "memory-region" property is handled today in Linux for
devices which can only DMA from/to the given memory region. However,
I'm not sure if no non-coherent DMA is actually enforced in any way by
the DMA subsystem.

2) coherent DMA from a reserved pool and non-coherent DMA from system memory

This is the case for the systems which have some dedicated part of
memory which is guaranteed to be coherent with the DMA, but still can
do non-coherent DMA to any part of the system memory. Linux handles it
the same way as 1), which is what made me believe that 1) might not
actually be handled correctly.

3) coherent DMA and bounced non-coherent DMA within a reserved pool
4) coherent DMA within one pool and bounced non-coherent within another pool

These are the two cases we're interested in. Basically they make it
possible for non-coherent DMA from arbitrary system memory to be
bounced through a reserved pool, which the device has access to. The
current series implements 4), but I'd argue that it:

- is problematic from the firmware point of view, because on most of
the systems, both pools would be just some carveouts and the fact that
Linux would use one for coherent and the other for non-coherent DMA
would be an OS implementation detail,
- suffers from the static memory split 

Re: [PATCH 10/11] media: exynos4-is: Prevent duplicate call to media_pipeline_stop

2020-07-27 Thread Tomasz Figa
On Sat, Jul 25, 2020 at 1:46 AM Jonathan Bakker  wrote:
>
> Hi Tomasz,
>
> On 2020-07-20 6:10 a.m., Tomasz Figa wrote:
> > On Sat, Jul 11, 2020 at 8:17 PM Jonathan Bakker  wrote:
> >>
> >> Hi Tomasz,
> >>
> >> On 2020-07-07 11:44 a.m., Tomasz Figa wrote:
> >>> Hi Jonathan,
> >>>
> >>> On Sat, Apr 25, 2020 at 07:26:49PM -0700, Jonathan Bakker wrote:
> >>>> media_pipeline_stop can be called from both release and streamoff,
> >>>> so make sure they're both protected under the streaming flag and
> >>>> not just one of them.
> >>>
> >>> First of all, thanks for the patch.
> >>>
> >>> Shouldn't it be that release calls streamoff, so that only streamoff
> >>> is supposed to have the call to media_pipeline_stop()?
> >>>
> >>
> >> I can't say that I understand the whole media subsystem enough to know :)
> >> Since media_pipeline_start is called in streamon, it makes sense that 
> >> streamoff
> >> should have the media_pipeline_stop call.  However, even after removing 
> >> the call
> >> in fimc_capture_release I'm still getting a backtrace such as
> >>
> >> [   73.843117] [ cut here ]
> >> [   73.843251] WARNING: CPU: 0 PID: 1575 at 
> >> drivers/media/mc/mc-entity.c:554 media_pipeline_stop+0x20/0x2c [mc]
> >> [   73.843265] Modules linked in: s5p_fimc v4l2_fwnode exynos4_is_common 
> >> videobuf2_dma_contig pvrsrvkm_s5pv210_sgx540_120 videobuf2_memops 
> >> v4l2_mem2mem brcmfmac videobuf2_v4l2 videobuf2_common hci_uart 
> >> sha256_generic libsha256 btbcm bluetooth cfg80211 brcmutil ecdh_generic 
> >> ecc ce147 libaes s5ka3dfx videodev atmel_mxt_ts mc pwm_vibra rtc_max8998
> >> [   73.843471] CPU: 0 PID: 1575 Comm: v4l2-ctl Not tainted 
> >> 5.7.0-14534-g2b33418b254e-dirty #669
> >> [   73.843487] Hardware name: Samsung S5PC110/S5PV210-based board
> >> [   73.843562] [] (unwind_backtrace) from [] 
> >> (show_stack+0x10/0x14)
> >> [   73.843613] [] (show_stack) from [] 
> >> (__warn+0xbc/0xd4)
> >> [   73.843661] [] (__warn) from [] 
> >> (warn_slowpath_fmt+0x60/0xb8)
> >> [   73.843734] [] (warn_slowpath_fmt) from [] 
> >> (media_pipeline_stop+0x20/0x2c [mc])
> >> [   73.843867] [] (media_pipeline_stop [mc]) from [] 
> >> (fimc_cap_streamoff+0x38/0x48 [s5p_fimc])
> >> [   73.844109] [] (fimc_cap_streamoff [s5p_fimc]) from 
> >> [] (__video_do_ioctl+0x220/0x448 [videodev])
> >> [   73.844308] [] (__video_do_ioctl [videodev]) from 
> >> [] (video_usercopy+0x114/0x498 [videodev])
> >> [   73.844438] [] (video_usercopy [videodev]) from [] 
> >> (ksys_ioctl+0x20c/0xa10)
> >> [   73.844484] [] (ksys_ioctl) from [] 
> >> (ret_fast_syscall+0x0/0x54)
> >> [   73.844505] Exception stack(0xe5083fa8 to 0xe5083ff0)
> >> [   73.844546] 3fa0:   0049908d bef8f8c0 0003 40045613 
> >> bef8d5ac 004c1d16
> >> [   73.844590] 3fc0: 0049908d bef8f8c0 bef8f8c0 0036 bef8d5ac  
> >> b6d6b320 bef8faf8
> >> [   73.844620] 3fe0: 004e3ed4 bef8c718 004990bb b6f00d0a
> >> [   73.844642] ---[ end trace e6a4a8b2f20addd4 ]---
> >>
> >> The command I'm using for testing is
> >>
> >> v4l2-ctl --verbose -d 1 --stream-mmap=3 --stream-skip=2 
> >> --stream-to=./test.yuv --stream-count=1
> >>
> >> Since I noticed that the streaming flag was being checked 
> >> fimc_capture_release
> >> but not in fimc_cap_streamoff, I assumed that it was simply a missed 
> >> check.  Comparing
> >> with other drivers, they seem to call media_pipeline_stop in their vb2_ops 
> >> stop_streaming
> >> callback.
> >
> > vb2 does a lot of state handling internally and makes sure that driver
> > ops are not called when unnecessary, preventing double calls for
> > example. I suppose it could be a better place to stop the pipeline
> > indeed. However, ...
> >
> >>
> >> I'm willing to test various options
> >>
> >
> > I think it could make sense to add something like WARN_ON(1) inside
> > media_pipeline_stop() and then check where the first call came from.
>
> Here's the results of that:
>
> [   69.876823] [ cut here ]
> [   69.876962] WARNING: CPU: 0 PID: 1566 at drivers/media/mc/mc-entity.c:550 
> __media_pipeline_stop+0x24/0xfc [mc]
> [   69.876976] Modules linked in: s5p_fimc v4l2_fwno

Re: [PATCH] iommu/mediatek: Move the tlb_sync into tlb_flush

2019-10-09 Thread Tomasz Figa
On Wed, Oct 9, 2019 at 10:38 PM Yong Wu  wrote:
>
> On Wed, 2019-10-09 at 16:56 +0900, Tomasz Figa wrote:
> > On Tue, Oct 8, 2019 at 5:09 PM Yong Wu  wrote:
> > >
> > > Hi Tomasz,
> > >
> > > Sorry for reply late.
> > >
> > > On Wed, 2019-10-02 at 14:18 +0900, Tomasz Figa wrote:
> > > > Hi Yong,
> > > >
> > > > On Mon, Sep 30, 2019 at 2:42 PM Yong Wu  wrote:
> > > > >
> > > > > The commit 4d689b619445 ("iommu/io-pgtable-arm-v7s: Convert to IOMMU 
> > > > > API
> > > > > TLB sync") help move the tlb_sync of unmap from v7s into the iommu
> > > > > framework. It helps add a new function "mtk_iommu_iotlb_sync", But it
> > > > > lacked the dom->pgtlock, then it will cause the variable
> > > > > "tlb_flush_active" may be changed unexpectedly, we could see this 
> > > > > warning
> > > > > log randomly:
> > > > >
> > > >
> > > > Thanks for the patch! Please see my comments inline.
> > > >
> > > > > mtk-iommu 10205000.iommu: Partial TLB flush timed out, falling back to
> > > > > full flush
> > > > >
> > > > > To fix this issue, we can add dom->pgtlock in the 
> > > > > "mtk_iommu_iotlb_sync".
> > > > > And when checking this issue, we find that __arm_v7s_unmap call
> > > > > io_pgtable_tlb_add_flush consecutively when it is 
> > > > > supersection/largepage,
> > > > > this also is potential unsafe for us. There is no tlb flush queue in 
> > > > > the
> > > > > MediaTek M4U HW. The HW always expect the tlb_flush/tlb_sync one by 
> > > > > one.
> > > > > If v7s don't always gurarantee the sequence, Thus, In this patch I 
> > > > > move
> > > > > the tlb_sync into tlb_flush(also rename the function deleting 
> > > > > "_nosync").
> > > > > and we don't care if it is leaf, rearrange the callback functions. 
> > > > > Also,
> > > > > the tlb flush/sync was already finished in v7s, then iotlb_sync and
> > > > > iotlb_sync_all is unnecessary.
> > > >
> > > > Performance-wise, we could do much better. Instead of synchronously
> > > > syncing at the end of mtk_iommu_tlb_add_flush(), we could sync at the
> > > > beginning, if there was any previous flush still pending. We would
> > > > also have to keep the .iotlb_sync() callback, to take care of waiting
> > > > for the last flush. That would allow better pipelining with CPU in
> > > > cases like this:
> > > >
> > > > for (all pages in range) {
> > > >change page table();
> > > >flush();
> > > > }
> > > >
> > > > "change page table()" could execute while the IOMMU is flushing the
> > > > previous change.
> > >
> > > Do you mean adding a new tlb_sync before tlb_flush_no_sync, like below:
> > >
> > > mtk_iommu_tlb_add_flush_nosync {
> > >+ mtk_iommu_tlb_sync();
> > >tlb_flush_no_sync();
> > >data->tlb_flush_active = true;
> > > }
> > >
> > > mtk_iommu_tlb_sync {
> > > if (!data->tlb_flush_active)
> > > return;
> > > tlb_sync();
> > > data->tlb_flush_active = false;
> > > }
> > >
> > > This way look improve the flow, But adjusting the flow is not the root
> > > cause of this issue. the problem is "data->tlb_flush_active" may be
> > > changed from mtk_iommu_iotlb_sync which don't have a dom->pglock.
> >
> > That was not the only problem with existing code. Existing code also
> > assumed that add_flush and sync always go in pairs, but that's not
> > true.
>
> Yes. Thus I put the tlb_flush always followed by tlb_sync to make sure
> they always go in pairs.
>
> >
> > My suggestion is to fix the locking in the driver and keep the sync
> > deferred as much as possible, so that performance is not degraded. I
>
> I really didn't get this timeout warning log in previous kernel(Many
> tlb_flush followed by one tlb_sync),

Locking issues typically lead to timing problems (race conditions), so
it might just be that the sequence or timing of calls changed between
kernel versions, enough to trigger the issue.

> But deferring the sync is not
> sug

Re: [PATCH] iommu/mediatek: Move the tlb_sync into tlb_flush

2019-10-09 Thread Tomasz Figa
On Tue, Oct 8, 2019 at 5:09 PM Yong Wu  wrote:
>
> Hi Tomasz,
>
> Sorry for reply late.
>
> On Wed, 2019-10-02 at 14:18 +0900, Tomasz Figa wrote:
> > Hi Yong,
> >
> > On Mon, Sep 30, 2019 at 2:42 PM Yong Wu  wrote:
> > >
> > > The commit 4d689b619445 ("iommu/io-pgtable-arm-v7s: Convert to IOMMU API
> > > TLB sync") help move the tlb_sync of unmap from v7s into the iommu
> > > framework. It helps add a new function "mtk_iommu_iotlb_sync", But it
> > > lacked the dom->pgtlock, then it will cause the variable
> > > "tlb_flush_active" may be changed unexpectedly, we could see this warning
> > > log randomly:
> > >
> >
> > Thanks for the patch! Please see my comments inline.
> >
> > > mtk-iommu 10205000.iommu: Partial TLB flush timed out, falling back to
> > > full flush
> > >
> > > To fix this issue, we can add dom->pgtlock in the "mtk_iommu_iotlb_sync".
> > > And when checking this issue, we find that __arm_v7s_unmap call
> > > io_pgtable_tlb_add_flush consecutively when it is supersection/largepage,
> > > this also is potential unsafe for us. There is no tlb flush queue in the
> > > MediaTek M4U HW. The HW always expect the tlb_flush/tlb_sync one by one.
> > > If v7s don't always gurarantee the sequence, Thus, In this patch I move
> > > the tlb_sync into tlb_flush(also rename the function deleting "_nosync").
> > > and we don't care if it is leaf, rearrange the callback functions. Also,
> > > the tlb flush/sync was already finished in v7s, then iotlb_sync and
> > > iotlb_sync_all is unnecessary.
> >
> > Performance-wise, we could do much better. Instead of synchronously
> > syncing at the end of mtk_iommu_tlb_add_flush(), we could sync at the
> > beginning, if there was any previous flush still pending. We would
> > also have to keep the .iotlb_sync() callback, to take care of waiting
> > for the last flush. That would allow better pipelining with CPU in
> > cases like this:
> >
> > for (all pages in range) {
> >change page table();
> >flush();
> > }
> >
> > "change page table()" could execute while the IOMMU is flushing the
> > previous change.
>
> Do you mean adding a new tlb_sync before tlb_flush_no_sync, like below:
>
> mtk_iommu_tlb_add_flush_nosync {
>+ mtk_iommu_tlb_sync();
>tlb_flush_no_sync();
>data->tlb_flush_active = true;
> }
>
> mtk_iommu_tlb_sync {
> if (!data->tlb_flush_active)
> return;
> tlb_sync();
> data->tlb_flush_active = false;
> }
>
> This way look improve the flow, But adjusting the flow is not the root
> cause of this issue. the problem is "data->tlb_flush_active" may be
> changed from mtk_iommu_iotlb_sync which don't have a dom->pglock.

That was not the only problem with existing code. Existing code also
assumed that add_flush and sync always go in pairs, but that's not
true.

My suggestion is to fix the locking in the driver and keep the sync
deferred as much as possible, so that performance is not degraded. I
changed my mind, though. I think we would need to make more changes to
the driver to make it implement the flushing efficiently, so let's go
with the current simple approach for now and improve incrementally.

>
> Currently the synchronisation of the tlb_flush/tlb_sync flow are
> controlled by the variable "data->tlb_flush_active".
>
> In this patch putting the tlb_flush/tlb_sync together looks make
> the flow simpler:
> a) Don't need the sensitive variable "tlb_flush_active".
> b) Remove mtk_iommu_iotlb_sync, Don't need add lock in it.
> c) Simplify the tlb_flush_walk/tlb_flush_leaf.
> is it ok?
>

Okay, let's do so as a first step to fix the issue. Then we can
optimize in follow up patches.

> >
> > >
> > > Besides, there are two minor changes:
> > > a) Use writel for the register F_MMU_INV_RANGE which is for triggering the
> > > HW work. We expect all the setting(iova_start/iova_end...) have already
> > > been finished before F_MMU_INV_RANGE.
> > > b) Reduce the tlb timeout value from 10us to 1000us. the original 
> > > value
> > > is so long that affect the multimedia performance.
> >
> > By definition, timeout is something that should not normally happen.
> > Too long timeout affecting multimedia performance would suggest that
> > the timeout was actually happening, which is the core problem, not the
> > length of the timeout. Could you provide more details on this?
>
> As description above, this issue is because there is no dom->pgtlock in
> the mtk_iommu_iotlb_sync. I have tried that the issue will disappear
> after adding lock in it.
>
> Although the issue is fixed after this patch, I still would like to
> reduce the timeout value for somehow error happen in the future. 100ms
> is unnecessary for us. It looks a minor improvement rather than fixing
> the issue. I will use a new patch for it.
>

Okay, makes sense.

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [V2, 2/2] media: i2c: Add DW9768 VCM driver

2019-10-08 Thread Tomasz Figa
Hi Dongchun,

On Thu, Sep 5, 2019 at 4:22 PM  wrote:
>
> From: Dongchun Zhu 
>
> This patch adds a V4L2 sub-device driver for DW9768 lens voice coil,
> and provides control to set the desired focus.
>
> The DW9768 is a 10 bit DAC with 100mA output current sink capability
> from Dongwoon, designed for linear control of voice coil motor,
> and controlled via I2C serial interface.
>
> Signed-off-by: Dongchun Zhu 
> ---
>  MAINTAINERS|   1 +
>  drivers/media/i2c/Kconfig  |  10 ++
>  drivers/media/i2c/Makefile |   1 +
>  drivers/media/i2c/dw9768.c | 349 
> +
>  4 files changed, 361 insertions(+)
>  create mode 100644 drivers/media/i2c/dw9768.c
>

Please see my further comments inline.

[snip]
> +struct regval_list {
> +   unsigned char reg_num;
> +   unsigned char value;

nit: Since we have strictly sized values here, should we use u8 for
both fields instead?

> +};
> +
> +static struct regval_list dw9768_init_regs[] = {
> +   {0x02, 0x02},
> +   {DW9768_CMD_DELAY, DW9768_CMD_DELAY},
> +   {0x06, 0x41},
> +   {0x07, 0x39},
> +   {DW9768_CMD_DELAY, DW9768_CMD_DELAY},
> +};
> +
> +static struct regval_list dw9768_release_regs[] = {
> +   {0x02, 0x00},
> +   {DW9768_CMD_DELAY, DW9768_CMD_DELAY},
> +   {0x01, 0x00},
> +   {DW9768_CMD_DELAY, DW9768_CMD_DELAY},
> +};
> +
> +static int dw9768_write_smbus(struct dw9768 *dw9768, unsigned char reg,
> + unsigned char value)

Should we use u8 for the last two arguments here as well?

> +{
> +   struct i2c_client *client = v4l2_get_subdevdata(>sd);
> +   int ret;
> +
> +   if (reg == DW9768_CMD_DELAY  && value == DW9768_CMD_DELAY)
> +   usleep_range(DW9768_CTRL_DELAY_US,
> +DW9768_CTRL_DELAY_US + 100);

ret will be uninitialized if we go this path.

> +   else
> +   ret = i2c_smbus_write_byte_data(client, reg, value);
> +   return ret;
> +}
> +
> +static int dw9768_write_array(struct dw9768 *dw9768, struct regval_list 
> *vals,
> + u32 len)

Since len is an array size, should we use size_t instead?

> +{
> +   unsigned int i;

size_t?

> +   int ret;
> +
> +   for (i = 0; i < len; i++) {
> +   ret = dw9768_write_smbus(dw9768, vals->reg_num, vals->value);

This should refer to vals[i] instead.

> +   if (ret < 0)
> +   return ret;
> +   }
> +   return 0;
> +}
> +
> +static int dw9768_set_position(struct dw9768 *dw9768, u16 val)
> +{
> +   struct i2c_client *client = v4l2_get_subdevdata(>sd);
> +   u8 addr[2];
> +
> +   addr[0] = (val >> DW9768_DAC_SHIFT) & DW9768_REG_MASK_MSB;
> +   addr[1] = val & DW9768_REG_MASK_LSB;
> +
> +   return i2c_smbus_write_block_data(client, DW9768_SET_POSITION_ADDR,
> + ARRAY_SIZE(addr), addr);

As we discovered earlier, i2c_smbus_write_block_data() uses a
different protocol from what we expected. Please change to
i2c_smbus_write_word_data(), as per our downstream changes.

> +}
> +
> +static int dw9768_release(struct dw9768 *dw9768)
> +{
> +   return dw9768_write_array(dw9768, dw9768_release_regs,
> + ARRAY_SIZE(dw9768_release_regs));
> +}
> +
> +static int dw9768_init(struct dw9768 *dw9768)
> +{
> +   return dw9768_write_array(dw9768, dw9768_init_regs,
> + ARRAY_SIZE(dw9768_init_regs));
> +}
> +
> +/* Power handling */
> +static int dw9768_power_off(struct dw9768 *dw9768)
> +{
> +   struct i2c_client *client = v4l2_get_subdevdata(>sd);
> +   int ret;
> +
> +   ret = dw9768_release(dw9768);
> +   if (ret)
> +   dev_err(>dev, "dw9768 release failed!\n");
> +
> +   ret = regulator_disable(dw9768->vin);
> +   if (ret)
> +   return ret;
> +
> +   return regulator_disable(dw9768->vdd);
> +}
> +
> +static int dw9768_power_on(struct dw9768 *dw9768)
> +{
> +   int ret;
> +
> +   ret = regulator_enable(dw9768->vin);
> +   if (ret < 0)
> +   return ret;
> +
> +   ret = regulator_enable(dw9768->vdd);
> +   if (ret < 0)
> +   return ret;

There is at least T_opr = 200 us of delay needed here. Would you be
able to add a comment and a corresponding usleep_range() call? I guess
the range of (300, 400) would be enough on the safe side.

Best regards,
Tomasz


Re: [PATCH] iommu/mediatek: Move the tlb_sync into tlb_flush

2019-10-01 Thread Tomasz Figa
Hi Yong,

On Mon, Sep 30, 2019 at 2:42 PM Yong Wu  wrote:
>
> The commit 4d689b619445 ("iommu/io-pgtable-arm-v7s: Convert to IOMMU API
> TLB sync") help move the tlb_sync of unmap from v7s into the iommu
> framework. It helps add a new function "mtk_iommu_iotlb_sync", But it
> lacked the dom->pgtlock, then it will cause the variable
> "tlb_flush_active" may be changed unexpectedly, we could see this warning
> log randomly:
>

Thanks for the patch! Please see my comments inline.

> mtk-iommu 10205000.iommu: Partial TLB flush timed out, falling back to
> full flush
>
> To fix this issue, we can add dom->pgtlock in the "mtk_iommu_iotlb_sync".
> And when checking this issue, we find that __arm_v7s_unmap call
> io_pgtable_tlb_add_flush consecutively when it is supersection/largepage,
> this also is potential unsafe for us. There is no tlb flush queue in the
> MediaTek M4U HW. The HW always expect the tlb_flush/tlb_sync one by one.
> If v7s don't always gurarantee the sequence, Thus, In this patch I move
> the tlb_sync into tlb_flush(also rename the function deleting "_nosync").
> and we don't care if it is leaf, rearrange the callback functions. Also,
> the tlb flush/sync was already finished in v7s, then iotlb_sync and
> iotlb_sync_all is unnecessary.

Performance-wise, we could do much better. Instead of synchronously
syncing at the end of mtk_iommu_tlb_add_flush(), we could sync at the
beginning, if there was any previous flush still pending. We would
also have to keep the .iotlb_sync() callback, to take care of waiting
for the last flush. That would allow better pipelining with CPU in
cases like this:

for (all pages in range) {
   change page table();
   flush();
}

"change page table()" could execute while the IOMMU is flushing the
previous change.

>
> Besides, there are two minor changes:
> a) Use writel for the register F_MMU_INV_RANGE which is for triggering the
> HW work. We expect all the setting(iova_start/iova_end...) have already
> been finished before F_MMU_INV_RANGE.
> b) Reduce the tlb timeout value from 10us to 1000us. the original value
> is so long that affect the multimedia performance.

By definition, timeout is something that should not normally happen.
Too long timeout affecting multimedia performance would suggest that
the timeout was actually happening, which is the core problem, not the
length of the timeout. Could you provide more details on this?

>
> Fixes: 4d689b619445 ("iommu/io-pgtable-arm-v7s: Convert to IOMMU API TLB 
> sync")
> Signed-off-by: Chao Hao 
> Signed-off-by: Yong Wu 
> ---
> This patch looks break the logic for tlb_flush and tlb_sync. I'm not
> sure if it
> is reasonable. If someone has concern, I could change:
> a) Add dom->pgtlock in the mtk_iommu_iotlb_sync
> b) Add a io_pgtable_tlb_sync in [1].
>
> [1]
> https://elixir.bootlin.com/linux/v5.3-rc1/source/drivers/iommu/io-pgtable-arm-v7s.c#L655
>
> This patch rebase on Joerg's mediatek-smmu-merge branch which has mt8183
> and Will's "Rework IOMMU API to allow for batching of invalidation".
> ---
>  drivers/iommu/mtk_iommu.c | 74 
> ---
>  drivers/iommu/mtk_iommu.h |  1 -
>  2 files changed, 19 insertions(+), 56 deletions(-)
>
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 6066272..e13cc56 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -173,11 +173,12 @@ static void mtk_iommu_tlb_flush_all(void *cookie)
> }
>  }
>
> -static void mtk_iommu_tlb_add_flush_nosync(unsigned long iova, size_t size,
> -  size_t granule, bool leaf,
> -  void *cookie)
> +static void mtk_iommu_tlb_add_flush(unsigned long iova, size_t size,
> +   size_t granule, void *cookie)
>  {
> struct mtk_iommu_data *data = cookie;
> +   int ret;
> +   u32 tmp;
>
> for_each_m4u(data) {
> writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
> @@ -186,25 +187,15 @@ static void mtk_iommu_tlb_add_flush_nosync(unsigned 
> long iova, size_t size,
> writel_relaxed(iova, data->base + REG_MMU_INVLD_START_A);
> writel_relaxed(iova + size - 1,
>data->base + REG_MMU_INVLD_END_A);
> -   writel_relaxed(F_MMU_INV_RANGE,
> -  data->base + REG_MMU_INVALIDATE);
> -   data->tlb_flush_active = true;
> -   }
> -}
> -
> -static void mtk_iommu_tlb_sync(void *cookie)
> -{
> -   struct mtk_iommu_data *data = cookie;
> -   int ret;
> -   u32 tmp;
> -
> -   for_each_m4u(data) {
> -   /* Avoid timing out if there's nothing to wait for */
> -   if (!data->tlb_flush_active)
> -   return;
> +   writel(F_MMU_INV_RANGE, data->base + REG_MMU_INVALIDATE);
>
> +   /*
> +* There is no tlb flush queue in the HW, the HW 

Re: [PATCH] media: i2c: ov5695: Modify the function of async register subdev related devices

2019-09-28 Thread Tomasz Figa
On Fri, Sep 27, 2019 at 4:18 PM Dongchun Zhu  wrote:
>
> This patch adds support for registering a sensor sub-device to the async 
> sub-device framework and parse set up common
> sensor related devices such as actuator/VCM.

nit: The description should be wrapped around the 80th column.

Sakari, do we need to resent or you could just rewrap when applying?

>
> Signed-off-by: Dongchun Zhu 
> ---
>  drivers/media/i2c/ov5695.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/media/i2c/ov5695.c b/drivers/media/i2c/ov5695.c
> index e65a943..b6ee62c 100644
> --- a/drivers/media/i2c/ov5695.c
> +++ b/drivers/media/i2c/ov5695.c
> @@ -1328,7 +1328,7 @@ static int ov5695_probe(struct i2c_client *client,
> goto err_power_off;
>  #endif
>
> -   ret = v4l2_async_register_subdev(sd);
> +   ret = v4l2_async_register_subdev_sensor_common(sd);
> if (ret) {
> dev_err(dev, "v4l2 async register subdev failed\n");
>     goto err_clean_entity;
> --
> 2.9.2
>

Otherwise:

Reviewed-by: Tomasz Figa 

Best regards,
Tomasz


Re: [RFC PATCH V3 4/5] platform: mtk-isp: Add Mediatek DIP driver

2019-09-19 Thread Tomasz Figa
On Thu, Sep 19, 2019 at 6:41 PM Frederic Chen
 wrote:
>
> Dear Tomasz,
>
>
> On Thu, 2019-09-12 at 14:58 +0900, Tomasz Figa wrote:
> > On Thu, Sep 12, 2019 at 2:41 AM Frederic Chen
> >  wrote:
> > >
> > > Hi Tomasz,
> > >
> > > I appreciate your helpful comments.
> > >
> > >
> > > On Tue, 2019-09-10 at 13:04 +0900, Tomasz Figa wrote:
> > > > Hi Frederic,
> > > >
> > > > On Tue, Sep 10, 2019 at 4:23 AM  wrote:
> > > > >
> > > > > From: Frederic Chen 
> > > > >
> > > > > This patch adds the driver of Digital Image Processing (DIP)
> > > > > unit in Mediatek ISP system, providing image format
> > > > > conversion, resizing, and rotation features.
> > > > >
> > > > > The mtk-isp directory will contain drivers for multiple IP
> > > > > blocks found in Mediatek ISP system. It will include ISP
> > > > > Pass 1 driver(CAM), sensor interface driver, DIP driver and
> > > > > face detection driver.
> > > > >
> > > > > Signed-off-by: Frederic Chen 
> > > > > ---
> > > > >  drivers/media/platform/mtk-isp/Makefile   |7 +
> > > > >  .../media/platform/mtk-isp/isp_50/Makefile|7 +
> > > > >  .../platform/mtk-isp/isp_50/dip/Makefile  |   18 +
> > > > >  .../platform/mtk-isp/isp_50/dip/mtk_dip-dev.c |  650 +
> > > > >  .../platform/mtk-isp/isp_50/dip/mtk_dip-dev.h |  566 +
> > > > >  .../platform/mtk-isp/isp_50/dip/mtk_dip-hw.h  |  156 ++
> > > > >  .../platform/mtk-isp/isp_50/dip/mtk_dip-sys.c |  521 
> > > > >  .../mtk-isp/isp_50/dip/mtk_dip-v4l2.c | 2255 
> > > > > +
> > > > >  8 files changed, 4180 insertions(+)
> > > > >  create mode 100644 drivers/media/platform/mtk-isp/Makefile
> > > > >  create mode 100644 drivers/media/platform/mtk-isp/isp_50/Makefile
> > > > >  create mode 100644 drivers/media/platform/mtk-isp/isp_50/dip/Makefile
> > > > >  create mode 100644 
> > > > > drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-dev.c
> > > > >  create mode 100644 
> > > > > drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-dev.h
> > > > >  create mode 100644 
> > > > > drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-hw.h
> > > > >  create mode 100644 
> > > > > drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-sys.c
> > > > >  create mode 100644 
> > > > > drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-v4l2.c
> > > > >
> > > >
> > > > Thanks for sending v3!
> > > >
> > > > I'm going to do a full review a bit later, but please check one
> > > > comment about power handling below.
> > > >
> > > > Other than that one comment, from a quick look, I think we only have a
> > > > number of style issues left. Thanks for the hard work!
> > > >
> > > > [snip]
> > > > > +static void dip_runner_func(struct work_struct *work)
> > > > > +{
> > > > > +   struct mtk_dip_request *req = 
> > > > > mtk_dip_hw_mdp_work_to_req(work);
> > > > > +   struct mtk_dip_dev *dip_dev = req->dip_pipe->dip_dev;
> > > > > +   struct img_config *config_data =
> > > > > +   (struct img_config 
> > > > > *)req->working_buf->config_data.vaddr;
> > > > > +
> > > > > +   /*
> > > > > +* Call MDP/GCE API to do HW excecution
> > > > > +* Pass the framejob to MDP driver
> > > > > +*/
> > > > > +   pm_runtime_get_sync(dip_dev->dev);
> > > > > +   mdp_cmdq_sendtask(dip_dev->mdp_pdev, config_data,
> > > > > + >img_fparam.frameparam, NULL, false,
> > > > > + dip_mdp_cb_func, req);
> > > > > +}
> > > > [snip]
> > > > > +static void dip_composer_workfunc(struct work_struct *work)
> > > > > +{
> > > > > +   struct mtk_dip_request *req = mtk_dip_hw_fw_work_to_req(work);
> > > > > +   struct mtk_dip_dev *dip_dev = req->dip_pipe->dip_dev;
> > > > > +   struct img_ipi_param ipi_param;
> > > > > +   

Re: [RFC PATCH V3 4/5] platform: mtk-isp: Add Mediatek DIP driver

2019-09-12 Thread Tomasz Figa
On Thu, Sep 12, 2019 at 2:41 AM Frederic Chen
 wrote:
>
> Hi Tomasz,
>
> I appreciate your helpful comments.
>
>
> On Tue, 2019-09-10 at 13:04 +0900, Tomasz Figa wrote:
> > Hi Frederic,
> >
> > On Tue, Sep 10, 2019 at 4:23 AM  wrote:
> > >
> > > From: Frederic Chen 
> > >
> > > This patch adds the driver of Digital Image Processing (DIP)
> > > unit in Mediatek ISP system, providing image format
> > > conversion, resizing, and rotation features.
> > >
> > > The mtk-isp directory will contain drivers for multiple IP
> > > blocks found in Mediatek ISP system. It will include ISP
> > > Pass 1 driver(CAM), sensor interface driver, DIP driver and
> > > face detection driver.
> > >
> > > Signed-off-by: Frederic Chen 
> > > ---
> > >  drivers/media/platform/mtk-isp/Makefile   |7 +
> > >  .../media/platform/mtk-isp/isp_50/Makefile|7 +
> > >  .../platform/mtk-isp/isp_50/dip/Makefile  |   18 +
> > >  .../platform/mtk-isp/isp_50/dip/mtk_dip-dev.c |  650 +
> > >  .../platform/mtk-isp/isp_50/dip/mtk_dip-dev.h |  566 +
> > >  .../platform/mtk-isp/isp_50/dip/mtk_dip-hw.h  |  156 ++
> > >  .../platform/mtk-isp/isp_50/dip/mtk_dip-sys.c |  521 
> > >  .../mtk-isp/isp_50/dip/mtk_dip-v4l2.c | 2255 +
> > >  8 files changed, 4180 insertions(+)
> > >  create mode 100644 drivers/media/platform/mtk-isp/Makefile
> > >  create mode 100644 drivers/media/platform/mtk-isp/isp_50/Makefile
> > >  create mode 100644 drivers/media/platform/mtk-isp/isp_50/dip/Makefile
> > >  create mode 100644 
> > > drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-dev.c
> > >  create mode 100644 
> > > drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-dev.h
> > >  create mode 100644 drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-hw.h
> > >  create mode 100644 
> > > drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-sys.c
> > >  create mode 100644 
> > > drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-v4l2.c
> > >
> >
> > Thanks for sending v3!
> >
> > I'm going to do a full review a bit later, but please check one
> > comment about power handling below.
> >
> > Other than that one comment, from a quick look, I think we only have a
> > number of style issues left. Thanks for the hard work!
> >
> > [snip]
> > > +static void dip_runner_func(struct work_struct *work)
> > > +{
> > > +   struct mtk_dip_request *req = mtk_dip_hw_mdp_work_to_req(work);
> > > +   struct mtk_dip_dev *dip_dev = req->dip_pipe->dip_dev;
> > > +   struct img_config *config_data =
> > > +   (struct img_config *)req->working_buf->config_data.vaddr;
> > > +
> > > +   /*
> > > +* Call MDP/GCE API to do HW excecution
> > > +* Pass the framejob to MDP driver
> > > +*/
> > > +   pm_runtime_get_sync(dip_dev->dev);
> > > +   mdp_cmdq_sendtask(dip_dev->mdp_pdev, config_data,
> > > + >img_fparam.frameparam, NULL, false,
> > > + dip_mdp_cb_func, req);
> > > +}
> > [snip]
> > > +static void dip_composer_workfunc(struct work_struct *work)
> > > +{
> > > +   struct mtk_dip_request *req = mtk_dip_hw_fw_work_to_req(work);
> > > +   struct mtk_dip_dev *dip_dev = req->dip_pipe->dip_dev;
> > > +   struct img_ipi_param ipi_param;
> > > +   struct mtk_dip_hw_subframe *buf;
> > > +   int ret;
> > > +
> > > +   down(_dev->sem);
> > > +
> > > +   buf = mtk_dip_hw_working_buf_alloc(req->dip_pipe->dip_dev);
> > > +   if (!buf) {
> > > +   dev_err(req->dip_pipe->dip_dev->dev,
> > > +   "%s:%s:req(%p): no free working buffer 
> > > available\n",
> > > +   __func__, req->dip_pipe->desc->name, req);
> > > +   }
> > > +
> > > +   req->working_buf = buf;
> > > +   
> > > mtk_dip_wbuf_to_ipi_img_addr(>img_fparam.frameparam.subfrm_data,
> > > +>buffer);
> > > +   memset(buf->buffer.vaddr, 0, DIP_SUB_FRM_SZ);
> > > +   
> > > mtk_dip_wbuf_to_ipi_img_sw_addr(>img_fparam.frameparam.config_data,
> > > +   >config_

Re: [V2, 2/2] media: i2c: Add more sensor modes for ov8856 camera sensor

2019-09-11 Thread Tomasz Figa
Hi Sakari,

On Tue, Sep 10, 2019 at 10:05 PM  wrote:
>
> From: Dongchun Zhu 
>
> This patch mainly adds two more sensor modes for OV8856 CMOS image sensor.
> That is, the resolution of 1632*1224 and 3264*2448, corresponding to the 
> bayer order of BGGR.
> The sensor revision also differs in some OTP register.
>
> Signed-off-by: Dongchun Zhu 
> ---
>  drivers/media/i2c/ov8856.c | 654 
> +++--
>  1 file changed, 639 insertions(+), 15 deletions(-)
>

What do you think about the approach taken by this patch?

My understanding is that the register arrays being added by it can be
only used with 24MHz input clock, while the existing ones are for
19.2MHz. That means that this patch makes the driver expose completely
different modes (resolutions, mbus formats) depending on the input
clock. Are we okay with this?

Best regards,
Tomasz

> diff --git a/drivers/media/i2c/ov8856.c b/drivers/media/i2c/ov8856.c
> index cd347d6..9ad0b73 100644
> --- a/drivers/media/i2c/ov8856.c
> +++ b/drivers/media/i2c/ov8856.c
> @@ -1,12 +1,15 @@
>  // SPDX-License-Identifier: GPL-2.0
>  // Copyright (c) 2019 Intel Corporation.
>
> +#include 
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -18,10 +21,15 @@
>  #define OV8856_LINK_FREQ_360MHZ36000ULL
>  #define OV8856_LINK_FREQ_180MHZ18000ULL
>  #define OV8856_SCLK14400ULL
> -#define OV8856_MCLK1920
> +#define OV8856_XVCLK   1920
> +#define OV8856_XVCLK_TYP   2400
>  #define OV8856_DATA_LANES  4
>  #define OV8856_RGB_DEPTH   10
>
> +#define REG_X_ADDR_START   0x3808
> +#define X_OUTPUT_FULL_SIZE 0x0cc0
> +#define X_OUTPUT_BINNING_SIZE  0x0660
> +
>  #define OV8856_REG_CHIP_ID 0x300a
>  #define OV8856_CHIP_ID 0x00885a
>
> @@ -29,6 +37,22 @@
>  #define OV8856_MODE_STANDBY0x00
>  #define OV8856_MODE_STREAMING  0x01
>
> +/* define 1B module revision */
> +#define OV8856_1B_MODULE   0x02
> +
> +/* the OTP read-out buffer is at 0x7000 and 0xf is the offset
> + * of the byte in the OTP that means the module revision
> + */
> +#define OV8856_MODULE_REVISION 0x700f
> +#define OV8856_OTP_MODE_CTRL   0x3d84
> +#define OV8856_OTP_LOAD_CTRL   0x3d81
> +#define OV8856_OTP_MODE_AUTO   0x00
> +#define OV8856_OTP_LOAD_CTRL_ENABLEBIT(0)
> +
> +/* Analog control register that decided by module revision */
> +#define OV8856_ANAL_MODE_CTRL  0x3614
> +#define OV8856_ANAL_1B_VAL 0x20
> +
>  /* vertical-timings from sensor */
>  #define OV8856_REG_VTS 0x380e
>  #define OV8856_VTS_MAX 0x7fff
> @@ -64,6 +88,14 @@
>
>  #define to_ov8856(_sd) container_of(_sd, struct ov8856, sd)
>
> +static const char * const ov8856_supply_names[] = {
> +   "dovdd",/* Digital I/O power */
> +   "avdd", /* Analog power */
> +   "dvdd", /* Digital core power */
> +};
> +
> +#define OV8856_NUM_SUPPLIES ARRAY_SIZE(ov8856_supply_names)
> +
>  enum {
> OV8856_LINK_FREQ_720MBPS,
> OV8856_LINK_FREQ_360MBPS,
> @@ -195,11 +227,11 @@ static const struct ov8856_reg mode_3280x2464_regs[] = {
> {0x3800, 0x00},
> {0x3801, 0x00},
> {0x3802, 0x00},
> -   {0x3803, 0x06},
> +   {0x3803, 0x07},
> {0x3804, 0x0c},
> {0x3805, 0xdf},
> {0x3806, 0x09},
> -   {0x3807, 0xa7},
> +   {0x3807, 0xa6},
> {0x3808, 0x0c},
> {0x3809, 0xd0},
> {0x380a, 0x09},
> @@ -211,7 +243,7 @@ static const struct ov8856_reg mode_3280x2464_regs[] = {
> {0x3810, 0x00},
> {0x3811, 0x00},
> {0x3812, 0x00},
> -   {0x3813, 0x01},
> +   {0x3813, 0x00},
> {0x3814, 0x01},
> {0x3815, 0x01},
> {0x3816, 0x00},
> @@ -316,6 +348,209 @@ static const struct ov8856_reg mode_3280x2464_regs[] = {
> {0x5e00, 0x00}
>  };
>
> +static const struct ov8856_reg mode_3264x2448_regs[] = {
> +   {0x0103, 0x01},
> +   {0x0302, 0x3c},
> +   {0x0303, 0x01},
> +   {0x031e, 0x0c},
> +   {0x3000, 0x20},
> +   {0x3003, 0x08},
> +   {0x300e, 0x20},
> +   {0x3010, 0x00},
> +   {0x3015, 0x84},
> +   {0x3018, 0x72},
> +   {0x3021, 0x23},
> +   {0x3033, 0x24},
> +   {0x3500, 0x00},
> +   {0x3501, 0x9a},
> +   {0x3502, 0x20},
> +   {0x3503, 0x08},
> +   {0x3505, 0x83},
> +   {0x3508, 0x01},
> +   {0x3509, 0x80},
> +   {0x350c, 0x00},
> +   {0x350d, 0x80},
> +   {0x350e, 0x04},
> +   {0x350f, 0x00},
> +   {0x3510, 0x00},
> +   {0x3511, 0x02},
> +   {0x3512, 0x00},
> +   {0x3600, 0x72},
> +   {0x3601, 0x40},
> +   {0x3602, 0x30},
> +  

Re: [V1, 2/2] media: i2c: Add more sensor mode for ov8856 camera sensor

2019-09-10 Thread Tomasz Figa
Hi Dongchun,

On Mon, Sep 9, 2019 at 6:27 PM Dongchun Zhu  wrote:
>
> Hi Tomasz,
>
> On Fri, 2019-08-23 at 19:01 +0900, Tomasz Figa wrote:
> > Hi Dongchun,
> >
> > On Thu, Aug 08, 2019 at 05:22:15PM +0800, dongchun@mediatek.com wrote:
[snip]
> > > +
> > >  /* vertical-timings from sensor */
> > >  #define OV8856_REG_VTS 0x380e
> > >  #define OV8856_VTS_MAX 0x7fff
> > > @@ -64,6 +80,14 @@
> > >
> > >  #define to_ov8856(_sd) container_of(_sd, struct 
> > > ov8856, sd)
> > >
> > > +static const char * const ov8856_supply_names[] = {
> > > +   "dovdd",/* Digital I/O power */
> > > +   "avdd", /* Analog power */
> > > +   "dvdd", /* Digital core power */
> > > +};
> > > +
> > > +#define OV8856_NUM_SUPPLIES ARRAY_SIZE(ov8856_supply_names)
> > > +
> > >  enum {
> > > OV8856_LINK_FREQ_720MBPS,
> > > OV8856_LINK_FREQ_360MBPS,
> > > @@ -316,6 +340,208 @@ static const struct ov8856_reg 
> > > mode_3280x2464_regs[] = {
> > > {0x5e00, 0x00}
> > >  };
> > >
> > > +static const struct ov8856_reg mode_3264x2448_regs[] = {
[snip]
> > > +};
> > > +
> >
> > It would be better if we could find the differences between the two arrays
> > and handle them incrementally.
> >
>
> This approach is not recommended.
>

Not recommended by whom? :) I myself recommend that approach.

I'm sorry, but I'm going to NACK this patch (including the
chromeos-4.19 tree), unless there is a very good technical reason not
to do it the way I'm suggesting. The other drivers do it that way and
I see no reason why this one should be an exception.

> For these two arrays, sensor input clock frequencies (19.2MHz, 24MHz)
> are different, corresponding to different PLL register setting.
>
> Besides, there are also some differences in image resolution and
> hts/vts, including 0x3614 register that reflecting sensor revision.
>

What would be the reason preventing us from handling that in driver code?

Note that I do _not_ mean just taking addresses and values that are
different and putting them to a separate array. What I'm asking for is
to handle the differences in a programmatic way - with dedicated code
in the driver setting appropriate registers.

[snip]

> > > +   fmt->code = MEDIA_BUS_FMT_SBGGR10_1X10;
> > > +   else
> > > +   fmt->code = MEDIA_BUS_FMT_SGRBG10_1X10;
> > > +
> > > fmt->field = V4L2_FIELD_NONE;
> > >  }
> > >
> > > @@ -850,6 +1333,17 @@ static int ov8856_start_streaming(struct ov8856 
> > > *ov8856)
> > > return ret;
> > > }
> > >
> > > +   /* update R3614 for 1B module */
> >
> > What's R3614?
> >
>
> R3614 is the register 0x3614, which reflects the sensor revision.
> For instance, it would be 0x20 for 1B module, while 0x60 for 2A module.
>

My point is - this comment doesn't mean anything for a person reading
it. The code below is actually more meaningful - you can see that the
clock settings register is written with a value for 1B.

> > > +   if (ov8856->is_1B_module) {
> > > +   ret = ov8856_write_reg(ov8856, OV8856_CLK_REG,
> > > +  OV8856_REG_VALUE_08BIT,
> > > +  OV8856_CLK_REG_1B_VAL);

Please define this value according to what it means, not a fixed
constant for 1B sensor revision.

> > > +   if (ret) {
> > > +   dev_err(>dev, "failed to set R3614");
> > > +   return ret;
> > > +   }
> > > +   }
> > > +
> > > ret = __v4l2_ctrl_handler_setup(ov8856->sd.ctrl_handler);
> > > if (ret)
> > > return ret;
> > > @@ -882,6 +1376,8 @@ static int ov8856_set_stream(struct v4l2_subdev *sd, 
> > > int enable)
> > > if (ov8856->streaming == enable)
> > > return 0;
> > >
> > > +   dev_dbg(>dev, "hardware version: (%d)\n", 
> > > ov8856->is_1B_module);
> > > +
> > > mutex_lock(>mutex);
> > > if (enable) {
> > > ret = pm_runtime_get_sync(>dev);
> > > @@ -908,6 +1404,54 @@ static int ov8856_set_stream(struct v4l2_subdev 
> > > *sd, int enable)
> > > return ret;
> > >  }
> > >
> > > +/* Calculate the delay in us by c

Re: [RFC PATCH V3 4/5] platform: mtk-isp: Add Mediatek DIP driver

2019-09-09 Thread Tomasz Figa
Hi Frederic,

On Tue, Sep 10, 2019 at 4:23 AM  wrote:
>
> From: Frederic Chen 
>
> This patch adds the driver of Digital Image Processing (DIP)
> unit in Mediatek ISP system, providing image format
> conversion, resizing, and rotation features.
>
> The mtk-isp directory will contain drivers for multiple IP
> blocks found in Mediatek ISP system. It will include ISP
> Pass 1 driver(CAM), sensor interface driver, DIP driver and
> face detection driver.
>
> Signed-off-by: Frederic Chen 
> ---
>  drivers/media/platform/mtk-isp/Makefile   |7 +
>  .../media/platform/mtk-isp/isp_50/Makefile|7 +
>  .../platform/mtk-isp/isp_50/dip/Makefile  |   18 +
>  .../platform/mtk-isp/isp_50/dip/mtk_dip-dev.c |  650 +
>  .../platform/mtk-isp/isp_50/dip/mtk_dip-dev.h |  566 +
>  .../platform/mtk-isp/isp_50/dip/mtk_dip-hw.h  |  156 ++
>  .../platform/mtk-isp/isp_50/dip/mtk_dip-sys.c |  521 
>  .../mtk-isp/isp_50/dip/mtk_dip-v4l2.c | 2255 +
>  8 files changed, 4180 insertions(+)
>  create mode 100644 drivers/media/platform/mtk-isp/Makefile
>  create mode 100644 drivers/media/platform/mtk-isp/isp_50/Makefile
>  create mode 100644 drivers/media/platform/mtk-isp/isp_50/dip/Makefile
>  create mode 100644 drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-dev.c
>  create mode 100644 drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-dev.h
>  create mode 100644 drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-hw.h
>  create mode 100644 drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-sys.c
>  create mode 100644 drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-v4l2.c
>

Thanks for sending v3!

I'm going to do a full review a bit later, but please check one
comment about power handling below.

Other than that one comment, from a quick look, I think we only have a
number of style issues left. Thanks for the hard work!

[snip]
> +static void dip_runner_func(struct work_struct *work)
> +{
> +   struct mtk_dip_request *req = mtk_dip_hw_mdp_work_to_req(work);
> +   struct mtk_dip_dev *dip_dev = req->dip_pipe->dip_dev;
> +   struct img_config *config_data =
> +   (struct img_config *)req->working_buf->config_data.vaddr;
> +
> +   /*
> +* Call MDP/GCE API to do HW excecution
> +* Pass the framejob to MDP driver
> +*/
> +   pm_runtime_get_sync(dip_dev->dev);
> +   mdp_cmdq_sendtask(dip_dev->mdp_pdev, config_data,
> + >img_fparam.frameparam, NULL, false,
> + dip_mdp_cb_func, req);
> +}
[snip]
> +static void dip_composer_workfunc(struct work_struct *work)
> +{
> +   struct mtk_dip_request *req = mtk_dip_hw_fw_work_to_req(work);
> +   struct mtk_dip_dev *dip_dev = req->dip_pipe->dip_dev;
> +   struct img_ipi_param ipi_param;
> +   struct mtk_dip_hw_subframe *buf;
> +   int ret;
> +
> +   down(_dev->sem);
> +
> +   buf = mtk_dip_hw_working_buf_alloc(req->dip_pipe->dip_dev);
> +   if (!buf) {
> +   dev_err(req->dip_pipe->dip_dev->dev,
> +   "%s:%s:req(%p): no free working buffer available\n",
> +   __func__, req->dip_pipe->desc->name, req);
> +   }
> +
> +   req->working_buf = buf;
> +   mtk_dip_wbuf_to_ipi_img_addr(>img_fparam.frameparam.subfrm_data,
> +>buffer);
> +   memset(buf->buffer.vaddr, 0, DIP_SUB_FRM_SZ);
> +   
> mtk_dip_wbuf_to_ipi_img_sw_addr(>img_fparam.frameparam.config_data,
> +   >config_data);
> +   memset(buf->config_data.vaddr, 0, DIP_COMP_SZ);
> +
> +   if (!req->img_fparam.frameparam.tuning_data.present) {
> +   /*
> +* When user enqueued without tuning buffer,
> +* it would use driver internal buffer.
> +*/
> +   dev_dbg(dip_dev->dev,
> +   "%s: frame_no(%d) has no tuning_data\n",
> +   __func__, req->img_fparam.frameparam.frame_no);
> +
> +   mtk_dip_wbuf_to_ipi_tuning_addr
> +   (>img_fparam.frameparam.tuning_data,
> +>tuning_buf);
> +   memset(buf->tuning_buf.vaddr, 0, DIP_TUNING_SZ);
> +   }
> +
> +   mtk_dip_wbuf_to_ipi_img_sw_addr(>img_fparam.frameparam.self_data,
> +   >frameparam);
> +   memcpy(buf->frameparam.vaddr, >img_fparam.frameparam,
> +  sizeof(req->img_fparam.frameparam));
> +   ipi_param.usage = IMG_IPI_FRAME;
> +   ipi_param.frm_param.handle = req->id;
> +   ipi_param.frm_param.scp_addr = (u32)buf->frameparam.scp_daddr;
> +
> +   mutex_lock(_dev->hw_op_lock);
> +   atomic_inc(_dev->num_composing);
> +   ret = scp_ipi_send(dip_dev->scp_pdev, SCP_IPI_DIP, _param,
> +  sizeof(ipi_param), 0);

We're not holding the pm_runtime enable count here
(pm_runtime_get_sync() wasn't 

Re: [V3, 2/2] media: i2c: Add Omnivision OV02A10 camera sensor driver

2019-09-06 Thread Tomasz Figa
On Fri, Sep 6, 2019 at 10:33 AM Dongchun Zhu  wrote:
>
> On Fri, 2019-09-06 at 06:58 +0800, Nicolas Boichat wrote:
> > On Fri, Sep 6, 2019 at 12:05 AM Sakari Ailus
> >  wrote:
> > >
> > > On Thu, Sep 05, 2019 at 07:53:37PM +0900, Tomasz Figa wrote:
> > > > On Thu, Sep 5, 2019 at 7:45 PM Sakari Ailus
> > > >  wrote:
> > > > >
> > > > > Hi Dongchun,
> > > > >
> > > > > On Thu, Sep 05, 2019 at 05:41:05PM +0800, Dongchun Zhu wrote:
> > > > >
> > > > > ...
> > > > >
> > > > > > > > + ret = regulator_bulk_enable(OV02A10_NUM_SUPPLIES, 
> > > > > > > > ov02a10->supplies);
> > > > > > > > + if (ret < 0) {
> > > > > > > > + dev_err(dev, "Failed to enable regulators\n");
> > > > > > > > + goto disable_clk;
> > > > > > > > + }
> > > > > > > > + msleep_range(7);
> > > > > > >
> > > > > > > This has some potential of clashing with more generic functions 
> > > > > > > in the
> > > > > > > future. Please use usleep_range directly, or msleep.
> > > > > > >
> > > > > >
> > > > > > Did you mean using usleep_range(7*1000, 8*1000), as used in patch 
> > > > > > v1?
> > > > > > https://patchwork.kernel.org/patch/10957225/
> > > > >
> > > > > Yes, please.
> > > >
> > > > Why not just msleep()?
> > >
> > > msleep() is usually less accurate. I'm not sure it makes a big different 
> > > in
> > > this case. Perhaps, if someone wants that the sensor is powered on and
> > > streaming as soon as possible.
> >
> > https://elixir.bootlin.com/linux/latest/source/Documentation/timers/timers-howto.txt#L70
> >
> > Use usleep_range for delays up to 20ms (at least that's what the
> > documentation (still) says?)
> >
>
> Thank you for your clarifications.
> From the doc,
> "msleep(1~20) may not do what the caller intends, and
> will often sleep longer (~20 ms actual sleep for any
> value given in the 1~20ms range). In many cases this
> is not the desired behavior."
>
> So, it is supposed to use usleep_range in shorter sleep case,
> such as 5ms.

Thanks for double checking. usleep_range() sounds good then. Sorry for
the noise.

Best regards,
Tomasz


Re: [V3, 2/2] media: i2c: Add Omnivision OV02A10 camera sensor driver

2019-09-05 Thread Tomasz Figa
On Thu, Sep 5, 2019 at 7:45 PM Sakari Ailus
 wrote:
>
> Hi Dongchun,
>
> On Thu, Sep 05, 2019 at 05:41:05PM +0800, Dongchun Zhu wrote:
>
> ...
>
> > > > + ret = regulator_bulk_enable(OV02A10_NUM_SUPPLIES, ov02a10->supplies);
> > > > + if (ret < 0) {
> > > > + dev_err(dev, "Failed to enable regulators\n");
> > > > + goto disable_clk;
> > > > + }
> > > > + msleep_range(7);
> > >
> > > This has some potential of clashing with more generic functions in the
> > > future. Please use usleep_range directly, or msleep.
> > >
> >
> > Did you mean using usleep_range(7*1000, 8*1000), as used in patch v1?
> > https://patchwork.kernel.org/patch/10957225/
>
> Yes, please.

Why not just msleep()?


Re: [V2, 2/2] media: i2c: Add DW9768 VCM driver

2019-09-05 Thread Tomasz Figa
Hi Dongchun,

On Thu, Sep 5, 2019 at 4:22 PM  wrote:
>
> From: Dongchun Zhu 
>
> This patch adds a V4L2 sub-device driver for DW9768 lens voice coil,
> and provides control to set the desired focus.
>
> The DW9768 is a 10 bit DAC with 100mA output current sink capability
> from Dongwoon, designed for linear control of voice coil motor,
> and controlled via I2C serial interface.
>
> Signed-off-by: Dongchun Zhu 
> ---
>  MAINTAINERS|   1 +
>  drivers/media/i2c/Kconfig  |  10 ++
>  drivers/media/i2c/Makefile |   1 +
>  drivers/media/i2c/dw9768.c | 349 
> +
>  4 files changed, 361 insertions(+)
>  create mode 100644 drivers/media/i2c/dw9768.c
>

Thanks for v2! Please see my comments inline.

> diff --git a/MAINTAINERS b/MAINTAINERS
> index 192a671..c5c9a0e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -4976,6 +4976,7 @@ M:Dongchun Zhu 
>  L: linux-me...@vger.kernel.org
>  T: git git://linuxtv.org/media_tree.git
>  S: Maintained
> +F: drivers/media/i2c/dw9768.c
>  F: Documentation/devicetree/bindings/media/i2c/dongwoon,dw9768.txt
>
>  DONGWOON DW9807 LENS VOICE COIL DRIVER
> diff --git a/drivers/media/i2c/Kconfig b/drivers/media/i2c/Kconfig
> index 79ce9ec..dfb665c 100644
> --- a/drivers/media/i2c/Kconfig
> +++ b/drivers/media/i2c/Kconfig
> @@ -1016,6 +1016,16 @@ config VIDEO_DW9714
>   capability. This is designed for linear control of
>   voice coil motors, controlled via I2C serial interface.
>
> +config VIDEO_DW9768
> +   tristate "DW9768 lens voice coil support"
> +   depends on I2C && VIDEO_V4L2 && MEDIA_CONTROLLER
> +   depends on VIDEO_V4L2_SUBDEV_API
> +   help
> + This is a driver for the DW9768 camera lens voice coil.
> + DW9768 is a 10 bit DAC with 100mA output current sink
> + capability. This is designed for linear control of
> + voice coil motors, controlled via I2C serial interface.
> +
>  config VIDEO_DW9807_VCM
> tristate "DW9807 lens voice coil support"
> depends on I2C && VIDEO_V4L2 && MEDIA_CONTROLLER
> diff --git a/drivers/media/i2c/Makefile b/drivers/media/i2c/Makefile
> index fd4ea86..2561239 100644
> --- a/drivers/media/i2c/Makefile
> +++ b/drivers/media/i2c/Makefile
> @@ -24,6 +24,7 @@ obj-$(CONFIG_VIDEO_SAA6752HS) += saa6752hs.o
>  obj-$(CONFIG_VIDEO_AD5820)  += ad5820.o
>  obj-$(CONFIG_VIDEO_AK7375)  += ak7375.o
>  obj-$(CONFIG_VIDEO_DW9714)  += dw9714.o
> +obj-$(CONFIG_VIDEO_DW9768)  += dw9768.o
>  obj-$(CONFIG_VIDEO_DW9807_VCM)  += dw9807-vcm.o
>  obj-$(CONFIG_VIDEO_ADV7170) += adv7170.o
>  obj-$(CONFIG_VIDEO_ADV7175) += adv7175.o
> diff --git a/drivers/media/i2c/dw9768.c b/drivers/media/i2c/dw9768.c
> new file mode 100644
> index 000..66d1712
> --- /dev/null
> +++ b/drivers/media/i2c/dw9768.c
> @@ -0,0 +1,349 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (c) 2019 MediaTek Inc.
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define DW9768_NAME"dw9768"
> +#define DW9768_MAX_FOCUS_POS   1023
> +/*
> + * This sets the minimum granularity for the focus positions.
> + * A value of 1 gives maximum accuracy for a desired focus position
> + */
> +#define DW9768_FOCUS_STEPS 1
> +/*
> + * DW9768 separates two registers to control the VCM position.
> + * One for MSB value, another is LSB value.
> + */
> +#define DW9768_REG_MASK_MSB0x03
> +#define DW9768_REG_MASK_LSB0xff
> +#define DW9768_SET_POSITION_ADDR0x03
> +
> +#define DW9768_CMD_DELAY   0xff
> +#define DW9768_CTRL_DELAY_US   5000
> +
> +#define DW9768_DAC_SHIFT   8
> +
> +/* dw9768 device structure */
> +struct dw9768 {
> +   struct v4l2_ctrl_handler ctrls;
> +   struct v4l2_subdev sd;
> +   struct regulator *vin;
> +   struct regulator *vdd;
> +};
> +
> +static inline struct dw9768 *to_dw9768_vcm(struct v4l2_ctrl *ctrl)
> +{
> +   return container_of(ctrl->handler, struct dw9768, ctrls);
> +}
> +
> +static inline struct dw9768 *sd_to_dw9768_vcm(struct v4l2_subdev *subdev)
> +{
> +   return container_of(subdev, struct dw9768, sd);
> +}
> +
> +struct regval_list {
> +   unsigned char reg_num;
> +   unsigned char value;
> +};
> +
> +static struct regval_list dw9768_init_regs[] = {
> +   {0x02, 0x02},
> +   {DW9768_CMD_DELAY, DW9768_CMD_DELAY},
> +   {0x06, 0x41},
> +   {0x07, 0x39},
> +   {DW9768_CMD_DELAY, DW9768_CMD_DELAY},
> +};
> +
> +static struct regval_list dw9768_release_regs[] = {
> +   {0x02, 0x00},
> +   {DW9768_CMD_DELAY, DW9768_CMD_DELAY},
> +   {0x01, 0x00},
> +   {DW9768_CMD_DELAY, DW9768_CMD_DELAY},
> +};
> +
> +static int dw9768_write_smbus(struct dw9768 *dw9768, unsigned char reg,
> + unsigned char 

Re: [PATCH 2/2] media: i2c: dw9768: Add DW9768 VCM driver

2019-09-02 Thread Tomasz Figa
Hi Dongchun,

On Tue, Sep 3, 2019 at 12:02 AM Dongchun Zhu  wrote:
>
> Hi Tomasz,
>
> On Fri, 2019-08-23 at 17:17 +0900, Tomasz Figa wrote:
> > Hi Dongchun,
> >
> > On Mon, Jul 08, 2019 at 06:06:41PM +0800, dongchun@mediatek.com wrote:
> > > From: Dongchun Zhu 
> > >
> > > This patch adds a V4L2 sub-device driver for DW9768 lens voice coil,
> > > and provides control to set the desired focus.
> > >
> > > The DW9807 is a 10 bit DAC from Dongwoon, designed for linear
> > > control of voice coil motor.
> > >
> > > Signed-off-by: Dongchun Zhu 
> > > ---
> > >  MAINTAINERS|   1 +
> > >  drivers/media/i2c/Kconfig  |  10 +
> > >  drivers/media/i2c/Makefile |   1 +
> > >  drivers/media/i2c/dw9768.c | 458 
> > > +
> > >  4 files changed, 470 insertions(+)
> > >  create mode 100644 drivers/media/i2c/dw9768.c
> > >
> >
> > Thanks for the patch! Please see my comments inline.
> >
> > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > index 8f6ac93..17152d7 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -4877,6 +4877,7 @@ M:Dongchun Zhu 
> > >  L: linux-me...@vger.kernel.org
> > >  T: git git://linuxtv.org/media_tree.git
> > >  S: Maintained
> > > +F: drivers/media/i2c/dw9768.c
> > >  F: Documentation/devicetree/bindings/media/i2c/dongwoon,dw9768.txt
> > >
> > >  DONGWOON DW9807 LENS VOICE COIL DRIVER
> > > diff --git a/drivers/media/i2c/Kconfig b/drivers/media/i2c/Kconfig
> > > index 7793358..8ff6c95 100644
> > > --- a/drivers/media/i2c/Kconfig
> > > +++ b/drivers/media/i2c/Kconfig
> > > @@ -1014,6 +1014,16 @@ config VIDEO_DW9714
> > >   capability. This is designed for linear control of
> > >   voice coil motors, controlled via I2C serial interface.
> > >
> > > +config VIDEO_DW9768
> > > +   tristate "DW9768 lens voice coil support"
> > > +   depends on I2C && VIDEO_V4L2 && MEDIA_CONTROLLER
> > > +   depends on VIDEO_V4L2_SUBDEV_API
> > > +   help
> > > + This is a driver for the DW9768 camera lens voice coil.
> > > + DW9768 is a 10 bit DAC with 100mA output current sink
> > > + capability. This is designed for linear control of
> > > + voice coil motors, controlled via I2C serial interface.
> > > +
> > >  config VIDEO_DW9807_VCM
> > > tristate "DW9807 lens voice coil support"
> > > depends on I2C && VIDEO_V4L2 && MEDIA_CONTROLLER
> > > diff --git a/drivers/media/i2c/Makefile b/drivers/media/i2c/Makefile
> > > index d8ad9da..944fbf6 100644
> > > --- a/drivers/media/i2c/Makefile
> > > +++ b/drivers/media/i2c/Makefile
> > > @@ -24,6 +24,7 @@ obj-$(CONFIG_VIDEO_SAA6752HS) += saa6752hs.o
> > >  obj-$(CONFIG_VIDEO_AD5820)  += ad5820.o
> > >  obj-$(CONFIG_VIDEO_AK7375)  += ak7375.o
> > >  obj-$(CONFIG_VIDEO_DW9714)  += dw9714.o
> > > +obj-$(CONFIG_VIDEO_DW9768)  += dw9768.o
> > >  obj-$(CONFIG_VIDEO_DW9807_VCM)  += dw9807-vcm.o
> > >  obj-$(CONFIG_VIDEO_ADV7170) += adv7170.o
> > >  obj-$(CONFIG_VIDEO_ADV7175) += adv7175.o
> > > diff --git a/drivers/media/i2c/dw9768.c b/drivers/media/i2c/dw9768.c
> > > new file mode 100644
> > > index 000..f5b5591
> > > --- /dev/null
> > > +++ b/drivers/media/i2c/dw9768.c
> > > @@ -0,0 +1,458 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * Copyright (c) 2018 MediaTek Inc.
> > > + */
> > > +
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +#define DW9768_VOLTAGE_ANALOG  280
> >
> > This is a platform detail and should be defined in the platform data, for
> > example DTS on platforms using DT.
> >
>
> Thanks for your reminder.
> This would be fixed in next release.
>
> > > +#define DW9768_NAME"dw9768"
> >
> > The chip we seem to be using this driver for is called gt9769. Shouldn't we
> > call the driver the same?
> >
>
> It is also called DW9768 from camera module specification, which was
> initially confirmed with vendor.
>

Okay, thanks for clarifying.

Best regards,
Tomasz


Re: [V3, 2/2] media: i2c: Add Omnivision OV02A10 camera sensor driver

2019-08-26 Thread Tomasz Figa
On Wed, Aug 21, 2019 at 8:05 PM Sakari Ailus
 wrote:
>
> Hi Tomasz,
>
> On Wed, Aug 21, 2019 at 07:30:38PM +0900, Tomasz Figa wrote:
[snip]
> > Is it really correct to enable the clock before the regulators?
> >
> > According to the datasheet, it should be:
> >  - PD pin HIGH,
> >  - nRST pin LOW,
> >  - DVDDIO and AVDD28 power up and stabilize,
> >  - clock enabled,
> >  - min 5 ms delay,
> >  - PD pin LOW,
> >  - min 4 ms delay,
> >  - nRST pin HIGH,
> >  - min 5 ms delay,
> >  - I2C interface ready.
> >
> > > +
> > > +   /* Note: set 0 is high, set 1 is low */
> >
> > Why is that? If there is some inverter on the way that should be handled
> > outside of this driver. (GPIO DT bindings have flags for this purpose.
> >
> > If the pins are nRESET and nPOWERDOWN in the hardware datasheet, we should
> > call them like this in the driver too (+/- the lowercase and underscore
> > convention).
> >
> > According to the datasheet, the reset pin is called RST and inverted, so we 
> > should
> > call it n_rst, but the powerdown signal, called PD, is not inverted, so pd
> > would be the right name.
>
> For what it's worth sensors generally have xshutdown (or reset) pin that is
> active high. Looking at the code, it is not the case here. It's a bit odd
> since the usual arrangement saves power when the camera is not in use; it's
> not a lot but still. Oh well.
>

I guess we could drive powerdown low after disabling the regulators
and clocks, but that wouldn't work for the cases where the regulators
are actually shared with something else, especially if that is not
related to the same camera module.

> ...
>
> > > +static struct i2c_driver ov02a10_i2c_driver = {
> > > +   .driver = {
> > > +   .name = "ov02a10",
> > > +   .pm = _pm_ops,
> > > +   .of_match_table = ov02a10_of_match,
> >
> > Please use of_match_ptr() wrapper.
>
> Not really needed; the driver does expect regulators, GPIOs etc., but by
> leaving out of_match_ptr(), the driver will also probe on ACPI based
> systems.

Good point, I always keep forgetting about the ability to probe OF
drivers from ACPI. Then we also need to remove the #if
IS_ENABLED(CONFIG_OF) from ov02a10_of_match.

Best regards,
Tomasz


Re: [PATCH v8 05/14] media: rkisp1: add Rockchip ISP1 subdev driver

2019-08-15 Thread Tomasz Figa
On Thu, Aug 15, 2019 at 5:25 PM Sakari Ailus
 wrote:
>
> Hi Helen,
>
> On Wed, Aug 14, 2019 at 09:58:05PM -0300, Helen Koike wrote:
>
> ...
>
> > >> +static int rkisp1_isp_sd_set_fmt(struct v4l2_subdev *sd,
> > >> +   struct v4l2_subdev_pad_config *cfg,
> > >> +   struct v4l2_subdev_format *fmt)
> > >> +{
> > >> +  struct rkisp1_device *isp_dev = sd_to_isp_dev(sd);
> > >> +  struct rkisp1_isp_subdev *isp_sd = _dev->isp_sdev;
> > >> +  struct v4l2_mbus_framefmt *mf = >format;
> > >> +
> > >
> > > Note that for sub-device nodes, the driver is itself responsible for
> > > serialising the access to its data structures.
> >
> > But looking at subdev_do_ioctl_lock(), it seems that it serializes the
> > ioctl calls for subdevs, no? Or I'm misunderstanding something (which is
> > most probably) ?
>
> Good question. I had missed this change --- subdev_do_ioctl_lock() is
> relatively new. But setting that lock is still not possible as the struct
> is allocated in the framework and the device is registered before the
> driver gets hold of it. It's a good idea to provide the same serialisation
> for subdevs as well.
>
> I'll get back to this later.
>
> ...
>
> > >> +static int rkisp1_isp_sd_s_power(struct v4l2_subdev *sd, int on)
> > >
> > > If you support runtime PM, you shouldn't implement the s_power op.
> >
> > Is is ok to completly remove the usage of runtime PM then?
> > Like this http://ix.io/1RJb ?
>
> Please use runtime PM instead. In the long run we should get rid of the
> s_power op. Drivers themselves know better when the hardware they control
> should be powered on or off.
>

One also needs to use runtime PM to handle power domains and power
dependencies on auxiliary devices, e.g. IOMMU.

> >
> > tbh I'm not that familar with runtime PM and I'm not sure what is the
> > difference of it and using s_power op (and 
> > Documentation/power/runtime_pm.rst
> > is not being that helpful tbh).
>
> You can find a simple example e.g. in
> drivers/media/platform/atmel/atmel-isi.c .
>
> >
> > >
> > > You'll still need to call s_power on external subdevs though.
> > >
> > >> +{
> > >> +  struct rkisp1_device *isp_dev = sd_to_isp_dev(sd);
> > >> +  int ret;
> > >> +
> > >> +  v4l2_dbg(1, rkisp1_debug, _dev->v4l2_dev, "s_power: %d\n", on);
> > >> +
> > >> +  if (on) {
> > >> +  ret = pm_runtime_get_sync(isp_dev->dev);
> >
> > If this is not ok to remove suport for runtime PM, then where should I put
> > the call to pm_runtime_get_sync() if not in this s_power op ?
>
> Basically the runtime_resume and runtime_suspend callbacks are where the
> device power state changes are implemented, and pm_runtime_get_sync and
> pm_runtime_put are how the driver controls the power state.
>
> So you no longer need the s_power() op at all. The op needs to be called on
> the pipeline however, as there are drivers that still use it.
>

For this driver, I suppose we would _get_sync() when we start
streaming (in the hardware, i.e. we want the ISP to start capturing
frames) and _put() when we stop and the driver shouldn't perform any
access to the hardware when the streaming is not active.

Best regards,
Tomasz


Re: [RFC,v3 7/9] media: platform: Add Mediatek ISP P1 device driver

2019-08-07 Thread Tomasz Figa
On Wed, Aug 7, 2019 at 11:11 AM Jungo Lin  wrote:
>
> Hi, Tomasz:
>
> On Tue, 2019-08-06 at 18:47 +0900, Tomasz Figa wrote:
> > Hi Jungo,
> >
> > On Fri, Jul 26, 2019 at 4:24 PM Jungo Lin  wrote:
> > >
> > > Hi, Tomasz:
> > >
> > > On Thu, 2019-07-25 at 18:23 +0900, Tomasz Figa wrote:
> > > > .Hi Jungo,
> > > >
> > > > On Sat, Jul 20, 2019 at 6:58 PM Jungo Lin  
> > > > wrote:
> > > > >
> > > > > Hi, Tomasz:
> > > > >
> > > > > On Wed, 2019-07-10 at 18:56 +0900, Tomasz Figa wrote:
> > > > > > Hi Jungo,
> > > > > >
> > > > > > On Tue, Jun 11, 2019 at 11:53:42AM +0800, Jungo Lin wrote:
> > [snip]
>
> I just keep some questions to be clarified.
> [snip]
>
> > > > > > > +   isp_dev->meta0_vb2_index = meta0_vb2_index;
> > > > > > > +   isp_dev->meta1_vb2_index = meta1_vb2_index;
> > > > > > > +   } else {
> > > > > > > +   if (irq_status & SOF_INT_ST) {
> > > > > > > +   isp_dev->current_frame = hw_frame_num;
> > > > > > > +   isp_dev->meta0_vb2_index = meta0_vb2_index;
> > > > > > > +   isp_dev->meta1_vb2_index = meta1_vb2_index;
> > > > > > > +   }
> > > > > > > +   irq_handle_notify_event(isp_dev, irq_status, 
> > > > > > > dma_status, 1);
> > > > > > > +   }
> > > > > >
> > > > > > The if and else blocks do almost the same things just in different 
> > > > > > order. Is
> > > > > > it really expected?
> > > > > >
> > > > >
> > > > > If we receive HW_PASS1_DON_ST & SOF_INT_ST IRQ events at the same 
> > > > > time,
> > > > > the correct sequence should be handle HW_PASS1_DON_ST firstly to check
> > > > > any de-queued frame and update the next frame setting later.
> > > > > Normally, this is a corner case or system performance issue.
> > > >
> > > > So it sounds like HW_PASS1_DON_ST means that all data from current
> > > > frame has been written, right? If I understand your explanation above
> > > > correctly, that would mean following handling of each interrupt:
> > > >
> > > > HW_PASS1_DON_ST:
> > > >  - CQ executes with next CQ buffer to prepare for next frame. <- how
> > > > is this handled? does the CQ hardware automatically receive this event
> > > > from the ISP hadware?
> > > >  - return VB2 buffers,
> > > >  - complete requests.
> > > >
> > > > SOF_INT_ST:
> > > >  - send VSYNC event to userspace,
> > > >  - program next CQ buffer to CQ,
> > > >
> > > > SW_PASS1_DON_ST:
> > > >  - reclaim CQ buffer and enqueue next frame to composing if available
> > > >
> > >
> > > Sorry for our implementation of HW_PASS1_DON_ST.
> > > It is confusing.
> > > Below is the revised version based on your conclusion.
> > > So in our new implemmenation, we just handle SOF_INT_ST &
> > > SW_PASS1_DON_ST events. We just add one warning message for
> > > HW_PASS1_DON_ST
> > >
> > > HW_PASS1_DON_ST:
> > > - CQ executes with next CQ buffer to prepare for next frame.
> > >
> > > SOF_INT_ST:
> > > - send VSYNC event to userspace,
> > > - program next CQ buffer to CQ,
> > >
> > > SW_PASS1_DON_ST:
> > > - reclaim CQ buffer and enqueue next frame to composing if available
> > > - return VB2 buffers,
> > > - complete requests.
> > >
> > > For CQ HW operations, it is listed below:
> > >
> > > a. The CQ buffer has two kinds of information
> > >  - Which ISP registers needs to be updated.
> > >  - Where the corresponding ISP register data to be read.
> > > b. The CQ buffer loading procedure is triggered by HW_PASS1_DONT_ST IRQ
> > > event periodically.
> > >  - Normally, if the ISP HW receives the completed frame and it will
> > > trigger W_PASS1_DONT_ST IRQ and perform CQ buffer loading immediately.
> > > -  So the CQ buffer loading is performed by ISP HW automatically.
> > > c. The ISP HW will read CQ base address register(REG_CQ_THR0_BASEADDR)
> > > to decide whi

Re: [RFC,v3 7/9] media: platform: Add Mediatek ISP P1 device driver

2019-08-06 Thread Tomasz Figa
Hi Jungo,

On Fri, Jul 26, 2019 at 4:24 PM Jungo Lin  wrote:
>
> Hi, Tomasz:
>
> On Thu, 2019-07-25 at 18:23 +0900, Tomasz Figa wrote:
> > .Hi Jungo,
> >
> > On Sat, Jul 20, 2019 at 6:58 PM Jungo Lin  wrote:
> > >
> > > Hi, Tomasz:
> > >
> > > On Wed, 2019-07-10 at 18:56 +0900, Tomasz Figa wrote:
> > > > Hi Jungo,
> > > >
> > > > On Tue, Jun 11, 2019 at 11:53:42AM +0800, Jungo Lin wrote:
[snip]
> > > > > +
> > > > > +   err_status = irq_status & INT_ST_MASK_CAM_ERR;
> > > > > +
> > > > > +   /* Sof, done order check */
> > > > > +   if ((irq_status & SOF_INT_ST) && (irq_status & HW_PASS1_DON_ST)) {
> > > > > +   dev_dbg(dev, "sof_done block cnt:%d\n", 
> > > > > isp_dev->sof_count);
> > > > > +
> > > > > +   /* Notify IRQ event and enqueue frame */
> > > > > +   irq_handle_notify_event(isp_dev, irq_status, dma_status, 
> > > > > 0);
> > > > > +   isp_dev->current_frame = hw_frame_num;
> > > >
> > > > What exactly is hw_frame_num? Shouldn't we assign it before notifying 
> > > > the
> > > > event?
> > > >
> > >
> > > This is a another spare register for frame sequence number usage.
> > > It comes from struct p1_frame_param:frame_seq_no which is sent by
> > > SCP_ISP_FRAME IPI command. We will rename this to dequeue_frame_seq_no.
> > > Is it a better understanding?
> >
> > I'm sorry, unfortunately it's still not clear to me. Is it the
> > sequence number of the frame that was just processed and returned to
> > the kernel or the next frame that is going to be processed from now
> > on?
> >
>
> It is the next frame that is going to be proceed.
> We simplify the implementation of isp_irq_cam function. The hw_frame_num
> is renamed to dequeue_frame_seq_no and saved this value from HW at
> SOF_INT_ST. Since it is obtained in SOF_INI_ST event, it means it is
> next frame to be processed. If there is SW_PASS1_DON_ST, it means this
> frame is processed done. We use this value to de-queue the frame request
> and return buffers to VB2.
>
> The normal IRQ sequence is SOF_INT_ST => SW_PASS1_DON_ST &
> HW_PASS1_DON_ST.
>
> a. SW_PASS_DON_ST is designed for DMAs done event.
> If there is no available DMA buffers en-queued into HW, there is no
> SW_PADD_DON_ST.
>
> b. HW_PASS_DON_ST is designed to trigger CQ buffer load procedure.
> It is paired with SOF IRQ event, even if there is no available DMA
> buffers.
>
> static void isp_irq_handle_sof(struct mtk_isp_p1_device *p1_dev,
>unsigned int dequeue_frame_seq_no)
> {
> dma_addr_t base_addr = p1_dev->composer_iova;
> int composed_frame_seq_no =
> atomic_read(_dev->composed_frame_seq_no);
> unsigned int addr_offset;
>
> /* Send V4L2_EVENT_FRAME_SYNC event */
> mtk_cam_dev_event_frame_sync(_dev->cam_dev, dequeue_frame_seq_no);
>
> p1_dev->sof_count += 1;
> /* Save dequeue frame information */
> p1_dev->dequeue_frame_seq_no = dequeue_frame_seq_no;
>
> /* Update CQ base address if needed */
> if (composed_frame_seq_no <= dequeue_frame_seq_no) {
> dev_dbg(p1_dev->dev,
> "SOF_INT_ST, no update, cq_num:%d, frame_seq:%d",
> composed_frame_seq_no, dequeue_frame_seq_no);
> return;
> }
> addr_offset = MTK_ISP_CQ_ADDRESS_OFFSET *
> (dequeue_frame_seq_no % MTK_ISP_CQ_BUFFER_COUNT);
> writel(base_addr + addr_offset, p1_dev->regs + REG_CQ_THR0_BASEADDR);
> dev_dbg(p1_dev->dev,
> "SOF_INT_ST, update next, cq_num:%d, frame_seq:%d 
> cq_addr:0x%x",
> composed_frame_seq_no, dequeue_frame_seq_no, addr_offset);
> }
>
> void mtk_cam_dev_dequeue_req_frame(struct mtk_cam_dev *cam,
>unsigned int frame_seq_no)
> {
> struct mtk_cam_dev_request *req, *req_prev;
> unsigned long flags;
>
> spin_lock_irqsave(>running_job_lock, flags);
> list_for_each_entry_safe(req, req_prev, >running_job_list, list) 
> {
> dev_dbg(cam->dev, "frame_seq:%d, de-queue frame_seq:%d\n",
> req->frame_params.frame_seq_no, frame_seq_no);
>
> /* Match by the en-queued request numbe

Re: [RFC,v3 6/9] media: platform: Add Mediatek ISP P1 V4L2 functions

2019-08-05 Thread Tomasz Figa
Hi Jungo,

On Tue, Jul 30, 2019 at 10:45 AM Jungo Lin  wrote:
>
> On Mon, 2019-07-29 at 19:04 +0900, Tomasz Figa wrote:
> > On Mon, Jul 29, 2019 at 10:18 AM Jungo Lin  wrote:
> > > On Fri, 2019-07-26 at 14:49 +0900, Tomasz Figa wrote:
> > > > On Wed, Jul 24, 2019 at 1:31 PM Jungo Lin  
> > > > wrote:
> > > > > On Tue, 2019-07-23 at 19:21 +0900, Tomasz Figa wrote:
> > > > > > On Thu, Jul 18, 2019 at 1:39 PM Jungo Lin  
> > > > > > wrote:
> > > > > > > On Wed, 2019-07-10 at 18:54 +0900, Tomasz Figa wrote:
> > > > > > > > On Tue, Jun 11, 2019 at 11:53:41AM +0800, Jungo Lin wrote:
[snip]
> > > > > > > > > +
> > > > > > > > > +   dev_dbg(dev, "%s: node:%d fd:%d idx:%d\n",
> > > > > > > > > +   __func__,
> > > > > > > > > +   node->id,
> > > > > > > > > +   buf->vbb.request_fd,
> > > > > > > > > +   buf->vbb.vb2_buf.index);
> > > > > > > > > +
> > > > > > > > > +   /* For request buffers en-queue, handled in 
> > > > > > > > > mtk_cam_req_try_queue */
> > > > > > > > > +   if (vb->vb2_queue->uses_requests)
> > > > > > > > > +   return;
> > > > > > > >
> > > > > > > > I'd suggest removing non-request support from this driver. Even 
> > > > > > > > if we end up
> > > > > > > > with a need to provide compatibility for non-request mode, then 
> > > > > > > > it should be
> > > > > > > > built on top of the requests mode, so that the driver itself 
> > > > > > > > doesn't have to
> > > > > > > > deal with two modes.
> > > > > > > >
> > > > > > >
> > > > > > > The purpose of non-request function in this driver is needed by
> > > > > > > our camera middle-ware design. It needs 3A statistics buffers 
> > > > > > > before
> > > > > > > image buffers en-queue. So we need to en-queue 3A statistics with
> > > > > > > non-request mode in this driver. After MW got the 3A statistics 
> > > > > > > data, it
> > > > > > > will en-queue the images, tuning buffer and other meta buffers 
> > > > > > > with
> > > > > > > request mode. Based on this requirement, do you have any 
> > > > > > > suggestion?
> > > > > > > For upstream driver, should we only consider request mode?
> > > > > > >
> > > > > >
> > > > > > Where does that requirement come from? Why the timing of queuing of
> > > > > > the buffers to the driver is important?
> > > > > >
> > > > > > [snip]
> > > > >
> > > > > Basically, this requirement comes from our internal camera
> > > > > middle-ware/3A hal in user space. Since this is not generic 
> > > > > requirement,
> > > > > we will follow your original suggestion to keep the request mode only
> > > > > and remove other non-request design in other files. For upstream 
> > > > > driver,
> > > > > it should support request mode only.
> > > > >
> > > >
> > > > Note that Chromium OS will use the "upstream driver" and we don't want
> > > > to diverge, so please make the userspace also use only requests. I
> > > > don't see a reason why there would be any need to submit any buffers
> > > > outside of a request.
> > > >
> > > > [snip]
> > >
> > > Ok, I have raised your concern to our colleagues and let him to discuss
> > > with you in another communication channel.
> > >
> >
> > Thanks!
> >
> > Best regards,
> > Tomasz
>
> Our colleague is preparing material to explain the our 3A/MW design. If
> he is ready, he will discuss this with you.

Thanks!

>
> In the original plan, we will deliver P1 v4 patch set tomorrow (31th
> Jul.). But, there are some comments waiting for other experts' input.
> Do you suggest it is better to resolve all comments before v4 patch set
> submitting or continue to discuss these comments on v4?

For the remaining v4l2-compliance issues, we can postpone them and
keep on a TODO list in the next version.

Best regards,
Tomasz


Re: [RFC,v3 6/9] media: platform: Add Mediatek ISP P1 V4L2 functions

2019-07-29 Thread Tomasz Figa
On Mon, Jul 29, 2019 at 10:18 AM Jungo Lin  wrote:
> On Fri, 2019-07-26 at 14:49 +0900, Tomasz Figa wrote:
> > On Wed, Jul 24, 2019 at 1:31 PM Jungo Lin  wrote:
> > > On Tue, 2019-07-23 at 19:21 +0900, Tomasz Figa wrote:
> > > > On Thu, Jul 18, 2019 at 1:39 PM Jungo Lin  
> > > > wrote:
> > > > > On Wed, 2019-07-10 at 18:54 +0900, Tomasz Figa wrote:
> > > > > > On Tue, Jun 11, 2019 at 11:53:41AM +0800, Jungo Lin wrote:
[snip]
> > > dev_dbg(cam->dev, "jobs are full\n");
> > > spin_unlock_irqrestore(>pending_job_lock, flags);
> > > return;
> > > }
> > > list_for_each_entry_safe(req, req_prev, >pending_job_list, 
> > > list) {
> >
> > Could we instead check the counter here and break if it's >=
> > MTK_ISP_MAX_RUNNING_JOBS?
> > Then we could increment it here too to simplify the code.
> >
>
> Thanks for your advice.
> We simplified this function as below:
>
> void mtk_cam_dev_req_try_queue(struct mtk_cam_dev *cam)
> {
> struct mtk_cam_dev_request *req, *req_prev;
> unsigned long flags;
>
> if (!cam->streaming) {
> dev_dbg(cam->dev, "stream is off\n");
> return;
> }
>
> spin_lock_irq(>pending_job_lock);
> spin_lock_irqsave(>running_job_lock, flags);

Having the inner call spin_lock_irqsave() doesn't really do anything
useful, because the outer spin_lock_irq() disables the IRQs and flags
would always have the IRQ disabled state. Please use irqsave for the
outer call.

[snip]
> > > > > > > +
> > > > > > > +static struct v4l2_subdev *
> > > > > > > +mtk_cam_cio_get_active_sensor(struct mtk_cam_dev *cam_dev)
> > > > > > > +{
> > > > > > > +   struct media_device *mdev = 
> > > > > > > cam_dev->seninf->entity.graph_obj.mdev;
> > > > > > > +   struct media_entity *entity;
> > > > > > > +   struct device *dev = _dev->pdev->dev;
> > > > > > > +   struct v4l2_subdev *sensor;
> > > > > >
> > > > > > This variable would be unitialized if there is no streaming sensor. 
> > > > > > Was
> > > > > > there no compiler warning generated for this?
> > > > > >
> > > > >
> > > > > No, there is no compiler warning.
> > > > > But, we will assign sensor to NULL to avoid unnecessary compiler 
> > > > > warning
> > > > > with different compiler options.
> > > > >
> > > >
> > > > Thanks. It would be useful if you could check why the compiler you're
> > > > using doesn't show a warning here. We might be missing other
> > > > uninitialized variables.
> > > >
> > >
> > > We will feedback to your project team to check the possible reason about
> > > compiler warning issue.
> > >
> >
> > Do you mean that it was the Clang toolchain used on Chromium OS (e.g.
> > emerge chromeos-kernel-4_19)?
>
> > [snip]
>
> Yes, I checked this comment in the Chromium OS build environment.
> But, I think I have made the mistake here. I need to check the build
> status in the Mediatek's kernel upstream environment. I will pay
> attention in next path set upstream.
>

Thanks a lot. I will recheck this in the Chromium OS toolchain too.

> > > > > > > +
> > > > > > > +   dev_dbg(dev, "%s: node:%d fd:%d idx:%d\n",
> > > > > > > +   __func__,
> > > > > > > +   node->id,
> > > > > > > +   buf->vbb.request_fd,
> > > > > > > +   buf->vbb.vb2_buf.index);
> > > > > > > +
> > > > > > > +   /* For request buffers en-queue, handled in 
> > > > > > > mtk_cam_req_try_queue */
> > > > > > > +   if (vb->vb2_queue->uses_requests)
> > > > > > > +   return;
> > > > > >
> > > > > > I'd suggest removing non-request support from this driver. Even if 
> > > > > > we end up
> > > > > > with a need to provide compatibility for non-request mode, then it 
> > > > > > should be
> > > > > > built on top of the requests mode, so that the driver itself 
> > > > > > doesn't ha

Re: [RFC PATCH V2 0/6] media: platform: Add support for Digital Image Processing (DIP) on mt8183 SoC

2019-07-27 Thread Tomasz Figa
Hi Hans,

On Mon, Jul 8, 2019 at 8:05 PM  wrote:
>
> Hello,
>
> This RFC patch series added Digital Image Processing (DIP) driver on Mediatek
> mt8183 SoC. It belongs to the Mediatek's ISP driver series based on V4L2 and
> media controller framework. I posted the main part of the DIP driver as RFC to
> discuss first and would like some review comments.
>
> I appreciate the helpful comment of Tomasz, Rob and Shik in RFC V1. The RFC V2
> patch addressed on the review issues in V1. There are 2 V4L2 compliance test
> issues still under discussion and I will do the corresponding modification in
> the next patch after we come to the conclusion.
>
> 1. Request API test doesn't know which buffers of the video devices are
> required so we got failed in testRequests()
>

Perhaps the test should check the media topology and infer the video
devices required from there?

> 2. V4L2 compliance test check if the driver return error when passing an
> invalid image size, but in vb2_create_bufs() case, we don't know if the
> size check is required or not.

In current VB2 API, we don't get the format given to CREATE_BUFS
anymore, we just get the requested sizeimage. How do we validate if
the sizeimage is big enough for the format? Should we implement that
check in the .vidioc_create_bufs op before calling vb2_create_bufs()?

>
> Please see the following URL for the detail.
> http://lists.infradead.org/pipermail/linux-mediatek/2019-June/020884.html
>

Would you be able to help us with the above 2 issues? They are worked
around in the driver, so the tests pass below, but we don't want such
workarounds in driver code.

Best regards,
Tomasz

> ==
>  Introduction
> ==
>
> Digital Image Processing (DIP) unit can accept the tuning parameters and
> adjust the image content in Mediatek ISP system. Furthermore, it performs
> demosaicing and noise reduction on the image to support the advanced camera
> features of the application. The DIP driver also support image format
> conversion, resizing and rotation with its hardware path.
>
> The driver is implemented with V4L2 and media controller framework. We
> have the following entities describing the DIP path. Since a DIP frame has
> multiple buffers, the driver uses Request API to control the multiple
> buffer's enqueue flow.
>
> 1. Meta (output video device): connects to DIP sub device. It accepts the
> input tuning buffer from userspace. The metadata interface used currently
> is only a temporary solution to kick off driver development and is not
> ready for reviewed yet.
>
> 2. RAW (output video device): connects to DIP sub device. It accepts input
> image buffer from userspace.
>
> 3. DIP (sub device): connects to MDP-0 and MDP-1. When processing an image,
> DIP hardware support multiple output images with different size and format
> so it needs two capture video devices to return the streaming data to the
> user.
>
> 4. MDP-0 (capture video device): return the processed image data.
>
> 5. MDP-1 (capture video device): return the processed image data, the
> image size and format can be different from the ones of MDP-0.
>
> The overall file structure of the DIP driver is as following:
>
> * mtk_dip-v4l2.c: implements DIP platform driver, V4L2 and vb2 operations.
>
> * mtk_dip-sys.c: implements the hardware job handling flow including the part 
> of
> interaction with the SCP and MDP.
>
> * mtk_dip-dev.c: implements dip pipe utilities. DIP driver supports 3 software
> pipes (preview, capture and reprocessing) at the same time. All
> the pipes share the same DIP hardware to process the images.
>
> ==
>  Changes in v2
> ==
> * mtk_dip-smem.c
> 1. Removed mtk_dip-smem.c and the custom code of SCP and DIP's share memory
> operation, and uses SCP device as the allocation device instead. (SCP creates
> the shared DMA pool of DMA buffers and can hook to DMA mapping APIs)
>
> * mtk_dip-ctrl.c
> 1. Merged mtk_dip-ctrl.c into mtk_dip-v4l2.c since we only have a HW ctrl.
> (V4L2_CID_ROTATE)
>
> * mtk_dip-sys.c:
> 1. Removed struct mtk_dip_hw_work, mtk_dip_hw_submit_work and the related 
> memory
> management flow (use mtk_dip_request instead)
>
> 2. Uses workqueue mdp_wq instead of dip_runner_thread kthread to simplify the
> design
>
> 3. Removed dip_gcejoblist and use mtk_dip_job_info list instead
>
> 4. Removed framejob and mtk_dip_hw_mdpcb_work and the related alloc and free
> since it already embedded in the new struct mtk_dip_request
>
> 5. Integrated struct mtk_dip_hw_user_id and struct with mtk_dip_pipe, and
> removed dip_hw->dip_useridlist
>
> 6. Pass mtk_dip_request to mdp_cmdq_sendtask() as cb data
>
> 7. Use spinlock instead of mutex as struct mtk_dip_hw_queue's queuelock so 
> that
> we can use direct function call instead of mdpcb_workqueue works
>
> 8. Removed dip_send() and use scp_ipi_send() directly
>
> 9. Removed composing_wq and the related macro, we use semaphore instead
>
> 10. Use array to keep constant number of 

Re: [RFC, v3 9/9] media: platform: Add Mediatek ISP P1 shared memory device

2019-07-26 Thread Tomasz Figa
On Fri, Jul 26, 2019 at 8:59 PM Jungo Lin  wrote:
>
> Hi Robin:
>
> On Fri, 2019-07-26 at 12:04 +0100, Robin Murphy wrote:
> > On 26/07/2019 08:42, Tomasz Figa wrote:
> > > On Fri, Jul 26, 2019 at 4:41 PM Christoph Hellwig  
> > > wrote:
> > >>
> > >> On Fri, Jul 26, 2019 at 02:15:14PM +0900, Tomasz Figa wrote:
> > >>> Could you try dma_get_sgtable() with the SCP struct device and then
> > >>> dma_map_sg() with the P1 struct device?
> > >>
> > >> Please don't do that.  dma_get_sgtable is a pretty broken API (see
> > >> the common near the arm implementation) and we should not add more
> > >> users of it.  If you want a piece of memory that can be mapped to
> > >> multiple devices allocate it using alloc_pages and then just map
> > >> it to each device.
> > >
> > > Thanks for taking a look at this thread.
> > >
> > > Unfortunately that wouldn't work. We have a specific reserved memory
> > > pool that is the only memory area accessible to one of the devices.
> > > Any idea how to handle this?
> >
> > If it's reserved in the sense of being outside struct-page-backed
> > "kernel memory", then provided you have a consistent CPU physical
> > address it might be reasonable for other devices to access it via
> > dma_map_resource().
> >
> > Robin.
>
> Thank you for your suggestion.
>
> After revising to use dma_map_resource(), it is worked. Below is the
> current implementation. Pleas kindly help us to check if there is any
> misunderstanding.
>
> #define MTK_ISP_COMPOSER_MEM_SIZE   0x20
>
> /*
>  * Allocate coherent reserved memory for SCP firmware usage.
>  * The size of SCP composer's memory is fixed to 0x20
>  * for the requirement of firmware.
>  */
> ptr = dma_alloc_coherent(p1_dev->cam_dev.smem_dev,
>  MTK_ISP_COMPOSER_MEM_SIZE, , 
> GFP_KERNEL);
> if (!ptr) {
> dev_err(dev, "failed to allocate compose memory\n");
> return -ENOMEM;
> }
> p1_dev->composer_scp_addr = addr;
> p1_dev->composer_virt_addr = ptr;
> dev_dbg(dev, "scp addr:%pad va:%pK\n", , ptr);
>
> /*
>  * This reserved memory is also be used by ISP P1 HW.
>  * Need to get iova address for ISP P1 DMA.
>  */
> addr = dma_map_resource(dev, addr, MTK_ISP_COMPOSER_MEM_SIZE,
> DMA_BIDIRECTIONAL, DMA_ATTR_SKIP_CPU_SYNC);

This is still incorrect, because addr is a DMA address, but the second
argument to dma_map_resource() is a physical address.

> if (dma_mapping_error(dev, addr)) {
> dev_err(dev, "Failed to map scp iova\n");
> ret = -ENOMEM;
> goto fail_free_mem;
> }
> p1_dev->composer_iova = addr;
> dev_info(dev, "scp iova addr:%pad\n", );
>
> Moreover, appropriate Tomasz & Christoph's help on this issue.

Robin, the memory is specified using the reserved-memory DT binding
and managed by the coherent DMA pool framework. We can allocate from
it using dma_alloc_coherent(), which gives us a DMA address, not CPU
physial address (although in practice on this platform they are equal
numerically).

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC, v3 9/9] media: platform: Add Mediatek ISP P1 shared memory device

2019-07-26 Thread Tomasz Figa
On Fri, Jul 26, 2019 at 4:41 PM Christoph Hellwig  wrote:
>
> On Fri, Jul 26, 2019 at 02:15:14PM +0900, Tomasz Figa wrote:
> > Could you try dma_get_sgtable() with the SCP struct device and then
> > dma_map_sg() with the P1 struct device?
>
> Please don't do that.  dma_get_sgtable is a pretty broken API (see
> the common near the arm implementation) and we should not add more
> users of it.  If you want a piece of memory that can be mapped to
> multiple devices allocate it using alloc_pages and then just map
> it to each device.

Thanks for taking a look at this thread.

Unfortunately that wouldn't work. We have a specific reserved memory
pool that is the only memory area accessible to one of the devices.
Any idea how to handle this?

Best regards,
Tomasz


Re: [RFC,v3 6/9] media: platform: Add Mediatek ISP P1 V4L2 functions

2019-07-25 Thread Tomasz Figa
On Wed, Jul 24, 2019 at 1:31 PM Jungo Lin  wrote:
>
> Hi, Tomasz:
>
> On Tue, 2019-07-23 at 19:21 +0900, Tomasz Figa wrote:
> > Hi Jungo,
> >
> > On Thu, Jul 18, 2019 at 1:39 PM Jungo Lin  wrote:
> > >
> > > Hi, Tomasz:
> > >
> > > On Wed, 2019-07-10 at 18:54 +0900, Tomasz Figa wrote:
> > > > Hi Jungo,
> > > >
> > > > On Tue, Jun 11, 2019 at 11:53:41AM +0800, Jungo Lin wrote:
> > [snip]
> > > > > +static void mtk_cam_req_try_isp_queue(struct mtk_cam_dev *cam_dev,
> > > > > + struct media_request *new_req)
> > > > > +{
> > > > > +   struct mtk_cam_dev_request *req, *req_safe, *cam_dev_req;
> > > > > +   struct device *dev = _dev->pdev->dev;
> > > > > +
> > > > > +   dev_dbg(dev, "%s new req:%d", __func__, !new_req);
> > > > > +
> > > > > +   if (!cam_dev->streaming) {
> > > > > +   cam_dev_req = mtk_cam_req_to_dev_req(new_req);
> > > > > +   spin_lock(_dev->req_lock);
> > > > > +   list_add_tail(_dev_req->list, _dev->req_list);
> > > > > +   spin_unlock(_dev->req_lock);
> > > > > +   dev_dbg(dev, "%s: stream off, no ISP enqueue\n", 
> > > > > __func__);
> > > > > +   return;
> > > > > +   }
> > > > > +
> > > > > +   /* Normal enqueue flow */
> > > > > +   if (new_req) {
> > > > > +   mtk_isp_req_enqueue(dev, new_req);
> > > > > +   return;
> > > > > +   }
> > > > > +
> > > > > +   /* Flush all media requests wehen first stream on */
> > > > > +   list_for_each_entry_safe(req, req_safe, _dev->req_list, list) 
> > > > > {
> > > > > +   list_del(>list);
> > > > > +   mtk_isp_req_enqueue(dev, >req);
> > > > > +   }
> > > > > +}
> > > >
> > > > This will have to be redone, as per the other suggestions, but 
> > > > generally one
> > > > would have a function that tries to queue as much as possible from a 
> > > > list to
> > > > the hardware and another function that adds a request to the list and 
> > > > calls
> > > > the first function.
> > > >
> > >
> > > We revised this function as below.
> > > First to check the en-queue conditions:
> > > a. stream on
> > > b. The composer buffers in SCP are 3, so we only could has 3 jobs
> > > at the same time.
> > >
> > >
> > > Second, try to en-queue the frames in the pending job if possible and
> > > move them into running job list if possible.
> > >
> > > The request has been inserted into pending job in mtk_cam_req_validate
> > > which is used to validate media_request.
> >
> > Thanks for replying to each of the comments, that's very helpful.
> > Snipped out the parts that I agreed with.
> >
> > Please note that req_validate is not supposed to change any driver
> > state. It's only supposed to validate the request. req_queue is the
> > right callback to insert the request into some internal driver
> > bookkeeping structures.
> >
>
> Yes, in req_validate function, we don't change any driver state.
> Below is the function's implementation.
>
> a. Call vb2_request_validate(req) to verify media request.
> b. Update the buffer internal structure buffer.
> c. Insert the request into pending_job_list to prepare en-queue.
>

Adding to a list is changing driver state. The callback must not
modify anything else than the request itself.

Queuing to driver's list should happen in req_queue instead.

[snip]
> > >
> > > void mtk_cam_dev_req_try_queue(struct mtk_cam_dev *cam_dev)
> > > {
> > > struct mtk_cam_dev_request *req, *req_prev;
> > > struct list_head enqueue_job_list;
> > > int buffer_cnt = atomic_read(_dev->running_job_count);
> > > unsigned long flags;
> > >
> > > if (!cam_dev->streaming ||
> > > buffer_cnt >= MTK_ISP_MAX_RUNNING_JOBS) {
> >
> > Do we have a guarantee that cam_dev->running_job_count doesn't
> > decrement between the atomic_read() above and this line?
> >
>
> Ok, we will use cam->pending_job_lock to protect
> cam_dev->running_job_count acces

Re: [RFC, v3 9/9] media: platform: Add Mediatek ISP P1 shared memory device

2019-07-25 Thread Tomasz Figa
On Tue, Jul 23, 2019 at 5:22 PM Jungo Lin  wrote:
>
> Hi, Tomasz:
>
> On Tue, 2019-07-23 at 16:20 +0900, Tomasz Figa wrote:
> > Hi Jungo,
> >
> > On Fri, Jul 5, 2019 at 4:59 PM Jungo Lin  wrote:
> > >
> > > Hi Tomasz:
> > >
> > > On Fri, 2019-07-05 at 13:22 +0900, Tomasz Figa wrote:
> > > > Hi Jungo,
> > > >
> > > > On Fri, Jul 5, 2019 at 12:33 PM Jungo Lin  
> > > > wrote:
> > > > >
> > > > > Hi Tomasz,
> > >
> > > [snip]
> > >
> > > > > After applying your suggestion in SCP device driver, we could remove
> > > > > mtk_cam-smem.h/c. Currently, we use dma_alloc_coherent with SCP device
> > > > > to get SCP address. We could touch the buffer with this SCP address in
> > > > > SCP processor.
> > > > >
> > > > > After that, we use dma_map_page_attrs with P1 device which supports
> > > > > IOMMU domain to get IOVA address. For this address, we will assign
> > > > > it to our ISP HW device to proceed.
> > > > >
> > > > > Below is the snippet for ISP P1 compose buffer initialization.
> > > > >
> > > > > ptr = dma_alloc_coherent(p1_dev->cam_dev.smem_dev,
> > > > >  MAX_COMPOSER_SIZE, , 
> > > > > GFP_KERNEL);
> > > > > if (!ptr) {
> > > > > dev_err(dev, "failed to allocate compose memory\n");
> > > > > return -ENOMEM;
> > > > > }
> > > > > isp_ctx->scp_mem_pa = addr;
> > > >
> > > > addr contains a DMA address, not a physical address. Could we call it
> > > > scp_mem_dma instead?
> > > >
> > > > > dev_dbg(dev, "scp addr:%pad\n", );
> > > > >
> > > > > /* get iova address */
> > > > > addr = dma_map_page_attrs(dev, phys_to_page(addr), 0,
> > > >
> > > > addr is a DMA address, so phys_to_page() can't be called on it. The
> > > > simplest thing here would be to use dma_map_single() with ptr as the
> > > > CPU address expected.
> > > >
> > >
> > > We have changed to use ma_map_single() with ptr, but encounter IOMMU
> > > error. From the debug log of iommu_dma_map_page[3], we got
> > > 0x5480 instead of expected address: 0x5080[2].
> > > There is a address offset(0x400). If we change to use
> > > dma_map_page_attrs with phys_to_page(addr), the address is correct as we
> > > expected[2]. Do you have any suggestion on this issue? Do we miss
> > > something?
> >
> > Sorry for the late reply. Could you show me the code changes you made
> > to use dma_map_single()? It would sound like the virtual address
> > passed to dma_map_single() isn't correct.
> >
> > Best regards,
> > Tomasz
> >
>
>
> Please check the below code snippet in today's testing.
>
> p1_dev->cam_dev.smem_dev = _dev->scp_pdev->dev;
> ptr = dma_alloc_coherent(p1_dev->cam_dev.smem_dev,
>  MTK_ISP_COMPOSER_MEM_SIZE, , 
> GFP_KERNEL);
> if (!ptr) {
> dev_err(dev, "failed to allocate compose memory\n");
> return -ENOMEM;
> }
> p1_dev->composer_scp_addr = addr;
> p1_dev->composer_virt_addr = ptr;
> dev_info(dev, "scp addr:%pad va:%pK\n", , ptr);
>
> /* get iova address */
> addr = dma_map_single(dev, ptr, MTK_ISP_COMPOSER_MEM_SIZE,
> DMA_BIDIRECTIONAL);
> if (dma_mapping_error(dev, addr)) {
> dma_free_coherent(p1_dev->cam_dev.smem_dev,
>   MTK_ISP_COMPOSER_MEM_SIZE,
>   ptr, p1_dev->composer_scp_addr);
> dev_err(dev, "Failed to map scp iova\n");
> ret = -ENOMEM;
> goto fail_free_mem;
> }
> p1_dev->composer_iova = addr;
> dev_info(dev, "scp iova addr:%pad\n", );
>
> Moreover, below is extracted log[2].
>
> We guess the virtual address which is returned by dma_alloc_coherent
> function is not valid kernel logical address. It is actually returned by
> memremap() in dma_init_coherent_memory(). Moreover, dma_map_single()
> will call virt_to_page() function. For virt_to_page function, it
> requires a

Re: [RFC,v3 8/9] media: platform: Add Mediatek ISP P1 SCP communication

2019-07-25 Thread Tomasz Figa
Hi Jungo,

On Sun, Jul 21, 2019 at 11:18 AM Jungo Lin  wrote:
[snip]
> > > +   wake_up_interruptible(_ctx->composer_tx_thread.wq);
> > > +   isp_ctx->composer_tx_thread.thread = NULL;
> > > +   }
> > > +
> > > +   if (isp_ctx->composer_deinit_thread.thread) {
> > > +   wake_up(_ctx->composer_deinit_thread.wq);
> > > +   isp_ctx->composer_deinit_thread.thread = NULL;
> > > +   }
> > > +   mutex_unlock(_ctx->lock);
> > > +
> > > +   pm_runtime_put_sync(_dev->pdev->dev);
> >
> > No need to use the sync variant.
> >
>
> We don't get this point. If we will call pm_runtime_get_sync in
> mtk_isp_hw_init function, will we need to call
> pm_runtime_put_sync_autosuspend in mtk_isp_hw_release in next patch?
> As we know, we should call runtime pm functions in pair.
>

My point is that pm_runtime_put_sync() is only needed if one wants the
runtime count to be decremented after the function returns. Normally
there is no need to do so and one would call pm_runtime_put(), or if
autosuspend is used, pm_runtime_put_autosuspend() (note there is no
"sync" in the name).

[snip]
> > +static void isp_composer_handler(void *data, unsigned int len, void *priv)
> > > +{
> > > +   struct mtk_isp_p1_ctx *isp_ctx = (struct mtk_isp_p1_ctx *)priv;
> > > +   struct isp_p1_device *p1_dev = p1_ctx_to_dev(isp_ctx);
> > > +   struct device *dev = _dev->pdev->dev;
> > > +   struct mtk_isp_scp_p1_cmd *ipi_msg;
> > > +
> > > +   ipi_msg = (struct mtk_isp_scp_p1_cmd *)data;
> >
> > Should we check that len == sizeof(*ipi_msg)? (Or at least >=, if data could
> > contain some extra bytes at the end.)
> >
>
> The len parameter is the actual sending bytes from SCP to kernel.
> In the runtime, it is only 6 bytes for isp_ack_info command
> However, sizeof(*ipi_msg) is large due to struct mtk_isp_scp_p1_cmd is
> union structure.
>

That said we still should check if len is enough to cover the data
we're accessing below.

> > > +
> > > +   if (ipi_msg->cmd_id != ISP_CMD_ACK)
> > > +   return;
> > > +
> > > +   if (ipi_msg->ack_info.cmd_id == ISP_CMD_FRAME_ACK) {
> > > +   dev_dbg(dev, "ack frame_num:%d",
> > > +   ipi_msg->ack_info.frame_seq_no);
> > > +   atomic_set(_ctx->composed_frame_id,
> > > +  ipi_msg->ack_info.frame_seq_no);
> >
> > I suppose we are expecting here that ipi_msg->ack_info.frame_seq_no would be
> > just isp_ctx->composed_frame_id + 1, right? If not, we probably dropped some
> > frames and we should handle that somehow.
> >
>
> No, we use isp_ctx->composed_frame_id to save which frame sequence
> number are composed done in SCP. In new design, we will move this from
> isp_ctx to p1_dev.

But we compose the frames in order, don't we? Wouldn't every composed
frame would be just previous frame ID + 1?

[snip]
> > > +void isp_composer_hw_init(struct device *dev)
> > > +{
> > > +   struct mtk_isp_scp_p1_cmd composer_tx_cmd;
> > > +   struct isp_p1_device *p1_dev = get_p1_device(dev);
> > > +   struct mtk_isp_p1_ctx *isp_ctx = _dev->isp_ctx;
> > > +
> > > +   memset(_tx_cmd, 0, sizeof(composer_tx_cmd));
> > > +   composer_tx_cmd.cmd_id = ISP_CMD_INIT;
> > > +   composer_tx_cmd.frameparam.hw_module = isp_ctx->isp_hw_module;
> > > +   composer_tx_cmd.frameparam.cq_addr.iova = isp_ctx->scp_mem_iova;
> > > +   composer_tx_cmd.frameparam.cq_addr.scp_addr = isp_ctx->scp_mem_pa;
> >
> > Should we also specify the size of the buffer? Otherwise we could end up
> > with some undetectable overruns.
> >
>
> The size of SCP composer's memory is fixed to 0x20.
> Is it necessary to specify the size of this buffer?
>
> #define MTK_ISP_COMPOSER_MEM_SIZE 0x20
>
> ptr = dma_alloc_coherent(p1_dev->cam_dev.smem_dev,
> MTK_ISP_COMPOSER_MEM_SIZE, , GFP_KERNEL);
>

Okay, but please add a comment saying that this is an implicit
requirement of the firmware.

Best regards,
Tomasz


Re: [RFC,v3 7/9] media: platform: Add Mediatek ISP P1 device driver

2019-07-25 Thread Tomasz Figa
.Hi Jungo,

On Sat, Jul 20, 2019 at 6:58 PM Jungo Lin  wrote:
>
> Hi, Tomasz:
>
> On Wed, 2019-07-10 at 18:56 +0900, Tomasz Figa wrote:
> > Hi Jungo,
> >
> > On Tue, Jun 11, 2019 at 11:53:42AM +0800, Jungo Lin wrote:
> > > This patch adds the Mediatek ISP P1 HW control device driver.
> > > It handles the ISP HW configuration, provides interrupt handling and
> > > initializes the V4L2 device nodes and other functions.
> > >
> > > (The current metadata interface used in meta input and partial
> > > meta nodes is only a temporary solution to kick off the driver
> > > development and is not ready to be reviewed yet.)
> > >
> > > Signed-off-by: Jungo Lin 
> > > ---
> > >  .../platform/mtk-isp/isp_50/cam/Makefile  |1 +
> > >  .../mtk-isp/isp_50/cam/mtk_cam-regs.h |  126 ++
> > >  .../platform/mtk-isp/isp_50/cam/mtk_cam.c | 1087 +
> > >  .../platform/mtk-isp/isp_50/cam/mtk_cam.h |  243 
> > >  4 files changed, 1457 insertions(+)
> > >  create mode 100644 
> > > drivers/media/platform/mtk-isp/isp_50/cam/mtk_cam-regs.h
> > >  create mode 100644 drivers/media/platform/mtk-isp/isp_50/cam/mtk_cam.c
> > >  create mode 100644 drivers/media/platform/mtk-isp/isp_50/cam/mtk_cam.h
> > >
> >
> > Thanks for the patch! Please see my comments inline.
> >
> > [snip]
> >
>
> Thanks for your comments. Please check my replies inline.
>

Thanks! I'll snip anything I don't have further comments on.

[snip]
> > > +/* META */
> > > +#define REG_META0_VB2_INDEX0x14dc
> > > +#define REG_META1_VB2_INDEX0x151c
> >
> > I don't believe these registers are really for VB2 indexes.
> >
>
> MTK P1 ISP HW supports frame header spare registers for each DMA, such
> as CAM_DMA_FH_AAO_SPARE or CAM_DMA_FH_AFO_SPARE. We could save some
> frame information in these ISP registers. In this case, we save META0
> VB2 index into AAO FH spare register and META1 VB2 index into AFO FH
> spare register. These implementation is designed for non-request 3A
> DMAs. These VB2 indexes are sent in ISP_CMD_ENQUEUE_META command of
> mtk_isp_enqueue function. So we just call CAM_DMA_FH_AAO_SPARE as
> REG_META0_VB2_INDEX for easy understanding.

Unfortunately it's not a good idea to mix hardware concepts with
naming specific to the OS the driver is written for. Better to keep
the hardware naming, e.g. CAM_DMA_FH_AAO_SPARE.

> Moreover, if we only need to
> support request mode, we should remove this here.
>
> cmd_params.cmd_id = ISP_CMD_ENQUEUE_META;
> cmd_params.meta_frame.enabled_dma = dma_port;
> cmd_params.meta_frame.vb_index = buffer->vbb.vb2_buf.index;
> cmd_params.meta_frame.meta_addr.iova = buffer->daddr;
> cmd_params.meta_frame.meta_addr.scp_addr = buffer->scp_addr;
>

Okay, removing sounds good to me. Let's keep the code simple.

[snip]
> > > +
> > > +   err_status = irq_status & INT_ST_MASK_CAM_ERR;
> > > +
> > > +   /* Sof, done order check */
> > > +   if ((irq_status & SOF_INT_ST) && (irq_status & HW_PASS1_DON_ST)) {
> > > +   dev_dbg(dev, "sof_done block cnt:%d\n", isp_dev->sof_count);
> > > +
> > > +   /* Notify IRQ event and enqueue frame */
> > > +   irq_handle_notify_event(isp_dev, irq_status, dma_status, 0);
> > > +   isp_dev->current_frame = hw_frame_num;
> >
> > What exactly is hw_frame_num? Shouldn't we assign it before notifying the
> > event?
> >
>
> This is a another spare register for frame sequence number usage.
> It comes from struct p1_frame_param:frame_seq_no which is sent by
> SCP_ISP_FRAME IPI command. We will rename this to dequeue_frame_seq_no.
> Is it a better understanding?

I'm sorry, unfortunately it's still not clear to me. Is it the
sequence number of the frame that was just processed and returned to
the kernel or the next frame that is going to be processed from now
on?

>
> Below is our frame request handling in current design.
>
> 1. Buffer preparation
> - Combined image buffers (IMGO/RRZO) + meta input buffer (Tuining) +
> other meta histogram buffers (LCSO/LMVO) into one request.
> - Accumulated one unique frame sequence number to each request and send
> this request to the SCP composer to compose CQ (Command queue) buffer
> via SCP_ISP_FRAME IPI command.
> - CQ buffer is frame registers set. If ISP registers should be updated
> per frame, these registers are configured in the CQ buffer, such as
> frame sequence number, DMA addresses and tuning ISP registers.
>

Re: [RFC,v3 6/9] media: platform: Add Mediatek ISP P1 V4L2 functions

2019-07-23 Thread Tomasz Figa
Hi Jungo,

On Thu, Jul 18, 2019 at 1:39 PM Jungo Lin  wrote:
>
> Hi, Tomasz:
>
> On Wed, 2019-07-10 at 18:54 +0900, Tomasz Figa wrote:
> > Hi Jungo,
> >
> > On Tue, Jun 11, 2019 at 11:53:41AM +0800, Jungo Lin wrote:
[snip]
> > > +static void mtk_cam_req_try_isp_queue(struct mtk_cam_dev *cam_dev,
> > > + struct media_request *new_req)
> > > +{
> > > +   struct mtk_cam_dev_request *req, *req_safe, *cam_dev_req;
> > > +   struct device *dev = _dev->pdev->dev;
> > > +
> > > +   dev_dbg(dev, "%s new req:%d", __func__, !new_req);
> > > +
> > > +   if (!cam_dev->streaming) {
> > > +   cam_dev_req = mtk_cam_req_to_dev_req(new_req);
> > > +   spin_lock(_dev->req_lock);
> > > +   list_add_tail(_dev_req->list, _dev->req_list);
> > > +   spin_unlock(_dev->req_lock);
> > > +   dev_dbg(dev, "%s: stream off, no ISP enqueue\n", __func__);
> > > +   return;
> > > +   }
> > > +
> > > +   /* Normal enqueue flow */
> > > +   if (new_req) {
> > > +   mtk_isp_req_enqueue(dev, new_req);
> > > +   return;
> > > +   }
> > > +
> > > +   /* Flush all media requests wehen first stream on */
> > > +   list_for_each_entry_safe(req, req_safe, _dev->req_list, list) {
> > > +   list_del(>list);
> > > +   mtk_isp_req_enqueue(dev, >req);
> > > +   }
> > > +}
> >
> > This will have to be redone, as per the other suggestions, but generally one
> > would have a function that tries to queue as much as possible from a list to
> > the hardware and another function that adds a request to the list and calls
> > the first function.
> >
>
> We revised this function as below.
> First to check the en-queue conditions:
> a. stream on
> b. The composer buffers in SCP are 3, so we only could has 3 jobs
> at the same time.
>
>
> Second, try to en-queue the frames in the pending job if possible and
> move them into running job list if possible.
>
> The request has been inserted into pending job in mtk_cam_req_validate
> which is used to validate media_request.

Thanks for replying to each of the comments, that's very helpful.
Snipped out the parts that I agreed with.

Please note that req_validate is not supposed to change any driver
state. It's only supposed to validate the request. req_queue is the
right callback to insert the request into some internal driver
bookkeeping structures.

>
> void mtk_cam_dev_req_try_queue(struct mtk_cam_dev *cam_dev)
> {
> struct mtk_cam_dev_request *req, *req_prev;
> struct list_head enqueue_job_list;
> int buffer_cnt = atomic_read(_dev->running_job_count);
> unsigned long flags;
>
> if (!cam_dev->streaming ||
> buffer_cnt >= MTK_ISP_MAX_RUNNING_JOBS) {

Do we have a guarantee that cam_dev->running_job_count doesn't
decrement between the atomic_read() above and this line?

> dev_dbg(cam_dev->dev, "stream off or buffers are full:%d\n",
> buffer_cnt);
> return;
> }
>
> INIT_LIST_HEAD(_job_list);
>
> spin_lock(_dev->pending_job_lock);
> list_for_each_entry_safe(req, req_prev,
>  _dev->pending_job_list, list) {
> list_del(>list);
> list_add_tail(>list, _job_list);

What's the reason to use the second list? Could we just take one job
from pending_job_list, enqueue it and then iterate again?

> if (atomic_inc_return(_dev->running_job_count) >=
> MTK_ISP_MAX_RUNNING_JOBS)
> break;
> }
> spin_unlock(_dev->pending_job_lock);
>
> list_for_each_entry_safe(req, req_prev,
>  _job_list, list) {
> list_del(>list);
> spin_lock_irqsave(_dev->running_job_lock, flags);
> list_add_tail(>list, _dev->running_job_list);
> spin_unlock_irqrestore(_dev->running_job_lock, flags);
>

Do we have a guarantee that another thread doesn't run the same
function ending up calling mtk_isp_req_enqueue() with another request
before this one and thus making the order of running_job_list
incorrect?

> mtk_isp_req_enqueue(cam_dev, req);
> }
> }
>
[snip]
> > > +   stride = DIV_ROUND_UP(stride * pixel_byte, 8);
> > > +
> > > +   if (pix_fmt 

Re: [RFC,v3 9/9] media: platform: Add Mediatek ISP P1 shared memory device

2019-07-23 Thread Tomasz Figa
Hi Jungo,

On Fri, Jul 5, 2019 at 4:59 PM Jungo Lin  wrote:
>
> Hi Tomasz:
>
> On Fri, 2019-07-05 at 13:22 +0900, Tomasz Figa wrote:
> > Hi Jungo,
> >
> > On Fri, Jul 5, 2019 at 12:33 PM Jungo Lin  wrote:
> > >
> > > Hi Tomasz,
>
> [snip]
>
> > > After applying your suggestion in SCP device driver, we could remove
> > > mtk_cam-smem.h/c. Currently, we use dma_alloc_coherent with SCP device
> > > to get SCP address. We could touch the buffer with this SCP address in
> > > SCP processor.
> > >
> > > After that, we use dma_map_page_attrs with P1 device which supports
> > > IOMMU domain to get IOVA address. For this address, we will assign
> > > it to our ISP HW device to proceed.
> > >
> > > Below is the snippet for ISP P1 compose buffer initialization.
> > >
> > > ptr = dma_alloc_coherent(p1_dev->cam_dev.smem_dev,
> > >  MAX_COMPOSER_SIZE, , GFP_KERNEL);
> > > if (!ptr) {
> > > dev_err(dev, "failed to allocate compose memory\n");
> > > return -ENOMEM;
> > > }
> > > isp_ctx->scp_mem_pa = addr;
> >
> > addr contains a DMA address, not a physical address. Could we call it
> > scp_mem_dma instead?
> >
> > > dev_dbg(dev, "scp addr:%pad\n", );
> > >
> > > /* get iova address */
> > > addr = dma_map_page_attrs(dev, phys_to_page(addr), 0,
> >
> > addr is a DMA address, so phys_to_page() can't be called on it. The
> > simplest thing here would be to use dma_map_single() with ptr as the
> > CPU address expected.
> >
>
> We have changed to use ma_map_single() with ptr, but encounter IOMMU
> error. From the debug log of iommu_dma_map_page[3], we got
> 0x5480 instead of expected address: 0x5080[2].
> There is a address offset(0x400). If we change to use
> dma_map_page_attrs with phys_to_page(addr), the address is correct as we
> expected[2]. Do you have any suggestion on this issue? Do we miss
> something?

Sorry for the late reply. Could you show me the code changes you made
to use dma_map_single()? It would sound like the virtual address
passed to dma_map_single() isn't correct.

Best regards,
Tomasz

>
> [1]
> [1.344786] __dma_alloc_from_coherent: 0x80 PAGE_SHIFT:12
> device_base:0x5000 dma:0x5080
> virt_base:ff801400 va:ff801480
>
> [1.346890] mtk-cam 1a00.camisp: scp addr:0x5080
> va:ff801480
>
> [1.347864] iommu_dma_map_page:0x5480 offset:0
> [1.348562] mtk-cam 1a00.camisp: iova addr:0xfde0
>
> [2]
> [1.346738] __dma_alloc_from_coherent: 0x80 PAGE_SHIFT:12
> device_base:0x5000 dma:0x5080
> virt_base:ff801400 va:ff801480
> [1.348841] mtk-cam 1a00.camisp: scp addr:0x5080
> va:ff801480
> [1.349816] iommu_dma_map_page:0x5080 offset:0
> [1.350514] mtk-cam 1a00.camisp: iova addr:0xfde0
>
>
> [3]
> dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
> unsigned long offset, size_t size, int prot)
> {
> phys_addr_t phys = page_to_phys(page);
> pr_err("iommu_dma_map_page:%pa offset:%lu\n", , offset);
>
> return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot,
> iommu_get_dma_domain(dev));
> }
>
> [snip]
>
> Best regards,
>
> Jungo
>


Re: [PATCH 0/2] media: add support for DW9768 VCM driver

2019-07-22 Thread Tomasz Figa
On Mon, Jul 8, 2019 at 7:12 PM  wrote:
>
> From: Dongchun Zhu 
>
> Hello,
>
> Add a v4l2 sub-device driver for Dongwoon's DW9768 lens voice coil.
> This is a voice coil module using the i2c bus to control the focus position.
>
> The DW9768 can control the position with 10 bits value and
> consists of two 8 bit registers show as below:
> register 0x04(DW9768_REG_POSITION):
> +---+---+---+---+---+---+---+---+
> |D07|D06|D05|D04|D03|D02|D01|D00|
> +---+---+---+---+---+---+---+---+
> register 0x03:
> +---+---+---+---+---+---+---+---+
> |---|---|---|---|---|---|D09|D08|
> +---+---+---+---+---+---+---+---+
>
> This driver support :
>  - set DW9768 to standby mode once suspend and turn it back to active if 
> resume
>  - set the position via V4L2_CID_FOCUS_ABSOLUTE ctrl
>
> Dongchun Zhu (2):
>   media: i2c: dw9768: Add DT support and MAINTAINERS entry
>   media: i2c: dw9768: Add DW9768 VCM driver
>
>  .../bindings/media/i2c/dongwoon,dw9768.txt |   9 +
>  MAINTAINERS|   8 +
>  drivers/media/i2c/Kconfig  |  10 +
>  drivers/media/i2c/Makefile |   1 +
>  drivers/media/i2c/dw9768.c | 458 
> +
>  5 files changed, 486 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/media/i2c/dongwoon,dw9768.txt
>  create mode 100644 drivers/media/i2c/dw9768.c
>
> --
> 2.9.2
>

Gentle ping. Some help with review would be appreciated!

Best regards,
Tomasz


Re: [RFC,v3 9/9] media: platform: Add Mediatek ISP P1 shared memory device

2019-07-04 Thread Tomasz Figa
Hi Jungo,

On Fri, Jul 5, 2019 at 12:33 PM Jungo Lin  wrote:
>
> Hi Tomasz,
>
> On Mon, 2019-07-01 at 16:25 +0900, Tomasz Figa wrote:
> > Hi Jungo,
> >
> > On Tue, Jun 11, 2019 at 11:53:44AM +0800, Jungo Lin wrote:
> > > The purpose of this child device is to provide shared
> > > memory management for exchanging tuning data between co-processor
> > > and the Pass 1 unit of the camera ISP system, including cache
> > > buffer handling.
> > >
> >
> > Looks like we haven't really progressed on getting this replaced with
> > something that doesn't require so much custom code. Let me propose something
> > better then.
> >
> > We already have a reserved memory mode in DT. If it has a compatible string
> > of "shared-dma-pool", it would be registered in the coherent DMA framework
> > [1]. That would make it available for consumer devices to look-up.
> >
> > Now if we add a "memory-region" property to the SCP device node and point it
> > to our reserved memory node, the SCP driver could look it up and hook to the
> > DMA mapping API using of_reserved_mem_device_init_by_idx[2].
> >
> > That basically makes any dma_alloc_*(), dma_map_*(), etc. calls on the SCP
> > struct device use the coherent DMA ops, which operate on the assigned memory
> > pool. With that, the P1 driver could just directly use those calls to
> > manage the memory, without any custom code.
> >
> > There is an example how this setup works in the s5p-mfc driver[3], but it
> > needs to be noted that it creates child nodes, because it can have more than
> > 1 DMA port, which may need its own memory pool. In our case, we wouldn't
> > need child nodes and could just use the SCP device directly.
> >
> > [1] 
> > https://elixir.bootlin.com/linux/v5.2-rc7/source/kernel/dma/coherent.c#L345
> > [2] 
> > https://elixir.bootlin.com/linux/v5.2-rc7/source/drivers/of/of_reserved_mem.c#L312
> > [3] 
> > https://elixir.bootlin.com/linux/v5.2-rc7/source/drivers/media/platform/s5p-mfc/s5p_mfc.c#L1075
> >
> > Let me also post some specific comments below, in case we end up still
> > needing any of the code.
> >
>
> Thanks your suggestions.
>
> After applying your suggestion in SCP device driver, we could remove
> mtk_cam-smem.h/c. Currently, we use dma_alloc_coherent with SCP device
> to get SCP address. We could touch the buffer with this SCP address in
> SCP processor.
>
> After that, we use dma_map_page_attrs with P1 device which supports
> IOMMU domain to get IOVA address. For this address, we will assign
> it to our ISP HW device to proceed.
>
> Below is the snippet for ISP P1 compose buffer initialization.
>
> ptr = dma_alloc_coherent(p1_dev->cam_dev.smem_dev,
>  MAX_COMPOSER_SIZE, , GFP_KERNEL);
> if (!ptr) {
> dev_err(dev, "failed to allocate compose memory\n");
> return -ENOMEM;
> }
> isp_ctx->scp_mem_pa = addr;

addr contains a DMA address, not a physical address. Could we call it
scp_mem_dma instead?

> dev_dbg(dev, "scp addr:%pad\n", );
>
> /* get iova address */
> addr = dma_map_page_attrs(dev, phys_to_page(addr), 0,

addr is a DMA address, so phys_to_page() can't be called on it. The
simplest thing here would be to use dma_map_single() with ptr as the
CPU address expected.

>   MAX_COMPOSER_SIZE, DMA_BIDIRECTIONAL,
>   DMA_ATTR_SKIP_CPU_SYNC);
> if (dma_mapping_error(dev, addr)) {
> isp_ctx->scp_mem_pa = 0;

We also need to free the allocated memory.

> dev_err(dev, "Failed to map scp iova\n");
> return -ENOMEM;
> }
> isp_ctx->scp_mem_iova = addr;
>
> Moreover, we have another meta input buffer usage.
> For this kind of buffer, it will be allocated by V4L2 framework
> with dma_alloc_coherent with SCP device. In order to get IOVA,
> we will add dma_map_page_attrs in vb2_ops' buf_init function.
> In buf_cleanup function, we will call dma_unmap_page_attrs function.

As per above, we don't have access to the struct page we want to map.
We probably want to get the CPU VA using vb2_plane_vaddr() and call
dma_map_single() instead.

>
> Based on these current implementation, do you think it is correct?
> If we got any wrong, please let us know.
>
> Btw, we also DMA_ATTR_NO_KERNEL_MAPPING DMA attribte to
> avoid dma_sync_sg_for_device. Othewise, it will hit the KE.
> Maybe we could not get the correct sg_table.
> Do you think it is a bug and 

Re: [RFC v2 4/4] media: platform: mtk-mdp3: Add Mediatek MDP3 driver

2019-06-25 Thread Tomasz Figa
On Thu, Jun 20, 2019 at 1:48 PM Alexandre Courbot  wrote:
>
> On Tue, Jun 4, 2019 at 8:20 PM Tomasz Figa  wrote:
> > > +
> > > + ret = mdp_vpu_get_locked(mdp);
> > > + if (ret < 0)
> > > + goto err_load_vpu;
> >
> > This shouldn't happen in open(), but rather the latest possible point in
> > time. If one needs to keep the VPU running for the time of streaming, then
> > it should be start_streaming. If one can safely turn the VPU off if there is
> > no frame queued for long time, it should be just in m2m job_run.
> >
> > Generally the userspace should be able to
> > just open an m2m device for querying it, without any side effects like
> > actually powering on the hardware or grabbing a hardware instance (which
> > could block some other processes, trying to grab one too).
>
> OTOH looking at the code of mdp_vpu_get_locked(), we do the whole
> rproc_boot and VPU init procedure if we were the only user. So I can
> understand we want to avoid doing this too often.
>
> Maybe mdp_vpu_get_locked() can be reorganized in a better way. I feel
> like the call to mdp_vpu_register() should be done in probe, and maybe
> we can use runtime PM (with a reasonable timeout) to control the rproc
> and VPU init?

I think it depends on when exactly the rproc and VPU need stay
initialized. In general, we want to turn off as much as possible as
quickly as possible, but keeping in mind any turn on latencies.

For example. if it takes 10 ms to boot rproc/VPU, we probably
shouldn't turn it off unless we already spent 20-30 ms idling, which
could be handled with runtime PM with (delayed) autosuspend. However,
things like clock gating are normally very fast, so we could just stop
any clocks as soon as frame processing ends and restart when next
frame is getting scheduled and if we use autosuspend, we wouldn't be
able to do it using PM runtime.

My point was that just open() is not the right place for doing this.
Any later stage should be okay, as long as it suits the hardware
architecture.

Best regards,
Tomasz


Re: [RFC PATCH V1 6/6] platform: mtk-isp: Add Mediatek DIP driver

2019-06-25 Thread Tomasz Figa
Hi Frederic,

On Tue, Jun 25, 2019 at 9:16 PM Frederic Chen
 wrote:
>
> Dear Tomasz,
>
> Would you comment on the following points in further? Thank you for the
> review.
>
> On Thu, 2019-05-09 at 18:48 +0900, Tomasz Figa wrote:
> > Hi Frederic,
> >
>
> [snip]
>
> > > +int mtk_dip_pipe_job_start(struct mtk_dip_pipe *dip_pipe,
> > > +  struct mtk_dip_pipe_job_info *pipe_job_info)
> > > +{
> > > +   struct platform_device *pdev = dip_pipe->dip_dev->pdev;
> > > +   int ret;
> > > +   int out_img_buf_idx;
> > > +   struct img_ipi_frameparam dip_param;
> > > +   struct mtk_dip_dev_buffer *dev_buf_in;
> > > +   struct mtk_dip_dev_buffer *dev_buf_out;
> > > +   struct mtk_dip_dev_buffer *dev_buf_tuning;
> > > +
> > > +   if (!pipe_job_info) {
> > > +   dev_err(>dev,
> > > +   "pipe_job_info(%p) in start can't be NULL\n",
> > > +   pipe_job_info);
> > > +   return -EINVAL;
> > > +   }
> >
> > This should be impossible to happen.
> >
> > > +
> > > +   /* We need RAW and at least MDP0 or MDP 1 buffer */
> > > +   if (!pipe_job_info->buf_map[MTK_DIP_VIDEO_NODE_ID_RAW_OUT] ||
> > > +   (!pipe_job_info->buf_map[MTK_DIP_VIDEO_NODE_ID_MDP0_CAPTURE] 
> > > &&
> > > +
> > > !pipe_job_info->buf_map[MTK_DIP_VIDEO_NODE_ID_MDP1_CAPTURE])){
> > > +   struct mtk_dip_dev_buffer **map = pipe_job_info->buf_map;
> > > +
> > > +   dev_dbg(>dev,
> > > +   "can't trigger job: raw(%p), mdp0(%p), 
> > > mdp1(%p)\n",
> > > +   map[MTK_DIP_VIDEO_NODE_ID_RAW_OUT],
> > > +   map[MTK_DIP_VIDEO_NODE_ID_MDP0_CAPTURE],
> > > +   map[MTK_DIP_VIDEO_NODE_ID_MDP1_CAPTURE]);
> > > +   return -EINVAL;
> >
> > This must be validated at the time of request_validate. We can't fail at
> > this stage anymore.
>
> After the modification about checking the required buffers in
> req_validate(), we got failed in the following testRequests()
> of V4L2 compliance test. The V4L2 compliance test case doesn't know
> which buffers of the video devices are required and expects that the
> MEDIA_REQUEST_IOC_QUEUE succeed when the request has any valid buffer.
>
> For example, when the request has MDP 0 buffer only, the DIP's
> req_validate() should return an error since it also need a buffer
> from RAW video device, but it make compliance test get failed.
>
> May I still check the required buffers in req_validate() in the next
> patch? I will add some note to explain that the compliance test failed
> item is related to the limitation?
>
> ===
> int testRequests(struct node *node, bool test_streaming)
> // ..
> if (i)
> fail_on_test(!buf.qbuf(node));
> buf.s_flags(buf.g_flags() | V4L2_BUF_FLAG_REQUEST_FD);
> buf.s_request_fd(buf_req_fds[i]);
> buf.s_field(V4L2_FIELD_ANY);
> fail_on_test(buf.qbuf(node));
> if (v4l_type_is_video(buf.g_type()) && v4l_type_is_output(buf.g_type()))
> fail_on_test(buf.g_field() == V4L2_FIELD_ANY);
> fail_on_test(buf.querybuf(node, i));
>
> // ..
>
> // LINE 1807 in v4l2-test-buffers.cpp, we will got the  failed here.
> // Since we need one RAW and one MDP0 buffer at least.
> // v4l2-test-buffers.cpp(1807): doioctl_fd(buf_req_fds[i],
> // MEDIA_REQUEST_IOC_QUEUE, 0)
> //  test Requests: FAIL
> fail_on_test(doioctl_fd(buf_req_fds[i], MEDIA_REQUEST_IOC_QUEUE, 0));
> ===
>

Sounds like a limitation of the compliance test. Request API testing
there is still new and possibly just made for simple mem-to-mem
devices.

Hans, the driver always requires some buffers to be given, like the
raw frame input, while other, e.g. downscaled output, are optional.
Any ideas?

> > > +
> > > +static int mtk_dip_vb2_queue_setup(struct vb2_queue *vq,
> > > +  unsigned int *num_buffers,
> > > +  unsigned int *num_planes,
> > > +  unsigned int sizes[],
> > > +  struct device *alloc_devs[])
> > > +{
> > > +   struct mtk_dip_pipe *dip_pipe = vb2_get_drv_priv(vq);
> > > +   struct mtk_dip_video_dev

Re: [PATCH v7 14/21] iommu/mediatek: Add mmu1 support

2019-06-18 Thread Tomasz Figa
On Tue, Jun 18, 2019 at 9:09 PM Yong Wu  wrote:
>
> On Tue, 2019-06-18 at 15:19 +0900, Tomasz Figa wrote:
> > On Mon, Jun 10, 2019 at 9:21 PM Yong Wu  wrote:
> > >
> > > Normally the M4U HW connect EMI with smi. the diagram is like below:
> > >   EMI
> > >|
> > >   M4U
> > >|
> > > smi-common
> > >|
> > >-
> > >||| |...
> > > larb0 larb1  larb2 larb3
> > >
> > > Actually there are 2 mmu cells in the M4U HW, like this diagram:
> > >
> > >   EMI
> > >-
> > > | |
> > >mmu0  mmu1 <- M4U
> > > | |
> > >-
> > >|
> > > smi-common
> > >|
> > >-
> > >||| |...
> > > larb0 larb1  larb2 larb3
> > >
> > > This patch add support for mmu1. In order to get better performance,
> > > we could adjust some larbs go to mmu1 while the others still go to
> > > mmu0. This is controlled by a SMI COMMON register SMI_BUS_SEL(0x220).
> > >
> > > mt2712, mt8173 and mt8183 M4U HW all have 2 mmu cells. the default
> > > value of that register is 0 which means all the larbs go to mmu0
> > > defaultly.
> > >
> > > This is a preparing patch for adjusting SMI_BUS_SEL for mt8183.
> > >
> > > Signed-off-by: Yong Wu 
> > > Reviewed-by: Evan Green 
> > > ---
> > >  drivers/iommu/mtk_iommu.c | 46 
> > > +-
> > >  1 file changed, 29 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > > index 3a14301..ec4ce74 100644
> > > --- a/drivers/iommu/mtk_iommu.c
> > > +++ b/drivers/iommu/mtk_iommu.c
> > > @@ -72,26 +72,32 @@
> > >  #define F_INT_CLR_BIT  BIT(12)
> > >
> > >  #define REG_MMU_INT_MAIN_CONTROL   0x124
> > > -#define F_INT_TRANSLATION_FAULTBIT(0)
> > > -#define F_INT_MAIN_MULTI_HIT_FAULT BIT(1)
> > > -#define F_INT_INVALID_PA_FAULT BIT(2)
> > > -#define F_INT_ENTRY_REPLACEMENT_FAULT  BIT(3)
> > > -#define F_INT_TLB_MISS_FAULT   BIT(4)
> > > -#define F_INT_MISS_TRANSACTION_FIFO_FAULT  BIT(5)
> > > -#define F_INT_PRETETCH_TRANSATION_FIFO_FAULT   BIT(6)
> > > +   /* mmu0 | mmu1 */
> > > +#define F_INT_TRANSLATION_FAULT(BIT(0) | BIT(7))
> > > +#define F_INT_MAIN_MULTI_HIT_FAULT (BIT(1) | BIT(8))
> > > +#define F_INT_INVALID_PA_FAULT (BIT(2) | BIT(9))
> > > +#define F_INT_ENTRY_REPLACEMENT_FAULT  (BIT(3) | BIT(10))
> > > +#define F_INT_TLB_MISS_FAULT   (BIT(4) | BIT(11))
> > > +#define F_INT_MISS_TRANSACTION_FIFO_FAULT  (BIT(5) | BIT(12))
> > > +#define F_INT_PRETETCH_TRANSATION_FIFO_FAULT   (BIT(6) | BIT(13))
> >
> > If there are two IOMMUs, shouldn't we have two driver instances handle
> > them, instead of making the driver combine them two internally?
>
> Actually it means only one IOMMU(M4U) HW here. Each a M4U HW has two
> small iommu cells which have independent MTLB. As the diagram above, M4U
> contain mmu0 and mmu1.
>
> MT8173 and MT8183 have only one M4U HW while MT2712 have 2 M4U HWs(two
> driver instances).
>
> >
> > And, what is even more important from security point of view actually,
> > have two separate page tables (aka IOMMU groups) for them?
>
> Each a IOMMU(M4U) have its own pagetable, thus, mt8183 have only one
> pagetable while mt2712 have two.

I see, thanks for clarifying.

Best regards,
Tomasz


Re: [PATCH v7 14/21] iommu/mediatek: Add mmu1 support

2019-06-18 Thread Tomasz Figa via iommu
On Mon, Jun 10, 2019 at 9:21 PM Yong Wu  wrote:
>
> Normally the M4U HW connect EMI with smi. the diagram is like below:
>   EMI
>|
>   M4U
>|
> smi-common
>|
>-
>||| |...
> larb0 larb1  larb2 larb3
>
> Actually there are 2 mmu cells in the M4U HW, like this diagram:
>
>   EMI
>-
> | |
>mmu0  mmu1 <- M4U
> | |
>-
>|
> smi-common
>|
>-
>||| |...
> larb0 larb1  larb2 larb3
>
> This patch add support for mmu1. In order to get better performance,
> we could adjust some larbs go to mmu1 while the others still go to
> mmu0. This is controlled by a SMI COMMON register SMI_BUS_SEL(0x220).
>
> mt2712, mt8173 and mt8183 M4U HW all have 2 mmu cells. the default
> value of that register is 0 which means all the larbs go to mmu0
> defaultly.
>
> This is a preparing patch for adjusting SMI_BUS_SEL for mt8183.
>
> Signed-off-by: Yong Wu 
> Reviewed-by: Evan Green 
> ---
>  drivers/iommu/mtk_iommu.c | 46 +-
>  1 file changed, 29 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 3a14301..ec4ce74 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -72,26 +72,32 @@
>  #define F_INT_CLR_BIT  BIT(12)
>
>  #define REG_MMU_INT_MAIN_CONTROL   0x124
> -#define F_INT_TRANSLATION_FAULTBIT(0)
> -#define F_INT_MAIN_MULTI_HIT_FAULT BIT(1)
> -#define F_INT_INVALID_PA_FAULT BIT(2)
> -#define F_INT_ENTRY_REPLACEMENT_FAULT  BIT(3)
> -#define F_INT_TLB_MISS_FAULT   BIT(4)
> -#define F_INT_MISS_TRANSACTION_FIFO_FAULT  BIT(5)
> -#define F_INT_PRETETCH_TRANSATION_FIFO_FAULT   BIT(6)
> +   /* mmu0 | mmu1 */
> +#define F_INT_TRANSLATION_FAULT(BIT(0) | BIT(7))
> +#define F_INT_MAIN_MULTI_HIT_FAULT (BIT(1) | BIT(8))
> +#define F_INT_INVALID_PA_FAULT (BIT(2) | BIT(9))
> +#define F_INT_ENTRY_REPLACEMENT_FAULT  (BIT(3) | BIT(10))
> +#define F_INT_TLB_MISS_FAULT   (BIT(4) | BIT(11))
> +#define F_INT_MISS_TRANSACTION_FIFO_FAULT  (BIT(5) | BIT(12))
> +#define F_INT_PRETETCH_TRANSATION_FIFO_FAULT   (BIT(6) | BIT(13))

If there are two IOMMUs, shouldn't we have two driver instances handle
them, instead of making the driver combine them two internally?

And, what is even more important from security point of view actually,
have two separate page tables (aka IOMMU groups) for them?

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2 4/4] media: platform: mtk-mdp3: Add Mediatek MDP3 driver

2019-06-11 Thread Tomasz Figa
Hi Daoyuan,

On Tue, Jun 11, 2019 at 6:20 PM Daoyuan Huang
 wrote:
>
> hi Tomasz:
>
> Thanks for your review comments, the corresponding modification
> & explanation is under preparation, will update soon.
>
> Thanks.

Thanks.

Note that Alexandre may already be reviewing the rest of this patch,
so I'd consult with him if sending a next revision or waiting for his
review is preferred.

Best regards,
Tomasz


Re: [RFC PATCH V1 6/6] platform: mtk-isp: Add Mediatek DIP driver

2019-06-11 Thread Tomasz Figa
On Tue, Jun 11, 2019 at 7:07 PM Frederic Chen
 wrote:
>
> Hi Tomasz,
>
>
> On Tue, 2019-06-11 at 17:59 +0900, Tomasz Figa wrote:
> > On Tue, Jun 11, 2019 at 5:48 PM Frederic Chen
> >  wrote:
> > >
> > > Dear Tomasz,
> > >
> > > I'd like to elaborate more about the tuning_data.va.
> > > Would you like to give us some advice about our improvement proposal 
> > > inline?
> > >
> > > Thank you very much.
> > >
> > >
> > > On Wed, 2019-05-22 at 03:14 +0800, Frederic Chen wrote:
> > > > Dear Tomasz,
> > > >
> > > > I appreciate your comment. It is very helpful for us.
> > > >
> > > >
> > > > > > diff --git 
> > > > > > a/drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-sys.c 
> > > > > > b/drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-sys.c
> > > > > > new file mode 100644
> > > > > > index ..54d2b5f5b802
> > > > > > --- /dev/null
> > > > > > +++ b/drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-sys.c
> > > > > > @@ -0,0 +1,1384 @@
> > >
> > > [snip]
> > >
> > > > > > +static void dip_submit_worker(struct work_struct *work)
> > > > > > +{
> > > > > > +   struct mtk_dip_hw_submit_work *dip_submit_work =
> > > > > > +   container_of(work, struct mtk_dip_hw_submit_work, 
> > > > > > frame_work);
> > > > > > +   struct mtk_dip_hw *dip_hw = dip_submit_work->dip_hw;
> > > > > > +   struct mtk_dip_dev *dip_dev = mtk_dip_hw_to_dev(dip_hw);
> > > > > > +   struct mtk_dip_hw_work *dip_work;
> > > > > > +   struct mtk_dip_hw_subframe *buf;
> > > > > > +   u32 len, num;
> > > > > > +   int ret;
> > > > > > +
> > > > > > +   num  = atomic_read(_hw->num_composing);
> > > > > > +
> > > > > > +   mutex_lock(_hw->dip_worklist.queuelock);
> > > > > > +   dip_work = list_first_entry(_hw->dip_worklist.queue,
> > >
> > > [snip]
> > >
> > > > > > +
> > > > > > +   if (dip_work->frameparams.tuning_data.pa == 0) {
> > > > > > +   dev_dbg(_dev->pdev->dev,
> > > > > > +   "%s: frame_no(%d) has no tuning_data\n",
> > > > > > +   __func__, dip_work->frameparams.frame_no);
> > > > > > +
> > > > > > +   memcpy(_work->frameparams.tuning_data,
> > > > > > +  >tuning_buf, sizeof(buf->tuning_buf));
> > > > >
> > > > > Ditto.
> > > > >
> > > >
> > > > I got it.
> > > >
> > > > > > +   memset((char *)buf->tuning_buf.va, 0, 
> > > > > > DIP_TUNING_SZ);
> > > > >
> > > > > Ditto.
> > > >
> > > > I got it.
> > > >
> > > > >
> > > > > > +   /*
> > > > > > +* When user enqueued without tuning buffer,
> > > > > > +* it would use driver internal buffer.
> > > > > > +* So, tuning_data.va should be 0
> > > > > > +*/
> > > > > > +   dip_work->frameparams.tuning_data.va = 0;
> > > > >
> > > > > I don't understand this. We just zeroed the buffer via this kernel VA 
> > > > > few
> > > > > lines above, so why would it have to be set to 0?
> > > > >
> > > >
> > > > I will remove this unnecessary line.
> > > >
> > > > > > +   }
> > >
> > > After confirming the firmware part, I found that we use this field
> > > (tuning_data.va) to notify firmware if there is no tuning data from
> > > user.
> > >
> > > - frameparams.tuning_data.va is 0: use the default tuning data in
> > >SCP, but we still need to pass
> > >frameparams.tuning_data.pa because
> > >the buffer contains some working
> > >buffer requir

Re: [RFC PATCH V1 6/6] platform: mtk-isp: Add Mediatek DIP driver

2019-06-11 Thread Tomasz Figa
On Tue, Jun 11, 2019 at 5:48 PM Frederic Chen
 wrote:
>
> Dear Tomasz,
>
> I'd like to elaborate more about the tuning_data.va.
> Would you like to give us some advice about our improvement proposal inline?
>
> Thank you very much.
>
>
> On Wed, 2019-05-22 at 03:14 +0800, Frederic Chen wrote:
> > Dear Tomasz,
> >
> > I appreciate your comment. It is very helpful for us.
> >
> >
> > > > diff --git a/drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-sys.c 
> > > > b/drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-sys.c
> > > > new file mode 100644
> > > > index ..54d2b5f5b802
> > > > --- /dev/null
> > > > +++ b/drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-sys.c
> > > > @@ -0,0 +1,1384 @@
>
> [snip]
>
> > > > +static void dip_submit_worker(struct work_struct *work)
> > > > +{
> > > > +   struct mtk_dip_hw_submit_work *dip_submit_work =
> > > > +   container_of(work, struct mtk_dip_hw_submit_work, 
> > > > frame_work);
> > > > +   struct mtk_dip_hw *dip_hw = dip_submit_work->dip_hw;
> > > > +   struct mtk_dip_dev *dip_dev = mtk_dip_hw_to_dev(dip_hw);
> > > > +   struct mtk_dip_hw_work *dip_work;
> > > > +   struct mtk_dip_hw_subframe *buf;
> > > > +   u32 len, num;
> > > > +   int ret;
> > > > +
> > > > +   num  = atomic_read(_hw->num_composing);
> > > > +
> > > > +   mutex_lock(_hw->dip_worklist.queuelock);
> > > > +   dip_work = list_first_entry(_hw->dip_worklist.queue,
>
> [snip]
>
> > > > +
> > > > +   if (dip_work->frameparams.tuning_data.pa == 0) {
> > > > +   dev_dbg(_dev->pdev->dev,
> > > > +   "%s: frame_no(%d) has no tuning_data\n",
> > > > +   __func__, dip_work->frameparams.frame_no);
> > > > +
> > > > +   memcpy(_work->frameparams.tuning_data,
> > > > +  >tuning_buf, sizeof(buf->tuning_buf));
> > >
> > > Ditto.
> > >
> >
> > I got it.
> >
> > > > +   memset((char *)buf->tuning_buf.va, 0, DIP_TUNING_SZ);
> > >
> > > Ditto.
> >
> > I got it.
> >
> > >
> > > > +   /*
> > > > +* When user enqueued without tuning buffer,
> > > > +* it would use driver internal buffer.
> > > > +* So, tuning_data.va should be 0
> > > > +*/
> > > > +   dip_work->frameparams.tuning_data.va = 0;
> > >
> > > I don't understand this. We just zeroed the buffer via this kernel VA few
> > > lines above, so why would it have to be set to 0?
> > >
> >
> > I will remove this unnecessary line.
> >
> > > > +   }
>
> After confirming the firmware part, I found that we use this field
> (tuning_data.va) to notify firmware if there is no tuning data from
> user.
>
> - frameparams.tuning_data.va is 0: use the default tuning data in
>SCP, but we still need to pass
>frameparams.tuning_data.pa because
>the buffer contains some working
>buffer required.
> - frameparams.tuning_data.va is not 0: the tuning data was passed from
>the user
>
> Since we should not pass cpu addres to SCP, could I rename tuning_data.va
> as tuning_data.cookie, and write a constant value to indicate if SCP
> should use its internal default setting or not here?
>
> For example,
> /* SCP uses tuning data passed from userspace*/
> dip_work->frameparams.tuning_data.cookie = MTK_DIP_USER_TUNING_DATA;
>
> /* SCP uses internal tuning data */
> dip_work->frameparams.tuning_data.cookie = MTK_DIP_DEFAULT_TUNING_DATA;

Perhaps we could just call it "present" and set to true or false?

Best regards,
Tomasz


Re: [PATCH v4 01/14] dt-bindings: Add binding for MT2712 MIPI-CSI2

2019-06-10 Thread Tomasz Figa
On Mon, Jun 10, 2019 at 4:51 PM CK Hu  wrote:
>
> Hi, Tomasz:
>
> On Mon, 2019-06-10 at 12:32 +0900, Tomasz Figa wrote:
> > Hi CK, Stu,
> >
> > On Mon, Jun 10, 2019 at 11:34 AM CK Hu  wrote:
> > >
> > > Hi, Stu:
> > >
> > > "mediatek,mt2712-mipicsi" and "mediatek,mt2712-mipicsi-common" have many
> > > common part with "mediatek,mt8183-seninf", and I've a discussion in [1],
> > > so I would like these two to be merged together.
> > >
> > > [1] https://patchwork.kernel.org/patch/10979131/
> > >
> >
> > Thanks CK for spotting this.
> >
> > I also noticed that the driver in fact handles two hardware blocks at
> > the same time - SenInf and CamSV. Unless the architecture is very
> > different from MT8183, I'd suggest splitting it.
> >
> > On a general note, the MT8183 SenInf driver has received several
> > rounds of review comments already, but I couldn't find any comments
> > posted for this one.
> >
> > Given the two aspects above and also based on my quick look at code
> > added by this series, I'd recommend adding MT2712 support on top of
> > the MT8183 series.
>
> In [1], "mediatek,mt8183-seninf" use one device to control multiple csi
> instance, so it duplicate many register definition. In [2], one
> "mediatek,mt2712-mipicsi" device control one csi instance, so there are
> multiple device and the register definition does not duplicate.

I guess we didn't catch that in the review yet. It should be fixed.

> You
> recommend adding MT2712 support on top of the MT8183 series, do you mean
> that "mediatek,mt2712-mipicsi" should use one device to control multiple
> csi instance and duplicate the register setting?

There are some aspects of MT8183 series that are done better than the
MT2712 series, but apparently there are also some better aspects in
MT2712. We should take the best aspects of both series. :)

Best regards,
Tomasz

>
> [1] https://patchwork.kernel.org/patch/10979121/
> [2] https://patchwork.kernel.org/patch/10974573/
>
> Regards,
> CK
>
> >
> > Best regards,
> > Tomasz
>
>


Re: [PATCH v4 01/14] dt-bindings: Add binding for MT2712 MIPI-CSI2

2019-06-09 Thread Tomasz Figa
Hi CK, Stu,

On Mon, Jun 10, 2019 at 11:34 AM CK Hu  wrote:
>
> Hi, Stu:
>
> "mediatek,mt2712-mipicsi" and "mediatek,mt2712-mipicsi-common" have many
> common part with "mediatek,mt8183-seninf", and I've a discussion in [1],
> so I would like these two to be merged together.
>
> [1] https://patchwork.kernel.org/patch/10979131/
>

Thanks CK for spotting this.

I also noticed that the driver in fact handles two hardware blocks at
the same time - SenInf and CamSV. Unless the architecture is very
different from MT8183, I'd suggest splitting it.

On a general note, the MT8183 SenInf driver has received several
rounds of review comments already, but I couldn't find any comments
posted for this one.

Given the two aspects above and also based on my quick look at code
added by this series, I'd recommend adding MT2712 support on top of
the MT8183 series.

Best regards,
Tomasz


Re: [PATCH] of/device: add blacklist for iommu dma_ops

2019-06-05 Thread Tomasz Figa
On Mon, Jun 3, 2019 at 7:48 PM Rob Clark  wrote:
>
> On Sun, Jun 2, 2019 at 11:25 PM Tomasz Figa  wrote:
> >
> > On Mon, Jun 3, 2019 at 4:40 AM Rob Clark  wrote:
> > >
> > > On Fri, May 10, 2019 at 7:35 AM Rob Clark  wrote:
> > > >
> > > > On Tue, Dec 4, 2018 at 2:29 PM Rob Herring  wrote:
> > > > >
> > > > > On Sat, Dec 1, 2018 at 10:54 AM Rob Clark  wrote:
> > > > > >
> > > > > > This solves a problem we see with drm/msm, caused by getting
> > > > > > iommu_dma_ops while we attach our own domain and manage it directly 
> > > > > > at
> > > > > > the iommu API level:
> > > > > >
> > > > > >   [0038] user address but active_mm is swapper
> > > > > >   Internal error: Oops: 9605 [#1] PREEMPT SMP
> > > > > >   Modules linked in:
> > > > > >   CPU: 7 PID: 70 Comm: kworker/7:1 Tainted: GW 
> > > > > > 4.19.3 #90
> > > > > >   Hardware name: xxx (DT)
> > > > > >   Workqueue: events deferred_probe_work_func
> > > > > >   pstate: 80c9 (Nzcv daif +PAN +UAO)
> > > > > >   pc : iommu_dma_map_sg+0x7c/0x2c8
> > > > > >   lr : iommu_dma_map_sg+0x40/0x2c8
> > > > > >   sp : ff80095eb4f0
> > > > > >   x29: ff80095eb4f0 x28: 
> > > > > >   x27: ffc0f9431578 x26: 
> > > > > >   x25:  x24: 0003
> > > > > >   x23: 0001 x22: ffc0fa9ac010
> > > > > >   x21:  x20: ffc0fab40980
> > > > > >   x19: ffc0fab40980 x18: 0003
> > > > > >   x17: 01c4 x16: 0007
> > > > > >   x15: 000e x14: 
> > > > > >   x13:  x12: 0028
> > > > > >   x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
> > > > > >   x9 :  x8 : ffc0fab409a0
> > > > > >   x7 :  x6 : 0002
> > > > > >   x5 : 0001 x4 : 
> > > > > >   x3 : 0001 x2 : 0002
> > > > > >   x1 : ffc0f9431578 x0 : 
> > > > > >   Process kworker/7:1 (pid: 70, stack limit = 0x17d08ffb)
> > > > > >   Call trace:
> > > > > >iommu_dma_map_sg+0x7c/0x2c8
> > > > > >__iommu_map_sg_attrs+0x70/0x84
> > > > > >get_pages+0x170/0x1e8
> > > > > >msm_gem_get_iova+0x8c/0x128
> > > > > >_msm_gem_kernel_new+0x6c/0xc8
> > > > > >msm_gem_kernel_new+0x4c/0x58
> > > > > >dsi_tx_buf_alloc_6g+0x4c/0x8c
> > > > > >msm_dsi_host_modeset_init+0xc8/0x108
> > > > > >msm_dsi_modeset_init+0x54/0x18c
> > > > > >_dpu_kms_drm_obj_init+0x430/0x474
> > > > > >dpu_kms_hw_init+0x5f8/0x6b4
> > > > > >msm_drm_bind+0x360/0x6c8
> > > > > >try_to_bring_up_master.part.7+0x28/0x70
> > > > > >component_master_add_with_match+0xe8/0x124
> > > > > >msm_pdev_probe+0x294/0x2b4
> > > > > >platform_drv_probe+0x58/0xa4
> > > > > >really_probe+0x150/0x294
> > > > > >driver_probe_device+0xac/0xe8
> > > > > >__device_attach_driver+0xa4/0xb4
> > > > > >bus_for_each_drv+0x98/0xc8
> > > > > >__device_attach+0xac/0x12c
> > > > > >device_initial_probe+0x24/0x30
> > > > > >bus_probe_device+0x38/0x98
> > > > > >deferred_probe_work_func+0x78/0xa4
> > > > > >process_one_work+0x24c/0x3dc
> > > > > >worker_thread+0x280/0x360
> > > > > >kthread+0x134/0x13c
> > > > > >ret_from_fork+0x10/0x18
> > > > > >   Code: d284 91000725 6b17039f 5400048a (f9401f40)
> > > > > >   ---[ end trace f22dda57f3648e2c ]---
> > > > > >   Kernel panic - not syncing: Fatal exception
> > > > > >   SMP: stopping secondary CPUs
> > > > > >   Kernel Offset: disabled
> > > > > >   CPU features: 0x0,22802a18
> > 

Re: [PATCH] of/device: add blacklist for iommu dma_ops

2019-06-03 Thread Tomasz Figa
On Mon, Jun 3, 2019 at 4:40 AM Rob Clark  wrote:
>
> On Fri, May 10, 2019 at 7:35 AM Rob Clark  wrote:
> >
> > On Tue, Dec 4, 2018 at 2:29 PM Rob Herring  wrote:
> > >
> > > On Sat, Dec 1, 2018 at 10:54 AM Rob Clark  wrote:
> > > >
> > > > This solves a problem we see with drm/msm, caused by getting
> > > > iommu_dma_ops while we attach our own domain and manage it directly at
> > > > the iommu API level:
> > > >
> > > >   [0038] user address but active_mm is swapper
> > > >   Internal error: Oops: 9605 [#1] PREEMPT SMP
> > > >   Modules linked in:
> > > >   CPU: 7 PID: 70 Comm: kworker/7:1 Tainted: GW 4.19.3 
> > > > #90
> > > >   Hardware name: xxx (DT)
> > > >   Workqueue: events deferred_probe_work_func
> > > >   pstate: 80c9 (Nzcv daif +PAN +UAO)
> > > >   pc : iommu_dma_map_sg+0x7c/0x2c8
> > > >   lr : iommu_dma_map_sg+0x40/0x2c8
> > > >   sp : ff80095eb4f0
> > > >   x29: ff80095eb4f0 x28: 
> > > >   x27: ffc0f9431578 x26: 
> > > >   x25:  x24: 0003
> > > >   x23: 0001 x22: ffc0fa9ac010
> > > >   x21:  x20: ffc0fab40980
> > > >   x19: ffc0fab40980 x18: 0003
> > > >   x17: 01c4 x16: 0007
> > > >   x15: 000e x14: 
> > > >   x13:  x12: 0028
> > > >   x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
> > > >   x9 :  x8 : ffc0fab409a0
> > > >   x7 :  x6 : 0002
> > > >   x5 : 0001 x4 : 
> > > >   x3 : 0001 x2 : 0002
> > > >   x1 : ffc0f9431578 x0 : 
> > > >   Process kworker/7:1 (pid: 70, stack limit = 0x17d08ffb)
> > > >   Call trace:
> > > >iommu_dma_map_sg+0x7c/0x2c8
> > > >__iommu_map_sg_attrs+0x70/0x84
> > > >get_pages+0x170/0x1e8
> > > >msm_gem_get_iova+0x8c/0x128
> > > >_msm_gem_kernel_new+0x6c/0xc8
> > > >msm_gem_kernel_new+0x4c/0x58
> > > >dsi_tx_buf_alloc_6g+0x4c/0x8c
> > > >msm_dsi_host_modeset_init+0xc8/0x108
> > > >msm_dsi_modeset_init+0x54/0x18c
> > > >_dpu_kms_drm_obj_init+0x430/0x474
> > > >dpu_kms_hw_init+0x5f8/0x6b4
> > > >msm_drm_bind+0x360/0x6c8
> > > >try_to_bring_up_master.part.7+0x28/0x70
> > > >component_master_add_with_match+0xe8/0x124
> > > >msm_pdev_probe+0x294/0x2b4
> > > >platform_drv_probe+0x58/0xa4
> > > >really_probe+0x150/0x294
> > > >driver_probe_device+0xac/0xe8
> > > >__device_attach_driver+0xa4/0xb4
> > > >bus_for_each_drv+0x98/0xc8
> > > >__device_attach+0xac/0x12c
> > > >device_initial_probe+0x24/0x30
> > > >bus_probe_device+0x38/0x98
> > > >deferred_probe_work_func+0x78/0xa4
> > > >process_one_work+0x24c/0x3dc
> > > >worker_thread+0x280/0x360
> > > >kthread+0x134/0x13c
> > > >ret_from_fork+0x10/0x18
> > > >   Code: d284 91000725 6b17039f 5400048a (f9401f40)
> > > >   ---[ end trace f22dda57f3648e2c ]---
> > > >   Kernel panic - not syncing: Fatal exception
> > > >   SMP: stopping secondary CPUs
> > > >   Kernel Offset: disabled
> > > >   CPU features: 0x0,22802a18
> > > >   Memory Limit: none
> > > >
> > > > The problem is that when drm/msm does it's own iommu_attach_device(),
> > > > now the domain returned by iommu_get_domain_for_dev() is drm/msm's
> > > > domain, and it doesn't have domain->iova_cookie.
> > > >
> > > > We kind of avoided this problem prior to sdm845/dpu because the iommu
> > > > was attached to the mdp node in dt, which is a child of the toplevel
> > > > mdss node (which corresponds to the dev passed in dma_map_sg()).  But
> > > > with sdm845, now the iommu is attached at the mdss level so we hit the
> > > > iommu_dma_ops in dma_map_sg().
> > > >
> > > > But auto allocating/attaching a domain before the driver is probed was
> > > > already a blocking problem for enabling per-context pagetables for the
> > > > GPU.  This problem is also now solved with this patch.
> > > >
> > > > Fixes: 97890ba9289c dma-mapping: detect and configure IOMMU in 
> > > > of_dma_configure
> > > > Tested-by: Douglas Anderson 
> > > > Signed-off-by: Rob Clark 
> > > > ---
> > > > This is an alternative/replacement for [1].  What it lacks in elegance
> > > > it makes up for in practicality ;-)
> > > >
> > > > [1] https://patchwork.freedesktop.org/patch/264930/
> > > >
> > > >  drivers/of/device.c | 22 ++
> > > >  1 file changed, 22 insertions(+)
> > > >
> > > > diff --git a/drivers/of/device.c b/drivers/of/device.c
> > > > index 5957cd4fa262..15ffee00fb22 100644
> > > > --- a/drivers/of/device.c
> > > > +++ b/drivers/of/device.c
> > > > @@ -72,6 +72,14 @@ int of_device_add(struct platform_device *ofdev)
> > > > return device_add(>dev);
> > > >  }
> > > >
> > > > +static const struct of_device_id iommu_blacklist[] = {
> > > > +   { .compatible = "qcom,mdp4" },
> > > > +

Re: [RFC PATCH V1 6/6] platform: mtk-isp: Add Mediatek DIP driver

2019-05-28 Thread Tomasz Figa
On Thu, May 23, 2019 at 10:46 PM Frederic Chen
 wrote:
>
> Dear Tomasz,
>
> Thank you for your comments.
>
>
> On Wed, 2019-05-22 at 19:25 +0900, Tomasz Figa wrote:
> > Hi Frederic,
> >
> > On Wed, May 22, 2019 at 03:14:15AM +0800, Frederic Chen wrote:
> > > Dear Tomasz,
> > >
> > > I appreciate your comment. It is very helpful for us.
> > >
> >
> > You're welcome. Thanks for replying to all the comments. I'll skip those
> > resolved in my reply to keep the message shorter.
> >
> > >
> > > On Thu, 2019-05-09 at 18:48 +0900, Tomasz Figa wrote:
> > > > Hi Frederic,
> > > >
> > > > On Wed, Apr 17, 2019 at 7:45 PM Frederic Chen 
> > > >  wrote:
[snip]
> > > > Also a general note - a work can be queued only once. This means that
> > > > current code races when two dip_works are attempted to be queued very
> > > > quickly one after another (or even at the same time from different 
> > > > threads).
> > > >
> > > > I can think of two potential options for fixing this:
> > > >
> > > > 1) Loop in the work function until there is nothing to queue to the 
> > > > hardware
> > > >anymore - but this needs tricky synchronization, because there is 
> > > > still
> > > >short time at the end of the work function when a new dip_work could 
> > > > be
> > > >added.
> > > >
> > > > 2) Change this to a kthread that just keeps running in a loop waiting 
> > > > for
> > > >some available dip_work to show up and then sending it to the 
> > > > firmware.
> > > >This should be simpler, as the kthread shouldn't have a chance to 
> > > > miss
> > > >any dip_work queued.
> > > >
> > > > I'm personally in favor of option 2, as it should simplify the
> > > > synchronization.
> > > >
> > >
> > > I would like to re-design this part with a kthread in the next patch.
> >
> > Actually I missed another option. We could have 1 work_struct for 1
> > request and then we could keep using a workqueue. Perhaps that could be
> > simpler than a kthread.
> >
> > Actually, similar approach could be used for the dip_runner_func.
> > Instead of having a kthread looping, we could just have another
> > workqueue and 1 dip_runner_work per 1 request. Then we wouldn't need to
> > do the waiting loop ourselves anymore.
> >
> > Does it make sense?
>
> Yes, it make sense. Let me summarize the modification about the flow.
>
> First, we will have two work_struct in mtk_dip_request.
>
> struct mtk_dip_request {
> struct media_request request;
> //...
> /* Prepare DIP part hardware configurtion */
> struct mtk_dip_hw_submit_work submit_work;
> /* Replace dip_running thread jobs*/
> struct mtk_dip_hw_composing_work composing_work;
> /* Only for composing error handling */
> struct mtk_dip_hw_mdpcb_timeout_work timeout_work;
> };
>
> Second, the overall flow of handling each request is :
>
> 1. mtk_dip_hw_enqueue calls queue_work() to put submit_work into its
>workqueue
> 2. submit_work sends IMG_IPI_FRAME command to SCP to prepare DIP
>hardware configuration
> 3. dip_scp_handler receives the IMG_IPI_FRAME result from SCP
> 4. dip_scp_handler calls queue_work() to put composing_work (instead
>of original dip_running thread jobs) into its workqueue
> 5. composing_work calls dip_mdp_cmdq_send() to finish the mdp part tasks
> 6. dip_mdp_cb_func() trigged by MDP driver calls vb2_buffer_done to
>return the buffer (no workqueue required here)
>

Sounds good to me, but actually then simply making the workqueues
freezable doesn't solve the suspend/resume problem, because the work
functions wouldn't wait for the firmware/hardware completion anymore.
That's also okay, but in this case we need to add some code to suspend
to wait for any pending operations to complete.

Best regards,
Tomasz


Re: [PATCH 1/1] iommu/arm-smmu: Add support to use Last level cache

2018-12-12 Thread Tomasz Figa
On Fri, Dec 7, 2018 at 6:25 PM Vivek Gautam  wrote:
>
> Hi Robin,
>
> On Tue, Dec 4, 2018 at 8:51 PM Robin Murphy  wrote:
> >
> > On 04/12/2018 11:01, Vivek Gautam wrote:
> > > Qualcomm SoCs have an additional level of cache called as
> > > System cache, aka. Last level cache (LLC). This cache sits right
> > > before the DDR, and is tightly coupled with the memory controller.
> > > The cache is available to all the clients present in the SoC system.
> > > The clients request their slices from this system cache, make it
> > > active, and can then start using it.
> > > For these clients with smmu, to start using the system cache for
> > > buffers and, related page tables [1], memory attributes need to be
> > > set accordingly.
> > > This change updates the MAIR and TCR configurations with correct
> > > attributes to use this system cache.
> > >
> > > To explain a little about memory attribute requirements here:
> > >
> > > Non-coherent I/O devices can't look-up into inner caches. However,
> > > coherent I/O devices can. But both can allocate in the system cache
> > > based on system policy and configured memory attributes in page
> > > tables.
> > > CPUs can access both inner and outer caches (including system cache,
> > > aka. Last level cache), and can allocate into system cache too
> > > based on memory attributes, and system policy.
> > >
> > > Further looking at memory types, we have following -
> > > a) Normal uncached :- MAIR 0x44, inner non-cacheable,
> > >outer non-cacheable;
> > > b) Normal cached :-   MAIR 0xff, inner read write-back non-transient,
> > >outer read write-back non-transient;
> > >attribute setting for coherenet I/O devices.
> > >
> > > and, for non-coherent i/o devices that can allocate in system cache
> > > another type gets added -
> > > c) Normal sys-cached/non-inner-cached :-
> > >MAIR 0xf4, inner non-cacheable,
> > >outer read write-back non-transient
> > >
> > > So, CPU will automatically use the system cache for memory marked as
> > > normal cached. The normal sys-cached is downgraded to normal non-cached
> > > memory for CPUs.
> > > Coherent I/O devices can use system cache by marking the memory as
> > > normal cached.
> > > Non-coherent I/O devices, to use system cache, should mark the memory as
> > > normal sys-cached in page tables.
> > >
> > > This change is a realisation of following changes
> > > from downstream msm-4.9:
> > > iommu: io-pgtable-arm: Support DOMAIN_ATTRIBUTE_USE_UPSTREAM_HINT[2]
> > > iommu: io-pgtable-arm: Implement IOMMU_USE_UPSTREAM_HINT[3]
> > >
> > > [1] https://patchwork.kernel.org/patch/10302791/
> > > [2] 
> > > https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9=bf762276796e79ca90014992f4d9da5593fa7d51
> > > [3] 
> > > https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9=d4c72c413ea27c43f60825193d4de9cb8ffd9602
> > >
> > > Signed-off-by: Vivek Gautam 
> > > ---
> > >
> > > Changes since v1:
> > >   - Addressed Tomasz's comments for basing the change on
> > > "NO_INNER_CACHE" concept for non-coherent I/O devices
> > > rather than capturing "SYS_CACHE". This is to indicate
> > > clearly the intent of non-coherent I/O devices that
> > > can't access inner caches.
> >
> > That seems backwards to me - there is already a fundamental assumption
> > that non-coherent devices can't access caches. What we're adding here is
> > a weird exception where they *can* use some level of cache despite still
> > being non-coherent overall.
> >
> > In other words, it's not a case of downgrading coherent devices'
> > accesses to bypass inner caches, it's upgrading non-coherent devices'
> > accesses to hit the outer cache. That's certainly the understanding I
> > got from talking with Pratik at Plumbers, and it does appear to fit with
> > your explanation above despite the final conclusion you draw being
> > different.
>
> Thanks for the thorough review of the change.
> Right, I guess it's rather an upgrade for non-coherent devices to use
> an outer cache than a downgrade for coherent devices.
>

Note that it was not my suggestion to use "NO_INNER_CACHE" for
enabling the system cache, sorry for not being clear. What I was
asking for in my comment was regarding the previous patch disabling
inner cache if system cache is requested, which may not make for
coherent devices, which could benefit from using both inner and system
cache.

So note that there are several cases here:
 - coherent, IC, system cache alloc,
 - coherent. non-IC, system cache alloc,
 - coherent, IC, system cache look-up,
 - noncoherent device, non-IC, system cache alloc,
 - noncoherent device, non-IC, system cache look-up.

Given the presence or lack of coherency for the device, which of the
2/3 options is the best depends on the use case, e.g. DMA/CPU access
pattern, sharing memory between multiple devices, etc.

Best regards,

Re: [PATCH] of/device: add blacklist for iommu dma_ops

2018-12-02 Thread Tomasz Figa
\n",
> iommu ? " " : " not ");
>
> +   /*
> +* There is at least one case where the driver wants to directly
> +* manage the IOMMU, but if we end up with iommu dma_ops, that
> +* interferes with the drivers ability to use dma_map_sg() for
> +* cache operations.  Since we don't currently have a better
> +* solution, and this code runs before the driver is probed and
> +* has a chance to intervene, use a simple blacklist to avoid
> +* ending up with iommu dma_ops:
> +*/
> +   if (of_match_device(iommu_blacklist, dev)) {
> +   dev_dbg(dev, "skipping iommu hookup\n");
> +   iommu = NULL;
> +   }
> +
> arch_setup_dma_ops(dev, dma_addr, size, iommu, coherent);
>
> return 0;
> --
> 2.19.2
>

+Marek Szyprowski who I believe had a similar problem with Exynos DRM before.

Reviewed-by: Tomasz Figa 

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] drm/rockchip: update cursors asynchronously through atomic.

2018-11-27 Thread Tomasz Figa
Hi Gustavo,

On Tue, Nov 27, 2018 at 8:54 AM Gustavo Padovan
 wrote:
>
> Hi Tomasz,
>
> On 11/23/18 12:27 AM, Tomasz Figa wrote:
> > Hi Helen,
> >
> > On Fri, Nov 23, 2018 at 8:31 AM Helen Koike  
> > wrote:
> >> Hi Tomasz,
> >>
> >> On 11/20/18 4:48 AM, Tomasz Figa wrote:
> >>> Hi Helen,
> >>>
> >>> On Tue, Nov 20, 2018 at 4:08 AM Helen Koike  
> >>> wrote:
> >>>> From: Enric Balletbo i Serra 
> >>>>
> >>>> Add support to async updates of cursors by using the new atomic
> >>>> interface for that.
> >>>>
> >>>> Signed-off-by: Enric Balletbo i Serra 
> >>>> [updated for upstream]
> >>>> Signed-off-by: Helen Koike 
> >>>>
> >>>> ---
> >>>> Hello,
> >>>>
> >>>> This is the third version of the async-plane update suport to the
> >>>> Rockchip driver.
> >>>>
> >>> Thanks for a quick respin. Please see my comments inline. (I'll try to
> >>> be better at responding from now on...)
> >>>
> >>>> I tested running igt kms_cursor_legacy and kms_atomic tests using a 
> >>>> 96Boards Ficus.
> >>>>
> >>>> Note that before the patch, the following igt tests failed:
> >>>>
> >>>>  basic-flip-before-cursor-atomic
> >>>>  basic-flip-before-cursor-legacy
> >>>>  cursor-vs-flip-atomic
> >>>>  cursor-vs-flip-legacy
> >>>>  cursor-vs-flip-toggle
> >>>>  flip-vs-cursor-atomic
> >>>>  flip-vs-cursor-busy-crc-atomic
> >>>>  flip-vs-cursor-busy-crc-legacy
> >>>>  flip-vs-cursor-crc-atomic
> >>>>  flip-vs-cursor-crc-legacy
> >>>>  flip-vs-cursor-legacy
> >>>>
> >>>> Full log: https://people.collabora.com/~koike/results-4.20/html/
> >>>>
> >>>> Now with the patch applied the following were fixed:
> >>>>  basic-flip-before-cursor-atomic
> >>>>  basic-flip-before-cursor-legacy
> >>>>  flip-vs-cursor-atomic
> >>>>  flip-vs-cursor-legacy
> >>>>
> >>>> Full log: https://people.collabora.com/~koike/results-4.20-async/html/
> >>> Could you also test modetest, with the -C switch to test the legacy
> >>> cursor API? I remember it triggering crashes due to synchronization
> >>> issues easily.
> >> Sure. I tested with
> >> $ modetest -M rockchip -s 37:1920x1080 -C
> >>
> >> I also vary the mode but I couldn't trigger any crashes.
> >>
> >>>> Tomasz, as you mentined in v2 about waiting the hardware before updating
> >>>> the framebuffer, now I call the loop you pointed out in the async path,
> >>>> was that what you had in mind? Or do you think I would make sense to
> >>>> call the vop_crtc_atomic_flush() instead of just exposing that loop?
> >>>>
> >>>> Thanks
> >>>> Helen
> >>>>
> >>>> Changes in v3:
> >>>> - Rebased on top of drm-misc
> >>>> - Fix missing include in rockchip_drm_vop.c
> >>>> - New function vop_crtc_atomic_commit_flush
> >>>>
> >>>> Changes in v2:
> >>>> - v2: https://patchwork.freedesktop.org/patch/254180/
> >>>> - Change the framebuffer as well to cover jumpy cursor when hovering
> >>>>text boxes or hyperlink. (Tomasz)
> >>>> - Use the PSR inhibit mechanism when accessing VOP hardware instead of
> >>>>PSR flushing (Tomasz)
> >>>>
> >>>> Changes in v1:
> >>>> - Rebased on top of drm-misc
> >>>> - In async_check call drm_atomic_helper_check_plane_state to check that
> >>>>the desired plane is valid and update various bits of derived state
> >>>>(clipped coordinates etc.)
> >>>> - In async_check allow to configure new scaling in the fast path.
> >>>> - In async_update force to flush all registered PSR encoders.
> >>>> - In async_update call atomic_update directly.
> >>>> - In async_update call vop_cfg_done needed to set the vop registers and 
> >>>> take effect.
> 

Re: [RESEND PATCH v17 5/5] iommu/arm-smmu: Add support for qcom,smmu-v2 variant

2018-11-25 Thread Tomasz Figa
On Sat, Nov 24, 2018 at 3:34 AM Will Deacon  wrote:
>
> On Fri, Nov 23, 2018 at 03:06:29PM +0530, Vivek Gautam wrote:
> > On Fri, Nov 23, 2018 at 2:52 PM Tomasz Figa  wrote:
> > > On Fri, Nov 23, 2018 at 6:13 PM Vivek Gautam
> > >  wrote:
> > > > On Wed, Nov 21, 2018 at 11:09 PM Will Deacon  
> > > > wrote:
> > > > > On Fri, Nov 16, 2018 at 04:54:30PM +0530, Vivek Gautam wrote:
> > > > > > @@ -2026,6 +2027,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, 
> > > > > > ARM_SMMU_V1_64K, GENERIC_SMMU);
> > > > > >  ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
> > > > > >  ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
> > > > > >
> > > > > > +static const char * const qcom_smmuv2_clks[] = {
> > > > > > + "bus", "iface",
> > > > > > +};
> > > > > > +
> > > > > > +static const struct arm_smmu_match_data qcom_smmuv2 = {
> > > > > > + .version = ARM_SMMU_V2,
> > > > > > + .model = QCOM_SMMUV2,
> > > > > > + .clks = qcom_smmuv2_clks,
> > > > > > + .num_clks = ARRAY_SIZE(qcom_smmuv2_clks),
> > > > > > +};
> > > > >
> > > > > These seems redundant if we go down the route proposed by Thor, where 
> > > > > we
> > > > > just pull all of the clocks out of the device-tree. In which case, why
> > > > > do we need this match_data at all?
> > > >
> > > > Which is better? Driver relying solely on the device tree to tell
> > > > which all clocks
> > > > are required to be enabled,
> > > > or, the driver deciding itself based on the platform's match data,
> > > > that it should
> > > > have X, Y, & Z clocks that should be supplied from the device tree.
> > >
> > > The former would simplify the driver, but would also make it
> > > impossible to spot mistakes in DT, which would ultimately surface out
> > > as very hard to debug bugs (likely complete system lockups).
> >
> > Thanks.
> > Yea, this is how I understand things presently. Relying on device tree
> > puts the things out of driver's control.
>
> But it also has the undesirable effect of having to update the driver
> code whenever we want to add support for a new SMMU implementation. If
> we do this all in the DT, as Thor is trying to do, then older kernels
> will work well with new hardware.

Fair enough, if you're okay with that. Obviously one would still have
to change the DT bindings to list the exact set of clocks for the new
hardware variant, unless the convention changed recently.

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RESEND PATCH v17 5/5] iommu/arm-smmu: Add support for qcom,smmu-v2 variant

2018-11-23 Thread Tomasz Figa
Hi Vivek, Will,

On Fri, Nov 23, 2018 at 6:13 PM Vivek Gautam
 wrote:
>
> Hi Will,
>
> On Wed, Nov 21, 2018 at 11:09 PM Will Deacon  wrote:
> >
> > [+Thor]
> >
> > On Fri, Nov 16, 2018 at 04:54:30PM +0530, Vivek Gautam wrote:
> > > qcom,smmu-v2 is an arm,smmu-v2 implementation with specific
> > > clock and power requirements.
> > > On msm8996, multiple cores, viz. mdss, video, etc. use this
> > > smmu. On sdm845, this smmu is used with gpu.
> > > Add bindings for the same.
> > >
> > > Signed-off-by: Vivek Gautam 
> > > Reviewed-by: Rob Herring 
> > > Reviewed-by: Tomasz Figa 
> > > Tested-by: Srinivas Kandagatla 
> > > Reviewed-by: Robin Murphy 
> > > ---
> > >  drivers/iommu/arm-smmu.c | 13 +
> > >  1 file changed, 13 insertions(+)
> > >
> > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > > index 2098c3141f5f..d315ca637097 100644
> > > --- a/drivers/iommu/arm-smmu.c
> > > +++ b/drivers/iommu/arm-smmu.c
> > > @@ -120,6 +120,7 @@ enum arm_smmu_implementation {
> > >   GENERIC_SMMU,
> > >   ARM_MMU500,
> > >   CAVIUM_SMMUV2,
> > > + QCOM_SMMUV2,
> > >  };
> > >
> > >  struct arm_smmu_s2cr {
> > > @@ -2026,6 +2027,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, 
> > > GENERIC_SMMU);
> > >  ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
> > >  ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
> > >
> > > +static const char * const qcom_smmuv2_clks[] = {
> > > + "bus", "iface",
> > > +};
> > > +
> > > +static const struct arm_smmu_match_data qcom_smmuv2 = {
> > > + .version = ARM_SMMU_V2,
> > > + .model = QCOM_SMMUV2,
> > > + .clks = qcom_smmuv2_clks,
> > > + .num_clks = ARRAY_SIZE(qcom_smmuv2_clks),
> > > +};
> >
> > These seems redundant if we go down the route proposed by Thor, where we
> > just pull all of the clocks out of the device-tree. In which case, why
> > do we need this match_data at all?
>
> Which is better? Driver relying solely on the device tree to tell
> which all clocks
> are required to be enabled,
> or, the driver deciding itself based on the platform's match data,
> that it should
> have X, Y, & Z clocks that should be supplied from the device tree.

The former would simplify the driver, but would also make it
impossible to spot mistakes in DT, which would ultimately surface out
as very hard to debug bugs (likely complete system lockups).

For qcom_smmuv2, I believe we're eventually going to end up with
platform-specific quirks anyway, so specifying the clocks too wouldn't
hurt. Given that, I'd recommend sticking to the latter, i.e. what this
patch does.

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


  1   2   3   >