Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache

2019-01-29 Thread Vivek Gautam
On Tue, Jan 29, 2019 at 8:34 PM Ard Biesheuvel
 wrote:
>
> (+ Bjorn)
>
> On Mon, 28 Jan 2019 at 12:27, Vivek Gautam  
> wrote:
> >
> > Hi Ard,
> >
> > On Thu, Jan 24, 2019 at 1:25 PM Ard Biesheuvel
> >  wrote:
> > >
> > > On Thu, 24 Jan 2019 at 07:58, Vivek Gautam  
> > > wrote:
> > > >
> > > > On Mon, Jan 21, 2019 at 7:55 PM Ard Biesheuvel
> > > >  wrote:
> > > > >
> > > > > On Mon, 21 Jan 2019 at 14:56, Robin Murphy  
> > > > > wrote:
> > > > > >
> > > > > > On 21/01/2019 13:36, Ard Biesheuvel wrote:
> > > > > > > On Mon, 21 Jan 2019 at 14:25, Robin Murphy  
> > > > > > > wrote:
> > > > > > >>
> > > > > > >> On 21/01/2019 10:50, Ard Biesheuvel wrote:
> > > > > > >>> On Mon, 21 Jan 2019 at 11:17, Vivek Gautam 
> > > > > > >>>  wrote:
> > > > > > 
> > > > > >  Hi,
> > > > > > 
> > > > > > 
> > > > > >  On Mon, Jan 21, 2019 at 12:56 PM Ard Biesheuvel
> > > > > >   wrote:
> > > > > > >
> > > > > > > On Mon, 21 Jan 2019 at 06:54, Vivek Gautam 
> > > > > > >  wrote:
> > > > > > >>
> > > > > > >> Qualcomm SoCs have an additional level of cache called as
> > > > > > >> System cache, aka. Last level cache (LLC). This cache sits 
> > > > > > >> right
> > > > > > >> before the DDR, and is tightly coupled with the memory 
> > > > > > >> controller.
> > > > > > >> The clients using this cache request their slices from this
> > > > > > >> system cache, make it active, and can then start using it.
> > > > > > >> For these clients with smmu, to start using the system cache 
> > > > > > >> for
> > > > > > >> buffers and, related page tables [1], memory attributes need 
> > > > > > >> to be
> > > > > > >> set accordingly. This series add the required support.
> > > > > > >>
> > > > > > >
> > > > > > > Does this actually improve performance on reads from a 
> > > > > > > device? The
> > > > > > > non-cache coherent DMA routines perform an unconditional 
> > > > > > > D-cache
> > > > > > > invalidate by VA to the PoC before reading from the buffers 
> > > > > > > filled by
> > > > > > > the device, and I would expect the PoC to be defined as lying 
> > > > > > > beyond
> > > > > > > the LLC to still guarantee the architected behavior.
> > > > > > 
> > > > > >  We have seen performance improvements when running Manhattan
> > > > > >  GFXBench benchmarks.
> > > > > > 
> > > > > > >>>
> > > > > > >>> Ah ok, that makes sense, since in that case, the data flow is 
> > > > > > >>> mostly
> > > > > > >>> to the device, not from the device.
> > > > > > >>>
> > > > > >  As for the PoC, from my knowledge on sdm845 the system cache, 
> > > > > >  aka
> > > > > >  Last level cache (LLC) lies beyond the point of coherency.
> > > > > >  Non-cache coherent buffers will not be cached to system cache 
> > > > > >  also, and
> > > > > >  no additional software cache maintenance ops are required for 
> > > > > >  system cache.
> > > > > >  Pratik can add more if I am missing something.
> > > > > > 
> > > > > >  To take care of the memory attributes from DMA APIs side, we 
> > > > > >  can add a
> > > > > >  DMA_ATTR definition to take care of any dma non-coherent APIs 
> > > > > >  calls.
> > > > > > 
> > > > > > >>>
> > > > > > >>> So does the device use the correct inner non-cacheable, outer
> > > > > > >>> writeback cacheable attributes if the SMMU is in pass-through?
> > > > > > >>>
> > > > > > >>> We have been looking into another use case where the fact that 
> > > > > > >>> the
> > > > > > >>> SMMU overrides memory attributes is causing issues (WC mappings 
> > > > > > >>> used
> > > > > > >>> by the radeon and amdgpu driver). So if the SMMU would honour 
> > > > > > >>> the
> > > > > > >>> existing attributes, would you still need the SMMU changes?
> > > > > > >>
> > > > > > >> Even if we could force a stage 2 mapping with the weakest 
> > > > > > >> pagetable
> > > > > > >> attributes (such that combining would work), there would still 
> > > > > > >> need to
> > > > > > >> be a way to set the TCR attributes appropriately if this 
> > > > > > >> behaviour is
> > > > > > >> wanted for the SMMU's own table walks as well.
> > > > > > >>
> > > > > > >
> > > > > > > Isn't that just a matter of implementing support for SMMUs that 
> > > > > > > lack
> > > > > > > the 'dma-coherent' attribute?
> > > > > >
> > > > > > Not quite - in general they need INC-ONC attributes in case there
> > > > > > actually is something in the architectural outer-cacheable domain.
> > > > >
> > > > > But is it a problem to use INC-ONC attributes for the SMMU PTW on this
> > > > > chip? AIUI, the reason for the SMMU changes is to avoid the
> > > > > performance hit of snooping, which is more expensive than cache
> > > > > maintenance of SMMU page tables. So are you saying the by-VA cache
> > > > > maintenance is not relayed to this system cache, resulting in 

Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache

2019-01-29 Thread Ard Biesheuvel
(+ Bjorn)

On Mon, 28 Jan 2019 at 12:27, Vivek Gautam  wrote:
>
> Hi Ard,
>
> On Thu, Jan 24, 2019 at 1:25 PM Ard Biesheuvel
>  wrote:
> >
> > On Thu, 24 Jan 2019 at 07:58, Vivek Gautam  
> > wrote:
> > >
> > > On Mon, Jan 21, 2019 at 7:55 PM Ard Biesheuvel
> > >  wrote:
> > > >
> > > > On Mon, 21 Jan 2019 at 14:56, Robin Murphy  wrote:
> > > > >
> > > > > On 21/01/2019 13:36, Ard Biesheuvel wrote:
> > > > > > On Mon, 21 Jan 2019 at 14:25, Robin Murphy  
> > > > > > wrote:
> > > > > >>
> > > > > >> On 21/01/2019 10:50, Ard Biesheuvel wrote:
> > > > > >>> On Mon, 21 Jan 2019 at 11:17, Vivek Gautam 
> > > > > >>>  wrote:
> > > > > 
> > > > >  Hi,
> > > > > 
> > > > > 
> > > > >  On Mon, Jan 21, 2019 at 12:56 PM Ard Biesheuvel
> > > > >   wrote:
> > > > > >
> > > > > > On Mon, 21 Jan 2019 at 06:54, Vivek Gautam 
> > > > > >  wrote:
> > > > > >>
> > > > > >> Qualcomm SoCs have an additional level of cache called as
> > > > > >> System cache, aka. Last level cache (LLC). This cache sits 
> > > > > >> right
> > > > > >> before the DDR, and is tightly coupled with the memory 
> > > > > >> controller.
> > > > > >> The clients using this cache request their slices from this
> > > > > >> system cache, make it active, and can then start using it.
> > > > > >> For these clients with smmu, to start using the system cache 
> > > > > >> for
> > > > > >> buffers and, related page tables [1], memory attributes need 
> > > > > >> to be
> > > > > >> set accordingly. This series add the required support.
> > > > > >>
> > > > > >
> > > > > > Does this actually improve performance on reads from a device? 
> > > > > > The
> > > > > > non-cache coherent DMA routines perform an unconditional D-cache
> > > > > > invalidate by VA to the PoC before reading from the buffers 
> > > > > > filled by
> > > > > > the device, and I would expect the PoC to be defined as lying 
> > > > > > beyond
> > > > > > the LLC to still guarantee the architected behavior.
> > > > > 
> > > > >  We have seen performance improvements when running Manhattan
> > > > >  GFXBench benchmarks.
> > > > > 
> > > > > >>>
> > > > > >>> Ah ok, that makes sense, since in that case, the data flow is 
> > > > > >>> mostly
> > > > > >>> to the device, not from the device.
> > > > > >>>
> > > > >  As for the PoC, from my knowledge on sdm845 the system cache, aka
> > > > >  Last level cache (LLC) lies beyond the point of coherency.
> > > > >  Non-cache coherent buffers will not be cached to system cache 
> > > > >  also, and
> > > > >  no additional software cache maintenance ops are required for 
> > > > >  system cache.
> > > > >  Pratik can add more if I am missing something.
> > > > > 
> > > > >  To take care of the memory attributes from DMA APIs side, we can 
> > > > >  add a
> > > > >  DMA_ATTR definition to take care of any dma non-coherent APIs 
> > > > >  calls.
> > > > > 
> > > > > >>>
> > > > > >>> So does the device use the correct inner non-cacheable, outer
> > > > > >>> writeback cacheable attributes if the SMMU is in pass-through?
> > > > > >>>
> > > > > >>> We have been looking into another use case where the fact that the
> > > > > >>> SMMU overrides memory attributes is causing issues (WC mappings 
> > > > > >>> used
> > > > > >>> by the radeon and amdgpu driver). So if the SMMU would honour the
> > > > > >>> existing attributes, would you still need the SMMU changes?
> > > > > >>
> > > > > >> Even if we could force a stage 2 mapping with the weakest pagetable
> > > > > >> attributes (such that combining would work), there would still 
> > > > > >> need to
> > > > > >> be a way to set the TCR attributes appropriately if this behaviour 
> > > > > >> is
> > > > > >> wanted for the SMMU's own table walks as well.
> > > > > >>
> > > > > >
> > > > > > Isn't that just a matter of implementing support for SMMUs that lack
> > > > > > the 'dma-coherent' attribute?
> > > > >
> > > > > Not quite - in general they need INC-ONC attributes in case there
> > > > > actually is something in the architectural outer-cacheable domain.
> > > >
> > > > But is it a problem to use INC-ONC attributes for the SMMU PTW on this
> > > > chip? AIUI, the reason for the SMMU changes is to avoid the
> > > > performance hit of snooping, which is more expensive than cache
> > > > maintenance of SMMU page tables. So are you saying the by-VA cache
> > > > maintenance is not relayed to this system cache, resulting in page
> > > > table updates to be invisible to masters using INC-ONC attributes?
> > >
> > > The reason for this SMMU changes is that the non-coherent devices
> > > can't access the inner caches at all. But they have a way to allocate
> > > and lookup in system cache.
> > >
> > > CPU will by default make use of system cache when the inner-cacheable
> > > and outer-cacheable memory 

Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache

2019-01-28 Thread Vivek Gautam
Hi Ard,

On Thu, Jan 24, 2019 at 1:25 PM Ard Biesheuvel
 wrote:
>
> On Thu, 24 Jan 2019 at 07:58, Vivek Gautam  
> wrote:
> >
> > On Mon, Jan 21, 2019 at 7:55 PM Ard Biesheuvel
> >  wrote:
> > >
> > > On Mon, 21 Jan 2019 at 14:56, Robin Murphy  wrote:
> > > >
> > > > On 21/01/2019 13:36, Ard Biesheuvel wrote:
> > > > > On Mon, 21 Jan 2019 at 14:25, Robin Murphy  
> > > > > wrote:
> > > > >>
> > > > >> On 21/01/2019 10:50, Ard Biesheuvel wrote:
> > > > >>> On Mon, 21 Jan 2019 at 11:17, Vivek Gautam 
> > > > >>>  wrote:
> > > > 
> > > >  Hi,
> > > > 
> > > > 
> > > >  On Mon, Jan 21, 2019 at 12:56 PM Ard Biesheuvel
> > > >   wrote:
> > > > >
> > > > > On Mon, 21 Jan 2019 at 06:54, Vivek Gautam 
> > > > >  wrote:
> > > > >>
> > > > >> Qualcomm SoCs have an additional level of cache called as
> > > > >> System cache, aka. Last level cache (LLC). This cache sits right
> > > > >> before the DDR, and is tightly coupled with the memory 
> > > > >> controller.
> > > > >> The clients using this cache request their slices from this
> > > > >> system cache, make it active, and can then start using it.
> > > > >> For these clients with smmu, to start using the system cache for
> > > > >> buffers and, related page tables [1], memory attributes need to 
> > > > >> be
> > > > >> set accordingly. This series add the required support.
> > > > >>
> > > > >
> > > > > Does this actually improve performance on reads from a device? The
> > > > > non-cache coherent DMA routines perform an unconditional D-cache
> > > > > invalidate by VA to the PoC before reading from the buffers 
> > > > > filled by
> > > > > the device, and I would expect the PoC to be defined as lying 
> > > > > beyond
> > > > > the LLC to still guarantee the architected behavior.
> > > > 
> > > >  We have seen performance improvements when running Manhattan
> > > >  GFXBench benchmarks.
> > > > 
> > > > >>>
> > > > >>> Ah ok, that makes sense, since in that case, the data flow is mostly
> > > > >>> to the device, not from the device.
> > > > >>>
> > > >  As for the PoC, from my knowledge on sdm845 the system cache, aka
> > > >  Last level cache (LLC) lies beyond the point of coherency.
> > > >  Non-cache coherent buffers will not be cached to system cache 
> > > >  also, and
> > > >  no additional software cache maintenance ops are required for 
> > > >  system cache.
> > > >  Pratik can add more if I am missing something.
> > > > 
> > > >  To take care of the memory attributes from DMA APIs side, we can 
> > > >  add a
> > > >  DMA_ATTR definition to take care of any dma non-coherent APIs 
> > > >  calls.
> > > > 
> > > > >>>
> > > > >>> So does the device use the correct inner non-cacheable, outer
> > > > >>> writeback cacheable attributes if the SMMU is in pass-through?
> > > > >>>
> > > > >>> We have been looking into another use case where the fact that the
> > > > >>> SMMU overrides memory attributes is causing issues (WC mappings used
> > > > >>> by the radeon and amdgpu driver). So if the SMMU would honour the
> > > > >>> existing attributes, would you still need the SMMU changes?
> > > > >>
> > > > >> Even if we could force a stage 2 mapping with the weakest pagetable
> > > > >> attributes (such that combining would work), there would still need 
> > > > >> to
> > > > >> be a way to set the TCR attributes appropriately if this behaviour is
> > > > >> wanted for the SMMU's own table walks as well.
> > > > >>
> > > > >
> > > > > Isn't that just a matter of implementing support for SMMUs that lack
> > > > > the 'dma-coherent' attribute?
> > > >
> > > > Not quite - in general they need INC-ONC attributes in case there
> > > > actually is something in the architectural outer-cacheable domain.
> > >
> > > But is it a problem to use INC-ONC attributes for the SMMU PTW on this
> > > chip? AIUI, the reason for the SMMU changes is to avoid the
> > > performance hit of snooping, which is more expensive than cache
> > > maintenance of SMMU page tables. So are you saying the by-VA cache
> > > maintenance is not relayed to this system cache, resulting in page
> > > table updates to be invisible to masters using INC-ONC attributes?
> >
> > The reason for this SMMU changes is that the non-coherent devices
> > can't access the inner caches at all. But they have a way to allocate
> > and lookup in system cache.
> >
> > CPU will by default make use of system cache when the inner-cacheable
> > and outer-cacheable memory attribute is set.
> >
> > So for SMMU page tables to be visible to PTW,
> > -- For IO coherent clients, the CPU cache maintenance operations are not
> > required for buffers marked Normal Cached to achieve a coherent view of
> > memory. However, client-specific cache maintenance may still be
> > required for devices
> > with local caches (for example, compute DSP local L1 

Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache

2019-01-23 Thread Ard Biesheuvel
On Thu, 24 Jan 2019 at 07:58, Vivek Gautam  wrote:
>
> On Mon, Jan 21, 2019 at 7:55 PM Ard Biesheuvel
>  wrote:
> >
> > On Mon, 21 Jan 2019 at 14:56, Robin Murphy  wrote:
> > >
> > > On 21/01/2019 13:36, Ard Biesheuvel wrote:
> > > > On Mon, 21 Jan 2019 at 14:25, Robin Murphy  wrote:
> > > >>
> > > >> On 21/01/2019 10:50, Ard Biesheuvel wrote:
> > > >>> On Mon, 21 Jan 2019 at 11:17, Vivek Gautam 
> > > >>>  wrote:
> > > 
> > >  Hi,
> > > 
> > > 
> > >  On Mon, Jan 21, 2019 at 12:56 PM Ard Biesheuvel
> > >   wrote:
> > > >
> > > > On Mon, 21 Jan 2019 at 06:54, Vivek Gautam 
> > > >  wrote:
> > > >>
> > > >> Qualcomm SoCs have an additional level of cache called as
> > > >> System cache, aka. Last level cache (LLC). This cache sits right
> > > >> before the DDR, and is tightly coupled with the memory controller.
> > > >> The clients using this cache request their slices from this
> > > >> system cache, make it active, and can then start using it.
> > > >> For these clients with smmu, to start using the system cache for
> > > >> buffers and, related page tables [1], memory attributes need to be
> > > >> set accordingly. This series add the required support.
> > > >>
> > > >
> > > > Does this actually improve performance on reads from a device? The
> > > > non-cache coherent DMA routines perform an unconditional D-cache
> > > > invalidate by VA to the PoC before reading from the buffers filled 
> > > > by
> > > > the device, and I would expect the PoC to be defined as lying beyond
> > > > the LLC to still guarantee the architected behavior.
> > > 
> > >  We have seen performance improvements when running Manhattan
> > >  GFXBench benchmarks.
> > > 
> > > >>>
> > > >>> Ah ok, that makes sense, since in that case, the data flow is mostly
> > > >>> to the device, not from the device.
> > > >>>
> > >  As for the PoC, from my knowledge on sdm845 the system cache, aka
> > >  Last level cache (LLC) lies beyond the point of coherency.
> > >  Non-cache coherent buffers will not be cached to system cache also, 
> > >  and
> > >  no additional software cache maintenance ops are required for system 
> > >  cache.
> > >  Pratik can add more if I am missing something.
> > > 
> > >  To take care of the memory attributes from DMA APIs side, we can add 
> > >  a
> > >  DMA_ATTR definition to take care of any dma non-coherent APIs calls.
> > > 
> > > >>>
> > > >>> So does the device use the correct inner non-cacheable, outer
> > > >>> writeback cacheable attributes if the SMMU is in pass-through?
> > > >>>
> > > >>> We have been looking into another use case where the fact that the
> > > >>> SMMU overrides memory attributes is causing issues (WC mappings used
> > > >>> by the radeon and amdgpu driver). So if the SMMU would honour the
> > > >>> existing attributes, would you still need the SMMU changes?
> > > >>
> > > >> Even if we could force a stage 2 mapping with the weakest pagetable
> > > >> attributes (such that combining would work), there would still need to
> > > >> be a way to set the TCR attributes appropriately if this behaviour is
> > > >> wanted for the SMMU's own table walks as well.
> > > >>
> > > >
> > > > Isn't that just a matter of implementing support for SMMUs that lack
> > > > the 'dma-coherent' attribute?
> > >
> > > Not quite - in general they need INC-ONC attributes in case there
> > > actually is something in the architectural outer-cacheable domain.
> >
> > But is it a problem to use INC-ONC attributes for the SMMU PTW on this
> > chip? AIUI, the reason for the SMMU changes is to avoid the
> > performance hit of snooping, which is more expensive than cache
> > maintenance of SMMU page tables. So are you saying the by-VA cache
> > maintenance is not relayed to this system cache, resulting in page
> > table updates to be invisible to masters using INC-ONC attributes?
>
> The reason for this SMMU changes is that the non-coherent devices
> can't access the inner caches at all. But they have a way to allocate
> and lookup in system cache.
>
> CPU will by default make use of system cache when the inner-cacheable
> and outer-cacheable memory attribute is set.
>
> So for SMMU page tables to be visible to PTW,
> -- For IO coherent clients, the CPU cache maintenance operations are not
> required for buffers marked Normal Cached to achieve a coherent view of
> memory. However, client-specific cache maintenance may still be
> required for devices
> with local caches (for example, compute DSP local L1 or L2).

Why would devices need to access the SMMU page tables?

> -- For non-IO coherent clients, the CPU cache maintenance operations (cleans
> and/or invalidates) are required at buffer handoff points for buffers marked 
> as
> Normal Cached in any CPU page table in order to observe the latest updates.
>

Indeed, and this is what your non-coherent 

Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache

2019-01-23 Thread Vivek Gautam
On Mon, Jan 21, 2019 at 7:55 PM Ard Biesheuvel
 wrote:
>
> On Mon, 21 Jan 2019 at 14:56, Robin Murphy  wrote:
> >
> > On 21/01/2019 13:36, Ard Biesheuvel wrote:
> > > On Mon, 21 Jan 2019 at 14:25, Robin Murphy  wrote:
> > >>
> > >> On 21/01/2019 10:50, Ard Biesheuvel wrote:
> > >>> On Mon, 21 Jan 2019 at 11:17, Vivek Gautam 
> > >>>  wrote:
> > 
> >  Hi,
> > 
> > 
> >  On Mon, Jan 21, 2019 at 12:56 PM Ard Biesheuvel
> >   wrote:
> > >
> > > On Mon, 21 Jan 2019 at 06:54, Vivek Gautam 
> > >  wrote:
> > >>
> > >> Qualcomm SoCs have an additional level of cache called as
> > >> System cache, aka. Last level cache (LLC). This cache sits right
> > >> before the DDR, and is tightly coupled with the memory controller.
> > >> The clients using this cache request their slices from this
> > >> system cache, make it active, and can then start using it.
> > >> For these clients with smmu, to start using the system cache for
> > >> buffers and, related page tables [1], memory attributes need to be
> > >> set accordingly. This series add the required support.
> > >>
> > >
> > > Does this actually improve performance on reads from a device? The
> > > non-cache coherent DMA routines perform an unconditional D-cache
> > > invalidate by VA to the PoC before reading from the buffers filled by
> > > the device, and I would expect the PoC to be defined as lying beyond
> > > the LLC to still guarantee the architected behavior.
> > 
> >  We have seen performance improvements when running Manhattan
> >  GFXBench benchmarks.
> > 
> > >>>
> > >>> Ah ok, that makes sense, since in that case, the data flow is mostly
> > >>> to the device, not from the device.
> > >>>
> >  As for the PoC, from my knowledge on sdm845 the system cache, aka
> >  Last level cache (LLC) lies beyond the point of coherency.
> >  Non-cache coherent buffers will not be cached to system cache also, and
> >  no additional software cache maintenance ops are required for system 
> >  cache.
> >  Pratik can add more if I am missing something.
> > 
> >  To take care of the memory attributes from DMA APIs side, we can add a
> >  DMA_ATTR definition to take care of any dma non-coherent APIs calls.
> > 
> > >>>
> > >>> So does the device use the correct inner non-cacheable, outer
> > >>> writeback cacheable attributes if the SMMU is in pass-through?
> > >>>
> > >>> We have been looking into another use case where the fact that the
> > >>> SMMU overrides memory attributes is causing issues (WC mappings used
> > >>> by the radeon and amdgpu driver). So if the SMMU would honour the
> > >>> existing attributes, would you still need the SMMU changes?
> > >>
> > >> Even if we could force a stage 2 mapping with the weakest pagetable
> > >> attributes (such that combining would work), there would still need to
> > >> be a way to set the TCR attributes appropriately if this behaviour is
> > >> wanted for the SMMU's own table walks as well.
> > >>
> > >
> > > Isn't that just a matter of implementing support for SMMUs that lack
> > > the 'dma-coherent' attribute?
> >
> > Not quite - in general they need INC-ONC attributes in case there
> > actually is something in the architectural outer-cacheable domain.
>
> But is it a problem to use INC-ONC attributes for the SMMU PTW on this
> chip? AIUI, the reason for the SMMU changes is to avoid the
> performance hit of snooping, which is more expensive than cache
> maintenance of SMMU page tables. So are you saying the by-VA cache
> maintenance is not relayed to this system cache, resulting in page
> table updates to be invisible to masters using INC-ONC attributes?

The reason for this SMMU changes is that the non-coherent devices
can't access the inner caches at all. But they have a way to allocate
and lookup in system cache.

CPU will by default make use of system cache when the inner-cacheable
and outer-cacheable memory attribute is set.

So for SMMU page tables to be visible to PTW,
-- For IO coherent clients, the CPU cache maintenance operations are not
required for buffers marked Normal Cached to achieve a coherent view of
memory. However, client-specific cache maintenance may still be
required for devices
with local caches (for example, compute DSP local L1 or L2).
-- For non-IO coherent clients, the CPU cache maintenance operations (cleans
and/or invalidates) are required at buffer handoff points for buffers marked as
Normal Cached in any CPU page table in order to observe the latest updates.


Regards
Vivek

>
> > The
> > case of the outer cacheablility being not that but a hint to control
> > non-CPU traffic through some not-quite-transparent cache behind the PoC
> > definitely stays wrapped up in qcom-specific magic ;)
> >
>
> I'm not surprised ...



-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux 

Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache

2019-01-21 Thread Robin Murphy

On 21/01/2019 14:24, Ard Biesheuvel wrote:

On Mon, 21 Jan 2019 at 14:56, Robin Murphy  wrote:


On 21/01/2019 13:36, Ard Biesheuvel wrote:

On Mon, 21 Jan 2019 at 14:25, Robin Murphy  wrote:


On 21/01/2019 10:50, Ard Biesheuvel wrote:

On Mon, 21 Jan 2019 at 11:17, Vivek Gautam  wrote:


Hi,


On Mon, Jan 21, 2019 at 12:56 PM Ard Biesheuvel
 wrote:


On Mon, 21 Jan 2019 at 06:54, Vivek Gautam  wrote:


Qualcomm SoCs have an additional level of cache called as
System cache, aka. Last level cache (LLC). This cache sits right
before the DDR, and is tightly coupled with the memory controller.
The clients using this cache request their slices from this
system cache, make it active, and can then start using it.
For these clients with smmu, to start using the system cache for
buffers and, related page tables [1], memory attributes need to be
set accordingly. This series add the required support.



Does this actually improve performance on reads from a device? The
non-cache coherent DMA routines perform an unconditional D-cache
invalidate by VA to the PoC before reading from the buffers filled by
the device, and I would expect the PoC to be defined as lying beyond
the LLC to still guarantee the architected behavior.


We have seen performance improvements when running Manhattan
GFXBench benchmarks.



Ah ok, that makes sense, since in that case, the data flow is mostly
to the device, not from the device.


As for the PoC, from my knowledge on sdm845 the system cache, aka
Last level cache (LLC) lies beyond the point of coherency.
Non-cache coherent buffers will not be cached to system cache also, and
no additional software cache maintenance ops are required for system cache.
Pratik can add more if I am missing something.

To take care of the memory attributes from DMA APIs side, we can add a
DMA_ATTR definition to take care of any dma non-coherent APIs calls.



So does the device use the correct inner non-cacheable, outer
writeback cacheable attributes if the SMMU is in pass-through?

We have been looking into another use case where the fact that the
SMMU overrides memory attributes is causing issues (WC mappings used
by the radeon and amdgpu driver). So if the SMMU would honour the
existing attributes, would you still need the SMMU changes?


Even if we could force a stage 2 mapping with the weakest pagetable
attributes (such that combining would work), there would still need to
be a way to set the TCR attributes appropriately if this behaviour is
wanted for the SMMU's own table walks as well.



Isn't that just a matter of implementing support for SMMUs that lack
the 'dma-coherent' attribute?


Not quite - in general they need INC-ONC attributes in case there
actually is something in the architectural outer-cacheable domain.


But is it a problem to use INC-ONC attributes for the SMMU PTW on this
chip? AIUI, the reason for the SMMU changes is to avoid the
performance hit of snooping, which is more expensive than cache
maintenance of SMMU page tables. So are you saying the by-VA cache
maintenance is not relayed to this system cache, resulting in page
table updates to be invisible to masters using INC-ONC attributes?


I only have a relatively vague impression of how this Qcom interconnect 
actually behaves, but AIUI the outer attribute has no correctness impact 
(it's effectively mismatched between CPU and devices already), only some 
degree of latency improvement which is effectively the opposite of 
no-snoop, in allowing certain non-coherent device traffic to still 
allocate in the LLC. I'm assuming that if that latency matters for the 
device accesses themselves, it might also matter for the associated 
table walks depending on the TLB miss rate.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache

2019-01-21 Thread Ard Biesheuvel
On Mon, 21 Jan 2019 at 14:56, Robin Murphy  wrote:
>
> On 21/01/2019 13:36, Ard Biesheuvel wrote:
> > On Mon, 21 Jan 2019 at 14:25, Robin Murphy  wrote:
> >>
> >> On 21/01/2019 10:50, Ard Biesheuvel wrote:
> >>> On Mon, 21 Jan 2019 at 11:17, Vivek Gautam  
> >>> wrote:
> 
>  Hi,
> 
> 
>  On Mon, Jan 21, 2019 at 12:56 PM Ard Biesheuvel
>   wrote:
> >
> > On Mon, 21 Jan 2019 at 06:54, Vivek Gautam 
> >  wrote:
> >>
> >> Qualcomm SoCs have an additional level of cache called as
> >> System cache, aka. Last level cache (LLC). This cache sits right
> >> before the DDR, and is tightly coupled with the memory controller.
> >> The clients using this cache request their slices from this
> >> system cache, make it active, and can then start using it.
> >> For these clients with smmu, to start using the system cache for
> >> buffers and, related page tables [1], memory attributes need to be
> >> set accordingly. This series add the required support.
> >>
> >
> > Does this actually improve performance on reads from a device? The
> > non-cache coherent DMA routines perform an unconditional D-cache
> > invalidate by VA to the PoC before reading from the buffers filled by
> > the device, and I would expect the PoC to be defined as lying beyond
> > the LLC to still guarantee the architected behavior.
> 
>  We have seen performance improvements when running Manhattan
>  GFXBench benchmarks.
> 
> >>>
> >>> Ah ok, that makes sense, since in that case, the data flow is mostly
> >>> to the device, not from the device.
> >>>
>  As for the PoC, from my knowledge on sdm845 the system cache, aka
>  Last level cache (LLC) lies beyond the point of coherency.
>  Non-cache coherent buffers will not be cached to system cache also, and
>  no additional software cache maintenance ops are required for system 
>  cache.
>  Pratik can add more if I am missing something.
> 
>  To take care of the memory attributes from DMA APIs side, we can add a
>  DMA_ATTR definition to take care of any dma non-coherent APIs calls.
> 
> >>>
> >>> So does the device use the correct inner non-cacheable, outer
> >>> writeback cacheable attributes if the SMMU is in pass-through?
> >>>
> >>> We have been looking into another use case where the fact that the
> >>> SMMU overrides memory attributes is causing issues (WC mappings used
> >>> by the radeon and amdgpu driver). So if the SMMU would honour the
> >>> existing attributes, would you still need the SMMU changes?
> >>
> >> Even if we could force a stage 2 mapping with the weakest pagetable
> >> attributes (such that combining would work), there would still need to
> >> be a way to set the TCR attributes appropriately if this behaviour is
> >> wanted for the SMMU's own table walks as well.
> >>
> >
> > Isn't that just a matter of implementing support for SMMUs that lack
> > the 'dma-coherent' attribute?
>
> Not quite - in general they need INC-ONC attributes in case there
> actually is something in the architectural outer-cacheable domain.

But is it a problem to use INC-ONC attributes for the SMMU PTW on this
chip? AIUI, the reason for the SMMU changes is to avoid the
performance hit of snooping, which is more expensive than cache
maintenance of SMMU page tables. So are you saying the by-VA cache
maintenance is not relayed to this system cache, resulting in page
table updates to be invisible to masters using INC-ONC attributes?

> The
> case of the outer cacheablility being not that but a hint to control
> non-CPU traffic through some not-quite-transparent cache behind the PoC
> definitely stays wrapped up in qcom-specific magic ;)
>

I'm not surprised ...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache

2019-01-21 Thread Robin Murphy

On 21/01/2019 13:36, Ard Biesheuvel wrote:

On Mon, 21 Jan 2019 at 14:25, Robin Murphy  wrote:


On 21/01/2019 10:50, Ard Biesheuvel wrote:

On Mon, 21 Jan 2019 at 11:17, Vivek Gautam  wrote:


Hi,


On Mon, Jan 21, 2019 at 12:56 PM Ard Biesheuvel
 wrote:


On Mon, 21 Jan 2019 at 06:54, Vivek Gautam  wrote:


Qualcomm SoCs have an additional level of cache called as
System cache, aka. Last level cache (LLC). This cache sits right
before the DDR, and is tightly coupled with the memory controller.
The clients using this cache request their slices from this
system cache, make it active, and can then start using it.
For these clients with smmu, to start using the system cache for
buffers and, related page tables [1], memory attributes need to be
set accordingly. This series add the required support.



Does this actually improve performance on reads from a device? The
non-cache coherent DMA routines perform an unconditional D-cache
invalidate by VA to the PoC before reading from the buffers filled by
the device, and I would expect the PoC to be defined as lying beyond
the LLC to still guarantee the architected behavior.


We have seen performance improvements when running Manhattan
GFXBench benchmarks.



Ah ok, that makes sense, since in that case, the data flow is mostly
to the device, not from the device.


As for the PoC, from my knowledge on sdm845 the system cache, aka
Last level cache (LLC) lies beyond the point of coherency.
Non-cache coherent buffers will not be cached to system cache also, and
no additional software cache maintenance ops are required for system cache.
Pratik can add more if I am missing something.

To take care of the memory attributes from DMA APIs side, we can add a
DMA_ATTR definition to take care of any dma non-coherent APIs calls.



So does the device use the correct inner non-cacheable, outer
writeback cacheable attributes if the SMMU is in pass-through?

We have been looking into another use case where the fact that the
SMMU overrides memory attributes is causing issues (WC mappings used
by the radeon and amdgpu driver). So if the SMMU would honour the
existing attributes, would you still need the SMMU changes?


Even if we could force a stage 2 mapping with the weakest pagetable
attributes (such that combining would work), there would still need to
be a way to set the TCR attributes appropriately if this behaviour is
wanted for the SMMU's own table walks as well.



Isn't that just a matter of implementing support for SMMUs that lack
the 'dma-coherent' attribute?


Not quite - in general they need INC-ONC attributes in case there 
actually is something in the architectural outer-cacheable domain. The 
case of the outer cacheablility being not that but a hint to control 
non-CPU traffic through some not-quite-transparent cache behind the PoC 
definitely stays wrapped up in qcom-specific magic ;)


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache

2019-01-21 Thread Ard Biesheuvel
On Mon, 21 Jan 2019 at 14:25, Robin Murphy  wrote:
>
> On 21/01/2019 10:50, Ard Biesheuvel wrote:
> > On Mon, 21 Jan 2019 at 11:17, Vivek Gautam  
> > wrote:
> >>
> >> Hi,
> >>
> >>
> >> On Mon, Jan 21, 2019 at 12:56 PM Ard Biesheuvel
> >>  wrote:
> >>>
> >>> On Mon, 21 Jan 2019 at 06:54, Vivek Gautam  
> >>> wrote:
> 
>  Qualcomm SoCs have an additional level of cache called as
>  System cache, aka. Last level cache (LLC). This cache sits right
>  before the DDR, and is tightly coupled with the memory controller.
>  The clients using this cache request their slices from this
>  system cache, make it active, and can then start using it.
>  For these clients with smmu, to start using the system cache for
>  buffers and, related page tables [1], memory attributes need to be
>  set accordingly. This series add the required support.
> 
> >>>
> >>> Does this actually improve performance on reads from a device? The
> >>> non-cache coherent DMA routines perform an unconditional D-cache
> >>> invalidate by VA to the PoC before reading from the buffers filled by
> >>> the device, and I would expect the PoC to be defined as lying beyond
> >>> the LLC to still guarantee the architected behavior.
> >>
> >> We have seen performance improvements when running Manhattan
> >> GFXBench benchmarks.
> >>
> >
> > Ah ok, that makes sense, since in that case, the data flow is mostly
> > to the device, not from the device.
> >
> >> As for the PoC, from my knowledge on sdm845 the system cache, aka
> >> Last level cache (LLC) lies beyond the point of coherency.
> >> Non-cache coherent buffers will not be cached to system cache also, and
> >> no additional software cache maintenance ops are required for system cache.
> >> Pratik can add more if I am missing something.
> >>
> >> To take care of the memory attributes from DMA APIs side, we can add a
> >> DMA_ATTR definition to take care of any dma non-coherent APIs calls.
> >>
> >
> > So does the device use the correct inner non-cacheable, outer
> > writeback cacheable attributes if the SMMU is in pass-through?
> >
> > We have been looking into another use case where the fact that the
> > SMMU overrides memory attributes is causing issues (WC mappings used
> > by the radeon and amdgpu driver). So if the SMMU would honour the
> > existing attributes, would you still need the SMMU changes?
>
> Even if we could force a stage 2 mapping with the weakest pagetable
> attributes (such that combining would work), there would still need to
> be a way to set the TCR attributes appropriately if this behaviour is
> wanted for the SMMU's own table walks as well.
>

Isn't that just a matter of implementing support for SMMUs that lack
the 'dma-coherent' attribute?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache

2019-01-21 Thread Robin Murphy

On 21/01/2019 10:50, Ard Biesheuvel wrote:

On Mon, 21 Jan 2019 at 11:17, Vivek Gautam  wrote:


Hi,


On Mon, Jan 21, 2019 at 12:56 PM Ard Biesheuvel
 wrote:


On Mon, 21 Jan 2019 at 06:54, Vivek Gautam  wrote:


Qualcomm SoCs have an additional level of cache called as
System cache, aka. Last level cache (LLC). This cache sits right
before the DDR, and is tightly coupled with the memory controller.
The clients using this cache request their slices from this
system cache, make it active, and can then start using it.
For these clients with smmu, to start using the system cache for
buffers and, related page tables [1], memory attributes need to be
set accordingly. This series add the required support.



Does this actually improve performance on reads from a device? The
non-cache coherent DMA routines perform an unconditional D-cache
invalidate by VA to the PoC before reading from the buffers filled by
the device, and I would expect the PoC to be defined as lying beyond
the LLC to still guarantee the architected behavior.


We have seen performance improvements when running Manhattan
GFXBench benchmarks.



Ah ok, that makes sense, since in that case, the data flow is mostly
to the device, not from the device.


As for the PoC, from my knowledge on sdm845 the system cache, aka
Last level cache (LLC) lies beyond the point of coherency.
Non-cache coherent buffers will not be cached to system cache also, and
no additional software cache maintenance ops are required for system cache.
Pratik can add more if I am missing something.

To take care of the memory attributes from DMA APIs side, we can add a
DMA_ATTR definition to take care of any dma non-coherent APIs calls.



So does the device use the correct inner non-cacheable, outer
writeback cacheable attributes if the SMMU is in pass-through?

We have been looking into another use case where the fact that the
SMMU overrides memory attributes is causing issues (WC mappings used
by the radeon and amdgpu driver). So if the SMMU would honour the
existing attributes, would you still need the SMMU changes?


Even if we could force a stage 2 mapping with the weakest pagetable 
attributes (such that combining would work), there would still need to 
be a way to set the TCR attributes appropriately if this behaviour is 
wanted for the SMMU's own table walks as well.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache

2019-01-21 Thread Ard Biesheuvel
On Mon, 21 Jan 2019 at 11:17, Vivek Gautam  wrote:
>
> Hi,
>
>
> On Mon, Jan 21, 2019 at 12:56 PM Ard Biesheuvel
>  wrote:
> >
> > On Mon, 21 Jan 2019 at 06:54, Vivek Gautam  
> > wrote:
> > >
> > > Qualcomm SoCs have an additional level of cache called as
> > > System cache, aka. Last level cache (LLC). This cache sits right
> > > before the DDR, and is tightly coupled with the memory controller.
> > > The clients using this cache request their slices from this
> > > system cache, make it active, and can then start using it.
> > > For these clients with smmu, to start using the system cache for
> > > buffers and, related page tables [1], memory attributes need to be
> > > set accordingly. This series add the required support.
> > >
> >
> > Does this actually improve performance on reads from a device? The
> > non-cache coherent DMA routines perform an unconditional D-cache
> > invalidate by VA to the PoC before reading from the buffers filled by
> > the device, and I would expect the PoC to be defined as lying beyond
> > the LLC to still guarantee the architected behavior.
>
> We have seen performance improvements when running Manhattan
> GFXBench benchmarks.
>

Ah ok, that makes sense, since in that case, the data flow is mostly
to the device, not from the device.

> As for the PoC, from my knowledge on sdm845 the system cache, aka
> Last level cache (LLC) lies beyond the point of coherency.
> Non-cache coherent buffers will not be cached to system cache also, and
> no additional software cache maintenance ops are required for system cache.
> Pratik can add more if I am missing something.
>
> To take care of the memory attributes from DMA APIs side, we can add a
> DMA_ATTR definition to take care of any dma non-coherent APIs calls.
>

So does the device use the correct inner non-cacheable, outer
writeback cacheable attributes if the SMMU is in pass-through?

We have been looking into another use case where the fact that the
SMMU overrides memory attributes is causing issues (WC mappings used
by the radeon and amdgpu driver). So if the SMMU would honour the
existing attributes, would you still need the SMMU changes?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache

2019-01-21 Thread Vivek Gautam
Hi,


On Mon, Jan 21, 2019 at 12:56 PM Ard Biesheuvel
 wrote:
>
> On Mon, 21 Jan 2019 at 06:54, Vivek Gautam  
> wrote:
> >
> > Qualcomm SoCs have an additional level of cache called as
> > System cache, aka. Last level cache (LLC). This cache sits right
> > before the DDR, and is tightly coupled with the memory controller.
> > The clients using this cache request their slices from this
> > system cache, make it active, and can then start using it.
> > For these clients with smmu, to start using the system cache for
> > buffers and, related page tables [1], memory attributes need to be
> > set accordingly. This series add the required support.
> >
>
> Does this actually improve performance on reads from a device? The
> non-cache coherent DMA routines perform an unconditional D-cache
> invalidate by VA to the PoC before reading from the buffers filled by
> the device, and I would expect the PoC to be defined as lying beyond
> the LLC to still guarantee the architected behavior.

We have seen performance improvements when running Manhattan
GFXBench benchmarks.

As for the PoC, from my knowledge on sdm845 the system cache, aka
Last level cache (LLC) lies beyond the point of coherency.
Non-cache coherent buffers will not be cached to system cache also, and
no additional software cache maintenance ops are required for system cache.
Pratik can add more if I am missing something.

To take care of the memory attributes from DMA APIs side, we can add a
DMA_ATTR definition to take care of any dma non-coherent APIs calls.

Regards
Vivek
>
>
>
> > This change is a realisation of following changes from downstream msm-4.9:
> > iommu: io-pgtable-arm: Support DOMAIN_ATTRIBUTE_USE_UPSTREAM_HINT[2]
> > iommu: io-pgtable-arm: Implement IOMMU_USE_UPSTREAM_HINT[3]
> >
> > Changes since v2:
> >  - Split the patches into io-pgtable-arm driver and arm-smmu driver.
> >  - Converted smmu domain attributes to a bitmap, so multiple attributes
> >can be managed easily.
> >  - With addition of non-coherent page table mapping support [4], this
> >patch series now aligns with the understanding of upgrading the
> >non-coherent devices to use some level of outer cache.
> >  - Updated the macros and comments to reflect the use of QCOM_SYS_CACHE.
> >  - QCOM_SYS_CACHE can still be used at stage 2, so that doens't depend on
> >stage-1 mapping.
> >  - Added change to disable the attribute from arm_smmu_domain_set_attr()
> >when needed.
> >  - Removed the page protection controls for QCOM_SYS_CACHE at the DMA API
> >level.
> >
> > Goes on top of the non-coherent page tables support patch series [4]
> >
> > [1] https://patchwork.kernel.org/patch/10302791/
> > [2] 
> > https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9=bf762276796e79ca90014992f4d9da5593fa7d51
> > [3] 
> > https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9=d4c72c413ea27c43f60825193d4de9cb8ffd9602
> > [4] https://lore.kernel.org/patchwork/cover/1032938/
> >
> > Vivek Gautam (3):
> >   iommu/arm-smmu: Move to bitmap for arm_smmu_domain atrributes
> >   iommu/io-pgtable-arm: Add support to use system cache
> >   iommu/arm-smmu: Add support to use system cache
> >
> >  drivers/iommu/arm-smmu.c   | 28 
> >  drivers/iommu/io-pgtable-arm.c | 15 +--
> >  drivers/iommu/io-pgtable.h |  4 
> >  include/linux/iommu.h  |  2 ++
> >  4 files changed, 43 insertions(+), 6 deletions(-)
> >
> > --
> > QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> > of Code Aurora Forum, hosted by The Linux Foundation
> >
> >
> > ___
> > linux-arm-kernel mailing list
> > linux-arm-ker...@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel



-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache

2019-01-20 Thread Ard Biesheuvel
On Mon, 21 Jan 2019 at 06:54, Vivek Gautam  wrote:
>
> Qualcomm SoCs have an additional level of cache called as
> System cache, aka. Last level cache (LLC). This cache sits right
> before the DDR, and is tightly coupled with the memory controller.
> The clients using this cache request their slices from this
> system cache, make it active, and can then start using it.
> For these clients with smmu, to start using the system cache for
> buffers and, related page tables [1], memory attributes need to be
> set accordingly. This series add the required support.
>

Does this actually improve performance on reads from a device? The
non-cache coherent DMA routines perform an unconditional D-cache
invalidate by VA to the PoC before reading from the buffers filled by
the device, and I would expect the PoC to be defined as lying beyond
the LLC to still guarantee the architected behavior.



> This change is a realisation of following changes from downstream msm-4.9:
> iommu: io-pgtable-arm: Support DOMAIN_ATTRIBUTE_USE_UPSTREAM_HINT[2]
> iommu: io-pgtable-arm: Implement IOMMU_USE_UPSTREAM_HINT[3]
>
> Changes since v2:
>  - Split the patches into io-pgtable-arm driver and arm-smmu driver.
>  - Converted smmu domain attributes to a bitmap, so multiple attributes
>can be managed easily.
>  - With addition of non-coherent page table mapping support [4], this
>patch series now aligns with the understanding of upgrading the
>non-coherent devices to use some level of outer cache.
>  - Updated the macros and comments to reflect the use of QCOM_SYS_CACHE.
>  - QCOM_SYS_CACHE can still be used at stage 2, so that doens't depend on
>stage-1 mapping.
>  - Added change to disable the attribute from arm_smmu_domain_set_attr()
>when needed.
>  - Removed the page protection controls for QCOM_SYS_CACHE at the DMA API
>level.
>
> Goes on top of the non-coherent page tables support patch series [4]
>
> [1] https://patchwork.kernel.org/patch/10302791/
> [2] 
> https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9=bf762276796e79ca90014992f4d9da5593fa7d51
> [3] 
> https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9=d4c72c413ea27c43f60825193d4de9cb8ffd9602
> [4] https://lore.kernel.org/patchwork/cover/1032938/
>
> Vivek Gautam (3):
>   iommu/arm-smmu: Move to bitmap for arm_smmu_domain atrributes
>   iommu/io-pgtable-arm: Add support to use system cache
>   iommu/arm-smmu: Add support to use system cache
>
>  drivers/iommu/arm-smmu.c   | 28 
>  drivers/iommu/io-pgtable-arm.c | 15 +--
>  drivers/iommu/io-pgtable.h |  4 
>  include/linux/iommu.h  |  2 ++
>  4 files changed, 43 insertions(+), 6 deletions(-)
>
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/3] iommu/arm-smmu: Add support to use Last level cache

2019-01-20 Thread Vivek Gautam
Qualcomm SoCs have an additional level of cache called as
System cache, aka. Last level cache (LLC). This cache sits right
before the DDR, and is tightly coupled with the memory controller.
The clients using this cache request their slices from this
system cache, make it active, and can then start using it.
For these clients with smmu, to start using the system cache for
buffers and, related page tables [1], memory attributes need to be
set accordingly. This series add the required support.

This change is a realisation of following changes from downstream msm-4.9:
iommu: io-pgtable-arm: Support DOMAIN_ATTRIBUTE_USE_UPSTREAM_HINT[2]
iommu: io-pgtable-arm: Implement IOMMU_USE_UPSTREAM_HINT[3]

Changes since v2:
 - Split the patches into io-pgtable-arm driver and arm-smmu driver.
 - Converted smmu domain attributes to a bitmap, so multiple attributes
   can be managed easily.
 - With addition of non-coherent page table mapping support [4], this
   patch series now aligns with the understanding of upgrading the
   non-coherent devices to use some level of outer cache.
 - Updated the macros and comments to reflect the use of QCOM_SYS_CACHE.
 - QCOM_SYS_CACHE can still be used at stage 2, so that doens't depend on
   stage-1 mapping.
 - Added change to disable the attribute from arm_smmu_domain_set_attr()
   when needed.
 - Removed the page protection controls for QCOM_SYS_CACHE at the DMA API
   level.

Goes on top of the non-coherent page tables support patch series [4]

[1] https://patchwork.kernel.org/patch/10302791/
[2] 
https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9=bf762276796e79ca90014992f4d9da5593fa7d51
[3] 
https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?h=msm-4.9=d4c72c413ea27c43f60825193d4de9cb8ffd9602
[4] https://lore.kernel.org/patchwork/cover/1032938/

Vivek Gautam (3):
  iommu/arm-smmu: Move to bitmap for arm_smmu_domain atrributes
  iommu/io-pgtable-arm: Add support to use system cache
  iommu/arm-smmu: Add support to use system cache

 drivers/iommu/arm-smmu.c   | 28 
 drivers/iommu/io-pgtable-arm.c | 15 +--
 drivers/iommu/io-pgtable.h |  4 
 include/linux/iommu.h  |  2 ++
 4 files changed, 43 insertions(+), 6 deletions(-)

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu