Re: [PATCH v7 0/3] make dma_alloc_coherent NUMA-aware by per-NUMA CMA

2020-08-21 Thread Mike Kravetz
On 8/21/20 1:47 PM, Song Bao Hua (Barry Song) wrote:
> 
> 
>> -Original Message-
>> From: Song Bao Hua (Barry Song)
>> Sent: Saturday, August 22, 2020 7:27 AM
>> To: 'Mike Kravetz' ; h...@lst.de;
>> m.szyprow...@samsung.com; robin.mur...@arm.com; w...@kernel.org;
>> ganapatrao.kulka...@cavium.com; catalin.mari...@arm.com;
>> a...@linux-foundation.org
>> Cc: iommu@lists.linux-foundation.org; linux-arm-ker...@lists.infradead.org;
>> linux-ker...@vger.kernel.org; Zengtao (B) ;
>> huangdaode ; Linuxarm 
>> Subject: RE: [PATCH v7 0/3] make dma_alloc_coherent NUMA-aware by
>> per-NUMA CMA
>>
>>
>>
>>> -Original Message-
>>> From: Mike Kravetz [mailto:mike.krav...@oracle.com]
>>> Sent: Saturday, August 22, 2020 5:53 AM
>>> To: Song Bao Hua (Barry Song) ; h...@lst.de;
>>> m.szyprow...@samsung.com; robin.mur...@arm.com; w...@kernel.org;
>>> ganapatrao.kulka...@cavium.com; catalin.mari...@arm.com;
>>> a...@linux-foundation.org
>>> Cc: iommu@lists.linux-foundation.org; linux-arm-ker...@lists.infradead.org;
>>> linux-ker...@vger.kernel.org; Zengtao (B) ;
>>> huangdaode ; Linuxarm
>> 
>>> Subject: Re: [PATCH v7 0/3] make dma_alloc_coherent NUMA-aware by
>>> per-NUMA CMA
>>>
>>> Hi Barry,
>>> Sorry for jumping in so late.
>>>
>>> On 8/21/20 4:33 AM, Barry Song wrote:
>>>>
>>>> with per-numa CMA, smmu will get memory from local numa node to save
>>> command
>>>> queues and page tables. that means dma_unmap latency will be shrunk
>>> much.
>>>
>>> Since per-node CMA areas for hugetlb was introduced, I have been thinking
>>> about the limited number of CMA areas.  In most configurations, I believe
>>> it is limited to 7.  And, IIRC it is not something that can be changed at
>>> runtime, you need to reconfig and rebuild to increase the number.  In
>> contrast
>>> some configs have NODES_SHIFT set to 10.  I wasn't too worried because of
>>> the limited hugetlb use case.  However, this series is adding another user
>>> of per-node CMA areas.
>>>
>>> With more users, should try to sync up number of CMA areas and number of
>>> nodes?  Or, perhaps I am worrying about nothing?
>>
>> Hi Mike,
>> The current limitation is 8. If the server has 4 nodes and we enable both
>> pernuma
>> CMA and hugetlb, the last node will fail to get one cma area as the default
>> global cma area will take 1 of 8. So users need to change menuconfig.
>> If the server has 8 nodes, we enable one of pernuma cma and hugetlb, one
>> node
>> will fail to get cma.
>>
>> We may set the default number of CMA areas as 8+MAX_NODES(if hugetlb
>> enabled) +
>> MAX_NODES(if pernuma cma enabled) if we don't expect users to change
>> config, but
>> right now hugetlb has not an option in Kconfig to enable or disable like
>> pernuma cma
>> has DMA_PERNUMA_CMA.
> 
> I would prefer we make some changes like:
> 
> config CMA_AREAS
>   int "Maximum count of the CMA areas"
>   depends on CMA
> + default 19 if NUMA
>   default 7
>   help
> CMA allows to create CMA areas for particular purpose, mainly,
> used as device private area. This parameter sets the maximum
> number of CMA area in the system.
> 
> -   If unsure, leave the default value "7".
> +   If unsure, leave the default value "7" or "19" if NUMA is used.
> 
> 1+ CONFIG_CMA_AREAS should be quite enough for almost all servers in the 
> markets.
> 
> If 2 numa nodes, and both hugetlb cma and pernuma cma is enabled, we need 2*2 
> + 1 = 5
> If 4 numa nodes, and both hugetlb cma and pernuma cma is enabled, we need 2*4 
> + 1 = 9-> default ARM64 config.
> If 8 numa nodes, and both hugetlb cma and pernuma cma is enabled, we need 2*8 
> + 1 = 17
> 
> The default value is supporting the most common case and is not going to 
> support those servers
> with NODES_SHIFT=10, they can make their own config just like users need to 
> increase CMA_AREAS
> if they add many cma areas in device tree in a system even without NUMA.
> 
> How do you think, mike?

I'm OK with that.  I really did not want to sidetrach this series.  It is
just something I thought about when looking at the hugetlb code.  My 'to do'
list includes looking at a way to make the number of CMA areas dynamic.
-- 
Mike Kravetz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 0/3] make dma_alloc_coherent NUMA-aware by per-NUMA CMA

2020-08-21 Thread Mike Kravetz
Hi Barry,
Sorry for jumping in so late.

On 8/21/20 4:33 AM, Barry Song wrote:
> 
> with per-numa CMA, smmu will get memory from local numa node to save command
> queues and page tables. that means dma_unmap latency will be shrunk much.

Since per-node CMA areas for hugetlb was introduced, I have been thinking
about the limited number of CMA areas.  In most configurations, I believe
it is limited to 7.  And, IIRC it is not something that can be changed at
runtime, you need to reconfig and rebuild to increase the number.  In contrast
some configs have NODES_SHIFT set to 10.  I wasn't too worried because of
the limited hugetlb use case.  However, this series is adding another user
of per-node CMA areas.

With more users, should try to sync up number of CMA areas and number of
nodes?  Or, perhaps I am worrying about nothing?
-- 
Mike Kravetz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu