Re: [PATCH 1/1] arm64/hugetlb: clear PG_dcache_clean if the page is dirty when munmap

2016-07-19 Thread Leizhen (ThunderTown)


On 2016/7/12 23:35, Catalin Marinas wrote:
> On Mon, Jul 11, 2016 at 08:43:32PM +0800, Leizhen (ThunderTown) wrote:
>> On 2016/7/9 0:13, Catalin Marinas wrote:
>>> On Fri, Jul 08, 2016 at 11:24:26PM +0800, Leizhen (ThunderTown) wrote:
>>>> On 2016/7/8 21:54, Catalin Marinas wrote:
>>>>> 8<
>>>>> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
>>>>> index dbd12ea8ce68..c753fa804165 100644
>>>>> --- a/arch/arm64/mm/flush.c
>>>>> +++ b/arch/arm64/mm/flush.c
>>>>> @@ -75,7 +75,8 @@ void __sync_icache_dcache(pte_t pte, unsigned long addr)
>>>>>   if (!page_mapping(page))
>>>>>   return;
>>>>>  
>>>>> - if (!test_and_set_bit(PG_dcache_clean, &page->flags))
>>>>> + if (!test_and_set_bit(PG_dcache_clean, &page->flags) ||
>>>>> + PageDirty(page))
>>>>>   sync_icache_aliases(page_address(page),
>>>>>   PAGE_SIZE << compound_order(page));
>>>>>   else if (icache_is_aivivt())
>>>>> 8<-
Hi, Catalin:
  Do you plan to send this patch? My colleagues told me that if our patches are 
quite
different, it should be Signed-off-by you.

  I searched all Linux source code, __sync_icache_dcache is only called by 
set_pte_at,
and some check conditions(especially pte_exec) will limit its impact.

if (pte_user(pte) && pte_exec(pte) && !pte_special(pte))
__sync_icache_dcache(pte, addr);

>>>>>
>>>>> BTW, can you make your tests (source) available somewhere?
>>>>
>>>> Both cases worked well with this patch.
>>>
>>> Now I'm even more confused ;). IIUC, after an msync() in user space we
>>> should flush the pages to disk via write_cache_pages(). This function
>>> calls clear_page_dirty_for_io() after which PageDirty() is no longer
>>> true. I can't tell how a subsequent mmap() can see the written pages as
>>> dirty.
>>
>> As my tracing, both cases invoked empty function.
>>
>> int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int 
>> datasync)
>>  ..
>>  return file->f_op->fsync(file, start, end, datasync);
>> }
>>
>> const struct file_operations hugetlbfs_file_operations = {
>>  .fsync  = noop_fsync,
>>
>> static const struct file_operations shmem_file_operations = {
>>  .mmap   = shmem_mmap,
>> #ifdef CONFIG_TMPFS
>>  .fsync  = noop_fsync,
> 
> I was referring to standard filesystem (e.g. ext4) writes where, IIUC,
> the PageDirty() status is cleared after I/O but it's not necessarily
> removed from the page cache.
> 



Re: [RFC] Arm64 boot fail with numa enable in BIOS

2016-09-19 Thread Leizhen (ThunderTown)


On 2016/9/19 22:45, Will Deacon wrote:
> On Mon, Sep 19, 2016 at 03:07:19PM +0100, Mark Rutland wrote:
>> [adding LAKML, arm64 maintainers]
> 
> I've also looped in Euler ThunderTown, since (a) he's at Huawei and is
> assumedly testing this stuff and (b) he has a fairly big NUMA patch
> series doing the rounds (some of which I've queued).
In my patch series, only one is used to resolve crashed problem, but it's 
related to device-tree.

> 
>> On Mon, Sep 19, 2016 at 09:05:26PM +0800, Yisheng Xie wrote:
>> In future, please make sure to Cc LAKML along with relevant parties when
>> sending arm64 patches/queries.
>>
>> For everyone newly Cc'd, the original message (with attachments) can be
>> found at:
>>
>> http://lkml.kernel.org/r/7618d76d-bfa8-d8aa-59aa-06f9d90c1...@huawei.com
>>
>>> When I enable NUMA in BIOS for arm64, it failed to boot on 
>>> v4.8-rc4-162-g071e31e.
>>
>> That commit ID doesn't seem to be in mainline (I can't find it in my
>> local tree). Which tree are you using? Do you have local patches
>> applied?
> 
> That commit is in mainline:
> 
>   http://git.kernel.org/linus/071e31e
> 
> It would be nice to know if the problem also exists on the arm64
> for-next/core branch.
> 
> Will
> 
> 
>> I take it that by "enable NUMA in BIOS", you mean exposing SRAT to the
>> OS?
>>
>>> For the crash log, it seems caused by error number of cpumask.
>>> Any ideas about it?
>>
>> Much earlier in your log, there was a (non-fatal) warning, as below. Do
>> you see this without NUMA/SRAT enabled in your FW? I don't see how the
>> SRAT should affect the secondaries we try to bring online.
>>
>> Given your MPIDRs have Aff2 bits set, I wonder if we've conflated a
>> logical ID with a physical ID somewhere, and it just so happens that the
>> NUMA code is more likely to poke something based on that.
>>
>> Can you modify the warning in cpumask.h to dump the bad CPU number? That
>> would make it fairly clear if that's the case.
>>
>> Thanks,
>> Mark.
>>
>>> [0.297337] Detected PIPT I-cache on CPU1
>>> [0.297347] GICv3: CPU1: found redistributor 10001 region 
>>> 1:0x4d14
>>> [0.297356] CPU1: Booted secondary processor [410fd082]
>>> [0.297375] [ cut here ]
>>> [0.320390] WARNING: CPU: 1 PID: 0 at ./include/linux/cpumask.h:121 
>>> gic_raise_softirq+0x128/0x17c
>>> [0.329356] Modules linked in:
>>> [0.332434] 
>>> [0.333932] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
>>> 4.8.0-rc4-00163-g803ea3a #21
>>> [0.341581] Hardware name: Hisilicon Hi1616 Evaluation Board (DT)
>>> [0.347735] task: 8013e9dd task.stack: 8013e9dcc000
>>> [0.353714] PC is at gic_raise_softirq+0x128/0x17c
>>> [0.358550] LR is at gic_raise_softirq+0xa0/0x17c
>>> [0.363298] pc : [] lr : [] pstate: 
>>> 21c5
>>> [0.370770] sp : 8013e9dcfde0
>>> [0.374112] x29: 8013e9dcfde0 x28:  
>>> [0.379476] x27: 0083207c x26: 08ca5d70 
>>> [0.384841] x25: 00010001 x24: 08d63ff3 
>>> [0.390205] x23:  x22: 08cb 
>>> [0.395569] x21: 0884edb0 x20: 0001 
>>> [0.400933] x19: 0001 x18:  
>>> [0.406298] x17:  x16: 03010066 
>>> [0.411661] x15: 08ca8000 x14: 0013 
>>> [0.417025] x13:  x12: 0013 
>>> [0.422389] x11: 0013 x10: 02e92aa7 
>>> [0.427754] x9 :  x8 : 8413eb6ca668 
>>> [0.433118] x7 : 8413eb6ca690 x6 :  
>>> [0.438482] x5 : fffe x4 :  
>>> [0.443845] x3 : 0040 x2 : 0041 
>>> [0.449209] x1 :  x0 : 0001 
>>> [0.454573] 
>>> [0.456069] ---[ end trace b58e70f3295a8cd7 ]---
>>> [0.460730] Call trace:
>>> [0.463193] Exception stack(0x8013e9dcfc10 to 0x8013e9dcfd40)
>>> [0.469699] fc00:   0001 
>>> 0001
>>> [0.477611] fc20: 8013e9dcfde0 0838c124 08d72228 
>>> 8013e9dcff70
>>> [0.485524] fc40: 08d72608 08ab02a4  
>>> 
>>> [0.493436] fc60:  3464313430303030  
>>> 
>>> [0.501348] fc80: 8013e9dcfc90 0836e678 8013e9dcfca0 
>>> 0836e910
>>> [0.509259] fca0: 8013e9dcfd30 0836ec10 0001 
>>> 
>>> [0.517171] fcc0: 0041 0040  
>>> fffe
>>> [0.525083] fce0:  8413eb6ca690 8413eb6ca668 
>>> 
>>> [0.532995] fd00: 02e92aa7 0013 0013 
>>> 
>>> [0.540907] fd20: 0013 08ca8000 03010066 
>>> 
>>> [0.54881

Re: [PATCH v4 00/14] fix some type infos and bugs for arm64/of numa

2016-06-19 Thread Leizhen (ThunderTown)


On 2016/6/14 22:22, Catalin Marinas wrote:
> On Wed, Jun 08, 2016 at 04:59:03PM +0800, Leizhen (ThunderTown) wrote:
>> On 2016/6/7 21:58, Will Deacon wrote:
>>> On Tue, Jun 07, 2016 at 04:08:04PM +0800, Zhen Lei wrote:
>>>> v3 -> v4:
>>>> 1. Packed three patches of Kefeng Wang, patch6-8.
>>>> 2. Add 6 new patches(9-15) to enhance the numa on arm64.
>>>>
>>>> v2 -> v3:
>>>> 1. Adjust patch2 and patch5 according to Matthias Brugger's advice, to 
>>>> make the
>>>>patches looks more well. The final code have no change. 
>>>>
>>>> v1 -> v2:
>>>> 1. Base on https://lkml.org/lkml/2016/5/24/679
>>>
>>> If you want bug fixes to land in 4.7, you'll need to base them on a
>>> mainline kernel.
>>
>> I heared that David Daney's acpi numa patch series was accepted and
>> put into next branch(Linux 4.8).
>> Otherwise I will suggest him sending his patch6-7 to mainline first.
>> So that, only a very small conflict will be exist.
>>
>> I also tested that:
>> 1. git am David Daney's patch6-7, then git am all of my patches on a
>> branch, named branch A.
>> 2. git am David Daney's patch6-7 on another branch, named branch B.
>> 3. when I git merge B into branch A, it's still conflict. So I guess
>> git merge is based on source code, rather than patches.
>>
>> So at present, unless the maintainers are willing to resolve the
>> conflict, otherwise I update my patches will not work.
> 
> It usually depends on how complex the conflict is and whether your
> patches functionally depend on the other patches. I have no idea what
> the dependency is here since I haven't tried applying them to mainline.
> 
>> Fortunately, these patches are not particularly urgent. So I think I
>> can wait until Linux 4.8 start, then send these patches again. But I'm
>> not sure whether these patches can be merged into Linux 4.8, I really
>> hope.
> 
> If there are fixes to the arm64 ACPI NUMA patches that Rafael queued
> into linux-next, they should be sent to him and potentially being queued
> on top ahead of the 4.8 merging window or shortly after 4.8-rc1.
> Non-ACPI NUMA patches (as I can see, most of these patches are DT
> specific) could be merged independently.
> 
> So how many patches do you have in each category below:
> 
> 1. NUMA fixes against current mainline (4.7-rc3)
> 2. NUMA fixes against the arm64 ACPI NUMA patches queued by Rafael
My patches have not fixed any bugs for ACPI NUMA, but just based on it.
There are only three related patches:
[PATCH v7 06_15] arm64, numa  rework numa_add_memblk()
[PATCH v7 07_15] arm64, numa  Cleanup NUMA disabled messages.
[PATCH v7 14_15] arm64, acpi, numa  NUMA support based on SRAT and SLIT

arch/arm64/mm/numa.c  |  28 --
drivers/of/of_numa.c  |   4 +-

My patches 1-5, 8, 11 will confict with it.

> 3. New functionality or clean-up. Are these against mainline or ACPI
>NUMA patches?
Hi, Catalin
I'm sorry to reply this email too late. Because I have been thinking if
there are any other solutions.

I try to adjust the sequence of my patches as below:
1. New functionality//queued in your branch  (my patches 9-14, and 
6, 6 is clean-up)
2. 4.8-rc1  //apci numa series and my new functionality had 
been merged
3. bug fixes//other 4.8-rc versions  (my patches 1-5)
4. clean-up (pr_fmt)//queued in 4.9  (my patches 7-8)

And there only one confliction exist:
++<<<<<<< HEAD
 +static u8 numa_distance[MAX_NUMNODES][MAX_NUMNODES];  
//choose this
 +static int numa_off;
++===
+ static int numa_distance_cnt;
+ static u8 *numa_distance;
+ static bool numa_off; 
//choose this
++>>>>>>> acpi

> 



Re: [PATCH v4 00/14] fix some type infos and bugs for arm64/of numa

2016-06-21 Thread Leizhen (ThunderTown)


On 2016/6/20 14:39, Leizhen (ThunderTown) wrote:
> 
> 
> On 2016/6/14 22:22, Catalin Marinas wrote:
>> On Wed, Jun 08, 2016 at 04:59:03PM +0800, Leizhen (ThunderTown) wrote:
>>> On 2016/6/7 21:58, Will Deacon wrote:
>>>> On Tue, Jun 07, 2016 at 04:08:04PM +0800, Zhen Lei wrote:
>>>>> v3 -> v4:
>>>>> 1. Packed three patches of Kefeng Wang, patch6-8.
>>>>> 2. Add 6 new patches(9-15) to enhance the numa on arm64.
>>>>>
>>>>> v2 -> v3:
>>>>> 1. Adjust patch2 and patch5 according to Matthias Brugger's advice, to 
>>>>> make the
>>>>>patches looks more well. The final code have no change. 
>>>>>
>>>>> v1 -> v2:
>>>>> 1. Base on https://lkml.org/lkml/2016/5/24/679
>>>>
>>>> If you want bug fixes to land in 4.7, you'll need to base them on a
>>>> mainline kernel.
>>>
>>> I heared that David Daney's acpi numa patch series was accepted and
>>> put into next branch(Linux 4.8).
>>> Otherwise I will suggest him sending his patch6-7 to mainline first.
>>> So that, only a very small conflict will be exist.
>>>
>>> I also tested that:
>>> 1. git am David Daney's patch6-7, then git am all of my patches on a
>>> branch, named branch A.
>>> 2. git am David Daney's patch6-7 on another branch, named branch B.
>>> 3. when I git merge B into branch A, it's still conflict. So I guess
>>> git merge is based on source code, rather than patches.
>>>
>>> So at present, unless the maintainers are willing to resolve the
>>> conflict, otherwise I update my patches will not work.
>>
>> It usually depends on how complex the conflict is and whether your
>> patches functionally depend on the other patches. I have no idea what
>> the dependency is here since I haven't tried applying them to mainline.
>>
>>> Fortunately, these patches are not particularly urgent. So I think I
>>> can wait until Linux 4.8 start, then send these patches again. But I'm
>>> not sure whether these patches can be merged into Linux 4.8, I really
>>> hope.
>>
>> If there are fixes to the arm64 ACPI NUMA patches that Rafael queued
>> into linux-next, they should be sent to him and potentially being queued
>> on top ahead of the 4.8 merging window or shortly after 4.8-rc1.
>> Non-ACPI NUMA patches (as I can see, most of these patches are DT
>> specific) could be merged independently.
>>
>> So how many patches do you have in each category below:
>>
>> 1. NUMA fixes against current mainline (4.7-rc3)
>> 2. NUMA fixes against the arm64 ACPI NUMA patches queued by Rafael
> My patches have not fixed any bugs for ACPI NUMA, but just based on it.
> There are only three related patches:
> [PATCH v7 06_15] arm64, numa  rework numa_add_memblk()
> [PATCH v7 07_15] arm64, numa  Cleanup NUMA disabled messages.
> [PATCH v7 14_15] arm64, acpi, numa  NUMA support based on SRAT and SLIT
> 
> arch/arm64/mm/numa.c  |  28 --
> drivers/of/of_numa.c  |   4 +-
> 
> My patches 1-5, 8, 11 will confict with it.
> 
>> 3. New functionality or clean-up. Are these against mainline or ACPI
>>NUMA patches?
> Hi, Catalin
> I'm sorry to reply this email too late. Because I have been thinking if
> there are any other solutions.
> 
> I try to adjust the sequence of my patches as below:
> 1. New functionality  //queued in your branch  (my patches 9-14, and 
> 6, 6 is clean-up)
> 2. 4.8-rc1//apci numa series and my new functionality had 
> been merged
> 3. bug fixes  //other 4.8-rc versions  (my patches 1-5)
> 4. clean-up (pr_fmt)  //queued in 4.9  (my patches 7-8)

Hi, Catalin
  What about your opinion? Are you agree?

> 
> And there only one confliction exist:
> ++<<<<<<< HEAD
>  +static u8 numa_distance[MAX_NUMNODES][MAX_NUMNODES];
> //choose this
>  +static int numa_off;
> ++===
> + static int numa_distance_cnt;
> + static u8 *numa_distance;
> + static bool numa_off;   
> //choose this
> ++>>>>>>> acpi
> 
>>



Re: [PATCH v8 00/16] fix some type infos and bugs for arm64/of numa

2016-09-08 Thread Leizhen (ThunderTown)


On 2016/9/8 19:01, Will Deacon wrote:
> On Thu, Sep 01, 2016 at 02:54:51PM +0800, Zhen Lei wrote:
>> v7 -> v8:
>> Updated patches according to Will Deacon's review comments, thanks.
>>
>> The changed patches is: 3, 5, 8, 9, 10, 11, 12, 13, 15
>> Patch 3 requires an ack from Rob Herring.
>> Patch 10 requires an ack from linux-mm.
>>
>> Hi, Will:
>> Something should still be clarified:
>> Patch 5, I modified it according to my last reply. BTW, The last sentence
>>  "srat_disabled() ? -EINVAL : 0" of arm64_acpi_numa_init should be 
>> moved
>>  into acpi_numa_init, I think.
>>  
>> Patch 9, I still leave the code in arch/arm64.
>>  1) the implementation of setup_per_cpu_areas on all platforms are 
>> different.
>>  2) Although my implementation referred to PowerPC, but still 
>> something different.
>>
>> Patch 15, I modified the description again. Can you take a look at it? If 
>> this patch is
>>dropped, the patch 14 should also be dropped.
>>
>> Patch 16, How many times the function node_distance to be called rely on the 
>> APP(need many tasks
>>   to be scheduled), I have not prepared yet, so I give up this patch 
>> as your advise. 
> 
> Ok, I'm trying to pick the pieces out of this patch series and it's not
> especially easy. As far as I can tell:
> 
>   Patch 3 needs an ack from the device-tree folks
Rob just acked.

> 
>   Patch 10 needs an ack from the memblock folks
I'll immediately send a email to remind them.

> 
>   Patch 11 depends on patch 10
> 
>   Patches 14,15,16 can wait for the time being (I still don't see their
>   value).
OK, that's no problem. So I put them in the end beforehand.

> 
> So, I could pick up patches 1-2, 4-9 and 12-13 but it's not clear whether
Now you can also add patch 3.

> that makes any sense. The whole series seems to be a mix of trivial printk
The most valueable patches are: patch 2, 9, 11. The other is just because of a 
programmer wants the code to be nice.

> cleanups, a bunch of core OF stuff, some new features and then some
> questionable changes at the end.
> 
> Please throw me a clue,
> 
> Will
> 
> .
> 



Re: [PATCH v8 10/16] mm/memblock: add a new function memblock_alloc_near_nid

2016-09-08 Thread Leizhen (ThunderTown)
Hi, linux-mm folks:
Can somebody help me to review this patch?
I ran scripts/get_maintainer.pl -f mm/memblock.c and 
scripts/get_maintainer.pl -f mm/, but
the results showed me that there is no maintainer.
To understand this patch should also read patch 11.

On 2016/9/1 14:55, Zhen Lei wrote:
> If HAVE_MEMORYLESS_NODES is selected, and some memoryless numa nodes are
> actually exist. The percpu variable areas and numa control blocks of that
> memoryless numa nodes must be allocated from the nearest available node
> to improve performance.
> 
> Signed-off-by: Zhen Lei 
> ---
>  include/linux/memblock.h |  1 +
>  mm/memblock.c| 28 
>  2 files changed, 29 insertions(+)
> 
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 2925da2..8e866e0 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -290,6 +290,7 @@ static inline int memblock_get_region_node(const struct 
> memblock_region *r)
> 
>  phys_addr_t memblock_alloc_nid(phys_addr_t size, phys_addr_t align, int nid);
>  phys_addr_t memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, int 
> nid);
> +phys_addr_t memblock_alloc_near_nid(phys_addr_t size, phys_addr_t align, int 
> nid);
> 
>  phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align);
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 483197e..6578fff 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1189,6 +1189,34 @@ again:
>   return ret;
>  }
> 
> +phys_addr_t __init memblock_alloc_near_nid(phys_addr_t size, phys_addr_t 
> align, int nid)
> +{
> + int i, best_nid, distance;
> + u64 pa;
> + DECLARE_BITMAP(nodes_map, MAX_NUMNODES);
> +
> + bitmap_zero(nodes_map, MAX_NUMNODES);
> +
> +find_nearest_node:
> + best_nid = NUMA_NO_NODE;
> + distance = INT_MAX;
> +
> + for_each_clear_bit(i, nodes_map, MAX_NUMNODES)
> + if (node_distance(nid, i) < distance) {
> + best_nid = i;
> + distance = node_distance(nid, i);
> + }
> +
> + pa = memblock_alloc_nid(size, align, best_nid);
> + if (!pa) {
> + BUG_ON(best_nid == NUMA_NO_NODE);
> + bitmap_set(nodes_map, best_nid, 1);
> + goto find_nearest_node;
> + }
> +
> + return pa;
> +}
> +
>  phys_addr_t __init __memblock_alloc_base(phys_addr_t size, phys_addr_t 
> align, phys_addr_t max_addr)
>  {
>   return memblock_alloc_base_nid(size, align, max_addr, NUMA_NO_NODE,
> --
> 2.5.0
> 
> 
> 
> .
> 



Re: [PATCH v7 05/14] arm64/numa: avoid inconsistent information to be printed

2016-08-30 Thread Leizhen (ThunderTown)


On 2016/8/31 1:51, Will Deacon wrote:
> On Sat, Aug 27, 2016 at 04:54:56PM +0800, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2016/8/26 20:47, Will Deacon wrote:
>>> On Wed, Aug 24, 2016 at 03:44:44PM +0800, Zhen Lei wrote:
>>>> numa_init(of_numa_init) may returned error because of numa configuration
>>>> error. So "No NUMA configuration found" is inaccurate. In fact, specific
>>>> configuration error information should be immediately printed by the
>>>> testing branch.
>>>>
>>>> Signed-off-by: Zhen Lei 
>>>> ---
>>>>  arch/arm64/mm/numa.c | 6 +++---
>>>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>>> index 5bb15ea..d97c6e2 100644
>>>> --- a/arch/arm64/mm/numa.c
>>>> +++ b/arch/arm64/mm/numa.c
>>>> @@ -335,8 +335,10 @@ static int __init numa_init(int (*init_func)(void))
>>>>if (ret < 0)
>>>>return ret;
>>>>
>>>> -  if (nodes_empty(numa_nodes_parsed))
>>>> +  if (nodes_empty(numa_nodes_parsed)) {
>>>> +  pr_info("No NUMA configuration found\n");
>>>>return -EINVAL;
>>>
>>> Hmm, but dummy_numa_init calls node_set(nid, numa_nodes_parsed) for a
>>> completely artificial setup, created by adding all memblocks to node 0,
>>> so this new message will be suppressed even though things really did go
>>> wrong.
>> It will be printed by the former: numa_init(of_numa_init)
> 
> Does that print an error for every possible failure case? What about the
> acpi path?
I think acpi path should print error by itself. The reason maybe:
1. In numa_init and its sub function, all error paths printed error 
immediately, except arm64_acpi_numa_init.
2. Suppose numa_init returns error, we do not print the returned error code, so 
the user don't known what problem cause acpi numa failed.


> 
>>> In that case, don't we want to print *something* (like we do today in
>>> dummy_numa_init) but maybe not "No NUMA configuration found"? What
>>> exactly do you find inaccurate about the current message?
>> For example:
>> [0.00] NUMA: No distance-matrix property in distance-map
>> [0.00] No NUMA configuration found
>>
>> So if of_numa_init or arm64_acpi_numa_init returned error, because of
>> some numa configuration error had been found, it's no good to print "No
>> NUMA ...".
> 
> Sure, I'm all for changing the message. I just think removing it is
> probably unhelpful. Something like:
> 
> "NUMA: Failed to initialise from firmware"
I think adding this into arm64_acpi_numa_init will be better, maybe we should 
print 'ret' further:

int __init arm64_acpi_numa_init(void)
{
int ret;

ret = acpi_numa_init();
if (ret) {
+   pr_info("Failed to initialise from firmware\n");
return ret;
}

> 
> might do the trick?
> 
> Will
> 
> .
> 



Re: [PATCH v7 14/14] Documentation: remove the constraint on the distances of node pairs

2016-08-30 Thread Leizhen (ThunderTown)


On 2016/8/31 1:55, Will Deacon wrote:
> On Sat, Aug 27, 2016 at 06:44:39PM +0800, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2016/8/26 23:35, Will Deacon wrote:
>>> On Wed, Aug 24, 2016 at 03:44:53PM +0800, Zhen Lei wrote:
>>>> Update documentation. This limit is unneccessary.
>>>>
>>>> Signed-off-by: Zhen Lei 
>>>> Acked-by: Rob Herring 
>>>> ---
>>>>  Documentation/devicetree/bindings/numa.txt | 1 -
>>>>  1 file changed, 1 deletion(-)
>>>>
>>>> diff --git a/Documentation/devicetree/bindings/numa.txt 
>>>> b/Documentation/devicetree/bindings/numa.txt
>>>> index 21b3505..c0ea4a7 100644
>>>> --- a/Documentation/devicetree/bindings/numa.txt
>>>> +++ b/Documentation/devicetree/bindings/numa.txt
>>>> @@ -48,7 +48,6 @@ distance (memory latency) between all numa nodes.
>>>>
>>>>Note:
>>>>1. Each entry represents distance from first node to second node.
>>>> -  The distances are equal in either direction.
>>>
>>> Hmm, so what happens now if firmware provides a description where both
>>> distances (in either direction) are supplied, but are different?
>> I have not known any hardware that the distances of two direction are
>> different yet
> 
> Then let's not add support for this just yet. When we have systems that
> actually need it, we'll be in a much better position to assess the
> suitability of any patches. At the moment, the whole thing is pretty
> questionable and it adds needless complication to the code.
How about I changed to:
To simplify the configuration, the distance of the opposite direction is the 
same to it by default.

> 
> Will
> 
> .
> 



Re: [PATCH 1/1] arm64/hugetlb: clear PG_dcache_clean if the page is dirty when munmap

2016-08-23 Thread Leizhen (ThunderTown)


On 2016/8/24 1:28, Catalin Marinas wrote:
> On Mon, Aug 22, 2016 at 12:19:04PM +0800, Leizhen (ThunderTown) wrote:
>> On 2016/7/20 17:19, Catalin Marinas wrote:
>>> On Wed, Jul 20, 2016 at 10:46:27AM +0800, Leizhen (ThunderTown) wrote:
>>>>>>>> On 2016/7/8 21:54, Catalin Marinas wrote:
>>>>>>>>> 8<
>>>>>>>>> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
>>>>>>>>> index dbd12ea8ce68..c753fa804165 100644
>>>>>>>>> --- a/arch/arm64/mm/flush.c
>>>>>>>>> +++ b/arch/arm64/mm/flush.c
>>>>>>>>> @@ -75,7 +75,8 @@ void __sync_icache_dcache(pte_t pte, unsigned long 
>>>>>>>>> addr)
>>>>>>>>>   if (!page_mapping(page))
>>>>>>>>>   return;
>>>>>>>>>  
>>>>>>>>> - if (!test_and_set_bit(PG_dcache_clean, &page->flags))
>>>>>>>>> + if (!test_and_set_bit(PG_dcache_clean, &page->flags) ||
>>>>>>>>> + PageDirty(page))
>>>>>>>>>   sync_icache_aliases(page_address(page),
>>>>>>>>>   PAGE_SIZE << compound_order(page));
>>>>>>>>>   else if (icache_is_aivivt())
>>>>>>>>> 8<-
>>>>
>>>> Do you plan to send this patch? My colleagues told me that if our
>>>> patches are quite different, it should be Signed-off-by you.
>>>
>>> The reason I'm not sending it is that I don't fully understand how it
>>> solves the problem for a shared file mmap(), not just hugetlbfs. As I
>>> said in an earlier email: after an msync() in user space we
>>> should flush the pages to disk via write_cache_pages(). This function
>> Hi Catalin:
>>I'm so sorry for my fault. The previous small pages test result I 
>> actually ran on ramfs.
>> Today, I ran the case on harddisk fs, it worked well without this patch.
>>
>> Summarized as follows:
>> small pages on ramfs: need this patch
>> small pages on harddisk fs: no need this patch
>> hugetlbfs: need this patch
> 
> I would add:
> 
> small pages over nfs: fails with or without this patch
> 
> (tested on Juno, Cortex-A57; seems to be fixed if I remove the
> PG_dcache_clean test altogether but, well, we end up over-flushing)
> 
> I assume that when using a hard drive, it goes through the block I/O
> layer and we may have a flush_dcache_page() called when the kernel is
> about to read a page that has been mapped in user space. This would
> clear the PG_dcache_clean bit and subsequent __sync_icache_dcache()
> would perform cache maintenance.
> 
> Could you try on your system the test case without the msync() call? I'm
> not sure whether munmap() would trigger an immediate write-back, in
> which case we may see the issue even with the filesystem on a hard
> drive.
OK, no problem. I will do it today or tomorrow.

> 



Re: [PATCH 1/1] arm64/hugetlb: clear PG_dcache_clean if the page is dirty when munmap

2016-08-24 Thread Leizhen (ThunderTown)


On 2016/8/24 1:28, Catalin Marinas wrote:
> On Mon, Aug 22, 2016 at 12:19:04PM +0800, Leizhen (ThunderTown) wrote:
>> On 2016/7/20 17:19, Catalin Marinas wrote:
>>> On Wed, Jul 20, 2016 at 10:46:27AM +0800, Leizhen (ThunderTown) wrote:
>>>>>>>> On 2016/7/8 21:54, Catalin Marinas wrote:
>>>>>>>>> 8<
>>>>>>>>> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
>>>>>>>>> index dbd12ea8ce68..c753fa804165 100644
>>>>>>>>> --- a/arch/arm64/mm/flush.c
>>>>>>>>> +++ b/arch/arm64/mm/flush.c
>>>>>>>>> @@ -75,7 +75,8 @@ void __sync_icache_dcache(pte_t pte, unsigned long 
>>>>>>>>> addr)
>>>>>>>>>   if (!page_mapping(page))
>>>>>>>>>   return;
>>>>>>>>>  
>>>>>>>>> - if (!test_and_set_bit(PG_dcache_clean, &page->flags))
>>>>>>>>> + if (!test_and_set_bit(PG_dcache_clean, &page->flags) ||
>>>>>>>>> + PageDirty(page))
>>>>>>>>>   sync_icache_aliases(page_address(page),
>>>>>>>>>   PAGE_SIZE << compound_order(page));
>>>>>>>>>   else if (icache_is_aivivt())
>>>>>>>>> 8<-
>>>>
>>>> Do you plan to send this patch? My colleagues told me that if our
>>>> patches are quite different, it should be Signed-off-by you.
>>>
>>> The reason I'm not sending it is that I don't fully understand how it
>>> solves the problem for a shared file mmap(), not just hugetlbfs. As I
>>> said in an earlier email: after an msync() in user space we
>>> should flush the pages to disk via write_cache_pages(). This function
>> Hi Catalin:
>>I'm so sorry for my fault. The previous small pages test result I 
>> actually ran on ramfs.
>> Today, I ran the case on harddisk fs, it worked well without this patch.
>>
>> Summarized as follows:
>> small pages on ramfs: need this patch
>> small pages on harddisk fs: no need this patch
>> hugetlbfs: need this patch
> 
> I would add:
> 
> small pages over nfs: fails with or without this patch
> 
> (tested on Juno, Cortex-A57; seems to be fixed if I remove the
> PG_dcache_clean test altogether but, well, we end up over-flushing)
> 
> I assume that when using a hard drive, it goes through the block I/O
> layer and we may have a flush_dcache_page() called when the kernel is
> about to read a page that has been mapped in user space. This would
> clear the PG_dcache_clean bit and subsequent __sync_icache_dcache()
> would perform cache maintenance.
> 
> Could you try on your system the test case without the msync() call? I'm
According to my test results: without msync, the test case may failed.

10-175-112-211:~ # ./tst_small_page_no_msync
Test is Failed: The result is 0x316b9, expect = 0x365a5
10-175-112-211:~ # ./tst_small_page_no_msync
Test is Failed: The result is 0x31023, expect = 0x31efa
10-175-112-211:~ # ./tst_small_page_no_msync
Test is Passed: The result is 0x31efa, expect = 0x31efa

10-175-112-211:~ # ./tst_small_page
Test is Passed: The result is 0x31eb7, expect = 0x31eb7
10-175-112-211:~ # ./tst_small_page
Test is Passed: The result is 0x3111f, expect = 0x3111f
10-175-112-211:~ # ./tst_small_page
Test is Passed: The result is 0x3111f, expect = 0x3111f

> not sure whether munmap() would trigger an immediate write-back, in
> which case we may see the issue even with the filesystem on a hard
> drive.
> 



Re: [PATCH 1/1] arm64/hugetlb: clear PG_dcache_clean if the page is dirty when munmap

2016-08-24 Thread Leizhen (ThunderTown)


On 2016/8/24 18:30, Catalin Marinas wrote:
> On Wed, Aug 24, 2016 at 05:00:50PM +0800, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2016/8/24 1:28, Catalin Marinas wrote:
>>> On Mon, Aug 22, 2016 at 12:19:04PM +0800, Leizhen (ThunderTown) wrote:
>>>> On 2016/7/20 17:19, Catalin Marinas wrote:
>>>>> On Wed, Jul 20, 2016 at 10:46:27AM +0800, Leizhen (ThunderTown) wrote:
>>>>>>>>>> On 2016/7/8 21:54, Catalin Marinas wrote:
>>>>>>>>>>> 8<
>>>>>>>>>>> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
>>>>>>>>>>> index dbd12ea8ce68..c753fa804165 100644
>>>>>>>>>>> --- a/arch/arm64/mm/flush.c
>>>>>>>>>>> +++ b/arch/arm64/mm/flush.c
>>>>>>>>>>> @@ -75,7 +75,8 @@ void __sync_icache_dcache(pte_t pte, unsigned 
>>>>>>>>>>> long addr)
>>>>>>>>>>> if (!page_mapping(page))
>>>>>>>>>>> return;
>>>>>>>>>>>  
>>>>>>>>>>> -   if (!test_and_set_bit(PG_dcache_clean, &page->flags))
>>>>>>>>>>> +   if (!test_and_set_bit(PG_dcache_clean, &page->flags) ||
>>>>>>>>>>> +   PageDirty(page))
>>>>>>>>>>> sync_icache_aliases(page_address(page),
>>>>>>>>>>> PAGE_SIZE << compound_order(page));
>>>>>>>>>>> else if (icache_is_aivivt())
>>>>>>>>>>> 8<-
>>>>>>
>>>>>> Do you plan to send this patch? My colleagues told me that if our
>>>>>> patches are quite different, it should be Signed-off-by you.
>>>>>
>>>>> The reason I'm not sending it is that I don't fully understand how it
>>>>> solves the problem for a shared file mmap(), not just hugetlbfs. As I
>>>>> said in an earlier email: after an msync() in user space we
>>>>> should flush the pages to disk via write_cache_pages(). This function
>>>> Hi Catalin:
>>>>I'm so sorry for my fault. The previous small pages test result I 
>>>> actually ran on ramfs.
>>>> Today, I ran the case on harddisk fs, it worked well without this patch.
>>>>
>>>> Summarized as follows:
>>>> small pages on ramfs: need this patch
>>>> small pages on harddisk fs: no need this patch
>>>> hugetlbfs: need this patch
>>>
>>> I would add:
>>>
>>> small pages over nfs: fails with or without this patch
>>>
>>> (tested on Juno, Cortex-A57; seems to be fixed if I remove the
>>> PG_dcache_clean test altogether but, well, we end up over-flushing)
>>>
>>> I assume that when using a hard drive, it goes through the block I/O
>>> layer and we may have a flush_dcache_page() called when the kernel is
>>> about to read a page that has been mapped in user space. This would
>>> clear the PG_dcache_clean bit and subsequent __sync_icache_dcache()
>>> would perform cache maintenance.
>>>
>>> Could you try on your system the test case without the msync() call? I'm
>>
>> According to my test results: without msync, the test case may failed.
> 
> Thanks. Just to be clear, does the test generate a file on on a hard
> drive?
Yes. I checked that the intermediate file had been generated.

> 
>> 10-175-112-211:~ # ./tst_small_page_no_msync
>> Test is Failed: The result is 0x316b9, expect = 0x365a5
>> 10-175-112-211:~ # ./tst_small_page_no_msync
>> Test is Failed: The result is 0x31023, expect = 0x31efa
>> 10-175-112-211:~ # ./tst_small_page_no_msync
>> Test is Passed: The result is 0x31efa, expect = 0x31efa
>>
>> 10-175-112-211:~ # ./tst_small_page
>> Test is Passed: The result is 0x31eb7, expect = 0x31eb7
>> 10-175-112-211:~ # ./tst_small_page
>> Test is Passed: The result is 0x3111f, expect = 0x3111f
>> 10-175-112-211:~ # ./tst_small_page
>> Test is Passed: The result is 0x3111f, expect = 0x3111f
> 
> How many tests did you run for the "passed" case? With NFS it may
I ran ./tst_small_page_no_msync and ./tst_small_page 10 times for each.

> sometime take minutes before a failure (I use the "watch" command with a
> slightly modified test to return non-zero in case of value mismatch).
> 
> While we indeed see failures on multiple filesystem types, I wonder
> whether this test case is actually expected to work. If I modify the
> test to pass O_TRUNC to open(), I can no longer see failures. So any
> standard tool that copies/creates executable files (gcc, dpkg, cp, rsync
> etc.) wouldn't encounter such issues since they truncate the original
> file and old page cache pages would be removed.
> 
> Do you have a real use-case where a task mmap's an executable file,
> modifies it in place and expects another task to see the new
> instructions without user-space cache maintenance?
No, it's just a test case created by testers.

> 



Re: [PATCH 1/1] arm64/hugetlb: clear PG_dcache_clean if the page is dirty when munmap

2016-08-25 Thread Leizhen (ThunderTown)


On 2016/8/25 17:30, Catalin Marinas wrote:
> On Thu, Aug 25, 2016 at 09:42:26AM +0800, Leizhen (ThunderTown) wrote:
>> On 2016/8/24 18:30, Catalin Marinas wrote:
>>>>>>>>>>>> On 2016/7/8 21:54, Catalin Marinas wrote:
>>>>>>>>>>>>> 8<
>>>>>>>>>>>>> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
>>>>>>>>>>>>> index dbd12ea8ce68..c753fa804165 100644
>>>>>>>>>>>>> --- a/arch/arm64/mm/flush.c
>>>>>>>>>>>>> +++ b/arch/arm64/mm/flush.c
>>>>>>>>>>>>> @@ -75,7 +75,8 @@ void __sync_icache_dcache(pte_t pte, unsigned 
>>>>>>>>>>>>> long addr)
>>>>>>>>>>>>>   if (!page_mapping(page))
>>>>>>>>>>>>>   return;
>>>>>>>>>>>>>  
>>>>>>>>>>>>> - if (!test_and_set_bit(PG_dcache_clean, &page->flags))
>>>>>>>>>>>>> + if (!test_and_set_bit(PG_dcache_clean, &page->flags) ||
>>>>>>>>>>>>> + PageDirty(page))
>>>>>>>>>>>>>   sync_icache_aliases(page_address(page),
>>>>>>>>>>>>>   PAGE_SIZE << compound_order(page));
>>>>>>>>>>>>>   else if (icache_is_aivivt())
>>>>>>>>>>>>> 8<-
> [...]
>>> While we indeed see failures on multiple filesystem types, I wonder
>>> whether this test case is actually expected to work. If I modify the
>>> test to pass O_TRUNC to open(), I can no longer see failures. So any
>>> standard tool that copies/creates executable files (gcc, dpkg, cp, rsync
>>> etc.) wouldn't encounter such issues since they truncate the original
>>> file and old page cache pages would be removed.
>>>
>>> Do you have a real use-case where a task mmap's an executable file,
>>> modifies it in place and expects another task to see the new
>>> instructions without user-space cache maintenance?
>>
>> No, it's just a test case created by testers.
> 
> In this case I propose we ignore this patch and you adjust the test to
> use O_TRUNC, at least until we find a real scenario where this would
> matter.
OK, thanks. We currently add __clear_cache in user space.

> 



Re: [PATCH 0/5] arm-smmu: performance optimization

2017-08-17 Thread Leizhen (ThunderTown)


On 2017/8/17 22:36, Will Deacon wrote:
> Thunder, Nate, Robin,
> 
> On Mon, Jun 26, 2017 at 09:38:45PM +0800, Zhen Lei wrote:
>> I described the optimization more detail in patch 1 and 2, and patch 3-5 are
>> the implementation on arm-smmu/arm-smmu-v3 of patch 2.
>>
>> Patch 1 is v2. In v1, I directly replaced writel with writel_relaxed in
>> queue_inc_prod. But Robin figured that it may lead SMMU consume stale
>> memory contents. I thought more than 3 whole days and got this one.
>>
>> This patchset is based on Robin Murphy's [PATCH v2 0/8] io-pgtable lock 
>> removal.
> 
> For the time being, I think we should focus on the new TLB flushing
> interface posted by Joerg:
> 
> http://lkml.kernel.org/r/1502974596-23835-1-git-send-email-j...@8bytes.org
> 
> which looks like it can give us most of the benefits of this series. Once
> we've got that, we can see what's left in the way of performance and focus
> on the cmdq batching separately (because I'm still not convinced about it).
OK, this is a good news.

But I have a review comment(sorry, I have not subscribed it yet, so can not 
directly reply it):
I don't think we should add tlb sync for map operation
1. at init time, all tlbs will be invalidated
2. when we try to map a new range, there are no related ptes bufferd in tlb, 
because of above 1 and below 3
3. when we unmap the above range, make sure all related ptes bufferd in tlb to 
be invalidated before unmap finished

> 
> Thanks,
> 
> Will
> 
> .
> 

-- 
Thanks!
BestRegards



Re: [PATCH 1/1] iommu/arm-smmu-v3: replace writel with writel_relaxed in queue_inc_prod

2017-06-20 Thread Leizhen (ThunderTown)


On 2017/6/20 19:35, Robin Murphy wrote:
> On 20/06/17 12:04, Zhen Lei wrote:
>> This function is protected by spinlock, and the latter will do memory
>> barrier implicitly. So that we can safely use writel_relaxed. In fact, the
>> dmb operation will lengthen the time protected by lock, which indirectly
>> increase the locking confliction in the stress scene.
> 
> If you remove the DSB between writing the commands (to Normal memory)
> and writing the pointer (to Device memory), how can you guarantee that
> the complete command is visible to the SMMU and it isn't going to try to
> consume stale memory contents? The spinlock is irrelevant since it's
> taken *before* the command is written.
OK, I see, thanks. Let's me see if there are any other methods. And I think
that this may should be done well by hardware.

> 
> Robin.
> 
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/iommu/arm-smmu-v3.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 380969a..d2fbee3 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -728,7 +728,7 @@ static void queue_inc_prod(struct arm_smmu_queue *q)
>>  u32 prod = (Q_WRP(q, q->prod) | Q_IDX(q, q->prod)) + 1;
>>
>>  q->prod = Q_OVF(q, q->prod) | Q_WRP(q, prod) | Q_IDX(q, prod);
>> -writel(q->prod, q->prod_reg);
>> +writel_relaxed(q->prod, q->prod_reg);
>>  }
>>
>>  /*
>> --
>> 2.5.0
>>
>>
> 
> 
> .
> 

-- 
Thanks!
BestRegards



Re: [PATCH v2 1/3] iommu/arm-smmu-v3: put off the execution of TLBI* to reduce lock confliction

2017-10-18 Thread Leizhen (ThunderTown)


On 2017/10/18 20:58, Will Deacon wrote:
> Hi Thunder,
> 
> On Tue, Sep 12, 2017 at 09:00:36PM +0800, Zhen Lei wrote:
>> Because all TLBI commands should be followed by a SYNC command, to make
>> sure that it has been completely finished. So we can just add the TLBI
>> commands into the queue, and put off the execution until meet SYNC or
>> other commands. To prevent the followed SYNC command waiting for a long
>> time because of too many commands have been delayed, restrict the max
>> delayed number.
>>
>> According to my test, I got the same performance data as I replaced writel
>> with writel_relaxed in queue_inc_prod.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/iommu/arm-smmu-v3.c | 42 +-
>>  1 file changed, 37 insertions(+), 5 deletions(-)
> 
> If we want to go down the route of explicit command batching, I'd much
> rather do it by implementing the iotlb_range_add callback in the driver,
> and have a fixed-length array of batched ranges on the domain. We could
I think even if iotlb_range_add callback is implemented, this patch is still 
valuable. The main purpose
of this patch is to reduce dsb operation. So in the scenario with 
iotlb_range_add implemented:
.iotlb_range_add:
spin_lock_irqsave(&smmu->cmdq.lock, flags);
...
add tlbi range-1 to cmq-queue
...
add tlbi range-n to cmq-queue   //n
dsb
...
spin_unlock_irqrestore(&smmu->cmdq.lock, flags);

.iotlb_sync
spin_lock_irqsave(&smmu->cmdq.lock, flags);
...
add cmd_sync to cmq-queue
dsb
...
spin_unlock_irqrestore(&smmu->cmdq.lock, flags);

Although iotlb_range_add can reduce n-1 dsb operations, but there are still 1 
left. If n is not large enough,
this patch is helpful.


> potentially toggle this function pointer based on the compatible string too,
> if it shows only to benefit some systems.
[
On 2017/9/19 12:31, Nate Watterson wrote:
I tested these (2) patches on QDF2400 hardware and saw performance
improvements in line with those I reported when testing the original
series.
]

I'm not sure whether this patch can improve performance on QDF2400, because 
there are two patches. But at least
it seems harmless, maybe the other hardware platforms are the same.

> 
> Will
> 
> .
> 

-- 
Thanks!
BestRegards



Re: [PATCH v2 2/3] iommu/arm-smmu-v3: add support for unmap an iova range with only one tlb sync

2017-10-18 Thread Leizhen (ThunderTown)


On 2017/10/18 21:00, Will Deacon wrote:
> On Tue, Sep 12, 2017 at 09:00:37PM +0800, Zhen Lei wrote:
>> This patch is base on: 
>> (add02cfdc9bc2 "iommu: Introduce Interface for IOMMU TLB Flushing")
>>
>> Because iotlb_sync is moved out of ".unmap = arm_smmu_unmap", some interval
>> ".unmap" calls should explicitly followed by a iotlb_sync operation.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/iommu/arm-smmu-v3.c| 10 ++
>>  drivers/iommu/io-pgtable-arm.c | 30 --
>>  drivers/iommu/io-pgtable.h |  1 +
>>  3 files changed, 31 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index ef42c4b..e92828e 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -1772,6 +1772,15 @@ arm_smmu_unmap(struct iommu_domain *domain, unsigned 
>> long iova, size_t size)
>>  return ops->unmap(ops, iova, size);
>>  }
>>  
>> +static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
>> +{
>> +struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> +
>> +if (ops && ops->iotlb_sync)
>> +ops->iotlb_sync(ops);
>> +}
>> +
>>  static phys_addr_t
>>  arm_smmu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
>>  {
>> @@ -1991,6 +2000,7 @@ static struct iommu_ops arm_smmu_ops = {
>>  .attach_dev = arm_smmu_attach_dev,
>>  .map= arm_smmu_map,
>>  .unmap  = arm_smmu_unmap,
>> +.iotlb_sync = arm_smmu_iotlb_sync,
>>  .map_sg = default_iommu_map_sg,
>>  .iova_to_phys   = arm_smmu_iova_to_phys,
>>  .add_device = arm_smmu_add_device,
>> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
>> index e8018a3..805efc9 100644
>> --- a/drivers/iommu/io-pgtable-arm.c
>> +++ b/drivers/iommu/io-pgtable-arm.c
>> @@ -304,6 +304,8 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable 
>> *data,
>>  WARN_ON(!selftest_running);
>>  return -EEXIST;
>>  } else if (iopte_type(pte, lvl) == ARM_LPAE_PTE_TYPE_TABLE) {
>> +size_t unmapped;
>> +
>>  /*
>>   * We need to unmap and free the old table before
>>   * overwriting it with a block entry.
>> @@ -312,7 +314,9 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable 
>> *data,
>>  size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
>>  
>>  tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
>> -if (WARN_ON(__arm_lpae_unmap(data, iova, sz, lvl, tblp) != sz))
>> +unmapped = __arm_lpae_unmap(data, iova, sz, lvl, tblp);
>> +io_pgtable_tlb_sync(&data->iop);
>> +if (WARN_ON(unmapped != sz))
>>  return -EINVAL;
>>  }
>>  
>> @@ -584,7 +588,6 @@ static int __arm_lpae_unmap(struct arm_lpae_io_pgtable 
>> *data,
>>  /* Also flush any partial walks */
>>  io_pgtable_tlb_add_flush(iop, iova, size,
>>  ARM_LPAE_GRANULE(data), false);
>> -io_pgtable_tlb_sync(iop);
>>  ptep = iopte_deref(pte, data);
>>  __arm_lpae_free_pgtable(data, lvl + 1, ptep);
>>  } else {
>> @@ -609,7 +612,6 @@ static int __arm_lpae_unmap(struct arm_lpae_io_pgtable 
>> *data,
>>  static int arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
>>size_t size)
>>  {
>> -size_t unmapped;
>>  struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
>>  arm_lpae_iopte *ptep = data->pgd;
>>  int lvl = ARM_LPAE_START_LVL(data);
>> @@ -617,11 +619,14 @@ static int arm_lpae_unmap(struct io_pgtable_ops *ops, 
>> unsigned long iova,
>>  if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
>>  return 0;
>>  
>> -unmapped = __arm_lpae_unmap(data, iova, size, lvl, ptep);
>> -if (unmapped)
>> -io_pgtable_tlb_sync(&data->iop);
>> +return __arm_lpae_unmap(data, iova, size, lvl, ptep);
>> +}
> 
> This change is already queued in Joerg's tree, due to a patch from Robin.
Yes, I see. So this one can be skipped.

> 
> Will
> 
> .
> 

-- 
Thanks!
BestRegards



Re: [PATCH 2/2] arm64: to allow EFI_RTC can be selected on ARM64

2015-10-08 Thread Leizhen (ThunderTown)


On 2015/9/28 17:44, Leizhen (ThunderTown) wrote:
>> > 
>>> >> --drivers/char/Kconfig--
>>> >> if RTC_LIB=n
>>> >>
>>> >> config RTC
>>> >> tristate "Enhanced Real Time Clock Support (legacy PC RTC driver)"
>>> >>
>>> >> ...
>>> >>
>>> >> config EFI_RTC
>>> >> bool "EFI Real Time Clock Services"
>>> >> depends on IA64 || ARM64
>>> >>
>>> >> ...
>>> >>
>>> >> endif # RTC_LIB
>> > 
>> > The driver you want is RTC_DRV_EFI, not EFI_RTC.
> OK, I will try it tommorrow.

Sorry, the coach called me to learn to drive these days. I opened RTC_DRV_EFI 
and the RTC worked fine, thanks a lot.

> 
>> > 
>> >Arnd
>> > 
>> > .
>> > 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 5/5] iommu/arm-smmu-v3: add bootup option "iommu.non_strict"

2018-08-27 Thread Leizhen (ThunderTown)



On 2018/8/23 1:02, Robin Murphy wrote:
> On 15/08/18 02:28, Zhen Lei wrote:
>> Add a bootup option to make the system manager can choose which mode to
>> be used. The default mode is strict.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>   Documentation/admin-guide/kernel-parameters.txt | 13 +
>>   drivers/iommu/arm-smmu-v3.c | 22 +-
>>   2 files changed, 34 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
>> b/Documentation/admin-guide/kernel-parameters.txt
>> index 5cde1ff..cb9d043e 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -1720,6 +1720,19 @@
>>   nobypass[PPC/POWERNV]
>>   Disable IOMMU bypass, using IOMMU for PCI devices.
>>
>> +iommu.non_strict=[ARM64]
>> +Format: { "0" | "1" }
>> +0 - strict mode, default.
>> +Release IOVAs after the related TLBs are invalid
>> +completely.
>> +1 - non-strict mode.
>> +Put off TLBs invalidation and release memory first.
>> +It's good for scatter-gather performance but lacks
>> +full isolation, an untrusted device can access the
>> +reused memory because the TLBs may still valid.
>> +Please takefull consideration before choosing this
>> +mode. Note that, VFIO will always use strict mode.
>> +
>>   iommu.passthrough=
>>   [ARM64] Configure DMA to bypass the IOMMU by default.
>>   Format: { "0" | "1" }
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 61eb7ec..0eda90e 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -631,6 +631,26 @@ struct arm_smmu_option_prop {
>>   { 0, NULL},
>>   };
>>
>> +static bool smmu_non_strict __read_mostly;
>> +
>> +static int __init arm_smmu_setup(char *str)
>> +{
>> +int ret;
>> +
>> +ret = kstrtobool(str, &smmu_non_strict);
>> +if (ret)
>> +return ret;
>> +
>> +if (smmu_non_strict) {
>> +pr_warn("WARNING: iommu non-strict mode is chosen.\n"
>> +"It's good for scatter-gather performance but lacks full 
>> isolation\n");
>> +add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
>> +}
>> +
>> +return 0;
>> +}
>> +early_param("iommu.non_strict", arm_smmu_setup);
> 
> As I said on v3, the option should be parsed by iommu-dma, since that's where 
> it takes effect, and I'm sure SMMUv2 users will be interested in trying it 
> out too.

OK, I am so sorry that I have not understood your opinion correctly.

> 
> In other words, if iommu_dma_init_domain() does something like:
> 
> if (iommu_dma_non_strict && domain->ops->flush_iotlb_all) {
> domain->non_strict = true;
> cookie->domain = domain;
> init_iova_flush_queue(...);
> }
> 
>> +
>>   static inline void __iomem *arm_smmu_page1_fixup(unsigned long offset,
>>struct arm_smmu_device *smmu)
>>   {
>> @@ -1622,7 +1642,7 @@ static int arm_smmu_domain_finalise(struct 
>> iommu_domain *domain)
>>   if (smmu->features & ARM_SMMU_FEAT_COHERENCY)
>>   pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_NO_DMA;
>>
>> -if (domain->type == IOMMU_DOMAIN_DMA) {
>> +if ((domain->type == IOMMU_DOMAIN_DMA) && smmu_non_strict) {
>>   domain->non_strict = true;
>>   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
>>   }
> 
> ...then all the driver should need to do is:
> 
> if (domain->non_strict)
> pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
> 
> 
> Now, that would make it possible to request non-strict mode even with drivers 
> which *don't* understand it, but I think that's not actually harmful, just 
> means that some TLBIs will still get issued synchronously and the flush queue 
> might not do much. If you wanted to avoid even that, you could replace 
> domain->non_strict with an iommu_domain_set_attr() call, so iommu_dma could 
> tell up-front whether the driver understands non-strict mode and it's worth 
> setting the queue up or not.

OK, I will think seriously about it, thanks. I've been busy these days, I will 
reply to you as soon as possible.

> 
> Robin.
> 
> .
> 

-- 
Thanks!
BestRegards



Re: [PATCH v5 3/5] iommu/io-pgtable-arm: add support for non-strict mode

2018-08-27 Thread Leizhen (ThunderTown)



On 2018/8/23 1:52, Robin Murphy wrote:
> On 15/08/18 02:28, Zhen Lei wrote:
>> To support the non-strict mode, now we only tlbi and sync for the strict
>> mode. But for the non-leaf case, always follow strict mode.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>   drivers/iommu/io-pgtable-arm.c | 20 ++--
>>   drivers/iommu/io-pgtable.h |  3 +++
>>   2 files changed, 17 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
>> index 010a254..20d3e98 100644
>> --- a/drivers/iommu/io-pgtable-arm.c
>> +++ b/drivers/iommu/io-pgtable-arm.c
>> @@ -538,6 +538,7 @@ static size_t arm_lpae_split_blk_unmap(struct 
>> arm_lpae_io_pgtable *data,
>>   phys_addr_t blk_paddr;
>>   size_t tablesz = ARM_LPAE_GRANULE(data);
>>   size_t split_sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
>> +size_t unmapped = size;
>>   int i, unmap_idx = -1;
>>
>>   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
>> @@ -575,11 +576,16 @@ static size_t arm_lpae_split_blk_unmap(struct 
>> arm_lpae_io_pgtable *data,
>>   tablep = iopte_deref(pte, data);
>>   }
>>
>> -if (unmap_idx < 0)
> 
> [ side note: the more I see this test the more it looks slightly wrong, but 
> that's a separate issue. I'll have to sit down and really remember what's 
> going on here... ]
> 
>> -return __arm_lpae_unmap(data, iova, size, lvl, tablep);
>> +if (unmap_idx < 0) {
>> +unmapped = __arm_lpae_unmap(data, iova, size, lvl, tablep);
>> +if (!(data->iop.cfg.quirks & IO_PGTABLE_QUIRK_NON_STRICT))
>> +return unmapped;
>> +}
> 
> I don't quite get this change - we should only be recursing back into 
> __arm_lpae_unmap() here if nothing's actually been unmapped at this point - 
> the block entry is simply replaced by a full next-level table and we're going 
> to come back and split another block at that next level, or we raced against 
> someone else splitting the same block and that's their table instead. Since 
> that's reverting back to a "regular" unmap, I don't see where the need to 
> introduce an additional flush to that path comes from (relative to the 
> existing behaviour, at least).

The old block mapping maybe cached in TLBs, it should be invalidated completely 
before the new next-level mapping to be used. Just ensure that.

In fact, I think the code of arm_lpae_split_blk_unmap may has some mistakes. 
For example:
if (size == split_sz)
unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data);

It means that "the size" can only be the block/page size of the next-level. 
Suppose current level is 2M block, but we may unmap 12K, and
the above "if" will limit us only be able to unmap 4K.

Furthermore, the situation "if (unmap_idx < 0)" should not appear.

Maybe my analysis is wrong, I will try to test it.


> 
>>   io_pgtable_tlb_add_flush(&data->iop, iova, size, size, true);
> 
> This is the flush which corresponds to whatever page split_blk_unmap() 
> actually unmapped itself (and also covers any recursively-split 
> intermediate-level entries)...
> 
>> -return size;
>> +io_pgtable_tlb_sync(&data->iop);
> 
> ...which does want this sync, but only technically for non-strict mode, since 
> it's otherwise covered by the sync in iommu_unmap().

Because split_blk_unmap() is rarely to be called, it has little impact on the 
overall performance,
so I ommitted the if statement of non-strict, I will add it back.

> 
> I'm not *against* tightening up the TLB maintenance here in general, but if 
> so that should be a separately-reasoned patch, not snuck in with other 
> changes.

OK

> 
> Robin.
> 
>> +
>> +return unmapped;
>>   }
>>
>>   static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>> @@ -609,7 +615,7 @@ static size_t __arm_lpae_unmap(struct 
>> arm_lpae_io_pgtable *data,
>>   io_pgtable_tlb_sync(iop);
>>   ptep = iopte_deref(pte, data);
>>   __arm_lpae_free_pgtable(data, lvl + 1, ptep);
>> -} else {
>> +} else if (!(iop->cfg.quirks & IO_PGTABLE_QUIRK_NON_STRICT)) {
>>   io_pgtable_tlb_add_flush(iop, iova, size, size, true);
>>   }
>>
>> @@ -771,7 +777,8 @@ static void arm_lpae_restrict_pgsizes(struct 
>> io_pgtable_cfg *cfg)
>>   u64 reg;
>>   struct arm_lpae_io_pgtable *data;
>>
>> -if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA))
>> +if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA |
>> +IO_PGTABLE_QUIRK_NON_STRICT))
>>   return NULL;
>>
>>   data = arm_lpae_alloc_pgtable(cfg);
>> @@ -863,7 +870,8 @@ static void arm_lpae_restrict_pgsizes(struct 
>> io_pgtable_cfg *cfg)
>>   struct arm_lpae_io_pgtable *data;
>>
>>   /* The NS quirk doesn't apply at stage 2 */
>> -if (cfg->quirks & ~IO_PGTABLE_QUIRK_NO_DMA)
>> +if (cfg->quirks & ~(IO_PGTABLE_QUIRK_NO_DMA |
>> +IO_PGTABLE_QUIRK_NON_STRICT))
>>   retu

Re: [PATCH v2 1/1] iommu/arm-smmu-v3: eliminate a potential memory corruption on Hi16xx soc

2018-10-29 Thread Leizhen (ThunderTown)



On 2018/10/30 1:59, Will Deacon wrote:
> On Sat, Oct 20, 2018 at 03:36:54PM +0800, Zhen Lei wrote:
>> The standard GITS_TRANSLATER register in ITS is only 4 bytes, but
>> Hisilicon expands the next 4 bytes to carry some IMPDEF information. That
>> means, total 8 bytes data will be written to MSIAddress each time.
>>
>> MSIAddr: |4bytes|4bytes|
>>   |MSIData   |IMPDEF|
>>
>> There is no problem for ITS, because the next 4 bytes space is reserved
>> in ITS. But it will overwrite the 4 bytes memory following "sync_count".
>> It's very fortunately that the previous and the next neighbour of the
>> "sync_count" are both aligned by 8 bytes, so no problem is met now.
>>
>> It's good to explicitly add a workaround:
>> 1. Add gcc __attribute__((aligned(8))) to make sure that "sync_count" is
>>always aligned by 8 bytes.
>> 2. Add a "int" struct member to make sure the 4 bytes padding is always
>>exist.
>>
>> There is no functional change.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/iommu/arm-smmu-v3.c | 15 ++-
>>  1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 5059d09..624fdd0 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -586,7 +586,20 @@ struct arm_smmu_device {
>>  
>>  struct arm_smmu_strtab_cfg  strtab_cfg;
>>  
>> -u32 sync_count;
>> +/*
>> + * The alignment and padding is required by Hi16xx of Hisilicon.
>> + * Because the ITS hardware on Hi16xx will truncate the MSIAddress(Here
>> + * it's the address of "sync_count") to 8 bytes boundary first, then
>> + * write 32 bits MSIdata at offset 0, and 32 bits IMPDEF data at offset
>> + * 4. Without this workaround, the adjacent member maybe overwritten.
>> + *
>> + *|---4bytes---|---4bytes---|
>> + * MSIAddress & (~0x7):   MSIdata  | IMPDEF data|
>> + */
>> +struct {
>> +u32 sync_count;
>> +int padding;
>> +} __attribute__((aligned(8)));
> 
> I thought the conclusion after reviewing your original patch was to maintain
> the union and drop the alignment directive? e.g.
> 
>   union {
>   u32 sync_count;
>   u64 padding; /* Hi16xx writes an extra 32 bits of goodness 
> */
>   };
OK, I will sent v3.

> 
> Will
> 
> .
> 

-- 
Thanks!
BestRegards



Re: [PATCH 1/1] iommu/arm-smmu-v3: eliminate a potential memory corruption on Hi16xx soc

2018-10-16 Thread Leizhen (ThunderTown)



On 2018/10/15 19:17, John Garry wrote:
> On 15/10/2018 09:36, Zhen Lei wrote:
>> ITS translation register map:
>> 0x-0x003CReserved
>> 0x0040GITS_TRANSLATER
>> 0x0044-0xFFFCReserved
>>
> 
> Can you add a better opening than the ITS translation register map?

OK

> 
>> The standard GITS_TRANSLATER register in ITS is only 4 bytes, but Hisilicon
>> expands the next 4 bytes to carry some IMPDEF information. That means, 8 
>> bytes
>> data will be written to MSIAddress each time.
>>
>> MSIAddr: |4bytes|4bytes|
>>  |MSIData   |IMPDEF|
>>
>> There is no problem for ITS, because the next 4 bytes space is reserved in 
>> ITS.
>> But it will overwrite the 4 bytes memory following "sync_count". It's very
> 
> I think arm_smmu_device.sync_count is better, or "sync_count member in the 
> the smmu driver control struct".

OK, I will use "struct" in v2.

+   struct {
u32 sync_count;
+   u32 padding;
+   } __attribute__((aligned(8)));

> 
>> luckly that the previous and the next neighbour of "sync_count" are both 
>> aligned
> 
> /s/luckly/luckily or fortunately/

OK, thanks

> 
>> by 8 bytes, so no problem is met now.
>>
>> It's good to explicitly add a workaround:
>> 1. Add gcc __attribute__((aligned(8))) to make sure that "sync_count" is 
>> always
>>aligned by 8 bytes.
>> 2. Add a "u64" union member to make sure the 4 bytes padding is always exist.
>>
>> There is no functional change.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/iommu/arm-smmu-v3.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 5059d09..a07bc0d 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -586,7 +586,10 @@ struct arm_smmu_device {
>>
>>  struct arm_smmu_strtab_cfgstrtab_cfg;
>>
>> +union {
>> +u64padding; /* workaround for Hisilicon */
> 
> I think that a more detailed comment is required.

OK, I will try to describe it more clearly.

> 
>>  u32sync_count;
> 
> Can you indent these 2 members? However - as discussed internally - this may 
> have endian issue so better to declare full 64b struct.

These indent is inherited, to keep aligning with other members.

There is no endian issue, I have tested it on both little-endian and big-endian.

$gdb vmlinux
..
(gdb) p &((struct arm_smmu_device *)0)->sync_count
$1 = (u32 *) 0x4178
(gdb) p &((struct arm_smmu_device *)0)->tst1
$2 = (int *) 0x4170
(gdb) p &((struct arm_smmu_device *)0)->tst2
$3 = (int *) 0x4180

testcase

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 5059d09..7c6f7ac 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -586,7 +586,14 @@ struct arm_smmu_device {

struct arm_smmu_strtab_cfg  strtab_cfg;

+ int tst1;
+
+ union {
+ u64 padding;
u32 sync_count;
+ } __attribute__((aligned(8)));
+
+ int tst2;

/* IOMMU core code handle */
struct iommu_device iommu;

> 
>> +} __attribute__((aligned(8)));
>>
>>  /* IOMMU core code handle */
>>  struct iommu_deviceiommu;
>>
> Thanks
> 
> 
> 
> 
> .
> 

-- 
Thanks!
BestRegards



Re: [PATCH 1/1] iommu/arm-smmu-v3: eliminate a potential memory corruption on Hi16xx soc

2018-10-16 Thread Leizhen (ThunderTown)



On 2018/10/15 20:46, Andrew Murray wrote:
> Hi Zhen,
> 
> On Mon, Oct 15, 2018 at 04:36:16PM +0800, Zhen Lei wrote:
>> ITS translation register map:
>> 0x-0x003CReserved
>> 0x0040   GITS_TRANSLATER
>> 0x0044-0xFFFCReserved
>>
>> The standard GITS_TRANSLATER register in ITS is only 4 bytes, but Hisilicon
>> expands the next 4 bytes to carry some IMPDEF information. That means, 8 
>> bytes
>> data will be written to MSIAddress each time.
>>
>> MSIAddr: |4bytes|4bytes|
>>   |MSIData   |IMPDEF|
>>
>> There is no problem for ITS, because the next 4 bytes space is reserved in 
>> ITS.
>> But it will overwrite the 4 bytes memory following "sync_count". It's very
>> luckly that the previous and the next neighbour of "sync_count" are both 
>> aligned
>> by 8 bytes, so no problem is met now.
> 
> My understanding is that MSI's are 32bit memory writes and as such the SMMU
> performs a 32bit write in response to the MSI. If so then what is different
> with the Hi16xx that causes a problem? Have you been able to able to adjust
> the layout of the arm_smmu_device struct to demonstrate this?

In normal, only 32bits MSIdata will be written into sync_count:
|4bytes|4bytes|
|  sync_count  |  |

But for Hi16xx, the ITS hardware will write extra 32bits IMDDEF data into 
"". If
"" is the space of the next struct member, its value will be overwritten.

> 
> Thanks,
> 
> Andrew Murray
> 
>>
>> It's good to explicitly add a workaround:
>> 1. Add gcc __attribute__((aligned(8))) to make sure that "sync_count" is 
>> always
>>aligned by 8 bytes.
>> 2. Add a "u64" union member to make sure the 4 bytes padding is always exist.
>>
>> There is no functional change.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/iommu/arm-smmu-v3.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 5059d09..a07bc0d 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -586,7 +586,10 @@ struct arm_smmu_device {
>>  
>>  struct arm_smmu_strtab_cfg  strtab_cfg;
>>  
>> +union {
>> +u64 padding; /* workaround for Hisilicon */
>>  u32 sync_count;
>> +} __attribute__((aligned(8)));
>>  
>>  /* IOMMU core code handle */
>>  struct iommu_device iommu;
>> -- 
>> 1.8.3
>>
>>
>> ___
>> iommu mailing list
>> io...@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 
> .
> 

-- 
Thanks!
BestRegards



Re: [PATCH 2/3] kallsyms: Add APIs to match symbol without .llmv. suffix.

2024-08-01 Thread Leizhen (ThunderTown)



On 2024/7/31 9:00, Song Liu wrote:
> Hi Masami, 
> 
>> On Jul 30, 2024, at 6:03 AM, Masami Hiramatsu  wrote:
>>
>> On Mon, 29 Jul 2024 17:54:32 -0700
>> Song Liu  wrote:
>>
>>> With CONFIG_LTO_CLANG=y, the compiler may add suffix to function names
>>> to avoid duplication. This causes confusion with users of kallsyms.
>>> On one hand, users like livepatch are required to match the symbols
>>> exactly. On the other hand, users like kprobe would like to match to
>>> original function names.
>>>
>>> Solve this by splitting kallsyms APIs. Specifically, existing APIs now
>>> should match the symbols exactly. Add two APIs that matches the full
>>> symbol, or only the part without .llvm.suffix. Specifically, the following
>>> two APIs are added:
>>>
>>> 1. kallsyms_lookup_name_or_prefix()
>>> 2. kallsyms_on_each_match_symbol_or_prefix()
>>
>> Since this API only removes the suffix, "match prefix" is a bit confusing.
>> (this sounds like matching "foo" with "foo" and "foo_bar", but in reality,
>> it only matches "foo" and "foo.llvm.*")
>> What about the name below?
>>
>> kallsyms_lookup_name_without_suffix()
>> kallsyms_on_each_match_symbol_without_suffix()
> 
> I am open to name suggestions. I named it as xx or prefix to highlight
> that these two APIs will try match full name first, and they only match
> the symbol without suffix when there is no full name match. 
> 
> Maybe we can call them: 
> - kallsyms_lookup_name_or_without_suffix()
> - kallsyms_on_each_match_symbol_or_without_suffix()
> 
> Again, I am open to any name selections here. 

Only static functions have suffixes. In my opinion, explicitly marking static
might be a little clearer.
kallsyms_lookup_static_name()
kallsyms_on_each_match_static_symbol()

> 
>>
>>>
>>> These APIs will be used by kprobe.
>>
>> No other user need this?
> 
> AFACIT, kprobe is the only use case here. Sami, please correct 
> me if I missed any users. 
> 
> 
> More thoughts on this: 
> 
> I actually hope we don't need these two new APIs, as they are 
> confusing. Modern compilers can do many things to the code 
> (inlining, etc.). So when we are tracing a function, we are not 
> really tracing "function in the source code". Instead, we are 
> tracing "function in the binary". If a function is inlined, it 
> will not show up in the binary. If a function is _partially_ 
> inlined (inlined by some callers, but not by others), it will 
> show up in the binary, but we won't be tracing it as it appears
> in the source code. Therefore, tracing functions by their names 
> in the source code only works under certain assumptions. And 
> these assumptions may not hold with modern compilers. Ideally, 
> I think we cannot promise the user can use name "ping_table" to
> trace function "ping_table.llvm.15394922576589127018"
> 
> Does this make sense?
> 
> Thanks,
> Song
> 
> 
> [...]
> 

-- 
Regards,
  Zhen Lei




Re: [PATCH 2/3] kallsyms: Add APIs to match symbol without .llmv. suffix.

2024-08-01 Thread Leizhen (ThunderTown)



On 2024/8/2 11:45, Song Liu wrote:
> 
> 
>> On Aug 1, 2024, at 6:18 PM, Leizhen (ThunderTown) 
>>  wrote:
>>
>> On 2024/7/31 9:00, Song Liu wrote:
>>> Hi Masami, 
>>>
>>>> On Jul 30, 2024, at 6:03 AM, Masami Hiramatsu  wrote:
>>>>
>>>> On Mon, 29 Jul 2024 17:54:32 -0700
>>>> Song Liu  wrote:
>>>>
>>>>> With CONFIG_LTO_CLANG=y, the compiler may add suffix to function names
>>>>> to avoid duplication. This causes confusion with users of kallsyms.
>>>>> On one hand, users like livepatch are required to match the symbols
>>>>> exactly. On the other hand, users like kprobe would like to match to
>>>>> original function names.
>>>>>
>>>>> Solve this by splitting kallsyms APIs. Specifically, existing APIs now
>>>>> should match the symbols exactly. Add two APIs that matches the full
>>>>> symbol, or only the part without .llvm.suffix. Specifically, the following
>>>>> two APIs are added:
>>>>>
>>>>> 1. kallsyms_lookup_name_or_prefix()
>>>>> 2. kallsyms_on_each_match_symbol_or_prefix()
>>>>
>>>> Since this API only removes the suffix, "match prefix" is a bit confusing.
>>>> (this sounds like matching "foo" with "foo" and "foo_bar", but in reality,
>>>> it only matches "foo" and "foo.llvm.*")
>>>> What about the name below?
>>>>
>>>> kallsyms_lookup_name_without_suffix()
>>>> kallsyms_on_each_match_symbol_without_suffix()
>>>
>>> I am open to name suggestions. I named it as xx or prefix to highlight
>>> that these two APIs will try match full name first, and they only match
>>> the symbol without suffix when there is no full name match. 
>>>
>>> Maybe we can call them: 
>>> - kallsyms_lookup_name_or_without_suffix()
>>> - kallsyms_on_each_match_symbol_or_without_suffix()
>>>
>>> Again, I am open to any name selections here.
>>
>> Only static functions have suffixes. In my opinion, explicitly marking static
>> might be a little clearer.
>> kallsyms_lookup_static_name()
>> kallsyms_on_each_match_static_symbol()
> 
> While these names are shorter, I think they are more confusing. Not all
> folks know that only static functions can have suffixes. 
> 
> Maybe we should not hide the "try match full name first first" in the
> API, and let the users handle it. Then, we can safely call the new APIs
> *_without_suffix(), as Masami suggested. 

Yes, that would be clearer.

> 
> If there is no objections, I will send v2 based on this direction. 
> 
> Thanks,
> Song
> 

-- 
Regards,
  Zhen Lei




Re: [PATCH 1/1] uprobe: fix comment of uprobe_apply()

2024-08-20 Thread Leizhen (ThunderTown)



On 2024/8/20 22:30, Oleg Nesterov wrote:
> On 08/20, Zhen Lei wrote:
>>
>> Depending on the argument 'add', uprobe_apply() may be registering or
>> unregistering a probe.
> 
> ...
> 
>>  /*
>> - * uprobe_apply - unregister an already registered probe.
>> - * @inode: the file in which the probe has to be removed.
>> + * uprobe_apply - register a probe or unregister an already registered 
>> probe.
> 
> Not really.
> 
> See the commit 3c83a9ad0295eb63bd ("uprobes: make uprobe_register() return 
> struct uprobe *")
> in tip/perf/core which changed this description
> 
>   * uprobe_apply - add or remove the breakpoints according to @uc->filter
> 
> still looks confusing, yes...

OK, I got it. I mistakenly thought the comment was based on 
register_for_each_vma.
It seems necessary to rename 'register_for_each_vma' to 'apply_for_each_vma',
or some other more appropriate name.

> 
> Oleg.
> 
> 
> .
> 

-- 
Regards,
  Zhen Lei



Re: [PATCH 3/3] debugobjects: Use hlist_cut_number() to optimize performance and improve readability

2024-09-09 Thread Leizhen (ThunderTown)



On 2024/9/10 2:41, Thomas Gleixner wrote:
> On Wed, Sep 04 2024 at 21:41, Zhen Lei wrote:
> 
>> Currently, there are multiple instances where several nodes are extracted
>> from one list and added to another list. One by one extraction, and then
>> one by one splicing, not only low efficiency, readability is also poor.
>> The work can be done well with hlist_cut_number() and hlist_splice_init(),
>> which move the entire sublist at once.
>>
>> When the number of nodes expected to be moved is less than or equal to 0,
>> or the source list is empty, hlist_cut_number() safely returns 0. The
>> splicing is performed only when the return value of hlist_cut_number() is
>> greater than 0.
>>
>> For two calls to hlist_cut_number() in __free_object(), the result is
>> obviously positive, the check of the return value is omitted.
> 
> Sure but hlist_cut_number() suffers from the same problem as the current
> code. If is a massive cache line chase as you actually have to walk the
> list to figure out where to cut it off.
> 
> All related functions have this problem and all of this code is very
> strict about boundaries. Instead of accurately doing the refill, purge
> etc. we should look into proper batch mode mechanisms. Let me think
> about it.

It may be helpful to add several arrays to record the first node of each batch
in each free list. Take 'percpu_pool' as an example:

struct debug_percpu_free {
struct hlist_head   free_objs;
int obj_free;
+   int batch_idx;
+   struct hlist_node   *batch_first[4]; // ODEBUG_POOL_PERCPU_SIZE / 
ODEBUG_BATCH_SIZE
};

A new free node is added to the header of the list, and the batch is cut from 
the tail
of the list.
  NodeA<-->...<-->NodeB<-->...<-->NodeC<-->NodeD<--> free_objs
|---one batch---|---one batch---|
|   |
batch_first[0]  batch_first[1]

__free_object():
//add obj into percpu_pool
obj_free++;
if (obj_free % ODEBUG_BATCH_SIZE == 0) {
idx = 0x3 & (batch_idx + (obj_free / ODEBUG_BATCH_SIZE) - 1);
//update batch_first[idx]
}

if (obj_free >= ODEBUG_POOL_PERCPU_SIZE) {
//move one batch
//cut batch at 'batch_idx' into obj_to_free (or obj_pool, if 
less than debug_objects_pool_min_level)
batch_idx++;
obj_free -= ODEBUG_BATCH_SIZE
}

> 
> Thanks,
> 
> tglx
> 
> 
> .
> 

-- 
Regards,
  Zhen Lei



Re: [PATCH 3/3] debugobjects: Use hlist_cut_number() to optimize performance and improve readability

2024-09-11 Thread Leizhen (ThunderTown)



On 2024/9/10 19:44, Thomas Gleixner wrote:
> On Tue, Sep 10 2024 at 12:00, Leizhen wrote:
>> On 2024/9/10 2:41, Thomas Gleixner wrote:
>>> All related functions have this problem and all of this code is very
>>> strict about boundaries. Instead of accurately doing the refill, purge
>>> etc. we should look into proper batch mode mechanisms. Let me think
>>> about it.
>> It may be helpful to add several arrays to record the first node of each 
>> batch
>> in each free list. Take 'percpu_pool' as an example:
>>
>> struct debug_percpu_free {
>>  struct hlist_head   free_objs;
>>  int obj_free;
>> +int batch_idx;
>> +struct hlist_node   *batch_first[4]; // ODEBUG_POOL_PERCPU_SIZE / 
>> ODEBUG_BATCH_SIZE
>> };
>>
>> A new free node is added to the header of the list, and the batch is cut 
>> from the tail
>> of the list.
>>   NodeA<-->...<-->NodeB<-->...<-->NodeC<-->NodeD<--> free_objs
>> |---one batch---|---one batch---|
>> |   |
>> batch_first[0]  batch_first[1]
> The current data structures are not fit for the purpose. Glueing
> workarounds into the existing mess makes it just worse.
> 
> So the data structures need to be redesigned from ground up to be fit
> for the purpose.
> 
> allocation:
> 
>1) Using the global pool for single object allocations is wrong
> 
>   During boot this can be a completely disconnected list, which does
>   not need any accounting, does not need pool_lock and can be just
>   protected with irqsave like the per CPU pools. It's effectivly a
>   per CPU pool because at that point there is only one CPU and
>   everything is single threaded.
> 
>2) The per CPU pool handling is backwards
> 
>   If the per CPU pool is empty, then the pool needs to be refilled
>   with a batch from the global pool and allocated from there.
> 
>   Allocation then always happens from the active per CPU batch slot.
> 
> free:
> 
>1) Early boot
> 
>   Just put it back on the dedicated boot list and be done
> 
>2) After obj_cache is initialized
> 
>   Put it back to the per CPU pool into the active batch slot.  If
>   the slot becomes full then make the next slot the active slot. It
>   the full slot was the top most slot then move that slot either
>   into the global pool when there is a free slot, or move it to the
>   to_free pool.
> 
> That means the per CPU pool is different from the global pools as it can
> allocate/free single objects, while the global pools are strictly stacks
> of batches. Any movement between per CPU pools and global pools is batch
> based and just moves lists from one head to another.
> 
> That minimizes the pool lock contention and the cache foot print. The
> global to free pool must have an extra twist to accomodate non-batch
> sized drops and to handle the all slots are full case, but that's just a
> trivial detail.

That's great. I really admire you for completing the refactor in such a
short of time. But I have a few minor comments.
1. When kmem_cache_zalloc() is called to allocate objs for filling,
   if less than one batch of objs are allocated, all of them can be
   pushed to the local CPU. That's, call pcpu_free() one by one.
   In this way, the number of free objs cached by pool_global and
   pool_to_free is always an integer multiple of ODEBUG_BATCH_SIZE.
2. Member tot_cnt of struct global_pool can be deleted. We can get it
   simply and quickly through (slot_idx * ODEBUG_BATCH_SIZE). Avoid
   redundant maintenance.
3. debug_objects_pool_min_level also needs to be adjusted accordingly,
   the number of batches of the min level.

> 
> See the completely untested combo patch against tip core/debugobjects
> below.
> 
> Thanks,
> 
> tglx

-- 
Regards,
  Zhen Lei



Re: [PATCH 3/3] debugobjects: Use hlist_cut_number() to optimize performance and improve readability

2024-09-11 Thread Leizhen (ThunderTown)



On 2024/9/11 16:54, Thomas Gleixner wrote:
> On Wed, Sep 11 2024 at 15:44, Leizhen wrote:
>> On 2024/9/10 19:44, Thomas Gleixner wrote:
>>> That minimizes the pool lock contention and the cache foot print. The
>>> global to free pool must have an extra twist to accomodate non-batch
>>> sized drops and to handle the all slots are full case, but that's just a
>>> trivial detail.
>>
>> That's great. I really admire you for completing the refactor in such a
>> short of time.
> 
> The trick is to look at it from the data model and not from the
> code. You need to sit down and think about which data model is required
> to achieve what you want. So the goal was batching, right?

Yes, when I found a hole in the road, I thought about how to fill it. But
you think more deeply, why is there a pit, is there a problem with the
foundation? I've benefited a lot from communicating with you these days.

> 
> That made it clear that the global pools need to be stacks of batches
> and never handle single objects because that makes it complex. As a
> consequence the per cpu pool is the one which does single object
> alloc/free and then either gets a full batch from the global pool or
> drops one into it. The rest is just mechanical.
> 
>> But I have a few minor comments.
>> 1. When kmem_cache_zalloc() is called to allocate objs for filling,
>>if less than one batch of objs are allocated, all of them can be
>>pushed to the local CPU. That's, call pcpu_free() one by one.
> 
> If that's the case then we should actually immediately give them back
> because thats a sign of memory pressure.

Yes, that makes sense, and that's a solution too.

> 
>> 2. Member tot_cnt of struct global_pool can be deleted. We can get it
>>simply and quickly through (slot_idx * ODEBUG_BATCH_SIZE). Avoid
>>redundant maintenance.
> 
> Agreed.
> 
>> 3. debug_objects_pool_min_level also needs to be adjusted accordingly,
>>the number of batches of the min level.
> 
> Sure. There are certainly more problems with that code. As I said, it's
> untested and way to big to be reviewed. I'll split it up into more
> manageable bits and pieces.

Looking forward to...

> 
> Thanks,
> 
> tglx
> .
> 

-- 
Regards,
  Zhen Lei



Re: [PATCH v2 2/2] ASoC: dt-bindings: renesas, rsnd: Clear warning 'ports' does not match any of the regexes

2021-04-08 Thread Leizhen (ThunderTown)



On 2021/4/7 10:04, Leizhen (ThunderTown) wrote:
> 
> 
> On 2021/4/2 4:20, Rob Herring wrote:
>> On Wed, Mar 31, 2021 at 05:16:16PM +0800, Zhen Lei wrote:
>>> Currently, if there are more than two ports, or if there is only one port
>>> but other properties(such as "#address-cells") is required, these ports
>>> are placed under the "ports" node. So add the schema of property "ports".
>>
>> A given binding should just use 'ports' or 'port' depending on it's 
>> need. Supporting both forms is needless complexity.

Hi Rob:
I don't think of a good way to avoid "port" and "ports" to be used at the same 
time.
Should I disable the use of "port"? Convert the two usages of "port" into 
"ports".
But usually no one will use both of them in one dts file. And even if it's used 
at
the same time, it's not a big mistake. So I decided not to test it.

> 
> Right, I'll adjust this patch again.
> 
>>
>>> Otherwise, warnings similar to the following will be reported:
>>> arch/arm64/boot/dts/renesas/r8a774a1-beacon-rzg2m-kit.dt.yaml: \
>>> sound@ec50: 'ports' does not match any of the regexes: \
>>> '^rcar_sound,ctu$', '^rcar_sound,dai$', '^rcar_sound,dvc$', ...
>>>
>>> Signed-off-by: Zhen Lei 
>>> ---
>>>  Documentation/devicetree/bindings/sound/renesas,rsnd.yaml | 5 +
>>>  1 file changed, 5 insertions(+)
>>>
>>> diff --git a/Documentation/devicetree/bindings/sound/renesas,rsnd.yaml 
>>> b/Documentation/devicetree/bindings/sound/renesas,rsnd.yaml
>>> index 384191ee497f534..a42992fa687d3f3 100644
>>> --- a/Documentation/devicetree/bindings/sound/renesas,rsnd.yaml
>>> +++ b/Documentation/devicetree/bindings/sound/renesas,rsnd.yaml
>>> @@ -115,6 +115,11 @@ properties:
>>>  $ref: audio-graph-port.yaml#
>>>  unevaluatedProperties: false
>>>  
>>> +  ports:
>>
>>$ref: /schemas/graph.yaml#/properties/ports
> 
> OK, thanks
> 
>>
>>> +patternProperties:
>>> +  '^port@[0-9]':
>>> +$ref: "#/properties/port"
>>
>> Then this should be: $ref: audio-graph-port.yaml#
> 
> OK, thanks
> 
>>
>> Also, what each port is should be defined, but that's a separate 
>> problem.
>>
>>> +
>>>  # use patternProperties to avoid naming "xxx,yyy" issue
>>>  patternProperties:
>>>"^rcar_sound,dvc$":
>>> -- 
>>> 1.8.3
>>>
>>>
>>
>> .
>>



[Question] Can we use SIGRTMIN when vdso disabled on X86?

2018-06-05 Thread Leizhen (ThunderTown)
After I executed "echo 0 > /proc/sys/abi/vsyscall32" to disable vdso, the 
rt_sigaction01 test case from ltp_2015 failed.
The test case source code please refer to the attachment, and the output as 
blow:

-
./rt_sigaction01
rt_sigaction010  TINFO  :  signal: 34
rt_sigaction011  TPASS  :  rt_sigaction call succeeded: result = 0
rt_sigaction010  TINFO  :  sa.sa_flags = SA_RESETHAND|SA_SIGINFO
rt_sigaction010  TINFO  :  Signal Handler Called with signal number 34

Segmentation fault
--


Is this the desired result? In function ia32_setup_rt_frame, I found below code:

if (ksig->ka.sa.sa_flags & SA_RESTORER)
restorer = ksig->ka.sa.sa_restorer;
else
restorer = current->mm->context.vdso +
vdso_image_32.sym___kernel_rt_sigreturn;
put_user_ex(ptr_to_compat(restorer), &frame->pretcode);

Because the vdso is disabled, so current->mm->context.vdso is NULL, which cause 
the result of frame->pretcode invalid.

I'm not sure whether this is a kernel bug or just an error of test case itself. 
Can anyone help me?

-- 
Thanks!
BestRegards
/**/
/* Copyright (c) Crackerjack Project., 2007   */
/**/
/* This program is free software;  you can redistribute it and/or modify  */
/* it under the terms of the GNU General Public License as published by   */
/* the Free Software Foundation; either version 2 of the License, or  */
/* (at your option) any later version.*/
/**/
/* This program is distributed in the hope that it will be useful,*/
/* but WITHOUT ANY WARRANTY;  without even the implied warranty of*/
/* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See  */
/* the GNU General Public License for more details.   */
/**/
/* You should have received a copy of the GNU General Public License  */
/* along with this program;  if not, write to the Free Software Foundation,   */
/* Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA   */
/**/
/* History: Porting from Crackerjack to LTP is done by*/
/*  Manas Kumar Nayak makna...@in.ibm.com>*/
/**/

/**/
/* Description: This tests the rt_sigaction() syscall */
/*  rt_sigaction alters an action taken by a process on receipt   */
/*  of a particular signal. The action is specified by the*/
/*  sigaction structure. The previous action on the signal is */
/*  saved in oact.sigsetsize should indicate the size of a*/
/*  sigset_t type.*/
/**/

#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include "test.h"
#include "linux_syscall_numbers.h"
#include "lapi/rt_sigaction.h"

char *TCID = "rt_sigaction01";
static int testno;
int TST_TOTAL = 1;

static void cleanup(void)
{
tst_rmdir();
}

static void setup(void)
{
TEST_PAUSE;
tst_tmpdir();
}

static int test_flags[] =
{ SA_RESETHAND | SA_SIGINFO, SA_RESETHAND, SA_RESETHAND | SA_SIGINFO,
SA_RESETHAND | SA_SIGINFO, SA_NOMASK };
char *test_flags_list[] =
{ "SA_RESETHAND|SA_SIGINFO", "SA_RESETHAND", "SA_RESETHAND|SA_SIGINFO",
"SA_RESETHAND|SA_SIGINFO", "SA_NOMASK" };

static void handler(int sig)
{
tst_resm(TINFO, "Signal Handler Called with signal number %d\n", sig);
return;
}

static int set_handler(int sig, int sig_to_mask, int mask_flags)
{
struct sigaction sa, oldaction;

sa.sa_handler = (void *)handler;
sa.sa_flags = mask_flags;
sigemptyset(&sa.sa_mask);
sigaddset(&sa.sa_mask, sig);

return ltp_rt_sigaction(sig, &sa, &oldaction, SIGSETSIZE);
}

int main(int ac, char **av)
{
unsigned int flag;
int signal;
int lc;

tst_parse_opts(ac, av, NULL, NULL);

setup();

for (lc = 0; TEST_LOOPING(lc); ++lc) {

tst_count = 0;

for (testno = 0; testno < TST_TOTAL; ++testno) {

for (signal = SIGRTMIN; signal <= SIGRTMAX; signal++) {
for (flag = 0;
 flag <
 

Is this a kernel BUG? ///Re: [Question] Can we use SIGRTMIN when vdso disabled on X86?

2018-06-06 Thread Leizhen (ThunderTown)



On 2018/6/5 19:24, Leizhen (ThunderTown) wrote:
> After I executed "echo 0 > /proc/sys/abi/vsyscall32" to disable vdso, the 
> rt_sigaction01 test case from ltp_2015 failed.
> The test case source code please refer to the attachment, and the output as 
> blow:
> 
> -
> ./rt_sigaction01
> rt_sigaction010  TINFO  :  signal: 34
> rt_sigaction011  TPASS  :  rt_sigaction call succeeded: result = 0
> rt_sigaction010  TINFO  :  sa.sa_flags = SA_RESETHAND|SA_SIGINFO
> rt_sigaction010  TINFO  :  Signal Handler Called with signal number 34
> 
> Segmentation fault
> --
> 
> 
> Is this the desired result? In function ia32_setup_rt_frame, I found below 
> code:
> 
>   if (ksig->ka.sa.sa_flags & SA_RESTORER)
>   restorer = ksig->ka.sa.sa_restorer;
>   else
>   restorer = current->mm->context.vdso +
>   vdso_image_32.sym___kernel_rt_sigreturn;
>   put_user_ex(ptr_to_compat(restorer), &frame->pretcode);
> 
> Because the vdso is disabled, so current->mm->context.vdso is NULL, which 
> cause the result of frame->pretcode invalid.
> 
> I'm not sure whether this is a kernel bug or just an error of test case 
> itself. Can anyone help me?
> 

-- 
Thanks!
BestRegards



Re: Is this a kernel BUG? ///Re: [Question] Can we use SIGRTMIN when vdso disabled on X86?

2018-06-06 Thread Leizhen (ThunderTown)
I found that glibc has already dealt with this case. So this issue must have 
been met before, should it be maintained by libc/user?

if (GLRO(dl_sysinfo_dso) == NULL)
{
kact.sa_flags |= SA_RESTORER;

kact.sa_restorer = ((act->sa_flags & SA_SIGINFO)
? &restore_rt : &restore);
}


On 2018/6/6 15:52, Leizhen (ThunderTown) wrote:
> 
> 
> On 2018/6/5 19:24, Leizhen (ThunderTown) wrote:
>> After I executed "echo 0 > /proc/sys/abi/vsyscall32" to disable vdso, the 
>> rt_sigaction01 test case from ltp_2015 failed.
>> The test case source code please refer to the attachment, and the output as 
>> blow:
>>
>> -
>> ./rt_sigaction01
>> rt_sigaction010  TINFO  :  signal: 34
>> rt_sigaction011  TPASS  :  rt_sigaction call succeeded: result = 0
>> rt_sigaction010  TINFO  :  sa.sa_flags = SA_RESETHAND|SA_SIGINFO
>> rt_sigaction010  TINFO  :  Signal Handler Called with signal number 34
>>
>> Segmentation fault
>> --
>>
>>
>> Is this the desired result? In function ia32_setup_rt_frame, I found below 
>> code:
>>
>>  if (ksig->ka.sa.sa_flags & SA_RESTORER)
>>  restorer = ksig->ka.sa.sa_restorer;
>>  else
>>  restorer = current->mm->context.vdso +
>>  vdso_image_32.sym___kernel_rt_sigreturn;
>>  put_user_ex(ptr_to_compat(restorer), &frame->pretcode);
>>
>> Because the vdso is disabled, so current->mm->context.vdso is NULL, which 
>> cause the result of frame->pretcode invalid.
>>
>> I'm not sure whether this is a kernel bug or just an error of test case 
>> itself. Can anyone help me?
>>
> 

-- 
Thanks!
BestRegards



Re: Is this a kernel BUG? ///Re: [Question] Can we use SIGRTMIN when vdso disabled on X86?

2018-06-06 Thread Leizhen (ThunderTown)



On 2018/6/7 1:48, h...@zytor.com wrote:
> On June 6, 2018 2:17:42 AM PDT, "Leizhen (ThunderTown)" 
>  wrote:
>> I found that glibc has already dealt with this case. So this issue must
>> have been met before, should it be maintained by libc/user?
>>
>>  if (GLRO(dl_sysinfo_dso) == NULL)
>>  {
>>  kact.sa_flags |= SA_RESTORER;
>>
>>  kact.sa_restorer = ((act->sa_flags & SA_SIGINFO)
>>          ? &restore_rt : &restore);
>>  }
>>
>>
>> On 2018/6/6 15:52, Leizhen (ThunderTown) wrote:
>>>
>>>
>>> On 2018/6/5 19:24, Leizhen (ThunderTown) wrote:
>>>> After I executed "echo 0 > /proc/sys/abi/vsyscall32" to disable
>> vdso, the rt_sigaction01 test case from ltp_2015 failed.
>>>> The test case source code please refer to the attachment, and the
>> output as blow:
>>>>
>>>> -
>>>> ./rt_sigaction01
>>>> rt_sigaction010  TINFO  :  signal: 34
>>>> rt_sigaction011  TPASS  :  rt_sigaction call succeeded: result =
>> 0
>>>> rt_sigaction010  TINFO  :  sa.sa_flags = SA_RESETHAND|SA_SIGINFO
>>>> rt_sigaction010  TINFO  :  Signal Handler Called with signal
>> number 34
>>>>
>>>> Segmentation fault
>>>> --
>>>>
>>>>
>>>> Is this the desired result? In function ia32_setup_rt_frame, I found
>> below code:
>>>>
>>>>if (ksig->ka.sa.sa_flags & SA_RESTORER)
>>>>restorer = ksig->ka.sa.sa_restorer;
>>>>else
>>>>restorer = current->mm->context.vdso +
>>>>vdso_image_32.sym___kernel_rt_sigreturn;
>>>>put_user_ex(ptr_to_compat(restorer), &frame->pretcode);
>>>>
>>>> Because the vdso is disabled, so current->mm->context.vdso is NULL,
>> which cause the result of frame->pretcode invalid.
>>>>
>>>> I'm not sure whether this is a kernel bug or just an error of test
>> case itself. Can anyone help me?
>>>>
>>>
> 
> The use of signals without SA_RESTORER is considered obsolete, but it's 
> somewhat surprising that the vdso isn't there; it should be mapped even for 
> static binaries esp. on i386 since it is the preferred way to do system calls 
> (you don't need to parse the ELF for that.) Are you explicitly disabling the 
> VDSO? If so, Don't Do That.

Yes, the vdso was explicitly disabled by the tester. Thanks.

> 

-- 
Thanks!
BestRegards



Re: Is this a kernel BUG? ///Re: [Question] Can we use SIGRTMIN when vdso disabled on X86?

2018-06-06 Thread Leizhen (ThunderTown)



On 2018/6/7 1:01, Andy Lutomirski wrote:
> On Wed, Jun 6, 2018 at 2:18 AM Leizhen (ThunderTown)
>  wrote:
>>
>> I found that glibc has already dealt with this case. So this issue must have 
>> been met before, should it be maintained by libc/user?
>>
>> if (GLRO(dl_sysinfo_dso) == NULL)
>> {
>> kact.sa_flags |= SA_RESTORER;
>>
>> kact.sa_restorer = ((act->sa_flags & SA_SIGINFO)
>> ? &restore_rt : &restore);
>> }
>>
>>
>> On 2018/6/6 15:52, Leizhen (ThunderTown) wrote:
>>>
>>>
>>> On 2018/6/5 19:24, Leizhen (ThunderTown) wrote:
>>>> After I executed "echo 0 > /proc/sys/abi/vsyscall32" to disable vdso, the 
>>>> rt_sigaction01 test case from ltp_2015 failed.
>>>> The test case source code please refer to the attachment, and the output 
>>>> as blow:
>>>>
>>>> -
>>>> ./rt_sigaction01
>>>> rt_sigaction010  TINFO  :  signal: 34
>>>> rt_sigaction011  TPASS  :  rt_sigaction call succeeded: result = 0
>>>> rt_sigaction010  TINFO  :  sa.sa_flags = SA_RESETHAND|SA_SIGINFO
>>>> rt_sigaction010  TINFO  :  Signal Handler Called with signal number 34
>>>>
>>>> Segmentation fault
>>>> --
>>>>
>>>>
>>>> Is this the desired result? In function ia32_setup_rt_frame, I found below 
>>>> code:
>>>>
>>>>  if (ksig->ka.sa.sa_flags & SA_RESTORER)
>>>>  restorer = ksig->ka.sa.sa_restorer;
>>>>  else
>>>>  restorer = current->mm->context.vdso +
>>>>  vdso_image_32.sym___kernel_rt_sigreturn;
>>>>  put_user_ex(ptr_to_compat(restorer), &frame->pretcode);
>>>>
>>>> Because the vdso is disabled, so current->mm->context.vdso is NULL, which 
>>>> cause the result of frame->pretcode invalid.
>>>>
>>>> I'm not sure whether this is a kernel bug or just an error of test case 
>>>> itself. Can anyone help me?
>>>>
>>>
>>
>>
> 
> I can't tell from your email what you're testing, what behavior you
> expect, and what you saw.  A program that sets up a signal handler
> without supplying a restorer will not work if the vDSO is off, and
> this is by design.
OK, so that the user should take care whether the vDSO is disabled by itself or 
not, and use different strategies to process it appropriately, like glibc.

> 
> (FWIW, there is a very longstanding libc bug that causes this case to
> get severely screwed up if the user's SS is not the expected value,
> and that bug was just fixed very recently.  But I doubt this is what
> you're seeing.)
> 
> I suppose we could improve the kernel to at least push NULL instead of
> some random address a bit above 0, but it'll still crash.
Should we add a warning? Which may help the user to aware this error in time.

> 
> .
> 

-- 
Thanks!
BestRegards



Re: Is this a kernel BUG? ///Re: [Question] Can we use SIGRTMIN when vdso disabled on X86?

2018-06-06 Thread Leizhen (ThunderTown)



On 2018/6/7 10:39, Andy Lutomirski wrote:
> 
> 
>> On Jun 6, 2018, at 7:05 PM, Leizhen (ThunderTown) 
>>  wrote:
>>
>>
>>
>>> On 2018/6/7 1:01, Andy Lutomirski wrote:
>>> On Wed, Jun 6, 2018 at 2:18 AM Leizhen (ThunderTown)
>>>  wrote:
>>>>
>>>> I found that glibc has already dealt with this case. So this issue must 
>>>> have been met before, should it be maintained by libc/user?
>>>>
>>>>if (GLRO(dl_sysinfo_dso) == NULL)
>>>>{
>>>>kact.sa_flags |= SA_RESTORER;
>>>>
>>>>    kact.sa_restorer = ((act->sa_flags & SA_SIGINFO)
>>>>? &restore_rt : &restore);
>>>>}
>>>>
>>>>
>>>>> On 2018/6/6 15:52, Leizhen (ThunderTown) wrote:
>>>>>
>>>>>
>>>>>> On 2018/6/5 19:24, Leizhen (ThunderTown) wrote:
>>>>>> After I executed "echo 0 > /proc/sys/abi/vsyscall32" to disable vdso, 
>>>>>> the rt_sigaction01 test case from ltp_2015 failed.
>>>>>> The test case source code please refer to the attachment, and the output 
>>>>>> as blow:
>>>>>>
>>>>>> -
>>>>>> ./rt_sigaction01
>>>>>> rt_sigaction010  TINFO  :  signal: 34
>>>>>> rt_sigaction011  TPASS  :  rt_sigaction call succeeded: result = 0
>>>>>> rt_sigaction010  TINFO  :  sa.sa_flags = SA_RESETHAND|SA_SIGINFO
>>>>>> rt_sigaction010  TINFO  :  Signal Handler Called with signal number 
>>>>>> 34
>>>>>>
>>>>>> Segmentation fault
>>>>>> --
>>>>>>
>>>>>>
>>>>>> Is this the desired result? In function ia32_setup_rt_frame, I found 
>>>>>> below code:
>>>>>>
>>>>>> if (ksig->ka.sa.sa_flags & SA_RESTORER)
>>>>>> restorer = ksig->ka.sa.sa_restorer;
>>>>>> else
>>>>>> restorer = current->mm->context.vdso +
>>>>>> vdso_image_32.sym___kernel_rt_sigreturn;
>>>>>> put_user_ex(ptr_to_compat(restorer), &frame->pretcode);
>>>>>>
>>>>>> Because the vdso is disabled, so current->mm->context.vdso is NULL, 
>>>>>> which cause the result of frame->pretcode invalid.
>>>>>>
>>>>>> I'm not sure whether this is a kernel bug or just an error of test case 
>>>>>> itself. Can anyone help me?
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>> I can't tell from your email what you're testing, what behavior you
>>> expect, and what you saw.  A program that sets up a signal handler
>>> without supplying a restorer will not work if the vDSO is off, and
>>> this is by design.
>> OK, so that the user should take care whether the vDSO is disabled by itself 
>> or not, and use different strategies to process it appropriately, like glibc.
>>
>>>
>>> (FWIW, there is a very longstanding libc bug that causes this case to
>>> get severely screwed up if the user's SS is not the expected value,
>>> and that bug was just fixed very recently.  But I doubt this is what
>>> you're seeing.)
>>>
>>> I suppose we could improve the kernel to at least push NULL instead of
>>> some random address a bit above 0, but it'll still crash.
>> Should we add a warning? Which may help the user to aware this error in time.
>>
> 
> It’s entirely valid to have a non working restorer if you never plan to 
> return from a signal handler. And anyone who writes their own libc should be 
> able to figure this out on their own, I think.

OK. Thanks a lot.

> 
>>>
>>> .
>>>
>>
>> -- 
>> Thanks!
>> BestRegards
>>
> 
> .
> 

-- 
Thanks!
BestRegards



Re: [PATCH RFC 1/1] iommu: set the default iommu-dma mode as non-strict

2019-02-28 Thread Leizhen (ThunderTown)



On 2019/2/26 20:36, Hanjun Guo wrote:
> Hi Jean,
> 
> On 2019/1/31 22:55, Jean-Philippe Brucker wrote:
>> Hi,
>>
>> On 31/01/2019 13:52, Zhen Lei wrote:
>>> Currently, many peripherals are faster than before. For example, the top
>>> speed of the older netcard is 10Gb/s, and now it's more than 25Gb/s. But
>>> when iommu page-table mapping enabled, it's hard to reach the top speed
>>> in strict mode, because of frequently map and unmap operations. In order
>>> to keep abreast of the times, I think it's better to set non-strict as
>>> default.
>>
>> Most users won't be aware of this relaxation and will have their system
>> vulnerable to e.g. thunderbolt hotplug. See for example 4.3 Deferred
>> Invalidation in
>> http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2018/MSC/MSC-2018-21.pdf
Hi Jean,

   In fact, we have discussed the vulnerable of deferred invalidation before 
upstream
the non-strict patches. The attacks maybe possible because of an untrusted 
device or
the mistake of the device driver. And we limited the VFIO to still use strict 
mode.
   As mentioned in the pdf, limit the freed memory with deferred invalidation 
only to
be reused by the device, can mitigate the vulnerability. But it's too hard to 
implement
it now.
   A compromise maybe we only apply non-strict to (1) dma_free_coherent, 
because the
memory is controlled by DMA common module, so we can make the memory to be 
freed after
the global invalidation in the timer handler. (2) And provide some new APIs 
related to
iommu_unmap_page/sg, these new APIs deferred invalidation. And the candiate 
device
drivers update the APIs if they want to improve performance. (3) Make sure that 
only
the trusted devices and trusted drivers can apply (1) and (2). For example, the 
driver
must be built into kernel Image.
   So that some high-end trusted devices use non-strict mode, and keep others 
still using
strict mode. The drivers who want to use non-strict mode, should change to use 
new APIs
by themselves.


>>
>> Why not keep the policy to secure by default, as we do for
>> iommu.passthrough? And maybe add something similar to
>> CONFIG_IOMMU_DEFAULT_PASSTRHOUGH? It's easy enough for experts to pass a
>> command-line argument or change the default config.
> 
> Sorry for the late reply, it was Chinese new year, and we had a long 
> discussion
> internally, we are fine to add a Kconfig but not sure OS vendors will set it
> to default y.
> 
> OS vendors seems not happy to pass a command-line argument, to be honest,
> this is our motivation to enable non-strict as default. Hope OS vendors
> can see this email thread, and give some input here.
> 
> Thanks
> Hanjun
> 
> 
> .
> 

-- 
Thanks!
BestRegards



Re: [PATCH 2/5] iommu/arm-smmu-v3: make smmu can be enabled in kdump kernel

2019-03-01 Thread Leizhen (ThunderTown)
It seems that the picture is too big, I change it from jpg to png.


On 2019/3/1 17:02, Leizhen (ThunderTown) wrote:
> Hi All,
> I drew a flowchart, hope this can help you to understand my method.
> 
> On 2019/2/19 15:54, Zhen Lei wrote:
>> To reduce the risk of further crash, the device_shutdown() was not called
>> by the first kernel. That means some devices may still working in the
>> secondary kernel. For example, a netcard may still using ring buffer to
>> receive the broadcast messages in the kdump kernel. No events are reported
>> utill the related smmu reinitialized by the kdump kernel.
>>
>> commit b63b3439b856 ("iommu/arm-smmu-v3: Abort all transactions if SMMU is
>> enabled in kdump kernel") set SMMU_GBPA.ABORT to prevent the unexpected
>> devices accessing, but it also prevent the devices accessing which we
>> needed, like hard disk, netcard.
>>
>> In fact, we can use STE.config=0b000 to abort the unexpected devices
>> accessing only. As below:
>> 1. In the first kernel, all buffers used by the "unexpected" devices are
>>correctly mapped, and it will not be used by the secondary kernel
>>because the latter has its dedicated reserved memory.
>> 2. In the secondary kernel, set SMMU_GBPA.ABORT=1 before "disable smmu".
>> 3. In the secondary kernel, after the smmu was disabled, preset all
>>STE.config=0b000. For 2-level Stream Table, make all L1STD.l2ptr
>>pointer to a dummy L2ST. The dummy L2ST is shared by all L1STDs.
>> 4. In the secondary kernel, enable smmu. For the needed devices, allocate
>>new L2STs accordingly.
>>
>> For phase 1 and 2, the unexpected devices base the old mapping access
>> memory, it will not corrupt others. For phase 3, SMMU_GBPA abort it. For
>> phase 4 STE abort it.
>>
>> Fixes: commit b63b3439b856 ("iommu/arm-smmu-v3: Abort all transactions ...")
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/iommu/arm-smmu-v3.c | 72 
>> -
>>  1 file changed, 51 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 2072897..c3c4ff2 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -1219,35 +1219,57 @@ static void arm_smmu_init_bypass_stes(u64 *strtab, 
>> unsigned int nent)
>>  }
>>  }
>>
>> -static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
>> +static int __arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid,
>> + struct arm_smmu_strtab_l1_desc *desc)
>>  {
>> -size_t size;
>>  void *strtab;
>>  struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
>> -struct arm_smmu_strtab_l1_desc *desc = &cfg->l1_desc[sid >> 
>> STRTAB_SPLIT];
>>
>> -if (desc->l2ptr)
>> -return 0;
>> -
>> -size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3);
>>  strtab = &cfg->strtab[(sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS];
>>
>> -desc->span = STRTAB_SPLIT + 1;
>> -desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &desc->l2ptr_dma,
>> -  GFP_KERNEL | __GFP_ZERO);
>>  if (!desc->l2ptr) {
>> -dev_err(smmu->dev,
>> -"failed to allocate l2 stream table for SID %u\n",
>> -sid);
>> -return -ENOMEM;
>> +size_t size;
>> +
>> +size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3);
>> +desc->l2ptr = dmam_alloc_coherent(smmu->dev, size,
>> +  &desc->l2ptr_dma,
>> +  GFP_KERNEL | __GFP_ZERO);
>> +if (!desc->l2ptr) {
>> +dev_err(smmu->dev,
>> +"failed to allocate l2 stream table for SID 
>> %u\n",
>> +sid);
>> +return -ENOMEM;
>> +}
>> +
>> +desc->span = STRTAB_SPLIT + 1;
>> +arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT);
>>  }
>>
>> -arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT);
>>  arm_smmu_write_strtab_l1_desc(strtab, desc);
>> +return 0;
>> +}
>> +
>> +static int arm_smmu_init_l2_strtab(struct arm_smmu_de

Re: [PATCH RFC 1/1] iommu: set the default iommu-dma mode as non-strict

2019-03-01 Thread Leizhen (ThunderTown)



On 2019/3/1 19:07, Jean-Philippe Brucker wrote:
> Hi Leizhen,
> 
> On 01/03/2019 04:44, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2019/2/26 20:36, Hanjun Guo wrote:
>>> Hi Jean,
>>>
>>> On 2019/1/31 22:55, Jean-Philippe Brucker wrote:
>>>> Hi,
>>>>
>>>> On 31/01/2019 13:52, Zhen Lei wrote:
>>>>> Currently, many peripherals are faster than before. For example, the top
>>>>> speed of the older netcard is 10Gb/s, and now it's more than 25Gb/s. But
>>>>> when iommu page-table mapping enabled, it's hard to reach the top speed
>>>>> in strict mode, because of frequently map and unmap operations. In order
>>>>> to keep abreast of the times, I think it's better to set non-strict as
>>>>> default.
>>>>
>>>> Most users won't be aware of this relaxation and will have their system
>>>> vulnerable to e.g. thunderbolt hotplug. See for example 4.3 Deferred
>>>> Invalidation in
>>>> http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2018/MSC/MSC-2018-21.pdf
>> Hi Jean,
>>
>>In fact, we have discussed the vulnerable of deferred invalidation before 
>> upstream
>> the non-strict patches. The attacks maybe possible because of an untrusted 
>> device or
>> the mistake of the device driver. And we limited the VFIO to still use 
>> strict mode.
>>As mentioned in the pdf, limit the freed memory with deferred 
>> invalidation only to
>> be reused by the device, can mitigate the vulnerability. But it's too hard 
>> to implement
>> it now.
>>A compromise maybe we only apply non-strict to (1) dma_free_coherent, 
>> because the
>> memory is controlled by DMA common module, so we can make the memory to be 
>> freed after
>> the global invalidation in the timer handler. (2) And provide some new APIs 
>> related to
>> iommu_unmap_page/sg, these new APIs deferred invalidation. And the candiate 
>> device
>> drivers update the APIs if they want to improve performance. (3) Make sure 
>> that only
>> the trusted devices and trusted drivers can apply (1) and (2). For example, 
>> the driver
>> must be built into kernel Image.
> 
> Do we have a notion of untrusted kernel drivers? A userspace driver
It seems impossible to have such driver. The modules insmod by root users 
should be
guaranteed by themselves.

> (VFIO) is untrusted, ok. But a malicious driver loaded into the kernel
> address space would have much easier ways to corrupt the system than to
> exploit lazy mode...
Yes, so that we have no need to consider untrusted drivers.

> 
> For (3), I agree that we should at least disallow lazy mode if
> pci_dev->untrusted is set. At the moment it means that we require the
> strictest IOMMU configuration for external-facing PCI ports, but it can
> be extended to blacklist other vulnerable devices or locations.
I plan to add an attribute file for each device, espcially for hotplug devices. 
And
let the root users to decide which mode should be used, strict or non-strict. 
Becasue
they should known whether the hot-plug divice is trusted or not.

> 
> If you do (3) then maybe we don't need (1) and (2), which require a
> tonne of work in the DMA and IOMMU layers (but would certainly be nice
> to see, since it would also help handle ATS invalidation timeouts)
> 
> Thanks,
> Jean
> 
>>So that some high-end trusted devices use non-strict mode, and keep 
>> others still using
>> strict mode. The drivers who want to use non-strict mode, should change to 
>> use new APIs
>> by themselves.
>>
>>
>>>>
>>>> Why not keep the policy to secure by default, as we do for
>>>> iommu.passthrough? And maybe add something similar to
>>>> CONFIG_IOMMU_DEFAULT_PASSTRHOUGH? It's easy enough for experts to pass a
>>>> command-line argument or change the default config.
>>>
>>> Sorry for the late reply, it was Chinese new year, and we had a long 
>>> discussion
>>> internally, we are fine to add a Kconfig but not sure OS vendors will set it
>>> to default y.
>>>
>>> OS vendors seems not happy to pass a command-line argument, to be honest,
>>> this is our motivation to enable non-strict as default. Hope OS vendors
>>> can see this email thread, and give some input here.
>>>
>>> Thanks
>>> Hanjun
>>>
>>>
>>> .
>>>
>>
> 
> 
> .
> 

-- 
Thanks!
BestRegards



Re: [PATCH 0/5] iommu/arm-smmu-v3: make smmu can be enabled in kdump kernel

2019-02-26 Thread Leizhen (ThunderTown)
Hi Will, Robin:
   Do you have time to review these patches? Hope you can give me some opinions.


On 2019/2/19 15:54, Zhen Lei wrote:
> This patch series include two parts:
> 1. Patch1-2 use dummy STE tables with "ste abort" hardware feature to abort 
> unexpected
>devices accessing. For more details, see the description in patch 2.
> 2. If the "ste abort" feature is not support, force the unexpected devices in 
> the
>secondary kernel to use the memory maps which it used in the first kernel. 
> For more
>details, see patch 5.
> 
> 
> Zhen Lei (5):
>   iommu/arm-smmu-v3: make sure the stale caching of L1STD are invalid
>   iommu/arm-smmu-v3: make smmu can be enabled in kdump kernel
>   iommu/arm-smmu-v3: add macro xxx_SIZE to replace xxx_DWORDS shift
>   iommu/arm-smmu-v3: move arm_smmu_get_step_for_sid() a little ahead
>   iommu/arm-smmu-v3: workaround for STE abort in kdump kernel
> 
>  drivers/iommu/arm-smmu-v3.c | 243 
> +++-
>  1 file changed, 194 insertions(+), 49 deletions(-)
> 

-- 
Thanks!
BestRegards



Re: [PATCH v5 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller

2021-01-28 Thread Leizhen (ThunderTown)



On 2021/1/28 22:24, Arnd Bergmann wrote:
> On Sat, Jan 16, 2021 at 4:27 AM Zhen Lei  wrote:
>> diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
>> +
>> +static void l3cache_maint_common(u32 range, u32 op_type)
>> +{
>> +   u32 reg;
>> +
>> +   reg = readl(l3_ctrl_base + L3_MAINT_CTRL);
>> +   reg &= ~(L3_MAINT_RANGE_MASK | L3_MAINT_TYPE_MASK);
>> +   reg |= range | op_type;
>> +   reg |= L3_MAINT_STATUS_START;
>> +   writel(reg, l3_ctrl_base + L3_MAINT_CTRL);
> 
> Are there contents of L3_MAINT_CTRL that need to be preserved
> across calls and can not be inferred? A 'readl()' is often expensive,
> so it might be more efficient if you can avoid that.

Right, this readl() can be replaced with readl_relaxed(). Thanks.

I'll check and correct the readl() and writel() in other places.

> 
>> +static inline void l3cache_flush_all_nolock(void)
>> +{
>> +   l3cache_maint_common(L3_MAINT_RANGE_ALL, L3_MAINT_TYPE_FLUSH);
>> +}
>> +
>> +static void l3cache_flush_all(void)
>> +{
>> +   unsigned long flags;
>> +
>> +   spin_lock_irqsave(&l3cache_lock, flags);
>> +   l3cache_flush_all_nolock();
>> +   spin_unlock_irqrestore(&l3cache_lock, flags);
>> +}
> 
> I see that cache-l2x0 uses raw_spin_lock_irqsave() instead of
> spin_lock_irqsave(), to avoid preemption in the middle of a cache
> operation. This is probably a good idea here as well.

I don't think there's any essential difference between the two! I don't know
if the compiler or tool will do anything extra. I checked the git log of the
l2x0 driver and it used raw_spin_lock_irqsave() at the beginning. Maybe
there's a description in 2.6. Since you mentioned this potential risk, I'll
change it to raw_spin_lock_irqsave.

include/linux/spinlock.h:
static __always_inline raw_spinlock_t *spinlock_check(spinlock_t *lock)
{
return &lock->rlock;
}

#define spin_lock_irqsave(lock, flags)  \
do {\
raw_spin_lock_irqsave(spinlock_check(lock), flags); \
} while (0)

> 
> I also see that l2x0 uses readl_relaxed(), to avoid a deadlock
> in l2x0_cache_sync(). This may also be beneficial for performance
> reasons, so it might be helpful to compare performance
> overhead. On the other hand, readl()/writel() are usually the
> safe choice, as those avoid the need to argue over whether
> the relaxed versions are safe in all corner cases.
> 
>> +static int __init l3cache_init(void)
>> +{
>> +   u32 reg;
>> +   struct device_node *node;
>> +
>> +   node = of_find_matching_node(NULL, l3cache_ids);
>> +   if (!node)
>> +   return -ENODEV;
> 
> I think the initcall should return '0' to indicate success when running
> a kernel with this driver built-in on a platform that does not have
> this device.

I have added "depends on ARCH_KUNPENG50X" for this driver. But it's OK to
return 0.

> 
>> diff --git a/arch/arm/mm/cache-kunpeng-l3.h b/arch/arm/mm/cache-kunpeng-l3.h
>> new file mode 100644
>> index 000..9ef6a53e7d4db49
>> --- /dev/null
>> +++ b/arch/arm/mm/cache-kunpeng-l3.h
>> @@ -0,0 +1,30 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef __CACHE_KUNPENG_L3_H
>> +#define __CACHE_KUNPENG_L3_H
>> +
>> +#define L3_CACHE_LINE_SHITF6
> 
> I would suggest moving the contents of the header file into the .c file,
> since there is only a single user of these macros.

Okay, I'll move it.

> 
>   Arnd
> 
> .
> 



Re: [PATCH v3 2/3] perf/smmuv3: Add a MODULE_SOFTDEP() to indicate dependency on SMMU

2021-01-29 Thread Leizhen (ThunderTown)



On 2021/1/30 1:03, Robin Murphy wrote:
> On 2021-01-29 15:34, John Garry wrote:
>> On 29/01/2021 15:12, Robin Murphy wrote:
>>> On 2021-01-27 11:32, Zhen Lei wrote:
 The MODULE_SOFTDEP() gives user space a hint of the loading sequence. And
 when command "modprobe arm_smmuv3_pmu" is executed, the arm_smmu_v3.ko is
 automatically loaded in advance.
>>>
>>> Why do we need this? If probe order doesn't matter when both drivers are 
>>> built-in, why should module load order?
>>>
>>> TBH I'm not sure why we even have a Kconfig dependency on ARM_SMMU_V3, 
>>> given that the drivers operate completely independently :/
>>
>> Can that Kconfig dependency just be removed? I think that it was added under 
>> the idea that there is no point in having the SMMUv3 PMU driver without the 
>> SMMUv3 driver.
> 
> A PMCG *might* be usable for simply counting transactions to measure device 
> activity regardless of its associated SMMU being enabled.

If that's the case, the SOFTDEP really shouldn't be added. I wasn't trying to 
make sure they were loaded in order, just to make sure that the SMMU was not 
forgotten to load.

> Either way, it's not really Kconfig's job to decide what makes sense (beyond 
> the top-level "can this driver *ever* be used on this platform" visibility 
> choices). Imagine if we gave every PCI/USB/etc. device driver an explicit 
> ?dependency on at least one host controller driver being enabled...
> 
> Robin.
> 
> .
> 



Re: [PATCH v3 1/3] perf/smmuv3: Don't reserve the PMCG register spaces

2021-01-29 Thread Leizhen (ThunderTown)



On 2021/1/29 23:06, Robin Murphy wrote:
> On 2021-01-27 11:32, Zhen Lei wrote:
>> According to the SMMUv3 specification:
>> Each PMCG counter group is represented by one 4KB page (Page 0) with one
>> optional additional 4KB page (Page 1), both of which are at IMPLEMENTATION
>> DEFINED base addresses.
>>
>> This means that the PMCG register spaces may be within the 64KB pages of
>> the SMMUv3 register space. When both the SMMU and PMCG drivers reserve
>> their own resources, a resource conflict occurs.
>>
>> To avoid this conflict, don't reserve the PMCG regions.
> 
> I'm still not a fan of this get_and_ioremap notion in general, especially 
> when the "helper" function ends up over twice the size of all the code it 
> replaces[1], but for the actual functional change here,

OK,I'll change it to [1] and add some comments.

> 
> Reviewed-by: Robin Murphy 
> 
>> Suggested-by: Robin Murphy 
>> Signed-off-by: Zhen Lei 
>> ---
>>   drivers/perf/arm_smmuv3_pmu.c | 27 +--
>>   1 file changed, 25 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
>> index 74474bb322c3f26..e5e505a0804fe53 100644
>> --- a/drivers/perf/arm_smmuv3_pmu.c
>> +++ b/drivers/perf/arm_smmuv3_pmu.c
>> @@ -761,6 +761,29 @@ static void smmu_pmu_get_acpi_options(struct smmu_pmu 
>> *smmu_pmu)
>>   dev_notice(smmu_pmu->dev, "option mask 0x%x\n", smmu_pmu->options);
>>   }
>>   +static void __iomem *
>> +smmu_pmu_get_and_ioremap_resource(struct platform_device *pdev,
>> +  unsigned int index,
>> +  struct resource **res)
>> +{
>> +    void __iomem *base;
>> +    struct resource *r;
>> +
>> +    r = platform_get_resource(pdev, IORESOURCE_MEM, index);
>> +    if (!r) {
>> +    dev_err(&pdev->dev, "invalid resource\n");
>> +    return ERR_PTR(-EINVAL);
>> +    }
>> +    if (res)
>> +    *res = r;
>> +
>> +    base = devm_ioremap(&pdev->dev, r->start, resource_size(r));
>> +    if (!base)
>> +    return ERR_PTR(-ENOMEM);
>> +
>> +    return base;
>> +}
>> +
>>   static int smmu_pmu_probe(struct platform_device *pdev)
>>   {
>>   struct smmu_pmu *smmu_pmu;
>> @@ -793,7 +816,7 @@ static int smmu_pmu_probe(struct platform_device *pdev)
>>   .capabilities    = PERF_PMU_CAP_NO_EXCLUDE,
>>   };
>>   -    smmu_pmu->reg_base = devm_platform_get_and_ioremap_resource(pdev, 0, 
>> &res_0);
>> +    smmu_pmu->reg_base = smmu_pmu_get_and_ioremap_resource(pdev, 0, &res_0);
>>   if (IS_ERR(smmu_pmu->reg_base))
>>   return PTR_ERR(smmu_pmu->reg_base);
>>   @@ -801,7 +824,7 @@ static int smmu_pmu_probe(struct platform_device *pdev)
>>     /* Determine if page 1 is present */
>>   if (cfgr & SMMU_PMCG_CFGR_RELOC_CTRS) {
>> -    smmu_pmu->reloc_base = devm_platform_ioremap_resource(pdev, 1);
>> +    smmu_pmu->reloc_base = smmu_pmu_get_and_ioremap_resource(pdev, 1, 
>> NULL);
>>   if (IS_ERR(smmu_pmu->reloc_base))
>>   return PTR_ERR(smmu_pmu->reloc_base);
>>   } else {
>>
> 
> [1]
> ->8-
>  drivers/perf/arm_smmuv3_pmu.c | 35 +--
>  1 file changed, 9 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
> index e5e505a0804f..c9adbc7b55a1 100644
> --- a/drivers/perf/arm_smmuv3_pmu.c
> +++ b/drivers/perf/arm_smmuv3_pmu.c
> @@ -761,33 +761,10 @@ static void smmu_pmu_get_acpi_options(struct smmu_pmu 
> *smmu_pmu)
>  dev_notice(smmu_pmu->dev, "option mask 0x%x\n", smmu_pmu->options);
>  }
> 
> -static void __iomem *
> -smmu_pmu_get_and_ioremap_resource(struct platform_device *pdev,
> -  unsigned int index,
> -  struct resource **res)
> -{
> -    void __iomem *base;
> -    struct resource *r;
> -
> -    r = platform_get_resource(pdev, IORESOURCE_MEM, index);
> -    if (!r) {
> -    dev_err(&pdev->dev, "invalid resource\n");
> -    return ERR_PTR(-EINVAL);
> -    }
> -    if (res)
> -    *res = r;
> -
> -    base = devm_ioremap(&pdev->dev, r->start, resource_size(r));
> -    if (!base)
> -    return ERR_PTR(-ENOMEM);
> -
> -    return base;
> -}
> -
>  static int smmu_pmu_probe(struct platform_device *pdev)
>  {
>  struct smmu_pmu *smmu_pmu;
> -    struct resource *res_0;
> +    struct resource *res_0, *res_1;
>  u32 cfgr, reg_size;
>  u64 ceid_64[2];
>  int irq, err;
> @@ -816,7 +793,10 @@ static int smmu_pmu_probe(struct platform_device *pdev)
>  .capabilities    = PERF_PMU_CAP_NO_EXCLUDE,
>  };
> 
> -    smmu_pmu->reg_base = smmu_pmu_get_and_ioremap_resource(pdev, 0, &res_0);
> +    res_0 = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +    if (!res_0)
> +    return ERR_PTR(-EINVAL);
> +    smmu_pmu->reg_base = devm_ioremap(dev, res_0->start, 
> resource_size(res_0));
>  if (IS_ERR(smmu_pmu->reg_base))
>  return PTR_ERR(smmu_pmu->reg_base);
> 
> @@ -824,7 +804,10 @@ static int sm

Re: [PATCH v5 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller

2021-01-30 Thread Leizhen (ThunderTown)



On 2021/1/29 18:33, Russell King - ARM Linux admin wrote:
> On Fri, Jan 29, 2021 at 11:26:38AM +0100, Arnd Bergmann wrote:
>> Another clarification, as there are actually two independent
>> points here:
>>
>> * if you can completely remove the readl() above and just write a
>>   hardcoded value into the register, or perhaps read the original
>>   value once at boot time, that is probably a win because it
>>   avoids one of the barriers in the beginning. The datasheet should
>>   tell you if there are any bits in the register that have to be
>>   preserved
>>
>> * Regarding the _relaxed() accessors, it's a lot harder to know
>>   whether that is safe, as you first have to show, in particular in case
>>   any of the accesses stop being guarded by the spinlock in that
>>   case, and whether there may be a case where you have to
>>   serialize the memory access against accesses that are still in the
>>   store queue or prefetched.
>>
>> Whether this matters at all depends mostly on the type of devices
>> you are driving on your SoC. If you have any high-speed network
>> interfaces that are unable to do cache coherent DMA, any extra
>> instruction here may impact the number of packets you can transfer,
>> but if all your high-speed devices are connected to a coherent
>> interconnect, I would just go with the obvious approach and use
>> the safe MMIO accessors everywhere.
> 
> For L2 cache code, I would say the opposite, actually, because it is
> all too easy to get into a deadlock otherwise.
> 
> If you implement the sync callback, that will be called from every
> non-relaxed accessor, which means if you need to take some kind of
> lock in the sync callback and elsewhere in the L2 cache code, you will
> definitely deadlock.
> 
> It is safer to put explicit barriers where it is necessary.
> 
> Also remember that the barrier in readl() etc is _after_ the read, not
> before, and the barrier in writel() is _before_ the write, not after.
> The point is to ensure that DMA memory accesses are properly ordered
> with the IO-accessing instructions.

Yes, I known it. writel() must be used for the write operations that control
"start/stop" or "enable/disable" function, to ensure that the data of previous
write operations reaches the target. I've met this kind of problem before.

> 
> So, using readl_relaxed() with a read-modify-write is entirely sensible
> provided you do not access DMA memory inbetween.

Actually, I don't think this register is that complicated. I copied the code
back below. All the bits of L3_MAINT_CTRL are not affected by DMA access 
operations.
The software change the "range | op_type" to specify the operation type and 
scope,
the set the bit "L3_MAINT_STATUS_START" to start the operation. Then wait for 
that
bit to change from 1 to 0 by hardware.

+   reg = readl(l3_ctrl_base + L3_MAINT_CTRL);
+   reg &= ~(L3_MAINT_RANGE_MASK | L3_MAINT_TYPE_MASK);
+   reg |= range | op_type;
+   reg |= L3_MAINT_STATUS_START;
+   writel(reg, l3_ctrl_base + L3_MAINT_CTRL);
+
+   /* Wait until the hardware maintenance operation is complete. */
+   do {
+   cpu_relax();
+   reg = readl(l3_ctrl_base + L3_MAINT_CTRL);
+   } while ((reg & L3_MAINT_STATUS_MASK) != L3_MAINT_STATUS_END);

> 



Re: [PATCH v4 1/2] perf/smmuv3: Don't reserve the PMCG register spaces

2021-01-30 Thread Leizhen (ThunderTown)
Hi, Robin:
  Can you review this patch again?


On 2021/1/30 15:14, Zhen Lei wrote:
> According to the SMMUv3 specification:
> Each PMCG counter group is represented by one 4KB page (Page 0) with one
> optional additional 4KB page (Page 1), both of which are at IMPLEMENTATION
> DEFINED base addresses.
> 
> This means that the PMCG register spaces may be within the 64KB pages of
> the SMMUv3 register space. When both the SMMU and PMCG drivers reserve
> their own resources, a resource conflict occurs.
> 
> To avoid this conflict, don't reserve the PMCG regions.
> 
> Suggested-by: Robin Murphy 
> Signed-off-by: Zhen Lei 
> ---
>  drivers/perf/arm_smmuv3_pmu.c | 25 +++--
>  1 file changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
> index 74474bb322c3f26..5e894f957c7b935 100644
> --- a/drivers/perf/arm_smmuv3_pmu.c
> +++ b/drivers/perf/arm_smmuv3_pmu.c
> @@ -793,17 +793,30 @@ static int smmu_pmu_probe(struct platform_device *pdev)
>   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
>   };
>  
> - smmu_pmu->reg_base = devm_platform_get_and_ioremap_resource(pdev, 0, 
> &res_0);
> - if (IS_ERR(smmu_pmu->reg_base))
> - return PTR_ERR(smmu_pmu->reg_base);
> + /*
> +  * The register spaces of the PMCG may be in the register space of
> +  * other devices. For example, SMMU. Therefore, the PMCG resources are
> +  * not reserved to avoid resource conflicts with other drivers.
> +  */
> + res_0 = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> + if (!res_0)
> + return ERR_PTR(-EINVAL);
> + smmu_pmu->reg_base = devm_ioremap(dev, res_0->start, 
> resource_size(res_0));
> + if (!smmu_pmu->reg_base)
> + return ERR_PTR(-ENOMEM);
>  
>   cfgr = readl_relaxed(smmu_pmu->reg_base + SMMU_PMCG_CFGR);
>  
>   /* Determine if page 1 is present */
>   if (cfgr & SMMU_PMCG_CFGR_RELOC_CTRS) {
> - smmu_pmu->reloc_base = devm_platform_ioremap_resource(pdev, 1);
> - if (IS_ERR(smmu_pmu->reloc_base))
> - return PTR_ERR(smmu_pmu->reloc_base);
> + struct resource *res_1;
> +
> + res_1 = platform_get_resource(pdev, IORESOURCE_MEM, 1);
> + if (!res_1)
> + return ERR_PTR(-EINVAL);
> + smmu_pmu->reloc_base = devm_ioremap(dev, res_1->start, 
> resource_size(res_1));
> + if (!smmu_pmu->reloc_base)
> + return ERR_PTR(-ENOMEM);
>   } else {
>   smmu_pmu->reloc_base = smmu_pmu->reg_base;
>   }
> 



Re: [PATCH v5 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller

2021-01-30 Thread Leizhen (ThunderTown)



On 2021/1/29 21:54, Leizhen (ThunderTown) wrote:
> 
> 
> On 2021/1/29 18:26, Arnd Bergmann wrote:
>> On Fri, Jan 29, 2021 at 9:16 AM Arnd Bergmann  wrote:
>>> On Fri, Jan 29, 2021 at 8:23 AM Leizhen (ThunderTown)
>>>  wrote:
>>>> On 2021/1/28 22:24, Arnd Bergmann wrote:
>>>>> On Sat, Jan 16, 2021 at 4:27 AM Zhen Lei  
>>>>> wrote:
>>>>>> diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
>>>>>> +
>>>>>> +static void l3cache_maint_common(u32 range, u32 op_type)
>>>>>> +{
>>>>>> +   u32 reg;
>>>>>> +
>>>>>> +   reg = readl(l3_ctrl_base + L3_MAINT_CTRL);
>>>>>> +   reg &= ~(L3_MAINT_RANGE_MASK | L3_MAINT_TYPE_MASK);
>>>>>> +   reg |= range | op_type;
>>>>>> +   reg |= L3_MAINT_STATUS_START;
>>>>>> +   writel(reg, l3_ctrl_base + L3_MAINT_CTRL);
>>>>>
>>>>> Are there contents of L3_MAINT_CTRL that need to be preserved
>>>>> across calls and can not be inferred? A 'readl()' is often expensive,
>>>>> so it might be more efficient if you can avoid that.
>>>>
>>>> Right, this readl() can be replaced with readl_relaxed(). Thanks.
>>>>
>>>> I'll check and correct the readl() and writel() in other places.
>>>
>>> What I meant is that if you want to replace them, you should provide
>>> performance numbers that show how much difference this makes
>>> and add comments in the source code explaining how you proved that
>>> the _relaxed() version is actually correct.
>>
>> Another clarification, as there are actually two independent
>> points here:
>>
>> * if you can completely remove the readl() above and just write a
>>   hardcoded value into the register, or perhaps read the original
>>   value once at boot time, that is probably a win because it
>>   avoids one of the barriers in the beginning. The datasheet should
>>   tell you if there are any bits in the register that have to be
>>   preserved
> 
> Code coupling will become very strong.
> 
>>
>> * Regarding the _relaxed() accessors, it's a lot harder to know
>>   whether that is safe, as you first have to show, in particular in case
>>   any of the accesses stop being guarded by the spinlock in that
>>   case, and whether there may be a case where you have to
>>   serialize the memory access against accesses that are still in the
>>   store queue or prefetched.
>>
>> Whether this matters at all depends mostly on the type of devices
>> you are driving on your SoC. If you have any high-speed network
>> interfaces that are unable to do cache coherent DMA, any extra
>> instruction here may impact the number of packets you can transfer,
>> but if all your high-speed devices are connected to a coherent
>> interconnect, I would just go with the obvious approach and use
>> the safe MMIO accessors everywhere.
> 
> In fact, this driver has been running on an earlier version for several years
> and has not received any feedback about the performance issue. So I didn't
> try to optimize it when I first sent these patches. I had to reconsider it
> until you noticed it.
> 
> How about keeping it unchanged for the moment? It'll take a lot of time and
> energy to retest.

In the spirit of code excellence, it's still necessary to optimize it.
Yesterday, my family urged me to go back, I wrote it in a hurry.

> 
>>
>>Arnd
>>
>> ___
>> linux-arm-kernel mailing list
>> linux-arm-ker...@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
>>
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> .
> 



Re: [PATCH v3 3/3] iommu/arm-smmu-v3: Reserving the entire SMMU register space

2021-01-30 Thread Leizhen (ThunderTown)



On 2021/1/29 23:27, Robin Murphy wrote:
> On 2021-01-27 11:32, Zhen Lei wrote:
>> commit 52f3fab0067d ("iommu/arm-smmu-v3: Don't reserve implementation
>> defined register space") only reserves the basic SMMU register space. So
>> the ECMDQ register space is not covered, it should be mapped again. Due
>> to the size of this ECMDQ resource is not fixed, depending on
>> SMMU_IDR6.CMDQ_CONTROL_PAGE_LOG2NUMQ. Processing its resource reservation
>> to avoid resource conflict with PMCG is a bit more complicated.
>>
>> Therefore, the resources of the PMCG are not reserved, and the entire SMMU
>> resources are reserved.
> 
> This is the opposite of what I suggested. My point was that it will make the 
> most sense to map the ECMDQ pages as a separate request anyway, therefore 
> there is no reason to stop mapping page 0 and page 1 separately either.

I don't understand why the ECMDQ mapping must be performed separately. If the 
conflict with PMCG resources is eliminated. ECMDQ cannot be a separate driver 
like PMCG.

> 
> If we need to expand the page 0 mapping to cover more of page 0 to reach the 
> SMMU_CMDQ_CONTROL_PAGE_* registers, we can do that when we actually add ECMDQ 
> support.
> 
> Robin.
> 
>> Suggested-by: Robin Murphy 
>> Signed-off-by: Zhen Lei 
>> ---
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 24 
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 --
>>   2 files changed, 4 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index f04c55a7503c790..fde24403b06a9e3 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -3476,14 +3476,6 @@ static int arm_smmu_set_bus_ops(struct iommu_ops *ops)
>>   return err;
>>   }
>>   -static void __iomem *arm_smmu_ioremap(struct device *dev, resource_size_t 
>> start,
>> -  resource_size_t size)
>> -{
>> -    struct resource res = DEFINE_RES_MEM(start, size);
>> -
>> -    return devm_ioremap_resource(dev, &res);
>> -}
>> -
>>   static int arm_smmu_device_probe(struct platform_device *pdev)
>>   {
>>   int irq, ret;
>> @@ -3519,22 +3511,14 @@ static int arm_smmu_device_probe(struct 
>> platform_device *pdev)
>>   }
>>   ioaddr = res->start;
>>   -    /*
>> - * Don't map the IMPLEMENTATION DEFINED regions, since they may contain
>> - * the PMCG registers which are reserved by the PMU driver.
>> - */
>> -    smmu->base = arm_smmu_ioremap(dev, ioaddr, ARM_SMMU_REG_SZ);
>> +    smmu->base = devm_ioremap_resource(dev, res);
>>   if (IS_ERR(smmu->base))
>>   return PTR_ERR(smmu->base);
>>   -    if (arm_smmu_resource_size(smmu) > SZ_64K) {
>> -    smmu->page1 = arm_smmu_ioremap(dev, ioaddr + SZ_64K,
>> -   ARM_SMMU_REG_SZ);
>> -    if (IS_ERR(smmu->page1))
>> -    return PTR_ERR(smmu->page1);
>> -    } else {
>> +    if (arm_smmu_resource_size(smmu) > SZ_64K)
>> +    smmu->page1 = smmu->base + SZ_64K;
>> +    else
>>   smmu->page1 = smmu->base;
>> -    }
>>     /* Interrupt lines */
>>   diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> index da525f46dab4fc1..63f2b476987d6ae 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> @@ -152,8 +152,6 @@
>>   #define ARM_SMMU_PRIQ_IRQ_CFG1    0xd8
>>   #define ARM_SMMU_PRIQ_IRQ_CFG2    0xdc
>>   -#define ARM_SMMU_REG_SZ    0xe00
>> -
>>   /* Common MSI config fields */
>>   #define MSI_CFG0_ADDR_MASK    GENMASK_ULL(51, 2)
>>   #define MSI_CFG2_SH    GENMASK(5, 4)
>>
> 
> .
> 



Re: [PATCH v6 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller

2021-02-01 Thread Leizhen (ThunderTown)



On 2021/2/1 16:31, Arnd Bergmann wrote:
> On Mon, Feb 1, 2021 at 4:36 AM Zhen Lei  wrote:
>>
>> Add support for the Hisilicon Kunpeng L3 cache controller as used with
>> Kunpeng506 and Kunpeng509 SoCs.
>>
>> These Hisilicon SoCs support LPAE, so the physical addresses is wider than
>> 32-bits, but the actual bit width does not exceed 36 bits. When the cache
>> operation is performed based on the address range, the upper 30 bits of
>> the physical address are recorded in registers L3_MAINT_START and
>> L3_MAINT_END, and ignore the lower 6 bits cacheline offset.
>>
>> Signed-off-by: Zhen Lei 
> 
> Reviewed-by: Arnd Bergmann 
> 
> If you add one more thing:
> 
>> +static void l3cache_maint_common(u32 range, u32 op_type)
>> +{
>> +   u32 reg;
>> +
>> +   reg = readl_relaxed(l3_ctrl_base + L3_MAINT_CTRL);
>> +   reg &= ~(L3_MAINT_RANGE_MASK | L3_MAINT_TYPE_MASK);
>> +   reg |= range | op_type;
>> +   reg |= L3_MAINT_STATUS_START;
>> +   writel(reg, l3_ctrl_base + L3_MAINT_CTRL);
>> +
>> +   /* Wait until the hardware maintenance operation is complete. */
>> +   do {
>> +   cpu_relax();
>> +   reg = readl(l3_ctrl_base + L3_MAINT_CTRL);
>> +   } while ((reg & L3_MAINT_STATUS_MASK) != L3_MAINT_STATUS_END);
>> +}
>> +
>> +static void l3cache_maint_range(phys_addr_t start, phys_addr_t end, u32 
>> op_type)
>> +{
>> +   start = start >> L3_CACHE_LINE_SHITF;
>> +   end = ((end - 1) >> L3_CACHE_LINE_SHITF) + 1;
>> +
>> +   writel_relaxed(start, l3_ctrl_base + L3_MAINT_START);
>> +   writel_relaxed(end, l3_ctrl_base + L3_MAINT_END);
>> +
>> +   l3cache_maint_common(L3_MAINT_RANGE_ADDR, op_type);
>> +}
> 
> As mentioned, I'd like to see a code comment that explains the use
> the of relaxed() vs non-relaxed MMIO accessors, as it will be impossible
> for a reader to later understand why you picked a mix of the two,
> and it also ensures that you have considered which one is the best
> option to use here and that your explanation matches what you do.

OK, I'll test the performance and add the comment.

> 
> Based on Russell's comments, I had expected that you would use
> only relaxed accessors, plus explicit barriers if you change it, matching
> what l2x0 does (l2x0 has to do it because of __l2c210_cache_sync(),
> while you don't have a sync callback and don't need to).

I might have been a little conservative, I'll change all of them to _relaxed 
and then test it. Thanks.

> 
>   Arnd
> 
> .
> 



Re: [PATCH v6 2/4] ARM: hisi: add support for Kunpeng50x SoC

2021-02-01 Thread Leizhen (ThunderTown)



On 2021/2/1 16:35, Arnd Bergmann wrote:
> On Mon, Feb 1, 2021 at 4:35 AM Zhen Lei  wrote:
>>
>> Enable support for the Hisilicon Kunpeng506 and Kunpeng509 SoC.
>>
>> Signed-off-by: Zhen Lei 
> 
> Reviewed-by: Arnd Bergmann 
> 
> Russell, do you have a preference for how to get this series merged
> after the last comments are resolved?
> 
> I think there is no technical problem in having patch two merged through
> the soc tree, while merging the other three through your tree, but it
> seems more logical to keep all four together in either location.

Wait, wait. I've coordinated resources urgently. I can run test cases for new 
changes tonight.

> 
>Arnd
> 
> .
> 



Re: [PATCH v3 3/3] iommu/arm-smmu-v3: Reserving the entire SMMU register space

2021-02-01 Thread Leizhen (ThunderTown)



On 2021/2/1 19:44, Robin Murphy wrote:
> On 2021-01-30 01:54, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2021/1/29 23:27, Robin Murphy wrote:
>>> On 2021-01-27 11:32, Zhen Lei wrote:
>>>> commit 52f3fab0067d ("iommu/arm-smmu-v3: Don't reserve implementation
>>>> defined register space") only reserves the basic SMMU register space. So
>>>> the ECMDQ register space is not covered, it should be mapped again. Due
>>>> to the size of this ECMDQ resource is not fixed, depending on
>>>> SMMU_IDR6.CMDQ_CONTROL_PAGE_LOG2NUMQ. Processing its resource reservation
>>>> to avoid resource conflict with PMCG is a bit more complicated.
>>>>
>>>> Therefore, the resources of the PMCG are not reserved, and the entire SMMU
>>>> resources are reserved.
>>>
>>> This is the opposite of what I suggested. My point was that it will make 
>>> the most sense to map the ECMDQ pages as a separate request anyway, 
>>> therefore there is no reason to stop mapping page 0 and page 1 separately 
>>> either.
>>
>> I don't understand why the ECMDQ mapping must be performed separately. If 
>> the conflict with PMCG resources is eliminated. ECMDQ cannot be a separate 
>> driver like PMCG.
> 
> I mean in terms of the basic practice of not mapping megabytes worth of 
> IMP-DEF crap that this driver doesn't need or even understand. If we don't 
> have ECMDQ, we definitely don't need anything beyond page 1, so there's no 
> point mapping it all, and indeed it's safest not to anyway. Even if we do 
> have ECMDQ, it's still safer not to map all the unknown stuff that may be in 
> between, and until we've mapped page 0 we don't know whether we have ECMDQ or 
> not.
> 
> Therefore the most sensible thing to do either way is to map the basic 
> page(s) first, then map the ECMDQ pages specifically if we determine that we 
> need to. And either way we don't even need to think about this until adding 
> ECMDQ support.

Okay, I got it. We really don't have to write code ahead of time for what might 
happen in the future.

> 
> Robin.
> 
>>> If we need to expand the page 0 mapping to cover more of page 0 to reach 
>>> the SMMU_CMDQ_CONTROL_PAGE_* registers, we can do that when we actually add 
>>> ECMDQ support.
>>>
>>> Robin.
>>>
>>>> Suggested-by: Robin Murphy 
>>>> Signed-off-by: Zhen Lei 
>>>> ---
>>>>    drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 24 
>>>> 
>>>>    drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 --
>>>>    2 files changed, 4 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>>>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> index f04c55a7503c790..fde24403b06a9e3 100644
>>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>>> @@ -3476,14 +3476,6 @@ static int arm_smmu_set_bus_ops(struct iommu_ops 
>>>> *ops)
>>>>    return err;
>>>>    }
>>>>    -static void __iomem *arm_smmu_ioremap(struct device *dev, 
>>>> resource_size_t start,
>>>> -  resource_size_t size)
>>>> -{
>>>> -    struct resource res = DEFINE_RES_MEM(start, size);
>>>> -
>>>> -    return devm_ioremap_resource(dev, &res);
>>>> -}
>>>> -
>>>>    static int arm_smmu_device_probe(struct platform_device *pdev)
>>>>    {
>>>>    int irq, ret;
>>>> @@ -3519,22 +3511,14 @@ static int arm_smmu_device_probe(struct 
>>>> platform_device *pdev)
>>>>    }
>>>>    ioaddr = res->start;
>>>>    -    /*
>>>> - * Don't map the IMPLEMENTATION DEFINED regions, since they may 
>>>> contain
>>>> - * the PMCG registers which are reserved by the PMU driver.
>>>> - */
>>>> -    smmu->base = arm_smmu_ioremap(dev, ioaddr, ARM_SMMU_REG_SZ);
>>>> +    smmu->base = devm_ioremap_resource(dev, res);
>>>>    if (IS_ERR(smmu->base))
>>>>    return PTR_ERR(smmu->base);
>>>>    -    if (arm_smmu_resource_size(smmu) > SZ_64K) {
>>>> -    smmu->page1 = arm_smmu_ioremap(dev, ioaddr + SZ_64K,
>>>> -   ARM_SMMU_REG_SZ);
>>>> -    if (IS_ERR(smmu->page1))
>>>> -    return PTR_ERR(smmu->page1);
>>>> -    } else {
>>>> +    if (arm_smmu_resource_size(smmu) > SZ_64K)
>>>> +    smmu->page1 = smmu->base + SZ_64K;
>>>> +    else
>>>>    smmu->page1 = smmu->base;
>>>> -    }
>>>>      /* Interrupt lines */
>>>>    diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
>>>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>>>> index da525f46dab4fc1..63f2b476987d6ae 100644
>>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>>>> @@ -152,8 +152,6 @@
>>>>    #define ARM_SMMU_PRIQ_IRQ_CFG1    0xd8
>>>>    #define ARM_SMMU_PRIQ_IRQ_CFG2    0xdc
>>>>    -#define ARM_SMMU_REG_SZ    0xe00
>>>> -
>>>>    /* Common MSI config fields */
>>>>    #define MSI_CFG0_ADDR_MASK    GENMASK_ULL(51, 2)
>>>>    #define MSI_CFG2_SH    GENMASK(5, 4)
>>>>
>>>
>>> .
>>>
>>
> 
> .
> 



Re: [PATCH v4 1/2] perf/smmuv3: Don't reserve the PMCG register spaces

2021-02-01 Thread Leizhen (ThunderTown)



On 2021/2/1 20:54, Will Deacon wrote:
> On Sat, Jan 30, 2021 at 03:14:13PM +0800, Zhen Lei wrote:
>> According to the SMMUv3 specification:
>> Each PMCG counter group is represented by one 4KB page (Page 0) with one
>> optional additional 4KB page (Page 1), both of which are at IMPLEMENTATION
>> DEFINED base addresses.
>>
>> This means that the PMCG register spaces may be within the 64KB pages of
>> the SMMUv3 register space. When both the SMMU and PMCG drivers reserve
>> their own resources, a resource conflict occurs.
>>
>> To avoid this conflict, don't reserve the PMCG regions.
>>
>> Suggested-by: Robin Murphy 
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/perf/arm_smmuv3_pmu.c | 25 +++--
>>  1 file changed, 19 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
>> index 74474bb322c3f26..5e894f957c7b935 100644
>> --- a/drivers/perf/arm_smmuv3_pmu.c
>> +++ b/drivers/perf/arm_smmuv3_pmu.c
>> @@ -793,17 +793,30 @@ static int smmu_pmu_probe(struct platform_device *pdev)
>>  .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
>>  };
>>  
>> -smmu_pmu->reg_base = devm_platform_get_and_ioremap_resource(pdev, 0, 
>> &res_0);
>> -if (IS_ERR(smmu_pmu->reg_base))
>> -return PTR_ERR(smmu_pmu->reg_base);
>> +/*
>> + * The register spaces of the PMCG may be in the register space of
>> + * other devices. For example, SMMU. Therefore, the PMCG resources are
>> + * not reserved to avoid resource conflicts with other drivers.
>> + */
>> +res_0 = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>> +if (!res_0)
>> +return ERR_PTR(-EINVAL);
> 
> I tried to apply this, but you've got your return type in a muddle:

I'm dizzy. I don't know how this bug patch came out. I just pinched my leg, 
like I'm still in the real world.

The "ERR_PTR()" of the four ERR_PTR(xxx) should be removed. Can you help me? Or 
I send a new one.

> 
> @@ @@
> +drivers/perf/arm_smmuv3_pmu.c: In function ‘smmu_pmu_probe’:
> +drivers/perf/arm_smmuv3_pmu.c:803:10: warning: returning ‘void *’ from a 
> function with return type ‘int’ makes integer from pointer without a cast 
> [-Wint-conversion]
> +  803 |   return ERR_PTR(-EINVAL);
> +  |  ^~~~
> +drivers/perf/arm_smmuv3_pmu.c:803:31: warning: incorrect type in return 
> expression (different base types) [sparse]
> +drivers/perf/arm_smmuv3_pmu.c:803:31:expected int [sparse]
> +drivers/perf/arm_smmuv3_pmu.c:803:31:got void * [sparse]
> +drivers/perf/arm_smmuv3_pmu.c:806:10: warning: returning ‘void *’ from a 
> function with return type ‘int’ makes integer from pointer without a cast 
> [-Wint-conversion]
> +  806 |   return ERR_PTR(-ENOMEM);
> +  |  ^~~~
> +drivers/perf/arm_smmuv3_pmu.c:806:31: warning: incorrect type in return 
> expression (different base types) [sparse]
> +drivers/perf/arm_smmuv3_pmu.c:806:31:expected int [sparse]
> +drivers/perf/arm_smmuv3_pmu.c:806:31:got void * [sparse]
> +drivers/perf/arm_smmuv3_pmu.c:816:11: warning: returning ‘void *’ from a 
> function with return type ‘int’ makes integer from pointer without a cast 
> [-Wint-conversion]
> +  816 |return ERR_PTR(-EINVAL);
> +  |   ^~~~
> +drivers/perf/arm_smmuv3_pmu.c:816:39: warning: incorrect type in return 
> expression (different base types) [sparse]
> +drivers/perf/arm_smmuv3_pmu.c:816:39:expected int [sparse]
> +drivers/perf/arm_smmuv3_pmu.c:816:39:got void * [sparse]
> +drivers/perf/arm_smmuv3_pmu.c:819:11: warning: returning ‘void *’ from a 
> function with return type ‘int’ makes integer from pointer without a cast 
> [-Wint-conversion]
> +  819 |return ERR_PTR(-ENOMEM);
> +  |   ^~~~
> +drivers/perf/arm_smmuv3_pmu.c:819:39: warning: incorrect type in return 
> expression (different base types) [sparse]
> +drivers/perf/arm_smmuv3_pmu.c:819:39:expected int [sparse]
> +drivers/perf/arm_smmuv3_pmu.c:819:39:got void * [sparse]
> 
> Will
> 
> .
> 



Re: [PATCH v5 0/1] perf/smmuv3: Don't reserve the PMCG register spaces

2021-02-01 Thread Leizhen (ThunderTown)



On 2021/2/1 23:50, Will Deacon wrote:
> On Mon, Feb 01, 2021 at 09:27:49PM +0800, Zhen Lei wrote:
>> v4 --> v5:
>> 1. Give up doing the mapping for the entire SMMU register space.
>> 2. Fix some compile warnings. Sorry. So sorry.
> 
> That's alright, these things happen. However, this came in slightly too
> late for 5.12, so please resend at -rc1 and we'll aim for 5.13.

Okay, thanks for your tolerance.

> 
> Will
> 
> .
> 



Re: [PATCH v7 4/4] ARM: Add support for Hisilicon Kunpeng L3 cache controller

2021-02-02 Thread Leizhen (ThunderTown)



On 2021/2/2 16:44, Arnd Bergmann wrote:
> On Tue, Feb 2, 2021 at 8:16 AM Zhen Lei  wrote:
>> +
>> +/*
>> + * All read and write operations on L3 cache registers are protected by the
>> + * spinlock, except for l3cache_init(). Each time the L3 cache operation is
>> + * performed, all related information is filled into its registers. 
>> Therefore,
>> + * there is no memory order problem when only _relaxed() functions are used.
> 
> Thank you for including the text.
> 
> I don't think the explanation with the spin_lock() explains why this
> can be considered safe though, as spin_lock() only contains serialization
> against other CPUs (smp_mb()) rather than the stronger DMA barriers
> implied by readl and writel. As Russell previously explained, these
> barriers are the L1 cache operations (e.g. v7_dma_inv_range) do
> include stronger barriers, so it would be better to come up with a
> justification based on those.

Okay, I'll correct the description.

> 
>> + * This can help us achieve some performance improvement:
>> + * 1) The readl_relaxed() is about 20ns faster than readl().
>> + * 2) The writel_relaxed() is about 123ns faster than writel().
> 
> These are not really the performance numbers I asked for, as a
> low-level benchmark comparing the instructions is rather meaningless.
> The time spent waiting for the barrier depends on what else is going
> on around the barrier. Also, most of the time would likely be
> spent spinning in the loop around readl() while the cache operations
> are in progress, so the latency of a single readl() is not necessarily
> significant.
> 
> To have a more useful performance number, try mentioning the
> most performance sensitive non-coherent DMA master on one
> of the chips that has this cache controller, and a high-level
> performance number such as "1.2% more network packets per
> second" if that is something you can measure easily.

It's not easy. My board only have debugging NIC, only the downstream
products have high-speed service NIC. Software needs to be packaged
layer by layer.

> 
> Of course, if all high-speed DMA masters on this chip are
> cache coherent, there is no need for performance numbers, just
> mention that we don't care about speed in that case.

It's not cache coherent, otherwise, the L3 cache does not need to be
operated.

> 
> Arnd
> 
> .
> 



Re: [PATCH 1/3] libnvdimm: Fix memory leaks in of_pmem.c

2020-08-18 Thread Leizhen (ThunderTown)



On 8/19/2020 3:00 AM, Markus Elfring wrote:
>> The memory priv->bus_desc.provider_name allocated by kstrdup() should be
>> freed.
> 
> * Would an imperative wording be preferred for the change description?
> 
> * I propose to add the tag “Fixes” to the commit message.
Thanks for your suggestion, I will add it in v2.

Fixes: 49bddc73d15c ("libnvdimm/of_pmem: Provide a unique name for bus 
provider")


> 
> Regards,
> Markus
> 
> 



Re: [PATCH v2 0/4] bug fix and optimize for drivers/nvdimm

2020-08-19 Thread Leizhen (ThunderTown)



On 8/19/2020 8:20 PM, Markus Elfring wrote:
>> v1 --> v2:
>> 1. Add Fixes for Patch 1-2
>> 2. Slightly change the subject and description of Patch 1
> …
>>   libnvdimm: fix memmory leaks in of_pmem.c
> …
> 
> I suggest to avoid a typo in such a patch subject.

OK, Thanks for reminding me.

> 
> Regards,
> Markus
> 
> 



Re: [PATCH v2 1/4] libnvdimm: Fix memory leaks in of_pmem.c

2020-08-19 Thread Leizhen (ThunderTown)



On 8/19/2020 8:28 PM, Markus Elfring wrote:
>> The memory priv->bus_desc.provider_name allocated by kstrdup() is not
>> freed correctly.
> 
> How do you think about to choose an imperative wording for
> a corresponding change description?
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=18445bf405cb331117bc98427b1ba6f12418ad17#n151

OK, thanks. I think I known what "imperative wording" means now.
I will rewrite the descriptions.

> 
> Regards,
> Markus
> 
> 



Re: [PATCH v2 3/4] libnvdimm/bus: simplify walk_to_nvdimm_bus()

2020-08-19 Thread Leizhen (ThunderTown)



On 8/19/2020 8:40 PM, Markus Elfring wrote:
>> … when is_nvdimm_bus(dev) successed.
> 
> I imagine that that an other wording will be more appropriate here.

OK, I will rewrite it.

> 
> Regards,
> Markus
> 
> 



Re: [PATCH v2 1/4] libnvdimm: Fix memory leaks in of_pmem.c

2020-08-19 Thread Leizhen (ThunderTown)



On 8/19/2020 9:35 PM, Oliver O'Halloran wrote:
> On Wed, Aug 19, 2020 at 10:28 PM Markus Elfring  wrote:
>>
>>> The memory priv->bus_desc.provider_name allocated by kstrdup() is not
>>> freed correctly.
> 
> Personally I thought his commit message was perfectly fine. A little
> unorthodox, but it works.
> 
>> How do you think about to choose an imperative wording for
>> a corresponding change description?
> 
> ...but this! This is word salad.

Talented students are trained by strict teacher. All of us is trying to make 
things better.

> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=18445bf405cb331117bc98427b1ba6f12418ad17#n151
>>
>> Regards,
>> Markus
> 
> 



Re: [PATCH v2 1/4] libnvdimm: fix memmory leaks in of_pmem.c

2020-08-19 Thread Leizhen (ThunderTown)



On 8/19/2020 9:37 PM, Oliver O'Halloran wrote:
> On Wed, Aug 19, 2020 at 12:05 PM Zhen Lei  wrote:
>>
>> The memory priv->bus_desc.provider_name allocated by kstrdup() is not
>> freed correctly.
>>
>> Fixes: 49bddc73d15c ("libnvdimm/of_pmem: Provide a unique name for bus 
>> provider")
>> Signed-off-by: Zhen Lei 
> 
> Yep, that's a bug.
> 
> Reviewed-by: Oliver O'Halloran 

Thanks for your review.

> 
>> ---
>>  drivers/nvdimm/of_pmem.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
>> index 10dbdcdfb9ce913..1292ffca7b2ecc0 100644
>> --- a/drivers/nvdimm/of_pmem.c
>> +++ b/drivers/nvdimm/of_pmem.c
>> @@ -36,6 +36,7 @@ static int of_pmem_region_probe(struct platform_device 
>> *pdev)
>>
>> priv->bus = bus = nvdimm_bus_register(&pdev->dev, &priv->bus_desc);
>> if (!bus) {
>> +   kfree(priv->bus_desc.provider_name);
>> kfree(priv);
>> return -ENODEV;
>> }
>> @@ -83,6 +84,7 @@ static int of_pmem_region_remove(struct platform_device 
>> *pdev)
>> struct of_pmem_private *priv = platform_get_drvdata(pdev);
>>
>> nvdimm_bus_unregister(priv->bus);
>> +   kfree(priv->bus_desc.provider_name);
>> kfree(priv);
>>
>> return 0;
>> --
>> 1.8.3
>>
>>
> 
> .
> 



Re: [PATCH 1/1] spi: cadence-quadspi: Fix a compilation warning for 64-bit platform

2021-01-12 Thread Leizhen (ThunderTown)



On 2021/1/12 18:16, Pratyush Yadav wrote:
> Hi Zhen,
> 
> On 12/01/21 06:06PM, Zhen Lei wrote:
>> The __typecheck() requires that the two arguments of max() must be of the
>> same type. For the current max(), the type of the first parameter "len" is
>> size_t. But the type of size_t is not fixed, it's "unsigned int" on 32-bit
>> platforms and "unsigned long" on 64-bit platforms. So both the suffix "U"
>> and "UL" are not appropriate for the second constant parameter. Therefore,
>> forcible type conversion is used.
>>
>> Fixes: 8728a81b8f10 ("spi: Fix distinct pointer types warning for ARCH=mips")
>> Fixes: 0920a32cf6f2 ("spi: cadence-quadspi: Wait at least 500 ms for direct 
>> reads")
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/spi/spi-cadence-quadspi.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/spi/spi-cadence-quadspi.c 
>> b/drivers/spi/spi-cadence-quadspi.c
>> index 576610ba11184c6..eb40b8d46b56b0c 100644
>> --- a/drivers/spi/spi-cadence-quadspi.c
>> +++ b/drivers/spi/spi-cadence-quadspi.c
>> @@ -1150,7 +1150,7 @@ static int cqspi_direct_read_execute(struct 
>> cqspi_flash_pdata *f_pdata,
>>
>>  dma_async_issue_pending(cqspi->rx_chan);
>>  if (!wait_for_completion_timeout(&cqspi->rx_dma_complete,
>> - msecs_to_jiffies(max(len, 500U {
>> + msecs_to_jiffies(max_t(size_t, len, 500 {
> 
> I sent a patch with this exact fix already [0]. It has made it in Mark's 
> for-next branch.

OK,I don't known it.

> 
> [0] https://lore.kernel.org/linux-spi/20210108181457.30291-1-p.ya...@ti.com/
> 
>>  dmaengine_terminate_sync(cqspi->rx_chan);
>>  dev_err(dev, "DMA wait_for_completion_timeout\n");
>>  ret = -ETIMEDOUT;
>> --
>> 1.8.3
>>
>>
> 



Re: [PATCH v3 2/3] dt-bindings: arm: hisilicon: Add binding for L3 cache controller

2021-01-12 Thread Leizhen (ThunderTown)



On 2021/1/12 16:46, Arnd Bergmann wrote:
> On Tue, Jan 12, 2021 at 2:56 AM Zhen Lei  wrote:
> 
>> +---
>> +$id: http://devicetree.org/schemas/arm/hisilicon/l3cache.yaml#
>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>> +
>> +title: Hisilicon L3 cache controller
>> +
>> +maintainers:
>> +  - Wei Xu 
>> +
>> +description: |
>> +  The Hisilicon L3 outer cache controller supports a maximum of 36-bit 
>> physical
>> +  addresses. The data cached in the L3 outer cache can be operated based on 
>> the
>> +  physical address range or the entire cache.
>> +
>> +properties:
>> +  compatible:
>> +items:
>> +  - const: hisilicon,l3cache
>> +
> 
> The compatible string needs to be a little more specific, I'm sure
> you cannot guarantee that this is the only L3 cache controller ever
> designed in the past or future by HiSilicon.
> 
> Normally when you have an IP block that is itself unnamed but that is specific
> to one or a few SoCs but that has no na, the convention is to include the name
> of the first SoC that contained it.

Right, thanks for your suggestion, I will rename it to 
"hisilicon,hi1381-l3cache"
and "hisilicon,hi1215-l3cache".

> 
> Can you share which products actually use this L3 cache controller?

This L3 cache controller is used on Hi1381 and Hi1215 board. I don't know where
these two boards are used. Our company is too large. Software is delivered level
by level. I'm only involved in the Kernel-related part.

> 
> On a related note, what does the memory map look like on this chip?

memory@a0 {
 device_type = "memory";
 reg = <0x0 0xa0 0x0 0x1aa0>, <0x1 0xe000 0x0 0x1d00>, <0x0 
0x1f40 0x0 0xb5c0>;
};

Currently, the DTS is being maintained by ourselves, I'll try to upstream it 
later.

> Do you support more than 4GB of total installed memory? If you

Currently, the total size does not exceed 4 GB. However, the physical address 
is wider than 32 bits.

> do, this becomes a problem in the future as highmem support
> winds down. In fact  anything more than 1GB on a 32-bit system
> requires more work on the kernel to be completed before we remove
> highmem, and will incur a slowdown. If the total is under 4GB but the
> memory is not in a contiguous physical address range. See my
> Linaro connect presentation[1] for further information on the topic.

Great.

> 
>Arnd
> 
> [1] https://connect.linaro.org/resources/lvc20/lvc20-106/
> 
> .
> 



Re: [PATCH v2] dt-bindings: leds: Document commonly used LED triggers

2020-12-12 Thread Leizhen (ThunderTown)



On 2020/12/10 16:24, Manivannan Sadhasivam wrote:
> This commit documents the LED triggers used commonly in the SoCs. Not
> all triggers are documented as some of them are very application specific.
> Most of the triggers documented here are currently used in devicetrees
> of many SoCs.
> 
> While at it, let's also sort the triggers in ascending order.
> 
> Signed-off-by: Manivannan Sadhasivam 
> ---
> 
> Changes in v2:
> 
> * Added more triggers, fixed the regex
> * Sorted triggers in ascending order
> 
>  .../devicetree/bindings/leds/common.yaml  | 78 ++-
>  1 file changed, 60 insertions(+), 18 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/leds/common.yaml 
> b/Documentation/devicetree/bindings/leds/common.yaml
> index f1211e7045f1..3c2e2208c1da 100644
> --- a/Documentation/devicetree/bindings/leds/common.yaml
> +++ b/Documentation/devicetree/bindings/leds/common.yaml
> @@ -79,24 +79,66 @@ properties:
>the LED.
>  $ref: /schemas/types.yaml#definitions/string
>  
> -enum:
> -# LED will act as a back-light, controlled by the framebuffer system
> -  - backlight
> -# LED will turn on (but for leds-gpio see "default-state" property in
> -# Documentation/devicetree/bindings/leds/leds-gpio.yaml)
> -  - default-on
> -# LED "double" flashes at a load average based rate
> -  - heartbeat
> -# LED indicates disk activity
> -  - disk-activity
> -# LED indicates IDE disk activity (deprecated), in new 
> implementations
> -# use "disk-activity"
> -  - ide-disk
> -# LED flashes at a fixed, configurable rate
> -  - timer
> -# LED alters the brightness for the specified duration with one 
> software
> -# timer (requires "led-pattern" property)
> -  - pattern
> +oneOf:
> +  - items:
> +  - enum:
> +# LED indicates mic mute state
> +  - audio-micmute
> +# LED indicates audio mute state
> +  - audio-mute
> +# LED will act as a back-light, controlled by the 
> framebuffer system
> +  - backlight
> +# LED indicates bluetooth power state
> +  - bluetooth-power
> +# LED indicates activity of all CPUs
> +  - cpu
> +# LED will turn on (but for leds-gpio see "default-state" 
> property in
> +# Documentation/devicetree/bindings/leds/leds-gpio.yaml)
> +  - default-on
> +# LED indicates disk activity
> +  - disk-activity
> +# LED indicates disk read activity
> +  - disk-read
> +# LED indicates disk write activity
> +  - disk-write
> +# LED indicates camera flash state
> +  - flash
> +# LED "double" flashes at a load average based rate
> +  - heartbeat
> +# LED indicates IDE disk activity (deprecated), in new 
> implementations
> +# use "disk-activity"
> +  - ide-disk
> +# LED indicates MTD memory activity
> +  - mtd
> +# LED indicates NAND memory activity (deprecated),
> +# in new implementations use "mtd"
> +  - nand-disk
> +# No trigger assigned to the LED. This is the default mode
> +# if trigger is absent
> +  - none
> +# LED alters the brightness for the specified duration with 
> one software
> +# timer (requires "led-pattern" property)
> +  - pattern
> +# LED flashes at a fixed, configurable rate
> +  - timer
> +# LED indicates camera torch state
> +  - torch
> +# LED indicates USB gadget activity
> +  - usb-gadget
> +# LED indicates USB host activity
> +  - usb-host
> +  - items:
> +# LED indicates activity of [N]th CPU
> +  - pattern: "^cpu[0-9]{1,2}$"
> +  - items:
> +# LED indicates power status of [N]th Bluetooth HCI device
> +  - pattern: "^hci[0-9]{1,2}-power$"
> +  - items:
> +# LED indicates [N]th MMC storage activity
> +  - pattern: "^mmc[0-9]{1,2}$"
> +  - items:
> +# LED indicates [N]th WLAN Tx activity
> +  - pattern: "^phy[0-9]{1,2}tx$"

Only the last three are not listed:
phy0rx
ir-power-click
ir-user-click

And the first one is easily supported by:
-# LED indicates [N]th WLAN Tx activity
-  - pattern: "^phy[0-9]{1,2}tx$"
+# LED indicates [N]th WLAN Tx/Rx activity
+  - pattern: "^phy[0-9]{1,2}(tx|rx)$"

Tested-by: Zhen Lei 

>  
>led-pattern:
>  description: |
> 



Re: [PATCH v2] dt-bindings: leds: Document commonly used LED triggers

2020-12-12 Thread Leizhen (ThunderTown)



On 2020/12/13 10:39, Leizhen (ThunderTown) wrote:
> 
> 
> On 2020/12/10 16:24, Manivannan Sadhasivam wrote:
>> This commit documents the LED triggers used commonly in the SoCs. Not
>> all triggers are documented as some of them are very application specific.
>> Most of the triggers documented here are currently used in devicetrees
>> of many SoCs.
>>
>> While at it, let's also sort the triggers in ascending order.
>>
>> Signed-off-by: Manivannan Sadhasivam 
>> ---
>>
>> Changes in v2:
>>
>> * Added more triggers, fixed the regex
>> * Sorted triggers in ascending order
>>
>>  .../devicetree/bindings/leds/common.yaml  | 78 ++-
>>  1 file changed, 60 insertions(+), 18 deletions(-)
>>
>> diff --git a/Documentation/devicetree/bindings/leds/common.yaml 
>> b/Documentation/devicetree/bindings/leds/common.yaml
>> index f1211e7045f1..3c2e2208c1da 100644
>> --- a/Documentation/devicetree/bindings/leds/common.yaml
>> +++ b/Documentation/devicetree/bindings/leds/common.yaml
>> @@ -79,24 +79,66 @@ properties:
>>the LED.
>>  $ref: /schemas/types.yaml#definitions/string
>>  
>> -enum:
>> -# LED will act as a back-light, controlled by the framebuffer system
>> -  - backlight
>> -# LED will turn on (but for leds-gpio see "default-state" property 
>> in
>> -# Documentation/devicetree/bindings/leds/leds-gpio.yaml)
>> -  - default-on
>> -# LED "double" flashes at a load average based rate
>> -  - heartbeat
>> -# LED indicates disk activity
>> -  - disk-activity
>> -# LED indicates IDE disk activity (deprecated), in new 
>> implementations
>> -# use "disk-activity"
>> -  - ide-disk
>> -# LED flashes at a fixed, configurable rate
>> -  - timer
>> -# LED alters the brightness for the specified duration with one 
>> software
>> -# timer (requires "led-pattern" property)
>> -  - pattern
>> +oneOf:
>> +  - items:
>> +  - enum:
>> +# LED indicates mic mute state
>> +  - audio-micmute
>> +# LED indicates audio mute state
>> +  - audio-mute
>> +# LED will act as a back-light, controlled by the 
>> framebuffer system
>> +  - backlight
>> +# LED indicates bluetooth power state
>> +  - bluetooth-power
>> +# LED indicates activity of all CPUs
>> +  - cpu
>> +# LED will turn on (but for leds-gpio see "default-state" 
>> property in
>> +# Documentation/devicetree/bindings/leds/leds-gpio.yaml)
>> +  - default-on
>> +# LED indicates disk activity
>> +  - disk-activity
>> +# LED indicates disk read activity
>> +  - disk-read
>> +# LED indicates disk write activity
>> +  - disk-write
>> +# LED indicates camera flash state
>> +  - flash
>> +# LED "double" flashes at a load average based rate
>> +  - heartbeat
>> +# LED indicates IDE disk activity (deprecated), in new 
>> implementations
>> +# use "disk-activity"
>> +  - ide-disk
>> +# LED indicates MTD memory activity
>> +  - mtd
>> +# LED indicates NAND memory activity (deprecated),
>> +# in new implementations use "mtd"
>> +  - nand-disk
>> +# No trigger assigned to the LED. This is the default mode
>> +# if trigger is absent
>> +  - none
>> +# LED alters the brightness for the specified duration with 
>> one software
>> +# timer (requires "led-pattern" property)
>> +  - pattern
>> +# LED flashes at a fixed, configurable rate
>> +  - timer
>> +# LED indicates camera torch state
>> +  - torch
>> +# LED indicates USB gadget activity
>> +  - usb-gadget
>> +# LED indicates USB host activity
>> +  - usb-host
>> +  - items:
>> +# LED indicates activity of [N]th CPU
>> +  - patte

Re: [PATCH 0/4] dt-bindings: media: eliminate yamllint warnings

2020-12-07 Thread Leizhen (ThunderTown)



On 2020/12/7 17:08, Jacopo Mondi wrote:
> Hi Zhen,
> 
> On Mon, Dec 07, 2020 at 12:23:56PM +0800, Zhen Lei wrote:
>> These patches are based on the latest linux-next code.
>>
>> Zhen Lei (4):
>>   dt-bindings: media: adv7604: eliminate yamllint warnings
>>   dt-bindings: media: nokia,smia: eliminate yamllint warnings
>>   dt-bindings: media: ov772x: eliminate yamllint warnings
>>   dt-bindings: media: imx214: eliminate yamllint warnings
> 
> The adv7604, ov772x and imx214 bits have been addressed by:
> https://www.spinics.net/lists/linux-media/msg181093.html

OK

> 
> Thanks
>   j
> 
>>
>>  Documentation/devicetree/bindings/media/i2c/adv7604.yaml |  4 ++--
>>  Documentation/devicetree/bindings/media/i2c/mipi-ccs.yaml| 11 
>> ++-
>>  Documentation/devicetree/bindings/media/i2c/ovti,ov772x.yaml | 12 
>> ++--
>>  Documentation/devicetree/bindings/media/i2c/sony,imx214.yaml | 12 
>> ++--
>>  4 files changed, 20 insertions(+), 19 deletions(-)
>>
>> --
>> 1.8.3
>>
>>
> 
> .
> 



Re: [PATCH v2 1/3] reset: hisilicon: correct vendor prefix

2020-12-07 Thread Leizhen (ThunderTown)



On 2020/12/8 7:08, Rob Herring wrote:
> On Fri, Dec 04, 2020 at 09:42:34AM +0800, Zhen Lei wrote:
>> The vendor prefix of "Hisilicon Limited" is "hisilicon", it is clearly
>> stated in "vendor-prefixes.yaml".
> 
> Yes, but you can't fix this as changing it breaks compability between 
> DTBs and kernels.
> 
> hisi has to be documented and marked 'deprecated'.

I searched, and this is the only place that uses the hisi prefix. Currently,
YAML check will report warnings due to mismatch any of the regexes defined
in Documentation/devicetree/bindings/vendor-prefixes.yaml

Is it not good to add hisi prefix to clear warnings?
"^hisi,.*":
  description: Hisilicon Limited, deprecated.


By the way, Hi Rob:
The license information in some YAML files is incorrect. Can you correct it in 
v5.11?
For example:
WARNING: DT binding documents should be licensed (GPL-2.0-only OR BSD-2-Clause)
#26: FILE: 
Documentation/devicetree/bindings/thermal/allwinner,sun8i-a83t-ths.yaml:1:
+# SPDX-License-Identifier: GPL-2.0

> 
>>
>> Fixes: 1527058736fa ("reset: hisilicon: add reset-hi3660")
>> Fixes: 35ca8168133c ("arm64: dts: Add dts files for Hisilicon Hi3660 SoC")
>> Fixes: dd8c7b78c11b ("arm64: dts: Add devicetree for Hisilicon Hi3670 SoC")
>> Signed-off-by: Zhen Lei 
>> Cc: Zhangfei Gao 
>> Cc: Chen Feng 
>> Cc: Manivannan Sadhasivam 
>> ---
>>  arch/arm64/boot/dts/hisilicon/hi3660.dtsi | 4 ++--
>>  arch/arm64/boot/dts/hisilicon/hi3670.dtsi | 2 +-
>>  drivers/reset/hisilicon/reset-hi3660.c| 2 +-
>>  3 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/boot/dts/hisilicon/hi3660.dtsi 
>> b/arch/arm64/boot/dts/hisilicon/hi3660.dtsi
>> index 49c19c6879f95ce..bfb1375426d2b58 100644
>> --- a/arch/arm64/boot/dts/hisilicon/hi3660.dtsi
>> +++ b/arch/arm64/boot/dts/hisilicon/hi3660.dtsi
>> @@ -345,7 +345,7 @@
>>  crg_rst: crg_rst_controller {
>>  compatible = "hisilicon,hi3660-reset";
>>  #reset-cells = <2>;
>> -hisi,rst-syscon = <&crg_ctrl>;
>> +hisilicon,rst-syscon = <&crg_ctrl>;
>>  };
>>  
>>  
>> @@ -376,7 +376,7 @@
>>  
>>  iomcu_rst: reset {
>>  compatible = "hisilicon,hi3660-reset";
>> -hisi,rst-syscon = <&iomcu>;
>> +hisilicon,rst-syscon = <&iomcu>;
>>  #reset-cells = <2>;
>>  };
>>  
>> diff --git a/arch/arm64/boot/dts/hisilicon/hi3670.dtsi 
>> b/arch/arm64/boot/dts/hisilicon/hi3670.dtsi
>> index 85b0dfb35d6d396..5c5a5dc964ea848 100644
>> --- a/arch/arm64/boot/dts/hisilicon/hi3670.dtsi
>> +++ b/arch/arm64/boot/dts/hisilicon/hi3670.dtsi
>> @@ -155,7 +155,7 @@
>>  compatible = "hisilicon,hi3670-reset",
>>   "hisilicon,hi3660-reset";
>>  #reset-cells = <2>;
>> -hisi,rst-syscon = <&crg_ctrl>;
>> +hisilicon,rst-syscon = <&crg_ctrl>;
>>  };
>>  
>>  pctrl: pctrl@e8a09000 {
>> diff --git a/drivers/reset/hisilicon/reset-hi3660.c 
>> b/drivers/reset/hisilicon/reset-hi3660.c
>> index a7d4445924e558c..8f1953159a65b31 100644
>> --- a/drivers/reset/hisilicon/reset-hi3660.c
>> +++ b/drivers/reset/hisilicon/reset-hi3660.c
>> @@ -83,7 +83,7 @@ static int hi3660_reset_probe(struct platform_device *pdev)
>>  if (!rc)
>>  return -ENOMEM;
>>  
>> -rc->map = syscon_regmap_lookup_by_phandle(np, "hisi,rst-syscon");
>> +rc->map = syscon_regmap_lookup_by_phandle(np, "hisilicon,rst-syscon");
>>  if (IS_ERR(rc->map)) {
>>  dev_err(dev, "failed to get hi3660,rst-syscon\n");
>>  return PTR_ERR(rc->map);
>> -- 
>> 1.8.3
>>
>>
> 
> .
> 



Re: [PATCH v2 3/3] dt-bindings: reset: convert Hisilicon reset controller bindings to json-schema

2020-12-07 Thread Leizhen (ThunderTown)



On 2020/12/8 7:10, Rob Herring wrote:
> On Fri, Dec 04, 2020 at 09:42:36AM +0800, Zhen Lei wrote:
>> Convert the Hisilicon reset controller binding to DT schema format using
>> json-schema.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  .../bindings/reset/hisilicon,hi3660-reset.txt  | 44 -
>>  .../bindings/reset/hisilicon,hi3660-reset.yaml | 77 
>> ++
>>  2 files changed, 77 insertions(+), 44 deletions(-)
>>  delete mode 100644 
>> Documentation/devicetree/bindings/reset/hisilicon,hi3660-reset.txt
>>  create mode 100644 
>> Documentation/devicetree/bindings/reset/hisilicon,hi3660-reset.yaml
>>
>> diff --git 
>> a/Documentation/devicetree/bindings/reset/hisilicon,hi3660-reset.txt 
>> b/Documentation/devicetree/bindings/reset/hisilicon,hi3660-reset.txt
>> deleted file mode 100644
>> index aefd26710f9e87d..000
>> --- a/Documentation/devicetree/bindings/reset/hisilicon,hi3660-reset.txt
>> +++ /dev/null
>> @@ -1,44 +0,0 @@
>> -Hisilicon System Reset Controller
>> -==
>> -
>> -Please also refer to reset.txt in this directory for common reset
>> -controller binding usage.
>> -
>> -The reset controller registers are part of the system-ctl block on
>> -hi3660 and hi3670 SoCs.
>> -
>> -Required properties:
>> -- compatible: should be one of the following:
>> - "hisilicon,hi3660-reset" for HI3660
>> - "hisilicon,hi3670-reset", "hisilicon,hi3660-reset" for HI3670
>> -- hisilicon,rst-syscon: phandle of the reset's syscon.
>> -- #reset-cells : Specifies the number of cells needed to encode a
>> -  reset source.  The type shall be a  and the value shall be 2.
>> -
>> - Cell #1 : offset of the reset assert control
>> -   register from the syscon register base
>> -   offset + 4: deassert control register
>> -   offset + 8: status control register
>> - Cell #2 : bit position of the reset in the reset control register
>> -
>> -Example:
>> -iomcu: iomcu@ffd7e000 {
>> -compatible = "hisilicon,hi3660-iomcu", "syscon";
>> -reg = <0x0 0xffd7e000 0x0 0x1000>;
>> -};
>> -
>> -iomcu_rst: iomcu_rst_controller {
>> -compatible = "hisilicon,hi3660-reset";
>> -hisilicon,rst-syscon = <&iomcu>;
>> -#reset-cells = <2>;
>> -};
>> -
>> -Specifying reset lines connected to IP modules
>> -==
>> -example:
>> -
>> -i2c0: i2c@. {
>> -...
>> -resets = <&iomcu_rst 0x20 3>; /* offset: 0x20; bit: 3 */
>> -...
>> -};
>> diff --git 
>> a/Documentation/devicetree/bindings/reset/hisilicon,hi3660-reset.yaml 
>> b/Documentation/devicetree/bindings/reset/hisilicon,hi3660-reset.yaml
>> new file mode 100644
>> index 000..9bf40952e5b7d28
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/reset/hisilicon,hi3660-reset.yaml
>> @@ -0,0 +1,77 @@
>> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
>> +%YAML 1.2
>> +---
>> +$id: http://devicetree.org/schemas/reset/hisilicon,hi3660-reset.yaml#
>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>> +
>> +title: Hisilicon System Reset Controller
>> +
>> +maintainers:
>> +  - Wei Xu 
>> +
>> +description: |
>> +  Please also refer to reset.txt in this directory for common reset
>> +  controller binding usage.
>> +  The reset controller registers are part of the system-ctl block on
>> +  hi3660 and hi3670 SoCs.
>> +
>> +properties:
>> +  compatible:
>> +oneOf:
>> +  - items:
>> +  - const: hisilicon,hi3660-reset
>> +  - items:
>> +  - const: hisilicon,hi3670-reset
>> +  - const: hisilicon,hi3660-reset
>> +
>> +  hisilicon,rst-syscon:
>> +description: phandle of the reset's syscon.
>> +$ref: /schemas/types.yaml#/definitions/phandle
>> +
>> +  '#reset-cells':
>> +description: |
>> +  Specifies the number of cells needed to encode a reset source.
>> +  Cell #1 : offset of the reset assert control register from the syscon
>> +register base
>> +offset + 4: deassert control register
>> +offset + 8: status control register
>> +  Cell #2 : bit position of the reset in the reset control register
>> +const: 2
>> +
>> +required:
>> +  - compatible
>> +
>> +additionalProperties: false
>> +
>> +examples:
>> +  - |
>> +#include 
>> +#include 
>> +#include 
>> +
>> +iomcu: iomcu@ffd7e000 {
>> +compatible = "hisilicon,hi3660-iomcu", "syscon";
>> +reg = <0xffd7e000 0x1000>;
>> +};
>> +
>> +iomcu_rst: iomcu_rst_controller {
>> +compatible = "hisilicon,hi3660-reset";
>> +hisilicon,rst-syscon = <&iomcu>;
> 
> Really, if you are going to break things, this node should be a child of 
> iomcu instead and you don't need this property (just get the parent). Or 
> just add '#reset-cells' to iomcu.

There are two c

Re: [PATCH v2 1/3] reset: hisilicon: correct vendor prefix

2020-12-08 Thread Leizhen (ThunderTown)



On 2020/12/8 17:25, Philipp Zabel wrote:
> Hi Zhen,
> 
> On Fri, 2020-12-04 at 09:42 +0800, Zhen Lei wrote:
>> The vendor prefix of "Hisilicon Limited" is "hisilicon", it is clearly
>> stated in "vendor-prefixes.yaml".
>>
>> Fixes: 1527058736fa ("reset: hisilicon: add reset-hi3660")
>> Fixes: 35ca8168133c ("arm64: dts: Add dts files for Hisilicon Hi3660 SoC")
>> Fixes: dd8c7b78c11b ("arm64: dts: Add devicetree for Hisilicon Hi3670 SoC")
>> Signed-off-by: Zhen Lei 
>> Cc: Zhangfei Gao 
>> Cc: Chen Feng 
>> Cc: Manivannan Sadhasivam 
>> ---
>>  arch/arm64/boot/dts/hisilicon/hi3660.dtsi | 4 ++--
>>  arch/arm64/boot/dts/hisilicon/hi3670.dtsi | 2 +-
>>  drivers/reset/hisilicon/reset-hi3660.c| 2 +-
> 
> Please keep device tree patches and reset driver patch separate, as they
> were in v1.

OK

> 
>>  3 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/boot/dts/hisilicon/hi3660.dtsi 
>> b/arch/arm64/boot/dts/hisilicon/hi3660.dtsi
>> index 49c19c6879f95ce..bfb1375426d2b58 100644
>> --- a/arch/arm64/boot/dts/hisilicon/hi3660.dtsi
>> +++ b/arch/arm64/boot/dts/hisilicon/hi3660.dtsi
>> @@ -345,7 +345,7 @@
>>  crg_rst: crg_rst_controller {
>>  compatible = "hisilicon,hi3660-reset";
>>  #reset-cells = <2>;
>> -hisi,rst-syscon = <&crg_ctrl>;
>> +hisilicon,rst-syscon = <&crg_ctrl>;
>>  };
>>  
>>  
>> @@ -376,7 +376,7 @@
>>  
>>  iomcu_rst: reset {
>>  compatible = "hisilicon,hi3660-reset";
>> -hisi,rst-syscon = <&iomcu>;
>> +hisilicon,rst-syscon = <&iomcu>;
>>  #reset-cells = <2>;
>>  };
>>  
>> diff --git a/arch/arm64/boot/dts/hisilicon/hi3670.dtsi 
>> b/arch/arm64/boot/dts/hisilicon/hi3670.dtsi
>> index 85b0dfb35d6d396..5c5a5dc964ea848 100644
>> --- a/arch/arm64/boot/dts/hisilicon/hi3670.dtsi
>> +++ b/arch/arm64/boot/dts/hisilicon/hi3670.dtsi
>> @@ -155,7 +155,7 @@
>>  compatible = "hisilicon,hi3670-reset",
>>   "hisilicon,hi3660-reset";
>>  #reset-cells = <2>;
>> -hisi,rst-syscon = <&crg_ctrl>;
>> +hisilicon,rst-syscon = <&crg_ctrl>;
>>  };
>>  
>>  pctrl: pctrl@e8a09000 {
>> diff --git a/drivers/reset/hisilicon/reset-hi3660.c 
>> b/drivers/reset/hisilicon/reset-hi3660.c
>> index a7d4445924e558c..8f1953159a65b31 100644
>> --- a/drivers/reset/hisilicon/reset-hi3660.c
>> +++ b/drivers/reset/hisilicon/reset-hi3660.c
>> @@ -83,7 +83,7 @@ static int hi3660_reset_probe(struct platform_device *pdev)
>>  if (!rc)
>>  return -ENOMEM;
>>  
>> -rc->map = syscon_regmap_lookup_by_phandle(np, "hisi,rst-syscon");
>> +rc->map = syscon_regmap_lookup_by_phandle(np, "hisilicon,rst-syscon");
> 
> This should fall back to the deprecated compatible, for example:
> 
>   rc->map = syscon_regmap_lookup_by_phandle(np, "hisilicon,rst-syscon");
>   if (PTR_ERR(rc->map) == -ENODEV)
>   rc->map = syscon_regmap_lookup_by_phandle(np, "hisi,rst-
> syscon");

Oh, thanks. I misunderstood your suggestion the other day. I'll fix it right 
away.

> 
>>  if (IS_ERR(rc->map)) {
>>  dev_err(dev, "failed to get hi3660,rst-syscon\n");
>>  return PTR_ERR(rc->map);
> 
> regards
> Philipp
> 
> .
> 



Re: [PATCH v3 2/3] dt-bindings: arm: hisilicon: Add binding for L3 cache controller

2021-01-12 Thread Leizhen (ThunderTown)



On 2021/1/12 21:55, Arnd Bergmann wrote:
> On Tue, Jan 12, 2021 at 1:35 PM Leizhen (ThunderTown)
>  wrote:
>> On 2021/1/12 16:46, Arnd Bergmann wrote:
>>> On Tue, Jan 12, 2021 at 2:56 AM Zhen Lei  wrote:
>>>
>>>> +---
>>>> +$id: http://devicetree.org/schemas/arm/hisilicon/l3cache.yaml#
>>>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>>>> +
>>>> +title: Hisilicon L3 cache controller
>>>> +
>>>> +maintainers:
>>>> +  - Wei Xu 
>>>> +
>>>> +description: |
>>>> +  The Hisilicon L3 outer cache controller supports a maximum of 36-bit 
>>>> physical
>>>> +  addresses. The data cached in the L3 outer cache can be operated based 
>>>> on the
>>>> +  physical address range or the entire cache.
>>>> +
>>>> +properties:
>>>> +  compatible:
>>>> +items:
>>>> +  - const: hisilicon,l3cache
>>>> +
>>>
>>> The compatible string needs to be a little more specific, I'm sure
>>> you cannot guarantee that this is the only L3 cache controller ever
>>> designed in the past or future by HiSilicon.
>>>
>>> Normally when you have an IP block that is itself unnamed but that is 
>>> specific
>>> to one or a few SoCs but that has no na, the convention is to include the 
>>> name
>>> of the first SoC that contained it.
>>
>> Right, thanks for your suggestion, I will rename it to 
>> "hisilicon,hi1381-l3cache"
>> and "hisilicon,hi1215-l3cache".

Sorry, Just received a response from the hardware developers, the SoC names 
need to
be changed:
hi1381 --> kunpeng509
hi1215 --> kunpeng506

So I want to rename the compatible string to "hisilicon,kunpeng-l3v1", Kunpeng 
L3
cache controller version 1. This is enough to distinguish other versions of 
cache
controller. It also facilitates the naming of the config option and files.

> 
> Sounds good.
> 
>>> Can you share which products actually use this L3 cache controller?
>>
>> This L3 cache controller is used on Hi1381 and Hi1215 board. I don't know 
>> where
>> these two boards are used. Our company is too large. Software is delivered 
>> level
>> by level. I'm only involved in the Kernel-related part.
>>
>>>
>>> On a related note, what does the memory map look like on this chip?
>>
>> memory@a0 {
>>  device_type = "memory";
>>  reg = <0x0 0xa0 0x0 0x1aa0>, <0x1 0xe000 0x0 0x1d00>, 
>> <0x0 0x1f40 0x0 0xb5c0>;
>> };
>>
>> Currently, the DTS is being maintained by ourselves, I'll try to upstream it 
>> later.
>>
>>> Do you support more than 4GB of total installed memory? If you
>>
>> Currently, the total size does not exceed 4 GB. However, the physical 
>> address is wider than 32 bits.
> 
> Ok, so it appears that the memory is actually contiguous in the first
> 3.5GB (with a few holes), plus the remaining 0.5GB being offset in
> the physical memory by 4GB (starting at 0x1e000 instead of
> 0xe000), presumably to allow the use of 32-bit DMA addresses.
> 
> This works fine for the moment, but it does require support for
> a nonlinear virt_to_phys()/phys_to_virt() translation after highmem
> gets removed, and you would get at most 3.75GB anyway, so it
> might be easier at that point to just drop the entire last block at
> 0x1e000, but this will depend on how well we get the 4G:4G
> code to work, and whether the users will still need kernel updates for
> this platform then.>
>  Arnd
> 
> .
> 



Re: [PATCH v3 2/3] dt-bindings: arm: hisilicon: Add binding for L3 cache controller

2021-01-13 Thread Leizhen (ThunderTown)



On 2021/1/13 15:44, Leizhen (ThunderTown) wrote:
> 
> 
> On 2021/1/12 21:55, Arnd Bergmann wrote:
>> On Tue, Jan 12, 2021 at 1:35 PM Leizhen (ThunderTown)
>>  wrote:
>>> On 2021/1/12 16:46, Arnd Bergmann wrote:
>>>> On Tue, Jan 12, 2021 at 2:56 AM Zhen Lei  
>>>> wrote:
>>>>
>>>>> +---
>>>>> +$id: http://devicetree.org/schemas/arm/hisilicon/l3cache.yaml#
>>>>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>>>>> +
>>>>> +title: Hisilicon L3 cache controller
>>>>> +
>>>>> +maintainers:
>>>>> +  - Wei Xu 
>>>>> +
>>>>> +description: |
>>>>> +  The Hisilicon L3 outer cache controller supports a maximum of 36-bit 
>>>>> physical
>>>>> +  addresses. The data cached in the L3 outer cache can be operated based 
>>>>> on the
>>>>> +  physical address range or the entire cache.
>>>>> +
>>>>> +properties:
>>>>> +  compatible:
>>>>> +items:
>>>>> +  - const: hisilicon,l3cache
>>>>> +
>>>>
>>>> The compatible string needs to be a little more specific, I'm sure
>>>> you cannot guarantee that this is the only L3 cache controller ever
>>>> designed in the past or future by HiSilicon.
>>>>
>>>> Normally when you have an IP block that is itself unnamed but that is 
>>>> specific
>>>> to one or a few SoCs but that has no na, the convention is to include the 
>>>> name
>>>> of the first SoC that contained it.
>>>
>>> Right, thanks for your suggestion, I will rename it to 
>>> "hisilicon,hi1381-l3cache"
>>> and "hisilicon,hi1215-l3cache".
> 
> Sorry, Just received a response from the hardware developers, the SoC names 
> need to
> be changed:
> hi1381 --> kunpeng509
> hi1215 --> kunpeng506
> 
> So I want to rename the compatible string to "hisilicon,kunpeng-l3v1", 
> Kunpeng L3

I thought about it. Let's name it "hisilicon,kunpeng-l3cache", and then add v2 
in
the future. Maybe the SoC name is changed later, and v2 is not required.

> cache controller version 1. This is enough to distinguish other versions of 
> cache
> controller. It also facilitates the naming of the config option and files.
> 
>>
>> Sounds good.
>>
>>>> Can you share which products actually use this L3 cache controller?
>>>
>>> This L3 cache controller is used on Hi1381 and Hi1215 board. I don't know 
>>> where
>>> these two boards are used. Our company is too large. Software is delivered 
>>> level
>>> by level. I'm only involved in the Kernel-related part.
>>>
>>>>
>>>> On a related note, what does the memory map look like on this chip?
>>>
>>> memory@a0 {
>>>  device_type = "memory";
>>>  reg = <0x0 0xa0 0x0 0x1aa0>, <0x1 0xe000 0x0 0x1d00>, 
>>> <0x0 0x1f40 0x0 0xb5c0>;
>>> };
>>>
>>> Currently, the DTS is being maintained by ourselves, I'll try to upstream 
>>> it later.
>>>
>>>> Do you support more than 4GB of total installed memory? If you
>>>
>>> Currently, the total size does not exceed 4 GB. However, the physical 
>>> address is wider than 32 bits.
>>
>> Ok, so it appears that the memory is actually contiguous in the first
>> 3.5GB (with a few holes), plus the remaining 0.5GB being offset in
>> the physical memory by 4GB (starting at 0x1e000 instead of
>> 0xe000), presumably to allow the use of 32-bit DMA addresses.
>>
>> This works fine for the moment, but it does require support for
>> a nonlinear virt_to_phys()/phys_to_virt() translation after highmem
>> gets removed, and you would get at most 3.75GB anyway, so it
>> might be easier at that point to just drop the entire last block at
>> 0x1e000, but this will depend on how well we get the 4G:4G
>> code to work, and whether the users will still need kernel updates for
>> this platform then.>
>>  Arnd
>>
>> .
>>
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> .
> 



Re: [PATCH v3 2/3] dt-bindings: arm: hisilicon: Add binding for L3 cache controller

2021-01-13 Thread Leizhen (ThunderTown)



On 2021/1/13 19:15, Arnd Bergmann wrote:
> On Wed, Jan 13, 2021 at 9:13 AM Leizhen (ThunderTown)
>  wrote:
>> On 2021/1/13 15:44, Leizhen (ThunderTown) wrote:
>>> On 2021/1/12 21:55, Arnd Bergmann wrote:
>>>> On Tue, Jan 12, 2021 at 1:35 PM Leizhen (ThunderTown)
>>>>  wrote:
>>>>> On 2021/1/12 16:46, Arnd Bergmann wrote:
>>>>>> On Tue, Jan 12, 2021 at 2:56 AM Zhen Lei  
>>>>>> wrote:
>>>>>>
>>>>>>> +---
>>>>>>> +$id: http://devicetree.org/schemas/arm/hisilicon/l3cache.yaml#
>>>>>>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>>>>>>> +
>>>>>>> +title: Hisilicon L3 cache controller
>>>>>>> +
>>>>>>> +maintainers:
>>>>>>> +  - Wei Xu 
>>>>>>> +
>>>>>>> +description: |
>>>>>>> +  The Hisilicon L3 outer cache controller supports a maximum of 36-bit 
>>>>>>> physical
>>>>>>> +  addresses. The data cached in the L3 outer cache can be operated 
>>>>>>> based on the
>>>>>>> +  physical address range or the entire cache.
>>>>>>> +
>>>>>>> +properties:
>>>>>>> +  compatible:
>>>>>>> +items:
>>>>>>> +  - const: hisilicon,l3cache
>>>>>>> +
>>>>>>
>>>>>> The compatible string needs to be a little more specific, I'm sure
>>>>>> you cannot guarantee that this is the only L3 cache controller ever
>>>>>> designed in the past or future by HiSilicon.
>>>>>>
>>>>>> Normally when you have an IP block that is itself unnamed but that is 
>>>>>> specific
>>>>>> to one or a few SoCs but that has no na, the convention is to include 
>>>>>> the name
>>>>>> of the first SoC that contained it.
>>>>>
>>>>> Right, thanks for your suggestion, I will rename it to 
>>>>> "hisilicon,hi1381-l3cache"
>>>>> and "hisilicon,hi1215-l3cache".
>>>
>>> Sorry, Just received a response from the hardware developers, the SoC names 
>>> need to
>>> be changed:
>>> hi1381 --> kunpeng509
>>> hi1215 --> kunpeng506
>>>
>>> So I want to rename the compatible string to "hisilicon,kunpeng-l3v1", 
>>> Kunpeng L3
>>
>> I thought about it. Let's name it "hisilicon,kunpeng-l3cache", and then add 
>> v2 in
>> the future. Maybe the SoC name is changed later, and v2 is not required.
> 
> I would prefer the more specific name to be listed as well. You can
> use the generic
> "hisilicon,kunpeng-l3cache" as the key that the driver uses, but
> please also include
> the chip specific one here.

Oh, yes. Sometimes, the "syscon" is used this way . The first string describes
the component information,and the second string is used to match the driver.

compatible = "hisilicon,kunpeng506-l3cache", "hisilicon,kunpeng-l3cache"


> We tend to use the chip identifiers
> (hi1381, ...), but if
> the marketing names (kunpeng509, ...) are now what they are known as in the

The hardware developers told me that hi1381 is the internal chip identifier,
and should be deprecated. kunpeng509 is both chip identifier and marketing name.

Kunpeng is the pinyin of two Chinese characters. They are two mythical animals.

> data sheet, then use that. The problem with marketing names is that they are
> more often unrelated to the technology underneath. It's possible that there
> might be e.g. kunpeng507 chip that sold to the same customers but very 
> different
> internally from kunpeng506/kunpeng509. This also happens with the chip 
> numbers,

It shouldn't make a big difference,unless the first two numbers are different.

> but those tend to be more stable (at least for other manufacturers).
> 
>Arnd
> 
> .
> 



Re: [PATCH 1/2] perf/smmuv3: Don't reserve the register space that overlaps with the SMMUv3

2021-01-19 Thread Leizhen (ThunderTown)



On 2021/1/19 20:32, Robin Murphy wrote:
> On 2021-01-19 01:59, Zhen Lei wrote:
>> Some SMMUv3 implementation embed the Perf Monitor Group Registers (PMCG)
>> inside the first 64kB region of the SMMU. Since SMMU and PMCG are managed
>> by two separate drivers, and this driver depends on ARM_SMMU_V3, so the
>> SMMU driver reserves the corresponding resource first, this driver should
>> not reserve the corresponding resource again. Otherwise, a resource
>> reservation conflict is reported during boot.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>   drivers/perf/arm_smmuv3_pmu.c | 42 
>> --
>>   1 file changed, 40 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
>> index 74474bb322c3f26..dcce085431c6ce8 100644
>> --- a/drivers/perf/arm_smmuv3_pmu.c
>> +++ b/drivers/perf/arm_smmuv3_pmu.c
>> @@ -761,6 +761,44 @@ static void smmu_pmu_get_acpi_options(struct smmu_pmu 
>> *smmu_pmu)
>>   dev_notice(smmu_pmu->dev, "option mask 0x%x\n", smmu_pmu->options);
>>   }
>>   +static void __iomem *
>> +smmu_pmu_get_and_ioremap_resource(struct platform_device *pdev,
>> +  unsigned int index,
>> +  struct resource **out_res)
>> +{
>> +    int ret;
>> +    void __iomem *base;
>> +    struct resource *res;
>> +
>> +    res = platform_get_resource(pdev, IORESOURCE_MEM, index);
>> +    if (!res) {
>> +    dev_err(&pdev->dev, "invalid resource\n");
>> +    return IOMEM_ERR_PTR(-EINVAL);
>> +    }
>> +    if (out_res)
>> +    *out_res = res;
>> +
>> +    ret = region_intersects(res->start, resource_size(res),
>> +    IORESOURCE_MEM, IORES_DESC_NONE);
>> +    if (ret == REGION_INTERSECTS) {
>> +    /*
>> + * The resource has already been reserved by the SMMUv3 driver.
>> + * Don't reserve it again, just do devm_ioremap().
>> + */
>> +    base = devm_ioremap(&pdev->dev, res->start, resource_size(res));
>> +    } else {
>> +    /*
>> + * The resource may have not been reserved by any driver, or
>> + * has been reserved but not type IORESOURCE_MEM. In the latter
>> + * case, devm_ioremap_resource() reports a conflict and returns
>> + * IOMEM_ERR_PTR(-EBUSY).
>> + */
>> +    base = devm_ioremap_resource(&pdev->dev, res);
>> +    }
> 
> What if the PMCG driver simply happens to probe first?

There are 4 cases:
1) ARM_SMMU_V3=m, ARM_SMMU_V3_PMU=y
   It's not allowed. Becase: ARM_SMMU_V3_PMU depends on ARM_SMMU_V3
   config ARM_SMMU_V3_PMU
 tristate "ARM SMMUv3 Performance Monitors Extension"
 depends on ARM64 && ACPI && ARM_SMMU_V3

2) ARM_SMMU_V3=y, ARM_SMMU_V3_PMU=m
   No problem, SMMUv3 will be initialized first.

3) ARM_SMMU_V3=y, ARM_SMMU_V3_PMU=y
   vi drivers/Makefile
   60 obj-y   += iommu/
   172 obj-$(CONFIG_PERF_EVENTS)   += perf/

   This link sequence ensure that SMMUv3 driver will be initialized first.
   They are currently at the same initialization level.

4) ARM_SMMU_V3=m, ARM_SMMU_V3_PMU=m
   Sorry, I thought module dependencies were generated based on "depends on".
   But I tried it today,module dependencies are generated only when symbol
   dependencies exist. I should use MODULE_SOFTDEP() to explicitly mark the
   dependency. I will send V2 later.


> 
> Robin.
> 
>> +
>> +    return base;
>> +}
>> +
>>   static int smmu_pmu_probe(struct platform_device *pdev)
>>   {
>>   struct smmu_pmu *smmu_pmu;
>> @@ -793,7 +831,7 @@ static int smmu_pmu_probe(struct platform_device *pdev)
>>   .capabilities    = PERF_PMU_CAP_NO_EXCLUDE,
>>   };
>>   -    smmu_pmu->reg_base = devm_platform_get_and_ioremap_resource(pdev, 0, 
>> &res_0);
>> +    smmu_pmu->reg_base = smmu_pmu_get_and_ioremap_resource(pdev, 0, &res_0);
>>   if (IS_ERR(smmu_pmu->reg_base))
>>   return PTR_ERR(smmu_pmu->reg_base);
>>   @@ -801,7 +839,7 @@ static int smmu_pmu_probe(struct platform_device *pdev)
>>     /* Determine if page 1 is present */
>>   if (cfgr & SMMU_PMCG_CFGR_RELOC_CTRS) {
>> -    smmu_pmu->reloc_base = devm_platform_ioremap_resource(pdev, 1);
>> +    smmu_pmu->reloc_base = smmu_pmu_get_and_ioremap_resource(pdev, 1, 
>> NULL);
>>   if (IS_ERR(smmu_pmu->reloc_base))
>>   return PTR_ERR(smmu_pmu->reloc_base);
>>   } else {
>>
> 
> .
> 



Re: [v2] Old platforms: bring out your dead

2021-01-15 Thread Leizhen (ThunderTown)



On 2021/1/15 17:26, Arnd Bergmann wrote:
> On Fri, Jan 15, 2021 at 8:08 AM Wei Xu  wrote:
>> On 2021/1/14 0:14, Arnd Bergmann wrote:
>>> On Fri, Jan 8, 2021 at 11:55 PM Arnd Bergmann  wrote:
>>> * mmp -- added in 2009, DT support is active, but board files might go
>>> * cns3xxx -- added in 2010, last fixed in 2019, probably no users left
>>> * hisi (hip01/hip05) -- servers added in 2013, replaced with arm64 in 2016
>>
>> I think it is OK to drop the support of the hip01(arm32) and hip05(arm64).
>> Could you also help to drop the support of the hip04(arm32) which I think 
>> nobody use as well?
> 
> Thank you for your reply! I actually meant to write hip04 instead of hip05,
> so I was only asking about the two 32-bit targets. I would expect that
> hip05 still has a few users, but wouldn't mind removing that as well if you
> are sure there are none.
> 
> Since Zhen Lei is starting to upstream Kunpeng506 and Kunpeng509
> support, can you clarify how much reuse of IP blocks there is between
> hip04 and those? In particular, hip04 has custom code for (at least)
> platmcpm, clk, irqchip, ethernet, and hw_rng, probably more as those
> were only the ones I see on a quick grep.
> 
> If we remove hip04, should we remove all these drivers right away,
> or keep some of them around?

I think the drivers should be kept. Currently, at least hip04_eth.c and
irq-hip04.c are used. These drivers were originally written for Hip04, but
the drivers used by other boards maybe similar to them. Therefore, these
drivers are extended without adding new drivers.

> 
>   Arnd
> 
> .
> 



Re: [PATCH 1/1] ARM: LPAE: use phys_addr_t instead of unsigned long in outercache hooks

2020-12-28 Thread Leizhen (ThunderTown)



On 2020/12/26 20:13, Russell King - ARM Linux admin wrote:
> On Fri, Dec 25, 2020 at 07:44:58PM +0800, Zhen Lei wrote:
>> The outercache of some Hisilicon SOCs support physical addresses wider
>> than 32-bits. The unsigned long datatype is not sufficient for mapping
>> physical addresses >= 4GB. The commit ad6b9c9d78b9 ("ARM: 6671/1: LPAE:
>> use phys_addr_t instead of unsigned long in outercache functions") has
>> already modified the outercache functions. But the parameters of the
>> outercache hooks are not changed. This patch use phys_addr_t instead of
>> unsigned long in outercache hooks: inv_range, clean_range, flush_range.
>>
>> To ensure the outercache that does not support LPAE works properly, do
>> cast phys_addr_t to unsigned long by adding a middle-tier function.
> 
> Please don't do that. The cast can be done inside the L2 functions
> themselves without needing all these additional functions.

OK. At first, I wanted to fit in like this:

-static void l2c220_inv_range(unsigned long start, unsigned long end)
+static void l2c220_inv_range(phys_addr_t lpae_start, phys_addr_t lpae_end)
 {
+  unsigned long start = lpae_start;
+  unsigned long end = lpae_end;


> 
> We probably ought to also add some protection against addresses > 4GB,
> although these are hot paths, so we don't want to add tests in these
> functions. Maybe instead checking whether the system has memory above
> 4GB while the L2 cache is being initialised would be a good idea?
> 

I'm sorry, I didn't quite understand what you meant. Currently, the
biggest problem is the compilation problem. The sizeof(long) may be
32, and the 64-bit physical address cannot be transferred from outcache
functions to outcache hooks.



Re: [PATCH 1/1] ARM: LPAE: use phys_addr_t instead of unsigned long in outercache hooks

2020-12-28 Thread Leizhen (ThunderTown)



On 2020/12/26 20:15, Russell King - ARM Linux admin wrote:
> On Sat, Dec 26, 2020 at 10:18:08AM +0800, Leizhen (ThunderTown) wrote:
>> On 2020/12/25 19:44, Zhen Lei wrote:
>>> The outercache of some Hisilicon SOCs support physical addresses wider
>>> than 32-bits. The unsigned long datatype is not sufficient for mapping
>>> physical addresses >= 4GB. The commit ad6b9c9d78b9 ("ARM: 6671/1: LPAE:
>>> use phys_addr_t instead of unsigned long in outercache functions") has
>>> already modified the outercache functions. But the parameters of the
>>> outercache hooks are not changed. This patch use phys_addr_t instead of
>>> unsigned long in outercache hooks: inv_range, clean_range, flush_range.
>>>
>>> To ensure the outercache that does not support LPAE works properly, do
>>> cast phys_addr_t to unsigned long by adding a middle-tier function.
>>
>> This patch will impact the outercache drivers that have not been merged into
>> the kernel. They should also update the datatype of the outercache hooks.
> 
> This isn't much of a concern to mainline. If it's that big a problem
> for you, then please consider merging your code into mainline so that
> everyone can benefit from it.

All right, I got it.

> 



Re: [PATCH 1/1] ARM: LPAE: use phys_addr_t instead of unsigned long in outercache hooks

2020-12-28 Thread Leizhen (ThunderTown)



On 2020/12/28 15:00, Arnd Bergmann wrote:
> On Fri, Dec 25, 2020 at 12:48 PM Zhen Lei  wrote:
>>
>> The outercache of some Hisilicon SOCs support physical addresses wider
>> than 32-bits. The unsigned long datatype is not sufficient for mapping
>> physical addresses >= 4GB. The commit ad6b9c9d78b9 ("ARM: 6671/1: LPAE:
>> use phys_addr_t instead of unsigned long in outercache functions") has
>> already modified the outercache functions. But the parameters of the
>> outercache hooks are not changed. This patch use phys_addr_t instead of
>> unsigned long in outercache hooks: inv_range, clean_range, flush_range.
>>
>> To ensure the outercache that does not support LPAE works properly, do
>> cast phys_addr_t to unsigned long by adding a middle-tier function.
>> For example:
>> -static void l2c220_inv_range(unsigned long start, unsigned long end)
>> +static void __l2c220_inv_range(unsigned long start, unsigned long end)
>>  {
>> ...
>>  }
>> +static void l2c220_inv_range(phys_addr_t start, phys_addr_t end)
>> +{
>> +  __l2c220_inv_range(start, end);
>> +}
>>
>> Note that the outercache functions have been doing this cast before this
>> patch. So now, the cast is just moved to the middle-tier function.
>>
>> No functional change.
>>
>> Signed-off-by: Zhen Lei 
> 
> This looks reasonable in principle, but it would be helpful to
> understand better which SoCs are affected. In which way is
> this specific to Hisilicon implementations, and why would others
> not need this?

I answered at the end.

> 
> Wouldn't this also be needed by an Armada XP that supports
> more than 4GB of RAM but has an outer cache?

I don't know about the armada XP environment.

> 
> I suppose those SoCs using off-the-shelf Arm cores are either
> pre-LPAE and cannot address memory above 4GB, or they do
> not need the outer_cache interfaces.

I think so.

> 
>> diff --git a/arch/arm/mm/cache-feroceon-l2.c 
>> b/arch/arm/mm/cache-feroceon-l2.c
>> index 5c1b7a7b9af6300..ab1d8051bf832c9 100644
>> --- a/arch/arm/mm/cache-feroceon-l2.c
>> +++ b/arch/arm/mm/cache-feroceon-l2.c
>> @@ -168,7 +168,7 @@ static unsigned long calc_range_end(unsigned long start, 
>> unsigned long end)
>> return range_end;
>>  }
>>
>> -static void feroceon_l2_inv_range(unsigned long start, unsigned long end)
>> +static void __feroceon_l2_inv_range(unsigned long start, unsigned long end)
>>  {
>> /*
>>  * Clean and invalidate partial first cache line.
>> @@ -198,7 +198,12 @@ static void feroceon_l2_inv_range(unsigned long start, 
>> unsigned long end)
>> dsb();
>>  }
>>
>> -static void feroceon_l2_clean_range(unsigned long start, unsigned long end)
>> +static void feroceon_l2_inv_range(phys_addr_t start, phys_addr_t end)
>> +{
>> +   __feroceon_l2_inv_range(start, end);
>> +}
>> +
> 
> What is this indirection for? It looks like you do this for all 
> implementations,
> so the actual address gets truncated here.

Because these environments are all 32-bit physical addresses or only the lower
32-bit physical addresses need to be operated. But my environment operates 
64-bit
physical address and sizeof(long) is 32. So need to change the datatype of the
outchache hooks.

 struct outer_cache_fns {
-   void (*inv_range)(unsigned long, unsigned long);
-   void (*clean_range)(unsigned long, unsigned long);
-   void (*flush_range)(unsigned long, unsigned long);
+   void (*inv_range)(phys_addr_t, phys_addr_t);
+   void (*clean_range)(phys_addr_t, phys_addr_t);
+   void (*flush_range)(phys_addr_t, phys_addr_t);
void (*flush_all)(void);

I added middle-tier function for all implementations, just to ensure that the
above changes do not have side effects on them.

> 
>Arnd
> 
> .
> 



Re: [PATCH 1/1] ARM: LPAE: use phys_addr_t instead of unsigned long in outercache hooks

2020-12-30 Thread Leizhen (ThunderTown)



On 2020/12/29 18:51, Russell King - ARM Linux admin wrote:
> On Tue, Dec 29, 2020 at 02:30:56PM +0800, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2020/12/26 20:13, Russell King - ARM Linux admin wrote:
>>> On Fri, Dec 25, 2020 at 07:44:58PM +0800, Zhen Lei wrote:
>>>> The outercache of some Hisilicon SOCs support physical addresses wider
>>>> than 32-bits. The unsigned long datatype is not sufficient for mapping
>>>> physical addresses >= 4GB. The commit ad6b9c9d78b9 ("ARM: 6671/1: LPAE:
>>>> use phys_addr_t instead of unsigned long in outercache functions") has
>>>> already modified the outercache functions. But the parameters of the
>>>> outercache hooks are not changed. This patch use phys_addr_t instead of
>>>> unsigned long in outercache hooks: inv_range, clean_range, flush_range.
>>>>
>>>> To ensure the outercache that does not support LPAE works properly, do
>>>> cast phys_addr_t to unsigned long by adding a middle-tier function.
>>>
>>> Please don't do that. The cast can be done inside the L2 functions
>>> themselves without needing all these additional functions.
>>
>> OK. At first, I wanted to fit in like this:
>>
>> -static void l2c220_inv_range(unsigned long start, unsigned long end)
>> +static void l2c220_inv_range(phys_addr_t lpae_start, phys_addr_t lpae_end)
>>  {
>> +  unsigned long start = lpae_start;
>> +  unsigned long end = lpae_end;
> 
> It sounds like there should be a "but..." clause here. This is exactly
> what I'm suggesting you should be doing. Currently, there's a silent
> narrowing cast in every single caller of the outer_.*_range() functions
> and you're only moving it from the callsites to inside the called
> functions.

Okay, I will send v2 based on this idea.

> 
>>> We probably ought to also add some protection against addresses > 4GB,
>>> although these are hot paths, so we don't want to add tests in these
>>> functions. Maybe instead checking whether the system has memory above
>>> 4GB while the L2 cache is being initialised would be a good idea?
>>
>> I'm sorry, I didn't quite understand what you meant. Currently, the
>> biggest problem is the compilation problem. The sizeof(long) may be
>> 32, and the 64-bit physical address cannot be transferred from outcache
>> functions to outcache hooks.
> 
> What I mean is that we really ought to warn if the L2C310 code tries to
> initialise on a system where memory is above 4GB. However, it's very
> unlikely that such a system exists, so it's probably fine not implement
> a check, but it just feels fragile to be truncating the 64-bit address
> to 32-bit on a kernel that _could_ support higher addresses, even though
> that's exactly what is happening today (kind of by accident - I don't
> think anyone realised.)
> 



Re: [PATCH v2 1/1] ARM: LPAE: use phys_addr_t instead of unsigned long in outercache hooks

2020-12-30 Thread Leizhen (ThunderTown)



On 2020/12/30 17:54, Russell King - ARM Linux admin wrote:
> On Wed, Dec 30, 2020 at 04:28:05PM +0800, Zhen Lei wrote:
>> The outercache of some Hisilicon SOCs support physical addresses wider
>> than 32-bits. The unsigned long datatype is not sufficient for mapping
>> physical addresses >= 4GB. The commit ad6b9c9d78b9 ("ARM: 6671/1: LPAE:
>> use phys_addr_t instead of unsigned long in outercache functions") has
>> already modified the outercache functions. But the parameters of the
>> outercache hooks are not changed. This patch use phys_addr_t instead of
>> unsigned long in outercache hooks: inv_range, clean_range, flush_range.
>>
>> To ensure the outercache that does not support LPAE works properly, do
>> cast phys_addr_t to unsigned long by adding a group of temporary
>> variables. For example:
>> -static void l2c220_inv_range(unsigned long start, unsigned long end)
>> +static void l2c220_inv_range(phys_addr_t pa_start, phys_addr_t pa_end)
>>  {
>> +unsigned long start = pa_start;
>> +unsigned long end = pa_end;
>>
>> Note that the outercache functions have been doing this cast before this
>> patch. So now, the cast is just moved into the outercache hook functions.
>>
>> No functional change.
>>
>> Signed-off-by: Zhen Lei 
> 
> This is fine, but there really needs to be a patch that makes use of
> this change before we accept it into mainline kernels.
> 

OK, I will send the outcache driver of Hisilicon after New Year's Day.



Re: [PATCH 1/1] ARM: LPAE: use phys_addr_t instead of unsigned long in outercache hooks

2020-12-25 Thread Leizhen (ThunderTown)



On 2020/12/25 19:44, Zhen Lei wrote:
> The outercache of some Hisilicon SOCs support physical addresses wider
> than 32-bits. The unsigned long datatype is not sufficient for mapping
> physical addresses >= 4GB. The commit ad6b9c9d78b9 ("ARM: 6671/1: LPAE:
> use phys_addr_t instead of unsigned long in outercache functions") has
> already modified the outercache functions. But the parameters of the
> outercache hooks are not changed. This patch use phys_addr_t instead of
> unsigned long in outercache hooks: inv_range, clean_range, flush_range.
> 
> To ensure the outercache that does not support LPAE works properly, do
> cast phys_addr_t to unsigned long by adding a middle-tier function.
> For example:
> -static void l2c220_inv_range(unsigned long start, unsigned long end)
> +static void __l2c220_inv_range(unsigned long start, unsigned long end)
>  {
>   ...
>  }
> +static void l2c220_inv_range(phys_addr_t start, phys_addr_t end)
> +{
> +  __l2c220_inv_range(start, end);
> +}
> 
> Note that the outercache functions have been doing this cast before this
> patch. So now, the cast is just moved to the middle-tier function.
> 
> No functional change.

This patch will impact the outercache drivers that have not been merged into
the kernel. They should also update the datatype of the outercache hooks.

Another compatible solution is to add three new outercache hooks, as follows:

diff --git a/arch/arm/include/asm/outercache.h 
b/arch/arm/include/asm/outercache.h
index 3364637755e86aa..83344d0428fa5b6 100644
--- a/arch/arm/include/asm/outercache.h
+++ b/arch/arm/include/asm/outercache.h
@@ -17,6 +17,9 @@ struct outer_cache_fns {
 void (*inv_range)(unsigned long, unsigned long);
 void (*clean_range)(unsigned long, unsigned long);
 void (*flush_range)(unsigned long, unsigned long);
+  void (*lpae_inv_range)(phys_addr_t, phys_addr_t);
+  void (*lpae_clean_range)(phys_addr_t, phys_addr_t);
+  void (*lpae_flush_range)(phys_addr_t, phys_addr_t);
 void (*flush_all)(void);
 void (*disable)(void);
 #ifdef CONFIG_OUTER_CACHE_SYNC
@@ -41,6 +44,8 @@ static inline void outer_inv_range(phys_addr_t start, 
phys_addr_t end)
 {
 if (outer_cache.inv_range)
 outer_cache.inv_range(start, end);
+  else if (outer_cache.lpae_inv_range)
+  outer_cache.lpae_inv_range(start, end);
 }

 /**
@@ -52,6 +57,8 @@ static inline void outer_clean_range(phys_addr_t start, 
phys_addr_t end)
 {
 if (outer_cache.clean_range)
 outer_cache.clean_range(start, end);
+  else if (outer_cache.lpae_clean_range)
+  outer_cache.lpae_clean_range(start, end);
 }

 /**
@@ -63,6 +70,8 @@ static inline void outer_flush_range(phys_addr_t start, 
phys_addr_t end)
 {
 if (outer_cache.flush_range)
 outer_cache.flush_range(start, end);
+  else if (outer_cache.lpae_flush_range)
+  outer_cache.lpae_flush_range(start, end);
 }

 /**



> 
> Signed-off-by: Zhen Lei 
> ---
>  arch/arm/include/asm/outercache.h |  6 +--
>  arch/arm/mm/cache-feroceon-l2.c   | 21 --
>  arch/arm/mm/cache-l2x0.c  | 83 
> ---
>  arch/arm/mm/cache-tauros2.c   | 21 --
>  arch/arm/mm/cache-uniphier.c  |  6 +--
>  arch/arm/mm/cache-xsc3l2.c| 21 --
>  6 files changed, 129 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/arm/include/asm/outercache.h 
> b/arch/arm/include/asm/outercache.h
> index 3364637755e86aa..4cee1ea0c15449a 100644
> --- a/arch/arm/include/asm/outercache.h
> +++ b/arch/arm/include/asm/outercache.h
> @@ -14,9 +14,9 @@
>  struct l2x0_regs;
>  
>  struct outer_cache_fns {
> - void (*inv_range)(unsigned long, unsigned long);
> - void (*clean_range)(unsigned long, unsigned long);
> - void (*flush_range)(unsigned long, unsigned long);
> + void (*inv_range)(phys_addr_t, phys_addr_t);
> + void (*clean_range)(phys_addr_t, phys_addr_t);
> + void (*flush_range)(phys_addr_t, phys_addr_t);
>   void (*flush_all)(void);
>   void (*disable)(void);
>  #ifdef CONFIG_OUTER_CACHE_SYNC
> diff --git a/arch/arm/mm/cache-feroceon-l2.c b/arch/arm/mm/cache-feroceon-l2.c
> index 5c1b7a7b9af6300..ab1d8051bf832c9 100644
> --- a/arch/arm/mm/cache-feroceon-l2.c
> +++ b/arch/arm/mm/cache-feroceon-l2.c
> @@ -168,7 +168,7 @@ static unsigned long calc_range_end(unsigned long start, 
> unsigned long end)
>   return range_end;
>  }
>  
> -static void feroceon_l2_inv_range(unsigned long start, unsigned long end)
> +static void __feroceon_l2_inv_range(unsigned long start, unsigned long end)
>  {
>   /*
>* Clean and invalidate partial first cache line.
> @@ -198,7 +198,12 @@ static void feroceon_l2_inv_range(unsigned long start, 
> unsigned long end)
>   dsb();
>  }
>  
> -static void feroceon_l2_clean_range(unsigned long start, unsigned long end)
> +static void feroceon_l2_inv_range(phys_addr_t start, phys_addr_t end)
> +{
> + __feroceon_l2_in

Re: [PATCH 1/1] watchdog: dw_wdt: Remove duplicated header file inclusion

2021-03-30 Thread Leizhen (ThunderTown)



On 2021/3/28 3:50, Guenter Roeck wrote:
> On 3/26/21 6:20 PM, Zhen Lei wrote:
>> The header file  is already included above and can be
>> removed here.
>>
>> Signed-off-by: Zhen Lei 
> 
> Already submitted:
> 
> https://patchwork.kernel.org/project/linux-watchdog/patch/20210325112916.865510-1-wanjiab...@vivo.com/
> 
> In general, when removing duplicate or unnecessary header files,
> please retain or improve alphabetic order. This patch makes it worse.

Okay, thanks for the heads-up. I'll pay attention next time.

> 
> Guenter
> 
>> ---
>>  drivers/watchdog/dw_wdt.c | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/drivers/watchdog/dw_wdt.c b/drivers/watchdog/dw_wdt.c
>> index 32d0e1781e63c4e..b1642e2d9175584 100644
>> --- a/drivers/watchdog/dw_wdt.c
>> +++ b/drivers/watchdog/dw_wdt.c
>> @@ -19,7 +19,6 @@
>>  #include 
>>  #include 
>>  #include 
>> -#include 
>>  #include 
>>  #include 
>>  #include 
>>
> 
> 
> .
> 



Re: [PATCH 1/2] ASoC: dt-bindings: renesas, rsnd: Clear warning 'dais' is a required property

2021-03-30 Thread Leizhen (ThunderTown)



On 2021/3/31 6:45, Rob Herring wrote:
> On Tue, Mar 30, 2021 at 11:06:30AM +0800, Zhen Lei wrote:
>> When I do dt_binding_check, below warning is reported:
>> Documentation/devicetree/bindings/sound/renesas,rsnd.example.dt.yaml: \
>> sound@ec50: 'dais' is a required property
>>
>> I looked at all the dts files in the "arch/arm64/boot/dts/renesas/"
>> directory, I found that all nodes that contain the "dais" property have
>> compatible string: "audio-graph-card". So I can be sure that the
>> "$ref: audio-graph.yaml#" should be corrected to
>> "$ref: audio-graph-card.yaml#".
>>
>> In addition, not all nodes have compatible string "audio-graph-card", so
>> the "$ref: audio-graph-card.yaml#" should be described as "anyOf". To
>> ensure the validation of "anyOf" always passes, group it with the "if"
>> statement, because the result of the "if" statement is always not empty.
> 
> 'anyOf' is probably not right here.

Oh, yes, I think I should use "if" statement to enumerate cases where
"audio-graph-card.yaml" is required.

> 
> In any case, the is going to conflict with my series[1].

OK, I'll rework my patch based on your series.

> 
> Rob
> 
> [1] https://lore.kernel.org/r/20210323163634.877511-1-r...@kernel.org/
> 
> .
> 



Re: [PATCH 1/1] perf map: Fix error return code in maps__clone()

2021-04-15 Thread Leizhen (ThunderTown)



On 2021/4/15 20:42, Arnaldo Carvalho de Melo wrote:
> Em Thu, Apr 15, 2021 at 05:27:44PM +0800, Zhen Lei escreveu:
>> Although 'err' has been initialized to -ENOMEM, but it will be reassigned
>> by the "err = unwind__prepare_access(...)" statement in the for loop. So
>> that, the value of 'err' is unknown when map__clone() failed.
> 
> You forgot to research and add this:
> 
> Fixes: 6c502584438bda63 ("perf unwind: Call unwind__prepare_access for forked 
> thread")
> 
> So that the sta...@kernel.org guys can pick this up automagically and
> apply this fix to the stable kernels.
> 
> I've added it.

OK, thank you very much.

> 
> Thanks, applied.
> 
> - Arnaldo
>  
>> Reported-by: Hulk Robot 
>> Signed-off-by: Zhen Lei 
>> ---
>>  tools/perf/util/map.c | 7 +--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
>> index fbc40a2c17d4dca..8af693d9678cefe 100644
>> --- a/tools/perf/util/map.c
>> +++ b/tools/perf/util/map.c
>> @@ -840,15 +840,18 @@ int maps__fixup_overlappings(struct maps *maps, struct 
>> map *map, FILE *fp)
>>  int maps__clone(struct thread *thread, struct maps *parent)
>>  {
>>  struct maps *maps = thread->maps;
>> -int err = -ENOMEM;
>> +int err;
>>  struct map *map;
>>  
>>  down_read(&parent->lock);
>>  
>>  maps__for_each_entry(parent, map) {
>>  struct map *new = map__clone(map);
>> -if (new == NULL)
>> +
>> +if (new == NULL) {
>> +err = -ENOMEM;
>>  goto out_unlock;
>> +}
>>  
>>  err = unwind__prepare_access(maps, new, NULL);
>>  if (err)
>> -- 
>> 2.26.0.106.g9fadedd
>>
>>
> 



Re: [PATCH 1/1] char: hpet: Remove unused local variable 'm' in hpet_interrupt()

2021-04-15 Thread Leizhen (ThunderTown)



On 2021/4/15 22:53, Greg Kroah-Hartman wrote:
> On Thu, Apr 15, 2021 at 10:24:04PM +0800, Zhen Lei wrote:
>> Commit 273ef9509b79 ("drivers/char/hpet.c: fix periodic-emulation for
>> delayed interrupt") removed the reference to local variable 'm', but
>> forgot to remove the definition and assignment of it. Due to
>> read_counter() indirectly calls "read barrier", the performance is
>> slightly degraded.
>>
>> Since the following comments give some description based on 'm', so move
>> the assignment of 'm' into it.
>>
>> Fixes: 273ef9509b79 ("drivers/char/hpet.c: fix periodic-emulation for 
>> delayed interrupt")
>> Reported-by: Hulk Robot 
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/char/hpet.c | 8 +---
>>  1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
>> index ed3b7dab678dbd1..46950a0cda181a1 100644
>> --- a/drivers/char/hpet.c
>> +++ b/drivers/char/hpet.c
>> @@ -156,14 +156,16 @@ static irqreturn_t hpet_interrupt(int irq, void *data)
>>   * This has the effect of treating non-periodic like periodic.
>>   */
>>  if ((devp->hd_flags & (HPET_IE | HPET_PERIODIC)) == HPET_IE) {
>> -unsigned long m, t, mc, base, k;
>> +unsigned long t, mc, base, k;
>>  struct hpet __iomem *hpet = devp->hd_hpet;
>>  struct hpets *hpetp = devp->hd_hpets;
>>  
>>  t = devp->hd_ireqfreq;
>> -m = read_counter(&devp->hd_timer->hpet_compare);
>>  mc = read_counter(&hpet->hpet_mc);
>> -/* The time for the next interrupt would logically be t + m,
>> +/*
>> + * m = read_counter(&devp->hd_timer->hpet_compare);
> 
> Why did you comment this out?
> 
> And are you sure that yuou are not required to actually read that
> counter, even if you do not do anything with the value?  Lots of
> hardware works in odd ways...
> 
> Have you tested and verified that this still works properly?

Sorry, I didn't actually test it. I didn't see any dependency on
this read operation for other members' reads and writes. If this
read operation is potentially required, hopefully there is an
explanatory note next to it.

> 
> thanks,
> 
> greg k-h
> 
> .
> 



Re: [PATCH 5/8] iommu: fix a couple of spelling mistakes

2021-04-16 Thread Leizhen (ThunderTown)



On 2021/4/16 23:55, John Garry wrote:
> On 26/03/2021 06:24, Zhen Lei wrote:
>> There are several spelling mistakes, as follows:
>> funcions ==> functions
>> distiguish ==> distinguish
>> detroyed ==> destroyed
>>
>> Signed-off-by: Zhen Lei
> 
> I think that there should be a /s/appropriatley/appropriately/ in iommu.c

OK, I will fix it in v2.

> 
> Thanks,
> john
> 
> .
> 



Re: [PATCH 0/8] iommu: fix a couple of spelling mistakes detected by codespell tool

2021-04-16 Thread Leizhen (ThunderTown)



On 2021/4/16 23:24, Joerg Roedel wrote:
> On Fri, Mar 26, 2021 at 02:24:04PM +0800, Zhen Lei wrote:
>> This detection and correction covers the entire driver/iommu directory.
>>
>> Zhen Lei (8):
>>   iommu/pamu: fix a couple of spelling mistakes
>>   iommu/omap: Fix spelling mistake "alignement" -> "alignment"
>>   iommu/mediatek: Fix spelling mistake "phyiscal" -> "physical"
>>   iommu/sun50i: Fix spelling mistake "consits" -> "consists"
>>   iommu: fix a couple of spelling mistakes
>>   iommu/amd: fix a couple of spelling mistakes
>>   iommu/arm-smmu: Fix spelling mistake "initally" -> "initially"
>>   iommu/vt-d: fix a couple of spelling mistakes
> 
> This patch-set doesn't apply. Please re-send it as a single patch when
> v5.13-rc1 is released.

OK

> 
> Thanks,
> 
>   Joerg
> 
> .
> 



Re: [PATCH 0/3] scsi: mptfusion: Clear the warnings indicating that the variable is not used

2021-04-13 Thread Leizhen (ThunderTown)



On 2021/4/13 13:07, Martin K. Petersen wrote:
> 
> Zhen,
> 
>> Zhen Lei (3):
>>   scsi: mptfusion: Remove unused local variable 'time_count'
>>   scsi: mptfusion: Remove unused local variable 'port'
>>   scsi: mptfusion: Fix error return code of mptctl_hp_hostinfo()
> 
> I applied patches 1+2. I hesitate making functional changes to such an
> old driver.

I think patch 3 does not change any functions. The current modification only
ensures that error codes are correctly returned in the error branch. In the
previous version, 0 is always returned.

> 



Re: [PATCH v2 2/2] ASoC: dt-bindings: renesas, rsnd: Clear warning 'ports' does not match any of the regexes

2021-04-06 Thread Leizhen (ThunderTown)



On 2021/4/2 4:20, Rob Herring wrote:
> On Wed, Mar 31, 2021 at 05:16:16PM +0800, Zhen Lei wrote:
>> Currently, if there are more than two ports, or if there is only one port
>> but other properties(such as "#address-cells") is required, these ports
>> are placed under the "ports" node. So add the schema of property "ports".
> 
> A given binding should just use 'ports' or 'port' depending on it's 
> need. Supporting both forms is needless complexity.

Right, I'll adjust this patch again.

> 
>> Otherwise, warnings similar to the following will be reported:
>> arch/arm64/boot/dts/renesas/r8a774a1-beacon-rzg2m-kit.dt.yaml: \
>> sound@ec50: 'ports' does not match any of the regexes: \
>> '^rcar_sound,ctu$', '^rcar_sound,dai$', '^rcar_sound,dvc$', ...
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  Documentation/devicetree/bindings/sound/renesas,rsnd.yaml | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/sound/renesas,rsnd.yaml 
>> b/Documentation/devicetree/bindings/sound/renesas,rsnd.yaml
>> index 384191ee497f534..a42992fa687d3f3 100644
>> --- a/Documentation/devicetree/bindings/sound/renesas,rsnd.yaml
>> +++ b/Documentation/devicetree/bindings/sound/renesas,rsnd.yaml
>> @@ -115,6 +115,11 @@ properties:
>>  $ref: audio-graph-port.yaml#
>>  unevaluatedProperties: false
>>  
>> +  ports:
> 
>$ref: /schemas/graph.yaml#/properties/ports

OK, thanks

> 
>> +patternProperties:
>> +  '^port@[0-9]':
>> +$ref: "#/properties/port"
> 
> Then this should be: $ref: audio-graph-port.yaml#

OK, thanks

> 
> Also, what each port is should be defined, but that's a separate 
> problem.
> 
>> +
>>  # use patternProperties to avoid naming "xxx,yyy" issue
>>  patternProperties:
>>"^rcar_sound,dvc$":
>> -- 
>> 1.8.3
>>
>>
> 
> .
> 



Re: [PATCH v2 2/2] ASoC: dt-bindings: renesas, rsnd: Clear warning 'ports' does not match any of the regexes

2021-04-11 Thread Leizhen (ThunderTown)



On 2021/4/9 4:42, Rob Herring wrote:
> On Thu, Apr 08, 2021 at 08:28:08PM +0800, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2021/4/7 10:04, Leizhen (ThunderTown) wrote:
>>>
>>>
>>> On 2021/4/2 4:20, Rob Herring wrote:
>>>> On Wed, Mar 31, 2021 at 05:16:16PM +0800, Zhen Lei wrote:
>>>>> Currently, if there are more than two ports, or if there is only one port
>>>>> but other properties(such as "#address-cells") is required, these ports
>>>>> are placed under the "ports" node. So add the schema of property "ports".
>>>>
>>>> A given binding should just use 'ports' or 'port' depending on it's 
>>>> need. Supporting both forms is needless complexity.
>>
>> Hi Rob:
>> I don't think of a good way to avoid "port" and "ports" to be used at the 
>> same time.
>> Should I disable the use of "port"? Convert the two usages of "port" into 
>> "ports".
>> But usually no one will use both of them in one dts file. And even if it's 
>> used at
>> the same time, it's not a big mistake. So I decided not to test it.
> 
> I think the Renesas folks need to comment on this.

Hi All:
  I've figured out a way to avoid both. I'll send v3 right away.

> 
> Rob
> 
> .
> 



Re: [PATCH 1/1] ACPI/nfit: correct the badrange to be reported in nfit_handle_mce()

2020-11-18 Thread Leizhen (ThunderTown)



On 2020/11/18 16:41, Zhen Lei wrote:
> The badrange to be reported should always cover mce->addr.
Maybe I should change this description to:
Make sure the badrange to be reported can always cover mce->addr.

> 
> Signed-off-by: Zhen Lei 
> ---
>  drivers/acpi/nfit/mce.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
> index ee8d9973f60b..053e719c7bea 100644
> --- a/drivers/acpi/nfit/mce.c
> +++ b/drivers/acpi/nfit/mce.c
> @@ -63,7 +63,7 @@ static int nfit_handle_mce(struct notifier_block *nb, 
> unsigned long val,
>  
>   /* If this fails due to an -ENOMEM, there is little we can do */
>   nvdimm_bus_add_badrange(acpi_desc->nvdimm_bus,
> - ALIGN(mce->addr, L1_CACHE_BYTES),
> + ALIGN_DOWN(mce->addr, L1_CACHE_BYTES),
>   L1_CACHE_BYTES);
>   nvdimm_region_notify(nfit_spa->nd_region,
>   NVDIMM_REVALIDATE_POISON);
> 



Re: [PATCH 1/1] ACPI/nfit: correct the badrange to be reported in nfit_handle_mce()

2020-11-18 Thread Leizhen (ThunderTown)



On 2020/11/19 3:16, Dan Williams wrote:
> On Wed, Nov 18, 2020 at 12:55 AM Leizhen (ThunderTown)
>  wrote:
>>
>>
>>
>> On 2020/11/18 16:41, Zhen Lei wrote:
>>> The badrange to be reported should always cover mce->addr.
>> Maybe I should change this description to:
>> Make sure the badrange to be reported can always cover mce->addr.
> 
> Yes, I like that better. Can you also say a bit more about how you
> found this bug? As far as I can see this looks like -stable material.

I found it when I was learning the code. I'm looking at it carefully.

> 
> 



Re: [PATCH 1/1] ACPI/nfit: correct the badrange to be reported in nfit_handle_mce()

2020-11-18 Thread Leizhen (ThunderTown)



On 2020/11/19 4:50, Verma, Vishal L wrote:
> On Wed, 2020-11-18 at 16:41 +0800, Zhen Lei wrote:
>> The badrange to be reported should always cover mce->addr.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  drivers/acpi/nfit/mce.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Ah good find, agreed with Dan that this is stable material.
> Minor nit, I'd recommend rewording the subject line to something like:
> 
> "acpi/nfit: fix badrange insertion in nfit_handle_mce()"

OK, I will send v2.

> 
> Otherwise, looks good to me.
> Reviewed-by: Vishal Verma 
> 
>>
>> diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
>> index ee8d9973f60b..053e719c7bea 100644
>> --- a/drivers/acpi/nfit/mce.c
>> +++ b/drivers/acpi/nfit/mce.c
>> @@ -63,7 +63,7 @@ static int nfit_handle_mce(struct notifier_block *nb, 
>> unsigned long val,
>>  
>>  /* If this fails due to an -ENOMEM, there is little we can do */
>>  nvdimm_bus_add_badrange(acpi_desc->nvdimm_bus,
>> -ALIGN(mce->addr, L1_CACHE_BYTES),
>> +ALIGN_DOWN(mce->addr, L1_CACHE_BYTES),
>>  L1_CACHE_BYTES);
>>  nvdimm_region_notify(nfit_spa->nd_region,
>>  NVDIMM_REVALIDATE_POISON);
> 



Re: [PATCH v2 1/1] ARM: dts: mmp2-olpc-xo-1-75: clear the warnings when make dtbs

2020-12-08 Thread Leizhen (ThunderTown)



On 2020/12/8 21:58, Arnd Bergmann wrote:
> On Mon, Dec 7, 2020 at 9:47 AM Zhen Lei  wrote:
>>
>> The check_spi_bus_bridge() in scripts/dtc/checks.c requires that the node
>> have "spi-slave" property must with "#address-cells = <0>" and
>> "#size-cells = <0>". But currently both "#address-cells" and "#size-cells"
>> properties are deleted, the corresponding default values are 2 and 1. As a
>> result, the check fails and below warnings is displayed.
>>
>> arch/arm/boot/dts/mmp2.dtsi:472.23-480.6: Warning (spi_bus_bridge): \
>> /soc/apb@d400/spi@d4037000: incorrect #address-cells for SPI bus
>>   also defined at arch/arm/boot/dts/mmp2-olpc-xo-1-75.dts:225.7-237.3
>> arch/arm/boot/dts/mmp2.dtsi:472.23-480.6: Warning (spi_bus_bridge): \
>> /soc/apb@d400/spi@d4037000: incorrect #size-cells for SPI bus
>>   also defined at arch/arm/boot/dts/mmp2-olpc-xo-1-75.dts:225.7-237.3
>> arch/arm/boot/dts/mmp2-olpc-xo-1-75.dtb: Warning (spi_bus_reg): \
>> Failed prerequisite 'spi_bus_bridge'
>>
>> Because the value of "#size-cells" is already defined as zero in the node
>> "ssp3: spi@d4037000" in arch/arm/boot/dts/mmp2.dtsi. So we only need to
>> explicitly add "#address-cells = <0>" and keep "#size-cells" no change.
>>
>> Signed-off-by: Zhen Lei 
> 
> Right, I already sent the same patch earlier.

Oh, sorry, I don't known it. If you send it earlier, please apply your patch!

> 
> Lubomir, can I apply this to the fixes branch?

This fix is really should be considered to merge into v5.10.

> 
>>  arch/arm/boot/dts/mmp2-olpc-xo-1-75.dts | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/boot/dts/mmp2-olpc-xo-1-75.dts 
>> b/arch/arm/boot/dts/mmp2-olpc-xo-1-75.dts
>> index adde62d6fce73b9..82da44dacba7172 100644
>> --- a/arch/arm/boot/dts/mmp2-olpc-xo-1-75.dts
>> +++ b/arch/arm/boot/dts/mmp2-olpc-xo-1-75.dts
>> @@ -224,7 +224,7 @@
>>
>>  &ssp3 {
>> /delete-property/ #address-cells;
>> -   /delete-property/ #size-cells;
>> +   #address-cells = <0>;
>> spi-slave;
>> status = "okay";
>> ready-gpios = <&gpio 125 GPIO_ACTIVE_HIGH>;
>> --
>> 1.8.3
>>
>>
> 
> .
> 



Re: [RESEND PATCH v3 1/4] iommu/iova: Add free_all_cpu_cached_iovas()

2020-12-09 Thread Leizhen (ThunderTown)
On 2020/11/17 18:25, John Garry wrote:
> Add a helper function to free the CPU rcache for all online CPUs.
> 
> There also exists a function of the same name in
> drivers/iommu/intel/iommu.c, but the parameters are different, and there
> should be no conflict.
> 
> Signed-off-by: John Garry 
> ---
>  drivers/iommu/iova.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index 30d969a4c5fd..81b7399dd5e8 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -227,6 +227,14 @@ static int __alloc_and_insert_iova_range(struct 
> iova_domain *iovad,
>   return -ENOMEM;
>  }
>  
> +static void free_all_cpu_cached_iovas(struct iova_domain *iovad)
> +{
> + unsigned int cpu;
> +
> + for_each_online_cpu(cpu)
> + free_cpu_cached_iovas(cpu, iovad);
> +}
> +
>  static struct kmem_cache *iova_cache;
>  static unsigned int iova_cache_users;
>  static DEFINE_MUTEX(iova_cache_mutex);
> @@ -422,15 +430,12 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned 
> long size,
>  retry:
>   new_iova = alloc_iova(iovad, size, limit_pfn, true);
>   if (!new_iova) {
> - unsigned int cpu;
> -
>   if (!flush_rcache)
>   return 0;
>  
>   /* Try replenishing IOVAs by flushing rcache. */
>   flush_rcache = false;
> - for_each_online_cpu(cpu)
> - free_cpu_cached_iovas(cpu, iovad);
> + free_all_cpu_cached_iovas(iovad);
>   goto retry;
>   }
>  

Reviewed-by: Zhen Lei 

> 



Re: [RESEND PATCH v3 2/4] iommu/iova: Avoid double-negatives in magazine helpers

2020-12-09 Thread Leizhen (ThunderTown)



On 2020/11/17 18:25, John Garry wrote:
> A similar crash to the following could be observed if initial CPU rcache
> magazine allocations fail in init_iova_rcaches():
> 
> Unable to handle kernel NULL pointer dereference at virtual address 
> 
> Mem abort info:
>ESR = 0x9604
>EC = 0x25: DABT (current EL), IL = 32 bits
>SET = 0, FnV = 0
>EA = 0, S1PTW = 0
> Data abort info:
>ISV = 0, ISS = 0x0004
>CM = 0, WnR = 0
> [] user address but active_mm is swapper
> Internal error: Oops: 9604 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 11 PID: 696 Comm: irq/40-hisi_sas Not tainted 5.9.0-rc7-dirty #109
> Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 
> 03/15/2019
> Call trace:
>   free_iova_fast+0xfc/0x280
>   iommu_dma_free_iova+0x64/0x70
>   __iommu_dma_unmap+0x9c/0xf8
>   iommu_dma_unmap_sg+0xa8/0xc8
>   dma_unmap_sg_attrs+0x28/0x50
>   cq_thread_v3_hw+0x2dc/0x528
>   irq_thread_fn+0x2c/0xa0
>   irq_thread+0x130/0x1e0
>   kthread+0x154/0x158
>   ret_from_fork+0x10/0x34
> 
> Code: f9400060 f102001f 54000981 d421 (f9400043)
> 
>  ---[ end trace 4afcbdfc61b60467 ]---
> 
> The issue is that expression !iova_magazine_full(NULL) evaluates true; this
> falls over in in __iova_rcache_insert() when we attempt to cache a mag
> and cpu_rcache->loaded == NULL:
> 
> if (!iova_magazine_full(cpu_rcache->loaded)) {
>   can_insert = true;
> ...
> 
> if (can_insert)
>   iova_magazine_push(cpu_rcache->loaded, iova_pfn);
> 
> As above, can_insert is evaluated true, which it shouldn't be, and we try
> to insert pfns in a NULL mag, which is not safe.
> 
> To avoid this, stop using double-negatives, like !iova_magazine_full() and
> !iova_magazine_empty(), and use positive tests, like
> iova_magazine_has_space() and iova_magazine_has_pfns(), respectively; these
> can safely deal with cpu_rcache->{loaded, prev} = NULL.
> 
> Signed-off-by: John Garry 
> ---
>  drivers/iommu/iova.c | 29 +
>  1 file changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index 81b7399dd5e8..1f3f0f8b12e0 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -827,14 +827,18 @@ iova_magazine_free_pfns(struct iova_magazine *mag, 
> struct iova_domain *iovad)
>   mag->size = 0;
>  }
>  
> -static bool iova_magazine_full(struct iova_magazine *mag)
> +static bool iova_magazine_has_space(struct iova_magazine *mag)
>  {
> - return (mag && mag->size == IOVA_MAG_SIZE);
> + if (!mag)
> + return false;
> + return mag->size < IOVA_MAG_SIZE;
>  }
>  
> -static bool iova_magazine_empty(struct iova_magazine *mag)
> +static bool iova_magazine_has_pfns(struct iova_magazine *mag)
>  {
> - return (!mag || mag->size == 0);
> + if (!mag)
> + return false;
> + return mag->size;
>  }
>  
>  static unsigned long iova_magazine_pop(struct iova_magazine *mag,
> @@ -843,7 +847,7 @@ static unsigned long iova_magazine_pop(struct 
> iova_magazine *mag,
>   int i;
>   unsigned long pfn;
>  
> - BUG_ON(iova_magazine_empty(mag));
> + BUG_ON(!iova_magazine_has_pfns(mag));
>  
>   /* Only fall back to the rbtree if we have no suitable pfns at all */
>   for (i = mag->size - 1; mag->pfns[i] > limit_pfn; i--)
> @@ -859,7 +863,7 @@ static unsigned long iova_magazine_pop(struct 
> iova_magazine *mag,
>  
>  static void iova_magazine_push(struct iova_magazine *mag, unsigned long pfn)
>  {
> - BUG_ON(iova_magazine_full(mag));
> + BUG_ON(!iova_magazine_has_space(mag));
>  
>   mag->pfns[mag->size++] = pfn;
>  }
> @@ -905,9 +909,9 @@ static bool __iova_rcache_insert(struct iova_domain 
> *iovad,
>   cpu_rcache = raw_cpu_ptr(rcache->cpu_rcaches);
>   spin_lock_irqsave(&cpu_rcache->lock, flags);
>  
> - if (!iova_magazine_full(cpu_rcache->loaded)) {
> + if (iova_magazine_has_space(cpu_rcache->loaded)) {
>   can_insert = true;
> - } else if (!iova_magazine_full(cpu_rcache->prev)) {
> + } else if (iova_magazine_has_space(cpu_rcache->prev)) {
>   swap(cpu_rcache->prev, cpu_rcache->loaded);
>   can_insert = true;
>   } else {
> @@ -916,8 +920,9 @@ static bool __iova_rcache_insert(struct iova_domain 
> *iovad,
>   if (new_mag) {
>   spin_lock(&rcache->lock);
>   if (rcache->depot_size < MAX_GLOBAL_MAGS) {
> - rcache->depot[rcache->depot_size++] =
> - cpu_rcache->loaded;
> + if (cpu_rcache->loaded)

Looks like it just needs to change this place. Compiler ensures that mag->size
will not be accessed when mag is NULL.

static bool iova_magazine_full(struct iova_magazine *mag)
{
return (mag && mag->size == IOVA_MAG_SIZE);
}

static bool iova_magazine_empty(struct iova_magazine *mag)
{
return (!mag || m

Re: [RESEND PATCH v3 3/4] iommu/iova: Flush CPU rcache for when a depot fills

2020-12-09 Thread Leizhen (ThunderTown)



On 2020/11/17 18:25, John Garry wrote:
> Leizhen reported some time ago that IOVA performance may degrade over time
> [0], but unfortunately his solution to fix this problem was not given
> attention.
> 
> To summarize, the issue is that as time goes by, the CPU rcache and depot
> rcache continue to grow. As such, IOVA RB tree access time also continues
> to grow.
> 
> At a certain point, a depot may become full, and also some CPU rcaches may
> also be full when inserting another IOVA is attempted. For this scenario,
> currently the "loaded" CPU rcache is freed and a new one is created. This
> freeing means that many IOVAs in the RB tree need to be freed, which
> makes IO throughput performance fall off a cliff in some storage scenarios:
> 
> Jobs: 12 (f=12): [] [0.0% done] [6314MB/0KB/0KB /s] [1616K/0/0 
> iops]
> Jobs: 12 (f=12): [] [0.0% done] [5669MB/0KB/0KB /s] [1451K/0/0 
> iops]
> Jobs: 12 (f=12): [] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 
> iops]
> Jobs: 12 (f=12): [] [0.0% done] [6673MB/0KB/0KB /s] [1708K/0/0 
> iops]
> Jobs: 12 (f=12): [] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 
> iops]
> Jobs: 12 (f=12): [] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 
> iops]
> Jobs: 12 (f=12): [] [0.0% done] [6761MB/0KB/0KB /s] [1731K/0/0 
> iops]
> Jobs: 12 (f=12): [] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 
> iops]
> Jobs: 12 (f=12): [] [0.0% done] [6685MB/0KB/0KB /s] [1711K/0/0 
> iops]
> Jobs: 12 (f=12): [] [0.0% done] [6178MB/0KB/0KB /s] [1582K/0/0 
> iops]
> Jobs: 12 (f=12): [] [0.0% done] [6731MB/0KB/0KB /s] [1723K/0/0 
> iops]
> Jobs: 12 (f=12): [] [0.0% done] [2387MB/0KB/0KB /s] [611K/0/0 
> iops]
> Jobs: 12 (f=12): [] [0.0% done] [2689MB/0KB/0KB /s] [688K/0/0 
> iops]
> Jobs: 12 (f=12): [] [0.0% done] [2278MB/0KB/0KB /s] [583K/0/0 
> iops]
> Jobs: 12 (f=12): [] [0.0% done] [1288MB/0KB/0KB /s] [330K/0/0 
> iops]
> Jobs: 12 (f=12): [] [0.0% done] [1632MB/0KB/0KB /s] [418K/0/0 
> iops]
> Jobs: 12 (f=12): [] [0.0% done] [1765MB/0KB/0KB /s] [452K/0/0 
> iops]
> 
> And continue in this fashion, without recovering. Note that in this
> example it was required to wait 16 hours for this to occur. Also note that
> IO throughput also becomes gradually becomes more unstable leading up to
> this point.
> 
> This problem is only seen for non-strict mode. For strict mode, the rcaches
> stay quite compact.
> 
> As a solution to this issue, judge that the IOVA caches have grown too big
> when cached magazines need to be free, and just flush all the CPUs rcaches
> instead.
> 
> The depot rcaches, however, are not flushed, as they can be used to
> immediately replenish active CPUs.
> 
> In future, some IOVA compaction could be implemented to solve the
> instabilty issue, which I figure could be quite complex to implement.
> 
> [0] 
> https://lore.kernel.org/linux-iommu/20190815121104.29140-3-thunder.leiz...@huawei.com/
> 
> Analyzed-by: Zhen Lei 
> Reported-by: Xiang Chen 
> Signed-off-by: John Garry 
> ---
>  drivers/iommu/iova.c | 16 ++--
>  1 file changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index 1f3f0f8b12e0..386005055aca 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -901,7 +901,6 @@ static bool __iova_rcache_insert(struct iova_domain 
> *iovad,
>struct iova_rcache *rcache,
>unsigned long iova_pfn)
>  {
> - struct iova_magazine *mag_to_free = NULL;
>   struct iova_cpu_rcache *cpu_rcache;
>   bool can_insert = false;
>   unsigned long flags;
> @@ -923,13 +922,12 @@ static bool __iova_rcache_insert(struct iova_domain 
> *iovad,
>   if (cpu_rcache->loaded)
>   rcache->depot[rcache->depot_size++] =
>   cpu_rcache->loaded;
> - } else {
> - mag_to_free = cpu_rcache->loaded;
> + can_insert = true;
> + cpu_rcache->loaded = new_mag;
>   }
>   spin_unlock(&rcache->lock);
> -
> - cpu_rcache->loaded = new_mag;
> - can_insert = true;
> + if (!can_insert)
> + iova_magazine_free(new_mag);
>   }
>   }
>  
> @@ -938,10 +936,8 @@ static bool __iova_rcache_insert(struct iova_domain 
> *iovad,
>  
>   spin_unlock_irqrestore(&cpu_rcache->lock, flags);
>  
> - if (mag_to_free) {
> - iova_magazine_free_pfns(mag_to_free, iovad);
> - iova_magazine_free(mag_to_free);
mag_to_free has been stripped out, that's why lock protection is not required 
here.

> - }
> +   

Re: [RESEND PATCH v3 3/4] iommu/iova: Flush CPU rcache for when a depot fills

2020-12-09 Thread Leizhen (ThunderTown)



On 2020/12/9 19:22, John Garry wrote:
> On 09/12/2020 09:13, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2020/11/17 18:25, John Garry wrote:
>>> Leizhen reported some time ago that IOVA performance may degrade over time
>>> [0], but unfortunately his solution to fix this problem was not given
>>> attention.
>>>
>>> To summarize, the issue is that as time goes by, the CPU rcache and depot
>>> rcache continue to grow. As such, IOVA RB tree access time also continues
>>> to grow.
>>>
>>> At a certain point, a depot may become full, and also some CPU rcaches may
>>> also be full when inserting another IOVA is attempted. For this scenario,
>>> currently the "loaded" CPU rcache is freed and a new one is created. This
>>> freeing means that many IOVAs in the RB tree need to be freed, which
>>> makes IO throughput performance fall off a cliff in some storage scenarios:
>>>
>>> Jobs: 12 (f=12): [] [0.0% done] [6314MB/0KB/0KB /s] [1616K/0/0 
>>> iops]
>>> Jobs: 12 (f=12): [] [0.0% done] [5669MB/0KB/0KB /s] [1451K/0/0 
>>> iops]
>>> Jobs: 12 (f=12): [] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 
>>> iops]
>>> Jobs: 12 (f=12): [] [0.0% done] [6673MB/0KB/0KB /s] [1708K/0/0 
>>> iops]
>>> Jobs: 12 (f=12): [] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 
>>> iops]
>>> Jobs: 12 (f=12): [] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 
>>> iops]
>>> Jobs: 12 (f=12): [] [0.0% done] [6761MB/0KB/0KB /s] [1731K/0/0 
>>> iops]
>>> Jobs: 12 (f=12): [] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 
>>> iops]
>>> Jobs: 12 (f=12): [] [0.0% done] [6685MB/0KB/0KB /s] [1711K/0/0 
>>> iops]
>>> Jobs: 12 (f=12): [] [0.0% done] [6178MB/0KB/0KB /s] [1582K/0/0 
>>> iops]
>>> Jobs: 12 (f=12): [] [0.0% done] [6731MB/0KB/0KB /s] [1723K/0/0 
>>> iops]
>>> Jobs: 12 (f=12): [] [0.0% done] [2387MB/0KB/0KB /s] [611K/0/0 
>>> iops]
>>> Jobs: 12 (f=12): [] [0.0% done] [2689MB/0KB/0KB /s] [688K/0/0 
>>> iops]
>>> Jobs: 12 (f=12): [] [0.0% done] [2278MB/0KB/0KB /s] [583K/0/0 
>>> iops]
>>> Jobs: 12 (f=12): [] [0.0% done] [1288MB/0KB/0KB /s] [330K/0/0 
>>> iops]
>>> Jobs: 12 (f=12): [] [0.0% done] [1632MB/0KB/0KB /s] [418K/0/0 
>>> iops]
>>> Jobs: 12 (f=12): [] [0.0% done] [1765MB/0KB/0KB /s] [452K/0/0 
>>> iops]
>>>
>>> And continue in this fashion, without recovering. Note that in this
>>> example it was required to wait 16 hours for this to occur. Also note that
>>> IO throughput also becomes gradually becomes more unstable leading up to
>>> this point.
>>>
>>> This problem is only seen for non-strict mode. For strict mode, the rcaches
>>> stay quite compact.
>>>
>>> As a solution to this issue, judge that the IOVA caches have grown too big
>>> when cached magazines need to be free, and just flush all the CPUs rcaches
>>> instead.
>>>
>>> The depot rcaches, however, are not flushed, as they can be used to
>>> immediately replenish active CPUs.
>>>
>>> In future, some IOVA compaction could be implemented to solve the
>>> instabilty issue, which I figure could be quite complex to implement.
>>>
>>> [0] 
>>> https://lore.kernel.org/linux-iommu/20190815121104.29140-3-thunder.leiz...@huawei.com/
>>>
>>> Analyzed-by: Zhen Lei 
>>> Reported-by: Xiang Chen 
>>> Signed-off-by: John Garry 
> 
> Thanks for having a look
> 
>>> ---
>>>   drivers/iommu/iova.c | 16 ++--
>>>   1 file changed, 6 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>>> index 1f3f0f8b12e0..386005055aca 100644
>>> --- a/drivers/iommu/iova.c
>>> +++ b/drivers/iommu/iova.c
>>> @@ -901,7 +901,6 @@ static bool __iova_rcache_insert(struct iova_domain 
>>> *iovad,
>>>    struct iova_rcache *rcache,
>>>    unsigned long iova_pfn)
>>>   {
>>> -    struct iova_magazine *mag_to_free = NULL;
>>>   struct iova_cpu_rcache *cpu_rcache;
>>>   bool can_insert = false;
>>>   unsigned long flags;
>>> @@ -923,13 +922,12 @@ static bool __iova_rcache_insert(struct iova_dom

Re: [RESEND PATCH v3 2/4] iommu/iova: Avoid double-negatives in magazine helpers

2020-12-09 Thread Leizhen (ThunderTown)



On 2020/12/9 19:39, John Garry wrote:
> On 09/12/2020 09:03, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2020/11/17 18:25, John Garry wrote:
>>> A similar crash to the following could be observed if initial CPU rcache
>>> magazine allocations fail in init_iova_rcaches():
>>>
>>> Unable to handle kernel NULL pointer dereference at virtual address 
>>> 
>>> Mem abort info:
>>>     ESR = 0x9604
>>>     EC = 0x25: DABT (current EL), IL = 32 bits
>>>     SET = 0, FnV = 0
>>>     EA = 0, S1PTW = 0
>>> Data abort info:
>>>     ISV = 0, ISS = 0x0004
>>>     CM = 0, WnR = 0
>>> [] user address but active_mm is swapper
>>> Internal error: Oops: 9604 [#1] PREEMPT SMP
>>> Modules linked in:
>>> CPU: 11 PID: 696 Comm: irq/40-hisi_sas Not tainted 5.9.0-rc7-dirty #109
>>> Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 - V1.16.01 
>>> 03/15/2019
>>> Call trace:
>>>    free_iova_fast+0xfc/0x280
>>>    iommu_dma_free_iova+0x64/0x70
>>>    __iommu_dma_unmap+0x9c/0xf8
>>>    iommu_dma_unmap_sg+0xa8/0xc8
>>>    dma_unmap_sg_attrs+0x28/0x50
>>>    cq_thread_v3_hw+0x2dc/0x528
>>>    irq_thread_fn+0x2c/0xa0
>>>    irq_thread+0x130/0x1e0
>>>    kthread+0x154/0x158
>>>    ret_from_fork+0x10/0x34
>>>
>>> Code: f9400060 f102001f 54000981 d421 (f9400043)
>>>
>>>   ---[ end trace 4afcbdfc61b60467 ]---
>>>
>>> The issue is that expression !iova_magazine_full(NULL) evaluates true; this
>>> falls over in in __iova_rcache_insert() when we attempt to cache a mag
>>> and cpu_rcache->loaded == NULL:
>>>
>>> if (!iova_magazine_full(cpu_rcache->loaded)) {
>>> can_insert = true;
>>> ...
>>>
>>> if (can_insert)
>>> iova_magazine_push(cpu_rcache->loaded, iova_pfn);
>>>
>>> As above, can_insert is evaluated true, which it shouldn't be, and we try
>>> to insert pfns in a NULL mag, which is not safe.
>>>
>>> To avoid this, stop using double-negatives, like !iova_magazine_full() and
>>> !iova_magazine_empty(), and use positive tests, like
>>> iova_magazine_has_space() and iova_magazine_has_pfns(), respectively; these
>>> can safely deal with cpu_rcache->{loaded, prev} = NULL.
>>>
>>> Signed-off-by: John Garry 
> 
> Thanks for checking here...
> 
>>> ---
>>>   drivers/iommu/iova.c | 29 +
>>>   1 file changed, 17 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>>> index 81b7399dd5e8..1f3f0f8b12e0 100644
>>> --- a/drivers/iommu/iova.c
>>> +++ b/drivers/iommu/iova.c
>>> @@ -827,14 +827,18 @@ iova_magazine_free_pfns(struct iova_magazine *mag, 
>>> struct iova_domain *iovad)
>>>   mag->size = 0;
>>>   }
>>>   -static bool iova_magazine_full(struct iova_magazine *mag)
>>> +static bool iova_magazine_has_space(struct iova_magazine *mag)
>>>   {
>>> -    return (mag && mag->size == IOVA_MAG_SIZE);
>>> +    if (!mag)
>>> +    return false;
>>> +    return mag->size < IOVA_MAG_SIZE;
>>>   }
>>>   -static bool iova_magazine_empty(struct iova_magazine *mag)
>>> +static bool iova_magazine_has_pfns(struct iova_magazine *mag)
>>>   {
>>> -    return (!mag || mag->size == 0);
>>> +    if (!mag)
>>> +    return false;
>>> +    return mag->size;
>>>   }
>>>     static unsigned long iova_magazine_pop(struct iova_magazine *mag,
>>> @@ -843,7 +847,7 @@ static unsigned long iova_magazine_pop(struct 
>>> iova_magazine *mag,
>>>   int i;
>>>   unsigned long pfn;
>>>   -    BUG_ON(iova_magazine_empty(mag));
>>> +    BUG_ON(!iova_magazine_has_pfns(mag));
>>>     /* Only fall back to the rbtree if we have no suitable pfns at all 
>>> */
>>>   for (i = mag->size - 1; mag->pfns[i] > limit_pfn; i--)
>>> @@ -859,7 +863,7 @@ static unsigned long iova_magazine_pop(struct 
>>> iova_magazine *mag,
>>>     static void iova_magazine_push(struct iova_magazine *mag, unsigned long 
>>> pfn)
>>>   {
>>> -    BUG_ON(iova_magazine_full(mag));
>>> +    BUG_ON(!iova_magazine_has_space(mag));
>>>     mag->pfns[mag->size++] = pfn;
>>>   }

Re: [PATCH 1/1] dt-bindings: leds: add onboard LED triggers of 96Boards

2020-12-09 Thread Leizhen (ThunderTown)



On 2020/12/10 11:31, Manivannan Sadhasivam wrote:
> Hi,
> 
> On Thu, Dec 10, 2020 at 11:12:03AM +0800, Zhen Lei wrote:
>> For all 96Boards, the following standard is used for onboard LEDs.
>>
>> green:user1  default-trigger: heartbeat
>> green:user2  default-trigger: mmc0/disk-activity(onboard-storage)
>> green:user3  default-trigger: mmc1 (SD-card)
>> green:user4  default-trigger: none, panic-indicator
>> yellow:wlan  default-trigger: phy0tx
>> blue:bt  default-trigger: hci0-power
>>
>> Link to 96Boards CE Specification: https://linaro.co/ce-specification
>>
> 
> This is just a board configuration and there is absolutely no need to document
> this in common LED binding. But if your intention is to document the missing
No, I don't think so. The common just means the property linux,default-trigger
is common, but not it values. This can be proved by counter-proving:none of
the triggerrs currently defined in common.yaml is used by 96Boards.

> triggers, then you should look at the patch I submitted long ago.

I'm just trying to eliminate the warnings related to Hisilicon that YAML 
detected.
So I didn't pay attention to other missing triggers.

> 
> https://lore.kernel.org/patchwork/patch/1146359/
> 
> Maybe I should resubmit it again in YAML format. (thanks for reminding me :P)

Yes, I hope that you will resubmit it. After all, these false positives are
entirely due to YAML's failure to list all triggers. The DTS itself is fine.

By the way, the description of this patch I copied from your patch:
953d9f390365 arm64: dts: rockchip: Add on-board LED support on rk3399-rock960

That's why I Cc to you.

> 
> Thanks,
> Mani
> 
>> Signed-off-by: Zhen Lei 
>> Cc: Darshak Patel 
>> Cc: Manivannan Sadhasivam 
>> Cc: Shawn Guo 
>> Cc: Dong Aisheng 
>> Cc: Guodong Xu 
>> Cc: Wei Xu 
>> Cc: Linus Walleij 
>> Cc: Lad Prabhakar 
>> Cc: Marian-Cristian Rotariu 
>> Cc: Geert Uytterhoeven 
>> Cc: Heiko Stuebner 
>> ---
>>  Documentation/devicetree/bindings/leds/common.yaml | 10 ++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/leds/common.yaml 
>> b/Documentation/devicetree/bindings/leds/common.yaml
>> index f1211e7045f12f3..525752d6c5c84fd 100644
>> --- a/Documentation/devicetree/bindings/leds/common.yaml
>> +++ b/Documentation/devicetree/bindings/leds/common.yaml
>> @@ -97,6 +97,16 @@ properties:
>>  # LED alters the brightness for the specified duration with one 
>> software
>>  # timer (requires "led-pattern" property)
>>- pattern
>> +#For all 96Boards, Green, disk-activity(onboard-storage)
>> +  - mmc0
>> +#For all 96Boards, Green, SD-card
>> +  - mmc1
>> +#For all 96Boards, Green, panic-indicator
>> +  - none
>> +#For all 96Boards, Yellow, WiFi activity LED
>> +  - phy0tx
>> +#For all 96Boards, Blue, Bluetooth activity LED
>> +  - hci0-power
>>  
>>led-pattern:
>>  description: |
>> -- 
>> 1.8.3
>>
>>
> 
> .
> 



Re: [PATCH] dt-bindings: leds: Document commonly used LED triggers

2020-12-09 Thread Leizhen (ThunderTown)



On 2020/12/10 14:14, Manivannan Sadhasivam wrote:
> This commit documents the LED triggers used commonly in the SoCs. Not
> all triggers are documented as some of them are very application specific.
> Most of the triggers documented here are currently used in devicetrees
> of many SoCs.
> 
> Signed-off-by: Manivannan Sadhasivam 
> ---
>  .../devicetree/bindings/leds/common.yaml  | 72 ++-
>  1 file changed, 54 insertions(+), 18 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/leds/common.yaml 
> b/Documentation/devicetree/bindings/leds/common.yaml
> index f1211e7045f1..eee4eb7a4535 100644
> --- a/Documentation/devicetree/bindings/leds/common.yaml
> +++ b/Documentation/devicetree/bindings/leds/common.yaml
> @@ -79,24 +79,60 @@ properties:
>the LED.
>  $ref: /schemas/types.yaml#definitions/string
>  
> -enum:
> -# LED will act as a back-light, controlled by the framebuffer system
> -  - backlight
> -# LED will turn on (but for leds-gpio see "default-state" property in
> -# Documentation/devicetree/bindings/leds/leds-gpio.yaml)
> -  - default-on
> -# LED "double" flashes at a load average based rate
> -  - heartbeat
> -# LED indicates disk activity
> -  - disk-activity
> -# LED indicates IDE disk activity (deprecated), in new 
> implementations
> -# use "disk-activity"
> -  - ide-disk
> -# LED flashes at a fixed, configurable rate
> -  - timer
> -# LED alters the brightness for the specified duration with one 
> software
> -# timer (requires "led-pattern" property)
> -  - pattern
> +oneOf:
> +  - items:
> +  - enum:
> +# LED will act as a back-light, controlled by the 
> framebuffer system
> +  - backlight
> +# LED will turn on (but for leds-gpio see "default-state" 
> property in
> +# Documentation/devicetree/bindings/leds/leds-gpio.yaml)
> +  - default-on
> +# LED "double" flashes at a load average based rate
> +  - heartbeat
> +# LED indicates disk activity
> +  - disk-activity
> +# LED indicates IDE disk activity (deprecated), in new 
> implementations
> +# use "disk-activity"
> +  - ide-disk
> +# LED flashes at a fixed, configurable rate
> +  - timer
> +# LED alters the brightness for the specified duration with 
> one software
> +# timer (requires "led-pattern" property)
> +  - pattern
> +# LED indicates camera flash state
> +  - flash
> +# LED indicates camera torch state
> +  - torch
> +# LED indicates audio mute state
> +  - audio-mute
> +# LED indicates mic mute state
> +  - audio-micmute
> +# LED indicates bluetooth power state
> +  - bluetooth-power
> +# LED indicates USB gadget activity
> +  - usb-gadget
> +# LED indicates USB host activity
> +  - usb-host
> +# LED indicates MTD memory activity
> +  - mtd
> +# LED indicates NAND memory activity (deprecated),
> +# in new implementations use "mtd"
> +  - nand-disk
> +# LED indicates disk read activity
> +  - disk-read
> +# LED indicates disk write activity
> +  - disk-write
> +# No trigger assigned to the LED. This is the default mode
> +# if trigger is absent
> +  - none
> +# LED indicates activity of all CPUs
> +  - cpu
The triggers phy0tx and hci0-power are missed.

Since you've rewritten it, please consider sorting these property strings
in ascending alphabetical order.

> +  - items:
> +# LED indicates activity of [N]th CPU
> +  - pattern: "^cpu[0-9][0-9]$"
should be ^cpu[0-9]{1,2}$, otherwise, it always requires two digit.

> +  - items:
> +# LED indicates [N]th MMC storage activity
> +  - pattern: '^mmc[0-9][0-9]$'
should be '^mmc[0-9]{1,2}$'

Why CPU use "", and mmc use '',It's better to keep them consistent.

>  
>led-pattern:
>  description: |
> 



Re: [PATCH 1/1] block: move the PAGE_SECTORS definition into

2020-11-19 Thread Leizhen (ThunderTown)



On 2020/11/20 9:27, John Dorminy wrote:
> Greetings;
> 
> There are a lot of uses of PAGE_SIZE/SECTOR_SIZE scattered around, and
> it seems like a medium improvement to be able to refer to it as
> PAGE_SECTORS everywhere instead of just inside dm, bcache, and
> null_blk. Did this change progress forward somewhere?

Actually, I'm trying to make further replacements after this patch is applied.
But there was no response except Coly Li.

> 
> Thanks!
> 
> John Dorminy
> 
> 
> On Mon, Sep 7, 2020 at 3:40 AM Leizhen (ThunderTown)
>  wrote:
>>
>> Hi, Jens Axboe, Alasdair Kergon, Mike Snitzer:
>>   What's your opinion?
>>
>>
>> On 2020/8/21 15:05, Coly Li wrote:
>>> On 2020/8/21 14:48, Leizhen (ThunderTown) wrote:
>>>>
>>>>
>>>> On 8/21/2020 12:11 PM, Coly Li wrote:
>>>>> On 2020/8/21 10:03, Zhen Lei wrote:
>>>>>> There are too many PAGE_SECTORS definitions, and all of them are the
>>>>>> same. It looks a bit of a mess. So why not move it into ,
>>>>>> to achieve a basic and unique definition.
>>>>>>
>>>>>> Signed-off-by: Zhen Lei 
>>>>>
>>>>>
>>>>> A lazy question about page size > 4KB: currently in bcache code the
>>>>> sector size is assumed to be 512 sectors, if kernel page > 4KB, it is
>>>>> possible that PAGE_SECTORS in bcache will be a number > 8 ?
>>>>
>>>> Sorry, I don't fully understand your question. I known that the sector size
>>>> can be 512 or 4K, and the PAGE_SIZE can be 4K or 64K. So even if sector 
>>>> size
>>>> is 4K, isn't it greater than 8 for 64K pages?
>>>>
>>>> I'm not sure if the question you're asking is the one Matthew Wilcox has
>>>> answered before:
>>>> https://www.spinics.net/lists/raid/msg64345.html
>>>
>>> Thank you for the above information. Currently bcache code assumes
>>> sector size is always 512 bytes, you may see how many "<< 9" or ">> 9"
>>> are used. Therefore I doubt whether current code may stably work on e.g.
>>> 4Kn SSDs (this is only doubt because I don't have such SSD).
>>>
>>> Anyway your patch is fine to me. This change to bcache doesn't make
>>> thins worse or better, maybe it can be helpful to trigger my above
>>> suspicious early if people do have this kind of problem on 4Kn sector SSDs.
>>>
>>> For the bcache part of this patch, you may add,
>>> Acked-by: Coly Li 
>>>
>>> Thanks.
>>>
>>> Coly Li
>>>
>>>>>> ---
>>>>>>  drivers/block/brd.c   | 1 -
>>>>>>  drivers/block/null_blk_main.c | 1 -
>>>>>>  drivers/md/bcache/util.h  | 2 --
>>>>>>  include/linux/blkdev.h| 5 +++--
>>>>>>  include/linux/device-mapper.h | 1 -
>>>>>>  5 files changed, 3 insertions(+), 7 deletions(-)
>>>>>>
>>>>>
>>>>> [snipped]
>>>>>
>>>>>> diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
>>>>>> index c029f7443190805..55196e0f37c32c6 100644
>>>>>> --- a/drivers/md/bcache/util.h
>>>>>> +++ b/drivers/md/bcache/util.h
>>>>>> @@ -15,8 +15,6 @@
>>>>>>
>>>>>>  #include "closure.h"
>>>>>>
>>>>>> -#define PAGE_SECTORS  (PAGE_SIZE / 512)
>>>>>> -
>>>>>>  struct closure;
>>>>>>
>>>>>>  #ifdef CONFIG_BCACHE_DEBUG
>>>>>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>>>>>> index bb5636cc17b91a7..b068dfc5f2ef0ab 100644
>>>>>> --- a/include/linux/blkdev.h
>>>>>> +++ b/include/linux/blkdev.h
>>>>>> @@ -949,11 +949,12 @@ static inline struct request_queue 
>>>>>> *bdev_get_queue(struct block_device *bdev)
>>>>>>   * multiple of 512 bytes. Hence these two constants.
>>>>>>   */
>>>>>>  #ifndef SECTOR_SHIFT
>>>>>> -#define SECTOR_SHIFT 9
>>>>>> +#define SECTOR_SHIFT  9
>>>>>>  #endif
>>>>>>  #ifndef SECTOR_SIZE
>>>>>> -#define SECTOR_SIZE (1 << SECTOR_SHIFT)
>>>>>> +#define SECTOR_SIZE   (1 << SECTOR_SHIFT)
>>>>>>  #endif
>>>>>> +#define PAGE_SECTORS  (PAGE_SIZE / SECTOR_SIZE)
>>>>>>
>>>>>>  /*
>>>>>>   * blk_rq_pos()   : the current sector
>>>>> [snipped]
>>>>>
>>>>>
>>>>
>>>
>>>
>>> .
>>>
>>
> 
> 
> .
> 



Re: [PATCH 1/1] device-dax: delete a redundancy check in dev_dax_validate_align()

2020-11-20 Thread Leizhen (ThunderTown)



On 2020/11/20 17:20, Zhen Lei wrote:
> After we have done the alignment check for the length of each range, the
> alignment check for dev_dax_size(dev_dax) is no longer needed, because it
> get the sum of the length of each range.

For example:
x/M + y/M = (x + y)/M
If x/M is a integer and y/M is also a integer, then (x + y)/M must be a integer.

> 
> Signed-off-by: Zhen Lei 
> ---
>  drivers/dax/bus.c | 7 ---
>  1 file changed, 7 deletions(-)
> 
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 1efae11d947a..35f9238c0139 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -1109,16 +1109,9 @@ static ssize_t align_show(struct device *dev,
>  
>  static ssize_t dev_dax_validate_align(struct dev_dax *dev_dax)
>  {
> - resource_size_t dev_size = dev_dax_size(dev_dax);
>   struct device *dev = &dev_dax->dev;
>   int i;
>  
> - if (dev_size > 0 && !alloc_is_aligned(dev_dax, dev_size)) {
> - dev_dbg(dev, "%s: align %u invalid for size %pa\n",
> - __func__, dev_dax->align, &dev_size);
> - return -EINVAL;
> - }
> -
>   for (i = 0; i < dev_dax->nr_range; i++) {
>   size_t len = range_len(&dev_dax->ranges[i].range);
>  
> 



Re: [PATCH 1/2] xfs: check the return value of krealloc()

2020-11-24 Thread Leizhen (ThunderTown)



On 2020/11/24 19:51, Christoph Hellwig wrote:
> On Tue, Nov 24, 2020 at 06:45:30PM +0800, Zhen Lei wrote:
>> krealloc() may fail to expand the memory space. Add sanity checks to it,
>> and WARN() if that really happened.
> 
> What part of the __GFP_NOFAIL semantics isn't clear enough?

Oh, sorry, I didn't notice __GFP_NOFAIL flag.

> 
> 



  1   2   3   4   >