Re: [RFC patch] ioremap: don't set up huge I/O mappings when p4d/pud/pmd is zero

2018-02-20 Thread Kani, Toshi
On Tue, 2018-02-20 at 14:54 +0530, Chintan Pandya wrote:
> 
> On 12/28/2017 4:54 PM, Hanjun Guo wrote:
> > From: Hanjun Guo 
> > 
> > When we using iounmap() to free the 4K mapping, it just clear the PTEs
> > but leave P4D/PUD/PMD unchanged, also will not free the memory of page
> > tables.
> > 
> > This will cause issues on ARM64 platform (not sure if other archs have
> > the same issue) for this case:
> > 
> > 1. ioremap a 4K size, valid page table will build,
> > 2. iounmap it, pte0 will set to 0;
> > 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
> > then set the a new value for pmd;
> > 4. pte0 is leaked;
> > 5. CPU may meet exception because the old pmd is still in TLB,
> > which will lead to kernel panic.
> > 
> > Fix it by skip setting up the huge I/O mappings when p4d/pud/pmd is
> > zero.
> > 
> 
> One obvious problem I see here is, once any 2nd level entry has 3rd 
> level mapping, this entry can't map 2M section ever in future. This way, 
> we will fragment entire virtual space over time.
> 
> The code you are changing is common between 32-bit systems as well (I 
> think). And running out of section mapping would be a reality in 
> practical terms.
> 
> So, if we can do the following as a fix up, we would be saved.
> 1) Invalidate 2nd level entry from TLB, and
> 2) Free the page which holds last level page table
> 
> BTW, is there any further discussion going on this topic which I am 
> missing ?

Yes, I suggested to free up a pte table in my last reply.
https://patchwork.kernel.org/patch/10134581/

Thanks,
-Toshi


Re: [RFC patch] ioremap: don't set up huge I/O mappings when p4d/pud/pmd is zero

2018-02-20 Thread Kani, Toshi
On Tue, 2018-02-20 at 14:54 +0530, Chintan Pandya wrote:
> 
> On 12/28/2017 4:54 PM, Hanjun Guo wrote:
> > From: Hanjun Guo 
> > 
> > When we using iounmap() to free the 4K mapping, it just clear the PTEs
> > but leave P4D/PUD/PMD unchanged, also will not free the memory of page
> > tables.
> > 
> > This will cause issues on ARM64 platform (not sure if other archs have
> > the same issue) for this case:
> > 
> > 1. ioremap a 4K size, valid page table will build,
> > 2. iounmap it, pte0 will set to 0;
> > 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
> > then set the a new value for pmd;
> > 4. pte0 is leaked;
> > 5. CPU may meet exception because the old pmd is still in TLB,
> > which will lead to kernel panic.
> > 
> > Fix it by skip setting up the huge I/O mappings when p4d/pud/pmd is
> > zero.
> > 
> 
> One obvious problem I see here is, once any 2nd level entry has 3rd 
> level mapping, this entry can't map 2M section ever in future. This way, 
> we will fragment entire virtual space over time.
> 
> The code you are changing is common between 32-bit systems as well (I 
> think). And running out of section mapping would be a reality in 
> practical terms.
> 
> So, if we can do the following as a fix up, we would be saved.
> 1) Invalidate 2nd level entry from TLB, and
> 2) Free the page which holds last level page table
> 
> BTW, is there any further discussion going on this topic which I am 
> missing ?

Yes, I suggested to free up a pte table in my last reply.
https://patchwork.kernel.org/patch/10134581/

Thanks,
-Toshi


Re: [RFC patch] ioremap: don't set up huge I/O mappings when p4d/pud/pmd is zero

2018-02-20 Thread Chintan Pandya



On 12/28/2017 4:54 PM, Hanjun Guo wrote:

From: Hanjun Guo 

When we using iounmap() to free the 4K mapping, it just clear the PTEs
but leave P4D/PUD/PMD unchanged, also will not free the memory of page
tables.

This will cause issues on ARM64 platform (not sure if other archs have
the same issue) for this case:

1. ioremap a 4K size, valid page table will build,
2. iounmap it, pte0 will set to 0;
3. ioremap the same address with 2M size, pgd/pmd is unchanged,
then set the a new value for pmd;
4. pte0 is leaked;
5. CPU may meet exception because the old pmd is still in TLB,
which will lead to kernel panic.

Fix it by skip setting up the huge I/O mappings when p4d/pud/pmd is
zero.

One obvious problem I see here is, once any 2nd level entry has 3rd 
level mapping, this entry can't map 2M section ever in future. This way, 
we will fragment entire virtual space over time.


The code you are changing is common between 32-bit systems as well (I 
think). And running out of section mapping would be a reality in 
practical terms.


So, if we can do the following as a fix up, we would be saved.
1) Invalidate 2nd level entry from TLB, and
2) Free the page which holds last level page table

BTW, is there any further discussion going on this topic which I am 
missing ?



Reported-by: Lei Li 
Signed-off-by: Hanjun Guo 
Cc: Toshi Kani 
Cc: Mark Rutland 
Cc: Will Deacon 
Cc: Catalin Marinas 
Cc: Michal Hocko 
Cc: Andrew Morton 
Cc: Xuefeng Wang 
---

Not sure if this is the right direction, this patch has a obvious
side effect that a mapped address with 4K will not back to 2M.  I may
miss something and just wrong, so this is just a RFC version, comments
are welcomed.

  lib/ioremap.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/ioremap.c b/lib/ioremap.c
index b808a39..4e6f19a 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -89,7 +89,7 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long 
addr,
do {
next = pmd_addr_end(addr, end);
  
-		if (ioremap_pmd_enabled() &&

+   if (ioremap_pmd_enabled() && pmd_none(*pmd) &&
((next - addr) == PMD_SIZE) &&
IS_ALIGNED(phys_addr + addr, PMD_SIZE)) {
if (pmd_set_huge(pmd, phys_addr + addr, prot))
@@ -115,7 +115,7 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned 
long addr,
do {
next = pud_addr_end(addr, end);
  
-		if (ioremap_pud_enabled() &&

+   if (ioremap_pud_enabled() && pud_none(*pud) &&
((next - addr) == PUD_SIZE) &&
IS_ALIGNED(phys_addr + addr, PUD_SIZE)) {
if (pud_set_huge(pud, phys_addr + addr, prot))
@@ -141,7 +141,7 @@ static inline int ioremap_p4d_range(pgd_t *pgd, unsigned 
long addr,
do {
next = p4d_addr_end(addr, end);
  
-		if (ioremap_p4d_enabled() &&

+   if (ioremap_p4d_enabled() && p4d_none(*p4d) &&
((next - addr) == P4D_SIZE) &&
IS_ALIGNED(phys_addr + addr, P4D_SIZE)) {
if (p4d_set_huge(p4d, phys_addr + addr, prot))



Chintan
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, a Linux Foundation
Collaborative Project


Re: [RFC patch] ioremap: don't set up huge I/O mappings when p4d/pud/pmd is zero

2018-02-20 Thread Chintan Pandya



On 12/28/2017 4:54 PM, Hanjun Guo wrote:

From: Hanjun Guo 

When we using iounmap() to free the 4K mapping, it just clear the PTEs
but leave P4D/PUD/PMD unchanged, also will not free the memory of page
tables.

This will cause issues on ARM64 platform (not sure if other archs have
the same issue) for this case:

1. ioremap a 4K size, valid page table will build,
2. iounmap it, pte0 will set to 0;
3. ioremap the same address with 2M size, pgd/pmd is unchanged,
then set the a new value for pmd;
4. pte0 is leaked;
5. CPU may meet exception because the old pmd is still in TLB,
which will lead to kernel panic.

Fix it by skip setting up the huge I/O mappings when p4d/pud/pmd is
zero.

One obvious problem I see here is, once any 2nd level entry has 3rd 
level mapping, this entry can't map 2M section ever in future. This way, 
we will fragment entire virtual space over time.


The code you are changing is common between 32-bit systems as well (I 
think). And running out of section mapping would be a reality in 
practical terms.


So, if we can do the following as a fix up, we would be saved.
1) Invalidate 2nd level entry from TLB, and
2) Free the page which holds last level page table

BTW, is there any further discussion going on this topic which I am 
missing ?



Reported-by: Lei Li 
Signed-off-by: Hanjun Guo 
Cc: Toshi Kani 
Cc: Mark Rutland 
Cc: Will Deacon 
Cc: Catalin Marinas 
Cc: Michal Hocko 
Cc: Andrew Morton 
Cc: Xuefeng Wang 
---

Not sure if this is the right direction, this patch has a obvious
side effect that a mapped address with 4K will not back to 2M.  I may
miss something and just wrong, so this is just a RFC version, comments
are welcomed.

  lib/ioremap.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/ioremap.c b/lib/ioremap.c
index b808a39..4e6f19a 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -89,7 +89,7 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long 
addr,
do {
next = pmd_addr_end(addr, end);
  
-		if (ioremap_pmd_enabled() &&

+   if (ioremap_pmd_enabled() && pmd_none(*pmd) &&
((next - addr) == PMD_SIZE) &&
IS_ALIGNED(phys_addr + addr, PMD_SIZE)) {
if (pmd_set_huge(pmd, phys_addr + addr, prot))
@@ -115,7 +115,7 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned 
long addr,
do {
next = pud_addr_end(addr, end);
  
-		if (ioremap_pud_enabled() &&

+   if (ioremap_pud_enabled() && pud_none(*pud) &&
((next - addr) == PUD_SIZE) &&
IS_ALIGNED(phys_addr + addr, PUD_SIZE)) {
if (pud_set_huge(pud, phys_addr + addr, prot))
@@ -141,7 +141,7 @@ static inline int ioremap_p4d_range(pgd_t *pgd, unsigned 
long addr,
do {
next = p4d_addr_end(addr, end);
  
-		if (ioremap_p4d_enabled() &&

+   if (ioremap_p4d_enabled() && p4d_none(*p4d) &&
((next - addr) == P4D_SIZE) &&
IS_ALIGNED(phys_addr + addr, P4D_SIZE)) {
if (p4d_set_huge(p4d, phys_addr + addr, prot))



Chintan
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, a Linux Foundation
Collaborative Project


Re: [RFC patch] ioremap: don't set up huge I/O mappings when p4d/pud/pmd is zero

2018-01-08 Thread Kani, Toshi
On Sat, 2018-01-06 at 17:46 +0800, Hanjun Guo wrote:
> On 2018/1/6 6:15, Kani, Toshi wrote:
> > On Thu, 2017-12-28 at 19:24 +0800, Hanjun Guo wrote:
> > > From: Hanjun Guo 
> > > 
> > > When we using iounmap() to free the 4K mapping, it just clear the PTEs
> > > but leave P4D/PUD/PMD unchanged, also will not free the memory of page
> > > tables.
> > > 
> > > This will cause issues on ARM64 platform (not sure if other archs have
> > > the same issue) for this case:
> > > 
> > > 1. ioremap a 4K size, valid page table will build,
> > > 2. iounmap it, pte0 will set to 0;
> > > 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
> > >then set the a new value for pmd;
> > > 4. pte0 is leaked;
> > > 5. CPU may meet exception because the old pmd is still in TLB,
> > >which will lead to kernel panic.
> > > 
> > > Fix it by skip setting up the huge I/O mappings when p4d/pud/pmd is
> > > zero.
> > 
> > Hi Hanjun,
> > 
> > I tested the above steps on my x86 box, but was not able to reproduce
> > your kernel panic.  On x86, a 4K vaddr gets allocated from a small
> > fragmented free range, whereas a 2MB vaddr is from a larger free range. 
> > Their addrs have different alignments (4KB & 2MB) as well.  So, the
> > steps did not lead to use a same pmd entry.
> 
> Thanks for the testing, I can only reproduce this on my ARM64 platform
> which the CPU will cache the PMD in TLB, from my knowledge, only Cortex-A75
> will do this, so ARM64 platforms which are not A75 based can't be reproduced
> either.
> 
> Catalin, Will, I can reproduce this issue in about 3 minutes with following
> simplified test case [1], and can trigger panic as [2], could you take a look
> as well?

Yes, the test case looks good to me. (nit - it should check if vir_addr
is not NULL.)

> > However, I agree that zero'd pte entries will be leaked when a pmd map
> > is set if they are present under the pmd.
> 
> Thanks for the confirm.
> 
> > 
> > I also tested your patch on my x86 box.  Unfortunately, it effectively
> > disabled 2MB mappings.  While a 2MB vaddr gets allocated from a larger
> > free range, it sill comes from a free range covered by zero'd pte
> > entries.  So, it ends up with 4KB mappings with your changes.
> > 
> > I think we need to come up with other approach.
> 
> Yes, As I said in my patch, this is just RFC, comments are welcomed :)

I am wondering if we can follow the same approach in
arch/x86/mm/pageattr.c.  Like the ioremap case, populate_pmd() does not
check if there is a pte table under the pmd.  But its free function,
unmap_pte_range() calls try_to_free_pte_page() so that a pte table is
freed when all pte entries are zero'd.  It then calls pmd_clear().
iounmap()'s free function, vunmap_pte_range() does not free up a pte
table even if all pte entries are zero'd.

Thanks,
-Toshi


Re: [RFC patch] ioremap: don't set up huge I/O mappings when p4d/pud/pmd is zero

2018-01-08 Thread Kani, Toshi
On Sat, 2018-01-06 at 17:46 +0800, Hanjun Guo wrote:
> On 2018/1/6 6:15, Kani, Toshi wrote:
> > On Thu, 2017-12-28 at 19:24 +0800, Hanjun Guo wrote:
> > > From: Hanjun Guo 
> > > 
> > > When we using iounmap() to free the 4K mapping, it just clear the PTEs
> > > but leave P4D/PUD/PMD unchanged, also will not free the memory of page
> > > tables.
> > > 
> > > This will cause issues on ARM64 platform (not sure if other archs have
> > > the same issue) for this case:
> > > 
> > > 1. ioremap a 4K size, valid page table will build,
> > > 2. iounmap it, pte0 will set to 0;
> > > 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
> > >then set the a new value for pmd;
> > > 4. pte0 is leaked;
> > > 5. CPU may meet exception because the old pmd is still in TLB,
> > >which will lead to kernel panic.
> > > 
> > > Fix it by skip setting up the huge I/O mappings when p4d/pud/pmd is
> > > zero.
> > 
> > Hi Hanjun,
> > 
> > I tested the above steps on my x86 box, but was not able to reproduce
> > your kernel panic.  On x86, a 4K vaddr gets allocated from a small
> > fragmented free range, whereas a 2MB vaddr is from a larger free range. 
> > Their addrs have different alignments (4KB & 2MB) as well.  So, the
> > steps did not lead to use a same pmd entry.
> 
> Thanks for the testing, I can only reproduce this on my ARM64 platform
> which the CPU will cache the PMD in TLB, from my knowledge, only Cortex-A75
> will do this, so ARM64 platforms which are not A75 based can't be reproduced
> either.
> 
> Catalin, Will, I can reproduce this issue in about 3 minutes with following
> simplified test case [1], and can trigger panic as [2], could you take a look
> as well?

Yes, the test case looks good to me. (nit - it should check if vir_addr
is not NULL.)

> > However, I agree that zero'd pte entries will be leaked when a pmd map
> > is set if they are present under the pmd.
> 
> Thanks for the confirm.
> 
> > 
> > I also tested your patch on my x86 box.  Unfortunately, it effectively
> > disabled 2MB mappings.  While a 2MB vaddr gets allocated from a larger
> > free range, it sill comes from a free range covered by zero'd pte
> > entries.  So, it ends up with 4KB mappings with your changes.
> > 
> > I think we need to come up with other approach.
> 
> Yes, As I said in my patch, this is just RFC, comments are welcomed :)

I am wondering if we can follow the same approach in
arch/x86/mm/pageattr.c.  Like the ioremap case, populate_pmd() does not
check if there is a pte table under the pmd.  But its free function,
unmap_pte_range() calls try_to_free_pte_page() so that a pte table is
freed when all pte entries are zero'd.  It then calls pmd_clear().
iounmap()'s free function, vunmap_pte_range() does not free up a pte
table even if all pte entries are zero'd.

Thanks,
-Toshi


Re: [RFC patch] ioremap: don't set up huge I/O mappings when p4d/pud/pmd is zero

2018-01-06 Thread Hanjun Guo
On 2018/1/6 6:15, Kani, Toshi wrote:
> On Thu, 2017-12-28 at 19:24 +0800, Hanjun Guo wrote:
>> From: Hanjun Guo 
>>
>> When we using iounmap() to free the 4K mapping, it just clear the PTEs
>> but leave P4D/PUD/PMD unchanged, also will not free the memory of page
>> tables.
>>
>> This will cause issues on ARM64 platform (not sure if other archs have
>> the same issue) for this case:
>>
>> 1. ioremap a 4K size, valid page table will build,
>> 2. iounmap it, pte0 will set to 0;
>> 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
>>then set the a new value for pmd;
>> 4. pte0 is leaked;
>> 5. CPU may meet exception because the old pmd is still in TLB,
>>which will lead to kernel panic.
>>
>> Fix it by skip setting up the huge I/O mappings when p4d/pud/pmd is
>> zero.
> 
> Hi Hanjun,
> 
> I tested the above steps on my x86 box, but was not able to reproduce
> your kernel panic.  On x86, a 4K vaddr gets allocated from a small
> fragmented free range, whereas a 2MB vaddr is from a larger free range. 
> Their addrs have different alignments (4KB & 2MB) as well.  So, the
> steps did not lead to use a same pmd entry.

Thanks for the testing, I can only reproduce this on my ARM64 platform
which the CPU will cache the PMD in TLB, from my knowledge, only Cortex-A75
will do this, so ARM64 platforms which are not A75 based can't be reproduced
either.

Catalin, Will, I can reproduce this issue in about 3 minutes with following
simplified test case [1], and can trigger panic as [2], could you take a look
as well?

> 
> However, I agree that zero'd pte entries will be leaked when a pmd map
> is set if they are present under the pmd.

Thanks for the confirm.

> 
> I also tested your patch on my x86 box.  Unfortunately, it effectively
> disabled 2MB mappings.  While a 2MB vaddr gets allocated from a larger
> free range, it sill comes from a free range covered by zero'd pte
> entries.  So, it ends up with 4KB mappings with your changes.
> 
> I think we need to come up with other approach.

Yes, As I said in my patch, this is just RFC, comments are welcomed :)

[1]:
#include 
#include 
#include 
#include 
#include 


void pcie_io_remap_test(u32 times_ms)
{
u64 phy_addr = 0xd000;
unsigned long timeout = jiffies + msecs_to_jiffies(times_ms);
int rand_type;
u32 mem_size;
void *vir_addr;

do {
get_random_bytes(_type, sizeof(u32));

rand_type %= 6;
switch (rand_type) {
case 0:
mem_size = 0x1000;
break;
case 1:
mem_size = 0x4000;
break;
case 2:
mem_size = 0x20;
break;
case 3:
mem_size = 0x30;
break;
case 4:
mem_size = 0x400;
break;
case 5:
mem_size = 0x40;
break;
default:
mem_size = 0x400;
break;
}

vir_addr = ioremap(phy_addr, mem_size);
readl(vir_addr);

iounmap(vir_addr);
schedule();

} while (time_before(jiffies, timeout));
}

[2]:

[ 2062.300139] Unable to handle kernel paging request at virtual address 
1860
[ 2062.300139] Unable to handle kernel paging request at virtual address 
1860
[ 2062.327051] Mem abort info:
[ 2062.327051] Mem abort info:
[ 2062.337134]   Exception class = DABT (current EL), IL = 32 bits
[ 2062.337134]   Exception class = DABT (current EL), IL = 32 bits
[ 2062.354614]   SET = 0, FnV = 0
[ 2062.354614]   SET = 0, FnV = 0
[ 2062.363818]   EA = 0, S1PTW = 0
[ 2062.363818]   EA = 0, S1PTW = 0
[ 2062.373131] Data abort info:
[ 2062.373131] Data abort info:
[ 2062.381671]   ISV = 0, ISS = 0x0007
[ 2062.381671]   ISV = 0, ISS = 0x0007
[ 2062.393099]   CM = 0, WnR = 0
[ 2062.393099]   CM = 0, WnR = 0
[ 2062.402155] swapper pgtable: 4k pages, 48-bit VAs, pgd = 09cf6000
[ 2062.402155] swapper pgtable: 4k pages, 48-bit VAs, pgd = 09cf6000
[ 2062.421477] [1860] *pgd=0021e003, *pud=0021d003, 
*pmd=00e8d705
[ 2062.421477] [1860] *pgd=0021e003, *pud=0021d003, 
*pmd=00e8d705
[ 2062.447913] Internal error: Oops: 9607 [#1] SMP
[ 2062.447913] Internal error: Oops: 9607 [#1] SMP
[ 2062.540141] CPU: 1 PID: 2149 Comm: unibsp.out Tainted: P   OEL  
4.14.0 #1
[ 2062.540141] CPU: 1 PID: 2149 Comm: unibsp.out Tainted: P   OEL  
4.14.0 #1
[ 2062.560853] task: 8021e6f8a100 task.stack: 10aa
[ 2062.560853] task: 8021e6f8a100 task.stack: 10aa
[ 2062.580070] PC is at 

Re: [RFC patch] ioremap: don't set up huge I/O mappings when p4d/pud/pmd is zero

2018-01-06 Thread Hanjun Guo
On 2018/1/6 6:15, Kani, Toshi wrote:
> On Thu, 2017-12-28 at 19:24 +0800, Hanjun Guo wrote:
>> From: Hanjun Guo 
>>
>> When we using iounmap() to free the 4K mapping, it just clear the PTEs
>> but leave P4D/PUD/PMD unchanged, also will not free the memory of page
>> tables.
>>
>> This will cause issues on ARM64 platform (not sure if other archs have
>> the same issue) for this case:
>>
>> 1. ioremap a 4K size, valid page table will build,
>> 2. iounmap it, pte0 will set to 0;
>> 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
>>then set the a new value for pmd;
>> 4. pte0 is leaked;
>> 5. CPU may meet exception because the old pmd is still in TLB,
>>which will lead to kernel panic.
>>
>> Fix it by skip setting up the huge I/O mappings when p4d/pud/pmd is
>> zero.
> 
> Hi Hanjun,
> 
> I tested the above steps on my x86 box, but was not able to reproduce
> your kernel panic.  On x86, a 4K vaddr gets allocated from a small
> fragmented free range, whereas a 2MB vaddr is from a larger free range. 
> Their addrs have different alignments (4KB & 2MB) as well.  So, the
> steps did not lead to use a same pmd entry.

Thanks for the testing, I can only reproduce this on my ARM64 platform
which the CPU will cache the PMD in TLB, from my knowledge, only Cortex-A75
will do this, so ARM64 platforms which are not A75 based can't be reproduced
either.

Catalin, Will, I can reproduce this issue in about 3 minutes with following
simplified test case [1], and can trigger panic as [2], could you take a look
as well?

> 
> However, I agree that zero'd pte entries will be leaked when a pmd map
> is set if they are present under the pmd.

Thanks for the confirm.

> 
> I also tested your patch on my x86 box.  Unfortunately, it effectively
> disabled 2MB mappings.  While a 2MB vaddr gets allocated from a larger
> free range, it sill comes from a free range covered by zero'd pte
> entries.  So, it ends up with 4KB mappings with your changes.
> 
> I think we need to come up with other approach.

Yes, As I said in my patch, this is just RFC, comments are welcomed :)

[1]:
#include 
#include 
#include 
#include 
#include 


void pcie_io_remap_test(u32 times_ms)
{
u64 phy_addr = 0xd000;
unsigned long timeout = jiffies + msecs_to_jiffies(times_ms);
int rand_type;
u32 mem_size;
void *vir_addr;

do {
get_random_bytes(_type, sizeof(u32));

rand_type %= 6;
switch (rand_type) {
case 0:
mem_size = 0x1000;
break;
case 1:
mem_size = 0x4000;
break;
case 2:
mem_size = 0x20;
break;
case 3:
mem_size = 0x30;
break;
case 4:
mem_size = 0x400;
break;
case 5:
mem_size = 0x40;
break;
default:
mem_size = 0x400;
break;
}

vir_addr = ioremap(phy_addr, mem_size);
readl(vir_addr);

iounmap(vir_addr);
schedule();

} while (time_before(jiffies, timeout));
}

[2]:

[ 2062.300139] Unable to handle kernel paging request at virtual address 
1860
[ 2062.300139] Unable to handle kernel paging request at virtual address 
1860
[ 2062.327051] Mem abort info:
[ 2062.327051] Mem abort info:
[ 2062.337134]   Exception class = DABT (current EL), IL = 32 bits
[ 2062.337134]   Exception class = DABT (current EL), IL = 32 bits
[ 2062.354614]   SET = 0, FnV = 0
[ 2062.354614]   SET = 0, FnV = 0
[ 2062.363818]   EA = 0, S1PTW = 0
[ 2062.363818]   EA = 0, S1PTW = 0
[ 2062.373131] Data abort info:
[ 2062.373131] Data abort info:
[ 2062.381671]   ISV = 0, ISS = 0x0007
[ 2062.381671]   ISV = 0, ISS = 0x0007
[ 2062.393099]   CM = 0, WnR = 0
[ 2062.393099]   CM = 0, WnR = 0
[ 2062.402155] swapper pgtable: 4k pages, 48-bit VAs, pgd = 09cf6000
[ 2062.402155] swapper pgtable: 4k pages, 48-bit VAs, pgd = 09cf6000
[ 2062.421477] [1860] *pgd=0021e003, *pud=0021d003, 
*pmd=00e8d705
[ 2062.421477] [1860] *pgd=0021e003, *pud=0021d003, 
*pmd=00e8d705
[ 2062.447913] Internal error: Oops: 9607 [#1] SMP
[ 2062.447913] Internal error: Oops: 9607 [#1] SMP
[ 2062.540141] CPU: 1 PID: 2149 Comm: unibsp.out Tainted: P   OEL  
4.14.0 #1
[ 2062.540141] CPU: 1 PID: 2149 Comm: unibsp.out Tainted: P   OEL  
4.14.0 #1
[ 2062.560853] task: 8021e6f8a100 task.stack: 10aa
[ 2062.560853] task: 8021e6f8a100 task.stack: 10aa
[ 2062.580070] PC is at pcie_io_remap_test+0x9c/0x228
[ 

Re: [RFC patch] ioremap: don't set up huge I/O mappings when p4d/pud/pmd is zero

2018-01-05 Thread Kani, Toshi
On Thu, 2017-12-28 at 19:24 +0800, Hanjun Guo wrote:
> From: Hanjun Guo 
> 
> When we using iounmap() to free the 4K mapping, it just clear the PTEs
> but leave P4D/PUD/PMD unchanged, also will not free the memory of page
> tables.
> 
> This will cause issues on ARM64 platform (not sure if other archs have
> the same issue) for this case:
> 
> 1. ioremap a 4K size, valid page table will build,
> 2. iounmap it, pte0 will set to 0;
> 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
>then set the a new value for pmd;
> 4. pte0 is leaked;
> 5. CPU may meet exception because the old pmd is still in TLB,
>which will lead to kernel panic.
> 
> Fix it by skip setting up the huge I/O mappings when p4d/pud/pmd is
> zero.

Hi Hanjun,

I tested the above steps on my x86 box, but was not able to reproduce
your kernel panic.  On x86, a 4K vaddr gets allocated from a small
fragmented free range, whereas a 2MB vaddr is from a larger free range. 
Their addrs have different alignments (4KB & 2MB) as well.  So, the
steps did not lead to use a same pmd entry.

However, I agree that zero'd pte entries will be leaked when a pmd map
is set if they are present under the pmd.

I also tested your patch on my x86 box.  Unfortunately, it effectively
disabled 2MB mappings.  While a 2MB vaddr gets allocated from a larger
free range, it sill comes from a free range covered by zero'd pte
entries.  So, it ends up with 4KB mappings with your changes.

I think we need to come up with other approach.
Thanks,
-Toshi


Re: [RFC patch] ioremap: don't set up huge I/O mappings when p4d/pud/pmd is zero

2018-01-05 Thread Kani, Toshi
On Thu, 2017-12-28 at 19:24 +0800, Hanjun Guo wrote:
> From: Hanjun Guo 
> 
> When we using iounmap() to free the 4K mapping, it just clear the PTEs
> but leave P4D/PUD/PMD unchanged, also will not free the memory of page
> tables.
> 
> This will cause issues on ARM64 platform (not sure if other archs have
> the same issue) for this case:
> 
> 1. ioremap a 4K size, valid page table will build,
> 2. iounmap it, pte0 will set to 0;
> 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
>then set the a new value for pmd;
> 4. pte0 is leaked;
> 5. CPU may meet exception because the old pmd is still in TLB,
>which will lead to kernel panic.
> 
> Fix it by skip setting up the huge I/O mappings when p4d/pud/pmd is
> zero.

Hi Hanjun,

I tested the above steps on my x86 box, but was not able to reproduce
your kernel panic.  On x86, a 4K vaddr gets allocated from a small
fragmented free range, whereas a 2MB vaddr is from a larger free range. 
Their addrs have different alignments (4KB & 2MB) as well.  So, the
steps did not lead to use a same pmd entry.

However, I agree that zero'd pte entries will be leaked when a pmd map
is set if they are present under the pmd.

I also tested your patch on my x86 box.  Unfortunately, it effectively
disabled 2MB mappings.  While a 2MB vaddr gets allocated from a larger
free range, it sill comes from a free range covered by zero'd pte
entries.  So, it ends up with 4KB mappings with your changes.

I think we need to come up with other approach.
Thanks,
-Toshi


Re: [RFC patch] ioremap: don't set up huge I/O mappings when p4d/pud/pmd is zero

2017-12-29 Thread Hanjun Guo
oops, the title of this patch is wrong, should be:
ioremap: skip setting up huge I/O mappings when p4d/pud/pmd is zero

On 2017/12/28 19:24, Hanjun Guo wrote:
> From: Hanjun Guo 
> 
> When we using iounmap() to free the 4K mapping, it just clear the PTEs
> but leave P4D/PUD/PMD unchanged, also will not free the memory of page
> tables.
> 
> This will cause issues on ARM64 platform (not sure if other archs have
> the same issue) for this case:
> 
> 1. ioremap a 4K size, valid page table will build,
> 2. iounmap it, pte0 will set to 0;
> 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
>then set the a new value for pmd;
> 4. pte0 is leaked;
> 5. CPU may meet exception because the old pmd is still in TLB,
>which will lead to kernel panic.
> 
> Fix it by skip setting up the huge I/O mappings when p4d/pud/pmd is
> zero.
> 
> Reported-by: Lei Li 
> Signed-off-by: Hanjun Guo 
> Cc: Toshi Kani 
> Cc: Mark Rutland 
> Cc: Will Deacon 
> Cc: Catalin Marinas 
> Cc: Michal Hocko 
> Cc: Andrew Morton 
> Cc: Xuefeng Wang 
> ---
> 
> Not sure if this is the right direction, this patch has a obvious
> side effect that a mapped address with 4K will not back to 2M.  I may
> miss something and just wrong, so this is just a RFC version, comments
> are welcomed.
> 
>  lib/ioremap.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/ioremap.c b/lib/ioremap.c
> index b808a39..4e6f19a 100644
> --- a/lib/ioremap.c
> +++ b/lib/ioremap.c
> @@ -89,7 +89,7 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned 
> long addr,
>   do {
>   next = pmd_addr_end(addr, end);
>  
> - if (ioremap_pmd_enabled() &&
> + if (ioremap_pmd_enabled() && pmd_none(*pmd) &&
>   ((next - addr) == PMD_SIZE) &&
>   IS_ALIGNED(phys_addr + addr, PMD_SIZE)) {
>   if (pmd_set_huge(pmd, phys_addr + addr, prot))
> @@ -115,7 +115,7 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned 
> long addr,
>   do {
>   next = pud_addr_end(addr, end);
>  
> - if (ioremap_pud_enabled() &&
> + if (ioremap_pud_enabled() && pud_none(*pud) &&
>   ((next - addr) == PUD_SIZE) &&
>   IS_ALIGNED(phys_addr + addr, PUD_SIZE)) {
>   if (pud_set_huge(pud, phys_addr + addr, prot))
> @@ -141,7 +141,7 @@ static inline int ioremap_p4d_range(pgd_t *pgd, unsigned 
> long addr,
>   do {
>   next = p4d_addr_end(addr, end);
>  
> - if (ioremap_p4d_enabled() &&
> + if (ioremap_p4d_enabled() && p4d_none(*p4d) &&
>   ((next - addr) == P4D_SIZE) &&
>   IS_ALIGNED(phys_addr + addr, P4D_SIZE)) {
>   if (p4d_set_huge(p4d, phys_addr + addr, prot))
> 



Re: [RFC patch] ioremap: don't set up huge I/O mappings when p4d/pud/pmd is zero

2017-12-29 Thread Hanjun Guo
oops, the title of this patch is wrong, should be:
ioremap: skip setting up huge I/O mappings when p4d/pud/pmd is zero

On 2017/12/28 19:24, Hanjun Guo wrote:
> From: Hanjun Guo 
> 
> When we using iounmap() to free the 4K mapping, it just clear the PTEs
> but leave P4D/PUD/PMD unchanged, also will not free the memory of page
> tables.
> 
> This will cause issues on ARM64 platform (not sure if other archs have
> the same issue) for this case:
> 
> 1. ioremap a 4K size, valid page table will build,
> 2. iounmap it, pte0 will set to 0;
> 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
>then set the a new value for pmd;
> 4. pte0 is leaked;
> 5. CPU may meet exception because the old pmd is still in TLB,
>which will lead to kernel panic.
> 
> Fix it by skip setting up the huge I/O mappings when p4d/pud/pmd is
> zero.
> 
> Reported-by: Lei Li 
> Signed-off-by: Hanjun Guo 
> Cc: Toshi Kani 
> Cc: Mark Rutland 
> Cc: Will Deacon 
> Cc: Catalin Marinas 
> Cc: Michal Hocko 
> Cc: Andrew Morton 
> Cc: Xuefeng Wang 
> ---
> 
> Not sure if this is the right direction, this patch has a obvious
> side effect that a mapped address with 4K will not back to 2M.  I may
> miss something and just wrong, so this is just a RFC version, comments
> are welcomed.
> 
>  lib/ioremap.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/ioremap.c b/lib/ioremap.c
> index b808a39..4e6f19a 100644
> --- a/lib/ioremap.c
> +++ b/lib/ioremap.c
> @@ -89,7 +89,7 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned 
> long addr,
>   do {
>   next = pmd_addr_end(addr, end);
>  
> - if (ioremap_pmd_enabled() &&
> + if (ioremap_pmd_enabled() && pmd_none(*pmd) &&
>   ((next - addr) == PMD_SIZE) &&
>   IS_ALIGNED(phys_addr + addr, PMD_SIZE)) {
>   if (pmd_set_huge(pmd, phys_addr + addr, prot))
> @@ -115,7 +115,7 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned 
> long addr,
>   do {
>   next = pud_addr_end(addr, end);
>  
> - if (ioremap_pud_enabled() &&
> + if (ioremap_pud_enabled() && pud_none(*pud) &&
>   ((next - addr) == PUD_SIZE) &&
>   IS_ALIGNED(phys_addr + addr, PUD_SIZE)) {
>   if (pud_set_huge(pud, phys_addr + addr, prot))
> @@ -141,7 +141,7 @@ static inline int ioremap_p4d_range(pgd_t *pgd, unsigned 
> long addr,
>   do {
>   next = p4d_addr_end(addr, end);
>  
> - if (ioremap_p4d_enabled() &&
> + if (ioremap_p4d_enabled() && p4d_none(*p4d) &&
>   ((next - addr) == P4D_SIZE) &&
>   IS_ALIGNED(phys_addr + addr, P4D_SIZE)) {
>   if (p4d_set_huge(p4d, phys_addr + addr, prot))
>