BUG: unable to handle kernel paing request at fffffc0000000000

2018-03-14 Thread chenjiankang


hello everyone:
my kernel version is 3.10.0-327.62.59.101.x86_64, and 
why this Kasan's shadow memory is lost?

Thanks;

BUG: unable to handle kernel paging request at fc00
IP: [] kasan_mem_to_shadow include/linux/kasan.h:20 [inline]
IP: [] memory_is_poisoned_4 mm/kasan/kasan.c:122 [inline]
IP: [] memory_is_poisoned mm/kasan/kasan.c:244 [inline]
IP: [] check_memory_region_inline mm/kasan/kasan.c:270 
[inline]
IP: [] __asan_load4+0x2b/0x80 mm/kasan/kasan.c:524
PGD 0
Oops:  [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 21826 Comm: syz-executor0 Tainted: GB   --- T 
3.10.0-327.62.59.101.x86_64+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
task: 8802337ae680 ti: 880212dc8000 task.ti: 880212dc8000
RIP: 0010:[]  [] kasan_mem_to_shadow 
include/linux/kasan.h:20 [inline]
RIP: 0010:[]  [] memory_is_poisoned_4 
mm/kasan/kasan.c:122 [inline]
RIP: 0010:[]  [] memory_is_poisoned 
mm/kasan/kasan.c:244 [inline]
RIP: 0010:[]  [] check_memory_region_inline 
mm/kasan/kasan.c:270 [inline]
RIP: 0010:[]  [] __asan_load4+0x2b/0x80 
mm/kasan/kasan.c:524
RSP: 0018:880212dcfba0  EFLAGS: 00010286
RAX: fbff RBX: 8802286ddd60 RCX: 8167b601
RDX: dc00 RSI: 0008 RDI: fff8
RBP: 880212dcfba0 R08: 0007 R09: 
R10: 8800 R11:  R12: 8802286da980
R13:  R14: fff8 R15: 81c9b370
FS:  () GS:8800bb10() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: fc00 CR3: 0255a000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Call Trace:
 [] crypto_ahash_digestsize include/crypto/hash.h:148 [inline]
 [] hash_sock_destruct+0x81/0x160 crypto/algif_hash.c:270
 [] __sk_free+0x44/0x330 net/core/sock.c:1392
 [] sk_free+0x2d/0x40 net/core/sock.c:1422
 [] sock_put include/net/sock.h:1722 [inline]
 [] af_alg_release+0x55/0x70 crypto/af_alg.c:123
 [] sock_release+0x5c/0x190 net/socket.c:570
 [] sock_close+0x1b/0x20 net/socket.c:1161
 [] __fput+0x1bb/0x560 fs/file_table.c:246
 [] fput+0x1a/0x20 fs/file_table.c:283
 [] task_work_run+0x11f/0x1e0 kernel/task_work.c:87
 [] exit_task_work include/linux/task_work.h:21 [inline]
 [] do_exit+0x68b/0x1b40 kernel/exit.c:815
 [] do_group_exit+0x91/0x1f0 kernel/exit.c:948
 [] SYSC_exit_group kernel/exit.c:959 [inline]
 [] SyS_exit_group+0x22/0x30 kernel/exit.c:957
 [] system_call_fastpath+0x16/0x1b
Code: 55 48 b8 ff ff ff ff ff 7f ff ff 48 39 c7 48 89 e5 48 8b 4d 08 76 43 48 
89 f8 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 48 01 d0 <66> 83 38 00 75 07 5d 
c3 0f 1f 44 00 00 48 8d 77 03 49 89 f0 49
RIP  [] kasan_mem_to_shadow include/linux/kasan.h:20 [inline]
RIP  [] memory_is_poisoned_4 mm/kasan/kasan.c:122 [inline]
RIP  [] memory_is_poisoned mm/kasan/kasan.c:244 [inline]
RIP  [] check_memory_region_inline mm/kasan/kasan.c:270 
[inline]
RIP  [] __asan_load4+0x2b/0x80 mm/kasan/kasan.c:524
 RSP 
CR2: fc00



Re: a racy access flag clearing warning when calling mmap system call

2017-12-11 Thread chenjiankang

> On Fri, Dec 08, 2017 at 11:19:52AM +0800, chenjiankang wrote:
>> 在 2017/12/7 21:23, Will Deacon 写道:
>>> diff --git a/arch/arm64/include/asm/pgtable.h 
>>> b/arch/arm64/include/asm/pgtable.h
>>> index 149d05fb9421..8fe103b1e101 100644
>>> --- a/arch/arm64/include/asm/pgtable.h
>>> +++ b/arch/arm64/include/asm/pgtable.h
>>> @@ -42,6 +42,8 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>> +#include 
>>>  
>>>  extern void __pte_error(const char *file, int line, unsigned long val);
>>>  extern void __pmd_error(const char *file, int line, unsigned long val);
>>> @@ -207,9 +209,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
>>> }
>>>  }
>>>  
>>> -struct mm_struct;
>>> -struct vm_area_struct;
>>> -
>>>  extern void __sync_icache_dcache(pte_t pteval, unsigned long addr);
>>>  
>>>  /*
>>> @@ -238,7 +237,8 @@ static inline void set_pte_at(struct mm_struct *mm, 
>>> unsigned long addr,
>>>  * hardware updates of the pte (ptep_set_access_flags safely changes
>>>  * valid ptes without going through an invalid entry).
>>>  */
>>> -   if (pte_valid(*ptep) && pte_valid(pte)) {
>>> +   if (IS_ENABLED(CONFIG_DEBUG_VM) && pte_valid(*ptep) && pte_valid(pte) &&
>>> +  (mm == current->active_mm || atomic_read(&mm->mm_users) > 1)) {
>>> VM_WARN_ONCE(!pte_young(pte),
>>>  "%s: racy access flag clearing: 0x%016llx -> 
>>> 0x%016llx",
>>>  __func__, pte_val(*ptep), pte_val(pte));
> [...]

Hi Will:
 I contruct a simple use case that can reproduce this fail;

like this: 
   #define LEN 1024*1024*100
   
int main(void){
int* addr = NULL; 
int pid = -1; 
addr = (int*)mmap(NULL, LEN, PROT_READ | PROT_WRITE, 
MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); 
madvise(addr, LEN, MADV_HUGEPAGE);
memset(addr, 1, LEN);
pid = fork();   
 
   
if(pid==0){ 
printf("wow! I am a child!\n");
}  
else { 
printf("I am a father!\n");
mmap(addr, 1024*1024*10, PROT_READ | PROT_WRITE, MAP_ANONYMOUS 
| MAP_PRIVATE | MAP_FIXED, -1, 0); 
}  
   
return 0;   
} 

And then, I use the will's modification,which can solve this problem;
Will, this patch should send a upstream?

>> From the print information, the only difference between pte and ptep is 
>> the PTE_SPECIAL bit.
>> And the the PTE access bit is all zero.
>> diff below. Whether the access bit of the new ptep should be judged to 
>> eliminate the 
>> false positive?
> [...]
>> diff --git a/arch/arm64/include/asm/pgtable.h 
>> b/arch/arm64/include/asm/pgtable.h
>> index 2987d5a..3c1b0c6 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -206,7 +206,7 @@ static inline void set_pte_at(struct mm_struct *mm, 
>> unsigned long addr,
>>  * valid ptes without going through an invalid entry).
>>  */
>> if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && pte_valid(*ptep)) {
>> -   VM_WARN_ONCE(!pte_young(pte),
>> +   VM_WARN_ONCE(!pte_young(pte) && pte_young(*ptep),
>>  "%s: racy access flag clearing: %016llx -> 
>> %016llx",
>>  __func__, pte_val(*ptep), pte_val(pte));
> 
> It's actually the other way around: *ptep being "old" (AF = 0) could at
> any point be made "young" by the hardware (AF = 1). This is racing with
> the software update which keeps the AF bit 0.
> 



Re: a racy access flag clearing warning when calling mmap system call

2017-12-07 Thread chenjiankang


在 2017/12/7 21:23, Will Deacon 写道:
> On Thu, Dec 07, 2017 at 09:46:59AM +0800, Yisheng Xie wrote:
>> On 2017/12/1 21:18, Will Deacon wrote:
>>> On Fri, Dec 01, 2017 at 03:38:04PM +0800, chenjiankang wrote:
>>>> [ cut here ]  
>>>> WARNING: at 
>>>> ../../../../../kernel/linux-4.1/arch/arm64/include/asm/pgtable.h:211
>>>
>>> Given that this is a fairly old 4.1 kernel, could you try to reproduce the
>>> failure with something more recent, please? We've fixed many bugs since
>>> then, some of them involving huge pages.
>>
>> Yeah, this is and old kernel, but I find a scene that will cause this 
>> warn_on:
>> When fork and dup_mmap, it will call copy_huge_pmd() and clear the Access 
>> Flag.
>>   dup_mmap
>> -> copy_page_range
>>  -> copy_pud_range
>>   -> copy_pmd_range
>>-> copy_huge_pmd
>> -> pmd_mkold
>>
>> If we do not have any access after dup_mmap, and start to split this thp,
>> it will cause this call trace in the old kernel, right?
>>
>> It seems this is normal scene but will cause call trace for this old kernel,
>> therefore, for this old kernel, we should just remove this WARN_ON_ONCE, 
>> right?
> 
> Whilst racy clearing of the access flag should be safe in practice, I like
> having the warning around because it does indicate that we're setting
> something to old which could immediately be made young again by the CPU.
> 
> In this case, it looks like the mm isn't even live, so a better approach
> would probably be to predicate that conditional on mm == current->active_mm
> or something like that. That also avoids us getting false positive for
> the dirty bit case, which would be harmful if the table was installed.
> 
> diff below. It's still racy with concurrent fork, but I don't want this
> check to become a generic "does my caller hold all the locks to protect
> against a concurrent walk" predicate and it just means we won't catch all
> possible races.
> 
> Will
> 
> --->8
> 
> diff --git a/arch/arm64/include/asm/pgtable.h 
> b/arch/arm64/include/asm/pgtable.h
> index 149d05fb9421..8fe103b1e101 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -42,6 +42,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  extern void __pte_error(const char *file, int line, unsigned long val);
>  extern void __pmd_error(const char *file, int line, unsigned long val);
> @@ -207,9 +209,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
>   }
>  }
>  
> -struct mm_struct;
> -struct vm_area_struct;
> -
>  extern void __sync_icache_dcache(pte_t pteval, unsigned long addr);
>  
>  /*
> @@ -238,7 +237,8 @@ static inline void set_pte_at(struct mm_struct *mm, 
> unsigned long addr,
>* hardware updates of the pte (ptep_set_access_flags safely changes
>* valid ptes without going through an invalid entry).
>*/
> - if (pte_valid(*ptep) && pte_valid(pte)) {
> + if (IS_ENABLED(CONFIG_DEBUG_VM) && pte_valid(*ptep) && pte_valid(pte) &&
> +(mm == current->active_mm || atomic_read(&mm->mm_users) > 1)) {
>   VM_WARN_ONCE(!pte_young(pte),
>"%s: racy access flag clearing: 0x%016llx -> 
> 0x%016llx",
>__func__, pte_val(*ptep), pte_val(pte));
> 
> 
> .
> 

Hi will
  
From the print information, the only difference between pte and ptep is the 
PTE_SPECIAL bit.
And the the PTE access bit is all zero.
diff below. Whether the access bit of the new ptep should be judged to 
eliminate the 
false positive?

Thanks 

Jiankang

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 2987d5a..3c1b0c6 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -206,7 +206,7 @@ static inline void set_pte_at(struct mm_struct *mm, 
unsigned long addr,
 * valid ptes without going through an invalid entry).
 */
if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && pte_valid(*ptep)) {
-   VM_WARN_ONCE(!pte_young(pte),
+   VM_WARN_ONCE(!pte_young(pte) && pte_young(*ptep),
 "%s: racy access flag clearing: %016llx -> 
%016llx",
 __func__, pte_val(*ptep), pte_val(pte));
VM_WARN_ONCE(pte_write(*ptep) && !pte_dirty(pte),
-- 
1.7.12.4




Re: a racy access flag clearing warning when calling mmap system call

2017-12-05 Thread chenjiankang


在 2017/12/1 21:18, Will Deacon 写道:
> On Fri, Dec 01, 2017 at 03:38:04PM +0800, chenjiankang wrote:
>>I find a warning by a syzkaller test;
>>
>>When the mmap syscall is called to create a virtual memory,
>> firstly it delete a old huge page mapping area;   
>>Before splitting the huge page, the pmd of a huge page is set up.
>> But The PTE_AF is zreo belonging to the current pmd of huge page.
>>I suspect that when the child process is created, the PTE_AF is cleared 
>> in copy_one_pte();
>> So, the set_pte_at() can have a warning.
>>
>>There, whether the set_pte_at() should detect PTE_AF?
>>
>> Thanks
>> The log:
>>
>> set_pte_at: racy access flag clearing: 00e09d200bd1 -> 
>> 01e09d200bd1
>> [ cut here ]  
>> WARNING: at 
>> ../../../../../kernel/linux-4.1/arch/arm64/include/asm/pgtable.h:211
> 
> Given that this is a fairly old 4.1 kernel, could you try to reproduce the
> failure with something more recent, please? We've fixed many bugs since
> then, some of them involving huge pages.
> 
> Thanks,
> 
> Will

 Hi Will:
   
It is more difficult to reproduce the failure with the old 4.1 kernel.
Does the warning have an impact? Or this is just to lose a PTE_AF flag?

 Thanks 
  
 Jiankang 

> 
>> Modules linked in: 
>> CPU: 0 PID: 3665 Comm: syz-executor7 Not tainted 4.1.44+ #1
>> Hardware name: linux,dummy-virt (DT)   
>> task: ffc06a873fc0 ti: ffc05aefc000 task.ti: ffc05aefc000
>> PC is at pmdp_splitting_flush+0x194/0x1b0
>> LR is at pmdp_splitting_flush+0x194/0x1b0
>> pc : [] lr : [] pstate: 8145
>> sp : ffc05aeff770 
>> x29: ffc05aeff770 x28: ffc05ae45800 
>> x27: 2020 x26: ffc061fdf450 
>> x25: 0002 x24: 0001 
>> x23: ffc06333d9c8 x22: ffc0014ba000 
>> x21: 01e09d200bd1 x20: 00e09d200bd1 
>> x19: ffc05ae45800 x18:  
>> x17: 004b4000 x16: ffc00017fdc0 
>> x15: 038ee280 x14: 3030653130203e2d 
>> x13: 2031646230303264 x12: 3930303030306530 
>> x11: 30203a676e697261 x10: 656c632067616c66 
>> x9 : 2073736563636120 x8 : 79636172203a7461 
>> x7 : ffc05aeff430 x6 : ffc00015f38c 
>> x5 : 0003 x4 :  
>> x3 : 0003 x2 : 0001 
>> x1 : ff9005a03000 x0 : 004b 
>> Call trace:
>> [] pmdp_splitting_flush+0x194/0x1b0
>> [] split_huge_page_to_list+0x168/0xdb0
>> [] __split_huge_page_pmd+0x1b0/0x510
>> [] split_huge_page_pmd_mm+0x84/0x88
>> [] split_huge_page_address+0xc4/0xe8
>> [] __vma_adjust_trans_huge+0x15c/0x190
>> [] vma_adjust+0x884/0x9f0
>> [] __split_vma.isra.5+0x200/0x310
>> [] do_munmap+0x5e0/0x608 
>> [] mmap_region+0x12c/0x900
>> [] do_mmap_pgoff+0x484/0x540
>> [] vm_mmap_pgoff+0x128/0x158
>> [] SyS_mmap_pgoff+0x188/0x300
>> [] sys_mmap+0x58/0x80   
>>
> 
> .
> 



a racy access flag clearing warning when calling mmap system call

2017-11-30 Thread chenjiankang

Hi will

   I find a warning by a syzkaller test;

   When the mmap syscall is called to create a virtual memory,
firstly it delete a old huge page mapping area;   
   Before splitting the huge page, the pmd of a huge page is set up.
But The PTE_AF is zreo belonging to the current pmd of huge page.
   I suspect that when the child process is created, the PTE_AF is cleared in 
copy_one_pte();
So, the set_pte_at() can have a warning.
   
   There, whether the set_pte_at() should detect PTE_AF?

Thanks.

The log:

set_pte_at: racy access flag clearing: 00e09d200bd1 -> 01e09d200bd1
[ cut here ]  
WARNING: at ../../../../../kernel/linux-4.1/arch/arm64/include/asm/pgtable.h:211
Modules linked in: 
CPU: 0 PID: 3665 Comm: syz-executor7 Not tainted 4.1.44+ #1
Hardware name: linux,dummy-virt (DT)   
task: ffc06a873fc0 ti: ffc05aefc000 task.ti: ffc05aefc000
PC is at pmdp_splitting_flush+0x194/0x1b0
LR is at pmdp_splitting_flush+0x194/0x1b0
pc : [] lr : [] pstate: 8145
sp : ffc05aeff770 
x29: ffc05aeff770 x28: ffc05ae45800 
x27: 2020 x26: ffc061fdf450 
x25: 0002 x24: 0001 
x23: ffc06333d9c8 x22: ffc0014ba000 
x21: 01e09d200bd1 x20: 00e09d200bd1 
x19: ffc05ae45800 x18:  
x17: 004b4000 x16: ffc00017fdc0 
x15: 038ee280 x14: 3030653130203e2d 
x13: 2031646230303264 x12: 3930303030306530 
x11: 30203a676e697261 x10: 656c632067616c66 
x9 : 2073736563636120 x8 : 79636172203a7461 
x7 : ffc05aeff430 x6 : ffc00015f38c 
x5 : 0003 x4 :  
x3 : 0003 x2 : 0001 
x1 : ff9005a03000 x0 : 004b 
Call trace:
[] pmdp_splitting_flush+0x194/0x1b0
[] split_huge_page_to_list+0x168/0xdb0
[] __split_huge_page_pmd+0x1b0/0x510
[] split_huge_page_pmd_mm+0x84/0x88
[] split_huge_page_address+0xc4/0xe8
[] __vma_adjust_trans_huge+0x15c/0x190
[] vma_adjust+0x884/0x9f0
[] __split_vma.isra.5+0x200/0x310
[] do_munmap+0x5e0/0x608 
[] mmap_region+0x12c/0x900
[] do_mmap_pgoff+0x484/0x540
[] vm_mmap_pgoff+0x128/0x158
[] SyS_mmap_pgoff+0x188/0x300
[] sys_mmap+0x58/0x80   



Re: [PATCH] kernel/kprobes: add check to avoid kprobe memory leak

2017-10-25 Thread chenjiankang
> On Tue, 24 Oct 2017 20:17:02 +0800
> JianKang Chen  wrote:
> 
>> The function register_kretprobe is used to initialize a struct
>> kretprobe and allocate a list table for kprobe instance.
>> However,in this function, there is a memory leak.
>>
>> The test case:
>>
>> static struct kretprobe rp;
>> struct  kretprobe *rps[10]={&rp ,&rp ,&rp ,
>> &rp ,&rp ,&rp ,&rp ,&rp ,&rp,&rp};
> 
> What ? this is buggy code. you must not list same kretprobe.
> But, year, since register_kprobe() already has similar protection against
> reusing, register_kretprobe() should do so.
> 
> [..]
>>  raw_spin_lock_init(&rp->lock);
>> +
>> +if (!hlist_empty(&rp->free_instances))
>> +return -EBUSY;
>> +
> 
> Hmm, but can you use check_kprobe_rereg() before raw_spin_lock_init()?
> If user reuses rp after it starts, rp->lock can already be used.

Hmm, your advice is very good, we can use check_kprobe_rereg() at 
the beginning of the register_kretprobe();

For example:

int register_kretprobe(struct kretprobe *rp)
{
int ret = 0;
struct kretprobe_instance *inst;
int i;
void *addr;

ret = check_kprobe_rereg(&rp->kp);
if (ret)
return ret;

Thank you!