Re: kernel BUG in split_huge_page_to_list

2021-01-24 Thread Li Xinhai




On 1/24/21 8:01 PM, syzbot wrote:

Hello,

syzbot found the following issue on:

HEAD commit:647060f3 Add linux-next specific files for 20210120
git tree:   linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=16f0353f50
kernel config:  https://syzkaller.appspot.com/x/.config?x=8f8a72b7e5067002
dashboard link: https://syzkaller.appspot.com/bug?extid=9b83ff893245a25c320e
compiler:   gcc (GCC) 10.1.0-syz 20200507
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=143d7e3b50
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17652feb50



It looks like a new page been mapped by the old vma, which happen
- after the existing page table entry moved from old vma to new vma;
- before unlinking anon_vma from old vma;

So this new page's ->mapping still points to the anon_vma which been
unlinked by the old vma. later, rmap path would not be able to find
mapped entries of this page.

Any hints about above scenario? I suppose it only possible for THP
pages and not happen through page fault path. Thanks.



The issue was bisected to:

commit fbdbae3da30a149a55a5f1883bbbe17a27660e05
Author: Li Xinhai 
Date:   Tue Jan 19 21:54:00 2021 +

 mm: mremap: unlink anon_vmas when mremap with MREMAP_DONTUNMAP success

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=16cad100d0
final oops: https://syzkaller.appspot.com/x/report.txt?x=15cad100d0
console output: https://syzkaller.appspot.com/x/log.txt?x=11cad100d0

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+9b83ff893245a25c3...@syzkaller.appspotmail.com
Fixes: fbdbae3da30a ("mm: mremap: unlink anon_vmas when mremap with MREMAP_DONTUNMAP 
success")

head:091c6650 order:9 compound_mapcount:0 compound_pincount:0
memcg:888010d0a000
anon flags: 0xfff009001d(locked|uptodate|dirty|lru|head|swapbacked)
raw: 00fff009001d eabc51c8 888010201800 88802575d801
raw: 00020e00  01fc 888010d0a000
page dumped because: VM_BUG_ON_PAGE(!unmap_success)
[ cut here ]
kernel BUG at mm/huge_memory.c:2351!
invalid opcode:  [#1] PREEMPT SMP KASAN
CPU: 0 PID: 8483 Comm: syz-executor525 Not tainted 
5.11.0-rc4-next-20210120-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:unmap_page mm/huge_memory.c:2351 [inline]
RIP: 0010:split_huge_page_to_list+0x1f02/0x43b0 mm/huge_memory.c:2720
Code: ef e8 82 46 ea ff 0f 0b e8 ab 69 b9 ff 4c 8d 73 ff e9 56 ea ff ff e8 9d 69 b9 
ff 48 c7 c6 40 69 57 89 48 89 ef e8 5e 46 ea ff <0f> 0b e8 87 69 b9 ff 4c 8d 75 
ff e9 28 e9 ff ff e8 79 69 b9 ff 49
RSP: 0018:c9000168f7a0 EFLAGS: 00010282
RAX:  RBX:  RCX: 
RDX: 88801e2d5400 RSI: 88bcc6c7 RDI: f520002d1e8e
RBP: eaca8000 R08: 0033 R09: 
R10: 815b136e R11:  R12: 888010d0ae60
R13: eaca8000 R14: 018c R15: 
FS:  0154e880() GS:8880b9e0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7fcc655666c0 CR3: 12e9a000 CR4: 001506f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
  split_huge_page include/linux/huge_mm.h:187 [inline]
  madvise_free_pte_range+0x736/0x1ee0 mm/madvise.c:633
  walk_pmd_range mm/pagewalk.c:89 [inline]
  walk_pud_range mm/pagewalk.c:160 [inline]
  walk_p4d_range mm/pagewalk.c:193 [inline]
  walk_pgd_range mm/pagewalk.c:229 [inline]
  __walk_page_range+0xe20/0x1ea0 mm/pagewalk.c:331
  walk_page_range+0x20d/0x400 mm/pagewalk.c:427
  madvise_free_single_vma+0x383/0x550 mm/madvise.c:731
  madvise_dontneed_free mm/madvise.c:819 [inline]
  madvise_vma mm/madvise.c:936 [inline]
  do_madvise.part.0+0x4e4/0x1ed0 mm/madvise.c:1132
  do_madvise mm/madvise.c:1158 [inline]
  __do_sys_madvise mm/madvise.c:1158 [inline]
  __se_sys_madvise mm/madvise.c:1156 [inline]
  __x64_sys_madvise+0x113/0x150 mm/madvise.c:1156
  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x440219
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 
48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b 13 
fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:7ffc51b58b98 EFLAGS: 0246 ORIG_RAX: 001c
RAX: ffda RBX: 004002c8 RCX: 00440219
RDX: 0008 RSI: 00c0 RDI: 2040
RBP: 006ca018 R08:  R09: 
R10: 20ffc000 R11: 0246 R12: 00401a20
R13: 00401ab0 R14:  R15: 
Modules linked in:
---[ end trace 7812a1

tools/bpf: build failed with defconfig(x86_64) on v5.6 and v5.7

2020-06-24 Thread Li Xinhai
- information of machine
Linux localhost.localdomain 4.18.0-193.6.3.el8_2.x86_64 #1 SMP Wed Jun 10 
11:09:32 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

- configurations
make defconfig
make kvmconfig

- failed logs on v5.6
```
  LINK     /mnt/build/1_build/05_build_v5.6/bpf/bpftool//libbpf/libbpf.a
  LINK     /mnt/build/1_build/05_build_v5.6/bpf/bpftool/bpftool
  DESCEND  runqslower
  GEN      
/mnt/build/0_code/0_linux/linux/tools/bpf/runqslower/.output/bpf_helper_defs.h
make[4]: *** No rule to make target 
'/mnt/build/0_code/0_linux/linux/tools/include/linux/build_bug.h', needed by 
'/mnt/build/0_code/0_linux/linux/tools/bpf/runqslower/.output/staticobjs/libbpf.o'.
  Stop.
make[3]: *** [Makefile:183: 
/mnt/build/0_code/0_linux/linux/tools/bpf/runqslower/.output/staticobjs/libbpf-in.o]
 Error 2
make[2]: *** [Makefile:79: .output/libbpf.a] Error 2
make[1]: *** [Makefile:119: runqslower] Error 2
make: *** [Makefile:68: bpf] Error 2
```

- failed logs on v5.7
```
In file included from 
/mnt/build/0_code/0_linux/linux/tools/include/linux/build_bug.h:5,
                 from 
/mnt/build/0_code/0_linux/linux/tools/include/linux/kernel.h:8,
                 from /mnt/build/0_code/0_linux/linux/kernel/bpf/disasm.h:10,
                 from /mnt/build/0_code/0_linux/linux/kernel/bpf/disasm.c:8:
/mnt/build/0_code/0_linux/linux/kernel/bpf/disasm.c: In function 
‘__func_get_name’:
/mnt/build/0_code/0_linux/linux/tools/include/linux/compiler.h:37:38: warning: 
nested extern declaration of ‘__compiletime_assert_0’ [-Wnested-externs]
  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
                                      ^
/mnt/build/0_code/0_linux/linux/tools/include/linux/compiler.h:16:15: note: in 
definition of macro ‘__compiletime_assert’
   extern void prefix ## suffix(void) __compiletime_error(msg); \
               ^~
/mnt/build/0_code/0_linux/linux/tools/include/linux/compiler.h:37:2: note: in 
expansion of macro ‘_compiletime_assert’
  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
  ^~~
/mnt/build/0_code/0_linux/linux/tools/include/linux/build_bug.h:39:37: note: in 
expansion of macro ‘compiletime_assert’
 #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                     ^~
/mnt/build/0_code/0_linux/linux/tools/include/linux/build_bug.h:50:2: note: in 
expansion of macro ‘BUILD_BUG_ON_MSG’
  BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
  ^~~~
/mnt/build/0_code/0_linux/linux/kernel/bpf/disasm.c:20:2: note: in expansion of 
macro ‘BUILD_BUG_ON’
  BUILD_BUG_ON(ARRAY_SIZE(func_id_str) != __BPF_FUNC_MAX_ID);
  ^~~~
```

and 
```
  LINK     /mnt/build/0_code/0_linux/linux/tools/bpf/runqslower/.output/libbpf.a
  GEN      vmlinux.h
  BPF      runqslower.bpf.o
In file included from runqslower.bpf.c:3:
.output/vmlinux.h:5:15: error: attribute 'preserve_access_index' is not 
supported by '#pragma clang attribute'
#pragma clang attribute push (__attribute__((preserve_access_index)), apply_to 
= record)
              ^
.output/vmlinux.h:98607:15: error: '#pragma clang attribute pop' with no 
matching '#pragma clang attribute push'
#pragma clang attribute pop
              ^
2 errors generated.
make[2]: *** [Makefile:57: .output/runqslower.bpf.o] Error 1
make[1]: *** [Makefile:119: runqslower] Error 2
make: *** [Makefile:68: bpf] Error 2
```

On this same machine and with same configuration, I've tried v5.4 and v5.5, no 
failures.




Re: [PATCH] tracing: Fix events.rst section numbering

2020-05-21 Thread Li Xinhai
On 2020-05-19 at 02:29 Tom Zanussi wrote:
>The in-kernel trace event API should have its own section, and the
>duplicate section numbers need fixing as well.
>
>Signed-off-by: Tom Zanussi 
>Reported-by: Li Xinhai 
>---
> Documentation/trace/events.rst | 28 ++--
> 1 file changed, 14 insertions(+), 14 deletions(-)
>
>diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst
>index ed79b220bd07..1a3b7762cb0f 100644
>--- a/Documentation/trace/events.rst
>+++ b/Documentation/trace/events.rst
>@@ -526,8 +526,8 @@ The following commands are supported:
>
>   See Documentation/trace/histogram.rst for details and examples.
>
>-6.3 In-kernel trace event API
>--
>+7. In-kernel trace event API
>+
>
> In most cases, the command-line interface to trace events is more than
> sufficient.  Sometimes, however, applications might find the need for
>@@ -559,8 +559,8 @@ following:
>   - tracing synthetic events from in-kernel code
>   - the low-level "dynevent_cmd" API
>
>-6.3.1 Dyamically creating synthetic event definitions
>--
>+7.1 Dyamically creating synthetic event definitions
>+---
>
> There are a couple ways to create a new synthetic event from a kernel
> module or other kernel code.
>@@ -665,8 +665,8 @@ registered by calling the synth_event_gen_cmd_end() 
>function:
> At this point, the event object is ready to be used for tracing new
> events.
>
>-6.3.3 Tracing synthetic events from in-kernel code
>---
>+7.2 Tracing synthetic events from in-kernel code
>+
>
> To trace a synthetic event, there are several options.  The first
> option is to trace the event in one call, using synth_event_trace()
>@@ -677,8 +677,8 @@ synth_event_trace_start() and synth_event_trace_end() 
>along with
> synth_event_add_next_val() or synth_event_add_val() to add the values
> piecewise.
>
>-6.3.3.1 Tracing a synthetic event all at once
>--
>+7.2.1 Tracing a synthetic event all at once
>+---
>
> To trace a synthetic event all at once, the synth_event_trace() or
> synth_event_trace_array() functions can be used.
>@@ -779,8 +779,8 @@ remove the event:
>
>    ret = synth_event_delete("schedtest");
>
>-6.3.3.1 Tracing a synthetic event piecewise
>
>+7.2.2 Tracing a synthetic event piecewise
>+-
>
> To trace a synthetic using the piecewise method described above, the
> synth_event_trace_start() function is used to 'open' the synthetic
>@@ -863,8 +863,8 @@ Note that synth_event_trace_end() must be called at the 
>end regardless
> of whether any of the add calls failed (say due to a bad field name
> being passed in).
>
>-6.3.4 Dyamically creating kprobe and kretprobe event definitions
>-
>+7.3 Dyamically creating kprobe and kretprobe event definitions
>+--
>
> To create a kprobe or kretprobe trace event from kernel code, the
> kprobe_event_gen_cmd_start() or kretprobe_event_gen_cmd_start()
>@@ -940,8 +940,8 @@ used to give the kprobe event file back and delete the 
>event:
>
>   ret = kprobe_event_delete("gen_kprobe_test");
>
>-6.3.4 The "dynevent_cmd" low-level API
>-------
>+7.4 The "dynevent_cmd" low-level API
>+
>
> Both the in-kernel synthetic event and kprobe interfaces are built on
> top of a lower-level "dynevent_cmd" interface.  This interface is
>--
>2.17.1
> 

It looks correct to me.
Reviewed-by: Li Xinhai 

>

Re: Documentation/trace/events.rst: wrong numbering of sections

2020-05-17 Thread Li Xinhai
>Hi,
>
>On Fri, 2020-05-15 at 09:11 -0400, Steven Rostedt wrote:
>> It's best to Cc the maintainers of the file. Nobody reads linux-
>> kernel (it
>> produces 800 emails a day!). Luckily, I happen to monitor the
>> linux-trace-devel list (which is mostly for userland tools),
>> otherwise this
>> email would have been lost to the LKML abyss.
>>
>> On Fri, 15 May 2020 15:43:43 +0800
>> "Li Xinhai"  wrote:
>>
>> > This document has below numbering of its sections:
>> >
>> > 1. Introduction
>> > 2. Using Event Tracing
>> > 2.1 Via the 'set_event' interface
>> > 2.2 Via the 'enable' toggle
>> > 2.3 Boot option
>> > 3. Defining an event-enabled tracepoint
>> > 4. Event formats
>> > 5. Event filtering
>> > 5.1 Expression syntax
>> > 5.2 Setting filters
>> > 5.3 Clearing filters
>> > 5.3 Subsystem filters
>> > 5.4 PID filtering
>> > 6. Event triggers
>> > 6.1 Expression syntax
>> > 6.2 Supported trigger commands
>> > 6.3 In-kernel trace event API
>> > 6.3.1 Dyamically creating synthetic event definitions
>> > 6.3.3 Tracing synthetic events from in-kernel code
>> > 6.3.3.1 Tracing a synthetic event all at once
>> > 6.3.3.1 Tracing a synthetic event piecewise
>> > 6.3.4 Dyamically creating kprobe and kretprobe event definitions
>> > 6.3.4 The "dynevent_cmd" low-level API
>> >
>> > It seems wrong numbering within 6.3 section.
>> > or, would it be better to have separated chapter #7, for 'In-kernel
>> > trace
>> > event API'? it seems not belong to 'Event triggers'.
>>
>> Yeah, 6.3.4 (both of them) probably should have been under a new top
>> level
>> section. (#7).
>>
>
>Yeah, aside from duplicate numbering in a couple of places, it would
>make more sense for everything starting from '6.3 In-kernel trace event
>API' to be in a section 7.
>
>Would you like to submit a patch for that, Li, or should I?
> 
I am not sure the correct organization of these part, you maybe better to fix 
it, thanks.

>Thanks,
>
>Tom
>
>> -- Steve
>

Documentation/trace/events.rst: wrong numbering of sections

2020-05-15 Thread Li Xinhai
This document has below numbering of its sections:

1. Introduction
2. Using Event Tracing
2.1 Via the 'set_event' interface
2.2 Via the 'enable' toggle
2.3 Boot option
3. Defining an event-enabled tracepoint
4. Event formats
5. Event filtering
5.1 Expression syntax
5.2 Setting filters
5.3 Clearing filters
5.3 Subsystem filters
5.4 PID filtering
6. Event triggers
6.1 Expression syntax
6.2 Supported trigger commands
6.3 In-kernel trace event API
6.3.1 Dyamically creating synthetic event definitions
6.3.3 Tracing synthetic events from in-kernel code
6.3.3.1 Tracing a synthetic event all at once
6.3.3.1 Tracing a synthetic event piecewise
6.3.4 Dyamically creating kprobe and kretprobe event definitions
6.3.4 The "dynevent_cmd" low-level API

It seems wrong numbering within 6.3 section. 
or, would it be better to have separated chapter #7, for 'In-kernel trace
event API'? it seems not belong to 'Event triggers'.