Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage
On Fri, Sep 25, 2020 at 11:30:49AM -0400, Steven Rostedt wrote: > On Fri, 25 Sep 2020 17:12:45 +0200 > Greg Kroah-Hartman wrote: > > > > Specifically, commits: > > > > > > a0d14b8909de55139b8702fe0c7e80b69763dcfb ("x86/mm, tracing: Fix CR2 > > > corruption") > > > 6879298bd0673840cadd1fb36d7225485504ceb4 ("x86/entry/64: Prevent > > > clobbering of saved CR2 value") > > > b8f70953c1251d8b16276995816a95639f598e70 ("x86/entry/32: Pass cr2 to > > > do_async_page_fault()") > > > > > > (which are in 5.4 but not 4.19) > > > > > > But again, is this too intrusive. There was a workaround that was > > > original proposed, but Peter didn't want any more band-aids, and did > > > the restructuring, but as you can see from the two other patches, it > > > makes it a bit more high risk. > > > > If those are known to work, why can't I take them as-is? > > If they apply without tweaks, I say "Go for it" ;-) > > My worry is that they may have other unknown dependencies. And I only > looked at what was applied between 4.19 and 5.4 mainline. I haven't > looked at what else may have been backported to fix the above three > commits. I tried to backport the above series, and quickly gave up, as yes, you are right, the dependencies are deep and messy from what I can tell. WHat's wrong with just moving to 5.4? :) thanks, greg k-h
Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage
On Fri, 25 Sep 2020 17:12:45 +0200 Greg Kroah-Hartman wrote: > > Specifically, commits: > > > > a0d14b8909de55139b8702fe0c7e80b69763dcfb ("x86/mm, tracing: Fix CR2 > > corruption") > > 6879298bd0673840cadd1fb36d7225485504ceb4 ("x86/entry/64: Prevent clobbering > > of saved CR2 value") > > b8f70953c1251d8b16276995816a95639f598e70 ("x86/entry/32: Pass cr2 to > > do_async_page_fault()") > > > > (which are in 5.4 but not 4.19) > > > > But again, is this too intrusive. There was a workaround that was > > original proposed, but Peter didn't want any more band-aids, and did > > the restructuring, but as you can see from the two other patches, it > > makes it a bit more high risk. > > If those are known to work, why can't I take them as-is? If they apply without tweaks, I say "Go for it" ;-) My worry is that they may have other unknown dependencies. And I only looked at what was applied between 4.19 and 5.4 mainline. I haven't looked at what else may have been backported to fix the above three commits. -- Steve
Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage
On Fri, Sep 25, 2020 at 11:07:06AM -0400, Steven Rostedt wrote: > On Fri, 25 Sep 2020 10:59:14 -0400 > Steven Rostedt wrote: > > > On Fri, 25 Sep 2020 10:54:58 -0400 > > Steven Rostedt wrote: > > > > > > > The crash looks like its cr3 related, which I believe Peter Zijlstra > > > > s/cr3/cr2/ > > > > Specifically, commits: > > a0d14b8909de55139b8702fe0c7e80b69763dcfb ("x86/mm, tracing: Fix CR2 > corruption") > 6879298bd0673840cadd1fb36d7225485504ceb4 ("x86/entry/64: Prevent clobbering > of saved CR2 value") > b8f70953c1251d8b16276995816a95639f598e70 ("x86/entry/32: Pass cr2 to > do_async_page_fault()") > > (which are in 5.4 but not 4.19) > > But again, is this too intrusive. There was a workaround that was > original proposed, but Peter didn't want any more band-aids, and did > the restructuring, but as you can see from the two other patches, it > makes it a bit more high risk. If those are known to work, why can't I take them as-is? thanks, greg k-h
Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage
On Fri, 25 Sep 2020 10:59:14 -0400 Steven Rostedt wrote: > On Fri, 25 Sep 2020 10:54:58 -0400 > Steven Rostedt wrote: > > > > The crash looks like its cr3 related, which I believe Peter Zijlstra > > s/cr3/cr2/ > Specifically, commits: a0d14b8909de55139b8702fe0c7e80b69763dcfb ("x86/mm, tracing: Fix CR2 corruption") 6879298bd0673840cadd1fb36d7225485504ceb4 ("x86/entry/64: Prevent clobbering of saved CR2 value") b8f70953c1251d8b16276995816a95639f598e70 ("x86/entry/32: Pass cr2 to do_async_page_fault()") (which are in 5.4 but not 4.19) But again, is this too intrusive. There was a workaround that was original proposed, but Peter didn't want any more band-aids, and did the restructuring, but as you can see from the two other patches, it makes it a bit more high risk. -- Steve
Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage
On Fri, 25 Sep 2020 10:54:58 -0400 Steven Rostedt wrote: > The crash looks like its cr3 related, which I believe Peter Zijlstra s/cr3/cr2/ -- Steve > did a restructuring of that code to not let it be an issue anymore. > I'll have to look deeper. The rework may be too intrusive to backport, > but we do have other work arounds for this issue if that would be > acceptable for backporting. >
Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage
On Fri, 25 Sep 2020 12:55:13 +0530 Naresh Kamboju wrote: > On Fri, 25 Sep 2020 at 10:45, Greg Kroah-Hartman > wrote: > > > > On Fri, Sep 25, 2020 at 10:13:05AM +0530, Naresh Kamboju wrote: > > > >From stable rc 4.18.1 onwards to today's stable rc 4.19.147 > > > > > > There are two problems while running LTP tracing tests > > > 1) kernel panic on i386, qemu_i386, x86_64 and qemu_x86_64 [1] > > > 2) " segfault at 0 ip " and "Code: Bad RIP value" on x86_64 and > > > qemu_x86_64 [2] > > > Please refer to the full test logs from below links. > > > > > > The first bad commit found by git bisect. > > >commit: c3bc8fd637a9623f5c507bd18f9677effbddf584 > > >tracing: Centralize preemptirq tracepoints and unify their usage > > > > > > Reported-by: Naresh Kamboju > > > > So this also is reproducable in 5.4 and Linus's tree right now? > > No. > The reported issues are not reproducible on 5.4, 5.8 and Linus's tree. The crash looks like its cr3 related, which I believe Peter Zijlstra did a restructuring of that code to not let it be an issue anymore. I'll have to look deeper. The rework may be too intrusive to backport, but we do have other work arounds for this issue if that would be acceptable for backporting. > > > > > Or are newer kernels working fine? > > No. > There are different issues while testing LTP tracing on 5.4, 5.8 and > Linus 's 5.9. > > NETDEV WATCHDOG: eth0 (igb): transmit queue 2 timed out > WARNING: CPU: 1 PID: 331 at net/sched/sch_generic.c:442 > dev_watchdog+0x4c7/0x4d0 > https://lore.kernel.org/stable/CA+G9fYtS_nAX=spv8ztts-nodpj4uxk9sqehoznus4wlvbc...@mail.gmail.com/ > > I see this on 5.4, 5.8 and Linus 's 5.9. > rcu: INFO: rcu_sched self-detected stall on CPU > ? ftrace_graph_caller+0xc0/0xc0 > https://lore.kernel.org/stable/ca+g9fysdtlrj55_bvod8sf+0zvk0rrmp5+fejcox5oacako...@mail.gmail.com/T/#u I've seen that too and couldn't bisect it down to any such commit. I'm not sure if it is even a bug per-se, because in my test suite, I've commented out the warning, and the system still remains stable. -- Steve
Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage
On Fri, 25 Sep 2020 at 10:45, Greg Kroah-Hartman wrote: > > On Fri, Sep 25, 2020 at 10:13:05AM +0530, Naresh Kamboju wrote: > > >From stable rc 4.18.1 onwards to today's stable rc 4.19.147 > > > > There are two problems while running LTP tracing tests > > 1) kernel panic on i386, qemu_i386, x86_64 and qemu_x86_64 [1] > > 2) " segfault at 0 ip " and "Code: Bad RIP value" on x86_64 and qemu_x86_64 > > [2] > > Please refer to the full test logs from below links. > > > > The first bad commit found by git bisect. > >commit: c3bc8fd637a9623f5c507bd18f9677effbddf584 > >tracing: Centralize preemptirq tracepoints and unify their usage > > > > Reported-by: Naresh Kamboju > > So this also is reproducable in 5.4 and Linus's tree right now? No. The reported issues are not reproducible on 5.4, 5.8 and Linus's tree. > > Or are newer kernels working fine? No. There are different issues while testing LTP tracing on 5.4, 5.8 and Linus 's 5.9. NETDEV WATCHDOG: eth0 (igb): transmit queue 2 timed out WARNING: CPU: 1 PID: 331 at net/sched/sch_generic.c:442 dev_watchdog+0x4c7/0x4d0 https://lore.kernel.org/stable/CA+G9fYtS_nAX=spv8ztts-nodpj4uxk9sqehoznus4wlvbc...@mail.gmail.com/ I see this on 5.4, 5.8 and Linus 's 5.9. rcu: INFO: rcu_sched self-detected stall on CPU ? ftrace_graph_caller+0xc0/0xc0 https://lore.kernel.org/stable/ca+g9fysdtlrj55_bvod8sf+0zvk0rrmp5+fejcox5oacako...@mail.gmail.com/T/#u > > thanks, > > greg k-h - Naresh
Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage
On Fri, Sep 25, 2020 at 10:13:05AM +0530, Naresh Kamboju wrote: > >From stable rc 4.18.1 onwards to today's stable rc 4.19.147 > > There are two problems while running LTP tracing tests > 1) kernel panic on i386, qemu_i386, x86_64 and qemu_x86_64 [1] > 2) " segfault at 0 ip " and "Code: Bad RIP value" on x86_64 and qemu_x86_64 > [2] > Please refer to the full test logs from below links. > > The first bad commit found by git bisect. >commit: c3bc8fd637a9623f5c507bd18f9677effbddf584 >tracing: Centralize preemptirq tracepoints and unify their usage > > Reported-by: Naresh Kamboju So this also is reproducable in 5.4 and Linus's tree right now? Or are newer kernels working fine? thanks, greg k-h
[stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage
>From stable rc 4.18.1 onwards to today's stable rc 4.19.147 There are two problems while running LTP tracing tests 1) kernel panic on i386, qemu_i386, x86_64 and qemu_x86_64 [1] 2) " segfault at 0 ip " and "Code: Bad RIP value" on x86_64 and qemu_x86_64 [2] Please refer to the full test logs from below links. The first bad commit found by git bisect. commit: c3bc8fd637a9623f5c507bd18f9677effbddf584 tracing: Centralize preemptirq tracepoints and unify their usage Reported-by: Naresh Kamboju easily reproducible on qemu steps to reproduce: # Boot qemu x86_64 with trace configs enabled. # cd /opt/ltp # ./runltp -f tracing metadata: git branch: linux-4.19.y git repo: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc make_kernelversion: 4.19.147 kernel-config: https://builds.tuxbuild.com/lOpUmeYR2e1pzvYdlLgGqw/kernel.config Crash log on qemu_i386 - ftrace-stress-test 1 TINFO: Start pid15=2414 /opt/ltp/testcases/bin/ftrace_stress/ftrace_buffer_size_kb.sh ftrace-stress-test 1 TINFO: Start pid16=2415 /opt/ltp/testcases/bin/ftrace_stress/ftrace_tracing_cpumask.sh ftrace-stress-test 1 TINFO: Start pid17=2416 /opt/ltp/testcases/bin/ftrace_stress/ftrace_set_ftrace_filter.sh [ 38.479869] Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked and stat_runtime require the kernel parameter schedstats=enable or kernel.sched_schedstats=1 Sep 23 18:39:40 intel-core2-32 user.warn kernel: [ 38.479869] Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked and stat_runtime require the kernel parameter schedstats=enable or kernel.sched_schedstats=1 [ 38.549712] cat[2583]: segfault at 0 ip b7f81767 sp bfbb3a20 error 4 in ld-2.27.so[b7f6c000+25000] [ 38.550427] sh[2582]: segfault at 467 ip b7fba0d8 sp bfacdb04 error 4 in ld-2.27.so[b7f9f000+25000] [ 38.551386] Code: 50 8d 86 84 62 ff ff 50 e8 86 a9 ff ff 83 c4 10 89 c2 83 f8 ff 0f 84 72 01 00 00 8b b6 e4 08 00 00 83 fe 10 0f 86 56 01 00 00 <81> 38 6c 64 2e 73 0f 85 1d 01 00 00 81 78 04 6f 2d 31 2e 0f 85 10 [ 38.552710] Code: 40 38 d5 74 ea 80 fd 00 74 12 c1 e9 10 40 38 d1 74 dd 80 f9 00 74 05 40 38 d5 74 d3 31 c0 eb cf 66 90 8b 4c 24 04 8b 54 24 08 <8a> 01 3a 02 75 09 41 42 84 c0 75 f4 31 c0 c3 b8 01 00 00 00 b9 ff [ 38.556010] systemd-journal[1327]: segfault at 5e ip b7c61e12 sp bff45044 error 6 in libc-2.27.so[b7b29000+1cc000] [ 38.558971] sh[2584]: segfault at 0 ip b7f30c15 sp bfbce710 error 14 [ 38.559387] audit: type=1701 audit(1600886380.372:3): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=2582 comm=\"sh\" exe=\"/bin/bash.bash\" sig=11 res=1 [ 38.559411] audit: type=1701 audit(1600886380.372:4): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=2583 comm=\"cat\" exe=\"/bin/cat.coreutils\" sig=11 res=1 [ 38.560079] Code: 66 0f 7f 5c 3a f0 72 30 66 0f 6f 54 38 10 83 e9 20 66 0f 6f 5c 38 20 66 0f 6f cb 66 0f 3a 0f da 08 66 0f 3a 0f d4 08 8d 7f 20 <66> 0f 7f 54 3a e0 66 0f 7f 5c 3a f0 73 a0 8d 49 20 01 cf 01 fa 8d [ 38.560811] audit: type=1701 audit(1600886380.373:5): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=1327 comm=\"systemd-journal\" exe=\"/lib/systemd/systemd-journald\" sig=11 res=1 [ 38.561615] Code: Bad RIP value. [ 38.564712] Core dump to |/bin/false pipe failed [ 38.566144] Core dump to |/bin/false pipe failed [ 38.566213] Core dump to |/bin/false pipe failed Sep 23 18:39:40 intel-core2-32 user.info kernel: [ 38.549712] cat[2583]: segfault at 0 ip b7f81767 sp bfbb3a20 error 4 in ld-2.27.so[b7f6c000+25000] Sep 23 18:39:40 intel-core2-32 user.info kernel: [ 38.550427] sh[2582]: segfault at 467 ip b7fba0d8 sp bfacdb04 error 4 in ld-2.27.so[b7f9f000+25000] Sep 23 18:39:40 intel-core2-32 user.info kernel: [ 38.551386] Code: 50 8d 86 84 62 ff ff 50 e8 86 a9 ff ff 83 c4 10 89 c2 83 f8 ff 0f 84 72 01 00 00 8b b6 e4 08 00 00 83 fe 10 0f 86 56 01 00 00 <81> 38 6c 64 2e 73 0f 85 1d 01 00 00 81 78 04 6f 2d 31 2e 0f 85 10 Sep 23 18:39:40 intel-core2-32 user.info kernel: [ 38.552710] Code: 40 38 d5 74 ea 80 fd 00 74 12 c1 e9 10 40 38 d1 74 dd 80 f9 00 74 05 40 38 d5 74 d3 31 c0 eb cf 66 90 8b 4c 24 04 8b 54 24 08 <8a> 01 3a 02 75 09 41 42 84 c0 75[ 38.582519] systemd[1]: segfault at 1 ip b7de036e sp bfd888e0 error 7 in libsystemd-shared-237.so[b7cd4000+1e2000] f4 31 c0 c3 b8 [ 38.584227] Code: 46 18 83 e0 1f 83 c8 20 88 46 18 89 f0 e8 ba da ff ff 85 c0 89 c3 0f 88 e0 00 00 00 8b 44 24 24 31 db 85 c0 74 06 8b 44 24 24 <89> 30 83 c4 0c 89 d8 5b 5e 5f 5d c3 8d b6 00 00 00 00 8d 83 1c 1c 01 00 00 00 b9 ff Sep 23 18:39:40 intel-core2-32 user.info kernel: [ 38.556010] systemd-journal[1327]: segfau[ 38.587783] systemd[1]: segfault at 0 ip b7a9fbe3 sp bfd88000 error 7 in libc-2.27.so[b79e5000+1cc000] lt at 5e ip b7c6[ 38.589349] Code: 14 8b 4c 24 10 8b 5c 24 0c b8 72 00 00 00 65 ff 15 10 00 00 00 5b 5e 3d 01 f0 ff ff 0f 83 75 e3 f5 ff c3 66 90 66 90 55 57 56 <53> 83 ec 1c 8b 5c 24 30 8b 4c 24 34 8b 54 24 38 8b