Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage

2020-10-05 Thread Greg Kroah-Hartman
On Fri, Sep 25, 2020 at 11:30:49AM -0400, Steven Rostedt wrote:
> On Fri, 25 Sep 2020 17:12:45 +0200
> Greg Kroah-Hartman  wrote:
> 
> > > Specifically, commits:
> > > 
> > > a0d14b8909de55139b8702fe0c7e80b69763dcfb ("x86/mm, tracing: Fix CR2 
> > > corruption")
> > > 6879298bd0673840cadd1fb36d7225485504ceb4 ("x86/entry/64: Prevent 
> > > clobbering of saved CR2 value")
> > > b8f70953c1251d8b16276995816a95639f598e70 ("x86/entry/32: Pass cr2 to 
> > > do_async_page_fault()")
> > > 
> > > (which are in 5.4 but not 4.19)
> > > 
> > > But again, is this too intrusive. There was a workaround that was
> > > original proposed, but Peter didn't want any more band-aids, and did
> > > the restructuring, but as you can see from the two other patches, it
> > > makes it a bit more high risk.  
> > 
> > If those are known to work, why can't I take them as-is?
> 
> If they apply without tweaks, I say "Go for it" ;-)
> 
> My worry is that they may have other unknown dependencies. And I only
> looked at what was applied between 4.19 and 5.4 mainline. I haven't
> looked at what else may have been backported to fix the above three
> commits.

I tried to backport the above series, and quickly gave up, as yes, you
are right, the dependencies are deep and messy from what I can tell.

WHat's wrong with just moving to 5.4? :)

thanks,

greg k-h


Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage

2020-09-25 Thread Steven Rostedt
On Fri, 25 Sep 2020 17:12:45 +0200
Greg Kroah-Hartman  wrote:

> > Specifically, commits:
> > 
> > a0d14b8909de55139b8702fe0c7e80b69763dcfb ("x86/mm, tracing: Fix CR2 
> > corruption")
> > 6879298bd0673840cadd1fb36d7225485504ceb4 ("x86/entry/64: Prevent clobbering 
> > of saved CR2 value")
> > b8f70953c1251d8b16276995816a95639f598e70 ("x86/entry/32: Pass cr2 to 
> > do_async_page_fault()")
> > 
> > (which are in 5.4 but not 4.19)
> > 
> > But again, is this too intrusive. There was a workaround that was
> > original proposed, but Peter didn't want any more band-aids, and did
> > the restructuring, but as you can see from the two other patches, it
> > makes it a bit more high risk.  
> 
> If those are known to work, why can't I take them as-is?

If they apply without tweaks, I say "Go for it" ;-)

My worry is that they may have other unknown dependencies. And I only
looked at what was applied between 4.19 and 5.4 mainline. I haven't
looked at what else may have been backported to fix the above three
commits.

-- Steve


Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage

2020-09-25 Thread Greg Kroah-Hartman
On Fri, Sep 25, 2020 at 11:07:06AM -0400, Steven Rostedt wrote:
> On Fri, 25 Sep 2020 10:59:14 -0400
> Steven Rostedt  wrote:
> 
> > On Fri, 25 Sep 2020 10:54:58 -0400
> > Steven Rostedt  wrote:
> > 
> > 
> > > The crash looks like its cr3 related, which I believe Peter Zijlstra  
> > 
> > s/cr3/cr2/
> > 
> 
> Specifically, commits:
> 
> a0d14b8909de55139b8702fe0c7e80b69763dcfb ("x86/mm, tracing: Fix CR2 
> corruption")
> 6879298bd0673840cadd1fb36d7225485504ceb4 ("x86/entry/64: Prevent clobbering 
> of saved CR2 value")
> b8f70953c1251d8b16276995816a95639f598e70 ("x86/entry/32: Pass cr2 to 
> do_async_page_fault()")
> 
> (which are in 5.4 but not 4.19)
> 
> But again, is this too intrusive. There was a workaround that was
> original proposed, but Peter didn't want any more band-aids, and did
> the restructuring, but as you can see from the two other patches, it
> makes it a bit more high risk.

If those are known to work, why can't I take them as-is?

thanks,

greg k-h


Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage

2020-09-25 Thread Steven Rostedt
On Fri, 25 Sep 2020 10:59:14 -0400
Steven Rostedt  wrote:

> On Fri, 25 Sep 2020 10:54:58 -0400
> Steven Rostedt  wrote:
> 
> 
> > The crash looks like its cr3 related, which I believe Peter Zijlstra  
> 
> s/cr3/cr2/
> 

Specifically, commits:

a0d14b8909de55139b8702fe0c7e80b69763dcfb ("x86/mm, tracing: Fix CR2 corruption")
6879298bd0673840cadd1fb36d7225485504ceb4 ("x86/entry/64: Prevent clobbering of 
saved CR2 value")
b8f70953c1251d8b16276995816a95639f598e70 ("x86/entry/32: Pass cr2 to 
do_async_page_fault()")

(which are in 5.4 but not 4.19)

But again, is this too intrusive. There was a workaround that was
original proposed, but Peter didn't want any more band-aids, and did
the restructuring, but as you can see from the two other patches, it
makes it a bit more high risk.

-- Steve


Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage

2020-09-25 Thread Steven Rostedt
On Fri, 25 Sep 2020 10:54:58 -0400
Steven Rostedt  wrote:


> The crash looks like its cr3 related, which I believe Peter Zijlstra

s/cr3/cr2/

-- Steve


> did a restructuring of that code to not let it be an issue anymore.
> I'll have to look deeper. The rework may be too intrusive to backport,
> but we do have other work arounds for this issue if that would be
> acceptable for backporting.
> 


Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage

2020-09-25 Thread Steven Rostedt
On Fri, 25 Sep 2020 12:55:13 +0530
Naresh Kamboju  wrote:

> On Fri, 25 Sep 2020 at 10:45, Greg Kroah-Hartman
>  wrote:
> >
> > On Fri, Sep 25, 2020 at 10:13:05AM +0530, Naresh Kamboju wrote:  
> > > >From stable rc 4.18.1 onwards to today's stable rc 4.19.147  
> > >
> > > There are two problems  while running LTP tracing tests
> > > 1) kernel panic  on i386, qemu_i386, x86_64 and qemu_x86_64 [1]
> > > 2) " segfault at 0 ip " and "Code: Bad RIP value" on x86_64 and 
> > > qemu_x86_64 [2]
> > > Please refer to the full test logs from below links.
> > >
> > > The first bad commit found by git bisect.
> > >commit: c3bc8fd637a9623f5c507bd18f9677effbddf584
> > >tracing: Centralize preemptirq tracepoints and unify their usage
> > >
> > > Reported-by: Naresh Kamboju   
> >
> > So this also is reproducable in 5.4 and Linus's tree right now?  
> 
> No.
> The reported issues are not reproducible on 5.4, 5.8 and Linus's tree.

The crash looks like its cr3 related, which I believe Peter Zijlstra
did a restructuring of that code to not let it be an issue anymore.
I'll have to look deeper. The rework may be too intrusive to backport,
but we do have other work arounds for this issue if that would be
acceptable for backporting.

> 
> >
> > Or are newer kernels working fine?  
> 
> No.
> There are different issues while testing LTP tracing on 5.4, 5.8 and
> Linus 's 5.9.
> 
> NETDEV WATCHDOG: eth0 (igb): transmit queue 2 timed out
> WARNING: CPU: 1 PID: 331 at net/sched/sch_generic.c:442 
> dev_watchdog+0x4c7/0x4d0
> https://lore.kernel.org/stable/CA+G9fYtS_nAX=spv8ztts-nodpj4uxk9sqehoznus4wlvbc...@mail.gmail.com/
> 
> I see this on 5.4, 5.8 and Linus 's 5.9.
> rcu: INFO: rcu_sched self-detected stall on CPU
> ? ftrace_graph_caller+0xc0/0xc0
> https://lore.kernel.org/stable/ca+g9fysdtlrj55_bvod8sf+0zvk0rrmp5+fejcox5oacako...@mail.gmail.com/T/#u

I've seen that too and couldn't bisect it down to any such commit. I'm
not sure if it is even a bug per-se, because in my test suite, I've
commented out the warning, and the system still remains stable.

-- Steve


Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage

2020-09-25 Thread Naresh Kamboju
On Fri, 25 Sep 2020 at 10:45, Greg Kroah-Hartman
 wrote:
>
> On Fri, Sep 25, 2020 at 10:13:05AM +0530, Naresh Kamboju wrote:
> > >From stable rc 4.18.1 onwards to today's stable rc 4.19.147
> >
> > There are two problems  while running LTP tracing tests
> > 1) kernel panic  on i386, qemu_i386, x86_64 and qemu_x86_64 [1]
> > 2) " segfault at 0 ip " and "Code: Bad RIP value" on x86_64 and qemu_x86_64 
> > [2]
> > Please refer to the full test logs from below links.
> >
> > The first bad commit found by git bisect.
> >commit: c3bc8fd637a9623f5c507bd18f9677effbddf584
> >tracing: Centralize preemptirq tracepoints and unify their usage
> >
> > Reported-by: Naresh Kamboju 
>
> So this also is reproducable in 5.4 and Linus's tree right now?

No.
The reported issues are not reproducible on 5.4, 5.8 and Linus's tree.

>
> Or are newer kernels working fine?

No.
There are different issues while testing LTP tracing on 5.4, 5.8 and
Linus 's 5.9.

NETDEV WATCHDOG: eth0 (igb): transmit queue 2 timed out
WARNING: CPU: 1 PID: 331 at net/sched/sch_generic.c:442 dev_watchdog+0x4c7/0x4d0
https://lore.kernel.org/stable/CA+G9fYtS_nAX=spv8ztts-nodpj4uxk9sqehoznus4wlvbc...@mail.gmail.com/

I see this on 5.4, 5.8 and Linus 's 5.9.
rcu: INFO: rcu_sched self-detected stall on CPU
? ftrace_graph_caller+0xc0/0xc0
https://lore.kernel.org/stable/ca+g9fysdtlrj55_bvod8sf+0zvk0rrmp5+fejcox5oacako...@mail.gmail.com/T/#u

>
> thanks,
>
> greg k-h

- Naresh


Re: [stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage

2020-09-24 Thread Greg Kroah-Hartman
On Fri, Sep 25, 2020 at 10:13:05AM +0530, Naresh Kamboju wrote:
> >From stable rc 4.18.1 onwards to today's stable rc 4.19.147
> 
> There are two problems  while running LTP tracing tests
> 1) kernel panic  on i386, qemu_i386, x86_64 and qemu_x86_64 [1]
> 2) " segfault at 0 ip " and "Code: Bad RIP value" on x86_64 and qemu_x86_64 
> [2]
> Please refer to the full test logs from below links.
> 
> The first bad commit found by git bisect.
>commit: c3bc8fd637a9623f5c507bd18f9677effbddf584
>tracing: Centralize preemptirq tracepoints and unify their usage
> 
> Reported-by: Naresh Kamboju 

So this also is reproducable in 5.4 and Linus's tree right now?

Or are newer kernels working fine?

thanks,

greg k-h


[stable 4.19] [PANIC]: tracing: Centralize preemptirq tracepoints and unify their usage

2020-09-24 Thread Naresh Kamboju
>From stable rc 4.18.1 onwards to today's stable rc 4.19.147

There are two problems  while running LTP tracing tests
1) kernel panic  on i386, qemu_i386, x86_64 and qemu_x86_64 [1]
2) " segfault at 0 ip " and "Code: Bad RIP value" on x86_64 and qemu_x86_64 [2]
Please refer to the full test logs from below links.

The first bad commit found by git bisect.
   commit: c3bc8fd637a9623f5c507bd18f9677effbddf584
   tracing: Centralize preemptirq tracepoints and unify their usage

Reported-by: Naresh Kamboju 

easily reproducible on qemu
steps to reproduce:
# Boot qemu x86_64 with trace configs enabled.
# cd /opt/ltp
# ./runltp -f tracing

metadata:
  git branch: linux-4.19.y
  git repo: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc
  make_kernelversion: 4.19.147
  kernel-config:
https://builds.tuxbuild.com/lOpUmeYR2e1pzvYdlLgGqw/kernel.config


Crash log on qemu_i386
-

ftrace-stress-test 1 TINFO: Start pid15=2414
/opt/ltp/testcases/bin/ftrace_stress/ftrace_buffer_size_kb.sh
ftrace-stress-test 1 TINFO: Start pid16=2415
/opt/ltp/testcases/bin/ftrace_stress/ftrace_tracing_cpumask.sh
ftrace-stress-test 1 TINFO: Start pid17=2416
/opt/ltp/testcases/bin/ftrace_stress/ftrace_set_ftrace_filter.sh
[   38.479869] Scheduler tracepoints stat_sleep, stat_iowait,
stat_blocked and stat_runtime require the kernel parameter
schedstats=enable or kernel.sched_schedstats=1
Sep 23 18:39:40 intel-core2-32 user.warn kernel: [   38.479869]
Scheduler tracepoints stat_sleep, stat_iowait, stat_blocked and
stat_runtime require the kernel parameter schedstats=enable or
kernel.sched_schedstats=1
[   38.549712] cat[2583]: segfault at 0 ip b7f81767 sp bfbb3a20 error
4 in ld-2.27.so[b7f6c000+25000]
[   38.550427] sh[2582]: segfault at 467 ip b7fba0d8 sp bfacdb04 error
4 in ld-2.27.so[b7f9f000+25000]
[   38.551386] Code: 50 8d 86 84 62 ff ff 50 e8 86 a9 ff ff 83 c4 10
89 c2 83 f8 ff 0f 84 72 01 00 00 8b b6 e4 08 00 00 83 fe 10 0f 86 56
01 00 00 <81> 38 6c 64 2e 73 0f 85 1d 01 00 00 81 78 04 6f 2d 31 2e 0f
85 10
[   38.552710] Code: 40 38 d5 74 ea 80 fd 00 74 12 c1 e9 10 40 38 d1
74 dd 80 f9 00 74 05 40 38 d5 74 d3 31 c0 eb cf 66 90 8b 4c 24 04 8b
54 24 08 <8a> 01 3a 02 75 09 41 42 84 c0 75 f4 31 c0 c3 b8 01 00 00 00
b9 ff
[   38.556010] systemd-journal[1327]: segfault at 5e ip b7c61e12 sp
bff45044 error 6 in libc-2.27.so[b7b29000+1cc000]
[   38.558971] sh[2584]: segfault at 0 ip b7f30c15 sp bfbce710 error 14
[   38.559387] audit: type=1701 audit(1600886380.372:3):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=2582
comm=\"sh\" exe=\"/bin/bash.bash\" sig=11 res=1
[   38.559411] audit: type=1701 audit(1600886380.372:4):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=2583
comm=\"cat\" exe=\"/bin/cat.coreutils\" sig=11 res=1
[   38.560079] Code: 66 0f 7f 5c 3a f0 72 30 66 0f 6f 54 38 10 83 e9
20 66 0f 6f 5c 38 20 66 0f 6f cb 66 0f 3a 0f da 08 66 0f 3a 0f d4 08
8d 7f 20 <66> 0f 7f 54 3a e0 66 0f 7f 5c 3a f0 73 a0 8d 49 20 01 cf 01
fa 8d
[   38.560811] audit: type=1701 audit(1600886380.373:5):
auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=1327
comm=\"systemd-journal\" exe=\"/lib/systemd/systemd-journald\" sig=11
res=1
[   38.561615] Code: Bad RIP value.
[   38.564712] Core dump to |/bin/false pipe failed
[   38.566144] Core dump to |/bin/false pipe failed
[   38.566213] Core dump to |/bin/false pipe failed
Sep 23 18:39:40 intel-core2-32 user.info kernel: [   38.549712]
cat[2583]: segfault at 0 ip b7f81767 sp bfbb3a20 error 4 in
ld-2.27.so[b7f6c000+25000]
Sep 23 18:39:40 intel-core2-32 user.info kernel: [   38.550427]
sh[2582]: segfault at 467 ip b7fba0d8 sp bfacdb04 error 4 in
ld-2.27.so[b7f9f000+25000]
Sep 23 18:39:40 intel-core2-32 user.info kernel: [   38.551386] Code:
50 8d 86 84 62 ff ff 50 e8 86 a9 ff ff 83 c4 10 89 c2 83 f8 ff 0f 84
72 01 00 00 8b b6 e4 08 00 00 83 fe 10 0f 86 56 01 00 00 <81> 38 6c 64
2e 73 0f 85 1d 01 00 00 81 78 04 6f 2d 31 2e 0f 85 10
Sep 23 18:39:40 intel-core2-32 user.info kernel: [   38.552710] Code:
40 38 d5 74 ea 80 fd 00 74 12 c1 e9 10 40 38 d1 74 dd 80 f9 00 74 05
40 38 d5 74 d3 31 c0 eb cf 66 90 8b 4c 24 04 8b 54 24 08 <8a> 01 3a 02
75 09 41 42 84 c0 75[   38.582519] systemd[1]: segfault at 1 ip
b7de036e sp bfd888e0 error 7 in
libsystemd-shared-237.so[b7cd4000+1e2000]
 f4 31 c0 c3 b8 [   38.584227] Code: 46 18 83 e0 1f 83 c8 20 88 46 18
89 f0 e8 ba da ff ff 85 c0 89 c3 0f 88 e0 00 00 00 8b 44 24 24 31 db
85 c0 74 06 8b 44 24 24 <89> 30 83 c4 0c 89 d8 5b 5e 5f 5d c3 8d b6 00
00 00 00 8d 83 1c 1c
01 00 00 00 b9 ff
Sep 23 18:39:40 intel-core2-32 user.info kernel: [   38.556010]
systemd-journal[1327]: segfau[   38.587783] systemd[1]: segfault at 0
ip b7a9fbe3 sp bfd88000 error 7 in libc-2.27.so[b79e5000+1cc000]
lt at 5e ip b7c6[   38.589349] Code: 14 8b 4c 24 10 8b 5c 24 0c b8 72
00 00 00 65 ff 15 10 00 00 00 5b 5e 3d 01 f0 ff ff 0f 83 75 e3 f5 ff
c3 66 90 66 90 55 57 56 <53> 83 ec 1c 8b 5c 24 30 8b 4c 24 34 8b 54 24
38 8b