Re: ptrace/strace and freezer oddities and v5.2+ kernels
On Tue, Oct 08, 2019 at 02:36:01PM +0200, Oleg Nesterov wrote: > On 10/08, Bruce Ashfield wrote: > > > > So I've been looking through the config delta's and late last night, I was > > able to move the runtime back to a failed 4 minute state by adding the > > CONFIG_PREEMPT settings that we have by default in our reference > > kernel. Ah, yeah, I don't have CONFIG_PREEMPT on any of my machines. Good catch, Bruce! > > Aha... Can you try the patch below? > > Oleg. > > --- x/kernel/signal.c > +++ x/kernel/signal.c > @@ -2205,8 +2205,8 @@ static void ptrace_stop(int exit_code, int why, int > clear_code, kernel_siginfo_t >*/ > preempt_disable(); > read_unlock(_lock); > - preempt_enable_no_resched(); > cgroup_enter_frozen(); > + preempt_enable_no_resched(); > freezable_schedule(); > cgroup_leave_frozen(true); > } else { > That was fast! Thank you, Oleg!
Re: ptrace/strace and freezer oddities and v5.2+ kernels
On Tue, Oct 8, 2019 at 8:36 AM Oleg Nesterov wrote: > > On 10/08, Bruce Ashfield wrote: > > > > So I've been looking through the config delta's and late last night, I was > > able to move the runtime back to a failed 4 minute state by adding the > > CONFIG_PREEMPT settings that we have by default in our reference > > kernel. > > Aha... Can you try the patch below? Confirmed. 4 second runtime with that change, 4 minutes with it in the original position. .. I'm kind of shocked I just didn't try that myself, since I spent plenty of time staring at the innards of cgroup_enter_frozen() for enough time to at least get an inkling to try that. I'll run this through some additional testing, but initial results are good. I'm not familiar enough with the semantics at play to even guess at any possible side effects. But do let me know if i can do anything else on this .. and thanks for everyone's patience. Bruce > > Oleg. > > --- x/kernel/signal.c > +++ x/kernel/signal.c > @@ -2205,8 +2205,8 @@ static void ptrace_stop(int exit_code, int why, int > clear_code, kernel_siginfo_t > */ > preempt_disable(); > read_unlock(_lock); > - preempt_enable_no_resched(); > cgroup_enter_frozen(); > + preempt_enable_no_resched(); > freezable_schedule(); > cgroup_leave_frozen(true); > } else { > -- - Thou shalt not follow the NULL pointer, for chaos and madness await thee at its end - "Use the force Harry" - Gandalf, Star Trek II
Re: ptrace/strace and freezer oddities and v5.2+ kernels
On 10/08, Bruce Ashfield wrote: > > So I've been looking through the config delta's and late last night, I was > able to move the runtime back to a failed 4 minute state by adding the > CONFIG_PREEMPT settings that we have by default in our reference > kernel. Aha... Can you try the patch below? Oleg. --- x/kernel/signal.c +++ x/kernel/signal.c @@ -2205,8 +2205,8 @@ static void ptrace_stop(int exit_code, int why, int clear_code, kernel_siginfo_t */ preempt_disable(); read_unlock(_lock); - preempt_enable_no_resched(); cgroup_enter_frozen(); + preempt_enable_no_resched(); freezable_schedule(); cgroup_leave_frozen(true); } else {
Re: ptrace/strace and freezer oddities and v5.2+ kernels
On Mon, Oct 7, 2019 at 7:28 PM Roman Gushchin wrote: > > On Mon, Oct 07, 2019 at 04:11:07PM -0400, Bruce Ashfield wrote: > > On Mon, Oct 7, 2019 at 8:54 AM Bruce Ashfield > > wrote: > > > > > > On Thu, Oct 3, 2019 at 8:09 PM Roman Gushchin wrote: > > > > > > > > On Wed, Oct 02, 2019 at 05:59:36PM -0400, Bruce Ashfield wrote: > > > > > On Wed, Oct 2, 2019 at 2:19 PM Roman Gushchin wrote: > > > > > > > > > > > > On Wed, Oct 02, 2019 at 12:18:54AM -0400, Bruce Ashfield wrote: > > > > > > > On Tue, Oct 1, 2019 at 10:01 PM Roman Gushchin > > > > > > > wrote: > > > > > > > > > > > > > > > > On Tue, Oct 01, 2019 at 12:14:18PM -0400, Bruce Ashfield wrote: > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > > > > Hi Bruce! > > > > > > > > > > > > > > > > > The Yocto project has an upcoming release this fall, and I've > > > > > > > > > been trying to > > > > > > > > > sort through some issues that are happening with kernel 5.2+ > > > > > > > > > .. although > > > > > > > > > there is a specific yocto kernel, I'm testing and seeing this > > > > > > > > > with > > > > > > > > > normal / vanilla > > > > > > > > > mainline kernels as well. > > > > > > > > > > > > > > > > > > I'm running into an issue that is *very* similar to the one > > > > > > > > > discussed in the > > > > > > > > > [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer" > > > > > > > > > (76f969e) > > > > > > > > > thread from this past may: https://lkml.org/lkml/2019/5/12/272 > > > > > > > > > > > > > > > > > > I can confirm that I have the proposed fix for the initial > > > > > > > > > regression report in > > > > > > > > > my build (05b2892637 [signal: unconditionally leave the > > > > > > > > > frozen state > > > > > > > > > in ptrace_stop()]), > > > > > > > > > but yet I'm still seeing 3 or 4 minute runtimes on a test > > > > > > > > > that used to take 3 or > > > > > > > > > 4 seconds. > > > > > > > > > > > > > > > > So, the problem is that you're experiencing a severe > > > > > > > > performance regression > > > > > > > > in some test, right? > > > > > > > > > > > > > > Hi Roman, > > > > > > > > > > > > > > Correct. In particular, running some of the tests that ship with > > > > > > > strace itself. > > > > > > > The performance change is so drastic, that it definitely makes > > > > > > > you wonder > > > > > > > "What have I done wrong? Since everyone must be seeing this" .. > > > > > > > and I > > > > > > > always blame myself first. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This isn't my normal area of kernel hacking, so I've so far > > > > > > > > > come up empty > > > > > > > > > at either fixing it myself, or figuring out a viable > > > > > > > > > workaround. (well, I can > > > > > > > > > "fix" it by remove the cgroup_enter_frozen() call in > > > > > > > > > ptrace_stop ... > > > > > > > > > but obviously, > > > > > > > > > that is just me trying to figure out what could be causing > > > > > > > > > the issue). > > > > > > > > > > > > > > > > > > As part of the release, we run tests that come with various > > > > > > > > > applications. The > > > > > > > > > ptrace test that is causing us issues can be boiled down to > > > > > > > > > this: > > > > > > > > > > > > > > > > > > $ cd /usr/lib/strace/ptest/tests > > > > > > > > > $ time ../strace -o log -qq -esignal=none -e/clock > > > > > > > > > ./printpath-umovestr>ttt > > > > > > > > > > > > > > > > > > (I can provide as many details as needed, but I wanted to > > > > > > > > > keep this initial > > > > > > > > > email relatively short). > > > > > > > > > > > > > > > > > > I'll continue to debug and attempt to fix this myself, but I > > > > > > > > > grabbed the > > > > > > > > > email list from the regression report in May to see if anyone > > > > > > > > > has any ideas > > > > > > > > > or angles that I haven't covered in my search for a fix. > > > > > > > > > > > > > > > > I'm definitely happy to help, but it's a bit hard to say > > > > > > > > anything from what > > > > > > > > you've provided. I'm not aware of any open issues with the > > > > > > > > freezer except > > > > > > > > some spurious cgroup frozen<->not frozen transitions which can > > > > > > > > happen in some > > > > > > > > cases. If you'll describe how can I reproduce the issue, and > > > > > > > > I'll try to take > > > > > > > > a look asap. > > > > > > > > > > > > > > That would be great. > > > > > > > > > > > > > > I'll attempt to remove all of the build system specifics out of > > > > > > > this > > > > > > > (and Richard Purdie > > > > > > > on the cc' of this can probably help provide more details / setup > > > > > > > info as well). > > > > > > > > > > > > > > We are running the built-in tests of strace. So here's a cut and > > > > > > > paste of what I > > > > > > > did to get the tests available (ignore/skip what is common sense > > > > > > > or isn't needed > > > > > > > in your test rig). > > > > > > > > > > > > > > %
Re: ptrace/strace and freezer oddities and v5.2+ kernels
On Mon, Oct 07, 2019 at 04:11:07PM -0400, Bruce Ashfield wrote: > On Mon, Oct 7, 2019 at 8:54 AM Bruce Ashfield > wrote: > > > > On Thu, Oct 3, 2019 at 8:09 PM Roman Gushchin wrote: > > > > > > On Wed, Oct 02, 2019 at 05:59:36PM -0400, Bruce Ashfield wrote: > > > > On Wed, Oct 2, 2019 at 2:19 PM Roman Gushchin wrote: > > > > > > > > > > On Wed, Oct 02, 2019 at 12:18:54AM -0400, Bruce Ashfield wrote: > > > > > > On Tue, Oct 1, 2019 at 10:01 PM Roman Gushchin wrote: > > > > > > > > > > > > > > On Tue, Oct 01, 2019 at 12:14:18PM -0400, Bruce Ashfield wrote: > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > Hi Bruce! > > > > > > > > > > > > > > > The Yocto project has an upcoming release this fall, and I've > > > > > > > > been trying to > > > > > > > > sort through some issues that are happening with kernel 5.2+ .. > > > > > > > > although > > > > > > > > there is a specific yocto kernel, I'm testing and seeing this > > > > > > > > with > > > > > > > > normal / vanilla > > > > > > > > mainline kernels as well. > > > > > > > > > > > > > > > > I'm running into an issue that is *very* similar to the one > > > > > > > > discussed in the > > > > > > > > [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer" > > > > > > > > (76f969e) > > > > > > > > thread from this past may: https://lkml.org/lkml/2019/5/12/272 > > > > > > > > > > > > > > > > I can confirm that I have the proposed fix for the initial > > > > > > > > regression report in > > > > > > > > my build (05b2892637 [signal: unconditionally leave the frozen > > > > > > > > state > > > > > > > > in ptrace_stop()]), > > > > > > > > but yet I'm still seeing 3 or 4 minute runtimes on a test that > > > > > > > > used to take 3 or > > > > > > > > 4 seconds. > > > > > > > > > > > > > > So, the problem is that you're experiencing a severe performance > > > > > > > regression > > > > > > > in some test, right? > > > > > > > > > > > > Hi Roman, > > > > > > > > > > > > Correct. In particular, running some of the tests that ship with > > > > > > strace itself. > > > > > > The performance change is so drastic, that it definitely makes you > > > > > > wonder > > > > > > "What have I done wrong? Since everyone must be seeing this" .. and > > > > > > I > > > > > > always blame myself first. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This isn't my normal area of kernel hacking, so I've so far > > > > > > > > come up empty > > > > > > > > at either fixing it myself, or figuring out a viable > > > > > > > > workaround. (well, I can > > > > > > > > "fix" it by remove the cgroup_enter_frozen() call in > > > > > > > > ptrace_stop ... > > > > > > > > but obviously, > > > > > > > > that is just me trying to figure out what could be causing the > > > > > > > > issue). > > > > > > > > > > > > > > > > As part of the release, we run tests that come with various > > > > > > > > applications. The > > > > > > > > ptrace test that is causing us issues can be boiled down to > > > > > > > > this: > > > > > > > > > > > > > > > > $ cd /usr/lib/strace/ptest/tests > > > > > > > > $ time ../strace -o log -qq -esignal=none -e/clock > > > > > > > > ./printpath-umovestr>ttt > > > > > > > > > > > > > > > > (I can provide as many details as needed, but I wanted to keep > > > > > > > > this initial > > > > > > > > email relatively short). > > > > > > > > > > > > > > > > I'll continue to debug and attempt to fix this myself, but I > > > > > > > > grabbed the > > > > > > > > email list from the regression report in May to see if anyone > > > > > > > > has any ideas > > > > > > > > or angles that I haven't covered in my search for a fix. > > > > > > > > > > > > > > I'm definitely happy to help, but it's a bit hard to say anything > > > > > > > from what > > > > > > > you've provided. I'm not aware of any open issues with the > > > > > > > freezer except > > > > > > > some spurious cgroup frozen<->not frozen transitions which can > > > > > > > happen in some > > > > > > > cases. If you'll describe how can I reproduce the issue, and I'll > > > > > > > try to take > > > > > > > a look asap. > > > > > > > > > > > > That would be great. > > > > > > > > > > > > I'll attempt to remove all of the build system specifics out of this > > > > > > (and Richard Purdie > > > > > > on the cc' of this can probably help provide more details / setup > > > > > > info as well). > > > > > > > > > > > > We are running the built-in tests of strace. So here's a cut and > > > > > > paste of what I > > > > > > did to get the tests available (ignore/skip what is common sense or > > > > > > isn't needed > > > > > > in your test rig). > > > > > > > > > > > > % git clone https://github.com/strace/strace.git > > > > > > % cd strace > > > > > > % ./bootstrap > > > > > > # the --enable flag isn't strictly required, but may break on some > > > > > > build machines > > > > > > % ./configure --enable-mpers=no > > > > > > % make > > > > > > % make check-TESTS > > > > > >
Re: ptrace/strace and freezer oddities and v5.2+ kernels
On Mon, Oct 7, 2019 at 8:54 AM Bruce Ashfield wrote: > > On Thu, Oct 3, 2019 at 8:09 PM Roman Gushchin wrote: > > > > On Wed, Oct 02, 2019 at 05:59:36PM -0400, Bruce Ashfield wrote: > > > On Wed, Oct 2, 2019 at 2:19 PM Roman Gushchin wrote: > > > > > > > > On Wed, Oct 02, 2019 at 12:18:54AM -0400, Bruce Ashfield wrote: > > > > > On Tue, Oct 1, 2019 at 10:01 PM Roman Gushchin wrote: > > > > > > > > > > > > On Tue, Oct 01, 2019 at 12:14:18PM -0400, Bruce Ashfield wrote: > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > Hi Bruce! > > > > > > > > > > > > > The Yocto project has an upcoming release this fall, and I've > > > > > > > been trying to > > > > > > > sort through some issues that are happening with kernel 5.2+ .. > > > > > > > although > > > > > > > there is a specific yocto kernel, I'm testing and seeing this with > > > > > > > normal / vanilla > > > > > > > mainline kernels as well. > > > > > > > > > > > > > > I'm running into an issue that is *very* similar to the one > > > > > > > discussed in the > > > > > > > [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer" > > > > > > > (76f969e) > > > > > > > thread from this past may: https://lkml.org/lkml/2019/5/12/272 > > > > > > > > > > > > > > I can confirm that I have the proposed fix for the initial > > > > > > > regression report in > > > > > > > my build (05b2892637 [signal: unconditionally leave the frozen > > > > > > > state > > > > > > > in ptrace_stop()]), > > > > > > > but yet I'm still seeing 3 or 4 minute runtimes on a test that > > > > > > > used to take 3 or > > > > > > > 4 seconds. > > > > > > > > > > > > So, the problem is that you're experiencing a severe performance > > > > > > regression > > > > > > in some test, right? > > > > > > > > > > Hi Roman, > > > > > > > > > > Correct. In particular, running some of the tests that ship with > > > > > strace itself. > > > > > The performance change is so drastic, that it definitely makes you > > > > > wonder > > > > > "What have I done wrong? Since everyone must be seeing this" .. and I > > > > > always blame myself first. > > > > > > > > > > > > > > > > > > > > > > > > > This isn't my normal area of kernel hacking, so I've so far come > > > > > > > up empty > > > > > > > at either fixing it myself, or figuring out a viable workaround. > > > > > > > (well, I can > > > > > > > "fix" it by remove the cgroup_enter_frozen() call in ptrace_stop > > > > > > > ... > > > > > > > but obviously, > > > > > > > that is just me trying to figure out what could be causing the > > > > > > > issue). > > > > > > > > > > > > > > As part of the release, we run tests that come with various > > > > > > > applications. The > > > > > > > ptrace test that is causing us issues can be boiled down to this: > > > > > > > > > > > > > > $ cd /usr/lib/strace/ptest/tests > > > > > > > $ time ../strace -o log -qq -esignal=none -e/clock > > > > > > > ./printpath-umovestr>ttt > > > > > > > > > > > > > > (I can provide as many details as needed, but I wanted to keep > > > > > > > this initial > > > > > > > email relatively short). > > > > > > > > > > > > > > I'll continue to debug and attempt to fix this myself, but I > > > > > > > grabbed the > > > > > > > email list from the regression report in May to see if anyone has > > > > > > > any ideas > > > > > > > or angles that I haven't covered in my search for a fix. > > > > > > > > > > > > I'm definitely happy to help, but it's a bit hard to say anything > > > > > > from what > > > > > > you've provided. I'm not aware of any open issues with the freezer > > > > > > except > > > > > > some spurious cgroup frozen<->not frozen transitions which can > > > > > > happen in some > > > > > > cases. If you'll describe how can I reproduce the issue, and I'll > > > > > > try to take > > > > > > a look asap. > > > > > > > > > > That would be great. > > > > > > > > > > I'll attempt to remove all of the build system specifics out of this > > > > > (and Richard Purdie > > > > > on the cc' of this can probably help provide more details / setup > > > > > info as well). > > > > > > > > > > We are running the built-in tests of strace. So here's a cut and > > > > > paste of what I > > > > > did to get the tests available (ignore/skip what is common sense or > > > > > isn't needed > > > > > in your test rig). > > > > > > > > > > % git clone https://github.com/strace/strace.git > > > > > % cd strace > > > > > % ./bootstrap > > > > > # the --enable flag isn't strictly required, but may break on some > > > > > build machines > > > > > % ./configure --enable-mpers=no > > > > > % make > > > > > % make check-TESTS > > > > > > > > > > That last step will not only build the tests, but run them all .. so > > > > > ^c the run once > > > > > it starts, since it is a lot of noise (we carry a patch to strace that > > > > > allows us to build > > > > > the tests without running them). > > > > > > > > > > % cd tests > > > > > % time strace -o log -qq -esignal=none
Re: ptrace/strace and freezer oddities and v5.2+ kernels
On Thu, Oct 3, 2019 at 8:09 PM Roman Gushchin wrote: > > On Wed, Oct 02, 2019 at 05:59:36PM -0400, Bruce Ashfield wrote: > > On Wed, Oct 2, 2019 at 2:19 PM Roman Gushchin wrote: > > > > > > On Wed, Oct 02, 2019 at 12:18:54AM -0400, Bruce Ashfield wrote: > > > > On Tue, Oct 1, 2019 at 10:01 PM Roman Gushchin wrote: > > > > > > > > > > On Tue, Oct 01, 2019 at 12:14:18PM -0400, Bruce Ashfield wrote: > > > > > > Hi all, > > > > > > > > > > > > > > > > Hi Bruce! > > > > > > > > > > > The Yocto project has an upcoming release this fall, and I've been > > > > > > trying to > > > > > > sort through some issues that are happening with kernel 5.2+ .. > > > > > > although > > > > > > there is a specific yocto kernel, I'm testing and seeing this with > > > > > > normal / vanilla > > > > > > mainline kernels as well. > > > > > > > > > > > > I'm running into an issue that is *very* similar to the one > > > > > > discussed in the > > > > > > [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer" > > > > > > (76f969e) > > > > > > thread from this past may: https://lkml.org/lkml/2019/5/12/272 > > > > > > > > > > > > I can confirm that I have the proposed fix for the initial > > > > > > regression report in > > > > > > my build (05b2892637 [signal: unconditionally leave the frozen state > > > > > > in ptrace_stop()]), > > > > > > but yet I'm still seeing 3 or 4 minute runtimes on a test that used > > > > > > to take 3 or > > > > > > 4 seconds. > > > > > > > > > > So, the problem is that you're experiencing a severe performance > > > > > regression > > > > > in some test, right? > > > > > > > > Hi Roman, > > > > > > > > Correct. In particular, running some of the tests that ship with strace > > > > itself. > > > > The performance change is so drastic, that it definitely makes you > > > > wonder > > > > "What have I done wrong? Since everyone must be seeing this" .. and I > > > > always blame myself first. > > > > > > > > > > > > > > > > > > > > > This isn't my normal area of kernel hacking, so I've so far come up > > > > > > empty > > > > > > at either fixing it myself, or figuring out a viable workaround. > > > > > > (well, I can > > > > > > "fix" it by remove the cgroup_enter_frozen() call in ptrace_stop ... > > > > > > but obviously, > > > > > > that is just me trying to figure out what could be causing the > > > > > > issue). > > > > > > > > > > > > As part of the release, we run tests that come with various > > > > > > applications. The > > > > > > ptrace test that is causing us issues can be boiled down to this: > > > > > > > > > > > > $ cd /usr/lib/strace/ptest/tests > > > > > > $ time ../strace -o log -qq -esignal=none -e/clock > > > > > > ./printpath-umovestr>ttt > > > > > > > > > > > > (I can provide as many details as needed, but I wanted to keep this > > > > > > initial > > > > > > email relatively short). > > > > > > > > > > > > I'll continue to debug and attempt to fix this myself, but I > > > > > > grabbed the > > > > > > email list from the regression report in May to see if anyone has > > > > > > any ideas > > > > > > or angles that I haven't covered in my search for a fix. > > > > > > > > > > I'm definitely happy to help, but it's a bit hard to say anything > > > > > from what > > > > > you've provided. I'm not aware of any open issues with the freezer > > > > > except > > > > > some spurious cgroup frozen<->not frozen transitions which can happen > > > > > in some > > > > > cases. If you'll describe how can I reproduce the issue, and I'll try > > > > > to take > > > > > a look asap. > > > > > > > > That would be great. > > > > > > > > I'll attempt to remove all of the build system specifics out of this > > > > (and Richard Purdie > > > > on the cc' of this can probably help provide more details / setup info > > > > as well). > > > > > > > > We are running the built-in tests of strace. So here's a cut and paste > > > > of what I > > > > did to get the tests available (ignore/skip what is common sense or > > > > isn't needed > > > > in your test rig). > > > > > > > > % git clone https://github.com/strace/strace.git > > > > % cd strace > > > > % ./bootstrap > > > > # the --enable flag isn't strictly required, but may break on some > > > > build machines > > > > % ./configure --enable-mpers=no > > > > % make > > > > % make check-TESTS > > > > > > > > That last step will not only build the tests, but run them all .. so > > > > ^c the run once > > > > it starts, since it is a lot of noise (we carry a patch to strace that > > > > allows us to build > > > > the tests without running them). > > > > > > > > % cd tests > > > > % time strace -o log -qq -esignal=none -e/clock ./printpath-umovestr > > > > > fff > > > > real0m2.566s > > > > user0m0.284s > > > > sys 0m2.519 > > > > > > > > On pre-cgroup2 freezer kernels, you see a run time similar to what I > > > > have above. > > > > On the newer kernels we are testing, it is taking 3 or 4 minutes to > > > > run the test. >
Re: ptrace/strace and freezer oddities and v5.2+ kernels
On Wed, Oct 02, 2019 at 05:59:36PM -0400, Bruce Ashfield wrote: > On Wed, Oct 2, 2019 at 2:19 PM Roman Gushchin wrote: > > > > On Wed, Oct 02, 2019 at 12:18:54AM -0400, Bruce Ashfield wrote: > > > On Tue, Oct 1, 2019 at 10:01 PM Roman Gushchin wrote: > > > > > > > > On Tue, Oct 01, 2019 at 12:14:18PM -0400, Bruce Ashfield wrote: > > > > > Hi all, > > > > > > > > > > > > > Hi Bruce! > > > > > > > > > The Yocto project has an upcoming release this fall, and I've been > > > > > trying to > > > > > sort through some issues that are happening with kernel 5.2+ .. > > > > > although > > > > > there is a specific yocto kernel, I'm testing and seeing this with > > > > > normal / vanilla > > > > > mainline kernels as well. > > > > > > > > > > I'm running into an issue that is *very* similar to the one discussed > > > > > in the > > > > > [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer" (76f969e) > > > > > thread from this past may: https://lkml.org/lkml/2019/5/12/272 > > > > > > > > > > I can confirm that I have the proposed fix for the initial regression > > > > > report in > > > > > my build (05b2892637 [signal: unconditionally leave the frozen state > > > > > in ptrace_stop()]), > > > > > but yet I'm still seeing 3 or 4 minute runtimes on a test that used > > > > > to take 3 or > > > > > 4 seconds. > > > > > > > > So, the problem is that you're experiencing a severe performance > > > > regression > > > > in some test, right? > > > > > > Hi Roman, > > > > > > Correct. In particular, running some of the tests that ship with strace > > > itself. > > > The performance change is so drastic, that it definitely makes you wonder > > > "What have I done wrong? Since everyone must be seeing this" .. and I > > > always blame myself first. > > > > > > > > > > > > > > > > > This isn't my normal area of kernel hacking, so I've so far come up > > > > > empty > > > > > at either fixing it myself, or figuring out a viable workaround. > > > > > (well, I can > > > > > "fix" it by remove the cgroup_enter_frozen() call in ptrace_stop ... > > > > > but obviously, > > > > > that is just me trying to figure out what could be causing the issue). > > > > > > > > > > As part of the release, we run tests that come with various > > > > > applications. The > > > > > ptrace test that is causing us issues can be boiled down to this: > > > > > > > > > > $ cd /usr/lib/strace/ptest/tests > > > > > $ time ../strace -o log -qq -esignal=none -e/clock > > > > > ./printpath-umovestr>ttt > > > > > > > > > > (I can provide as many details as needed, but I wanted to keep this > > > > > initial > > > > > email relatively short). > > > > > > > > > > I'll continue to debug and attempt to fix this myself, but I grabbed > > > > > the > > > > > email list from the regression report in May to see if anyone has any > > > > > ideas > > > > > or angles that I haven't covered in my search for a fix. > > > > > > > > I'm definitely happy to help, but it's a bit hard to say anything from > > > > what > > > > you've provided. I'm not aware of any open issues with the freezer > > > > except > > > > some spurious cgroup frozen<->not frozen transitions which can happen > > > > in some > > > > cases. If you'll describe how can I reproduce the issue, and I'll try > > > > to take > > > > a look asap. > > > > > > That would be great. > > > > > > I'll attempt to remove all of the build system specifics out of this > > > (and Richard Purdie > > > on the cc' of this can probably help provide more details / setup info as > > > well). > > > > > > We are running the built-in tests of strace. So here's a cut and paste of > > > what I > > > did to get the tests available (ignore/skip what is common sense or isn't > > > needed > > > in your test rig). > > > > > > % git clone https://github.com/strace/strace.git > > > % cd strace > > > % ./bootstrap > > > # the --enable flag isn't strictly required, but may break on some > > > build machines > > > % ./configure --enable-mpers=no > > > % make > > > % make check-TESTS > > > > > > That last step will not only build the tests, but run them all .. so > > > ^c the run once > > > it starts, since it is a lot of noise (we carry a patch to strace that > > > allows us to build > > > the tests without running them). > > > > > > % cd tests > > > % time strace -o log -qq -esignal=none -e/clock ./printpath-umovestr > fff > > > real0m2.566s > > > user0m0.284s > > > sys 0m2.519 > > > > > > On pre-cgroup2 freezer kernels, you see a run time similar to what I have > > > above. > > > On the newer kernels we are testing, it is taking 3 or 4 minutes to > > > run the test. > > > > > > I hope that is simple enough to setup and try. Since I've been seeing > > > this on both > > > mainline kernels and the yocto reference kernels, I don't think it is > > > something that > > > I'm carrying in the distro/reference kernel that is causing this (but > > > again, I always > > > blame myself first). If you don't see that
Re: ptrace/strace and freezer oddities and v5.2+ kernels
On Wed, Oct 2, 2019 at 2:19 PM Roman Gushchin wrote: > > On Wed, Oct 02, 2019 at 12:18:54AM -0400, Bruce Ashfield wrote: > > On Tue, Oct 1, 2019 at 10:01 PM Roman Gushchin wrote: > > > > > > On Tue, Oct 01, 2019 at 12:14:18PM -0400, Bruce Ashfield wrote: > > > > Hi all, > > > > > > > > > > Hi Bruce! > > > > > > > The Yocto project has an upcoming release this fall, and I've been > > > > trying to > > > > sort through some issues that are happening with kernel 5.2+ .. although > > > > there is a specific yocto kernel, I'm testing and seeing this with > > > > normal / vanilla > > > > mainline kernels as well. > > > > > > > > I'm running into an issue that is *very* similar to the one discussed > > > > in the > > > > [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer" (76f969e) > > > > thread from this past may: https://lkml.org/lkml/2019/5/12/272 > > > > > > > > I can confirm that I have the proposed fix for the initial regression > > > > report in > > > > my build (05b2892637 [signal: unconditionally leave the frozen state > > > > in ptrace_stop()]), > > > > but yet I'm still seeing 3 or 4 minute runtimes on a test that used to > > > > take 3 or > > > > 4 seconds. > > > > > > So, the problem is that you're experiencing a severe performance > > > regression > > > in some test, right? > > > > Hi Roman, > > > > Correct. In particular, running some of the tests that ship with strace > > itself. > > The performance change is so drastic, that it definitely makes you wonder > > "What have I done wrong? Since everyone must be seeing this" .. and I > > always blame myself first. > > > > > > > > > > > > > This isn't my normal area of kernel hacking, so I've so far come up > > > > empty > > > > at either fixing it myself, or figuring out a viable workaround. (well, > > > > I can > > > > "fix" it by remove the cgroup_enter_frozen() call in ptrace_stop ... > > > > but obviously, > > > > that is just me trying to figure out what could be causing the issue). > > > > > > > > As part of the release, we run tests that come with various > > > > applications. The > > > > ptrace test that is causing us issues can be boiled down to this: > > > > > > > > $ cd /usr/lib/strace/ptest/tests > > > > $ time ../strace -o log -qq -esignal=none -e/clock > > > > ./printpath-umovestr>ttt > > > > > > > > (I can provide as many details as needed, but I wanted to keep this > > > > initial > > > > email relatively short). > > > > > > > > I'll continue to debug and attempt to fix this myself, but I grabbed the > > > > email list from the regression report in May to see if anyone has any > > > > ideas > > > > or angles that I haven't covered in my search for a fix. > > > > > > I'm definitely happy to help, but it's a bit hard to say anything from > > > what > > > you've provided. I'm not aware of any open issues with the freezer except > > > some spurious cgroup frozen<->not frozen transitions which can happen in > > > some > > > cases. If you'll describe how can I reproduce the issue, and I'll try to > > > take > > > a look asap. > > > > That would be great. > > > > I'll attempt to remove all of the build system specifics out of this > > (and Richard Purdie > > on the cc' of this can probably help provide more details / setup info as > > well). > > > > We are running the built-in tests of strace. So here's a cut and paste of > > what I > > did to get the tests available (ignore/skip what is common sense or isn't > > needed > > in your test rig). > > > > % git clone https://github.com/strace/strace.git > > % cd strace > > % ./bootstrap > > # the --enable flag isn't strictly required, but may break on some > > build machines > > % ./configure --enable-mpers=no > > % make > > % make check-TESTS > > > > That last step will not only build the tests, but run them all .. so > > ^c the run once > > it starts, since it is a lot of noise (we carry a patch to strace that > > allows us to build > > the tests without running them). > > > > % cd tests > > % time strace -o log -qq -esignal=none -e/clock ./printpath-umovestr > fff > > real0m2.566s > > user0m0.284s > > sys 0m2.519 > > > > On pre-cgroup2 freezer kernels, you see a run time similar to what I have > > above. > > On the newer kernels we are testing, it is taking 3 or 4 minutes to > > run the test. > > > > I hope that is simple enough to setup and try. Since I've been seeing > > this on both > > mainline kernels and the yocto reference kernels, I don't think it is > > something that > > I'm carrying in the distro/reference kernel that is causing this (but > > again, I always > > blame myself first). If you don't see that same run time, then that > > does point the finger > > back at what we are doing and I'll have to apologize for chewing up some of > > your > > time. > > Thank you for the detailed description! > I'll try to reproduce the issue and will be back > by the end of the week. Thanks again! While discussing the issue with a few yocto folks today,
Re: ptrace/strace and freezer oddities and v5.2+ kernels
On Wed, Oct 02, 2019 at 12:18:54AM -0400, Bruce Ashfield wrote: > On Tue, Oct 1, 2019 at 10:01 PM Roman Gushchin wrote: > > > > On Tue, Oct 01, 2019 at 12:14:18PM -0400, Bruce Ashfield wrote: > > > Hi all, > > > > > > > Hi Bruce! > > > > > The Yocto project has an upcoming release this fall, and I've been trying > > > to > > > sort through some issues that are happening with kernel 5.2+ .. although > > > there is a specific yocto kernel, I'm testing and seeing this with > > > normal / vanilla > > > mainline kernels as well. > > > > > > I'm running into an issue that is *very* similar to the one discussed in > > > the > > > [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer" (76f969e) > > > thread from this past may: https://lkml.org/lkml/2019/5/12/272 > > > > > > I can confirm that I have the proposed fix for the initial regression > > > report in > > > my build (05b2892637 [signal: unconditionally leave the frozen state > > > in ptrace_stop()]), > > > but yet I'm still seeing 3 or 4 minute runtimes on a test that used to > > > take 3 or > > > 4 seconds. > > > > So, the problem is that you're experiencing a severe performance regression > > in some test, right? > > Hi Roman, > > Correct. In particular, running some of the tests that ship with strace > itself. > The performance change is so drastic, that it definitely makes you wonder > "What have I done wrong? Since everyone must be seeing this" .. and I > always blame myself first. > > > > > > > > > This isn't my normal area of kernel hacking, so I've so far come up empty > > > at either fixing it myself, or figuring out a viable workaround. (well, I > > > can > > > "fix" it by remove the cgroup_enter_frozen() call in ptrace_stop ... > > > but obviously, > > > that is just me trying to figure out what could be causing the issue). > > > > > > As part of the release, we run tests that come with various applications. > > > The > > > ptrace test that is causing us issues can be boiled down to this: > > > > > > $ cd /usr/lib/strace/ptest/tests > > > $ time ../strace -o log -qq -esignal=none -e/clock > > > ./printpath-umovestr>ttt > > > > > > (I can provide as many details as needed, but I wanted to keep this > > > initial > > > email relatively short). > > > > > > I'll continue to debug and attempt to fix this myself, but I grabbed the > > > email list from the regression report in May to see if anyone has any > > > ideas > > > or angles that I haven't covered in my search for a fix. > > > > I'm definitely happy to help, but it's a bit hard to say anything from what > > you've provided. I'm not aware of any open issues with the freezer except > > some spurious cgroup frozen<->not frozen transitions which can happen in > > some > > cases. If you'll describe how can I reproduce the issue, and I'll try to > > take > > a look asap. > > That would be great. > > I'll attempt to remove all of the build system specifics out of this > (and Richard Purdie > on the cc' of this can probably help provide more details / setup info as > well). > > We are running the built-in tests of strace. So here's a cut and paste of > what I > did to get the tests available (ignore/skip what is common sense or isn't > needed > in your test rig). > > % git clone https://github.com/strace/strace.git > % cd strace > % ./bootstrap > # the --enable flag isn't strictly required, but may break on some > build machines > % ./configure --enable-mpers=no > % make > % make check-TESTS > > That last step will not only build the tests, but run them all .. so > ^c the run once > it starts, since it is a lot of noise (we carry a patch to strace that > allows us to build > the tests without running them). > > % cd tests > % time strace -o log -qq -esignal=none -e/clock ./printpath-umovestr > fff > real0m2.566s > user0m0.284s > sys 0m2.519 > > On pre-cgroup2 freezer kernels, you see a run time similar to what I have > above. > On the newer kernels we are testing, it is taking 3 or 4 minutes to > run the test. > > I hope that is simple enough to setup and try. Since I've been seeing > this on both > mainline kernels and the yocto reference kernels, I don't think it is > something that > I'm carrying in the distro/reference kernel that is causing this (but > again, I always > blame myself first). If you don't see that same run time, then that > does point the finger > back at what we are doing and I'll have to apologize for chewing up some of > your > time. Thank you for the detailed description! I'll try to reproduce the issue and will be back by the end of the week. Thank you! Roman
Re: ptrace/strace and freezer oddities and v5.2+ kernels
On Tue, Oct 1, 2019 at 10:01 PM Roman Gushchin wrote: > > On Tue, Oct 01, 2019 at 12:14:18PM -0400, Bruce Ashfield wrote: > > Hi all, > > > > Hi Bruce! > > > The Yocto project has an upcoming release this fall, and I've been trying to > > sort through some issues that are happening with kernel 5.2+ .. although > > there is a specific yocto kernel, I'm testing and seeing this with > > normal / vanilla > > mainline kernels as well. > > > > I'm running into an issue that is *very* similar to the one discussed in the > > [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer" (76f969e) > > thread from this past may: https://lkml.org/lkml/2019/5/12/272 > > > > I can confirm that I have the proposed fix for the initial regression > > report in > > my build (05b2892637 [signal: unconditionally leave the frozen state > > in ptrace_stop()]), > > but yet I'm still seeing 3 or 4 minute runtimes on a test that used to take > > 3 or > > 4 seconds. > > So, the problem is that you're experiencing a severe performance regression > in some test, right? Hi Roman, Correct. In particular, running some of the tests that ship with strace itself. The performance change is so drastic, that it definitely makes you wonder "What have I done wrong? Since everyone must be seeing this" .. and I always blame myself first. > > > > > This isn't my normal area of kernel hacking, so I've so far come up empty > > at either fixing it myself, or figuring out a viable workaround. (well, I > > can > > "fix" it by remove the cgroup_enter_frozen() call in ptrace_stop ... > > but obviously, > > that is just me trying to figure out what could be causing the issue). > > > > As part of the release, we run tests that come with various applications. > > The > > ptrace test that is causing us issues can be boiled down to this: > > > > $ cd /usr/lib/strace/ptest/tests > > $ time ../strace -o log -qq -esignal=none -e/clock ./printpath-umovestr>ttt > > > > (I can provide as many details as needed, but I wanted to keep this initial > > email relatively short). > > > > I'll continue to debug and attempt to fix this myself, but I grabbed the > > email list from the regression report in May to see if anyone has any ideas > > or angles that I haven't covered in my search for a fix. > > I'm definitely happy to help, but it's a bit hard to say anything from what > you've provided. I'm not aware of any open issues with the freezer except > some spurious cgroup frozen<->not frozen transitions which can happen in some > cases. If you'll describe how can I reproduce the issue, and I'll try to take > a look asap. That would be great. I'll attempt to remove all of the build system specifics out of this (and Richard Purdie on the cc' of this can probably help provide more details / setup info as well). We are running the built-in tests of strace. So here's a cut and paste of what I did to get the tests available (ignore/skip what is common sense or isn't needed in your test rig). % git clone https://github.com/strace/strace.git % cd strace % ./bootstrap # the --enable flag isn't strictly required, but may break on some build machines % ./configure --enable-mpers=no % make % make check-TESTS That last step will not only build the tests, but run them all .. so ^c the run once it starts, since it is a lot of noise (we carry a patch to strace that allows us to build the tests without running them). % cd tests % time strace -o log -qq -esignal=none -e/clock ./printpath-umovestr > fff real0m2.566s user0m0.284s sys 0m2.519 On pre-cgroup2 freezer kernels, you see a run time similar to what I have above. On the newer kernels we are testing, it is taking 3 or 4 minutes to run the test. I hope that is simple enough to setup and try. Since I've been seeing this on both mainline kernels and the yocto reference kernels, I don't think it is something that I'm carrying in the distro/reference kernel that is causing this (but again, I always blame myself first). If you don't see that same run time, then that does point the finger back at what we are doing and I'll have to apologize for chewing up some of your time. Cheers, Bruce > > Roman -- - Thou shalt not follow the NULL pointer, for chaos and madness await thee at its end - "Use the force Harry" - Gandalf, Star Trek II
Re: ptrace/strace and freezer oddities and v5.2+ kernels
On Tue, Oct 01, 2019 at 12:14:18PM -0400, Bruce Ashfield wrote: > Hi all, > Hi Bruce! > The Yocto project has an upcoming release this fall, and I've been trying to > sort through some issues that are happening with kernel 5.2+ .. although > there is a specific yocto kernel, I'm testing and seeing this with > normal / vanilla > mainline kernels as well. > > I'm running into an issue that is *very* similar to the one discussed in the > [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer" (76f969e) > thread from this past may: https://lkml.org/lkml/2019/5/12/272 > > I can confirm that I have the proposed fix for the initial regression report > in > my build (05b2892637 [signal: unconditionally leave the frozen state > in ptrace_stop()]), > but yet I'm still seeing 3 or 4 minute runtimes on a test that used to take 3 > or > 4 seconds. So, the problem is that you're experiencing a severe performance regression in some test, right? > > This isn't my normal area of kernel hacking, so I've so far come up empty > at either fixing it myself, or figuring out a viable workaround. (well, I can > "fix" it by remove the cgroup_enter_frozen() call in ptrace_stop ... > but obviously, > that is just me trying to figure out what could be causing the issue). > > As part of the release, we run tests that come with various applications. The > ptrace test that is causing us issues can be boiled down to this: > > $ cd /usr/lib/strace/ptest/tests > $ time ../strace -o log -qq -esignal=none -e/clock ./printpath-umovestr>ttt > > (I can provide as many details as needed, but I wanted to keep this initial > email relatively short). > > I'll continue to debug and attempt to fix this myself, but I grabbed the > email list from the regression report in May to see if anyone has any ideas > or angles that I haven't covered in my search for a fix. I'm definitely happy to help, but it's a bit hard to say anything from what you've provided. I'm not aware of any open issues with the freezer except some spurious cgroup frozen<->not frozen transitions which can happen in some cases. If you'll describe how can I reproduce the issue, and I'll try to take a look asap. Roman
ptrace/strace and freezer oddities and v5.2+ kernels
Hi all, The Yocto project has an upcoming release this fall, and I've been trying to sort through some issues that are happening with kernel 5.2+ .. although there is a specific yocto kernel, I'm testing and seeing this with normal / vanilla mainline kernels as well. I'm running into an issue that is *very* similar to the one discussed in the [REGRESSION] ptrace broken from "cgroup: cgroup v2 freezer" (76f969e) thread from this past may: https://lkml.org/lkml/2019/5/12/272 I can confirm that I have the proposed fix for the initial regression report in my build (05b2892637 [signal: unconditionally leave the frozen state in ptrace_stop()]), but yet I'm still seeing 3 or 4 minute runtimes on a test that used to take 3 or 4 seconds. This isn't my normal area of kernel hacking, so I've so far come up empty at either fixing it myself, or figuring out a viable workaround. (well, I can "fix" it by remove the cgroup_enter_frozen() call in ptrace_stop ... but obviously, that is just me trying to figure out what could be causing the issue). As part of the release, we run tests that come with various applications. The ptrace test that is causing us issues can be boiled down to this: $ cd /usr/lib/strace/ptest/tests $ time ../strace -o log -qq -esignal=none -e/clock ./printpath-umovestr>ttt (I can provide as many details as needed, but I wanted to keep this initial email relatively short). I'll continue to debug and attempt to fix this myself, but I grabbed the email list from the regression report in May to see if anyone has any ideas or angles that I haven't covered in my search for a fix. Cheers, Bruce -- - Thou shalt not follow the NULL pointer, for chaos and madness await thee at its end - "Use the force Harry" - Gandalf, Star Trek II