Re: [x86-tip] strange nr_cpus= boot regression
On Tue, 27 Sep 2016, Dou Liyang wrote: > In fact, it's my fault. > I should re-base my patches after the commit c291b0151585 in time. Definitely not your fault. As a maintainer I should have been more careful and check whether Linus tree or x86/urgent has any modifications to that area before applying your patches. > I learned a lot from it. We all did. Thanks, tglx
Re: [x86-tip] strange nr_cpus= boot regression
Mike Galbraith wrote: >Whew, no mythical creature infestation. Thanks, next encounter with >such an artifact should provide markedly less entertainment. Oh, no worries. Next time it'll be something else. There's no dull day with this this kernel thing ;-) -- Sent from a small device: formatting sucks and brevity is inevitable.
Re: [x86-tip] strange nr_cpus= boot regression
On Mon, 2016-09-26 at 15:35 -0400, Thomas Gleixner wrote: > On Mon, 26 Sep 2016, Thomas Gleixner wrote: > > Can you please provide your .config and the dmesg of a bad and a good run? > > Don't bother. I found it. > > It's a merge artifact. So git bisect pointing at the merge commit is > entirely correct. > > mainline moves > > > num_processors++; > > to a different place in the function. See commit c291b0151585. > > Now the nodeid patch set in x86/apic does not have this commit and so > f7c28833c2520 removes num_processors++ from the original location before > c291b0151585. Whew, no mythical creature infestation. Thanks, next encounter with such an artifact should provide markedly less entertainment. -Mike
Re: [x86-tip] strange nr_cpus= boot regression
Hi tglx, I'm sorry for the late reply. Awfully sorry that I could not do anything help. In fact, it's my fault. I should re-base my patches after the commit c291b0151585 in time. I learned a lot from it. Thank a lot, and once again my apologies. Thanks, Dou At 09/27/2016 01:36 AM, Thomas Gleixner wrote: CC'ed: Dou Liyang On Mon, 26 Sep 2016, Mike Galbraith wrote: I've encountered a strange regression in tip, symptom is that if you boot with nr_cpus=nr_you_have, what actually boots is nr_you_have/2. Do not pass nr_cpus=, and all is well. What's the number of possible cpus in your system? Bisection repeatedly goes as below, pointing to the nodeid merge, despite both timers/core and x86/apic (nodeid) being fine. Take tip HEAD, extract all of the commits from nodeid (plus the fix), and revert them in a quilt tree, the tree remains busted. So you remove all the nodeid commits from tip/master and it's still broken? Checkout the timers/core merge commit, and merge nodeid with that, it is indeed bad. Bisecting takes you right the merge commit, with no commit being 'bad', see logs. That's more than strange. An empty merge commit being the culprit. Thanks, tglx
Re: [x86-tip] strange nr_cpus= boot regression
On Mon, 26 Sep 2016, Thomas Gleixner wrote: > Here is a patch against tip/master which fixes the issue at least for > Boris. I'm going to merge that other commit into x86/apic and fix it up so > we don't end up with that mess again. tip/x86/apic and tip/master are updated now. Thanks, tglx
Re: [x86-tip] strange nr_cpus= boot regression
On Mon, 26 Sep 2016, Thomas Gleixner wrote: > Can you please provide your .config and the dmesg of a bad and a good run? Don't bother. I found it. It's a merge artifact. So git bisect pointing at the merge commit is entirely correct. mainline moves num_processors++; to a different place in the function. See commit c291b0151585. Now the nodeid patch set in x86/apic does not have this commit and so f7c28833c2520 removes num_processors++ from the original location before c291b0151585. Now merging both branches does not conflict because both remove it from the original location. Though both add it to new locations and it ends up with both instances of num_processors++ in place. Which of course makes each invocation increment twice and therefor cuts the number of cpus in half. So it's my fault that I did not merge x86/urgent into x86/apic before I added the nodeid bits. And of course because the thing did not reject and the merge of it into master gave no conflicts I did not notice Here is a patch against tip/master which fixes the issue at least for Boris. I'm going to merge that other commit into x86/apic and fix it up so we don't end up with that mess again. Thanks, tglx diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 46bb29958509..f266b8a92a9e 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -2171,8 +2171,6 @@ int __generic_processor_info(int apicid, int version, bool enabled) return -ENOSPC; } - num_processors++; - /* * Validate version */
Re: [x86-tip] strange nr_cpus= boot regression
Mike, On Mon, 26 Sep 2016, Thomas Gleixner wrote: > On Mon, 26 Sep 2016, Mike Galbraith wrote: > > > I've encountered a strange regression in tip, symptom is that if you > > boot with nr_cpus=nr_you_have, what actually boots is nr_you_have/2. > > Do not pass nr_cpus=, and all is well. > > What's the number of possible cpus in your system? Can you please provide your .config and the dmesg of a bad and a good run? Thanks, tglx
Re: [x86-tip] strange nr_cpus= boot regression
CC'ed: Dou Liyang On Mon, 26 Sep 2016, Mike Galbraith wrote: > I've encountered a strange regression in tip, symptom is that if you > boot with nr_cpus=nr_you_have, what actually boots is nr_you_have/2. > Do not pass nr_cpus=, and all is well. What's the number of possible cpus in your system? > Bisection repeatedly goes as below, pointing to the nodeid merge, > despite both timers/core and x86/apic (nodeid) being fine. Take tip > HEAD, extract all of the commits from nodeid (plus the fix), and revert > them in a quilt tree, the tree remains busted. So you remove all the nodeid commits from tip/master and it's still broken? > Checkout the timers/core merge commit, and merge nodeid with that, it is > indeed bad. > Bisecting takes you right the merge commit, with no commit > being 'bad', see logs. That's more than strange. An empty merge commit being the culprit. Thanks, tglx
Re: [x86-tip] strange nr_cpus= boot regression
On Mon, 2016-09-26 at 15:40 +0200, Borislav Petkov wrote: > On Mon, Sep 26, 2016 at 02:39:53PM +0200, Mike Galbraith wrote: > > On Mon, 2016-09-26 at 14:29 +0200, Mike Galbraith wrote: > > > > > Checkout timers/core, and merge nodeid, and all is well. I'm > > > currently > > > bisecting the result against HEAD.. which will likely be about as > > > useful as the last five bisections, but ya never know. (ok git, > > > finger > > > somebody already [hotplug] and call it a day) > > > > That came out backward, though it shouldn't matter. I checked out > > nodeid, and merged timers/core into it, which works fine. > > Ok, I can confirm the same observation without *any* command line > options diminishing the number of CPUs. I'm booting with halved CPU > numbers. Here's a coarse bisect of the merge commits of tip/master: > > ... > e47f2e50ead9 Merge tag 'configfs-for-4.8-2' of > git://git.infradead.org/users/hch/configfs > d24efa23e93c Merge branch 'x86/apic' > > <--- v4.8-rc7-714-gcdcef608a13d: bad > > cdcef608a13d Merge branch 'linus' > 450df5cb5cad Merge branch 'x86/vdso' > 2f2c2f35ddbf Merge branch 'x86/timers' > 782795482a9e Merge branch 'x86/platform' > 1200484cdb51 Merge branch 'x86/cleanups' > b1591e618894 Merge branch 'x86/boot' > 5b8b44cc2d19 Merge branch 'x86/asm' > > <--- v4.8-rc7-496-g20eefd15d70: bad > > 20eefd15d70f Merge branch 'x86/apic' > > <--- v4.8-rc7-475-gb468c89ee756: OK > > b468c89ee756 Merge branch 'timers/core' > > <--- v4.8-rc7-454-g4be273986460: OK > > 4be273986460 Merge branch 'smp/hotplug' > > <--- v4.8-rc7-405-g1eaa05b665b0: OK > > 1eaa05b665b0 Merge branch 'sched/core' > b820db731394 Merge branch 'ras/core' > 689d1113c7ba Merge branch 'perf/core' > f93add128c73 Merge branch 'mm/pkeys' > f0be67cb6c64 Merge branch 'locking/urgent' > ab5f266b0b55 Merge branch 'locking/core' > c69a1dec2a36 Merge branch 'irq/core' > 87671cd5fb60 Merge branch 'efi/core' > 51e5767069c8 Merge branch 'core/smp' > 85202a2289d2 Merge branch 'core/rcu' > ... > Arrrg. 20eefd15d70f297c7e193e087ed4103b47c2e312 is the first bad commit git bisect start # good: [bd0c06c62f40fb2c1443729e42956a6e4045e791] Merge remote-tracking branch 'x86-tip/timers/core' into nodeid git bisect good bd0c06c62f40fb2c1443729e42956a6e4045e791 # bad: [a04c42e40060130bb05eecb64df2c1cc8f48ab43] Merge branch 'linus' git bisect bad a04c42e40060130bb05eecb64df2c1cc8f48ab43 # good: [b536fd587044af02183b3c02690431b93154f0fa] timekeeping: Include the correct header for errno definitions git bisect good b536fd587044af02183b3c02690431b93154f0fa # good: [fd74da217df7d4bd25e95411da64e0b92762842e] acpi: Validate processor id when mapping the processor git bisect good fd74da217df7d4bd25e95411da64e0b92762842e # good: [739f1bcd045f473d79358aac94439722d41a2650] Merge branch 'perf/urgent' into perf/core, to pick up fixes git bisect good 739f1bcd045f473d79358aac94439722d41a2650 # good: [1eaa05b665b08cd13235f6f430a5214d8b8a3c87] Merge branch 'sched/core' git bisect good 1eaa05b665b08cd13235f6f430a5214d8b8a3c87 # bad: [2f2c2f35ddbf49372db8056b25c2221161d02e75] Merge branch 'x86/timers' git bisect bad 2f2c2f35ddbf49372db8056b25c2221161d02e75 # good: [71f5443ebb1227c22e8decbcd28a1ea6deaf8257] x86/dumpstack: Fix show_stack() task pointer regression git bisect good 71f5443ebb1227c22e8decbcd28a1ea6deaf8257 # good: [e476d3129100ba18daea2224f38fdd7195118d4b] mips/loongson/smp: Convert to hotplug state machine git bisect good e476d3129100ba18daea2224f38fdd7195118d4b # good: [58cbbee2391ce3876e6eee80a4f2a7f025859c52] x86/platform/mellanox: Introduce support for Mellanox systems platform git bisect good 58cbbee2391ce3876e6eee80a4f2a7f025859c52 # bad: [5b8b44cc2d19500e981f492d3d96f85a7007e4ff] Merge branch 'x86/asm' git bisect bad 5b8b44cc2d19500e981f492d3d96f85a7007e4ff # good: [0bf71e4d02ffec8ab9a6adecca61d3eed74fc99d] Merge branch 'smp/for-block' into smp/hotplug git bisect good 0bf71e4d02ffec8ab9a6adecca61d3eed74fc99d # good: [b468c89ee7568999dfc575da56841694c118164a] Merge branch 'timers/core' git bisect good b468c89ee7568999dfc575da56841694c118164a # bad: [20eefd15d70f297c7e193e087ed4103b47c2e312] Merge branch 'x86/apic' git bisect bad 20eefd15d70f297c7e193e087ed4103b47c2e312 # first bad commit: [20eefd15d70f297c7e193e087ed4103b47c2e312] Merge branch 'x86/apic' Well, that helped.. not at all. (headbutts box.. nope not the right tool either, maybe traceprintk;)
Re: [x86-tip] strange nr_cpus= boot regression
On Mon, Sep 26, 2016 at 03:40:34PM +0200, Borislav Petkov wrote: > <--- v4.8-rc7-496-g20eefd15d70: bad > > 20eefd15d70f Merge branch 'x86/apic' > > <--- v4.8-rc7-475-gb468c89ee756: OK > > b468c89ee756 Merge branch 'timers/core' Ok, it points to the merge commit here too: git bisect start # bad: [20eefd15d70f297c7e193e087ed4103b47c2e312] Merge branch 'x86/apic' git bisect bad 20eefd15d70f297c7e193e087ed4103b47c2e312 # good: [b468c89ee7568999dfc575da56841694c118164a] Merge branch 'timers/core' git bisect good b468c89ee7568999dfc575da56841694c118164a # good: [5035da41996d346c648a65c1d7a9f6469c7d358a] x86/apic: Update comment about disabling processor focus git bisect good 5035da41996d346c648a65c1d7a9f6469c7d358a # good: [f7c28833c252031bc68a29e26a18a661797cf3a3] x86/acpi: Enable acpi to register all possible cpus at boot time git bisect good f7c28833c252031bc68a29e26a18a661797cf3a3 # good: [dc6db24d2476cd09c0ecf2b8d80313539f737a89] x86/acpi: Set persistent cpuid <-> nodeid mapping when booting git bisect good dc6db24d2476cd09c0ecf2b8d80313539f737a89 # good: [fd74da217df7d4bd25e95411da64e0b92762842e] acpi: Validate processor id when mapping the processor git bisect good fd74da217df7d4bd25e95411da64e0b92762842e # first bad commit: [20eefd15d70f297c7e193e087ed4103b47c2e312] Merge branch 'x86/apic' -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [x86-tip] strange nr_cpus= boot regression
On Mon, Sep 26, 2016 at 02:39:53PM +0200, Mike Galbraith wrote: > On Mon, 2016-09-26 at 14:29 +0200, Mike Galbraith wrote: > > > Checkout timers/core, and merge nodeid, and all is well. I'm currently > > bisecting the result against HEAD.. which will likely be about as > > useful as the last five bisections, but ya never know. (ok git, finger > > somebody already [hotplug] and call it a day) > > That came out backward, though it shouldn't matter. I checked out > nodeid, and merged timers/core into it, which works fine. Ok, I can confirm the same observation without *any* command line options diminishing the number of CPUs. I'm booting with halved CPU numbers. Here's a coarse bisect of the merge commits of tip/master: ... e47f2e50ead9 Merge tag 'configfs-for-4.8-2' of git://git.infradead.org/users/hch/configfs d24efa23e93c Merge branch 'x86/apic' <--- v4.8-rc7-714-gcdcef608a13d: bad cdcef608a13d Merge branch 'linus' 450df5cb5cad Merge branch 'x86/vdso' 2f2c2f35ddbf Merge branch 'x86/timers' 782795482a9e Merge branch 'x86/platform' 1200484cdb51 Merge branch 'x86/cleanups' b1591e618894 Merge branch 'x86/boot' 5b8b44cc2d19 Merge branch 'x86/asm' <--- v4.8-rc7-496-g20eefd15d70: bad 20eefd15d70f Merge branch 'x86/apic' <--- v4.8-rc7-475-gb468c89ee756: OK b468c89ee756 Merge branch 'timers/core' <--- v4.8-rc7-454-g4be273986460: OK 4be273986460 Merge branch 'smp/hotplug' <--- v4.8-rc7-405-g1eaa05b665b0: OK 1eaa05b665b0 Merge branch 'sched/core' b820db731394 Merge branch 'ras/core' 689d1113c7ba Merge branch 'perf/core' f93add128c73 Merge branch 'mm/pkeys' f0be67cb6c64 Merge branch 'locking/urgent' ab5f266b0b55 Merge branch 'locking/core' c69a1dec2a36 Merge branch 'irq/core' 87671cd5fb60 Merge branch 'efi/core' 51e5767069c8 Merge branch 'core/smp' 85202a2289d2 Merge branch 'core/rcu' ... -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [x86-tip] strange nr_cpus= boot regression
On Mon, 2016-09-26 at 14:29 +0200, Mike Galbraith wrote: > Checkout timers/core, and merge nodeid, and all is well. I'm currently > bisecting the result against HEAD.. which will likely be about as > useful as the last five bisections, but ya never know. (ok git, finger > somebody already [hotplug] and call it a day) That came out backward, though it shouldn't matter. I checked out nodeid, and merged timers/core into it, which works fine.
[x86-tip] strange nr_cpus= boot regression
Hi Ingo, I've encountered a strange regression in tip, symptom is that if you boot with nr_cpus=nr_you_have, what actually boots is nr_you_have/2. Do not pass nr_cpus=, and all is well. Bisection repeatedly goes as below, pointing to the nodeid merge, despite both timers/core and x86/apic (nodeid) being fine. Take tip HEAD, extract all of the commits from nodeid (plus the fix), and revert them in a quilt tree, the tree remains busted. Checkout the timers/core merge commit, and merge nodeid with that, it is indeed bad. Bisecting that tree takes you right the merge commit, with no commit being 'bad', see logs. 20eefd15d70f297c7e193e087ed4103b47c2e312 is the first bad commit git bisect start # good: [3be7988674ab33565700a37b210f502563d932e6] Linux 4.8-rc7 git bisect good 3be7988674ab33565700a37b210f502563d932e6 # bad: [a04c42e40060130bb05eecb64df2c1cc8f48ab43] Merge branch 'linus' git bisect bad a04c42e40060130bb05eecb64df2c1cc8f48ab43 # good: [b820db7313942f6ae55637e118b19ed9739a12af] Merge branch 'ras/core' git bisect good b820db7313942f6ae55637e118b19ed9739a12af # bad: [5b8b44cc2d19500e981f492d3d96f85a7007e4ff] Merge branch 'x86/asm' git bisect bad 5b8b44cc2d19500e981f492d3d96f85a7007e4ff # good: [4be273986460867b37701a8185f8118ae201b39c] Merge branch 'smp/hotplug' git bisect good 4be273986460867b37701a8185f8118ae201b39c # good: [c6c314a613cd7d03fb97713e0d642b493de42e69] sched/core: Add try_get_task_stack() and put_task_stack() git bisect good c6c314a613cd7d03fb97713e0d642b493de42e69 # good: [b468c89ee7568999dfc575da56841694c118164a] Merge branch 'timers/core' git bisect good b468c89ee7568999dfc575da56841694c118164a # good: [dc6db24d2476cd09c0ecf2b8d80313539f737a89] x86/acpi: Set persistent cpuid <-> nodeid mapping when booting git bisect good dc6db24d2476cd09c0ecf2b8d80313539f737a89 # good: [7c7900f89770d7fba96100d8a9e18043a1af3973] x86/unwind: Add new unwind interface and implementations git bisect good 7c7900f89770d7fba96100d8a9e18043a1af3973 # good: [c8fe4609827aedc9c4b45de80e7cdc8ccfa8541b] x86/dumpstack: Remove dump_trace() and related callbacks git bisect good c8fe4609827aedc9c4b45de80e7cdc8ccfa8541b # good: [fd74da217df7d4bd25e95411da64e0b92762842e] acpi: Validate processor id when mapping the processor git bisect good fd74da217df7d4bd25e95411da64e0b92762842e # good: [71f5443ebb1227c22e8decbcd28a1ea6deaf8257] x86/dumpstack: Fix show_stack() task pointer regression git bisect good 71f5443ebb1227c22e8decbcd28a1ea6deaf8257 # bad: [20eefd15d70f297c7e193e087ed4103b47c2e312] Merge branch 'x86/apic' git bisect bad 20eefd15d70f297c7e193e087ed4103b47c2e312 # first bad commit: [20eefd15d70f297c7e193e087ed4103b47c2e312] Merge branch 'x86/apic' 92365cbaa9a781b6016fa27b03e3ba2fedbea22d is the first bad commit git bisect start # good: [b468c89ee7568999dfc575da56841694c118164a] Merge branch 'timers/core' git bisect good b468c89ee7568999dfc575da56841694c118164a # bad: [92365cbaa9a781b6016fa27b03e3ba2fedbea22d] Merge branch 'nodeid' into test git bisect bad 92365cbaa9a781b6016fa27b03e3ba2fedbea22d # good: [5035da41996d346c648a65c1d7a9f6469c7d358a] x86/apic: Update comment about disabling processor focus git bisect good 5035da41996d346c648a65c1d7a9f6469c7d358a # good: [f7c28833c252031bc68a29e26a18a661797cf3a3] x86/acpi: Enable acpi to register all possible cpus at boot time git bisect good f7c28833c252031bc68a29e26a18a661797cf3a3 # good: [dc6db24d2476cd09c0ecf2b8d80313539f737a89] x86/acpi: Set persistent cpuid <-> nodeid mapping when booting git bisect good dc6db24d2476cd09c0ecf2b8d80313539f737a89 # good: [fd74da217df7d4bd25e95411da64e0b92762842e] acpi: Validate processor id when mapping the processor git bisect good fd74da217df7d4bd25e95411da64e0b92762842e # first bad commit: [92365cbaa9a781b6016fa27b03e3ba2fedbea22d] Merge branch 'nodeid' into test # good: [fd74da217df7d4bd25e95411da64e0b92762842e] acpi: Validate processor id when mapping the processor git bisect good fd74da217df7d4bd25e95411da64e0b92762842e # first bad commit: [92365cbaa9a781b6016fa27b03e3ba2fedbea22d] Merge branch 'nodeid' into test Checkout timers/core, and merge nodeid, and all is well. I'm currently bisecting the result against HEAD.. which will likely be about as useful as the last five bisections, but ya never know. (ok git, finger somebody already [hotplug] and call it a day) -Mike