Re: Proper way to allocate DMA buffer within a single 4GB block?
On Wed, 20 Sep 2017 17:18:09 -0500 Timur Tabiwrote: > I have a device that requires I allocated a few buffers for DMA. The > problem is that this device has only one register for the upper 32 > bits of all of the buffers. That is, all of buffers must reside > within the same 4GB block of memory. In order words, > > end = start + size - 1; > if (upper_32_bits(start) != upper_32_bits(end)) > // Oh no, the buffer spans across a 4GB boundary! > > The buffer is typically less than 16KB in size, so we've never seen > it actually span across a 4GB boundary. However, I want to ensure > that it's impossible. I wrote this function that re-tries the > allocation if the first one is invalid, but I suspect that it's too > hackish. Is there a better way? > Allocate a buffer twice as big as what you really need. If the first half doesn't work, the second half will.
Re: Proper way to allocate DMA buffer within a single 4GB block?
On Wed, 20 Sep 2017 17:18:09 -0500 Timur Tabi wrote: > I have a device that requires I allocated a few buffers for DMA. The > problem is that this device has only one register for the upper 32 > bits of all of the buffers. That is, all of buffers must reside > within the same 4GB block of memory. In order words, > > end = start + size - 1; > if (upper_32_bits(start) != upper_32_bits(end)) > // Oh no, the buffer spans across a 4GB boundary! > > The buffer is typically less than 16KB in size, so we've never seen > it actually span across a 4GB boundary. However, I want to ensure > that it's impossible. I wrote this function that re-tries the > allocation if the first one is invalid, but I suspect that it's too > hackish. Is there a better way? > Allocate a buffer twice as big as what you really need. If the first half doesn't work, the second half will.
Re: [regression 4.14rc] 74def747bcd0 (genirq: Restrict effective affinity to interrupts actually using it)
On Tue, 19 Sep 2017 16:51:06 +0100 Marc Zyngierwrote: > On 19/09/17 16:40, Yanko Kaneti wrote: > > On Tue, 2017-09-19 at 16:33 +0100, Marc Zyngier wrote: > >> On 19/09/17 16:12, Yanko Kaneti wrote: > >>> Hello, > >>> > >>> Fedora rawhide config here. > >>> AMD FX-8370E > >>> > >>> Bisected a problem to: > >>> 74def747bcd0 (genirq: Restrict effective affinity to interrupts > >>> actually using it) > >>> > >>> It seems to be causing stalls, short lived or long lived lockups > >>> very shortly after boot. Everything becomes jerky. > >>> > >>> The only visible in the log indication is something like : > >>> > >>> [ 59.802129] clocksource: timekeeping watchdog on CPU3: Marking > >>> clocksource 'tsc' as unstable because the skew is too large: > >>> [ 59.802134] clocksource: 'hpet' wd_now: > >>> 3326e7aa wd_last: 329956f8 mask: [ 59.802137] > >>> clocksource: 'tsc' cs_now: 423662bc6f > >>> cs_last: 41dfc91650 mask: [ 59.802140] tsc: > >>> Marking TSC unstable due to clocksource watchdog [ 59.802158] > >>> TSC found unstable after boot, most likely due to broken BIOS. > >>> Use 'tsc=unstable'. [ 59.802161] sched_clock: Marking unstable > >>> (59802142067, 15510)<-(59920871789, -118714277) [ 60.015604] > >>> clocksource: Switched to clocksource hpet [ 89.015994] INFO: > >>> NMI handler (perf_event_nmi_handler) took too long to run: > >>> 209.660 msecs [ 89.016003] perf: interrupt took too long > >>> (1638003 > 2500), lowering kernel.perf_event_max_sample_rate to > >>> 1000 > >>> > >>> Just reverting that commit on top of linus mainline cures all the > >>> symptoms > >> > >> Interesting. Do you still get HPET interrupts? > > > > Sorry, I might need some basic help here (i.e where do I count > > them...) > > /proc/interrupts should display them. > > > After the watchdog switches the clocksource to hpet the system is > > still somewhat alive, so I'll guess some clock is still > > ticking > Probably, but I suspect they're not hitting the right CPU, hence the > lockups. > > Unfortunately, my x86-foo is pretty minimal, and I'm about to drop off > the net for a few days. > > Thomas, any insight? Looking at flat_cpu_mask_to_apicid(), I don't see how 74def747bcd0 can be correct: struct cpumask *effmsk = irq_data_get_effective_affinity_mask(irqdata); unsigned long cpu_mask = cpumask_bits(mask)[0] & APIC_ALL_CPUS; if (!cpu_mask) return -EINVAL; *apicid = (unsigned int)cpu_mask; cpumask_bits(effmsk)[0] = cpu_mask; Before that patch, this function wrote to the effective mask unconditionally. After, it only writes to effective_mask if it is already non-zero.
Re: [regression 4.14rc] 74def747bcd0 (genirq: Restrict effective affinity to interrupts actually using it)
On Tue, 19 Sep 2017 16:51:06 +0100 Marc Zyngier wrote: > On 19/09/17 16:40, Yanko Kaneti wrote: > > On Tue, 2017-09-19 at 16:33 +0100, Marc Zyngier wrote: > >> On 19/09/17 16:12, Yanko Kaneti wrote: > >>> Hello, > >>> > >>> Fedora rawhide config here. > >>> AMD FX-8370E > >>> > >>> Bisected a problem to: > >>> 74def747bcd0 (genirq: Restrict effective affinity to interrupts > >>> actually using it) > >>> > >>> It seems to be causing stalls, short lived or long lived lockups > >>> very shortly after boot. Everything becomes jerky. > >>> > >>> The only visible in the log indication is something like : > >>> > >>> [ 59.802129] clocksource: timekeeping watchdog on CPU3: Marking > >>> clocksource 'tsc' as unstable because the skew is too large: > >>> [ 59.802134] clocksource: 'hpet' wd_now: > >>> 3326e7aa wd_last: 329956f8 mask: [ 59.802137] > >>> clocksource: 'tsc' cs_now: 423662bc6f > >>> cs_last: 41dfc91650 mask: [ 59.802140] tsc: > >>> Marking TSC unstable due to clocksource watchdog [ 59.802158] > >>> TSC found unstable after boot, most likely due to broken BIOS. > >>> Use 'tsc=unstable'. [ 59.802161] sched_clock: Marking unstable > >>> (59802142067, 15510)<-(59920871789, -118714277) [ 60.015604] > >>> clocksource: Switched to clocksource hpet [ 89.015994] INFO: > >>> NMI handler (perf_event_nmi_handler) took too long to run: > >>> 209.660 msecs [ 89.016003] perf: interrupt took too long > >>> (1638003 > 2500), lowering kernel.perf_event_max_sample_rate to > >>> 1000 > >>> > >>> Just reverting that commit on top of linus mainline cures all the > >>> symptoms > >> > >> Interesting. Do you still get HPET interrupts? > > > > Sorry, I might need some basic help here (i.e where do I count > > them...) > > /proc/interrupts should display them. > > > After the watchdog switches the clocksource to hpet the system is > > still somewhat alive, so I'll guess some clock is still > > ticking > Probably, but I suspect they're not hitting the right CPU, hence the > lockups. > > Unfortunately, my x86-foo is pretty minimal, and I'm about to drop off > the net for a few days. > > Thomas, any insight? Looking at flat_cpu_mask_to_apicid(), I don't see how 74def747bcd0 can be correct: struct cpumask *effmsk = irq_data_get_effective_affinity_mask(irqdata); unsigned long cpu_mask = cpumask_bits(mask)[0] & APIC_ALL_CPUS; if (!cpu_mask) return -EINVAL; *apicid = (unsigned int)cpu_mask; cpumask_bits(effmsk)[0] = cpu_mask; Before that patch, this function wrote to the effective mask unconditionally. After, it only writes to effective_mask if it is already non-zero.
Re: 319554f284dd ("inet: don't use sk_v6_rcv_saddr directly") causes bind port regression
On Wed, 13 Sep 2017 17:28:25 + Josef Bacikwrote: > Sorry I thought I had made this other fix, can you apply this on top > of the other one and try that? I have more things to try if this > doesn’t work, sorry you are playing go between, but I want to make > sure I know _which_ fix actually fixes the problem, and then clean up > in followup patches. Thanks, > > Josef > > On 9/13/17, 8:45 AM, "Laura Abbott" wrote: > > On 09/12/2017 04:12 PM, Josef Bacik wrote: > > First I’m super sorry for the top post, I’m at plumbers and I > > forgot to upload my muttrc to my new cloud instance, so I’m screwed > > using outlook. > > > > I have a completely untested, uncompiled patch that I think will > > fix the problem, would you mind giving it a go? Thanks, > > > > Josef > > Thanks for the quick turnaround. Unfortunately, the problem is still > reproducible according to the reporter. > > Thanks, > Laura I am confused by the patch that originally caused this: if (sk->sk_family == AF_INET6) return ipv6_rcv_saddr_equal(>sk_v6_rcv_saddr, - >sk_v6_rcv_saddr, + inet6_rcv_saddr(sk2), sk->sk_rcv_saddr, sk2->sk_rcv_saddr, Shouldn't the first argument also be changed to use inet6_rcv_saddr()?
Re: 319554f284dd ("inet: don't use sk_v6_rcv_saddr directly") causes bind port regression
On Wed, 13 Sep 2017 17:28:25 + Josef Bacik wrote: > Sorry I thought I had made this other fix, can you apply this on top > of the other one and try that? I have more things to try if this > doesn’t work, sorry you are playing go between, but I want to make > sure I know _which_ fix actually fixes the problem, and then clean up > in followup patches. Thanks, > > Josef > > On 9/13/17, 8:45 AM, "Laura Abbott" wrote: > > On 09/12/2017 04:12 PM, Josef Bacik wrote: > > First I’m super sorry for the top post, I’m at plumbers and I > > forgot to upload my muttrc to my new cloud instance, so I’m screwed > > using outlook. > > > > I have a completely untested, uncompiled patch that I think will > > fix the problem, would you mind giving it a go? Thanks, > > > > Josef > > Thanks for the quick turnaround. Unfortunately, the problem is still > reproducible according to the reporter. > > Thanks, > Laura I am confused by the patch that originally caused this: if (sk->sk_family == AF_INET6) return ipv6_rcv_saddr_equal(>sk_v6_rcv_saddr, - >sk_v6_rcv_saddr, + inet6_rcv_saddr(sk2), sk->sk_rcv_saddr, sk2->sk_rcv_saddr, Shouldn't the first argument also be changed to use inet6_rcv_saddr()?
[tip:sched/core] sched/x86: Fix typo in __switch_to() comments
Commit-ID: 558a65bc31a0c7811b34dad32f51f47c55a4 Gitweb: http://git.kernel.org/tip/558a65bc31a0c7811b34dad32f51f47c55a4 Author: Chuck Ebbert AuthorDate: Wed, 14 Oct 2015 14:31:19 -0400 Committer: Ingo Molnar CommitDate: Mon, 19 Oct 2015 10:18:53 +0200 sched/x86: Fix typo in __switch_to() comments Fix obvious mistake: FS/GS should be DS/ES. Signed-off-by: Chuck Ebbert Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20151014143119.78858eeb@r5 Signed-off-by: Ingo Molnar --- arch/x86/kernel/process_64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index d7f1d5c..e835d26 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -332,7 +332,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) /* * Switch FS and GS. * -* These are even more complicated than FS and GS: they have +* These are even more complicated than DS and ES: they have * 64-bit bases are that controlled by arch_prctl. Those bases * only differ from the values in the GDT or LDT if the selector * is 0. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:sched/core] sched/x86: Fix typo in __switch_to() comments
Commit-ID: 558a65bc31a0c7811b34dad32f51f47c55a4 Gitweb: http://git.kernel.org/tip/558a65bc31a0c7811b34dad32f51f47c55a4 Author: Chuck Ebbert <cebbert.l...@gmail.com> AuthorDate: Wed, 14 Oct 2015 14:31:19 -0400 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Mon, 19 Oct 2015 10:18:53 +0200 sched/x86: Fix typo in __switch_to() comments Fix obvious mistake: FS/GS should be DS/ES. Signed-off-by: Chuck Ebbert <cebbert.l...@gmail.com> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Borislav Petkov <b...@alien8.de> Cc: Brian Gerst <brge...@gmail.com> Cc: Denys Vlasenko <dvlas...@redhat.com> Cc: H. Peter Anvin <h...@zytor.com> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Mike Galbraith <efa...@gmx.de> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Link: http://lkml.kernel.org/r/20151014143119.78858eeb@r5 Signed-off-by: Ingo Molnar <mi...@kernel.org> --- arch/x86/kernel/process_64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index d7f1d5c..e835d26 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -332,7 +332,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) /* * Switch FS and GS. * -* These are even more complicated than FS and GS: they have +* These are even more complicated than DS and ES: they have * 64-bit bases are that controlled by arch_prctl. Those bases * only differ from the values in the GDT or LDT if the selector * is 0. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sched, x86: Fix typo in comments in __switch_to()
Fix obvious mistake: FS/GS should be DS/ES. Signed-off-by: Chuck Ebbert diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index d7f1d5c..e835d26 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -332,7 +332,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) /* * Switch FS and GS. * -* These are even more complicated than FS and GS: they have +* These are even more complicated than DS and ES: they have * 64-bit bases are that controlled by arch_prctl. Those bases * only differ from the values in the GDT or LDT if the selector * is 0. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sched, x86: Fix typo in comments in __switch_to()
Fix obvious mistake: FS/GS should be DS/ES. Signed-off-by: Chuck Ebbert <cebbert.l...@gmail.com> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index d7f1d5c..e835d26 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -332,7 +332,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) /* * Switch FS and GS. * -* These are even more complicated than FS and GS: they have +* These are even more complicated than DS and ES: they have * 64-bit bases are that controlled by arch_prctl. Those bases * only differ from the values in the GDT or LDT if the selector * is 0. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] locking/static_keys: fix a silly typo
On Tue, 08 Sep 2015 12:05:04 -0400 Jason Baron wrote: > > > On 09/07/2015 03:18 PM, Jonathan Corbet wrote: > > 412758cb2670 (jump label, locking/static_keys: Update docs) introduced a > > typo that might as well get fixed. > > > > Signed-off-by: Jonathan Corbet > > --- > > Documentation/static-keys.txt | 2 +- > > include/linux/jump_label.h| 2 +- > > 2 files changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/Documentation/static-keys.txt b/Documentation/static-keys.txt > > index f4cb0b2..ec91158 100644 > > --- a/Documentation/static-keys.txt > > +++ b/Documentation/static-keys.txt > > @@ -16,7 +16,7 @@ The updated API replacements are: > > DEFINE_STATIC_KEY_TRUE(key); > > DEFINE_STATIC_KEY_FALSE(key); > > static_key_likely() > > -statick_key_unlikely() > > +static_key_unlikely() > > > > 0) Abstract > > > > diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h > > index 7f653e8..0684bd3 100644 > > --- a/include/linux/jump_label.h > > +++ b/include/linux/jump_label.h > > @@ -22,7 +22,7 @@ > > * DEFINE_STATIC_KEY_TRUE(key); > > * DEFINE_STATIC_KEY_FALSE(key); > > * static_key_likely() > > - * statick_key_unlikely() > > + * static_key_unlikely() > > * > > * Jump labels provide an interface to generate dynamic branches using > > * self-modifying code. Assuming toolchain and architecture support, if we > > > > Thanks. I actually messed this up further. That's supposed to be, > 'static_branch_likely()', and 'static_branch_unlikely()'. So: > > s/static_key_likely()/static_branch_likely() > > and > > s/static_key_unlikely()/static_branch_unlikely() > I sent a patch to fix that part on August 25: https://lkml.org/lkml/2015/8/25/288 Did I send it to the wrong person? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] locking/static_keys: fix a silly typo
On Tue, 08 Sep 2015 12:05:04 -0400 Jason Baronwrote: > > > On 09/07/2015 03:18 PM, Jonathan Corbet wrote: > > 412758cb2670 (jump label, locking/static_keys: Update docs) introduced a > > typo that might as well get fixed. > > > > Signed-off-by: Jonathan Corbet > > --- > > Documentation/static-keys.txt | 2 +- > > include/linux/jump_label.h| 2 +- > > 2 files changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/Documentation/static-keys.txt b/Documentation/static-keys.txt > > index f4cb0b2..ec91158 100644 > > --- a/Documentation/static-keys.txt > > +++ b/Documentation/static-keys.txt > > @@ -16,7 +16,7 @@ The updated API replacements are: > > DEFINE_STATIC_KEY_TRUE(key); > > DEFINE_STATIC_KEY_FALSE(key); > > static_key_likely() > > -statick_key_unlikely() > > +static_key_unlikely() > > > > 0) Abstract > > > > diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h > > index 7f653e8..0684bd3 100644 > > --- a/include/linux/jump_label.h > > +++ b/include/linux/jump_label.h > > @@ -22,7 +22,7 @@ > > * DEFINE_STATIC_KEY_TRUE(key); > > * DEFINE_STATIC_KEY_FALSE(key); > > * static_key_likely() > > - * statick_key_unlikely() > > + * static_key_unlikely() > > * > > * Jump labels provide an interface to generate dynamic branches using > > * self-modifying code. Assuming toolchain and architecture support, if we > > > > Thanks. I actually messed this up further. That's supposed to be, > 'static_branch_likely()', and 'static_branch_unlikely()'. So: > > s/static_key_likely()/static_branch_likely() > > and > > s/static_key_unlikely()/static_branch_unlikely() > I sent a patch to fix that part on August 25: https://lkml.org/lkml/2015/8/25/288 Did I send it to the wrong person? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: stop breaking dosemu (Re: x86/kconfig/32: Rename CONFIG_VM86 and default it to 'n')
On Fri, 4 Sep 2015 00:28:04 +0300 Stas Sergeev wrote: > 03.09.2015 21:51, Austin S Hemmelgarn пишет: > > There are servers out there that have this enabled and _never_ use it > > at all, > Unless I am mistaken, servers usually use special flavour of the > distro (different from desktop install), where of course this will > be disabled _compile time_. > Many (most?) distros use just one kernel for everything, because it's just too much work to have a separate flavor for servers. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: stop breaking dosemu (Re: x86/kconfig/32: Rename CONFIG_VM86 and default it to 'n')
On Fri, 4 Sep 2015 00:28:04 +0300 Stas Sergeevwrote: > 03.09.2015 21:51, Austin S Hemmelgarn пишет: > > There are servers out there that have this enabled and _never_ use it > > at all, > Unless I am mistaken, servers usually use special flavour of the > distro (different from desktop install), where of course this will > be disabled _compile time_. > Many (most?) distros use just one kernel for everything, because it's just too much work to have a separate flavor for servers. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] Ext3 removal, quota & udf fixes
On Tue, 1 Sep 2015 15:39:45 -0400 Austin S Hemmelgarn wrote: > On 2015-09-01 06:29, Albino B Neto wrote: > > 2015-08-31 19:31 GMT-03:00 Raymond Jennings : > >> I think also that we should remove the ext2 driver before we remove the > >> ext3 > >> driver. > > > > Yes. It is logical to remove the old ext2 drive, because there are > > more computers with ext3 that ext2. Ext2 is obsolete by existing > > technologies. > > > NO, it is not logical. A vast majority of Android smartphones in the > wild use ext2, as do a very significant portion of embedded systems that > don't have room for the few hundred kilobytes of extra code that the > ext4 driver has in comparison to ext2. Would it be possible to discard the code used for ext4 and ext3 features at module init time? So you could do something like: modprobe ext4 no_ext4 no_ext3 and all the space used by those functions would be freed and the filesystem driver would mark all the ext3/ext4 features as unsupported. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] Ext3 removal, quota & udf fixes
On Tue, 1 Sep 2015 15:39:45 -0400 Austin S Hemmelgarnwrote: > On 2015-09-01 06:29, Albino B Neto wrote: > > 2015-08-31 19:31 GMT-03:00 Raymond Jennings : > >> I think also that we should remove the ext2 driver before we remove the > >> ext3 > >> driver. > > > > Yes. It is logical to remove the old ext2 drive, because there are > > more computers with ext3 that ext2. Ext2 is obsolete by existing > > technologies. > > > NO, it is not logical. A vast majority of Android smartphones in the > wild use ext2, as do a very significant portion of embedded systems that > don't have room for the few hundred kilobytes of extra code that the > ext4 driver has in comparison to ext2. Would it be possible to discard the code used for ext4 and ext3 features at module init time? So you could do something like: modprobe ext4 no_ext4 no_ext3 and all the space used by those functions would be freed and the filesystem driver would mark all the ext3/ext4 features as unsupported. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG 4.2-rc8] Interrupt occurs while apply_alternatives() is patching the handler
This is from https://bugzilla.redhat.com/show_bug.cgi?id=1258223 [0.036000] BUG: unable to handle kernel paging request at 55501e06 [0.036000] IP: [] common_interrupt+0xb/0x38 [0.036000] *pde = [0.036000] Oops: [#1] SMP [0.036000] Modules linked in: [0.036000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-0.rc8.git3.1.fc24.i686 #1 [0.036000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014 [0.036000] task: c0d49ac0 ti: c0d42000 task.ti: c0d42000 [0.036000] EIP: 0060:[] EFLAGS: 00200046 CPU: 0 [0.036000] EIP is at common_interrupt+0xb/0x38 [0.036000] EAX: c0aae480 EBX: 008d ECX: c0ab1c83 EDX: e4af6810 [0.036000] ESI: 029a7802 EDI: 0003 EBP: c0d43e68 ESP: c0d43e44 [0.036000] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [0.036000] CR0: 8005003b CR2: 55501e06 CR3: 00ebd000 CR4: 0690 [0.036000] DR0: DR1: DR2: DR3: [0.036000] DR6: DR7: [0.036000] Stack: [0.036000] 004f c0409c80 0060 00200202 00200046 c0d43e60 c0ea150c 029a7802 [0.036000] c0d43fb8 c040a054 c07f1cf0 6c0a1000 0006 00200046 0043 [0.036000] c0ed0bc0 c0d43e98 c071a6fc c0d43ea8 c0d43ec4 c0ea4c73 c0ea4c7f [0.036000] Call Trace: [0.036000] [] ? add_nops+0x90/0xa0 [0.036000] [] apply_alternatives+0x274/0x630 [0.036000] [] ? wait_for_xmitr+0xa0/0xa0 [0.036000] [] ? sprintf+0x1c/0x20 [0.036000] [] ? irq_entries_start+0x698/0x698 [0.036000] [] ? memcpy+0xb/0x30 [0.036000] [] ? serial8250_set_termios+0x20/0x20 [0.036000] [] ? _raw_write_unlock_irqrestore+0x13/0x20 [0.036000] [] ? _raw_write_unlock_irqrestore+0x13/0x20 [0.036000] [] ? _raw_spin_unlock_irqrestore+0xd/0x10 [0.036000] [] ? console_unlock+0x2e9/0x610 [0.036000] [] ? log_store+0x1cd/0x210 [0.036000] [] ? vprintk_emit+0x29e/0x570 [0.036000] [] ? vprintk_default+0x41/0x60 [0.036000] [] ? printk+0x17/0x19 [0.036000] [] ? identify_boot_cpu+0x7b/0x80 [0.036000] [] alternative_instructions+0x17/0xc1 [0.036000] [] check_bugs+0x32/0x39 [0.036000] [] start_kernel+0x3ca/0x40a [0.036000] [] i386_start_kernel+0x91/0x95 [0.036000] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 8d 90 90 83 04 24 80 fc 0f a8 0f 06 1e 50 55 57 56 52 51 53 ba 7b 00 00 00 8e da 8e c2 ba d8 0: 8d 90 90 83 04 24 lea0x24048390(%eax),%edx 6: 80 fc 0fcmp$0xf,%ah 9: a8 0f test $0xf,%al >> b: a0 06 1e 50 55 mov0x55501e06,%al 10: 57 push %edi 11: 56 push %esi Interrupt 0x30 occurred while the alternatives code was replacing the initial 0x90,0x90,0x90 NOPs (from the ASM_CLAC macro) with the optimized version, 0x8d,0x76,0x00. Only the first byte has been replaced so far, and it makes a mess out of the insn decoding. I have no clue how to fix this. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG 4.2-rc8] Interrupt occurs while apply_alternatives() is patching the handler
This is from https://bugzilla.redhat.com/show_bug.cgi?id=1258223 [0.036000] BUG: unable to handle kernel paging request at 55501e06 [0.036000] IP: [c0aae48b] common_interrupt+0xb/0x38 [0.036000] *pde = [0.036000] Oops: [#1] SMP [0.036000] Modules linked in: [0.036000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-0.rc8.git3.1.fc24.i686 #1 [0.036000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014 [0.036000] task: c0d49ac0 ti: c0d42000 task.ti: c0d42000 [0.036000] EIP: 0060:[c0aae48b] EFLAGS: 00200046 CPU: 0 [0.036000] EIP is at common_interrupt+0xb/0x38 [0.036000] EAX: c0aae480 EBX: 008d ECX: c0ab1c83 EDX: e4af6810 [0.036000] ESI: 029a7802 EDI: 0003 EBP: c0d43e68 ESP: c0d43e44 [0.036000] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [0.036000] CR0: 8005003b CR2: 55501e06 CR3: 00ebd000 CR4: 0690 [0.036000] DR0: DR1: DR2: DR3: [0.036000] DR6: DR7: [0.036000] Stack: [0.036000] 004f c0409c80 0060 00200202 00200046 c0d43e60 c0ea150c 029a7802 [0.036000] c0d43fb8 c040a054 c07f1cf0 6c0a1000 0006 00200046 0043 [0.036000] c0ed0bc0 c0d43e98 c071a6fc c0d43ea8 c0d43ec4 c0ea4c73 c0ea4c7f [0.036000] Call Trace: [0.036000] [c0409c80] ? add_nops+0x90/0xa0 [0.036000] [c040a054] apply_alternatives+0x274/0x630 [0.036000] [c07f1cf0] ? wait_for_xmitr+0xa0/0xa0 [0.036000] [c071a6fc] ? sprintf+0x1c/0x20 [0.036000] [c0aae480] ? irq_entries_start+0x698/0x698 [0.036000] [c071be4b] ? memcpy+0xb/0x30 [0.036000] [c07f3950] ? serial8250_set_termios+0x20/0x20 [0.036000] [c0aad4e3] ? _raw_write_unlock_irqrestore+0x13/0x20 [0.036000] [c0aad4e3] ? _raw_write_unlock_irqrestore+0x13/0x20 [0.036000] [c0aad4fd] ? _raw_spin_unlock_irqrestore+0xd/0x10 [0.036000] [c04b17b9] ? console_unlock+0x2e9/0x610 [0.036000] [c04b03cd] ? log_store+0x1cd/0x210 [0.036000] [c04b1d7e] ? vprintk_emit+0x29e/0x570 [0.036000] [c04b21e1] ? vprintk_default+0x41/0x60 [0.036000] [c0aa7725] ? printk+0x17/0x19 [0.036000] [c0dfdd48] ? identify_boot_cpu+0x7b/0x80 [0.036000] [c0dfca47] alternative_instructions+0x17/0xc1 [0.036000] [c0dfdda9] check_bugs+0x32/0x39 [0.036000] [c0df6b57] start_kernel+0x3ca/0x40a [0.036000] [c0df62e3] i386_start_kernel+0x91/0x95 [0.036000] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 8d 90 90 83 04 24 80 fc 0f a8 0f a0 06 1e 50 55 57 56 52 51 53 ba 7b 00 00 00 8e da 8e c2 ba d8 0: 8d 90 90 83 04 24 lea0x24048390(%eax),%edx 6: 80 fc 0fcmp$0xf,%ah 9: a8 0f test $0xf,%al b: a0 06 1e 50 55 mov0x55501e06,%al 10: 57 push %edi 11: 56 push %esi Interrupt 0x30 occurred while the alternatives code was replacing the initial 0x90,0x90,0x90 NOPs (from the ASM_CLAC macro) with the optimized version, 0x8d,0x76,0x00. Only the first byte has been replaced so far, and it makes a mess out of the insn decoding. I have no clue how to fix this. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -next] static-keys: Better error checking for static_key_enable/disable
The warnings for static_key_enable/disable don't catch common errors. For example, starting with a default enabled key and calling enable doesn't cause a warning until the next enable or disable. Check explicitly for zero or one instead of allowing both values in every case. Generated code should be smaller too. Signed-off-by: Chuck Ebbert diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h index 7f653e8..ba9ca0c 100644 --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -225,7 +225,7 @@ static inline void static_key_enable(struct static_key *key) { int count = static_key_count(key); - WARN_ON_ONCE(count < 0 || count > 1); + WARN_ON_ONCE(count); if (!count) static_key_slow_inc(key); @@ -235,7 +235,7 @@ static inline void static_key_disable(struct static_key *key) { int count = static_key_count(key); - WARN_ON_ONCE(count < 0 || count > 1); + WARN_ON_ONCE(count != 1); if (count) static_key_slow_dec(key); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -next] static-keys: Better error checking for static_key_enable/disable
The warnings for static_key_enable/disable don't catch common errors. For example, starting with a default enabled key and calling enable doesn't cause a warning until the next enable or disable. Check explicitly for zero or one instead of allowing both values in every case. Generated code should be smaller too. Signed-off-by: Chuck Ebbert cebbert.l...@gmail.com diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h index 7f653e8..ba9ca0c 100644 --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -225,7 +225,7 @@ static inline void static_key_enable(struct static_key *key) { int count = static_key_count(key); - WARN_ON_ONCE(count 0 || count 1); + WARN_ON_ONCE(count); if (!count) static_key_slow_inc(key); @@ -235,7 +235,7 @@ static inline void static_key_disable(struct static_key *key) { int count = static_key_count(key); - WARN_ON_ONCE(count 0 || count 1); + WARN_ON_ONCE(count != 1); if (count) static_key_slow_dec(key); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ip_rcv_finish() NULL pointer and possibly related Oopses
On Wed, 26 Aug 2015 08:46:59 + Shaun Crampton wrote: > Testing our app at scale on Google¹s GCE, running ~1000 CoreOS hosts: over > approximately 1 hour, I see about 1 in 50 hosts hit one of the Oopses > below and then reboot (I¹m not sure if the different oopses are related to > each other). > > The app is Project Calico, which is a datacenter networking fabric. > calico-felix, the process named below, is our per-host agent. The > per-host agent is responsible for reading the network information from a > central server and applying "ip route² and "iptables" updates to the > kernel. We¹re running on CoreOS, with about 100 docker containers/veths > pairs running on each host. calico-felix is running inside one of those > containers. We also run the BIRD BGP stack to redistribute routes around > the datacenter. The errors happen more frequently while Calico is under > load. > > I¹m not sure where to go from here. I can reproduce these issues easily > at that scale but I haven¹t managed to boil it down to a small-scale repro > scenario for further investigation (yet). > What in the world is going on with those call traces? E.g.: > [ 4513.712008] > [ 4513.712008] [] ? ip_rcv_finish+0x81/0x360 > [ 4513.712008] [] ip_rcv+0x2a4/0x400 > [ 4513.712008] [] ? inet_del_offload+0x40/0x40 > [ 4513.712008] [] __netif_receive_skb_core+0x6c3/0x9a0 > [ 4513.712008] [] ? build_skb+0x17/0x90 > [ 4513.712008] [] __netif_receive_skb+0x18/0x60 > [ 4513.712008] [] netif_receive_skb_internal+0x33/0xa0 > [ 4513.712008] [] netif_receive_skb_sk+0x1c/0x70 > [ 4513.712008] [] 0xa00f772b > [ 4513.712008] [] ? __netif_receive_skb_core+0x6c3/0x9a0 > [ 4513.712008] [] 0xa00f7d81 > [ 4513.712008] [] net_rx_action+0x159/0x340 > [ 4513.712008] [] __do_softirq+0xf4/0x290 > [ 4513.712008] [] irq_exit+0xad/0xc0 > [ 4513.712008] [] do_IRQ+0x5a/0xf0 > [ 4513.712008] [] common_interrupt+0x6e/0x6e > [ 4513.712008] There are two functions in the call trace that the kernel knows nothing about. How did they get in there? And there is really executable code in there, as can be seen from a later trace: > [ 4123.003006] > [ 4123.003006] [] nf_iterate+0x57/0x80 > [ 4123.003006] [] nf_hook_slow+0x97/0x100 > [ 4123.003006] [] ip_local_deliver+0x92/0xa0 > [ 4123.003006] [] ? ip_rcv_finish+0x360/0x360 > [ 4123.003006] [] ip_rcv_finish+0x81/0x360 > [ 4123.003006] [] ip_rcv+0x2a4/0x400 > [ 4123.003006] [] ? inet_del_offload+0x40/0x40 > [ 4123.003006] [] __netif_receive_skb_core+0x6c3/0x9a0 > [ 4123.003006] [] ? build_skb+0x17/0x90 > [ 4123.003006] [] __netif_receive_skb+0x18/0x60 > [ 4123.003006] [] netif_receive_skb_internal+0x33/0xa0 > [ 4123.003006] [] netif_receive_skb_sk+0x1c/0x70 > [ 4123.003006] [] 0xa00d472b > [ 4123.003006] [] 0xa00d4d81 > [ 4123.003006] [] net_rx_action+0x159/0x340 > [ 4123.003006] [] __do_softirq+0xf4/0x290 > [ 4123.003006] [] irq_exit+0xad/0xc0 > [ 4123.003006] [] do_IRQ+0x5a/0xf0 > [ 4123.003006] [] common_interrupt+0x6e/0x6e > [ 4123.003006] > [ 4123.003006] [] ? __ip_route_output_key+0x31d/0x860 > [ 4123.003006] [] ? xfrm_lookup_route+0x5/0x70 > [ 4123.003006] [] ? ip_route_output_flow+0x54/0x60 > [ 4123.003006] [] ip_queue_xmit+0x36a/0x3d0 > [ 4123.003006] [] tcp_transmit_skb+0x4b9/0x990 > [ 4123.003006] [] tcp_write_xmit+0x115/0xe90 > [ 4123.003006] [] __tcp_push_pending_frames+0x32/0xd0 > [ 4123.003006] [] tcp_push+0xef/0x120 > [ 4123.003006] [] tcp_sendmsg+0xc5/0xb20 > [ 4123.003006] [] ? lock_hrtimer_base.isra.22+0x29/0x50 > [ 4123.003006] [] inet_sendmsg+0x64/0xa0 > [ 4123.003006] [] ? __fget_light+0x25/0x70 > [ 4123.003006] [] sock_sendmsg+0x3d/0x50 > [ 4123.003006] [] SYSC_sendto+0x102/0x1a0 > [ 4123.003006] [] ? __audit_syscall_entry+0xb4/0x110 > [ 4123.003006] [] ? do_audit_syscall_entry+0x6c/0x70 > [ 4123.003006] [] ? > syscall_trace_enter_phase1+0x103/0x160 > [ 4123.003006] [] SyS_sendto+0xe/0x10 > [ 4123.003006] [] system_call_fastpath+0x12/0x71 > [ 4123.003006] Code: <48> 8b 88 40 03 00 00 e8 1d dd dd ff 5d c3 0f 1f 00 > 41 83 b9 80 00 > [ 4123.003006] RIP [] 0xa0233027 > [ 4123.003006] RSP Presumably the same two functions as before (loaded at a different base address but same offsets, 0xd81 and 0x72b). And then nf_iterate call into another unknown function, and there really is code there and it's consistent with the oops. And the kernel thinks it's outside of any normal text section, so it does not try to dump any code from before the instruction pointer. 0: 48 8b 88 40 03 00 00mov0x340(%rax),%rcx 7: e8 1d dd dd ff callq 0xff29 c: 5d pop%rbp d: c3 retq Did you write your own module loader or something? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: ip_rcv_finish() NULL pointer and possibly related Oopses
On Wed, 26 Aug 2015 08:46:59 + Shaun Crampton shaun.cramp...@metaswitch.com wrote: Testing our app at scale on Google¹s GCE, running ~1000 CoreOS hosts: over approximately 1 hour, I see about 1 in 50 hosts hit one of the Oopses below and then reboot (I¹m not sure if the different oopses are related to each other). The app is Project Calico, which is a datacenter networking fabric. calico-felix, the process named below, is our per-host agent. The per-host agent is responsible for reading the network information from a central server and applying ip route² and iptables updates to the kernel. We¹re running on CoreOS, with about 100 docker containers/veths pairs running on each host. calico-felix is running inside one of those containers. We also run the BIRD BGP stack to redistribute routes around the datacenter. The errors happen more frequently while Calico is under load. I¹m not sure where to go from here. I can reproduce these issues easily at that scale but I haven¹t managed to boil it down to a small-scale repro scenario for further investigation (yet). What in the world is going on with those call traces? E.g.: [ 4513.712008] IRQ [ 4513.712008] [81486751] ? ip_rcv_finish+0x81/0x360 [ 4513.712008] [814870e4] ip_rcv+0x2a4/0x400 [ 4513.712008] [814866d0] ? inet_del_offload+0x40/0x40 [ 4513.712008] [814491b3] __netif_receive_skb_core+0x6c3/0x9a0 [ 4513.712008] [8143b667] ? build_skb+0x17/0x90 [ 4513.712008] [814494a8] __netif_receive_skb+0x18/0x60 [ 4513.712008] [81449523] netif_receive_skb_internal+0x33/0xa0 [ 4513.712008] [814495ac] netif_receive_skb_sk+0x1c/0x70 [ 4513.712008] [a00f772b] 0xa00f772b [ 4513.712008] [814491b3] ? __netif_receive_skb_core+0x6c3/0x9a0 [ 4513.712008] [a00f7d81] 0xa00f7d81 [ 4513.712008] [81449979] net_rx_action+0x159/0x340 [ 4513.712008] [810715f4] __do_softirq+0xf4/0x290 [ 4513.712008] [810719fd] irq_exit+0xad/0xc0 [ 4513.712008] [815528ba] do_IRQ+0x5a/0xf0 [ 4513.712008] [815507ae] common_interrupt+0x6e/0x6e [ 4513.712008] EOI There are two functions in the call trace that the kernel knows nothing about. How did they get in there? And there is really executable code in there, as can be seen from a later trace: [ 4123.003006] IRQ [ 4123.003006] [8147d477] nf_iterate+0x57/0x80 [ 4123.003006] [8147d537] nf_hook_slow+0x97/0x100 [ 4123.003006] [81486e32] ip_local_deliver+0x92/0xa0 [ 4123.003006] [81486a30] ? ip_rcv_finish+0x360/0x360 [ 4123.003006] [81486751] ip_rcv_finish+0x81/0x360 [ 4123.003006] [814870e4] ip_rcv+0x2a4/0x400 [ 4123.003006] [814866d0] ? inet_del_offload+0x40/0x40 [ 4123.003006] [814491b3] __netif_receive_skb_core+0x6c3/0x9a0 [ 4123.003006] [8143b667] ? build_skb+0x17/0x90 [ 4123.003006] [814494a8] __netif_receive_skb+0x18/0x60 [ 4123.003006] [81449523] netif_receive_skb_internal+0x33/0xa0 [ 4123.003006] [814495ac] netif_receive_skb_sk+0x1c/0x70 [ 4123.003006] [a00d472b] 0xa00d472b [ 4123.003006] [a00d4d81] 0xa00d4d81 [ 4123.003006] [81449979] net_rx_action+0x159/0x340 [ 4123.003006] [810715f4] __do_softirq+0xf4/0x290 [ 4123.003006] [810719fd] irq_exit+0xad/0xc0 [ 4123.003006] [815528ba] do_IRQ+0x5a/0xf0 [ 4123.003006] [815507ae] common_interrupt+0x6e/0x6e [ 4123.003006] EOI [ 4123.003006] [81483a3d] ? __ip_route_output_key+0x31d/0x860 [ 4123.003006] [814e2e95] ? xfrm_lookup_route+0x5/0x70 [ 4123.003006] [81484224] ? ip_route_output_flow+0x54/0x60 [ 4123.003006] [8148ca6a] ip_queue_xmit+0x36a/0x3d0 [ 4123.003006] [814a4799] tcp_transmit_skb+0x4b9/0x990 [ 4123.003006] [814a4d85] tcp_write_xmit+0x115/0xe90 [ 4123.003006] [814a5d72] __tcp_push_pending_frames+0x32/0xd0 [ 4123.003006] [8149443f] tcp_push+0xef/0x120 [ 4123.003006] [81497cb5] tcp_sendmsg+0xc5/0xb20 [ 4123.003006] [810d74c9] ? lock_hrtimer_base.isra.22+0x29/0x50 [ 4123.003006] [814c2d04] inet_sendmsg+0x64/0xa0 [ 4123.003006] [811e94b5] ? __fget_light+0x25/0x70 [ 4123.003006] [8142d74d] sock_sendmsg+0x3d/0x50 [ 4123.003006] [8142dc12] SYSC_sendto+0x102/0x1a0 [ 4123.003006] [8110f864] ? __audit_syscall_entry+0xb4/0x110 [ 4123.003006] [810224fc] ? do_audit_syscall_entry+0x6c/0x70 [ 4123.003006] [81023cf3] ? syscall_trace_enter_phase1+0x103/0x160 [ 4123.003006] [8142e75e] SyS_sendto+0xe/0x10 [ 4123.003006] [8154fc6e] system_call_fastpath+0x12/0x71 [ 4123.003006] Code: 48 8b 88 40 03 00 00 e8 1d dd dd ff 5d c3 0f 1f 00 41 83 b9 80 00 [ 4123.003006] RIP [a0233027] 0xa0233027 [
[PATCH -next] static-keys: Fix documentation
Fix some mistakes and typos, clean up text a bit. Signed-off-by: Chuck Ebbert diff --git a/Documentation/static-keys.txt b/Documentation/static-keys.txt index f4cb0b2..127391c 100644 --- a/Documentation/static-keys.txt +++ b/Documentation/static-keys.txt @@ -3,8 +3,8 @@ DEPRECATED API: -The use of 'struct static_key' directly, is now DEPRECATED. In addition -static_key_{true,false}() is also DEPRECATED. IE DO NOT use the following: +The use of 'struct static_key' directly is now DEPRECATED. In addition +static_key_{true,false}() is also DEPRECATED. I.e. DO NOT use the following: struct static_key false = STATIC_KEY_INIT_FALSE; struct static_key true = STATIC_KEY_INIT_TRUE; @@ -15,8 +15,8 @@ The updated API replacements are: DEFINE_STATIC_KEY_TRUE(key); DEFINE_STATIC_KEY_FALSE(key); -static_key_likely() -statick_key_unlikely() +static_branch_likely() +static_branch_unlikely() 0) Abstract diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h index 7f653e8..dd89266 100644 --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -9,8 +9,8 @@ * * DEPRECATED API: * - * The use of 'struct static_key' directly, is now DEPRECATED. In addition - * static_key_{true,false}() is also DEPRECATED. IE DO NOT use the following: + * The use of 'struct static_key' directly is now DEPRECATED. In addition + * static_key_{true,false}() is also DEPRECATED. I.e. DO NOT use the following: * * struct static_key false = STATIC_KEY_INIT_FALSE; * struct static_key true = STATIC_KEY_INIT_TRUE; @@ -21,8 +21,8 @@ * * DEFINE_STATIC_KEY_TRUE(key); * DEFINE_STATIC_KEY_FALSE(key); - * static_key_likely() - * statick_key_unlikely() + * static_branch_likely() + * static_branch_unlikely() * * Jump labels provide an interface to generate dynamic branches using * self-modifying code. Assuming toolchain and architecture support, if we @@ -90,7 +90,7 @@ extern bool static_key_initialized; struct static_key { atomic_t enabled; -/* Set lsb bit to 1 if branch is default true, 0 ot */ +/* Set lsb to 1 if branch is default true, 0 otherwise */ struct jump_entry *entries; #ifdef CONFIG_MODULES struct static_key_mod *next; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -next] static-keys: Fix documentation
Fix some mistakes and typos, clean up text a bit. Signed-off-by: Chuck Ebbert cebbert.l...@gmail.com diff --git a/Documentation/static-keys.txt b/Documentation/static-keys.txt index f4cb0b2..127391c 100644 --- a/Documentation/static-keys.txt +++ b/Documentation/static-keys.txt @@ -3,8 +3,8 @@ DEPRECATED API: -The use of 'struct static_key' directly, is now DEPRECATED. In addition -static_key_{true,false}() is also DEPRECATED. IE DO NOT use the following: +The use of 'struct static_key' directly is now DEPRECATED. In addition +static_key_{true,false}() is also DEPRECATED. I.e. DO NOT use the following: struct static_key false = STATIC_KEY_INIT_FALSE; struct static_key true = STATIC_KEY_INIT_TRUE; @@ -15,8 +15,8 @@ The updated API replacements are: DEFINE_STATIC_KEY_TRUE(key); DEFINE_STATIC_KEY_FALSE(key); -static_key_likely() -statick_key_unlikely() +static_branch_likely() +static_branch_unlikely() 0) Abstract diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h index 7f653e8..dd89266 100644 --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -9,8 +9,8 @@ * * DEPRECATED API: * - * The use of 'struct static_key' directly, is now DEPRECATED. In addition - * static_key_{true,false}() is also DEPRECATED. IE DO NOT use the following: + * The use of 'struct static_key' directly is now DEPRECATED. In addition + * static_key_{true,false}() is also DEPRECATED. I.e. DO NOT use the following: * * struct static_key false = STATIC_KEY_INIT_FALSE; * struct static_key true = STATIC_KEY_INIT_TRUE; @@ -21,8 +21,8 @@ * * DEFINE_STATIC_KEY_TRUE(key); * DEFINE_STATIC_KEY_FALSE(key); - * static_key_likely() - * statick_key_unlikely() + * static_branch_likely() + * static_branch_unlikely() * * Jump labels provide an interface to generate dynamic branches using * self-modifying code. Assuming toolchain and architecture support, if we @@ -90,7 +90,7 @@ extern bool static_key_initialized; struct static_key { atomic_t enabled; -/* Set lsb bit to 1 if branch is default true, 0 ot */ +/* Set lsb to 1 if branch is default true, 0 otherwise */ struct jump_entry *entries; #ifdef CONFIG_MODULES struct static_key_mod *next; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hello everyone <3
On Sun, 16 Aug 2015 02:00:34 +0200 noi...@a6.25u.com wrote: > Question: Wouldn't it be a good idea to enforce the Linux trademark > (somewhen) in a way that all these streamlined operating systems use the > word "Linux" more carefully (or not at all) in their promotional > material? To make sure "correlation" isn't (deliberately) twisted into > "causation" by the media /if/ the streamlining trend starts to cause > serious regressions in transparency and reliability? > > Or is that too much politics for the weekend? Concern troll is concerned. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs: out of bounds on stack in iov_iter_advance
On Wed, 12 Aug 2015 10:13:24 -0400 Sasha Levin wrote: > While fuzzing with trinity inside a KVM tools guest running -next I've > stumbled on the following: > > [64092.216447] > == > [64092.217840] BUG: KASan: out of bounds on stack in > iov_iter_advance+0x3b7/0x480 at addr 88040506fd48 > [64092.219314] Read of size 8 by task trinity-c194/11387 > [64092.220114] page:ea0010141bc0 count:0 mapcount:0 mapping: > (null) index:0x2 > [64092.221354] flags: 0x46f8000() > [64092.221998] page dumped because: kasan: bad access detected > [64092.222879] CPU: 4 PID: 11387 Comm: trinity-c194 Not tainted > 4.2.0-rc6-next-20150810-sasha-00040-g12ad0db3-dirty #2427 > [64092.224537] 88040506fd30 88040506fa88 9ce7763b > 88040506fb10 > [64092.225763] 88040506fb00 9376b1be > 880270108600 > [64092.226992] 0282 > > [64092.228221] Call Trace: > [64092.228679] dump_stack (lib/dump_stack.c:52) > [64092.231252] kasan_report_error (mm/kasan/report.c:132 > mm/kasan/report.c:193) > [64092.232219] __asan_report_load8_noabort (mm/kasan/report.c:251) > [64092.234167] iov_iter_advance (lib/iov_iter.c:511) > [64092.235105] generic_file_read_iter (mm/filemap.c:1743) > [64092.241532] blkdev_read_iter (fs/block_dev.c:1649) > [64092.242448] __vfs_read (fs/read_write.c:423 fs/read_write.c:434) > [64092.246949] vfs_read (fs/read_write.c:454) > [64092.247743] SyS_pread64 (fs/read_write.c:607 fs/read_write.c:594) > [64092.250445] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:186) > [64092.251440] Memory state around the buggy address: > [64092.252221] 88040506fc00: 00 00 00 f1 f1 f1 f1 00 00 00 00 00 f4 f4 > f4 f3 > [64092.253340] 88040506fc80: f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 > 00 00 > [64092.254456] >88040506fd00: 00 00 f1 f1 f1 f1 00 00 f4 f4 f2 f2 f2 f2 > 00 00 > [64092.255566] ^ > [64092.256432] 88040506fd80: 00 00 00 f4 f4 f4 f2 f2 f2 f2 00 00 00 00 > 00 f4 > [64092.257557] 88040506fe00: f4 f4 f3 f3 f3 f3 00 00 00 00 00 00 00 00 > 00 00 > [64092.258684] > == > I tried to debug this but kasan doesn't print much useful information for stack out of bounds access. It shows the address that's being accessed but it doesn't show the value of the boundary that was exceeded. And the stack dump doesn't show any addresses either - just contents. It would be nice to see a full stack frame dump showing where all the parent frames are too. Also too the file and line number (lib/iov_iter.c:511) are completely useless because of inlining, though that's not kasan's fault. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Hello everyone 3
On Sun, 16 Aug 2015 02:00:34 +0200 noi...@a6.25u.com wrote: Question: Wouldn't it be a good idea to enforce the Linux trademark (somewhen) in a way that all these streamlined operating systems use the word Linux more carefully (or not at all) in their promotional material? To make sure correlation isn't (deliberately) twisted into causation by the media /if/ the streamlining trend starts to cause serious regressions in transparency and reliability? Or is that too much politics for the weekend? Concern troll is concerned. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fs: out of bounds on stack in iov_iter_advance
On Wed, 12 Aug 2015 10:13:24 -0400 Sasha Levin sasha.le...@oracle.com wrote: While fuzzing with trinity inside a KVM tools guest running -next I've stumbled on the following: [64092.216447] == [64092.217840] BUG: KASan: out of bounds on stack in iov_iter_advance+0x3b7/0x480 at addr 88040506fd48 [64092.219314] Read of size 8 by task trinity-c194/11387 [64092.220114] page:ea0010141bc0 count:0 mapcount:0 mapping: (null) index:0x2 [64092.221354] flags: 0x46f8000() [64092.221998] page dumped because: kasan: bad access detected [64092.222879] CPU: 4 PID: 11387 Comm: trinity-c194 Not tainted 4.2.0-rc6-next-20150810-sasha-00040-g12ad0db3-dirty #2427 [64092.224537] 88040506fd30 88040506fa88 9ce7763b 88040506fb10 [64092.225763] 88040506fb00 9376b1be 880270108600 [64092.226992] 0282 [64092.228221] Call Trace: [64092.228679] dump_stack (lib/dump_stack.c:52) [64092.231252] kasan_report_error (mm/kasan/report.c:132 mm/kasan/report.c:193) [64092.232219] __asan_report_load8_noabort (mm/kasan/report.c:251) [64092.234167] iov_iter_advance (lib/iov_iter.c:511) [64092.235105] generic_file_read_iter (mm/filemap.c:1743) [64092.241532] blkdev_read_iter (fs/block_dev.c:1649) [64092.242448] __vfs_read (fs/read_write.c:423 fs/read_write.c:434) [64092.246949] vfs_read (fs/read_write.c:454) [64092.247743] SyS_pread64 (fs/read_write.c:607 fs/read_write.c:594) [64092.250445] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:186) [64092.251440] Memory state around the buggy address: [64092.252221] 88040506fc00: 00 00 00 f1 f1 f1 f1 00 00 00 00 00 f4 f4 f4 f3 [64092.253340] 88040506fc80: f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00 [64092.254456] 88040506fd00: 00 00 f1 f1 f1 f1 00 00 f4 f4 f2 f2 f2 f2 00 00 [64092.255566] ^ [64092.256432] 88040506fd80: 00 00 00 f4 f4 f4 f2 f2 f2 f2 00 00 00 00 00 f4 [64092.257557] 88040506fe00: f4 f4 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 [64092.258684] == I tried to debug this but kasan doesn't print much useful information for stack out of bounds access. It shows the address that's being accessed but it doesn't show the value of the boundary that was exceeded. And the stack dump doesn't show any addresses either - just contents. It would be nice to see a full stack frame dump showing where all the parent frames are too. Also too the file and line number (lib/iov_iter.c:511) are completely useless because of inlining, though that's not kasan's fault. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Disable input device
On Sat, 29 Nov 2014 18:24:03 +0100 Pali Rohár wrote: > What do you think about adding new sysfs file "disable" (accept > values 1 or 0) for every input device? With "1" it cause that > kernel will drop all events from specific input device and if > driver provide some function is can be called (e.g. for power > management or disabling device at hardware level). > Yeah, I'd like to see this too. I am using xinput to disable the notebook keyboard so the cats walking across it don't cause any problems. It would be nice to have a better solution. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: frequent lockups in 3.18rc4
On Fri, 5 Dec 2014 13:48:08 -0500 Dave Jones wrote: > [ 1611.749570] [] do_nmi+0xb8/0xf0 > [ 1611.750438] [] end_repeat_nmi+0x1e/0x2e > [ 1611.751312] [] ? preempt_count_add+0x18/0xb0 > [ 1611.752177] [] ? preempt_count_add+0x18/0xb0 > [ 1611.753025] [] ? preempt_count_add+0x18/0xb0 > [ 1611.753861] <> [] is_module_text_address+0x17/0x50 > [ 1611.754734] [] __kernel_text_address+0x58/0x80 > [ 1611.755575] [] print_context_stack+0x8f/0x100 > [ 1611.756410] [] dump_trace+0x140/0x370 > [ 1611.757242] [] ? getname_flags+0x4f/0x1a0 > [ 1611.758072] [] ? getname_flags+0x4f/0x1a0 > [ 1611.758895] [] save_stack_trace+0x2b/0x50 > [ 1611.759720] [] set_track+0x70/0x140 > [ 1611.760541] [] alloc_debug_processing+0x92/0x118 > [ 1611.761366] [] __slab_alloc+0x45f/0x56f > [ 1611.762195] [] ? getname_flags+0x4f/0x1a0 > [ 1611.763024] [] ? __slab_free+0x114/0x309 > [ 1611.763853] [] ? debug_check_no_obj_freed+0x17e/0x270 > [ 1611.764712] [] ? getname_flags+0x4f/0x1a0 > [ 1611.765539] [] kmem_cache_alloc+0x1f6/0x270 So, every time there is a slab allocation the entire stack trace gets saved as human readable text. And for each line in the trace, is_module_text_address() can be called, which has huge overhead walking the entire list of loaded modules. No wonder there are timeouts... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: frequent lockups in 3.18rc4
On Fri, 5 Dec 2014 13:48:08 -0500 Dave Jones da...@redhat.com wrote: [ 1611.749570] [81007948] do_nmi+0xb8/0xf0 [ 1611.750438] [8179dd2a] end_repeat_nmi+0x1e/0x2e [ 1611.751312] [810a12c8] ? preempt_count_add+0x18/0xb0 [ 1611.752177] [810a12c8] ? preempt_count_add+0x18/0xb0 [ 1611.753025] [810a12c8] ? preempt_count_add+0x18/0xb0 [ 1611.753861] EOE [810fee07] is_module_text_address+0x17/0x50 [ 1611.754734] [81092ab8] __kernel_text_address+0x58/0x80 [ 1611.755575] [81006b5f] print_context_stack+0x8f/0x100 [ 1611.756410] [81005540] dump_trace+0x140/0x370 [ 1611.757242] [811e797f] ? getname_flags+0x4f/0x1a0 [ 1611.758072] [811e797f] ? getname_flags+0x4f/0x1a0 [ 1611.758895] [810137cb] save_stack_trace+0x2b/0x50 [ 1611.759720] [811c29a0] set_track+0x70/0x140 [ 1611.760541] [8178d993] alloc_debug_processing+0x92/0x118 [ 1611.761366] [8178e5d6] __slab_alloc+0x45f/0x56f [ 1611.762195] [811e797f] ? getname_flags+0x4f/0x1a0 [ 1611.763024] [8178dd57] ? __slab_free+0x114/0x309 [ 1611.763853] [8137187e] ? debug_check_no_obj_freed+0x17e/0x270 [ 1611.764712] [811e797f] ? getname_flags+0x4f/0x1a0 [ 1611.765539] [811c6b26] kmem_cache_alloc+0x1f6/0x270 So, every time there is a slab allocation the entire stack trace gets saved as human readable text. And for each line in the trace, is_module_text_address() can be called, which has huge overhead walking the entire list of loaded modules. No wonder there are timeouts... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Disable input device
On Sat, 29 Nov 2014 18:24:03 +0100 Pali Rohár pali.ro...@gmail.com wrote: What do you think about adding new sysfs file disable (accept values 1 or 0) for every input device? With 1 it cause that kernel will drop all events from specific input device and if driver provide some function is can be called (e.g. for power management or disabling device at hardware level). Yeah, I'd like to see this too. I am using xinput to disable the notebook keyboard so the cats walking across it don't cause any problems. It would be nice to have a better solution. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] all arches, signal: Move restart_block to struct task_struct
Should the completely pointless supervisor_stack[] be removed as well? I had a patch to do that but I never sent it. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] all arches, signal: Move restart_block to struct task_struct
Should the completely pointless supervisor_stack[] be removed as well? I had a patch to do that but I never sent it. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86: fix disabling XSAVE feature when CPUID level is capped
When the x86 XSAVE CPU feature is disabled because of capped CPUID level, disable the dependent features the same way as the "noxsave" command line option. Without this fix, the raid6 code oopses in the speed test when it tries to test the AVX functions with a capped CPUID level. Signed-off-by: Chuck Ebbert --- Compile tested only. I don't have an AVX-capable machine to test on. diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index cfa9b5b..f668e09 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -144,15 +144,26 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = { } }; EXPORT_PER_CPU_SYMBOL_GPL(gdt_page); +/* CPU features that must be cleared when disabling XSAVE */ +static const u32 xsave_features[] = { + X86_FEATURE_XSAVE, + X86_FEATURE_XSAVEOPT, + X86_FEATURE_XSAVES, + X86_FEATURE_AVX, + X86_FEATURE_AVX2, + 0 +}; + static int __init x86_xsave_setup(char *s) { + const u32 *feature; + if (strlen(s)) return 0; - setup_clear_cpu_cap(X86_FEATURE_XSAVE); - setup_clear_cpu_cap(X86_FEATURE_XSAVEOPT); - setup_clear_cpu_cap(X86_FEATURE_XSAVES); - setup_clear_cpu_cap(X86_FEATURE_AVX); - setup_clear_cpu_cap(X86_FEATURE_AVX2); + + for (feature = xsave_features; *feature; feature++) + setup_clear_cpu_cap(*feature); + return 1; } __setup("noxsave", x86_xsave_setup); @@ -308,19 +319,39 @@ static __always_inline void setup_smap(struct cpuinfo_x86 *c) /* * Some CPU features depend on higher CPUID levels, which may not always * be available due to CPUID level capping or broken virtualization - * software. Add those features to this table to auto-disable them. + * software. Add those to cpuid_dependent_features[] to auto-disable them. + * + * Disabling some features may require additional actions like disabling + * dependent features as well. Add functions below to do this and set + * clear_fn if needed. */ struct cpuid_dependent_feature { + /* CPU feature dependent on cpuid level. */ u32 feature; + + /* Above feature requires at least this cpuid level. */ u32 level; + + /* Function to call instead of just disabling feature. */ + void (*clear_fn)(struct cpuinfo_x86 *); }; +static void clear_xsave(struct cpuinfo_x86 *c) +{ + const u32 *feature; + + for (feature = xsave_features; *feature; feature++) { + if (cpu_has(c, *feature)) + clear_cpu_cap(c, *feature); + } +} + static const struct cpuid_dependent_feature cpuid_dependent_features[] = { - { X86_FEATURE_MWAIT,0x0005 }, - { X86_FEATURE_DCA, 0x0009 }, - { X86_FEATURE_XSAVE,0x000d }, - { 0, 0 } + { X86_FEATURE_MWAIT,0x0005, NULL }, + { X86_FEATURE_DCA, 0x0009, NULL }, + { X86_FEATURE_XSAVE,0x000d, clear_xsave }, + { 0, 0, NULL } }; static void filter_cpuid_features(struct cpuinfo_x86 *c, bool warn) @@ -343,7 +374,11 @@ static void filter_cpuid_features(struct cpuinfo_x86 *c, bool warn) (s32)df->level > (s32)c->cpuid_level)) continue; - clear_cpu_cap(c, df->feature); + if (df->clear_fn) + df->clear_fn(c); + else + clear_cpu_cap(c, df->feature); + if (!warn) continue; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86: fix disabling XSAVE feature when CPUID level is capped
When the x86 XSAVE CPU feature is disabled because of capped CPUID level, disable the dependent features the same way as the noxsave command line option. Without this fix, the raid6 code oopses in the speed test when it tries to test the AVX functions with a capped CPUID level. Signed-off-by: Chuck Ebbert cebbert.l...@gmail.com --- Compile tested only. I don't have an AVX-capable machine to test on. diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index cfa9b5b..f668e09 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -144,15 +144,26 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = { } }; EXPORT_PER_CPU_SYMBOL_GPL(gdt_page); +/* CPU features that must be cleared when disabling XSAVE */ +static const u32 xsave_features[] = { + X86_FEATURE_XSAVE, + X86_FEATURE_XSAVEOPT, + X86_FEATURE_XSAVES, + X86_FEATURE_AVX, + X86_FEATURE_AVX2, + 0 +}; + static int __init x86_xsave_setup(char *s) { + const u32 *feature; + if (strlen(s)) return 0; - setup_clear_cpu_cap(X86_FEATURE_XSAVE); - setup_clear_cpu_cap(X86_FEATURE_XSAVEOPT); - setup_clear_cpu_cap(X86_FEATURE_XSAVES); - setup_clear_cpu_cap(X86_FEATURE_AVX); - setup_clear_cpu_cap(X86_FEATURE_AVX2); + + for (feature = xsave_features; *feature; feature++) + setup_clear_cpu_cap(*feature); + return 1; } __setup(noxsave, x86_xsave_setup); @@ -308,19 +319,39 @@ static __always_inline void setup_smap(struct cpuinfo_x86 *c) /* * Some CPU features depend on higher CPUID levels, which may not always * be available due to CPUID level capping or broken virtualization - * software. Add those features to this table to auto-disable them. + * software. Add those to cpuid_dependent_features[] to auto-disable them. + * + * Disabling some features may require additional actions like disabling + * dependent features as well. Add functions below to do this and set + * clear_fn if needed. */ struct cpuid_dependent_feature { + /* CPU feature dependent on cpuid level. */ u32 feature; + + /* Above feature requires at least this cpuid level. */ u32 level; + + /* Function to call instead of just disabling feature. */ + void (*clear_fn)(struct cpuinfo_x86 *); }; +static void clear_xsave(struct cpuinfo_x86 *c) +{ + const u32 *feature; + + for (feature = xsave_features; *feature; feature++) { + if (cpu_has(c, *feature)) + clear_cpu_cap(c, *feature); + } +} + static const struct cpuid_dependent_feature cpuid_dependent_features[] = { - { X86_FEATURE_MWAIT,0x0005 }, - { X86_FEATURE_DCA, 0x0009 }, - { X86_FEATURE_XSAVE,0x000d }, - { 0, 0 } + { X86_FEATURE_MWAIT,0x0005, NULL }, + { X86_FEATURE_DCA, 0x0009, NULL }, + { X86_FEATURE_XSAVE,0x000d, clear_xsave }, + { 0, 0, NULL } }; static void filter_cpuid_features(struct cpuinfo_x86 *c, bool warn) @@ -343,7 +374,11 @@ static void filter_cpuid_features(struct cpuinfo_x86 *c, bool warn) (s32)df-level (s32)c-cpuid_level)) continue; - clear_cpu_cap(c, df-feature); + if (df-clear_fn) + df-clear_fn(c); + else + clear_cpu_cap(c, df-feature); + if (!warn) continue; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.17 00/25] 3.17.1-stable review
On Mon, 13 Oct 2014 08:28:35 -0300 Henrique de Moraes Holschuh wrote: > On Mon, 13 Oct 2014, Greg Kroah-Hartman wrote: > > This is the start of the stable review cycle for the 3.17.1 > > release. There are 25 patches in this series, all will be posted > > as a response to this one. If anyone has any issues with these > > being applied, please let me know. > > Actually, 3.17 is a filesystem killer for some unlucky users due to > the libata blacklist being disabled by a bug. > > Patch: > https://git.kernel.org/cgit/linux/kernel/git/tj/libata.git/patch/?id=1c40279960bcd7d52dbdf1d466b20d24b99176c8 > 1c40279960bcd7d52dbdf1d466b20d24b99176c8 (libata: Un-break ATA > blacklist) > There are some more bugs too: RCU stalls, fix has been in rcu/urgent for 7 days: https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=rcu/urgent=789cbbeca4eb7141cbd748ee93772471101b507b blk-mq bio merging (still no good fix available): https://lkml.org/lkml/2014/10/9/729 (The fix sitting in the block tree: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git/commit/?h=for-3.18/core=764f612c6c3c231b9c12cfae7c328ccc9c453258 is wrong according to that message.) Stalls when using nohz, requires series of six patches to fix that didn't make 3.17 and weren't marked for -stable: >From a80e49e2cc3145af014a8ae44f575829cc236192 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Subject: nohz: Move nohz full init call to tick init >From c5c38ef3d70377dc504a6a3f611a3ec814bc757b Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Subject: irq_work: Introduce arch_irq_work_has_interrupt() >From 76a33061b9323b7fdb220ae5fa116c10833ec22e Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Subject: irq_work: Force raised irq work to run on irq work interrupt >From 3010279f0fc36f0388872203e63ca49912f648fd Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Subject: x86: Tell irq work about self IPI support >From 09f6edd424218eb69078551b2ecfada1f2d098eb Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Subject: arm: Tell irq work about self IPI support >From 3631073659d0aafeaa52227bb61a100efaf901dc Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Subject: arm64: Tell irq work about self IPI support -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CRASH during boot 3.16.3+
On Mon, 13 Oct 2014 12:14:40 +0200 Udo van den Heuvel wrote: > On 2014-10-12 19:42, Peter Hurley wrote: > > On 10/12/2014 09:57 AM, Udo van den Heuvel wrote: > >> The problem: > >> During the first few seconds of bootup the kernel gets into some sort of > >> loop and rapidly prints loads of register-like things and then a load of > >> rubbish. > >> I did `make clean` and then a rebuild etc but this did not help. > >> > >> How can I capture the logging to find the point where things go wrong? > >> How can I find out what is wrong? > > > > Start with git bisect between good=3.16.2 and bad=3.16.3. > > And dmesg from 3.16.2. > > Incomplete dmesg attached for 3.16.2. > > > What happens after the 'then a load of rubbish.'? > > I press reset as this is unexpected and looks like it will not result in > a succesfull boot. > > > And rubbish is not > very descriptive. Please include a sample, if you can't catch all the > console prints. > > How can I capture the output easily? Add "boot_delay=3000" to the kernel command line. This will add a 3 second delay between each line. Then take pictures of the screen while it boots. And try to time the shots so you take them during the delays. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CRASH during boot 3.16.3+
On Mon, 13 Oct 2014 12:14:40 +0200 Udo van den Heuvel udo...@xs4all.nl wrote: On 2014-10-12 19:42, Peter Hurley wrote: On 10/12/2014 09:57 AM, Udo van den Heuvel wrote: The problem: During the first few seconds of bootup the kernel gets into some sort of loop and rapidly prints loads of register-like things and then a load of rubbish. I did `make clean` and then a rebuild etc but this did not help. How can I capture the logging to find the point where things go wrong? How can I find out what is wrong? Start with git bisect between good=3.16.2 and bad=3.16.3. And dmesg from 3.16.2. Incomplete dmesg attached for 3.16.2. What happens after the 'then a load of rubbish.'? I press reset as this is unexpected and looks like it will not result in a succesfull boot. And rubbish is not very descriptive. Please include a sample, if you can't catch all the console prints. How can I capture the output easily? Add boot_delay=3000 to the kernel command line. This will add a 3 second delay between each line. Then take pictures of the screen while it boots. And try to time the shots so you take them during the delays. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.17 00/25] 3.17.1-stable review
On Mon, 13 Oct 2014 08:28:35 -0300 Henrique de Moraes Holschuh h...@hmh.eng.br wrote: On Mon, 13 Oct 2014, Greg Kroah-Hartman wrote: This is the start of the stable review cycle for the 3.17.1 release. There are 25 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Actually, 3.17 is a filesystem killer for some unlucky users due to the libata blacklist being disabled by a bug. Patch: https://git.kernel.org/cgit/linux/kernel/git/tj/libata.git/patch/?id=1c40279960bcd7d52dbdf1d466b20d24b99176c8 1c40279960bcd7d52dbdf1d466b20d24b99176c8 (libata: Un-break ATA blacklist) There are some more bugs too: RCU stalls, fix has been in rcu/urgent for 7 days: https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=rcu/urgentid=789cbbeca4eb7141cbd748ee93772471101b507b blk-mq bio merging (still no good fix available): https://lkml.org/lkml/2014/10/9/729 (The fix sitting in the block tree: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git/commit/?h=for-3.18/coreid=764f612c6c3c231b9c12cfae7c328ccc9c453258 is wrong according to that message.) Stalls when using nohz, requires series of six patches to fix that didn't make 3.17 and weren't marked for -stable: From a80e49e2cc3145af014a8ae44f575829cc236192 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker fweis...@gmail.com Subject: nohz: Move nohz full init call to tick init From c5c38ef3d70377dc504a6a3f611a3ec814bc757b Mon Sep 17 00:00:00 2001 From: Peter Zijlstra pet...@infradead.org Subject: irq_work: Introduce arch_irq_work_has_interrupt() From 76a33061b9323b7fdb220ae5fa116c10833ec22e Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker fweis...@gmail.com Subject: irq_work: Force raised irq work to run on irq work interrupt From 3010279f0fc36f0388872203e63ca49912f648fd Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker fweis...@gmail.com Subject: x86: Tell irq work about self IPI support From 09f6edd424218eb69078551b2ecfada1f2d098eb Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker fweis...@gmail.com Subject: arm: Tell irq work about self IPI support From 3631073659d0aafeaa52227bb61a100efaf901dc Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker fweis...@gmail.com Subject: arm64: Tell irq work about self IPI support -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Clean up stack access code in irq_32.c
On Sun, 12 Oct 2014 10:13:33 -0700 "H. Peter Anvin" wrote: > That's not a justification for change. Claiming no harm is nevessary but not > sufficient. > The optimization is also a little better with GCC when using C instead of asm for current_stack_pointer. Probably not enough better to do different macros for gcc and other compilers though. clang actually moves %esp to memory and then into another register instead of moving it directly when using the current macro. Their optimizer really needs some work... > On October 12, 2014 9:53:32 AM PDT, Chuck Ebbert > wrote: > >On Sun, 12 Oct 2014 09:47:53 -0700 > >"H. Peter Anvin" wrote: > > > >[replying to the list this time, sigh] > > > >> We changed this to asm because the C broke some compilers. Why are > >you changing it back? > >> > > > >The C broke some compilers because it was using a global register > >variable. This is a local one, which the clang documentation says is > >supported. And I compiled it with clang with no problem. > > > >> On October 12, 2014 9:43:53 AM PDT, Chuck Ebbert > > wrote: > >> >Use C instead of asm for accessing the stack pointer. And define > >some > >> >macros to make the code easier to understand. > >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Clean up stack access code in irq_32.c
On Sun, 12 Oct 2014 12:00:03 -0500 Jeff Epler wrote: > It looks like the proposed variant still miscompiles in clang 3.4 and 3.5, the > two versions I had handy to test. > > I extracted your code to a simple standalone C translation unit and > inspected various compilers' results via objdump. > Wow, my little test program below worked with clang by accident. I was building it with both printf() calls enabled and it printed out the same results on both output lines. But commenting out the first line reveals that it simply leaves whatever junk is on the stack there for the first arg when it calls printf(). #define _GNU_SOURCE #include #include #include #define current_stack_pointer ({\ unsigned long sp; \ asm("mov %%esp,%0" : "=g" (sp));\ sp; \ }) #define current_stack_pointer2 ({ \ register unsigned long sp asm("esp"); \ sp; \ }) int main(int argc, char **argv) { // printf("%X %X\n", current_stack_pointer , __builtin_frame_address(0)); printf("%X %X\n", current_stack_pointer2, __builtin_frame_address(0)); return 0; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Clean up stack access code in irq_32.c
On Sun, 12 Oct 2014 09:47:53 -0700 "H. Peter Anvin" wrote: [replying to the list this time, sigh] > We changed this to asm because the C broke some compilers. Why are you > changing it back? > The C broke some compilers because it was using a global register variable. This is a local one, which the clang documentation says is supported. And I compiled it with clang with no problem. > On October 12, 2014 9:43:53 AM PDT, Chuck Ebbert > wrote: > >Use C instead of asm for accessing the stack pointer. And define some > >macros to make the code easier to understand. > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86: Clean up stack access code in irq_32.c
Use C instead of asm for accessing the stack pointer. And define some macros to make the code easier to understand. Signed-off-by: Chuck Ebbert diff --git a/arch/x86/include/asm/page_32_types.h b/arch/x86/include/asm/page_32_types.h index f48b17d..a8ca0cb 100644 --- a/arch/x86/include/asm/page_32_types.h +++ b/arch/x86/include/asm/page_32_types.h @@ -19,6 +19,8 @@ #define THREAD_SIZE_ORDER 1 #define THREAD_SIZE(PAGE_SIZE << THREAD_SIZE_ORDER) +#define THREAD_SIZE_MASK (THREAD_SIZE - 1) +#define CURRENT_MASK (~THREAD_SIZE_MASK) #define STACKFAULT_STACK 0 #define DOUBLEFAULT_STACK 1 diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h index 6782051..ded89b0 100644 --- a/arch/x86/include/asm/page_64_types.h +++ b/arch/x86/include/asm/page_64_types.h @@ -2,8 +2,9 @@ #define _ASM_X86_PAGE_64_DEFS_H #define THREAD_SIZE_ORDER 2 -#define THREAD_SIZE (PAGE_SIZE << THREAD_SIZE_ORDER) -#define CURRENT_MASK (~(THREAD_SIZE - 1)) +#define THREAD_SIZE(PAGE_SIZE << THREAD_SIZE_ORDER) +#define THREAD_SIZE_MASK (THREAD_SIZE - 1) +#define CURRENT_MASK (~THREAD_SIZE_MASK) #define EXCEPTION_STACK_ORDER 0 #define EXCEPTION_STKSZ (PAGE_SIZE << EXCEPTION_STACK_ORDER) diff --git a/arch/x86/kernel/irq_32.c b/arch/x86/kernel/irq_32.c index 63ce838..bef90fc 100644 --- a/arch/x86/kernel/irq_32.c +++ b/arch/x86/kernel/irq_32.c @@ -27,6 +27,12 @@ EXPORT_PER_CPU_SYMBOL(irq_stat); DEFINE_PER_CPU(struct pt_regs *, irq_regs); EXPORT_PER_CPU_SYMBOL(irq_regs); +/* how to get the current stack pointer from C */ +#define current_stack_pointer ({ \ + register unsigned long sp asm("esp"); \ + sp; \ +}) + #ifdef CONFIG_DEBUG_STACKOVERFLOW int sysctl_panic_on_stackoverflow __read_mostly; @@ -34,12 +40,8 @@ int sysctl_panic_on_stackoverflow __read_mostly; /* Debugging check for stack overflow: is there less than 1KB free? */ static int check_stack_overflow(void) { - long sp; - - __asm__ __volatile__("andl %%esp,%0" : -"=r" (sp) : "0" (THREAD_SIZE - 1)); - - return sp < (sizeof(struct thread_info) + STACK_WARN); + return (current_stack_pointer & THREAD_SIZE_MASK) + < sizeof(struct thread_info) + STACK_WARN; } static void print_stack_overflow(void) @@ -69,16 +71,9 @@ static void call_on_stack(void *func, void *stack) : "memory", "cc", "edx", "ecx", "eax"); } -/* how to get the current stack pointer from C */ -#define current_stack_pointer ({ \ - unsigned long sp; \ - asm("mov %%esp,%0" : "=g" (sp));\ - sp; \ -}) - static inline void *current_stack(void) { - return (void *)(current_stack_pointer & ~(THREAD_SIZE - 1)); + return (void *)(current_stack_pointer & CURRENT_MASK); } static inline int -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86: Clean up stack access code in irq_32.c
Use C instead of asm for accessing the stack pointer. And define some macros to make the code easier to understand. Signed-off-by: Chuck Ebbert cebbert.l...@gmail.com diff --git a/arch/x86/include/asm/page_32_types.h b/arch/x86/include/asm/page_32_types.h index f48b17d..a8ca0cb 100644 --- a/arch/x86/include/asm/page_32_types.h +++ b/arch/x86/include/asm/page_32_types.h @@ -19,6 +19,8 @@ #define THREAD_SIZE_ORDER 1 #define THREAD_SIZE(PAGE_SIZE THREAD_SIZE_ORDER) +#define THREAD_SIZE_MASK (THREAD_SIZE - 1) +#define CURRENT_MASK (~THREAD_SIZE_MASK) #define STACKFAULT_STACK 0 #define DOUBLEFAULT_STACK 1 diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h index 6782051..ded89b0 100644 --- a/arch/x86/include/asm/page_64_types.h +++ b/arch/x86/include/asm/page_64_types.h @@ -2,8 +2,9 @@ #define _ASM_X86_PAGE_64_DEFS_H #define THREAD_SIZE_ORDER 2 -#define THREAD_SIZE (PAGE_SIZE THREAD_SIZE_ORDER) -#define CURRENT_MASK (~(THREAD_SIZE - 1)) +#define THREAD_SIZE(PAGE_SIZE THREAD_SIZE_ORDER) +#define THREAD_SIZE_MASK (THREAD_SIZE - 1) +#define CURRENT_MASK (~THREAD_SIZE_MASK) #define EXCEPTION_STACK_ORDER 0 #define EXCEPTION_STKSZ (PAGE_SIZE EXCEPTION_STACK_ORDER) diff --git a/arch/x86/kernel/irq_32.c b/arch/x86/kernel/irq_32.c index 63ce838..bef90fc 100644 --- a/arch/x86/kernel/irq_32.c +++ b/arch/x86/kernel/irq_32.c @@ -27,6 +27,12 @@ EXPORT_PER_CPU_SYMBOL(irq_stat); DEFINE_PER_CPU(struct pt_regs *, irq_regs); EXPORT_PER_CPU_SYMBOL(irq_regs); +/* how to get the current stack pointer from C */ +#define current_stack_pointer ({ \ + register unsigned long sp asm(esp); \ + sp; \ +}) + #ifdef CONFIG_DEBUG_STACKOVERFLOW int sysctl_panic_on_stackoverflow __read_mostly; @@ -34,12 +40,8 @@ int sysctl_panic_on_stackoverflow __read_mostly; /* Debugging check for stack overflow: is there less than 1KB free? */ static int check_stack_overflow(void) { - long sp; - - __asm__ __volatile__(andl %%esp,%0 : -=r (sp) : 0 (THREAD_SIZE - 1)); - - return sp (sizeof(struct thread_info) + STACK_WARN); + return (current_stack_pointer THREAD_SIZE_MASK) + sizeof(struct thread_info) + STACK_WARN; } static void print_stack_overflow(void) @@ -69,16 +71,9 @@ static void call_on_stack(void *func, void *stack) : memory, cc, edx, ecx, eax); } -/* how to get the current stack pointer from C */ -#define current_stack_pointer ({ \ - unsigned long sp; \ - asm(mov %%esp,%0 : =g (sp));\ - sp; \ -}) - static inline void *current_stack(void) { - return (void *)(current_stack_pointer ~(THREAD_SIZE - 1)); + return (void *)(current_stack_pointer CURRENT_MASK); } static inline int -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Clean up stack access code in irq_32.c
On Sun, 12 Oct 2014 09:47:53 -0700 H. Peter Anvin h...@zytor.com wrote: [replying to the list this time, sigh] We changed this to asm because the C broke some compilers. Why are you changing it back? The C broke some compilers because it was using a global register variable. This is a local one, which the clang documentation says is supported. And I compiled it with clang with no problem. On October 12, 2014 9:43:53 AM PDT, Chuck Ebbert cebbert.l...@gmail.com wrote: Use C instead of asm for accessing the stack pointer. And define some macros to make the code easier to understand. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Clean up stack access code in irq_32.c
On Sun, 12 Oct 2014 12:00:03 -0500 Jeff Epler jep...@unpythonic.net wrote: It looks like the proposed variant still miscompiles in clang 3.4 and 3.5, the two versions I had handy to test. I extracted your code to a simple standalone C translation unit and inspected various compilers' results via objdump. Wow, my little test program below worked with clang by accident. I was building it with both printf() calls enabled and it printed out the same results on both output lines. But commenting out the first line reveals that it simply leaves whatever junk is on the stack there for the first arg when it calls printf(). #define _GNU_SOURCE #include string.h #include stdio.h #include unistd.h #define current_stack_pointer ({\ unsigned long sp; \ asm(mov %%esp,%0 : =g (sp));\ sp; \ }) #define current_stack_pointer2 ({ \ register unsigned long sp asm(esp); \ sp; \ }) int main(int argc, char **argv) { // printf(%X %X\n, current_stack_pointer , __builtin_frame_address(0)); printf(%X %X\n, current_stack_pointer2, __builtin_frame_address(0)); return 0; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Clean up stack access code in irq_32.c
On Sun, 12 Oct 2014 10:13:33 -0700 H. Peter Anvin h...@zytor.com wrote: That's not a justification for change. Claiming no harm is nevessary but not sufficient. The optimization is also a little better with GCC when using C instead of asm for current_stack_pointer. Probably not enough better to do different macros for gcc and other compilers though. clang actually moves %esp to memory and then into another register instead of moving it directly when using the current macro. Their optimizer really needs some work... On October 12, 2014 9:53:32 AM PDT, Chuck Ebbert cebbert.l...@gmail.com wrote: On Sun, 12 Oct 2014 09:47:53 -0700 H. Peter Anvin h...@zytor.com wrote: [replying to the list this time, sigh] We changed this to asm because the C broke some compilers. Why are you changing it back? The C broke some compilers because it was using a global register variable. This is a local one, which the clang documentation says is supported. And I compiled it with clang with no problem. On October 12, 2014 9:43:53 AM PDT, Chuck Ebbert cebbert.l...@gmail.com wrote: Use C instead of asm for accessing the stack pointer. And define some macros to make the code easier to understand. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [sched] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
On Sat, 11 Oct 2014 13:15:30 +0800 Fengguang Wu wrote: > FYI, we noticed the below changes on commit > > 445d95d7c384741d133251a9adac935866591c92 ("sched: Remove > update_rq_runnable_avg") > > [ 67.303839] BUG: unable to handle kernel NULL pointer dereference at > 0040 > [ 67.304014] IP: [] print_cfs_rq+0x4a3/0xa96 Well that one's pretty obvious: --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -68,14 +68,6 @@ static void print_cfs_group_stats(struct seq_file *m, int cpu, struct task_group #define PN(F) \ SEQ_printf(m, " .%-30s: %lld.%06ld\n", #F, SPLIT_NS((long long)F)) - if (!se) { - struct sched_avg *avg = _rq(cpu)->avg; - P(avg->runnable_avg_sum); - P(avg->runnable_avg_period); - return; - } - - PN(se->exec_start); PN(se->vruntime); PN(se->sum_exec_runtime); You can remove the P() calls from that if statement, but you can't remove the whole thing because you will try to dereference a NULL se immediately afterward if you do. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [sched] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
On Sat, 11 Oct 2014 13:15:30 +0800 Fengguang Wu fengguang...@intel.com wrote: FYI, we noticed the below changes on commit 445d95d7c384741d133251a9adac935866591c92 (sched: Remove update_rq_runnable_avg) [ 67.303839] BUG: unable to handle kernel NULL pointer dereference at 0040 [ 67.304014] IP: [810b1d52] print_cfs_rq+0x4a3/0xa96 Well that one's pretty obvious: --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -68,14 +68,6 @@ static void print_cfs_group_stats(struct seq_file *m, int cpu, struct task_group #define PN(F) \ SEQ_printf(m, .%-30s: %lld.%06ld\n, #F, SPLIT_NS((long long)F)) - if (!se) { - struct sched_avg *avg = cpu_rq(cpu)-avg; - P(avg-runnable_avg_sum); - P(avg-runnable_avg_period); - return; - } - - PN(se-exec_start); PN(se-vruntime); PN(se-sum_exec_runtime); You can remove the P() calls from that if statement, but you can't remove the whole thing because you will try to dereference a NULL se immediately afterward if you do. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86: Use the page tables to look up kernel addresses in backtrace
On Fri, 10 Oct 2014 16:25:17 -0700 Andi Kleen wrote: > From: Andi Kleen > > On my workstation which has a lot of modules loaded: > > $ lsmod | wc -l > 80 > > backtrace from the NMI for perf record -g can take a quite long time. > > This leads to frequent messages like: > perf interrupt took too long (7852 > 7812), lowering > kernel.perf_event_max_sample_rate to 16000 > > One larger part of the PMI cost is each text address check during > the backtrace taking upto to 3us, like this: > > 1) | print_context_stack_bp() { > 1) |__kernel_text_address() { > 1) | is_module_text_address() { > 1) |__module_text_address() { > 1) 1.611 us| __module_address(); > 1) 1.939 us|} > 1) 2.296 us| } > 1) 2.659 us|} > 1) |__kernel_text_address() { > 1) | is_module_text_address() { > 1) |__module_text_address() { > 1) 0.724 us| __module_address(); > 1) 1.064 us|} > 1) 1.430 us| } > 1) 1.798 us|} > 1) |__kernel_text_address() { > 1) | is_module_text_address() { > 1) |__module_text_address() { > 1) 0.656 us| __module_address(); > 1) 1.012 us|} > 1) 1.356 us| } > 1) 1.761 us|} > > So just with a reasonably sized backtrace easily 10-20us can be spent > on just checking the frame pointer IPs. > > So essentially currently the module lookup is N-MODULES*M-length of backtrace > > This patch uses the NX bits in the page tables to check for > valid kernel addresses instead. This can be done in any context > because kernel page tables are not removed (if they were it could > be handled by RCU like the user page tables) > > The lookup here is 2-4 memory accesses bounded. > > Anything with no NX bit set and is in kernel space is a valid > kernel executable. Unlike the previous scheme this will also > handle cases like the profiler hitting BIOS code or similar > (e.g. the PCI BIOS on 32bit) > > On systems without NX we fall back to the previous scheme. > > Signed-off-by: Andi Kleen > --- > arch/x86/kernel/dumpstack.c | 38 +- > 1 file changed, 37 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c > index b74ebc7..9279549 100644 > --- a/arch/x86/kernel/dumpstack.c > +++ b/arch/x86/kernel/dumpstack.c > @@ -90,6 +90,42 @@ static inline int valid_stack_ptr(struct thread_info > *tinfo, > return p > t && p < t + THREAD_SIZE - size; > } > > +/* > + * Check if the address is in a executable page. > + * This can be much faster than looking it up in the module > + * table list when many modules are loaded. > + * > + * This is safe in any context because kernel page tables > + * are never removed. > + */ > +static bool addr_is_executable(unsigned long addr) That name is confusing. Maybe call it x86_kernel_text_address()? > +{ > + pgd_t *pgd; > + pud_t *pud; > + pmd_t *pmd; > + pte_t *pte; > + > + if (!(__supported_pte_mask & _PAGE_NX)) > + return __kernel_text_address(addr); > + if (addr < __PAGE_OFFSET) > + return false; Can't you check __PAGE_OFFSET first? That would speed up the non-NX case... > + pgd = pgd_offset_k(addr); > + if (!pgd_present(*pgd)) > + return false; > + pud = pud_offset(pgd, addr); > + if (!pud_present(*pud)) > + return false; > + if (pud_large(*pud)) > + return pte_exec(*(pte_t *)pud); > + pmd = pmd_offset(pud, addr); > + if (!pmd_present(*pmd)) > + return false; > + if (pmd_large(*pmd)) > + return pte_exec(*(pte_t *)pmd); > + pte = pte_offset_kernel(pmd, addr); > + return pte_present(*pte) && pte_exec(*pte); > +} > + > unsigned long > print_context_stack(struct thread_info *tinfo, > unsigned long *stack, unsigned long bp, > @@ -102,7 +138,7 @@ print_context_stack(struct thread_info *tinfo, > unsigned long addr; > > addr = *stack; > - if (__kernel_text_address(addr)) { > + if (addr_is_executable(addr)) { > if ((unsigned long) stack == bp + sizeof(long)) { > ops->address(data, addr, 1); > frame = frame->next_frame; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] x86: Use the page tables to look up kernel addresses in backtrace
On Fri, 10 Oct 2014 16:25:17 -0700 Andi Kleen a...@firstfloor.org wrote: From: Andi Kleen a...@linux.intel.com On my workstation which has a lot of modules loaded: $ lsmod | wc -l 80 backtrace from the NMI for perf record -g can take a quite long time. This leads to frequent messages like: perf interrupt took too long (7852 7812), lowering kernel.perf_event_max_sample_rate to 16000 One larger part of the PMI cost is each text address check during the backtrace taking upto to 3us, like this: 1) | print_context_stack_bp() { 1) |__kernel_text_address() { 1) | is_module_text_address() { 1) |__module_text_address() { 1) 1.611 us| __module_address(); 1) 1.939 us|} 1) 2.296 us| } 1) 2.659 us|} 1) |__kernel_text_address() { 1) | is_module_text_address() { 1) |__module_text_address() { 1) 0.724 us| __module_address(); 1) 1.064 us|} 1) 1.430 us| } 1) 1.798 us|} 1) |__kernel_text_address() { 1) | is_module_text_address() { 1) |__module_text_address() { 1) 0.656 us| __module_address(); 1) 1.012 us|} 1) 1.356 us| } 1) 1.761 us|} So just with a reasonably sized backtrace easily 10-20us can be spent on just checking the frame pointer IPs. So essentially currently the module lookup is N-MODULES*M-length of backtrace This patch uses the NX bits in the page tables to check for valid kernel addresses instead. This can be done in any context because kernel page tables are not removed (if they were it could be handled by RCU like the user page tables) The lookup here is 2-4 memory accesses bounded. Anything with no NX bit set and is in kernel space is a valid kernel executable. Unlike the previous scheme this will also handle cases like the profiler hitting BIOS code or similar (e.g. the PCI BIOS on 32bit) On systems without NX we fall back to the previous scheme. Signed-off-by: Andi Kleen a...@linux.intel.com --- arch/x86/kernel/dumpstack.c | 38 +- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index b74ebc7..9279549 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -90,6 +90,42 @@ static inline int valid_stack_ptr(struct thread_info *tinfo, return p t p t + THREAD_SIZE - size; } +/* + * Check if the address is in a executable page. + * This can be much faster than looking it up in the module + * table list when many modules are loaded. + * + * This is safe in any context because kernel page tables + * are never removed. + */ +static bool addr_is_executable(unsigned long addr) That name is confusing. Maybe call it x86_kernel_text_address()? +{ + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + + if (!(__supported_pte_mask _PAGE_NX)) + return __kernel_text_address(addr); + if (addr __PAGE_OFFSET) + return false; Can't you check __PAGE_OFFSET first? That would speed up the non-NX case... + pgd = pgd_offset_k(addr); + if (!pgd_present(*pgd)) + return false; + pud = pud_offset(pgd, addr); + if (!pud_present(*pud)) + return false; + if (pud_large(*pud)) + return pte_exec(*(pte_t *)pud); + pmd = pmd_offset(pud, addr); + if (!pmd_present(*pmd)) + return false; + if (pmd_large(*pmd)) + return pte_exec(*(pte_t *)pmd); + pte = pte_offset_kernel(pmd, addr); + return pte_present(*pte) pte_exec(*pte); +} + unsigned long print_context_stack(struct thread_info *tinfo, unsigned long *stack, unsigned long bp, @@ -102,7 +138,7 @@ print_context_stack(struct thread_info *tinfo, unsigned long addr; addr = *stack; - if (__kernel_text_address(addr)) { + if (addr_is_executable(addr)) { if ((unsigned long) stack == bp + sizeof(long)) { ops-address(data, addr, 1); frame = frame-next_frame; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: acpitool - /proc/acpi/wakeup
On Thu, 9 Oct 2014 22:16:11 +0200 Frans Klaver wrote: > On Thu, Oct 9, 2014 at 9:58 PM, Marc Burkhardt wrote: > > > > > >>On Thu, Oct 9, 2014 at 9:42 PM, Marc Burkhardt > >>wrote: > >>> I upgraded from 3.10 on that machine. 3.12 didn't work for me due to > >>a hibernation bug. The rest was left out... :/ > >> > >>If you still have the 3.12 kernel around, could you test if acpitool > >>-e worked there? > > > > Let me ask you a question: does it make sense to test 3.12 again because > > you know there's something changed regarding /proc/acpi/... or because it's > > the kernel I broke up on upgrading? > > Never mind. It broke after 3.14. I'll bisect. > The below patch fixes it for me. Looks like the line sizes changed and some are now exactly the right length to make it loop forever reading /proc/acpi/wakeup: --- a/src/acpitool.cpp +++ b/src/acpitool.cpp @@ -417,7 +417,7 @@ int Do_Fan_Info(int verbose) int Show_WakeUp_Devices(int verbose) { ifstream file_in; -char *filename, str[40]; +char *filename, str[80]; filename = "/proc/acpi/wakeup"; @@ -438,13 +438,13 @@ int Show_WakeUp_Devices(int verbose) } else { - file_in.getline(str, 40); // first line are just headers // + file_in.getline(str, 80); // first line are just headers // cout<<" "
Re: i915.ko WC writes are slow after ea8596bb2d8d379
On Thu, 9 Oct 2014 14:00:47 +0100 Chris Wilson wrote: > On Thu, Oct 09, 2014 at 07:44:16AM -0500, Chuck Ebbert wrote: > > Could you try installing x86info and running "x86info --mtrr > > --all-cpus" while running the broken kernel? > > # /opt/xorg/src/intel-gpu-tools/tests/gem_gtt_speed > IGT-Version: 1.8-g32a0308 (x86_64) (Linux: 3.17.0+ x86_64) > Time to read 16k through a GTT map: 318.643µs > Time to write 16k through a GTT map:203.103µs > Time to clear 16k through a GTT map: 53.098µs > Time to clear 16k through a cached GTT map: 49.925µs > > (i.e. bad kernel) > > # x86info --mtrr --all-cpus > x86info v1.30. Dave Jones 2001-2011 > Feedback to . > > Found 4 CPUs. > CPU #1: > Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7 > Type: 0 (Original OEM) > CPU Model (x86info's best guess): Unknown model. > Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ > 3.30GHz > > MTRR registers: > MTRRcap (0xfe): 0x0d0a (smrr flag: 0x1, wc flag: 0x1, fix flag: > 0x1, vcnt field: 0x0a (10)) > MTRRphysBase0 (0x200): 0x0006 (physbase field:0x00 type > field: 0x06 (write-back)) > MTRRphysMask0 (0x201): 0x000f8800 (physmask field:0xf8 valid > flag: 1) > MTRRphysBase1 (0x202): 0x8006 (physbase field:0x08 type > field: 0x06 (write-back)) > MTRRphysMask1 (0x203): 0x000ff800 (physmask field:0xff valid > flag: 1) > MTRRphysBase2 (0x204): 0x8e00 (physbase field:0x08e000 type > field: 0x00 (uncacheable)) > MTRRphysMask2 (0x205): 0x000ffe000800 (physmask field:0xffe000 valid > flag: 1) > MTRRphysBase3 (0x206): 0x8d00 (physbase field:0x08d000 type > field: 0x00 (uncacheable)) > MTRRphysMask3 (0x207): 0x000fff000800 (physmask field:0xfff000 valid > flag: 1) > MTRRphysBase4 (0x208): 0x00010006 (physbase field:0x10 type > field: 0x06 (write-back)) > MTRRphysMask4 (0x209): 0x000f8800 (physmask field:0xf8 valid > flag: 1) > MTRRphysBase5 (0x20a): 0x00017000 (physbase field:0x17 type > field: 0x00 (uncacheable)) > MTRRphysMask5 (0x20b): 0x000ff800 (physmask field:0xff valid > flag: 1) > MTRRphysBase6 (0x20c): 0x00016f00 (physbase field:0x16f000 type > field: 0x00 (uncacheable)) > MTRRphysMask6 (0x20d): 0x000fff000800 (physmask field:0xfff000 valid > flag: 1) > MTRRphysBase7 (0x20e): 0x00016e80 (physbase field:0x16e800 type > field: 0x00 (uncacheable)) > MTRRphysMask7 (0x20f): 0x000fff800800 (physmask field:0xfff800 valid > flag: 1) > MTRRfix64K_0 (0x250): 0x0606060606060606 > MTRRfix16K_8 (0x258): 0x0606060606060606 > MTRRfix16K_A (0x259): 0x > MTRRfix4K_C8000 (0x269): 0x0505050505050505 > MTRRfix4K_D 0x26a: 0x > MTRRfix4K_D8000 0x26b: 0x > MTRRfix4K_E 0x26c: 0x > MTRRfix4K_E8000 0x26d: 0x0505050505050505 > MTRRfix4K_F 0x26e: 0x0505050505050505 > MTRRfix4K_F8000 0x26f: 0x0505050505050505 > MTRRdefType (0x2ff): 0x0c00 (fixed-range flag: 0x1, mtrr flag: > 0x1, type field: 0x00 (uncacheable)) > Well they're all the same. Hmm, x86info is not dumping all the variable MTRRs. You have 10, but it only prints the first 8. I don't know if it will show anything different, but can you try fixing it with this patch? --- a/mtrr.c +++ b/mtrr.c @@ -75,19 +75,23 @@ printf("0x%016llx\n", val); } -static void decode_mtrrcap(int cpu, int msr) +unsigned int decode_mtrrcap(int cpu, int msr) { unsigned long long val; + unsigned int vcnt = 0; int ret; ret = mtrr_value(cpu,msr,); if (ret) { + vcnt = (unsigned int)(val & IA32_MTRRCAP_VCNT); printf("0x%016llx ", val); printf("(smrr flag: 0x%01x, ",(unsigned int) (val & IA32_MTRRCAP_SMRR) >> 11 ); printf("wc flag: 0x%01x, ",(unsigned int) (val_MTRRCAP_WC) >> 10); printf("fix flag: 0x%01x, ",(unsigned int) (val_MTRRCAP_FIX) >> 8); - printf("vcnt field: 0x%02x (%d))\n",(unsigned int) (val_MTRRCAP_VCNT) , (int) (val_MTRRCAP_VCNT)); + printf("vcnt field: 0x%02x (%u))\n", vcnt, vcnt); } + + return vcnt; } static void decode_mtrr_deftype(int cpu, int msr) @@ -142,7 +146,7 @@ void dump_mtrrs(struct cpudata *cpu) { unsigned long long val = 0; - unsigned int i; + unsigned int i, vcnt; if (!(cpu->flags_edx & (X86_FEATURE_MTRR))) return; @@ -157,11 +161,11 @@ printf("M
Re: i915.ko WC writes are slow after ea8596bb2d8d379
On Thu, 9 Oct 2014 07:53:31 +0100 Chris Wilson wrote: > # cat /proc/mtrr > reg00: base=0x0 (0MB), size= 2048MB, count=1: write-back > reg01: base=0x08000 ( 2048MB), size= 256MB, count=1: write-back > reg02: base=0x08e00 ( 2272MB), size= 32MB, count=1: uncachable > reg03: base=0x08d00 ( 2256MB), size= 16MB, count=1: uncachable > reg04: base=0x1 ( 4096MB), size= 2048MB, count=1: write-back > reg05: base=0x17000 ( 5888MB), size= 256MB, count=1: uncachable > reg06: base=0x16f00 ( 5872MB), size= 16MB, count=1: uncachable > reg07: base=0x16e80 ( 5864MB), size=8MB, count=1: uncachable > reg08: base=0x16e60 ( 5862MB), size=2MB, count=1: uncachable > Well that's what the kernel thinks is in every CPU. Could you try installing x86info and running "x86info --mtrr --all-cpus" while running the broken kernel? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915.ko WC writes are slow after ea8596bb2d8d379
On Thu, 9 Oct 2014 07:53:31 +0100 Chris Wilson ch...@chris-wilson.co.uk wrote: # cat /proc/mtrr reg00: base=0x0 (0MB), size= 2048MB, count=1: write-back reg01: base=0x08000 ( 2048MB), size= 256MB, count=1: write-back reg02: base=0x08e00 ( 2272MB), size= 32MB, count=1: uncachable reg03: base=0x08d00 ( 2256MB), size= 16MB, count=1: uncachable reg04: base=0x1 ( 4096MB), size= 2048MB, count=1: write-back reg05: base=0x17000 ( 5888MB), size= 256MB, count=1: uncachable reg06: base=0x16f00 ( 5872MB), size= 16MB, count=1: uncachable reg07: base=0x16e80 ( 5864MB), size=8MB, count=1: uncachable reg08: base=0x16e60 ( 5862MB), size=2MB, count=1: uncachable Well that's what the kernel thinks is in every CPU. Could you try installing x86info and running x86info --mtrr --all-cpus while running the broken kernel? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915.ko WC writes are slow after ea8596bb2d8d379
On Thu, 9 Oct 2014 14:00:47 +0100 Chris Wilson ch...@chris-wilson.co.uk wrote: On Thu, Oct 09, 2014 at 07:44:16AM -0500, Chuck Ebbert wrote: Could you try installing x86info and running x86info --mtrr --all-cpus while running the broken kernel? # /opt/xorg/src/intel-gpu-tools/tests/gem_gtt_speed IGT-Version: 1.8-g32a0308 (x86_64) (Linux: 3.17.0+ x86_64) Time to read 16k through a GTT map: 318.643µs Time to write 16k through a GTT map:203.103µs Time to clear 16k through a GTT map: 53.098µs Time to clear 16k through a cached GTT map: 49.925µs (i.e. bad kernel) # x86info --mtrr --all-cpus x86info v1.30. Dave Jones 2001-2011 Feedback to da...@redhat.com. Found 4 CPUs. CPU #1: Extended Family: 0 Extended Model: 2 Family: 6 Model: 42 Stepping: 7 Type: 0 (Original OEM) CPU Model (x86info's best guess): Unknown model. Processor name string (BIOS programmed): Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz MTRR registers: MTRRcap (0xfe): 0x0d0a (smrr flag: 0x1, wc flag: 0x1, fix flag: 0x1, vcnt field: 0x0a (10)) MTRRphysBase0 (0x200): 0x0006 (physbase field:0x00 type field: 0x06 (write-back)) MTRRphysMask0 (0x201): 0x000f8800 (physmask field:0xf8 valid flag: 1) MTRRphysBase1 (0x202): 0x8006 (physbase field:0x08 type field: 0x06 (write-back)) MTRRphysMask1 (0x203): 0x000ff800 (physmask field:0xff valid flag: 1) MTRRphysBase2 (0x204): 0x8e00 (physbase field:0x08e000 type field: 0x00 (uncacheable)) MTRRphysMask2 (0x205): 0x000ffe000800 (physmask field:0xffe000 valid flag: 1) MTRRphysBase3 (0x206): 0x8d00 (physbase field:0x08d000 type field: 0x00 (uncacheable)) MTRRphysMask3 (0x207): 0x000fff000800 (physmask field:0xfff000 valid flag: 1) MTRRphysBase4 (0x208): 0x00010006 (physbase field:0x10 type field: 0x06 (write-back)) MTRRphysMask4 (0x209): 0x000f8800 (physmask field:0xf8 valid flag: 1) MTRRphysBase5 (0x20a): 0x00017000 (physbase field:0x17 type field: 0x00 (uncacheable)) MTRRphysMask5 (0x20b): 0x000ff800 (physmask field:0xff valid flag: 1) MTRRphysBase6 (0x20c): 0x00016f00 (physbase field:0x16f000 type field: 0x00 (uncacheable)) MTRRphysMask6 (0x20d): 0x000fff000800 (physmask field:0xfff000 valid flag: 1) MTRRphysBase7 (0x20e): 0x00016e80 (physbase field:0x16e800 type field: 0x00 (uncacheable)) MTRRphysMask7 (0x20f): 0x000fff800800 (physmask field:0xfff800 valid flag: 1) MTRRfix64K_0 (0x250): 0x0606060606060606 MTRRfix16K_8 (0x258): 0x0606060606060606 MTRRfix16K_A (0x259): 0x MTRRfix4K_C8000 (0x269): 0x0505050505050505 MTRRfix4K_D 0x26a: 0x MTRRfix4K_D8000 0x26b: 0x MTRRfix4K_E 0x26c: 0x MTRRfix4K_E8000 0x26d: 0x0505050505050505 MTRRfix4K_F 0x26e: 0x0505050505050505 MTRRfix4K_F8000 0x26f: 0x0505050505050505 MTRRdefType (0x2ff): 0x0c00 (fixed-range flag: 0x1, mtrr flag: 0x1, type field: 0x00 (uncacheable)) snip Well they're all the same. Hmm, x86info is not dumping all the variable MTRRs. You have 10, but it only prints the first 8. I don't know if it will show anything different, but can you try fixing it with this patch? --- a/mtrr.c +++ b/mtrr.c @@ -75,19 +75,23 @@ printf(0x%016llx\n, val); } -static void decode_mtrrcap(int cpu, int msr) +unsigned int decode_mtrrcap(int cpu, int msr) { unsigned long long val; + unsigned int vcnt = 0; int ret; ret = mtrr_value(cpu,msr,val); if (ret) { + vcnt = (unsigned int)(val IA32_MTRRCAP_VCNT); printf(0x%016llx , val); printf((smrr flag: 0x%01x, ,(unsigned int) (val IA32_MTRRCAP_SMRR) 11 ); printf(wc flag: 0x%01x, ,(unsigned int) (valIA32_MTRRCAP_WC) 10); printf(fix flag: 0x%01x, ,(unsigned int) (valIA32_MTRRCAP_FIX) 8); - printf(vcnt field: 0x%02x (%d))\n,(unsigned int) (valIA32_MTRRCAP_VCNT) , (int) (valIA32_MTRRCAP_VCNT)); + printf(vcnt field: 0x%02x (%u))\n, vcnt, vcnt); } + + return vcnt; } static void decode_mtrr_deftype(int cpu, int msr) @@ -142,7 +146,7 @@ void dump_mtrrs(struct cpudata *cpu) { unsigned long long val = 0; - unsigned int i; + unsigned int i, vcnt; if (!(cpu-flags_edx (X86_FEATURE_MTRR))) return; @@ -157,11 +161,11 @@ printf(MTRR registers:\n); printf(MTRRcap (0xfe): ); - decode_mtrrcap(cpu-number, 0xfe); + vcnt = decode_mtrrcap(cpu-number, 0xfe); set_max_phy_addr(cpu); - for (i = 0; i 16; i+=2) { + for (i = 0; i 2 * vcnt; i += 2) { printf(MTRRphysBase%u (0x%x): , i/2, (unsigned int) 0x200+i); decode_mtrr_physbase(cpu
Re: acpitool - /proc/acpi/wakeup
On Thu, 9 Oct 2014 22:16:11 +0200 Frans Klaver franskla...@gmail.com wrote: On Thu, Oct 9, 2014 at 9:58 PM, Marc Burkhardt m...@osknowledge.org wrote: On Thu, Oct 9, 2014 at 9:42 PM, Marc Burkhardt m...@osknowledge.org wrote: I upgraded from 3.10 on that machine. 3.12 didn't work for me due to a hibernation bug. The rest was left out... :/ If you still have the 3.12 kernel around, could you test if acpitool -e worked there? Let me ask you a question: does it make sense to test 3.12 again because you know there's something changed regarding /proc/acpi/... or because it's the kernel I broke up on upgrading? Never mind. It broke after 3.14. I'll bisect. The below patch fixes it for me. Looks like the line sizes changed and some are now exactly the right length to make it loop forever reading /proc/acpi/wakeup: --- a/src/acpitool.cpp +++ b/src/acpitool.cpp @@ -417,7 +417,7 @@ int Do_Fan_Info(int verbose) int Show_WakeUp_Devices(int verbose) { ifstream file_in; -char *filename, str[40]; +char *filename, str[80]; filename = /proc/acpi/wakeup; @@ -438,13 +438,13 @@ int Show_WakeUp_Devices(int verbose) } else { - file_in.getline(str, 40); // first line are just headers // + file_in.getline(str, 80); // first line are just headers // cout strendl; cout ---endl; int t = 1; while(!file_in.eof()) { - file_in.getline(str, 40); + file_in.getline(str, 80); if (strlen(str)!=0) // avoid printing last empty line // { cout t. strendl; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: vdso_standalone_test_x86.c build failure on Linus' tree
On Wed, 08 Oct 2014 14:21:00 -0700 "H. Peter Anvin" wrote: > On 10/08/2014 02:09 PM, Chuck Ebbert wrote: > >> > >> Breaking cross-compilation is not okay, though, regardless of what > >> Fedora does. It should be okay to, for example, build an i386 kernel on > >> an ARM box. > >> > > > > I think they tried that for a while, and ended up chasing compiler > > and makefile bugs all day. And then there's the software that wants > > to run self-tests as part of its build... > > > > That we can't solve, but it is not okay to break the kernel build. > Also, as Andy pointed out, when building for x86_64 we probably want to build both 32-bit and 64-bit versions of most test programs like this one. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: vdso_standalone_test_x86.c build failure on Linus' tree
On Wed, 08 Oct 2014 13:55:00 -0700 "H. Peter Anvin" wrote: > On 10/08/2014 01:46 PM, Chuck Ebbert wrote: > > > > Fedora doesn't cross-compile i686 builds because of problems like > > this. It sets up an i386 chroot and runs all native tools inside of > > it. > > > > Breaking cross-compilation is not okay, though, regardless of what > Fedora does. It should be okay to, for example, build an i386 kernel on > an ARM box. > I think they tried that for a while, and ended up chasing compiler and makefile bugs all day. And then there's the software that wants to run self-tests as part of its build... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: vdso_standalone_test_x86.c build failure on Linus' tree
On Wed, 8 Oct 2014 12:16:11 -0700 Andy Lutomirski wrote: > On Wed, Oct 8, 2014 at 11:52 AM, Josh Boyer wrote: > > I'm seeing the following build failure on a 32-bit x86 build in Fedora > > based on Linux v3.17-2860-gef0625b70dac: > > > > Documentation/vDSO/vdso_standalone_test_x86.o: In function `to_base10': > > vdso_standalone_test_x86.c:(.text+0xcc): undefined reference to `__umoddi3' > > vdso_standalone_test_x86.c:(.text+0xea): undefined reference to `__udivdi3' > > collect2: error: ld returned 1 exit status > > scripts/Makefile.host:100: recipe for target > > 'Documentation/vDSO/vdso_standalone_test_x86' failed > > make[2]: *** [Documentation/vDSO/vdso_standalone_test_x86] Error 1 > > scripts/Makefile.build:404: recipe for target 'Documentation/vDSO' failed > > make[1]: *** [Documentation/vDSO] Error 2 > > make[1]: *** Waiting for unfinished jobs > > Makefile:922: recipe for target 'vmlinux' failed > > make: *** [vmlinux] Error 2 > > > It should build and work on 32-bit. > > Except that the makefile is totally bogus. vdso_standalone_test isn't > a hostprog at all. It's a target prog. But kbuild doesn't understand > that, so I have no idea what, if anything, that makefile is supposed > to do. Heh, I was wondering why I got a 64-bit program with ARCH=i386 > I would argue that the whole documentation build system should be > fixed to cross-compile or should just be disabled for cross-builds if > glibc isn't available. > Fedora doesn't cross-compile i686 builds because of problems like this. It sets up an i386 chroot and runs all native tools inside of it. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915.ko WC writes are slow after ea8596bb2d8d379
On Wed, 8 Oct 2014 10:03:36 +0100 Chris Wilson wrote: > and adding that back into the current build, e.g. > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 3632743..48a8a69 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -87,6 +87,7 @@ config X86 > select HAVE_USER_RETURN_NOTIFIER > select ARCH_BINFMT_ELF_RANDOMIZE_PIE > select HAVE_ARCH_JUMP_LABEL > + select STOP_MACHINE > select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE > select SPARSE_IRQ > select GENERIC_FIND_FIRST_BIT > > fixes the regression. > Looking closer at this, it seems most configs work by accident, because they have MOD_UNLOAD and/or HOTPLUG_CPU enabled. I take it you disabled both of those? stop_machine() is called from all kinds of places and almost none of them make sure STOP_MACHINE is selected. $ find -name Kconf\* | xargs grep STOP_MACHINE ./init/Kconfig:config STOP_MACHINE All these places use stop_machine(): mm/page_alloc.c, line 3886 drivers/xen/manage.c, line 130 drivers/char/hw_random/intel-rng.c, line 373 arch/powerpc/mm/numa.c: line 1616 line 1623 arch/powerpc/platforms/powernv/subcore.c, line 324 arch/arm/kernel/kprobes.c, line 165 arch/arm/kernel/patch.c: line 64 line 71 arch/s390/kernel/jump_label.c, line 61 arch/s390/kernel/kprobes.c: line 311 line 320 arch/s390/kernel/time.c: line 820 line 1590 arch/x86/kernel/cpu/mtrr/main.c, line 231 arch/arm64/kernel/insn.c, line 181 kernel/time/timekeeping.c, line 892 kernel/trace/ftrace.c, line 2219 kernel/module.c: line 770 line 1861 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915.ko WC writes are slow after ea8596bb2d8d379
On Wed, 8 Oct 2014 10:03:36 +0100 Chris Wilson wrote: > > I ran into a problem on a Sandybridge i5-2500s whilst measuring the > performance of GTT write-combining access. I found subsequent runs were > about 10-40x slower than the first. For example, > > igt/gem_gtt_speed: > > Time to read 16k through a GTT map: 325.285µs > Time to write 16k through a GTT map: 4.729µs > Time to clear 16k through a GTT map: 4.584µs > Time to clear 16k through a cached GTT map: 1.342µs > > on the second run became: > > Time to read 16k through a GTT map: 332.148µs > Time to write 16k through a GTT map:209.411µs > Time to clear 16k through a GTT map: 56.460µs > Time to clear 16k through a cached GTT map: 50.897µs > > Naively I would say that we lost the wc on our ioremap. > /sys/kernel/debug/x86/pat_memtype_list remained the same across repeated > runs. > > A bisection pointed to > > commit ea8596bb2d8d37957f3e92db9511c50801689180 > Author: Masami Hiramatsu > Date: Thu Jul 18 20:47:53 2013 +0900 > > kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() > functions > > of which the active ingredient was just > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index b32ebf9..f4001e0 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -2334,7 +2334,6 @@ config HAVE_ATOMIC_IOMAP > > config HAVE_TEXT_POKE_SMP > bool > - select STOP_MACHINE if SMP > > config X86_DEV_DMA_OPS > bool > > and adding that back into the current build, e.g. Hmm, set_mtrr() uses stop_machine(). I wonder if your MTRRs are out of sync and your results depend on which CPU the test runs on? > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 3632743..48a8a69 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -87,6 +87,7 @@ config X86 > select HAVE_USER_RETURN_NOTIFIER > select ARCH_BINFMT_ELF_RANDOMIZE_PIE > select HAVE_ARCH_JUMP_LABEL > + select STOP_MACHINE > select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE > select SPARSE_IRQ > select GENERIC_FIND_FIRST_BIT > > fixes the regression. > > For the record, this kernel build doesn't use modules, which seems relevant > in light of ea8596bb2 "fixes a Kconfig dependency issue on STOP_MACHINE > in the case of CONFIG_SMP && !CONFIG_MODULE_UNLOAD". -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915.ko WC writes are slow after ea8596bb2d8d379
On Wed, 8 Oct 2014 10:03:36 +0100 Chris Wilson ch...@chris-wilson.co.uk wrote: I ran into a problem on a Sandybridge i5-2500s whilst measuring the performance of GTT write-combining access. I found subsequent runs were about 10-40x slower than the first. For example, igt/gem_gtt_speed: Time to read 16k through a GTT map: 325.285µs Time to write 16k through a GTT map: 4.729µs Time to clear 16k through a GTT map: 4.584µs Time to clear 16k through a cached GTT map: 1.342µs on the second run became: Time to read 16k through a GTT map: 332.148µs Time to write 16k through a GTT map:209.411µs Time to clear 16k through a GTT map: 56.460µs Time to clear 16k through a cached GTT map: 50.897µs Naively I would say that we lost the wc on our ioremap. /sys/kernel/debug/x86/pat_memtype_list remained the same across repeated runs. A bisection pointed to commit ea8596bb2d8d37957f3e92db9511c50801689180 Author: Masami Hiramatsu masami.hiramatsu...@hitachi.com Date: Thu Jul 18 20:47:53 2013 +0900 kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functions of which the active ingredient was just diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index b32ebf9..f4001e0 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2334,7 +2334,6 @@ config HAVE_ATOMIC_IOMAP config HAVE_TEXT_POKE_SMP bool - select STOP_MACHINE if SMP config X86_DEV_DMA_OPS bool and adding that back into the current build, e.g. Hmm, set_mtrr() uses stop_machine(). I wonder if your MTRRs are out of sync and your results depend on which CPU the test runs on? diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 3632743..48a8a69 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -87,6 +87,7 @@ config X86 select HAVE_USER_RETURN_NOTIFIER select ARCH_BINFMT_ELF_RANDOMIZE_PIE select HAVE_ARCH_JUMP_LABEL + select STOP_MACHINE select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE select SPARSE_IRQ select GENERIC_FIND_FIRST_BIT fixes the regression. For the record, this kernel build doesn't use modules, which seems relevant in light of ea8596bb2 fixes a Kconfig dependency issue on STOP_MACHINE in the case of CONFIG_SMP !CONFIG_MODULE_UNLOAD. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915.ko WC writes are slow after ea8596bb2d8d379
On Wed, 8 Oct 2014 10:03:36 +0100 Chris Wilson ch...@chris-wilson.co.uk wrote: and adding that back into the current build, e.g. diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 3632743..48a8a69 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -87,6 +87,7 @@ config X86 select HAVE_USER_RETURN_NOTIFIER select ARCH_BINFMT_ELF_RANDOMIZE_PIE select HAVE_ARCH_JUMP_LABEL + select STOP_MACHINE select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE select SPARSE_IRQ select GENERIC_FIND_FIRST_BIT fixes the regression. Looking closer at this, it seems most configs work by accident, because they have MOD_UNLOAD and/or HOTPLUG_CPU enabled. I take it you disabled both of those? stop_machine() is called from all kinds of places and almost none of them make sure STOP_MACHINE is selected. $ find -name Kconf\* | xargs grep STOP_MACHINE ./init/Kconfig:config STOP_MACHINE All these places use stop_machine(): mm/page_alloc.c, line 3886 drivers/xen/manage.c, line 130 drivers/char/hw_random/intel-rng.c, line 373 arch/powerpc/mm/numa.c: line 1616 line 1623 arch/powerpc/platforms/powernv/subcore.c, line 324 arch/arm/kernel/kprobes.c, line 165 arch/arm/kernel/patch.c: line 64 line 71 arch/s390/kernel/jump_label.c, line 61 arch/s390/kernel/kprobes.c: line 311 line 320 arch/s390/kernel/time.c: line 820 line 1590 arch/x86/kernel/cpu/mtrr/main.c, line 231 arch/arm64/kernel/insn.c, line 181 kernel/time/timekeeping.c, line 892 kernel/trace/ftrace.c, line 2219 kernel/module.c: line 770 line 1861 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: vdso_standalone_test_x86.c build failure on Linus' tree
On Wed, 8 Oct 2014 12:16:11 -0700 Andy Lutomirski l...@amacapital.net wrote: On Wed, Oct 8, 2014 at 11:52 AM, Josh Boyer jwbo...@fedoraproject.org wrote: I'm seeing the following build failure on a 32-bit x86 build in Fedora based on Linux v3.17-2860-gef0625b70dac: Documentation/vDSO/vdso_standalone_test_x86.o: In function `to_base10': vdso_standalone_test_x86.c:(.text+0xcc): undefined reference to `__umoddi3' vdso_standalone_test_x86.c:(.text+0xea): undefined reference to `__udivdi3' collect2: error: ld returned 1 exit status scripts/Makefile.host:100: recipe for target 'Documentation/vDSO/vdso_standalone_test_x86' failed make[2]: *** [Documentation/vDSO/vdso_standalone_test_x86] Error 1 scripts/Makefile.build:404: recipe for target 'Documentation/vDSO' failed make[1]: *** [Documentation/vDSO] Error 2 make[1]: *** Waiting for unfinished jobs Makefile:922: recipe for target 'vmlinux' failed make: *** [vmlinux] Error 2 It should build and work on 32-bit. Except that the makefile is totally bogus. vdso_standalone_test isn't a hostprog at all. It's a target prog. But kbuild doesn't understand that, so I have no idea what, if anything, that makefile is supposed to do. Heh, I was wondering why I got a 64-bit program with ARCH=i386 I would argue that the whole documentation build system should be fixed to cross-compile or should just be disabled for cross-builds if glibc isn't available. Fedora doesn't cross-compile i686 builds because of problems like this. It sets up an i386 chroot and runs all native tools inside of it. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: vdso_standalone_test_x86.c build failure on Linus' tree
On Wed, 08 Oct 2014 13:55:00 -0700 H. Peter Anvin h...@zytor.com wrote: On 10/08/2014 01:46 PM, Chuck Ebbert wrote: Fedora doesn't cross-compile i686 builds because of problems like this. It sets up an i386 chroot and runs all native tools inside of it. Breaking cross-compilation is not okay, though, regardless of what Fedora does. It should be okay to, for example, build an i386 kernel on an ARM box. I think they tried that for a while, and ended up chasing compiler and makefile bugs all day. And then there's the software that wants to run self-tests as part of its build... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: vdso_standalone_test_x86.c build failure on Linus' tree
On Wed, 08 Oct 2014 14:21:00 -0700 H. Peter Anvin h...@zytor.com wrote: On 10/08/2014 02:09 PM, Chuck Ebbert wrote: Breaking cross-compilation is not okay, though, regardless of what Fedora does. It should be okay to, for example, build an i386 kernel on an ARM box. I think they tried that for a while, and ended up chasing compiler and makefile bugs all day. And then there's the software that wants to run self-tests as part of its build... That we can't solve, but it is not okay to break the kernel build. Also, as Andy pointed out, when building for x86_64 we probably want to build both 32-bit and 64-bit versions of most test programs like this one. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] ring-buffer: Fix infinite spin in reading buffer
On Sun, 5 Oct 2014 16:49:43 -0700 Greg Kroah-Hartman wrote: > On Fri, Oct 03, 2014 at 04:20:43PM -0400, Steven Rostedt wrote: ... > > Fixes: 651e22f2701b "ring-buffer: Always reset iterator to reader page" > > Signed-off-by: Steven Rostedt > > Next time, please also add a Cc: stable... here so that my tools pick > it up automatically. > Can you add "Fixes:" to the list of keywords your tools pick up, and determine if the patch is needed in -stable by looking at the commit ID that's being fixed? Some authors might not remember if the thing being fixed has made it into an older release via -stable. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH resend] MIPS: Allow FPU emulator to use non-stack area.
On Tue, 7 Oct 2014 16:59:03 -0700 David Daney wrote: > On 10/07/2014 04:20 PM, Ralf Baechle wrote: > > On Mon, Oct 06, 2014 at 02:18:19PM -0700, David Daney wrote: > > > >>> As an alternative, if the space of possible instruction with a delay > >>> slot is sufficiently small, all such instructions could be mapped as > >>> immutable code in a shared mapping, each at a fixed offset in the > >>> mapping. I suspect this would be borderline-impractical (multiple > >>> megabytes?), but it is the cleanest solution otherwise. > >>> > >> > >> Yes, there are 2^32 possible instructions. Each one is 4 bytes, plus you > >> need a way to exit after the instruction has executed, which would require > >> another instruction. So you would need 32GB of memory to hold all those > >> instructions, larger than the 32-bit virtual address space. > > > > Plus errata support for some older CPUs requires no other instructions > > that might cause an exception to be present in the same cache line inflating > > the size to 32 bytes per instruction. > > > > I've contemplated a full emulation - but that would require an emulator that > > is capable of most of the instruction set. With all the random ASEs around > > that would be hard to implement while the FPU emulator trampoline as > > currently > > used has the advantage of automatically supporting ASEs, known and unknown. > > So it's a huge bonus for maintenance. > > > > Unfortunatly it breaks when our friends at Imgtec introduce their PC > relative instructions in mipsr6, so an emulator may be unavoidable. > The x86 kprobes code deals with executing relocated insns with PC-relative offsets by adjusting the offset in a relocated instruction before executing it. See arch/x86/kernel/kprobes/core.c::__copy_instruction() -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.17
On Tue, 7 Oct 2014 23:45:20 +0300 (EEST) Meelis Roos wrote: > > Anyway, back to 3.17. Nothing major happened during the last week, as > > you can see from the appended shortlog. Mostly drivers (i915, nouveau, > > ethernet, scsi, sound) and some networking fixes. With some misc > > noise all over. > > > > Go out and test, > > Unfortunately my computer still livelocks with watchdog timeouts. The > previous reports are here: https://lkml.org/lkml/2014/9/28/40 and > https://lkml.org/lkml/2014/9/30/217 and this time the dmesg is like that > (config is also below): > Unforunately the fixes for this did not make 3.17, even though they were available Sept. 25th: http://marc.info/?t=14116557614=1=2 > [87785.429264] [ cut here ] > [87785.429276] WARNING: CPU: 2 PID: 19638 at kernel/watchdog.c:267 > watchdog_overflow_callback+0x90/0xc0() > [87785.429278] Watchdog detected hard LOCKUP on cpu 2 > [87785.429280] Modules linked in: > [87785.429282] ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat dm_mod nfsv2 > cpufreq_stats cpufreq_powersave cpufreq_userspace cpufreq_conservative > snd_hrtimer snd_seq_dummy snd_seq_midi snd_seq_oss snd_seq_midi_event > snd_rawmidi snd_seq snd_seq_device binfmt_misc nfsd auth_rpcgss oid_registry > nfs_acl nfs lockd sunrpc bridge stp llc tun joydev hid_generic usbhid hid > x86_pkg_temp_thermal kvm_intel kvm crc32c_intel aesni_intel i915 aes_x86_64 > glue_helper lrw gf128mul snd_hda_codec_realtek snd_hda_codec_generic > ablk_helper cryptd i2c_algo_bit video drm_kms_helper iTCO_wdt psmouse pcspkr > iTCO_vendor_support drm snd_hda_intel evdev snd_hda_controller microcode > snd_hda_codec e1000e sr_mod cdrom snd_hwdep snd_pcm_oss ehci_pci > snd_mixer_oss parport_pc snd_pcm ehci_hcd parport snd_timer snd usbcore sg > soundcore > [87785.429341] tpm_tis nuvoton_cir i2c_i801 rc_core floppy tpm ptp pps_core > lpc_ich button mfd_core processor usb_common thermal_sys msr w83627ehf > hwmon_vid coretemp hwmon eeprom i2c_core loop fuse autofs4 > [87785.429359] CPU: 2 PID: 19638 Comm: less Not tainted 3.17.0 #130 > [87785.429361] Hardware name: /DQ67OW, BIOS > SWQ6710H.86A.0066.2012.1105.1504 11/05/2012 > [87785.429363] 0009 814636f3 88023e306cd0 > 8104494d > [87785.429366] 880232e24000 88023e306d20 88023e306db8 > 88023e306ef8 > [87785.429369] 810449b7 816e7fc0 > 0020 > [87785.429373] Call Trace: > [87785.429375][] ? dump_stack+0x41/0x51 > [87785.429385] [] ? warn_slowpath_common+0x6d/0x90 > [87785.429388] [] ? warn_slowpath_fmt+0x47/0x50 > [87785.429392] [] ? watchdog_overflow_callback+0x90/0xc0 > [87785.429397] [] ? __perf_event_overflow+0x87/0x2c0 > [87785.429401] [] ? intel_pmu_handle_irq+0x1d5/0x390 > [87785.429406] [] ? perf_event_nmi_handler+0x26/0x40 > [87785.429410] [] ? nmi_handle+0x61/0xc0 > [87785.429413] [] ? do_nmi+0xc3/0x340 > [87785.429418] [] ? end_repeat_nmi+0x1e/0x2e > [87785.429423] [] ? lock_hrtimer_base.isra.29+0x1b/0x40 > [87785.429427] [] ? lock_hrtimer_base.isra.29+0x1b/0x40 > [87785.429430] [] ? lock_hrtimer_base.isra.29+0x1b/0x40 > [87785.429432] <>[] ? > hrtimer_try_to_cancel+0x1a/0x80 > [87785.429439] [] ? hrtimer_cancel+0x1a/0x20 > [87785.429442] [] ? tick_nohz_restart+0xd/0x80 > [87785.429445] [] ? __tick_nohz_full_check+0x9c/0xa0 > [87785.429448] [] ? irq_work_run_list+0x3c/0x70 > [87785.429451] [] ? irq_work_run+0x15/0x40 > [87785.429454] [] ? update_process_times+0x4f/0x60 > [87785.429457] [] ? tick_sched_timer+0x37/0x60 > [87785.429461] [] ? __run_hrtimer.isra.32+0x41/0xf0 > [87785.429464] [] ? hrtimer_interrupt+0xef/0x240 > [87785.429468] [] ? smp_apic_timer_interrupt+0x36/0x50 > [87785.429472] [] ? apic_timer_interrupt+0x6a/0x70 > [87785.429473] > [87785.429476] ---[ end trace c05bae025e1c336d ]--- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] perf tools: fix off-by-one error in maps
On Tue, 7 Oct 2014 11:00:50 -0300 Arnaldo Carvalho de Melo wrote: > I keep thinking that this change is making things unclear. > > I.e. the _start_ of a map (map->start) is _in_ the map, and the _end_ > of a map (map->end) is _in_ the map as well. > > if (addr > m->end) > > is shorter than: > > if (addr >= m->end) > > "start" and "end" should have the same rule applied, i.e. if one is in, > the other is in as well. > > Etc. > But the convention used in the memory management code is that "end" is the next byte after the memory region. This gives you: size = end - start end = start + size Using a different convention here will just confuse people used to the way it's done everywhere else. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] perf tools: fix off-by-one error in maps
On Tue, 7 Oct 2014 11:00:50 -0300 Arnaldo Carvalho de Melo a...@redhat.com wrote: I keep thinking that this change is making things unclear. I.e. the _start_ of a map (map-start) is _in_ the map, and the _end_ of a map (map-end) is _in_ the map as well. if (addr m-end) is shorter than: if (addr = m-end) start and end should have the same rule applied, i.e. if one is in, the other is in as well. Etc. But the convention used in the memory management code is that end is the next byte after the memory region. This gives you: size = end - start end = start + size Using a different convention here will just confuse people used to the way it's done everywhere else. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.17
On Tue, 7 Oct 2014 23:45:20 +0300 (EEST) Meelis Roos mr...@ut.ee wrote: Anyway, back to 3.17. Nothing major happened during the last week, as you can see from the appended shortlog. Mostly drivers (i915, nouveau, ethernet, scsi, sound) and some networking fixes. With some misc noise all over. Go out and test, Unfortunately my computer still livelocks with watchdog timeouts. The previous reports are here: https://lkml.org/lkml/2014/9/28/40 and https://lkml.org/lkml/2014/9/30/217 and this time the dmesg is like that (config is also below): Unforunately the fixes for this did not make 3.17, even though they were available Sept. 25th: http://marc.info/?t=14116557614r=1w=2 [87785.429264] [ cut here ] [87785.429276] WARNING: CPU: 2 PID: 19638 at kernel/watchdog.c:267 watchdog_overflow_callback+0x90/0xc0() [87785.429278] Watchdog detected hard LOCKUP on cpu 2 [87785.429280] Modules linked in: [87785.429282] ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat dm_mod nfsv2 cpufreq_stats cpufreq_powersave cpufreq_userspace cpufreq_conservative snd_hrtimer snd_seq_dummy snd_seq_midi snd_seq_oss snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd sunrpc bridge stp llc tun joydev hid_generic usbhid hid x86_pkg_temp_thermal kvm_intel kvm crc32c_intel aesni_intel i915 aes_x86_64 glue_helper lrw gf128mul snd_hda_codec_realtek snd_hda_codec_generic ablk_helper cryptd i2c_algo_bit video drm_kms_helper iTCO_wdt psmouse pcspkr iTCO_vendor_support drm snd_hda_intel evdev snd_hda_controller microcode snd_hda_codec e1000e sr_mod cdrom snd_hwdep snd_pcm_oss ehci_pci snd_mixer_oss parport_pc snd_pcm ehci_hcd parport snd_timer snd usbcore sg soundcore [87785.429341] tpm_tis nuvoton_cir i2c_i801 rc_core floppy tpm ptp pps_core lpc_ich button mfd_core processor usb_common thermal_sys msr w83627ehf hwmon_vid coretemp hwmon eeprom i2c_core loop fuse autofs4 [87785.429359] CPU: 2 PID: 19638 Comm: less Not tainted 3.17.0 #130 [87785.429361] Hardware name: /DQ67OW, BIOS SWQ6710H.86A.0066.2012.1105.1504 11/05/2012 [87785.429363] 0009 814636f3 88023e306cd0 8104494d [87785.429366] 880232e24000 88023e306d20 88023e306db8 88023e306ef8 [87785.429369] 810449b7 816e7fc0 0020 [87785.429373] Call Trace: [87785.429375] NMI [814636f3] ? dump_stack+0x41/0x51 [87785.429385] [8104494d] ? warn_slowpath_common+0x6d/0x90 [87785.429388] [810449b7] ? warn_slowpath_fmt+0x47/0x50 [87785.429392] [810c60e0] ? watchdog_overflow_callback+0x90/0xc0 [87785.429397] [810d1a47] ? __perf_event_overflow+0x87/0x2c0 [87785.429401] [8101d0b5] ? intel_pmu_handle_irq+0x1d5/0x390 [87785.429406] [81015c86] ? perf_event_nmi_handler+0x26/0x40 [87785.429410] [810068e1] ? nmi_handle+0x61/0xc0 [87785.429413] [81006a03] ? do_nmi+0xc3/0x340 [87785.429418] [8146a91a] ? end_repeat_nmi+0x1e/0x2e [87785.429423] [81093f7b] ? lock_hrtimer_base.isra.29+0x1b/0x40 [87785.429427] [81093f7b] ? lock_hrtimer_base.isra.29+0x1b/0x40 [87785.429430] [81093f7b] ? lock_hrtimer_base.isra.29+0x1b/0x40 [87785.429432] EOE IRQ [810944ba] ? hrtimer_try_to_cancel+0x1a/0x80 [87785.429439] [8109453a] ? hrtimer_cancel+0x1a/0x20 [87785.429442] [810a1a1d] ? tick_nohz_restart+0xd/0x80 [87785.429445] [810a1e3c] ? __tick_nohz_full_check+0x9c/0xa0 [87785.429448] [810c891c] ? irq_work_run_list+0x3c/0x70 [87785.429451] [810c8965] ? irq_work_run+0x15/0x40 [87785.429454] [81093aff] ? update_process_times+0x4f/0x60 [87785.429457] [810a1957] ? tick_sched_timer+0x37/0x60 [87785.429461] [810943f1] ? __run_hrtimer.isra.32+0x41/0xf0 [87785.429464] [8109478f] ? hrtimer_interrupt+0xef/0x240 [87785.429468] [8102cce6] ? smp_apic_timer_interrupt+0x36/0x50 [87785.429472] [81469e8a] ? apic_timer_interrupt+0x6a/0x70 [87785.429473] EOI [87785.429476] ---[ end trace c05bae025e1c336d ]--- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH resend] MIPS: Allow FPU emulator to use non-stack area.
On Tue, 7 Oct 2014 16:59:03 -0700 David Daney dda...@caviumnetworks.com wrote: On 10/07/2014 04:20 PM, Ralf Baechle wrote: On Mon, Oct 06, 2014 at 02:18:19PM -0700, David Daney wrote: As an alternative, if the space of possible instruction with a delay slot is sufficiently small, all such instructions could be mapped as immutable code in a shared mapping, each at a fixed offset in the mapping. I suspect this would be borderline-impractical (multiple megabytes?), but it is the cleanest solution otherwise. Yes, there are 2^32 possible instructions. Each one is 4 bytes, plus you need a way to exit after the instruction has executed, which would require another instruction. So you would need 32GB of memory to hold all those instructions, larger than the 32-bit virtual address space. Plus errata support for some older CPUs requires no other instructions that might cause an exception to be present in the same cache line inflating the size to 32 bytes per instruction. I've contemplated a full emulation - but that would require an emulator that is capable of most of the instruction set. With all the random ASEs around that would be hard to implement while the FPU emulator trampoline as currently used has the advantage of automatically supporting ASEs, known and unknown. So it's a huge bonus for maintenance. Unfortunatly it breaks when our friends at Imgtec introduce their PC relative instructions in mipsr6, so an emulator may be unavoidable. The x86 kprobes code deals with executing relocated insns with PC-relative offsets by adjusting the offset in a relocated instruction before executing it. See arch/x86/kernel/kprobes/core.c::__copy_instruction() -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] ring-buffer: Fix infinite spin in reading buffer
On Sun, 5 Oct 2014 16:49:43 -0700 Greg Kroah-Hartman gre...@linuxfoundation.org wrote: On Fri, Oct 03, 2014 at 04:20:43PM -0400, Steven Rostedt wrote: ... Fixes: 651e22f2701b ring-buffer: Always reset iterator to reader page Signed-off-by: Steven Rostedt rost...@goodmis.org Next time, please also add a Cc: stable... here so that my tools pick it up automatically. Can you add Fixes: to the list of keywords your tools pick up, and determine if the patch is needed in -stable by looking at the commit ID that's being fixed? Some authors might not remember if the thing being fixed has made it into an older release via -stable. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf: 3.17 another perf_fuzzer lockup
On Mon, 6 Oct 2014 11:55:11 -0400 (EDT) Vince Weaver wrote: > On Mon, 6 Oct 2014, Vince Weaver wrote: > > > [ 843.700042] general protection fault: [#1] SMP > > ... > > [ 843.704001] task: 88011a874000 ti: 8800bc0ec000 task.ti: > > 8800bc0ec000 > > [ 843.704001] RIP: 0010:[] [] > > perf_event_context_sched_in.isra.75+0x1f/0x90 > > For what it's worth, this is > > kernel/events/core.c:2646 > > if (atomic_read(&__get_cpu_var(perf_cgroup_events))) > perf_cgroup_sched_in(prev, task); > > > 810cd902: 53 push %rbx > 810cd903: 48 8b 07mov(%rdi),%rax > 810cd906: 48 8b 58 40 mov0x40(%rax),%rbx > 810cd90a: 65 48 03 1c 25 08 ceadd%gs:0xce08,%rbx > 810cd911: 00 00 > 810cd913: 48 39 bb d8 00 00 00cmp%rdi,0xd8(%rbx) > 810cd91a: 74 63 je 810cd97f > > Actually it's: static void perf_event_context_sched_in(struct perf_event_context *ctx, struct task_struct *task) { struct perf_cpu_context *cpuctx; cpuctx = __get_cpu_context(ctx); if (cpuctx->task_ctx == ctx) <=== oops return; cpuctx is in %rbx (=1001e742c000) and that's not even a legal address, which is what caused the general protection fault > > > [ 843.704001] RSP: 0018:8800bc0efd50 EFLAGS: 00010087 > > [ 843.704001] RAX: ea0002ba2d68 RBX: 1001e742c000 RCX: > > 038f > > [ 843.704001] RDX: 88011fc95b30 RSI: 880037d0eb00 RDI: > > 880037d0e700 > > [ 843.704001] RBP: 8800bc0efd60 R08: 8800bc0ec000 R09: > > baff > > [ 843.704001] R10: 0006 R11: 09bc R12: > > 880037d0e700 > > [ 843.704001] R13: 8800c944f400 R14: 0001 R15: > > 88011b340800 > > [ 843.704001] FS: 7ffc76c17700() GS:88011fc8() > > knlGS: > > [ 843.704001] CS: 0010 DS: ES: CR0: 8005003b > > [ 843.704001] DR0: 01c7b000 DR1: DR2: > > 01c7b000 > > [ 843.704001] DR3: DR6: 0ff0 DR7: > > 0600 > > [ 843.704001] Stack: > > [ 843.704001] 88011a874000 88011b340800 8800bc0efd90 > > 810cd9bb > > [ 843.704001] 88011a8744e8 88011b340800 88011fc929c0 > > 8800c944f400 > > [ 843.704001] 8800bc0efdc0 8105ae62 88011fc929c0 > > 8800c944f400 > > [ 843.704001] Call Trace: > > [ 843.704001] [] __perf_event_task_sched_in+0x37/0xf4 > > [ 843.704001] [] finish_task_switch+0x9b/0xa6 > > [ 843.704001] [] __schedule+0x309/0x4a5 > > [ 843.704001] [] _cond_resched+0x28/0x3b > > [ 843.704001] [] mutex_lock+0x12/0x2f > > [ 843.704001] [] find_get_context+0xfc/0x170 > > [ 843.704001] [] SYSC_perf_event_open+0x47b/0x7f5 > > [ 843.704001] [] SyS_perf_event_open+0xe/0x10 > > [ 843.704001] [] tracesys+0xd4/0xd9 > > [ 843.704001] Code: 89 e7 e8 65 fe ff ff 5b 41 5c 5d c3 e8 c7 7e 45 00 55 > > 48 89 e5 41 54 49 89 fc 53 48 8b 07 48 8b 58 40 65 48 03 1c 25 08 ce 00 00 > > <48> 39 bb d8 00 00 00 74 63 48 89 fe 48 89 df e8 f0 b4 ff ff 49 > > [ 843.704001] RIP [] > > perf_event_context_sched_in.isra.75+0x1f/0x90 > > [ 843.704001] RSP -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf: 3.17 another perf_fuzzer lockup
On Mon, 6 Oct 2014 11:55:11 -0400 (EDT) Vince Weaver vincent.wea...@maine.edu wrote: On Mon, 6 Oct 2014, Vince Weaver wrote: [ 843.700042] general protection fault: [#1] SMP ... [ 843.704001] task: 88011a874000 ti: 8800bc0ec000 task.ti: 8800bc0ec000 [ 843.704001] RIP: 0010:[810cd913] [810cd913] perf_event_context_sched_in.isra.75+0x1f/0x90 For what it's worth, this is kernel/events/core.c:2646 if (atomic_read(__get_cpu_var(perf_cgroup_events))) perf_cgroup_sched_in(prev, task); 810cd902: 53 push %rbx 810cd903: 48 8b 07mov(%rdi),%rax 810cd906: 48 8b 58 40 mov0x40(%rax),%rbx 810cd90a: 65 48 03 1c 25 08 ceadd%gs:0xce08,%rbx 810cd911: 00 00 810cd913: 48 39 bb d8 00 00 00cmp%rdi,0xd8(%rbx) 810cd91a: 74 63 je 810cd97f perf_event_context_sched_in.isra.75+0x8b Actually it's: static void perf_event_context_sched_in(struct perf_event_context *ctx, struct task_struct *task) { struct perf_cpu_context *cpuctx; cpuctx = __get_cpu_context(ctx); if (cpuctx-task_ctx == ctx) === oops return; cpuctx is in %rbx (=1001e742c000) and that's not even a legal address, which is what caused the general protection fault [ 843.704001] RSP: 0018:8800bc0efd50 EFLAGS: 00010087 [ 843.704001] RAX: ea0002ba2d68 RBX: 1001e742c000 RCX: 038f [ 843.704001] RDX: 88011fc95b30 RSI: 880037d0eb00 RDI: 880037d0e700 [ 843.704001] RBP: 8800bc0efd60 R08: 8800bc0ec000 R09: baff [ 843.704001] R10: 0006 R11: 09bc R12: 880037d0e700 [ 843.704001] R13: 8800c944f400 R14: 0001 R15: 88011b340800 [ 843.704001] FS: 7ffc76c17700() GS:88011fc8() knlGS: [ 843.704001] CS: 0010 DS: ES: CR0: 8005003b [ 843.704001] DR0: 01c7b000 DR1: DR2: 01c7b000 [ 843.704001] DR3: DR6: 0ff0 DR7: 0600 [ 843.704001] Stack: [ 843.704001] 88011a874000 88011b340800 8800bc0efd90 810cd9bb [ 843.704001] 88011a8744e8 88011b340800 88011fc929c0 8800c944f400 [ 843.704001] 8800bc0efdc0 8105ae62 88011fc929c0 8800c944f400 [ 843.704001] Call Trace: [ 843.704001] [810cd9bb] __perf_event_task_sched_in+0x37/0xf4 [ 843.704001] [8105ae62] finish_task_switch+0x9b/0xa6 [ 843.704001] [815209c1] __schedule+0x309/0x4a5 [ 843.704001] [81520df3] _cond_resched+0x28/0x3b [ 843.704001] [815217b5] mutex_lock+0x12/0x2f [ 843.704001] [810cb9bb] find_get_context+0xfc/0x170 [ 843.704001] [810cff77] SYSC_perf_event_open+0x47b/0x7f5 [ 843.704001] [810d0651] SyS_perf_event_open+0xe/0x10 [ 843.704001] [81523ad1] tracesys+0xd4/0xd9 [ 843.704001] Code: 89 e7 e8 65 fe ff ff 5b 41 5c 5d c3 e8 c7 7e 45 00 55 48 89 e5 41 54 49 89 fc 53 48 8b 07 48 8b 58 40 65 48 03 1c 25 08 ce 00 00 48 39 bb d8 00 00 00 74 63 48 89 fe 48 89 df e8 f0 b4 ff ff 49 [ 843.704001] RIP [810cd913] perf_event_context_sched_in.isra.75+0x1f/0x90 [ 843.704001] RSP 8800bc0efd50 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.16 000/357] 3.16.4-stable review
On Sun, 5 Oct 2014 13:39:14 -0700 Greg Kroah-Hartman wrote: > On Sat, Oct 04, 2014 at 07:38:39AM -0500, Chuck Ebbert wrote: > > On Fri, 3 Oct 2014 14:26:26 -0700 > > Greg Kroah-Hartman wrote: > > > > > - > > > Note: This is a big stable release. Mostly my fault for being on the > > > road last week, combined with an unusually large number of patches being > > > tagged for the stable tree. Anyway, I've caught up with all pending > > > patches before 3.17-rc7, so if you have marked something for the stable > > > tree that I have not applied, or emailed the stable@v.k.o list asking > > > for a patch, that is not here, please let me know. > > > - > > > > > > This is the start of the stable review cycle for the 3.16.4 release. > > > There are 357 patches in this series, all will be posted as a response > > > to this one. If anyone has any issues with these being applied, please > > > let me know. > > > > > > Responses should be made by Sun Oct 5 21:28:42 UTC 2014. > > > Anything received after that time might be too late. > > > > > > > Should also include this one, which fixes CVE-2014-6410: > > > > commit c03aa9f6e1f938618e6db2e23afef0574efeeb65 > > Author: Jan Kara > > Date: Thu Sep 4 14:06:55 2014 +0200 > > > > udf: Avoid infinite loop when processing indirect ICBs > > Thanks, I'll queue that up for the next stable releases. > > greg k-h I built 3.16.4-rc1 for Fedora, which already has this patch included. It boots and runs on Fedora 21, and that patch has been in Fedora 21 for a while. Why not include it in 3.16.4? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf & rasd integration plan
On Sun, 5 Oct 2014 20:24:42 +0200 Jiri Olsa wrote: > On Sun, Oct 05, 2014 at 07:48:01PM +0200, Borislav Petkov wrote: > > Top-posting on purpose: > > > > Btw, jolsa, if you get your LCE proposal for the perf splitting > > approved, please post the time here so people can come. > > yep, it got accepted, ther schedule is: > Friday, October 17, 2014 from 9:00am – Noon > Room 2 > > https://pdxplumbers.osuosl.org/2014/ocw/events/LPC2014/tracks/351 > The SSL cert for that site is comic. 1. Expired over three years ago. 2. Was issued to a different domain. 3. Issued by an un-verifiable CA. This is a joke, right? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: (Song) Fk SystemD
On Sun, 5 Oct 2014 10:29:24 + Gregory Smith wrote: > Fuck you. > > This is what you systemd shitheads say to everything. > It's either your way or the highway. > Did you even *read* Al's reply? He's objecting to the tactics, not the underlying message that systemd is crap. Sheesh. > > On 10/4/14, Al Viro wrote: > > On Sat, Oct 04, 2014 at 06:49:30PM +0200, Tom Collins wrote: > > > > [snip masturbation] > > > > To quote Jordan Hubbard, > > > > Your brand of "advocacy" is akin to having the KKK show up at one's > > wedding to congradulate the happy couple on their choice of marrying > > within their race. Some kinds of "support" you just don't need if > > all it leaves you with the desire to take a couple of dozen showers. > > > > That was about a different dipshit in a different flamefest, but it applies > > to you nicely. Incidentally, as you obviously understand and don't give > > a fuck about, you are actively helping the Fine Piece Of Software in > > question, > > letting the pushers of said FPOS to paint everyone who has objections with > > your, er, output... > > > > Go play in the traffic, kid. Remember to arrange a video... > > > > > > -- > > To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org > > with a subject of "unsubscribe". Trouble? Contact > > listmas...@lists.debian.org > > Archive: https://lists.debian.org/20141004174018.gw7...@zeniv.linux.org.uk > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cma: make default CMA area size zero for x86
On Sun, 5 Oct 2014 15:02:56 +0900 Akinobu Mita wrote: > This makes CMA memory area size zero for x86 in default configuration > (doesn't change on the other architectures). If default CMA size is > zero, DMA_CMA is disabled. It can be enabled by passing cma= to the > kernel. > > This makes less impact on x86. Because there is no mainline driver that > requires it for x86, and Peter Hurley reported the performance > regression, as this is trying to drive _all_ dma mapping allocations > through a _very_ small window. > > Signed-off-by: Akinobu Mita > Reported-by: Peter Hurley > Cc: Peter Hurley > Cc: Chuck Ebbert > Cc: Marek Szyprowski > Cc: Konrad Rzeszutek Wilk > Cc: David Woodhouse > Cc: Don Dutile > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: "H. Peter Anvin" > Cc: Andi Kleen > Cc: Yinghai Lu > Cc: x...@kernel.org > Cc: io...@lists.linux-foundation.org > --- > drivers/base/Kconfig | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig > index 4e7f0ff..92a5987e 100644 > --- a/drivers/base/Kconfig > +++ b/drivers/base/Kconfig > @@ -240,6 +240,7 @@ comment "Default contiguous memory area size:" > config CMA_SIZE_MBYTES > int "Size in Mega Bytes" > depends on !CMA_SIZE_SEL_PERCENTAGE > + default 0 if X86 > default 16 > help > Defines the size (in MiB) of the default memory area for Contiguous > @@ -248,6 +249,7 @@ config CMA_SIZE_MBYTES > config CMA_SIZE_PERCENTAGE > int "Percentage of total memory" > depends on !CMA_SIZE_SEL_MBYTES > + default 0 if X86 > default 10 > help > Defines the size of the default memory area for Contiguous Memory You probably need to add some documentation too. Jean Delvare proposed the below, before your change. If the default is going to be zero on x86, that information and some further help should be added to this. From: Jean Delvare Subject: [PATCH] CMA: Document cma=0 It isn't obvious that CMA can be disabled on the kernel's command line, so document it. Signed-off-by: Jean Delvare Cc: Joonsoo Kim Cc: Greg Kroah-Hartman --- Documentation/kernel-parameters.txt |3 ++- drivers/base/Kconfig|3 +++ 2 files changed, 5 insertions(+), 1 deletion(-) --- linux-3.17-rc7.orig/Documentation/kernel-parameters.txt 2014-09-23 13:19:06.644838292 +0200 +++ linux-3.17-rc7/Documentation/kernel-parameters.txt 2014-10-04 14:10:03.257579721 +0200 @@ -656,7 +656,8 @@ bytes respectively. Such letter suffixes Sets the size of kernel global memory area for contiguous memory allocations and optionally the placement constraint by the physical address range of - memory allocations. For more information, see + memory allocations. A value of 0 disables CMA + altogether. For more information, see include/linux/dma-contiguous.h cmo_free_hint= [PPC] Format: { yes | no } --- linux-3.17-rc7.orig/drivers/base/Kconfig2014-09-12 16:23:14.911353676 +0200 +++ linux-3.17-rc7/drivers/base/Kconfig 2014-10-04 13:41:37.672347240 +0200 @@ -231,6 +231,9 @@ config DMA_CMA to allocate big physically-contiguous blocks of memory for use with hardware components that do not support I/O map nor scatter-gather. + You can disable CMA by specifying "cma=0" on the kernel's command + line. + For more information see . If unsure, say "n". -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fs: use kfree_rcu instead of i_callback
On Sat, 4 Oct 2014 23:00:42 -0400 John de la Garza wrote: > Since the callback is doing nothing more than calling kfree() we can > use kfree_rcu() instead of having to use a callback. > > Signed-off-by: John de la Garza > --- > fs/inode.c | 8 +--- > 1 file changed, 1 insertion(+), 7 deletions(-) > > diff --git a/fs/inode.c b/fs/inode.c > index 26753ba..51deccd 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -250,12 +250,6 @@ void __destroy_inode(struct inode *inode) > } > EXPORT_SYMBOL(__destroy_inode); > > -static void i_callback(struct rcu_head *head) > -{ > - struct inode *inode = container_of(head, struct inode, i_rcu); > - kmem_cache_free(inode_cachep, inode); > -} > - > static void destroy_inode(struct inode *inode) > { > BUG_ON(!list_empty(>i_lru)); > @@ -263,7 +257,7 @@ static void destroy_inode(struct inode *inode) > if (inode->i_sb->s_op->destroy_inode) > inode->i_sb->s_op->destroy_inode(inode); > else > - call_rcu(>i_rcu, i_callback); > + kfree(inode, i_rcu); Your description says "use kfree_rcu()" but that is kfree(). This won't even compile. And as Al pointed out, kfree() is not the same as kmem_cache_free(). So you'd need to invent kmem_cache_free_rcu() also. Not sure that would have any value elsewhere. > } > > /** -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fs: use kfree_rcu instead of i_callback
On Sat, 4 Oct 2014 23:00:42 -0400 John de la Garza j...@jjdev.com wrote: Since the callback is doing nothing more than calling kfree() we can use kfree_rcu() instead of having to use a callback. Signed-off-by: John de la Garza j...@jjdev.com --- fs/inode.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 26753ba..51deccd 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -250,12 +250,6 @@ void __destroy_inode(struct inode *inode) } EXPORT_SYMBOL(__destroy_inode); -static void i_callback(struct rcu_head *head) -{ - struct inode *inode = container_of(head, struct inode, i_rcu); - kmem_cache_free(inode_cachep, inode); -} - static void destroy_inode(struct inode *inode) { BUG_ON(!list_empty(inode-i_lru)); @@ -263,7 +257,7 @@ static void destroy_inode(struct inode *inode) if (inode-i_sb-s_op-destroy_inode) inode-i_sb-s_op-destroy_inode(inode); else - call_rcu(inode-i_rcu, i_callback); + kfree(inode, i_rcu); Your description says use kfree_rcu() but that is kfree(). This won't even compile. And as Al pointed out, kfree() is not the same as kmem_cache_free(). So you'd need to invent kmem_cache_free_rcu() also. Not sure that would have any value elsewhere. } /** -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cma: make default CMA area size zero for x86
On Sun, 5 Oct 2014 15:02:56 +0900 Akinobu Mita akinobu.m...@gmail.com wrote: This makes CMA memory area size zero for x86 in default configuration (doesn't change on the other architectures). If default CMA size is zero, DMA_CMA is disabled. It can be enabled by passing cma= to the kernel. This makes less impact on x86. Because there is no mainline driver that requires it for x86, and Peter Hurley reported the performance regression, as this is trying to drive _all_ dma mapping allocations through a _very_ small window. Signed-off-by: Akinobu Mita akinobu.m...@gmail.com Reported-by: Peter Hurley pe...@hurleysoftware.com Cc: Peter Hurley pe...@hurleysoftware.com Cc: Chuck Ebbert cebbert.l...@gmail.com Cc: Marek Szyprowski m.szyprow...@samsung.com Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: David Woodhouse dw...@infradead.org Cc: Don Dutile ddut...@redhat.com Cc: Thomas Gleixner t...@linutronix.de Cc: Ingo Molnar mi...@redhat.com Cc: H. Peter Anvin h...@zytor.com Cc: Andi Kleen a...@firstfloor.org Cc: Yinghai Lu ying...@kernel.org Cc: x...@kernel.org Cc: io...@lists.linux-foundation.org --- drivers/base/Kconfig | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 4e7f0ff..92a5987e 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -240,6 +240,7 @@ comment Default contiguous memory area size: config CMA_SIZE_MBYTES int Size in Mega Bytes depends on !CMA_SIZE_SEL_PERCENTAGE + default 0 if X86 default 16 help Defines the size (in MiB) of the default memory area for Contiguous @@ -248,6 +249,7 @@ config CMA_SIZE_MBYTES config CMA_SIZE_PERCENTAGE int Percentage of total memory depends on !CMA_SIZE_SEL_MBYTES + default 0 if X86 default 10 help Defines the size of the default memory area for Contiguous Memory You probably need to add some documentation too. Jean Delvare proposed the below, before your change. If the default is going to be zero on x86, that information and some further help should be added to this. From: Jean Delvare jdelv...@suse.de Subject: [PATCH] CMA: Document cma=0 It isn't obvious that CMA can be disabled on the kernel's command line, so document it. Signed-off-by: Jean Delvare jdelv...@suse.de Cc: Joonsoo Kim iamjoonsoo@lge.com Cc: Greg Kroah-Hartman gre...@linuxfoundation.org --- Documentation/kernel-parameters.txt |3 ++- drivers/base/Kconfig|3 +++ 2 files changed, 5 insertions(+), 1 deletion(-) --- linux-3.17-rc7.orig/Documentation/kernel-parameters.txt 2014-09-23 13:19:06.644838292 +0200 +++ linux-3.17-rc7/Documentation/kernel-parameters.txt 2014-10-04 14:10:03.257579721 +0200 @@ -656,7 +656,8 @@ bytes respectively. Such letter suffixes Sets the size of kernel global memory area for contiguous memory allocations and optionally the placement constraint by the physical address range of - memory allocations. For more information, see + memory allocations. A value of 0 disables CMA + altogether. For more information, see include/linux/dma-contiguous.h cmo_free_hint= [PPC] Format: { yes | no } --- linux-3.17-rc7.orig/drivers/base/Kconfig2014-09-12 16:23:14.911353676 +0200 +++ linux-3.17-rc7/drivers/base/Kconfig 2014-10-04 13:41:37.672347240 +0200 @@ -231,6 +231,9 @@ config DMA_CMA to allocate big physically-contiguous blocks of memory for use with hardware components that do not support I/O map nor scatter-gather. + You can disable CMA by specifying cma=0 on the kernel's command + line. + For more information see include/linux/dma-contiguous.h. If unsure, say n. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: (Song) Fk SystemD
On Sun, 5 Oct 2014 10:29:24 + Gregory Smith gregorysmith19...@gmail.com wrote: Fuck you. This is what you systemd shitheads say to everything. It's either your way or the highway. Did you even *read* Al's reply? He's objecting to the tactics, not the underlying message that systemd is crap. Sheesh. On 10/4/14, Al Viro v...@zeniv.linux.org.uk wrote: On Sat, Oct 04, 2014 at 06:49:30PM +0200, Tom Collins wrote: [snip masturbation] To quote Jordan Hubbard, Your brand of advocacy is akin to having the KKK show up at one's wedding to congradulate the happy couple on their choice of marrying within their race. Some kinds of support you just don't need if all it leaves you with the desire to take a couple of dozen showers. That was about a different dipshit in a different flamefest, but it applies to you nicely. Incidentally, as you obviously understand and don't give a fuck about, you are actively helping the Fine Piece Of Software in question, letting the pushers of said FPOS to paint everyone who has objections with your, er, output... Go play in the traffic, kid. Remember to arrange a video... -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20141004174018.gw7...@zeniv.linux.org.uk -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: perf rasd integration plan
On Sun, 5 Oct 2014 20:24:42 +0200 Jiri Olsa jo...@redhat.com wrote: On Sun, Oct 05, 2014 at 07:48:01PM +0200, Borislav Petkov wrote: Top-posting on purpose: Btw, jolsa, if you get your LCE proposal for the perf splitting approved, please post the time here so people can come. yep, it got accepted, ther schedule is: Friday, October 17, 2014 from 9:00am – Noon Room 2 https://pdxplumbers.osuosl.org/2014/ocw/events/LPC2014/tracks/351 The SSL cert for that site is comic. 1. Expired over three years ago. 2. Was issued to a different domain. 3. Issued by an un-verifiable CA. This is a joke, right? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.16 000/357] 3.16.4-stable review
On Sun, 5 Oct 2014 13:39:14 -0700 Greg Kroah-Hartman gre...@linuxfoundation.org wrote: On Sat, Oct 04, 2014 at 07:38:39AM -0500, Chuck Ebbert wrote: On Fri, 3 Oct 2014 14:26:26 -0700 Greg Kroah-Hartman gre...@linuxfoundation.org wrote: - Note: This is a big stable release. Mostly my fault for being on the road last week, combined with an unusually large number of patches being tagged for the stable tree. Anyway, I've caught up with all pending patches before 3.17-rc7, so if you have marked something for the stable tree that I have not applied, or emailed the stable@v.k.o list asking for a patch, that is not here, please let me know. - This is the start of the stable review cycle for the 3.16.4 release. There are 357 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Sun Oct 5 21:28:42 UTC 2014. Anything received after that time might be too late. Should also include this one, which fixes CVE-2014-6410: commit c03aa9f6e1f938618e6db2e23afef0574efeeb65 Author: Jan Kara j...@suse.cz Date: Thu Sep 4 14:06:55 2014 +0200 udf: Avoid infinite loop when processing indirect ICBs Thanks, I'll queue that up for the next stable releases. greg k-h I built 3.16.4-rc1 for Fedora, which already has this patch included. It boots and runs on Fedora 21, and that patch has been in Fedora 21 for a while. Why not include it in 3.16.4? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Slowdown due to threads bouncing between HT cores
On Fri, 3 Oct 2014 21:44:29 +0200 "Steinar H. Gunderson" wrote: > Hi, > > I did a chess benchmark of my new machine (2x E5-2650v3, so 20x2.3GHz > Haswell-EP), and it performed a bit worse than comparable Windows setups. > It looks like the scheduler somehow doesn't perform as well with > hyperthreading; HT is on in the BIOS, but I'm only using 20 threads > (chess scales sublinearly, so using all 40 usually isn't a good idea), > so really, the threads should just get one core each and that's it. > It looks like they are bouncing between cores, reducing overall performance > by ~20% for some reason. (The machine is otherwise generally idle.) > Try playing with /proc/sys/kernel/sched_migration_cost_ns. This sets the number of nanoseconds the kernel will wait before considering moving a thread to another CPU. I have mine set to 5000. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.16 000/357] 3.16.4-stable review
On Fri, 3 Oct 2014 14:26:26 -0700 Greg Kroah-Hartman wrote: > - > Note: This is a big stable release. Mostly my fault for being on the > road last week, combined with an unusually large number of patches being > tagged for the stable tree. Anyway, I've caught up with all pending > patches before 3.17-rc7, so if you have marked something for the stable > tree that I have not applied, or emailed the stable@v.k.o list asking > for a patch, that is not here, please let me know. > - > > This is the start of the stable review cycle for the 3.16.4 release. > There are 357 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Sun Oct 5 21:28:42 UTC 2014. > Anything received after that time might be too late. > Should also include this one, which fixes CVE-2014-6410: commit c03aa9f6e1f938618e6db2e23afef0574efeeb65 Author: Jan Kara Date: Thu Sep 4 14:06:55 2014 +0200 udf: Avoid infinite loop when processing indirect ICBs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [x86, locking/rwlocks, btrfs] INFO: rcu_sched self-detected stall on CPU
On Fri, 03 Oct 2014 23:27:58 -0400 Waiman Long wrote: > On 10/03/2014 09:33 AM, Fengguang Wu wrote: > > Hi Waiman, > > > > FYI, we noticed the below changes on commit > > > > bd01ec1a13f9a327950c8e3080096446c7804753 ("x86, locking/rwlocks: Enable > > qrwlocks on x86") > > > > +--+++ > > | | 70af2f8a4f | bd01ec1a13 | > > +--+++ > > | boot_successes | 3 | 2 | > > | boot_failures| 7 | 13 | > > | BUG:kernel_test_crashed | 7 | 8 | > > | INFO:rcu_sched_self-detected_stall_on_CPU| 0 | 4 | > > | RIP:intel_idle | 0 | 4 | > > | RIP:queue_write_lock_slowpath| 0 | 4 | > > | RIP:queue_read_lock_slowpath | 0 | 4 | > > | RIP:sys_imageblit_sysimgblt | 0 | 2 | > > | RIP:default_send_IPI_mask_sequence_phys | 0 | 1 | > > | RIP:memcpy | 0 | 1 | > > | RIP:delay_tsc| 0 | 4 | > > | backtrace:cpu_startup_entry | 0 | 3 | > > | backtrace:do_fsync | 0 | 4 | > > | backtrace:SyS_fsync | 0 | 4 | > > | backtrace:normal_work_helper | 0 | 1 | > > | backtrace:vfs_write | 0 | 3 | > > | backtrace:SyS_write | 0 | 3 | > > | backtrace:do_sys_open| 0 | 4 | > > | backtrace:SyS_open | 0 | 4 | > > | backtrace:flush_to_ldisc | 0 | 1 | > > | RIP:cpu_startup_entry| 0 | 1 | > > | RIP:native_read_tsc | 0 | 2 | > > | RIP:rcu_eqs_exit_common | 0 | 1 | > > | INFO:rcu_sched_detected_stalls_on_CPUs/tasks | 0 | 1 | > > +--+++ > > > > > > run: /lkp/lkp/src/monitors/wrapper sched_debug {"interval"=>"10"} > > run: /usr/bin/time -v -o /lkp/lkp/src/tmp/time /lkp/lkp/src/tests/wrapper > > fsmark{"filesize"=>"9B", "test_size"=>"400M", > > "sync_method"=>"fsyncBeforeClose", "nr_directories"=>"16d", > > "nr_files_per_directory"=>"256fpd"} > > run: /lkp/lkp/src/monitors/wrapper pmeter {} > > [ 125.656200] INFO: rcu_sched self-detected stall on CPU > > [ 125.657199] INFO: rcu_sched self-detected stall on CPUINFO: rcu_sched > > self-detected stall on CPUINFO: rcu_sched self-detected stall on CPUINFO: > > rcu_sched self-detected stall on CPU { { { { 0 9 7 14} } } } (t=10 > > jiffies g=1792 c=1791 q=0) > > [ 125.657218] (t=10 jiffies g=1792 c=1791 q=0) > > [ 125.657219] (t=10 jiffies g=1792 c=1791 q=0) > > [ 125.657221] (t=10 jiffies g=1792 c=1791 q=0) > > [ 125.657222] sending NMI to all CPUs: > > [ 125.657224] NMI backtrace for cpu 0 > > [ 125.657227] CPU: 0 PID: 3025 Comm: fs_mark Not tainted 3.16.0 #1 > > [ 125.657227] Hardware name: Intel Corporation S5520UR/S5520UR, BIOS > > S5500.86B.01.00.0050.050620101605 05/06/2010 > > [ 125.657228] task: 88007ef58000 ti: 88007ef54000 task.ti: > > 88007ef54000 > > [ 125.657229] RIP: 0010:[] [] > > native_read_tsc+0x6/0x20 > > [ 125.657236] RSP: 0018:88036fc03d20 EFLAGS: 0002 > > [ 125.657237] RAX: 3f172acf RBX: 3f172ab0 RCX: > > 0028 > > [ 125.657238] RDX: 14e5 RSI: 0018 RDI: > > 0004773a > > [ 125.657238] RBP: 88036fc03d20 R08: 81da2200 R09: > > 0092 > > [ 125.657239] R10: 14e53edc9480 R11: 0008 R12: > > 0004773a > > [ 125.657239] R13: R14: 0002 R15: > > 0001 > > [ 125.657241] FS: 01ee0880(0063) GS:88036fc0() > > knlGS: > > [ 125.657241] CS: 0010 DS: ES: CR0: 8005003b > > [ 125.657242] CR2: 0061c000 CR3: 7ef3c000 CR4: > > 07f0 > > [ 125.657243] Stack: > > [ 125.657243] 88036fc03d48 813f85e3 1000 > > 03e9 > > [ 125.657244] 0400 88036fc03d58 813f8538 > > 88036fc03d78 > > [ 125.657246] 81046d1a b032 81da2200 > > 88036fc03dc0 > > [ 125.657247] Call Trace: > > [ 125.657247] [] delay_tsc+0x43/0x90 > > [ 125.657253] []
Re: [x86, locking/rwlocks, btrfs] INFO: rcu_sched self-detected stall on CPU
On Fri, 03 Oct 2014 23:27:58 -0400 Waiman Long waiman.l...@hp.com wrote: On 10/03/2014 09:33 AM, Fengguang Wu wrote: Hi Waiman, FYI, we noticed the below changes on commit bd01ec1a13f9a327950c8e3080096446c7804753 (x86, locking/rwlocks: Enable qrwlocks on x86) +--+++ | | 70af2f8a4f | bd01ec1a13 | +--+++ | boot_successes | 3 | 2 | | boot_failures| 7 | 13 | | BUG:kernel_test_crashed | 7 | 8 | | INFO:rcu_sched_self-detected_stall_on_CPU| 0 | 4 | | RIP:intel_idle | 0 | 4 | | RIP:queue_write_lock_slowpath| 0 | 4 | | RIP:queue_read_lock_slowpath | 0 | 4 | | RIP:sys_imageblit_sysimgblt | 0 | 2 | | RIP:default_send_IPI_mask_sequence_phys | 0 | 1 | | RIP:memcpy | 0 | 1 | | RIP:delay_tsc| 0 | 4 | | backtrace:cpu_startup_entry | 0 | 3 | | backtrace:do_fsync | 0 | 4 | | backtrace:SyS_fsync | 0 | 4 | | backtrace:normal_work_helper | 0 | 1 | | backtrace:vfs_write | 0 | 3 | | backtrace:SyS_write | 0 | 3 | | backtrace:do_sys_open| 0 | 4 | | backtrace:SyS_open | 0 | 4 | | backtrace:flush_to_ldisc | 0 | 1 | | RIP:cpu_startup_entry| 0 | 1 | | RIP:native_read_tsc | 0 | 2 | | RIP:rcu_eqs_exit_common | 0 | 1 | | INFO:rcu_sched_detected_stalls_on_CPUs/tasks | 0 | 1 | +--+++ run: /lkp/lkp/src/monitors/wrapper sched_debug {interval=10} run: /usr/bin/time -v -o /lkp/lkp/src/tmp/time /lkp/lkp/src/tests/wrapper fsmark{filesize=9B, test_size=400M, sync_method=fsyncBeforeClose, nr_directories=16d, nr_files_per_directory=256fpd} run: /lkp/lkp/src/monitors/wrapper pmeter {} [ 125.656200] INFO: rcu_sched self-detected stall on CPU [ 125.657199] INFO: rcu_sched self-detected stall on CPUINFO: rcu_sched self-detected stall on CPUINFO: rcu_sched self-detected stall on CPUINFO: rcu_sched self-detected stall on CPU { { { { 0 9 7 14} } } } (t=10 jiffies g=1792 c=1791 q=0) [ 125.657218] (t=10 jiffies g=1792 c=1791 q=0) [ 125.657219] (t=10 jiffies g=1792 c=1791 q=0) [ 125.657221] (t=10 jiffies g=1792 c=1791 q=0) [ 125.657222] sending NMI to all CPUs: [ 125.657224] NMI backtrace for cpu 0 [ 125.657227] CPU: 0 PID: 3025 Comm: fs_mark Not tainted 3.16.0 #1 [ 125.657227] Hardware name: Intel Corporation S5520UR/S5520UR, BIOS S5500.86B.01.00.0050.050620101605 05/06/2010 [ 125.657228] task: 88007ef58000 ti: 88007ef54000 task.ti: 88007ef54000 [ 125.657229] RIP: 0010:[8101cff6] [8101cff6] native_read_tsc+0x6/0x20 [ 125.657236] RSP: 0018:88036fc03d20 EFLAGS: 0002 [ 125.657237] RAX: 3f172acf RBX: 3f172ab0 RCX: 0028 [ 125.657238] RDX: 14e5 RSI: 0018 RDI: 0004773a [ 125.657238] RBP: 88036fc03d20 R08: 81da2200 R09: 0092 [ 125.657239] R10: 14e53edc9480 R11: 0008 R12: 0004773a [ 125.657239] R13: R14: 0002 R15: 0001 [ 125.657241] FS: 01ee0880(0063) GS:88036fc0() knlGS: [ 125.657241] CS: 0010 DS: ES: CR0: 8005003b [ 125.657242] CR2: 0061c000 CR3: 7ef3c000 CR4: 07f0 [ 125.657243] Stack: [ 125.657243] 88036fc03d48 813f85e3 1000 03e9 [ 125.657244] 0400 88036fc03d58 813f8538 88036fc03d78 [ 125.657246] 81046d1a b032 81da2200 88036fc03dc0 [ 125.657247] Call Trace: [ 125.657247]IRQ [813f85e3] delay_tsc+0x43/0x90 [ 125.657253] [813f8538] __const_udelay+0x28/0x30 [ 125.657254] [81046d1a] native_safe_apic_wait_icr_idle+0x2a/0x60 [ 125.657257]
Re: [PATCH 3.16 000/357] 3.16.4-stable review
On Fri, 3 Oct 2014 14:26:26 -0700 Greg Kroah-Hartman gre...@linuxfoundation.org wrote: - Note: This is a big stable release. Mostly my fault for being on the road last week, combined with an unusually large number of patches being tagged for the stable tree. Anyway, I've caught up with all pending patches before 3.17-rc7, so if you have marked something for the stable tree that I have not applied, or emailed the stable@v.k.o list asking for a patch, that is not here, please let me know. - This is the start of the stable review cycle for the 3.16.4 release. There are 357 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Sun Oct 5 21:28:42 UTC 2014. Anything received after that time might be too late. Should also include this one, which fixes CVE-2014-6410: commit c03aa9f6e1f938618e6db2e23afef0574efeeb65 Author: Jan Kara j...@suse.cz Date: Thu Sep 4 14:06:55 2014 +0200 udf: Avoid infinite loop when processing indirect ICBs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Slowdown due to threads bouncing between HT cores
On Fri, 3 Oct 2014 21:44:29 +0200 Steinar H. Gunderson sgunder...@bigfoot.com wrote: Hi, I did a chess benchmark of my new machine (2x E5-2650v3, so 20x2.3GHz Haswell-EP), and it performed a bit worse than comparable Windows setups. It looks like the scheduler somehow doesn't perform as well with hyperthreading; HT is on in the BIOS, but I'm only using 20 threads (chess scales sublinearly, so using all 40 usually isn't a good idea), so really, the threads should just get one core each and that's it. It looks like they are bouncing between cores, reducing overall performance by ~20% for some reason. (The machine is otherwise generally idle.) Try playing with /proc/sys/kernel/sched_migration_cost_ns. This sets the number of nanoseconds the kernel will wait before considering moving a thread to another CPU. I have mine set to 5000. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [tip:x86/asm] x86: Speed up ___preempt_schedule*() by using THUNK helpers
On Fri, 3 Oct 2014 23:41:24 +0200 Oleg Nesterov wrote: > On 10/03, Chuck Ebbert wrote: > > > > > [ 921.917752] ? ___preempt_schedule_context (arch/x86/lib/thunk_64.S:44) > > > [ 921.917752] ? preempt_schedule_context (kernel/context_tracking.c:145) > > > [ 921.917752] ? ___preempt_schedule_context (arch/x86/lib/thunk_64.S:44) > > > [ 921.917752] ? preempt_schedule_context (kernel/context_tracking.c:145) > > > [ 921.917752] ? ___preempt_schedule_context (arch/x86/lib/thunk_64.S:44) > > > [ 921.917752] ? preempt_schedule_context (kernel/context_tracking.c:145) > > > [ 921.917752] ? ___preempt_schedule_context (arch/x86/lib/thunk_64.S:44) > > > [ 921.917752] ? preempt_schedule_context (kernel/context_tracking.c:145) > > > [ 921.917752] ? ___preempt_schedule_context (arch/x86/lib/thunk_64.S:44) > > > [ 921.917752] ? preempt_schedule_context (kernel/context_tracking.c:145) > > > > > > > > I *think* this is because RBP isn't being saved across task switch > > anymore? > > > > Without CONFIG_FRAME_POINTERS that night not be a problem... > > Could you please spell? > > I don't even understand "RBP isn't being saved", SAVE_CONTEXT/RESTORE_CONTEXT > do push/pop %rbp? > SAVE_ARGS/RESTORE_ARGS, which is what THUNK uses, doesn't push/pop %rbp Before, SAVE_ALL/RESTORE_ALL were being used around the call to preempt_schedule(). So from the symptoms I thought this was the problem. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [tip:x86/asm] x86: Speed up ___preempt_schedule*() by using THUNK helpers
On Fri, 03 Oct 2014 00:50:13 -0400 Sasha Levin wrote: > On 09/24/2014 11:02 AM, tip-bot for Oleg Nesterov wrote: > > Commit-ID: 0ad6e3c5199be12c9745da8f8b9e3c9f8066c235 > > Gitweb: > > http://git.kernel.org/tip/0ad6e3c5199be12c9745da8f8b9e3c9f8066c235 > > Author: Oleg Nesterov > > AuthorDate: Sun, 21 Sep 2014 20:41:53 +0200 > > Committer: Ingo Molnar > > CommitDate: Wed, 24 Sep 2014 15:15:38 +0200 > > > > x86: Speed up ___preempt_schedule*() by using THUNK helpers > > > > ___preempt_schedule() does SAVE_ALL/RESTORE_ALL but this is > > suboptimal, we do not need to save/restore the callee-saved > > register. And we already have arch/x86/lib/thunk_*.S which > > implements the similar asm wrappers, so it makes sense to > > redefine ___preempt_schedule() as "THUNK ..." and remove > > preempt.S altogether. > > > > Signed-off-by: Oleg Nesterov > > Reviewed-by: Andy Lutomirski > > Cc: Denys Vlasenko > > Cc: Peter Zijlstra > > Cc: Linus Torvalds > > Link: http://lkml.kernel.org/r/20140921184153.ga23...@redhat.com > > Signed-off-by: Ingo Molnar > > --- > > Hi Oleg, > > I *think* that this patch is causing the following trace > (arch/x86/lib/thunk_64.S:44 > is new code introduced by this patch): > > > [ 921.908530] kernel BUG at kernel/sched/core.c:2702! > [ 921.909159] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC > [ 921.910084] Dumping ftrace buffer: > [ 921.910626](ftrace buffer empty) > [ 921.911178] Modules linked in: > [ 921.915690] CPU: 18 PID: 9489 Comm: trinity-c195 Not tainted > 3.17.0-rc7-next-20141002-sasha-00031-gbdb4244 #1273 > [ 921.917016] task: 8802bd748000 ti: 8802bda3c000 task.ti: > 8802bda3c000 > [ 921.917752] RIP: __schedule (kernel/sched/core.c:2702 > kernel/sched/core.c:2808) > [ 921.917752] RSP: 0018:8802bda3c360 EFLAGS: 00010297 > [ 921.917752] RAX: 8802bda3c000 RBX: 8808501e2a00 RCX: > 0001 > [ 921.917752] RDX: RSI: RDI: > 0286 > [ 921.917752] RBP: 8802bda3c3c0 R08: 0001aa50 R09: > > [ 921.917752] R10: R11: 0001 R12: > 0012 > [ 921.917752] R13: 8808501e2a00 R14: 0002 R15: > 8802bda3c428 > [ 921.917752] FS: 7f5475cc2700() GS:88085000() > knlGS: > [ 921.917752] CS: 0010 DS: ES: CR0: 8005003b > [ 921.917752] CR2: 7f5475abe60c CR3: 0002bebab000 CR4: > 06a0 > [ 921.917752] DR0: 006f DR1: DR2: > > [ 921.917752] DR3: DR6: 0ff0 DR7: > 0600 > [ 921.917752] Stack: > [ 921.917752] 0001aa50 8802bd748000 8802bda3ffd8 > 001e2a00 > [ 921.917752] 001e2a00 8802bd748000 8802bda3c3a0 > 001e2a00 > [ 921.917752] 8802bd748000 0001a9ea 0002 > 8802bda3c428 > [ 921.917752] Call Trace: > [ 921.917752] schedule_user (kernel/sched/core.c:2894 > include/linux/jump_label.h:114 include/linux/context_tracking_state.h:27 > include/linux/context_tracking.h:20 kernel/sched/core.c:2909) > [ 921.917752] int_careful (arch/x86/kernel/entry_64.S:560) > [ 921.917752] ? retint_careful (arch/x86/kernel/entry_64.S:889) > [ 921.917752] ? preempt_schedule (./arch/x86/include/asm/preempt.h:80 > (discriminator 1) kernel/sched/core.c:2943 (discriminator 1)) > [ 921.917752] ? preempt_schedule_context > (./arch/x86/include/asm/preempt.h:75 kernel/context_tracking.c:143) > [ 921.917752] ? ___preempt_schedule_context (arch/x86/lib/thunk_64.S:44) > [ 921.917752] ? preempt_schedule_context (kernel/context_tracking.c:145) > [ 921.917752] ? ___preempt_schedule_context (arch/x86/lib/thunk_64.S:44) > [ 921.917752] ? preempt_schedule_context (kernel/context_tracking.c:145) > [ 921.917752] ? ___preempt_schedule_context (arch/x86/lib/thunk_64.S:44) > [ 921.917752] ? preempt_schedule_context (kernel/context_tracking.c:145) > [ 921.917752] ? ___preempt_schedule_context (arch/x86/lib/thunk_64.S:44) > [ 921.917752] ? preempt_schedule_context (kernel/context_tracking.c:145) > [ 921.917752] ? ___preempt_schedule_context (arch/x86/lib/thunk_64.S:44) > [ 921.917752] ? preempt_schedule_context (kernel/context_tracking.c:145) I *think* this is because RBP isn't being saved across task switch anymore? Without CONFIG_FRAME_POINTERS that night not be a problem... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [tip:x86/asm] x86: Speed up ___preempt_schedule*() by using THUNK helpers
On Fri, 03 Oct 2014 00:50:13 -0400 Sasha Levin sasha.le...@oracle.com wrote: On 09/24/2014 11:02 AM, tip-bot for Oleg Nesterov wrote: Commit-ID: 0ad6e3c5199be12c9745da8f8b9e3c9f8066c235 Gitweb: http://git.kernel.org/tip/0ad6e3c5199be12c9745da8f8b9e3c9f8066c235 Author: Oleg Nesterov o...@redhat.com AuthorDate: Sun, 21 Sep 2014 20:41:53 +0200 Committer: Ingo Molnar mi...@kernel.org CommitDate: Wed, 24 Sep 2014 15:15:38 +0200 x86: Speed up ___preempt_schedule*() by using THUNK helpers ___preempt_schedule() does SAVE_ALL/RESTORE_ALL but this is suboptimal, we do not need to save/restore the callee-saved register. And we already have arch/x86/lib/thunk_*.S which implements the similar asm wrappers, so it makes sense to redefine ___preempt_schedule() as THUNK ... and remove preempt.S altogether. Signed-off-by: Oleg Nesterov o...@redhat.com Reviewed-by: Andy Lutomirski l...@amacapital.net Cc: Denys Vlasenko dvlas...@redhat.com Cc: Peter Zijlstra pet...@infradead.org Cc: Linus Torvalds torva...@linux-foundation.org Link: http://lkml.kernel.org/r/20140921184153.ga23...@redhat.com Signed-off-by: Ingo Molnar mi...@kernel.org --- Hi Oleg, I *think* that this patch is causing the following trace (arch/x86/lib/thunk_64.S:44 is new code introduced by this patch): [ 921.908530] kernel BUG at kernel/sched/core.c:2702! [ 921.909159] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 921.910084] Dumping ftrace buffer: [ 921.910626](ftrace buffer empty) [ 921.911178] Modules linked in: [ 921.915690] CPU: 18 PID: 9489 Comm: trinity-c195 Not tainted 3.17.0-rc7-next-20141002-sasha-00031-gbdb4244 #1273 [ 921.917016] task: 8802bd748000 ti: 8802bda3c000 task.ti: 8802bda3c000 [ 921.917752] RIP: __schedule (kernel/sched/core.c:2702 kernel/sched/core.c:2808) [ 921.917752] RSP: 0018:8802bda3c360 EFLAGS: 00010297 [ 921.917752] RAX: 8802bda3c000 RBX: 8808501e2a00 RCX: 0001 [ 921.917752] RDX: RSI: RDI: 0286 [ 921.917752] RBP: 8802bda3c3c0 R08: 0001aa50 R09: [ 921.917752] R10: R11: 0001 R12: 0012 [ 921.917752] R13: 8808501e2a00 R14: 0002 R15: 8802bda3c428 [ 921.917752] FS: 7f5475cc2700() GS:88085000() knlGS: [ 921.917752] CS: 0010 DS: ES: CR0: 8005003b [ 921.917752] CR2: 7f5475abe60c CR3: 0002bebab000 CR4: 06a0 [ 921.917752] DR0: 006f DR1: DR2: [ 921.917752] DR3: DR6: 0ff0 DR7: 0600 [ 921.917752] Stack: [ 921.917752] 0001aa50 8802bd748000 8802bda3ffd8 001e2a00 [ 921.917752] 001e2a00 8802bd748000 8802bda3c3a0 001e2a00 [ 921.917752] 8802bd748000 0001a9ea 0002 8802bda3c428 [ 921.917752] Call Trace: [ 921.917752] schedule_user (kernel/sched/core.c:2894 include/linux/jump_label.h:114 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:20 kernel/sched/core.c:2909) [ 921.917752] int_careful (arch/x86/kernel/entry_64.S:560) [ 921.917752] ? retint_careful (arch/x86/kernel/entry_64.S:889) [ 921.917752] ? preempt_schedule (./arch/x86/include/asm/preempt.h:80 (discriminator 1) kernel/sched/core.c:2943 (discriminator 1)) [ 921.917752] ? preempt_schedule_context (./arch/x86/include/asm/preempt.h:75 kernel/context_tracking.c:143) [ 921.917752] ? ___preempt_schedule_context (arch/x86/lib/thunk_64.S:44) [ 921.917752] ? preempt_schedule_context (kernel/context_tracking.c:145) [ 921.917752] ? ___preempt_schedule_context (arch/x86/lib/thunk_64.S:44) [ 921.917752] ? preempt_schedule_context (kernel/context_tracking.c:145) [ 921.917752] ? ___preempt_schedule_context (arch/x86/lib/thunk_64.S:44) [ 921.917752] ? preempt_schedule_context (kernel/context_tracking.c:145) [ 921.917752] ? ___preempt_schedule_context (arch/x86/lib/thunk_64.S:44) [ 921.917752] ? preempt_schedule_context (kernel/context_tracking.c:145) [ 921.917752] ? ___preempt_schedule_context (arch/x86/lib/thunk_64.S:44) [ 921.917752] ? preempt_schedule_context (kernel/context_tracking.c:145) snip lots of repeats of this I *think* this is because RBP isn't being saved across task switch anymore? Without CONFIG_FRAME_POINTERS that night not be a problem... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [tip:x86/asm] x86: Speed up ___preempt_schedule*() by using THUNK helpers
On Fri, 3 Oct 2014 23:41:24 +0200 Oleg Nesterov o...@redhat.com wrote: On 10/03, Chuck Ebbert wrote: [ 921.917752] ? ___preempt_schedule_context (arch/x86/lib/thunk_64.S:44) [ 921.917752] ? preempt_schedule_context (kernel/context_tracking.c:145) [ 921.917752] ? ___preempt_schedule_context (arch/x86/lib/thunk_64.S:44) [ 921.917752] ? preempt_schedule_context (kernel/context_tracking.c:145) [ 921.917752] ? ___preempt_schedule_context (arch/x86/lib/thunk_64.S:44) [ 921.917752] ? preempt_schedule_context (kernel/context_tracking.c:145) [ 921.917752] ? ___preempt_schedule_context (arch/x86/lib/thunk_64.S:44) [ 921.917752] ? preempt_schedule_context (kernel/context_tracking.c:145) [ 921.917752] ? ___preempt_schedule_context (arch/x86/lib/thunk_64.S:44) [ 921.917752] ? preempt_schedule_context (kernel/context_tracking.c:145) snip lots of repeats of this I *think* this is because RBP isn't being saved across task switch anymore? Without CONFIG_FRAME_POINTERS that night not be a problem... Could you please spell? I don't even understand RBP isn't being saved, SAVE_CONTEXT/RESTORE_CONTEXT do push/pop %rbp? SAVE_ARGS/RESTORE_ARGS, which is what THUNK uses, doesn't push/pop %rbp Before, SAVE_ALL/RESTORE_ALL were being used around the call to preempt_schedule(). So from the symptoms I thought this was the problem. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] vfs: Don't exchange "short" filenames unconditionally.
On Wed, 1 Oct 2014 01:16:15 +0100 Al Viro wrote: Can we get the below added somewhere in Documentation/filesystems/ ? I don't see anything there that covers all this. > > Huh? copy_name() does copy a _reference_, not the name itself. All the > copying involved is source->d_name.name = target->d_name.name. And those > are simply unsigned char *. > > write_seqcount_begin() is irrelevant here. Look: all callers of > __d_move(x, y) are holding references both to x and y. Contributing to > the refcount of dentries themselves, that is, not the names. > > That gives exclusion between __d_move() and free_dentry() - the latter cannot > be called until dentry refcount reaches zero. RCU is completely irrelevant > here. In fact, no call chain leads to __d_move() under rcu_read_lock(). > You must hold the target dentry hard, or it could simply be freed right > under you. > > And __d_move() is taking ->d_lock on all dentries involved (in > addition to rename_lock serializing it system-wide). > > What could possibly lead to refcount zero being observed on target of > __d_move()? The history of any dentry is this: > * it is created by __d_alloc(). Nobody can see it until __d_alloc() > returns. Dentry refcount (not to be confused with refcount of external > name) is 1. > * it passes through some (usually - zero) __d_move() calls. > Some - as the first argument, some - as the second one. All those > calls are serialized by global seqlock - callers must hold rename_lock. > And all of them are done by somebody who is holding a counting reference > to dentries in question. > * counting references to dentry might be taken and dropped; > eventually refcount reaches zero (under ->d_lock) and no further > counting references can be taken after that. See __dentry_kill() - the > first thing it does is poisoning the refcount, so that any future > attempt to increment it would fail. __dentry_kill() (still under ->d_lock > of dentry, ->d_lock of its parent and ->i_lock of its inode) removes > dentry from the tree, from hash and from the alias list of inode; > Then it drops the locks. At that point the only search structure dentry > might be found in is shrink list; if it's not on such list, free_dentry() > is called immediately, otherwise it's marked so that the code processing > the shrink list in question would, as soon as it gets to that sucker, > remove it from the shrink list and call the same free_dentry(). And that's > the only thing done to such dentry by somebody finding it via a shrink list. > * once free_dentry() has been reached, dentry can can be only seen > by RCU lookups, and after the grace period ends it gets physically freed. > > free_dentry() isn't allowed to overlap __d_move(); to have that happen is > a serious dentry refcounting bug. No __d_move() is allowed _after_ > free_dentry() has been entered, either. Again, it would take a refcounting > bug for dentries to have that happen - basically, double dput() somewhere. > If that happens, all bets are off, of course - if dentry gets unexpectedly > freed under somebody who has grabbed a reference to it and has not dropped > it yet, we are fucked. > > Nothing outside of __d_move() is allowed to change ->d_name.name. > RCU-critical > code is allowed to fetch and dereference it, and such code relies upon > a) freeing of name seen by somebody who'd done rcu_read_lock() being > delayed until after the matching rcu_read_unlock() > b) store of terminating NUL done by __d_alloc() (and never overwritten > afterwards) being seen by RCU-critical code that has found the pointer to > that name in dentry->d_name.name > > All other code accessing ->d_name.name is required to hold one of the locks > that are held by __d_move() and its callers. Grabbing any of those leads > to smp_mb() on alpha, which serves as data dependency barrier there, so > we don't need explicit barrier there as we do in RCU-critical places - > guarding > NUL will be seen. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pipe/page fault oddness.
On Wed, 01 Oct 2014 23:32:15 -0400 Sasha Levin wrote: > On 10/01/2014 06:28 PM, Chuck Ebbert wrote: > > On Wed, 01 Oct 2014 18:08:30 -0400 > > Sasha Levin wrote: > > > >> > On 10/01/2014 04:20 PM, Linus Torvalds wrote: > >>> > > So I'm really sending this patch out in the hope that it will get > >>> > > comments, fixup and possibly even testing by people who actually know > >>> > > the NUMA balancing code. Rik? Anybody? > >> > > >> > Hi Linus, > >> > > >> > I've tried this patch on the same configuration that was triggering > >> > the VM_BUG_ON that Hugh mentioned previously. Surprisingly enough it > >> > ran fine for ~20 minutes before exploding with: > >> > > >> > [ 2781.566206] kernel BUG at mm/huge_memory.c:1293! > > That's: > > BUG_ON(is_huge_zero_page(page)); > > > > Can you change your scripts to show the source code line when > > the error is a BUG_ON()? The machine code disassembly after the > > oops message doesn't really help. > > > > Hum? The source code line is the first line in the trace: > > [ 2781.566206] kernel BUG at mm/huge_memory.c:1293! > I meant, display the contents of that line so we can see what the BUG_ON() was triggered by. In some cases you might have a custom patch applied or be running a version that some people don't have handy. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/