Re: [REGRESSION] Boot hang with 939f04bec printk: enable interrupts before calling console_trylock_for_printk()
On Wed 23-07-14 01:29:32, Andreas Bombe wrote: > On Mon, Jul 21, 2014 at 12:04:34PM +0200, Jan Kara wrote: > > On Sat 19-07-14 00:50:05, Andreas Bombe wrote: > > > I don't see anything in /sys/kernel/debug/tracing/trace_pipe or > > > .../trace (besides the header) with your patch applied. In case you > > > meant to test it with the problematic printk change, I also tried with > > > the revert reverted. That still hangs as before without any error report > > > to see. > > Yes, I meant testing my lockdep patch with the problematic printk change. > > Thanks for having a look. I'm puzzled why it didn't help. > > > > > I checked the kernel logs and there is also no lockdep report anywhere. > > > I get the "trace_printk() being used" notice but nothing else of > > > interest around there. Though the notice should mean trace_printk() was > > > used at least once? > > Yes. Anyway, I'd be grateful if you could run one more test for me so > > that I can better understand what's going on. Can you take recent vanilla > > kernel (with the revert) and apply attached patch to it? It again enables > > interrupts when calling console_unlock() but keeps lockdep coverage > > unchanged. It helped Sasha so I want to see whether your case is similar or > > different. Thanks! > > Applied on top of 15ba2236f, works fine. Great. Thanks for testing. I'll send the patch to Andrew. > I still don't see what printing could have triggered the problem. The > only thing that is a warning is from the PCI code about some missing > pcie-to-pci bridge (which I really should report some time). That isn't > the culprit however since I tested a build with that WARN_ON_ONCE > removed and it still hung. > > Okay, there's ACPI errors, but these seem to be rather late to matter, I > think? Anyway, here's the log of this working boot from the start to > just before initrd gets started: The prints you can see are those that are fine ;). The thing is that when lockdep covers more of the printk & console code itself (as was the case with my original patch), it finds something which makes it crash the machine. I'm not sure why my patch to make lockdep use trace_printk didn't really help, maybe the nature of the crash is different than I thought. Sasha Levin was able to reproduce the problem with a virtual machine so I'll ask him for a config and will hope I'll be able to reproduce myself and experiment with it. Thanks again for testing! Honza -- Jan Kara SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION] Boot hang with 939f04bec printk: enable interrupts before calling console_trylock_for_printk()
On Wed 23-07-14 01:29:32, Andreas Bombe wrote: On Mon, Jul 21, 2014 at 12:04:34PM +0200, Jan Kara wrote: On Sat 19-07-14 00:50:05, Andreas Bombe wrote: I don't see anything in /sys/kernel/debug/tracing/trace_pipe or .../trace (besides the header) with your patch applied. In case you meant to test it with the problematic printk change, I also tried with the revert reverted. That still hangs as before without any error report to see. Yes, I meant testing my lockdep patch with the problematic printk change. Thanks for having a look. I'm puzzled why it didn't help. I checked the kernel logs and there is also no lockdep report anywhere. I get the trace_printk() being used notice but nothing else of interest around there. Though the notice should mean trace_printk() was used at least once? Yes. Anyway, I'd be grateful if you could run one more test for me so that I can better understand what's going on. Can you take recent vanilla kernel (with the revert) and apply attached patch to it? It again enables interrupts when calling console_unlock() but keeps lockdep coverage unchanged. It helped Sasha so I want to see whether your case is similar or different. Thanks! Applied on top of 15ba2236f, works fine. Great. Thanks for testing. I'll send the patch to Andrew. I still don't see what printing could have triggered the problem. The only thing that is a warning is from the PCI code about some missing pcie-to-pci bridge (which I really should report some time). That isn't the culprit however since I tested a build with that WARN_ON_ONCE removed and it still hung. Okay, there's ACPI errors, but these seem to be rather late to matter, I think? Anyway, here's the log of this working boot from the start to just before initrd gets started: The prints you can see are those that are fine ;). The thing is that when lockdep covers more of the printk console code itself (as was the case with my original patch), it finds something which makes it crash the machine. I'm not sure why my patch to make lockdep use trace_printk didn't really help, maybe the nature of the crash is different than I thought. Sasha Levin was able to reproduce the problem with a virtual machine so I'll ask him for a config and will hope I'll be able to reproduce myself and experiment with it. Thanks again for testing! Honza -- Jan Kara j...@suse.cz SUSE Labs, CR -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION] Boot hang with 939f04bec printk: enable interrupts before calling console_trylock_for_printk()
On Sat 19-07-14 00:50:05, Andreas Bombe wrote: > On Thu, Jul 17, 2014 at 10:31:37AM +0200, Jan Kara wrote: > > On Wed 16-07-14 23:34:08, Andreas Bombe wrote: > > > On Mon, Jul 14, 2014 at 10:35:27AM +0200, Jan Kara wrote: > > > > On Sun 29-06-14 00:50:50, Andreas Bombe wrote: > > > > > None of the post 3.15 kernel boot for me. They all hang at the GRUB > > > > > screen telling me it loaded and started the kernel, but the kernel > > > > > itself stops before it prints anything (or even replaces the GRUB > > > > > background graphics). > > > > > > > > > > I bisected it down to 939f04bec1a4ef6ba4370b0f34b01decc844b1b1 > > > > > "printk: > > > > > enable interrupts before calling console_trylock_for_printk()". > > > > > Reverting that patch on the latest kernel (git 24b414d5a7) allows me > > > > > to > > > > > boot normally. I fixed the conflict in the revert by leaving in the > > > > > "if > > > > > (in_sched) return printed_len;". > > > > > > > > > > I have the "early printk via the EFI framebuffer" option enabled, > > > > > disabling it made no difference however. > > > > Thanks for report. I've been on vacation so I'm replying with a > > > > delay. I > > > > believe this is one of the issues where this patch just uncovers > > > > underlying > > > > problem - I belive lockdep tries to report some locking issue in console > > > > driver code (this patch increased lockdep coverage of console driver > > > > code) > > > > however we are holding some locks in printk code which make lockdep > > > > deadlock. Can you try running with the attached patch? > > > > > > EUNABLE > > > > > > You forgot to attach a patch. > > Bah, sorry. Attaching now. > > I don't see anything in /sys/kernel/debug/tracing/trace_pipe or > .../trace (besides the header) with your patch applied. In case you > meant to test it with the problematic printk change, I also tried with > the revert reverted. That still hangs as before without any error report > to see. Yes, I meant testing my lockdep patch with the problematic printk change. Thanks for having a look. I'm puzzled why it didn't help. > I checked the kernel logs and there is also no lockdep report anywhere. > I get the "trace_printk() being used" notice but nothing else of > interest around there. Though the notice should mean trace_printk() was > used at least once? Yes. Anyway, I'd be grateful if you could run one more test for me so that I can better understand what's going on. Can you take recent vanilla kernel (with the revert) and apply attached patch to it? It again enables interrupts when calling console_unlock() but keeps lockdep coverage unchanged. It helped Sasha so I want to see whether your case is similar or different. Thanks! Honza -- Jan Kara SUSE Labs, CR >From c73d4093c3fd5013dbf5baccd495d6db24de31cc Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 4 Jun 2014 16:11:37 -0700 Subject: [PATCH] printk: enable interrupts before calling console_trylock_for_printk() We need interrupts disabled when calling console_trylock_for_printk() only so that cpu id we pass to can_use_console() remains valid (for other things console_sem provides all the exclusion we need and deadlocks on console_sem due to interrupts are impossible because we use down_trylock()). However if we are rescheduled, we are guaranteed to run on an online cpu so we can easily just get the cpu id in can_use_console(). We can lose a bit of performance when we enable interrupts in vprintk_emit() and then disable them again in console_unlock() but OTOH it can somewhat reduce interrupt latency caused by console_unlock(). We avoid calling console_unlock() from vprintk_emit() with lockdep enabled as that has unveiled quite some bugs leading to system freezes during boot (e.g. https://lkml.org/lkml/2014/5/30/242). Signed-off-by: Jan Kara --- kernel/printk/printk.c | 31 +-- 1 file changed, 21 insertions(+), 10 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 13e839dbca07..fe4154d83fe4 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1416,10 +1416,9 @@ static int have_callable_console(void) /* * Can we actually use the console at this time on this cpu? * - * Console drivers may assume that per-cpu resources have - * been allocated. So unless they're explicitly marked as - * being able to cope (CON_ANYTIME) don't call them until - * this CPU is officially up. + * Console drivers may assume that per-cpu resources have been allocated. So + * unless they're explicitly marked as being able to cope (CON_ANYTIME) don't + * call them until this CPU is officially up. */ static inline int can_use_console(unsigned int cpu) { @@ -1432,8 +1431,10 @@ static inline int can_use_console(unsigned int cpu) * console_lock held, and 'console_locked' set) if it * is successful, false otherwise. */ -static int console_trylock_for_printk(unsigned int
Re: [REGRESSION] Boot hang with 939f04bec printk: enable interrupts before calling console_trylock_for_printk()
On Sat 19-07-14 00:50:05, Andreas Bombe wrote: On Thu, Jul 17, 2014 at 10:31:37AM +0200, Jan Kara wrote: On Wed 16-07-14 23:34:08, Andreas Bombe wrote: On Mon, Jul 14, 2014 at 10:35:27AM +0200, Jan Kara wrote: On Sun 29-06-14 00:50:50, Andreas Bombe wrote: None of the post 3.15 kernel boot for me. They all hang at the GRUB screen telling me it loaded and started the kernel, but the kernel itself stops before it prints anything (or even replaces the GRUB background graphics). I bisected it down to 939f04bec1a4ef6ba4370b0f34b01decc844b1b1 printk: enable interrupts before calling console_trylock_for_printk(). Reverting that patch on the latest kernel (git 24b414d5a7) allows me to boot normally. I fixed the conflict in the revert by leaving in the if (in_sched) return printed_len;. I have the early printk via the EFI framebuffer option enabled, disabling it made no difference however. Thanks for report. I've been on vacation so I'm replying with a delay. I believe this is one of the issues where this patch just uncovers underlying problem - I belive lockdep tries to report some locking issue in console driver code (this patch increased lockdep coverage of console driver code) however we are holding some locks in printk code which make lockdep deadlock. Can you try running with the attached patch? EUNABLE You forgot to attach a patch. Bah, sorry. Attaching now. I don't see anything in /sys/kernel/debug/tracing/trace_pipe or .../trace (besides the header) with your patch applied. In case you meant to test it with the problematic printk change, I also tried with the revert reverted. That still hangs as before without any error report to see. Yes, I meant testing my lockdep patch with the problematic printk change. Thanks for having a look. I'm puzzled why it didn't help. I checked the kernel logs and there is also no lockdep report anywhere. I get the trace_printk() being used notice but nothing else of interest around there. Though the notice should mean trace_printk() was used at least once? Yes. Anyway, I'd be grateful if you could run one more test for me so that I can better understand what's going on. Can you take recent vanilla kernel (with the revert) and apply attached patch to it? It again enables interrupts when calling console_unlock() but keeps lockdep coverage unchanged. It helped Sasha so I want to see whether your case is similar or different. Thanks! Honza -- Jan Kara j...@suse.cz SUSE Labs, CR From c73d4093c3fd5013dbf5baccd495d6db24de31cc Mon Sep 17 00:00:00 2001 From: Jan Kara j...@suse.cz Date: Wed, 4 Jun 2014 16:11:37 -0700 Subject: [PATCH] printk: enable interrupts before calling console_trylock_for_printk() We need interrupts disabled when calling console_trylock_for_printk() only so that cpu id we pass to can_use_console() remains valid (for other things console_sem provides all the exclusion we need and deadlocks on console_sem due to interrupts are impossible because we use down_trylock()). However if we are rescheduled, we are guaranteed to run on an online cpu so we can easily just get the cpu id in can_use_console(). We can lose a bit of performance when we enable interrupts in vprintk_emit() and then disable them again in console_unlock() but OTOH it can somewhat reduce interrupt latency caused by console_unlock(). We avoid calling console_unlock() from vprintk_emit() with lockdep enabled as that has unveiled quite some bugs leading to system freezes during boot (e.g. https://lkml.org/lkml/2014/5/30/242). Signed-off-by: Jan Kara j...@suse.cz --- kernel/printk/printk.c | 31 +-- 1 file changed, 21 insertions(+), 10 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 13e839dbca07..fe4154d83fe4 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1416,10 +1416,9 @@ static int have_callable_console(void) /* * Can we actually use the console at this time on this cpu? * - * Console drivers may assume that per-cpu resources have - * been allocated. So unless they're explicitly marked as - * being able to cope (CON_ANYTIME) don't call them until - * this CPU is officially up. + * Console drivers may assume that per-cpu resources have been allocated. So + * unless they're explicitly marked as being able to cope (CON_ANYTIME) don't + * call them until this CPU is officially up. */ static inline int can_use_console(unsigned int cpu) { @@ -1432,8 +1431,10 @@ static inline int can_use_console(unsigned int cpu) * console_lock held, and 'console_locked' set) if it * is successful, false otherwise. */ -static int console_trylock_for_printk(unsigned int cpu) +static int console_trylock_for_printk(void) { + unsigned int cpu = smp_processor_id(); + if (!console_trylock())
Re: [REGRESSION] Boot hang with 939f04bec printk: enable interrupts before calling console_trylock_for_printk()
On Thu, Jul 17, 2014 at 10:31:37AM +0200, Jan Kara wrote: > On Wed 16-07-14 23:34:08, Andreas Bombe wrote: > > On Mon, Jul 14, 2014 at 10:35:27AM +0200, Jan Kara wrote: > > > On Sun 29-06-14 00:50:50, Andreas Bombe wrote: > > > > None of the post 3.15 kernel boot for me. They all hang at the GRUB > > > > screen telling me it loaded and started the kernel, but the kernel > > > > itself stops before it prints anything (or even replaces the GRUB > > > > background graphics). > > > > > > > > I bisected it down to 939f04bec1a4ef6ba4370b0f34b01decc844b1b1 "printk: > > > > enable interrupts before calling console_trylock_for_printk()". > > > > Reverting that patch on the latest kernel (git 24b414d5a7) allows me to > > > > boot normally. I fixed the conflict in the revert by leaving in the "if > > > > (in_sched) return printed_len;". > > > > > > > > I have the "early printk via the EFI framebuffer" option enabled, > > > > disabling it made no difference however. > > > Thanks for report. I've been on vacation so I'm replying with a delay. I > > > believe this is one of the issues where this patch just uncovers > > > underlying > > > problem - I belive lockdep tries to report some locking issue in console > > > driver code (this patch increased lockdep coverage of console driver code) > > > however we are holding some locks in printk code which make lockdep > > > deadlock. Can you try running with the attached patch? > > > > EUNABLE > > > > You forgot to attach a patch. > Bah, sorry. Attaching now. I don't see anything in /sys/kernel/debug/tracing/trace_pipe or .../trace (besides the header) with your patch applied. In case you meant to test it with the problematic printk change, I also tried with the revert reverted. That still hangs as before without any error report to see. I checked the kernel logs and there is also no lockdep report anywhere. I get the "trace_printk() being used" notice but nothing else of interest around there. Though the notice should mean trace_printk() was used at least once? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION] Boot hang with 939f04bec printk: enable interrupts before calling console_trylock_for_printk()
On Thu, Jul 17, 2014 at 10:31:37AM +0200, Jan Kara wrote: On Wed 16-07-14 23:34:08, Andreas Bombe wrote: On Mon, Jul 14, 2014 at 10:35:27AM +0200, Jan Kara wrote: On Sun 29-06-14 00:50:50, Andreas Bombe wrote: None of the post 3.15 kernel boot for me. They all hang at the GRUB screen telling me it loaded and started the kernel, but the kernel itself stops before it prints anything (or even replaces the GRUB background graphics). I bisected it down to 939f04bec1a4ef6ba4370b0f34b01decc844b1b1 printk: enable interrupts before calling console_trylock_for_printk(). Reverting that patch on the latest kernel (git 24b414d5a7) allows me to boot normally. I fixed the conflict in the revert by leaving in the if (in_sched) return printed_len;. I have the early printk via the EFI framebuffer option enabled, disabling it made no difference however. Thanks for report. I've been on vacation so I'm replying with a delay. I believe this is one of the issues where this patch just uncovers underlying problem - I belive lockdep tries to report some locking issue in console driver code (this patch increased lockdep coverage of console driver code) however we are holding some locks in printk code which make lockdep deadlock. Can you try running with the attached patch? EUNABLE You forgot to attach a patch. Bah, sorry. Attaching now. I don't see anything in /sys/kernel/debug/tracing/trace_pipe or .../trace (besides the header) with your patch applied. In case you meant to test it with the problematic printk change, I also tried with the revert reverted. That still hangs as before without any error report to see. I checked the kernel logs and there is also no lockdep report anywhere. I get the trace_printk() being used notice but nothing else of interest around there. Though the notice should mean trace_printk() was used at least once? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION] Boot hang with 939f04bec printk: enable interrupts before calling console_trylock_for_printk()
On Wed 16-07-14 23:34:08, Andreas Bombe wrote: > On Mon, Jul 14, 2014 at 10:35:27AM +0200, Jan Kara wrote: > > On Sun 29-06-14 00:50:50, Andreas Bombe wrote: > > > None of the post 3.15 kernel boot for me. They all hang at the GRUB > > > screen telling me it loaded and started the kernel, but the kernel > > > itself stops before it prints anything (or even replaces the GRUB > > > background graphics). > > > > > > I bisected it down to 939f04bec1a4ef6ba4370b0f34b01decc844b1b1 "printk: > > > enable interrupts before calling console_trylock_for_printk()". > > > Reverting that patch on the latest kernel (git 24b414d5a7) allows me to > > > boot normally. I fixed the conflict in the revert by leaving in the "if > > > (in_sched) return printed_len;". > > > > > > I have the "early printk via the EFI framebuffer" option enabled, > > > disabling it made no difference however. > > Thanks for report. I've been on vacation so I'm replying with a delay. I > > believe this is one of the issues where this patch just uncovers underlying > > problem - I belive lockdep tries to report some locking issue in console > > driver code (this patch increased lockdep coverage of console driver code) > > however we are holding some locks in printk code which make lockdep > > deadlock. Can you try running with the attached patch? > > EUNABLE > > You forgot to attach a patch. Bah, sorry. Attaching now. Honza -- Jan Kara SUSE Labs, CR >From 93930c92c664d8b834e1e04b92c00023626fc07a Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Thu, 19 Jun 2014 15:36:09 +0200 Subject: [PATCH] lockdep: Dump info via tracing Signed-off-by: Jan Kara --- kernel/locking/lockdep.c | 707 +++ 1 file changed, 402 insertions(+), 305 deletions(-) diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index d24e4339b46d..b15e7dec55f6 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -77,6 +77,26 @@ module_param(lock_stat, int, 0644); */ static arch_spinlock_t lockdep_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; +static void sprint_ip_sym(char *buf, unsigned long ip) +{ + sprintf(buf, "[<%p>] %pS\n", (void *) ip, (void *) ip); +} + +static void trace_print_stack_trace(struct stack_trace *trace, int spaces) +{ + int i, n; + char buf[256]; + + if (!trace->entries) + return; + + for (i = 0; i < trace->nr_entries; i++) { + n = sprintf(buf, "%*c", 1 + spaces, ' '); + sprint_ip_sym(buf + n, trace->entries[i]); + trace_printk(buf); + } +} + static int graph_lock(void) { arch_spin_lock(_lock); @@ -382,9 +402,9 @@ static unsigned long stack_trace[MAX_STACK_TRACE_ENTRIES]; static void print_lockdep_off(const char *bug_msg) { - printk(KERN_DEBUG "%s\n", bug_msg); - printk(KERN_DEBUG "turning off the locking correctness validator.\n"); - printk(KERN_DEBUG "Please attach the output of /proc/lock_stat to the bug report\n"); + trace_printk("%s\n", bug_msg); + trace_printk("turning off the locking correctness validator.\n"); + trace_printk("Please attach the output of /proc/lock_stat to the bug report\n"); } static int save_trace(struct stack_trace *trace) @@ -417,7 +437,7 @@ static int save_trace(struct stack_trace *trace) return 0; print_lockdep_off("BUG: MAX_STACK_TRACE_ENTRIES too low!"); - dump_stack(); + trace_dump_stack(0); return 0; } @@ -506,7 +526,7 @@ void get_usage_chars(struct lock_class *class, char usage[LOCK_USAGE_CHARS]) usage[i] = '\0'; } -static void __print_lock_name(struct lock_class *class) +static void __sprint_lock_name(char *buf, struct lock_class *class) { char str[KSYM_NAME_LEN]; const char *name; @@ -514,28 +534,28 @@ static void __print_lock_name(struct lock_class *class) name = class->name; if (!name) { name = __get_key_name(class->key, str); - printk("%s", name); + strcpy(buf, name); } else { - printk("%s", name); + strcpy(buf, name); if (class->name_version > 1) - printk("#%d", class->name_version); + sprintf(buf + strlen(buf), "#%d", class->name_version); if (class->subclass) - printk("/%d", class->subclass); + sprintf(buf + strlen(buf), "/%d", class->subclass); } } -static void print_lock_name(struct lock_class *class) +static void sprint_lock_name(char *buf, struct lock_class *class) { char usage[LOCK_USAGE_CHARS]; get_usage_chars(class, usage); - printk(" ("); - __print_lock_name(class); - printk("){%s}", usage); + strcpy(buf, " ("); + __sprint_lock_name(buf, class); + sprintf(buf + strlen(buf), "){%s}", usage); } -static void print_lockdep_cache(struct lockdep_map *lock) +static void sprint_lockdep_cache(char *buf, struct lockdep_map *lock) { const char *name; char str[KSYM_NAME_LEN]; @@ -544,14 +564,14 @@ static void print_lockdep_cache(struct lockdep_map *lock) if (!name) name = __get_key_name(lock->key->subkeys, str); - printk("%s", name); + strcpy(buf, name); } -static void
Re: [REGRESSION] Boot hang with 939f04bec printk: enable interrupts before calling console_trylock_for_printk()
On Wed 16-07-14 23:34:08, Andreas Bombe wrote: On Mon, Jul 14, 2014 at 10:35:27AM +0200, Jan Kara wrote: On Sun 29-06-14 00:50:50, Andreas Bombe wrote: None of the post 3.15 kernel boot for me. They all hang at the GRUB screen telling me it loaded and started the kernel, but the kernel itself stops before it prints anything (or even replaces the GRUB background graphics). I bisected it down to 939f04bec1a4ef6ba4370b0f34b01decc844b1b1 printk: enable interrupts before calling console_trylock_for_printk(). Reverting that patch on the latest kernel (git 24b414d5a7) allows me to boot normally. I fixed the conflict in the revert by leaving in the if (in_sched) return printed_len;. I have the early printk via the EFI framebuffer option enabled, disabling it made no difference however. Thanks for report. I've been on vacation so I'm replying with a delay. I believe this is one of the issues where this patch just uncovers underlying problem - I belive lockdep tries to report some locking issue in console driver code (this patch increased lockdep coverage of console driver code) however we are holding some locks in printk code which make lockdep deadlock. Can you try running with the attached patch? EUNABLE You forgot to attach a patch. Bah, sorry. Attaching now. Honza -- Jan Kara j...@suse.cz SUSE Labs, CR From 93930c92c664d8b834e1e04b92c00023626fc07a Mon Sep 17 00:00:00 2001 From: Jan Kara j...@suse.cz Date: Thu, 19 Jun 2014 15:36:09 +0200 Subject: [PATCH] lockdep: Dump info via tracing Signed-off-by: Jan Kara j...@suse.cz --- kernel/locking/lockdep.c | 707 +++ 1 file changed, 402 insertions(+), 305 deletions(-) diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index d24e4339b46d..b15e7dec55f6 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -77,6 +77,26 @@ module_param(lock_stat, int, 0644); */ static arch_spinlock_t lockdep_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; +static void sprint_ip_sym(char *buf, unsigned long ip) +{ + sprintf(buf, [%p] %pS\n, (void *) ip, (void *) ip); +} + +static void trace_print_stack_trace(struct stack_trace *trace, int spaces) +{ + int i, n; + char buf[256]; + + if (!trace-entries) + return; + + for (i = 0; i trace-nr_entries; i++) { + n = sprintf(buf, %*c, 1 + spaces, ' '); + sprint_ip_sym(buf + n, trace-entries[i]); + trace_printk(buf); + } +} + static int graph_lock(void) { arch_spin_lock(lockdep_lock); @@ -382,9 +402,9 @@ static unsigned long stack_trace[MAX_STACK_TRACE_ENTRIES]; static void print_lockdep_off(const char *bug_msg) { - printk(KERN_DEBUG %s\n, bug_msg); - printk(KERN_DEBUG turning off the locking correctness validator.\n); - printk(KERN_DEBUG Please attach the output of /proc/lock_stat to the bug report\n); + trace_printk(%s\n, bug_msg); + trace_printk(turning off the locking correctness validator.\n); + trace_printk(Please attach the output of /proc/lock_stat to the bug report\n); } static int save_trace(struct stack_trace *trace) @@ -417,7 +437,7 @@ static int save_trace(struct stack_trace *trace) return 0; print_lockdep_off(BUG: MAX_STACK_TRACE_ENTRIES too low!); - dump_stack(); + trace_dump_stack(0); return 0; } @@ -506,7 +526,7 @@ void get_usage_chars(struct lock_class *class, char usage[LOCK_USAGE_CHARS]) usage[i] = '\0'; } -static void __print_lock_name(struct lock_class *class) +static void __sprint_lock_name(char *buf, struct lock_class *class) { char str[KSYM_NAME_LEN]; const char *name; @@ -514,28 +534,28 @@ static void __print_lock_name(struct lock_class *class) name = class-name; if (!name) { name = __get_key_name(class-key, str); - printk(%s, name); + strcpy(buf, name); } else { - printk(%s, name); + strcpy(buf, name); if (class-name_version 1) - printk(#%d, class-name_version); + sprintf(buf + strlen(buf), #%d, class-name_version); if (class-subclass) - printk(/%d, class-subclass); + sprintf(buf + strlen(buf), /%d, class-subclass); } } -static void print_lock_name(struct lock_class *class) +static void sprint_lock_name(char *buf, struct lock_class *class) { char usage[LOCK_USAGE_CHARS]; get_usage_chars(class, usage); - printk( (); - __print_lock_name(class); - printk(){%s}, usage); + strcpy(buf, (); + __sprint_lock_name(buf, class); + sprintf(buf + strlen(buf), ){%s}, usage); } -static void print_lockdep_cache(struct lockdep_map *lock) +static void sprint_lockdep_cache(char *buf, struct lockdep_map *lock) { const char *name; char str[KSYM_NAME_LEN]; @@ -544,14 +564,14 @@ static void print_lockdep_cache(struct lockdep_map *lock) if (!name) name = __get_key_name(lock-key-subkeys, str); - printk(%s, name); + strcpy(buf, name); } -static void print_lock(struct held_lock *hlock) +static void sprint_lock(char *buf, struct
Re: [REGRESSION] Boot hang with 939f04bec printk: enable interrupts before calling console_trylock_for_printk()
On Mon, Jul 14, 2014 at 10:35:27AM +0200, Jan Kara wrote: > On Sun 29-06-14 00:50:50, Andreas Bombe wrote: > > None of the post 3.15 kernel boot for me. They all hang at the GRUB > > screen telling me it loaded and started the kernel, but the kernel > > itself stops before it prints anything (or even replaces the GRUB > > background graphics). > > > > I bisected it down to 939f04bec1a4ef6ba4370b0f34b01decc844b1b1 "printk: > > enable interrupts before calling console_trylock_for_printk()". > > Reverting that patch on the latest kernel (git 24b414d5a7) allows me to > > boot normally. I fixed the conflict in the revert by leaving in the "if > > (in_sched) return printed_len;". > > > > I have the "early printk via the EFI framebuffer" option enabled, > > disabling it made no difference however. > Thanks for report. I've been on vacation so I'm replying with a delay. I > believe this is one of the issues where this patch just uncovers underlying > problem - I belive lockdep tries to report some locking issue in console > driver code (this patch increased lockdep coverage of console driver code) > however we are holding some locks in printk code which make lockdep > deadlock. Can you try running with the attached patch? EUNABLE You forgot to attach a patch. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION] Boot hang with 939f04bec printk: enable interrupts before calling console_trylock_for_printk()
On Mon, Jul 14, 2014 at 10:35:27AM +0200, Jan Kara wrote: On Sun 29-06-14 00:50:50, Andreas Bombe wrote: None of the post 3.15 kernel boot for me. They all hang at the GRUB screen telling me it loaded and started the kernel, but the kernel itself stops before it prints anything (or even replaces the GRUB background graphics). I bisected it down to 939f04bec1a4ef6ba4370b0f34b01decc844b1b1 printk: enable interrupts before calling console_trylock_for_printk(). Reverting that patch on the latest kernel (git 24b414d5a7) allows me to boot normally. I fixed the conflict in the revert by leaving in the if (in_sched) return printed_len;. I have the early printk via the EFI framebuffer option enabled, disabling it made no difference however. Thanks for report. I've been on vacation so I'm replying with a delay. I believe this is one of the issues where this patch just uncovers underlying problem - I belive lockdep tries to report some locking issue in console driver code (this patch increased lockdep coverage of console driver code) however we are holding some locks in printk code which make lockdep deadlock. Can you try running with the attached patch? EUNABLE You forgot to attach a patch. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION] Boot hang with 939f04bec printk: enable interrupts before calling console_trylock_for_printk()
On Mon, Jun 30, 2014 at 01:20:30PM -0700, Andrew Morton wrote: > On Sun, 29 Jun 2014 00:50:50 +0200 Andreas Bombe wrote: > > > None of the post 3.15 kernel boot for me. They all hang at the GRUB > > screen telling me it loaded and started the kernel, but the kernel > > itself stops before it prints anything (or even replaces the GRUB > > background graphics). > > > > I bisected it down to 939f04bec1a4ef6ba4370b0f34b01decc844b1b1 "printk: > > enable interrupts before calling console_trylock_for_printk()". > > Reverting that patch on the latest kernel (git 24b414d5a7) allows me to > > boot normally. > > OK, thanks, I queued the below reversion. I'll hold off for a week or > two in case we come up with a fix. > > > I fixed the conflict in the revert by leaving in the "if > > (in_sched) return printed_len;". > > hm, that might have been buggy - we still need to the lockdep_on() and > local_irq_restore(). You're a victim of > someone-stuck-a-random-return-in-the-middle-of-a-function. To be honest, I just randomly picked one possible conflict resolution to confirm whether the revert actually helps. > Please review this and if possible, run-time test it? Against current > mainline. I've applied it (actually I picked the one from your patch series but they are identical I think) and it does boot. I'm writing this on the latest kernel + your revert. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION] Boot hang with 939f04bec printk: enable interrupts before calling console_trylock_for_printk()
On Mon, Jun 30, 2014 at 01:20:30PM -0700, Andrew Morton wrote: On Sun, 29 Jun 2014 00:50:50 +0200 Andreas Bombe a...@debian.org wrote: None of the post 3.15 kernel boot for me. They all hang at the GRUB screen telling me it loaded and started the kernel, but the kernel itself stops before it prints anything (or even replaces the GRUB background graphics). I bisected it down to 939f04bec1a4ef6ba4370b0f34b01decc844b1b1 printk: enable interrupts before calling console_trylock_for_printk(). Reverting that patch on the latest kernel (git 24b414d5a7) allows me to boot normally. OK, thanks, I queued the below reversion. I'll hold off for a week or two in case we come up with a fix. I fixed the conflict in the revert by leaving in the if (in_sched) return printed_len;. hm, that might have been buggy - we still need to the lockdep_on() and local_irq_restore(). You're a victim of someone-stuck-a-random-return-in-the-middle-of-a-function. To be honest, I just randomly picked one possible conflict resolution to confirm whether the revert actually helps. Please review this and if possible, run-time test it? Against current mainline. I've applied it (actually I picked the one from your patch series but they are identical I think) and it does boot. I'm writing this on the latest kernel + your revert. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REGRESSION] Boot hang with 939f04bec printk: enable interrupts before calling console_trylock_for_printk()
On Sun, 29 Jun 2014 00:50:50 +0200 Andreas Bombe wrote: > None of the post 3.15 kernel boot for me. They all hang at the GRUB > screen telling me it loaded and started the kernel, but the kernel > itself stops before it prints anything (or even replaces the GRUB > background graphics). > > I bisected it down to 939f04bec1a4ef6ba4370b0f34b01decc844b1b1 "printk: > enable interrupts before calling console_trylock_for_printk()". > Reverting that patch on the latest kernel (git 24b414d5a7) allows me to > boot normally. OK, thanks, I queued the below reversion. I'll hold off for a week or two in case we come up with a fix. > I fixed the conflict in the revert by leaving in the "if > (in_sched) return printed_len;". hm, that might have been buggy - we still need to the lockdep_on() and local_irq_restore(). You're a victim of someone-stuck-a-random-return-in-the-middle-of-a-function. Please review this and if possible, run-time test it? Against current mainline. From: Andrew Morton Subject: kernel/printk/printk.c: revert "printk: enable interrupts before calling console_trylock_for_printk()" Revert 939f04bec1a4 ("printk: enable interrupts before calling console_trylock_for_printk()"). Andreas reported: : None of the post 3.15 kernel boot for me. They all hang at the GRUB : screen telling me it loaded and started the kernel, but the kernel : itself stops before it prints anything (or even replaces the GRUB : background graphics). 939f04bec1a4 is modest latency reduction. Revert it until we understand the reason for these failures. Reported-by: Andreas Bombe Cc: Jan Kara Cc: Steven Rostedt Signed-off-by: Andrew Morton --- kernel/printk/printk.c | 44 +++ 1 file changed, 18 insertions(+), 26 deletions(-) diff -puN kernel/printk/printk.c~kernel-printk-printkc-revert-printk-enable-interrupts-before-calling-console_trylock_for_printk kernel/printk/printk.c --- a/kernel/printk/printk.c~kernel-printk-printkc-revert-printk-enable-interrupts-before-calling-console_trylock_for_printk +++ a/kernel/printk/printk.c @@ -1416,9 +1416,10 @@ static int have_callable_console(void) /* * Can we actually use the console at this time on this cpu? * - * Console drivers may assume that per-cpu resources have been allocated. So - * unless they're explicitly marked as being able to cope (CON_ANYTIME) don't - * call them until this CPU is officially up. + * Console drivers may assume that per-cpu resources have + * been allocated. So unless they're explicitly marked as + * being able to cope (CON_ANYTIME) don't call them until + * this CPU is officially up. */ static inline int can_use_console(unsigned int cpu) { @@ -1431,10 +1432,8 @@ static inline int can_use_console(unsign * console_lock held, and 'console_locked' set) if it * is successful, false otherwise. */ -static int console_trylock_for_printk(void) +static int console_trylock_for_printk(unsigned int cpu) { - unsigned int cpu = smp_processor_id(); - if (!console_trylock()) return 0; /* @@ -1609,8 +1608,7 @@ asmlinkage int vprintk_emit(int facility */ if (!oops_in_progress && !lockdep_recursing(current)) { recursion_bug = 1; - local_irq_restore(flags); - return 0; + goto out_restore_irqs; } zap_locks(); } @@ -1718,27 +1716,21 @@ asmlinkage int vprintk_emit(int facility logbuf_cpu = UINT_MAX; raw_spin_unlock(_lock); - lockdep_on(); - local_irq_restore(flags); /* If called from the scheduler, we can not call up(). */ - if (in_sched) - return printed_len; - - /* -* Disable preemption to avoid being preempted while holding -* console_sem which would prevent anyone from printing to console -*/ - preempt_disable(); - /* -* Try to acquire and then immediately release the console semaphore. -* The release will print out buffers and wake up /dev/kmsg and syslog() -* users. -*/ - if (console_trylock_for_printk()) - console_unlock(); - preempt_enable(); + if (!in_sched) { + /* +* Try to acquire and then immediately release the console +* semaphore. The release will print out buffers and wake up +* /dev/kmsg and syslog() users. +*/ + if (console_trylock_for_printk(this_cpu)) + console_unlock(); + } + lockdep_on(); +out_restore_irqs: + local_irq_restore(flags); return printed_len; } EXPORT_SYMBOL(vprintk_emit); _ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] Boot hang with 939f04bec printk: enable interrupts before calling console_trylock_for_printk()
On Sun, 29 Jun 2014 00:50:50 +0200 Andreas Bombe a...@debian.org wrote: None of the post 3.15 kernel boot for me. They all hang at the GRUB screen telling me it loaded and started the kernel, but the kernel itself stops before it prints anything (or even replaces the GRUB background graphics). I bisected it down to 939f04bec1a4ef6ba4370b0f34b01decc844b1b1 printk: enable interrupts before calling console_trylock_for_printk(). Reverting that patch on the latest kernel (git 24b414d5a7) allows me to boot normally. OK, thanks, I queued the below reversion. I'll hold off for a week or two in case we come up with a fix. I fixed the conflict in the revert by leaving in the if (in_sched) return printed_len;. hm, that might have been buggy - we still need to the lockdep_on() and local_irq_restore(). You're a victim of someone-stuck-a-random-return-in-the-middle-of-a-function. Please review this and if possible, run-time test it? Against current mainline. From: Andrew Morton a...@linux-foundation.org Subject: kernel/printk/printk.c: revert printk: enable interrupts before calling console_trylock_for_printk() Revert 939f04bec1a4 (printk: enable interrupts before calling console_trylock_for_printk()). Andreas reported: : None of the post 3.15 kernel boot for me. They all hang at the GRUB : screen telling me it loaded and started the kernel, but the kernel : itself stops before it prints anything (or even replaces the GRUB : background graphics). 939f04bec1a4 is modest latency reduction. Revert it until we understand the reason for these failures. Reported-by: Andreas Bombe a...@debian.org Cc: Jan Kara j...@suse.cz Cc: Steven Rostedt rost...@goodmis.org Signed-off-by: Andrew Morton a...@linux-foundation.org --- kernel/printk/printk.c | 44 +++ 1 file changed, 18 insertions(+), 26 deletions(-) diff -puN kernel/printk/printk.c~kernel-printk-printkc-revert-printk-enable-interrupts-before-calling-console_trylock_for_printk kernel/printk/printk.c --- a/kernel/printk/printk.c~kernel-printk-printkc-revert-printk-enable-interrupts-before-calling-console_trylock_for_printk +++ a/kernel/printk/printk.c @@ -1416,9 +1416,10 @@ static int have_callable_console(void) /* * Can we actually use the console at this time on this cpu? * - * Console drivers may assume that per-cpu resources have been allocated. So - * unless they're explicitly marked as being able to cope (CON_ANYTIME) don't - * call them until this CPU is officially up. + * Console drivers may assume that per-cpu resources have + * been allocated. So unless they're explicitly marked as + * being able to cope (CON_ANYTIME) don't call them until + * this CPU is officially up. */ static inline int can_use_console(unsigned int cpu) { @@ -1431,10 +1432,8 @@ static inline int can_use_console(unsign * console_lock held, and 'console_locked' set) if it * is successful, false otherwise. */ -static int console_trylock_for_printk(void) +static int console_trylock_for_printk(unsigned int cpu) { - unsigned int cpu = smp_processor_id(); - if (!console_trylock()) return 0; /* @@ -1609,8 +1608,7 @@ asmlinkage int vprintk_emit(int facility */ if (!oops_in_progress !lockdep_recursing(current)) { recursion_bug = 1; - local_irq_restore(flags); - return 0; + goto out_restore_irqs; } zap_locks(); } @@ -1718,27 +1716,21 @@ asmlinkage int vprintk_emit(int facility logbuf_cpu = UINT_MAX; raw_spin_unlock(logbuf_lock); - lockdep_on(); - local_irq_restore(flags); /* If called from the scheduler, we can not call up(). */ - if (in_sched) - return printed_len; - - /* -* Disable preemption to avoid being preempted while holding -* console_sem which would prevent anyone from printing to console -*/ - preempt_disable(); - /* -* Try to acquire and then immediately release the console semaphore. -* The release will print out buffers and wake up /dev/kmsg and syslog() -* users. -*/ - if (console_trylock_for_printk()) - console_unlock(); - preempt_enable(); + if (!in_sched) { + /* +* Try to acquire and then immediately release the console +* semaphore. The release will print out buffers and wake up +* /dev/kmsg and syslog() users. +*/ + if (console_trylock_for_printk(this_cpu)) + console_unlock(); + } + lockdep_on(); +out_restore_irqs: + local_irq_restore(flags); return printed_len; } EXPORT_SYMBOL(vprintk_emit); _ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to