Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-12 Thread Peter Zijlstra
On Thu, Oct 12, 2017 at 05:28:13PM +0800, Fengguang Wu wrote: > Please try this: > > rm openwrt-trinity-i386.cgz > wget > https://github.com/fengguang/reproduce-kernel-bug/raw/master/openwrt/openwrt-trinity-i386.cgz Yep, that makes it go. Thanks!

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-12 Thread Fengguang Wu
[ 35.721719] Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance. Well I got the same result. The script and initrd image matches well with my local version. I'll dig what goes wrong. And it did d

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-12 Thread Fengguang Wu
On Thu, Oct 12, 2017 at 10:47:25AM +0200, Peter Zijlstra wrote: On Tue, Oct 03, 2017 at 10:06:34PM +0800, Fengguang Wu wrote: #!/bin/bash kernel=$1 initrd=openwrt-trinity-i386.cgz wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/$initrd kvm=( qem

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-12 Thread Peter Zijlstra
On Tue, Oct 03, 2017 at 10:06:34PM +0800, Fengguang Wu wrote: > #!/bin/bash > > kernel=$1 > initrd=openwrt-trinity-i386.cgz > > wget --no-clobber > https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/$initrd > > kvm=( > qemu-system-x86_64 > -enable-kvm > -cpu

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-11 Thread Byungchul Park
On Wed, Oct 11, 2017 at 09:56:05AM +0900, Byungchul Park wrote: > Thank you very much for explaining it in detail. > > But let's shift a viewpoint. Precisely, I didn't want to work on locks > but *waiters* becasue dependancies causing deadlocks only can be created > by waiters - nevertheless I hav

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-10 Thread Byungchul Park
On Tue, Oct 10, 2017 at 09:56:26AM -0700, Linus Torvalds wrote: > On Tue, Oct 10, 2017 at 9:22 AM, Linus Torvalds > wrote: > > > > I really would like to see the sites that do cross-thread lock/unlock > > pairs themselves be annotated. > > > > So when you lock in one thread, and then unlock in ano

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-10 Thread Byungchul Park
On Tue, Oct 10, 2017 at 08:14:09PM +0200, Peter Zijlstra wrote: > On Tue, Oct 10, 2017 at 09:56:26AM -0700, Linus Torvalds wrote: > > > So I think the best model would be something like this: > > > > - T1: > > mutex_lock(&lock) > > ... > > mutex_transfer(&lock) > > > >

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-10 Thread Byungchul Park
On Wed, Oct 11, 2017 at 09:56:05AM +0900, Byungchul Park wrote: > Thank you very much for explaining it in detail. > > But let's shift a viewpoint. Precisely, I didn't want to work on locks > but *waiters* becasue dependancies causing deadlocks only can be created > by waiters - nevertheless I hav

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-10 Thread Byungchul Park
On Tue, Oct 10, 2017 at 09:22:26AM -0700, Linus Torvalds wrote: > On Mon, Oct 9, 2017 at 10:48 PM, Byungchul Park > wrote: > >> > >> The place where the release is done should simply be special. > >> > >> Because we should *not* encourage the whole "acquire by one context, > >> release by another

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-10 Thread Linus Torvalds
On Tue, Oct 10, 2017 at 11:14 AM, Peter Zijlstra wrote: > > Ah, but that's not at all what cross-release is about. Nobody really > does wonky ownership transfer of mutexes like that (although there might > be someone doing something with semaphores, I didn't check). Its to > allow detecting this d

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-10 Thread Peter Zijlstra
On Tue, Oct 10, 2017 at 09:56:26AM -0700, Linus Torvalds wrote: > So I think the best model would be something like this: > > - T1: > mutex_lock(&lock) > ... > mutex_transfer(&lock) > > - T2: > mutex_receive(&lock); > ... > mutex_unlock(&lock); >

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-10 Thread Linus Torvalds
On Tue, Oct 10, 2017 at 9:22 AM, Linus Torvalds wrote: > > I really would like to see the sites that do cross-thread lock/unlock > pairs themselves be annotated. > > So when you lock in one thread, and then unlock in another, I'd > actually prefer to see something like > > - T1: > lock_mu

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-10 Thread Linus Torvalds
On Mon, Oct 9, 2017 at 10:48 PM, Byungchul Park wrote: >> >> The place where the release is done should simply be special. >> >> Because we should *not* encourage the whole "acquire by one context, >> release by another" as being something normal and "just set the flag >> to let lockdep know". > >

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-09 Thread Byungchul Park
On Wed, Oct 04, 2017 at 10:34:30AM +0200, Peter Zijlstra wrote: > Right, and print_circular_bug() uses @trace before it ever can be set, > although I suspect the intention is that that only ever gets called from > commit_xhlock() where we pass in an initialized @trace. A comment > would've been goo

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-09 Thread Byungchul Park
On Tue, Oct 03, 2017 at 09:57:02AM -0700, Linus Torvalds wrote: > On Tue, Oct 3, 2017 at 9:54 AM, Linus Torvalds > wrote: > > > > Can we consider just reverting the crossrelease thing? > > > > The apparent stack corruption really worries me [...] > > Side note: I also think the thing is just brok

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-09 Thread Byungchul Park
On Tue, Oct 03, 2017 at 07:18:24PM +0200, Ingo Molnar wrote: > > * Linus Torvalds wrote: > > > On Tue, Oct 3, 2017 at 7:06 AM, Fengguang Wu wrote: > > > > > > This patch triggers a NULL-dereference bug at update_stack_state(). > > > Although its parent commit also has a NULL-dereference bug, ho

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-09 Thread Byungchul Park
On Tue, Oct 03, 2017 at 09:54:31AM -0700, Linus Torvalds wrote: > On Tue, Oct 3, 2017 at 7:06 AM, Fengguang Wu wrote: > > > > This patch triggers a NULL-dereference bug at update_stack_state(). > > Although its parent commit also has a NULL-dereference bug, however > > the call stack looks rather

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-09 Thread Fengguang Wu
On Mon, Oct 09, 2017 at 05:44:46PM +0200, Peter Zijlstra wrote: On Mon, Oct 09, 2017 at 11:41:30PM +0800, Fengguang Wu wrote: > > [ 187.855027] init: plymouth-splash main process (418) terminated with status 1 > > [ 187.953296] init: networking main process (419) terminated with status 1 > >

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-09 Thread Peter Zijlstra
On Mon, Oct 09, 2017 at 11:41:30PM +0800, Fengguang Wu wrote: > > > [ 187.855027] init: plymouth-splash main process (418) terminated with > > > status 1 > > > [ 187.953296] init: networking main process (419) terminated with status > > > 1 > > > [ 191.697721] [ cut here ]-

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-09 Thread Fengguang Wu
[ 187.855027] init: plymouth-splash main process (418) terminated with status 1 [ 187.953296] init: networking main process (419) terminated with status 1 [ 191.697721] [ cut here ] [ 191.699318] WARNING: CPU: 0 PID: 424 at kernel/locking/lockdep.c:3928 check_flags+0x1

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-09 Thread Peter Zijlstra
On Mon, Oct 09, 2017 at 10:17:06PM +0800, Fengguang Wu wrote: > It works! I tried 500 boots and only find 1 occurrence of this error, > which looks irrelevant to the current issue. OK, I'll go write a Changelog for the lockdep patch. > > [ 187.855027] init: plymouth-splash main process (418) t

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-09 Thread Fengguang Wu
On Mon, Oct 09, 2017 at 08:26:05AM -0500, Josh Poimboeuf wrote: On Mon, Oct 09, 2017 at 08:55:04PM +0800, Fengguang Wu wrote: On Mon, Oct 09, 2017 at 08:21:13PM +0800, Fengguang Wu wrote: > On Mon, Oct 09, 2017 at 12:50:55PM +0200, Peter Zijlstra wrote: > > > Fengguang, if you're still listening

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-09 Thread Josh Poimboeuf
On Mon, Oct 09, 2017 at 08:55:04PM +0800, Fengguang Wu wrote: > On Mon, Oct 09, 2017 at 08:21:13PM +0800, Fengguang Wu wrote: > > On Mon, Oct 09, 2017 at 12:50:55PM +0200, Peter Zijlstra wrote: > > > > Fengguang, if you're still listening, could you please rerun the tests > > > > on top of ce07a941

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-09 Thread Josh Poimboeuf
On Mon, Oct 09, 2017 at 02:54:04PM +0200, Peter Zijlstra wrote: > On Mon, Oct 09, 2017 at 08:21:13PM +0800, Fengguang Wu wrote: > > > > From e7840ad76515f0b5061fcdd098b57b7c01b61482 Mon Sep 17 00:00:00 2001 > > > > Message-Id: > > > > > > > > From: Josh Poimboeuf > > > > Date: Thu, 5 Oct 2017 09

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-09 Thread Fengguang Wu
On Mon, Oct 09, 2017 at 02:54:04PM +0200, Peter Zijlstra wrote: On Mon, Oct 09, 2017 at 08:21:13PM +0800, Fengguang Wu wrote: > > From e7840ad76515f0b5061fcdd098b57b7c01b61482 Mon Sep 17 00:00:00 2001 > > Message-Id: > > From: Josh Poimboeuf > > Date: Thu, 5 Oct 2017 09:43:59 -0500 > > Subjec

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-09 Thread Fengguang Wu
On Mon, Oct 09, 2017 at 08:21:13PM +0800, Fengguang Wu wrote: On Mon, Oct 09, 2017 at 12:50:55PM +0200, Peter Zijlstra wrote: Fengguang, if you're still listening, could you please rerun the tests on top of ce07a9415f26, with the attached patches also applied? Ping!? it would be very good to g

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-09 Thread Peter Zijlstra
On Mon, Oct 09, 2017 at 08:21:13PM +0800, Fengguang Wu wrote: > > > From e7840ad76515f0b5061fcdd098b57b7c01b61482 Mon Sep 17 00:00:00 2001 > > > Message-Id: > > > > > > From: Josh Poimboeuf > > > Date: Thu, 5 Oct 2017 09:43:59 -0500 > > > Subject: [PATCH 1/2] unwinder fixes > > > > > > --- > >

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-09 Thread Fengguang Wu
On Mon, Oct 09, 2017 at 12:50:55PM +0200, Peter Zijlstra wrote: Fengguang, if you're still listening, could you please rerun the tests on top of ce07a9415f26, with the attached patches also applied? Ping!? it would be very good to get feedback on this asap. Sorry for the delay! From e7840ad

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-09 Thread Peter Zijlstra
> Fengguang, if you're still listening, could you please rerun the tests > on top of ce07a9415f26, with the attached patches also applied? Ping!? it would be very good to get feedback on this asap. > From e7840ad76515f0b5061fcdd098b57b7c01b61482 Mon Sep 17 00:00:00 2001 > Message-Id: > > From:

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-05 Thread Josh Poimboeuf
On Thu, Oct 05, 2017 at 08:01:46AM -0500, Josh Poimboeuf wrote: > On Tue, Oct 03, 2017 at 09:54:31AM -0700, Linus Torvalds wrote: > > On Tue, Oct 3, 2017 at 7:06 AM, Fengguang Wu wrote: > > > > > > This patch triggers a NULL-dereference bug at update_stack_state(). > > > Although its parent commit

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-05 Thread Josh Poimboeuf
On Thu, Oct 05, 2017 at 08:02:33PM +0900, Tetsuo Handa wrote: > Josh Poimboeuf wrote: > > On Wed, Oct 04, 2017 at 06:44:50AM +0900, Tetsuo Handa wrote: > > > Josh Poimboeuf wrote: > > > > On Tue, Oct 03, 2017 at 11:28:15AM -0500, Josh Poimboeuf wrote: > > > > > There are two bugs: > > > > > > > >

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-05 Thread Josh Poimboeuf
On Tue, Oct 03, 2017 at 09:54:31AM -0700, Linus Torvalds wrote: > On Tue, Oct 3, 2017 at 7:06 AM, Fengguang Wu wrote: > > > > This patch triggers a NULL-dereference bug at update_stack_state(). > > Although its parent commit also has a NULL-dereference bug, however > > the call stack looks rather

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-05 Thread Tetsuo Handa
Josh Poimboeuf wrote: > On Wed, Oct 04, 2017 at 06:44:50AM +0900, Tetsuo Handa wrote: > > Josh Poimboeuf wrote: > > > On Tue, Oct 03, 2017 at 11:28:15AM -0500, Josh Poimboeuf wrote: > > > > There are two bugs: > > > > > > > > 1) Somebody -- presumably lockdep -- is corrupting the stack. Need the

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-04 Thread Josh Poimboeuf
On Wed, Oct 04, 2017 at 06:44:50AM +0900, Tetsuo Handa wrote: > Josh Poimboeuf wrote: > > On Tue, Oct 03, 2017 at 11:28:15AM -0500, Josh Poimboeuf wrote: > > > There are two bugs: > > > > > > 1) Somebody -- presumably lockdep -- is corrupting the stack. Need the > > >lockdep people to look at

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-04 Thread Josh Poimboeuf
On Wed, Oct 04, 2017 at 02:30:42PM -0700, Linus Torvalds wrote: > On Wed, Oct 4, 2017 at 2:06 PM, Josh Poimboeuf wrote: > > > > I compiled the same kernel with a similar version of GCC. It turns out > > that GCC *does* create unaligned stacks with frame pointers enabled: > > Christ. What a piece

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-04 Thread Linus Torvalds
On Wed, Oct 4, 2017 at 2:06 PM, Josh Poimboeuf wrote: > > I compiled the same kernel with a similar version of GCC. It turns out > that GCC *does* create unaligned stacks with frame pointers enabled: Christ. What a piece of crap. It doesn't even seem to make any sense. Spill room for the "u16 i

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-04 Thread Josh Poimboeuf
On Wed, Oct 04, 2017 at 06:44:50AM +0900, Tetsuo Handa wrote: > Josh Poimboeuf wrote: > > On Tue, Oct 03, 2017 at 11:28:15AM -0500, Josh Poimboeuf wrote: > > > There are two bugs: > > > > > > 1) Somebody -- presumably lockdep -- is corrupting the stack. Need the > > >lockdep people to look at

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-04 Thread Josh Poimboeuf
On Wed, Oct 04, 2017 at 11:20:52AM +0200, Peter Zijlstra wrote: > On Tue, Oct 03, 2017 at 07:18:24PM +0200, Ingo Molnar wrote: > > Yes, I'll do that tomorrow. I was always a bit unhappy about cross-release, > > because it breaks the 'owner task owns the lock' model. > > Still, you can get real dea

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-04 Thread Ingo Molnar
* Peter Zijlstra wrote: > On Tue, Oct 03, 2017 at 07:18:24PM +0200, Ingo Molnar wrote: > > Yes, I'll do that tomorrow. I was always a bit unhappy about cross-release, > > because it breaks the 'owner task owns the lock' model. > > Still, you can get real deadlocks with completions... > > > Plu

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-04 Thread Peter Zijlstra
On Tue, Oct 03, 2017 at 07:18:24PM +0200, Ingo Molnar wrote: > Yes, I'll do that tomorrow. I was always a bit unhappy about cross-release, > because it breaks the 'owner task owns the lock' model. Still, you can get real deadlocks with completions... > Plus I don't think we found that many real b

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-04 Thread Peter Zijlstra
On Tue, Oct 03, 2017 at 10:05:38AM -0500, Josh Poimboeuf wrote: > I don't know the lockdep code, but one more comment from the peanut > gallery. This code looks suspect to me: > > > /* >* Stop saving stack_trace if save_trace() was >

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-03 Thread Tetsuo Handa
Josh Poimboeuf wrote: > On Tue, Oct 03, 2017 at 11:28:15AM -0500, Josh Poimboeuf wrote: > > There are two bugs: > > > > 1) Somebody -- presumably lockdep -- is corrupting the stack. Need the > >lockdep people to look at that. > > > > 2) The 32-bit FP unwinder isn't handling the corrupt stack

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-03 Thread Josh Poimboeuf
On Tue, Oct 03, 2017 at 11:28:15AM -0500, Josh Poimboeuf wrote: > There are two bugs: > > 1) Somebody -- presumably lockdep -- is corrupting the stack. Need the >lockdep people to look at that. > > 2) The 32-bit FP unwinder isn't handling the corrupt stack very well, >It's blindly derefe

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-03 Thread Ingo Molnar
* Linus Torvalds wrote: > On Tue, Oct 3, 2017 at 7:06 AM, Fengguang Wu wrote: > > > > This patch triggers a NULL-dereference bug at update_stack_state(). > > Although its parent commit also has a NULL-dereference bug, however > > the call stack looks rather different. Both dmesg files are attac

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-03 Thread Linus Torvalds
On Tue, Oct 3, 2017 at 9:54 AM, Linus Torvalds wrote: > > Can we consider just reverting the crossrelease thing? > > The apparent stack corruption really worries me [...] Side note: I also think the thing is just broken. Any actual cross-releaser should be way more annotated than just "set cross

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-03 Thread Linus Torvalds
On Tue, Oct 3, 2017 at 7:06 AM, Fengguang Wu wrote: > > This patch triggers a NULL-dereference bug at update_stack_state(). > Although its parent commit also has a NULL-dereference bug, however > the call stack looks rather different. Both dmesg files are attached. > > It also triggers this warnin

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-03 Thread Josh Poimboeuf
On Tue, Oct 03, 2017 at 10:05:38AM -0500, Josh Poimboeuf wrote: > I don't know the lockdep code, but one more comment from the peanut > gallery. This code looks suspect to me: > > > /* >* Stop saving stack_trace if save_trace() was >

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-03 Thread Josh Poimboeuf
On Tue, Oct 03, 2017 at 09:41:36AM -0500, Josh Poimboeuf wrote: > On Tue, Oct 03, 2017 at 09:31:47AM -0500, Josh Poimboeuf wrote: > > On Tue, Oct 03, 2017 at 10:06:34PM +0800, Fengguang Wu wrote: > > > Hi Byungchul, > > > > > > This patch triggers a NULL-dereference bug at update_stack_state(). >

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-03 Thread Josh Poimboeuf
On Tue, Oct 03, 2017 at 09:31:47AM -0500, Josh Poimboeuf wrote: > On Tue, Oct 03, 2017 at 10:06:34PM +0800, Fengguang Wu wrote: > > Hi Byungchul, > > > > This patch triggers a NULL-dereference bug at update_stack_state(). > > Although its parent commit also has a NULL-dereference bug, however > >

Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2

2017-10-03 Thread Josh Poimboeuf
On Tue, Oct 03, 2017 at 10:06:34PM +0800, Fengguang Wu wrote: > Hi Byungchul, > > This patch triggers a NULL-dereference bug at update_stack_state(). > Although its parent commit also has a NULL-dereference bug, however > the call stack looks rather different. Both dmesg files are attached. > > I