Re: [9fans] lock diagnostics on Fossil server

2010-10-26 Thread erik quanstrom
On Mon Oct 25 22:03:53 EDT 2010, cinap_len...@gmx.de wrote: hm... wouldnt it just crash if mh-mount is nil? perhaps you are reading the diff backwards? it used to crash when mh-mount was nil. leading to a lock loop. i added the test to see that mh-mount != nil after the rlock on mh-lock is

Re: [9fans] lock diagnostics on Fossil server

2010-10-26 Thread Lucio De Re
On Tue, Oct 26, 2010 at 08:44:37AM -0400, erik quanstrom wrote: On Mon Oct 25 22:03:53 EDT 2010, cinap_len...@gmx.de wrote: hm... wouldnt it just crash if mh-mount is nil? perhaps you are reading the diff backwards? it used to crash when mh-mount was nil. leading to a lock loop. i

Re: [9fans] lock diagnostics on Fossil server

2010-10-26 Thread Russ Cox
sounds familiar.  this patch needs to be applied to the kernel: Like Lucio and Cinap, I am skeptical that this is the fix. It's a real bug and a correct fix, as we've discussed before, but if the kernel loses this race I believe it will crash dereferencing nil. Lucio showed a kernel that was

Re: [9fans] lock diagnostics on Fossil server

2010-10-26 Thread erik quanstrom
I was hoping you'd follow up on that, I needed a seed message and my mailbox has recently overflowed :-( I'm curious what you call crash in this case and I think Cinap is too. Basically, exactly what happens in the situation when a nil pointer is dereferenced in the kernel? How does the

Re: [9fans] lock diagnostics on Fossil server

2010-10-26 Thread erik quanstrom
It's a real bug and a correct fix, as we've discussed before, but if the kernel loses this race I believe it will crash dereferencing nil. Lucio showed a kernel that was very much still running. you are correct. i was confused. the bug reported looks like a missing waserror(). - erik

Re: [9fans] lock diagnostics on Fossil server

2010-10-26 Thread Lucio De Re
On Tue, Oct 26, 2010 at 07:28:57AM -0700, Russ Cox wrote: Like Lucio and Cinap, I am skeptical that this is the fix. It's a real bug and a correct fix, as we've discussed before, but if the kernel loses this race I believe it will crash dereferencing nil. Lucio showed a kernel that was

Re: [9fans] lock diagnostics on Fossil server

2010-10-26 Thread erik quanstrom
I can re-create the problem if anybody wants me to help diagnose it. please do. - erik

Re: [9fans] lock diagnostics on Fossil server

2010-10-26 Thread lucio
I can re-create the problem if anybody wants me to help diagnose it. please do. Looks like I don't need to: I left the machines running last night and I note two more instances this morning, using the patched kernel. So the problem is much less common now, but still present. That is

Re: [9fans] lock diagnostics on Fossil server

2010-10-26 Thread erik quanstrom
Looks like I don't need to: I left the machines running last night and I note two more instances this morning, using the patched kernel. So the problem is much less common now, but still present. That is positively weird. I thought I posted a request for some help debugging the kernel

Re: [9fans] lock diagnostics on Fossil server

2010-10-25 Thread erik quanstrom
I had a couple of CPU sessions running on it. The most obvious trigger seems to have been exportfs, I eventually turned off the stats report sounds familiar. this patch needs to be applied to the kernel: /n/sources/plan9//sys/src/9/port/chan.c:1012,1018 - chan.c:1012,1020

Re: [9fans] lock diagnostics on Fossil server

2010-10-25 Thread Lucio De Re
On Mon, Oct 25, 2010 at 09:20:51AM -0400, erik quanstrom wrote: sounds familiar. this patch needs to be applied to the kernel: /n/sources/plan9//sys/src/9/port/chan.c:1012,1018 - chan.c:1012,1020 /* * mh-mount-to == c, so start

Re: [9fans] lock diagnostics on Fossil server

2010-10-25 Thread cinap_lenrek
hm... wouldnt it just crash if mh-mount is nil? -- cinap ---BeginMessage--- I had a couple of CPU sessions running on it. The most obvious trigger seems to have been exportfs, I eventually turned off the stats report sounds familiar. this patch needs to be applied to the kernel:

[9fans] lock diagnostics on Fossil server

2010-10-24 Thread Lucio De Re
I keep getting errors in this fashion (I'm afraid I have no serial console, so this is manually copied stuff): lock 0xf045c390 loop key 0xdeaddead pc 0xf017728a held by pc 0xf017728a proc 100 117: timesync pc f01ef88c dbgpc 916f Open (Running) ut 22 st 173 bss 1b000 qpc f01c733d nl 0 nd