Re: panic in RELENG_5 UMA - two new stack traces

2005-07-05 Thread Gary Mu1der
Gleb Smirnoff wrote: G How often does it crash? Does debug.mpsafenet=0 increases stability? G G I can reproduce the crash within 60 seconds of firing off 30+ ping/arp G -d scripts, all running in parallel. G G debug.mpsafenet=0 seems to have solved the problem. I'm running 100+ G instances

Re: panic in RELENG_5 UMA - two new stack traces

2005-07-01 Thread Gleb Smirnoff
On Tue, Jun 28, 2005 at 11:24:47AM -0400, Gary Mu1der wrote: G I spent the day yesterday trying to reproduce the crash that I posted G last week and you kindly replied to. This is due to the fact that I G stupidly managed to overwrite the kernel.debug that I used to generate G the stack trace.

Re: panic in RELENG_5 UMA - two new stack traces

2005-07-01 Thread Gary Mu1der
Gleb Smirnoff wrote: On Tue, Jun 28, 2005 at 11:24:47AM -0400, Gary Mu1der wrote: G I spent the day yesterday trying to reproduce the crash that I posted G last week and you kindly replied to. This is due to the fact that I G stupidly managed to overwrite the kernel.debug that I used to

Re: panic in RELENG_5 UMA - two new stack traces

2005-07-01 Thread Gleb Smirnoff
On Fri, Jul 01, 2005 at 01:54:59PM -0400, Gary Mu1der wrote: G On Tue, Jun 28, 2005 at 11:24:47AM -0400, Gary Mu1der wrote: G G I spent the day yesterday trying to reproduce the crash that I posted G G last week and you kindly replied to. This is due to the fact that I G G stupidly managed to

Re: panic in RELENG_5 UMA - two new stack traces

2005-07-01 Thread Gary Mu1der
Gleb Smirnoff wrote: G I can reproduce the crash within 60 seconds of firing off 30+ ping/arp G -d scripts, all running in parallel. G G debug.mpsafenet=0 seems to have solved the problem. I'm running 100+ G instances of the above script and the system has been stable for over an G hour.

Re: panic in RELENG_5 UMA - two new stack traces

2005-07-01 Thread Gleb Smirnoff
On Fri, Jul 01, 2005 at 04:32:38PM -0400, Gary Mu1der wrote: G G I can reproduce the crash within 60 seconds of firing off 30+ ping/arp G G -d scripts, all running in parallel. G G G G debug.mpsafenet=0 seems to have solved the problem. I'm running 100+ G G instances of the above script and the

Re: panic in RELENG_5 UMA - two new stack traces

2005-06-28 Thread Gary Mu1der
Gleb, Thank you very much for your reply. I spent the day yesterday trying to reproduce the crash that I posted last week and you kindly replied to. This is due to the fact that I stupidly managed to overwrite the kernel.debug that I used to generate the stack trace. Sadly I could not cause

Re: panic in RELENG_5 UMA

2005-06-27 Thread Gleb Smirnoff
On Fri, Jun 24, 2005 at 03:28:34PM -0400, Gary Mu1der wrote: G Can someone confirm that the following stack trace is showing the same G problem, or not? G I can reproduce the problem with the custom kernel config included below G (which is basically GENERIC stripped of devices I don't have or

Re: panic in RELENG_5 UMA

2005-06-24 Thread Gary Mu1der
All, Can someone confirm that the following stack trace is showing the same problem, or not? I can reproduce the problem with the custom kernel config included below (which is basically GENERIC stripped of devices I don't have or need and IPFILTER added), but not with a stock GENERIC

Re: panic in RELENG_5 UMA

2005-06-24 Thread Gary Mu1der
Sorry, I forgot to add that this is a Tyan Thunder K8SPRO w/dual AMD Opteron Processors, model no. 246, 4GB of RAM and an Adaptec 2200S RAID controller. The NIC being used is the onboard Broadcom Gigabit Ethernet (bge). Thanks, Gary ___

Re: panic in RELENG_5 UMA

2005-06-23 Thread Gleb Smirnoff
On Wed, Jun 22, 2005 at 03:03:53PM +0200, Andre Oppermann wrote: A Fixing this one is harder. We take la from unlocked rtentry obtained via A rt_check(), or from arplookup(). The latter drops lock on rtentry, too. A Then we do some work and use this la. It may have already been freed in A

Re: panic in RELENG_5 UMA

2005-06-23 Thread Jeremie Le Hen
Gleb, What about fixing it step by step? The patch attached to my previous message fixes the panic report by Jeremie, I suppose. It is race between output path and input path, that can occur anytime in runtime. FYI, I compiled my kernel with your patch and I have had no panic since then.

Re: panic in RELENG_5 UMA

2005-06-22 Thread Andre Oppermann
Gleb Smirnoff wrote: [ cc'ing parties involved in this part of code] On Tue, Jun 21, 2005 at 01:07:01PM +0400, Gleb Smirnoff wrote: T On Tue, Jun 21, 2005 at 09:04:27AM +0200, Jeremie Le Hen wrote: T J #25 0xc05a0a0b in m_freem (mb=0x0) at uma.h:304 T J No locals. T J #26 0xc05ee0d5 in

Re: panic in RELENG_5 UMA

2005-06-22 Thread Gleb Smirnoff
On Wed, Jun 22, 2005 at 03:03:53PM +0200, Andre Oppermann wrote: A Fixing this one is harder. We take la from unlocked rtentry obtained via A rt_check(), or from arplookup(). The latter drops lock on rtentry, too. A Then we do some work and use this la. It may have already been freed in A

Re: panic in RELENG_5 UMA

2005-06-22 Thread Andre Oppermann
Gleb Smirnoff wrote: On Wed, Jun 22, 2005 at 03:03:53PM +0200, Andre Oppermann wrote: A Fixing this one is harder. We take la from unlocked rtentry obtained via A rt_check(), or from arplookup(). The latter drops lock on rtentry, too. A Then we do some work and use this la. It may have

panic in RELENG_5 UMA

2005-06-21 Thread Jeremie Le Hen
Hi list, I caught a panic this night on my RELENG_5. The kernel was compiled on 2005/05/21. Please, feel free to ask for further informations (and include me explicitely in the recipients list since I'm not subscribed to this list). kgdb stacktrace: %%% #22 0xc0566d1d in panic (

Re: panic in RELENG_5 UMA

2005-06-21 Thread Jeremie Le Hen
Hi, I caught a panic this night on my RELENG_5. The kernel was compiled on 2005/05/21. Please, feel free to ask for further informations (and include me explicitely in the recipients list since I'm not subscribed to this list). kgdb stacktrace: %%% [snip] %%% I was a little bit

Re: panic in RELENG_5 UMA

2005-06-21 Thread Gleb Smirnoff
On Tue, Jun 21, 2005 at 09:04:27AM +0200, Jeremie Le Hen wrote: J #25 0xc05a0a0b in m_freem (mb=0x0) at uma.h:304 J No locals. J #26 0xc05ee0d5 in arpresolve (ifp=0xc1a5b000, rt0=0xc1d44000, m=0xc1be7200, J dst=0xd6d3fa94, desten=0xd6d3fa2c /??]??w??) J at

Re: panic in RELENG_5 UMA

2005-06-21 Thread Jeremie Le Hen
Hi Gleb, IMHO, this looks like a race. The route is not locked, when its llinfo is edited. Probably the mbuf was freed when arp reply arrived and la_hold was send. Look into in_arpinput() near 736: (*ifp-if_output)(ifp, la-la_hold, rt_key(rt), rt);

Re: panic in RELENG_5 UMA

2005-06-21 Thread Gleb Smirnoff
On Tue, Jun 21, 2005 at 11:28:36AM +0200, Jeremie Le Hen wrote: J IMHO, this looks like a race. The route is not locked, when J its llinfo is edited. J J Probably the mbuf was freed when arp reply arrived and la_hold was send. J Look into in_arpinput() near 736: J J

Re: panic in RELENG_5 UMA

2005-06-21 Thread Gleb Smirnoff
[ cc'ing parties involved in this part of code] On Tue, Jun 21, 2005 at 01:07:01PM +0400, Gleb Smirnoff wrote: T On Tue, Jun 21, 2005 at 09:04:27AM +0200, Jeremie Le Hen wrote: T J #25 0xc05a0a0b in m_freem (mb=0x0) at uma.h:304 T J No locals. T J #26 0xc05ee0d5 in arpresolve (ifp=0xc1a5b000,