Re: scheduler panic
On Sunday, December 25, 2011 10:48:32 am Larry Rosenman wrote: > On Fri, 23 Dec 2011, Larry Rosenman wrote: > > On 12/23/2011 8:54 AM, John Baldwin wrote: > >> The sloppiest fix might be to do this: > >> > >> Index: sched_ule.c > >> === > >> > >> > > - --- sched_ule.c (revision 228777) > >> +++ sched_ule.c(working copy) @@ -1434,7 +1434,8 @@ > >> sched_priority(struct thread *td) } else { pri = SCHED_PRI_MIN; if > >> (td->td_sched->ts_ticks) - pri += > >> SCHED_PRI_TICKS(td->td_sched); > >> + pri += min(SCHED_PRI_TICKS(td->td_sched), + > >> SCHED_PRI_RANGE); pri += SCHED_PRI_NICE(td->td_proc->p_nice); > >> KASSERT(pri >= PRI_MIN_BATCH && pri <= PRI_MAX_BATCH, > >> ("sched_priority: invalid priority %d: nice %d, " > >> > > > > I've applied this to both the host and the guest, and am recompiling > > the guest kernel (hopefully it'll stay up long enough...). > > > > I'll report back. > > > > Do y'all (FreeBSD Devs) want a PR? > > > > > I've run 2 complete buildworld/buildkernel cycles with the patch applied > in the guest, and it's made it all the way through. It wouldn't > do that without it. > > Can we get this (or something else like it) applied? > > Do I need to file a PR? I've committed this. I do think the root problem is an issue with the clock interrupts, but this seems to be a common enough problem I think a workaround is warranted. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: scheduler panic
On Fri, 23 Dec 2011, Larry Rosenman wrote: On 12/23/2011 8:54 AM, John Baldwin wrote: The sloppiest fix might be to do this: Index: sched_ule.c === - --- sched_ule.c (revision 228777) +++ sched_ule.c (working copy) @@ -1434,7 +1434,8 @@ sched_priority(struct thread *td) } else { pri = SCHED_PRI_MIN; if (td->td_sched->ts_ticks) -pri += SCHED_PRI_TICKS(td->td_sched); + pri += min(SCHED_PRI_TICKS(td->td_sched), + SCHED_PRI_RANGE); pri += SCHED_PRI_NICE(td->td_proc->p_nice); KASSERT(pri >= PRI_MIN_BATCH && pri <= PRI_MAX_BATCH, ("sched_priority: invalid priority %d: nice %d, " I've applied this to both the host and the guest, and am recompiling the guest kernel (hopefully it'll stay up long enough...). I'll report back. Do y'all (FreeBSD Devs) want a PR? I've run 2 complete buildworld/buildkernel cycles with the patch applied in the guest, and it's made it all the way through. It wouldn't do that without it. Can we get this (or something else like it) applied? Do I need to file a PR? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 512-248-2683 E-Mail: l...@lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3893 ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: scheduler panic
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/23/2011 8:54 AM, John Baldwin wrote: > The sloppiest fix might be to do this: > > Index: sched_ule.c > === > > - --- sched_ule.c (revision 228777) > +++ sched_ule.c (working copy) @@ -1434,7 +1434,8 @@ > sched_priority(struct thread *td) } else { pri = SCHED_PRI_MIN; if > (td->td_sched->ts_ticks) -pri += > SCHED_PRI_TICKS(td->td_sched); > + pri += min(SCHED_PRI_TICKS(td->td_sched), + > SCHED_PRI_RANGE); pri += SCHED_PRI_NICE(td->td_proc->p_nice); > KASSERT(pri >= PRI_MIN_BATCH && pri <= PRI_MAX_BATCH, > ("sched_priority: invalid priority %d: nice %d, " > I've applied this to both the host and the guest, and am recompiling the guest kernel (hopefully it'll stay up long enough...). I'll report back. Do y'all (FreeBSD Devs) want a PR? - -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 512-248-2683 E-Mail: l...@lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3893 -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJO9K9cAAoJENC8dtAvA1zmruAIAL0udaYatGWp5E/Th9YYD8Hh FHVri/G/Va8YsivqfZLFYUZd8SyqO/0vxEIoG73iKJJmjW/CpYIjgOvCRvsCrefm ABOYmRX0dvC8GLHDgN9XFt4J9GmNTDcneNV7rOvWKisygkHw0GlK5DxKtSo3PsE8 6MQSnUuVmUMggsVQfBUiPTyTmJigcJ9KuEdfbHQ2o7+sCWx+gAKCyfVFcwkNIrYv M7j21dJ8hjHUteHZ3YttVjYku0/YISSmtvGVCMlm2xBGD+tTu5g2ZcqZsxzlRFst HyLGDP3mKSQJRMHcvl+OXMmwnFO7m31fLhj04LIWardV93S3CYF0c54LNEHYEN4= =/imM -END PGP SIGNATURE- ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: scheduler panic
On Friday, December 23, 2011 8:21:41 am Larry Rosenman wrote: > I've been getting these in a VirtualBox VM. I'm not sure what to do. > > I CAN give VNC access to this VM in this state. > > panic: sched_priority: invalid priority 331: nice 0, ticks 56612596 > ftick 1213618 itick 1214628 tick pri 159 In the past this happened because the 'ticks' value was bananas. Priority values should only be from 0 to 255, so 331 is definitely too large. The priority is computed like so: pri = SCHED_PRI_MIN; if (td->td_sched->ts_ticks) pri += SCHED_PRI_TICKS(td->td_sched); pri += SCHED_PRI_NICE(td->td_proc->p_nice); KASSERT(pri >= PRI_MIN_BATCH && pri <= PRI_MAX_BATCH, ("sched_priority: invalid priority %d: nice %d, " "ticks %d ftick %d ltick %d tick pri %d", pri, td->td_proc->p_nice, td->td_sched->ts_ticks, td->td_sched->ts_ftick, td->td_sched->ts_ltick, SCHED_PRI_TICKS(td->td_sched))); Note that you have: kern/sched_ule.c: #define PRI_TIMESHARE_RANGE (PRI_MAX_TIMESHARE - PRI_MIN_TIMESHARE + 1) #define PRI_INTERACT_RANGE ((PRI_TIMESHARE_RANGE - SCHED_PRI_NRESV) / 2) #define PRI_BATCH_RANGE (PRI_TIMESHARE_RANGE - PRI_INTERACT_RANGE) #define PRI_MIN_INTERACTPRI_MIN_TIMESHARE #define PRI_MAX_INTERACT(PRI_MIN_TIMESHARE + PRI_INTERACT_RANGE - 1) #define PRI_MIN_BATCH (PRI_MIN_TIMESHARE + PRI_INTERACT_RANGE) #define PRI_MAX_BATCH PRI_MAX_TIMESHARE #define SCHED_PRI_NRESV (PRIO_MAX - PRIO_MIN) sys/resource.h: #define PRIO_MIN-20 #define PRIO_MAX20 sys/priority.h: #define PRI_MIN_TIMESHARE (120) #define PRI_MAX_TIMESHARE (PRI_MIN_IDLE - 1) #define PRI_MIN_IDLE(224) So PRI_MAX_BATCH is 223. PRI_MIN_BATCH is 120 + (((223 - 120 + 1) - (20 - -20)) / 2) which is 152. So given SCHED_PRI_TICKS() of 159, you end up with 152 + 159 = 311, and since your nice is 0, SCHED_PRI_NICE() ends up being 20, hence 331. It seems the largets value SCHED_PRI_TICKS() should ever generate is (PRI_BATCH_RANGE - SCHED_PRI_NRESV), though ULE doesn't quite compute it that way (it might be off by one): #define SCHED_PRI_NRESV (PRIO_MAX - PRIO_MIN) #define SCHED_PRI_NHALF (SCHED_PRI_NRESV / 2) #define SCHED_PRI_MIN (PRI_MIN_BATCH + SCHED_PRI_NHALF) #define SCHED_PRI_MAX (PRI_MAX_BATCH - SCHED_PRI_NHALF) #define SCHED_PRI_RANGE (SCHED_PRI_MAX - SCHED_PRI_MIN + 1) However, it's not clear that SCHED_PRI_TICKS() will cap its value to SCHED_PRI_RANGE: #define SCHED_PRI_TICKS(ts) \ (SCHED_TICK_HZ((ts)) / \ (roundup(SCHED_TICK_TOTAL((ts)), SCHED_PRI_RANGE) / SCHED_PRI_RANGE)) The sloppiest fix might be to do this: Index: sched_ule.c === --- sched_ule.c (revision 228777) +++ sched_ule.c (working copy) @@ -1434,7 +1434,8 @@ sched_priority(struct thread *td) } else { pri = SCHED_PRI_MIN; if (td->td_sched->ts_ticks) - pri += SCHED_PRI_TICKS(td->td_sched); + pri += min(SCHED_PRI_TICKS(td->td_sched), + SCHED_PRI_RANGE); pri += SCHED_PRI_NICE(td->td_proc->p_nice); KASSERT(pri >= PRI_MIN_BATCH && pri <= PRI_MAX_BATCH, ("sched_priority: invalid priority %d: nice %d, " -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
scheduler panic
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I've been getting these in a VirtualBox VM. I'm not sure what to do. I CAN give VNC access to this VM in this state. panic: sched_priority: invalid priority 331: nice 0, ticks 56612596 ftick 1213618 itick 1214628 tick pri 159 cpuid = 0 KDB: enter: panic Ideas? - -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 512-248-2683 E-Mail: l...@lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3893 -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJO9H7aAAoJENC8dtAvA1zmXg8H/3lmAWQBszmBCPv2ucbH4JE8 c7M20HHmtJZtISal/FAkjFD324xDDAIwwZhBlB5bJZzXw3RE+BuCuJy+yYdIcGQd 3DGUvli2ryhOpE8xzkG1i9qIyBvMV8B2lxgdpnAGTtuCnMQPEMGUNPST6RrTivHs gSk+KxtrmuEtpIowKxeg4HC2JIyF2VQikd0eximYM2b9pRQg5eYiO6HG4xoKJCxh OQJ3hbITveoSlevd9QddKUQeD7y80KnBT2KNIZsr9HtErZCIDcZYJAXIAgcGUPDW F9lXVTj7+vaX8YEgZc1i/WExKnyvq3qyQQQktSWSnInzHlMg8nItovZduwtE23E= =nqba -END PGP SIGNATURE- paninc.PNG.sig Description: Binary data ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: scheduler panic
On Fri, 23 Dec 2011, Ivan Klymenko wrote: ? Fri, 23 Dec 2011 07:38:21 -0600 Larry Rosenman ?: BORG-DTRACE Show, please, the kernel config BORG-DTRACE include GENERIC ident BORG-DTRACE options KDTRACE_HOOKS# all architectures - enable general DTrace hooks options DDB_CTF # all architectures - kernel ELF linker loads CTF data options KDTRACE_FRAME# amd64 - ensure frames are compiled in #makeoptions DEBUG="-g" # amd64? - build kernel with gdb(1) debug symbols makeoptions WITH_CTF=1 #options COMPAT_FREEBSD8 nooptions WITNESS nodevice mvs nodevice siis nodevice ahc nodevice ahd nodevice amd nodevice hptiop nodevice isp nodevice mpt nodevice mps nodevice sym nodevice trm nodevice adv nodevice adw nodevice aic nodevice bt nodevice amr nodevice arcmsr nodevice asr nodevice ciss nodevice dpt nodevice hptmv nodevice hptrr nodevice iir nodevice ips nodevice mly nodevice twa nodevice aac nodevice aacp nodevice ida nodevice mfi nodevice mlx nodevice twe nodevice tws nodevice cbb nodevice pccard nodevice cardbus nodevice plip nodevice puc nodevice bxe nodevice de nodevice igb nodevice ixgbe nodevice le nodevice ti nodevice txp nodevice vx nodevice ae nodevice age nodevice alc nodevice ale nodevice bce nodevice bfe nodevice bge nodevice dc nodevice et nodevice fxp nodevice jme nodevice lge nodevice msk nodevice mge nodevice pcn nodevice re nodevice rl nodevice sf nodevice sge nodevice sis nodevice sk nodevice ste nodevice stge nodevice tl nodevice tx nodevice vge nodevice vr nodevice wb nodevice xl nodevice cs nodevice ed nodevice ex nodevice ep nodevice fe nodevice sn nodevice xe nodevice wlan nodevice wlan_wep nodevice wlan_ccmp nodevice wlan_tkip nodevice wlan_amrr nodevice an nodevice ath nodevice ath_pci nodevice ath_hal nodevice ath_rate_sample nodevice ipw nodevice iwi nodevice iwn nodevice malo nodevice mwl nodevice ral nodevice wi nodevice wpi nodevice urio# Diamond Rio 500 MP3 player # USB Serial devices nodevice u3g # USB-based 3G modems (Option, Huawei, Sierra) nodevice uark# Technologies ARK3116 based serial adapters nodevice ubsa# Belkin F5U103 and compatible serial adapters nodevice uftdi # For FTDI usb serial adapters nodevice uipaq # Some WinCE based devices nodevice uplcom # Prolific PL-2303 serial adapters nodevice uslcom # SI Labs CP2101/CP2102 serial adapters nodevice uvisor # Visor and Palm devices nodevice uvscom # USB serial support for DDI pocket's PHS # USB Ethernet, requires miibus nodevice aue # ADMtek USB Ethernet nodevice axe # ASIX Electronics USB Ethernet nodevice cdce# Generic USB over Ethernet nodevice cue # CATC USB Ethernet nodevice kue # Kawasaki LSI USB Ethernet nodevice rue # RealTek RTL8150 USB Ethernet nodevice udav# Davicom DM9601E USB # USB Wireless nodevice rum # Ralink Technology RT2501USB wireless NICs nodevice run # Ralink Technology RT2700/RT2800/RT3000 NICs. nodevice uath# Atheros AR5523 wireless NICs nodevice upgt# Conexant/Intersil PrismGT wireless NICs. nodevice ural# Ralink Technology RT2500USB wireless NICs nodevice urtw# Realtek RTL8187B/L wireless NICs nodevice zyd # ZyDAS zd1211/zd1211b wireless NICs # FireWire support nodevice firewire# FireWire bus code nodevice sbp # SCSI over FireWire (Requires scbus and da) nodevice fwe # Ethernet over FireWire (non-standard!) nodevice fwip# IP over FireWire (RFC 2734,3146) nodevice dcons # Dumb console driver nodevice dcons_crom # Configuration ROM for dcons # Sound support nodevice sound # Generic sound driver (required) nodevice snd_es137x # Ensoniq AudioPCI ES137x nodevice snd_hda # Intel High Definition Audio nodevice snd_ich # Intel, NVidia and other ICH AC'97 Audio nodevice snd_uaudio # USB Audio nodevice snd_via8233 # VIA VT8233x Audio devicenetmap options FFCLOCK I've also seen it with GENERIC, FWIW. -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 512-248-2683 E-Mail: l...@lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3893 ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: scheduler panic
В Fri, 23 Dec 2011 07:38:21 -0600 Larry Rosenman пишет: > BORG-DTRACE Show, please, the kernel config BORG-DTRACE ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: scheduler panic
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/23/2011 7:31 AM, Ivan Klymenko wrote: > В Fri, 23 Dec 2011 07:21:41 -0600 Larry Rosenman > пишет: > >> -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 >> >> I've been getting these in a VirtualBox VM. I'm not sure what to >> do. >> >> I CAN give VNC access to this VM in this state. >> >> panic: sched_priority: invalid priority 331: nice 0, ticks >> 56612596 ftick 1213618 itick 1214628 tick pri 159 cpuid = 0 KDB: >> enter: panic >> >> Ideas? (repost without the screenshot). > > uname -a ??? Oops. It's running the same kernel as it's host: $ uname -a FreeBSD borg.lerctr.org 10.0-CURRENT FreeBSD 10.0-CURRENT #31 r228802: Thu Dec 22 11:21:25 CST 2011 r...@borg.lerctr.org:/usr/obj/usr/src/sys/BORG-DTRACE amd64 $ I can also give SSH access to the host as well. - -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 512-248-2683 E-Mail: l...@lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3893 -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJO9IRMAAoJENC8dtAvA1zmxJEIALP7wzsx9Co9QaE+Cx3JK2vx pCRJqLBTkpsnzdYmGsczBAUpEXJ/POx+7UsWycd48zQlT64FZubeHGi2yIIZNOzL zpYdaY/70cacFuyouMtZyLOrCTLiJe4AVBOluA79zCNgJKIcIhGyGSObsO7CqiiR oS2MHyWy9n5oLo6Qf79708gar4QXHDZwVkgRZ3heWeZY+wPt8CVrzX5k8uf7dSlz Yq4+A1G9atfuprp6iRUTIT7aHKKv6IwM3QAg2wuaUqatUYsJv8ushRrsZHfJmmft /LmMmHMlaqqsDy4Wjm0v5souid6vIuGv7zyOxfILCnk/9UnWEEThqpP31mu362k= =skwU -END PGP SIGNATURE- ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: scheduler panic
В Fri, 23 Dec 2011 07:21:41 -0600 Larry Rosenman пишет: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > I've been getting these in a VirtualBox VM. I'm not sure what to do. > > I CAN give VNC access to this VM in this state. > > panic: sched_priority: invalid priority 331: nice 0, ticks 56612596 > ftick 1213618 itick 1214628 tick pri 159 > cpuid = 0 > KDB: enter: panic > > Ideas? > (repost without the screenshot). uname -a ??? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
scheduler panic
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I've been getting these in a VirtualBox VM. I'm not sure what to do. I CAN give VNC access to this VM in this state. panic: sched_priority: invalid priority 331: nice 0, ticks 56612596 ftick 1213618 itick 1214628 tick pri 159 cpuid = 0 KDB: enter: panic Ideas? (repost without the screenshot). - -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 512-248-2683 E-Mail: l...@lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3893 -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJO9IBlAAoJENC8dtAvA1zmAJAIAL67TPUdIigtumkBLHZM1qCo 7JFfBXpyEjH8vs0bkCk+GYSCke67IGMUpiR5XeZ8UsKjiTtyyhw1SQZYIw/EiVvf 7Nf+DOxbKIYEPezeEqpaskejItfOM6h7ajZovRNTJsrNH+ha0csGgFk46iEFH5Qq LTQ7D5GrFj+hCzNDLcbxWOiTqxGMlTboZun5C0Y6BYK09RpLqMtU6bIh/37zj7kr u4VSh94hPW8t8qTnL5rlETMAjvmtIivphEVv/R5jOv0cGtNP/o2QaM66w3TaxyJ0 Z9ixNuzq3MAft20VRrVdUEnZ43DASv7Aisl2GNoaTNRW/MVuaULG/PdA9hj2vZ8= =G2La -END PGP SIGNATURE- ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Scheduler panic
replying to myself again > > This is the best workaround I can think of: > > Index: kern/kern_intr.c > === > RCS file: /home/ncvs/src/sys/kern/kern_intr.c,v > retrieving revision 1.47 > diff -u -r1.47 kern_intr.c > --- kern/kern_intr.c2001/02/28 02:53:43 1.47 > +++ kern/kern_intr.c2001/03/02 02:28:08 > @@ -366,7 +366,7 @@ > */ > ithread->it_need = 1; > mtx_lock_spin(&sched_lock); > - if (p->p_stat == SWAIT) { > + if (p->p_stat == SWAIT && curproc->p_stat == SRUN) { > CTR1(KTR_INTR, __func__ ": setrunqueue %d", p->p_pid); > p->p_stat = SRUN; > setrunqueue(p); Heh. Sorry this is wrong, the test for SRUN should be in the same if statement as the do_switch, one further in. This will completetly miss interrupts if the race is ever hit... To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Scheduler panic
On 02-Mar-01 Jake Burkholder wrote: >> > On Sun, Feb 25, 2001 at 10:29:42PM -0800, Kris Kennaway wrote: >> > > This is on a UP system. >> > >> > Had another one of these, under the same conditions. Both times I was >> > running more(1) on a stdin stream which was generated by a "find | >> > grep | more" operation, and I suspended the process with ^Z, >> > triggering the panic. Perhaps this will help in tracking down the >> > root cause. >> >> I'm pretty sure I know what this is; I'll work up a patch tonight. >> > > Sorry this is taking so long. Its turned out to be a little more > complex to fix properly than I originally thought. We're going to > have to change the way one of the fields of struct proc (p_pptr) > is locked. The problem is that a process is getting preempted > when its not SRUN, which should be protected by the scheduler > lock so that the preemption can't occur. > > This is the best workaround I can think of: > > Index: kern/kern_intr.c > === > RCS file: /home/ncvs/src/sys/kern/kern_intr.c,v > retrieving revision 1.47 > diff -u -r1.47 kern_intr.c > --- kern/kern_intr.c2001/02/28 02:53:43 1.47 > +++ kern/kern_intr.c2001/03/02 02:28:08 > @@ -366,7 +366,7 @@ > */ > ithread->it_need = 1; > mtx_lock_spin(&sched_lock); > - if (p->p_stat == SWAIT) { > + if (p->p_stat == SWAIT && curproc->p_stat == SRUN) { > CTR1(KTR_INTR, __func__ ": setrunqueue %d", p->p_pid); > p->p_stat = SRUN; > setrunqueue(p); > > Jake Eek, this is wrong. We need to always put it on the runqueue, the trick is we just need to avoid the actual task switch. This is what I have here: @@ -369,7 +374,7 @@ CTR1(KTR_INTR, __func__ ": setrunqueue %d", p->p_pid); p->p_stat = SRUN; setrunqueue(p); - if (do_switch) { + if (do_switch && curproc->p_stat == SRUN) { saveintr = sched_lock.mtx_saveintr; mtx_intr_enable(&sched_lock); if (curproc != PCPU_GET(idleproc)) (Among other fixes.) I'll try and get this committed tonight if no one screams bloody murder. -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Scheduler panic
> > On Sun, Feb 25, 2001 at 10:29:42PM -0800, Kris Kennaway wrote: > > > This is on a UP system. > > > > Had another one of these, under the same conditions. Both times I was > > running more(1) on a stdin stream which was generated by a "find | > > grep | more" operation, and I suspended the process with ^Z, > > triggering the panic. Perhaps this will help in tracking down the > > root cause. > > I'm pretty sure I know what this is; I'll work up a patch tonight. > Sorry this is taking so long. Its turned out to be a little more complex to fix properly than I originally thought. We're going to have to change the way one of the fields of struct proc (p_pptr) is locked. The problem is that a process is getting preempted when its not SRUN, which should be protected by the scheduler lock so that the preemption can't occur. This is the best workaround I can think of: Index: kern/kern_intr.c === RCS file: /home/ncvs/src/sys/kern/kern_intr.c,v retrieving revision 1.47 diff -u -r1.47 kern_intr.c --- kern/kern_intr.c2001/02/28 02:53:43 1.47 +++ kern/kern_intr.c2001/03/02 02:28:08 @@ -366,7 +366,7 @@ */ ithread->it_need = 1; mtx_lock_spin(&sched_lock); - if (p->p_stat == SWAIT) { + if (p->p_stat == SWAIT && curproc->p_stat == SRUN) { CTR1(KTR_INTR, __func__ ": setrunqueue %d", p->p_pid); p->p_stat = SRUN; setrunqueue(p); Jake To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Scheduler panic
> On Sun, Feb 25, 2001 at 10:29:42PM -0800, Kris Kennaway wrote: > > This is on a UP system. > > Had another one of these, under the same conditions. Both times I was > running more(1) on a stdin stream which was generated by a "find | > grep | more" operation, and I suspended the process with ^Z, > triggering the panic. Perhaps this will help in tracking down the > root cause. I'm pretty sure I know what this is; I'll work up a patch tonight. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Scheduler panic
On Sun, Feb 25, 2001 at 10:29:42PM -0800, Kris Kennaway wrote: > This is on a UP system. Had another one of these, under the same conditions. Both times I was running more(1) on a stdin stream which was generated by a "find | grep | more" operation, and I suspended the process with ^Z, triggering the panic. Perhaps this will help in tracking down the root cause. Kris PGP signature
Scheduler panic
This is on a UP system. Kris GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd"... IdlePTD 4767744 initial pcb at 3c9740 panicstr: from debugger panic messages: --- panic: runq_add: proc 0xca466420 (more) not SRUN panic: from debugger Uptime: 3h39m42s dumping to dev #da/0x20001, offset 262144 dump 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 --- #0 dumpsys () at ../../kern/kern_shutdown.c:476 476 if (dumping++) { (kgdb) bt #0 dumpsys () at ../../kern/kern_shutdown.c:476 #1 0xc01b0c3c in boot (howto=260) at ../../kern/kern_shutdown.c:319 #2 0xc01b1011 in panic (fmt=0xc032a794 "from debugger") at ../../kern/kern_shutdown.c:569 #3 0xc013c7dd in db_panic (addr=-1070605187, have_addr=0, count=-1, modif=0xca4dbca0 "") at ../../ddb/db_command.c:433 #4 0xc013c77b in db_command (last_cmdp=0xc0371cf4, cmd_table=0xc0371b54, aux_cmd_tablep=0xc03b6b60) at ../../ddb/db_command.c:333 #5 0xc013c842 in db_command_loop () at ../../ddb/db_command.c:455 #6 0xc013eaaf in db_trap (type=3, code=0) at ../../ddb/db_trap.c:71 #7 0xc02fda0c in kdb_trap (type=3, code=0, regs=0xca4dbda0) at ../../i386/i386/db_interface.c:164 #8 0xc030a418 in trap (frame={tf_fs = -901382120, tf_es = 16, tf_ds = -1051394032, tf_edi = -1069770656, tf_esi = 256, tf_ebp = -900874772, tf_isp = -900874804, tf_ebx = 2, tf_edx = -1070199057, tf_ecx = 32, tf_eax = 18, tf_trapno = 3, tf_err = 0, tf_eip = -1070605187, tf_cs = 8, tf_eflags = 86, tf_esp = -1070199073, tf_ss = -1070347485}) at ../../i386/i386/trap.c:614 #9 0xc02fdc7d in Debugger (msg=0xc033cb23 "panic") at machine/cpufunc.h:60 #10 0xc01b1008 in panic (fmt=0xc033cf60 "runq_add: proc %p (%s) not SRUN") at ../../kern/kern_shutdown.c:567 #11 0xc01b483c in runq_add (rq=0xc03c9860, p=0xca466420) at ../../kern/kern_switch.c:142 #12 0xc01b47f5 in setrunqueue (p=0xca466420) at ../../kern/kern_switch.c:70 ---Type to continue, or q to quit--- #13 0xc01a5750 in ithread_schedule (ithread=0xc1349100, do_switch=1) at ../../kern/kern_intr.c:376 #14 0xc030ed8d in sched_ithd (cookie=0x5) at ../../i386/isa/ithread.c:99 #15 0x8 in ?? () #16 0xc01b3329 in issignal (p=0xca466420) at ../../kern/kern_sig.c:1410 #17 0xc01b116a in CURSIG (p=0xca466420) at ../../kern/kern_sig.c:190 #18 0xc030981e in userret (p=0xca466420, frame=0xca4dbfa8, oticks=2) at ../../i386/i386/trap.c:179 #19 0xc030b3d3 in syscall (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 2, tf_esi = 12, tf_ebp = -1077938440, tf_isp = -900874284, tf_ebx = -2, tf_edx = 22195, tf_ecx = 17, tf_eax = 0, tf_trapno = 22, tf_err = 2, tf_eip = 672387100, tf_cs = 31, tf_eflags = 646, tf_esp = -1077938484, tf_ss = 47}) at ../../i386/i386/trap.c:1239 #20 0xc02fe393 in Xint0x80_syscall () #21 0x804d186 in ?? () #22 0x80495ac in ?? () #23 0x804915d in ?? () Script done on Sun Feb 25 22:28:05 2001 PGP signature
Scheduler panic
This is on a UP system. Kris GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd"... IdlePTD 4767744 initial pcb at 3c9740 panicstr: from debugger panic messages: --- panic: runq_add: proc 0xca466420 (more) not SRUN panic: from debugger Uptime: 3h39m42s dumping to dev #da/0x20001, offset 262144 dump 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 --- #0 dumpsys () at ../../kern/kern_shutdown.c:476 476 if (dumping++) { (kgdb) bt #0 dumpsys () at ../../kern/kern_shutdown.c:476 #1 0xc01b0c3c in boot (howto=260) at ../../kern/kern_shutdown.c:319 #2 0xc01b1011 in panic (fmt=0xc032a794 "from debugger") at ../../kern/kern_shutdown.c:569 #3 0xc013c7dd in db_panic (addr=-1070605187, have_addr=0, count=-1, modif=0xca4dbca0 "") at ../../ddb/db_command.c:433 #4 0xc013c77b in db_command (last_cmdp=0xc0371cf4, cmd_table=0xc0371b54, aux_cmd_tablep=0xc03b6b60) at ../../ddb/db_command.c:333 #5 0xc013c842 in db_command_loop () at ../../ddb/db_command.c:455 #6 0xc013eaaf in db_trap (type=3, code=0) at ../../ddb/db_trap.c:71 #7 0xc02fda0c in kdb_trap (type=3, code=0, regs=0xca4dbda0) at ../../i386/i386/db_interface.c:164 #8 0xc030a418 in trap (frame={tf_fs = -901382120, tf_es = 16, tf_ds = -1051394032, tf_edi = -1069770656, tf_esi = 256, tf_ebp = -900874772, tf_isp = -900874804, tf_ebx = 2, tf_edx = -1070199057, tf_ecx = 32, tf_eax = 18, tf_trapno = 3, tf_err = 0, tf_eip = -1070605187, tf_cs = 8, tf_eflags = 86, tf_esp = -1070199073, tf_ss = -1070347485}) at ../../i386/i386/trap.c:614 #9 0xc02fdc7d in Debugger (msg=0xc033cb23 "panic") at machine/cpufunc.h:60 #10 0xc01b1008 in panic (fmt=0xc033cf60 "runq_add: proc %p (%s) not SRUN") at ../../kern/kern_shutdown.c:567 #11 0xc01b483c in runq_add (rq=0xc03c9860, p=0xca466420) at ../../kern/kern_switch.c:142 #12 0xc01b47f5 in setrunqueue (p=0xca466420) at ../../kern/kern_switch.c:70 ---Type to continue, or q to quit--- #13 0xc01a5750 in ithread_schedule (ithread=0xc1349100, do_switch=1) at ../../kern/kern_intr.c:376 #14 0xc030ed8d in sched_ithd (cookie=0x5) at ../../i386/isa/ithread.c:99 #15 0x8 in ?? () #16 0xc01b3329 in issignal (p=0xca466420) at ../../kern/kern_sig.c:1410 #17 0xc01b116a in CURSIG (p=0xca466420) at ../../kern/kern_sig.c:190 #18 0xc030981e in userret (p=0xca466420, frame=0xca4dbfa8, oticks=2) at ../../i386/i386/trap.c:179 #19 0xc030b3d3 in syscall (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 2, tf_esi = 12, tf_ebp = -1077938440, tf_isp = -900874284, tf_ebx = -2, tf_edx = 22195, tf_ecx = 17, tf_eax = 0, tf_trapno = 22, tf_err = 2, tf_eip = 672387100, tf_cs = 31, tf_eflags = 646, tf_esp = -1077938484, tf_ss = 47}) at ../../i386/i386/trap.c:1239 #20 0xc02fe393 in Xint0x80_syscall () #21 0x804d186 in ?? () #22 0x80495ac in ?? () #23 0x804915d in ?? () Script done on Sun Feb 25 22:28:05 2001 PGP signature