Re: scheduler panic

2011-12-29 Thread John Baldwin
On Sunday, December 25, 2011 10:48:32 am Larry Rosenman wrote:
> On Fri, 23 Dec 2011, Larry Rosenman wrote:
> > On 12/23/2011 8:54 AM, John Baldwin wrote:
> >> The sloppiest fix might be to do this:
> >>
> >> Index: sched_ule.c
> >> ===
> >>
> >>
> > - --- sched_ule.c   (revision 228777)
> >> +++ sched_ule.c(working copy) @@ -1434,7 +1434,8 @@
> >> sched_priority(struct thread *td) } else { pri = SCHED_PRI_MIN; if
> >> (td->td_sched->ts_ticks) - pri += 
> >> SCHED_PRI_TICKS(td->td_sched);
> >> +  pri += min(SCHED_PRI_TICKS(td->td_sched), +
> >> SCHED_PRI_RANGE); pri += SCHED_PRI_NICE(td->td_proc->p_nice);
> >> KASSERT(pri >= PRI_MIN_BATCH && pri <= PRI_MAX_BATCH,
> >> ("sched_priority: invalid priority %d: nice %d, "
> >>
> >
> > I've applied this to both the host and the guest, and am recompiling
> > the guest kernel (hopefully it'll stay up long enough...).
> >
> > I'll report back.
> >
> > Do y'all (FreeBSD Devs) want a PR?
> >
> >
> I've run 2 complete buildworld/buildkernel cycles with the patch applied
> in the guest, and it's made it all the way through.  It wouldn't
> do that without it.
> 
> Can we get this (or something else like it) applied?
> 
> Do I need to file a PR?

I've committed this.  I do think the root problem is an issue with the clock
interrupts, but this seems to be a common enough problem I think a workaround
is warranted.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: scheduler panic

2011-12-25 Thread Larry Rosenman

On Fri, 23 Dec 2011, Larry Rosenman wrote:

On 12/23/2011 8:54 AM, John Baldwin wrote:

The sloppiest fix might be to do this:

Index: sched_ule.c
===



- --- sched_ule.c   (revision 228777)

+++ sched_ule.c (working copy) @@ -1434,7 +1434,8 @@
sched_priority(struct thread *td) } else { pri = SCHED_PRI_MIN; if
(td->td_sched->ts_ticks) -pri += 
SCHED_PRI_TICKS(td->td_sched);
+   pri += min(SCHED_PRI_TICKS(td->td_sched), +
SCHED_PRI_RANGE); pri += SCHED_PRI_NICE(td->td_proc->p_nice);
KASSERT(pri >= PRI_MIN_BATCH && pri <= PRI_MAX_BATCH,
("sched_priority: invalid priority %d: nice %d, "



I've applied this to both the host and the guest, and am recompiling
the guest kernel (hopefully it'll stay up long enough...).

I'll report back.

Do y'all (FreeBSD Devs) want a PR?



I've run 2 complete buildworld/buildkernel cycles with the patch applied
in the guest, and it's made it all the way through.  It wouldn't
do that without it.

Can we get this (or something else like it) applied?

Do I need to file a PR?

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 512-248-2683 E-Mail: l...@lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3893
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: scheduler panic

2011-12-23 Thread Larry Rosenman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/23/2011 8:54 AM, John Baldwin wrote:
> The sloppiest fix might be to do this:
> 
> Index: sched_ule.c 
> ===
>
> 
- --- sched_ule.c   (revision 228777)
> +++ sched_ule.c   (working copy) @@ -1434,7 +1434,8 @@
> sched_priority(struct thread *td) } else { pri = SCHED_PRI_MIN; if
> (td->td_sched->ts_ticks) -pri += 
> SCHED_PRI_TICKS(td->td_sched); 
> + pri += min(SCHED_PRI_TICKS(td->td_sched), +
> SCHED_PRI_RANGE); pri += SCHED_PRI_NICE(td->td_proc->p_nice); 
> KASSERT(pri >= PRI_MIN_BATCH && pri <= PRI_MAX_BATCH, 
> ("sched_priority: invalid priority %d: nice %d, "
> 

I've applied this to both the host and the guest, and am recompiling
the guest kernel (hopefully it'll stay up long enough...).

I'll report back.

Do y'all (FreeBSD Devs) want a PR?


- -- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 512-248-2683 E-Mail: l...@lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3893
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJO9K9cAAoJENC8dtAvA1zmruAIAL0udaYatGWp5E/Th9YYD8Hh
FHVri/G/Va8YsivqfZLFYUZd8SyqO/0vxEIoG73iKJJmjW/CpYIjgOvCRvsCrefm
ABOYmRX0dvC8GLHDgN9XFt4J9GmNTDcneNV7rOvWKisygkHw0GlK5DxKtSo3PsE8
6MQSnUuVmUMggsVQfBUiPTyTmJigcJ9KuEdfbHQ2o7+sCWx+gAKCyfVFcwkNIrYv
M7j21dJ8hjHUteHZ3YttVjYku0/YISSmtvGVCMlm2xBGD+tTu5g2ZcqZsxzlRFst
HyLGDP3mKSQJRMHcvl+OXMmwnFO7m31fLhj04LIWardV93S3CYF0c54LNEHYEN4=
=/imM
-END PGP SIGNATURE-
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: scheduler panic

2011-12-23 Thread John Baldwin
On Friday, December 23, 2011 8:21:41 am Larry Rosenman wrote:
> I've been getting these in a VirtualBox VM.  I'm not sure what to do.
> 
> I CAN give VNC access to this VM in this state.
> 
> panic: sched_priority: invalid priority 331: nice 0, ticks 56612596
> ftick 1213618 itick 1214628 tick pri 159

In the past this happened because the 'ticks' value was bananas.  Priority
values should only be from 0 to 255, so 331 is definitely too large.  The
priority is computed like so:

pri = SCHED_PRI_MIN;
if (td->td_sched->ts_ticks)
pri += SCHED_PRI_TICKS(td->td_sched);
pri += SCHED_PRI_NICE(td->td_proc->p_nice);
KASSERT(pri >= PRI_MIN_BATCH && pri <= PRI_MAX_BATCH,
("sched_priority: invalid priority %d: nice %d, " 
"ticks %d ftick %d ltick %d tick pri %d",
pri, td->td_proc->p_nice, td->td_sched->ts_ticks,
td->td_sched->ts_ftick, td->td_sched->ts_ltick,
SCHED_PRI_TICKS(td->td_sched)));

Note that you have:

kern/sched_ule.c:

#define PRI_TIMESHARE_RANGE (PRI_MAX_TIMESHARE - PRI_MIN_TIMESHARE + 1)
#define PRI_INTERACT_RANGE  ((PRI_TIMESHARE_RANGE - SCHED_PRI_NRESV) / 2)
#define PRI_BATCH_RANGE (PRI_TIMESHARE_RANGE - PRI_INTERACT_RANGE)

#define PRI_MIN_INTERACTPRI_MIN_TIMESHARE
#define PRI_MAX_INTERACT(PRI_MIN_TIMESHARE + PRI_INTERACT_RANGE - 1)
#define PRI_MIN_BATCH   (PRI_MIN_TIMESHARE + PRI_INTERACT_RANGE)
#define PRI_MAX_BATCH   PRI_MAX_TIMESHARE

#define SCHED_PRI_NRESV (PRIO_MAX - PRIO_MIN)

sys/resource.h:

#define PRIO_MIN-20
#define PRIO_MAX20

sys/priority.h:

#define PRI_MIN_TIMESHARE   (120)
#define PRI_MAX_TIMESHARE   (PRI_MIN_IDLE - 1)

#define PRI_MIN_IDLE(224)

So PRI_MAX_BATCH is 223.
PRI_MIN_BATCH is 120 + (((223 - 120 + 1) - (20 - -20)) / 2) which is 152.

So given SCHED_PRI_TICKS() of 159, you end up with 152 + 159 = 311, and
since your nice is 0, SCHED_PRI_NICE() ends up being 20, hence 331.  It
seems the largets value SCHED_PRI_TICKS() should ever generate is
(PRI_BATCH_RANGE - SCHED_PRI_NRESV), though ULE doesn't quite compute it
that way (it might be off by one):

#define SCHED_PRI_NRESV (PRIO_MAX - PRIO_MIN)
#define SCHED_PRI_NHALF (SCHED_PRI_NRESV / 2)
#define SCHED_PRI_MIN   (PRI_MIN_BATCH + SCHED_PRI_NHALF)
#define SCHED_PRI_MAX   (PRI_MAX_BATCH - SCHED_PRI_NHALF)
#define SCHED_PRI_RANGE (SCHED_PRI_MAX - SCHED_PRI_MIN + 1)

However, it's not clear that SCHED_PRI_TICKS() will cap its value to
SCHED_PRI_RANGE:

#define SCHED_PRI_TICKS(ts) \
(SCHED_TICK_HZ((ts)) /  \
(roundup(SCHED_TICK_TOTAL((ts)), SCHED_PRI_RANGE) / SCHED_PRI_RANGE))

The sloppiest fix might be to do this:

Index: sched_ule.c
===
--- sched_ule.c (revision 228777)
+++ sched_ule.c (working copy)
@@ -1434,7 +1434,8 @@ sched_priority(struct thread *td)
} else {
pri = SCHED_PRI_MIN;
if (td->td_sched->ts_ticks)
-   pri += SCHED_PRI_TICKS(td->td_sched);
+   pri += min(SCHED_PRI_TICKS(td->td_sched),
+   SCHED_PRI_RANGE);
pri += SCHED_PRI_NICE(td->td_proc->p_nice);
KASSERT(pri >= PRI_MIN_BATCH && pri <= PRI_MAX_BATCH,
("sched_priority: invalid priority %d: nice %d, " 

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


scheduler panic

2011-12-23 Thread Larry Rosenman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I've been getting these in a VirtualBox VM.  I'm not sure what to do.

I CAN give VNC access to this VM in this state.

panic: sched_priority: invalid priority 331: nice 0, ticks 56612596
ftick 1213618 itick 1214628 tick pri 159
cpuid = 0
KDB: enter: panic

Ideas?

- -- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 512-248-2683 E-Mail: l...@lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3893
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJO9H7aAAoJENC8dtAvA1zmXg8H/3lmAWQBszmBCPv2ucbH4JE8
c7M20HHmtJZtISal/FAkjFD324xDDAIwwZhBlB5bJZzXw3RE+BuCuJy+yYdIcGQd
3DGUvli2ryhOpE8xzkG1i9qIyBvMV8B2lxgdpnAGTtuCnMQPEMGUNPST6RrTivHs
gSk+KxtrmuEtpIowKxeg4HC2JIyF2VQikd0eximYM2b9pRQg5eYiO6HG4xoKJCxh
OQJ3hbITveoSlevd9QddKUQeD7y80KnBT2KNIZsr9HtErZCIDcZYJAXIAgcGUPDW
F9lXVTj7+vaX8YEgZc1i/WExKnyvq3qyQQQktSWSnInzHlMg8nItovZduwtE23E=
=nqba
-END PGP SIGNATURE-


paninc.PNG.sig
Description: Binary data
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: scheduler panic

2011-12-23 Thread Larry Rosenman

On Fri, 23 Dec 2011, Ivan Klymenko wrote:


? Fri, 23 Dec 2011 07:38:21 -0600
Larry Rosenman  ?:


BORG-DTRACE

Show, please, the kernel config BORG-DTRACE



include GENERIC
ident   BORG-DTRACE
options KDTRACE_HOOKS# all architectures - enable general DTrace hooks
options DDB_CTF  # all architectures - kernel ELF linker loads CTF 
data
options KDTRACE_FRAME# amd64 - ensure frames are compiled in
#makeoptions DEBUG="-g"   # amd64? - build kernel with gdb(1) debug symbols
makeoptions WITH_CTF=1


#options COMPAT_FREEBSD8
nooptions WITNESS
nodevice mvs
nodevice siis
nodevice ahc
nodevice ahd
nodevice amd
nodevice hptiop
nodevice isp
nodevice mpt
nodevice mps
nodevice sym
nodevice trm
nodevice adv
nodevice adw
nodevice aic
nodevice bt
nodevice amr
nodevice arcmsr
nodevice asr
nodevice ciss
nodevice dpt
nodevice hptmv
nodevice hptrr
nodevice iir
nodevice ips
nodevice mly
nodevice twa
nodevice aac
nodevice aacp
nodevice ida
nodevice mfi
nodevice mlx
nodevice twe
nodevice tws
nodevice cbb
nodevice pccard
nodevice cardbus
nodevice plip
nodevice puc
nodevice bxe
nodevice de
nodevice igb
nodevice ixgbe
nodevice le
nodevice ti
nodevice txp
nodevice vx
nodevice ae
nodevice age
nodevice alc
nodevice ale
nodevice bce
nodevice bfe
nodevice bge
nodevice dc
nodevice et
nodevice fxp
nodevice jme
nodevice lge
nodevice msk
nodevice mge
nodevice pcn
nodevice re
nodevice rl
nodevice sf
nodevice sge
nodevice sis
nodevice sk
nodevice ste
nodevice stge
nodevice tl
nodevice tx
nodevice vge
nodevice vr
nodevice wb
nodevice xl
nodevice cs
nodevice ed
nodevice ex
nodevice ep
nodevice fe
nodevice sn
nodevice xe

nodevice wlan 
nodevice wlan_wep

nodevice wlan_ccmp
nodevice wlan_tkip
nodevice wlan_amrr
nodevice an
nodevice ath
nodevice ath_pci
nodevice ath_hal
nodevice ath_rate_sample
nodevice ipw
nodevice iwi
nodevice iwn
nodevice malo
nodevice mwl
nodevice ral
nodevice wi
nodevice wpi


nodevice  urio# Diamond Rio 500 MP3 player
# USB Serial devices
nodevice  u3g # USB-based 3G modems (Option, Huawei, Sierra)
nodevice  uark# Technologies ARK3116 based serial adapters
nodevice  ubsa# Belkin F5U103 and compatible serial adapters
nodevice  uftdi   # For FTDI usb serial adapters
nodevice  uipaq   # Some WinCE based devices
nodevice  uplcom  # Prolific PL-2303 serial adapters
nodevice  uslcom  # SI Labs CP2101/CP2102 serial adapters
nodevice  uvisor  # Visor and Palm devices
nodevice  uvscom  # USB serial support for DDI pocket's PHS
# USB Ethernet, requires miibus
nodevice  aue # ADMtek USB Ethernet
nodevice  axe # ASIX Electronics USB Ethernet
nodevice  cdce# Generic USB over Ethernet
nodevice  cue # CATC USB Ethernet
nodevice  kue # Kawasaki LSI USB Ethernet

nodevice  rue # RealTek RTL8150 USB Ethernet
nodevice  udav# Davicom DM9601E USB
# USB Wireless
nodevice  rum # Ralink Technology RT2501USB wireless NICs
nodevice  run # Ralink Technology RT2700/RT2800/RT3000 NICs.
nodevice  uath# Atheros AR5523 wireless NICs
nodevice  upgt# Conexant/Intersil PrismGT wireless NICs.
nodevice  ural# Ralink Technology RT2500USB wireless NICs
nodevice  urtw# Realtek RTL8187B/L wireless NICs
nodevice  zyd # ZyDAS zd1211/zd1211b wireless NICs

# FireWire support
nodevice  firewire# FireWire bus code
nodevice  sbp # SCSI over FireWire (Requires scbus and da)
nodevice  fwe # Ethernet over FireWire (non-standard!)
nodevice  fwip# IP over FireWire (RFC 2734,3146)
nodevice  dcons   # Dumb console driver
nodevice  dcons_crom  # Configuration ROM for dcons

# Sound support
nodevice  sound   # Generic sound driver (required)
nodevice  snd_es137x  # Ensoniq AudioPCI ES137x
nodevice  snd_hda # Intel High Definition Audio
nodevice  snd_ich # Intel, NVidia and other ICH AC'97 Audio
nodevice  snd_uaudio  # USB Audio
nodevice  snd_via8233 # VIA VT8233x Audio

devicenetmap
options   FFCLOCK


I've also seen it with GENERIC, FWIW.

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 512-248-2683 E-Mail: l...@lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3893
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: scheduler panic

2011-12-23 Thread Ivan Klymenko
В Fri, 23 Dec 2011 07:38:21 -0600
Larry Rosenman  пишет:

> BORG-DTRACE
Show, please, the kernel config BORG-DTRACE
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: scheduler panic

2011-12-23 Thread Larry Rosenman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/23/2011 7:31 AM, Ivan Klymenko wrote:
> В Fri, 23 Dec 2011 07:21:41 -0600 Larry Rosenman 
> пишет:
> 
>> -BEGIN PGP SIGNED MESSAGE- Hash: SHA1
>> 
>> I've been getting these in a VirtualBox VM.  I'm not sure what to
>> do.
>> 
>> I CAN give VNC access to this VM in this state.
>> 
>> panic: sched_priority: invalid priority 331: nice 0, ticks
>> 56612596 ftick 1213618 itick 1214628 tick pri 159 cpuid = 0 KDB:
>> enter: panic
>> 
>> Ideas? (repost without the screenshot).
> 
> uname -a ???
Oops.  It's running the same kernel as it's host:
$ uname -a
FreeBSD borg.lerctr.org 10.0-CURRENT FreeBSD 10.0-CURRENT #31 r228802:
Thu Dec 22 11:21:25 CST 2011
r...@borg.lerctr.org:/usr/obj/usr/src/sys/BORG-DTRACE  amd64
$

I can also give SSH access to the host as well.


- -- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 512-248-2683 E-Mail: l...@lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3893
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJO9IRMAAoJENC8dtAvA1zmxJEIALP7wzsx9Co9QaE+Cx3JK2vx
pCRJqLBTkpsnzdYmGsczBAUpEXJ/POx+7UsWycd48zQlT64FZubeHGi2yIIZNOzL
zpYdaY/70cacFuyouMtZyLOrCTLiJe4AVBOluA79zCNgJKIcIhGyGSObsO7CqiiR
oS2MHyWy9n5oLo6Qf79708gar4QXHDZwVkgRZ3heWeZY+wPt8CVrzX5k8uf7dSlz
Yq4+A1G9atfuprp6iRUTIT7aHKKv6IwM3QAg2wuaUqatUYsJv8ushRrsZHfJmmft
/LmMmHMlaqqsDy4Wjm0v5souid6vIuGv7zyOxfILCnk/9UnWEEThqpP31mu362k=
=skwU
-END PGP SIGNATURE-
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: scheduler panic

2011-12-23 Thread Ivan Klymenko
В Fri, 23 Dec 2011 07:21:41 -0600
Larry Rosenman  пишет:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> I've been getting these in a VirtualBox VM.  I'm not sure what to do.
> 
> I CAN give VNC access to this VM in this state.
> 
> panic: sched_priority: invalid priority 331: nice 0, ticks 56612596
> ftick 1213618 itick 1214628 tick pri 159
> cpuid = 0
> KDB: enter: panic
> 
> Ideas?
> (repost without the screenshot).

uname -a ???
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


scheduler panic

2011-12-23 Thread Larry Rosenman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I've been getting these in a VirtualBox VM.  I'm not sure what to do.

I CAN give VNC access to this VM in this state.

panic: sched_priority: invalid priority 331: nice 0, ticks 56612596
ftick 1213618 itick 1214628 tick pri 159
cpuid = 0
KDB: enter: panic

Ideas?
(repost without the screenshot).

- -- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 512-248-2683 E-Mail: l...@lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3893
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJO9IBlAAoJENC8dtAvA1zmAJAIAL67TPUdIigtumkBLHZM1qCo
7JFfBXpyEjH8vs0bkCk+GYSCke67IGMUpiR5XeZ8UsKjiTtyyhw1SQZYIw/EiVvf
7Nf+DOxbKIYEPezeEqpaskejItfOM6h7ajZovRNTJsrNH+ha0csGgFk46iEFH5Qq
LTQ7D5GrFj+hCzNDLcbxWOiTqxGMlTboZun5C0Y6BYK09RpLqMtU6bIh/37zj7kr
u4VSh94hPW8t8qTnL5rlETMAjvmtIivphEVv/R5jOv0cGtNP/o2QaM66w3TaxyJ0
Z9ixNuzq3MAft20VRrVdUEnZ43DASv7Aisl2GNoaTNRW/MVuaULG/PdA9hj2vZ8=
=G2La
-END PGP SIGNATURE-
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Scheduler panic

2001-03-01 Thread Jake Burkholder

replying to myself again

> 
> This is the best workaround I can think of:
> 
> Index: kern/kern_intr.c
> ===
> RCS file: /home/ncvs/src/sys/kern/kern_intr.c,v
> retrieving revision 1.47
> diff -u -r1.47 kern_intr.c
> --- kern/kern_intr.c2001/02/28 02:53:43 1.47
> +++ kern/kern_intr.c2001/03/02 02:28:08
> @@ -366,7 +366,7 @@
>  */
> ithread->it_need = 1;
> mtx_lock_spin(&sched_lock);
> -   if (p->p_stat == SWAIT) {
> +   if (p->p_stat == SWAIT && curproc->p_stat == SRUN) {
> CTR1(KTR_INTR, __func__ ": setrunqueue %d", p->p_pid);
> p->p_stat = SRUN;
> setrunqueue(p);

Heh.  Sorry this is wrong, the test for SRUN should be in the
same if statement as the do_switch, one further in.  This
will completetly miss interrupts if the race is ever hit...


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Scheduler panic

2001-03-01 Thread John Baldwin


On 02-Mar-01 Jake Burkholder wrote:
>> > On Sun, Feb 25, 2001 at 10:29:42PM -0800, Kris Kennaway wrote:
>> > > This is on a UP system.
>> > 
>> > Had another one of these, under the same conditions.  Both times I was
>> > running more(1) on a stdin stream which was generated by a "find |
>> > grep | more" operation, and I suspended the process with ^Z,
>> > triggering the panic.  Perhaps this will help in tracking down the
>> > root cause.
>> 
>> I'm pretty sure I know what this is; I'll work up a patch tonight.
>> 
> 
> Sorry this is taking so long.  Its turned out to be a little more
> complex to fix properly than I originally thought.  We're going to
> have to change the way one of the fields of struct proc (p_pptr)
> is locked.  The problem is that a process is getting preempted
> when its not SRUN, which should be protected by the scheduler
> lock so that the preemption can't occur.
> 
> This is the best workaround I can think of:
> 
> Index: kern/kern_intr.c
> ===
> RCS file: /home/ncvs/src/sys/kern/kern_intr.c,v
> retrieving revision 1.47
> diff -u -r1.47 kern_intr.c
> --- kern/kern_intr.c2001/02/28 02:53:43 1.47
> +++ kern/kern_intr.c2001/03/02 02:28:08
> @@ -366,7 +366,7 @@
>  */
> ithread->it_need = 1;
> mtx_lock_spin(&sched_lock);
> -   if (p->p_stat == SWAIT) {
> +   if (p->p_stat == SWAIT && curproc->p_stat == SRUN) {
> CTR1(KTR_INTR, __func__ ": setrunqueue %d", p->p_pid);
> p->p_stat = SRUN;
> setrunqueue(p);
> 
> Jake

Eek, this is wrong.  We need to always put it on the runqueue, the trick
is we just need to avoid the actual task switch.  This is what I have here:

@@ -369,7 +374,7 @@
CTR1(KTR_INTR, __func__ ": setrunqueue %d", p->p_pid);
p->p_stat = SRUN;
setrunqueue(p);
-   if (do_switch) {
+   if (do_switch && curproc->p_stat == SRUN) {
saveintr = sched_lock.mtx_saveintr;
mtx_intr_enable(&sched_lock);
if (curproc != PCPU_GET(idleproc))

(Among other fixes.)  I'll try and get this committed tonight if no one screams
bloody murder.

-- 

John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Scheduler panic

2001-03-01 Thread Jake Burkholder

> > On Sun, Feb 25, 2001 at 10:29:42PM -0800, Kris Kennaway wrote:
> > > This is on a UP system.
> > 
> > Had another one of these, under the same conditions.  Both times I was
> > running more(1) on a stdin stream which was generated by a "find |
> > grep | more" operation, and I suspended the process with ^Z,
> > triggering the panic.  Perhaps this will help in tracking down the
> > root cause.
> 
> I'm pretty sure I know what this is; I'll work up a patch tonight.
> 

Sorry this is taking so long.  Its turned out to be a little more
complex to fix properly than I originally thought.  We're going to
have to change the way one of the fields of struct proc (p_pptr)
is locked.  The problem is that a process is getting preempted
when its not SRUN, which should be protected by the scheduler
lock so that the preemption can't occur.

This is the best workaround I can think of:

Index: kern/kern_intr.c
===
RCS file: /home/ncvs/src/sys/kern/kern_intr.c,v
retrieving revision 1.47
diff -u -r1.47 kern_intr.c
--- kern/kern_intr.c2001/02/28 02:53:43 1.47
+++ kern/kern_intr.c2001/03/02 02:28:08
@@ -366,7 +366,7 @@
 */
ithread->it_need = 1;
mtx_lock_spin(&sched_lock);
-   if (p->p_stat == SWAIT) {
+   if (p->p_stat == SWAIT && curproc->p_stat == SRUN) {
CTR1(KTR_INTR, __func__ ": setrunqueue %d", p->p_pid);
p->p_stat = SRUN;
setrunqueue(p);

Jake


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Scheduler panic

2001-02-26 Thread Jake Burkholder

> On Sun, Feb 25, 2001 at 10:29:42PM -0800, Kris Kennaway wrote:
> > This is on a UP system.
> 
> Had another one of these, under the same conditions.  Both times I was
> running more(1) on a stdin stream which was generated by a "find |
> grep | more" operation, and I suspended the process with ^Z,
> triggering the panic.  Perhaps this will help in tracking down the
> root cause.

I'm pretty sure I know what this is; I'll work up a patch tonight.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Scheduler panic

2001-02-26 Thread Kris Kennaway

On Sun, Feb 25, 2001 at 10:29:42PM -0800, Kris Kennaway wrote:
> This is on a UP system.

Had another one of these, under the same conditions.  Both times I was
running more(1) on a stdin stream which was generated by a "find |
grep | more" operation, and I suspended the process with ^Z,
triggering the panic.  Perhaps this will help in tracking down the
root cause.

Kris

 PGP signature


Scheduler panic

2001-02-25 Thread Kris Kennaway

This is on a UP system.

Kris

GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...
IdlePTD 4767744
initial pcb at 3c9740
panicstr: from debugger
panic messages:
---
panic: runq_add: proc 0xca466420 (more) not SRUN
panic: from debugger
Uptime: 3h39m42s

dumping to dev #da/0x20001, offset 262144
dump 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 
108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 
82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 
53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 
24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 
---
#0  dumpsys () at ../../kern/kern_shutdown.c:476
476 if (dumping++) {
(kgdb) bt
#0  dumpsys () at ../../kern/kern_shutdown.c:476
#1  0xc01b0c3c in boot (howto=260) at ../../kern/kern_shutdown.c:319
#2  0xc01b1011 in panic (fmt=0xc032a794 "from debugger")
at ../../kern/kern_shutdown.c:569
#3  0xc013c7dd in db_panic (addr=-1070605187, have_addr=0, count=-1, 
modif=0xca4dbca0 "") at ../../ddb/db_command.c:433
#4  0xc013c77b in db_command (last_cmdp=0xc0371cf4, cmd_table=0xc0371b54, 
aux_cmd_tablep=0xc03b6b60) at ../../ddb/db_command.c:333
#5  0xc013c842 in db_command_loop () at ../../ddb/db_command.c:455
#6  0xc013eaaf in db_trap (type=3, code=0) at ../../ddb/db_trap.c:71
#7  0xc02fda0c in kdb_trap (type=3, code=0, regs=0xca4dbda0)
at ../../i386/i386/db_interface.c:164
#8  0xc030a418 in trap (frame={tf_fs = -901382120, tf_es = 16, 
  tf_ds = -1051394032, tf_edi = -1069770656, tf_esi = 256, 
  tf_ebp = -900874772, tf_isp = -900874804, tf_ebx = 2, 
  tf_edx = -1070199057, tf_ecx = 32, tf_eax = 18, tf_trapno = 3, 
  tf_err = 0, tf_eip = -1070605187, tf_cs = 8, tf_eflags = 86, 
  tf_esp = -1070199073, tf_ss = -1070347485}) at ../../i386/i386/trap.c:614
#9  0xc02fdc7d in Debugger (msg=0xc033cb23 "panic") at machine/cpufunc.h:60
#10 0xc01b1008 in panic (fmt=0xc033cf60 "runq_add: proc %p (%s) not SRUN")
at ../../kern/kern_shutdown.c:567
#11 0xc01b483c in runq_add (rq=0xc03c9860, p=0xca466420)
at ../../kern/kern_switch.c:142
#12 0xc01b47f5 in setrunqueue (p=0xca466420) at ../../kern/kern_switch.c:70
---Type  to continue, or q  to quit---
#13 0xc01a5750 in ithread_schedule (ithread=0xc1349100, do_switch=1)
at ../../kern/kern_intr.c:376
#14 0xc030ed8d in sched_ithd (cookie=0x5) at ../../i386/isa/ithread.c:99
#15 0x8 in ?? ()
#16 0xc01b3329 in issignal (p=0xca466420) at ../../kern/kern_sig.c:1410
#17 0xc01b116a in CURSIG (p=0xca466420) at ../../kern/kern_sig.c:190
#18 0xc030981e in userret (p=0xca466420, frame=0xca4dbfa8, oticks=2)
at ../../i386/i386/trap.c:179
#19 0xc030b3d3 in syscall (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, 
  tf_edi = 2, tf_esi = 12, tf_ebp = -1077938440, tf_isp = -900874284, 
  tf_ebx = -2, tf_edx = 22195, tf_ecx = 17, tf_eax = 0, tf_trapno = 22, 
  tf_err = 2, tf_eip = 672387100, tf_cs = 31, tf_eflags = 646, 
  tf_esp = -1077938484, tf_ss = 47}) at ../../i386/i386/trap.c:1239
#20 0xc02fe393 in Xint0x80_syscall ()
#21 0x804d186 in ?? ()
#22 0x80495ac in ?? ()
#23 0x804915d in ?? ()

Script done on Sun Feb 25 22:28:05 2001

 PGP signature


Scheduler panic

2001-02-25 Thread Kris Kennaway

This is on a UP system.

Kris

GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...
IdlePTD 4767744
initial pcb at 3c9740
panicstr: from debugger
panic messages:
---
panic: runq_add: proc 0xca466420 (more) not SRUN
panic: from debugger
Uptime: 3h39m42s

dumping to dev #da/0x20001, offset 262144
dump 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 
108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 
82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 
53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 
24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 
---
#0  dumpsys () at ../../kern/kern_shutdown.c:476
476 if (dumping++) {
(kgdb) bt
#0  dumpsys () at ../../kern/kern_shutdown.c:476
#1  0xc01b0c3c in boot (howto=260) at ../../kern/kern_shutdown.c:319
#2  0xc01b1011 in panic (fmt=0xc032a794 "from debugger")
at ../../kern/kern_shutdown.c:569
#3  0xc013c7dd in db_panic (addr=-1070605187, have_addr=0, count=-1, 
modif=0xca4dbca0 "") at ../../ddb/db_command.c:433
#4  0xc013c77b in db_command (last_cmdp=0xc0371cf4, cmd_table=0xc0371b54, 
aux_cmd_tablep=0xc03b6b60) at ../../ddb/db_command.c:333
#5  0xc013c842 in db_command_loop () at ../../ddb/db_command.c:455
#6  0xc013eaaf in db_trap (type=3, code=0) at ../../ddb/db_trap.c:71
#7  0xc02fda0c in kdb_trap (type=3, code=0, regs=0xca4dbda0)
at ../../i386/i386/db_interface.c:164
#8  0xc030a418 in trap (frame={tf_fs = -901382120, tf_es = 16, 
  tf_ds = -1051394032, tf_edi = -1069770656, tf_esi = 256, 
  tf_ebp = -900874772, tf_isp = -900874804, tf_ebx = 2, 
  tf_edx = -1070199057, tf_ecx = 32, tf_eax = 18, tf_trapno = 3, 
  tf_err = 0, tf_eip = -1070605187, tf_cs = 8, tf_eflags = 86, 
  tf_esp = -1070199073, tf_ss = -1070347485}) at ../../i386/i386/trap.c:614
#9  0xc02fdc7d in Debugger (msg=0xc033cb23 "panic") at machine/cpufunc.h:60
#10 0xc01b1008 in panic (fmt=0xc033cf60 "runq_add: proc %p (%s) not SRUN")
at ../../kern/kern_shutdown.c:567
#11 0xc01b483c in runq_add (rq=0xc03c9860, p=0xca466420)
at ../../kern/kern_switch.c:142
#12 0xc01b47f5 in setrunqueue (p=0xca466420) at ../../kern/kern_switch.c:70
---Type  to continue, or q  to quit---
#13 0xc01a5750 in ithread_schedule (ithread=0xc1349100, do_switch=1)
at ../../kern/kern_intr.c:376
#14 0xc030ed8d in sched_ithd (cookie=0x5) at ../../i386/isa/ithread.c:99
#15 0x8 in ?? ()
#16 0xc01b3329 in issignal (p=0xca466420) at ../../kern/kern_sig.c:1410
#17 0xc01b116a in CURSIG (p=0xca466420) at ../../kern/kern_sig.c:190
#18 0xc030981e in userret (p=0xca466420, frame=0xca4dbfa8, oticks=2)
at ../../i386/i386/trap.c:179
#19 0xc030b3d3 in syscall (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, 
  tf_edi = 2, tf_esi = 12, tf_ebp = -1077938440, tf_isp = -900874284, 
  tf_ebx = -2, tf_edx = 22195, tf_ecx = 17, tf_eax = 0, tf_trapno = 22, 
  tf_err = 2, tf_eip = 672387100, tf_cs = 31, tf_eflags = 646, 
  tf_esp = -1077938484, tf_ss = 47}) at ../../i386/i386/trap.c:1239
#20 0xc02fe393 in Xint0x80_syscall ()
#21 0x804d186 in ?? ()
#22 0x80495ac in ?? ()
#23 0x804915d in ?? ()

Script done on Sun Feb 25 22:28:05 2001

 PGP signature