Re: boot() called on cpu #1 - hang

2001-09-11 Thread Michael Class


On Mon, 10 Sep 2001 [EMAIL PROTECTED] wrote:

  Hello Tor,
 
  thank you for your quick response, unfortunately your patch did
  not fix the problem.


 Your machine seems to hang too early for the patch to have any effect.
 (the patch affects a hang that occurs after the kernel has printed
   cpu reset called on cpu#1
   cpu_reset: Stopping other CPUs
 )

  I have now tested a little bit more with the following sequence:
 
boot machine to single-user
reboot
 
  I did this more then 10 times. It now got stuck every time
  Approx. 8 time with
 
boot() called on cpu #1
W
 
  And 3 times with
 
boot() called on cpu #0
Wa
 
  or
 
boot() called on cpu #0
Waiting (max
 
  It looks to me that the kernel-printf gets somehow stuck.


 Did you use -O2 when compiling the kernel ?  That sometimes causes
 strange problems.

 The kernel doesn't appear do do much before printing the

   Waiting (max %d seconds) for system process `%s' to stop

 message in kproc_shutdown.


 boot() in /usr/src/sys/kern/kern_shutdown.c contains

 #ifdef SMP
 if (smp_active)
 printf(boot() called on cpu#%d\n, PCPU_GET(cpuid));
 #endif
 /*
  * Do any callouts that should be done BEFORE syncing the filesystems.
  */
 EVENTHANDLER_INVOKE(shutdown_pre_sync, howto);


 where the EVENTHANDLER_INVOKE macro expands to a lockmgr() call and
 invocation of the two events associated with shutdown_pre_sync:

   kproc_shutdown(bufdaemonproc, howto)
   kproc_shutdown(updateproc, howto)

 The normal output is

   Waiting (max 60 seconds) for system process `bufdaemon' to stop...stopped
   Waiting (max 60 seconds) for system process `syncer' to stop...stopped

 If the lockmgr lock for the event list is damaged, further damage
 elsewhere might occur due to the lockmgr call.  If a debug printf
 before the lockmgr call in EVENTHANDLER_INVOKE() works while a debug
 printf after the lockmgr call isn't properly printed, then the
 probability for the problem being related to the lockmgr call is
 increased (cf. /usr/src/sys/sys/eventhandler.h)

 - Tor Egge



Hello Tor,

I have added a printf right before and after the lockmgr call in the
EVENTHANDLER_INVOKE() Macro in /usr/src/sys/sys/eventhandler.h.
But both of these printf do work! The output I am getting then is:

Boot() called on cpu #1
before lockmgr
after lockmgr
W

What else could I test?

Michael


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: boot() called on cpu #1 - hang

2001-09-09 Thread Michael Class

Hello Tor,

thank you for your quick response, unfortunately your patch did
not fix the problem.

I have now tested a little bit more with the following sequence:

  boot machine to single-user
  reboot

I did this more then 10 times. It now got stuck every time
Approx. 8 time with

  boot() called on cpu #1
  W

And 3 times with

  boot() called on cpu #0
  Wa

or

  boot() called on cpu #0
  Waiting (max

It looks to me that the kernel-printf gets somehow stuck.

Michael


On Sun, 9 Sep 2001 [EMAIL PROTECTED] wrote:

  Hello,
 
  on a 5.0-current i386-SMP system of today I am still getting on about
  every second reboot the message:
 
  boot() called on cpu #1
  W

 Try applying the enclosed patch.

 - Tor Egge




-
michael class, viktor-renner str. 39, 72074 tuebingen, frg
E-Mail: [EMAIL PROTECTED]
 Phone: +49 7031 14-3707 (work) +49 7071 81950 (private)
-


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



boot() called on cpu #1 - hang

2001-09-08 Thread Michael Class

Hello,

on a 5.0-current i386-SMP system of today I am still getting on about
every second reboot the message:

boot() called on cpu #1
W

and then the sysetm hangs. When boot is called on cpu #0 everything works
as expected.

I think this started roughly two week from now. But I am not sure
if then boot was only called on cpu #0 or boot worked on cpu #1.

Any suggestions?

Micha


dmesg from the system is:


Copyright (c) 1992-2001 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.0-CURRENT #0: Sat Sep  8 10:04:26 MEST 2001
[EMAIL PROTECTED]:/usr/src/sys/i386/compile/MCSMP2
Timecounter i8254  frequency 1193182 Hz
CPU: Pentium III/Pentium III Xeon/Celeron (998.36-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0x686  Stepping = 6
  
Features=0x383fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE
real memory  = 1073676288 (1048512K bytes)
avail memory = 1040457728 (1016072K bytes)
Programming 24 pins in IOAPIC #0
IOAPIC #0 intpin 2 - irq 0
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfee0
 cpu1 (AP):  apic id:  1, version: 0x00040011, at 0xfee0
 io0 (APIC): apic id:  2, version: 0x00178011, at 0xfec0
Preloaded elf kernel kernel at 0xc04c7000.
Preloaded elf module acpi.ko at 0xc04c709c.
Pentium Pro MTRR support enabled
Using $PIR table, 8 entries at 0xc00fdbc0
npx0: math processor on motherboard
npx0: INT 16 interface
acpi0: VIA694 AWRDACPI on motherboard
acpi0: power button is handled as a fixed feature programming model.
Timecounter ACPI  frequency 3579545 Hz
acpi_timer0: 24-bit timer at 3.579545MHz port 0x4008-0x400b on acpi0
acpi_cpu0: CPU on acpi0
acpi_cpu1: CPU on acpi0
acpi_button0: Power Button on acpi0
acpi_pcib0: Host-PCI bridge port 
0x6000-0x607f,0x5000-0x500f,0x4080-0x40ff,0x4000-0x407f,0xcf8-0xcff on acpi0
IOAPIC #0 intpin 19 - irq 2
IOAPIC #0 intpin 16 - irq 5
IOAPIC #0 intpin 17 - irq 10
IOAPIC #0 intpin 18 - irq 11
pci0: PCI bus on acpi_pcib0
agp0: VIA 82C691 (Apollo Pro) host to PCI bridge mem 0xd000-0xd3ff at device 
0.0 on pci0
pcib1: PCI-PCI bridge at device 1.0 on pci0
pci1: PCI bus on pcib1
pci1: display, VGA at device 0.0 (no driver attached)
isab0: PCI-ISA bridge at device 7.0 on pci0
isa0: ISA bus on isab0
atapci0: VIA 82C686 ATA100 controller port 0xc000-0xc00f at device 7.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
pci0: serial bus, USB at device 7.2 (no driver attached)
pci0: serial bus, USB at device 7.3 (no driver attached)
pci0: bridge, PCI-unknown at device 7.4 (no driver attached)
pcm0: Creative EMU10K1 port 0xcc00-0xcc1f irq 5 at device 9.0 on pci0
sym0: 810 port 0xd400-0xd4ff mem 0xda003000-0xda0030ff irq 10 at device 10.0 on pci0
sym0: No NVRAM, ID 7, Fast-10, SE, parity checking
bktr0: BrookTree 878 mem 0xda00-0xda000fff irq 2 at device 12.0 on pci0
bti2c0: bt848 Hard/Soft I2C controller
iicbb0: I2C generic bit-banging driver on bti2c0
iicbus0: Philips I2C bus on iicbb0 master-only
smbus0: System Management Bus on bti2c0
smb0: SMBus general purpose I/O on smbus0
bktr0: Hauppauge Model 61344 D121
bktr0: Detected a MSP3410D-B4 at 0x80
bktr0: Hauppauge WinCast/TV, Philips FR1216 PAL FM tuner, msp3400c stereo, remote 
control.
pci0: multimedia at device 12.1 (no driver attached)
xl0: 3Com 3c905B-TX Fast Etherlink XL port 0xd800-0xd87f mem 0xda002000-0xda00207f 
irq 11 at device 13.0 on pci0
xl0: Ethernet address: 00:10:5a:d7:dd:9c
miibus0: MII bus on xl0
xlphy0: 3Com internal media interface on miibus0
xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
atapci1: HighPoint HPT370 ATA100 controller port 
0xec00-0xecff,0xe800-0xe803,0xe400-0xe407,0xe000-0xe003,0xdc00-0xdc07 irq 11 at device 
14.0 on pci0
ata2: at 0xdc00 on atapci1
ata3: at 0xe400 on atapci1
fdc0: NEC 72065B or clone port 0x3f7,0x3f0-0x3f5 irq 6 on acpi0
fdc0: FIFO enabled, 8 bytes threshold
fd0: 1440-KB 3.5 drive on fdc0 drive 0
sio0 port 0x3f8-0x3ff irq 4 on acpi0
sio0: type 16550A
sio1 port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
ppc0 port 0x378-0x37f irq 7 on acpi0
ppc0: Generic chipset (EPP/NIBBLE) in COMPATIBLE mode
lpt0: Printer on ppbus0
lpt0: Interrupt-driven port
ppc1: cannot reserve I/O port range
atkbdc0: Keyboard controller (i8042) port 0x64,0x60 irq 1 on acpi0
atkbd0: AT Keyboard flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
ppc1: cannot reserve I/O port range
psm0: PS/2 Mouse irq 12 on atkbdc0
psm0: model Generic PS/2 mouse, device ID 0
orm0: Option ROM at iomem 0xc-0xc7fff on isa0
sc0: System console at flags 0x100 on isa0
sc0: VGA 16 virtual consoles, flags=0x300
vga0: Generic ISA VGA at port 0x3c0-0x3df iomem 0xa-0xb on isa0
linprocfs registered
APIC_IO: Testing 8254 interrupt delivery
APIC_IO: routing 8254 via IOAPIC #0 intpin 2
IPv6 packet filtering initialized, default to accept, 

Re: boot() called on cpu #1 - hang

2001-09-08 Thread Tor . Egge

 Hello,
 
 on a 5.0-current i386-SMP system of today I am still getting on about
 every second reboot the message:
 
 boot() called on cpu #1
 W

Try applying the enclosed patch.

- Tor Egge




Index: vm_machdep.c
===
RCS file: /home/ncvs/src/sys/i386/i386/vm_machdep.c,v
retrieving revision 1.169
diff -u -r1.169 vm_machdep.c
--- vm_machdep.c4 Sep 2001 08:36:46 -   1.169
+++ vm_machdep.c4 Sep 2001 19:58:38 -
@@ -424,8 +433,13 @@
 {
 
cpu_reset_proxy_active = 1;
+   wbinvd();
while (cpu_reset_proxy_active == 1)
;/* Wait for other cpu to see that we've started */
+   cpu_reset_proxy_active = 3;
+   wbinvd();
+   while (cpu_reset_proxy_active == 3)
+   ;  /* Wait for other cpu to enable interrupts */
stop_cpus((1cpu_reset_proxyid));
printf(cpu_reset_proxy: Stopped CPU %d\n, cpu_reset_proxyid);
DELAY(100);
@@ -463,6 +477,7 @@
cpu_reset_proxyid = PCPU_GET(cpuid);
cpustop_restartfunc = cpu_reset_proxy;
cpu_reset_proxy_active = 0;
+   wbinvd();
printf(cpu_reset: Restarting BSP\n);
started_cpus = (10);  /* Restart CPU #0 */
 
@@ -471,9 +486,19 @@
cnt++;  /* Wait for BSP to announce restart */
if (cpu_reset_proxy_active == 0)
printf(cpu_reset: Failed to restart BSP\n);
-   enable_intr();
+   disable_intr();
cpu_reset_proxy_active = 2;
-
+   wbinvd();
+   cnt = 0;
+   while (cpu_reset_proxy_active == 2  cnt  1000)
+   cnt++;  /* Wait for BSP to stop APs */
+   if (cpu_reset_proxy_active == 2) {
+   printf(cpu_reset: BSP did not stop APs\n);
+   cpu_reset_real();
+   }
+   cpu_reset_proxy_active = 4;
+   wbinvd();
+   enable_intr();
while (1);
/* NOTREACHED */
}