Re: boot() called on cpu #1 - hang
On Mon, 10 Sep 2001 [EMAIL PROTECTED] wrote: Hello Tor, thank you for your quick response, unfortunately your patch did not fix the problem. Your machine seems to hang too early for the patch to have any effect. (the patch affects a hang that occurs after the kernel has printed cpu reset called on cpu#1 cpu_reset: Stopping other CPUs ) I have now tested a little bit more with the following sequence: boot machine to single-user reboot I did this more then 10 times. It now got stuck every time Approx. 8 time with boot() called on cpu #1 W And 3 times with boot() called on cpu #0 Wa or boot() called on cpu #0 Waiting (max It looks to me that the kernel-printf gets somehow stuck. Did you use -O2 when compiling the kernel ? That sometimes causes strange problems. The kernel doesn't appear do do much before printing the Waiting (max %d seconds) for system process `%s' to stop message in kproc_shutdown. boot() in /usr/src/sys/kern/kern_shutdown.c contains #ifdef SMP if (smp_active) printf(boot() called on cpu#%d\n, PCPU_GET(cpuid)); #endif /* * Do any callouts that should be done BEFORE syncing the filesystems. */ EVENTHANDLER_INVOKE(shutdown_pre_sync, howto); where the EVENTHANDLER_INVOKE macro expands to a lockmgr() call and invocation of the two events associated with shutdown_pre_sync: kproc_shutdown(bufdaemonproc, howto) kproc_shutdown(updateproc, howto) The normal output is Waiting (max 60 seconds) for system process `bufdaemon' to stop...stopped Waiting (max 60 seconds) for system process `syncer' to stop...stopped If the lockmgr lock for the event list is damaged, further damage elsewhere might occur due to the lockmgr call. If a debug printf before the lockmgr call in EVENTHANDLER_INVOKE() works while a debug printf after the lockmgr call isn't properly printed, then the probability for the problem being related to the lockmgr call is increased (cf. /usr/src/sys/sys/eventhandler.h) - Tor Egge Hello Tor, I have added a printf right before and after the lockmgr call in the EVENTHANDLER_INVOKE() Macro in /usr/src/sys/sys/eventhandler.h. But both of these printf do work! The output I am getting then is: Boot() called on cpu #1 before lockmgr after lockmgr W What else could I test? Michael To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: boot() called on cpu #1 - hang
Hello Tor, thank you for your quick response, unfortunately your patch did not fix the problem. I have now tested a little bit more with the following sequence: boot machine to single-user reboot I did this more then 10 times. It now got stuck every time Approx. 8 time with boot() called on cpu #1 W And 3 times with boot() called on cpu #0 Wa or boot() called on cpu #0 Waiting (max It looks to me that the kernel-printf gets somehow stuck. Michael On Sun, 9 Sep 2001 [EMAIL PROTECTED] wrote: Hello, on a 5.0-current i386-SMP system of today I am still getting on about every second reboot the message: boot() called on cpu #1 W Try applying the enclosed patch. - Tor Egge - michael class, viktor-renner str. 39, 72074 tuebingen, frg E-Mail: [EMAIL PROTECTED] Phone: +49 7031 14-3707 (work) +49 7071 81950 (private) - To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
boot() called on cpu #1 - hang
Hello, on a 5.0-current i386-SMP system of today I am still getting on about every second reboot the message: boot() called on cpu #1 W and then the sysetm hangs. When boot is called on cpu #0 everything works as expected. I think this started roughly two week from now. But I am not sure if then boot was only called on cpu #0 or boot worked on cpu #1. Any suggestions? Micha dmesg from the system is: Copyright (c) 1992-2001 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.0-CURRENT #0: Sat Sep 8 10:04:26 MEST 2001 [EMAIL PROTECTED]:/usr/src/sys/i386/compile/MCSMP2 Timecounter i8254 frequency 1193182 Hz CPU: Pentium III/Pentium III Xeon/Celeron (998.36-MHz 686-class CPU) Origin = GenuineIntel Id = 0x686 Stepping = 6 Features=0x383fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE real memory = 1073676288 (1048512K bytes) avail memory = 1040457728 (1016072K bytes) Programming 24 pins in IOAPIC #0 IOAPIC #0 intpin 2 - irq 0 FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfee0 cpu1 (AP): apic id: 1, version: 0x00040011, at 0xfee0 io0 (APIC): apic id: 2, version: 0x00178011, at 0xfec0 Preloaded elf kernel kernel at 0xc04c7000. Preloaded elf module acpi.ko at 0xc04c709c. Pentium Pro MTRR support enabled Using $PIR table, 8 entries at 0xc00fdbc0 npx0: math processor on motherboard npx0: INT 16 interface acpi0: VIA694 AWRDACPI on motherboard acpi0: power button is handled as a fixed feature programming model. Timecounter ACPI frequency 3579545 Hz acpi_timer0: 24-bit timer at 3.579545MHz port 0x4008-0x400b on acpi0 acpi_cpu0: CPU on acpi0 acpi_cpu1: CPU on acpi0 acpi_button0: Power Button on acpi0 acpi_pcib0: Host-PCI bridge port 0x6000-0x607f,0x5000-0x500f,0x4080-0x40ff,0x4000-0x407f,0xcf8-0xcff on acpi0 IOAPIC #0 intpin 19 - irq 2 IOAPIC #0 intpin 16 - irq 5 IOAPIC #0 intpin 17 - irq 10 IOAPIC #0 intpin 18 - irq 11 pci0: PCI bus on acpi_pcib0 agp0: VIA 82C691 (Apollo Pro) host to PCI bridge mem 0xd000-0xd3ff at device 0.0 on pci0 pcib1: PCI-PCI bridge at device 1.0 on pci0 pci1: PCI bus on pcib1 pci1: display, VGA at device 0.0 (no driver attached) isab0: PCI-ISA bridge at device 7.0 on pci0 isa0: ISA bus on isab0 atapci0: VIA 82C686 ATA100 controller port 0xc000-0xc00f at device 7.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 pci0: serial bus, USB at device 7.2 (no driver attached) pci0: serial bus, USB at device 7.3 (no driver attached) pci0: bridge, PCI-unknown at device 7.4 (no driver attached) pcm0: Creative EMU10K1 port 0xcc00-0xcc1f irq 5 at device 9.0 on pci0 sym0: 810 port 0xd400-0xd4ff mem 0xda003000-0xda0030ff irq 10 at device 10.0 on pci0 sym0: No NVRAM, ID 7, Fast-10, SE, parity checking bktr0: BrookTree 878 mem 0xda00-0xda000fff irq 2 at device 12.0 on pci0 bti2c0: bt848 Hard/Soft I2C controller iicbb0: I2C generic bit-banging driver on bti2c0 iicbus0: Philips I2C bus on iicbb0 master-only smbus0: System Management Bus on bti2c0 smb0: SMBus general purpose I/O on smbus0 bktr0: Hauppauge Model 61344 D121 bktr0: Detected a MSP3410D-B4 at 0x80 bktr0: Hauppauge WinCast/TV, Philips FR1216 PAL FM tuner, msp3400c stereo, remote control. pci0: multimedia at device 12.1 (no driver attached) xl0: 3Com 3c905B-TX Fast Etherlink XL port 0xd800-0xd87f mem 0xda002000-0xda00207f irq 11 at device 13.0 on pci0 xl0: Ethernet address: 00:10:5a:d7:dd:9c miibus0: MII bus on xl0 xlphy0: 3Com internal media interface on miibus0 xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto atapci1: HighPoint HPT370 ATA100 controller port 0xec00-0xecff,0xe800-0xe803,0xe400-0xe407,0xe000-0xe003,0xdc00-0xdc07 irq 11 at device 14.0 on pci0 ata2: at 0xdc00 on atapci1 ata3: at 0xe400 on atapci1 fdc0: NEC 72065B or clone port 0x3f7,0x3f0-0x3f5 irq 6 on acpi0 fdc0: FIFO enabled, 8 bytes threshold fd0: 1440-KB 3.5 drive on fdc0 drive 0 sio0 port 0x3f8-0x3ff irq 4 on acpi0 sio0: type 16550A sio1 port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A ppc0 port 0x378-0x37f irq 7 on acpi0 ppc0: Generic chipset (EPP/NIBBLE) in COMPATIBLE mode lpt0: Printer on ppbus0 lpt0: Interrupt-driven port ppc1: cannot reserve I/O port range atkbdc0: Keyboard controller (i8042) port 0x64,0x60 irq 1 on acpi0 atkbd0: AT Keyboard flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 ppc1: cannot reserve I/O port range psm0: PS/2 Mouse irq 12 on atkbdc0 psm0: model Generic PS/2 mouse, device ID 0 orm0: Option ROM at iomem 0xc-0xc7fff on isa0 sc0: System console at flags 0x100 on isa0 sc0: VGA 16 virtual consoles, flags=0x300 vga0: Generic ISA VGA at port 0x3c0-0x3df iomem 0xa-0xb on isa0 linprocfs registered APIC_IO: Testing 8254 interrupt delivery APIC_IO: routing 8254 via IOAPIC #0 intpin 2 IPv6 packet filtering initialized, default to accept,
Re: boot() called on cpu #1 - hang
Hello, on a 5.0-current i386-SMP system of today I am still getting on about every second reboot the message: boot() called on cpu #1 W Try applying the enclosed patch. - Tor Egge Index: vm_machdep.c === RCS file: /home/ncvs/src/sys/i386/i386/vm_machdep.c,v retrieving revision 1.169 diff -u -r1.169 vm_machdep.c --- vm_machdep.c4 Sep 2001 08:36:46 - 1.169 +++ vm_machdep.c4 Sep 2001 19:58:38 - @@ -424,8 +433,13 @@ { cpu_reset_proxy_active = 1; + wbinvd(); while (cpu_reset_proxy_active == 1) ;/* Wait for other cpu to see that we've started */ + cpu_reset_proxy_active = 3; + wbinvd(); + while (cpu_reset_proxy_active == 3) + ; /* Wait for other cpu to enable interrupts */ stop_cpus((1cpu_reset_proxyid)); printf(cpu_reset_proxy: Stopped CPU %d\n, cpu_reset_proxyid); DELAY(100); @@ -463,6 +477,7 @@ cpu_reset_proxyid = PCPU_GET(cpuid); cpustop_restartfunc = cpu_reset_proxy; cpu_reset_proxy_active = 0; + wbinvd(); printf(cpu_reset: Restarting BSP\n); started_cpus = (10); /* Restart CPU #0 */ @@ -471,9 +486,19 @@ cnt++; /* Wait for BSP to announce restart */ if (cpu_reset_proxy_active == 0) printf(cpu_reset: Failed to restart BSP\n); - enable_intr(); + disable_intr(); cpu_reset_proxy_active = 2; - + wbinvd(); + cnt = 0; + while (cpu_reset_proxy_active == 2 cnt 1000) + cnt++; /* Wait for BSP to stop APs */ + if (cpu_reset_proxy_active == 2) { + printf(cpu_reset: BSP did not stop APs\n); + cpu_reset_real(); + } + cpu_reset_proxy_active = 4; + wbinvd(); + enable_intr(); while (1); /* NOTREACHED */ }