Re: 4-way SMP broken ?
On Wed, 09 Jun 1999, Luoqi Chen wrote: I've been trying to install 19990604-CURRENT on a couple of SC450NX boxes. It works fine with 2 cpu's, but an SMP kernel with 4 cpu's falls over very quickly (I think while it's setting up the APIC stuff, or very shortly after - the messages about APIC bus ids appear on the screen very briefly, then the machine reboots itself). Do you mean messages like these? FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfec08000 cpu1 (AP): apic id: 12, version: 0x00040011, at 0xfec08000 io0 (APIC): apic id: 13, version: 0x00170011, at 0xfec0 By the time you see these messages, all cpus should have been booted up successfully, any crash immediately follows is not likely to be SMP related. It's helpful to pinpoint the crash if you could include the last few lines from a verbose boot. I have added more debugging messages, and the crash appears to be inside mp_start(). I don't have a log because this is too early in the boot to get the messages saved anywhere, and they go by too quickly to write it down. The evidence that this is an SMP problem is simple - with 2 cpu's plugged in, it works fine; with 3 or 4 cpu's plugged in, it crashes. I believe the hardware is fine because I was previously running 19990421-CURRENT with all 4 cpu's without serious problems (it was a little unstable, but always booted ok). Does anyone know a) when was the last time it worked on 4 cpu's b) what's changed recently which might relate to this. So if anyone has an answer to these questions I'd still be interested. Also in trying to figure this out I looked at the DRAM probing code in /usr/src/sys/i386/i386/machdep.c:getmemsize(), and it looks as though it's not safe for 2GB (e.g. comparisons of byte addresses against signed int end). It would also be good if this probing I've tried various hacks to this code, but have not succeeded in making it work for 4GB. Changing int end to vm_offset_t end is not sufficient. It has a tendency to say Too many holes in address space ... Even defining MAXMEM does not solve the problem. Richard Cownie (t...@ma.ikos.com) To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: 4-way SMP broken ?
I have added more debugging messages, and the crash appears to be inside mp_start(). I don't have a log because this is too early in the boot to get the messages saved anywhere, and they go by too quickly to write it down. The evidence that this is an SMP problem is simple - with 2 cpu's plugged in, it works fine; with 3 or 4 cpu's plugged in, it crashes. Could you narrow down the crash further inside mp_start()? I'd like to know whether the crash occurred inside start_all_aps(). One or two lines of debug messages would be really helpful, you don't have to write down the exact words. Do you have options DDB enabled in the kernel? It helps to stop the last few lines of console messages to scroll of the screen. I believe the hardware is fine because I was previously running 19990421-CURRENT with all 4 cpu's without serious problems (it was a little unstable, but always booted ok). If possible, could you try a kernel built from sources with the POST_SMP_VMSHARE tag? I may have broken something during the commit. -lq To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: 4-way SMP broken ?
On Thu, 10 Jun 1999, Luoqi Chen wrote: Could you narrow down the crash further inside mp_start()? I'd like to know whether the crash occurred inside start_all_aps(). One or two lines of debug messages would be really helpful, you don't have to write down the exact words. Do you have options DDB enabled in the kernel? It helps to stop the last few lines of console messages to scroll of the screen. Yes, I added more messages and it's inside start_all_aps() - it seems to start AP #1 ok, then crashes while starting AP #2. If possible, could you try a kernel built from sources with the POST_SMP_VMSHARE tag? I may have broken something during the commit. Have to get out the door right now, will try this either tomorrow morning or Monday. Thanks Richard Cownie To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
4-way SMP broken ?
Hi, I've been trying to install 19990604-CURRENT on a couple of SC450NX boxes. It works fine with 2 cpu's, but an SMP kernel with 4 cpu's falls over very quickly (I think while it's setting up the APIC stuff, or very shortly after - the messages about APIC bus ids appear on the screen very briefly, then the machine reboots itself). Does anyone know a) when was the last time it worked on 4 cpu's b) what's changed recently which might relate to this. Also in trying to figure this out I looked at the DRAM probing code in /usr/src/sys/i386/i386/machdep.c:getmemsize(), and it looks as though it's not safe for 2GB (e.g. comparisons of byte addresses against signed int end). It would also be good if this probing code was carefule not to ventrue past 4GB-64MB (PCI device space) - then a generic kernel could work on a 4GB machine without any tweaking, which would simplify installation - I get nervous shuffling DIMMs in and out of the machine ... Thanks Richard Cownie (t...@ma.ikos.com) To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: 4-way SMP broken ?
Hi, I've been trying to install 19990604-CURRENT on a couple of SC450NX boxes. It works fine with 2 cpu's, but an SMP kernel with 4 cpu's falls over very quickly (I think while it's setting up the APIC stuff, or very shortly after - the messages about APIC bus ids appear on the screen very briefly, then the machine reboots itself). Do you mean messages like these? FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfec08000 cpu1 (AP): apic id: 12, version: 0x00040011, at 0xfec08000 io0 (APIC): apic id: 13, version: 0x00170011, at 0xfec0 By the time you see these messages, all cpus should have been booted up successfully, any crash immediately follows is not likely to be SMP related. It's helpful to pinpoint the crash if you could include the last few lines from a verbose boot. Does anyone know a) when was the last time it worked on 4 cpu's b) what's changed recently which might relate to this. Also in trying to figure this out I looked at the DRAM probing code in /usr/src/sys/i386/i386/machdep.c:getmemsize(), and it looks as though it's not safe for 2GB (e.g. comparisons of byte addresses against signed int end). It would also be good if this probing code was carefule not to ventrue past 4GB-64MB (PCI device space) - then a generic kernel could work on a 4GB machine without any tweaking, which would simplify installation - I get nervous shuffling DIMMs in and out of the machine ... Thanks Richard Cownie (t...@ma.ikos.com) -lq To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: 4-way SMP broken ?
In reply: Hi, I've been trying to install 19990604-CURRENT on a couple of SC450NX boxes. It works fine with 2 cpu's, but an SMP kernel with 4 cpu's falls over very quickly (I think while it's setting up the APIC stuff, or very shortly after - the messages about APIC bus ids appear on the screen very briefly, then the machine reboots itself). Do you mean messages like these? FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfec08000 cpu1 (AP): apic id: 12, version: 0x00040011, at 0xfec08000 io0 (APIC): apic id: 13, version: 0x00170011, at 0xfec0 By the time you see these messages, all cpus should have been booted up successfully, any crash immediately follows is not likely to be SMP related. It's helpful to pinpoint the crash if you could include the last few lines from a verbose boot. interesting. then why the delay in bringing up the AP? Note in the dmesg output below, that the AP only comes up during th SCSI delay. I have also added other comments to the following output. --- [Last night's kernel] Copyright (c) 1992-1999 The FreeBSD Project. Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. FreeBSD 4.0-CURRENT #7: Wed Jun 9 16:10:23 CDT 1999 jbry...@wahoo:/usr/src/sys/compile/WAHOO Timecounter i8254 frequency 1192990 Hz CPU: Pentium II/Xeon/Celeron (686-class CPU) Origin = GenuineIntel Id = 0x650 Stepping=0 Features=0x183fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR real memory = 134217728 (131072K bytes) avail memory = 126902272 (123928K bytes) Programming 24 pins in IOAPIC #0 FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfee0 cpu1 (AP): apic id: 1, version: 0x00040011, at 0xfee0 io0 (APIC): apic id: 2, version: 0x00170011, at 0xfec0 Preloaded elf kernel kernel at 0xc0393000. Preloaded userconfig_script /boot/kernel.conf at 0xc039309c. DEVFS: ready for devices Pentium Pro MTRR support enabled, default memory type is uncacheable ipl: ERROR: driver has bogus cdevsw-d_maj = -1 ^^ ?? ccd0-3: Concatenated disk drivers Probing for PnP devices: CSN 1 Vendor ID: YMH0802 [0x0208a865] Serial 0x Comp ID: PNPb02f [0x2fb0d041] npx0: math processor on motherboard npx0: INT 16 interface pcib0: PCI host bus adapter on motherboard pci0: PCI bus on pcib0 chip0: Intel 82443LX (440 LX) host to PCI bridge at device 0.0 on pci0 pcib1: Intel 82443LX (440 LX) PCI-PCI (AGP) bridge at device 1.0 on pci0 pci1: PCI bus on pcib1 vga-pci0: Matrox model 0521 graphics accelerator irq 2 at device 0.0 on pci1 isab0: Intel 82371AB PCI to ISA bridge at device 7.0 on pci0 chip1: Intel PIIX4 IDE controller at device 7.1 on pci0 uhci0: Intel 82371AB/EB (PIIX4) USB Host Controller irq 19 at device 7.2 on pci0 usb0: Intel 82371AB/EB (PIIX4) USB Host Controller on uhci0 uhub0 at usb0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered intpm0: Intel 82371AB Power management controller at device 7.3 on pci0 intpm0: I/O mapped fcb0 intpm0: intr IRQ 9 enabled revision 0 intsmb0: Intel PIIX4 SMBUS Interface smbus0: System Management Bus on intsmb0 intpm0: PM I/O mapped fc00 ed0: NE2000 PCI Ethernet (RealTek 8029) irq 17 at device 12.0 on pci0 ed0: address 00:00:e8:4e:0e:16, type NE2000 (16 bit) ahc0: Adaptec aic7895 Ultra SCSI adapter irq 19 at device 15.0 on pci0 ahc0: Using left over BIOS settings ahc0: aic7895 Wide Channel A, SCSI Id=7, 255 SCBs ahc1: Adaptec aic7895 Ultra SCSI adapter irq 16 at device 15.1 on pci0 ahc1: Using left over BIOS settings ahc1: aic7895 Wide Channel B, SCSI Id=7, 255 SCBs devclass_alloc_unit: ed0 already exists, using next available unit number ^ ?? isa0: ISA bus on motherboard fdc0: NEC 72065B or clone at port 0x3f0-0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: 1440-KB 3.5 drive at fdc0 drive 0 ppc0 at port 0x378-0x37f irq 7 on isa0 ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/8 bytes threshold plip0: PLIP network interface on ppbus 0 lpt0: generic printer on ppbus 0 lpt0: Interrupt-driven port ppi0: generic parallel i/o on ppbus 0 lppps0: Pulse per second Timing Interface on ppbus 0 sio0 at port 0x3f8-0x3ff irq 4 on isa0 sio0: type 16550A sio1: configured irq 3 not in bitmap of probed irqs 0 joy0 at port 0x201 on isa0 joy0: joystick atkbdc0: keyboard controller (i8042) at port 0x60-0x6f on isa0 atkbd0: AT Keyboard irq 1 on atkbdc0 kbd0 at atkbd0 psm0: PS/2 Mouse irq 12 on atkbdc0 psm0: model Generic PS/2 mouse, device ID 0 vga0: Generic ISA VGA on isa0 sc0: System console at flags 0x6 on isa0 pca0 at port 0x40 on isa0 pca0: PC
Re: 4-way SMP broken ?
Do you mean messages like these? FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfec08000 cpu1 (AP): apic id: 12, version: 0x00040011, at 0xfec08000 io0 (APIC): apic id: 13, version: 0x00170011, at 0xfec0 By the time you see these messages, all cpus should have been booted up successfully, any crash immediately follows is not likely to be SMP related. It's helpful to pinpoint the crash if you could include the last few lines from a verbose boot. interesting. then why the delay in bringing up the AP? Note in the dmesg output below, that the AP only comes up during th SCSI delay. I have also added other comments to the following output. The APs are up, but not fully initialized. Initializations that require holding of the giant lock are done near the end of the booting process, until then the APs are just spinning around the lock. Tor Egge tried once to move to an earlier time, but it didn't work well on some motherboards. -lq To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: 4-way SMP broken ?
interesting. then why the delay in bringing up the AP? Note in the dmesg output below, that the AP only comes up during th SCSI delay. I have also added other comments to the following output. The APs need the giant kernel lock when initializing the local APIC and printing the launched message. I added code for bringing up the APs earlier, but had to disable it, since it caused some machines to hang. The APs were probably launched too early, causing the BSP to attempt to send IPIs before the local APIC was initialized). A revised patch for bringing up the APs early is enclosed. - Tor Egge Index: mp_machdep.c === RCS file: /home/ncvs/src/sys/i386/i386/mp_machdep.c,v retrieving revision 1.102 diff -u -r1.102 mp_machdep.c --- mp_machdep.c1999/06/01 18:19:42 1.102 +++ mp_machdep.c1999/06/08 00:27:19 @@ -494,6 +494,10 @@ #if defined(APIC_IO) + +/* Wait for all APs to be fully initialized */ +extern int wait_ap(unsigned int); + /* * Final configuration of the BSP's local APIC: * - disable 'pic mode'. @@ -526,6 +530,9 @@ if (bootverbose) apic_dump(bsp_apic_configure()); + wait_ap(100); + if (smp_started == 0) + printf(WARNING: Failed to start all APs\n); } #endif /* APIC_IO */ @@ -1743,9 +1750,6 @@ #endif /* USE_CLOCKLOCK */ } - -/* Wait for all APs to be fully initialized */ -extern int wait_ap(unsigned int); /* * start each AP in our list
Re: 4-way SMP broken ?
In article local.mail.freebsd-current/199906092225.saa01...@par28.ma.ikos.com you write: Also in trying to figure this out I looked at the DRAM probing code in /usr/src/sys/i386/i386/machdep.c:getmemsize(), and it looks as though it's not safe for 2GB (e.g. comparisons of byte addresses against signed int end). I just made this into a vm_offset_t, so it should be good for up to 4GB. -- Jonathan To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: 4-way SMP broken ?
In reply: interesting. then why the delay in bringing up the AP? Note in the dmesg output below, that the AP only comes up during th SCSI delay. I have also added other comments to the following output. The APs need the giant kernel lock when initializing the local APIC and printing the launched message. I added code for bringing up the APs earlier, but had to disable it, since it caused some machines to hang. The APs were probably launched too early, causing the BSP to attempt to send IPIs before the local APIC was initialized). A revised patch for bringing up the APs early is enclosed. - Tor Egge Index: mp_machdep.c === RCS file: /home/ncvs/src/sys/i386/i386/mp_machdep.c,v retrieving revision 1.102 diff -u -r1.102 mp_machdep.c --- mp_machdep.c 1999/06/01 18:19:42 1.102 +++ mp_machdep.c 1999/06/08 00:27:19 the second hunk rejects. i show this: -rw-r--r-- 1 root wheel 62767 Jun 1 23:38 /usr/src/sys/i386/i386/mp_machdep.c i'll apply this by hand a little later, i'll get back with you on compatability. I am using a Tyan S1696DLUA Thunder2 motherboard. jim -- All opinions expressed are mine, if you| I will not be pushed, stamped, think otherwise, then go jump into turbid | briefed, debriefed, indexed, or radioactive waters and yell WAHOO !!! | numbered! - #1, The Prisoner -- Inet: jbry...@tfs.netAX.25: kc5...@wv0t.#neks.ks.usa.noam grid: EM28pw voice: KC5VDJ - 6 2 Meters AM/FM/SSB, 70cm FM. http://www.tfs.net/~jbryant -- HF/6M/2M: IC-706-MkII, 2M: HTX-212, 2M: HTX-202, 70cm: HTX-404, Packet: KPC-3+ To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message