Re: 4-way SMP broken ?

1999-06-10 Thread Richard Cownie
On Wed, 09 Jun 1999, Luoqi Chen wrote:
  I've been trying to install 19990604-CURRENT on a couple of SC450NX
  boxes.  It works fine with 2 cpu's, but an SMP kernel with 4 cpu's
  falls over very quickly (I think while it's setting up the APIC
  stuff, or very shortly after - the messages about APIC bus ids appear
  on the screen very briefly, then the machine reboots itself).
  
 Do you mean messages like these?
 FreeBSD/SMP: Multiprocessor motherboard
  cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfec08000
  cpu1 (AP):  apic id: 12, version: 0x00040011, at 0xfec08000
  io0 (APIC): apic id: 13, version: 0x00170011, at 0xfec0
 By the time you see these messages, all cpus should have been booted up
 successfully, any crash immediately follows is not likely to be SMP related.
 It's helpful to pinpoint the crash if you could include the last few lines
 from a verbose boot.

I have added more debugging messages, and the crash appears to be inside
mp_start().  I don't have a log because this is too early in the boot 
to get the messages saved anywhere, and they go by too quickly to
write it down.  The evidence that this is an SMP problem is simple -
with 2 cpu's plugged in, it works fine;  with 3 or 4 cpu's plugged in,
it crashes.

I believe the hardware is fine because I was previously running 
19990421-CURRENT with all 4 cpu's without serious problems (it was
a little unstable, but always booted ok).

  Does anyone know a) when was the last time it worked on 4 cpu's
  b) what's changed recently which might relate to this.

So if anyone has an answer to these questions I'd still be interested.

  Also in trying to figure this out I looked at the DRAM probing
  code in /usr/src/sys/i386/i386/machdep.c:getmemsize(), and it looks
  as though it's not safe for 2GB (e.g. comparisons of byte addresses
  against signed int end).  It would also be good if this probing

I've tried various hacks to this code, but have not succeeded in making it
work for 4GB.  Changing int end to vm_offset_t end is not sufficient.
It has a tendency to say Too many holes in address space ...  Even 
defining MAXMEM does not solve the problem.

Richard Cownie (t...@ma.ikos.com)


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



Re: 4-way SMP broken ?

1999-06-10 Thread Luoqi Chen
 I have added more debugging messages, and the crash appears to be inside
 mp_start().  I don't have a log because this is too early in the boot 
 to get the messages saved anywhere, and they go by too quickly to
 write it down.  The evidence that this is an SMP problem is simple -
 with 2 cpu's plugged in, it works fine;  with 3 or 4 cpu's plugged in,
 it crashes.
 
Could you narrow down the crash further inside mp_start()? I'd like to
know whether the crash occurred inside start_all_aps(). One or two lines of
debug messages would be really helpful, you don't have to write down the exact
words. Do you have options DDB enabled in the kernel? It helps to stop
the last few lines of console messages to scroll of the screen.

 I believe the hardware is fine because I was previously running 
 19990421-CURRENT with all 4 cpu's without serious problems (it was
 a little unstable, but always booted ok).
 
If possible, could you try a kernel built from sources with the
POST_SMP_VMSHARE tag? I may have broken something during the commit.

-lq


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



Re: 4-way SMP broken ?

1999-06-10 Thread Richard Cownie
On Thu, 10 Jun 1999, Luoqi Chen wrote:
 Could you narrow down the crash further inside mp_start()? I'd like to
 know whether the crash occurred inside start_all_aps(). One or two lines of
 debug messages would be really helpful, you don't have to write down the exact
 words. Do you have options DDB enabled in the kernel? It helps to stop
 the last few lines of console messages to scroll of the screen.

Yes, I added more messages and it's inside start_all_aps() - it seems
to start AP #1 ok, then crashes while starting AP #2.  

 If possible, could you try a kernel built from sources with the
 POST_SMP_VMSHARE tag? I may have broken something during the commit.

Have to get out the door right now, will try this either tomorrow morning
or Monday.

Thanks
Richard Cownie


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



4-way SMP broken ?

1999-06-09 Thread Richard Cownie
Hi,

I've been trying to install 19990604-CURRENT on a couple of SC450NX
boxes.  It works fine with 2 cpu's, but an SMP kernel with 4 cpu's
falls over very quickly (I think while it's setting up the APIC
stuff, or very shortly after - the messages about APIC bus ids appear
on the screen very briefly, then the machine reboots itself).

Does anyone know a) when was the last time it worked on 4 cpu's
b) what's changed recently which might relate to this.

Also in trying to figure this out I looked at the DRAM probing
code in /usr/src/sys/i386/i386/machdep.c:getmemsize(), and it looks
as though it's not safe for 2GB (e.g. comparisons of byte addresses
against signed int end).  It would also be good if this probing
code was carefule not to ventrue past 4GB-64MB (PCI device space) -
then a generic kernel could work on a 4GB machine without any tweaking,
which would simplify installation - I get nervous shuffling DIMMs
in and out of the machine ...

Thanks
   Richard Cownie (t...@ma.ikos.com)


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



Re: 4-way SMP broken ?

1999-06-09 Thread Luoqi Chen
 Hi,
 
 I've been trying to install 19990604-CURRENT on a couple of SC450NX
 boxes.  It works fine with 2 cpu's, but an SMP kernel with 4 cpu's
 falls over very quickly (I think while it's setting up the APIC
 stuff, or very shortly after - the messages about APIC bus ids appear
 on the screen very briefly, then the machine reboots itself).
 
Do you mean messages like these?
FreeBSD/SMP: Multiprocessor motherboard
 cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfec08000
 cpu1 (AP):  apic id: 12, version: 0x00040011, at 0xfec08000
 io0 (APIC): apic id: 13, version: 0x00170011, at 0xfec0
By the time you see these messages, all cpus should have been booted up
successfully, any crash immediately follows is not likely to be SMP related.
It's helpful to pinpoint the crash if you could include the last few lines
from a verbose boot.

 Does anyone know a) when was the last time it worked on 4 cpu's
 b) what's changed recently which might relate to this.
 
 Also in trying to figure this out I looked at the DRAM probing
 code in /usr/src/sys/i386/i386/machdep.c:getmemsize(), and it looks
 as though it's not safe for 2GB (e.g. comparisons of byte addresses
 against signed int end).  It would also be good if this probing
 code was carefule not to ventrue past 4GB-64MB (PCI device space) -
 then a generic kernel could work on a 4GB machine without any tweaking,
 which would simplify installation - I get nervous shuffling DIMMs
 in and out of the machine ...
 
 Thanks
Richard Cownie (t...@ma.ikos.com)
 

-lq


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



Re: 4-way SMP broken ?

1999-06-09 Thread Jim Bryant
In reply:
  Hi,
  
  I've been trying to install 19990604-CURRENT on a couple of SC450NX
  boxes.  It works fine with 2 cpu's, but an SMP kernel with 4 cpu's
  falls over very quickly (I think while it's setting up the APIC
  stuff, or very shortly after - the messages about APIC bus ids appear
  on the screen very briefly, then the machine reboots itself).
  
 Do you mean messages like these?
 FreeBSD/SMP: Multiprocessor motherboard
  cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfec08000
  cpu1 (AP):  apic id: 12, version: 0x00040011, at 0xfec08000
  io0 (APIC): apic id: 13, version: 0x00170011, at 0xfec0
 By the time you see these messages, all cpus should have been booted up
 successfully, any crash immediately follows is not likely to be SMP related.
 It's helpful to pinpoint the crash if you could include the last few lines
 from a verbose boot.

interesting.  then why the delay in bringing up the AP?  Note in the
dmesg output below, that the AP only comes up during th SCSI delay.  I
have also added other comments to the following output.

---
[Last night's kernel]

Copyright (c) 1992-1999 The FreeBSD Project.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
FreeBSD 4.0-CURRENT #7: Wed Jun  9 16:10:23 CDT 1999
jbry...@wahoo:/usr/src/sys/compile/WAHOO
Timecounter i8254  frequency 1192990 Hz
CPU: Pentium II/Xeon/Celeron (686-class CPU)
  Origin = GenuineIntel  Id = 0x650  Stepping=0
  
Features=0x183fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR
real memory  = 134217728 (131072K bytes)
avail memory = 126902272 (123928K bytes)
Programming 24 pins in IOAPIC #0
FreeBSD/SMP: Multiprocessor motherboard
 cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfee0
 cpu1 (AP):  apic id:  1, version: 0x00040011, at 0xfee0
 io0 (APIC): apic id:  2, version: 0x00170011, at 0xfec0
Preloaded elf kernel kernel at 0xc0393000.
Preloaded userconfig_script /boot/kernel.conf at 0xc039309c.
DEVFS: ready for devices
Pentium Pro MTRR support enabled, default memory type is uncacheable
ipl: ERROR: driver has bogus cdevsw-d_maj = -1
 ^^ ??
ccd0-3: Concatenated disk drivers
Probing for PnP devices:
CSN 1 Vendor ID: YMH0802 [0x0208a865] Serial 0x Comp ID: PNPb02f 
[0x2fb0d041]
npx0: math processor on motherboard
npx0: INT 16 interface
pcib0: PCI host bus adapter on motherboard
pci0: PCI bus on pcib0
chip0: Intel 82443LX (440 LX) host to PCI bridge at device 0.0 on pci0
pcib1: Intel 82443LX (440 LX) PCI-PCI (AGP) bridge at device 1.0 on pci0
pci1: PCI bus on pcib1
vga-pci0: Matrox model 0521 graphics accelerator irq 2 at device 0.0 on pci1
isab0: Intel 82371AB PCI to ISA bridge at device 7.0 on pci0
chip1: Intel PIIX4 IDE controller at device 7.1 on pci0
uhci0: Intel 82371AB/EB (PIIX4) USB Host Controller irq 19 at device 7.2 on 
pci0
usb0: Intel 82371AB/EB (PIIX4) USB Host Controller on uhci0
uhub0 at usb0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
intpm0: Intel 82371AB Power management controller at device 7.3 on pci0
intpm0: I/O mapped fcb0
intpm0: intr IRQ 9 enabled revision 0
intsmb0: Intel PIIX4 SMBUS Interface
smbus0: System Management Bus on intsmb0
intpm0: PM I/O mapped fc00 
ed0: NE2000 PCI Ethernet (RealTek 8029) irq 17 at device 12.0 on pci0
ed0: address 00:00:e8:4e:0e:16, type NE2000 (16 bit) 
ahc0: Adaptec aic7895 Ultra SCSI adapter irq 19 at device 15.0 on pci0
ahc0: Using left over BIOS settings
ahc0: aic7895 Wide Channel A, SCSI Id=7, 255 SCBs
ahc1: Adaptec aic7895 Ultra SCSI adapter irq 16 at device 15.1 on pci0
ahc1: Using left over BIOS settings
ahc1: aic7895 Wide Channel B, SCSI Id=7, 255 SCBs
devclass_alloc_unit: ed0 already exists, using next available unit number
^ ??
isa0: ISA bus on motherboard
fdc0: NEC 72065B or clone at port 0x3f0-0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: 1440-KB 3.5 drive at fdc0 drive 0
ppc0 at port 0x378-0x37f irq 7 on isa0
ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/8 bytes threshold
plip0: PLIP network interface on ppbus 0
lpt0: generic printer on ppbus 0
lpt0: Interrupt-driven port
ppi0: generic parallel i/o on ppbus 0
lppps0: Pulse per second Timing Interface on ppbus 0
sio0 at port 0x3f8-0x3ff irq 4 on isa0
sio0: type 16550A
sio1: configured irq 3 not in bitmap of probed irqs 0
joy0 at port 0x201 on isa0
joy0: joystick
atkbdc0: keyboard controller (i8042) at port 0x60-0x6f on isa0
atkbd0: AT Keyboard irq 1 on atkbdc0
kbd0 at atkbd0
psm0: PS/2 Mouse irq 12 on atkbdc0
psm0: model Generic PS/2 mouse, device ID 0
vga0: Generic ISA VGA on isa0
sc0: System console at flags 0x6 on isa0
pca0 at port 0x40 on isa0
pca0: PC 

Re: 4-way SMP broken ?

1999-06-09 Thread Luoqi Chen
  Do you mean messages like these?
  FreeBSD/SMP: Multiprocessor motherboard
   cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfec08000
   cpu1 (AP):  apic id: 12, version: 0x00040011, at 0xfec08000
   io0 (APIC): apic id: 13, version: 0x00170011, at 0xfec0
  By the time you see these messages, all cpus should have been booted up
  successfully, any crash immediately follows is not likely to be SMP related.
  It's helpful to pinpoint the crash if you could include the last few lines
  from a verbose boot.
 
 interesting.  then why the delay in bringing up the AP?  Note in the
 dmesg output below, that the AP only comes up during th SCSI delay.  I
 have also added other comments to the following output.
 
The APs are up, but not fully initialized. Initializations that require
holding of the giant lock are done near the end of the booting process,
until then the APs are just spinning around the lock. Tor Egge tried once
to move to an earlier time, but it didn't work well on some motherboards.

-lq


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



Re: 4-way SMP broken ?

1999-06-09 Thread Tor . Egge

 interesting.  then why the delay in bringing up the AP?  Note in the
 dmesg output below, that the AP only comes up during th SCSI delay.  I
 have also added other comments to the following output.

The APs need the giant kernel lock when initializing the 
local APIC and printing the launched message.

I added code for bringing up the APs earlier, but had to disable it,
since it caused some machines to hang.  The APs were probably launched
too early, causing the BSP to attempt to send IPIs before the local
APIC was initialized).

A revised patch for bringing up the APs early is enclosed.

- Tor Egge

Index: mp_machdep.c
===
RCS file: /home/ncvs/src/sys/i386/i386/mp_machdep.c,v
retrieving revision 1.102
diff -u -r1.102 mp_machdep.c
--- mp_machdep.c1999/06/01 18:19:42 1.102
+++ mp_machdep.c1999/06/08 00:27:19
@@ -494,6 +494,10 @@
 
 
 #if defined(APIC_IO)
+
+/* Wait for all APs to be fully initialized */
+extern int wait_ap(unsigned int);
+
 /*
  * Final configuration of the BSP's local APIC:
  *  - disable 'pic mode'.
@@ -526,6 +530,9 @@
 
if (bootverbose)
apic_dump(bsp_apic_configure());
+   wait_ap(100);
+   if (smp_started == 0)
+   printf(WARNING: Failed to start all APs\n);
 }
 #endif  /* APIC_IO */
 
@@ -1743,9 +1750,6 @@
 #endif /* USE_CLOCKLOCK */
 }
 
-
-/* Wait for all APs to be fully initialized */
-extern int wait_ap(unsigned int);
 
 /*
  * start each AP in our list


Re: 4-way SMP broken ?

1999-06-09 Thread Jonathan Lemon
In article local.mail.freebsd-current/199906092225.saa01...@par28.ma.ikos.com 
you write:
Also in trying to figure this out I looked at the DRAM probing
code in /usr/src/sys/i386/i386/machdep.c:getmemsize(), and it looks
as though it's not safe for 2GB (e.g. comparisons of byte addresses
against signed int end).

I just made this into a vm_offset_t, so it should be good for up to 4GB.
--
Jonathan


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message



Re: 4-way SMP broken ?

1999-06-09 Thread Jim Bryant
In reply:
  interesting.  then why the delay in bringing up the AP?  Note in the
  dmesg output below, that the AP only comes up during th SCSI delay.  I
  have also added other comments to the following output.
 
 The APs need the giant kernel lock when initializing the 
 local APIC and printing the launched message.
 
 I added code for bringing up the APs earlier, but had to disable it,
 since it caused some machines to hang.  The APs were probably launched
 too early, causing the BSP to attempt to send IPIs before the local
 APIC was initialized).
 
 A revised patch for bringing up the APs early is enclosed.
 
 - Tor Egge
 

 Index: mp_machdep.c
 ===
 RCS file: /home/ncvs/src/sys/i386/i386/mp_machdep.c,v
 retrieving revision 1.102
 diff -u -r1.102 mp_machdep.c
 --- mp_machdep.c  1999/06/01 18:19:42 1.102
 +++ mp_machdep.c  1999/06/08 00:27:19

the second hunk rejects.

i show this:

-rw-r--r--  1 root  wheel  62767 Jun  1 23:38 
/usr/src/sys/i386/i386/mp_machdep.c

i'll apply this by hand a little later, i'll get back with you on
compatability.  I am using a Tyan S1696DLUA Thunder2 motherboard.

jim
-- 
All opinions expressed are mine, if you|  I will not be pushed, stamped,
think otherwise, then go jump into turbid  |  briefed, debriefed, indexed, or
radioactive waters and yell WAHOO !!!  |  numbered! - #1, The Prisoner
--
Inet: jbry...@tfs.netAX.25: kc5...@wv0t.#neks.ks.usa.noam grid: EM28pw
voice: KC5VDJ - 6  2 Meters AM/FM/SSB, 70cm FM.   http://www.tfs.net/~jbryant
--
HF/6M/2M: IC-706-MkII, 2M: HTX-212, 2M: HTX-202, 70cm: HTX-404, Packet: KPC-3+


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message