Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)

2001-01-18 Thread Daniel C. Sobral

[EMAIL PROTECTED] wrote:
 
 The enclosed patch implements a virtual NMI pushbutton by programming
 the IOAPIC to deliver an NMI when sio1 generates an interrupt.

This would be a nice kernel option... :-)

-- 
Daniel C. Sobral(8-DCS)
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]

"There is no spoon." -- Kiki



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)

2001-01-18 Thread Julian Elischer

[EMAIL PROTECTED] wrote:
 
  Again I'll offer to run any and all code or patches to -current you
  guys can come up with, but I simply dont have the time to sit down
  and analyze into details what you have been doing...
 
 The enclosed patch implements a virtual NMI pushbutton by programming
 the IOAPIC to deliver an NMI when sio1 generates an interrupt.
 
 DDB should be defined in the kernel config file.
 
 getty should not run on ttyd1 when this patch is applied.
 
 A serial console on sio0 is recommended.
 
 If you still cannot break into the kernel debugger when the machine
 locks up then a rogue device is probably blocking the system
 (or the debugger is trying to obtain a mutex held by somebody else)
 
 - Tor Egge

cool.
What are the instructions for using this?
should something have sio1 open?
can a paperclip be used to generat the interupt by connecting pins 2 and 3?
etc.



-- 
  __--_|\  Julian Elischer
 /   \ [EMAIL PROTECTED]
(   OZ) World tour 2000
--- X_.---._/  from Perth, presently in:  Budapest
v




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)

2001-01-18 Thread Tor . Egge

 cool.
 What are the instructions for using this?
 should something have sio1 open?


I use conserver


conserver   conserver hostnull-modem serial cables   test machine
label

testport AA   -   sio0   serial console

testnmi port BB   -   sio1   NMI


I start two conserver sessions, one using test (for the console
access) and one using port testnmi (for NMI).

When I need an NMI, I just press return or space in the session using
port BB.

This only works when the test machine runs an SMP kernel with DDB and
the virtual NMI pushbutton patch.

No programs on the test machine should open sio1, since that could
cause interrupts (which are now NMIs).

 can a paperclip be used to generat the interupt by connecting pins 2 and 3?

I haven't tried that.

- Tor Egge


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)

2001-01-17 Thread Jason Evans

On Wed, Jan 17, 2001 at 07:42:26PM +0100, Soren Schmidt wrote:
  Basically if you're expecting me or the SMP team to figure out
  what's going on without more info, you're pretty much out of luck.
 
 See above, not really possible, we have been trying to find some
 (affordable) HW that could be used to preserve a log over a boot,
 but so far I havn't been able to find anything that works, and
 is fast enough to not effect the system too much...

I did some research on this and am convinced that at least some video cards
would work as memory buffers for KTR logs.  Specifically, someone mentioned
to me yesterday that their Matrox Millennium II flashes the X desktop
during startup from a previous invocation across warm boots.  (I pursued
some alternatives and found the PCI RAM cards to be prohibitively expensive
(more than $700), and sound cards to not have enough RAM except on old
SoundBlaster AWE cards.)

For someone with device driver experience, I expect it would be a few hours
of effort to make it possible to use a second video card (or even the
primary one for that matter) as a DMA region in which KTR logs can be
saved, so that there is a way to debug even these spontaneous reboots
you're having.  Maybe I'll eventually get to implementing this myself, but
to be honest, I don't have a driving need for it right now, whereas you
do. =)

You're experiencing a stability problem that none of us (SMPng people) can
reproduce.  We'd love to fix the problem, but without more information,
your reports are only slightly more useful than the typical newbie "it's
broken" reports, though certainly more frustrating.

Jason


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)

2001-01-17 Thread Soren Schmidt

It seems Jason Evans wrote:
 On Wed, Jan 17, 2001 at 07:42:26PM +0100, Soren Schmidt wrote:
   Basically if you're expecting me or the SMP team to figure out
   what's going on without more info, you're pretty much out of luck.
  
  See above, not really possible, we have been trying to find some
  (affordable) HW that could be used to preserve a log over a boot,
  but so far I havn't been able to find anything that works, and
  is fast enough to not effect the system too much...
 
 I did some research on this and am convinced that at least some video cards
 would work as memory buffers for KTR logs.  Specifically, someone mentioned
 to me yesterday that their Matrox Millennium II flashes the X desktop
 during startup from a previous invocation across warm boots.  (I pursued
 some alternatives and found the PCI RAM cards to be prohibitively expensive
 (more than $700), and sound cards to not have enough RAM except on old
 SoundBlaster AWE cards.)

Hmm, I've been toying with this, but the el cheapo videocards I have
all lose random amounts of their video RAM over a reset, probably due
to the DRAM refresh being absent for too long...

 For someone with device driver experience, I expect it would be a few hours
 of effort to make it possible to use a second video card (or even the
 primary one for that matter) as a DMA region in which KTR logs can be
 saved, so that there is a way to debug even these spontaneous reboots
 you're having.  Maybe I'll eventually get to implementing this myself, but
 to be honest, I don't have a driving need for it right now, whereas you
 do. =)

Do you need DMA ?? a simple ptr to the mem should do (and much easier
to get to work)...

 You're experiencing a stability problem that none of us (SMPng people) can
 reproduce.  We'd love to fix the problem, but without more information,
 your reports are only slightly more useful than the typical newbie "it's
 broken" reports, though certainly more frustrating.

Well, I'm not alone thats for sure, and since this has been so for
months I've almost gotten to the impression that something fundamental 
must be wrong, however until now I've just been told to go away :)

I know these problems are a bitch to find, but we need to take this
at least semi professionalistic and find out whats wrong, or 5.0
will be a disater when it hits the streets.

I dont have the time to play around with SMP for the time
being, but I do expect the SMPng group to take these problems
seriously instead of the "it works here" attitude thats been
hollering down the halls lately..

Again I'll offer to run any and all code or patches to -current you
guys can come up with, but I simply dont have the time to sit down
and analyze into details what you have been doing...

-Sren


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)

2001-01-17 Thread Matt Dillon

:I did some research on this and am convinced that at least some video cards
:would work as memory buffers for KTR logs.  Specifically, someone mentioned
:to me yesterday that their Matrox Millennium II flashes the X desktop
:during startup from a previous invocation across warm boots.  (I pursued
:some alternatives and found the PCI RAM cards to be prohibitively expensive
:(more than $700), and sound cards to not have enough RAM except on old
:SoundBlaster AWE cards.)

My Voodoo 3 2000 does the same thing... crash, reboot, bring up X,
and the original pre-boot display flashes before X reinitializes
the screen.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)

2001-01-17 Thread Tor . Egge

 Again I'll offer to run any and all code or patches to -current you
 guys can come up with, but I simply dont have the time to sit down
 and analyze into details what you have been doing...

The enclosed patch implements a virtual NMI pushbutton by programming
the IOAPIC to deliver an NMI when sio1 generates an interrupt.

DDB should be defined in the kernel config file.

getty should not run on ttyd1 when this patch is applied.

A serial console on sio0 is recommended.

If you still cannot break into the kernel debugger when the machine
locks up then a rogue device is probably blocking the system
(or the debugger is trying to obtain a mutex held by somebody else)

- Tor Egge




Index: sys/i386/i386/mpapic.c
===
RCS file: /home/ncvs/src/sys/i386/i386/mpapic.c,v
retrieving revision 1.45
diff -u -r1.45 mpapic.c
--- sys/i386/i386/mpapic.c  2001/01/10 04:43:46 1.45
+++ sys/i386/i386/mpapic.c  2001/01/18 05:44:30
@@ -269,6 +269,41 @@
/* return GOOD status */
return 0;
 }
+
+
+void
+enable_sio_NMI(int irq) 
+{
+   u_char  select;  /* the select register is 8 bits */
+   u_int32_t flags; /* the window register is 32 bits */
+   u_int32_t target;/* the window register is 32 bits */
+   u_int32_t vector;/* the window register is 32 bits */
+   int apic;
+   int pin;
+   
+   if (irq  0 || irq  15) {
+   printf("Could not enable NMI for irq %d\n", irq);
+   return;
+   }
+   apic = int_to_apicintpin[irq].ioapic; 
+   pin = int_to_apicintpin[irq].int_pin;
+
+   target = CPU_TO_ID(0)  24;
+   select = IOAPIC_REDTBL0 + (2 * pin);
+   vector = TPR_FAST_INTS + irq;
+   flags =  ((u_int32_t)
+ (IOART_INTMCLR |
+  IOART_TRGREDG |
+  IOART_INTAHI |
+  IOART_DESTPHY |
+  IOART_DELNMI));
+   
+   io_apic_write(apic, select, flags | vector);
+   io_apic_write(apic, select + 1, target);
+   printf("Enabled NMI for irq %d\n", irq);
+   printf("XXX IOAPIC #%d intpin %d -irq %d vector 0x%x (Delivery mode NMI)\n",
+  apic, pin, irq, vector);
+}
 #undef DEFAULT_ISA_FLAGS
 #undef DEFAULT_FLAGS
 
Index: sys/i386/i386/trap.c
===
RCS file: /home/ncvs/src/sys/i386/i386/trap.c,v
retrieving revision 1.164
diff -u -r1.164 trap.c
--- sys/i386/i386/trap.c2001/01/10 04:43:46 1.164
+++ sys/i386/i386/trap.c2001/01/18 05:44:30
@@ -248,7 +248,8 @@
 
atomic_add_int(cnt.v_trap, 1);
 
-   if ((frame.tf_eflags  PSL_I) == 0) {
+   if ((frame.tf_eflags  PSL_I) == 0 
+   frame.tf_trapno != T_NMI) {
/*
 * Buggy application or kernel code has disabled
 * interrupts and then trapped.  Enabling interrupts
@@ -285,8 +286,38 @@
enable_intr();
}   
 
-   mtx_enter(Giant, MTX_DEF);
+   if (frame.tf_trapno == T_NMI) {
+   /* If we can't get Giant then forward NMI to next CPU */
+   if (mtx_try_enter(Giant, MTX_DEF) == 0) {
+   u_long  icr_lo;
+   u_long  icr_hi;
+   int target;
+
+   target = PCPU_GET(cpuid) + 1;
+   if (((1  target)  PCPU_GET(other_cpus)) == 0)
+   target = 0;
+   
+   /* write the destination field for the target AP */
+   icr_hi = (lapic.icr_hi  ~APIC_ID_MASK) |
+   (cpu_num_to_apic_id[target]  24);
+   lapic.icr_hi = icr_hi;
+   
+   /* write command */
+   icr_lo = (lapic.icr_lo  APIC_RESV2_MASK) |
+   APIC_DEST_DESTFLD | APIC_DELMODE_NMI | 0xff;
+   lapic.icr_lo = icr_lo;
+   
+   /* wait for pending status end */
+   while (lapic.icr_lo  APIC_DELSTAT_MASK)
+   /* spin */ ;
 
+   __asm __volatile("int $0xff");
+
+   return;
+   }
+   } else
+   mtx_enter(Giant, MTX_DEF);
+
 #if defined(I586_CPU)  !defined(NO_F00F_HACK)
 restart:
 #endif
@@ -388,6 +419,9 @@
 */
if (ddb_on_nmi) {
printf ("NMI ... going to debugger\n");
+   sioEATintr();
+   __asm __volatile("int $0xff");
+   enable_intr();
kdb_trap (type, 0, frame);
}
 #endif /* DDB