Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)

2001-01-18 Thread Tor . Egge

> cool.
> What are the instructions for using this?
> should something have sio1 open?


I use conserver


conserver   conserver hostnull-modem serial cables   test machine
label

testport AA   -   sio0   serial console

testnmi port BB   -   sio1   NMI


I start two conserver sessions, one using test (for the console
access) and one using port testnmi (for NMI).

When I need an NMI, I just press return or space in the session using
port BB.

This only works when the test machine runs an SMP kernel with DDB and
the virtual NMI pushbutton patch.

No programs on the test machine should open sio1, since that could
cause interrupts (which are now NMIs).

> can a paperclip be used to generat the interupt by connecting pins 2 and 3?

I haven't tried that.

- Tor Egge


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)

2001-01-18 Thread Julian Elischer

[EMAIL PROTECTED] wrote:
> 
> > Again I'll offer to run any and all code or patches to -current you
> > guys can come up with, but I simply dont have the time to sit down
> > and analyze into details what you have been doing...
> 
> The enclosed patch implements a virtual NMI pushbutton by programming
> the IOAPIC to deliver an NMI when sio1 generates an interrupt.
> 
> DDB should be defined in the kernel config file.
> 
> getty should not run on ttyd1 when this patch is applied.
> 
> A serial console on sio0 is recommended.
> 
> If you still cannot break into the kernel debugger when the machine
> locks up then a rogue device is probably blocking the system
> (or the debugger is trying to obtain a mutex held by somebody else)
> 
> - Tor Egge

cool.
What are the instructions for using this?
should something have sio1 open?
can a paperclip be used to generat the interupt by connecting pins 2 and 3?
etc.



-- 
  __--_|\  Julian Elischer
 /   \ [EMAIL PROTECTED]
(   OZ) World tour 2000
---> X_.---._/  from Perth, presently in:  Budapest
v




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)

2001-01-18 Thread Daniel C. Sobral

[EMAIL PROTECTED] wrote:
> 
> The enclosed patch implements a virtual NMI pushbutton by programming
> the IOAPIC to deliver an NMI when sio1 generates an interrupt.

This would be a nice kernel option... :-)

-- 
Daniel C. Sobral(8-DCS)
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]

"There is no spoon." -- Kiki



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)

2001-01-17 Thread Tor . Egge

> Again I'll offer to run any and all code or patches to -current you
> guys can come up with, but I simply dont have the time to sit down
> and analyze into details what you have been doing...

The enclosed patch implements a virtual NMI pushbutton by programming
the IOAPIC to deliver an NMI when sio1 generates an interrupt.

DDB should be defined in the kernel config file.

getty should not run on ttyd1 when this patch is applied.

A serial console on sio0 is recommended.

If you still cannot break into the kernel debugger when the machine
locks up then a rogue device is probably blocking the system
(or the debugger is trying to obtain a mutex held by somebody else)

- Tor Egge




Index: sys/i386/i386/mpapic.c
===
RCS file: /home/ncvs/src/sys/i386/i386/mpapic.c,v
retrieving revision 1.45
diff -u -r1.45 mpapic.c
--- sys/i386/i386/mpapic.c  2001/01/10 04:43:46 1.45
+++ sys/i386/i386/mpapic.c  2001/01/18 05:44:30
@@ -269,6 +269,41 @@
/* return GOOD status */
return 0;
 }
+
+
+void
+enable_sio_NMI(int irq) 
+{
+   u_char  select;  /* the select register is 8 bits */
+   u_int32_t flags; /* the window register is 32 bits */
+   u_int32_t target;/* the window register is 32 bits */
+   u_int32_t vector;/* the window register is 32 bits */
+   int apic;
+   int pin;
+   
+   if (irq < 0 || irq > 15) {
+   printf("Could not enable NMI for irq %d\n", irq);
+   return;
+   }
+   apic = int_to_apicintpin[irq].ioapic; 
+   pin = int_to_apicintpin[irq].int_pin;
+
+   target = CPU_TO_ID(0) << 24;
+   select = IOAPIC_REDTBL0 + (2 * pin);
+   vector = TPR_FAST_INTS + irq;
+   flags =  ((u_int32_t)
+ (IOART_INTMCLR |
+  IOART_TRGREDG |
+  IOART_INTAHI |
+  IOART_DESTPHY |
+  IOART_DELNMI));
+   
+   io_apic_write(apic, select, flags | vector);
+   io_apic_write(apic, select + 1, target);
+   printf("Enabled NMI for irq %d\n", irq);
+   printf("XXX IOAPIC #%d intpin %d ->irq %d vector 0x%x (Delivery mode NMI)\n",
+  apic, pin, irq, vector);
+}
 #undef DEFAULT_ISA_FLAGS
 #undef DEFAULT_FLAGS
 
Index: sys/i386/i386/trap.c
===
RCS file: /home/ncvs/src/sys/i386/i386/trap.c,v
retrieving revision 1.164
diff -u -r1.164 trap.c
--- sys/i386/i386/trap.c2001/01/10 04:43:46 1.164
+++ sys/i386/i386/trap.c2001/01/18 05:44:30
@@ -248,7 +248,8 @@
 
atomic_add_int(&cnt.v_trap, 1);
 
-   if ((frame.tf_eflags & PSL_I) == 0) {
+   if ((frame.tf_eflags & PSL_I) == 0 &&
+   frame.tf_trapno != T_NMI) {
/*
 * Buggy application or kernel code has disabled
 * interrupts and then trapped.  Enabling interrupts
@@ -285,8 +286,38 @@
enable_intr();
}   
 
-   mtx_enter(&Giant, MTX_DEF);
+   if (frame.tf_trapno == T_NMI) {
+   /* If we can't get Giant then forward NMI to next CPU */
+   if (mtx_try_enter(&Giant, MTX_DEF) == 0) {
+   u_long  icr_lo;
+   u_long  icr_hi;
+   int target;
+
+   target = PCPU_GET(cpuid) + 1;
+   if (((1 << target) & PCPU_GET(other_cpus)) == 0)
+   target = 0;
+   
+   /* write the destination field for the target AP */
+   icr_hi = (lapic.icr_hi & ~APIC_ID_MASK) |
+   (cpu_num_to_apic_id[target] << 24);
+   lapic.icr_hi = icr_hi;
+   
+   /* write command */
+   icr_lo = (lapic.icr_lo & APIC_RESV2_MASK) |
+   APIC_DEST_DESTFLD | APIC_DELMODE_NMI | 0xff;
+   lapic.icr_lo = icr_lo;
+   
+   /* wait for pending status end */
+   while (lapic.icr_lo & APIC_DELSTAT_MASK)
+   /* spin */ ;
 
+   __asm __volatile("int $0xff");
+
+   return;
+   }
+   } else
+   mtx_enter(&Giant, MTX_DEF);
+
 #if defined(I586_CPU) && !defined(NO_F00F_HACK)
 restart:
 #endif
@@ -388,6 +419,9 @@
 */
if (ddb_on_nmi) {
printf ("NMI ... going to debugger\n");
+   sioEATintr();
+   __asm __volatile("int $0xff");
+   enable_intr();
kdb_trap (type, 0, &frame);
  

Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)

2001-01-17 Thread Matt Dillon

:I did some research on this and am convinced that at least some video cards
:would work as memory buffers for KTR logs.  Specifically, someone mentioned
:to me yesterday that their Matrox Millennium II flashes the X desktop
:during startup from a previous invocation across warm boots.  (I pursued
:some alternatives and found the PCI RAM cards to be prohibitively expensive
:(more than $700), and sound cards to not have enough RAM except on old
:SoundBlaster AWE cards.)

My Voodoo 3 2000 does the same thing... crash, reboot, bring up X,
and the original pre-boot display flashes before X reinitializes
the screen.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)

2001-01-17 Thread Soren Schmidt

It seems Jason Evans wrote:
> On Wed, Jan 17, 2001 at 07:42:26PM +0100, Soren Schmidt wrote:
> > > Basically if you're expecting me or the SMP team to figure out
> > > what's going on without more info, you're pretty much out of luck.
> > 
> > See above, not really possible, we have been trying to find some
> > (affordable) HW that could be used to preserve a log over a boot,
> > but so far I havn't been able to find anything that works, and
> > is fast enough to not effect the system too much...
> 
> I did some research on this and am convinced that at least some video cards
> would work as memory buffers for KTR logs.  Specifically, someone mentioned
> to me yesterday that their Matrox Millennium II flashes the X desktop
> during startup from a previous invocation across warm boots.  (I pursued
> some alternatives and found the PCI RAM cards to be prohibitively expensive
> (more than $700), and sound cards to not have enough RAM except on old
> SoundBlaster AWE cards.)

Hmm, I've been toying with this, but the el cheapo videocards I have
all lose random amounts of their video RAM over a reset, probably due
to the DRAM refresh being absent for too long...

> For someone with device driver experience, I expect it would be a few hours
> of effort to make it possible to use a second video card (or even the
> primary one for that matter) as a DMA region in which KTR logs can be
> saved, so that there is a way to debug even these spontaneous reboots
> you're having.  Maybe I'll eventually get to implementing this myself, but
> to be honest, I don't have a driving need for it right now, whereas you
> do. =)

Do you need DMA ?? a simple ptr to the mem should do (and much easier
to get to work)...

> You're experiencing a stability problem that none of us (SMPng people) can
> reproduce.  We'd love to fix the problem, but without more information,
> your reports are only slightly more useful than the typical newbie "it's
> broken" reports, though certainly more frustrating.

Well, I'm not alone thats for sure, and since this has been so for
months I've almost gotten to the impression that something fundamental 
must be wrong, however until now I've just been told to go away :)

I know these problems are a bitch to find, but we need to take this
at least semi professionalistic and find out whats wrong, or 5.0
will be a disater when it hits the streets.

I dont have the time to play around with SMP for the time
being, but I do expect the SMPng group to take these problems
seriously instead of the "it works here" attitude thats been
hollering down the halls lately..

Again I'll offer to run any and all code or patches to -current you
guys can come up with, but I simply dont have the time to sit down
and analyze into details what you have been doing...

-Søren


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Debugging SMP instability (was Re: HEADS-UP: await/asleep removal imminent)

2001-01-17 Thread Jason Evans

On Wed, Jan 17, 2001 at 07:42:26PM +0100, Soren Schmidt wrote:
> > Basically if you're expecting me or the SMP team to figure out
> > what's going on without more info, you're pretty much out of luck.
> 
> See above, not really possible, we have been trying to find some
> (affordable) HW that could be used to preserve a log over a boot,
> but so far I havn't been able to find anything that works, and
> is fast enough to not effect the system too much...

I did some research on this and am convinced that at least some video cards
would work as memory buffers for KTR logs.  Specifically, someone mentioned
to me yesterday that their Matrox Millennium II flashes the X desktop
during startup from a previous invocation across warm boots.  (I pursued
some alternatives and found the PCI RAM cards to be prohibitively expensive
(more than $700), and sound cards to not have enough RAM except on old
SoundBlaster AWE cards.)

For someone with device driver experience, I expect it would be a few hours
of effort to make it possible to use a second video card (or even the
primary one for that matter) as a DMA region in which KTR logs can be
saved, so that there is a way to debug even these spontaneous reboots
you're having.  Maybe I'll eventually get to implementing this myself, but
to be honest, I don't have a driving need for it right now, whereas you
do. =)

You're experiencing a stability problem that none of us (SMPng people) can
reproduce.  We'd love to fix the problem, but without more information,
your reports are only slightly more useful than the typical newbie "it's
broken" reports, though certainly more frustrating.

Jason


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: smp instability

2000-10-26 Thread Patrick Hartling

Patrick Hartling <[EMAIL PROTECTED]> wrote:

} John Baldwin <[EMAIL PROTECTED]> wrote:
} 
} } 
} } On 25-Oct-00 Chuck Robey wrote:
} } > I'm having rather extreme problems with stability on my dual PIII
} } > setup.  I know this is to be expected, but it's gotten so extreme on my
} } > system, I can't spend more than a few minutes before it locks up.
} } > 
} } > Is there any chance that I could make things better by using a sysctl to
} } > tell the box it's now a single-cpu system?  I can't read man pages at the
} } > moment (I'm composing this on my Sparc Ultra-5) so if this might work, an
*** d
} } > someone knows the exact command to use, I'd appreciate a bit of help.
} } 
} } You can use kernel.old to compile a UP kernel.  I always keep a UP kernel
} } around just in case.  Also, when did your SMP box become unstable?  There
} } was a known problem with SMP boxes when the vm page zero'ing during the idl
*** e
} } loop was first turned on that has since been fixed with the latest commit t
*** o
} } vm_machdep.c yesterday.  Symptoms were frequent kernel panic 12's with
} } interrupts disabled .
} 
} I am having the same lockup problems as Chuck with SMP kernels built since
} October 21.  The system completely locks up after a short period of time.
} If I'm running X, it does it within 10-15 minutes, but if I don't run X
} and just leave it at the console, it can go for a few hours.  It does
} eventually lock up, though.  I haven't tried building a UP kernel, but I
} will try the latest vm_machdep.c changes.  If that doesn't work, I'll go
} the UP route since I'm tired of being unable to list my processes.  :\

To follow up on this, I rebuilt everything using sources from
approximately 11:00 am CDT yesterday (10/26), and everything is great
again.  Hooray!

 -Patrick


Patrick L. Hartling | Research Assistant, VRAC
[EMAIL PROTECTED] | 2624 Howe Hall -- (515)294-4916
http://www.137.org/patrick/ | http://www.vrac.iastate.edu/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: smp instability

2000-10-25 Thread Patrick Hartling

John Baldwin <[EMAIL PROTECTED]> wrote:

} 
} On 25-Oct-00 Chuck Robey wrote:
} > I'm having rather extreme problems with stability on my dual PIII
} > setup.  I know this is to be expected, but it's gotten so extreme on my
} > system, I can't spend more than a few minutes before it locks up.
} > 
} > Is there any chance that I could make things better by using a sysctl to
} > tell the box it's now a single-cpu system?  I can't read man pages at the
} > moment (I'm composing this on my Sparc Ultra-5) so if this might work, and
} > someone knows the exact command to use, I'd appreciate a bit of help.
} 
} You can use kernel.old to compile a UP kernel.  I always keep a UP kernel
} around just in case.  Also, when did your SMP box become unstable?  There
} was a known problem with SMP boxes when the vm page zero'ing during the idle
} loop was first turned on that has since been fixed with the latest commit to
} vm_machdep.c yesterday.  Symptoms were frequent kernel panic 12's with
} interrupts disabled .

I am having the same lockup problems as Chuck with SMP kernels built since
October 21.  The system completely locks up after a short period of time.
If I'm running X, it does it within 10-15 minutes, but if I don't run X
and just leave it at the console, it can go for a few hours.  It does
eventually lock up, though.  I haven't tried building a UP kernel, but I
will try the latest vm_machdep.c changes.  If that doesn't work, I'll go
the UP route since I'm tired of being unable to list my processes.  :\

My working world+kernel was built October 4.  Normally, I update my
-current system more frequently than that, but this has been an abnormally
busy month.  Because of that, I can't narrow down exactly when the
instability began.  Right now, I'm running with a world built October 23
and the October 4 kernel which is rather unpleasant.

 -Patrick


Patrick L. Hartling | Research Assistant, VRAC
[EMAIL PROTECTED] | 2624 Howe Hall -- (515)294-4916
http://www.137.org/patrick/ | http://www.vrac.iastate.edu/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: smp instability

2000-10-24 Thread Chuck Robey

On Tue, 24 Oct 2000, Mike Meyer wrote:

> Chuck Robey writes:
> > I'm having rather extreme problems with stability on my dual PIII
> > setup.  I know this is to be expected, but it's gotten so extreme on my
> > system, I can't spend more than a few minutes before it locks up.
> > 
> > Is there any chance that I could make things better by using a sysctl to
> > tell the box it's now a single-cpu system?  I can't read man pages at the
> > moment (I'm composing this on my Sparc Ultra-5) so if this might work, and
> > someone knows the exact command to use, I'd appreciate a bit of help.
> 
> Try "sysctl -w machdep.smp_active=0". It's not clear how much good
> this will do since you'll still be running an SMP kernel. Please
> let us know how that works.


With less than a full hour's history, I haven't exactly heavily tested it,
but it only lasted 10 minutes last time, and my system is still kicking
currently.

Regarding that control-C needed on booting thing: when I log in, my call
to fortune needs to be interrupted also, so I immediately went and tried a
"ktrace fortune".  I didn't need to kdump, because doing that ktrace seems
to have somehow cleared the control-C thing on all that kicked it off
before (not just fortune alone).

My system is really repeatable on that, so if it's not yet fixed, and you
have other things to try on it, I'd be willing (if my system stays up!)

In the meantime, I think that "sysctl -w machdep.smp_active=0" might
actually work for me (I did it in single user so the multiuser startup
would be cleaner).



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



RE: smp instability

2000-10-24 Thread Chuck Robey

On Tue, 24 Oct 2000, John Baldwin wrote:

> 
> On 25-Oct-00 Chuck Robey wrote:
> > I'm having rather extreme problems with stability on my dual PIII
> > setup.  I know this is to be expected, but it's gotten so extreme on my
> > system, I can't spend more than a few minutes before it locks up.
> > 
> > Is there any chance that I could make things better by using a sysctl to
> > tell the box it's now a single-cpu system?  I can't read man pages at the
> > moment (I'm composing this on my Sparc Ultra-5) so if this might work, and
> > someone knows the exact command to use, I'd appreciate a bit of help.
> 
> You can use kernel.old to compile a UP kernel.  I always keep a UP kernel
> around just in case.  Also, when did your SMP box become unstable?  There
> was a known problem with SMP boxes when the vm page zero'ing during the idle
> loop was first turned on that has since been fixed with the latest commit to
> vm_machdep.c yesterday.  Symptoms were frequent kernel panic 12's with
> interrupts disabled .

No kernel panics, just lockups.  I saw the startup problems (having to hit
a lot of control-C's to get booted) and I had two kinds of lockup
problems, one a complete machine freeze (still pings, but that's all) and
also a strange one where an entire mounted filesystem would disappear.

I can back up to my kernel.gd I keep around, but I have to get me an older
mountd, netstat, ps (and others) before that older kernel is good, and
it was from before the /boot/kernel thing (I hated that idea, and still
do).  I'm going to try the sysctl route first, see if that works.  I won't
be able to report reliable results until the morning (if it lasts all
night, it's a huge fix).

As it stands now, no way can I do any compiling.

> 
> 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



RE: smp instability

2000-10-24 Thread John Baldwin


On 25-Oct-00 Chuck Robey wrote:
> I'm having rather extreme problems with stability on my dual PIII
> setup.  I know this is to be expected, but it's gotten so extreme on my
> system, I can't spend more than a few minutes before it locks up.
> 
> Is there any chance that I could make things better by using a sysctl to
> tell the box it's now a single-cpu system?  I can't read man pages at the
> moment (I'm composing this on my Sparc Ultra-5) so if this might work, and
> someone knows the exact command to use, I'd appreciate a bit of help.

You can use kernel.old to compile a UP kernel.  I always keep a UP kernel
around just in case.  Also, when did your SMP box become unstable?  There
was a known problem with SMP boxes when the vm page zero'ing during the idle
loop was first turned on that has since been fixed with the latest commit to
vm_machdep.c yesterday.  Symptoms were frequent kernel panic 12's with
interrupts disabled .

-- 

John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: smp instability

2000-10-24 Thread Mike Meyer

Chuck Robey writes:
> I'm having rather extreme problems with stability on my dual PIII
> setup.  I know this is to be expected, but it's gotten so extreme on my
> system, I can't spend more than a few minutes before it locks up.
> 
> Is there any chance that I could make things better by using a sysctl to
> tell the box it's now a single-cpu system?  I can't read man pages at the
> moment (I'm composing this on my Sparc Ultra-5) so if this might work, and
> someone knows the exact command to use, I'd appreciate a bit of help.

Try "sysctl -w machdep.smp_active=0". It's not clear how much good
this will do since you'll still be running an SMP kernel. Please
let us know how that works.




smp instability

2000-10-24 Thread Chuck Robey

I'm having rather extreme problems with stability on my dual PIII
setup.  I know this is to be expected, but it's gotten so extreme on my
system, I can't spend more than a few minutes before it locks up.

Is there any chance that I could make things better by using a sysctl to
tell the box it's now a single-cpu system?  I can't read man pages at the
moment (I'm composing this on my Sparc Ultra-5) so if this might work, and
someone knows the exact command to use, I'd appreciate a bit of help.

Otherwise, I'm going to have to go to a lot of trouble to move back to a
pre-SMPNG system, and I sure don't want to do that.

Thanks

Chuck (who doesn't even have his .sig now!)



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message