Re: hard lock-up writing to tape

2003-11-20 Thread Mike Durian
On Wednesday 19 November 2003 02:15 pm, Bruce Evans wrote:
>
> Anyway, the stuff to the left of the slash in the above is the list
> of active consoles and the stuff to the right of the slash is the
> list of possible consoles.  You have to move stuff from one list to
> the other.  I vaguely remember that this is done using '-' to delete
> things from the left hand list and something more direct to add them.

You remember correctly.  Thanks for the info.  However, I think I'm
going to have to throw in the towel on this.  When I swap the
console output using the kern.console sysctl, I can get user application
console output to appear on the remote machine - just nothing from
the kernel.

For example, if I 'echo hello > /dev/console', hello will appear on
the remote machine.  But I never see any of the bold face messages,
such as the very frequent:

checking stopevent 2 with the following non-sleepable locks held:
exclusive sleep mutex sigacts r = 0 (0xc6b6faa8) locked @ /disk2/src/sys/kern/
subr_trap.c:260


When I tried to generate a break using ~# from tip to drop into the
debugger, nothing happens, so I don't think the serial console is
fully connected in the kernel, even though the bold-face output
disappeared from the syscons console.


I do have an extra bit of information on my original tape lock-up
problem.  At one point, when I thought I had the remote console
working and was reproducing the problem, the tape backup worked
fine until I pinged the machine.  I think the machine responded
to the ping and then that was it.  It locked up solid.

mike


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: hard lock-up writing to tape

2003-11-19 Thread Bruce Evans
On Wed, 19 Nov 2003, Mike Durian wrote:

> On Tuesday 18 November 2003 08:29 pm, Bruce Evans wrote:
> > - -current has the kern.console sysctl for enabling multiple consoles
> >   (buut only 1 sio one).  You can boot with a syscons console and then
> >   enable the serial, and the latter should work if it is on a working
> >   port to begin with.  Anyway, this sysctl shows which sio port can be
> >   a console, if any.
>
> Is there any documentation on this sysctl?  I'm not sure what I
> should set it to.  After a normal boot, it reads:

Only in the source code.

> kern.console: consolectl,/ttyd1,consolectl,

Not even the bug that syscons's consolectl device is printed here is
documented (the actual syscons console is on /dev/ttyv0, but this
bogusly shares a tty struct with /dev/consolectl and many things
cannot tell the difference.  This bug also messes up the columns in
pstat -t, since consolectl is too wide to fit).

Anyway, the stuff to the left of the slash in the above is the list
of active consoles and the stuff to the right of the slash is the
list of possible consoles.  You have to move stuff from one list to
the other.  I vaguely remember that this is done using '-' to delete
things from the left hand list and something more direct to add them.

Bruce
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: hard lock-up writing to tape

2003-11-19 Thread Mike Durian
On Tuesday 18 November 2003 08:29 pm, Bruce Evans wrote:
>
> This could be from a speed mismatch or from kern.consmute somehwo getting
> set.

I had wondered about a speed mismatch, but everything I've found says
9600.  I did not know to look at kern.consmute.  I'll check that.

> - -current has the kern.console sysctl for enabling multiple consoles
>   (buut only 1 sio one).  You can boot with a syscons console and then
>   enable the serial, and the latter should work if it is on a working
>   port to begin with.  Anyway, this sysctl shows which sio port can be
>   a console, if any.

Is there any documentation on this sysctl?  I'm not sure what I
should set it to.  After a normal boot, it reads:

kern.console: consolectl,/ttyd1,consolectl,

> - RELENG_4 and -current have the machdep.conspeed sysctl for setting the
>   console speed.

That is the expected 9600.

mike


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: hard lock-up writing to tape

2003-11-18 Thread Bruce Evans
On Tue, 18 Nov 2003, Mike Durian wrote:

> On Monday 17 November 2003 04:41 pm, Mike Durian wrote:
> >
> > I was finally able to get some partial success by setting flag 0x30
> > for sio1.  When I'd boot, I'd get console messages on my remote
> > tip session.  However, I'd only receive those messages printed
> > from user-level applications.  I would not see any of the bold-face
> > messages from the kernel.
>
> I'm still stumbling with the remote serial console.  Can someone
> who does this often test and verify they can use COM2 as the
> serial console - and then tell me what you did.

Moving the 0x10 flag from sio0 to sio1 should be sufficient for the kernel
part.  Setting the 0x20 flag for sio1 together with the 0x10 flag should
mainly save having to edit the flag for sio0.  If the kernel's serial
console is the same as the boot blocks', then it should use the same speed
as the boot blocks set it too.  Otherwise there may be a speed mismatch.

> The best I can manage is described above and then I get neither
> the bold kernel messages nor the debugger prompt.

This could be from a speed mismatch or from kern.consmute somehwo getting
set.

Some of this stuff can be configured after booting:
- RELENG4 has non-broken boot-time configuration which allows changing
  during the boot.
- -current has the kern.console sysctl for enabling multiple consoles
  (buut only 1 sio one).  You can boot with a syscons console and then
  enable the serial, and the latter should work if it is on a working
  port to begin with.  Anyway, this sysctl shows which sio port can be
  a console, if any.
- RELENG_4 and -current have the machdep.conspeed sysctl for setting the
  console speed.

Bruce
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: hard lock-up writing to tape

2003-11-18 Thread Mike Durian
On Monday 17 November 2003 04:41 pm, Mike Durian wrote:
>
> I was finally able to get some partial success by setting flag 0x30
> for sio1.  When I'd boot, I'd get console messages on my remote
> tip session.  However, I'd only receive those messages printed
> from user-level applications.  I would not see any of the bold-face
> messages from the kernel.

I'm still stumbling with the remote serial console.  Can someone
who does this often test and verify they can use COM2 as the
serial console - and then tell me what you did.

The best I can manage is described above and then I get neither
the bold kernel messages nor the debugger prompt.

mike


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: hard lock-up writing to tape

2003-11-17 Thread Mike Durian
On Monday 17 November 2003 02:09 pm, Doug White wrote:
>
> Set flag 0x80 on sio1 and take it off of sio0. Thats what the kernel uses
> to decide which port to use.  The BOOT_COMCONSOLE_PORT is used by loader
> only.

I was finally able to get some partial success by setting flag 0x30
for sio1.  When I'd boot, I'd get console messages on my remote
tip session.  However, I'd only receive those messages printed
from user-level applications.  I would not see any of the bold-face
messages from the kernel.

I tried dropping into the kernel debugger when the machine was not
hung.  The machine would immediately become unresponsive, as you'd
expect if it was stopped in the debugger, but I never got any
prompt on the serial console.  I couldn't type another on the
serial console to make anything happen either.

Are there some hard-coded assumptions in the kernel that force
a remote serial console to only work on sio0?

Until I can get this working, I'm not going to be much help
providing the trace backs needed to debug the tape write lock-up.

mike


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: hard lock-up writing to tape

2003-11-17 Thread Doug White
On Mon, 17 Nov 2003, Mike Durian wrote:

> On Monday 17 November 2003 10:50 am, Doug White wrote:
> >
> > To debug this, you will need to set up a serial console with some special
> > kernel options.  Instructions for booting with serial console are in the
> > Handbook, but you will have to compile with the following kernel options:
>
> Is there a trick to setting up a serial console on sio1?  My line
> drivers are fried on sio0 and I only have sio1, sio4 and sio5 available
> for use.

Set flag 0x80 on sio1 and take it off of sio0. Thats what the kernel uses
to decide which port to use.  The BOOT_COMCONSOLE_PORT is used by loader
only.

-- 
Doug White|  FreeBSD: The Power to Serve
[EMAIL PROTECTED]  |  www.FreeBSD.org
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: hard lock-up writing to tape

2003-11-17 Thread Mike Durian
On Monday 17 November 2003 10:50 am, Doug White wrote:
>
> To debug this, you will need to set up a serial console with some special
> kernel options.  Instructions for booting with serial console are in the
> Handbook, but you will have to compile with the following kernel options:

Is there a trick to setting up a serial console on sio1?  My line
drivers are fried on sio0 and I only have sio1, sio4 and sio5 available
for use.

I set BOOT_COMCONSOLE_PORT= 0x2F8 in /etc/make.conf, rebuilt
sys/boot and installed.  I put the new boot blocks on disk using
bsdlabel -B /dev/ad0s2.  I edited /boot/device.hints and changed
hint.sio.0.flags="0x10" to hint.sio.1.flags="0x10".  I also
tried statically compiling the hints into the kernel.

Now when I boot and use -h or set console=comconsole in loader,
the console flips away from the vidconsole as expected, but doesn't
go to sio1.  At least not so I can tell.  I've got a null-modem
connecting sio1 to a tip session on another machine.  I've verified
the connection is good because I can tip between the two machines
manually.

What am I missing?

mike


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: hard lock-up writing to tape

2003-11-17 Thread Doug White
On Sun, 16 Nov 2003, Mike Durian wrote:

> I'm using -current cvsup'd as of Nov 15, 2003.  When I try to do a
> dump or run the btape (fill command) program from bacula, my machine
> will lock up hard.  Doesn't respond to ping.  No access to kernel
> debugger.  Num lock doesn't come on.

Sounds like a Giant deadlock.

dwhite's Form Letter on Debugging Giant Deadlocks

If you are experiencing problems with CURRENT locking up hard, it may be
due to a deadlock against the Giant mutex, which controls large parts of
the kernel.  Symptoms include:

. No response to any input
. System video console
. Network (ping)

To debug this, you will need to set up a serial console with some special
kernel options.  Instructions for booting with serial console are in the
Handbook, but you will have to compile with the following kernel options:

options DDB
options BREAK_TO_DEBUGGER
options WITNESS
options INVARIANTS
options INVARIANTS_SUPPORT

Make sure your serial console is capable of sending a Break signal. If
not, use "ALT_BREAK_TO_DEBUGGER" instead of "BREAK_TO_DEBUGGER".

Enable the serial console and boot the system. Turn on terminal logging.
In loader, stop the boot and type "boot -v" at the OK prompt to get
additional info during the boot process.

Once the system is up, trigger the hang. When the system hangs, issue the
Break signal (or if you have used ALT_BREAK_TO_DEBUGGER, press Enter ~ ^E
b (tilde, Ctrl-E, b)).

If you get the db> prompt, then your hang is probably due to a Giant
deadlock. If not, then something else may be at fault.

Once in db>, run the following two commands and capture their output using
your terminal's logging capability:

show locks
tr

Take these and the boot -v output, put them on a webpage, and send a
message to [EMAIL PROTECTED] carefully explaining what you did to
trigger the hang.

Good luck!

>
> I can perform a dump or run the btape fill program when in single
> user mode, but in multi-user the machine will only stay up for
> a short while before locking.
>
> This has been happening since I got the tape system (Sparcstorage
> Library) about 3-4 weeks ago.  I don't know how long the problem
> existed before then as I didn't have a tape system to use.
>
> I've tried two types of SCSI cards: Adaptec 2930 and ASUS PCI-SC200
> (sym(4) device).  Both behave the same.
>
> I wonder if it could be network or interrupt related.  In single
> user mode, the network interface is not up.
>
> Dmesg from my system follows:
> Copyright (c) 1992-2003 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
>   The Regents of the University of California. All rights reserved.
> FreeBSD 5.1-CURRENT #57: Sat Nov 15 15:50:50 MST 2003
> [EMAIL PROTECTED]:/disk2/obj/disk2/src/sys/BOOGIE
> Preloaded elf kernel "/boot/kernel/kernel" at 0xc0a93000.
> Preloaded elf module "/boot/kernel/linux.ko" at 0xc0a931f4.
> Preloaded elf module "/boot/kernel/snd_pcm.ko" at 0xc0a932a0.
> Preloaded elf module "/boot/kernel/snd_via82c686.ko" at 0xc0a9334c.
> Preloaded elf module "/boot/kernel/sym.ko" at 0xc0a93400.
> Preloaded elf module "/boot/kernel/nvidia.ko" at 0xc0a934a8.
> Timecounter "i8254" frequency 1193182 Hz quality 0
> CPU: AMD Athlon(tm) processor (1002.28-MHz 686-class CPU)
>   Origin = "AuthenticAMD"  Id = 0x642  Stepping = 2
>   
> Features=0x183f9ff
>   AMD Features=0xc044
> real memory  = 1073676288 (1023 MB)
> avail memory = 1033502720 (985 MB)
> Pentium Pro MTRR support enabled
> npx0: [FAST]
> npx0:  on motherboard
> npx0: INT 16 interface
> acpi0:  on motherboard
> pcibios: BIOS version 2.10
> Using $PIR table, 8 entries at 0xc00fde30
> acpi0: Power Button (fixed)
> Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0
> acpi_cpu0:  on acpi0
> acpi_button0:  on acpi0
> pcib0:  port
> 0x6000-0x607f,0x5000-0x500f,0x4080-0x40ff,0x4000-0x407f,0xcf8-0xcff on acpi0
> pci0:  on pcib0
> pcib0: slot 7 INTD is routed to irq 11
> pcib0: slot 7 INTD is routed to irq 11
> pcib0: slot 7 INTC is routed to irq 10
> pcib0: slot 9 INTA is routed to irq 9
> pcib0: slot 9 INTA is routed to irq 9
> pcib0: slot 9 INTA is routed to irq 9
> pcib0: slot 9 INTA is routed to irq 9
> pcib0: slot 10 INTA is routed to irq 10
> pcib0: slot 11 INTA is routed to irq 11
> pcib0: slot 12 INTA is routed to irq 10
> pcib0: slot 13 INTA is routed to irq 11
> agp0:  mem
> 0xd000-0xd7ff at device 0.0 on pci0
> pcib1:  at device 1.0 on pci0
> pci1:  on pcib1
> pcib0: slot 1 INTA is routed to irq 5
> pcib1: slot 0 INTA is routed to irq 5
> nvidia0:  mem
> 0xd800-0xdfff,0xe000-0xe0ff irq 5 at device 0.0 on pci1
> isab0:  at device 7.0 on pci0
> isa0:  on isab0
> atapci0:  port 0xa000-0xa00f at device 7.1 on
> pci0
> atapci0: Correcting VIA config for southbridge data corruption bug
> ata0: at 0x1f0 irq 14 on atapci0
> ata0: [MPSAFE]
> ata1: at 0x170 irq 15 on atapci0
> ata1: [M

hard lock-up writing to tape

2003-11-16 Thread Mike Durian
I'm using -current cvsup'd as of Nov 15, 2003.  When I try to do a
dump or run the btape (fill command) program from bacula, my machine
will lock up hard.  Doesn't respond to ping.  No access to kernel
debugger.  Num lock doesn't come on.

I can perform a dump or run the btape fill program when in single
user mode, but in multi-user the machine will only stay up for
a short while before locking.

This has been happening since I got the tape system (Sparcstorage
Library) about 3-4 weeks ago.  I don't know how long the problem
existed before then as I didn't have a tape system to use.

I've tried two types of SCSI cards: Adaptec 2930 and ASUS PCI-SC200
(sym(4) device).  Both behave the same.

I wonder if it could be network or interrupt related.  In single
user mode, the network interface is not up.

Dmesg from my system follows:
Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.1-CURRENT #57: Sat Nov 15 15:50:50 MST 2003
[EMAIL PROTECTED]:/disk2/obj/disk2/src/sys/BOOGIE
Preloaded elf kernel "/boot/kernel/kernel" at 0xc0a93000.
Preloaded elf module "/boot/kernel/linux.ko" at 0xc0a931f4.
Preloaded elf module "/boot/kernel/snd_pcm.ko" at 0xc0a932a0.
Preloaded elf module "/boot/kernel/snd_via82c686.ko" at 0xc0a9334c.
Preloaded elf module "/boot/kernel/sym.ko" at 0xc0a93400.
Preloaded elf module "/boot/kernel/nvidia.ko" at 0xc0a934a8.
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Athlon(tm) processor (1002.28-MHz 686-class CPU)
  Origin = "AuthenticAMD"  Id = 0x642  Stepping = 2
  
Features=0x183f9ff
  AMD Features=0xc044
real memory  = 1073676288 (1023 MB)
avail memory = 1033502720 (985 MB)
Pentium Pro MTRR support enabled
npx0: [FAST]
npx0:  on motherboard
npx0: INT 16 interface
acpi0:  on motherboard
pcibios: BIOS version 2.10
Using $PIR table, 8 entries at 0xc00fde30
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0
acpi_cpu0:  on acpi0
acpi_button0:  on acpi0
pcib0:  port 
0x6000-0x607f,0x5000-0x500f,0x4080-0x40ff,0x4000-0x407f,0xcf8-0xcff on acpi0
pci0:  on pcib0
pcib0: slot 7 INTD is routed to irq 11
pcib0: slot 7 INTD is routed to irq 11
pcib0: slot 7 INTC is routed to irq 10
pcib0: slot 9 INTA is routed to irq 9
pcib0: slot 9 INTA is routed to irq 9
pcib0: slot 9 INTA is routed to irq 9
pcib0: slot 9 INTA is routed to irq 9
pcib0: slot 10 INTA is routed to irq 10
pcib0: slot 11 INTA is routed to irq 11
pcib0: slot 12 INTA is routed to irq 10
pcib0: slot 13 INTA is routed to irq 11
agp0:  mem 
0xd000-0xd7ff at device 0.0 on pci0
pcib1:  at device 1.0 on pci0
pci1:  on pcib1
pcib0: slot 1 INTA is routed to irq 5
pcib1: slot 0 INTA is routed to irq 5
nvidia0:  mem 
0xd800-0xdfff,0xe000-0xe0ff irq 5 at device 0.0 on pci1
isab0:  at device 7.0 on pci0
isa0:  on isab0
atapci0:  port 0xa000-0xa00f at device 7.1 on 
pci0
atapci0: Correcting VIA config for southbridge data corruption bug
ata0: at 0x1f0 irq 14 on atapci0
ata0: [MPSAFE]
ata1: at 0x170 irq 15 on atapci0
ata1: [MPSAFE]
uhci0:  port 0xa400-0xa41f irq 11 at device 7.2 on 
pci0
usb0:  on uhci0
usb0: USB revision 1.0
uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1:  port 0xa800-0xa81f irq 11 at device 7.3 on 
pci0
usb1:  on uhci1
usb1: USB revision 1.0
uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
viapropm0: SMBus I/O base at 0x5000
viapropm0:  port 0x5000-0x500f at device 
7.4 on pci0
viapropm0: SMBus revision code 0x40
smbus0:  on viapropm0
smb0:  on smbus0
pcm0:  port 0xb400-0xb403,0xb000-0xb003,0xac00-0xacff irq 10 at 
device 7.5 on pci0
pcm0: 
ohci0:  mem 0xe3006000-0xe3006fff irq 9 at 
device 9.0 on pci0
usb2: OHCI version 1.0, legacy support
usb2:  on ohci0
usb2: USB revision 1.0
uhub2: (0x11c1) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 1 port with 1 removable, self powered
ohci1:  mem 0xe3007000-0xe3007fff irq 9 at 
device 9.1 on pci0
usb3: OHCI version 1.0, legacy support
usb3:  on ohci1
usb3: USB revision 1.0
uhub3: (0x11c1) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub3: 1 port with 1 removable, self powered
ohci2:  mem 0xe3004000-0xe3004fff irq 9 at 
device 9.2 on pci0
usb4: OHCI version 1.0, legacy support
usb4:  on ohci2
usb4: USB revision 1.0
uhub4: (0x11c1) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub4: 1 port with 1 removable, self powered
ohci3:  mem 0xe3005000-0xe3005fff irq 9 at 
device 9.3 on pci0
usb5: OHCI version 1.0, legacy support
usb5:  on ohci3
usb5: USB revision 1.0
uhub5: (0x11c1) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub5: 1 port with 1 removable, self powered
puc0:  port 
0xcc00-0xcc0f,0xc800-0xc807,0xc400-0xc407,0xc000-0xc007,0xbc00-0xbc07,0xb800-0xb807 
irq 10 at