Re: Assertion in zdb?

2013-10-09 Thread Richard Todd
Vitalij Satanivskij sa...@ukr.net writes:

 Hello. 

 System - 10.0-CURRENT FreeBSD 10.0-CURRENT #2 r255173

 While trying to get some statistics from zdb 

  zdb -dd disk1  stat.log

 get some assertion: 

 Assertion failed: object_count == usedobjs (0x85727 == 0x3aa93d), file 
 /usr/src/cddl/usr.sbin/zdb/../../../cddl/contrib/opensolaris/cmd/zdb/zdb.c, 
 line 1767.
 zsh: abort (core dumped)  zdb -dd disk1  stat.log

 Maybe somebody have any idea about what's it's can be and how big problem 
 it's (or not a problem at all)?

Probably not a problem unless it happens reliably when you try it multiple
times.  Since zdb looks at the raw disks, if the filesystem/zpool is active, 
zdb can easily read bits of the zpool metadata off the disks at different
times and thus see an inconsistent state.  Hence trying to get stats out of 
zdb always carries a certain risk of not working.  

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Problem with firewire disks with recent -CURRENT.

2013-05-07 Thread Richard Todd
rmt...@servalan.servalan.com writes:

 Tried upgrading one of my machines to -CURRENT yesterday and got the 
 following panic when the sbp code did its probing of all the firewire 
 devices:

 panic: mutex sbp not owned at /usr/src/sys/cam/cam_xpt.c:4549
 cpuid = 0
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff81fe6837f0
 kdb_backtrace() at kdb_backtrace+0x39/frame 0xff81fe6838a0
 vpanic() at vpanic+0x126/frame 0xff81fe6838e0
 panic() at panic+0x43/frame 0xff81fe683940
 __mtx_assert() at __mtx_assert+0xc2/frame 0xff81fe683950
 xpt_compile_path() at xpt_compile_path+0xa1/frame 0xff81fe6839a0
 xpt_create_path() at xpt_create_path+0x5b/frame 0xff81fe6839f0
 sbp_do_attach() at sbp_do_attach+0xe8/frame 0xff81fe683a30

I did some further poking around in the source code trying to figure out what
went on here.  Looks to me like in the current version of xpt_find_target()
(called by xpt_compile_path() and hence, indirectly, by xpt_create_path() )
the code expects the SIM's mutex to be owned, but apparently the call from
the sbp_do_attach happens without the SIM mutex being locked.  I tried hacking
together the following patch and the resulting kernel comes up and lets the
system properly detect the drives and do I/O to them.  I don't know enough
about the CAM system and its locking to know if this patch is the Right 
Thing to do here, though.

diff -r 96ce948dd944 sys/dev/firewire/sbp.c
--- a/sys/dev/firewire/sbp.cSat May 04 17:23:33 2013 -0500
+++ b/sys/dev/firewire/sbp.cTue May 07 19:17:28 2013 -0500
@@ -1085,10 +1085,13 @@
 END_DEBUG
sbp_xfer_free(xfer);
 
-   if (sdev-path == NULL)
+if (sdev-path == NULL) {
+   CAM_SIM_LOCK(target-sbp-sim);
xpt_create_path(sdev-path, NULL,
cam_sim_path(target-sbp-sim),
target-target_id, sdev-lun_id);
+CAM_SIM_UNLOCK(target-sbp-sim);
+   }
 
/*
 * Let CAM scan the bus if we are in the boot process.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Problem with firewire disks with recent -CURRENT.

2013-05-06 Thread Richard Todd
 
 What happens if you re-add the xpt_periph variable to sbp_do_attach() ?
 
 ref:
 http://svnweb.freebsd.org/base/head/sys/dev/firewire/sbp.c?r1=249468r2=249467pathrev=249468diff_format=f
 
 see line 1089
 
 Sean

Tried that.  No change, still get the same panic: mutex sbp not owned
at /usr/src/sys/cam/cam_xpt.c:4549 .

Richard

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Firewire disk/tape access stopped working after recent CAM commit

2012-01-23 Thread Richard Todd
On Mon, Jan 23, 2012 at 11:16:05AM -0700, Kenneth D. Merry wrote:
 If you can, please try the attached patch and see if it has any impact on
 the problem.  There is a bug in that commit in that we shouldn't be
 invalidating all LUNs on a target when we get a status of
 CAM_DEV_NOT_THERE.

Just applied the patch, built new kernel, and rebooted, and all the FW
drivees are showing up now.  Thanks!

 It may be that we need to do a more thorough audit of how various SIM
 drivers are using the CAM_DEV_NOT_THERE status.

So I take it the layers for the different hardware (SCSI, FW, USB,
ATA/AHCI) are handling this status differently, so that's why this bug only
showed up on the Firewire buses but not on ATA/AHCI, USB, or (on my other 
machine) SCSI buses? 

Richard

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Firewire disk/tape access stopped working after recent CAM commit

2012-01-22 Thread Richard Todd
Hi.  I tried upgrading my amd64 10-CURRENT box to the most recent -CURRENT code
and found that the new kernel couldn't find my two disks and tape drive that
are on a Firewire bus.  All the USB and AHCI-attached hardware still showed
up okay, it's just the Firewire stuff that failed to show up properly on boot.
Spent today doing binary search to find the responsible commit and it looks
to be this one: 

  r23 | ken | 2012-01-11 18:41:48 -0600 (Wed, 11 Jan 2012) | 72 lines

  Fix a race condition in CAM peripheral free handling, locking
  in the CAM XPT bus traversal code, and a number of other periph level
  issues.

Not sure what in this commit triggers the problem, or why it just hits 
Firewire and not the rest of the system.   I've built kernels both right
before and right after the r23 commit, with CAM debugging turned on real
high on the firewire bus in question, bus 0 (hardwired to that number in
device.hints, if that matters)

 options CAMDEBUG
 options CAM_DEBUG_BUS=0
 options CAM_DEBUG_TARGET=-1
 options CAM_DEBUG_LUN=-1
 options CAM_DEBUG_FLAGS=CAM_DEBUG_INFO|CAM_DEBUG_TRACE|CAM_DEBUG_CDB

and got dmesgs of both the bad (r23) and good (pre-r23) kernels,
which I've put online at http://ln.servalan.com/rmtodd/bug1/dmesg.bad and
http://ln.servalan.com/rmtodd/bug1/dmesg.good, respectively.  They're a bit
lengthy, what with all that debug info.  Grepping out the info for one of
the targets (disk 0, sbp0:0:0:0) and just looking at the lines for that one,
we see that the good kernel does a lot more with that target, starting
with the (noperiph:sbp0:0:0:0): xpt_compile_path bit, that the bad
kernel doesn't do, as seen in the diff below. 

Not sure what's going on here, but if anyone has suggestions on more things
I can test/debug code I can add to track this down further, let me know.

--- /tmp/dbad   2012-01-22 19:08:03.0 -0600
+++ /tmp/dgood  2012-01-22 19:08:10.0 -0600
@@ -128,3 +128,1097 @@
 (xpt0:sbp0:0:0:0): xpt_action_default
 (xpt0:sbp0:0:0:0): xpt_free_path
 (xpt0:sbp0:0:0:0): xpt_release_path
+(noperiph:sbp0:0:0:0): xpt_compile_path
+(noperiph:sbp0:0:0:0): xpt_setup_ccb
+(noperiph:sbp0:0:0:0): xpt_action
+(noperiph:sbp0:0:0:0): xpt_action_default
+(da0:sbp0:0:0:0): xpt_compile_path
+(da0:sbp0:0:0:0): xpt_setup_ccb
+(da0:sbp0:0:0:0): xpt_action
+(da0:sbp0:0:0:0): xpt_action_default
+(da0:sbp0:0:0:0): xpt_done
+(da0:sbp0:0:0:0): xpt_setup_ccb
+(da0:sbp0:0:0:0): xpt_action
+(da0:sbp0:0:0:0): xpt_action_default
+(da0:sbp0:0:0:0): xpt_schedule
+(da0:sbp0:0:0:0): xpt_setup_ccb
+(da0:sbp0:0:0:0): xpt_action
+(da0:sbp0:0:0:0): xpt_action_default
+(da0:sbp0:0:0:0): READ CAPACITY(10). CDB: 25 0 0 0 0 0 0 0 0 0 
+(noperiph:sbp0:0:0:0): xpt_release_path
+(da0:sbp0:0:0:0): xpt_done
+(da0:sbp0:0:0:0): camisr
+(da0:sbp0:0:0:0): xpt_setup_ccb
+(da0:sbp0:0:0:0): xpt_action
+(da0:sbp0:0:0:0): xpt_action_default
+(da0:sbp0:0:0:0): xpt_done
+(da0:sbp0:0:0:0): xpt_setup_ccb
+(da0:sbp0:0:0:0): xpt_action
+(da0:sbp0:0:0:0): xpt_done
+(da0:sbp0:0:0:0): xpt_setup_ccb
+(da0:sbp0:0:0:0): xpt_action
+(da0:sbp0:0:0:0): xpt_action_default
+(da0:sbp0:0:0:0): xpt_done
+(da0:sbp0:0:0:0): daopen: disk=da0 (unit 0)
+(da0:sbp0:0:0:0): entering cdgetccb
+(da0:sbp0:0:0:0): xpt_schedule
+(da0:sbp0:0:0:0): xpt_setup_ccb
+(da0:sbp0:0:0:0): xpt_action
+(da0:sbp0:0:0:0): xpt_action_default
+(da0:sbp0:0:0:0): READ CAPACITY(10). CDB: 25 0 0 0 0 0 0 0 0 0 
+(da0:sbp0:0:0:0): xpt_done
+(da0:sbp0:0:0:0): camisr
+(da0:sbp0:0:0:0): xpt_setup_ccb
+(da0:sbp0:0:0:0): xpt_action
+(da0:sbp0:0:0:0): xpt_action_default
+(da0:sbp0:0:0:0): xpt_done
+(da0:sbp0:0:0:0): xpt_schedule
+(da0:sbp0:0:0:0): xpt_setup_ccb
+(da0:sbp0:0:0:0): xpt_action
+(da0:sbp0:0:0:0): xpt_action_default
+(da0:sbp0:0:0:0): READ(10). CDB: 28 0 22 ee c1 2f 0 0 1 0 
+(noperiph:sbp0:0:0:0): xpt_compile_path
+(noperiph:sbp0:0:0:0): xpt_setup_ccb
+(noperiph:sbp0:0:0:0): xpt_action
+(noperiph:sbp0:0:0:0): xpt_action_default
+(noperiph:sbp0:0:0:0): xpt_release_path
+(noperiph:sbp0:0:0:0): xpt_compile_path
+(noperiph:sbp0:0:0:0): xpt_setup_ccb
+(noperiph:sbp0:0:0:0): xpt_action
+(noperiph:sbp0:0:0:0): xpt_action_default
+(pass0:sbp0:0:0:0): xpt_compile_path
+(pass0:sbp0:0:0:0): xpt_setup_ccb
+(pass0:sbp0:0:0:0): xpt_action
+(pass0:sbp0:0:0:0): xpt_action_default
+(pass0:sbp0:0:0:0): xpt_done
+(pass0:sbp0:0:0:0): xpt_setup_ccb
+(pass0:sbp0:0:0:0): xpt_action
+(pass0:sbp0:0:0:0): xpt_action_default
+(noperiph:sbp0:0:0:0): xpt_release_path
+(da0:sbp0:0:0:0): xpt_done
+(da0:sbp0:0:0:0): camisr
+(da0:sbp0:0:0:0): entering cdgetccb
+(da0:sbp0:0:0:0): xpt_schedule
+(da0:sbp0:0:0:0): xpt_setup_ccb
+(da0:sbp0:0:0:0): xpt_action
+(da0:sbp0:0:0:0): xpt_action_default
+(da0:sbp0:0:0:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0 0 0 0 
+(da0:sbp0:0:0:0): xpt_done
+(da0:sbp0:0:0:0): camisr
+(da0:sbp0:0:0:0): daopen: disk=da0 (unit 0)
+(da0:sbp0:0:0:0): entering cdgetccb
+(da0:sbp0:0:0:0): xpt_schedule
+(da0:sbp0:0:0:0): xpt_setup_ccb
+(da0:sbp0:0:0:0): xpt_action

Re: new interrupts not working for me

2003-11-11 Thread Richard Todd
John Baldwin wrote:
On 06-Nov-2003 Peter Schultz wrote:
 John Baldwin wrote:
 On 05-Nov-2003 Peter Schultz wrote:
 
I have a Tyan S1832DL w/dual pii 350s and it's not able to boot.  Seems 
to be having trouble with my adaptec scsi controller, I get a whole 
bunch of output like this hand transcribed bit, it comes after waiting 
15 seconds for scsi devices to settle:

ahc0 timeout SCB already complete interrupts may not be functioning
Infinite interrupt loop INTSTAT=0(probe3:ahc0:0:3:0): SCB 0x6 - timed out

Anyone else seeing this?  There are probably 100+ related lines of 
output, I'll have to configure serial debugging if you need to see it.
 
 
 The dmesg output excluding all the ahc0 errors would help figure out
 why your interrupts aren't working.  However, I just committed a patch
 that might fix your problem.
 
 Now the kernel just dies and the machine reboots right in the beginning 
 when it's setting up the ACPI/APIC stuff.  Of course, with ACPI off, 
 there's no apparent problem with the kernel.

Ok.  Did the old kernel break before with ACPI turned off?  It should
have.  By the way, I've committed a fix for the ACPI breakage.

I've got a similar motherboard to the original poster (a Tyan S1836DLUAN/GX
instead of S1832DL), and ran across essentially the same problem -- the 
interrupts for the ahc controller weren't working -- with the new interrupt 
code.  With the new kernel, booting with ACPI disabled worked okay, but 
booting with ACPI enabled caused the SCSI device probe to hang up.  This is
true even for a kernel compiled from current source today.  Below I list
the dmesg output for a boot with today's kernel with ACPI disabled.  Alas, 
I don't have a similar file for the ACPI-enabled case (since the OS doesn't
ever get up to a point where it can write to its disks, and don't have a 
machine available for ready serial console-ing), but I can tell you that
where the non-ACPI boot said 
pcib0: slot 7 INTD routed to irq 19
pcib0: slot 17 INTA routed to irq 19
pcib0: slot 18 INTA routed to irq 16
pcib0: slot 18 INTB routed to irq 16

the booted-with-ACPI kernel said those interrupts were routed to IRQs 11 and
10, respectively, and the later ahc? probes said that ahc[01] were on irq 10 
as well.  

I'd attach the dump of the ACPI tables as well, but, um, 
ichotolot# acpidump -t 
acpidump: sysctl machdep.acpi_root does not point to RSDP
ichotolot# sysctl -a | grep acpi_root
machdep.acpi_root: 0
ichotolot# 

So you can't dump the ACPI tables for debugging purposes if you didn't
boot with ACPI? I don't recall this being the case before...

Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.1-CURRENT #9: Mon Nov 10 21:13:08 CST 2003
[EMAIL PROTECTED]:/usr/src/sys/i386/compile/ICHOTOLOTSMP
Preloaded elf kernel /boot/kernel/kernel at 0xc0b3f000.
MPTable: INTEL440GX   
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: Pentium II/Pentium II Xeon/Celeron (400.91-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0x653  Stepping = 3
  
Features=0x183fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR
real memory  = 668991488 (638 MB)
avail memory = 640176128 (610 MB)
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
ioapic0: Assuming intbase of 0
ioapic0 Version 1.1 irqs 0-23 on motherboard
Pentium Pro MTRR support enabled
npx0: [FAST]
npx0: math processor on motherboard
npx0: INT 16 interface
pcibios: BIOS version 2.10
pcib0: MPTable Host-PCI bridge at pcibus 0 on motherboard
pci0: PCI bus on pcib0
pcib0: slot 7 INTD routed to irq 19
pcib0: slot 17 INTA routed to irq 19
pcib0: slot 18 INTA routed to irq 16
pcib0: slot 18 INTB routed to irq 16
agp0: Intel 82443GX host to PCI bridge mem 0xf800-0xfbff at device 0.0 on 
pci0
pcib1: MPTable PCI-PCI bridge at device 1.0 on pci0
pci1: PCI bus on pcib1
pcib1: slot 0 INTA routed to irq 16
pci1: display, VGA at device 0.0 (no driver attached)
isab0: PCI-ISA bridge at device 7.0 on pci0
isa0: ISA bus on isab0
atapci0: Intel PIIX4 UDMA33 controller port 0xffa0-0xffaf at device 7.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata0: [MPSAFE]
ata1: at 0x170 irq 15 on atapci0
ata1: [MPSAFE]
uhci0: Intel 82371AB/EB (PIIX4) USB controller port 0xef80-0xef9f irq 19 at device 
7.2 on pci0
usb0: Intel 82371AB/EB (PIIX4) USB controller on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
ums0: Cypress Sem PS2/USB Browser Combo Mouse, rev 1.00/4.9c, addr 2, iclass 3/1
ums0: 5 buttons and Z dir.
piix0: PIIX Timecounter port 0x440-0x44f at device 7.3 on pci0
Timecounter PIIX frequency 3579545 Hz quality 0
pcib2: PCI-PCI bridge at device 16.0 on pci0
pci2: PCI bus on pcib2
fxp0: Intel 82558 Pro/100 Ethernet port 0xef40-0xef5f mem 

Panic in scheduler code with SCHED_ULE during boot to multi-user.

2003-07-03 Thread Richard Todd
Hi.  Last night I upgraded to the most recent -current source and
rebuilt everything, and decided on building the kernel to try the new
SCHED_ULE scheduler (I had been using SCHED_4BSD before).  Alas, the
experiment did not go well; every time I booted the machine, I got a
panic just as the system was about to put up the login prompt.
Switching the kernel config back to SCHED_4BSD and building a kernel
with the same (last night's) sources gave me a working kernel.  This
is on a dual-processor PII/400 box.  Below I list what I've got from a
kernel coredump of the SCHED_ULE kernel; I've added my comments on the gdb
listing preceded by # signs.

ichotolot# gdb -k ./kernel.debug ./vmcore.45
GNU gdb 5.2.1 (FreeBSD)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-undermydesk-freebsd...
panic: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
cpuid = 1; lapic.id = 0100
fault virtual address   = 0x38
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc036835d
stack pointer   = 0x10:0xe1cfbbbc
frame pointer   = 0x10:0xe1cfbbcc
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 649 (squid)
trap number = 12
panic: page fault
cpuid = 1; lapic.id = 0100
boot() called on cpu#1

syncing disks, buffers remaining... panic: absolutely cannot call smp_ipi_shootdown 
with interrupts already disabled
cpuid = 1; lapic.id = 0100
boot() called on cpu#1
Uptime: 1m12s
Dumping 638 MB
 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 
384 400 416 432 448 464 480 496 512 528 544 560 576 592 608 624
---
Reading symbols from 
/usr/src/sys/i386/compile/ICHOTOLOTSMP/modules/usr/src/sys/modules/acpi/acpi.ko.debug...done.
Loaded symbols for 
/usr/src/sys/i386/compile/ICHOTOLOTSMP/modules/usr/src/sys/modules/acpi/acpi.ko.debug
Reading symbols from 
/usr/src/sys/i386/compile/ICHOTOLOTSMP/modules/usr/src/sys/modules/linprocfs/linprocfs.ko.debug...done.
Loaded symbols for 
/usr/src/sys/i386/compile/ICHOTOLOTSMP/modules/usr/src/sys/modules/linprocfs/linprocfs.ko.debug
Reading symbols from 
/usr/src/sys/i386/compile/ICHOTOLOTSMP/modules/usr/src/sys/modules/linux/linux.ko.debug...done.
Loaded symbols for 
/usr/src/sys/i386/compile/ICHOTOLOTSMP/modules/usr/src/sys/modules/linux/linux.ko.debug
Reading symbols from /boot/kernel/green_saver.ko...done.
Loaded symbols for /boot/kernel/green_saver.ko
#0  doadump () at ../../../kern/kern_shutdown.c:240
240 dumping++;
(kgdb) bt
#0  doadump () at ../../../kern/kern_shutdown.c:240
#1  0xc03547c0 in boot (howto=260) at ../../../kern/kern_shutdown.c:372
#2  0xc0354ba6 in panic () at ../../../kern/kern_shutdown.c:550
#3  0xc050f9db in smp_tlb_shootdown (vector=0, addr1=0, addr2=0)
at ../../../i386/i386/mp_machdep.c:2387
#4  0xc050fc79 in smp_invlpg_range (addr1=0, addr2=0)
at ../../../i386/i386/mp_machdep.c:2519
#5  0xc0511df8 in pmap_invalidate_range (pmap=0xc06dc620, sva=3568271360, 
eva=1) at ../../../i386/i386/pmap.c:719
#6  0xc0512118 in pmap_qenter (sva=3568271360, m=0xe1cfb8c0, count=-1)
at ../../../i386/i386/pmap.c:943
#7  0xc03a0448 in vm_hold_load_pages (bp=0xd199d440, from=3568271360, 
to=3568279552) at ../../../kern/vfs_bio.c:3574
#8  0xc039ea5c in allocbuf (bp=0xd199d440, size=6144)
at ../../../kern/vfs_bio.c:2752
#9  0xc039e6fe in geteblk (size=6144) at ../../../kern/vfs_bio.c:2634
#10 0xc039b210 in bwrite (bp=0xd188b8e0) at ../../../kern/vfs_bio.c:818
#11 0xc039bc6c in bawrite (bp=0x0) at ../../../kern/vfs_bio.c:1153
#12 0xc03a4860 in vop_stdfsync (ap=0xe1cfba14)
at ../../../kern/vfs_default.c:742
#13 0xc031ba10 in spec_fsync (ap=0xe1cfba14)
at ../../../fs/specfs/spec_vnops.c:417
#14 0xc031ae38 in spec_vnoperate (ap=0x0)
at ../../../fs/specfs/spec_vnops.c:122
#15 0xc04ad2d7 in ffs_sync (mp=0xc5336400, waitfor=2, cred=0xc1c27e80, 
td=0xc0669ec0) at vnode_if.h:624
#16 0xc03b0b1b in sync (td=0xc0669ec0, uap=0x0)
at ../../../kern/vfs_syscalls.c:142
#17 0xc03542e2 in boot (howto=256) at ../../../kern/kern_shutdown.c:281
#18 0xc0354ba6 in panic () at ../../../kern/kern_shutdown.c:550
#19 0xc0517292 in trap_fatal (frame=0xe1cfbb7c, eva=0)
at ../../../i386/i386/trap.c:836
#20 0xc0516863 in trap (frame=
  {tf_fs = -1067057128, tf_es = 16, tf_ds = -1067974640, tf_edi = -982159056, 
tf_esi = -1066987296, tf_ebp = -506479668, tf_isp = -506479704, tf_ebx = 0, tf_edx = 
2, tf_ecx = 1, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -1070169251, tf_cs = 
8, 

Re: HEADS UP: ACPI CHANGES AFFECTING MOST -CURRENT USERS

2001-09-02 Thread Richard Todd

In servalan.mailinglist.fbsd-current David Malone writes:

On Wed, Aug 29, 2001 at 07:58:59PM -0700, Mike Smith wrote:
  - The PnP BIOS is disabled and onboard peripherals are detected
using ACPI, and attach to ACPI and not isa.

With the ACPI module loaded I find that ed0, fdc0 and pca0 are no
longer detected (well, fdc0 is detected but gives an error). I have
the most recent BIOS installed and it doesn't seem to make any
difference if I twiddle BIOS settings.  Could this have something
to do with hints, or where should I be looking for the problem?

I'm seeing similar behavior, with fdc0 not functioning properly and giving
the following stuff in dmesg.  Note the 'fdc0: cmd 3 failed at out byte 1 of 3'
messages; the kernel never seems to properly detect floppy drive 0.  This
is on a Tyan Thunder 100GX motherboard. It's not got the most current rev. of
the BIOS, but I'm somewhat reluctant to try flashing a newer BIOS unless I'm
sure the lossage is in the BIOS and not in the FreeBSD kernel.  (Alas, 
trying the newer BIOS may be the only way to find out for sure.) 


Copyright (c) 1992-2001 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.0-CURRENT #1: Sat Sep  1 21:43:41 CDT 2001
[EMAIL PROTECTED]:/usr/src/sys/i386/compile/ICHOTOLOTSMP
Timecounter i8254  frequency 1193182 Hz
CPU: Pentium II/Pentium II Xeon/Celeron (400.91-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0x653  Stepping = 3
  
Features=0x183fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR
real memory  = 134152192 (131008K bytes)
avail memory = 124178432 (121268K bytes)
Programming 24 pins in IOAPIC #0
IOAPIC #0 intpin 2 - irq 0
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfee0
 cpu1 (AP):  apic id:  1, version: 0x00040011, at 0xfee0
 io0 (APIC): apic id:  2, version: 0x00170011, at 0xfec0
Preloaded elf kernel kernel at 0xc0633000.
Preloaded elf module acpi.ko at 0xc063309c.
Pentium Pro MTRR support enabled
WARNING: Driver mistake: destroy_dev on 154/0
npx0: math processor on motherboard
npx0: INT 16 interface
acpi0: TYANCP TYANTBLE on motherboard
acpi0: power button is handled as a fixed feature programming model.
Timecounter ACPI  frequency 3579545 Hz
acpi_timer0: 24-bit timer at 3.579545MHz port 0x408-0x40b on acpi0
acpi_cpu0: CPU on acpi0
acpi_cpu1: CPU on acpi0
acpi_tz0: thermal zone on acpi0
acpi_pcib0: Host-PCI bridge port 0xcf8-0xcff on acpi0
IOAPIC #0 intpin 19 - irq 2
IOAPIC #0 intpin 16 - irq 10
pci0: PCI bus on acpi_pcib0
pcib1: PCI-PCI bridge at device 1.0 on pci0
pci1: PCI bus on pcib1
pci1: display, VGA at 0.0 (no driver attached)
isab0: PCI-ISA bridge at device 7.0 on pci0
isa0: ISA bus on isab0
atapci0: Intel PIIX4 ATA33 controller port 0xffa0-0xffaf at device 7.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
uhci0: Intel 82371AB/EB (PIIX4) USB controller port 0xef80-0xef9f irq 2 at device 
7.2 on pci0
usb0: Intel 82371AB/EB (PIIX4) USB controller on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
ums0: Cypress Sem PS2/USB Browser Combo Mouse, rev 1.00/4.9c, addr 2, iclass 3/1
ums0: 5 buttons and Z dir.
Timecounter PIIX  frequency 3579545 Hz
pci0: bridge, PCI-unknown at 7.3 (no driver attached)
pcib2: PCI-PCI bridge at device 16.0 on pci0
pci2: PCI bus on pcib2
fxp0: Intel Pro 10/100B/100+ Ethernet port 0xef40-0xef5f mem 
0xfea0-0xfeaf,0xfc4ff000-0xfc4f irq 2 at device 17.0 on pci0
fxp0: Ethernet address 00:e0:81:10:47:b2
inphy0: i82555 10/100 media interface on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
ahc0: Adaptec aic7895 Ultra SCSI adapter port 0xe400-0xe4ff mem 
0xfebfe000-0xfebfefff irq 10 at device 18.0 on pci0
aic7895C: Ultra Wide Channel A, SCSI Id=7, 32/255 SCBs
ahc1: Adaptec aic7895 Ultra SCSI adapter port 0xe800-0xe8ff mem 
0xfebff000-0xfebf irq 10 at device 18.1 on pci0
aic7895C: Ultra Wide Channel B, SCSI Id=7, 32/255 SCBs
fdc0: cmd 3 failed at out byte 1 of 3
sio0 port 0x3f8-0x3ff irq 4 on acpi0
sio0: type 16550A
sio1 port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
ppc0 port 0x378-0x37f irq 7 on acpi0
ppc0: Generic chipset (EPP/NIBBLE) in COMPATIBLE mode
plip0: PLIP network interface on ppbus0
lpt0: Printer on ppbus0
lpt0: Interrupt-driven port
ppi0: Parallel I/O on ppbus0
ppc1: cannot reserve I/O port range
fdc0: cmd 3 failed at out byte 1 of 3
ppc1: cannot reserve I/O port range
orm0: Option ROMs at iomem 0xc-0xc87ff,0xcc000-0xd07ff on isa0
atkbdc0: Keyboard controller (i8042) at port 0x60,0x64 on isa0
atkbd0: AT Keyboard flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
ppc1: cannot reserve I/O port range
sc0: System console at flags 0x100 on isa0
sc0: VGA 16 virtual consoles, flags=0x300
vga0: Generic ISA VGA 

Re: Interrupt messages from usb0 on CURRENT

2001-08-22 Thread Richard Todd

In servalan.mailinglist.fbsd-current you write:

I just upgraded to the latest sources (two hours ago) on my VAIO laptop and
I'm now getting dozens of messages:

Aug 22 15:00:07 sidhe /boot/kernel/kernel: usb0: interrupt, but not for us
Aug 22 15:00:51 sidhe last message repeated 8 times
Aug 22 15:03:02 sidhe last message repeated 19 times
Aug 22 15:12:59 sidhe last message repeated 92 times

This is apparently due to a change last night in the uhci and ohci drivers to
report interrupts the USB code sees but which don't correspond to any actual
USB activity.  I saw the same thing last night after I upgraded (to try out
jhb's latest fixes, which worked like a charm on the sound problem).  

I note that on my system the uhci0 and fxp0 are on the same IRQ:
uhci0: Intel 82371AB/EB (PIIX4) USB controller port 0xef80-0xef9f irq 2 at device 
7.2 on pci0
fxp0: Intel Pro 10/100B/100+ Ethernet port 0xef40-0xef5f mem 
0xfea0-0xfeaf,0xfc4ff000-0xfc4f irq 2 at device 17.0 on pci0

I wonder if the interrupts not for us are actually interrupts from the 
Ethernet that the USB code sees because both the USB and the Ethernet
are on the same irq.  


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Sound broken on -current again...

2001-08-20 Thread Richard Todd

In servalan.mailinglist.fbsd-current jhb writes:
On 19-Aug-01 Richard Todd wrote:
 In servalan.mailinglist.fbsd-current Maxim Sobolev writes:
I found that after reverting the following deltas (jhb's 10 August commit)
sound starts working again:
 
 [list of deltas deleted]
 
 I found much the same thing; specifically, the problematic change is this
 one:

What wait channel is the process (xmms, mpg123, whatever) in?

Looking at a core file from a known-buggy kernel that I'd forced to core
itself with ddb, I find for the madplay process:
(kgdb) proc 855
(kgdb) bt
#0  mi_switch () at ../../../kern/kern_synch.c:707
#1  0xc0273645 in msleep (ident=0xc13e0b00, mtx=0xc13d2800, priority=332, 
wmesg=0xc042bcb4 pcmwr, timo=1) at ../../../kern/kern_synch.c:466
#2  0xc01fcad8 in chn_sleep (c=0xc13d1680, str=0xc042bcb4 pcmwr, timeout=1)
at ../../../dev/sound/pcm/channel.c:109
#3  0xc01fcd5c in chn_write (c=0xc13d1680, buf=0xc8f1af00)
at ../../../dev/sound/pcm/channel.c:259
#4  0xc01fef40 in dsp_write (i_dev=0xc13e0f00, buf=0xc8f1af00, flag=2359297)
at ../../../dev/sound/pcm/dsp.c:381
#5  0xc0243095 in spec_write (ap=0xc8f1ae90)
at ../../../fs/specfs/spec_vnops.c:289
#6  0xc0242dc9 in spec_vnoperate (ap=0xc8f1ae90)
at ../../../fs/specfs/spec_vnops.c:119
#7  0xc02b7c5f in vn_write (fp=0xc1623ec0, uio=0xc8f1af00, cred=0xc15c2600, 
flags=0, p=0xc8e54100) at vnode_if.h:303
#8  0xc028c073 in dofilewrite (p=0xc8e54100, fp=0xc1623ec0, fd=3, 
buf=0xbfbf8b74, nbyte=4608, offset=-1, flags=0) at ../../../sys/file.h:162
#9  0xc028bf26 in write (p=0xc8e54100, uap=0xc8f1af80)
at ../../../kern/sys_generic.c:334
#10 0xc03e2fc9 in syscall (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, 
  tf_edi = -1077965964, tf_esi = 4608, tf_ebp = -1077937536, 
  tf_isp = -923684908, tf_ebx = -1077965964, tf_edx = 1103, tf_ecx = -411, 
  tf_eax = 4, tf_trapno = 0, tf_err = 2, tf_eip = 672022312, tf_cs = 31, 
  tf_eflags = 663, tf_esp = -1077966048, tf_ss = 47})
at ../../../i386/i386/trap.c:1128
#11 0xc03cce0d in syscall_with_err_pushed ()

so apparently it was waiting on 'pcmwr'. 

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Sound broken on -current again...

2001-08-19 Thread Richard Todd

In servalan.mailinglist.fbsd-current Maxim Sobolev writes:
I found that after reverting the following deltas (jhb's 10 August commit)
sound starts working again:

[list of deltas deleted]

I found much the same thing; specifically, the problematic change is this one:


jhb 2001/08/10 14:08:57 PDT

  Modified files:
sys/kern kern_synch.c 
  Log:
  Work around a race between msleep() and endtsleep() where it was possible
  for endtsleep() to be executing when msleep() resumed, for endtsleep()
  to spin on sched_lock long enough for the other process to loop on
  msleep() and sleep again resulting in endtsleep() waking up the wrong
  msleep.
  
  Obtained from:BSD/OS
  
  Revision  ChangesPath
  1.154 +24 -4 src/sys/kern/kern_synch.c


Kernels built from source immediately prior to this change work; kernels 
built from source immediately after this change have the sound-related problems
mentioned in this thread. 

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Sound broken on -current again...

2001-08-18 Thread Richard Todd

In servalan.mailinglist.fbsd-current Daniel M. Kurry writes:

On Wed, Aug 15, 2001 at 07:01:46PM +0200, some SMTP stream spewed forth: 
 
 One gets the first DMA buffer full, then the process hangs...

Due to the lack of replies, I'll go ahead.

I am seeing sound breakage also.
My card is a 
Creative Labs SoundBlaster Live!.

xmms will play a short (less than a second) spurt of audio and then stop
responding. mpg123 will not play (any audio to the speakers) at all.

I ran a buildworld today which apparently broke it.
That puts the breakage between today and sometime less than 2 months
ago.
(I really cannot be more specific.)

I'm seeing much the same thing, on an SMP box with onboard sbc0 (Vibra16X)
sound chip.  Attempting to play sound with madplay gets about 2 seconds of 
sound and then silence, with the madplay process in an unkillable kernel
wait.  Oddly enough, the sbc0 interrupt thread continues to occasionally gather
a tick of CPU time, but apparently not enough to do anything useful.

I'm busy doing binary-search on the CVS tree, checking out source from
different times and seeing if I can localize the commit that broke it.
My current results are that a kernel built from source as of
2001/08/10 00:00 CDT (i.e. 2001/08/09 22:00:00 PDT) works, one built
from source as of 2001/08/10 15:52 PDT does not, so the bug is
somewhere in between there.  I'm now trying to narrow this down further,
to a specific commit somewhere in that region.




To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Couple Giant not locked at vm_object.c:261 panics I had to

2001-06-12 Thread Richard Todd

In fbsd-current John Baldwin writes:
On 09-Jun-01 Richard Todd wrote:
 Note that the first panic is somewhat muddled by the fact that, while 
 syncing disks from the vm_object.c panic, it apparently paniced again with
 Giant locked at i386/trap.c:1153.  That probably confuses the issue 
 greatly.

Yes, I need the first traceback, not the second.  One question: are you using
ktrace?  ddb is your friend here, as it can do a traceback when you have the
first panic.

Yeah, I am using ktrace, and now that I think of it, yeah, a ktraced process
was probably running when those panics occured.  Unfortunately, ddb is
not my friend, as I'm usually running X.  :-(  

 P.S. Stupid -current question: How does one tell what process was running
 that triggered a panic?  This used to be findable with p *curproc in
 gdb, but that doesn't seem to work anymore.

You have to look at the list of per-cpu data (look at the gd_allcpu list).  In
ddb you can use 'show pcpu' to look at per-cpu data.  At some point, gdb needs
to be taught the notion of a 'current CPU' and be taught a way to access
per-cpu data of the current CPU.

Ah. Okay.  

#10 0xc042c603 in ast (framep=0xc8ce0fa8) at ../../i386/i386/trap.c:1320
#11 0xc0417b00 in doreti_ast ()

Ok, this one is the ktrace bogon that was recently brought to my attention.

Cool.  Thanks. 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Couple Giant not locked at vm_object.c:261 panics I had today....

2001-06-09 Thread Richard Todd

Backtraces posted here in hopes they might enlighten someone. 
This is with kernel source from June 6 (specifically,
Sticky Date:   2001.06.06.22.16.24 according to cvs status).  The machine
is a dual PII/400; dmesg follows the backtraces from the two panics. If you
want more information from these two core files, please let me know.

Note that the first panic is somewhat muddled by the fact that, while 
syncing disks from the vm_object.c panic, it apparently paniced again with
Giant locked at i386/trap.c:1153.  That probably confuses the issue 
greatly.

P.S. Stupid -current question: How does one tell what process was running
that triggered a panic?  This used to be findable with p *curproc in
gdb, but that doesn't seem to work anymore.


Script started on Sat Jun  9 16:02:27 2001
You have mail.
ichotolot# gdb -k kernel.debug vmcore.19
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-unknown-freebsd...
IdlePTD 6516736
initial pcb at 529440
panicstr: witness_restore: lock (sleep mutex) Giant not locked
panic messages:
---
panic: mutex Giant not owned at ../../vm/vm_object.c:261
cpuid = 1; lapic.id = 0100
boot() called on cpu#1

syncing disks... exclusive (sleep mutex) Giant (0xc0576ca0) locked @ 
../../i386/i386/trap.c:1153
exclusive (spin mutex) sched lock (0xc05763e0) locked @ ../../kern/kern_mutex.c:312
panic: witness_restore: lock (sleep mutex) Giant not locked
cpuid = 1; lapic.id = 0100
boot() called on cpu#1
Uptime: 2d2h35m38s

dumping to dev da0s2b, offset 270336
dump 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 
108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 
82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 
53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 
24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 
---
#0  dumpsys () at ../../kern/kern_shutdown.c:478
478 if (dumping++) {
(kgdb) p curproc
No symbol curproc in current context.
(kgdb) bt
#0  dumpsys () at ../../kern/kern_shutdown.c:478
#1  0xc026b35f in boot (howto=260) at ../../kern/kern_shutdown.c:321
#2  0xc026b7d1 in panic (fmt=0xc0488ae5 %s: lock (%s) %s not locked)
at ../../kern/kern_shutdown.c:600
#3  0xc02878a5 in witness_restore (lock=0xc0576ca0, 
file=0xc048bc20 ../../kern/vfs_bio.c, line=1827)
at ../../kern/subr_witness.c:1297
#4  0xc0273836 in msleep (ident=0xc054eaec, mtx=0x0, priority=68, 
wmesg=0xc048c09e psleep, timo=100) at ../../kern/kern_synch.c:500
#5  0xc02ab0a5 in buf_daemon () at ../../kern/vfs_bio.c:1883
#6  0xc025af78 in fork_exit (callout=0xc02aaf20 buf_daemon, arg=0x0, 
frame=0xc80cbfa8) at ../../kern/kern_fork.c:727
(kgdb) fr 6
#6  0xc025af78 in fork_exit (callout=0xc02aaf20 buf_daemon, arg=0x0, 
frame=0xc80cbfa8) at ../../kern/kern_fork.c:727
727 callout(arg, frame);
(kgdb) l
722  * cpu_set_fork_handler intercepts this function call to
723  * have this call a non-return function to stay in kernel mode.
724  * initproc has its own fork handler, but it does return.
725  */
726 KASSERT(callout != NULL, (NULL callout in fork_exit));
727 callout(arg, frame);
728 
729 /*
730  * Check if a kernel thread misbehaved and returned from its main
731  * function.
(kgdb) l
732  */
733 PROC_LOCK(p);
734 if (p-p_flag  P_KTHREAD) {
735 PROC_UNLOCK(p);
736 mtx_lock(Giant);
737 printf(Kernel thread \%s\ (pid %d) exited prematurely.\n,
738 p-p_comm, p-p_pid);
739 kthread_exit(0);
740 }
741 PROC_UNLOCK(p);
(kgdb) p frame
$1 = (struct trapframe *) 0xc80cbfa8
(kgdb) p frame[0]
$2 = {tf_fs = 0, tf_es = 0, tf_ds = 0, tf_edi = 0, tf_esi = 0, tf_ebp = 0, 
  tf_isp = 0, tf_ebx = 0, tf_edx = 1, tf_ecx = 0, tf_eax = 0, tf_trapno = 0, 
  tf_err = 0, tf_eip = 0, tf_cs = 0, tf_eflags = 0, tf_esp = 0, tf_ss = 0}
(kgdb) fr 5
#5  0xc02ab0a5 in buf_daemon () at ../../kern/vfs_bio.c:1883
1883tsleep(bd_request, PVM, qsleep, hz / 2);
(kgdb) l
1878/*
1879 * We couldn't find any flushable dirty buffers but
1880 * still have too many dirty buffers, we
1881 * have to sleep and try again.  (rare)
1882 */
1883tsleep(bd_request, PVM, qsleep, hz / 2);
1884   

Panic I got: mutex sx backing lock recursed at ../../kern/kern_condvar.c:198

2001-04-05 Thread Richard Todd

I'm running -CURRENT on a dual PII/400 box with 128M of RAM.  The kernel 
I'm running was built from sources current as of last night (i.e. around
9PM CDT Apr 3).  Just now, while listening to streaming audio with xmms, 
the machine crashed.  It's done that a couple times before, with recent-ish
kernels while doing streaming audio with xmms, but the other times didn't give
core dumps with usable backtraces.  *This* time I got a decent backtrace. 

If I'm reading this backtrace right, the thread handling the sound
hardware called selwakeup() (frame #19).  This called pfind() (frame
#18), which tries to lock allproc.  Somewhere in doing this,
witness_sleep() (frame #15) decides it wants to printf() a message. printf()
calls down into the tty code, which goes into ptsstart() (frame #9) and the
pty code (I'm not entirely sure why). This code then tries to do a selwakeup()
of its own (frame #7) which calls pfind() which tries (again) to lock allproc, 
leading to the "mutex recursed" panic. 

GDB output and (if it matters) kernel config file below. 

Script started on Thu Apr  5 01:12:28 2001
ichotolot# cd /usr/src/sys/compile/ICHOTOLOTSMP
ichotolot# gdb -k kernel.debug /var/crash/vmcore.7
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...
IdlePTD 6356992
initial pcb at 513860
panicstr: mutex sx backing lock recursed at ../../kern/kern_condvar.c:198
panic messages:
---
panic: mutex sx backing lock recursed at ../../kern/kern_condvar.c:198
cpuid = 0; lapic.id = 
boot() called on cpu#0

syncing disks... 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 
1: dev:da0s2e, flags:21021024, blkno:11469104, lblkno:11469104
2: dev:da0s2e, flags:21021024, blkno:11468864, lblkno:11468864
3: dev:da0s2e, flags:2124, blkno:2048, lblkno:2048
4: dev:da0s2e, flags:21021024, blkno:2752848, lblkno:2752848
5: dev:da0s2e, flags:21021024, blkno:2752736, lblkno:2752736
6: dev:da0s2e, flags:21021024, blkno:11468976, lblkno:11468976
7: dev:da0s2a, flags:21021024, blkno:131152, lblkno:131152
8: dev:da0s2e, flags:21021024, blkno:2294176, lblkno:2294176
9: dev:da0s2e, flags:21021024, blkno:2425120, lblkno:2425120
10: dev:da0s2a, flags:21021024, blkno:131184, lblkno:131184
11: dev:da0s2e, flags:2124, blkno:16, lblkno:16
12: dev:da0s2e, flags:21021024, blkno:2294160, lblkno:2294160
13: dev:da0s2e, flags:21021024, blkno:14221440, lblkno:14221440
14: dev:da0s2e, flags:21021024, blkno:2294192, lblkno:2294192
15: dev:da0s2e, flags:01011024, blkno:11474186, lblkno:0
16: dev:da0s2e, flags:0124, blkno:11468848, lblkno:11468848
giving up on 16 buffers
Uptime: 23h3m37s

dumping to dev da0s2b, offset 270336
dump 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 
108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 
82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 
53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 
24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 
---
#0  dumpsys () at ../../kern/kern_shutdown.c:478
478 if (dumping++) {
(kgdb) bt
#0  dumpsys () at ../../kern/kern_shutdown.c:478
#1  0xc0251547 in boot (howto=256) at ../../kern/kern_shutdown.c:321
#2  0xc0251a09 in panic (fmt=0xc0464a44 "mutex %s recursed at %s:%d")
at ../../kern/kern_shutdown.c:592
#3  0xc024aec3 in _mtx_assert (m=0xc054765c, what=9, 
file=0xc0462932 "../../kern/kern_condvar.c", line=198)
at ../../kern/kern_mutex.c:602
#4  0xc02369a2 in cv_wait (cvp=0xc0547698, mp=0xc054765c)
at ../../kern/kern_condvar.c:198
#5  0xc0258caa in _sx_slock (sx=0xc0547640, 
file=0xc0464e23 "../../kern/kern_proc.c", line=143)
at ../../kern/kern_sx.c:117
#6  0xc024bf48 in pfind (pid=606) at ../../kern/kern_proc.c:143
#7  0xc026ffe1 in selwakeup (sip=0xc10eea04) at ../../kern/sys_generic.c:1061
#8  0xc027accf in ptcwakeup (tp=0xc10eea20, flag=1) at ../../kern/tty_pty.c:318
#9  0xc027acaa in ptsstart (tp=0xc10eea20) at ../../kern/tty_pty.c:307
#10 0xc0278170 in ttstart (tp=0xc10eea20) at ../../kern/tty.c:1417
#11 0xc027978d in tputchar (c=46, tp=0xc10eea20) at ../../kern/tty.c:2484
#12 0xc0268813 in putchar (c=46, arg=0xc7f12e10) at ../../kern/subr_prf.c:304
#13 0xc0268fb8 in kvprintf (
fmt=0xc0468642 ":%d: %s with \"%s\" locked from %s:%d\n", 
func=0xc02687c4 putchar, arg=0xc7f12e10, radix=10, ap=0xc7f12e2c "Ç")
at ../../kern/subr_prf.c:637
#14 0xc0268740 in printf (
fmt=0xc0468640 "%s:%d: %s with \"%s\" locked from %s:%d\n")
at ../../kern/subr_prf.c:260
#15 0xc026cff9 in witness_sleep (check_only=0, lock=0xc054765c, 

Re: Tracking down problem with booting large kernels (bug in locore.s)

2001-03-13 Thread Richard Todd

In message [EMAIL PROTECTED], Peter Wemm writes:
Richard Todd wrote:

  No crashes as of here
  pushl   $begin  /* jump to high virtualized add
ress */
  ret   
 
 /* now running relocated at KERNBASE where the system is linked to run */
 begin:
  crashes before it gets here!!!
  /* set up bootstrap stack */
  movlproc0paddr,%eax /* location of in-kernel pages 
*/

I have some suspicions..  Can you do a nm on your kernel?

peter@daintree[8:41pm]~-102 nm /boot/kernel/kernel  |grep begin
c0123689 t begin


Sure.  A working kernel (the one I'm booted off of now) shows:
55 ichotolot ~[11:49PM] Z% nm /boot/kernel.good5/kernel | grep begin
c0128c79 t begin
c0368b3f t mp_begin

and one that crashes shows:

56 ichotolot ~[11:50PM] Z% nm /boot/kernel.old/kernel | grep begin
c01290a9 t begin
c038d49f t mp_begin

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Tracking down problem with booting large kernels (bug in locore.s)

2001-03-12 Thread Richard Todd

On my system (dual PII/400 running -current), I've noticed for some time that
if I build a kernel with too many device drivers in it (where "too many" seems
to correspond to text size 3M for the resulting kernel), the system reboots
itself immediately upon booting with the new kernel.  Other people have noticed
this before (see the thread "Recent kernels won't boot" in the mailing list
archives at 
http://www.freebsd.org/mail/archive/2000/freebsd-current/20001015.freebsd-current.html
).
However, no fix for or cause of the problem was ever identified, and the
problem still exists in -current cvsuped as of today.   

I spent some time tonight seeing if I could localize the exact place
of the crash, and had some luck finding where it's crashing.  The
problem is annoyingly hard to track down, as even booting with DDB and
boot -d wouldn't catch the bug; the kernel reboots before DDB starts.  I 
had to resort to sticking "hlt" instructions (or calls to cpu_halt()) in 
various places and seeing if I could get the kernel to hang (telling me that
the kernel had gotten as far as where I stuck the halt.)  I narrowed the crash
down to this area of locore.s (note the arrows).

---
/* Now enable paging */
movlR(IdlePTD), %eax
movl%eax,%cr3   /* load ptd addr into mmu */
movl%cr0,%eax   /* get control word */
orl $CR0_PE|CR0_PG,%eax /* enable paging */
movl%eax,%cr0   /* and let's page NOW! */

#ifdef BDE_DEBUGGER
/*
 * Complete the adjustments for paging so that we can keep tracing through
 * initi386() after the low (physical) addresses for the gdt and idt become
 * invalid.
 */
callbdb_commit_paging
#endif
 No crashes as of here
pushl   $begin  /* jump to high virtualized address */
ret   

/* now running relocated at KERNBASE where the system is linked to run */
begin:
 crashes before it gets here!!!
/* set up bootstrap stack */
movlproc0paddr,%eax /* location of in-kernel pages */
--

The pushl and ret is where the boot code is jumping to "begin:" at its proper
virtual address after the page tables are setup.  I'm guessing that
create_pagetables is somehow losing and creating bogus page tables such that
the jump to the kernel virtual address space goes into deep space somewhere, 
but frankly the details of page tables on the i386 are beyond my expertise.
So I'm posting this in hopes that someone on here *does* know enough to figure
out what's going wrong when the kernel size is sufficiently large. 

Any takers?

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Dual probing of PCI-connected hardware (was Re: xl driver

2000-09-03 Thread Richard Todd

In servalan.mailinglist.fbsd-current you write:

In message [EMAIL PROTECTED] R Joseph Wright writes:
: Sep  3 13:24:26 manatee /kernel: xl0: 3Com 3c900-COMBO Etherlink XL port 
0x6c00-0x6c3f irq 11 at device 9.0 on pci0
: Sep  3 13:24:26 manatee /kernel: xl1: 3Com 3c900-COMBO Etherlink XL port 
0x6c00-0x6c3f irq 11 at device 9.0 on pci2

Looks like your pci bus is getting probed twice!

Warner


I've been seeing similar oddities too.  Nothing crippling, but it is a little
disconcerting to see the machine think you have twice as many SCSI controllers
and ethernets as you actually have.   This is with kernel src grabbed earlier
today, i.e. after Peter Wemm's most recent fixes.  Note how the kernel thinks
it sees an fxp1 and an ahc2/3 as well as an ata2/3, but fails on the
allocation of resources/IRQs for those devices, since they're already allocated
to the "real" instances of the fxp, ahc0/1, ata0/1 devices. 


Copyright (c) 1992-2000 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.0-CURRENT #36: Sun Sep  3 18:54:17 CDT 2000
[EMAIL PROTECTED]:/usr/src/sys/compile/ICHOTOLOTSMP
Calibrating clock(s) ... TSC clock: 400853210 Hz, i8254 clock: 1193016 Hz
CLK_USE_I8254_CALIBRATION not specified - using default frequency
Timecounter "i8254"  frequency 1193182 Hz
CLK_USE_TSC_CALIBRATION not specified - using old calibration method
CPU: Pentium II/Pentium II Xeon/Celeron (400.91-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x653  Stepping = 3
  
Features=0x183fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR
real memory  = 134217728 (131072K bytes)
Physical memory chunk(s):
0x1000 - 0x0009efff, 647168 bytes (158 pages)
0x00739000 - 0x07ff7fff, 126611456 bytes (30911 pages)
avail memory = 123305984 (120416K bytes)
Programming 24 pins in IOAPIC #0
IOAPIC #0 intpin 2 - irq 0
SMP: CPU0 apic_initialize():
 lint0: 0x0700 lint1: 0x00010400 TPR: 0x0010 SVR: 0x01ff
FreeBSD/SMP: Multiprocessor motherboard
 cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfee0
 cpu1 (AP):  apic id:  1, version: 0x00040011, at 0xfee0
 io0 (APIC): apic id:  2, version: 0x00170011, at 0xfec0
bios32: Found BIOS32 Service Directory header at 0xc00fdb50
bios32: Entry = 0xfdb60 (c00fdb60)  Rev = 0  Len = 1
pcibios: PCI BIOS entry at 0xf+0xdb81
pnpbios: Found PnP BIOS data at 0xc00f72e0
pnpbios: Entry = f:6984  Rev = 1.0
Other BIOS signatures found:
Preloaded elf kernel "kernel" at 0xc071d000.
random: entropy source
nulldev: null device, zero device
mem: memory  I/O
Pentium Pro MTRR support enabled
Creating DISK md0
md0: Malloc disk
Math emulator present
SMP: CPU0 bsp_apic_configure():
 lint0: 0x00010700 lint1: 0x0400 TPR: 0x0010 SVR: 0x01ff
pcib-: pcib0 exists, using next available unit number
npx0: math processor on motherboard
npx0: INT 16 interface
pcib0: Intel 82443GX host to PCI bridge on motherboard
pci0: physical bus=0
found- vendor=0x8086, dev=0x71a0, revid=0x00
class=06-00-00, hdrtype=0x00, mfdev=0
subordinatebus=0secondarybus=0
map[10]: type 3, range 32, base f800, size 26, enabled
found- vendor=0x8086, dev=0x71a1, revid=0x00
class=06-04-00, hdrtype=0x01, mfdev=0
subordinatebus=1secondarybus=1
found- vendor=0x8086, dev=0x7110, revid=0x02
class=06-01-00, hdrtype=0x00, mfdev=1
subordinatebus=0secondarybus=0
found- vendor=0x8086, dev=0x7111, revid=0x01
class=01-01-80, hdrtype=0x00, mfdev=0
subordinatebus=0secondarybus=0
map[20]: type 4, range 32, base ffa0, size  4, enabled
Freeing (NOT implemented) redirected PCI irq 11.
found- vendor=0x8086, dev=0x7112, revid=0x01
class=0c-03-00, hdrtype=0x00, mfdev=0
subordinatebus=0secondarybus=0
intpin=d, irq=19
map[20]: type 4, range 32, base ef80, size  5, enabled
found- vendor=0x8086, dev=0x7113, revid=0x02
class=06-80-00, hdrtype=0x00, mfdev=0
subordinatebus=0secondarybus=0
map[90]: type 4, range 32, base 0440, size  4, enabled
found- vendor=0x1011, dev=0x0024, revid=0x03
class=06-04-00, hdrtype=0x01, mfdev=0
subordinatebus=2secondarybus=2
Freeing (NOT implemented) redirected PCI irq 11.
found- vendor=0x8086, dev=0x1229, revid=0x05
class=02-00-00, hdrtype=0x00, mfdev=0
subordinatebus=0secondarybus=0
intpin=a, irq=19
map[10]: type 3, range 32, base fc4ff000, size 12, enabled
map[14]: type 4, range 32, base ef40, size  5, enabled
map[18]: type 1, range 32, base fea0, size 20, enabled
Freeing (NOT implemented) redirected PCI irq 10.
found- vendor=0x9004, dev=0x7895, revid=0x04
class=01-00-00, hdrtype=0x00, mfdev=1
subordinatebus=0secondarybus=0
intpin=a, 

Re: (noperiph:ahc0:0:-1:-1): ... error

2000-07-21 Thread Richard Todd

In servalan.mailinglist.fbsd-current you write:

I am trying to run a recent (as of today) and am seeing the following
error when I try to boot::

(noperiph:ahc0:0:-1:-1): SCSI bus reset delivered. 0 SCBs aborted.
panic: Bogus resid sgptr value 0xbd68609

(I copied this from the console after the boot failure, there may be
minor mistakes.)

This started happening when I started compiling kernels built from
sources cvsuped around Jul 18.

I am not sure what is causing these messages. The "noperiph" message
appears to come from xpt_print_path in /usr/src/sys/cam/cam_xpt.c while
the panic seems to be written by ahc_calc_residual in
/usr/src/sys/dev/aic7xxx/aic7xxx.c.  From a quick look at the code, the
problem is not directly in the code pointed to by the messages.

I have an Adaptec 2940UW. A much older kernel reports it as Adaptec
2940 Ultra SCSI adapter  with  aic7880 Wide Channel A, SCSI Id=7,
16/255 SCBs. The Bios on the board is version 2.20.0

I have 4 drives and a UMAX scanner connected to the bus. More details
available if needed.

I saw something similar, but not identical, when trying to boot a -current
kernel made last night.  I saw the (noperiph...) message you saw.  After that,
the machine didn't panic, but it didn't work very well, either.  It did, after
a few seconds, detect the SCSI tape drive I had (sa0), but failed on detecting
the SCSI disk and CDROM, repeatedly timing out and resetting the bus.  Alas,
I didn't have the presence of mind to write down the exact messages; I'll try
to do that tonight, assuming the bug is still present in the src I'm cvsupping
now.  This was on an SMP box (Tyan Thunder 100GX), with an aic7895 SCSI
controller, and the following three SCSI devices:
sa0 at ahc0 bus 0 target 0 lun 0
sa0: SONY SDT-7000 0300 Removable Sequential Access SCSI-2 device
sa0: 10.000MB/s transfers (10.000MHz, offset 15)
Mounting root from ufs:/dev/da0s2a
da0 at ahc0 bus 0 target 6 lun 0
da0: QUANTUM ATLAS IV 9 WLS 0707 Fixed Direct Access SCSI-3 device
da0: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da0: 8761MB (17942584 512 byte sectors: 255H 63S/T 1116C)
cd0 at ahc0 bus 0 target 1 lun 0
cd0: TOSHIBA CD-ROM XM-6401TA 1009 Removable CD-ROM SCSI-2 device
cd0: 20.000MB/s transfers (20.000MHz, offset 15)
cd0: Attempt to query device size failed: NOT READY, Medium not present




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: db 1.85 -- 2.x or 3.x?

2000-05-02 Thread Richard Todd

In servalan.mailinglist.fbsd-current Brad Knowles writes:
   Besides, don't we use gcc as the system-standard compiler, and 
doesn't this likewise infect everything compiled on FreeBSD with the 
GPL?

No, because none of the gcc code appears in the resulting binary.  The
binary does include the "libgcc" code, but that code is specifically
exempted from the GPL.   Programs that link against the Berkeley DB 2.x
library, however, will end up including the DB code, and thus end up
including code covered by the 2.x licence. 

[Note: of course, if you link against a shared library, the actual code
from the library doesn't appear in your binary.  It seems to be the general
consensus opinion that the courts would treat this the same as the static
linking case, i.e. your binary would be covered under the licence "as if" you
had statically linked against the relevant library, but I don't know if this
has ever been tested in court anywhere.  If you're in a situation where the
legalities really matter, you should probably ask a real lawyer instead of
relying on the semi-informed opinion of people posting to mailing lists.]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message