Re: SMP deadlock during multi-user mode transition after r204866

2010-03-12 Thread David Wolfskill
On Thu, Mar 11, 2010 at 09:04:40AM -0800, David Wolfskill wrote:
 My build machine (dmesg attached) is a dual CPU, single-core box; my
 laptop is a single CPU, single-core box.  I track head on each daily;
 while the build machine has been locking up during the transition to
 multi-user mode since Tuesday (when I had built CURRENT at r204909;
 previous was r204866, on Monday) -- and it boots to single-user mode OK
 -- the laptop has not exhibited the problem.
 ...

This appears to be resolved as of r205078 -- thanks!

 ...

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
Depriving a girl or boy of an opportunity for education is evil.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


pgpb8ltDeklLf.pgp
Description: PGP signature


SMP deadlock during multi-user mode transition after r204866

2010-03-11 Thread David Wolfskill
My build machine (dmesg attached) is a dual CPU, single-core box; my
laptop is a single CPU, single-core box.  I track head on each daily;
while the build machine has been locking up during the transition to
multi-user mode since Tuesday (when I had built CURRENT at r204909;
previous was r204866, on Monday) -- and it boots to single-user mode OK
-- the laptop has not exhibited the problem.

This build machine was deployed fairly recently, and since a GENERIC
kernel had been working OK, I had left it that way (so that's the kernel
config).  I have a more customized config I had used on its predecessor;
I'm pretty sure I had that set up with DDB  assorted other goodies to
try to get something useful out of a misbehaviing system, and am willing
to set that up (but probably won't have time for several hours, at
least, as I need to give a presentation at a work meeting).

One of the more peculiar symptoms is that after such a lock-up, I
power-cycle the machine, then boot to single-user mode, at which point I
typically start with 

fsck -p

However, since Tuesday, that attempt yields:

Enter full pathname of shell or RETURN for /bin/sh: 
# fsck -p
/dev/aacd0s4a: LINK COUNT DIR I=2  OWNER=root MODE=40755
/dev/aacd0s4a: SIZE=1024 MTIME=Mar 11 08:30 2010  COUNT 26 SHOULD BE 27
/dev/aacd0s4a: LINK COUNT INCREASING
/dev/aacd0s4a: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY.
# 

My circumvention of choice at the moment is:

# fsck -y /  fsck -p

as it appears that the root file system is the only one thus affected.

Is this sufficently well understood already that I should stop
disturbing folks who are trying to fix it?  Would it be usful for me to
configure a kernel that supports DDB  provide a backtrace (and maybe
additional stuff)?

To clarify, it appears that something after r204866 but no later than
r204909 has caused the observed problem.

Thanks.

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
Depriving a girl or boy of an opportunity for education is evil.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.
Copyright (c) 1992-2010 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 9.0-CURRENT #92: Mon Mar  8 06:14:22 PST 2010
r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/sys/GENERIC i386
WARNING: WITNESS option enabled, expect reduced performance.
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 3.60GHz (3614.54-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0xf41  Stepping = 1
  
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
  Features2=0x659dSSE3,DTES64,MON,DS_CPL,EST,TM2,CNXT-ID,CX16,xTPR
  AMD Features=0x2010NX,LM
  TSC: P-state invariant
real memory  = 2147483648 (2048 MB)
avail memory = 2086187008 (1989 MB)
ACPI APIC Table: PTLTD  APIC  
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 2 package(s) x 1 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  6
ioapic0 Version 2.0 irqs 0-23 on motherboard
ioapic1 Version 2.0 irqs 24-47 on motherboard
ioapic2 Version 2.0 irqs 48-71 on motherboard
kbd1 at kbdmux0
acpi0: PTLTD   RSDT on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
Timecounter ACPI-fast frequency 3579545 Hz quality 1000
acpi_timer0: 24-bit timer at 3.579545MHz port 0x1008-0x100b on acpi0
cpu0: ACPI CPU on acpi0
cpu1: ACPI CPU on acpi0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0
pci0: ACPI PCI bus on pcib0
pci0: unknown at device 0.1 (no driver attached)
pci0: base peripheral at device 1.0 (no driver attached)
pcib1: ACPI PCI-PCI bridge irq 16 at device 2.0 on pci0
pci1: ACPI PCI bus on pcib1
pcib2: ACPI PCI-PCI bridge at device 0.0 on pci1
pci2: ACPI PCI bus on pcib2
aac0: Adaptec SCSI RAID 2200S mem 0xdc00-0xdfff irq 24 at device 1.0 
on pci2
aac0: Enable Raw I/O
aac0: New comm. interface enabled
aac0: [ITHREAD]
aac0: Adaptec 2200S, aac driver 2.1.9-1
aacp0: SCSI Passthrough Bus on aac0
aacp1: SCSI Passthrough Bus on aac0
pcib3: ACPI PCI-PCI bridge at device 0.2 on pci1
pci3: ACPI PCI bus on pcib3
em0: Intel(R) PRO/1000 Network Connection 6.9.25 port 0x2000-0x203f mem 
0xd820-0xd821 irq 54 at device 2.0 on pci3
em0: [FILTER]
em0: Ethernet address: 00:30:48:2d:32:6a
em1: Intel(R) PRO/1000 Network Connection 6.9.25 port 0x2040-0x207f mem 
0xd822-0xd823 irq 55 at device 2.1 on pci3
em1: [FILTER]
em1: Ethernet address: 00:30:48:2d:32:6b
pcib4: ACPI PCI-PCI bridge irq 16 at device 4.0 on pci0
pci4: ACPI PCI bus on pcib4
pcib5: ACPI PCI-PCI bridge irq 16 at device 6.0 on pci0
pci5: ACPI PCI bus on pcib5
uhci0: Intel 82801EB (ICH5) USB controller USB-A port 0x1400-0x141f irq 16 at 
device 29.0 on pci0
uhci0: [ITHREAD]
usbus0: Intel