Panic during -CURRENT buildworld

2001-05-28 Thread David Wolfskill

This is on a system (my laptop):
FreeBSD localhost 5.0-CURRENT FreeBSD 5.0-CURRENT #13: Sun May 27 23:44:24 PDT 2001
 [EMAIL PROTECTED]:/common/C/obj/usr/src/sys/LAPTOP_30W  i386 Mon May 28 
07:27:59 PDT 2001

Recent CVSup activity:
CVSup begin from cvsup14.freebsd.org at Sat May 26 03:47:01 PDT 2001
CVSup ended from cvsup14.freebsd.org at Sat May 26 03:52:48 PDT 2001
CVSup begin from cvsup14.freebsd.org at Sun May 27 03:47:01 PDT 2001
CVSup ended from cvsup14.freebsd.org at Sun May 27 03:53:36 PDT 2001
CVSup begin from cvsup14.freebsd.org at Mon May 28 03:47:00 PDT 2001
CVSup ended from cvsup14.freebsd.org at Mon May 28 03:53:51 PDT 2001

I had tried the buildworld within X (as had been my normal practice until
the recent difficulties with swap and/or VM), and the system re-booted
itself.  Got the well-discussed symptom of an active file system failing
fsck's check of primary vs. first alternate superblock, and after fsck
got finished with the file system, soft updates got turned off, so I turned
soft updates back on again.

Since I was in single-user mode anyhow, I mounted the necessary file
systems, issued a swapon -a, and proceeded to re-try the make
buildworld ( friends) -- from within script (again, as I usually do).

Here's a transcript of the first  last parts of the typescript file:

Script started on Mon May 28 01:21:19 2001
# mount  cd /usr/src  uname -a  date  make buildworld  date  make kernel 
KERNCONF=LAPTOP_30W  date  make installworld  date  mergemaster  date  
sync  df -k
/dev/ad0s3a on / (ufs, local, soft-updates)
devfs on /dev (devfs, local)
/dev/ad0s3e on /usr (ufs, local, soft-updates)
/dev/ad0s3g on /var (ufs, local, soft-updates)
procfs on /proc (procfs, local)
/dev/ad0s3h on /common (ufs, local, soft-updates)
FreeBSD  5.0-CURRENT FreeBSD 5.0-CURRENT #13: Sun May 27 23:44:24 PDT 2001 
[EMAIL PROTECTED]:/common/C/obj/usr/src/sys/LAPTOP_30W  i386
Mon May 28 01:22:06 PDT 2001

--
 Rebuilding the temporary build tree
--
rm -rf /usr/obj/usr/src/i386

... [elided -- dhw]

cc -pg -O -pipe  -I. -I/usr/src/lib/libncurses -I/usr/src/lib/libncurses/../../c
ontrib/ncurses/ncurses -I/usr/src/lib/libncurses/../../contrib/ncurses/include -
Wall -DFREEBSD_NATIVE -DNDEBUG -DHAVE_CONFIG_H -DTERMIOS -I/usr/obj/usr/src/i386
/usr/include  -c /usr/src/lib/libncurses/../../contrib/ncurses/ncurses/tinfo/get
env_num^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@

And a hand-transcription of the panic (starting with the last command
shown on the console from the make buildworld):

cc -fpic -DPIC -O -pipe -I. -I/usr/src/lib/libncurses 
-I/usr/src/lib/libncurses/../../contrib/ncurses/ncurses 
-I/usr/src/lib/libncurses/../../contrib/ncurses/include -Wall -DFREEBSD_NATIVE 
-DNDEBIG -DHAVE_CONFIG_H -DTERMIOS -i/usr/obj/usr/src/i386/usr/include -c lib_gen.c -o 
libgen.So
freeing uidinfo: uid=0, proccnt=33
kernel trap 12 with interrupts disabled
panic: blockable sleep lock (sleep mutex) Giant @/usr/src/sys/vm/vm_fault.c:213
Debugger(panic)
Stopped atDebugger+0x44: pushl %ebx
db trace
Debugger(c03a499b) at Debugger+0x44
panic(c03a75e0,c03a3820,c03cc4b4,c03c1d9b,d5) at panic+0x70
witness_lock(c047dda0,8,c03c1d9b,d5) at witness_lock+0x1b2
vm_fault(c04692ac,deadc000,1,0,0) at vm_fault+0xb2
trap_pfault(ce7f4e34,0,deadc2af,ce7ffa60,c0e4259c) at trap_pfault+0x5d0
trap(ce7f0018,c01f0010,c01f0010,4,c0e4259c) at trap+0x5d0
calltrap() at calltrap+0x5
--- trap 0xc, eip=0xc01d8cb6, esp=0xce7f4e74, ebp=0xce7f4e80 ---
uihold(c0e42580,c1ca3a68,c03a4235,0,98) at uihold+0x5f
crdup(c0e3d600, ce7ffb7c,ce7ffa60,2,c0445a00) at crdup+0x4c
access(ce7ffa60,ce7f4f80,806b240,806f080,805d1ce) at access+0x18
syscall(2f,2f,2f,805d1ce,806f080) at syscall+0x71d
syscall_with_err_pushed() at syscall_with_err_pushed+0x1b


In addition to the kernel.old (dated a few hours earlier the same day), I
have a saved kernel from 16 May (which is the one from which I had booted
in order to build the one from 23:50 (PDT) on 27 May, so I could try that.
Or I could try some other things, if that might help identify the problem.
I have not (yet) tried any of the posted experimental patches against
anything involving file systems, soft updates, swap, or VM.  (I do have
a small patch for keyboard control of the sound mute function, as well
as some bits  pieces of some of Doug Ambrisko's sys/dev/an patches.
However, the Cisco/Aironet card wasn't inserted at any point during the
boot that did the panic.)

Help?

Thanks,
david
-- 
David H. Wolfskill  [EMAIL PROTECTED]
As a computing professional, I believe it would be unethical for me to
advise, recommend, or support the use (save possibly for personal
amusement) of any product that is or depends on any Microsoft product.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Panic during -CURRENT buildworld

2001-05-28 Thread David Wolfskill

Date: Mon, 28 May 2001 10:46:53 -0700 (PDT)
From: David Wolfskill [EMAIL PROTECTED]

This is on a system (my laptop):
FreeBSD localhost 5.0-CURRENT FreeBSD 5.0-CURRENT #13: Sun May 27 23:44:24 PDT 2001   
  [EMAIL PROTECTED]:/common/C/obj/usr/src/sys/LAPTOP_30W  i386 Mon May 28 
07:27:59 PDT 2001

Recent CVSup activity:
CVSup begin from cvsup14.freebsd.org at Sat May 26 03:47:01 PDT 2001
CVSup ended from cvsup14.freebsd.org at Sat May 26 03:52:48 PDT 2001
CVSup begin from cvsup14.freebsd.org at Sun May 27 03:47:01 PDT 2001
CVSup ended from cvsup14.freebsd.org at Sun May 27 03:53:36 PDT 2001
CVSup begin from cvsup14.freebsd.org at Mon May 28 03:47:00 PDT 2001
CVSup ended from cvsup14.freebsd.org at Mon May 28 03:53:51 PDT 2001

I had tried the buildworld within X (as had been my normal practice until
the recent difficulties with swap and/or VM), and the system re-booted
itself.  Got the well-discussed symptom of an active file system failing
fsck's check of primary vs. first alternate superblock, and after fsck
got finished with the file system, soft updates got turned off, so I turned
soft updates back on again.



I was able to do the buildworld ( friends) by booting a saved kernel
from 16 May into single-user mode, so I'm now running:

FreeBSD dhcp-133.catwhisker.org 5.0-CURRENT FreeBSD 5.0-CURRENT #14: Mon May 28 
09:56:14 PDT 2001 root@:/common/C/obj/usr/src/sys/LAPTOP_30W  i386

And while running that kernel (no further CVSups; no further source tree
mods), I was able to do a make buildworld while running X.

Seems like an improvement to me,
david
-- 
David H. Wolfskill  [EMAIL PROTECTED]
As a computing professional, I believe it would be unethical for me to
advise, recommend, or support the use (save possibly for personal
amusement) of any product that is or depends on any Microsoft product.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Panic during -CURRENT buildworld

2001-05-19 Thread Niels Chr. Bank-Pedersen

On Fri, May 18, 2001 at 09:25:59PM -0700, David Wolfskill wrote:
 
 I haven't seen anyone else reporting any problems similar to what I
 experienced, so I'm not about to claim there's something that's
 definitely broken

I have seen exactly the same - the machine (IBM thinkpad T21)
freezes during buildworld (or it appears to, but as you said,
it's hard to say if it panic'ed when you run X).
This problem appear to have been introduced sometime within the
last 3-4 days.  Only reason I've been silent is that I still
haven't had the time to get a trace.

 david

Cheers,
/Niels Chr.

-- 
 Niels Christian Bank-Pedersen, NCB1-RIPE.
 Network Manager, TDC, IP-section.

 Hey, are any of you guys out there actually *using* RFC 2549?

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Panic during -CURRENT buildworld

2001-05-19 Thread Clive Lin


Panic w/ softupdate disappears after I grab this revision of
ffs_softdep.c:

 ident /usr/src/sys/ufs/ffs/ffs_softdep.c
/usr/src/sys/ufs/ffs/ffs_softdep.c:
 $FreeBSD: src/sys/ufs/ffs/ffs_softdep.c,v 1.97 2001/05/19 19:24:26 mckusick Exp $

Now it's fairly smooth to buildworld, installworld, copy many small
files bewteen different slice/media/network (Okay, samba :D) for me.

-- 
Clive Lin (Tong-I Lin)\n =P [EMAIL PROTECTED] # Family, friends, private
affairs\n =F [EMAIL PROTECTED] # Chinese ports, documentation\n =O
[EMAIL PROTECTED] # Others\n =J.* # What do you think about the 'J' ?\n

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Panic during -CURRENT buildworld

2001-05-18 Thread David Wolfskill

[This is *really* long.  Sorry.  dhw]

Running:
FreeBSD dhcp-140.catwhisker.org 5.0-CURRENT FreeBSD 5.0-CURRENT #1: Thu May 17 
09:13:03 PDT 2001 
[EMAIL PROTECTED]:/common/C/obj/usr/src/sys/LAPTOP_30W  i386

(The #1 sequence number is probably misleading; I'll go into that below.
It was around #60 or so before the events discussed there.)

Last few CVSups:
CVSup begin from cvsup14.freebsd.org at Tue May 15 03:47:00 PDT 2001
CVSup ended from cvsup14.freebsd.org at Tue May 15 03:52:15 PDT 2001
CVSup begin from cvsup14.freebsd.org at Wed May 16 03:47:01 PDT 2001
CVSup ended from cvsup14.freebsd.org at Wed May 16 03:52:26 PDT 2001
CVSup begin from cvsup14.freebsd.org at Wed May 16 07:57:56 PDT 2001
CVSup ended from cvsup14.freebsd.org at Wed May 16 08:04:15 PDT 2001
CVSup begin from cvsup14.freebsd.org at Thu May 17 03:47:00 PDT 2001
CVSup ended from cvsup14.freebsd.org at Thu May 17 03:52:58 PDT 2001
CVSup begin from cvsup14.freebsd.org at Fri May 18 03:47:01 PDT 2001
CVSup ended from cvsup14.freebsd.org at Fri May 18 03:52:34 PDT 2001

(Wednesday wasn't the best of days for me)

As some folks may recall, I've been tracking -STABLE  -CURRENT on this
machine (my laptop) since early March.  I've encountered various forms
of challenges in that, but today's the first time I got a panic during
the make buildworld.

And this was on the 3rd attempt today.  :-(  (The other 2 were done --
as usual for me -- within an X environment, so I could more easily monitor
the progress... or so I had imagined.  Each of those locked up; I
suspect, in hindsight, that they may also have been panics.)  This last
was one I did in single-user mode.  (I had already built  booted
today's -STABLE.)

FWIW, the system appears to have gotten much further while in
single-user mode -- it was in stage 4: building libraries; here are my
(hand-transcribed) notes:

cc -fpic -DPIC  ...  lib_addch.So
cc -fpic -DPIC  ...  lib_addstr.So

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xdeadc0e8
fault code  = supervidor read, page not present
instruction pointer = 0x880 0xc0314938
   [This character may be wrong--^  sorry.  :-(  Could be 6, b, or 8.]
stack pointer   = 0x10: 0xccd8be7c
frame pointer   = 0x10: 0xccd8be7c
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL=0
current process = 16 (irq14: ata0)
kernel: type 12 trap, code =0
stopped atworklist_remove+0x1c:   cmpw $0, 0xa(%ecx)
db trace
worklist_remove(deadc0de) at ++0x1c
free_diradd(deadc0de) at free_diradd+0x26
free_newdirblk(c16a0980) at free_newdirblk+0x2e
handle_written_inodeblock(c18a0280, c688ff6c) at handle_written_inodeblock+0x2b2
softdep_disk_write_complete(c688ff6c) at softdep_disk_write_complete+0x6a
bufdone(c688ff6c, ccd8bf50, c0134df6, c688ff6c, c167c200) at bufdone+0x109
bufdonebio(c688ff6c) at bufdonebio+0xe
ad_interrupt(c1be4700, c016131c0, ccd8bf7c, c01c1db7, c167c200) at ad_interrupt+0x3ce
ata_intr(c167c200) at ata_intr+0xae
ithread_loop(c167c180, ccd8bfa8) at ithread_loop+0x413
fork_exit(c01c19a4, c167c180, ccd8bfa8) at fork_exit+0xb4
fork_trampoline() at fork_trampoline+0x8
db

Each of the first 2 times the make buildworld didn't complete, I ended
up power-cycling the machine.  fsck wasn't especially happy about this,
and I did take the opportunity to note that each time the -CURRENT fsck 
wanted a hand-invoked fsck, the following effects were noted:

* It complained about the primary superblock had some (presumably
  unexpected) discrepancy with the first alternate (at block 32).
  Each time, I told it to use the alternate, and to update the primary
  superblock.

* After finishing up, the soft updates flag was no longer turned on,
  so I ran tunefs -n enable while I was (still) in single-user mode.


As to the events alluded to above:  Wednesday, while trying to figure
out some breakage, I quoted the $FreeBSD$ line of a Makefile to a
correspondent (Ruslan Ermilov) who was kind enough to try to help me 
figure things out.  He noticed that the revision level on the Makefile
seemed too low; off by one, actually.  But cvs log Makefile in the
directory in question showed that the Makefile itself was current; it
was the $FreeBSD$ tag that was incorrect.

It turns out that when I had set up my CVS repository, I placed it in
/cvs, but put the FreeBSD part of it in /cvs/freebsd (assuming that I
might want to place some non-FreeBSD-specific things in there at some
point).  And I set up my (default) $CVSROOT environment variable to
/cvs.  So far, so good.

This was (as mentioned) back in early March, so there are some aspects
of this that I don't quite recall.  But when I created my -CURRENT
working directory for /usr/src, I must have done something that did 

Re: Panic during -CURRENT buildworld

2001-05-18 Thread Szilveszter Adam

Hello everybody,

attention! pure speculation and unprofessional comments follow!

These problems with the alternate superblock remind me... there were
reports about the same when fsck had problems some time ago.

But there was a common theme to all of them: The fsck raves were a whole
lot more severe if there were softupdates enabled. I for example have been
running -CURRENT for quite a while here on a daily basis (although I have
never tried to use the same file systems concurrently with -STABLE) doing
buildowrlds, Mozilla bi-daily builds and other fun stuff, yet have not seen
any problems of this kind. Even when I had a crash, a manual fsck (for
safety's sake) always fixed things as it should, and never complained. And
this, although I had some very unfortunate crashes, when eg I crashed from
X in the middle of a Mozilla build, but even so there were almost no file
structures damaged or at least not noted by fsck. But I have never enabled
soft updates on any fs of mine, which of course means that some operations
require all the time in the world to complete, but seems it is safer
somehow. I am probably just lucky and totally wrong, but just
speculating...

-- 
Regards:

Szilveszter ADAM
Szeged University
Szeged Hungary

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Panic during -CURRENT buildworld

2001-05-18 Thread David Wolfskill

Date: Fri, 18 May 2001 10:48:19 -0700 (PDT)
From: David Wolfskill [EMAIL PROTECTED]

[Excruciatingly long narrative of panic during buildworld for today's
-CURRENT elided; it's in the archives.  dhw]

Reporting back after getting today's -CURRENT built:
FreeBSD dhcp-140.catwhisker.org 5.0-CURRENT FreeBSD 5.0-CURRENT #2: Fri May 18 
11:31:46 PDT 2001 root@:/common/C/obj/usr/src/sys/LAPTOP_30W  i386

Taking a cue from Szilveszter Adam's response to my note, I booted into
single-user mode, turned off soft updates for the file systems on ad0s3
(all of which get used during the biuldworld/installworld process, since
/usr/obj is a symlink to somwhere in /common -- df listing below for
reference), and I unmounted the file systems on ad0s1  ad0s2.

I then did the make buildworld/kernel/installworld  mergemaster while
remaining in single-user mode, running yesterday's -CURRENT (but, as
noted, with soft updates turned off for all mounted file systems).

As noted, it appears to have completed successfully.  I have turned
soft updates back on for everything.  Tomorrow's build may prove
interesting.

Here's what df -k looks like (while I'm running -CURRENT):

Filesystem  1K-blocks UsedAvail Capacity  Mounted on
/dev/ad0s3a 95263737821386084%/
devfs   110   100%/dev
/dev/ad0s1a 95263408124683047%/S1
/dev/ad0s1e915695   7774726496892%/S1/usr
/dev/ad0s2a 95263410304661247%/S2
/dev/ad0s2e915727   7761086636192%/S2/usr
/dev/ad0s3e915727   733431   10903887%/usr
/dev/ad0s3g254063   107557   12618146%/var
/dev/ad0s3h  14116697  4180188  880717432%/common
procfs  440   100%/proc
/dev/md10c 520140   16   478516 0%/tmp

I haven't seen anyone else reporting any problems similar to what I
experienced, so I'm not about to claim there's something that's
definitely broken

Cheers,
david
-- 
David H. Wolfskill  [EMAIL PROTECTED]
As a computing professional, I believe it would be unethical for me to
advise, recommend, or support the use (save possibly for personal
amusement) of any product that is or depends on any Microsoft product.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message