Re: panic: found dirty cache page 0xf046f1c0

1999-01-24 Thread Wilko Bulte
As Matthew Dillon wrote...
 I've committed one bug fix to the 'found dirty cache page' bug --
 turns out vm_map_split() was the culprit, renaming pages
 without removing them from PQ_CACHE.  The bug was introduced
 in -3.0, and hit the KASSERT() I put in -4.x.
 
 I've committed a general inlining of 'changing the page dirty
 status to VM_PAGE_BITS_ALL' and put a sanity check in the inline.
 If this problem occurs again, you will get a different panic.
 One of:
 
   vm_page_dirty: page in cache!
   vm_page_busy: page already busy!!!
   vm_page_wakeup: page not busy!!!
 
 If your box drops into DDB, please get a backtrace and report
 it to the list or to me and we should be able to track down
 any remaining dirty-pages-in-PQ_CACHE bugs.

FYI: a buildworld of -current including the above on FreeBSD/axp completed
without any incidents.

Wilko
_ __
 |   / o / /  _  Bulteemail: wi...@yedi.iaf.nl 
 |/|/ / / /( (_) Arnhem, The Netherlands  WWW  : http://www.tcja.nl
__ Powered by FreeBSD __

To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-24 Thread Matthew Dillon
:FYI: a buildworld of -current including the above on FreeBSD/axp completed
:without any incidents.
:
:Wilko
:...

:... ( other reports )

We are looking good, I've got half a dozen positive reports!

On general principles, I think it is possible to make the FreeBSD
VM system bulletproof.  The problem is that there are lots of odd
exceptions and special rules that haven't been black-boxed or even 
documented ( other then being in John's head, which isn't all that
useful to me ).  The rules tend to be layed out in code on each 
occurance, which inevitably leads to mistakes.  The mistakes 
are further compounded by a severe lack of enforcement ( KASSERT()s )
and thus propogate from release to release, building up as time passes.

With appropriate black boxing, documentation, and enforcement, it
should be fairly easy to shorten the development cycle on finding
the bugs.  __inline procedures are a godsend because there are literally
a hundred places in the code where someone 'optimized' it by doing
a manual expansion of something from some other module in order to avoid 
a subroutine call.  This cross module pollinization tends to make
things even less readable.  Bleh.

So, for example, a few commits ago I added enforcement of the no-dirty-
pages-on-cache-queue rule and systems started to panic.  That enforcement
had to be extended to every dirtying of a page before we actually found
the bug ( which turned out to be a -3.x bug ).  More recently I have
added enforcement for PG_BUSY state changes to disallow the busying of
an already-busy page, and unbusying of a non-busy page.

In discussions with John, there are a number of other rules that have
been broken and need to be fixed.  Pages on PQ_CACHE are supposed to be 
unqueued prior to being busied, held, or wired, for example, but the
rule is pretty much ignored and a lot of code was hacked in to check for
and requeue ( to another queue) the busy-page-on-cache case.

Entry conditions, exit conditions, and side effects for procedures are
mostly undocumented.  biodone() sequencing is not well documented, and
struct buf's have a 'kitchen sink' mentality from being hacked up so much.

There are currently too many NFS-specific exceptions strewn all over
the code.

It all works, but it is also a mess.

-Matt

Matthew Dillon 
dil...@backplane.com


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-23 Thread Matthew Dillon
:It's definately happening still, sorry. :-(  I recompiled a 100% static 
:kernel and have had three more explosions, usually after starting exmh.  
:(exmh takes 10 to 15MB of ram on this system due to my mailbox folder 
:sizes).
:
:However, a clue..  The SMP box that is doing fine is a P6, an NFS client
:and server (loading nfs.ko, it fsck's fast, so I use that box for making
:sure the modules work).  The one that is crashing, is a P5, an NFS client
:and server (static kernel), and with a MFS /tmp.  Both run softupdates (up
:to date src/contrib/sys).
:
:I suspect MFS is the key.  There's the new VOP_FREEBLKS() stuff you added, 
:and the corresponding calls to madvise to free the pages.
:
:Given madvise()'s murky history, I can't help but feel suspicious about it.
:
:I've unmounted /tmp and am about to thrash the machine.  At the 
:moment, it's sitting on:  Swap: 120M Total, 376K Used, 120M Free
:
:Cheers,
:-Peter

Hmmm.  It's possible.  A quick look at the exmh source indicates that
it uses /tmp a lot.  I've been doing make buildworld's with a 300MB
MFS /usr/obj, but those are typically nothing more then simple file
creates, reads, and writes.  Presumably exmh is doing something more
sophisticated.

Try changing the panic in vm/vm_page.c to a printf() ( 

if (m-dirty)
panic(found dirty cache page %p, m);

if (m-dirty)
printf(
found dirty cache page %p (%p,%d,%x) obtype %d 
obflags %x, 
m,
m-object,
(int)m-pindex,
(int)m-flags,
(int)m-object-type,
(int)m-object-flags
);

Lets see what we get.  This should tell me what kind 
of object the page is attached to and the flags of the
page and object.

-Matt
Matthew Dillon 
dil...@backplane.com

To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-23 Thread Peter Wemm
Matthew Dillon wrote:
 :It's definately happening still, sorry. :-(  I recompiled a 100% static 
 :kernel and have had three more explosions, usually after starting exmh.  
 :(exmh takes 10 to 15MB of ram on this system due to my mailbox folder 
 :sizes).
 :
 :However, a clue..  The SMP box that is doing fine is a P6, an NFS client
 :and server (loading nfs.ko, it fsck's fast, so I use that box for making
 :sure the modules work).  The one that is crashing, is a P5, an NFS client
 :and server (static kernel), and with a MFS /tmp.  Both run softupdates (up
 :to date src/contrib/sys).
 :
 :I suspect MFS is the key.  There's the new VOP_FREEBLKS() stuff you added, 
 :and the corresponding calls to madvise to free the pages.
 :
 :Given madvise()'s murky history, I can't help but feel suspicious about it.
 :
 :I've unmounted /tmp and am about to thrash the machine.  At the 
 :moment, it's sitting on:  Swap: 120M Total, 376K Used, 120M Free
 :
 :Cheers,
 :-Peter
 
 Hmmm.  It's possible.  A quick look at the exmh source indicates that
 it uses /tmp a lot.  I've been doing make buildworld's with a 300MB
 MFS /usr/obj, but those are typically nothing more then simple file
 creates, reads, and writes.  Presumably exmh is doing something more
 sophisticated.

I've since disabled MFS, compiled out a couple of other things I'm not 
using very often and generally cleaned things up.  I've had three more 
panics since turning off MFS, so that wasn't it. :-(

Anyway, I've just recompiled without SMP.  There were some very strange 
things happening on the serial console again that I really do not like the 
look of.  Console output has been disappearing and getting duplicated.

 Try changing the panic in vm/vm_page.c to a printf() ( 

I'll do that.

FWIW, this has happened while the system has been nearly quiescent all the 
way through to being thrashed with parallel cvs updates etc running.  Most 
times it waits till exmh is running.  Last time (when recompiling without 
SMP) it crashed when it came to linking the kernel (and no exmh running).

I'll see if it still crashes in uniprocessor mode, if so, I'll put some 
debugging in and see if I can find anything out.  The kernel was last 
built on Jan 16, and that one works fine still, so I'm pretty sure it 
isn't hardware.

Cheers,
-Peter




To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-23 Thread Doug Rabson
On Sat, 23 Jan 1999, Peter Wemm wrote:

 Matthew Dillon wrote:
  :It's definately happening still, sorry. :-(  I recompiled a 100% static 
  :kernel and have had three more explosions, usually after starting exmh.  
  :(exmh takes 10 to 15MB of ram on this system due to my mailbox folder 
  :sizes).
  :
  :However, a clue..  The SMP box that is doing fine is a P6, an NFS client
  :and server (loading nfs.ko, it fsck's fast, so I use that box for making
  :sure the modules work).  The one that is crashing, is a P5, an NFS client
  :and server (static kernel), and with a MFS /tmp.  Both run softupdates (up
  :to date src/contrib/sys).
  :
  :I suspect MFS is the key.  There's the new VOP_FREEBLKS() stuff you added, 
  :and the corresponding calls to madvise to free the pages.
  :
  :Given madvise()'s murky history, I can't help but feel suspicious about it.
  :
  :I've unmounted /tmp and am about to thrash the machine.  At the 
  :moment, it's sitting on:  Swap: 120M Total, 376K Used, 120M Free
  :
  :Cheers,
  :-Peter
  
  Hmmm.  It's possible.  A quick look at the exmh source indicates that
  it uses /tmp a lot.  I've been doing make buildworld's with a 300MB
  MFS /usr/obj, but those are typically nothing more then simple file
  creates, reads, and writes.  Presumably exmh is doing something more
  sophisticated.
 
 I've since disabled MFS, compiled out a couple of other things I'm not 
 using very often and generally cleaned things up.  I've had three more 
 panics since turning off MFS, so that wasn't it. :-(
 
 Anyway, I've just recompiled without SMP.  There were some very strange 
 things happening on the serial console again that I really do not like the 
 look of.  Console output has been disappearing and getting duplicated.
 
  Try changing the panic in vm/vm_page.c to a printf() ( 
 
 I'll do that.
 
 FWIW, this has happened while the system has been nearly quiescent all the 
 way through to being thrashed with parallel cvs updates etc running.  Most 
 times it waits till exmh is running.  Last time (when recompiling without 
 SMP) it crashed when it came to linking the kernel (and no exmh running).
 
 I'll see if it still crashes in uniprocessor mode, if so, I'll put some 
 debugging in and see if I can find anything out.  The kernel was last 
 built on Jan 16, and that one works fine still, so I'm pretty sure it 
 isn't hardware.

I just had one of these on one of my alphas.  The machine is UP
(obviously), no MFS, no dynamically loaded stuff.  It was doing an
installworld with NFSv3 mounted source, local obj.  All filesystems were
using softupdates.

--
Doug Rabson Mail:  d...@nlsystems.com
Nonlinear Systems Ltd.  Phone: +44 181 442 9037



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-23 Thread Doug Rabson
On Sat, 23 Jan 1999, Doug Rabson wrote:

 I just had one of these on one of my alphas.  The machine is UP
 (obviously), no MFS, no dynamically loaded stuff.  It was doing an
 installworld with NFSv3 mounted source, local obj.  All filesystems were
 using softupdates.

I made it happen again by doing the same installworld but this time I
caught it in the debugger.  I'll leave the machine up for a while in case
someone has some idea of how to debug it.  The stacktrace looks like this:

#0  Debugger () at ../../alpha/alpha/db_interface.c:260
#1  0xfc36c2c0 in panic () at ../../kern/kern_shutdown.c:444
#2  0xfc4942fc in vm_page_alloc () at ../../vm/vm_page.c:1041
#3  0xfc3a1b54 in allocbuf () at ../../kern/vfs_bio.c:1791
#4  0xfc3a13d4 in getblk () at ../../kern/vfs_bio.c:1572
#5  0xfc46a150 in ffs_balloc () at ../../ufs/ffs/ffs_balloc.c:170
#6  0xfc4772dc in ffs_write () at vnode_if.h:1015
#7  0xfc3b3c00 in vn_write () at vnode_if.h:331
#8  0xfc37f72c in write () at ../../kern/sys_generic.c:270
#9  0xfc4b0a4c in syscall () at ../../alpha/alpha/trap.c:620
#10 0xfc4a416c in XentSys () at ../../alpha/alpha/exception.s:127


--
Doug Rabson Mail:  d...@nlsystems.com
Nonlinear Systems Ltd.  Phone: +44 181 442 9037



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-23 Thread Matthew Dillon

:I made it happen again by doing the same installworld but this time I
:caught it in the debugger.  I'll leave the machine up for a while in case
:someone has some idea of how to debug it.  The stacktrace looks like this:
:
:#0  Debugger () at ../../alpha/alpha/db_interface.c:260
:#1  0xfc36c2c0 in panic () at ../../kern/kern_shutdown.c:444
:#2  0xfc4942fc in vm_page_alloc () at ../../vm/vm_page.c:1041
:#3  0xfc3a1b54 in allocbuf () at ../../kern/vfs_bio.c:1791

The panic message should be printing the address of the vm_page_t that
it caught.

From the debugger, dump that vm_page_t with 'print'.

print *0xADDRESS

Do about 8 print's bumping the address by 4 ( in hex ) for each.

It would be even better if we could figure out the contents and type
of the underlying object.

-Matt
Matthew Dillon 
dil...@backplane.com

:--
:Doug RabsonMail:  d...@nlsystems.com
:Nonlinear Systems Ltd. Phone: +44 181 442 9037


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-23 Thread Doug Rabson
On Sat, 23 Jan 1999, Matthew Dillon wrote:

 
 :I made it happen again by doing the same installworld but this time I
 :caught it in the debugger.  I'll leave the machine up for a while in case
 :someone has some idea of how to debug it.  The stacktrace looks like this:
 :
 :#0  Debugger () at ../../alpha/alpha/db_interface.c:260
 :#1  0xfc36c2c0 in panic () at ../../kern/kern_shutdown.c:444
 :#2  0xfc4942fc in vm_page_alloc () at ../../vm/vm_page.c:1041
 :#3  0xfc3a1b54 in allocbuf () at ../../kern/vfs_bio.c:1791
 
 The panic message should be printing the address of the vm_page_t that
 it caught.
 
 From the debugger, dump that vm_page_t with 'print'.
 
 print *0xADDRESS
 
 Do about 8 print's bumping the address by 4 ( in hex ) for each.
 
 It would be even better if we could figure out the contents and type
 of the underlying object.

I have full symbols:

(gdb) fr 2
#2  0xfc4942fc in vm_page_alloc () at ../../vm/vm_page.c:1041
1041panic(found dirty cache page %p, m);
(gdb) l
1036 */
1037
1038if (qtype == PQ_CACHE) {
1039#if !defined(MAX_PERF)
1040if (m-dirty)
1041panic(found dirty cache page %p, m);
1042
1043#endif
1044vm_page_busy(m);
1045vm_page_protect(m, VM_PROT_NONE);
(gdb) p m
$4 = (struct vm_page *) 0xfe108f40
(gdb) p *m
$5 = {pageq = {tqe_next = 0x0, tqe_prev = 0xfc52ecc8}, hnext = 0x0, 
  listq = {tqe_next = 0xfe090fe0, tqe_prev = 0xfe0bb6b8}, 
  object = 0xfe00050e2a10, pindex = 12, phys_addr = 88940544, queue = 172, 
  flags = 128, pc = 41, wire_count = 0, hold_count = 0, act_count = 5 '\005', 
  busy = 0 '\000', valid = 65535, dirty = 65535}


--
Doug Rabson Mail:  d...@nlsystems.com
Nonlinear Systems Ltd.  Phone: +44 181 442 9037



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-23 Thread Peter Wemm
Doug Rabson wrote:
 On Sat, 23 Jan 1999, Matthew Dillon wrote:
 
  
  :I made it happen again by doing the same installworld but this time I
  :caught it in the debugger.  I'll leave the machine up for a while in case
  :someone has some idea of how to debug it.  The stacktrace looks like this:
  :
  :#0  Debugger () at ../../alpha/alpha/db_interface.c:260
  :#1  0xfc36c2c0 in panic () at ../../kern/kern_shutdown.c:444
  :#2  0xfc4942fc in vm_page_alloc () at ../../vm/vm_page.c:1041
  :#3  0xfc3a1b54 in allocbuf () at ../../kern/vfs_bio.c:1791
  
  The panic message should be printing the address of the vm_page_t that
  it caught.
  
  From the debugger, dump that vm_page_t with 'print'.
  
  print *0xADDRESS
  
  Do about 8 print's bumping the address by 4 ( in hex ) for each.
  
  It would be even better if we could figure out the contents and type
  of the underlying object.
 
 I have full symbols:
 
 (gdb) fr 2
 #2  0xfc4942fc in vm_page_alloc () at ../../vm/vm_page.c:1041
 1041panic(found dirty cache page %p, m);
 (gdb) l
 1036 */
 1037
 1038if (qtype == PQ_CACHE) {
 1039#if !defined(MAX_PERF)
 1040if (m-dirty)
 1041panic(found dirty cache page %p, m);
 1042
 1043#endif
 1044vm_page_busy(m);
 1045vm_page_protect(m, VM_PROT_NONE);
 (gdb) p m
 $4 = (struct vm_page *) 0xfe108f40
 (gdb) p *m
 $5 = {pageq = {tqe_next = 0x0, tqe_prev = 0xfc52ecc8}, hnext = 0x0, 
   listq = {tqe_next = 0xfe090fe0, tqe_prev = 0xfe0bb6b8}, 
   object = 0xfe00050e2a10, pindex = 12, phys_addr = 88940544, queue = 172
, 
   flags = 128, pc = 41, wire_count = 0, hold_count = 0, act_count = 5 '\005',
 
   busy = 0 '\000', valid = 65535, dirty = 65535}

 --

Doug, Matt wanted some things from m-object too..  If it's still there 
can you grab it?

printf(
found dirty cache page %p (%p,%d,%x) obtype %d obflags %x, 
m,
m-object,
(int)m-pindex,
(int)m-flags,
(int)m-object-type,
(int)m-object-flags
);

BTW; in vm_map.c:
/*
 * vm_map_clean
 * 
 * Push any dirty cached pages in the address range to their pager.
 * If syncio is TRUE, dirty pages are written synchronously.
 * If invalidate is TRUE, any cached pages are freed as well.
 *
 * Returns an error if any part of the specified range is not mapped.
 */
This kinda suggests that dirty cached pages might not be all that 
unusual..  but the code in question seems to be working at a different 
level.

Cheers,
-Peter
--
Peter Wemm pe...@netplex.com.au   Netplex Consulting
No coffee, No workee! :-)



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-23 Thread Peter Wemm
Peter Wemm wrote:
 Matthew Dillon wrote:
[..]
  Try changing the panic in vm/vm_page.c to a printf() ( 
 
 I'll do that.

BTW; what are the dangers of this?  lost disk writes or corruption?  Can 
we (as a workaround) push the page that we found back onto a dirty queue 
and try again after some diagnostics?

 FWIW, this has happened while the system has been nearly quiescent all the 
 way through to being thrashed with parallel cvs updates etc running.  Most 
 times it waits till exmh is running.  Last time (when recompiling without 
 SMP) it crashed when it came to linking the kernel (and no exmh running).
 
 I'll see if it still crashes in uniprocessor mode, if so, I'll put some 
 debugging in and see if I can find anything out.  The kernel was last 
 built on Jan 16, and that one works fine still, so I'm pretty sure it 
 isn't hardware.

It crashed in uniprocessor mode about 60 seconds after sending this mail. 
It's got a really trimmed down kernel config and no modules loaded or in 
use.  I have not disabled softupdates yet, that's next.

This particular machine won't reboot by itself after it's been running in 
SMP mode (it's really old), so I have to manually reset it.  I went to 
sleep straight after that, and it ran the whole time I was asleep.  After 
getting up again, I started exmh, and it crashed 30 seconds later.  There 
was no swapping in progress, I have been tunning top -s1 to see what the 
swap and memory state is when it dies.  Unfortunately I lost the last one, 
but it generally looks like this:

last pid:  6293;  load averages:  0.51,  0.52,  0.65up 0+01:40:54  14:19:06
40 processes:  1 running, 39 sleeping
CPU states:  4.6% user,  0.0% nice, 11.8% system,  1.5% interrupt, 82.1% idle
Mem: 19M Active, 9236K Inact, 13M Wired, 3068K Cache, 4691K Buf, 508K Free
Swap: 120M Total, 128K Used, 120M Free

This machine has 48M of ram, one swap partition only.

Oh, one other thing that occurred to me..  Under 4.0-current, I regularly 
(ie: within 30 seconds of boot) get if_de tranmitter underflows.  My 
console corruption was happening at the instant that de0 was being 
configured with ifconfig.  exmh is running to a remote display over that 
de0 interface.

Under Jan 16 3.0-current, I do not get that tranmitter underflow..

The only thin I can think of about if_de that's unusual that is VM related
(apart from the complexity of the code) is that it uses configmalloc().  I 
wonder if this is somehow setting the scene for the later failures?  It's 
certainly suspicious that has done strange things when being ifconfig'ed, 
including things like trashing the serial console on no less than a dozen 
occasions.

Cheers,
-Peter



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-23 Thread Doug Rabson
On Sun, 24 Jan 1999, Peter Wemm wrote:

 Doug Rabson wrote:
  On Sat, 23 Jan 1999, Matthew Dillon wrote:
  
   
   :I made it happen again by doing the same installworld but this time I
   :caught it in the debugger.  I'll leave the machine up for a while in case
   :someone has some idea of how to debug it.  The stacktrace looks like 
   this:
   :
   :#0  Debugger () at ../../alpha/alpha/db_interface.c:260
   :#1  0xfc36c2c0 in panic () at ../../kern/kern_shutdown.c:444
   :#2  0xfc4942fc in vm_page_alloc () at ../../vm/vm_page.c:1041
   :#3  0xfc3a1b54 in allocbuf () at ../../kern/vfs_bio.c:1791
   
   The panic message should be printing the address of the vm_page_t that
   it caught.
   
   From the debugger, dump that vm_page_t with 'print'.
   
   print *0xADDRESS
   
   Do about 8 print's bumping the address by 4 ( in hex ) for each.
   
   It would be even better if we could figure out the contents and type
   of the underlying object.
  
  I have full symbols:
  
  (gdb) fr 2
  #2  0xfc4942fc in vm_page_alloc () at ../../vm/vm_page.c:1041
  1041panic(found dirty cache page %p, m);
  (gdb) l
  1036 */
  1037
  1038if (qtype == PQ_CACHE) {
  1039#if !defined(MAX_PERF)
  1040if (m-dirty)
  1041panic(found dirty cache page %p, m);
  1042
  1043#endif
  1044vm_page_busy(m);
  1045vm_page_protect(m, VM_PROT_NONE);
  (gdb) p m
  $4 = (struct vm_page *) 0xfe108f40
  (gdb) p *m
  $5 = {pageq = {tqe_next = 0x0, tqe_prev = 0xfc52ecc8}, hnext = 0x0, 
listq = {tqe_next = 0xfe090fe0, tqe_prev = 0xfe0bb6b8}, 
object = 0xfe00050e2a10, pindex = 12, phys_addr = 88940544, queue = 
  172
 , 
flags = 128, pc = 41, wire_count = 0, hold_count = 0, act_count = 5 
  '\005',
  
busy = 0 '\000', valid = 65535, dirty = 65535}
 
  --
 
 Doug, Matt wanted some things from m-object too..  If it's still there 
 can you grab it?
 
 printf(
   found dirty cache page %p (%p,%d,%x) obtype %d obflags %x, 
   m,
   m-object,
   (int)m-pindex,
   (int)m-flags,
   (int)m-object-type,
   (int)m-object-flags
   );

He sent me private mail asking for m-object which I replied to.  Here is
*m-object:

$6 = {object_list = {tqe_next = 0xfe0005369870, 
tqe_prev = 0xfe000527e0b8}, shadow_head = {tqh_first = 0x0, 
tqh_last = 0xfe00050e2a20}, shadow_list = {tqe_next = 0x0, 
tqe_prev = 0xfe00052d2020}, memq = {tqh_first = 0xfe0c8f80, 
tqh_last = 0xfe115a78}, generation = 897, type = OBJT_DEFAULT, 
  size = 23, ref_count = 1, shadow_count = 0, pg_color = 4, 
  hash_rand = -15145890, flags = 8192, paging_in_progress = 0, behavior = 0, 
  resident_page_count = 15, cache_count = 15, wire_count = 0, 
  backing_object = 0x0, backing_object_offset = 0, last_read = 0, 
  pager_object_list = {tqe_next = 0x0, tqe_prev = 0x0}, handle = 0x0, 
  un_pager = {vnp = {vnp_size = 754}, devp = {devp_pglist = {
tqh_first = 0x2f2, tqh_last = 0x0}}, swp = {swp_bcount = 754}}}

 
 BTW; in vm_map.c:
 /*
  * vm_map_clean
  * 
  * Push any dirty cached pages in the address range to their pager.
  * If syncio is TRUE, dirty pages are written synchronously.
  * If invalidate is TRUE, any cached pages are freed as well.
  *
  * Returns an error if any part of the specified range is not mapped.
  */
 This kinda suggests that dirty cached pages might not be all that 
 unusual..  but the code in question seems to be working at a different 
 level.

I'm not too familiar with this code.  It is only called from msync(2) as
far as I can see.

--
Doug Rabson Mail:  d...@nlsystems.com
Nonlinear Systems Ltd.  Phone: +44 181 442 9037



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-23 Thread Matthew Dillon

:[..]
:  Try changing the panic in vm/vm_page.c to a printf() ( 
: 
: I'll do that.
:
:BTW; what are the dangers of this?  lost disk writes or corruption?  Can 
:we (as a workaround) push the page that we found back onto a dirty queue 
:and try again after some diagnostics?

That's ok, don't worry about it...  Doug's debug output gives me the
same info.

:It crashed in uniprocessor mode about 60 seconds after sending this mail. 
:It's got a really trimmed down kernel config and no modules loaded or in 
:use.  I have not disabled softupdates yet, that's next.

I don't get it why can't I reproduce this problem?

Can you email me your kernel configuration?  Are you using any
special devices like vn or something ?

:Oh, one other thing that occurred to me..  Under 4.0-current, I regularly 
:(ie: within 30 seconds of boot) get if_de tranmitter underflows.  My 
:console corruption was happening at the instant that de0 was being 
:configured with ifconfig.  exmh is running to a remote display over that 
:de0 interface.
:
:Under Jan 16 3.0-current, I do not get that tranmitter underflow..
:
:The only thin I can think of about if_de that's unusual that is VM related
:(apart from the complexity of the code) is that it uses configmalloc().  I 
:wonder if this is somehow setting the scene for the later failures?  It's 
:certainly suspicious that has done strange things when being ifconfig'ed, 
:including things like trashing the serial console on no less than a dozen 
:occasions.
:
:Cheers,
:-Peter

Hmmm..  HMM.  contigmalloc, eh?   You might be onto something here.
I will investigate it.

The problem was are having is that, somehow, a vm_page_t in the PQ_CACHE
is being set dirty.  

Sinc vm_page_cache() panics if m-dirty is set, then m-dirty must be 
getting
set *after* the page has been moved to the cache.

contigmalloc() looks suspicious.


-Matt
Matthew Dillon 
dil...@backplane.com


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-23 Thread Peter Wemm
Matthew Dillon wrote:
[..]
 :Oh, one other thing that occurred to me..  Under 4.0-current, I regularly 
 :(ie: within 30 seconds of boot) get if_de tranmitter underflows.  My 
 :console corruption was happening at the instant that de0 was being 
 :configured with ifconfig.  exmh is running to a remote display over that 
 :de0 interface.
 :
 :Under Jan 16 3.0-current, I do not get that tranmitter underflow..
 :
 :The only thin I can think of about if_de that's unusual that is VM related
 :(apart from the complexity of the code) is that it uses configmalloc().  I 
 :wonder if this is somehow setting the scene for the later failures?  It's 
 :certainly suspicious that has done strange things when being ifconfig'ed, 
 :including things like trashing the serial console on no less than a dozen 
 :occasions.
 :
 :Cheers,
 :-Peter
 
 Hmmm..  HMM.  contigmalloc, eh?   You might be onto something here.
 I will investigate it.
 
 The problem was are having is that, somehow, a vm_page_t in the PQ_CACHE
 is being set dirty.  
 
 Sinc vm_page_cache() panics if m-dirty is set, then m-dirty must be get
ting
 set *after* the page has been moved to the cache.
 
 contigmalloc() looks suspicious.

Damn, I must be loosing my mind.  if_de doesn't use contigmalloc.. it 
either did, or was going to as a result of the problem of the transmit 
descriptor array crossing a page boundary that had to be contiguous.  In 
the end there were two seperate malloc's, each less than PAGE_SIZE.

Cheers,
-Peter




To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-23 Thread Doug Rabson
On Sun, 24 Jan 1999, Peter Wemm wrote:

 
 Oh, one other thing that occurred to me..  Under 4.0-current, I regularly 
 (ie: within 30 seconds of boot) get if_de tranmitter underflows.  My 
 console corruption was happening at the instant that de0 was being 
 configured with ifconfig.  exmh is running to a remote display over that 
 de0 interface.
 
 Under Jan 16 3.0-current, I do not get that tranmitter underflow..

One of my alpha boxes has always got a few of these errors when it first
transmits a largish packet.  It happened under NetBSD and FreeBSD since I
bought the machine (about June last year I think).  Andrew Gallatin has
seen similar errors on OSF1.  I think its harmless.

 
 The only thin I can think of about if_de that's unusual that is VM related
 (apart from the complexity of the code) is that it uses configmalloc().  I 
 wonder if this is somehow setting the scene for the later failures?  It's 
 certainly suspicious that has done strange things when being ifconfig'ed, 
 including things like trashing the serial console on no less than a dozen 
 occasions.

I can't see where if_de is using contigmalloc().  I thought the bus_dma
code in there wasn't used on FreeBSD.

--
Doug Rabson Mail:  d...@nlsystems.com
Nonlinear Systems Ltd.  Phone: +44 181 442 9037



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-23 Thread N
On Sun, 24 Jan 1999, Peter Wemm wrote:

[..]
 Oh, one other thing that occurred to me..  Under 4.0-current, I regularly 
 (ie: within 30 seconds of boot) get if_de tranmitter underflows.  My 
 console corruption was happening at the instant that de0 was being 
 configured with ifconfig.  exmh is running to a remote display over that 
 de0 interface.

Here too... pretty quickly after boot on a SMP machine (current as of Jan
12) that pushes quite a bit of traffic, the following messages appear:

de0: abnormal interrupt: transmit underflow (raising TX threshold to 96|256)
de0: abnormal interrupt: transmit underflow (raising TX threshold to 128|512)
de0: abnormal interrupt: transmit underflow (raising TX threshold to 160|1024)

The card is:

de0: Digital 21140A Fast Ethernet rev 0x22 int a irq 16 on pci0.12.0
de0: 21140A [10-100Mb/s] pass 2.2
de0: address 00:c0:f0:1f:5d:0d
de0: enabling Full Duplex 100baseTX port

Actually a Kingston clone, not a real DEC (so 1/5th of the price - but the
receiver doesn't go audibly *click* when it's autosensing).

So far I've gotten this message once:

de0: abnormal interrupt: transmit underflow (switching to store-and-forward 
mode)

Any harm in them, or can I safely ignore them?  Would it be a good idea to
raise the TX threshold by default, if only to avoid the messages?
It's plugged into a Catalyst switch, if it makes any difference...


-- Niels.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-23 Thread Matthew Dillon
Yes, we're working on it in a sub-group.

Since the panic message is a new one -- it's one I added that never existed
in -3.x, it is possible that the bug is not related to my VM stuff but
related to something else going on.

I've found a number of other bugs in the greater VM system which I am 
comitting fixes for, *BUT* I don't think any of them are related to this
particular panic.

I am also comitting some very strict KASSERT checking to try to catch
the problem earlier.  Everyone running 4.x kernels should add the following
options to your kernel config:

options INVARIANTS
options INVARIANT_SUPPORT


-Matt
Matthew Dillon 
dil...@backplane.com

:On Sun, 24 Jan 1999, Peter Wemm wrote:
:
:[..]
: Oh, one other thing that occurred to me..  Under 4.0-current, I regularly 
: (ie: within 30 seconds of boot) get if_de tranmitter underflows.  My 
: console corruption was happening at the instant that de0 was being 
: configured with ifconfig.  exmh is running to a remote display over that 
: de0 interface.
:
:Here too... pretty quickly after boot on a SMP machine (current as of Jan
:12) that pushes quite a bit of traffic, the following messages appear:
:
:de0: abnormal interrupt: transmit underflow (raising TX threshold to 96|256)
:de0: abnormal interrupt: transmit underflow (raising TX threshold to 128|512)
:de0: abnormal interrupt: transmit underflow (raising TX threshold to 160|1024)
:
:The card is:
:
:de0: Digital 21140A Fast Ethernet rev 0x22 int a irq 16 on pci0.12.0
:de0: 21140A [10-100Mb/s] pass 2.2
:de0: address 00:c0:f0:1f:5d:0d
:de0: enabling Full Duplex 100baseTX port
:
:Actually a Kingston clone, not a real DEC (so 1/5th of the price - but the
:receiver doesn't go audibly *click* when it's autosensing).
:
:So far I've gotten this message once:
:
:de0: abnormal interrupt: transmit underflow (switching to store-and-forward 
mode)
:
:Any harm in them, or can I safely ignore them?  Would it be a good idea to
:raise the TX threshold by default, if only to avoid the messages?
:It's plugged into a Catalyst switch, if it makes any difference...
:
:
:   -- Niels.
:
:
:To Unsubscribe: send mail to majord...@freebsd.org
:with unsubscribe freebsd-current in the body of the message
:


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-23 Thread Matthew Dillon
:Here too... pretty quickly after boot on a SMP machine (current as of Jan
:12) that pushes quite a bit of traffic, the following messages appear:
:
:de0: abnormal interrupt: transmit underflow (raising TX threshold to 96|256)
:de0: abnormal interrupt: transmit underflow (raising TX threshold to 128|512)
:de0: abnormal interrupt: transmit underflow (raising TX threshold to 160|1024)
:
:The card is:
:
:de0: Digital 21140A Fast Ethernet rev 0x22 int a irq 16 on pci0.12.0
:de0: 21140A [10-100Mb/s] pass 2.2
:de0: address 00:c0:f0:1f:5d:0d
:de0: enabling Full Duplex 100baseTX port

Three people getting these panics, three people with DEC ethernet
cards.  Random complaints about card during ifconfig: speaker goes click,
console gets junked, etc etc etc.

Is there anyone having this panic who does NOT have a DEC ethernet card ?

I still don't think the card is causing the problem, but it would be nice
if we could rule it out.

GRIN

-Matt

:   -- Niels.

Matthew Dillon 
dil...@backplane.com

To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-23 Thread N
 Here too... pretty quickly after boot on a SMP machine (current as of Jan
 12) that pushes quite a bit of traffic, the following messages appear:
 de0: abnormal interrupt: transmit underflow (raising TX threshold to 96|256)
[..]

 Three people getting these panics, three people with DEC ethernet
 cards.  Random complaints about card during ifconfig: speaker goes click,
 console gets junked, etc etc etc.

Actually, I haven't had the console of that SMP machine junked yet, but
that's because there is no console worth speaking of.  Previous reboot was
because processes like tail(1) only appeared to hang, unkillable except by
-9, and after attaching monitor and keyboard, upon pressing Enter at a
login: prompt the cursor would only advance a line once...

But that was a week ago, and it's a *busy* news server (that's not hitting
swap), I was just curious about the error messages from the de driver.


-- Niels.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-23 Thread Matthew Dillon
:But that was a week ago, and it's a *busy* news server (that's not hitting
:swap), I was just curious about the error messages from the de driver.
:
:   -- Niels.

The transmit underflow messages:

de0: abnormal interrupt: transmit underflow (raising TX threshold to 96|256)
de0: abnormal interrupt: transmit underflow (raising TX threshold to 128|512)
de0: abnormal interrupt: transmit underflow (raising TX threshold to 160|1024)

can typically be ignored.  It simply means that the DEC card has too small
a transmit FIFO and is getting DMA underflows.  Stupid card.

-Matt

Matthew Dillon 
dil...@backplane.com

To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-23 Thread Brian Feldman
On Sat, 23 Jan 1999, Matthew Dillon wrote:

 Yes, we're working on it in a sub-group.
 
 Since the panic message is a new one -- it's one I added that never 
 existed
 in -3.x, it is possible that the bug is not related to my VM stuff but
 related to something else going on.
 
 I've found a number of other bugs in the greater VM system which I am 
 comitting fixes for, *BUT* I don't think any of them are related to this
 particular panic.
 
 I am also comitting some very strict KASSERT checking to try to catch
 the problem earlier.  Everyone running 4.x kernels should add the 
 following

Ahem, would you kindly define 'everyone'?

 options to your kernel config:
 
   options INVARIANTS
   options INVARIANT_SUPPORT
 
 
   -Matt
   Matthew Dillon 
   dil...@backplane.com
 
 :On Sun, 24 Jan 1999, Peter Wemm wrote:
 :
 :[..]
 : Oh, one other thing that occurred to me..  Under 4.0-current, I regularly 
 : (ie: within 30 seconds of boot) get if_de tranmitter underflows.  My 
 : console corruption was happening at the instant that de0 was being 
 : configured with ifconfig.  exmh is running to a remote display over that 
 : de0 interface.
 :
 :Here too... pretty quickly after boot on a SMP machine (current as of Jan
 :12) that pushes quite a bit of traffic, the following messages appear:
 :
 :de0: abnormal interrupt: transmit underflow (raising TX threshold to 96|256)
 :de0: abnormal interrupt: transmit underflow (raising TX threshold to 128|512)
 :de0: abnormal interrupt: transmit underflow (raising TX threshold to 
 160|1024)
 :
 :The card is:
 :
 :de0: Digital 21140A Fast Ethernet rev 0x22 int a irq 16 on pci0.12.0
 :de0: 21140A [10-100Mb/s] pass 2.2
 :de0: address 00:c0:f0:1f:5d:0d
 :de0: enabling Full Duplex 100baseTX port
 :
 :Actually a Kingston clone, not a real DEC (so 1/5th of the price - but the
 :receiver doesn't go audibly *click* when it's autosensing).
 :
 :So far I've gotten this message once:
 :
 :de0: abnormal interrupt: transmit underflow (switching to store-and-forward 
 mode)
 :
 :Any harm in them, or can I safely ignore them?  Would it be a good idea to
 :raise the TX threshold by default, if only to avoid the messages?
 :It's plugged into a Catalyst switch, if it makes any difference...
 :
 :
 : -- Niels.
 :
 :
 :To Unsubscribe: send mail to majord...@freebsd.org
 :with unsubscribe freebsd-current in the body of the message
 :
 
 
 To Unsubscribe: send mail to majord...@freebsd.org
 with unsubscribe freebsd-current in the body of the message
 

 Brian Feldman_ __  ___ ___ ___  
 gr...@unixhelp.org   _ __ ___ | _ ) __|   \ 
 http://www.freebsd.org/ _ __ ___  | _ \__ \ |) |
 FreeBSD: The Power to Serve!  _ __ ___  _ |___/___/___/ 


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-23 Thread Matthew Dillon
: 
: I am also comitting some very strict KASSERT checking to try to catch
: the problem earlier.  Everyone running 4.x kernels should add the 
following
:
:Ahem, would you kindly define 'everyone'?

Anyone, everyone, everybody, all ... any individual using the -4.x
kernels needs to understand the highly experimental nature of said
kernels.  Turning on INVARIANTS is just plain smart.   For many
reasons but I will give you the top two:

* The sanity checks could save your disks when someone
  commits a major mistake.

* The sanity checks make it easier for bugs to be found and
  fixed when they do occur.

-4.x is just getting on its feet, nobody should be shipping
product with it for a while ( if they are, they are insane ).

-Matt
Matthew Dillon 
dil...@backplane.com
: options to your kernel config:
: 
:  options INVARIANTS
:  options INVARIANT_SUPPORT


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-23 Thread Peter Wemm
Matthew Dillon wrote:
 :But that was a week ago, and it's a *busy* news server (that's not hitting
 :swap), I was just curious about the error messages from the de driver.
 :
 : -- Niels.
 
 The transmit underflow messages:
 
 de0: abnormal interrupt: transmit underflow (raising TX threshold to 96|256)
 de0: abnormal interrupt: transmit underflow (raising TX threshold to 128|512)
 de0: abnormal interrupt: transmit underflow (raising TX threshold to 160|1024
)
 
 can typically be ignored.  It simply means that the DEC card has too smal
l
 a transmit FIFO and is getting DMA underflows.  Stupid card.

As I understand it, what's happening is that it's reacting to pci bus
congestion by raising the preread threshholds.  It degenerates to fetching
the entire frame into on-card (or chip) memory before beginning 
transmission.

On my system I can understand it, it's a 2xP5 with a shared L2 cache on a 
Neptune chipset - something that isn't known for speed.  Once you get two 
processors hammering the system bus, *plus* mix in an EISA scsi 
controller, I could well imagine the memory bus getting thrashed.

I'm not sure how to read the messages. Looking at the if_pn driver as 
well, it looks like both start with a FIFO threshold of 72 bytes.  I think 
that '160|1024' (for example) means start transmitting when the FIFO has 
fetched 160 bytes and don't stop fetching unless we hit 1024 bytes in the 
fifo.

Store and forward mode (I believe) is a degenerate case where it fetches
the entire packet into the buffer before beginning transmission.

Bill Paul's if_pn driver doesn't react to an underflow at all..  it stays 
at 72/128 permanently.

For what it's worth, the de cards are the only ones I've found that can 
work at all on this system at 100Mbit.  The realtek 8139 cards (cheap!) 
went belly-up on the spot, no suprise there.  I don't have an fxp card to 
test.

   -Matt

Cheers,
-Peter




To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-23 Thread Matthew Dillon
:On my system I can understand it, it's a 2xP5 with a shared L2 cache on a 
:Neptune chipset - something that isn't known for speed.  Once you get two 
:processors hammering the system bus, *plus* mix in an EISA scsi 
:controller, I could well imagine the memory bus getting thrashed.

When we started throwing together Duel-P-II machines, we basically
had to throw away our DEC chipset cards...  I think that the DEC chip
cards, at least the older ones, have serious PCI spec bugs that cause
them to operate incorrectly on duel-cpu machines when more then one
cpu is populated.

-Matt



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-23 Thread Matthew Dillon
I've committed one bug fix to the 'found dirty cache page' bug --
turns out vm_map_split() was the culprit, renaming pages
without removing them from PQ_CACHE.  The bug was introduced
in -3.0, and hit the KASSERT() I put in -4.x.

I've committed a general inlining of 'changing the page dirty
status to VM_PAGE_BITS_ALL' and put a sanity check in the inline.
If this problem occurs again, you will get a different panic.
One of:

vm_page_dirty: page in cache!
vm_page_busy: page already busy!!!
vm_page_wakeup: page not busy!!!

If your box drops into DDB, please get a backtrace and report
it to the list or to me and we should be able to track down
any remaining dirty-pages-in-PQ_CACHE bugs.

-Matt
Matthew Dillon 
dil...@backplane.com


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-22 Thread Manfred Antar
At 10:34 AM 1/23/99 +0800, Peter Wemm wrote:
Dual p5-90 w/ 48M ram, doing a major cvs update/merge (which mostly got 
lost):

panic: found dirty cache page 0xf046f1c0
mp_lock = 0101; cpuid = 1; lapic.id = 0100
Debugger(panic)
Stopped at  Debugger+0x37:  movl$0,in_Debugger
db trace
Debugger(f01f1806) at Debugger+0x37
panic(f01fbb50,f046f1c0,0,80,f45cbb20) at panic+0xa4
vm_page_alloc(f45f6f68,80,3,0,80) at vm_page_alloc+0x114
vm_page_grab(f45f6f68,80,83,0,80) at vm_page_grab+0x8d
_pmap_allocpte(f45cbb20,80,201df000,201df000,2a86000) at _pmap_allocpte+0x19
pmap_allocpte(f45cbb20,201df000,f02c4df4,201df000,f45cbac0) at 
pmap_allocpte+0x53
pmap_enter(f45cbb20,201df000,2a86000,5,0) at pmap_enter+0x3d
vm_fault(f45cbac0,201df000,1,0,f4195180) at vm_fault+0x891
trap_pfault(f45f9fbc,1,201df236) at trap_pfault+0xf2
trap(27,27,,5,efbfad38) at trap+0x1c2
calltrap() at calltrap+0x3c
--- trap 0xc, eip = 0x201df236, esp = 0xefbfac4c, ebp = 0xefbfad38 ---
db c
boot() called on cpu#1

syncing disks... 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 
232
 232 232 232 232 giving up
1: dev:, flags:20020034, blkno:1057008, lblkno:0
[..]

This was compiled two houts ago from absolute latest -current:
FreeBSD spinner.netplex.com.au 4.0-CURRENT FreeBSD 4.0-CURRENT #385:
Sat Jan 23 08:38:42 WST 1999
pe...@spinner.netplex.com.au:/home/src/sys/compile/SPINNER  i386

My other SMP machine (2xPPro200) seems to be running fine:
FreeBSD beast.netplex.com.au 4.0-CURRENT FreeBSD 4.0-CURRENT #267:
Thu Jan 21 21:39:45 WST 1999
pe...@beast.netplex.com.au:/home/src/sys/compile/BEAST  i386

Cheers,
-Peter

I just got the same thing doing a make -j8 world
Machine is a dual pentium pro Intel PR440FX
This must be from the recent vm changes as I could make -j8 world
continually a 
few days ago without problem. This is the second time it happened to me 
the first time I was running X so I couldn't see the debugger message .
This time without X I got the :

panic: found dirty cache page

Manfred
=
||man...@netcom.com||
||p...@infinex.com ||
||Ph. (415) 681-6235||
=


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-22 Thread Peter Wemm
Peter Wemm wrote:
 Dual p5-90 w/ 48M ram, doing a major cvs update/merge (which mostly got 
 lost):
 
 panic: found dirty cache page 0xf046f1c0
 mp_lock = 0101; cpuid = 1; lapic.id = 0100
 Debugger(panic)
 Stopped at  Debugger+0x37:  movl$0,in_Debugger
 db trace

This is possibly a false alarm..  Something wierd was happening.  I cleaned
out the kernel and reconfigured with NFS static (it was being loaded) and
it seems to boot OK.  At least, I'm not getting console corruption (random
baud rate changes) and the SMP mutex being broken and both cpu's entering
the kernel at once.  I think I'll blame it on the 15 hour electrical 
storm. :-]

Cheers,
-Peter




To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-22 Thread Matthew Dillon
:Peter Wemm wrote:
: Dual p5-90 w/ 48M ram, doing a major cvs update/merge (which mostly got 
: lost):
: 
: panic: found dirty cache page 0xf046f1c0
:...
:
:This is possibly a false alarm..  Something wierd was happening.  I cleaned
:out the kernel and reconfigured with NFS static (it was being loaded) and
:it seems to boot OK.  At least, I'm not getting console corruption (random
:baud rate changes) and the SMP mutex being broken and both cpu's entering
:the kernel at once.  I think I'll blame it on the 15 hour electrical 
:storm. :-]
:
:Cheers,
:-Peter

An old nfs module would almost certainly not work with the new
kernel without at least a recompile.  I'd definitely recommend
keeping the major modules compiled in rather then dynamically 
loaded, just on principle.  In fact, in all my time at BEST and 
all my time playing with FreeBSD, I have *never* used any 
dynamic module except for the linux compatibility thingy, and
even that was only a fluke.  If you can compile it in, compile
it in.

But, keep a watch on it.  I didn't have an SMP box to test
the new VM stuff on so it's possible there's something going
on there.

-Matt
Matthew Dillon 
dil...@backplane.com


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-22 Thread Matthew Dillon

:At 10:34 AM 1/23/99 +0800, Peter Wemm wrote:
:Dual p5-90 w/ 48M ram, doing a major cvs update/merge (which mostly got 
:lost):
:
:panic: found dirty cache page 0xf046f1c0
:mp_lock = 0101; cpuid = 1; lapic.id = 0100
:...

:I just got the same thing doing a make -j8 world
:Machine is a dual pentium pro Intel PR440FX
:This must be from the recent vm changes as I could make -j8 world
:continually a 
:few days ago without problem. This is the second time it happened to me 
:the first time I was running X so I couldn't see the debugger message .
:This time without X I got the :
:
:panic: found dirty cache page
:
:Manfred

Any dynamically loaded modules?  e.g. nfs?  Did you update 
/usr/src/contrib/sys (i.e. softupdates ) along with /usr/src/sys ?
Are you using vinum?

-Matt

:=
:||man...@netcom.com||
:||p...@infinex.com ||
:||Ph. (415) 681-6235||
:=

Matthew Dillon 
dil...@backplane.com

To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-22 Thread Brian Feldman
On Fri, 22 Jan 1999, Manfred Antar wrote:

 At 10:34 AM 1/23/99 +0800, Peter Wemm wrote:
 Dual p5-90 w/ 48M ram, doing a major cvs update/merge (which mostly got 
 lost):
 
 panic: found dirty cache page 0xf046f1c0
 mp_lock = 0101; cpuid = 1; lapic.id = 0100
 Debugger(panic)
 Stopped at  Debugger+0x37:  movl$0,in_Debugger
 db trace
 Debugger(f01f1806) at Debugger+0x37
 panic(f01fbb50,f046f1c0,0,80,f45cbb20) at panic+0xa4
 vm_page_alloc(f45f6f68,80,3,0,80) at vm_page_alloc+0x114
 vm_page_grab(f45f6f68,80,83,0,80) at vm_page_grab+0x8d
 _pmap_allocpte(f45cbb20,80,201df000,201df000,2a86000) at _pmap_allocpte+0x19
 pmap_allocpte(f45cbb20,201df000,f02c4df4,201df000,f45cbac0) at 
 pmap_allocpte+0x53
 pmap_enter(f45cbb20,201df000,2a86000,5,0) at pmap_enter+0x3d
 vm_fault(f45cbac0,201df000,1,0,f4195180) at vm_fault+0x891
 trap_pfault(f45f9fbc,1,201df236) at trap_pfault+0xf2
 trap(27,27,,5,efbfad38) at trap+0x1c2
 calltrap() at calltrap+0x3c
 --- trap 0xc, eip = 0x201df236, esp = 0xefbfac4c, ebp = 0xefbfad38 ---
 db c
 boot() called on cpu#1
 
 syncing disks... 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 
 232
  232 232 232 232 giving up
 1: dev:, flags:20020034, blkno:1057008, lblkno:0
 [..]
 
 This was compiled two houts ago from absolute latest -current:
 FreeBSD spinner.netplex.com.au 4.0-CURRENT FreeBSD 4.0-CURRENT #385:
 Sat Jan 23 08:38:42 WST 1999
 pe...@spinner.netplex.com.au:/home/src/sys/compile/SPINNER  i386
 
 My other SMP machine (2xPPro200) seems to be running fine:
 FreeBSD beast.netplex.com.au 4.0-CURRENT FreeBSD 4.0-CURRENT #267:
 Thu Jan 21 21:39:45 WST 1999
 pe...@beast.netplex.com.au:/home/src/sys/compile/BEAST  i386
 
 Cheers,
 -Peter
 
 I just got the same thing doing a make -j8 world
 Machine is a dual pentium pro Intel PR440FX
 This must be from the recent vm changes as I could make -j8 world
 continually a 
 few days ago without problem. This is the second time it happened to me 
 the first time I was running X so I couldn't see the debugger message .
 This time without X I got the :
 
 panic: found dirty cache page

You should definitely be using DDB_UNATTENDED, by the way, if you're going
to be running X and want DDB but not to have DDB try to pop up on a panic.
I did get DDB_UNATTENDED behavior finally working as well as it should, so
there's no reason not to use it.

 
 Manfred
 =
 ||man...@netcom.com||
 ||p...@infinex.com ||
 ||Ph. (415) 681-6235||
 =
 
 
 To Unsubscribe: send mail to majord...@freebsd.org
 with unsubscribe freebsd-current in the body of the message
 

 Brian Feldman_ __  ___ ___ ___  
 gr...@unixhelp.org   _ __ ___ | _ ) __|   \ 
 http://www.freebsd.org/ _ __ ___  | _ \__ \ |) |
 FreeBSD: The Power to Serve!  _ __ ___  _ |___/___/___/ 


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message


Re: panic: found dirty cache page 0xf046f1c0

1999-01-22 Thread Peter Wemm
Matthew Dillon wrote:
 :Peter Wemm wrote:
 : Dual p5-90 w/ 48M ram, doing a major cvs update/merge (which mostly got 
 : lost):
 : 
 : panic: found dirty cache page 0xf046f1c0
 :...
 :
 :This is possibly a false alarm..  Something wierd was happening.  I cleaned
 :out the kernel and reconfigured with NFS static (it was being loaded) and
 :it seems to boot OK.  At least, I'm not getting console corruption (random
 :baud rate changes) and the SMP mutex being broken and both cpu's entering
 :the kernel at once.  I think I'll blame it on the 15 hour electrical 
 :storm. :-]
 :
 :Cheers,
 :-Peter
 
 An old nfs module would almost certainly not work with the new
 kernel without at least a recompile.  I'd definitely recommend
 keeping the major modules compiled in rather then dynamically 
 loaded, just on principle.  In fact, in all my time at BEST and 
 all my time playing with FreeBSD, I have *never* used any 
 dynamic module except for the linux compatibility thingy, and
 even that was only a fluke.  If you can compile it in, compile
 it in.

It's definately happening still, sorry. :-(  I recompiled a 100% static 
kernel and have had three more explosions, usually after starting exmh.  
(exmh takes 10 to 15MB of ram on this system due to my mailbox folder 
sizes).

 But, keep a watch on it.  I didn't have an SMP box to test
 the new VM stuff on so it's possible there's something going
 on there.

However, a clue..  The SMP box that is doing fine is a P6, an NFS client
and server (loading nfs.ko, it fsck's fast, so I use that box for making
sure the modules work).  The one that is crashing, is a P5, an NFS client
and server (static kernel), and with a MFS /tmp.  Both run softupdates (up
to date src/contrib/sys).

I suspect MFS is the key.  There's the new VOP_FREEBLKS() stuff you added, 
and the corresponding calls to madvise to free the pages.

Given madvise()'s murky history, I can't help but feel suspicious about it.

I've unmounted /tmp and am about to thrash the machine.  At the 
moment, it's sitting on:  Swap: 120M Total, 376K Used, 120M Free

Cheers,
-Peter



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message