Re: panic: found dirty cache page 0xf046f1c0
As Matthew Dillon wrote... I've committed one bug fix to the 'found dirty cache page' bug -- turns out vm_map_split() was the culprit, renaming pages without removing them from PQ_CACHE. The bug was introduced in -3.0, and hit the KASSERT() I put in -4.x. I've committed a general inlining of 'changing the page dirty status to VM_PAGE_BITS_ALL' and put a sanity check in the inline. If this problem occurs again, you will get a different panic. One of: vm_page_dirty: page in cache! vm_page_busy: page already busy!!! vm_page_wakeup: page not busy!!! If your box drops into DDB, please get a backtrace and report it to the list or to me and we should be able to track down any remaining dirty-pages-in-PQ_CACHE bugs. FYI: a buildworld of -current including the above on FreeBSD/axp completed without any incidents. Wilko _ __ | / o / / _ Bulteemail: wi...@yedi.iaf.nl |/|/ / / /( (_) Arnhem, The Netherlands WWW : http://www.tcja.nl __ Powered by FreeBSD __ To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
:FYI: a buildworld of -current including the above on FreeBSD/axp completed :without any incidents. : :Wilko :... :... ( other reports ) We are looking good, I've got half a dozen positive reports! On general principles, I think it is possible to make the FreeBSD VM system bulletproof. The problem is that there are lots of odd exceptions and special rules that haven't been black-boxed or even documented ( other then being in John's head, which isn't all that useful to me ). The rules tend to be layed out in code on each occurance, which inevitably leads to mistakes. The mistakes are further compounded by a severe lack of enforcement ( KASSERT()s ) and thus propogate from release to release, building up as time passes. With appropriate black boxing, documentation, and enforcement, it should be fairly easy to shorten the development cycle on finding the bugs. __inline procedures are a godsend because there are literally a hundred places in the code where someone 'optimized' it by doing a manual expansion of something from some other module in order to avoid a subroutine call. This cross module pollinization tends to make things even less readable. Bleh. So, for example, a few commits ago I added enforcement of the no-dirty- pages-on-cache-queue rule and systems started to panic. That enforcement had to be extended to every dirtying of a page before we actually found the bug ( which turned out to be a -3.x bug ). More recently I have added enforcement for PG_BUSY state changes to disallow the busying of an already-busy page, and unbusying of a non-busy page. In discussions with John, there are a number of other rules that have been broken and need to be fixed. Pages on PQ_CACHE are supposed to be unqueued prior to being busied, held, or wired, for example, but the rule is pretty much ignored and a lot of code was hacked in to check for and requeue ( to another queue) the busy-page-on-cache case. Entry conditions, exit conditions, and side effects for procedures are mostly undocumented. biodone() sequencing is not well documented, and struct buf's have a 'kitchen sink' mentality from being hacked up so much. There are currently too many NFS-specific exceptions strewn all over the code. It all works, but it is also a mess. -Matt Matthew Dillon dil...@backplane.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
:It's definately happening still, sorry. :-( I recompiled a 100% static :kernel and have had three more explosions, usually after starting exmh. :(exmh takes 10 to 15MB of ram on this system due to my mailbox folder :sizes). : :However, a clue.. The SMP box that is doing fine is a P6, an NFS client :and server (loading nfs.ko, it fsck's fast, so I use that box for making :sure the modules work). The one that is crashing, is a P5, an NFS client :and server (static kernel), and with a MFS /tmp. Both run softupdates (up :to date src/contrib/sys). : :I suspect MFS is the key. There's the new VOP_FREEBLKS() stuff you added, :and the corresponding calls to madvise to free the pages. : :Given madvise()'s murky history, I can't help but feel suspicious about it. : :I've unmounted /tmp and am about to thrash the machine. At the :moment, it's sitting on: Swap: 120M Total, 376K Used, 120M Free : :Cheers, :-Peter Hmmm. It's possible. A quick look at the exmh source indicates that it uses /tmp a lot. I've been doing make buildworld's with a 300MB MFS /usr/obj, but those are typically nothing more then simple file creates, reads, and writes. Presumably exmh is doing something more sophisticated. Try changing the panic in vm/vm_page.c to a printf() ( if (m-dirty) panic(found dirty cache page %p, m); if (m-dirty) printf( found dirty cache page %p (%p,%d,%x) obtype %d obflags %x, m, m-object, (int)m-pindex, (int)m-flags, (int)m-object-type, (int)m-object-flags ); Lets see what we get. This should tell me what kind of object the page is attached to and the flags of the page and object. -Matt Matthew Dillon dil...@backplane.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
Matthew Dillon wrote: :It's definately happening still, sorry. :-( I recompiled a 100% static :kernel and have had three more explosions, usually after starting exmh. :(exmh takes 10 to 15MB of ram on this system due to my mailbox folder :sizes). : :However, a clue.. The SMP box that is doing fine is a P6, an NFS client :and server (loading nfs.ko, it fsck's fast, so I use that box for making :sure the modules work). The one that is crashing, is a P5, an NFS client :and server (static kernel), and with a MFS /tmp. Both run softupdates (up :to date src/contrib/sys). : :I suspect MFS is the key. There's the new VOP_FREEBLKS() stuff you added, :and the corresponding calls to madvise to free the pages. : :Given madvise()'s murky history, I can't help but feel suspicious about it. : :I've unmounted /tmp and am about to thrash the machine. At the :moment, it's sitting on: Swap: 120M Total, 376K Used, 120M Free : :Cheers, :-Peter Hmmm. It's possible. A quick look at the exmh source indicates that it uses /tmp a lot. I've been doing make buildworld's with a 300MB MFS /usr/obj, but those are typically nothing more then simple file creates, reads, and writes. Presumably exmh is doing something more sophisticated. I've since disabled MFS, compiled out a couple of other things I'm not using very often and generally cleaned things up. I've had three more panics since turning off MFS, so that wasn't it. :-( Anyway, I've just recompiled without SMP. There were some very strange things happening on the serial console again that I really do not like the look of. Console output has been disappearing and getting duplicated. Try changing the panic in vm/vm_page.c to a printf() ( I'll do that. FWIW, this has happened while the system has been nearly quiescent all the way through to being thrashed with parallel cvs updates etc running. Most times it waits till exmh is running. Last time (when recompiling without SMP) it crashed when it came to linking the kernel (and no exmh running). I'll see if it still crashes in uniprocessor mode, if so, I'll put some debugging in and see if I can find anything out. The kernel was last built on Jan 16, and that one works fine still, so I'm pretty sure it isn't hardware. Cheers, -Peter To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
On Sat, 23 Jan 1999, Peter Wemm wrote: Matthew Dillon wrote: :It's definately happening still, sorry. :-( I recompiled a 100% static :kernel and have had three more explosions, usually after starting exmh. :(exmh takes 10 to 15MB of ram on this system due to my mailbox folder :sizes). : :However, a clue.. The SMP box that is doing fine is a P6, an NFS client :and server (loading nfs.ko, it fsck's fast, so I use that box for making :sure the modules work). The one that is crashing, is a P5, an NFS client :and server (static kernel), and with a MFS /tmp. Both run softupdates (up :to date src/contrib/sys). : :I suspect MFS is the key. There's the new VOP_FREEBLKS() stuff you added, :and the corresponding calls to madvise to free the pages. : :Given madvise()'s murky history, I can't help but feel suspicious about it. : :I've unmounted /tmp and am about to thrash the machine. At the :moment, it's sitting on: Swap: 120M Total, 376K Used, 120M Free : :Cheers, :-Peter Hmmm. It's possible. A quick look at the exmh source indicates that it uses /tmp a lot. I've been doing make buildworld's with a 300MB MFS /usr/obj, but those are typically nothing more then simple file creates, reads, and writes. Presumably exmh is doing something more sophisticated. I've since disabled MFS, compiled out a couple of other things I'm not using very often and generally cleaned things up. I've had three more panics since turning off MFS, so that wasn't it. :-( Anyway, I've just recompiled without SMP. There were some very strange things happening on the serial console again that I really do not like the look of. Console output has been disappearing and getting duplicated. Try changing the panic in vm/vm_page.c to a printf() ( I'll do that. FWIW, this has happened while the system has been nearly quiescent all the way through to being thrashed with parallel cvs updates etc running. Most times it waits till exmh is running. Last time (when recompiling without SMP) it crashed when it came to linking the kernel (and no exmh running). I'll see if it still crashes in uniprocessor mode, if so, I'll put some debugging in and see if I can find anything out. The kernel was last built on Jan 16, and that one works fine still, so I'm pretty sure it isn't hardware. I just had one of these on one of my alphas. The machine is UP (obviously), no MFS, no dynamically loaded stuff. It was doing an installworld with NFSv3 mounted source, local obj. All filesystems were using softupdates. -- Doug Rabson Mail: d...@nlsystems.com Nonlinear Systems Ltd. Phone: +44 181 442 9037 To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
On Sat, 23 Jan 1999, Doug Rabson wrote: I just had one of these on one of my alphas. The machine is UP (obviously), no MFS, no dynamically loaded stuff. It was doing an installworld with NFSv3 mounted source, local obj. All filesystems were using softupdates. I made it happen again by doing the same installworld but this time I caught it in the debugger. I'll leave the machine up for a while in case someone has some idea of how to debug it. The stacktrace looks like this: #0 Debugger () at ../../alpha/alpha/db_interface.c:260 #1 0xfc36c2c0 in panic () at ../../kern/kern_shutdown.c:444 #2 0xfc4942fc in vm_page_alloc () at ../../vm/vm_page.c:1041 #3 0xfc3a1b54 in allocbuf () at ../../kern/vfs_bio.c:1791 #4 0xfc3a13d4 in getblk () at ../../kern/vfs_bio.c:1572 #5 0xfc46a150 in ffs_balloc () at ../../ufs/ffs/ffs_balloc.c:170 #6 0xfc4772dc in ffs_write () at vnode_if.h:1015 #7 0xfc3b3c00 in vn_write () at vnode_if.h:331 #8 0xfc37f72c in write () at ../../kern/sys_generic.c:270 #9 0xfc4b0a4c in syscall () at ../../alpha/alpha/trap.c:620 #10 0xfc4a416c in XentSys () at ../../alpha/alpha/exception.s:127 -- Doug Rabson Mail: d...@nlsystems.com Nonlinear Systems Ltd. Phone: +44 181 442 9037 To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
:I made it happen again by doing the same installworld but this time I :caught it in the debugger. I'll leave the machine up for a while in case :someone has some idea of how to debug it. The stacktrace looks like this: : :#0 Debugger () at ../../alpha/alpha/db_interface.c:260 :#1 0xfc36c2c0 in panic () at ../../kern/kern_shutdown.c:444 :#2 0xfc4942fc in vm_page_alloc () at ../../vm/vm_page.c:1041 :#3 0xfc3a1b54 in allocbuf () at ../../kern/vfs_bio.c:1791 The panic message should be printing the address of the vm_page_t that it caught. From the debugger, dump that vm_page_t with 'print'. print *0xADDRESS Do about 8 print's bumping the address by 4 ( in hex ) for each. It would be even better if we could figure out the contents and type of the underlying object. -Matt Matthew Dillon dil...@backplane.com :-- :Doug RabsonMail: d...@nlsystems.com :Nonlinear Systems Ltd. Phone: +44 181 442 9037 To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
On Sat, 23 Jan 1999, Matthew Dillon wrote: :I made it happen again by doing the same installworld but this time I :caught it in the debugger. I'll leave the machine up for a while in case :someone has some idea of how to debug it. The stacktrace looks like this: : :#0 Debugger () at ../../alpha/alpha/db_interface.c:260 :#1 0xfc36c2c0 in panic () at ../../kern/kern_shutdown.c:444 :#2 0xfc4942fc in vm_page_alloc () at ../../vm/vm_page.c:1041 :#3 0xfc3a1b54 in allocbuf () at ../../kern/vfs_bio.c:1791 The panic message should be printing the address of the vm_page_t that it caught. From the debugger, dump that vm_page_t with 'print'. print *0xADDRESS Do about 8 print's bumping the address by 4 ( in hex ) for each. It would be even better if we could figure out the contents and type of the underlying object. I have full symbols: (gdb) fr 2 #2 0xfc4942fc in vm_page_alloc () at ../../vm/vm_page.c:1041 1041panic(found dirty cache page %p, m); (gdb) l 1036 */ 1037 1038if (qtype == PQ_CACHE) { 1039#if !defined(MAX_PERF) 1040if (m-dirty) 1041panic(found dirty cache page %p, m); 1042 1043#endif 1044vm_page_busy(m); 1045vm_page_protect(m, VM_PROT_NONE); (gdb) p m $4 = (struct vm_page *) 0xfe108f40 (gdb) p *m $5 = {pageq = {tqe_next = 0x0, tqe_prev = 0xfc52ecc8}, hnext = 0x0, listq = {tqe_next = 0xfe090fe0, tqe_prev = 0xfe0bb6b8}, object = 0xfe00050e2a10, pindex = 12, phys_addr = 88940544, queue = 172, flags = 128, pc = 41, wire_count = 0, hold_count = 0, act_count = 5 '\005', busy = 0 '\000', valid = 65535, dirty = 65535} -- Doug Rabson Mail: d...@nlsystems.com Nonlinear Systems Ltd. Phone: +44 181 442 9037 To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
Doug Rabson wrote: On Sat, 23 Jan 1999, Matthew Dillon wrote: :I made it happen again by doing the same installworld but this time I :caught it in the debugger. I'll leave the machine up for a while in case :someone has some idea of how to debug it. The stacktrace looks like this: : :#0 Debugger () at ../../alpha/alpha/db_interface.c:260 :#1 0xfc36c2c0 in panic () at ../../kern/kern_shutdown.c:444 :#2 0xfc4942fc in vm_page_alloc () at ../../vm/vm_page.c:1041 :#3 0xfc3a1b54 in allocbuf () at ../../kern/vfs_bio.c:1791 The panic message should be printing the address of the vm_page_t that it caught. From the debugger, dump that vm_page_t with 'print'. print *0xADDRESS Do about 8 print's bumping the address by 4 ( in hex ) for each. It would be even better if we could figure out the contents and type of the underlying object. I have full symbols: (gdb) fr 2 #2 0xfc4942fc in vm_page_alloc () at ../../vm/vm_page.c:1041 1041panic(found dirty cache page %p, m); (gdb) l 1036 */ 1037 1038if (qtype == PQ_CACHE) { 1039#if !defined(MAX_PERF) 1040if (m-dirty) 1041panic(found dirty cache page %p, m); 1042 1043#endif 1044vm_page_busy(m); 1045vm_page_protect(m, VM_PROT_NONE); (gdb) p m $4 = (struct vm_page *) 0xfe108f40 (gdb) p *m $5 = {pageq = {tqe_next = 0x0, tqe_prev = 0xfc52ecc8}, hnext = 0x0, listq = {tqe_next = 0xfe090fe0, tqe_prev = 0xfe0bb6b8}, object = 0xfe00050e2a10, pindex = 12, phys_addr = 88940544, queue = 172 , flags = 128, pc = 41, wire_count = 0, hold_count = 0, act_count = 5 '\005', busy = 0 '\000', valid = 65535, dirty = 65535} -- Doug, Matt wanted some things from m-object too.. If it's still there can you grab it? printf( found dirty cache page %p (%p,%d,%x) obtype %d obflags %x, m, m-object, (int)m-pindex, (int)m-flags, (int)m-object-type, (int)m-object-flags ); BTW; in vm_map.c: /* * vm_map_clean * * Push any dirty cached pages in the address range to their pager. * If syncio is TRUE, dirty pages are written synchronously. * If invalidate is TRUE, any cached pages are freed as well. * * Returns an error if any part of the specified range is not mapped. */ This kinda suggests that dirty cached pages might not be all that unusual.. but the code in question seems to be working at a different level. Cheers, -Peter -- Peter Wemm pe...@netplex.com.au Netplex Consulting No coffee, No workee! :-) To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
Peter Wemm wrote: Matthew Dillon wrote: [..] Try changing the panic in vm/vm_page.c to a printf() ( I'll do that. BTW; what are the dangers of this? lost disk writes or corruption? Can we (as a workaround) push the page that we found back onto a dirty queue and try again after some diagnostics? FWIW, this has happened while the system has been nearly quiescent all the way through to being thrashed with parallel cvs updates etc running. Most times it waits till exmh is running. Last time (when recompiling without SMP) it crashed when it came to linking the kernel (and no exmh running). I'll see if it still crashes in uniprocessor mode, if so, I'll put some debugging in and see if I can find anything out. The kernel was last built on Jan 16, and that one works fine still, so I'm pretty sure it isn't hardware. It crashed in uniprocessor mode about 60 seconds after sending this mail. It's got a really trimmed down kernel config and no modules loaded or in use. I have not disabled softupdates yet, that's next. This particular machine won't reboot by itself after it's been running in SMP mode (it's really old), so I have to manually reset it. I went to sleep straight after that, and it ran the whole time I was asleep. After getting up again, I started exmh, and it crashed 30 seconds later. There was no swapping in progress, I have been tunning top -s1 to see what the swap and memory state is when it dies. Unfortunately I lost the last one, but it generally looks like this: last pid: 6293; load averages: 0.51, 0.52, 0.65up 0+01:40:54 14:19:06 40 processes: 1 running, 39 sleeping CPU states: 4.6% user, 0.0% nice, 11.8% system, 1.5% interrupt, 82.1% idle Mem: 19M Active, 9236K Inact, 13M Wired, 3068K Cache, 4691K Buf, 508K Free Swap: 120M Total, 128K Used, 120M Free This machine has 48M of ram, one swap partition only. Oh, one other thing that occurred to me.. Under 4.0-current, I regularly (ie: within 30 seconds of boot) get if_de tranmitter underflows. My console corruption was happening at the instant that de0 was being configured with ifconfig. exmh is running to a remote display over that de0 interface. Under Jan 16 3.0-current, I do not get that tranmitter underflow.. The only thin I can think of about if_de that's unusual that is VM related (apart from the complexity of the code) is that it uses configmalloc(). I wonder if this is somehow setting the scene for the later failures? It's certainly suspicious that has done strange things when being ifconfig'ed, including things like trashing the serial console on no less than a dozen occasions. Cheers, -Peter To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
On Sun, 24 Jan 1999, Peter Wemm wrote: Doug Rabson wrote: On Sat, 23 Jan 1999, Matthew Dillon wrote: :I made it happen again by doing the same installworld but this time I :caught it in the debugger. I'll leave the machine up for a while in case :someone has some idea of how to debug it. The stacktrace looks like this: : :#0 Debugger () at ../../alpha/alpha/db_interface.c:260 :#1 0xfc36c2c0 in panic () at ../../kern/kern_shutdown.c:444 :#2 0xfc4942fc in vm_page_alloc () at ../../vm/vm_page.c:1041 :#3 0xfc3a1b54 in allocbuf () at ../../kern/vfs_bio.c:1791 The panic message should be printing the address of the vm_page_t that it caught. From the debugger, dump that vm_page_t with 'print'. print *0xADDRESS Do about 8 print's bumping the address by 4 ( in hex ) for each. It would be even better if we could figure out the contents and type of the underlying object. I have full symbols: (gdb) fr 2 #2 0xfc4942fc in vm_page_alloc () at ../../vm/vm_page.c:1041 1041panic(found dirty cache page %p, m); (gdb) l 1036 */ 1037 1038if (qtype == PQ_CACHE) { 1039#if !defined(MAX_PERF) 1040if (m-dirty) 1041panic(found dirty cache page %p, m); 1042 1043#endif 1044vm_page_busy(m); 1045vm_page_protect(m, VM_PROT_NONE); (gdb) p m $4 = (struct vm_page *) 0xfe108f40 (gdb) p *m $5 = {pageq = {tqe_next = 0x0, tqe_prev = 0xfc52ecc8}, hnext = 0x0, listq = {tqe_next = 0xfe090fe0, tqe_prev = 0xfe0bb6b8}, object = 0xfe00050e2a10, pindex = 12, phys_addr = 88940544, queue = 172 , flags = 128, pc = 41, wire_count = 0, hold_count = 0, act_count = 5 '\005', busy = 0 '\000', valid = 65535, dirty = 65535} -- Doug, Matt wanted some things from m-object too.. If it's still there can you grab it? printf( found dirty cache page %p (%p,%d,%x) obtype %d obflags %x, m, m-object, (int)m-pindex, (int)m-flags, (int)m-object-type, (int)m-object-flags ); He sent me private mail asking for m-object which I replied to. Here is *m-object: $6 = {object_list = {tqe_next = 0xfe0005369870, tqe_prev = 0xfe000527e0b8}, shadow_head = {tqh_first = 0x0, tqh_last = 0xfe00050e2a20}, shadow_list = {tqe_next = 0x0, tqe_prev = 0xfe00052d2020}, memq = {tqh_first = 0xfe0c8f80, tqh_last = 0xfe115a78}, generation = 897, type = OBJT_DEFAULT, size = 23, ref_count = 1, shadow_count = 0, pg_color = 4, hash_rand = -15145890, flags = 8192, paging_in_progress = 0, behavior = 0, resident_page_count = 15, cache_count = 15, wire_count = 0, backing_object = 0x0, backing_object_offset = 0, last_read = 0, pager_object_list = {tqe_next = 0x0, tqe_prev = 0x0}, handle = 0x0, un_pager = {vnp = {vnp_size = 754}, devp = {devp_pglist = { tqh_first = 0x2f2, tqh_last = 0x0}}, swp = {swp_bcount = 754}}} BTW; in vm_map.c: /* * vm_map_clean * * Push any dirty cached pages in the address range to their pager. * If syncio is TRUE, dirty pages are written synchronously. * If invalidate is TRUE, any cached pages are freed as well. * * Returns an error if any part of the specified range is not mapped. */ This kinda suggests that dirty cached pages might not be all that unusual.. but the code in question seems to be working at a different level. I'm not too familiar with this code. It is only called from msync(2) as far as I can see. -- Doug Rabson Mail: d...@nlsystems.com Nonlinear Systems Ltd. Phone: +44 181 442 9037 To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
:[..] : Try changing the panic in vm/vm_page.c to a printf() ( : : I'll do that. : :BTW; what are the dangers of this? lost disk writes or corruption? Can :we (as a workaround) push the page that we found back onto a dirty queue :and try again after some diagnostics? That's ok, don't worry about it... Doug's debug output gives me the same info. :It crashed in uniprocessor mode about 60 seconds after sending this mail. :It's got a really trimmed down kernel config and no modules loaded or in :use. I have not disabled softupdates yet, that's next. I don't get it why can't I reproduce this problem? Can you email me your kernel configuration? Are you using any special devices like vn or something ? :Oh, one other thing that occurred to me.. Under 4.0-current, I regularly :(ie: within 30 seconds of boot) get if_de tranmitter underflows. My :console corruption was happening at the instant that de0 was being :configured with ifconfig. exmh is running to a remote display over that :de0 interface. : :Under Jan 16 3.0-current, I do not get that tranmitter underflow.. : :The only thin I can think of about if_de that's unusual that is VM related :(apart from the complexity of the code) is that it uses configmalloc(). I :wonder if this is somehow setting the scene for the later failures? It's :certainly suspicious that has done strange things when being ifconfig'ed, :including things like trashing the serial console on no less than a dozen :occasions. : :Cheers, :-Peter Hmmm.. HMM. contigmalloc, eh? You might be onto something here. I will investigate it. The problem was are having is that, somehow, a vm_page_t in the PQ_CACHE is being set dirty. Sinc vm_page_cache() panics if m-dirty is set, then m-dirty must be getting set *after* the page has been moved to the cache. contigmalloc() looks suspicious. -Matt Matthew Dillon dil...@backplane.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
Matthew Dillon wrote: [..] :Oh, one other thing that occurred to me.. Under 4.0-current, I regularly :(ie: within 30 seconds of boot) get if_de tranmitter underflows. My :console corruption was happening at the instant that de0 was being :configured with ifconfig. exmh is running to a remote display over that :de0 interface. : :Under Jan 16 3.0-current, I do not get that tranmitter underflow.. : :The only thin I can think of about if_de that's unusual that is VM related :(apart from the complexity of the code) is that it uses configmalloc(). I :wonder if this is somehow setting the scene for the later failures? It's :certainly suspicious that has done strange things when being ifconfig'ed, :including things like trashing the serial console on no less than a dozen :occasions. : :Cheers, :-Peter Hmmm.. HMM. contigmalloc, eh? You might be onto something here. I will investigate it. The problem was are having is that, somehow, a vm_page_t in the PQ_CACHE is being set dirty. Sinc vm_page_cache() panics if m-dirty is set, then m-dirty must be get ting set *after* the page has been moved to the cache. contigmalloc() looks suspicious. Damn, I must be loosing my mind. if_de doesn't use contigmalloc.. it either did, or was going to as a result of the problem of the transmit descriptor array crossing a page boundary that had to be contiguous. In the end there were two seperate malloc's, each less than PAGE_SIZE. Cheers, -Peter To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
On Sun, 24 Jan 1999, Peter Wemm wrote: Oh, one other thing that occurred to me.. Under 4.0-current, I regularly (ie: within 30 seconds of boot) get if_de tranmitter underflows. My console corruption was happening at the instant that de0 was being configured with ifconfig. exmh is running to a remote display over that de0 interface. Under Jan 16 3.0-current, I do not get that tranmitter underflow.. One of my alpha boxes has always got a few of these errors when it first transmits a largish packet. It happened under NetBSD and FreeBSD since I bought the machine (about June last year I think). Andrew Gallatin has seen similar errors on OSF1. I think its harmless. The only thin I can think of about if_de that's unusual that is VM related (apart from the complexity of the code) is that it uses configmalloc(). I wonder if this is somehow setting the scene for the later failures? It's certainly suspicious that has done strange things when being ifconfig'ed, including things like trashing the serial console on no less than a dozen occasions. I can't see where if_de is using contigmalloc(). I thought the bus_dma code in there wasn't used on FreeBSD. -- Doug Rabson Mail: d...@nlsystems.com Nonlinear Systems Ltd. Phone: +44 181 442 9037 To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
On Sun, 24 Jan 1999, Peter Wemm wrote: [..] Oh, one other thing that occurred to me.. Under 4.0-current, I regularly (ie: within 30 seconds of boot) get if_de tranmitter underflows. My console corruption was happening at the instant that de0 was being configured with ifconfig. exmh is running to a remote display over that de0 interface. Here too... pretty quickly after boot on a SMP machine (current as of Jan 12) that pushes quite a bit of traffic, the following messages appear: de0: abnormal interrupt: transmit underflow (raising TX threshold to 96|256) de0: abnormal interrupt: transmit underflow (raising TX threshold to 128|512) de0: abnormal interrupt: transmit underflow (raising TX threshold to 160|1024) The card is: de0: Digital 21140A Fast Ethernet rev 0x22 int a irq 16 on pci0.12.0 de0: 21140A [10-100Mb/s] pass 2.2 de0: address 00:c0:f0:1f:5d:0d de0: enabling Full Duplex 100baseTX port Actually a Kingston clone, not a real DEC (so 1/5th of the price - but the receiver doesn't go audibly *click* when it's autosensing). So far I've gotten this message once: de0: abnormal interrupt: transmit underflow (switching to store-and-forward mode) Any harm in them, or can I safely ignore them? Would it be a good idea to raise the TX threshold by default, if only to avoid the messages? It's plugged into a Catalyst switch, if it makes any difference... -- Niels. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
Yes, we're working on it in a sub-group. Since the panic message is a new one -- it's one I added that never existed in -3.x, it is possible that the bug is not related to my VM stuff but related to something else going on. I've found a number of other bugs in the greater VM system which I am comitting fixes for, *BUT* I don't think any of them are related to this particular panic. I am also comitting some very strict KASSERT checking to try to catch the problem earlier. Everyone running 4.x kernels should add the following options to your kernel config: options INVARIANTS options INVARIANT_SUPPORT -Matt Matthew Dillon dil...@backplane.com :On Sun, 24 Jan 1999, Peter Wemm wrote: : :[..] : Oh, one other thing that occurred to me.. Under 4.0-current, I regularly : (ie: within 30 seconds of boot) get if_de tranmitter underflows. My : console corruption was happening at the instant that de0 was being : configured with ifconfig. exmh is running to a remote display over that : de0 interface. : :Here too... pretty quickly after boot on a SMP machine (current as of Jan :12) that pushes quite a bit of traffic, the following messages appear: : :de0: abnormal interrupt: transmit underflow (raising TX threshold to 96|256) :de0: abnormal interrupt: transmit underflow (raising TX threshold to 128|512) :de0: abnormal interrupt: transmit underflow (raising TX threshold to 160|1024) : :The card is: : :de0: Digital 21140A Fast Ethernet rev 0x22 int a irq 16 on pci0.12.0 :de0: 21140A [10-100Mb/s] pass 2.2 :de0: address 00:c0:f0:1f:5d:0d :de0: enabling Full Duplex 100baseTX port : :Actually a Kingston clone, not a real DEC (so 1/5th of the price - but the :receiver doesn't go audibly *click* when it's autosensing). : :So far I've gotten this message once: : :de0: abnormal interrupt: transmit underflow (switching to store-and-forward mode) : :Any harm in them, or can I safely ignore them? Would it be a good idea to :raise the TX threshold by default, if only to avoid the messages? :It's plugged into a Catalyst switch, if it makes any difference... : : : -- Niels. : : :To Unsubscribe: send mail to majord...@freebsd.org :with unsubscribe freebsd-current in the body of the message : To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
:Here too... pretty quickly after boot on a SMP machine (current as of Jan :12) that pushes quite a bit of traffic, the following messages appear: : :de0: abnormal interrupt: transmit underflow (raising TX threshold to 96|256) :de0: abnormal interrupt: transmit underflow (raising TX threshold to 128|512) :de0: abnormal interrupt: transmit underflow (raising TX threshold to 160|1024) : :The card is: : :de0: Digital 21140A Fast Ethernet rev 0x22 int a irq 16 on pci0.12.0 :de0: 21140A [10-100Mb/s] pass 2.2 :de0: address 00:c0:f0:1f:5d:0d :de0: enabling Full Duplex 100baseTX port Three people getting these panics, three people with DEC ethernet cards. Random complaints about card during ifconfig: speaker goes click, console gets junked, etc etc etc. Is there anyone having this panic who does NOT have a DEC ethernet card ? I still don't think the card is causing the problem, but it would be nice if we could rule it out. GRIN -Matt : -- Niels. Matthew Dillon dil...@backplane.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
Here too... pretty quickly after boot on a SMP machine (current as of Jan 12) that pushes quite a bit of traffic, the following messages appear: de0: abnormal interrupt: transmit underflow (raising TX threshold to 96|256) [..] Three people getting these panics, three people with DEC ethernet cards. Random complaints about card during ifconfig: speaker goes click, console gets junked, etc etc etc. Actually, I haven't had the console of that SMP machine junked yet, but that's because there is no console worth speaking of. Previous reboot was because processes like tail(1) only appeared to hang, unkillable except by -9, and after attaching monitor and keyboard, upon pressing Enter at a login: prompt the cursor would only advance a line once... But that was a week ago, and it's a *busy* news server (that's not hitting swap), I was just curious about the error messages from the de driver. -- Niels. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
:But that was a week ago, and it's a *busy* news server (that's not hitting :swap), I was just curious about the error messages from the de driver. : : -- Niels. The transmit underflow messages: de0: abnormal interrupt: transmit underflow (raising TX threshold to 96|256) de0: abnormal interrupt: transmit underflow (raising TX threshold to 128|512) de0: abnormal interrupt: transmit underflow (raising TX threshold to 160|1024) can typically be ignored. It simply means that the DEC card has too small a transmit FIFO and is getting DMA underflows. Stupid card. -Matt Matthew Dillon dil...@backplane.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
On Sat, 23 Jan 1999, Matthew Dillon wrote: Yes, we're working on it in a sub-group. Since the panic message is a new one -- it's one I added that never existed in -3.x, it is possible that the bug is not related to my VM stuff but related to something else going on. I've found a number of other bugs in the greater VM system which I am comitting fixes for, *BUT* I don't think any of them are related to this particular panic. I am also comitting some very strict KASSERT checking to try to catch the problem earlier. Everyone running 4.x kernels should add the following Ahem, would you kindly define 'everyone'? options to your kernel config: options INVARIANTS options INVARIANT_SUPPORT -Matt Matthew Dillon dil...@backplane.com :On Sun, 24 Jan 1999, Peter Wemm wrote: : :[..] : Oh, one other thing that occurred to me.. Under 4.0-current, I regularly : (ie: within 30 seconds of boot) get if_de tranmitter underflows. My : console corruption was happening at the instant that de0 was being : configured with ifconfig. exmh is running to a remote display over that : de0 interface. : :Here too... pretty quickly after boot on a SMP machine (current as of Jan :12) that pushes quite a bit of traffic, the following messages appear: : :de0: abnormal interrupt: transmit underflow (raising TX threshold to 96|256) :de0: abnormal interrupt: transmit underflow (raising TX threshold to 128|512) :de0: abnormal interrupt: transmit underflow (raising TX threshold to 160|1024) : :The card is: : :de0: Digital 21140A Fast Ethernet rev 0x22 int a irq 16 on pci0.12.0 :de0: 21140A [10-100Mb/s] pass 2.2 :de0: address 00:c0:f0:1f:5d:0d :de0: enabling Full Duplex 100baseTX port : :Actually a Kingston clone, not a real DEC (so 1/5th of the price - but the :receiver doesn't go audibly *click* when it's autosensing). : :So far I've gotten this message once: : :de0: abnormal interrupt: transmit underflow (switching to store-and-forward mode) : :Any harm in them, or can I safely ignore them? Would it be a good idea to :raise the TX threshold by default, if only to avoid the messages? :It's plugged into a Catalyst switch, if it makes any difference... : : : -- Niels. : : :To Unsubscribe: send mail to majord...@freebsd.org :with unsubscribe freebsd-current in the body of the message : To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message Brian Feldman_ __ ___ ___ ___ gr...@unixhelp.org _ __ ___ | _ ) __| \ http://www.freebsd.org/ _ __ ___ | _ \__ \ |) | FreeBSD: The Power to Serve! _ __ ___ _ |___/___/___/ To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
: : I am also comitting some very strict KASSERT checking to try to catch : the problem earlier. Everyone running 4.x kernels should add the following : :Ahem, would you kindly define 'everyone'? Anyone, everyone, everybody, all ... any individual using the -4.x kernels needs to understand the highly experimental nature of said kernels. Turning on INVARIANTS is just plain smart. For many reasons but I will give you the top two: * The sanity checks could save your disks when someone commits a major mistake. * The sanity checks make it easier for bugs to be found and fixed when they do occur. -4.x is just getting on its feet, nobody should be shipping product with it for a while ( if they are, they are insane ). -Matt Matthew Dillon dil...@backplane.com : options to your kernel config: : : options INVARIANTS : options INVARIANT_SUPPORT To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
Matthew Dillon wrote: :But that was a week ago, and it's a *busy* news server (that's not hitting :swap), I was just curious about the error messages from the de driver. : : -- Niels. The transmit underflow messages: de0: abnormal interrupt: transmit underflow (raising TX threshold to 96|256) de0: abnormal interrupt: transmit underflow (raising TX threshold to 128|512) de0: abnormal interrupt: transmit underflow (raising TX threshold to 160|1024 ) can typically be ignored. It simply means that the DEC card has too smal l a transmit FIFO and is getting DMA underflows. Stupid card. As I understand it, what's happening is that it's reacting to pci bus congestion by raising the preread threshholds. It degenerates to fetching the entire frame into on-card (or chip) memory before beginning transmission. On my system I can understand it, it's a 2xP5 with a shared L2 cache on a Neptune chipset - something that isn't known for speed. Once you get two processors hammering the system bus, *plus* mix in an EISA scsi controller, I could well imagine the memory bus getting thrashed. I'm not sure how to read the messages. Looking at the if_pn driver as well, it looks like both start with a FIFO threshold of 72 bytes. I think that '160|1024' (for example) means start transmitting when the FIFO has fetched 160 bytes and don't stop fetching unless we hit 1024 bytes in the fifo. Store and forward mode (I believe) is a degenerate case where it fetches the entire packet into the buffer before beginning transmission. Bill Paul's if_pn driver doesn't react to an underflow at all.. it stays at 72/128 permanently. For what it's worth, the de cards are the only ones I've found that can work at all on this system at 100Mbit. The realtek 8139 cards (cheap!) went belly-up on the spot, no suprise there. I don't have an fxp card to test. -Matt Cheers, -Peter To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
:On my system I can understand it, it's a 2xP5 with a shared L2 cache on a :Neptune chipset - something that isn't known for speed. Once you get two :processors hammering the system bus, *plus* mix in an EISA scsi :controller, I could well imagine the memory bus getting thrashed. When we started throwing together Duel-P-II machines, we basically had to throw away our DEC chipset cards... I think that the DEC chip cards, at least the older ones, have serious PCI spec bugs that cause them to operate incorrectly on duel-cpu machines when more then one cpu is populated. -Matt To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
I've committed one bug fix to the 'found dirty cache page' bug -- turns out vm_map_split() was the culprit, renaming pages without removing them from PQ_CACHE. The bug was introduced in -3.0, and hit the KASSERT() I put in -4.x. I've committed a general inlining of 'changing the page dirty status to VM_PAGE_BITS_ALL' and put a sanity check in the inline. If this problem occurs again, you will get a different panic. One of: vm_page_dirty: page in cache! vm_page_busy: page already busy!!! vm_page_wakeup: page not busy!!! If your box drops into DDB, please get a backtrace and report it to the list or to me and we should be able to track down any remaining dirty-pages-in-PQ_CACHE bugs. -Matt Matthew Dillon dil...@backplane.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
At 10:34 AM 1/23/99 +0800, Peter Wemm wrote: Dual p5-90 w/ 48M ram, doing a major cvs update/merge (which mostly got lost): panic: found dirty cache page 0xf046f1c0 mp_lock = 0101; cpuid = 1; lapic.id = 0100 Debugger(panic) Stopped at Debugger+0x37: movl$0,in_Debugger db trace Debugger(f01f1806) at Debugger+0x37 panic(f01fbb50,f046f1c0,0,80,f45cbb20) at panic+0xa4 vm_page_alloc(f45f6f68,80,3,0,80) at vm_page_alloc+0x114 vm_page_grab(f45f6f68,80,83,0,80) at vm_page_grab+0x8d _pmap_allocpte(f45cbb20,80,201df000,201df000,2a86000) at _pmap_allocpte+0x19 pmap_allocpte(f45cbb20,201df000,f02c4df4,201df000,f45cbac0) at pmap_allocpte+0x53 pmap_enter(f45cbb20,201df000,2a86000,5,0) at pmap_enter+0x3d vm_fault(f45cbac0,201df000,1,0,f4195180) at vm_fault+0x891 trap_pfault(f45f9fbc,1,201df236) at trap_pfault+0xf2 trap(27,27,,5,efbfad38) at trap+0x1c2 calltrap() at calltrap+0x3c --- trap 0xc, eip = 0x201df236, esp = 0xefbfac4c, ebp = 0xefbfad38 --- db c boot() called on cpu#1 syncing disks... 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 giving up 1: dev:, flags:20020034, blkno:1057008, lblkno:0 [..] This was compiled two houts ago from absolute latest -current: FreeBSD spinner.netplex.com.au 4.0-CURRENT FreeBSD 4.0-CURRENT #385: Sat Jan 23 08:38:42 WST 1999 pe...@spinner.netplex.com.au:/home/src/sys/compile/SPINNER i386 My other SMP machine (2xPPro200) seems to be running fine: FreeBSD beast.netplex.com.au 4.0-CURRENT FreeBSD 4.0-CURRENT #267: Thu Jan 21 21:39:45 WST 1999 pe...@beast.netplex.com.au:/home/src/sys/compile/BEAST i386 Cheers, -Peter I just got the same thing doing a make -j8 world Machine is a dual pentium pro Intel PR440FX This must be from the recent vm changes as I could make -j8 world continually a few days ago without problem. This is the second time it happened to me the first time I was running X so I couldn't see the debugger message . This time without X I got the : panic: found dirty cache page Manfred = ||man...@netcom.com|| ||p...@infinex.com || ||Ph. (415) 681-6235|| = To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
Peter Wemm wrote: Dual p5-90 w/ 48M ram, doing a major cvs update/merge (which mostly got lost): panic: found dirty cache page 0xf046f1c0 mp_lock = 0101; cpuid = 1; lapic.id = 0100 Debugger(panic) Stopped at Debugger+0x37: movl$0,in_Debugger db trace This is possibly a false alarm.. Something wierd was happening. I cleaned out the kernel and reconfigured with NFS static (it was being loaded) and it seems to boot OK. At least, I'm not getting console corruption (random baud rate changes) and the SMP mutex being broken and both cpu's entering the kernel at once. I think I'll blame it on the 15 hour electrical storm. :-] Cheers, -Peter To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
:Peter Wemm wrote: : Dual p5-90 w/ 48M ram, doing a major cvs update/merge (which mostly got : lost): : : panic: found dirty cache page 0xf046f1c0 :... : :This is possibly a false alarm.. Something wierd was happening. I cleaned :out the kernel and reconfigured with NFS static (it was being loaded) and :it seems to boot OK. At least, I'm not getting console corruption (random :baud rate changes) and the SMP mutex being broken and both cpu's entering :the kernel at once. I think I'll blame it on the 15 hour electrical :storm. :-] : :Cheers, :-Peter An old nfs module would almost certainly not work with the new kernel without at least a recompile. I'd definitely recommend keeping the major modules compiled in rather then dynamically loaded, just on principle. In fact, in all my time at BEST and all my time playing with FreeBSD, I have *never* used any dynamic module except for the linux compatibility thingy, and even that was only a fluke. If you can compile it in, compile it in. But, keep a watch on it. I didn't have an SMP box to test the new VM stuff on so it's possible there's something going on there. -Matt Matthew Dillon dil...@backplane.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
:At 10:34 AM 1/23/99 +0800, Peter Wemm wrote: :Dual p5-90 w/ 48M ram, doing a major cvs update/merge (which mostly got :lost): : :panic: found dirty cache page 0xf046f1c0 :mp_lock = 0101; cpuid = 1; lapic.id = 0100 :... :I just got the same thing doing a make -j8 world :Machine is a dual pentium pro Intel PR440FX :This must be from the recent vm changes as I could make -j8 world :continually a :few days ago without problem. This is the second time it happened to me :the first time I was running X so I couldn't see the debugger message . :This time without X I got the : : :panic: found dirty cache page : :Manfred Any dynamically loaded modules? e.g. nfs? Did you update /usr/src/contrib/sys (i.e. softupdates ) along with /usr/src/sys ? Are you using vinum? -Matt := :||man...@netcom.com|| :||p...@infinex.com || :||Ph. (415) 681-6235|| := Matthew Dillon dil...@backplane.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
On Fri, 22 Jan 1999, Manfred Antar wrote: At 10:34 AM 1/23/99 +0800, Peter Wemm wrote: Dual p5-90 w/ 48M ram, doing a major cvs update/merge (which mostly got lost): panic: found dirty cache page 0xf046f1c0 mp_lock = 0101; cpuid = 1; lapic.id = 0100 Debugger(panic) Stopped at Debugger+0x37: movl$0,in_Debugger db trace Debugger(f01f1806) at Debugger+0x37 panic(f01fbb50,f046f1c0,0,80,f45cbb20) at panic+0xa4 vm_page_alloc(f45f6f68,80,3,0,80) at vm_page_alloc+0x114 vm_page_grab(f45f6f68,80,83,0,80) at vm_page_grab+0x8d _pmap_allocpte(f45cbb20,80,201df000,201df000,2a86000) at _pmap_allocpte+0x19 pmap_allocpte(f45cbb20,201df000,f02c4df4,201df000,f45cbac0) at pmap_allocpte+0x53 pmap_enter(f45cbb20,201df000,2a86000,5,0) at pmap_enter+0x3d vm_fault(f45cbac0,201df000,1,0,f4195180) at vm_fault+0x891 trap_pfault(f45f9fbc,1,201df236) at trap_pfault+0xf2 trap(27,27,,5,efbfad38) at trap+0x1c2 calltrap() at calltrap+0x3c --- trap 0xc, eip = 0x201df236, esp = 0xefbfac4c, ebp = 0xefbfad38 --- db c boot() called on cpu#1 syncing disks... 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 232 giving up 1: dev:, flags:20020034, blkno:1057008, lblkno:0 [..] This was compiled two houts ago from absolute latest -current: FreeBSD spinner.netplex.com.au 4.0-CURRENT FreeBSD 4.0-CURRENT #385: Sat Jan 23 08:38:42 WST 1999 pe...@spinner.netplex.com.au:/home/src/sys/compile/SPINNER i386 My other SMP machine (2xPPro200) seems to be running fine: FreeBSD beast.netplex.com.au 4.0-CURRENT FreeBSD 4.0-CURRENT #267: Thu Jan 21 21:39:45 WST 1999 pe...@beast.netplex.com.au:/home/src/sys/compile/BEAST i386 Cheers, -Peter I just got the same thing doing a make -j8 world Machine is a dual pentium pro Intel PR440FX This must be from the recent vm changes as I could make -j8 world continually a few days ago without problem. This is the second time it happened to me the first time I was running X so I couldn't see the debugger message . This time without X I got the : panic: found dirty cache page You should definitely be using DDB_UNATTENDED, by the way, if you're going to be running X and want DDB but not to have DDB try to pop up on a panic. I did get DDB_UNATTENDED behavior finally working as well as it should, so there's no reason not to use it. Manfred = ||man...@netcom.com|| ||p...@infinex.com || ||Ph. (415) 681-6235|| = To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message Brian Feldman_ __ ___ ___ ___ gr...@unixhelp.org _ __ ___ | _ ) __| \ http://www.freebsd.org/ _ __ ___ | _ \__ \ |) | FreeBSD: The Power to Serve! _ __ ___ _ |___/___/___/ To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message
Re: panic: found dirty cache page 0xf046f1c0
Matthew Dillon wrote: :Peter Wemm wrote: : Dual p5-90 w/ 48M ram, doing a major cvs update/merge (which mostly got : lost): : : panic: found dirty cache page 0xf046f1c0 :... : :This is possibly a false alarm.. Something wierd was happening. I cleaned :out the kernel and reconfigured with NFS static (it was being loaded) and :it seems to boot OK. At least, I'm not getting console corruption (random :baud rate changes) and the SMP mutex being broken and both cpu's entering :the kernel at once. I think I'll blame it on the 15 hour electrical :storm. :-] : :Cheers, :-Peter An old nfs module would almost certainly not work with the new kernel without at least a recompile. I'd definitely recommend keeping the major modules compiled in rather then dynamically loaded, just on principle. In fact, in all my time at BEST and all my time playing with FreeBSD, I have *never* used any dynamic module except for the linux compatibility thingy, and even that was only a fluke. If you can compile it in, compile it in. It's definately happening still, sorry. :-( I recompiled a 100% static kernel and have had three more explosions, usually after starting exmh. (exmh takes 10 to 15MB of ram on this system due to my mailbox folder sizes). But, keep a watch on it. I didn't have an SMP box to test the new VM stuff on so it's possible there's something going on there. However, a clue.. The SMP box that is doing fine is a P6, an NFS client and server (loading nfs.ko, it fsck's fast, so I use that box for making sure the modules work). The one that is crashing, is a P5, an NFS client and server (static kernel), and with a MFS /tmp. Both run softupdates (up to date src/contrib/sys). I suspect MFS is the key. There's the new VOP_FREEBLKS() stuff you added, and the corresponding calls to madvise to free the pages. Given madvise()'s murky history, I can't help but feel suspicious about it. I've unmounted /tmp and am about to thrash the machine. At the moment, it's sitting on: Swap: 120M Total, 376K Used, 120M Free Cheers, -Peter To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-current in the body of the message