Re: amd(8) cores dump when load high
On Sat, Dec 27, 2008 at 7:03 PM, Danny Braniss da...@cs.huji.ac.il wrote: No, we do not running amd with -S. # ps auxww | grep amd root 706 0.0 0.1 7660 5416 ?? Ss Wed05PM 4:48.12 /usr/sbin/amd -p -k amd64 -x all /net amd.map well, I'm running 7.1-PRERELEASE, what does the amd logs show? [...] Dec 27 10:37:01 sf-02 amd[857]: Locked process pages in memory ** Hmm.. interesting, I got this Dec 26 15:32:11 bsd2 amd[39723]: Couldn't lock process pages in memory using mlo ckall(): Resource temporarily unavailable w/ 7-STABLE around Sep 4. I don't put plock = no in amd.conf, so by default it's plock'ed. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: amd(8) cores dump when load high
On Tue, Dec 23, 2008 at 12:44 AM, Lin Jui-Nan Eric eric...@tamama.org wrote: Dear listers, We currently found that amd frequently cores dump while loading is high (about 4~5) after we upgrade world kernel from 7.0-RELEASE to 7.1-PRERELEASE. I have read -stable and svn log of 7-STABLE, but can not found a report or a solution. Did anyone have the same issue? Thank you very much. According to my previous experience, amd 6.1.5 crashes under low memory situations. Not necessary high load. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: lock problem: nfs server on FreeBSD 7-stable, client on linux
On Sun, Apr 6, 2008 at 1:18 PM, Tz-Huan Huang [EMAIL PROTECTED] wrote: Hi, Thanks for your suggestion, but we don't accept this workaround. After doing binary searching, I find that this commit break the working lockd: http://lists.freebsd.org/pipermail/cvs-src/2008-March/089037.html I have rolled back the lockd.c to 1.20 in our nfs server and it works fine as before. Add dfr@ to CC list. I'm curious about this change, could you check what socket bind by rpc.lockd and rpc.statd before and after lockd. rev 1.21+1.22 changes? Thanks, Rong-En Fan On Thu, Apr 3, 2008 at 1:02 AM, Ken Chen [EMAIL PROTECTED] wrote: I have the similar problem when FreeBSD 7 client + FreeBSD 6 server. Now, I use ' mount_nfs -L' on the client to do local locking only. Of course, it may cause other problem. 2008/4/2, Tz-Huan Huang [EMAIL PROTECTED]: Hi, We have one nfs server (Mar 27's 7-stable, AMD64) and many clients. One of the client is also 7-stable(Mar 30's, i386), and others are Debian Linux. The problem is that the fcntl lock works fine on FreeBSD client but not on linux ones. We have tested the linux server + linux client, and they works fine. The following is all the combination we have tried: FreeBSD server + FreeBSD client: ok FreeBSd server + Linux clinet: fail Linux server + Linux client: ok Linux server + FreeBSD client: ok Is there some issue with 7-stable 's rpc.lockd? More information will be available if necessary, thanks. Tz-Huan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: SchedULE vs BSD scheduler - Was: HP ProLiant DL360 G5 success stories?
On Sat, Mar 15, 2008 at 12:14 AM, Christopher Sean Hilton [EMAIL PROTECTED] wrote: On Mar 12, 2008, at 12:05 PM, Oliver Fromme wrote: Those machines work very well with both FreeBSD 6 and 7. If you install FreeBSD 7, remember to enable ULE instead of the default BSD scheduler. What's the advantage of ULE / disadvantage of the default? Is it specific to this hardware? It gives you better performance. You may want to check Kris's slides http://people.freebsd.org/~kris/scaling/7.0%20and%20beyond.pdf Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
panic: locking against myself on 7.0-R
It's 7.0-RELEASE amd64, GENERIC modulo some devices, using 4BSD, IPSEC, and IPFW. The backtrace seems related to softupdate code. This box is just a NFS server that serves ~25 6.x + Linux clients. Any ideas? Regards, Rong-En Fan panic: lockmgr: locking against myself cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a panic() at panic+0x17a _lockmgr() at _lockmgr+0x85a getblk() at getblk+0x149 breadn() at breadn+0x3f bread() at bread+0x1e indir_trunc() at indir_trunc+0x11f indir_trunc() at indir_trunc+0x287 indir_trunc() at indir_trunc+0x287 handle_workitem_freeblocks() at handle_workitem_freeblocks+0x2aa process_worklist_item() at process_worklist_item+0x293 softdep_process_worklist() at softdep_process_worklist+0xed softdep_flush() at softdep_flush+0x12a fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xb91f1d30, rbp = 0 --- Uptime: 8d15h33m46s Physical memory: 3064 MB Dumping 470 MB: 455 439 423 407 391 375 359 343 327 311 295 279 263 247 231 215 199 183 167 151 135 119 103 87 71 55 39 23 7 #0 doadump () at pcpu.h:194 194 pcpu.h: No such file or directory. in pcpu.h #0 doadump () at pcpu.h:194 #1 0x802b1ad8 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #2 0x802b1f37 in panic (fmt=Variable fmt is not available. ) at /usr/src/sys/kern/kern_shutdown.c:563 #3 0x802a258a in _lockmgr (lkp=0xa65f1c38, flags=0, interlkp=Variable interlkp is not available. ) at /usr/src/sys/kern/kern_lock.c:366 #4 0x80319dc9 in getblk (vp=0xff000151e7c0, blkno=21058528, size=16384, slpflag=0, slptimeo=0, flags=Variable flags is not available. ) at buf.h:301 #5 0x8031aa8f in breadn (vp=0xff000151e7c0, blkno=Variable blkno is not available. ) at /usr/src/sys/kern/vfs_bio.c:786 #6 0x8031abae in bread (vp=Variable vp is not available. ) at /usr/src/sys/kern/vfs_bio.c:734 #7 0x803e897f in indir_trunc (freeblks=0xff009c6db600, dbn=21058528, level=0, lbn=6303756, countp=0xb91f1b10) at /usr/src/sys/ufs/ffs/ffs_softdep.c:2866 #8 0x803e8ae7 in indir_trunc (freeblks=0xff009c6db600, dbn=Variable dbn is not available. ) at /usr/src/sys/ufs/ffs/ffs_softdep.c:2892 #9 0x803e8ae7 in indir_trunc (freeblks=0xff009c6db600, dbn=Variable dbn is not available. ) at /usr/src/sys/ufs/ffs/ffs_softdep.c:2892 #10 0x803e8f0a in handle_workitem_freeblocks ( freeblks=0xff009c6db600, flags=0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:2746 #11 0x803ea473 in process_worklist_item (mp=0xff000159d978, flags=Variable flags is not available. ) at /usr/src/sys/ufs/ffs/ffs_softdep.c:963 #12 0x803eb4cd in softdep_process_worklist (mp=0xff000159d978, full=0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:847 #13 0x803ed42a in softdep_flush () at /usr/src/sys/ufs/ffs/ffs_softdep.c:758 #14 0x802924bf in fork_exit ( callout=0x803ed300 softdep_flush, arg=0x0, frame=0xb91f1c80) at /usr/src/sys/kern/kern_fork.c:781 #15 0x8043170e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:415 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: broken buildkernel (scsi_low and -Os) and duplicate manpages
On Wed, Feb 13, 2008 at 10:52:50AM +0100, Christian Brueffer wrote: On Wed, Feb 13, 2008 at 11:15:29AM +0200, David Naylor wrote: Hi, Building the kernel with CFLAGS=-Os breaks when compiling module scsi_low. Sorry no output available. Placing CFLAGS+= -O in the Makefile fixes the problem. Last build with -O2 did work (for everything, world, kernel and ports). From my research it appears the -Os produces code faster than -O2 and generally slower than -O3 but the smallest binary (and quicker compile times), does anyone have a better understanding of such things (performance and -O? flags). When doing an installworld DEST=? it fails twice when trying to install duplicate man pages: 1) lib/ncurses/ncurses: tputs.3 2) share/man/man9: rman_fini.9 The rman_fini.9 one was a mistake, I've just fixed it. Thanks! rafan@ (CCed) did the last few ncurses updates. Rong-En, could you take a look at the tputs.3 issue? Interesting, I actually use installworld w/ DESTDIR, but it does not fail. Nevertheless, I have just removed the duplicate one (actually, both curs_terminfo and curs_termcap has tputs.3. As we use termcap in base, so I just removed the one links to curs_terminfo). Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
6.3 panic (seems hptmv related)
We have a box running 6.2-RELEASE smoothly, once we boot with 6.3-RELEASE. It panics in hptmv0, I have kernel dump available. Any ideas? Regards, Rong-En Fan Fatal trap 12: page fault while in kernel mode fault virtual address = 0xfffb5444efc5 fault code = supervisor read data, page not present instruction pointer = 0x8:0x8032ac66 stack pointer = 0x10:0xa5334b90 frame pointer = 0x10:0x0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 22 (irq19: hptmv0) trap number = 12 panic: page fault Uptime: 2m46s Dumping 1023 MB (2 chunks) chunk 0: 1MB (159 pages) ... ok chunk 1: 1023MB (261808 pages) 1007 991 975 959 943 927 911 895 879 863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591 575 559 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15 #0 doadump () at pcpu.h:172 172 __asm __volatile(movq %%gs:0,%0 : =r (td)); (kgdb) bt full #0 doadump () at pcpu.h:172 No locals. #1 0x0004 in ?? () No symbol table info available. #2 0x8021a083 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 first_buf_printf = 1 #3 0x8021a686 in panic (fmt=0xff003dba2980 °6¹=) at /usr/src/sys/kern/kern_shutdown.c:565 bootopt = 260 newpanic = 0 ap = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0xa5334a00, reg_save_area = 0xa5334930}} buf = page fault, '\0' repeats 245 times #4 0x80349e41 in trap_fatal (frame=0xff003dba2980, eva=18446742975233472176) at /usr/src/sys/amd64/amd64/trap.c:669 code = 12 ss = 12 type = 12 esp = 0 softseg = {ssd_base = 0, ssd_limit = 1048575, ssd_type = 27, ssd_dpl = 0, ssd_p = 1, ssd_long = 1, ssd_def32 = 0, ssd_gran = 1} msg = 0x0 #5 0x8034a1b2 in trap_pfault (frame=0xa5334ae0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:580 va = 18446744053648515072 vm = (struct vmspace *) 0x0 map = 0x1 rv = 1 ftype = 1 '\001' p = (struct proc *) 0x0 eva = 18446744053648519109 #6 0x8034a463 in trap (frame= {tf_rdi = -2144149195, tf_rsi = -2142164024, tf_rdx = 0, tf_rcx = 3175162082, tf_r8 = 1536, tf_r9 = 97, tf_rax = -20061032619, tf_rbx = -2144149195, tf_rbp = 0, tf_r10 = -2142280936, tf_r11 = -2054259168, tf_r12 = 4, tf_r13 = 0, tf_r14 = -1099503442816, tf_r15 = -1098476017280, tf_trapno = 12, tf_addr = -20061032507, tf_flags = -1098476079440, tf_err = 0, tf_rip = -2144162714, tf_cs = 8, tf_rflags = 66183, tf_rsp = -1523364960, tf_ss = 16}) at /usr/src/sys/amd64/amd64/trap.c:353 p = (struct proc *) 0xff003db936b0 sticks = 4294967295 type = 3 i = 0 ucode = 0 code = 0 #7 0x80334dbb in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168 No locals. #8 0x8032ac66 in CheckPendingCall () No symbol table info available. #9 0x8035681c in hpt_intr (arg=0x8032df25) at /usr/src/sys/dev/hptmv/entry.c:2039 _vbus_p = 0x8032e135 oldspl = 0 #10 0x80200335 in ithread_loop (arg=0xff7ce480) at /usr/src/sys/kern/kern_intr.c:682 ie = (struct intr_event *) 0xff009800 #11 0x801fed83 in fork_exit ( callout=0x802001f0 ithread_loop, arg=0xff7ce480, frame=0xa5334c50) at /usr/src/sys/kern/kern_fork.c:788 p = (struct proc *) 0xff003db936b0 #12 0x8033517e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:411 No locals. #13 0x in ?? () No symbol table info available. #14 0x in ?? () No symbol table info available. #15 0x0001 in ?? () No symbol table info available. #16 0x in ?? () No symbol table info available. #17 0x in ?? () No symbol table info available. #18 0x in ?? () No symbol table info available. #19 0x in ?? () No symbol table info available. #20 0x in ?? () No symbol table info available. #21 0x in ?? () No symbol table info available. #22 0x in ?? () No symbol table info available. #23 0x in ?? () No symbol table info available. #24 0x in ?? () No symbol table info available. #25 0x in ?? () No symbol table info available. #26 0x in ?? () No symbol table info available. #27 0x in ?? () No symbol table info available. #28 0x in ?? () No symbol table info available. #29 0x in ?? () No symbol table info available. #30
if you see undefined symbol '__mb_sb_limit' on 6.x
The ctype fix for UTF-8 locale unfortunately introduced some new symbols to libc. Therefore, binaries built on system with that fix can not be used on older system. For that sake, the fix is back-out for 6-STABLE. If you see undefined symbols '__mb_sb_limit', please rebuild the affected binary. Everything will be fine then. Binaries built between 20071025 and 20071030 will be affected by this. Sorry for the inconvenience. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: IBM xSeries 336 dual Xeon hangs on boot when APIC enabled
On Aug 13, 2006 11:41 PM, Arjan van Leeuwen [EMAIL PROTECTED] wrote: I'm trying to boot FreeBSD 6.1-RELEASE/amd64 on an IBM xSeries 336 machine with dual Xeons 3.2GHz installed. The installation was successful, but if I try to boot the SMP kernel, it hangs after detection of SCSI and ATA devices (possibly when doing the initialization of the mpt0 RAID controller, or when it tries to start the second CPU?). Recently, I had an opportunity to access one xSeries 336 box. With 7.0-BETA2 amd64, it boots just fine without any tuning. SMP is also working. Something must be changed in the past two years. ;-) Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: HEADSUP: don't upgrade to RELENG_6 now [FIXED]
On 10/30/07, Byung-Hee HWANG [EMAIL PROTECTED] wrote: Hello, On Thu, 2007-10-25 at 20:51 +0800, Rong-en Fan wrote: The breakage introduced by MFC of ctype(3) after 2007/10/24 14:23 UTC is now fixed. Make sure you have lib/Makefile rev 1.205.2.4 before upgrading your world. If it breaks already, please follow the instructions in src/UPDATING to recover. In that case, should I re-build only userland ? or should I re-build both userland and kernel? (Yep, of course, I have updated source tree with CVSup for now) If you have fixed your world, it should be okay. But I suggest you upgrading to latest 6-STABLE for ctype abi forward compatibility fix. Regards, Rong-En Fan Sorry for all the troubles. No problem, I'm always OK! Regards, Rong-En Fan Byung-Hee On 10/25/07, Rong-en Fan [EMAIL PROTECTED] wrote: On 10/25/07, Rong-en Fan [EMAIL PROTECTED] wrote: On 10/25/07, Alson van der Meulen [EMAIL PROTECTED] wrote: Hello, My installworld of RELENG_6 from a few hours ago failed with this error (from memory): /lib/libncurses.so.6: undefined symbol: __mb_sb_limit This broke everything that depended on libncurses, plus PAM. I had to force a reboot via DDB and copy /usr/obj/lib/libc/libc.so.6 to /lib using binaries from /rescue to fix it so I could run make installworld again. I will take a look. before that do not upgrade your system. I did some tests, it turns out that only RELENG_6 is affected. To be more specific, as ncurses lib is installed before libc. It gets broken. For 7 and above, it is fine because we install libc right after csu and before everything. One way to solve this is we install libc as early as possible, but I think it may be too risky at release cycle, so I would like to back out this change and add an UPDATING entry. Then check whether we can change the installation order of libc later. If you cvsup after 'Oct 24 14:32:33 2007 UTC', please do not upgrade until I send out all clear message. If you already broken your world, use this way in *single user*, /rescue/chflags noschg /lib/libc.so.6 /rescue/cp /usr/obj/usr/src/lib/libc/libc.so.6 /lib/ then you need reboot, after that continue installworld (this is what I just did). Sorry for all the trouble. Thanks, Rong-En Fan Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] -- After super, can you drive me and the kids to New York in your car? That's what I came for. -- Kay Adams and Tom Hagen, Chapter 32, page 443 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: HEADSUP: don't upgrade to RELENG_6 now (7 is fine)
On 10/25/07, Rong-en Fan [EMAIL PROTECTED] wrote: On 10/25/07, Rong-en Fan [EMAIL PROTECTED] wrote: On 10/25/07, Alson van der Meulen [EMAIL PROTECTED] wrote: Hello, My installworld of RELENG_6 from a few hours ago failed with this error (from memory): /lib/libncurses.so.6: undefined symbol: __mb_sb_limit This broke everything that depended on libncurses, plus PAM. I had to force a reboot via DDB and copy /usr/obj/lib/libc/libc.so.6 to /lib using binaries from /rescue to fix it so I could run make installworld again. I will take a look. before that do not upgrade your system. I did some tests, it turns out that only RELENG_6 is affected. To be more specific, as ncurses lib is installed before libc. It gets broken. For 7 and above, it is fine because we install libc right after csu and before everything. One way to solve this is we install libc as early as possible, but I think it may be too risky at release cycle, so I would like to back out this change and add an UPDATING entry. Then check whether we can change the installation order of libc later. If you cvsup after 'Oct 24 14:32:33 2007 UTC', please do not upgrade until I send out all clear message. If you already broken your world, use this way in *single user*, /rescue/chflags noschg /lib/libc.so.6 /rescue/cp /usr/obj/usr/src/lib/libc/libc.so.6 /lib/ then you need reboot, after that continue installworld (this is what I just did). Sorry for all the trouble. An entry in UPDATING is added and I'm working on a proper fix. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: HEADSUP: don't upgrade to RELENG_6 now (7 is fine)
On 10/25/07, Rong-en Fan [EMAIL PROTECTED] wrote: On 10/25/07, Rong-en Fan [EMAIL PROTECTED] wrote: On 10/25/07, Rong-en Fan [EMAIL PROTECTED] wrote: On 10/25/07, Alson van der Meulen [EMAIL PROTECTED] wrote: Hello, My installworld of RELENG_6 from a few hours ago failed with this error (from memory): /lib/libncurses.so.6: undefined symbol: __mb_sb_limit This broke everything that depended on libncurses, plus PAM. I had to force a reboot via DDB and copy /usr/obj/lib/libc/libc.so.6 to /lib using binaries from /rescue to fix it so I could run make installworld again. I will take a look. before that do not upgrade your system. I did some tests, it turns out that only RELENG_6 is affected. To be more specific, as ncurses lib is installed before libc. It gets broken. For 7 and above, it is fine because we install libc right after csu and before everything. One way to solve this is we install libc as early as possible, but I think it may be too risky at release cycle, so I would like to back out this change and add an UPDATING entry. Then check whether we can change the installation order of libc later. If you cvsup after 'Oct 24 14:32:33 2007 UTC', please do not upgrade until I send out all clear message. If you already broken your world, use this way in *single user*, /rescue/chflags noschg /lib/libc.so.6 /rescue/cp /usr/obj/usr/src/lib/libc/libc.so.6 /lib/ then you need reboot, after that continue installworld (this is what I just did). Sorry for all the trouble. An entry in UPDATING is added and I'm working on a proper fix. If you have updated your source after Oct 24, please apply this patch http://people.freebsd.org/~rafan/libc-order.diff before building world. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: HEADSUP: don't upgrade to RELENG_6 now [FIXED]
The breakage introduced by MFC of ctype(3) after 2007/10/24 14:23 UTC is now fixed. Make sure you have lib/Makefile rev 1.205.2.4 before upgrading your world. If it breaks already, please follow the instructions in src/UPDATING to recover. Sorry for all the troubles. Regards, Rong-En Fan On 10/25/07, Rong-en Fan [EMAIL PROTECTED] wrote: On 10/25/07, Rong-en Fan [EMAIL PROTECTED] wrote: On 10/25/07, Alson van der Meulen [EMAIL PROTECTED] wrote: Hello, My installworld of RELENG_6 from a few hours ago failed with this error (from memory): /lib/libncurses.so.6: undefined symbol: __mb_sb_limit This broke everything that depended on libncurses, plus PAM. I had to force a reboot via DDB and copy /usr/obj/lib/libc/libc.so.6 to /lib using binaries from /rescue to fix it so I could run make installworld again. I will take a look. before that do not upgrade your system. I did some tests, it turns out that only RELENG_6 is affected. To be more specific, as ncurses lib is installed before libc. It gets broken. For 7 and above, it is fine because we install libc right after csu and before everything. One way to solve this is we install libc as early as possible, but I think it may be too risky at release cycle, so I would like to back out this change and add an UPDATING entry. Then check whether we can change the installation order of libc later. If you cvsup after 'Oct 24 14:32:33 2007 UTC', please do not upgrade until I send out all clear message. If you already broken your world, use this way in *single user*, /rescue/chflags noschg /lib/libc.so.6 /rescue/cp /usr/obj/usr/src/lib/libc/libc.so.6 /lib/ then you need reboot, after that continue installworld (this is what I just did). Sorry for all the trouble. Thanks, Rong-En Fan Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
HEADSUP: don't upgrade to RELENG_[67] now (Re: Installworld broken on RELENG_6 by libc commit?)
On 10/25/07, Alson van der Meulen [EMAIL PROTECTED] wrote: Hello, My installworld of RELENG_6 from a few hours ago failed with this error (from memory): /lib/libncurses.so.6: undefined symbol: __mb_sb_limit This broke everything that depended on libncurses, plus PAM. I had to force a reboot via DDB and copy /usr/obj/lib/libc/libc.so.6 to /lib using binaries from /rescue to fix it so I could run make installworld again. I will take a look. before that do not upgrade your system. Regards, Rong-En Fan I upgraded from RELENG_6 of October, 22. I believe I followed the procedure from /usr/src/UPDATING fairly closely, except for the reboot to single user part after installing the kernel: mergemaster -p make buildworld make kernel make installworld mergemaster make delete-old I would expect libc to be installed before other libs. The securelevel was -1, so it should be no problem to overwrite libc. Did I do something wrong or is this a bug/missing entry in UPDATING? regards, Alson The csup output since my last make world: -- Running /usr/bin/csup -- Parsing supfile /usr/share/examples/cvsup/stable-supfile Connecting to cvsup3.nl.freebsd.org Connected to 62.250.3.15 Server software version: SNAP_16_1h Negotiating file attribute support Exchanging collection information Establishing multiplexed-mode data connection Running Updating collection src-all/cvs Edit src/include/_ctype.h Add delta 1.30.2.1 2007.10.24.14.32.32 rafan Edit src/include/ctype.h Add delta 1.28.8.1 2007.10.24.14.32.32 rafan Edit src/lib/libc/locale/big5.c Add delta 1.17.2.1 2007.10.24.14.32.32 rafan Edit src/lib/libc/locale/euc.c Add delta 1.21.2.1 2007.10.24.14.32.32 rafan Edit src/lib/libc/locale/gb18030.c Add delta 1.7.2.1 2007.10.24.14.32.32 rafan Edit src/lib/libc/locale/gb2312.c Add delta 1.9.2.1 2007.10.24.14.32.33 rafan Edit src/lib/libc/locale/gbk.c Add delta 1.12.2.1 2007.10.24.14.32.33 rafan Edit src/lib/libc/locale/isctype.c Add delta 1.9.14.1 2007.10.24.14.32.33 rafan Edit src/lib/libc/locale/mskanji.c Add delta 1.17.2.1 2007.10.24.14.32.33 rafan Edit src/lib/libc/locale/none.c Add delta 1.13.2.1 2007.10.24.14.32.33 rafan Edit src/lib/libc/locale/setrunelocale.c Add delta 1.45.2.1 2007.10.24.14.32.33 rafan Edit src/lib/libc/locale/utf8.c Add delta 1.13.2.2 2007.10.24.14.32.33 rafan Edit src/lib/libstand/Makefile Add delta 1.54.2.1 2007.10.24.11.50.06 nyan Edit src/release/Makefile Add delta 1.887.2.21 2007.10.23.23.45.14 kensmith Edit src/sbin/mount_unionfs/mount_unionfs.8 Add delta 1.20.2.2 2007.10.23.03.37.09 daichi Edit src/share/mklocale/UTF-8.src Add delta 1.1.8.2 2007.10.24.14.32.33 rafan Edit src/sys/alpha/pci/pcibus.c Add delta 1.36.2.2 2007.10.24.12.36.25 jhb Edit src/sys/boot/ficl/Makefile Add delta 1.41.2.2 2007.10.24.11.50.07 nyan Edit src/sys/boot/pc98/Makefile.inc Add delta 1.5.8.2 2007.10.24.11.50.07 nyan Edit src/sys/conf/newvers.sh Add delta 1.69.2.15 2007.10.23.23.41.24 kensmith Edit src/sys/ddb/db_command.c Add delta 1.60.2.4 2007.10.23.16.07.30 obrien Edit src/sys/fs/nullfs/null_subr.c Add delta 1.48.2.2 2007.10.23.03.38.31 daichi Edit src/sys/fs/nullfs/null_vnops.c Add delta 1.87.2.4 2007.10.23.03.38.32 daichi Edit src/sys/fs/unionfs/union.h Add delta 1.31.2.2 2007.10.23.03.28.22 daichi Add delta 1.31.2.3 2007.10.23.03.37.09 daichi Edit src/sys/fs/unionfs/union_subr.c Add delta 1.86.2.2 2007.10.23.03.22.48 daichi Add delta 1.86.2.3 2007.10.23.03.28.22 daichi Edit src/sys/fs/unionfs/union_vfsops.c Add delta 1.76.2.3 2007.10.23.03.32.17 daichi Add delta 1.76.2.4 2007.10.23.03.34.58 daichi Add delta 1.76.2.5 2007.10.23.03.37.09 daichi Edit src/sys/fs/unionfs/union_vnops.c Add delta 1.132.2.2 2007.10.23.03.24.37 daichi Add delta 1.132.2.3 2007.10.23.03.26.37 daichi Add delta 1.132.2.4 2007.10.23.03.28.22 daichi Add delta 1.132.2.5 2007.10.23.03.30.13 daichi Add delta 1.132.2.6 2007.10.23.03.32.17 daichi Add delta 1.132.2.7 2007.10.23.03.33.43 daichi Add delta 1.132.2.8 2007.10.23.03.37.10 daichi Shutting down connection to server Finished successfully ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: HEADSUP: don't upgrade to RELENG_6 now (7 is fine)
On 10/25/07, Rong-en Fan [EMAIL PROTECTED] wrote: On 10/25/07, Alson van der Meulen [EMAIL PROTECTED] wrote: Hello, My installworld of RELENG_6 from a few hours ago failed with this error (from memory): /lib/libncurses.so.6: undefined symbol: __mb_sb_limit This broke everything that depended on libncurses, plus PAM. I had to force a reboot via DDB and copy /usr/obj/lib/libc/libc.so.6 to /lib using binaries from /rescue to fix it so I could run make installworld again. I will take a look. before that do not upgrade your system. I did some tests, it turns out that only RELENG_6 is affected. To be more specific, as ncurses lib is installed before libc. It gets broken. For 7 and above, it is fine because we install libc right after csu and before everything. One way to solve this is we install libc as early as possible, but I think it may be too risky at release cycle, so I would like to back out this change and add an UPDATING entry. Then check whether we can change the installation order of libc later. If you cvsup after 'Oct 24 14:32:33 2007 UTC', please do not upgrade until I send out all clear message. If you already broken your world, use this way in *single user*, /rescue/chflags noschg /lib/libc.so.6 /rescue/cp /usr/obj/usr/src/lib/libc/libc.so.6 /lib/ then you need reboot, after that continue installworld (this is what I just did). Sorry for all the trouble. Thanks, Rong-En Fan Regards, Rong-En Fan I upgraded from RELENG_6 of October, 22. I believe I followed the procedure from /usr/src/UPDATING fairly closely, except for the reboot to single user part after installing the kernel: mergemaster -p make buildworld make kernel make installworld mergemaster make delete-old I would expect libc to be installed before other libs. The securelevel was -1, so it should be no problem to overwrite libc. Did I do something wrong or is this a bug/missing entry in UPDATING? regards, Alson The csup output since my last make world: -- Running /usr/bin/csup -- Parsing supfile /usr/share/examples/cvsup/stable-supfile Connecting to cvsup3.nl.freebsd.org Connected to 62.250.3.15 Server software version: SNAP_16_1h Negotiating file attribute support Exchanging collection information Establishing multiplexed-mode data connection Running Updating collection src-all/cvs Edit src/include/_ctype.h Add delta 1.30.2.1 2007.10.24.14.32.32 rafan Edit src/include/ctype.h Add delta 1.28.8.1 2007.10.24.14.32.32 rafan Edit src/lib/libc/locale/big5.c Add delta 1.17.2.1 2007.10.24.14.32.32 rafan Edit src/lib/libc/locale/euc.c Add delta 1.21.2.1 2007.10.24.14.32.32 rafan Edit src/lib/libc/locale/gb18030.c Add delta 1.7.2.1 2007.10.24.14.32.32 rafan Edit src/lib/libc/locale/gb2312.c Add delta 1.9.2.1 2007.10.24.14.32.33 rafan Edit src/lib/libc/locale/gbk.c Add delta 1.12.2.1 2007.10.24.14.32.33 rafan Edit src/lib/libc/locale/isctype.c Add delta 1.9.14.1 2007.10.24.14.32.33 rafan Edit src/lib/libc/locale/mskanji.c Add delta 1.17.2.1 2007.10.24.14.32.33 rafan Edit src/lib/libc/locale/none.c Add delta 1.13.2.1 2007.10.24.14.32.33 rafan Edit src/lib/libc/locale/setrunelocale.c Add delta 1.45.2.1 2007.10.24.14.32.33 rafan Edit src/lib/libc/locale/utf8.c Add delta 1.13.2.2 2007.10.24.14.32.33 rafan Edit src/lib/libstand/Makefile Add delta 1.54.2.1 2007.10.24.11.50.06 nyan Edit src/release/Makefile Add delta 1.887.2.21 2007.10.23.23.45.14 kensmith Edit src/sbin/mount_unionfs/mount_unionfs.8 Add delta 1.20.2.2 2007.10.23.03.37.09 daichi Edit src/share/mklocale/UTF-8.src Add delta 1.1.8.2 2007.10.24.14.32.33 rafan Edit src/sys/alpha/pci/pcibus.c Add delta 1.36.2.2 2007.10.24.12.36.25 jhb Edit src/sys/boot/ficl/Makefile Add delta 1.41.2.2 2007.10.24.11.50.07 nyan Edit src/sys/boot/pc98/Makefile.inc Add delta 1.5.8.2 2007.10.24.11.50.07 nyan Edit src/sys/conf/newvers.sh Add delta 1.69.2.15 2007.10.23.23.41.24 kensmith Edit src/sys/ddb/db_command.c Add delta 1.60.2.4 2007.10.23.16.07.30 obrien Edit src/sys/fs/nullfs/null_subr.c Add delta 1.48.2.2 2007.10.23.03.38.31 daichi Edit src/sys/fs/nullfs/null_vnops.c Add delta 1.87.2.4 2007.10.23.03.38.32 daichi Edit src/sys/fs/unionfs/union.h Add delta 1.31.2.2 2007.10.23.03.28.22 daichi Add delta 1.31.2.3 2007.10.23.03.37.09 daichi Edit src/sys/fs/unionfs/union_subr.c Add delta 1.86.2.2 2007.10.23.03.22.48 daichi Add delta 1.86.2.3 2007.10.23.03.28.22 daichi Edit src/sys/fs/unionfs/union_vfsops.c Add delta 1.76.2.3 2007.10.23.03.32.17 daichi Add delta 1.76.2.4 2007.10.23.03.34.58 daichi Add delta 1.76.2.5 2007.10.23.03.37.09 daichi Edit src/sys/fs/unionfs/union_vnops.c Add delta 1.132.2.2
Re: Installworld broken on RELENG_6 by libc commit?
On 10/25/07, David Booth [EMAIL PROTECTED] wrote: On Wednesday 24 October 2007, Alson van der Meulen wrote: Hello, My installworld of RELENG_6 from a few hours ago failed with this error (from memory): /lib/libncurses.so.6: undefined symbol: __mb_sb_limit This broke everything that depended on libncurses, plus PAM. I had to force a reboot via DDB and copy /usr/obj/lib/libc/libc.so.6 to /lib using binaries from /rescue to fix it so I could run make installworld again. I upgraded from RELENG_6 of October, 22. I believe I followed the procedure from /usr/src/UPDATING fairly closely, except for the reboot to single user part after installing the kernel: mergemaster -p make buildworld make kernel make installworld mergemaster make delete-old I would expect libc to be installed before other libs. The securelevel was -1, so it should be no problem to overwrite libc. Did I do something wrong or is this a bug/missing entry in UPDATING? regards, Alson It is not something you did. I had the same problem and have just recovered from it. Sorry for the trouble, see my HEADSUP message on [EMAIL PROTECTED] Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: is read-write nullfs safe?
On 6/21/07, Peter Jeremy [EMAIL PROTECTED] wrote: On 2007-Jun-19 02:58:20 -0400, Kris Kennaway [EMAIL PROTECTED] wrote: On Tue, Jun 19, 2007 at 02:39:22PM +0800, Rong-en Fan wrote: I was asking about nullfs because the following lines in sys/conf/NOTES: # NB: The NULL, PORTAL, UMAP and UNION filesystems are known to be # buggy, and WILL panic your system if you attempt to do anything with # them. They are included here as an incentive for some enterprising # soul to sit down and fix them. Yeah, that's almost completely stale for both 6.x and 7.x. Since this issue pops up fairly regularly, would it be possible to correct, tone down or remove this warning before 6.3/7.0? Ya, how about this one: http://people.freebsd.org/~rafan/remove-nullfs-warning.diff It just remove NULL from the warning list. If no one objects, I will ask re@ for approval tomorrow. BTW, since umapfs is disconnected from build, shall we axe it? Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: is read-write nullfs safe?
On 6/21/07, Kris Kennaway [EMAIL PROTECTED] wrote: On Thu, Jun 21, 2007 at 09:49:16PM +0800, Rong-en Fan wrote: On 6/21/07, Peter Jeremy [EMAIL PROTECTED] wrote: On 2007-Jun-19 02:58:20 -0400, Kris Kennaway [EMAIL PROTECTED] wrote: On Tue, Jun 19, 2007 at 02:39:22PM +0800, Rong-en Fan wrote: I was asking about nullfs because the following lines in sys/conf/NOTES: # NB: The NULL, PORTAL, UMAP and UNION filesystems are known to be # buggy, and WILL panic your system if you attempt to do anything with # them. They are included here as an incentive for some enterprising # soul to sit down and fix them. Yeah, that's almost completely stale for both 6.x and 7.x. Since this issue pops up fairly regularly, would it be possible to correct, tone down or remove this warning before 6.3/7.0? Ya, how about this one: http://people.freebsd.org/~rafan/remove-nullfs-warning.diff Maybe note that UNIONFS is being maintained now and is in a much better state, although there are still some issues being resolved. OK, I add them. See the patch in the url above. It just remove NULL from the warning list. If no one objects, I will ask re@ for approval tomorrow. BTW, since umapfs is disconnected from build, shall we axe it? I would recommend it. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: is read-write nullfs safe?
On 6/19/07, Josh Paetzel [EMAIL PROTECTED] wrote: On Tuesday 19 June 2007, Rong-en Fan wrote: I'm running 6.2-RELEASE, and I am wondering if using nullfs w/ rw is safe in a production environment? My impression is that ro nullfs is ok, but not rw. Is this still the case? Regards, Rong-En Fan I've been using r/w nullfs in production for ages without issue...sure you're not confusing nullfs with unionfs? I'm aware that unionfs status and I think it's usable in 7.x, right? I was asking about nullfs because the following lines in sys/conf/NOTES: # NB: The NULL, PORTAL, UMAP and UNION filesystems are known to be # buggy, and WILL panic your system if you attempt to do anything with # them. They are included here as an incentive for some enterprising # soul to sit down and fix them. Regards, Rong-En Fan -- Thanks, Josh Paetzel ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
is read-write nullfs safe?
I'm running 6.2-RELEASE, and I am wondering if using nullfs w/ rw is safe in a production environment? My impression is that ro nullfs is ok, but not rw. Is this still the case? Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Unable to install FreeBSD from external USB cdrom
On 5/28/07, Daniel O'Connor [EMAIL PROTECTED] wrote: Daniel O'Connor wrote: kib@ has real mode BTX code which appears to work with affected systems of mine, however, the code has not yet made it into CVS. I spliced it into a 6.2 miniboot ISO and it worked. Ooh ahh, please sir, can I have some more^Wit? :) I did some googling.. Is this the patch? http://people.freebsd.org/~kib/realbtx/realbtx.2.patch (Going to try it today anyway :) [I'm CC'ing [EMAIL PROTECTED] Yes, there is also a loader/pxeboot in the same directory. As kib@ told me, do not install this loader on your disk which may destroy your data. Regards, Rong-En Fan -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au The nice thing about standards is that there are so many of them to choose from. -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Unable to install FreeBSD from external USB cdrom
On 5/26/07, Bruce M. Simpson [EMAIL PROTECTED] wrote: Daniel O'Connor wrote: I believe this is most likely this issue... http://www.nabble.com/BTX-issues-when-booting-from-a-USB-CD-ROM-t3047441.html Alas no solution yet as far as I am aware :( Forgot to Cc: my reply to the list: kib@ has real mode BTX code which appears to work with affected systems of mine, however, the code has not yet made it into CVS. I spliced it into a 6.2 miniboot ISO and it worked. It also works on my ThinkPad X60 with 7.x boot cd. Regards, Rong-En Fan regards, BMS ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: bge watchdog timeout -- resetting problem on recent update
On 4/18/07, Tom Evans [EMAIL PROTECTED] wrote: On Wed, 2007-04-18 at 17:38 +0800, Jason Chang 張傑生 wrote: Dear All, After recent cvsup and make world, my server suffered from the bge watchdog timeout -- resetting problem. Manually revert the bge related source to older version and compiled a new kernel may solve the problem. So I guess the recent committed source does not go well with bge nics onboard of IBM e326m servers. Hi Jason Can you add hw.pci.enable_msi=0 hw.pci.enable_msix=0 to /boot/loader.conf and reboot and see if that solves the problem. I saw this problem a while ago on my -CURRENT with a misbehaving MSI on my motherboard. (I may have this completely wrong; I'm not even 100% sure the MSI stuff has been MFC'ed to -STABLE). MSI stuffs are MFC'ed to 6.x since April 1 by jhb@, but I thought to use MSI, one MUST set hw.pci.enable_{msi,msix} to 1 in loader.conf as the commit log said. But from the revision changes that Jason posted, the most suspicious part is MSI... Regards, Rong-En Fan Cheers Tom ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: HEADS UP: ncurses is updated
On 4/10/07, Jeremy Chadwick [EMAIL PROTECTED] wrote: On Sat, Apr 07, 2007 at 02:05:45AM +0800, Rong-en Fan wrote: I just merged ncurses 5.6 and wide character support from HEAD to 6.x. That means ncurses in 6.x is now up-to-date and has wide character support, i.e., ncursesw library. I just wanted to take a moment to thank you for this. You have no idea how long I've been waiting (okay, now you do: years!), as I never felt comfortable with having two versions of ncurses installed on a single box (base + port). So far it works great. Thank you so much! You are welcome. The only thing I've found, though, is that dialog(1) does not appear to properly handle UTF-8 encoding. Line drawing characters show up as gibberish (alphanumeric characters). I realise dialog isn't part of ncurses, but it does rely on it. We should consider updating dialog to match this change. You mean it display sometihng like tqxu instead of line drawing characters? Last time I checked, I thought it is terminal related. When I use screen, it uses line drawing character. For PuTTY, see: http://lists.freebsd.org/pipermail/freebsd-questions/2007-April/146577.html The current dialog + utf8 MacOS's Term.app seems work just fine. I'm playing with devel/cdialog and no matter it uses ncurses or ncursesw the result is the same. I'm CCing ache@ who imports GNU's dialog to our base and cdialog/ncurses author, hope they can comment :-) Regards, Rong-En Fan -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: HEADS UP: ncurses is updated
On 4/10/07, Nikolay Pavlov [EMAIL PROTECTED] wrote: On Monday, 9 April 2007 at 11:48:08 -0700, Jeremy Chadwick wrote: On Mon, Apr 09, 2007 at 11:21:08AM -0700, Jeremy Chadwick wrote: On Tue, Apr 10, 2007 at 01:49:32AM +0800, Rong-en Fan wrote: On 4/10/07, Jeremy Chadwick [EMAIL PROTECTED] wrote: The only thing I've found, though, is that dialog(1) does not appear to properly handle UTF-8 encoding. Line drawing characters show up as gibberish (alphanumeric characters). I realise dialog isn't part of ncurses, but it does rely on it. We should consider updating dialog to match this change. You mean it display sometihng like tqxu instead of line drawing characters? Last time I checked, I thought it is terminal related. When I use screen, it uses line drawing character. For PuTTY, see: http://lists.freebsd.org/pipermail/freebsd-questions/2007-April/146577.html This is quite applicable. I just now got around to reading it (should've done this before I sent my previous Email). Yep, that's the exact problem: /usr/bin/dialog: libdialog.so.5 = /usr/lib/libdialog.so.5 (0x3807e000) libncurses.so.6 = /lib/libncurses.so.6 (0x38099000) libc.so.6 = /lib/libc.so.6 (0x380dd000) At least I have a workaround with NCURSES_NO_UTF8_ACS=1. :-) I am not sure, but maybe this is related to ncurses update. I am getting this trying to run sysinstall utility: Probing devices, please wait (this can take a while)...BARF 170 105 Than goes EOL and exit... It's a current from April 6. The ncurses update to 5.6 is in late Jan, and enable wide character support it in late Feb. My sysinstall runs just fine under console and rxvt-unicode on my currenct as of yesterday. -- == - Best regards, Nikolay Pavlov. --- == ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Call for Testers: ncurses 5.6 update
On 4/7/07, Stefan Lambrev [EMAIL PROTECTED] wrote: Hi list, Rong-en Fan wrote: On 3/13/07, Stefan Lambrev [EMAIL PROTECTED] wrote: Hello, Rong-en Fan wrote: On 3/12/07, Stefan Lambrev [EMAIL PROTECTED] wrote: Rong-en Fan wrote: Hi folks, ncurses in 6.x is pretty old. We have update-to-date ncurses in 7.x with wide character support now. The patch at http://people.freebsd.org/~rafan/ncurses/ncursesw-5.6-all-fbsd6-20070310.diff.gz gives you ncurses 5.6 and wide character support in 6.x. Please apply with 'patch -p0' under /usr/src. For more information, please visit http://people.freebsd.org/~rafan/ncurses/ You can also find individual patches, say ncurses update and wide character support, there. Feedbacks and suggestions are welcome. P.S. Due to some lib32 issues, the patch above contains changes made by ru@ recently for src/Makefile.inc1. make installworld failed: cd /usr/src; /usr/obj/usr/src/make.amd64/make -f Makefile.inc1 install32 mkdir -p /usr/lib32 # XXX add to mtree [...] Sorry about this. I messed up the lib32 changes in the all-in-one patch. Could you please use this one instead? http://people.freebsd.org/~rafan/ncurses/ncursesw-5.6-all-fbsd6-20070312.diff.gz This patch doesn't seems to work anymore on 6.2-stable i386 (my previous test was on amd64) It seems that part of this patch is already in -stable :) If I'm right the patch for src/Makefile.inc1 should be replaced by : --- Makefile.inc1 Fri Apr 6 20:03:35 2007 +++ /root/Makefile.inc1.origFri Apr 6 20:03:17 2007 @@ -894,8 +894,7 @@ bin/csh \ bin/sh \ ${_rescue} \ -lib/ncurses/ncurses \ -lib/ncurses/ncursesw \ +lib/libncurses \ ${_share} \ ${_aicasm} \ usr.bin/awk \ @@ -1000,8 +999,7 @@ _prebuild_libs+= lib/libbz2 lib/libcom_err lib/libcrypt lib/libexpat \ lib/libkvm lib/libmd \ - lib/ncurses/ncurses lib/ncurses/ncursesw \ - lib/libnetgraph lib/libopie lib/libpam \ + lib/libncurses lib/libnetgraph lib/libopie lib/libpam \ lib/libradius \ lib/libsbuf lib/libtacplus lib/libutil \ lib/libz lib/msun I'm still compiling and will let you know if things still works. Yes, you are right. I merged Makefile.inc1 changes two days ago. I'm going to merge the whole changes later. Enjoy! Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
HEADS UP: ncurses is updated
Hi all, I just merged ncurses 5.6 and wide character support from HEAD to 6.x. That means ncurses in 6.x is now up-to-date and has wide character support, i.e., ncursesw library. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Lenovo X60 em workaround
On 1/23/07, Jack Vogel [EMAIL PROTECTED] wrote: Hey Gleb, Acknowledge... I can do better than that, I have a fix for this problem, and its not temporary. Here is the code change (not a patch, I'm very busy), its in hardware_init, should be obvious how to patch: /* Make sure we have a good EEPROM before we read from it */ if (e1000_validate_nvm_checksum(adapter-hw) 0) { /* ** Some PCI-E parts fail the first check due to ** the link being in sleep state, call it again, ** if it fails a second time its a real issue. */ if (e1000_validate_nvm_checksum(adapter-hw) 0) { device_printf(dev, The EEPROM Checksum Is Not Valid\n); return (EIO); } } This is already checked into my code base at Intel, I've just been too busy to do anything with it, be my guest if you wish to check it in after testing... I accidentally found this : http://www-307.ibm.com/pc/support/site.wss/document.do?sitestyle=lenovolndocid=MIGR-67166 which patches the eeprom. And it solves by problem. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Call for Testers: ncurses 5.6 update
On 3/12/07, Stefan Lambrev [EMAIL PROTECTED] wrote: Rong-en Fan wrote: Hi folks, ncurses in 6.x is pretty old. We have update-to-date ncurses in 7.x with wide character support now. The patch at http://people.freebsd.org/~rafan/ncurses/ncursesw-5.6-all-fbsd6-20070310.diff.gz gives you ncurses 5.6 and wide character support in 6.x. Please apply with 'patch -p0' under /usr/src. For more information, please visit http://people.freebsd.org/~rafan/ncurses/ You can also find individual patches, say ncurses update and wide character support, there. Feedbacks and suggestions are welcome. P.S. Due to some lib32 issues, the patch above contains changes made by ru@ recently for src/Makefile.inc1. make installworld failed: cd /usr/src; /usr/obj/usr/src/make.amd64/make -f Makefile.inc1 install32 mkdir -p /usr/lib32 # XXX add to mtree [...] Sorry about this. I messed up the lib32 changes in the all-in-one patch. Could you please use this one instead? http://people.freebsd.org/~rafan/ncurses/ncursesw-5.6-all-fbsd6-20070312.diff.gz It should solve lib32 problem. Note that individual patches work well. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Call for Testers: ncurses 5.6 update
On 3/13/07, Stefan Lambrev [EMAIL PROTECTED] wrote: Hello, Rong-en Fan wrote: On 3/12/07, Stefan Lambrev [EMAIL PROTECTED] wrote: Rong-en Fan wrote: Hi folks, ncurses in 6.x is pretty old. We have update-to-date ncurses in 7.x with wide character support now. The patch at http://people.freebsd.org/~rafan/ncurses/ncursesw-5.6-all-fbsd6-20070310.diff.gz gives you ncurses 5.6 and wide character support in 6.x. Please apply with 'patch -p0' under /usr/src. For more information, please visit http://people.freebsd.org/~rafan/ncurses/ You can also find individual patches, say ncurses update and wide character support, there. Feedbacks and suggestions are welcome. P.S. Due to some lib32 issues, the patch above contains changes made by ru@ recently for src/Makefile.inc1. make installworld failed: cd /usr/src; /usr/obj/usr/src/make.amd64/make -f Makefile.inc1 install32 mkdir -p /usr/lib32 # XXX add to mtree [...] Sorry about this. I messed up the lib32 changes in the all-in-one patch. Could you please use this one instead? http://people.freebsd.org/~rafan/ncurses/ncursesw-5.6-all-fbsd6-20070312.diff.gz This works for me (at least make buildworld make installworld finished without problems). Thanks for testing. Should I recompile and the kernel again or the patch is only in contrib ? :) No you don't. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Call for Testers: ncurses 5.6 update
Hi folks, ncurses in 6.x is pretty old. We have update-to-date ncurses in 7.x with wide character support now. The patch at http://people.freebsd.org/~rafan/ncurses/ncursesw-5.6-all-fbsd6-20070310.diff.gz gives you ncurses 5.6 and wide character support in 6.x. Please apply with 'patch -p0' under /usr/src. For more information, please visit http://people.freebsd.org/~rafan/ncurses/ You can also find individual patches, say ncurses update and wide character support, there. Feedbacks and suggestions are welcome. P.S. Due to some lib32 issues, the patch above contains changes made by ru@ recently for src/Makefile.inc1. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ncurses
On 1/18/07, Stephen Montgomery-Smith [EMAIL PROTECTED] wrote: In the cvs repository, there has appeared src/lib/ncurses, which seems to be a copy of lib/libncurses. Is this meant to be? Yes. it's for the upcoming ncurses update, which will occur within one week. I'm waiting the current exp run on pointyhat to be finished. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: using md-mounted ISO as NFS root for PXE booting and installing
On 1/12/07, Andrew N. Below [EMAIL PROTECTED] wrote: Hello. I have 6.1-STABLE FreeBSD box I want to use as network boot and install server for PXE clients. At this moment I'm experimenting with 4.11-RELEASE-i386-disc1-gnome.iso and 6.2-RC2-i386-disc1.iso images. [...] All is fine if we are booting into 4.11 (root mounted from MFS, sysintall runs and we able to install OS to local disks via NFS). But in case of 6.2-RC2 root mounted from NFS, not MFS: Does add vfs.root.mountfrom=ufs:/dev/md0c in boot/loader.conf from CD help? Of course, you have to copy the boot/ directory out of cd. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
gpt device node does not show at boot
I'm running 6.2-RC1 on i386. I use gpt(8) to partition my disk. After reboot, the device node, say da1p1, does not show up until 'gpt show da1' is issued. This prevents gpt partition being mounted from fstab, and therefore cannot be nfs exported at boot time! My kernel config is simply GENERIC+QUOTA+SMP. I also noticed that it is not possible to modify in-use disk's partition table. There is also a PR 85772 about it. Can someone comment on it? Thanks. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ips(4) in toaster mode
It seems that after upgrading ips firmware to the latest version available on ibm.com solves this problem. One changelog caught me eye: increase timeout when tape driver is attached to the adapter. Indeed, we have a tape on ahd(4), which I think ips(4) is sharing this adapter. however, the mystery is why ips stops work suddenly. Perhaps, I tweaked some hardware settings and I forgot it. On 11/18/06, Scott Long [EMAIL PROTECTED] wrote: I'll look at this. Scott Rong-en Fan wrote: Hi, After upgrading RELENG_6 from Jul 11 to Sep 30 on an i386 box, everytime I run tar to backup my system to a mounted nfs volume. After one hour of operation, it panics with sleeping thread. Upgrading to RELENG_6_2 does not help. Also, the console is complete hang, I can not break into DDB at all. The only thing is do power cycling. Also, the only harddisk on that host is the ips(4), so I can not obtain a kernel dump. I'm not sure if this is a hardware failure, at least, no led on the panel is shown red... OK, the only information on console is attached below. Any suggesstion are welcome. Thanks, Rong-En Fan == ips0: WARNING: command timeout. Adapter is in toaster mode, resetting to known s tate ips0: resetting adapter, this may take up to 5 minutes ips0: syncing config Sleeping thread (tid 12, pid 14) owns a non-sleepable lock sched_switch(c5feec00,0,1,8577f833,d14d2103,...) at sched_switch+0x158 mi_switch(1,0) at mi_switch+0x1d5 sleepq_switch(c60c2604,e9f77b8c,c051acd3,4,1,...) at sleepq_switch+0x93 sleepq_wait(c60c2604,c60c25e0,c06e6957,1,1,...) at sleepq_wait+0x75 cv_wait(c60c2604,c60c25e0,a,e9f77c04,5,...) at cv_wait+0x151 _sema_wait(c60c25e0,0,0,c60c2400,c60c2400,...) at _sema_wait+0x64 ips_send_config_sync_cmd(c60f5000,e9f77c08,1,c60f5000,7,...) at ips_send_config_sync_cmd+0x78 ips_clear_adapter(c60c2400,c60b6e00,0,4,c60f5000,...) at ips_clear_adapter+0x60 ips_morpheus_reinit(c60c2400,1,c053abf7,c0740100,c5feec00,...) at ips_morpheus_reinit+0x2ac ips_timeout(c60c2400,c053a7f5,c5feec00,c5feea80,d69f60ed,...) at ips_timeout+0xf8 softclock(0,e9f77cd4,15dbe,c43ec589,c5feec00,...) at softclock+0x35d ithread_execute_handlers(c5fed430,c6042000,0,0,0,...) at ithread_execute_handlers+0x162 ithread_loop(c5fbc880,e9f77d38,0,0,0,...) at ithread_loop+0x64 fork_exit(c050b22a,c5fbc880,e9f77d38) at fork_exit+0x7b fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe9f77d6c, ebp = 0 --- panic: sleeping thread cpuid = 2 KDB: stack backtrace: kdb_backtrace(c0702303,2,c06ef01b,e9f98bf0,0,...) at kdb_backtrace+0x2f panic(c06ef01b,,e,c051ac54,1,...) at panic+0x129 propagate_priority(c5fefd80,c5fefd80,0,0,0,...) at propagate_priority+0x69 turnstile_wait(c60c25a8,c5feec00,c610c000,c7d064a4,4,...) at turnstile_wait+0x32f _mtx_lock_sleep(c60c25a8,c5fefd80,0,0,0,...) at _mtx_lock_sleep+0xfd ipsd_strategy(c7d064a4,43,200,0,c04e31a1,...) at ipsd_strategy+0x70 g_disk_start(c7d1e4a4,c073bac8,24c,c06e8406,64,...) at g_disk_start+0x1b1 g_io_schedule_down(c5fefd80,4c,c5fefd80,c04e39c1,e9f98d04,...) at g_io_schedule_down+0x15f g_down_procbody(0,e9f98d38,0,0,0,...) at g_down_procbody+0xb3 fork_exit(c04e39c1,0,e9f98d38) at fork_exit+0x7b fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe9f98d6c, ebp = 0 --- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Panic in thread taskq on RELENG_6
On 11/29/06, Kevin Oberman [EMAIL PROTECTED] wrote: From: =?iso-8859-1?Q?Markus_Oestreicher?= [EMAIL PROTECTED] Date: Tue, 28 Nov 2006 20:01:06 +0100 Sender: [EMAIL PROTECTED] Good Day, I get a panic on latest RELENG_6 every 6-12 hours. The server is a Dual Xeon FSB800 with 2 GB RAM and aac(4)-disks running postfix and amavisd-new for SPAM scanning. kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 07 fault virtual address = 0x104 fault code= supervisor read, page not present instruction pointer = 0x20:0xc06774e1 stack pointer = 0x28:0xe4f93c90 frame pointer = 0x28:0xe4f93c9c code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eglags = resume, IOPL = 0 current process = 5 (thread taskq) The panic always in process thread taskq. dbtrace _mit_lock_sleep(cb031e5c,c63f7180) at _mtx_lock_sleep+0x9d unp_gc(0,1) at uno_gc+0x222 taskqueue_run(c6439d80) at taskqueue_run+0x13f taskqueue_thread_loop(c09f8988,e4f93d38) at taskqueue_thread_loop+0x 92 fork_exit(c06a1bc0,c09f8988,e4f93d38) at fork_exit+0x71 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp=0xe4f93d6c, ebp = 0 FreeBSD mx.local 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #1: Tue Nov 28 02:12:58 CET 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SMP i386 Does that look like a hardware problem or a software issue? I will try to swap RAM in the next few days. You are the third person to report this panic. (I am one of the other two. I reported unp_gc() panic recently. See Re: LOR (intr table and sio) and instability on [EMAIL PROTECTED] jhb@ told me that he also saw this and there is currently no fix yet. Regards, Rong-En Fan I am guessing from the name of your kernel that this is an SMP system. So are the other two. Are you running gnome-2.16 with hald? This is about all we found in common on the first two systems. Robert Watson would like some added data. Can you build a kernel with the following options and connect something to the serial port to record output? options WITNESS options INVARIANT_SUPPORT options DDB options KDB options INVARIANTS At the debugger prompt: show pcpu trace show allpcpu traceall show alllocks At least my system has been totally uncooperative in crashing when I am anywhere near it, so I have not yet collected any information other than dumps. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: [EMAIL PROTECTED] Phone: +1 510 486-8634 Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ips(4) in toaster mode
On 11/18/06, Martin Blapp [EMAIL PROTECTED] wrote: Hi, Also, the only harddisk on that host is the ips(4), so I can not obtain a kernel dump. I'm not sure if this is a hardware failure, at least, no led on the panel is shown red... Hmm ? We do kernel dumps on ips(4) and it works. dumpdev=/dev/ipsd0s1b Martin ips(4) can do kernel dump, but in my case above, ips(4) is already command timeout mode... Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
ips(4) in toaster mode
Hi, After upgrading RELENG_6 from Jul 11 to Sep 30 on an i386 box, everytime I run tar to backup my system to a mounted nfs volume. After one hour of operation, it panics with sleeping thread. Upgrading to RELENG_6_2 does not help. Also, the console is complete hang, I can not break into DDB at all. The only thing is do power cycling. Also, the only harddisk on that host is the ips(4), so I can not obtain a kernel dump. I'm not sure if this is a hardware failure, at least, no led on the panel is shown red... OK, the only information on console is attached below. Any suggesstion are welcome. Thanks, Rong-En Fan == ips0: WARNING: command timeout. Adapter is in toaster mode, resetting to known s tate ips0: resetting adapter, this may take up to 5 minutes ips0: syncing config Sleeping thread (tid 12, pid 14) owns a non-sleepable lock sched_switch(c5feec00,0,1,8577f833,d14d2103,...) at sched_switch+0x158 mi_switch(1,0) at mi_switch+0x1d5 sleepq_switch(c60c2604,e9f77b8c,c051acd3,4,1,...) at sleepq_switch+0x93 sleepq_wait(c60c2604,c60c25e0,c06e6957,1,1,...) at sleepq_wait+0x75 cv_wait(c60c2604,c60c25e0,a,e9f77c04,5,...) at cv_wait+0x151 _sema_wait(c60c25e0,0,0,c60c2400,c60c2400,...) at _sema_wait+0x64 ips_send_config_sync_cmd(c60f5000,e9f77c08,1,c60f5000,7,...) at ips_send_config_sync_cmd+0x78 ips_clear_adapter(c60c2400,c60b6e00,0,4,c60f5000,...) at ips_clear_adapter+0x60 ips_morpheus_reinit(c60c2400,1,c053abf7,c0740100,c5feec00,...) at ips_morpheus_reinit+0x2ac ips_timeout(c60c2400,c053a7f5,c5feec00,c5feea80,d69f60ed,...) at ips_timeout+0xf8 softclock(0,e9f77cd4,15dbe,c43ec589,c5feec00,...) at softclock+0x35d ithread_execute_handlers(c5fed430,c6042000,0,0,0,...) at ithread_execute_handlers+0x162 ithread_loop(c5fbc880,e9f77d38,0,0,0,...) at ithread_loop+0x64 fork_exit(c050b22a,c5fbc880,e9f77d38) at fork_exit+0x7b fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe9f77d6c, ebp = 0 --- panic: sleeping thread cpuid = 2 KDB: stack backtrace: kdb_backtrace(c0702303,2,c06ef01b,e9f98bf0,0,...) at kdb_backtrace+0x2f panic(c06ef01b,,e,c051ac54,1,...) at panic+0x129 propagate_priority(c5fefd80,c5fefd80,0,0,0,...) at propagate_priority+0x69 turnstile_wait(c60c25a8,c5feec00,c610c000,c7d064a4,4,...) at turnstile_wait+0x32f _mtx_lock_sleep(c60c25a8,c5fefd80,0,0,0,...) at _mtx_lock_sleep+0xfd ipsd_strategy(c7d064a4,43,200,0,c04e31a1,...) at ipsd_strategy+0x70 g_disk_start(c7d1e4a4,c073bac8,24c,c06e8406,64,...) at g_disk_start+0x1b1 g_io_schedule_down(c5fefd80,4c,c5fefd80,c04e39c1,e9f98d04,...) at g_io_schedule_down+0x15f g_down_procbody(0,e9f98d38,0,0,0,...) at g_down_procbody+0xb3 fork_exit(c04e39c1,0,e9f98d38) at fork_exit+0x7b fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe9f98d6c, ebp = 0 --- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
panic when portupgrade in jail (devfs related?)
Hi, I'm running RELENG_6 as of yesterday on a amd64 box. This host has one jail running, and everytime when I try to run portupgrade inside the jail. It panics. INVARIANTS does not catch anything. I don't think this happens on RELENG_6 two months ago. The panic messages and backtrace are shown: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0xfffe75b851b0 fault code = supervisor read, page not present instruction pointer = 0x8:0x80231118 stack pointer = 0x10:0xb40a5860 frame pointer = 0x10:0xb40a5880 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 60200 (script) [thread pid 60200 tid 100268 ] Stopped at ptcclose+0x19: movqlinesw(,%rdx,8),%rax db bt Tracing pid 60200 tid 100268 td 0xff006c4ec720 ptcclose() at ptcclose+0x19 giant_close() at giant_close+0x5f devfs_close() at devfs_close+0x28f VOP_CLOSE_APV() at VOP_CLOSE_APV+0x6e vn_close() at vn_close+0x90 vn_closefile() at vn_closefile+0x88 fdrop_locked() at fdrop_locked+0xa5 closef() at closef+0x35f close() at close+0x173 syscall() at syscall+0x4a1 Xfast_syscall() at Xfast_syscall+0xa8 --- syscall (6, FreeBSD ELF64, close), rip = 0x800807f9c, rsp = 0x7fffdfa8, rbp = 0 --- I put the box back to production. If anyone needs more information, I can reproduce this panic and gather them in ddb. BTW, I used 'call doadump' in ddb, but after rebooting, savecore complains there is no dump? Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
LOR (intr table and sio) and instability
I'm running today's 6-stable on a amd64 SMP (Pentium-D) machine. When turning on witness, I got a LOR on half way of booting: ata0-slave: pio=PIO4 wdma=WDMA2 udma=UDMA100 cable=40 wire ata2-master: pio=PIO4 wdma=WDMA2 udma=UDMA133 cable=40 wire ad4: 157066MB HDT722516DLA380 V43OA96A at ata2-master SATA150 ad4: 321672960 sectors [319120C/16H/63S] 16 sectors/interrupt 1 depth queue SMP: AP CPU #1 Launched! cpu1 AP: ID: 0x0100 VER: 0x00050014 LDR: 0x DFR: 0x lint0: 0x00010700 lint1: 0x0400 TPR: 0x SVR: 0x01ff timer: 0x000200ef therm: 0x0001 err: 0x0001 pcm: 0x0001 lock order reversal: 1st 0x805643c0 intr table (intr table) @ /home/admin/usr/src/sys/amd64/amd64/intr_machdep.c:417 2nd 0x8056fb00 sio (sio) @ /home/admin/usr/src/sys/dev/sio/sio.c:2586 KDB: stack backtrace: witness_checkorder() at witness_checkorder+0x4e1 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x5d siocnputc() at siocnputc+0xf5 cnputc() at cnputc+0x60 putchar() at putchar+0xcb kvprintf() at kvprintf+0x9b printf() at printf+0xe7 intr_assign_next_cpu() at intr_assign_next_cpu+0x7a intr_shuffle_irqs() at intr_shuffle_irqs+0x6b mi_startup() at mi_startup+0xc0 btext() at btext+0x2c INTR: Assigning IRQ 1 to local APIC 0 ioapic0: Assigning ISA IRQ 1 to local APIC 0 INTR: Assigning IRQ 3 to local APIC 1 ioapic0: Assigning ISA IRQ 3 to local APIC 1 INTR: Assigning IRQ 4 to local APIC 0 ioapic0: Assigning ISA IRQ 4 to local APIC 0 INTR: Assigning IRQ 9 to local APIC 1 ioapic0: Assigning ISA IRQ 9 to local APIC 1 INTR: Assigning IRQ 12 to local APIC 0 ioapic0: Assigning ISA IRQ 12 to local APIC 0 INTR: Assigning IRQ 14 to local APIC 1 ioapic0: Assigning ISA IRQ 14 to local APIC 1 INTR: Assigning IRQ 15 to local APIC 0 ioapic0: Assigning ISA IRQ 15 to local APIC 0 INTR: Assigning IRQ 17 to local APIC 1 ioapic0: Assigning PCI IRQ 17 to local APIC 1 GEOM: new disk ad4 Trying to mount root from ufs:/dev/ad4s1a I checked the LOR page, and there no such report. I'm wondering if this is related to the console hang every few days on this machine. When it hangs, keyboard does not work. The only way is to break into ddb and reset it. I just hooked up the serial console few days ago. No hang yet. But I got a panic this noon: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x18c fault code = supervisor read, page not present instruction pointer= 0x8:0x801f107e stack pointer = 0x10:0xb1839b30 frame pointer = 0x10:0xb1839b60 code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process= 9 (thread taskq) [thread pid 9 tid 100011 ] Stopped at _mtx_lock_sleep+0x80: movl0x18824(%r12),%r8d db bt Tracing pid 9 tid 100011 td 0xff007b218980 _mtx_lock_sleep() at _mtx_lock_sleep+0x80 unp_gc() at unp_gc+0x4c4 taskqueue_run() at taskqueue_run+0xd5 taskqueue_thread_loop() at taskqueue_thread_loop+0x88 fork_exit() at fork_exit+0x8b fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xb1839d00, rbp = 0 --- I searched the mail archive, but it seems there is no similar report. I call 'doadump' in ddb, but after rebooting, savecore says there is no dump there? (i have dump_dev=AUTO If you need more information, please let me know. Thanks, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nfsd stuck in ufs/biord/biowr/getblk
On 10/18/06, Vivek Khera [EMAIL PROTECTED] wrote: On Oct 16, 2006, at 5:03 PM, Rong-en Fan wrote: Yesterday, I saw my all my nfsd stuck in ufs/biord/biowr/getblk. I saw the same thing some time ago. I break into ddb and do a 'alltrace': do you have an em or bge ethernet? Yes. I do have an em0. My em0 does not share irq with other device. I got watchdog timeout message every few days, but I didn't get any in the deadlock above since boot. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
nfsd stuck in ufs/biord/biowr/getblk
Hi, Yesterday, I saw my all my nfsd stuck in ufs/biord/biowr/getblk. I saw the same thing some time ago. I break into ddb and do a 'alltrace': http://www.rafan.org/FreeBSD/ufs/20061017.txt The system in question is running 6-STABLE Sep 20. It's an i386 SMP box. When all nfsd stuck in ufs/biord/biowr/getblk, I can still login to the system (all exported fs are on an external RAID). I'm not sure how to trigger this behavior. Any suggestions are welcome. If there is anything I can provide in ddb to help trace this down, please let me know.. Thanks, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NFS locking question
On 8/15/06, Kris Kennaway [EMAIL PROTECTED] wrote: On Tue, Aug 15, 2006 at 11:59:50AM +0200, Morten A. Middelthon wrote: On Tue, Feb 28, 2006 at 05:21:50AM -0500, Kris Kennaway wrote: On Tue, Feb 28, 2006 at 11:14:53AM +0100, Patrick M. Hausen wrote: Hi, all! In our local office network we have a rather old FreeBSD 5.2.1 server acting as an NFS server for several other systems, mostly running 6.0. From time to time we experience processes on the NFS clients hanging in statd D with wchan lockd when accessing files over NFS. Try the attached patch on the 6.0 machines: Index: usr.sbin/rpc.lockd/lock_proc.c snip Hi, I have been plagued with this NFS lockd issue for quite some time now. It has kept me from installing FreeBSD 6.x on our workstations at work. I just tried applying your patch to my own 6.1-RELEASE-p3 workstation, and so far it has been working nicely. Has anyone else had the same experience? If so, maybe it should go into production? I was unable to obtain confirmation from anyone else (including the submitter who previously claimed it was necessary, and my own testing) that the patch actually solved a problem. Since it involves reverting useful functionality, someone would need to obtain further debugging from your system (tcpdump traces before/after, etc) to determine what it's actually solving. Kris In my experiences, rpc.lockd dies automatically on both server and client. If this happens, then all processes that want to lock a file, they will be stuck in lockd (top will tell). In my case, rpc.lockd dies because write failed, and then a SIGPIPE generated. Two months ago, bin/97768 is sent and rodrigc@ committed (also MFC'ed in RELENG_6). That PR ignores SIGPIPE (since the code in server/client already takes care of write failed case). After I applied this PR, I'm quite happy with nfs locking. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NFS Locking Issue
On 6/29/06, Michael Collette [EMAIL PROTECTED] wrote: This last week I had been working on a test network to test out 6.1 prior to upgrading our production boxes from 5.4. That's when I ran across the rpc.lockd issues that have been discussed earlier. Our production setup has diskless clients running KDE, which due to this bug is now dead on 6.1. I also have my mail server delivering messages to a file server via NFS. I even have servers booting diskless with NFS provided file systems... all of which are dead on 6.1. The last discussion our bug updates I've seen on this issue were about 3 months ago. This leaves me with a number of questions I hope can be answered here on this list. Is NFS a big deal for most other users, or am I out here on the fringe using it as much as I do? Is anyone working on a fix for this? If so, is there any kind of time frame where this fix might be MFC'd to 6-STABLE? I guess I'm still just a bit stunned that a bug this obvious not only found it's way into the STABLE branch, but is still there. Maybe it's not as obvious as I think, or not many folks are using it? All I know for sure here is that if I had upgraded to 6.1 my network would have been crippled. Try 6.1-STABLE, especially make sure you have $FreeBSD: src/usr.sbin/rpc.lockd/kern.c,v 1.16.2.1 2006/06/02 01:20:58 rodrigc Exp $ for usr.sbin/rpc.lockd/kern.c, and see if this helps. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Updating ncurses in base
Hi, [I'm also CC'ed peter@ since he is the maintainer of ncurses in base] As you may know, the current ncurses in the base system is rather old (it is 4 years old). I have been working on updating ncurses to the latest version 5.5 and enable wide character as default. I have put the description, goal, issues, current status, and tarball for test at the following URL: http://www.rafan.org/FreeBSD/ncurses/ I use the updated ncurses on my 7-CURRENT for sometime, everything works well. As I note in that page, there are some issues related to building framework of libncurses and related libraries. I hope there are some experienced people here can show me which way is most likely to be included in the base system. In addition to those issues, I hope some of you can test it and feedback. I really would like to see ncurses in base is updated in the near future. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?
On 5/25/06, Konstantin Belousov [EMAIL PROTECTED] wrote: On Thu, May 25, 2006 at 01:19:26AM -0400, Kris Kennaway wrote: On Wed, May 24, 2006 at 11:48:53PM -0400, Howard Leadmon wrote: So what's changed at that delta, under the one that works vfs_lookup.c is: Edit src/sys/kern/vfs_lookup.c Add delta 1.80.2.6 2006.03.31.07.39.24 kris Under the one that fails the vfs_lookup.c is: Edit src/sys/kern/vfs_lookup.c Add delta 1.80.2.7 2006.04.30.03.57.46 kris So I stand corrected on my last post, the issue is in fact in this module, as just taking that module back to 1.80.2.6 fixes the problem with my server. I even took multiple NFS clients and gave them a heavy workload, and CPU still remained reasonable, and very responsive. As soon as I rev to the new version, NFS breaks badly and even a single client doing something like a du of a directory structure results in sluggishness and extreme CPU usage. Yep, unfortunately this commit was necessary to fix other bugs. Jeff said he should have time to look at it next week. Kris I tried to debug the problem. First, I have to admit that I cannot reproduce the problem on GENERIC kernel. Only after QUOTAS where added, and, correspondingly, UFS started to require Giant, I get described behaviour. Below are the changes to GENERIC config file I made to reproduce problem. [...] After that, server machine easily panics on KASSERT(!(debug_mpsafenet == 1 mtx_owned(Giant)), (nfssvc_nfsd(): debug.mpsafenet=1 Giant)); from nfsserver/nfs_syscalls.c, line 570. As I understand the problem, kern/vfs_lookup.c:lookup() could aquire additional locks on Giant, indicating this by GIANTHELD flag in nd. All processing in nfsserver already goes with Giant held, so, I just dropped that excessive locks after return from lookup. System with patch applied survived smoke test (client did du on mounted dir, patch was generated from exported fs, etc.). nfsd eats no more than 25% of CPU (with INVARIANTS). Please, users who reported the problem and willing to help, try the patch (generated against STABLE) and give the feedback. [...] Hi Konstantin and others, I'm now running RELENG_6_1 as of Apr 30 04:00 UTC source + your patch. The nfsd is quite happy! After client's du finishes, it stays idle as expected (eats 0.00% CPU). Thank you very much. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Trouble with NFSd under 6.1-Stable, any ideas?
On 5/23/06, Konstantin Belousov [EMAIL PROTECTED] wrote: On Mon, May 22, 2006 at 05:43:32PM -0400, Rong-en Fan wrote: On 5/14/06, Kris Kennaway [EMAIL PROTECTED] wrote: On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote: Hello All, I have been running FBSD a long while, and actually running since the 5.x releases on the server I am having troubles with. I basically have a small network and just use NIS/NFS to link my various FBSD and Solaris machines together. This has all been running fine up till a few days ago, when all of a sudden NFS came to a crawl, and CPU usage so high the box appears to freeze almost. When I had 6.1-RC running all seemed well, then came the announcement for the official 6.1 release, so I did the cvs updates, made world, kernel, and ran mergemaster to get everything up to the 6.1 stable version. Now after doing this, something is wrong with NFS. It works, it will return information and open files, just it's very very slow, and while performing a request the CPU spike is astounding. A simple du of my home directory can take minutes, and machine all but locks up if the request is done over NFS. Here is top snip: PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 497 root 1 40 1252K 780K - 2 50:42 188.48% nfsd This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM on a disk array, and locally is screams, heck NFS used to scream till I updated. I am not really sure what info would be useful in debugging, so won't post tons of misc junk in this eMail, but if anyone has any ideas as to how best to figure out and resolve this issue it would sure be appreicated... Use tcpdump and related tools to find out what traffic is being sent. Also verify that you did not change your system configuration in any way: there have been no changes to NFS since the release, so it is unclear why an update would cause the problem to suddenly occur. Kris Hi Kris and Howard, As I posted few days ago, I have similar problems like Howard's (some details in the thread 6.1-RELEASE, em0 high interrupt rate and nfsd eats lots of cpu on stable@). After binary searching the source tree, I found that RELENG_6_1, 2006.04.30.03.57 ok RELENG_6_1, 2006.04.30.04.00 bad The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91. With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90, the same problem occurs. Let me refresh what problems I'm seeing 1. a client (no matter Linux 2.6.16 or FreeBSD 6.1) runs du on a nfs directory 2. on server-side, nfsd starts to eats lots of CPU 3. the du finishes 4. on server-side, nfsd still eats lots of CPU, but there is no nfs traffic. Wait for 5 minutes, you can still see that nfsd is running and eats lots of CPU. On FreeBSD 6.1R client, it uses UDP mount and fstab is like rw,-L,nosuid,bg,nodev. On Linux cleint, it uses UDP mount and fstab is like defaults,udp,hard,intr,nfsvers=3,rsize=8192,wsize=8192. The server's kernel conf is at http://www.rafan.org/FreeBSD/nfs/KERNEL Some related configuration files: /etc/export /export/dir1 host1 host2... /export/dir2 host1 host2... /etc/rc.conf nfs_server_enable=YES nfs_server_flags=-u -t -n 16 mountd_enable=YES mountd_flags=-r -l -n rpc_lockd_enable=YES rpc_statd_enable=YES rpcbind_enable=YES /etc/fstab: /dev/... /export/dir1 ufs rw,nosuid,noexec 2 2 /dev/... /export/dir2 ufs rw,nosuid,noexec,userquota 2 2 The NFS server is also using amd to mount some backup directories from another NFS server. the amd.conf is [global] browsable_dirs = yes map_type = file mount_type = nfs auto_dir = /nfs fully_qualified_hosts = no log_file = syslog nfs_proto = udp nfs_allow_insecure_port = no nfs_vers = 3 # plock = yes selectors_on_default = yes restart_mounts = yes [/backup] map_options = type:=direct map_name = /etc/amd.direct /etc/amd.direct: /defaults opts:=rw,grpid,resvport,vers=3,proto=udp,nosuid,nodev,rsize=8192,wsize=8192 backup type:=nfs;rhost:=nfs2;rfs:=/nfs2/${host} If there are any thing I can provide to help tracking this down. Please let me know. By the way, I tried with truss/kdump to see what happens when nfsd eats lot of CPUs, but in vain. They do not return anything. I tried your recipe on 7-CURRENT with locally exported fs, remounted over nfs. I did not get the behaviour your described. As noted in my previous thread, I have another 6.1-RELEASE nfs server, which does not have this problem. Could you, please, provide the backtrace for the nfsd that eats the CPU (from the ddb). I think it would be helpful to get several backtraces (i.e., bt nfsd pid, cont, bt nfsd pid ...) to see where it running. I'm afraid that I can not do that. Last time I tried breaking into ddb (on 5.x), it hangs my serial console and the server is miles away :-( . Perhaps we
Re: Trouble with NFSd under 6.1-Stable, any ideas?
On 5/23/06, Howard Leadmon [EMAIL PROTECTED] wrote: If there are any thing I can provide to help tracking this down. Please let me know. By the way, I tried with truss/kdump to see what happens when nfsd eats lot of CPUs, but in vain. They do not return anything. I tried your recipe on 7-CURRENT with locally exported fs, remounted over nfs. I did not get the behaviour your described. As noted in my previous thread, I have another 6.1-RELEASE nfs server, which does not have this problem. Could you, please, provide the backtrace for the nfsd that eats the CPU (from the ddb). I think it would be helpful to get several backtraces (i.e., bt nfsd pid, cont, bt nfsd pid ...) to see where it running. I'm afraid that I can not do that. Last time I tried breaking into ddb (on 5.x), it hangs my serial console and the server is miles away :-( . Perhaps we can ask Howard to do that? I am more than willing to do that, as this machine runs here with me, so if needed I can easily get on a console, or perform a reboot. Can one of you shed a little light on exactly what I need to do, and how to do this? I ask as I have never used this ddb stuff, so not clue one on how to go about getting the information your looking to find. Guess I have been lucky, and just never had an issue that took things to this level. At least you have to add the following to your kernel: options KDB options DDB Recompile it, reboot. You would better to setup a serial console so you can easily copy thing from ddb output. To do it, you have to put device sio in your kernel configuration and some files below: /boot.config -Dh /boot/loader.conf comconsole_speed=115200 machdep.conspeed=115200 /etc/ttys ttyd0 /usr/libexec/getty std.115200 cons25 on secure On the other machine, /etc/remote: com1:dv=/dev/cuad0:br#115200:pa=none: Then, use tip com1 to attach the nfs server. The above settings assume your serial console on nfs server is on COM1 and on the client side is also COM1. If that's not the case, please follow Handbook for howto setup a serial console other than COM1. To break into ddb, either use ctrl+alt+esc or send a BREAK (I think ^b will do) via serial line. After that, you should see db Then you first use ps to find out the nfsd pid (better to remember the pid which eats lots of cpu before enter ddb). After that, do what Konstantin suggests. I have never tried cont in db. I guess that will return the execution back to kernel and you need to break into ddb again to do another bt pid. By the way, could you verify that backing out vfs_lookup.c rev 1.90 helps in your situation? If not, maybe we are seeing different problems, and then I have to figure out how to make my serial console work here. Thanks, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: quota and snapshots in 6.1-RELEASE
On 5/23/06, Dmitriy Kirhlarov [EMAIL PROTECTED] wrote: Hi, list. Some time ago quota and, AFAIR, snapshots in 6.1-RELEASE has deadlock problems. What the current situation with this? I'm ready to test patches, if needed. WBR IIRC, there are some quota and snapshots changes merged in 6.1-STABLE after 6.1-RELEASE is releases. So I think you may want to try that. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Trouble with NFSd under 6.1-Stable, any ideas?
On 5/23/06, Howard Leadmon [EMAIL PROTECTED] wrote: Hello Rong-en, Thanks for the info on getting the debugger configured, and on the serial console. I will have to try and play with the serial console thing more, I just tried putting in the flags and the damn thing hung, I had to boot from CD and take the stuff back out. One thing you mention below that concerns me is that you have version 1.90 of the vfs_lookup.c file. I just did a less on /usr/src/sys/kern/vfs_lookup.c and I see the following: FreeBSD: src/sys/kern/vfs_lookup.c,v 1.80.2.7 2006/04/30 03:57:46 kris Exp I even did a cvsup (I use cvsup2.FreeBSD.org) to make sure I had the current stuff before rebuilding the kernel just now, and still I see the same thing. Is something fishy going on here, or did you by chance make a typo?? Sorry for the confusion. rev 1.90 is the number for -HEAD. To back out this MFC'ed change for RELENG_6_1, please cvsup to RELENG_6_1 date=2006.04.30.03.57.00. Then you should see it is 1.80.2.6 2006/03/31 07:39:24 kris To verify the effect of this revision. Please run RELENG_6_1 with 2006.04.30.03.57.00 and 2006.04.30.04.00.00. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Trouble with NFSd under 6.1-Stable, any ideas?
On 5/14/06, Kris Kennaway [EMAIL PROTECTED] wrote: On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote: Hello All, I have been running FBSD a long while, and actually running since the 5.x releases on the server I am having troubles with. I basically have a small network and just use NIS/NFS to link my various FBSD and Solaris machines together. This has all been running fine up till a few days ago, when all of a sudden NFS came to a crawl, and CPU usage so high the box appears to freeze almost. When I had 6.1-RC running all seemed well, then came the announcement for the official 6.1 release, so I did the cvs updates, made world, kernel, and ran mergemaster to get everything up to the 6.1 stable version. Now after doing this, something is wrong with NFS. It works, it will return information and open files, just it's very very slow, and while performing a request the CPU spike is astounding. A simple du of my home directory can take minutes, and machine all but locks up if the request is done over NFS. Here is top snip: PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 497 root 1 40 1252K 780K - 2 50:42 188.48% nfsd This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM on a disk array, and locally is screams, heck NFS used to scream till I updated. I am not really sure what info would be useful in debugging, so won't post tons of misc junk in this eMail, but if anyone has any ideas as to how best to figure out and resolve this issue it would sure be appreicated... Use tcpdump and related tools to find out what traffic is being sent. Also verify that you did not change your system configuration in any way: there have been no changes to NFS since the release, so it is unclear why an update would cause the problem to suddenly occur. Kris Hi Kris and Howard, As I posted few days ago, I have similar problems like Howard's (some details in the thread 6.1-RELEASE, em0 high interrupt rate and nfsd eats lots of cpu on stable@). After binary searching the source tree, I found that RELENG_6_1, 2006.04.30.03.57 ok RELENG_6_1, 2006.04.30.04.00 bad The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91. With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90, the same problem occurs. Let me refresh what problems I'm seeing 1. a client (no matter Linux 2.6.16 or FreeBSD 6.1) runs du on a nfs directory 2. on server-side, nfsd starts to eats lots of CPU 3. the du finishes 4. on server-side, nfsd still eats lots of CPU, but there is no nfs traffic. Wait for 5 minutes, you can still see that nfsd is running and eats lots of CPU. On FreeBSD 6.1R client, it uses UDP mount and fstab is like rw,-L,nosuid,bg,nodev. On Linux cleint, it uses UDP mount and fstab is like defaults,udp,hard,intr,nfsvers=3,rsize=8192,wsize=8192. The server's kernel conf is at http://www.rafan.org/FreeBSD/nfs/KERNEL Some related configuration files: /etc/export /export/dir1 host1 host2... /export/dir2 host1 host2... /etc/rc.conf nfs_server_enable=YES nfs_server_flags=-u -t -n 16 mountd_enable=YES mountd_flags=-r -l -n rpc_lockd_enable=YES rpc_statd_enable=YES rpcbind_enable=YES /etc/fstab: /dev/... /export/dir1 ufs rw,nosuid,noexec 2 2 /dev/... /export/dir2 ufs rw,nosuid,noexec,userquota 2 2 The NFS server is also using amd to mount some backup directories from another NFS server. the amd.conf is [global] browsable_dirs = yes map_type = file mount_type = nfs auto_dir = /nfs fully_qualified_hosts = no log_file = syslog nfs_proto = udp nfs_allow_insecure_port = no nfs_vers = 3 # plock = yes selectors_on_default = yes restart_mounts = yes [/backup] map_options = type:=direct map_name = /etc/amd.direct /etc/amd.direct: /defaults opts:=rw,grpid,resvport,vers=3,proto=udp,nosuid,nodev,rsize=8192,wsize=8192 backup type:=nfs;rhost:=nfs2;rfs:=/nfs2/${host} If there are any thing I can provide to help tracking this down. Please let me know. By the way, I tried with truss/kdump to see what happens when nfsd eats lot of CPUs, but in vain. They do not return anything. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
6.1-RELEASE, em0 high interrupt rate and nfsd eats lots of cpu
root 1 -40 -159 0K 8K CPU0 0 8:19 9.18% swi2: camb 40 root 1 -160 0K 8K sdflus 1 6:04 5.13% softdepflu The wait channel of nfsd are usually biord, biowd, ufs, RUN, CPUX, and -. The kernel conf is GENERIC without unneeded hardware + ipfw2, FAST_IPSEC, QUOTA (but I don't have any userquota or groupquota in fstab). I also tuned some sysctls: machdep.hyperthreading_allowed=1 vm.kmem_size_max=419430400 vm.kmem_size_scale=2 net.link.ether.inet.log_arp_wrong_iface=0 net.inet.tcp.sendspace=65536 net.inet.tcp.recvspace=65536 net.inet.udp.recvspace=65536 kern.ipc.somaxconn=4096 kern.maxfiles=65535 kern.ipc.shmmax=104857600 kern.ipc.shmall=25600 net.inet.ip.random_id=1 kern.maxvnodes=10 vfs.read_max=16 kern.cam.da.retry_count=20 kern.cam.da.default_timeout=300 Anything that I can provide to help nail this problem down? Thanks, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.1-RELEASE, em0 high interrupt rate and nfsd eats lots of cpu
On 5/15/06, Dmitriy Kirhlarov [EMAIL PROTECTED] wrote: On Mon, May 15, 2006 at 02:15:08PM -0400, Rong-en Fan wrote: Hi, After upgrading from 5.5-PRERELEASE to 6.1-RELEASE on one nfs server today, I noticed that the load is very high, ranging from 4.x to 30.x, depends how many nfsd I run. From mrtg traffic graph, I did not notice there is high traffic. This box is 2 physical Xeon CPU w/ I have same situation today on RC2. One client installing world from nfs share. nfsd eat 91% CPU, load average 6-8. Very small disk activitie. I don't look interrupt rate. I, also, have em0. Hi, It looks to me that after reboot the machine, do a du frm a client, during du running, the nfsd eats lots of cpu. However, after du exits, the nfsd still eats lots of cpu. Don't know what happened, I will give latest RELENG_6 a shot. If that does not work, I probably goes back to 6.0-R or even 5.5-PR :( By the way, I have another nfs server (mainly for backup), running 6.1-RELEASE, does not have this behavior. When do you start to notice this problem? Since 6.1-RC or? Thanks, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.1-RELEASE, em0 high interrupt rate and nfsd eats lots of cpu
On 5/15/06, Dmitriy Kirhlarov [EMAIL PROTECTED] wrote: On Mon, May 15, 2006 at 02:15:08PM -0400, Rong-en Fan wrote: Hi, After upgrading from 5.5-PRERELEASE to 6.1-RELEASE on one nfs server today, I noticed that the load is very high, ranging from 4.x to 30.x, depends how many nfsd I run. From mrtg traffic graph, I did not notice there is high traffic. This box is 2 physical Xeon CPU w/ I have same situation today on RC2. One client installing world from nfs share. nfsd eat 91% CPU, load average 6-8. Very small disk activitie. I don't look interrupt rate. I, also, have em0. After some digging, I found the cpu-eater nfsd can be triggered by running ``du'' on nfs client (both FreeBSD 6.1-R and Linux box). The nfsd will eat lots of CPU. After the client's du is finished, the nfsd still eat lots of CPU. The workaround is to /etc/rc.d/nfsd restart Everything will be just fine. Besides du, on FreeBSD 6.1-R client, buildkernel over nfs will trigger the same behavior. I just downgraded this box to 6.0-RELEASE and everything works fine. Running du or buildkernel from nfs client do not trigger the same behavior. I will try to do a binary search from 6.0-R to 6.1-R see if I can find out related commits. By the way, I have another nfs server running 6.1-RELEASE, but it does not exhibit this behavior. Kernel conf and sysctl are basically the same for both boxes. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.x on an IBM T42 laptop
On 5/8/06, Nik Clayton [EMAIL PROTECTED] wrote: Mark Willson wrote: I've got an IBM T42 laptop that's currently running 5.4, and it's working nicely at the moment. ACPI works well enough that suspend to RAM works ('zzz'), the audio works, USB devices are recognised, and the battery life's reasonable (with est enabled). Is anyone aware of any regressions in laptop functionality going from 5.4 to 6.x? I've been running 6-STABLE on a T42 for a while and not noticed any problems in the subjects mentioned. The addition of iwi has made life a little simpler. I think it is ok to take the leap... Thanks. I noticed that the acpi_ibm manual page talks about the suspend-to-disk functionality. Does that work? From my experience on X31, the Fn+F12 (suspend to disk) only works with apm and no acpi. Of course, you have to create a partition first (via phdisk(?) aviliable at ibm's site). Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
ypwhich -m
Hi, I found that ypwhich -m does not work on 6.1-RC, it shows ypwhich: can't find the master of `�`: reason: No such map in server's domain IIRC, there was a commit last year to fix this. After some search, I think it is include/rpcsvc/yp_prot.h revision 1.13 done by peter@ (CC'ed). As far as I can tell, ypwhich -m is also broken on 5.4 and 5.5-PRERELEASE. I have tested that revision on a 5.5-PRERELEASE machine, it fixes ypwhich -m. I would like to see this MFC'ed to RELENG_6 and RELENG_5, so the newer releases will have this fixed. Thanks, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: rpc.lockd brokenness (2)
On 4/8/06, Kris Kennaway [EMAIL PROTECTED] wrote: On Sat, Apr 08, 2006 at 01:28:55AM -0400, Rong-En Fan wrote: On 3/6/06, Jun Kuriyama [EMAIL PROTECTED] wrote: I'm not yet received enough information to track rpc.lockd problem. As Kris posted before, here is a patch to backout my suspected commit. If someone can easily reproduce this problem, please try with this patch on both of server/client side of rpc.lockd (I'm not sure which of server/client side this affects). http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/80389 http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/84953 Any reports about this patch (OK or still problem) are welcome! Hi, Somehow I have problems with lockd after 3 boxes upgraded from Feb's RELENG_6 to Apr 6's. One of them has problems with lockd. For example, mutt and irssi will stuck in lockd (shown by top). I tried to back out changes in revision 1.18 for lock_proc.c, and do /etc/rc.d/nfslocking stop then a start. After backout it, mutt and irssi work well. If I put 1.18 back, mutt and irssi will stuck in lockd again. Last month, I played with the test program/script in those two PRs, found that revision 1.18 does not make any difference. I'm not 100% sure the problem I encountered now is related to rev 1.18. But it is a report that backout 1.18 really helps. For record, all my clients involved in this mail are running RELENG_6. Server is RELENG_5 as of March 9. Only IPv4 here, no IPv6. 1.18 was merged 15 months ago, so it cannot be the cause if you updated from Feb 2006. Yes , I know that. But how to explain that after back-out 1.18 and restart rpc.lockd, my mutt and irssi will work. And putting it back, they dont work? I tried backing out and putting back three times. And, if I simply restart lockd, it does not help. Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
RELENG_6_1
Hi, According to the webpage [1], 6.1 has been branched on April 5. However, I noticed that there is a tag called RELENG_6_1, not a branch called RELENG_6_1. For example, sys/conf/newvers.sh [2], rev 1.69.2.11, is on RELENG_6 branch with tag RELENG_6_1_BP and RELENG_6_1. It is a bit strange for me. At least, we have RELENG_X_Y branch before and RELENG_X_Y_BP tag. Is there any special reason that we have a tag instead of a branch for 6.1? Regards, Rong-En Fan [1] http://www.freebsd.org/releases/6.1R/schedule.html [2] http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/conf/newvers.sh ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: RELENG_6_1
On 4/8/06, Scott Long [EMAIL PROTECTED] wrote: Rong-En Fan wrote: Hi, According to the webpage [1], 6.1 has been branched on April 5. However, I noticed that there is a tag called RELENG_6_1, not a branch called RELENG_6_1. For example, sys/conf/newvers.sh [2], rev 1.69.2.11, is on RELENG_6 branch with tag RELENG_6_1_BP and RELENG_6_1. It is a bit strange for me. At least, we have RELENG_X_Y branch before and RELENG_X_Y_BP tag. Is there any special reason that we have a tag instead of a branch for 6.1? RELENG_6_1 is a branch tag (or at least it should have been unless I screwed it up). The _BP tag always comes before the branch tag. I just checked CVS and it appears to agree with this. Can you give an example of what is wrong? http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/conf/newvers.sh When 6.0 is branched and moves to RC, it shows Revision 1.69.2.8 / (download) - annotate - [select for diffs], Sun Oct 9 16:59:34 2005 UTC (5 months, 4 weeks ago) by scottl Branch: RELENG_6 CVS Tags: RELENG_6_0_BP Branch point for: RELENG_6_0 When 6.1 moves to RC, it shows Revision 1.69.2.11 / (download) - annotate - [select for diffs], Sat Apr 8 14:42:23 2006 UTC (9 hours, 9 minutes ago) by scottl Branch: RELENG_6 CVS Tags: RELENG_6_1_BP, RELENG_6_1 I expected to see something like the case for 6.0. I didn't see a branch point for: RELENG_6_1 here. Did I miss something or cvsweb shows the wrong information? Hope we can see 6.1 RELEASE soon :-) Thanks, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: rpc.lockd brokenness (2)
On 3/6/06, Jun Kuriyama [EMAIL PROTECTED] wrote: I'm not yet received enough information to track rpc.lockd problem. As Kris posted before, here is a patch to backout my suspected commit. If someone can easily reproduce this problem, please try with this patch on both of server/client side of rpc.lockd (I'm not sure which of server/client side this affects). http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/80389 http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/84953 Any reports about this patch (OK or still problem) are welcome! Hi, Somehow I have problems with lockd after 3 boxes upgraded from Feb's RELENG_6 to Apr 6's. One of them has problems with lockd. For example, mutt and irssi will stuck in lockd (shown by top). I tried to back out changes in revision 1.18 for lock_proc.c, and do /etc/rc.d/nfslocking stop then a start. After backout it, mutt and irssi work well. If I put 1.18 back, mutt and irssi will stuck in lockd again. Last month, I played with the test program/script in those two PRs, found that revision 1.18 does not make any difference. I'm not 100% sure the problem I encountered now is related to rev 1.18. But it is a report that backout 1.18 really helps. For record, all my clients involved in this mail are running RELENG_6. Server is RELENG_5 as of March 9. Only IPv4 here, no IPv6. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: ps column header case changes
On 4/5/06, Garance A Drosehn [EMAIL PROTECTED] wrote: At 10:08 PM -0400 4/5/06, Rong-En Fan wrote: I just updated my world from Feb's RELENG_6 as of today. I noticed that the column header of ps's output is changed from upper to lower case. $ ps awx -r -o user|head -1 user This is used to be USER. I found that changes in ps/keyword.c rev 1.75 causes this (this is already MFC'ed). Ugh. Sometimes the simple changes are the easiest ones to screw up. That's what I get for trying to fix the previous bug between meetings, I guess. I'll look into it. Many apologies. Thanks for committing rev 1.76. It fixes the column header problem. If no further problems, could you please MFC to 6.1, which is still broken. :-) Best, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
ps column header case changes
Hi, I just updated my world from Feb's RELENG_6 as of today. I noticed that the column header of ps's output is changed from upper to lower case. $ ps awx -r -o user|head -1 user This is used to be USER. I found that changes in ps/keyword.c rev 1.75 causes this (this is already MFC'ed). Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
NFS data corruption listed in 6.1 show stopper
Hi, We are planning to upgrade a NFS server from 5.x to 6.x. However, we found there is a show stopper of 6.1 about NFS data corruption. From the todo page, it says this item is worked in progress. Is this has been fixed already or not yet? From the description, it is found by running fsx (in regression/fsx?), can somebody show how to reproduce this? We would like to do some tests. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
6.1 ata panic if dma enabled
Hi, Recently, we upgrade a 4.11 box to 6.1-BETA2 by reinstall+newfs everything. After that, we found that if hw.ata.ata_dma=1 at boot, then as soon as it starts fsck -p, it panics. It happens only if ad0 is setted to UDMA66 or above. My current solution is set hw.ata.ata_dma=0 in loader.conf and manually turn DMA on ad0 to UDMA33 and rest ad4~ad7 to UDMA100. In the days of 4.x, there is something wrong with DMA on ad0, but it will fall back to PIO4 automatically without problem. We have been tried to 1) change the cable 2) change from primary ata controller to the second, 3) upgrade to RELENG_6 as of March 11, but all these are failed. There is no options in bios to turn off DMA for the onboard ATA controller. The ata controller and ad0 is atapci0: VIA 82C686B UDMA100 controller port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 7.1 on pci0 atapci0: Reserved 0x10 bytes for rid 0x20 type 4 at 0xffa0 ata0: ATA channel 0 on atapci0 atapci0: Reserved 0x8 bytes for rid 0x10 type 4 at 0x1f0 atapci0: Reserved 0x1 bytes for rid 0x14 type 4 at 0x3f6 ata0: reset tp1 mask=03 ostat0=50 ostat1=00 ata0: stat0=0x50 err=0x01 lsb=0x00 msb=0x00 ata0: stat1=0x00 err=0x01 lsb=0x00 msb=0x00 ata0: reset tp2 stat0=50 stat1=00 devices=0x1ATA_MASTER ata0: [MPSAFE] ata0-master: pio=PIO4 wdma=WDMA2 udma=UDMA100 cable=80 wire ad0: setting PIO4 on 82C686B chip ad0: setting UDMA100 on 82C686B chip ad0: 38166MB Seagate ST340016A 3.10 at ata0-master UDMA100 ad0: 78165360 sectors [19158C/16H/255S] 16 sectors/interrupt 1 depth queue I'm pretty sure this HD is capable of UDMA100 (by the specification on Seagate website). The console messages are: /dev/ad0s1e: clean, 823031 free (447 frags, 102823 blocks, 0.0% fragmentation) ad0: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=191 ad0: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=191 ad0: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=131647 ad0: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=131647 ad0: FAILURE - WRITE_DMA status=51READY,DSC,ERROR error=84ICRC,ABORTED LBA=131647 g_vfs_done():ad0s1a[WRITE(offset=67371008, length=16384)]error = 5 [...] kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode fault virtual address = 0x24 fault code = supervisor read, page not present instruction pointer = 0x20:0xc04eef95 stack pointer = 0x28:0xe4c714f0 frame pointer = 0x28:0xe4c71500 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= resume, IOPL = 0 current process = 127 (cp) [thread pid 127 tid 100028 ] Stopped at turnstile_broadcast+0x9:movl0x24(%eax),%eax db bt Tracing pid 127 tid 100028 td 0xc474e000 turnstile_broadcast(0) at turnstile_broadcast+0x9 _mtx_unlock_sleep(c068aa60,0,0,0) at _mtx_unlock_sleep+0x6c softdep_sync_metadata(c4958880) at softdep_sync_metadata+0x7d4 ffs_syncvnode(c4958880,1) at ffs_syncvnode+0x43d ffs_truncate(c4958880,200,0,880,c4695d00,c474e000) at ffs_truncate+0x77e ufs_direnter(c4958880,c49de880,e4c7192c,e4c71bd0,0) at ufs_direnter+0x85d ufs_makeinode(81a4,c4958880,e4c71bbc,e4c71bd0) at ufs_makeinode+0x30f ufs_create(e4c71a84) at ufs_create+0x37 VOP_CREATE_APV(c0670ec0,e4c71a84) at VOP_CREATE_APV+0x3c VOP_CREATE(c4958880,e4c71bbc,e4c71bd0,e4c71ae0) at VOP_CREATE+0x34 vn_open_cred(e4c71ba8,e4c71cc4,1a4,c4695d00,4) at vn_open_cred+0x20c vn_open(e4c71ba8,e4c71cc4,1a4,4) at vn_open+0x29 kern_open(c474e000,804c1c8,0,602,21b6) at kern_open+0xd4 open(c474e000,e4c71cf0) at open+0x22 syscall(3b,3b,3b,8060100,bfbfeec4) at syscall+0x337 Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (5, FreeBSD ELF32, open), eip = 0x28137ccf, esp = 0xbfbfec7c, ebp = 0xbfbfecc8 --- db call doadump Cannot dump. No dump device defined. The full dmesg (with boot_verbose) is available at http://www.rafan.org/FreeBSD/ata/20060316-dmesg+db.txt I did a alltrace in ddb: http://www.rafan.org/FreeBSD/ata/20060311-dball.txt Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 6.1 ata panic if dma enabled
On 3/16/06, Scott Long [EMAIL PROTECTED] wrote: Rong-En Fan wrote: Hi, Recently, we upgrade a 4.11 box to 6.1-BETA2 by reinstall+newfs everything. After that, we found that if hw.ata.ata_dma=1 at boot, then as soon as it starts fsck -p, it panics. It happens only if ad0 is setted to UDMA66 or above. My current solution is set hw.ata.ata_dma=0 in loader.conf and manually turn DMA on ad0 to UDMA33 and rest ad4~ad7 to UDMA100. In the days of 4.x, there is something wrong with DMA on ad0, but it will fall back to PIO4 automatically without problem. We have been tried to 1) change the cable 2) change from primary ata controller to the second, 3) upgrade to RELENG_6 as of March 11, but all these are failed. There is no options in bios to turn off DMA for the onboard ATA controller. Please review the release notes from the 6.1-BETA2 announcement. Fixes went into 6.1 shortly after BETA2 was released, and are in BETA3 and BETA4. Upgrade to today's RELENG_6, it is the same. I'm not quite if this is hardware problem. But however, why can't ata fall back to PIO4 is DMA write error, just like 4.x does? ata0-master: pio=PIO4 wdma=WDMA2 udma=UDMA100 cable=80 wire ad0: setting PIO4 on 82C686B chip ad0: setting UDMA100 on 82C686B chip ad0: 38166MB Seagate ST340016A 3.10 at ata0-master UDMA100 ad0: 78165360 sectors [19158C/16H/255S] 16 sectors/interrupt 1 depth queue /dev/ad0s1d: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/ad0s1d: clean, 624587 free (28411 frags, 74522 blocks, 1.9% fragmentation) /dev/ad0s1e: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/ad0s1e: clean, 826458 free (466 frags, 103249 blocks, 0.0% fragmentation) ad0: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=191 ad0: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=191 ad0: FAILURE - WRITE_DMA status=51READY,DSC,ERROR error=84ICRC,ABORTED LBA=191 g_vfs_done():ad0s1a[WRITE(offset=65536, length=2048)]error = 5 mount: /dev/ad0s1a: Input/output error Mounting root filesystem rw failed, startup aborted Boot interrupted Enter root password, or ^D to go multi-user then I just continue..., finally it panics kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode fault virtual address = 0x24 fault code = supervisor read, page not present instruction pointer = 0x20:0xc045 stack pointer = 0x28:0xe4cfb4f0 frame pointer = 0x28:0xe4cfb500 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= resume, IOPL = 0 current process = 168 (cp) [thread pid 168 tid 100044 ] Stopped at turnstile_broadcast+0x9:movl0x24(%eax),%eax db bt Tracing pid 168 tid 100044 td 0xc48de180 turnstile_broadcast(0) at turnstile_broadcast+0x9 _mtx_unlock_sleep(c068aca0,0,0,0) at _mtx_unlock_sleep+0x6c softdep_sync_metadata(c495d660) at softdep_sync_metadata+0x7d4 ffs_syncvnode(c495d660,1) at ffs_syncvnode+0x43d ffs_truncate(c495d660,200,0,880,c4695d00,c48de180) at ffs_truncate+0x77e ufs_direnter(c495d660,c49e1880,e4cfb92c,e4cfbbd0,0) at ufs_direnter+0x85d ufs_makeinode(81a4,c495d660,e4cfbbbc,e4cfbbd0) at ufs_makeinode+0x30f ufs_create(e4cfba84) at ufs_create+0x37 VOP_CREATE_APV(c0671100,e4cfba84) at VOP_CREATE_APV+0x3c VOP_CREATE(c495d660,e4cfbbbc,e4cfbbd0,e4cfbae0) at VOP_CREATE+0x34 vn_open_cred(e4cfbba8,e4cfbcc4,1a4,c4695d00,4) at vn_open_cred+0x20c vn_open(e4cfbba8,e4cfbcc4,1a4,4) at vn_open+0x29 kern_open(c48de180,804c1c8,0,602,21b6) at kern_open+0xd4 open(c48de180,e4cfbcf0) at open+0x22 syscall(3b,3b,3b,8060100,bfbfeec4) at syscall+0x337 Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (5, FreeBSD ELF32, open), eip = 0x28137ccf, esp = 0xbfbfec7c, ebp = 0xbfbfecc8 --- Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nfsclient process stucks in nfsaio
On 3/10/06, Rong-En Fan [EMAIL PROTECTED] wrote: With INVARIANT, WITNESS enabled, when I tried to ^C to exit dd, it panics immediately. Some ddb kgdb messages below (I have KDB_TRACE, KDB_UNATTENDED). Core file is available. Any help is appreciated :-) UPDATE: sometimes, I cant ^C or kill -9 the dd process even with mpsafenet=0. In that situation, a panic with similar trace as below, which is mpsafenet=1. Hi, After tried with SMP with different combination of debug.mpsafe{net,vm,vfs}, UP kernel, all the same. Also, I did tune options MAXDSIZ=(2048UL*1024*1024) options MAXSSIZ=(128UL*1024*1024) options DFLDSIZ=(2048UL*1024*1024) in my kernel. Dont know if this afftects or not. I will try remove them, and test again. Thanks, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nfsclient process stucks in nfsaio
Hi, With INVARIANT, WITNESS enabled, when I tried to ^C to exit dd, it panics immediately. Some ddb kgdb messages below (I have KDB_TRACE, KDB_UNATTENDED). Core file is available. Any help is appreciated :-) UPDATE: sometimes, I cant ^C or kill -9 the dd process even with mpsafenet=0. In that situation, a panic with similar trace as below, which is mpsafenet=1. panic: VOP_STRATEGY failed bp=0xd835acd8 vp=0xc4a1baa0 cpuid = 1 KDB: stack backtrace: kdb_backtrace(1,c05056b4,1,e7f1b7d0,1) at kdb_backtrace+0x2e panic(c061782c,d835acd8,c4a1baa0,c4a1baa0,4) at panic+0x12b bufstrategy(c4a1bb60,d835acd8,e7f1b80c,c471ee63,d835acd8) at bufstrategy+0x7d bstrategy(d835acd8,c060be84,23c,a00200a6,0) at bstrategy+0x60 nfs_writebp(d835acd8,1,c4369000,e7f1b82c,c471eb73) at nfs_writebp+0xf3 nfs_bwrite(d835acd8,e7f1b904,c471e92b,d835acd8,1dd88000) at nfs_bwrite+0x13 bwrite(d835acd8,1dd88000,0,1dd86000,0) at bwrite+0x5b nfs_flush(c4a1baa0,1,c4369000,1,e7f1b92c) at nfs_flush+0x78b nfs_fsync(e7f1b93c) at nfs_fsync+0x1c VOP_FSYNC_APV(c4735fc0,e7f1b93c) at VOP_FSYNC_APV+0x99 VOP_FSYNC(c4a1baa0,1,c4369000) at VOP_FSYNC+0x2e bufsync(c4a1bb60,1,c4369000) at bufsync+0x14 bufobj_invalbuf(c4a1bb60,1,c4369000,100,0) at bufobj_invalbuf+0xda vinvalbuf(c4a1baa0,1,c4369000,100,0) at vinvalbuf+0x1d nfs_vinvalbuf(c4a1baa0,1,c4369000,1,c04d5738) at nfs_vinvalbuf+0xda nfs_write(e7f1bbc8) at nfs_write+0x16f VOP_WRITE_APV(c4735fc0,e7f1bbc8) at VOP_WRITE_APV+0x11e VOP_WRITE(c4a1baa0,e7f1bcb0,7f0001,c49f5180) at VOP_WRITE+0x34 vn_write(c46d6ca8,e7f1bcb0,c49f5180,0,c4369000) at vn_write+0x1ad fo_write(c46d6ca8,e7f1bcb0,c49f5180,0,c4369000) at fo_write+0x1d dofilewrite(c4369000,4,c46d6ca8,e7f1bcb0,,,0) at dofilewrite+0x8e kern_writev(c4369000,4,e7f1bcb0) at kern_writev+0x41 write(c4369000,e7f1bcf0) at write+0x58 syscall(3b,3b,3b,8076000,10) at syscall+0x2cf Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (4, FreeBSD ELF32, write), eip = 0x880b9813, esp = 0xbfbfeaac, ebp = 0xbfbfead8 --- Uptime: 4m18s Dumping 3062 MB (2 chunks) [...] (kgdb) bt full #0 0xc04a8181 in doadump () at /usr/src/sys/kern/kern_shutdown.c:233 No locals. #1 0xc04a8841 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399 first_buf_printf = 1 #2 0xc04a8bf9 in panic (fmt=0xc061782c VOP_STRATEGY failed bp=%p vp=%p) at /usr/src/sys/kern/kern_shutdown.c:555 td = (struct thread *) 0xc4369000 bootopt = 260 newpanic = 1 ap = 0xe7f1b7d0 ج5ؠ��Ġ���\004 buf = VOP_STRATEGY failed bp=0xd835acd8 vp=0xc4a1baa0, '\0' repeats 208 times #3 0xc0505689 in bufstrategy (bo=0xc4a1bb60, bp=0xd835acd8) at /usr/src/sys/kern/vfs_bio.c:3690 i = 4 vp = (struct vnode *) 0xc4a1baa0 #4 0xc471ef28 in ?? () No symbol table info available. #5 0xc4a1bb60 in ?? () No symbol table info available. #6 0xd835acd8 in ?? () No symbol table info available. #7 0xe7f1b80c in ?? () No symbol table info available. #8 0xc471ee63 in ?? () No symbol table info available. #9 0xd835acd8 in ?? () No symbol table info available. #10 0xc060be84 in __func__.2 () No symbol table info available. #11 0x023c in ?? () No symbol table info available. #12 0xa00200a6 in ?? () No symbol table info available. #13 0x in ?? () No symbol table info available. #14 0xe7f1b820 in ?? () No symbol table info available. #15 0xc471f2a3 in ?? () No symbol table info available. #16 0xd835acd8 in ?? () No symbol table info available. #17 0x0001 in ?? () No symbol table info available. #18 0xc4369000 in ?? () No symbol table info available. #19 0xe7f1b82c in ?? () No symbol table info available. #20 0xc471eb73 in ?? () No symbol table info available. #21 0xd835acd8 in ?? () No symbol table info available. #22 0xe7f1b904 in ?? () No symbol table info available. #23 0xc471e92b in ?? () No symbol table info available. #24 0xd835acd8 in ?? () No symbol table info available. #25 0x1dd88000 in ?? () No symbol table info available. #26 0x in ?? () No symbol table info available. #27 0x1dd86000 in ?? () No symbol table info available. #28 0x in ?? () No symbol table info available. #29 0xe7f1b858 in ?? () No symbol table info available. #30 0xc049ee97 in _mtx_assert (m=0xd835acd8, what=-1067401596, file=0x23c Address 0x23c out of bounds, line=-1610481498) at /usr/src/sys/kern/kern_mutex.c:754 No locals. Previous frame inner to this frame (corrupt stack?) (kgdb) l *0xc0505689 0xc0505689 is in bufstrategy (/usr/src/sys/kern/vfs_bio.c:3691). 3686KASSERT(vp == bo-bo_private, (Inconsistent vnode bufstrategy)); 3687KASSERT(vp-v_type != VCHR vp-v_type != VBLK, 3688(Wrong vnode in bufstrategy(bp=%p, vp=%p), bp, vp)); 3689i = VOP_STRATEGY(vp, bp); 3690KASSERT(i == 0, (VOP_STRATEGY failed bp=%p vp=%p, bp, bp-b_vp)); 3691} 3692 3693void 3694bufobj_wrefl(struct bufobj *bo) 3695{ On 3/10/06, Rong-En Fan [EMAIL PROTECTED] wrote: Hi
nfsclient process stucks in nfsaio
Hi, After upgrading several our nfs clients from 5.4-RELEASE to 6.0-RELEASE and some are now 6.1-PRERELEASE (a weeks ago). From time to time, we saw some processes stuck in nfsaio, and unkillable. These processes generate lots of traffic to nfs server (write to nfs, but nfs server's disk does not really in write. from netstat, client sends ~100Mbps, on nfs server, iostat does not show me ~12.5MB/s). The nfsd on the server side is either in RUN or in ufs state. Server is running 5.5-PRELEASE as of yesterday. Client mount options: rw,nosuid,bg,intr,nodev. Both client and server are running rpc.lockd, rpc.statd. I'm sure it's not related to any locking problems. I have another set of nfs server/client both running 6.0-RELEASE. And I can easily reproduce this situation on these two boxesnes, just by running dd if=/dev/zero of=/nfs/ooo bs=1m If I do not add bs=1m, it works fine. Of all the boxes I mentioned above, I did not do anything special to kernel config, i.e., they are GENERIC w/o unnecessary devices and w/ firewal. Basically, I can do anything on these two boxes (they are not in production mode). Any suggestion are welcome. Thanks, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nfsclient process stucks in nfsaio
Hi, forget to mention all the clients/servers here are SMP kernel. After some Googling, a post on current@ 2005/01/12 NFS problems, locking up is hightly related to my situation. An workaround is to set debug.mpsafenet=0, just verified this indeed works. Now I'm turning on INVARIANTS, WITNESS to see if there are some output. However, I'm afriad that I can not get a serial console access to these machines (and thus no ddb output :( ). Thanks, Rong-En Fan On 3/10/06, Rong-En Fan [EMAIL PROTECTED] wrote: Hi, After upgrading several our nfs clients from 5.4-RELEASE to 6.0-RELEASE and some are now 6.1-PRERELEASE (a weeks ago). From time to time, we saw some processes stuck in nfsaio, and unkillable. These processes generate lots of traffic to nfs server (write to nfs, but nfs server's disk does not really in write. from netstat, client sends ~100Mbps, on nfs server, iostat does not show me ~12.5MB/s). The nfsd on the server side is either in RUN or in ufs state. Server is running 5.5-PRELEASE as of yesterday. Client mount options: rw,nosuid,bg,intr,nodev. Both client and server are running rpc.lockd, rpc.statd. I'm sure it's not related to any locking problems. I have another set of nfs server/client both running 6.0-RELEASE. And I can easily reproduce this situation on these two boxesnes, just by running dd if=/dev/zero of=/nfs/ooo bs=1m If I do not add bs=1m, it works fine. Of all the boxes I mentioned above, I did not do anything special to kernel config, i.e., they are GENERIC w/o unnecessary devices and w/ firewal. Basically, I can do anything on these two boxes (they are not in production mode). Any suggestion are welcome. Thanks, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: nfsclient process stucks in nfsaio [SOLVED]
Hi all, I believe that this behavior is caused by the ``intr'' (-i) option to mount_nfs(8). As noted by Stephan Uphoff in PR/79700, he does not recommend use intr option. After remove the option, the dd works well. Sorry for the noisy :-) However, I think some warnings can be added to mount_nfs(8) about the usage of intr and its consequence. So this won't be happened again. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: RELENG_6: serial console drops back from 115200 to 9600 baud
On 2/27/06, Ruslan Ermilov [EMAIL PROTECTED] wrote: On Sun, Feb 26, 2006 at 08:26:42PM +0100, Dimitry Andric wrote: Ian Dowse wrote: Okay, but why did 4.x through 5.x through 6.x (these have all been on this particular machine) always boot with 115200 until now? :) They probably used 9600 for the boot blocks, and then switched to 115200 when /boot/loader started, so you didn't notice. Now the settings from the boot blocks get used by /boot/loader. Ah, but this still means that /boot/loader used to use a hardcoded default specified in /etc/make.conf, and now it doesn't honor that anymore. Have you checked with documentation? : comconsole_speed : Defines the speed of the serial console (i386 and amd64 only). : If the previous boot stage indicated that a serial console is : in use then this variable is initialized to the current speed : of the console serial port. Otherwise it is set to 9600 unless : this was overridden using the BOOT_COMCONSOLE_SPEED variable : when loader was compiled. Changes to the comconsole_speed : variable take effect immediately. Which way is preferred: setting comconsole_speed, -S in boot.config, or using harded code BOOT_COMCONSOLE_SPEED in make.conf? If now the most preferred way is to using -S or comconsole_speed in loader.conf, please update that in Handbook 22.6.5.1 Setting a Faster Serial Port Speed. Thanks, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: IBM x236 with Serveraid
On 2/21/06, Balgansuren Batsukh [EMAIL PROTECTED] wrote: Hello, We installed FreeBSD-6.0-RELEASE and cvsuped to STABLE. We configured machine as IPFW+NAT and installed SQUID+SQUIDGUARD+DANSGUARDIAN. It works well under light load, but on heavy load suddenly no response whole machine. We guess FreeBSD-6.0 doesn't support ServeRaid-7k/7t on IBM eSeries x236 server. We have IBM two eserver xSeries 236 (Model 8841) running 5.4-STABLE. The ServeRAID 7k works good: ips0: Adaptec ServeRAID Adapter mem 0xcfffd000-0xcfffdfff irq 38 at device 14.0 on pci3 ips0: logical drives: 1 ips0: Logical Drive 0: RAID1 sectors: 860239872, state OK ipsd0: Logical Drive on ips0 ipsd0: Logical Drive (420039MB) rafan. How can we to solve problem? Is there anyway to get work/support ServeRAID-7k/7t controller? I check most of FreeBSD mailing list archive and search on google. Didn't find any good answer of above question. Regards, Balgaa ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
6.0/amd64 boot hang if apic enabled on IBM x336
Hi all, We have a IBM xSeries 336 which has 2 Pentium 4 (EM64T) 3.2G with 2GB memory. When we boot it with 6.0-RELEASE/amd64, it hangs after acd0 is shown. However, if apic is disabled, then it boots. For 5.4-RELEASE/amd64, it works great. A 7.0-CURRENT (SNAP009, Nov/2005) shows the same behavior. To have it boot, I set following in loader.conf hint.atkbd.0.disabled=0x4 # otherwise, I lose my keyboard hint.apic.0.disabled=1 The verbose dmesg with/without apic, and the asl dump (with bios version 1.12, the latest one found on IBM site) are available at: http://www.rafan.org/FreeBSD/x336/ BTW, the SCSI timeout message from dmesg-noapic.txt is a bit wired. If I boot without verbose message it does not show, everything works great. With verbose booting, those messages show up and it takes me a lot of time to get into multiuser mode. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
panic on RELENG_5 on em(4)
Hi, I'm running RELENG_5 around Oct 12, got a panic related to em(4). After some searching, I saw a similar panic reported on -current (his/her system is also RELENG_5) in May, but no further replies. The kernel is similar to GENERIC with IPFW and have HTT enabled in loader.conf. Box is a 2*Xeon with HTT, SMP kernel is enabled, thus there are 4 logical cpus. For some reasons, I did not have DDB compiled. The kgdb outputs are enclosed. If there are people interested to help debug this, I can send information as request. Thanks,, Rong-En Fan (kgdb and console): Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 06 fault virtual address = 0xbfc38018 fault code = supervisor read, page not present instruction pointer = 0x8:0xc05fb49f stack pointer = 0x10:0xe6448bc0 frame pointer = 0x10:0xe6448c24 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 77 (irq16: em0) trap number = 12 panic: page fault cpuid = 2 #0 doadump () at pcpu.h:160 No locals. #1 0xc04c1268 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:412 first_buf_printf = 1 #2 0xc04c1616 in panic (fmt=0xc062dcbe %s) at /usr/src/sys/kern/kern_shutdown.c:568 td = (struct thread *) 0xc313bd80 bootopt = 260 newpanic = 0 ap = 0xc313bd80 L\023\020\t buf = page fault, '\0' repeats 245 times #3 0xc06121dd in trap_fatal (frame=0xe6448b80, eva=0) at /usr/src/sys/i386/i386/trap.c:817 code = 16 type = 12 ss = 16 esp = 0 softseg = {ssd_base = 0, ssd_limit = 1048575, ssd_type = 27, ssd_dpl = 0, ssd_p = 1, ssd_xx = 1, ssd_xx1 = 0, ssd_def32 = 1, ssd_gran = 1} #4 0xc0611ed4 in trap_pfault (frame=0xe6448b80, usermode=0, eva=3217260568) at /usr/src/sys/i386/i386/trap.c:735 va = 3217260544 vm = (struct vmspace *) 0x0 map = 0xc0673280 rv = 1 ftype = 1 '\001' td = (struct thread *) 0xc313bd80 p = (struct proc *) 0xc313a54c #5 0xc0611ab9 in trap (frame= {tf_fs = -1022033896, tf_es = 16, tf_ds = -431751152, tf_edi = -10217512\ 96, tf_esi = -1017843008, tf_ebp = -431715292, tf_isp = -431715412, tf_ebx = -\ 1008379904, tf_edx = 0, tf_ecx = 234907650, tf_eax = 57350, tf_trapno = 12, tf\ _err = 0, tf_eip = -1067469665, tf_cs = 8, tf_eflags = 66055, tf_esp = -431715\ 256, tf_ss = -1021353488}) at /usr/src/sys/i386/i386/trap.c:425 td = (struct thread *) 0xc313bd80 p = (struct proc *) 0xc313a54c sticks = 3863251848 i = 0 ucode = 0 type = 12 code = 0 eva = 3217260568 #6 0xc05fdc4a in calltrap () at /usr/src/sys/i386/i386/exception.s:140 No locals. #7 0xc3150018 in ?? () No symbol table info available. #8 0x0010 in ?? () No symbol table info available. #9 0xe6440010 in ?? () No symbol table info available. #10 0xc3195000 in ?? () No symbol table info available. #11 0xc354f2c0 in ?? () No symbol table info available. #12 0xe6448c24 in ?? () No symbol table info available. #13 0xe6448bac in ?? () No symbol table info available. #14 0xc3e55800 in ?? () No symbol table info available. #15 0x in ?? () No symbol table info available. #16 0x0e006802 in ?? () No symbol table info available. #17 0xe006 in ?? () No symbol table info available. #18 0x000c in ?? () No symbol table info available. #19 0x in ?? () No symbol table info available. #20 0xc05fb49f in bus_dmamap_load (dmat=0xc3353400, map=0x0, buf=0xe006802, buflen=2046, callback=0xc045f8e8 em_dmamap_cb, callback_arg=0xe6448c48, flags=0) at pmap.h:200 lastaddr = 0 error = 0 nsegs = 0 195 vm_paddr_t pa; 196 197 if ((pa = PTD[va PDRSHIFT]) PG_PS) { 198 pa = (pa ~(NBPDR - 1)) | (va (NBPDR - 1)); 199 } else { 200 pa = *vtopte(va); 201 pa = (pa PG_FRAME) | (va PAGE_MASK); 202 } 203 return pa; 204 } #21 0xc04602f1 in em_get_buf (i=88, adapter=0xc3195000, nmp=0x0) at /usr/src/sys/dev/em/if_em.c:2531 mp = (struct mbuf *) 0xc3e55800 rx_buffer = (struct em_buffer *) 0xc354f2c0 ifp = (struct ifnet *) 0xc354f2c0 paddr = 3272850816 error = -1021751296 2526 2527/* 2528 * Using memory from the mbuf cluster pool, invoke the 2529 * bus_dma machinery to arrange the memory mapping. 2530 */ 2531error = bus_dmamap_load(adapter-rxtag, rx_buffer-map, 2532mtod(mp, void *), mp-m_len, 2533em_dmamap_cb, paddr, 0); 2534if (error) { 2535m_free(mp); #22 0xc0460b6e in em_process_receive_interrupts (adapter
Re: got a panic on 5.4-STABLE
On 8/28/05, Rong-En Fan [EMAIL PROTECTED] wrote: And I also looked at the dump file, looks like that when calling m_copym(), m-m_len is 20, off is 1500, m-m_next is NULL After first iteration, m becomes NULL... #20 0xc051d62f in m_copym (m=0x0, off0=1500, len=1480, wait=1) at /usr/src/sys/kern/uipc_mbuf.c:389 I have turned off mpsafenet and got another panic yesterday. The panicstr is: kmem_malloc(4096): kmem_map too small: 335544320 total allocated The backtrace is here: (sorry, no console log, it was flushed by the fsck messages) (kgdb) bt full #0 doadump () at pcpu.h:160 No locals. #1 0xc04e0385 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:410 first_buf_printf = 1 #2 0xc04e0733 in panic ( fmt=0xc0677392 kmem_malloc(%ld): kmem_map too small: %ld total allocated) at /usr/src/sys/kern/kern_shutdown.c:566 td = (struct thread *) 0xc1f59c00 bootopt = 260 newpanic = 0 ap = 0xc1f59c00 L�:� �:� buf = kmem_malloc(4096): kmem_map too small: 335544320 total allocated, '\0' repeats 191 times #3 0xc05fb1c4 in kmem_malloc (map=0xc0c590c0, size=4096, flags=1026) at /usr/src/sys/vm/vm_kern.c:299 offset = 0 i = 0 entry = 0xd603f0c0 addr = 3253096448 m = 0xc0c64b48 pflags = -704384832 #4 0xc060da82 in page_alloc (zone=0xc0c52840, bytes=0, pflag=0x0, wait=0) at /usr/src/sys/vm/uma_core.c:957 p = (void *) 0x0 #5 0xc060d4df in slab_zalloc (zone=0xc0c52840, wait=1026) at /usr/src/sys/vm/uma_core.c:827 slabref = 0x0 slab = 0x0 flags = 2 '\002' i = -1060746424 #6 0xc060f0ec in uma_zone_slab (zone=0xc0c52840, flags=1282) at /usr/src/sys/vm/uma_core.c:1994 slab = 0x0 keg = 0xc0c64b40 #7 0xc060f352 in uma_zalloc_bucket (zone=0xc0c52840, flags=1282) at /usr/src/sys/vm/uma_core.c:2103 bucket = 0xc3e6f624 slab = 0xc0c52840 saved = 0 max = 128 origflags = 1282 #8 0xc060ef3a in uma_zalloc_arg (zone=0xc0c52840, udata=0x0, flags=1282) at /usr/src/sys/vm/uma_core.c:1911 item = (void *) 0xe4b39730 cache = 0xc0c52878 bucket = 0x0 cpu = 0 #9 0xc04d3f00 in malloc (size=192, type=0xc069d0e0, flags=1282) at uma.h:276 indx = 4 va = 0x800 Address 0x800 out of bounds zone = 0x0 keg = 0xc0c64b40 #10 0xc05d7e67 in softdep_setup_freeblocks (ip=0xc439a604, length=Unhandled dwarf expression opcode 0x93 ) at /usr/src/sys/ufs/ffs/ffs_softdep.c:1963 freeblks = (struct freeblks *) 0xc2330900 inodedep = (struct inodedep *) 0x1 adp = (struct allocdirect *) 0x0 vp = (struct vnode *) 0x0 bp = (struct buf *) 0x2 fs = (struct fs *) 0xc229f800 extblocks = -3098686512764636259 datablocks = Unhandled dwarf expression opcode 0x93 I read FreeBSD FAQ 5.10 and 5.11, which describe this kind of panic. This machine has 1G memory. So I turned kern.ipc.nmbclusters: 25600 - 32768 vm.kmem_size_max: 335544320 - 419430400 vm.kmem_size_scale: 3 - 2 Now, the vm.kmem_size is 419430400 (before is 335544320). I'm wondering if these 3 panics are related to not enough kmem? So, I would like to know is there any way to monitor kmem usage? BTW, after first panic, I turned off SACK but no help. After the second one, I turned off mpsafenet. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
got a panic on 5.4-STABLE
Hi, I got a panic on an i386 5.4-STABLE around Aug 28 with SMP enabled. It has 2 physical CPU with HTT enabled (so, total 4 cpus). This is a NFS server only with external scsi raid attached. The console log, kgdb output and sysctl.conf are as below. I'll keep this core and if someone is interested, I can send any other information requested. Regards, Rong-En Fan [sysctl.conf] net.link.ether.inet.log_arp_wrong_iface=0 net.inet.tcp.sendspace=65536 net.inet.tcp.recvspace=65536 net.inet.udp.recvspace=65536 kern.ipc.somaxconn=4096 kern.maxfiles=65535 kern.ipc.shmmax=104857600 kern.ipc.shmall=25600 net.inet.ip.random_id=1 kern.cam.da.retry_count=20 kern.cam.da.default_timeout=300 kern.maxvnodes=10 vfs.read_max=16 [console log] Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xc fault code = supervisor read, page not present instruction pointer = 0x8:0xc051d62f stack pointer = 0x10:0xe7088a64 frame pointer = 0x10:0xe7088a98 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 523 (nfsd) trap number = 12 panic: page fault cpuid = 0 boot() called on cpu#0 Uptime: 2h59m59s Dumping 1023 MB 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 384 400 416 432 448 464 480 496 512 528 544 560 576 592 608 624 640 656 672 688 704 720 736 752 768 784 800 816 832 848 864 880 896 912 928 944 960 976 992 1008 Dump complete Automatic reboot in 1 seconds - press a key on the console to abort Rebooting... cpu_reset called on cpu#0 cpu_reset: Stopping other CPUs [kgdb output] Unread portion of the kernel message buffer: @��?� �� D0p'[EMAIL PROTECTED]' #0 doadump () at pcpu.h:160 160 __asm __volatile(movl %%fs:0,%0 : =r (td)); (kgdb) up #1 0xc04e0385 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:410 410 doadump(); (kgdb) #2 0xc04e0733 in panic (fmt=0xc065d085 %s) at /usr/src/sys/kern/kern_shutdown.c:566 566 boot(bootopt); (kgdb) #3 0xc064503d in trap_fatal (frame=0xe7088a24, eva=0) at /usr/src/sys/i386/i386/trap.c:817 817 panic(%s, trap_msg[type]); (kgdb) #4 0xc0644d34 in trap_pfault (frame=0xe7088a24, usermode=0, eva=12) at /usr/src/sys/i386/i386/trap.c:735 735 trap_fatal(frame, eva); (kgdb) #5 0xc0644919 in trap (frame= {tf_fs = -1038614504, tf_es = -418906096, tf_ds = 16, tf_edi = 1480, tf_esi = 0, tf_ebp = -418870632, tf_isp = -418870704, tf_ebx = -1039299264, tf_edx = 20, tf_ecx = 1500, tf_eax = -1039299328, tf_trapno = 12, tf_err = 0, tf_eip = -1068378577, tf_cs = 8, tf_eflags = 66050, tf_esp = -1039299328, tf_ss = 1480}) at /usr/src/sys/i386/i386/trap.c:425 425 (void) trap_pfault(frame, FALSE, eva); (kgdb) #6 0xc063122a in calltrap () at /usr/src/sys/i386/i386/exception.s:140 140 calltrap Current language: auto; currently asm (kgdb) #7 0xc2180018 in ?? () (kgdb) #8 0xe7080010 in ?? () (kgdb) #9 0x0010 in ?? () (kgdb) #10 0x05c8 in ?? () (kgdb) #11 0x in ?? () (kgdb) #12 0xe7088a98 in ?? () (kgdb) #13 0xe7088a50 in ?? () (kgdb) #14 0xc20d8d40 in ?? () (kgdb) #15 0x0014 in ?? () (kgdb) #16 0x05dc in ?? () (kgdb) #17 0xc20d8d00 in ?? () (kgdb) #18 0x000c in ?? () (kgdb) #19 0x in ?? () (kgdb) #20 0xc051d62f in m_copym (m=0x0, off0=1500, len=1480, wait=1) at /usr/src/sys/kern/uipc_mbuf.c:389 389 m = m-m_next; Current language: auto; currently c (kgdb) #21 0xc0576835 in ip_fragment (ip=0xc20d8de4, m_frag=0xe7088b48, mtu=-1039299264, if_hwassist_flags=6, sw_csum=1) at /usr/src/sys/netinet/ip_output.c:967 967 m-m_next = m_copy(m0, off, len); (kgdb) #22 0xc05764a6 in ip_output (m=0xc20d8d00, opt=0xc20d8de4, ro=0xe7088b14, flags=0, imo=0x0, inp=0xc24317bc) at /usr/src/sys/netinet/ip_output.c:796 796 error = ip_fragment(ip, m, ifp-if_mtu, ifp-if_hwassist, sw_csum); (kgdb) #23 0xc0589954 in udp_output (inp=0xc24317bc, m=0xc20d8d00, addr=0xc36fe6d0, control=0x0, td=0xc243ea80) at /usr/src/sys/netinet/udp_usrreq.c:870 870 error = ip_output(m, inp-inp_options, NULL, ipflags, (kgdb) #24 0xc058a5b2 in udp_send (so=0x0, flags=0, m=0x0, addr=0x0, control=0x0, td=0x0) at /usr/src/sys/netinet/udp_usrreq.c:1047 1047return udp_output(inp, m, addr, control, td); (kgdb) #25 0xc0520aa6 in sosend (so=0xc2443144, addr=0xc36fe6d0, uio=0x0, top=0xc5b87b00, control=0x0, flags=0, td=0xc243ea80) at /usr/src/sys/kern/uipc_socket.c:835 835 error = (*so-so_proto-pr_usrreqs-pru_send)(so, (kgdb) #26 0xc05c09f4 in nfsrv_send (so=0xc2443144, nam=0xc36fe6d0, top=0x0) at pcpu.h:157 157 { (kgdb) #27 0xc05c3b5a in nfssvc_nfsd (td=0x0) at /usr
Re: got a panic on 5.4-STABLE
On 8/28/05, Rong-En Fan [EMAIL PROTECTED] wrote: Hi, I got a panic on an i386 5.4-STABLE around Aug 28 with SMP enabled. It has 2 physical CPU with HTT enabled (so, total 4 cpus). This is a NFS server only with external scsi raid attached. The console log, kgdb output and sysctl.conf are as below. I'll keep this core and if someone is interested, I can send any other information requested. I have the following in make.conf: CPUTYPE?= p4 CFLAGS= -O -pipe COPTFLAGS= -O -pipe The difference between GENERIC and my kernel is here: http://www.rafan.org/FreeBSD/panic/m_copym/kernel-diff-against-GENERIC.txt And I also looked at the dump file, looks like that when calling m_copym(), m-m_len is 20, off is 1500, m-m_next is NULL After first iteration, m becomes NULL... #20 0xc051d62f in m_copym (m=0x0, off0=1500, len=1480, wait=1) at /usr/src/sys/kern/uipc_mbuf.c:389 389 m = m-m_next; (kgdb) l 384 while (off 0) { 385 KASSERT(m != NULL, (m_copym, offset size of mbuf chain)); 386 if (off m-m_len) 387 break; 388 off -= m-m_len; 389 m = m-m_next; 390 } 391 np = top; 392 top = 0; 393 while (len 0) { (kgdb) p off $15 = 1480 (kgdb) up #21 0xc0576835 in ip_fragment (ip=0xc20d8de4, m_frag=0xe7088b48, mtu=-1039299264, if_hwassist_flags=6, sw_csum=1) at /usr/src/sys/netinet/ip_output.c:967 967 m-m_next = m_copy(m0, off, len); (kgdb) p off $9 = 1500 (kgdb) p len $10 = 1480 (kgdb) p m0 $11 = (struct mbuf *) 0xc20d8d00 (kgdb) p *m0 $12 = {m_hdr = {mh_next = 0x0, mh_nextpkt = 0x0, mh_data = 0xc20d8d40 E, mh_len = 20, mh_flags = 2050, mh_type = 2}, M_dat = {MH = {MH_pkthdr = {rcvif = 0x0, len = 8348, header = 0xc5b25010, csum_flags = 0, csum_data = 6, tags = {slh_first = 0x0}}, MH_dat = {MH_ext = { ext_buf = 0xc33ae000 oker/a\nonline casinos a href=http://www.lucky-777-casinos.comonline casinos/a\nviagra a href=http://www.ALL-VIAGRA.INFOvip�, ext_free = 0, ext_args = 0x0, ext_size = 2048, ref_cnt = 0xdc050045, ext_type = 549021963}, MH_databuf = \000�:�\000\000\000\000\000\000\000\000\000\b\000\000E\000\005�\vi� @\0 [EMAIL PROTECTED] \000xU�mJ\207�\000\000\000\0 001, '\0' repeats 23 times, \002\000\000\001�\000\000\000\002\000\000q�\000\000\001�\000\0 000\000\000\000\000\002\000\000\000\000\000\000\000\b\000\000\000\000\001\001\217\000\021\000\000\000\000\000\000\004\034\000\000\000\000\000\031\024�C\021m�\000\000\000\000B�[�\000\000\0 000\000B�[�\000\000\000\016\fo\2117\000\016\f^̺\b\000E\000\234 [EMAIL PROTECTED] 4p\036\034\b\001\003� \210N�}}, M_databuf = \000\000\000\000\234 \000\000\020P��\000\000\000\000\006\000\000\000\000\000 0\000\000\000�:�\000\000\000\000\000\000\000\000\000\b\000\000E\000\005�\vi� @\021��\214p\0360 [EMAIL PROTECTED] \000xU�mJ\207�\000\000\000\001, '\0' rep peats 23 times, \002\000\000\001�\000\000\000\002\000\000q�\000\000\001�\000\000\000\000\000 0\000\002\000\000\000\000\000\000\000\b\000\000\000\000\001\001\217\000\021\000\000\000\000\000\000\004\034\000\000\000\000\000\031\024�C\021m�\000\000\000\000B�[�\000\000\000\000B�[�\000 0\000\000\016\fo\2117\000\016\f^̺\b\000E\000\234 \vi\000\000@...}} (kgdb) l 962 len = ip-ip_len - off; 963 m-m_flags |= M_LASTFRAG; 964 } else 965 mhip-ip_off |= IP_MF; 966 mhip-ip_len = htons((u_short)(len + mhlen)); 967 m-m_next = m_copy(m0, off, len); 968 if (m-m_next == NULL) {/* copy failed */ 969 m_free(m); 970 error = ENOBUFS;/* ??? */ 971 ipstat.ips_odropped++; (kgdb) up #22 0xc05764a6 in ip_output (m=0xc20d8d00, opt=0xc20d8de4, ro=0xe7088b14, flags=0, imo=0x0, inp=0xc24317bc) at /usr/src/sys/netinet/ip_output.c:796 796 error = ip_fragment(ip, m, ifp-if_mtu, ifp
panic: page fault while in kernel mode
Hi all, It is an 5.4-STABLE running on i386, date is about Aug 10 4am UTC. When I'm doing: cat /var/log/maillog | ./log.pl to do some log analysis, I panicked this system. Here are some console log and kgdb output. I'll keeping this dump for sometime, so if anyone wants any information, feel free to contact me :-) Regards, Rong-En Fan Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x1c fault code = supervisor write, page not present instruction pointer = 0x8:0xc05123aa stack pointer = 0x10:0xea39b9cc frame pointer = 0x10:0xea39b9ec code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 63117 (cat) trap number = 12 panic: page fault cpuid = 0 boot() called on cpu#0 Uptime: 8d14h34m14s (kgdb) bt #0 doadump () at pcpu.h:160 #1 0xc04c141d in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:410 #2 0xc04c17cb in panic (fmt=0xc062d55e %s) at /usr/src/sys/kern/kern_shutdown.c:566 #3 0xc0611a8d in trap_fatal (frame=0xea39b98c, eva=0) at /usr/src/sys/i386/i386/trap.c:817 #4 0xc0611784 in trap_pfault (frame=0xea39b98c, usermode=0, eva=28) at /usr/src/sys/i386/i386/trap.c:735 #5 0xc0611369 in trap (frame= {tf_fs = 24, tf_es = -899547120, tf_ds = -365363184, tf_edi = -684394740, tf_esi = -684394740, tf_ebp = -365315604, tf_isp = -365315656, tf_ebx = -684394740, tf_edx = 0, tf_ecx = -899544064, tf_eax = 4, tf_trapno = 12, tf_err = 2, tf_eip = -1068424278, tf_cs = 8, tf_eflags = 66198, tf_esp = 1049856, tf_ss = 33554464}) at /usr/src/sys/i386/i386/trap.c:425 #6 0xc05fd51a in calltrap () at /usr/src/sys/i386/i386/exception.s:140 #7 0x0018 in ?? () #8 0xca620010 in ?? () #9 0xea390010 in ?? () #10 0xd734f70c in ?? () #11 0xd734f70c in ?? () #12 0xea39b9ec in ?? () #13 0xea39b9b8 in ?? () #14 0xd734f70c in ?? () #15 0x in ?? () #16 0xca620c00 in ?? () #17 0x0004 in ?? () #18 0x000c in ?? () #19 0x0002 in ?? () #20 0xc05123aa in vfs_vmio_release (bp=0xd734f70c) at atomic.h:365 #21 0xc0512db9 in getnewbuf (slpflag=0, slptimeo=0, size=16384, maxsize=16384) at /usr/src/sys/kern/vfs_bio.c:1885 #22 0xc0514619 in getblk (vp=0xc4b96840, blkno=1019, size=16384, slpflag=0, slptimeo=0, flags=0) at /usr/src/sys/kern/vfs_bio.c:2585 #23 0xc0519810 in cluster_read (vp=0xc4b96840, filesize=25188657, lblkno=1019, size=16384, cred=0x0, totread=4096, seqcount=127, bpp=0x0) at /usr/src/sys/kern/vfs_cluster.c:123 #24 0xc05af374 in ffs_read (ap=0x0) at /usr/src/sys/ufs/ffs/ffs_vnops.c:462 #25 0xc05324d2 in vn_read (fp=0xcab176a4, uio=0xea39bcb0, active_cred=0xc794d800, flags=0, td=0xca620c00) at vnode_if.h:398 #26 0xc04e79e0 in dofileread (td=0xca620c00, fd=0, fp=0xcab176a4, auio=0xea39bcb0, offset=Unhandled dwarf expression opcode 0x93 ) at file.h:233 #27 0xc04e7809 in kern_readv (td=0xca620c00, fd=3, auio=0x0) at /usr/src/sys/kern/sys_generic.c:191 #28 0xc04e76df in read (td=0x0, uap=0x0) at /usr/src/sys/kern/sys_generic.c:115 #29 0xc0611e6a in syscall (frame= {tf_fs = 47, tf_es = 47, tf_ds = -1078001617, tf_edi = 1, tf_esi = 4096, tf_ebp = -1077941336, tf_isp = -365314716, tf_ebx = 0, tf_edx = 134541312, tf_ecx = 1, tf_eax = 3, tf_trapno = 0, tf_err = 2, tf_eip = 671949351, tf_cs = 31, tf_eflags = 582, tf_esp = -1077941476, tf_ss = 47}) at /usr/src/sys/i386/i386/trap.c:1009 #30 0xc05fd56f in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:201 #31 0x002f in ?? () #32 0x002f in ?? () #33 0xbfbf002f in ?? () #34 0x0001 in ?? () #35 0x1000 in ?? () #36 0xbfbfeba8 in ?? () #37 0xea39bd64 in ?? () #38 0x in ?? () #39 0x0804f000 in ?? () #40 0x0001 in ?? () #41 0x0003 in ?? () #42 0x in ?? () #43 0x0002 in ?? () #44 0x280d2227 in ?? () #45 0x001f in ?? () #46 0x0246 in ?? () #47 0xbfbfeb1c in ?? () #48 0x002f in ?? () #49 0x in ?? () #50 0x in ?? () #51 0x in ?? () #52 0x in ?? () #53 0x0eb8f000 in ?? () #54 0xca615e20 in ?? () #55 0xca620c00 in ?? () #56 0xea39bb30 in ?? () #57 0xea39bb18 in ?? () #58 0xc3097900 in ?? () #59 0xc04d46f8 in sched_switch (td=0x1000, newtd=0x0, flags=Cannot access memory at address 0xbfbfebb8 ) at /usr/src/sys/kern/sched_4bsd.c:881 Previous frame inner to this frame (corrupt stack?) (kgdb) up #20 0xc05123aa in vfs_vmio_release (bp=0xd734f70c) at atomic.h:365 365 atomic.h: No such file or directory. in atomic.h Current language: auto; currently c (kgdb) #21 0xc0512db9 in getnewbuf (slpflag=0, slptimeo=0, size=16384, maxsize=16384) at /usr/src/sys/kern/vfs_bio.c:1885 1885vfs_vmio_release(bp); (kgdb) #22 0xc0514619 in getblk (vp=0xc4b96840, blkno=1019, size=16384, slpflag=0, slptimeo=0, flags=0) at /usr/src/sys/kern
Re: panic: sbflush_locked on 5.4-p5/i386
Hi After upgrading to 5-STABLE (about Aug 6), it works very good. With mpsafenet=1, it can work more than one day without panic. For 5.4-p5, it will panic at most half day or so. This bug seems fixed after 5.4 is released. I'll keep watching this machine. Will let you know if it still have similar panics ;-) Regards, Rong-En Fan On 7/29/05, Robert Watson [EMAIL PROTECTED] wrote: On Mon, 25 Jul 2005, Rong-En Fan wrote: I have a 5.4-p5 running on i386. Got a panic: panic: sbflush_locked: cc 0 || mb 0xc33bf000 || mbcnt 4294967040 It is an web server running Apache and Postfix as a backup MX. I'm using gmirror on all partitions and thus cannot get a dump (swap is on gmirror). Some ddb outputs are below. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: sbflush_locked on 5.4-p5/i386
On 8/1/05, Alexander S. Usov [EMAIL PROTECTED] wrote: In just 2 days of waiting I got it, however it looks that it has fired in a bit different place. (kgdb) bt #0 doadump () at pcpu.h:159 #1 0xc0513899 in boot (howto=260) at ../../../kern/kern_shutdown.c:410 #2 0xc0513ede in panic (fmt=0xc06ac87f sbdrop) at ../../../kern/kern_shutdown.c:566 #3 0xc05556f4 in sbdrop_locked (sb=0xc2285ad8, len=112) at ../../../kern/uipc_socket2.c:1149 #4 0xc05b27f2 in tcp_input (m=0xc1e31800, off0=152) at ../../../netinet/tcp_input.c:2209 #5 0xc05a9b13 in ip_input (m=0xc1e31800) at ../../../netinet/ip_input.c:776 #6 0xc059215e in netisr_processqueue (ni=0xc070b0f8) at ../../../net/netisr.c:233 #7 0xc059241d in swi_net (dummy=0x0) at ../../../net/netisr.c:346 #8 0xc04fb9a1 in ithread_loop (arg=0xc1979500) at ../../../kern/kern_intr.c:547 #9 0xc04fa9dc in fork_exit (callout=0xc04fb8ea ithread_loop, arg=0x0, frame=0x0)at ../../../kern/kern_fork.c:791 #10 0xc0656a8c in fork_trampoline () at ../../../i386/i386/exception.s:209 I also got a panic:sbdrop, but a bit different on path: http://www.rafan.org/FreeBSD/5.4-so/panic-sbdrop (but no dump) And, another one with trap 12 http://www.rafan.org/FreeBSD/5.4-so/panic-trap12 I have dump for this, but kgdb cant read it :-( It says cannot read PTD. This machine is our main www server, so I have to make it as stable as possible. I might be able to switch mpsafenet at night. By the way, I have accf_http(4) and accf_data(4) compiled in. But with or without them, I can easily get panic with mpsafenet in less half day. I am going to keep it around for some time, so I can easisy do a full bt or variables. Corresponding dmesg sysctl output can be found at https://kvip88.kvi.nl/~usov -- Best regards, Alexander. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: sbflush_locked on 5.4-p5/i386
On 7/29/05, Robert Watson [EMAIL PROTECTED] wrote: On Mon, 25 Jul 2005, Rong-En Fan wrote: I have a 5.4-p5 running on i386. Got a panic: panic: sbflush_locked: cc 0 || mb 0xc33bf000 || mbcnt 4294967040 It is an web server running Apache and Postfix as a backup MX. I'm using gmirror on all partitions and thus cannot get a dump (swap is on gmirror). Some ddb outputs are below. Is this system an SMP and/or HTT system? SMP (2 physical CPU) and HTT is enabled. So, there are total 4. If this problem is reproduceable, could I ask you to capture the following serial console output from DDB: will do. Would it be possible to add an extra ATA disk to use for swap and capturing a core dump? it's a bit hard for me to add this :-( ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
panic: sbflush_locked on 5.4-p5/i386
hello, I have a 5.4-p5 running on i386. Got a panic: panic: sbflush_locked: cc 0 || mb 0xc33bf000 || mbcnt 4294967040 It is an web server running Apache and Postfix as a backup MX. I'm using gmirror on all partitions and thus cannot get a dump (swap is on gmirror). Some ddb outputs are below. Google told me that http://lists.freebsd.org/pipermail/freebsd-current/2004-December/044535.html looks related. But the code path is different. Note that the patch in that mail is already in 5.4. If needed, I can provide kernel conf. I also tuned following sysctls: vfs.hirunningspace=2097152 kern.ipc.somaxconn=4096 kern.maxfiles=3 kern.maxfilesperproc=3 net.inet.ip.random_id=1 machdep.hyperthreading_allowed=1 The DDB messages go here: cpuid = 3 KDB: enter: panic [thread pid 61 tid 100061 ] Stopped at kdb_enter+0x2b: nop db wh Tracing pid 61 tid 100061 td 0xc311e180 kdb_enter(c05f3bc6) at kdb_enter+0x2b panic(c05f6f09,0,c33bf000,ff00,c3a1970c) at panic+0x127 sbflush_locked(c3a1970c,c3a19654,e74aeba4,c04e4cb4,c3a1970c) at sbflush_locked+0x6f sbrelease_locked(c3a1970c,c3a19654) at sbrelease_locked+0xd sofree(c3a19654) at sofree+0x26c in_pcbdetach(c371d870,c3e996f0,c3e996f0,e74aec9c,c05355df) at in_pcbdetach+0xb6 tcp_close(c3e996f0,1,1,1042e,1) at tcp_close+0x16 tcp_input(c4513400,14,1c1e708c,0,0) at tcp_input+0x2297 ip_input(c4513400) at ip_input+0x4f1 netisr_processqueue(c0643298) at netisr_processqueue+0xa3 swi_net(0) at swi_net+0xf2 ithread_loop(c3094c80,e74aed48) at ithread_loop+0x159 fork_exit(c049c138,c3094c80,e74aed48) at fork_exit+0x75 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe74aed7c, ebp = 0 --- db ps 61 c311ce200 0 0 204 [CPU 3] swi1: net Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: IBM xSeries 335 and FreeBSD 5 STABLE. SMP problem
On 7/19/05, Alexander Markov [EMAIL PROTECTED] wrote: If you unload kernel and load it again at boot manually, can 335 boot? I have one 336 with 5.4 that must use this trick to boot, otherwise it hangs after ipfw2 initialized. On the other hand, I have 3 335 installed with 5.4 running SMP smoothly. Nope, this trick doesn't work for me :-( And btw, do you have LSI Logic SCSI controller on your 335? Sure. It is mpt0: LSILogic 1030 Ultra4 Adapter port 0x2300-0x23ff mem 0xfbfe-0xfbfefff I have tried it with RAID1 (with patch, performance is fine) and It can boot with/without the patch. Anyway, I use gmirror now. I'll try to upgrade BIOS today, for it seems to be the only difference between my and people in the list's hardware. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: IBM xSeries 335 and FreeBSD 5 STABLE. SMP problem
On 7/14/05, Alexander Markov [EMAIL PROTECTED] wrote: Hello! I've got IBM xseries 335 with FreeBSD 5.4 installed, which hangs during boot without panic. If I boot it with hint.apic.0.disabled=1 - everything is ok, except the fact, that only one CPU is detected (from two ones). I tried kernels with SMP and without SMP - nothing changed, booting with apic kills OS. Please look at boot log below. It seems like mounting / from LSILogic 1030 Ultra4 Adapter (da0 over mpt0) results in hang. Cvsuping to STABLE gave no effect :-( If any information or tests are required, I'd be glad to provide it. Hello, If you unload kernel and load it again at boot manually, can 335 boot? I have one 336 with 5.4 that must use this trick to boot, otherwise it hangs after ipfw2 initialized. On the other hand, I have 3 335 installed with 5.4 running SMP smoothly. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
5.3-p16/i386 unknown reason console hang
Hello all, Recently, one of our 5.3-p16/i386 machine got frequenctly hang. Details, 1. I can switch vty, but can't login (after typing username, got hang) 2. can response to ping, but not other tcp/udp services 3. can break into ddb It's a IBM X236 (EM64T) with 2G memory, and running Postfix/Amavisd/clamav and mail/openwebmail with apache2. Sometime ago, I also reported similar hang on 5.4/amd64, in fact they are the same machines, but at that time, I have some non-default nfs mount options. But now, nfs mount options only includes -L and nodev, nosuid. I'm wondering if it is some kind of hareware problems, similar thins happens on 5.4/amd64, 5.3/5.4 i386. Anyway, I have kernel conf, loader.conf, dmesg, and two ddb output (ps, show lockedvn, show threads) at http://rafan.infor.org/tmp/236/. By the way, a strange thing is that sometims, after hang, I reboot the machine, after *foreground* fsck, when it enters multiuser, after the login prompt, I got another hang. But this time, I can't break into ddb.. only solution is the power cycle. If you need more informations, please let me know :-) Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
panic: ffs_blkfree: freeing free block
Hi, I have seen this knid of panic on 5.3/i386 (-p15) and 4.x. Machine is an i386 machine with an external hardware RAID, shared with nfs. There are total 20+ nfs client. When one exported space gets full and some user's program keeping write (probability with remove) data to this space, sometimes after lots of /dev/x is full message, I got this panic. dev = da0s1d, block = 44652432, fs = /export/b1 panic: ffs_blkfree: freeing free block cpuid = 3 KDB: enter: panic [thread 100112] Stopped at kdb_enter+0x30: leave db wh kdb_enter(c066992d,3,c0672f0b,e4ba6ae0,5) at kdb_enter+0x30 panic(c0672f0b,c22b58a8,2a95790,0,c23638d4) at panic+0x13e ffs_blkfree(c2363800,c23cb318,2a95790,0,4000) at ffs_blkfree+0x3d2 indir_trunc(c2690e00,aa55e20,0,1,80c) at indir_trunc+0x30d handle_workitem_freeblocks(c2690e00,0,2,6,0) at handle_workitem_freeblocks+0x20e process_worklist_item(0,0,42935dad,0,0) at process_worklist_item+0x1e1 softdep_process_worklist(0,1e,c1f1a320,0,0) at softdep_process_worklist+0xcc sched_sync(0,e4ba6d48,0,0,0) at sched_sync+0x5f2 fork_exit(c0541295,0,e4ba6d48) at fork_exit+0x80 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe4ba6d7c, ebp = 0 --- The system disk is on ips(4) which does not support dump in 5.3 (supportted in 5.4). So, there is no dump available. I don't exactly know what kind of accessing pattern causes this. Therefore, no idea where to start at. Wonder if this can be fixed or so. Cheers, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4/amd64 console hang
On 4/16/05, Anders Nordby [EMAIL PROTECTED] wrote: Hi, On Fri, Apr 15, 2005 at 03:27:11PM +0800, Rong-En Fan wrote: I'm using a Pentium Xeon 3.2G * 2 running 5.3/5.4 amd54 RELEASE. That's a strange combination. Don't use FreeBSD/amd64 with Intel Pentium Xeon processors. Maybe you made a typing error or two? :-) Those Xeon are EM64T, compatible with x86-64 :-) By the way, I'm thinking that more frequently hang might related with large read/write block in mount_nfs -r/-w (I use 8192, original is 1024). Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
5.4/amd64 console hang
Hi, I'm using a Pentium Xeon 3.2G * 2 running 5.3/5.4 amd54 RELEASE. Recently, I have frequently hang, say 4 in 3 days. Originally, I'm using 5.3-RELEASE-p5 or so, and it happens, so I decided upgrade to 5.4-RC1/RC2 and disable HTT in BIOS. Somehow, I noticed that this situation happens more frequently after upgrade to 5.4-RC1/RC2. This is a Mail server running Postifx with clamd/amavisd and apache2 with some webmail applications. All users home directory (toatl 10) is mounted from another NFS server running 5.4-PRELEASE/i386. I have few ddb output and kernel config here: http://rafan.infor.org/tmp/5.4-hang/ I executes ps, show threads, show lockedvn. when console hangs, serial console does not response, front console, I can use alt+f? to switch vty, caps/numlock led is fine, but keyboard does not response. can break to ddb. any suggestions? Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.3-S (Mar 6) softdep stack backtrace from getdirtybuf()... problem?
On Apr 11, 2005 3:16 AM, Brandon S. Allbery KF8NH [EMAIL PROTECTED] wrote: I have twice so far had the kernel syslog a stack backtrace with no other information. Inspection of the kernel source, to the best of my limited understanding, suggests that getdirtybuf() was handed a buffer without an associated vnode. Kernel config file and make.conf attached. Should I be concerned? Note that this system is an older 600MHz Athlon with only 256MB RAM, and both times this triggered it was thrashing quite a bit (that's more or less its usual state...). I saw these similar trace on a 5.4-RC1/amd64 with 9 NFS mount. I suspect this is a issue with busy NFS server? rafan. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: xSeries346 and FreeBSD 5.3
On Wed, 9 Mar 2005 12:10:09 +0100, pck [EMAIL PROTECTED] wrote: Welcome, I've just bought IBM xSeries346 server. There was no problem with installing FreeBSD 5.3, but problem is with LAN card - BCM5721 (I think, reseller told me this name). Is there any possibility to run this card? You have to install a 5-STABLE or manually get following files src/sys/dev/bge/* src/sys/dev/mii/miidevs src/sys/dev/mii/brgphy.c and recompile kernel. without those two mii files, you can only run 100-baseTX. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: Fatal trap 12: page fault while in kernel mode
On Wed, 16 Feb 2005 15:36:25 -0800 (PST), Doug White [EMAIL PROTECTED] wrote: On Wed, 16 Feb 2005, Rong-En Fan wrote: Hello, This is a 5.3-RELEASE-p5/amd64 on IBM X236 (EM64T) with 2GB RAM and a LSI 21320 rmpt(4) running at 160MB/s with a hardware RAID (da0, da1). HTT is enabled. When I run benchmark/blogbench on /da0/ I can *reproduce* this panic again and again: (I'm getting a dump now, let me fsck first) kernel conf dmesg (boot -v) are at http://rafan.infor.org/tmp/236/ I only have an 2x244 Opteron box so I'm not sure if this is a problem with KSE or with hyperthreading. I'll try the benchmark anyway and see if I can reproduce. Looks like I'll need to rebuild first, I'm getting the exiting from __thread_start error... If I use machdep.hlt_logical_cpus=1, I got the same panic. And when I use kgdb to read the kernel dump, I see only #1 ?? (??) in backtrace. I just reinstall the system to 5.3-p5, i386. It does not panic and finsih the test two times. I'll run more to see if is panics. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
panic: Fatal trap 12: page fault while in kernel mode
] irq40: 53 ff007b7712e8 b1a8d0000 0 0 204 [IWAIT] irq39: 52 ff007b7715d0 b1a8e0000 0 0 204 [IWAIT] irq38: ips0 51 ff007b7718b8 b1a8f0000 0 0 204 [IWAIT] irq37: 50 ff007b771ba0 b1a90 0 0 204 [IWAIT] irq36: 49 ff007b7d6000 b1a910000 0 0 204 [IWAIT] irq35: 48 ff007b78 b1a050000 0 0 204 [IWAIT] irq34: 47 ff007b7802e8 b1a060000 0 0 204 [IWAIT] irq33: 46 ff007b7805d0 b1a070000 0 0 204 [IWAIT] irq32: 45 ff007b7808b8 b1a080000 0 0 204 [IWAIT] irq31: 44 ff007b780ba0 b1a090000 0 0 204 [IWAIT] irq30: 43 ff007b783000 b1a460000 0 0 204 [IWAIT] irq29: 42 ff007b7832e8 b1a470000 0 0 204 [IWAIT] irq28: 41 ff007b7835d0 b1a480000 0 0 204 [IWAIT] irq27: 40 ff007b7838b8 b1a490000 0 0 204 [IWAIT] irq26: 39 ff007b783ba0 b1a4a0000 0 0 204 [IWAIT] irq25: 38 ff007b7662e8 b19c0 0 0 204 [IWAIT] irq24: 37 ff007b7665d0 b19c10000 0 0 204 [IWAIT] irq71: 36 ff007b7668b8 b19c20000 0 0 204 [IWAIT] irq70: 35 ff007b766ba0 b19c30000 0 0 204 [IWAIT] irq69: 34 ff007b7b5000 b1a00 0 0 204 [IWAIT] irq68: 33 ff007b7b52e8 b1a010000 0 0 204 [IWAIT] irq67: 32 ff007b7b55d0 b1a020000 0 0 204 [IWAIT] irq66: 31 ff007b7b58b8 b1a030000 0 0 204 [IWAIT] irq65: 30 ff007b7b5ba0 b1a040000 0 0 204 [IWAIT] irq64: 29 ff007b7dc8b8 b199a0000 0 0 204 [IWAIT] irq63: 28 ff007b7dcba0 b199b0000 0 0 204 [IWAIT] irq62: 27 ff007b784000 b199c0000 0 0 204 [IWAIT] irq61: 26 ff007b7842e8 b19bb0000 0 0 204 [IWAIT] irq60: 25 ff007b7845d0 b19bc0000 0 0 204 [IWAIT] irq59: 24 ff007b7848b8 b19bd0000 0 0 204 [IWAIT] irq58: 23 ff007b784ba0 b19be0000 0 0 204 [IWAIT] irq57: 22 ff007b766000 b19bf0000 0 0 204 [IWAIT] irq56: 21 ff007b7d42e8 b19750000 0 0 204 [IWAIT] irq55: 20 ff007b7d45d0 b19760000 0 0 204 [IWAIT] irq54: 19 ff007b7d48b8 b19950000 0 0 204 [IWAIT] irq53: mpt1 18 ff007b7d4ba0 b19960000 0 0 204 [CPU 1] irq52: mpt0 17 ff007b7dc000 b19970000 0 0 204 [IWAIT] irq51: 16 ff007b7dc2e8 b19980000 0 0 204 [IWAIT] irq50: 15 ff007b7dc5d0 b19990000 0 0 204 [IWAIT] irq49: 14 ff007b76d000 b19330000 0 0 204 [IWAIT] irq48: 13 ff007b76d2e8 b1970 0 0 20c [Can run] idle: cpu0 12 ff007b76d5d0 b19710000 0 0 20c [Can run] idle: cpu1 11 ff007b76d8b8 b19720000 0 0 20c [Can run] idle: cpu2 10 ff007b76dba0 b19730000 0 0 20c [Can run] idle: cpu3 1 ff007b7d4000 b19740000 0 1 0004200 [SLPQ wait 0xff007b7d4000][SLP] init 0 8051e580 805f50000 0 0 200 [SLPQ sched 0x8051e580][SLP] swapper Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
panic: Fatal trap 12: page fault while in kernel mode
[IWAIT] irq38: ips0 51 ff007b7718b8 b1a8f0000 0 0 204 [IWAIT] irq37: 50 ff007b771ba0 b1a90 0 0 204 [IWAIT] irq36: 49 ff007b7d6000 b1a910000 0 0 204 [IWAIT] irq35: 48 ff007b78 b1a050000 0 0 204 [IWAIT] irq34: 47 ff007b7802e8 b1a060000 0 0 204 [IWAIT] irq33: 46 ff007b7805d0 b1a070000 0 0 204 [IWAIT] irq32: 45 ff007b7808b8 b1a080000 0 0 204 [IWAIT] irq31: 44 ff007b780ba0 b1a090000 0 0 204 [IWAIT] irq30: 43 ff007b783000 b1a460000 0 0 204 [IWAIT] irq29: 42 ff007b7832e8 b1a470000 0 0 204 [IWAIT] irq28: 41 ff007b7835d0 b1a480000 0 0 204 [IWAIT] irq27: 40 ff007b7838b8 b1a490000 0 0 204 [IWAIT] irq26: 39 ff007b783ba0 b1a4a0000 0 0 204 [IWAIT] irq25: 38 ff007b7662e8 b19c0 0 0 204 [IWAIT] irq24: 37 ff007b7665d0 b19c10000 0 0 204 [IWAIT] irq71: 36 ff007b7668b8 b19c20000 0 0 204 [IWAIT] irq70: 35 ff007b766ba0 b19c30000 0 0 204 [IWAIT] irq69: 34 ff007b7b5000 b1a00 0 0 204 [IWAIT] irq68: 33 ff007b7b52e8 b1a010000 0 0 204 [IWAIT] irq67: 32 ff007b7b55d0 b1a020000 0 0 204 [IWAIT] irq66: 31 ff007b7b58b8 b1a030000 0 0 204 [IWAIT] irq65: 30 ff007b7b5ba0 b1a040000 0 0 204 [IWAIT] irq64: 29 ff007b7dc8b8 b199a0000 0 0 204 [IWAIT] irq63: 28 ff007b7dcba0 b199b0000 0 0 204 [IWAIT] irq62: 27 ff007b784000 b199c0000 0 0 204 [IWAIT] irq61: 26 ff007b7842e8 b19bb0000 0 0 204 [IWAIT] irq60: 25 ff007b7845d0 b19bc0000 0 0 204 [IWAIT] irq59: 24 ff007b7848b8 b19bd0000 0 0 204 [IWAIT] irq58: 23 ff007b784ba0 b19be0000 0 0 204 [IWAIT] irq57: 22 ff007b766000 b19bf0000 0 0 204 [IWAIT] irq56: 21 ff007b7d42e8 b19750000 0 0 204 [IWAIT] irq55: 20 ff007b7d45d0 b19760000 0 0 204 [IWAIT] irq54: 19 ff007b7d48b8 b19950000 0 0 204 [IWAIT] irq53: mpt1 18 ff007b7d4ba0 b19960000 0 0 204 [CPU 1] irq52: mpt0 17 ff007b7dc000 b19970000 0 0 204 [IWAIT] irq51: 16 ff007b7dc2e8 b19980000 0 0 204 [IWAIT] irq50: 15 ff007b7dc5d0 b19990000 0 0 204 [IWAIT] irq49: 14 ff007b76d000 b19330000 0 0 204 [IWAIT] irq48: 13 ff007b76d2e8 b1970 0 0 20c [Can run] idle: cpu0 12 ff007b76d5d0 b19710000 0 0 20c [Can run] idle: cpu1 11 ff007b76d8b8 b19720000 0 0 20c [Can run] idle: cpu2 10 ff007b76dba0 b19730000 0 0 20c [Can run] idle: cpu3 1 ff007b7d4000 b19740000 0 1 0004200 [SLPQ wait 0xff007b7d4000][SLP] init 0 8051e580 805f50000 0 0 200 [SLPQ sched 0x8051e580][SLP] swapper Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
IBM ServeRAID 7k 5.3
[just for a record] Hi all, It seems ips(4) doesn't support ServeRAID 7k, however I just installed 5.3-RELEASE/i386 on IBM x236 which has ServeRAID 7k installed. Everything looks fine here (I'm running RAID-5 over 4 HDDs). A little problem is that once a HDD fails, FreeBSD doesn't know that unless I reboot it and saw the ips state is DEGRADED. pciconf dmesg are listed as below: [EMAIL PROTECTED]:14:0: class=0x010400 card=0x028e1014 chip=0x02509005 rev=0x07 hdr=0x00 vendor = 'Adaptec Inc' class= mass storage subclass = RAID ips0: Adaptec ServeRAID Adapter mem 0xcfffd000-0xcfffdfff irq 38 at device 14.0 on pci3 ips0: Reserved 0x1000 bytes for rid 0x10 type 3 at 0xcfffd000 ips0: [GIANT-LOCKED] ips0: logical drives: 1 ips0: Logical Drive 0: RAID5 sectors: 430116864, state OK ipsd0: Logical Drive on ips0 ipsd0: Logical Drive (210018MB) Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]