Re: 5.4-RC2 freezing - ATA related?
In [EMAIL PROTECTED], [EMAIL PROTECTED] writes: From: Peter Jeremy [EMAIL PROTECTED] On Wed, 2005-May-18 06:43:37 -0600, Elliot Finley wrote: Had the system lock up again. This is with the new ATA mkIII patches on http://people.freebsd.org/~sos/ATA. I didn't get the crashdump (forgot to set dumpdev), but I did get 'ps' and 'show lockedvnods' output from DDB. The output is in the form of screenshots combined into a single .pdf which can be accessed here http://www.efinley.com/Binder1.pdf That shows a deadlock-to-root in your /dev/ar0s1a (presumably root) filesystem. The perl process (pid 487) has an exclusive lock on the FS mountpoint - this is blocking 130 other processes. Pid 487 is itself waiting on another filesystem lock (you can't determine the actual lock tree without more poking around kernel memory). The vnode locks are held by processes: PID namewaiting on 487 perl [ufs c3c1c1b4] 57 syncer [snaplk c535f500] (holds 2 locks) 476 perl [ufs c87e4f1c] 489 perl [snaplk c535f500] (holds 2 locks) 3337 mksnap_ffs [getblk d77656f4] Looking through the process list, cron has started a dump -L which is trying to create a filesystem snapshot. That has wedged on getblk (trying to perform physical disk I/O) and is probably the root of your problem. Nothing else is waiting on physical I/O. I'd say that your first guess was right: This is a bug in the ATA code and is probably a job for sos. I took the -L option off of my dump command in my daily dump script. I've gone two days without locking up which is unusual. I think that may be what was tickling the bug that was locking me up. This is a filesystem lock problem, not an ATA driver problem. I analyzed it, and posted the results to -hackers last week, with the subject snapshots and innds. The problem is that there is an invariant being broken in msync() -- Kirk describes it fully in his reply to my message. -- Steve Watt KD6GGD PP-ASEL-IA ICBM: 121W 56' 57.8 / 37N 20' 14.9 Internet: steve @ Watt.COM Whois: SW32 Free time? There's no such thing. It just comes in varying prices... ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
Jamie Heckford wrote: On Wed, May 18, 2005 at 03:54:59PM -0700, Doug White wrote: On Wed, 18 May 2005, Jamie Heckford wrote: Hi Peter, On Thu, May 19, 2005 at 05:53:12AM +1000, Peter Jeremy wrote: On Wed, 2005-May-18 16:03:16 +0100, Jamie Heckford wrote: Managed to get a dump on our system for a similar prob we are getting: That traceback looks like a panic, not a deadlock. What was the panic message? Only have remote access to the box im afraid, is there anyway I can obtain the panic message? print msgbuf should do it Another one... looks completly different :-( [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol ps_pglobal_lookup] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-marcel-freebsd. #0 doadump () at pcpu.h:160 160 __asm __volatile(movl %%fs:0,%0 : =r (td)); (kgdb) bt full #0 doadump () at pcpu.h:160 No locals. #1 0xc04fac8a in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:410 first_buf_printf = 1 #2 0xc04faf50 in panic (fmt=0xc06c06db softdep_deallocate_dependencies: dangling deps) at /usr/src/sys/kern/kern_shutdown.c:566 td = (struct thread *) 0xc357fd80 bootopt = 260 newpanic = 1 ap = 0xc357fd80 \\\214\215ÃðjOÃ buf = softdep_deallocate_dependencies: dangling deps, '\0' repeats 209 times #3 0xc061cbfe in softdep_deallocate_dependencies (bp=0x0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:5961 No locals. #4 0xc053c8f4 in brelse (bp=0xd77932d4) at buf.h:431 No locals. #5 0xc054bd24 in flushbuflist (blist=0xd77932d4, flags=0, vp=0xc4bf9630, slpflag=0, slptimeo=0, errorp=0x0) at /usr/src/sys/kern/vfs_subr.c:1101 bp = (struct buf *) 0xd77932d4 nbp = (struct buf *) 0xd75948f0 found = 1 #6 0xc054b987 in vinvalbuf (vp=0xc4bf9630, flags=0, cred=0x0, td=0x0, slpflag=0, slptimeo=0) at /usr/src/sys/kern/vfs_subr.c:987 blist = (struct buf *) 0x0 error = 0 object = 0xc04efc79 #7 0xc054e85c in vclean (vp=0xc4bf9630, flags=8, td=0xc357fd80) at /usr/src/sys/kern/vfs_subr.c:2479 ---Type return to continue, or q return to quit--- active = 0 #8 0xc054eeb5 in vgonel (vp=0xc4bf9630, td=0xc357fd80) at /usr/src/sys/kern/vfs_subr.c:2697 No locals. #9 0xc054a9f2 in vlrureclaim (mp=0xc35b3c00) at pcpu.h:157 vp = (struct vnode *) 0xc4bf9630 done = 0 trigger = 10 usevnodes = 0 count = 7 #10 0xc054ac66 in vnlru_proc () at /usr/src/sys/kern/vfs_subr.c:598 mp = (struct mount *) 0xc35b3c00 nmp = (struct mount *) 0xc35b3c00 done = 5887 p = (struct proc *) 0xc38d8c5c td = (struct thread *) 0xc357fd80 #11 0xc04e67e8 in fork_exit (callout=0xc054aa98 vnlru_proc, arg=0x0, frame=0xe68aad38) at /usr/src/sys/kern/kern_fork.c:791 p = (struct proc *) 0xc38d8c5c td = (struct thread *) 0x0 #12 0xc066746c in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:209 No locals. (kgdb) panic: softdep_deallocate_dependencies: dangling deps Uptime: 10h26m14s Dumping 2047 MB 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 384 400 416 432 448 464 480 496 512 528 544 560 576 592 608 624 640 656 672 688 704 720 736 752 768 784 800 816 832 848 864 880 896 912 928 944 960 976 992 1008 1024 1040 1056 1072 1088 1104 1120 1136 1152 1168 1184 1200 1216 1232 1248 1264 1280 1296 1312 1328 1344 1360 1376 1392 1408 1424 1440 1456 1472 1488 1504 1520 1536 1552 1568 1584 1600 1616 1632 1648 1664 1680 1696 1712 1728 1744 1760 1776 1792 1808 1824 1840 1856 1872 1888 1904 1920 1936 1952 1968 1984 2000 2016 2032(kgdb) Would be really grateful if anyone could suggest anything, again it appears to happen around the time periodic runs (but has happened randomly under load, not sure if this is a red herring tho) If anyone needs anymore info, more than happy to oblige. Cheers -- Jamie Heckford Network Manager Trident Microsystems Ltd. t: +44(0)1737-780790 f: +44(0)1737-771908 w: http://www.trident-uk.co.uk/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
Jamie Heckford wrote: Another one... looks completly different :-( [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol ps_pglobal_lookup] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-marcel-freebsd. #0 doadump () at pcpu.h:160 160 __asm __volatile(movl %%fs:0,%0 : =r (td)); (kgdb) bt full #0 doadump () at pcpu.h:160 No locals. #1 0xc04fac8a in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:410 first_buf_printf = 1 #2 0xc04faf50 in panic (fmt=0xc06c06db softdep_deallocate_dependencies: dangling deps) at /usr/src/sys/kern/kern_shutdown.c:566 td = (struct thread *) 0xc357fd80 bootopt = 260 newpanic = 1 ap = 0xc357fd80 \\\214\215ÃðjOÃ buf = softdep_deallocate_dependencies: dangling deps, '\0' repeats 209 times #3 0xc061cbfe in softdep_deallocate_dependencies (bp=0x0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:5961 No locals. #4 0xc053c8f4 in brelse (bp=0xd77932d4) at buf.h:431 No locals. #5 0xc054bd24 in flushbuflist (blist=0xd77932d4, flags=0, vp=0xc4bf9630, slpflag=0, slptimeo=0, errorp=0x0) at /usr/src/sys/kern/vfs_subr.c:1101 bp = (struct buf *) 0xd77932d4 nbp = (struct buf *) 0xd75948f0 found = 1 #6 0xc054b987 in vinvalbuf (vp=0xc4bf9630, flags=0, cred=0x0, td=0x0, slpflag=0, slptimeo=0) at /usr/src/sys/kern/vfs_subr.c:987 blist = (struct buf *) 0x0 error = 0 object = 0xc04efc79 #7 0xc054e85c in vclean (vp=0xc4bf9630, flags=8, td=0xc357fd80) at /usr/src/sys/kern/vfs_subr.c:2479 ---Type return to continue, or q return to quit--- active = 0 #8 0xc054eeb5 in vgonel (vp=0xc4bf9630, td=0xc357fd80) at /usr/src/sys/kern/vfs_subr.c:2697 No locals. #9 0xc054a9f2 in vlrureclaim (mp=0xc35b3c00) at pcpu.h:157 vp = (struct vnode *) 0xc4bf9630 done = 0 trigger = 10 usevnodes = 0 count = 7 #10 0xc054ac66 in vnlru_proc () at /usr/src/sys/kern/vfs_subr.c:598 mp = (struct mount *) 0xc35b3c00 nmp = (struct mount *) 0xc35b3c00 done = 5887 p = (struct proc *) 0xc38d8c5c td = (struct thread *) 0xc357fd80 #11 0xc04e67e8 in fork_exit (callout=0xc054aa98 vnlru_proc, arg=0x0, frame=0xe68aad38) at /usr/src/sys/kern/kern_fork.c:791 p = (struct proc *) 0xc38d8c5c td = (struct thread *) 0x0 #12 0xc066746c in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:209 No locals. (kgdb) panic: softdep_deallocate_dependencies: dangling deps Uptime: 10h26m14s Dumping 2047 MB 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 384 400 416 432 448 464 480 496 512 528 544 560 576 592 608 624 640 656 672 688 704 720 736 752 768 784 800 816 832 848 864 880 896 912 928 944 960 976 992 1008 1024 1040 1056 1072 1088 1104 1120 1136 1152 1168 1184 1200 1216 1232 1248 1264 1280 1296 1312 1328 1344 1360 1376 1392 1408 1424 1440 1456 1472 1488 1504 1520 1536 1552 1568 1584 1600 1616 1632 1648 1664 1680 1696 1712 1728 1744 1760 1776 1792 1808 1824 1840 1856 1872 1888 1904 1920 1936 1952 1968 1984 2000 2016 2032(kgdb) Would be really grateful if anyone could suggest anything, again it appears to happen around the time periodic runs (but has happened randomly under load, not sure if this is a red herring tho) If anyone needs anymore info, more than happy to oblige. Cheers Is there anyway this could be triggered by a filesystem becoming full.? -- Jamie Heckford Network Manager Trident Microsystems Ltd. t: +44(0)1737-780790 f: +44(0)1737-771908 w: http://www.trident-uk.co.uk/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
From: Søren Schmidt [EMAIL PROTECTED] On 21/05/2005, at 0:52, Peter Jeremy wrote: On Fri, 2005-May-20 14:53:09 -0600, Elliot Finley wrote: From: Peter Jeremy [EMAIL PROTECTED] On Fri, 2005-May-20 08:25:58 -0600, Elliot Finley wrote: I took the -L option off of my dump command in my daily dump script. I've gone two days without locking up which is unusual. I think that may be what was tickling the bug that was locking me up. Sometime you might like to do a 'dd if=/dev/ar0 of=/dev/null bs=32k' just to confirm that you don't have any unreadable blocks (though this seems unlikely). came up clean. transfer went 40MB/s. That seem to leave the finger pointing at the ATA driver. Paging Søren: Are you have to help Elliot? ++No, my only advise is to use the ATA mkIII patches or better yet - ++current.. I'm already running with the newest ATA mkIII patches. Even with the patches, it freezes up when using the -L option on my daily dump. Elliot ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
Elliot Finley wrote: This has been happening since 5.3-R, I've been tuning different parameters to no avail. I've taken the disks off of the onboard ICH5 controller and put them a promise TX4 S150 controller, but still the same thing happens. The system freezes, but isn't totally dead. It'll still respond to pings, the screensaver still functions, but it won't respond to a CAD at the console. But if I press 'Enter' at the console, it'll give me a 'login:' prompt, but after entering the username, it never comes back with the 'password:' prompt. After manually resetting the system it boots and says 'Automatic file system check failed; help!' and drops into single user mode. Running fsck manually corrects errors on all volumes. Then it'll boot from that point. This seems to be triggered by daily periodic as it happens at 3:02-3:03AM each time. But it doesn't happen *every* morning. I've had a similar problem with an IBM Thinkpad A21p. The machine would slowly start to lock up until the only thing it would respond to were pings. This would usually occur when the filesystem was under a heavy load (like untarring openoffice). I managed to trace the problem to snapshots that were about 40 days old (I keep old snapshots around for CYA purposes). After deleting the old snapshots, the system functioned perfectly. I've been running it pretty hard now for the last few weeks and it hasn't locked up once. Whether or not the snapshots were the cause of the problem or just another symptom I can't really tell but deleting them definitely cured the problem. Right now I have a filesystem snapshot that's about a week old and it seems to be just fine. Mark ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
On 21/05/2005, at 0:52, Peter Jeremy wrote: On Fri, 2005-May-20 14:53:09 -0600, Elliot Finley wrote: From: Peter Jeremy [EMAIL PROTECTED] On Fri, 2005-May-20 08:25:58 -0600, Elliot Finley wrote: I took the -L option off of my dump command in my daily dump script. I've gone two days without locking up which is unusual. I think that may be what was tickling the bug that was locking me up. Sometime you might like to do a 'dd if=/dev/ar0 of=/dev/null bs=32k' just to confirm that you don't have any unreadable blocks (though this seems unlikely). came up clean. transfer went 40MB/s. That seem to leave the finger pointing at the ATA driver. Paging Søren: Are you have to help Elliot? No, my only advise is to use the ATA mkIII patches or better yet - current.. - Søren ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
* Søren Schmidt ([EMAIL PROTECTED]) wrote: No, my only advise is to use the ATA mkIII patches or better yet - current.. In a similar vein, I'm seeing the same WRITE_DMA timeouts and system lockups using ATA mkIII patches as I did using the standard RELENG_5 driver, on two seperate systems. I'm getting the WRITE_DMA retries on a multi-gmirror Athlon system using a PCI SATA card; the two PATA drives on the system are fine: FreeBSD 5.4-STABLE #0: Thu Apr 28 06:31:53 BST 2005 atapci1: SiI 3112 SATA150 controller port 0xcc00-0xcc0f,0xc800-0xc803,0xc400-0xc407,0xc000-0xc003,0xbc00-0xbc07 mem 0xe7062000-0xe70621ff irq 11 at device 12.0 on pci0 ad4: 381554MB ST3400832AS/3.01 [775221/16/63] at ata2-master SATA150 ad6: 381554MB ST3400832AS/3.01 [775221/16/63] at ata3-master SATA150 .. ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=401743679 ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=781421759 It seems harmless, but results in writes freezing for several seconds every couple of hundred MB (annoying with 360G of storage as you might imagine). It normally favours a single drive, but seems to bounce between ad4 and 6 for no apparant reason. Replacing the SATA card and cables has no effect. Attempting to drop the drives to PIO with atacontrol doesn't seem to do anything either (they remain at SATA150). The other system where I see the lockups (I used to get READ/WRITE_DMA timeouts with the lockup many moons ago, which seems to have started after a system update, but for the past 6+ months or so I just get the lockup) is an old BP6 (dual Celeron), on two different channels on two different drive: FreeBSD 5.4-STABLE #2: Tue Apr 26 17:59:25 BST 2005 atapci1: HighPoint HPT366 UDMA66 controller port 0xd800-0xd8ff,0xd400-0xd403,0xd000-0xd007 irq 18 at device 19.0 on pci0 atapci2: HighPoint HPT366 UDMA66 controller port 0xe400-0xe4ff,0xe000-0xe003,0xdc00-0xdc07 irq 18 at device 19.1 on pci0 ad4: 76319MB Seagate ST380011A 3.04 at ata2-master UDMA66 ad6: 114473MB Seagate ST3120026A 3.01 at ata3-master UDMA66 Setting these drives to PIO4 resolves the stability problems (which again only occurs under heavy disk activity, almost always on writes), but makes the system crawl. I'm planning on migrating it to gmirror, which I expect will make it behave more like the Athlon, but obviously I'd like to be able to use DMA reliably without resorting to RAID-1 everywhere. Save me Søren! -- Thomas 'Freaky' Hurst http://hur.st/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
On 22/05/2005, at 2:36, Thomas Hurst wrote: * Søren Schmidt ([EMAIL PROTECTED]) wrote: No, my only advise is to use the ATA mkIII patches or better yet - current.. In a similar vein, I'm seeing the same WRITE_DMA timeouts and system lockups using ATA mkIII patches as I did using the standard RELENG_5 driver, on two seperate systems. I'm getting the WRITE_DMA retries on a multi-gmirror Athlon system using a PCI SATA card; the two PATA drives on the system are fine: FreeBSD 5.4-STABLE #0: Thu Apr 28 06:31:53 BST 2005 atapci1: SiI 3112 SATA150 controller port 0xcc00-0xcc0f, 0xc800-0xc803,0xc400-0xc407,0xc000-0xc003,0xbc00-0xbc07 mem 0xe7062000-0xe70621ff irq 11 at device 12.0 on pci0 ad4: 381554MB ST3400832AS/3.01 [775221/16/63] at ata2-master SATA150 ad6: 381554MB ST3400832AS/3.01 [775221/16/63] at ata3-master SATA150 .. ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=401743679 ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=781421759 It seems harmless, but results in writes freezing for several seconds every couple of hundred MB (annoying with 360G of storage as you might imagine). It normally favours a single drive, but seems to bounce between ad4 and 6 for no apparant reason. Replacing the SATA card and cables has no effect. Attempting to drop the drives to PIO with atacontrol doesn't seem to do anything either (they remain at SATA150). The other system where I see the lockups (I used to get READ/WRITE_DMA timeouts with the lockup many moons ago, which seems to have started after a system update, but for the past 6+ months or so I just get the lockup) is an old BP6 (dual Celeron), on two different channels on two different drive: FreeBSD 5.4-STABLE #2: Tue Apr 26 17:59:25 BST 2005 atapci1: HighPoint HPT366 UDMA66 controller port 0xd800-0xd8ff,0xd400-0xd403,0xd000-0xd007 irq 18 at device 19.0 on pci0 atapci2: HighPoint HPT366 UDMA66 controller port 0xe400-0xe4ff,0xe000-0xe003,0xdc00-0xdc07 irq 18 at device 19.1 on pci0 ad4: 76319MB Seagate ST380011A 3.04 at ata2-master UDMA66 ad6: 114473MB Seagate ST3120026A 3.01 at ata3-master UDMA66 Setting these drives to PIO4 resolves the stability problems (which again only occurs under heavy disk activity, almost always on writes), but makes the system crawl. I'm planning on migrating it to gmirror, which I expect will make it behave more like the Athlon, but obviously I'd like to be able to use DMA reliably without resorting to RAID-1 everywhere. Save me Søren! You have picked some of the most dreaded HW out there thats for sure, so I'm not sure I can do that :) Anyhow, you should try a recent -current since some of the race/ timeout problems thats possible in 5.x has been fixed there. - Søren ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
From: Peter Jeremy [EMAIL PROTECTED] On Wed, 2005-May-18 06:43:37 -0600, Elliot Finley wrote: Had the system lock up again. This is with the new ATA mkIII patches on http://people.freebsd.org/~sos/ATA. I didn't get the crashdump (forgot to set dumpdev), but I did get 'ps' and 'show lockedvnods' output from DDB. The output is in the form of screenshots combined into a single .pdf which can be accessed here http://www.efinley.com/Binder1.pdf That shows a deadlock-to-root in your /dev/ar0s1a (presumably root) filesystem. The perl process (pid 487) has an exclusive lock on the FS mountpoint - this is blocking 130 other processes. Pid 487 is itself waiting on another filesystem lock (you can't determine the actual lock tree without more poking around kernel memory). The vnode locks are held by processes: PID namewaiting on 487 perl [ufs c3c1c1b4] 57 syncer [snaplk c535f500] (holds 2 locks) 476 perl [ufs c87e4f1c] 489 perl [snaplk c535f500] (holds 2 locks) 3337 mksnap_ffs [getblk d77656f4] Looking through the process list, cron has started a dump -L which is trying to create a filesystem snapshot. That has wedged on getblk (trying to perform physical disk I/O) and is probably the root of your problem. Nothing else is waiting on physical I/O. I'd say that your first guess was right: This is a bug in the ATA code and is probably a job for sos. I took the -L option off of my dump command in my daily dump script. I've gone two days without locking up which is unusual. I think that may be what was tickling the bug that was locking me up. Thanks for the analysis Peter. Elliot ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
On Fri, 2005-May-20 08:25:58 -0600, Elliot Finley wrote: I took the -L option off of my dump command in my daily dump script. I've gone two days without locking up which is unusual. I think that may be what was tickling the bug that was locking me up. Sometime you might like to do a 'dd if=/dev/ar0 of=/dev/null bs=32k' just to confirm that you don't have any unreadable blocks (though this seems unlikely). -- Peter Jeremy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
From: Peter Jeremy [EMAIL PROTECTED] On Fri, 2005-May-20 08:25:58 -0600, Elliot Finley wrote: I took the -L option off of my dump command in my daily dump script. I've gone two days without locking up which is unusual. I think that may be what was tickling the bug that was locking me up. Sometime you might like to do a 'dd if=/dev/ar0 of=/dev/null bs=32k' just to confirm that you don't have any unreadable blocks (though this seems unlikely). came up clean. transfer went 40MB/s. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
On Fri, 2005-May-20 14:53:09 -0600, Elliot Finley wrote: From: Peter Jeremy [EMAIL PROTECTED] On Fri, 2005-May-20 08:25:58 -0600, Elliot Finley wrote: I took the -L option off of my dump command in my daily dump script. I've gone two days without locking up which is unusual. I think that may be what was tickling the bug that was locking me up. Sometime you might like to do a 'dd if=/dev/ar0 of=/dev/null bs=32k' just to confirm that you don't have any unreadable blocks (though this seems unlikely). came up clean. transfer went 40MB/s. That seem to leave the finger pointing at the ATA driver. Paging Søren: Are you have to help Elliot? -- Peter Jeremy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
Previously posted trap frame: #5 0xc0691771 in trap (frame= {tf_fs = -1068433384, tf_es = -989790192, tf_ds = 16, tf_edi = -106612473 6, tf_esi = -1066124736, tf_ebp = -323699844, tf_isp = -323699872, tf_ebx = -10 07063716, tf_edx = 528, tf_ecx = -1013235680, tf_eax = 307472464, tf_trapno = 1 2, tf_err = 2, tf_eip = -1067870386, tf_cs = 8, tf_eflags = 66050, tf_esp = -98 9760240, tf_ss = -1007063716}) at /usr/src/sys/i386/i386/trap.c:425 On Thu, 2005-May-19 00:15:44 +0100, Jamie Heckford wrote: Fatal trap 12: page fault while in kernel mode fault virtual address = 0x214 That's a NULL pointer somewhere. The trap frame shows %edx is 528 so the code has presumably tried to dereference %edx but it's not clear how %edx would up with that value. fault code = supervisor write, page not present instruction pointer = 0x8:0xc059974e stack pointer = 0x10:0xecb4bb74 frame pointer = 0x10:0xecb4bb7c This instruction pointer matches the trap frame but not the traceback you posted. The trap frame gives the stack pointer as 0xC5017510 (which is nonsense) with a nonsense stack segment but the frame pointer matches. Having the frame pointer above the stack pointer is also unusual. It looks like gdb is a bit confused. You could try: disasm 0xc059974e x/x 0xecb4bb74 Does the instruction either at or immediately before 0xc059974e include [%edx]? What function is it in and can you work out the line number? Does the address reported by the x/x match anything in the backtrace? -- Peter Jeremy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
On Mon, May 16, 2005 at 06:40:01AM -0600, Elliot Finley wrote: The system freezes, but isn't totally dead. It'll still respond to pings, the screensaver still functions, but it won't respond to a CAD at the console. But if I press 'Enter' at the console, it'll give me a 'login:' prompt, but after entering the username, it never comes back with the 'password:' prompt. ... On my lightly loaded systems, it happens rarely. On my mailserver (fairly heavy disk load), it happens quite frequently. This could equally be a filesystem deadlock (race-to-root) rather than something in the ATA controller. Do you know if it happens gradually (starts with one or two non-responsive, unkillable processes and gets worse until nothing happens)? How can I troubleshoot this? Re-compile the kernel with: options KDB options DDB makeoptions DEBUG=-g and ensure you have a dumpdev in /etc/rc.conf. When you get a lockup, drop to DDB (Ctrl-Alt-ESC) and run show lockedvnods, ps and call doadump(). If you post the output (a serial console will help here) someone might be able to provide more pointers. (The crashdump will help with later debugging). Had the system lock up again. This is with the new ATA mkIII patches on http://people.freebsd.org/~sos/ATA. I didn't get the crashdump (forgot to set dumpdev), but I did get 'ps' and 'show lockedvnods' output from DDB. The output is in the form of screenshots combined into a single .pdf which can be accessed here http://www.efinley.com/Binder1.pdf I hope this is helpful, I'll get a crashdump next time (probably tomorrow morning). Elliot ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
On Mon, May 16, 2005 at 06:40:01AM -0600, Elliot Finley wrote: This has been happening since 5.3-R, I've been tuning different parameters to no avail. I've taken the disks off of the onboard ICH5 controller and put them a promise TX4 S150 controller, but still the same thing happens. The system freezes, but isn't totally dead. It'll still respond to pings, the screensaver still functions, but it won't respond to a CAD at the console. But if I press 'Enter' at the console, it'll give me a 'login:' prompt, but after entering the username, it never comes back with the 'password:' prompt. After manually resetting the system it boots and says 'Automatic file system check failed; help!' and drops into single user mode. Running fsck manually corrects errors on all volumes. Then it'll boot from that point. This seems to be triggered by daily periodic as it happens at 3:02-3:03AM each time. But it doesn't happen *every* morning. I suspect a bug in FreeBSD because this mode of failure happens on 3 different machines, all configured similarly. ASUS P4P800 2G RAM (though the other affected systems only have 1G) 80G Seagate Barracuda SATA drives (one system now on Promise TX4 S150 controller, others on onboard ICH5) On my lightly loaded systems, it happens rarely. On my mailserver (fairly heavy disk load), it happens quite frequently. How can I troubleshoot this? Managed to get a dump on our system for a similar prob we are getting: [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol ps_pglobal_lookup] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-marcel-freebsd. #0 doadump () at pcpu.h:160 160 __asm __volatile(movl %%fs:0,%0 : =r (td)); (kgdb) bt #0 doadump () at pcpu.h:160 #1 0xc05131ae in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:410 #2 0xc0513474 in panic (fmt=0xc06c3da5 %s) at /usr/src/sys/kern/kern_shutdown.c:566 #3 0xc0691e18 in trap_fatal (frame=0xecb4bb34, eva=532) at /usr/src/sys/i386/i386/trap.c:817 #4 0xc0691b73 in trap_pfault (frame=0xecb4bb34, usermode=0, eva=532) at /usr/src/sys/i386/i386/trap.c:735 #5 0xc0691771 in trap (frame= {tf_fs = -1068433384, tf_es = -989790192, tf_ds = 16, tf_edi = -1066124736, tf_esi = -1066124736, tf_ebp = -323699844, tf_isp = -323699872, tf_ebx = -1007063716, tf_edx = 528, tf_ecx = -1013235680, tf_eax = 307472464, tf_trapno = 12, tf_err = 2, tf_eip = -1067870386, tf_cs = 8, tf_eflags = 66050, tf_esp = -989760240, tf_ss = -1007063716}) at /usr/src/sys/i386/i386/trap.c:425 #6 0xc068168a in calltrap () at /usr/src/sys/i386/i386/exception.s:140 #7 0xc0510018 in crcopy () at /usr/src/sys/kern/kern_prot.c:1810 #8 0xc0598c77 in in_pcbdetach (inp=0xc0743a40) at /usr/src/sys/netinet/in_pcb.c:720 #9 0xc05b21a6 in tcp_close (tp=0x0) at /usr/src/sys/netinet/tcp_subr.c:783 #10 0xc05ae560 in tcp_input (m=0xc3a6a300, off0=20) at /usr/src/sys/netinet/tcp_input.c:2308 #11 0xc05a5aed in ip_input (m=0xc3a6a300) at /usr/src/sys/netinet/ip_input.c:776 #12 0xc0582f13 in netisr_processqueue (ni=0xc0742498) at /usr/src/sys/net/netisr.c:233 #13 0xc058310a in swi_net (dummy=0x0) at /usr/src/sys/net/netisr.c:346 #14 0xc04ffa79 in ithread_loop (arg=0xc3481600) at /usr/src/sys/kern/kern_intr.c:547 #15 0xc04fed0c in fork_exit (callout=0xc04ff928 ithread_loop, arg=0xc3481600, frame=0xecb4bd38) at /usr/src/sys/kern/kern_fork.c:791 #16 0xc06816ec in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:209 (kgdb) Help? ;) -- Jamie Heckford Network Manager Trident Microsystems Ltd. t: +44(0)1737-780790 f: +44(0)1737-771908 w: http://www.tridentmicrosystems.co.uk/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
On Wednesday 18 May 2005 17:03, Jamie Heckford wrote: On Mon, May 16, 2005 at 06:40:01AM -0600, Elliot Finley wrote: This has been happening since 5.3-R, I've been tuning different parameters to no avail. I've taken the disks off of the onboard ICH5 controller and put them a promise TX4 S150 controller, but still the same thing happens. The system freezes, but isn't totally dead. It'll still respond to pings, the screensaver still functions, but it won't respond to a CAD at the console. But if I press 'Enter' at the console, it'll give me a 'login:' prompt, but after entering the username, it never comes back with the 'password:' prompt. After manually resetting the system it boots and says 'Automatic file system check failed; help!' and drops into single user mode. Running fsck manually corrects errors on all volumes. Then it'll boot from that point. This seems to be triggered by daily periodic as it happens at 3:02-3:03AM each time. But it doesn't happen *every* morning. I suspect a bug in FreeBSD because this mode of failure happens on 3 different machines, all configured similarly. ASUS P4P800 2G RAM (though the other affected systems only have 1G) 80G Seagate Barracuda SATA drives (one system now on Promise TX4 S150 controller, others on onboard ICH5) On my lightly loaded systems, it happens rarely. On my mailserver (fairly heavy disk load), it happens quite frequently. How can I troubleshoot this? Help? ;) There is a bug in machine/bus.h (was: machine/bus_at386.h) that might cause random freezes, but I'm not sure if it is related: http://www.freebsd.org/cgi/query-pr.cgi?pr=80980 --HPS ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
On Wed, 2005-May-18 06:43:37 -0600, Elliot Finley wrote: Had the system lock up again. This is with the new ATA mkIII patches on http://people.freebsd.org/~sos/ATA. I didn't get the crashdump (forgot to set dumpdev), but I did get 'ps' and 'show lockedvnods' output from DDB. The output is in the form of screenshots combined into a single .pdf which can be accessed here http://www.efinley.com/Binder1.pdf That shows a deadlock-to-root in your /dev/ar0s1a (presumably root) filesystem. The perl process (pid 487) has an exclusive lock on the FS mountpoint - this is blocking 130 other processes. Pid 487 is itself waiting on another filesystem lock (you can't determine the actual lock tree without more poking around kernel memory). The vnode locks are held by processes: PID namewaiting on 487 perl [ufs c3c1c1b4] 57 syncer [snaplk c535f500] (holds 2 locks) 476 perl [ufs c87e4f1c] 489 perl [snaplk c535f500] (holds 2 locks) 3337 mksnap_ffs [getblk d77656f4] Looking through the process list, cron has started a dump -L which is trying to create a filesystem snapshot. That has wedged on getblk (trying to perform physical disk I/O) and is probably the root of your problem. Nothing else is waiting on physical I/O. I'd say that your first guess was right: This is a bug in the ATA code and is probably a job for sos. -- Peter Jeremy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
On Wed, 2005-May-18 16:03:16 +0100, Jamie Heckford wrote: Managed to get a dump on our system for a similar prob we are getting: That traceback looks like a panic, not a deadlock. What was the panic message? #2 0xc0513474 in panic (fmt=0xc06c3da5 %s) at /usr/src/sys/kern/kern_shutdown.c:566 ... #7 0xc0510018 in crcopy () at /usr/src/sys/kern/kern_prot.c:1810 #8 0xc0598c77 in in_pcbdetach (inp=0xc0743a40) at /usr/src/sys/netinet/in_pcb.c:720 #9 0xc05b21a6 in tcp_close (tp=0x0) at /usr/src/sys/netinet/tcp_subr.c:783 There's something wrong here: If tcp_close() is passed NULL it will panic at this point when it tries to dereference tp. #10 0xc05ae560 in tcp_input (m=0xc3a6a300, off0=20) at /usr/src/sys/netinet/tcp_input.c:2308 #11 0xc05a5aed in ip_input (m=0xc3a6a300) at /usr/src/sys/netinet/ip_input.c:776 #12 0xc0582f13 in netisr_processqueue (ni=0xc0742498) at /usr/src/sys/net/netisr.c:233 #13 0xc058310a in swi_net (dummy=0x0) at /usr/src/sys/net/netisr.c:346 -- Peter Jeremy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
Hi Peter, On Thu, May 19, 2005 at 05:53:12AM +1000, Peter Jeremy wrote: On Wed, 2005-May-18 16:03:16 +0100, Jamie Heckford wrote: Managed to get a dump on our system for a similar prob we are getting: That traceback looks like a panic, not a deadlock. What was the panic message? Only have remote access to the box im afraid, is there anyway I can obtain the panic message? #2 0xc0513474 in panic (fmt=0xc06c3da5 %s) at /usr/src/sys/kern/kern_shutdown.c:566 ... #7 0xc0510018 in crcopy () at /usr/src/sys/kern/kern_prot.c:1810 #8 0xc0598c77 in in_pcbdetach (inp=0xc0743a40) at /usr/src/sys/netinet/in_pcb.c:720 #9 0xc05b21a6 in tcp_close (tp=0x0) at /usr/src/sys/netinet/tcp_subr.c:783 There's something wrong here: If tcp_close() is passed NULL it will panic at this point when it tries to dereference tp. Starting to stretch my knowledge a bit now ;) If I can provide you with further debug output would you be able to give me some pointers? Thanks for your help -- Jamie Heckford Network Manager Trident Microsystems Ltd. t: +44(0)1737-780790 f: +44(0)1737-771908 w: http://www.tridentmicrosystems.co.uk/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
On Wed, 18 May 2005, Jamie Heckford wrote: Hi Peter, On Thu, May 19, 2005 at 05:53:12AM +1000, Peter Jeremy wrote: On Wed, 2005-May-18 16:03:16 +0100, Jamie Heckford wrote: Managed to get a dump on our system for a similar prob we are getting: That traceback looks like a panic, not a deadlock. What was the panic message? Only have remote access to the box im afraid, is there anyway I can obtain the panic message? print msgbuf should do it #2 0xc0513474 in panic (fmt=0xc06c3da5 %s) at /usr/src/sys/kern/kern_shutdown.c:566 ... #7 0xc0510018 in crcopy () at /usr/src/sys/kern/kern_prot.c:1810 #8 0xc0598c77 in in_pcbdetach (inp=0xc0743a40) at /usr/src/sys/netinet/in_pcb.c:720 #9 0xc05b21a6 in tcp_close (tp=0x0) at /usr/src/sys/netinet/tcp_subr.c:783 There's something wrong here: If tcp_close() is passed NULL it will panic at this point when it tries to dereference tp. Starting to stretch my knowledge a bit now ;) If I can provide you with further debug output would you be able to give me some pointers? Thanks for your help -- Doug White| FreeBSD: The Power to Serve [EMAIL PROTECTED] | www.FreeBSD.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
On Wed, May 18, 2005 at 03:54:59PM -0700, Doug White wrote: On Wed, 18 May 2005, Jamie Heckford wrote: Hi Peter, On Thu, May 19, 2005 at 05:53:12AM +1000, Peter Jeremy wrote: On Wed, 2005-May-18 16:03:16 +0100, Jamie Heckford wrote: Managed to get a dump on our system for a similar prob we are getting: That traceback looks like a panic, not a deadlock. What was the panic message? Only have remote access to the box im afraid, is there anyway I can obtain the panic message? print msgbuf should do it (kgdb) printf %s, (char *)msgbufp-msg_ptr snip Fatal trap 12: page fault while in kernel mode fault virtual address = 0x214 fault code = supervisor write, page not present instruction pointer = 0x8:0xc059974e stack pointer = 0x10:0xecb4bb74 frame pointer = 0x10:0xecb4bb7c code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 59 (swi1: net) trap number = 12 panic: page fault Uptime: 2h19m27s Dumping 2047 MB 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 384 400 416 432 448 464 480 496 512 528 544 560 576 592 608 624 640 656 672 688 704 720 736 752 768 784 800 816 832 848 864 880 896 912 928 944 960 976 992 1008 1024 1040 1056 1072 1088 1104 1120 1136 1152 1168 1184 1200 1216 1232 1248 1264 1280 1296 1312 1328 1344 1360 1376 1392 1408 1424 1440 1456 1472 1488 1504 1520 1536 1552 1568 1584 1600 1616 1632 1648 1664 1680 1696 1712 1728 1744 1760 1776 1792 1808 1824 1840 1856 1872 1888 1904 1920 1936 1952 1968 1984 2000 2016 2032(kgdb) Thanks :) -- Jamie Heckford Network Manager Trident Microsystems Ltd. t: +44(0)1737-780790 f: +44(0)1737-771908 w: http://www.tridentmicrosystems.co.uk/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
On Tue, 17 May 2005 [EMAIL PROTECTED] wrote: Date: Mon, 16 May 2005 06:40:01 -0600 From: Elliot Finley [EMAIL PROTECTED] Subject: 5.4-RC2 freezing - ATA related? To: freebsd-stable@freebsd.org Cc: [EMAIL PROTECTED] Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; charset=iso-8859-1 This has been happening since 5.3-R, I've been tuning different parameters to no avail. I've taken the disks off of the onboard ICH5 controller and put them a promise TX4 S150 controller, but still the same thing happens. The system freezes, but isn't totally dead. It'll still respond to pings, the screensaver still functions, but it won't respond to a CAD at the console. But if I press 'Enter' at the console, it'll give me a 'login:' prompt, but after entering the username, it never comes back with the 'password:' prompt. After manually resetting the system it boots and says 'Automatic file system check failed; help!' and drops into single user mode. Running fsck manually corrects errors on all volumes. Then it'll boot from that point. This seems to be triggered by daily periodic as it happens at 3:02-3:03AM each time. But it doesn't happen *every* morning. I suspect a bug in FreeBSD because this mode of failure happens on 3 different machines, all configured similarly. You can add a fourth. Ever since 5.1 (my first 5.x install) I have experienced the same problem, again with an Intel ICH5 ATA controller. The symptoms are exactly the same -- the hang is normally triggered during the periodic runs just after 3AM. The hang does occur at other times as well, but with nowhere near the same consistency. The only solution I found at that time was reverting to 4.10, though that is obviously suboptimal. I could be persuaded to reinstall 5.x on the machine if I'd be sure to get someone to look into this. Thanks, Brent Casavant -- Brent Casavant http://www.angeltread.org/ KD5EMB -.- -.. . . -- -... 44 54'24N 93 03'21W 907FASL EN34lv ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
On Tuesday 17 May 2005 15:58, Brent Casavant wrote: snip You can add a fourth. Ever since 5.1 (my first 5.x install) I have experienced the same problem, again with an Intel ICH5 ATA controller. The symptoms are exactly the same -- the hang is normally triggered during the periodic runs just after 3AM. The hang does occur at other times as well, but with nowhere near the same consistency. I've got four machines with ICH5/6 chips in, no stability problems whatsoever, and thats been the case since I installed them, around 5.2.1. Perhaps it is something to do with your workload, or another piece of hardware in the system. The machines do a lot of disc i/o so during their working days. The only solution I found at that time was reverting to 4.10, though that is obviously suboptimal. I could be persuaded to reinstall 5.x on the machine if I'd be sure to get someone to look into this. Thanks, Brent Casavant HTH, -- Dominic GoodforBusiness.co.uk I.T. Services for SMEs in the UK. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
On Tue, 17 May 2005, Dominic Marks wrote: I've got four machines with ICH5/6 chips in, no stability problems whatsoever, and thats been the case since I installed them, around 5.2.1. Perhaps it is something to do with your workload, or another piece of hardware in the system. The machines do a lot of disc i/o so during their working days. Not so much so in this case. It was being used only as a workstation, completely idle when I wasn't sitting in front of it. While it would occasionally lock up while in active use, that was relatively rare compared to the frequent hangs just after 3AM. I do agree it could be some other piece of hardware, but the fact that at least two of us have identical problems, often triggered by the same event (nightly periodic runs), starts to point in the direction of a software bug. I can't say for certain that downgrading to 4.x solved the problem as I needed to do that install on a SCSI drive instead, in order to preserve the contents of the filesystem on the ATA drive until I could transfer everything over. If I remember correctly (forgive me, this was about a year ago), I ran a number of disk performance benchmarks and other general stress tests on the ATA drive, and never was able to manually trigger the hang. Anecdotally, a friend of mine who recently tried 5.3 ran into frequent filesystem/drive problems as well, which reverting to 4.x solved. However it didn't sound like exactly the same problem. Brent -- Brent Casavant http://www.angeltread.org/ KD5EMB -.- -.. . . -- -... 44 54'24N 93 03'21W 907FASL EN34lv ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
On Mon, May 16, 2005 at 06:40:01AM -0600, Elliot Finley wrote: This has been happening since 5.3-R, I've been tuning different parameters to no avail. I've taken the disks off of the onboard ICH5 controller and put them a promise TX4 S150 controller, but still the same thing happens. The system freezes, but isn't totally dead. It'll still respond to pings, the screensaver still functions, but it won't respond to a CAD at the console. But if I press 'Enter' at the console, it'll give me a 'login:' prompt, but after entering the username, it never comes back with the 'password:' prompt. After manually resetting the system it boots and says 'Automatic file system check failed; help!' and drops into single user mode. Running fsck manually corrects errors on all volumes. Then it'll boot from that point. This seems to be triggered by daily periodic as it happens at 3:02-3:03AM each time. But it doesn't happen *every* morning. I suspect a bug in FreeBSD because this mode of failure happens on 3 different machines, all configured similarly. We are having similar problems to this on a box, won't go into great detail at the moment but will post results when we have finished testing. -- Jamie Heckford Network Manager Trident Microsystems Ltd. t: +44(0)1737-780790 f: +44(0)1737-771908 w: http://www.tridentmicrosystems.co.uk/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
On Tue, 2005-May-17 09:58:33 -0500, Brent Casavant wrote: The only solution I found at that time was reverting to 4.10, though that is obviously suboptimal. I could be persuaded to reinstall 5.x on the machine if I'd be sure to get someone to look into this. It doesn't work that way. You are going to need to provide much more information and do some of the work yourself. I'd suggest that you: 1) Install 5.4-RELEASE (or -STABLE), including a kernel built with debugging and DDB enabled (see my previous post and/or the handbook). 2) Confirm that the problem still exists for you. 3) Since you think it's the daily tasks, run periodic daily manually and try to provoke the problem. 4) Once you can provoke it, run the scripts in /etc/periodic/daily individually to identify which script is the problem. Try to narrow it down to a single command within the script. 5) Once you can identify a command (or command sequence) that provokes' the problem, save a crashdump and send your dmesg, the sequence of commands you ran as well as the DDB output from show lockedvnods and ps to this list. That's enough information for someone to make a start on investigating the problem. If that's all too hard and you want a fix, see http://www.freebsd.org/commercial/consult_bycat.html -- Peter Jeremy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
On Wed, 18 May 2005, Peter Jeremy wrote: On Tue, 2005-May-17 09:58:33 -0500, Brent Casavant wrote: The only solution I found at that time was reverting to 4.10, though that is obviously suboptimal. I could be persuaded to reinstall 5.x on the machine if I'd be sure to get someone to look into this. It doesn't work that way. You are going to need to provide much more information and do some of the work yourself. Oh certainly. Didn't mean to imply otherwise. I simply meant that I'm not qualified to look into IDE or filesystem related kernel code (the two most likely culprits), but if someone else was willing to do so I'd be happy to do any sort of testing they requested. Sorry for muddling my original statement. Brent -- Brent Casavant http://www.angeltread.org/ KD5EMB -.- -.. . . -- -... 44 54'24N 93 03'21W 907FASL EN34lv ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
5.4-RC2 freezing - ATA related?
This has been happening since 5.3-R, I've been tuning different parameters to no avail. I've taken the disks off of the onboard ICH5 controller and put them a promise TX4 S150 controller, but still the same thing happens. The system freezes, but isn't totally dead. It'll still respond to pings, the screensaver still functions, but it won't respond to a CAD at the console. But if I press 'Enter' at the console, it'll give me a 'login:' prompt, but after entering the username, it never comes back with the 'password:' prompt. After manually resetting the system it boots and says 'Automatic file system check failed; help!' and drops into single user mode. Running fsck manually corrects errors on all volumes. Then it'll boot from that point. This seems to be triggered by daily periodic as it happens at 3:02-3:03AM each time. But it doesn't happen *every* morning. I suspect a bug in FreeBSD because this mode of failure happens on 3 different machines, all configured similarly. ASUS P4P800 2G RAM (though the other affected systems only have 1G) 80G Seagate Barracuda SATA drives (one system now on Promise TX4 S150 controller, others on onboard ICH5) On my lightly loaded systems, it happens rarely. On my mailserver (fairly heavy disk load), it happens quite frequently. How can I troubleshoot this? dmesg follows: Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.4-RC2 #2: Wed Apr 13 17:35:20 MDT 2005 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/Postmaster Timecounter i8254 frequency 1193182 Hz quality 0 CPU: Intel(R) Pentium(R) 4 CPU 2.60GHz (2605.92-MHz 686-class CPU) Origin = GenuineIntel Id = 0xf29 Stepping = 9 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA ,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Hyperthreading: 2 logical CPUs real memory = 2146631680 (2047 MB) avail memory = 2095153152 (1998 MB) ACPI APIC Table: A M I OEMAPIC ioapic0 Version 2.0 irqs 0-23 on motherboard npx0: math processor on motherboard npx0: INT 16 interface acpi0: A M I OEMXSDT on motherboard acpi0: Power Button (fixed) Timecounter ACPI-fast frequency 3579545 Hz quality 1000 acpi_timer0: 24-bit timer at 3.579545MHz port 0x808-0x80b on acpi0 cpu0: ACPI CPU on acpi0 pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0 pci0: ACPI PCI bus on pcib0 agp0: Intel 82865 host to AGP bridge mem 0xf800-0xfbff at device 0.0 on pci0 pcib1: ACPI PCI-PCI bridge at device 1.0 on pci0 pci1: ACPI PCI bus on pcib1 pci1: display, VGA at device 0.0 (no driver attached) uhci0: Intel 82801EB (ICH5) USB controller USB-A port 0xef00-0xef1f irq 16 at device 29.0 on pci0 usb0: Intel 82801EB (ICH5) USB controller USB-A on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: Intel 82801EB (ICH5) USB controller USB-B port 0xef20-0xef3f irq 19 at device 29.1 on pci0 usb1: Intel 82801EB (ICH5) USB controller USB-B on uhci1 usb1: USB revision 1.0 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: Intel 82801EB (ICH5) USB controller USB-C port 0xef40-0xef5f irq 18 at device 29.2 on pci0 usb2: Intel 82801EB (ICH5) USB controller USB-C on uhci2 usb2: USB revision 1.0 uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered uhci3: Intel 82801EB (ICH5) USB controller USB-D port 0xef80-0xef9f irq 16 at device 29.3 on pci0 usb3: Intel 82801EB (ICH5) USB controller USB-D on uhci3 usb3: USB revision 1.0 uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub3: 2 ports with 2 removable, self powered pci0: serial bus, USB at device 29.7 (no driver attached) pcib2: ACPI PCI-PCI bridge at device 30.0 on pci0 pci2: ACPI PCI bus on pcib2 skc0: 3Com 3C940 Gigabit Ethernet port 0xd800-0xd8ff mem 0xfeafc000-0xfeaf irq 22 at device 5.0 on pci2 skc0: 3Com Gigabit LOM (3C940) rev. (0x1) sk0: Marvell Semiconductor, Inc. Yukon on skc0 sk0: Ethernet address: 00:0c:6e:54:4b:19 miibus0: MII bus on sk0 e1000phy0: Marvell 88E1000 Gigabit PHY on miibus0 e1000phy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX-FDX, auto atapci0: Promise PDC20319 SATA150 controller port 0xdc00-0xdc7f,0xdfa0-0xdfaf,0xdf00-0xdf3f mem 0xfeac-0xfead,0xfeafb000-0xfeafbfff irq 21 at device 9.0 on pci2 atapci0: failed: rid 0x20 is memory, requested 4 ata2: channel #0 on atapci0 ata3: channel #1 on atapci0 ata4: channel #2 on atapci0 ata5: channel #3 on atapci0 xl0: 3Com 3c905C-TX Fast Etherlink XL port 0xd480-0xd4ff mem 0xfeaf9c00-0xfeaf9c7f irq 20 at device 12.0 on pci2 miibus1: MII bus on xl0 ukphy0: Generic IEEE 802.3u media interface on miibus1 ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto xl0: Ethernet address: 00:04:75:f1:1c:7e isab0:
Re: 5.4-RC2 freezing - ATA related?
Elliot Finley wrote: This has been happening since 5.3-R, I've been tuning different parameters to no avail. I've taken the disks off of the onboard ICH5 controller and put them a promise TX4 S150 controller, but still the same thing happens. The system freezes, but isn't totally dead. It'll still respond to pings, the screensaver still functions, but it won't respond to a CAD at the console. But if I press 'Enter' at the console, it'll give me a 'login:' prompt, but after entering the username, it never comes back with the 'password:' prompt. After manually resetting the system it boots and says 'Automatic file system check failed; help!' and drops into single user mode. Running fsck manually corrects errors on all volumes. Then it'll boot from that point. This seems to be triggered by daily periodic as it happens at 3:02-3:03AM each time. But it doesn't happen *every* morning. Hmm, sounds as a deadlock somewhere. On the ATA part, try the ATA mkIII patches on http://people.freebsd.org/~sos/ATA and see if that changes anything. -- -Søren ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 5.4-RC2 freezing - ATA related?
On Mon, May 16, 2005 at 06:40:01AM -0600, Elliot Finley wrote: The system freezes, but isn't totally dead. It'll still respond to pings, the screensaver still functions, but it won't respond to a CAD at the console. But if I press 'Enter' at the console, it'll give me a 'login:' prompt, but after entering the username, it never comes back with the 'password:' prompt. ... On my lightly loaded systems, it happens rarely. On my mailserver (fairly heavy disk load), it happens quite frequently. This could equally be a filesystem deadlock (race-to-root) rather than something in the ATA controller. Do you know if it happens gradually (starts with one or two non-responsive, unkillable processes and gets worse until nothing happens)? How can I troubleshoot this? Re-compile the kernel with: options KDB options DDB makeoptions DEBUG=-g and ensure you have a dumpdev in /etc/rc.conf. When you get a lockup, drop to DDB (Ctrl-Alt-ESC) and run show lockedvnods, ps and call doadump(). If you post the output (a serial console will help here) someone might be able to provide more pointers. (The crashdump will help with later debugging). Note: If you don't have another FreeBSD system handy, a hard copy of ddb(4) will be very handy if you want to play around in DDB. -- Peter ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]