Re: 5.4-RC2 freezing - ATA related?

2005-05-31 Thread Steve Watt
In [EMAIL PROTECTED], [EMAIL PROTECTED] writes:
From: Peter Jeremy [EMAIL PROTECTED]
 On Wed, 2005-May-18 06:43:37 -0600, Elliot Finley wrote:
 Had the system lock up again.  This is with the new ATA mkIII patches on
 http://people.freebsd.org/~sos/ATA.
 
 I didn't get the crashdump (forgot to set dumpdev), but I did get 'ps'
and
 'show lockedvnods' output from DDB.  The output is in the form of
 screenshots combined into a single .pdf which can be accessed here
 http://www.efinley.com/Binder1.pdf

 That shows a deadlock-to-root in your /dev/ar0s1a (presumably root)
 filesystem.  The perl process (pid 487) has an exclusive lock on
 the FS mountpoint - this is blocking 130 other processes.  Pid 487
 is itself waiting on another filesystem lock (you can't determine
 the actual lock tree without more poking around kernel memory).

 The vnode locks are held by processes:
  PID   namewaiting on
  487  perl   [ufs c3c1c1b4]
   57  syncer [snaplk c535f500]  (holds 2 locks)
  476  perl   [ufs c87e4f1c]
  489  perl   [snaplk c535f500]  (holds 2 locks)
 3337  mksnap_ffs [getblk d77656f4]

 Looking through the process list, cron has started a dump -L which
 is trying to create a filesystem snapshot.  That has wedged on
 getblk (trying to perform physical disk I/O) and is probably the
 root of your problem.  Nothing else is waiting on physical I/O.

 I'd say that your first guess was right:  This is a bug in the ATA
 code and is probably a job for sos.

I took the -L option off of my dump command in my daily dump script.  I've
gone two days without locking up which is unusual.  I think that may be what
was tickling the bug that was locking me up.

This is a filesystem lock problem, not an ATA driver problem.  I analyzed
it, and posted the results to -hackers last week, with the subject snapshots
and innds.

The problem is that there is an invariant being broken in msync() -- Kirk
describes it fully in his reply to my message.

-- 
Steve Watt KD6GGD  PP-ASEL-IA  ICBM: 121W 56' 57.8 / 37N 20' 14.9
 Internet: steve @ Watt.COM Whois: SW32
   Free time?  There's no such thing.  It just comes in varying prices...
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-25 Thread Jamie Heckford

Jamie Heckford wrote:

On Wed, May 18, 2005 at 03:54:59PM -0700, Doug White wrote:


On Wed, 18 May 2005, Jamie Heckford wrote:



Hi Peter,

On Thu, May 19, 2005 at 05:53:12AM +1000, Peter Jeremy wrote:


On Wed, 2005-May-18 16:03:16 +0100, Jamie Heckford wrote:


Managed to get a dump on our system for a similar prob we are getting:


That traceback looks like a panic, not a deadlock.  What was the panic
message?


Only have remote access to the box im afraid, is there anyway I can obtain
the panic message?


print msgbuf should do it


Another one... looks completly different :-(

[GDB will not be able to debug user-mode threads: 
/usr/lib/libthread_db.so: Undefined symbol 
ps_pglobal_lookup]

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.

Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-marcel-freebsd.
#0  doadump () at pcpu.h:160
160 __asm __volatile(movl %%fs:0,%0 : =r (td));
(kgdb) bt full
#0  doadump () at pcpu.h:160
No locals.
#1  0xc04fac8a in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:410
first_buf_printf = 1
#2  0xc04faf50 in panic (fmt=0xc06c06db 
softdep_deallocate_dependencies: dangling deps)

at /usr/src/sys/kern/kern_shutdown.c:566
td = (struct thread *) 0xc357fd80
bootopt = 260
newpanic = 1
ap = 0xc357fd80 \\\214\215ÃðjOÃ
buf = softdep_deallocate_dependencies: dangling deps, '\0' 
repeats 209 times

#3  0xc061cbfe in softdep_deallocate_dependencies (bp=0x0)
at /usr/src/sys/ufs/ffs/ffs_softdep.c:5961
No locals.
#4  0xc053c8f4 in brelse (bp=0xd77932d4) at buf.h:431
No locals.
#5  0xc054bd24 in flushbuflist (blist=0xd77932d4, flags=0, 
vp=0xc4bf9630, slpflag=0,

slptimeo=0, errorp=0x0) at /usr/src/sys/kern/vfs_subr.c:1101
bp = (struct buf *) 0xd77932d4
nbp = (struct buf *) 0xd75948f0
found = 1
#6  0xc054b987 in vinvalbuf (vp=0xc4bf9630, flags=0, cred=0x0, td=0x0, 
slpflag=0,

slptimeo=0) at /usr/src/sys/kern/vfs_subr.c:987
blist = (struct buf *) 0x0
error = 0
object = 0xc04efc79
#7  0xc054e85c in vclean (vp=0xc4bf9630, flags=8, td=0xc357fd80)
at /usr/src/sys/kern/vfs_subr.c:2479
---Type return to continue, or q return to quit---
active = 0
#8  0xc054eeb5 in vgonel (vp=0xc4bf9630, td=0xc357fd80)
at /usr/src/sys/kern/vfs_subr.c:2697
No locals.
#9  0xc054a9f2 in vlrureclaim (mp=0xc35b3c00) at pcpu.h:157
vp = (struct vnode *) 0xc4bf9630
done = 0
trigger = 10
usevnodes = 0
count = 7
#10 0xc054ac66 in vnlru_proc () at /usr/src/sys/kern/vfs_subr.c:598
mp = (struct mount *) 0xc35b3c00
nmp = (struct mount *) 0xc35b3c00
done = 5887
p = (struct proc *) 0xc38d8c5c
td = (struct thread *) 0xc357fd80
#11 0xc04e67e8 in fork_exit (callout=0xc054aa98 vnlru_proc, arg=0x0, 
frame=0xe68aad38)

at /usr/src/sys/kern/kern_fork.c:791
p = (struct proc *) 0xc38d8c5c
td = (struct thread *) 0x0
#12 0xc066746c in fork_trampoline () at 
/usr/src/sys/i386/i386/exception.s:209

No locals.
(kgdb)

panic: softdep_deallocate_dependencies: dangling deps
Uptime: 10h26m14s
Dumping 2047 MB
 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 
320 336 352 368 384 400 416 432 448 464 480 496 512 528 544 560 576 592 
608 624 640 656 672 688 704 720 736 752 768 784 800 816 832 848 864 880 
896 912 928 944 960 976 992 1008 1024 1040 1056 1072 1088 1104 1120 1136 
1152 1168 1184 1200 1216 1232 1248 1264 1280 1296 1312 1328 1344 1360 
1376 1392 1408 1424 1440 1456 1472 1488 1504 1520 1536 1552 1568 1584 
1600 1616 1632 1648 1664 1680 1696 1712 1728 1744 1760 1776 1792 1808 
1824 1840 1856 1872 1888 1904 1920 1936 1952 1968 1984 2000 2016 2032(kgdb)


Would be really grateful if anyone could suggest anything, again it 
appears to happen around the time periodic runs (but has happened 
randomly under load, not sure if this is a red herring tho)


If anyone needs anymore info, more than happy to oblige.

Cheers

--
Jamie Heckford
Network Manager
Trident Microsystems Ltd.

t: +44(0)1737-780790
f: +44(0)1737-771908
w: http://www.trident-uk.co.uk/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-25 Thread Jamie Heckford

Jamie Heckford wrote:

Another one... looks completly different :-(

[GDB will not be able to debug user-mode threads: 
/usr/lib/libthread_db.so: Undefined symbol ps_pglobal_lookup]

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you 
are
welcome to change it and/or distribute copies of it under certain 
conditions.

Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-marcel-freebsd.
#0  doadump () at pcpu.h:160
160 __asm __volatile(movl %%fs:0,%0 : =r (td));
(kgdb) bt full
#0  doadump () at pcpu.h:160
No locals.
#1  0xc04fac8a in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:410
first_buf_printf = 1
#2  0xc04faf50 in panic (fmt=0xc06c06db 
softdep_deallocate_dependencies: dangling deps)

at /usr/src/sys/kern/kern_shutdown.c:566
td = (struct thread *) 0xc357fd80
bootopt = 260
newpanic = 1
ap = 0xc357fd80 \\\214\215ÃðjOÃ
buf = softdep_deallocate_dependencies: dangling deps, '\0' 
repeats 209 times

#3  0xc061cbfe in softdep_deallocate_dependencies (bp=0x0)
at /usr/src/sys/ufs/ffs/ffs_softdep.c:5961
No locals.
#4  0xc053c8f4 in brelse (bp=0xd77932d4) at buf.h:431
No locals.
#5  0xc054bd24 in flushbuflist (blist=0xd77932d4, flags=0, 
vp=0xc4bf9630, slpflag=0,

slptimeo=0, errorp=0x0) at /usr/src/sys/kern/vfs_subr.c:1101
bp = (struct buf *) 0xd77932d4
nbp = (struct buf *) 0xd75948f0
found = 1
#6  0xc054b987 in vinvalbuf (vp=0xc4bf9630, flags=0, cred=0x0, td=0x0, 
slpflag=0,

slptimeo=0) at /usr/src/sys/kern/vfs_subr.c:987
blist = (struct buf *) 0x0
error = 0
object = 0xc04efc79
#7  0xc054e85c in vclean (vp=0xc4bf9630, flags=8, td=0xc357fd80)
at /usr/src/sys/kern/vfs_subr.c:2479
---Type return to continue, or q return to quit---
active = 0
#8  0xc054eeb5 in vgonel (vp=0xc4bf9630, td=0xc357fd80)
at /usr/src/sys/kern/vfs_subr.c:2697
No locals.
#9  0xc054a9f2 in vlrureclaim (mp=0xc35b3c00) at pcpu.h:157
vp = (struct vnode *) 0xc4bf9630
done = 0
trigger = 10
usevnodes = 0
count = 7
#10 0xc054ac66 in vnlru_proc () at /usr/src/sys/kern/vfs_subr.c:598
mp = (struct mount *) 0xc35b3c00
nmp = (struct mount *) 0xc35b3c00
done = 5887
p = (struct proc *) 0xc38d8c5c
td = (struct thread *) 0xc357fd80
#11 0xc04e67e8 in fork_exit (callout=0xc054aa98 vnlru_proc, arg=0x0, 
frame=0xe68aad38)

at /usr/src/sys/kern/kern_fork.c:791
p = (struct proc *) 0xc38d8c5c
td = (struct thread *) 0x0
#12 0xc066746c in fork_trampoline () at 
/usr/src/sys/i386/i386/exception.s:209

No locals.
(kgdb)

panic: softdep_deallocate_dependencies: dangling deps
Uptime: 10h26m14s
Dumping 2047 MB
 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 
320 336 352 368 384 400 416 432 448 464 480 496 512 528 544 560 576 592 
608 624 640 656 672 688 704 720 736 752 768 784 800 816 832 848 864 880 
896 912 928 944 960 976 992 1008 1024 1040 1056 1072 1088 1104 1120 1136 
1152 1168 1184 1200 1216 1232 1248 1264 1280 1296 1312 1328 1344 1360 
1376 1392 1408 1424 1440 1456 1472 1488 1504 1520 1536 1552 1568 1584 
1600 1616 1632 1648 1664 1680 1696 1712 1728 1744 1760 1776 1792 1808 
1824 1840 1856 1872 1888 1904 1920 1936 1952 1968 1984 2000 2016 2032(kgdb)


Would be really grateful if anyone could suggest anything, again it 
appears to happen around the time periodic runs (but has happened 
randomly under load, not sure if this is a red herring tho)


If anyone needs anymore info, more than happy to oblige.

Cheers



Is there anyway this could be triggered by a filesystem becoming full.?

--
Jamie Heckford
Network Manager
Trident Microsystems Ltd.

t: +44(0)1737-780790
f: +44(0)1737-771908
w: http://www.trident-uk.co.uk/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-23 Thread Elliot Finley
From: Søren Schmidt [EMAIL PROTECTED]
On 21/05/2005, at 0:52, Peter Jeremy wrote:
 On Fri, 2005-May-20 14:53:09 -0600, Elliot Finley wrote:

 From: Peter Jeremy [EMAIL PROTECTED]

 On Fri, 2005-May-20 08:25:58 -0600, Elliot Finley wrote:

 I took the -L option off of my dump command in my daily dump
 script.  I've
 gone two days without locking up which is unusual.  I think that
 may be what
 was tickling the bug that was locking me up.


 Sometime you might like to do a 'dd if=/dev/ar0 of=/dev/null
 bs=32k' just
 to confirm that you don't have any unreadable blocks (though this
 seems
 unlikely).


 came up clean. transfer went 40MB/s.


 That seem to leave the finger pointing at the ATA driver.

 Paging Søren: Are you have to help Elliot?

++No, my only advise is to use the ATA mkIII patches or better yet -
++current..

I'm already running with the newest ATA mkIII patches.  Even with the
patches, it freezes up when using the -L option on my daily dump.

Elliot

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-23 Thread Mark Pheffer



Elliot Finley wrote:

This has been happening since 5.3-R, I've been tuning different 
parameters

to no avail.  I've taken the disks off of the onboard ICH5 controller and
put them a promise TX4 S150 controller, but still the same thing happens.

The system freezes, but isn't totally dead.  It'll still respond to 
pings,

the screensaver still functions, but it won't respond to a CAD at the
console.  But if I press 'Enter' at the console, it'll give me a 'login:'
prompt, but after entering the username, it never comes back with the
'password:' prompt.

After manually resetting the system it boots and says 'Automatic file 
system
check failed; help!' and drops into single user mode.  Running fsck 
manually

corrects errors on all volumes.  Then it'll boot from that point.

This seems to be triggered by daily periodic as it happens at 3:02-3:03AM
each time.  But it doesn't happen *every* morning.



I've had a similar problem with an IBM Thinkpad A21p. The machine would 
slowly start to lock up until the only thing it would respond to were 
pings. This would usually occur when the filesystem was under a heavy 
load (like untarring openoffice). I managed to trace the problem to 
snapshots that were about 40 days old (I keep old snapshots around for 
CYA purposes). After deleting the old snapshots, the system functioned 
perfectly.


I've been running it pretty hard now for the last few weeks and it 
hasn't locked up once. Whether or not the snapshots were the cause of 
the problem or just another symptom I can't really tell but deleting 
them definitely cured the problem. Right now I have a filesystem 
snapshot that's about a week old and it seems to be just fine.


Mark
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-21 Thread Søren Schmidt


On 21/05/2005, at 0:52, Peter Jeremy wrote:


On Fri, 2005-May-20 14:53:09 -0600, Elliot Finley wrote:


From: Peter Jeremy [EMAIL PROTECTED]


On Fri, 2005-May-20 08:25:58 -0600, Elliot Finley wrote:

I took the -L option off of my dump command in my daily dump  
script.  I've
gone two days without locking up which is unusual.  I think that  
may be what

was tickling the bug that was locking me up.



Sometime you might like to do a 'dd if=/dev/ar0 of=/dev/null  
bs=32k' just
to confirm that you don't have any unreadable blocks (though this  
seems

unlikely).



came up clean. transfer went 40MB/s.



That seem to leave the finger pointing at the ATA driver.

Paging Søren: Are you have to help Elliot?


No, my only advise is to use the ATA mkIII patches or better yet - 
current..


- Søren



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-21 Thread Thomas Hurst
* Søren Schmidt ([EMAIL PROTECTED]) wrote:

 No, my only advise is to use the ATA mkIII patches or better yet -
 current..

In a similar vein, I'm seeing the same WRITE_DMA timeouts and system
lockups using ATA mkIII patches as I did using the standard RELENG_5
driver, on two seperate systems.

I'm getting the WRITE_DMA retries on a multi-gmirror Athlon system using
a PCI SATA card; the two PATA drives on the system are fine:

 FreeBSD 5.4-STABLE #0: Thu Apr 28 06:31:53 BST 2005
 atapci1: SiI 3112 SATA150 controller port
  0xcc00-0xcc0f,0xc800-0xc803,0xc400-0xc407,0xc000-0xc003,0xbc00-0xbc07
  mem 0xe7062000-0xe70621ff irq 11 at device 12.0 on pci0
 ad4: 381554MB ST3400832AS/3.01 [775221/16/63] at ata2-master SATA150
 ad6: 381554MB ST3400832AS/3.01 [775221/16/63] at ata3-master SATA150
 ..
 ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=401743679
 ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=781421759

It seems harmless, but results in writes freezing for several seconds
every couple of hundred MB (annoying with 360G of storage as you might
imagine).  It normally favours a single drive, but seems to bounce
between ad4 and 6 for no apparant reason.  Replacing the SATA card and
cables has no effect.  Attempting to drop the drives to PIO with
atacontrol doesn't seem to do anything either (they remain at SATA150).

The other system where I see the lockups (I used to get READ/WRITE_DMA
timeouts with the lockup many moons ago, which seems to have started
after a system update, but for the past 6+ months or so I just get the
lockup) is an old BP6 (dual Celeron), on two different channels on two
different drive:

 FreeBSD 5.4-STABLE #2: Tue Apr 26 17:59:25 BST 2005
 atapci1: HighPoint HPT366 UDMA66 controller port
   0xd800-0xd8ff,0xd400-0xd403,0xd000-0xd007 irq 18 at device 19.0 on pci0
 atapci2: HighPoint HPT366 UDMA66 controller port
   0xe400-0xe4ff,0xe000-0xe003,0xdc00-0xdc07 irq 18 at device 19.1 on pci0
 ad4: 76319MB Seagate ST380011A 3.04 at ata2-master UDMA66
 ad6: 114473MB Seagate ST3120026A 3.01 at ata3-master UDMA66

Setting these drives to PIO4 resolves the stability problems (which
again only occurs under heavy disk activity, almost always on writes),
but makes the system crawl.  I'm planning on migrating it to gmirror,
which I expect will make it behave more like the Athlon, but obviously
I'd like to be able to use DMA reliably without resorting to RAID-1
everywhere.

Save me Søren!

-- 
Thomas 'Freaky' Hurst
http://hur.st/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-21 Thread Søren Schmidt


On 22/05/2005, at 2:36, Thomas Hurst wrote:


* Søren Schmidt ([EMAIL PROTECTED]) wrote:



No, my only advise is to use the ATA mkIII patches or better yet -
current..



In a similar vein, I'm seeing the same WRITE_DMA timeouts and system
lockups using ATA mkIII patches as I did using the standard RELENG_5
driver, on two seperate systems.

I'm getting the WRITE_DMA retries on a multi-gmirror Athlon system  
using

a PCI SATA card; the two PATA drives on the system are fine:

 FreeBSD 5.4-STABLE #0: Thu Apr 28 06:31:53 BST 2005
 atapci1: SiI 3112 SATA150 controller port
  0xcc00-0xcc0f, 
0xc800-0xc803,0xc400-0xc407,0xc000-0xc003,0xbc00-0xbc07

  mem 0xe7062000-0xe70621ff irq 11 at device 12.0 on pci0
 ad4: 381554MB ST3400832AS/3.01 [775221/16/63] at ata2-master  
SATA150
 ad6: 381554MB ST3400832AS/3.01 [775221/16/63] at ata3-master  
SATA150

 ..
 ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=401743679
 ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=781421759

It seems harmless, but results in writes freezing for several seconds
every couple of hundred MB (annoying with 360G of storage as you might
imagine).  It normally favours a single drive, but seems to bounce
between ad4 and 6 for no apparant reason.  Replacing the SATA card and
cables has no effect.  Attempting to drop the drives to PIO with
atacontrol doesn't seem to do anything either (they remain at  
SATA150).


The other system where I see the lockups (I used to get READ/WRITE_DMA
timeouts with the lockup many moons ago, which seems to have started
after a system update, but for the past 6+ months or so I just get the
lockup) is an old BP6 (dual Celeron), on two different channels on two
different drive:

 FreeBSD 5.4-STABLE #2: Tue Apr 26 17:59:25 BST 2005
 atapci1: HighPoint HPT366 UDMA66 controller port
   0xd800-0xd8ff,0xd400-0xd403,0xd000-0xd007 irq 18 at device 19.0  
on pci0

 atapci2: HighPoint HPT366 UDMA66 controller port
   0xe400-0xe4ff,0xe000-0xe003,0xdc00-0xdc07 irq 18 at device 19.1  
on pci0

 ad4: 76319MB Seagate ST380011A 3.04 at ata2-master UDMA66
 ad6: 114473MB Seagate ST3120026A 3.01 at ata3-master UDMA66

Setting these drives to PIO4 resolves the stability problems (which
again only occurs under heavy disk activity, almost always on writes),
but makes the system crawl.  I'm planning on migrating it to gmirror,
which I expect will make it behave more like the Athlon, but obviously
I'd like to be able to use DMA reliably without resorting to RAID-1
everywhere.

Save me Søren!


You have picked some of the most dreaded HW out there thats for sure,  
so I'm not sure I can do that :)
Anyhow, you should try a recent -current since some of the race/ 
timeout problems thats possible in 5.x has been fixed there.


- Søren



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-20 Thread Elliot Finley
From: Peter Jeremy [EMAIL PROTECTED]
 On Wed, 2005-May-18 06:43:37 -0600, Elliot Finley wrote:
 Had the system lock up again.  This is with the new ATA mkIII patches on
 http://people.freebsd.org/~sos/ATA.
 
 I didn't get the crashdump (forgot to set dumpdev), but I did get 'ps'
and
 'show lockedvnods' output from DDB.  The output is in the form of
 screenshots combined into a single .pdf which can be accessed here
 http://www.efinley.com/Binder1.pdf

 That shows a deadlock-to-root in your /dev/ar0s1a (presumably root)
 filesystem.  The perl process (pid 487) has an exclusive lock on
 the FS mountpoint - this is blocking 130 other processes.  Pid 487
 is itself waiting on another filesystem lock (you can't determine
 the actual lock tree without more poking around kernel memory).

 The vnode locks are held by processes:
  PID   namewaiting on
  487  perl   [ufs c3c1c1b4]
   57  syncer [snaplk c535f500]  (holds 2 locks)
  476  perl   [ufs c87e4f1c]
  489  perl   [snaplk c535f500]  (holds 2 locks)
 3337  mksnap_ffs [getblk d77656f4]

 Looking through the process list, cron has started a dump -L which
 is trying to create a filesystem snapshot.  That has wedged on
 getblk (trying to perform physical disk I/O) and is probably the
 root of your problem.  Nothing else is waiting on physical I/O.

 I'd say that your first guess was right:  This is a bug in the ATA
 code and is probably a job for sos.

I took the -L option off of my dump command in my daily dump script.  I've
gone two days without locking up which is unusual.  I think that may be what
was tickling the bug that was locking me up.

Thanks for the analysis Peter.

Elliot

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-20 Thread Peter Jeremy
On Fri, 2005-May-20 08:25:58 -0600, Elliot Finley wrote:
I took the -L option off of my dump command in my daily dump script.  I've
gone two days without locking up which is unusual.  I think that may be what
was tickling the bug that was locking me up.

Sometime you might like to do a 'dd if=/dev/ar0 of=/dev/null bs=32k' just
to confirm that you don't have any unreadable blocks (though this seems
unlikely).

-- 
Peter Jeremy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-20 Thread Elliot Finley
From: Peter Jeremy [EMAIL PROTECTED]
 On Fri, 2005-May-20 08:25:58 -0600, Elliot Finley wrote:
 I took the -L option off of my dump command in my daily dump script.
I've
 gone two days without locking up which is unusual.  I think that may be
what
 was tickling the bug that was locking me up.

 Sometime you might like to do a 'dd if=/dev/ar0 of=/dev/null bs=32k' just
 to confirm that you don't have any unreadable blocks (though this seems
 unlikely).

came up clean. transfer went 40MB/s.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-20 Thread Peter Jeremy
On Fri, 2005-May-20 14:53:09 -0600, Elliot Finley wrote:
From: Peter Jeremy [EMAIL PROTECTED]
 On Fri, 2005-May-20 08:25:58 -0600, Elliot Finley wrote:
 I took the -L option off of my dump command in my daily dump script.  I've
 gone two days without locking up which is unusual.  I think that may be what
 was tickling the bug that was locking me up.

 Sometime you might like to do a 'dd if=/dev/ar0 of=/dev/null bs=32k' just
 to confirm that you don't have any unreadable blocks (though this seems
 unlikely).

came up clean. transfer went 40MB/s.

That seem to leave the finger pointing at the ATA driver.

Paging Søren: Are you have to help Elliot?

-- 
Peter Jeremy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-19 Thread Peter Jeremy
Previously posted trap frame:
#5  0xc0691771 in trap (frame=
  {tf_fs = -1068433384, tf_es = -989790192, tf_ds = 16, tf_edi = -106612473
6, tf_esi = -1066124736, tf_ebp = -323699844, tf_isp = -323699872, tf_ebx = -10
07063716, tf_edx = 528, tf_ecx = -1013235680, tf_eax = 307472464, tf_trapno = 1
2, tf_err = 2, tf_eip = -1067870386, tf_cs = 8, tf_eflags = 66050, tf_esp = -98
9760240, tf_ss = -1007063716}) at /usr/src/sys/i386/i386/trap.c:425

On Thu, 2005-May-19 00:15:44 +0100, Jamie Heckford wrote:
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x214

That's a NULL pointer somewhere.  The trap frame shows %edx is 528 so
the code has presumably tried to dereference %edx but it's not clear
how %edx would up with that value.

fault code  = supervisor write, page not present
instruction pointer = 0x8:0xc059974e
stack pointer   = 0x10:0xecb4bb74
frame pointer   = 0x10:0xecb4bb7c

This instruction pointer matches the trap frame but not the traceback
you posted.  The trap frame gives the stack pointer as 0xC5017510
(which is nonsense) with a nonsense stack segment but the frame
pointer matches.  Having the frame pointer above the stack pointer
is also unusual.

It looks like gdb is a bit confused.  You could try:
disasm 0xc059974e
x/x 0xecb4bb74

Does the instruction either at or immediately before 0xc059974e
include [%edx]?  What function is it in and can you work out the
line number?  Does the address reported by the x/x match anything
in the backtrace?

-- 
Peter Jeremy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-18 Thread Elliot Finley
 On Mon, May 16, 2005 at 06:40:01AM -0600, Elliot Finley wrote:
 The system freezes, but isn't totally dead.  It'll still respond to
pings,
 the screensaver still functions, but it won't respond to a CAD at the
 console.  But if I press 'Enter' at the console, it'll give me a 'login:'
 prompt, but after entering the username, it never comes back with the
 'password:' prompt.
 ...
 On my lightly loaded systems, it happens rarely.  On my mailserver
(fairly
 heavy disk load), it happens quite frequently.

 This could equally be a filesystem deadlock (race-to-root) rather than
 something in the ATA controller.  Do you know if it happens gradually
 (starts with one or two non-responsive, unkillable processes and gets
 worse until nothing happens)?

 How can I troubleshoot this?

 Re-compile the kernel with:
options KDB
options DDB
makeoptions DEBUG=-g
 and ensure you have a dumpdev in /etc/rc.conf.  When you get a
 lockup, drop to DDB (Ctrl-Alt-ESC) and run show lockedvnods, ps
 and call doadump().  If you post the output (a serial console will
 help here) someone might be able to provide more pointers.  (The
 crashdump will help with later debugging).

Had the system lock up again.  This is with the new ATA mkIII patches on
http://people.freebsd.org/~sos/ATA.

I didn't get the crashdump (forgot to set dumpdev), but I did get 'ps' and
'show lockedvnods' output from DDB.  The output is in the form of
screenshots combined into a single .pdf which can be accessed here
http://www.efinley.com/Binder1.pdf

I hope this is helpful, I'll get a crashdump next time (probably tomorrow
morning).

Elliot

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-18 Thread Jamie Heckford
On Mon, May 16, 2005 at 06:40:01AM -0600, Elliot Finley wrote:
 This has been happening since 5.3-R, I've been tuning different parameters
 to no avail.  I've taken the disks off of the onboard ICH5 controller and
 put them a promise TX4 S150 controller, but still the same thing happens.
 
 The system freezes, but isn't totally dead.  It'll still respond to pings,
 the screensaver still functions, but it won't respond to a CAD at the
 console.  But if I press 'Enter' at the console, it'll give me a 'login:'
 prompt, but after entering the username, it never comes back with the
 'password:' prompt.
 
 After manually resetting the system it boots and says 'Automatic file system
 check failed; help!' and drops into single user mode.  Running fsck manually
 corrects errors on all volumes.  Then it'll boot from that point.
 
 This seems to be triggered by daily periodic as it happens at 3:02-3:03AM
 each time.  But it doesn't happen *every* morning.
 
 I suspect a bug in FreeBSD because this mode of failure happens on 3
 different machines, all configured similarly.
 
 ASUS P4P800
 2G RAM (though the other affected systems only have 1G)
 80G Seagate Barracuda SATA drives (one system now on Promise TX4 S150
 controller, others on onboard ICH5)
 
 On my lightly loaded systems, it happens rarely.  On my mailserver (fairly
 heavy disk load), it happens quite frequently.
 
 How can I troubleshoot this?

Managed to get a dump on our system for a similar prob we are getting:

[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol ps_pglobal_lookup]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-marcel-freebsd.
#0  doadump () at pcpu.h:160
160 __asm __volatile(movl %%fs:0,%0 : =r (td));
(kgdb) bt
#0  doadump () at pcpu.h:160
#1  0xc05131ae in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:410
#2  0xc0513474 in panic (fmt=0xc06c3da5 %s) at 
/usr/src/sys/kern/kern_shutdown.c:566
#3  0xc0691e18 in trap_fatal (frame=0xecb4bb34, eva=532) at 
/usr/src/sys/i386/i386/trap.c:817
#4  0xc0691b73 in trap_pfault (frame=0xecb4bb34, usermode=0, eva=532) at 
/usr/src/sys/i386/i386/trap.c:735
#5  0xc0691771 in trap (frame=
  {tf_fs = -1068433384, tf_es = -989790192, tf_ds = 16, tf_edi = 
-1066124736, tf_esi = -1066124736, tf_ebp = -323699844, tf_isp = -323699872, 
tf_ebx = -1007063716, tf_edx = 528, tf_ecx = -1013235680, tf_eax = 307472464, 
tf_trapno = 12, tf_err = 2, tf_eip = -1067870386, tf_cs = 8, tf_eflags = 66050, 
tf_esp = -989760240, tf_ss = -1007063716}) at /usr/src/sys/i386/i386/trap.c:425
#6  0xc068168a in calltrap () at /usr/src/sys/i386/i386/exception.s:140
#7  0xc0510018 in crcopy () at /usr/src/sys/kern/kern_prot.c:1810
#8  0xc0598c77 in in_pcbdetach (inp=0xc0743a40) at 
/usr/src/sys/netinet/in_pcb.c:720
#9  0xc05b21a6 in tcp_close (tp=0x0) at /usr/src/sys/netinet/tcp_subr.c:783
#10 0xc05ae560 in tcp_input (m=0xc3a6a300, off0=20) at 
/usr/src/sys/netinet/tcp_input.c:2308
#11 0xc05a5aed in ip_input (m=0xc3a6a300) at /usr/src/sys/netinet/ip_input.c:776
#12 0xc0582f13 in netisr_processqueue (ni=0xc0742498) at 
/usr/src/sys/net/netisr.c:233
#13 0xc058310a in swi_net (dummy=0x0) at /usr/src/sys/net/netisr.c:346
#14 0xc04ffa79 in ithread_loop (arg=0xc3481600) at 
/usr/src/sys/kern/kern_intr.c:547
#15 0xc04fed0c in fork_exit (callout=0xc04ff928 ithread_loop, arg=0xc3481600, 
frame=0xecb4bd38) at /usr/src/sys/kern/kern_fork.c:791
#16 0xc06816ec in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:209
(kgdb) 

Help? ;)

-- 
Jamie Heckford
Network Manager
Trident Microsystems Ltd.

t: +44(0)1737-780790
f: +44(0)1737-771908
w: http://www.tridentmicrosystems.co.uk/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-18 Thread Hans Petter Selasky
On Wednesday 18 May 2005 17:03, Jamie Heckford wrote:
 On Mon, May 16, 2005 at 06:40:01AM -0600, Elliot Finley wrote:
  This has been happening since 5.3-R, I've been tuning different
  parameters to no avail.  I've taken the disks off of the onboard ICH5
  controller and put them a promise TX4 S150 controller, but still the same
  thing happens.
 
  The system freezes, but isn't totally dead.  It'll still respond to
  pings, the screensaver still functions, but it won't respond to a CAD at
  the console.  But if I press 'Enter' at the console, it'll give me a
  'login:' prompt, but after entering the username, it never comes back
  with the 'password:' prompt.
 
  After manually resetting the system it boots and says 'Automatic file
  system check failed; help!' and drops into single user mode.  Running
  fsck manually corrects errors on all volumes.  Then it'll boot from that
  point.
 
  This seems to be triggered by daily periodic as it happens at 3:02-3:03AM
  each time.  But it doesn't happen *every* morning.
 
  I suspect a bug in FreeBSD because this mode of failure happens on 3
  different machines, all configured similarly.
 
  ASUS P4P800
  2G RAM (though the other affected systems only have 1G)
  80G Seagate Barracuda SATA drives (one system now on Promise TX4 S150
  controller, others on onboard ICH5)
 
  On my lightly loaded systems, it happens rarely.  On my mailserver
  (fairly heavy disk load), it happens quite frequently.
 
  How can I troubleshoot this?


 Help? ;)

There is a bug in machine/bus.h (was: machine/bus_at386.h) that might 
cause random freezes, but I'm not sure if it is related:

http://www.freebsd.org/cgi/query-pr.cgi?pr=80980

--HPS
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-18 Thread Peter Jeremy
On Wed, 2005-May-18 06:43:37 -0600, Elliot Finley wrote:
Had the system lock up again.  This is with the new ATA mkIII patches on
http://people.freebsd.org/~sos/ATA.

I didn't get the crashdump (forgot to set dumpdev), but I did get 'ps' and
'show lockedvnods' output from DDB.  The output is in the form of
screenshots combined into a single .pdf which can be accessed here
http://www.efinley.com/Binder1.pdf

That shows a deadlock-to-root in your /dev/ar0s1a (presumably root)
filesystem.  The perl process (pid 487) has an exclusive lock on
the FS mountpoint - this is blocking 130 other processes.  Pid 487
is itself waiting on another filesystem lock (you can't determine
the actual lock tree without more poking around kernel memory).

The vnode locks are held by processes:
 PID   namewaiting on
 487  perl   [ufs c3c1c1b4]
  57  syncer [snaplk c535f500]  (holds 2 locks)
 476  perl   [ufs c87e4f1c]
 489  perl   [snaplk c535f500]  (holds 2 locks)
3337  mksnap_ffs [getblk d77656f4]

Looking through the process list, cron has started a dump -L which
is trying to create a filesystem snapshot.  That has wedged on
getblk (trying to perform physical disk I/O) and is probably the
root of your problem.  Nothing else is waiting on physical I/O.

I'd say that your first guess was right:  This is a bug in the ATA
code and is probably a job for sos.

-- 
Peter Jeremy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-18 Thread Peter Jeremy
On Wed, 2005-May-18 16:03:16 +0100, Jamie Heckford wrote:
Managed to get a dump on our system for a similar prob we are getting:

That traceback looks like a panic, not a deadlock.  What was the panic
message?

#2  0xc0513474 in panic (fmt=0xc06c3da5 %s) at 
/usr/src/sys/kern/kern_shutdown.c:566
...
#7  0xc0510018 in crcopy () at /usr/src/sys/kern/kern_prot.c:1810
#8  0xc0598c77 in in_pcbdetach (inp=0xc0743a40) at 
/usr/src/sys/netinet/in_pcb.c:720
#9  0xc05b21a6 in tcp_close (tp=0x0) at /usr/src/sys/netinet/tcp_subr.c:783

There's something wrong here:  If tcp_close() is passed NULL it will panic
at this point when it tries to dereference tp.

#10 0xc05ae560 in tcp_input (m=0xc3a6a300, off0=20) at 
/usr/src/sys/netinet/tcp_input.c:2308
#11 0xc05a5aed in ip_input (m=0xc3a6a300) at 
/usr/src/sys/netinet/ip_input.c:776
#12 0xc0582f13 in netisr_processqueue (ni=0xc0742498) at 
/usr/src/sys/net/netisr.c:233
#13 0xc058310a in swi_net (dummy=0x0) at /usr/src/sys/net/netisr.c:346

-- 
Peter Jeremy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-18 Thread Jamie Heckford
Hi Peter,

On Thu, May 19, 2005 at 05:53:12AM +1000, Peter Jeremy wrote:
 On Wed, 2005-May-18 16:03:16 +0100, Jamie Heckford wrote:
 Managed to get a dump on our system for a similar prob we are getting:
 
 That traceback looks like a panic, not a deadlock.  What was the panic
 message?

Only have remote access to the box im afraid, is there anyway I can obtain
the panic message?

 
 #2  0xc0513474 in panic (fmt=0xc06c3da5 %s) at 
 /usr/src/sys/kern/kern_shutdown.c:566
 ...
 #7  0xc0510018 in crcopy () at /usr/src/sys/kern/kern_prot.c:1810
 #8  0xc0598c77 in in_pcbdetach (inp=0xc0743a40) at 
 /usr/src/sys/netinet/in_pcb.c:720
 #9  0xc05b21a6 in tcp_close (tp=0x0) at /usr/src/sys/netinet/tcp_subr.c:783
 
 There's something wrong here:  If tcp_close() is passed NULL it will panic
 at this point when it tries to dereference tp.

Starting to stretch my knowledge a bit now ;)

If I can provide you with further debug output would you be able to give me some
pointers?

Thanks for your help

-- 
Jamie Heckford
Network Manager
Trident Microsystems Ltd.

t: +44(0)1737-780790
f: +44(0)1737-771908
w: http://www.tridentmicrosystems.co.uk/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-18 Thread Doug White
On Wed, 18 May 2005, Jamie Heckford wrote:

 Hi Peter,

 On Thu, May 19, 2005 at 05:53:12AM +1000, Peter Jeremy wrote:
  On Wed, 2005-May-18 16:03:16 +0100, Jamie Heckford wrote:
  Managed to get a dump on our system for a similar prob we are getting:
 
  That traceback looks like a panic, not a deadlock.  What was the panic
  message?

 Only have remote access to the box im afraid, is there anyway I can obtain
 the panic message?

print msgbuf should do it


 
  #2  0xc0513474 in panic (fmt=0xc06c3da5 %s) at 
  /usr/src/sys/kern/kern_shutdown.c:566
  ...
  #7  0xc0510018 in crcopy () at /usr/src/sys/kern/kern_prot.c:1810
  #8  0xc0598c77 in in_pcbdetach (inp=0xc0743a40) at 
  /usr/src/sys/netinet/in_pcb.c:720
  #9  0xc05b21a6 in tcp_close (tp=0x0) at /usr/src/sys/netinet/tcp_subr.c:783
 
  There's something wrong here:  If tcp_close() is passed NULL it will panic
  at this point when it tries to dereference tp.

 Starting to stretch my knowledge a bit now ;)

 If I can provide you with further debug output would you be able to give me 
 some
 pointers?

 Thanks for your help



-- 
Doug White|  FreeBSD: The Power to Serve
[EMAIL PROTECTED]  |  www.FreeBSD.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-18 Thread Jamie Heckford
On Wed, May 18, 2005 at 03:54:59PM -0700, Doug White wrote:
 On Wed, 18 May 2005, Jamie Heckford wrote:
 
  Hi Peter,
 
  On Thu, May 19, 2005 at 05:53:12AM +1000, Peter Jeremy wrote:
   On Wed, 2005-May-18 16:03:16 +0100, Jamie Heckford wrote:
   Managed to get a dump on our system for a similar prob we are getting:
  
   That traceback looks like a panic, not a deadlock.  What was the panic
   message?
 
  Only have remote access to the box im afraid, is there anyway I can obtain
  the panic message?
 
 print msgbuf should do it

(kgdb) printf %s, (char *)msgbufp-msg_ptr

snip

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x214
fault code  = supervisor write, page not present
instruction pointer = 0x8:0xc059974e
stack pointer   = 0x10:0xecb4bb74
frame pointer   = 0x10:0xecb4bb7c
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 59 (swi1: net)
trap number = 12
panic: page fault
Uptime: 2h19m27s
Dumping 2047 MB
 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 
352 368 384 400 416 432 448 464 480 496 512 528 544 560 576 592 608 624 640 656 
672 688 704 720 736 752 768 784 800 816 832 848 864 880 896 912 928 944 960 976 
992 1008 1024 1040 1056 1072 1088 1104 1120 1136 1152 1168 1184 1200 1216 1232 
1248 1264 1280 1296 1312 1328 1344 1360 1376 1392 1408 1424 1440 1456 1472 1488 
1504 1520 1536 1552 1568 1584 1600 1616 1632 1648 1664 1680 1696 1712 1728 1744 
1760 1776 1792 1808 1824 1840 1856 1872 1888 1904 1920 1936 1952 1968 1984 2000 
2016 2032(kgdb)

Thanks :)

-- 
Jamie Heckford
Network Manager
Trident Microsystems Ltd.

t: +44(0)1737-780790
f: +44(0)1737-771908
w: http://www.tridentmicrosystems.co.uk/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-17 Thread Brent Casavant
On Tue, 17 May 2005 [EMAIL PROTECTED] wrote:

 Date: Mon, 16 May 2005 06:40:01 -0600
 From: Elliot Finley [EMAIL PROTECTED]
 Subject: 5.4-RC2 freezing - ATA related?
 To: freebsd-stable@freebsd.org
 Cc: [EMAIL PROTECTED]
 Message-ID: [EMAIL PROTECTED]
 Content-Type: text/plain; charset=iso-8859-1
 
 This has been happening since 5.3-R, I've been tuning different parameters
 to no avail.  I've taken the disks off of the onboard ICH5 controller and
 put them a promise TX4 S150 controller, but still the same thing happens.
 
 The system freezes, but isn't totally dead.  It'll still respond to pings,
 the screensaver still functions, but it won't respond to a CAD at the
 console.  But if I press 'Enter' at the console, it'll give me a 'login:'
 prompt, but after entering the username, it never comes back with the
 'password:' prompt.
 
 After manually resetting the system it boots and says 'Automatic file system
 check failed; help!' and drops into single user mode.  Running fsck manually
 corrects errors on all volumes.  Then it'll boot from that point.
 
 This seems to be triggered by daily periodic as it happens at 3:02-3:03AM
 each time.  But it doesn't happen *every* morning.
 
 I suspect a bug in FreeBSD because this mode of failure happens on 3
 different machines, all configured similarly.

You can add a fourth.  Ever since 5.1 (my first 5.x install) I have
experienced the same problem, again with an Intel ICH5 ATA controller.
The symptoms are exactly the same -- the hang is normally triggered
during the periodic runs just after 3AM.  The hang does occur at other
times as well, but with nowhere near the same consistency.

The only solution I found at that time was reverting to 4.10, though
that is obviously suboptimal.  I could be persuaded to reinstall 5.x
on the machine if I'd be sure to get someone to look into this.

Thanks,
Brent Casavant

-- 
Brent Casavant  http://www.angeltread.org/
KD5EMB  -.- -.. . . -- -...
44 54'24N 93 03'21W 907FASL   EN34lv
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-17 Thread Dominic Marks
On Tuesday 17 May 2005 15:58, Brent Casavant wrote:
snip

 You can add a fourth.  Ever since 5.1 (my first 5.x install) I have
 experienced the same problem, again with an Intel ICH5 ATA controller.
 The symptoms are exactly the same -- the hang is normally triggered
 during the periodic runs just after 3AM.  The hang does occur at other
 times as well, but with nowhere near the same consistency.

I've got four machines with ICH5/6 chips in, no stability problems whatsoever, 
and thats been the case since I installed them, around 5.2.1. Perhaps it is 
something to do with your workload, or another piece of hardware in the 
system. The machines do a lot of disc i/o so during their working days.

 The only solution I found at that time was reverting to 4.10, though
 that is obviously suboptimal.  I could be persuaded to reinstall 5.x
 on the machine if I'd be sure to get someone to look into this.

 Thanks,
 Brent Casavant

HTH,
-- 
Dominic
GoodforBusiness.co.uk
I.T. Services for SMEs in the UK.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-17 Thread Brent Casavant
On Tue, 17 May 2005, Dominic Marks wrote:

 I've got four machines with ICH5/6 chips in, no stability problems 
 whatsoever, 
 and thats been the case since I installed them, around 5.2.1. Perhaps it is 
 something to do with your workload, or another piece of hardware in the 
 system. The machines do a lot of disc i/o so during their working days.

Not so much so in this case.  It was being used only as a workstation,
completely idle when I wasn't sitting in front of it.  While it would
occasionally lock up while in active use, that was relatively rare
compared to the frequent hangs just after 3AM.

I do agree it could be some other piece of hardware, but the fact that
at least two of us have identical problems, often triggered by the same
event (nightly periodic runs), starts to point in the direction of a
software bug.

I can't say for certain that downgrading to 4.x solved the problem as I
needed to do that install on a SCSI drive instead, in order to preserve
the contents of the filesystem on the ATA drive until I could transfer
everything over.

If I remember correctly (forgive me, this was about a year ago), I
ran a number of disk performance benchmarks and other general stress
tests on the ATA drive, and never was able to manually trigger the
hang.

Anecdotally, a friend of mine who recently tried 5.3 ran into frequent
filesystem/drive problems as well, which reverting to 4.x solved.
However it didn't sound like exactly the same problem.

Brent

-- 
Brent Casavant  http://www.angeltread.org/
KD5EMB  -.- -.. . . -- -...
44 54'24N 93 03'21W 907FASL   EN34lv
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-17 Thread Jamie Heckford
On Mon, May 16, 2005 at 06:40:01AM -0600, Elliot Finley wrote:
 This has been happening since 5.3-R, I've been tuning different parameters
 to no avail.  I've taken the disks off of the onboard ICH5 controller and
 put them a promise TX4 S150 controller, but still the same thing happens.
 
 The system freezes, but isn't totally dead.  It'll still respond to pings,
 the screensaver still functions, but it won't respond to a CAD at the
 console.  But if I press 'Enter' at the console, it'll give me a 'login:'
 prompt, but after entering the username, it never comes back with the
 'password:' prompt.
 
 After manually resetting the system it boots and says 'Automatic file system
 check failed; help!' and drops into single user mode.  Running fsck manually
 corrects errors on all volumes.  Then it'll boot from that point.
 
 This seems to be triggered by daily periodic as it happens at 3:02-3:03AM
 each time.  But it doesn't happen *every* morning.
 
 I suspect a bug in FreeBSD because this mode of failure happens on 3
 different machines, all configured similarly.
 

We are having similar problems to this on a box, won't go into great detail at 
the moment
but will post results when we have finished testing.

-- 
Jamie Heckford
Network Manager
Trident Microsystems Ltd.

t: +44(0)1737-780790
f: +44(0)1737-771908
w: http://www.tridentmicrosystems.co.uk/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-17 Thread Peter Jeremy
On Tue, 2005-May-17 09:58:33 -0500, Brent Casavant wrote:
The only solution I found at that time was reverting to 4.10, though
that is obviously suboptimal.  I could be persuaded to reinstall 5.x
on the machine if I'd be sure to get someone to look into this.

It doesn't work that way.  You are going to need to provide much more
information and do some of the work yourself.  I'd suggest that you:
1) Install 5.4-RELEASE (or -STABLE), including a kernel built with
   debugging and DDB enabled (see my previous post and/or the handbook).
2) Confirm that the problem still exists for you.
3) Since you think it's the daily tasks, run periodic daily manually
   and try to provoke the problem.
4) Once you can provoke it, run the scripts in /etc/periodic/daily
   individually to identify which script is the problem.  Try to narrow
   it down to a single command within the script.
5) Once you can identify a command (or command sequence) that provokes'
   the problem, save a crashdump and send your dmesg, the sequence of
   commands you ran as well as the DDB output from show lockedvnods
   and ps to this list.  That's enough information for someone to
   make a start on investigating the problem.

If that's all too hard and you want a fix, see
http://www.freebsd.org/commercial/consult_bycat.html

-- 
Peter Jeremy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-17 Thread Brent Casavant
On Wed, 18 May 2005, Peter Jeremy wrote:

 On Tue, 2005-May-17 09:58:33 -0500, Brent Casavant wrote:
 The only solution I found at that time was reverting to 4.10, though
 that is obviously suboptimal.  I could be persuaded to reinstall 5.x
 on the machine if I'd be sure to get someone to look into this.
 
 It doesn't work that way.  You are going to need to provide much more
 information and do some of the work yourself.

Oh certainly.  Didn't mean to imply otherwise.  I simply meant that
I'm not qualified to look into IDE or filesystem related kernel code
(the two most likely culprits), but if someone else was willing to
do so I'd be happy to do any sort of testing they requested.

Sorry for muddling my original statement.

Brent

-- 
Brent Casavant  http://www.angeltread.org/
KD5EMB  -.- -.. . . -- -...
44 54'24N 93 03'21W 907FASL   EN34lv
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


5.4-RC2 freezing - ATA related?

2005-05-16 Thread Elliot Finley
This has been happening since 5.3-R, I've been tuning different parameters
to no avail.  I've taken the disks off of the onboard ICH5 controller and
put them a promise TX4 S150 controller, but still the same thing happens.

The system freezes, but isn't totally dead.  It'll still respond to pings,
the screensaver still functions, but it won't respond to a CAD at the
console.  But if I press 'Enter' at the console, it'll give me a 'login:'
prompt, but after entering the username, it never comes back with the
'password:' prompt.

After manually resetting the system it boots and says 'Automatic file system
check failed; help!' and drops into single user mode.  Running fsck manually
corrects errors on all volumes.  Then it'll boot from that point.

This seems to be triggered by daily periodic as it happens at 3:02-3:03AM
each time.  But it doesn't happen *every* morning.

I suspect a bug in FreeBSD because this mode of failure happens on 3
different machines, all configured similarly.

ASUS P4P800
2G RAM (though the other affected systems only have 1G)
80G Seagate Barracuda SATA drives (one system now on Promise TX4 S150
controller, others on onboard ICH5)

On my lightly loaded systems, it happens rarely.  On my mailserver (fairly
heavy disk load), it happens quite frequently.

How can I troubleshoot this?

dmesg follows:

Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.4-RC2 #2: Wed Apr 13 17:35:20 MDT 2005
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/Postmaster
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: Intel(R) Pentium(R) 4 CPU 2.60GHz (2605.92-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0xf29  Stepping = 9

Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA
,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
  Hyperthreading: 2 logical CPUs
real memory  = 2146631680 (2047 MB)
avail memory = 2095153152 (1998 MB)
ACPI APIC Table: A M I  OEMAPIC 
ioapic0 Version 2.0 irqs 0-23 on motherboard
npx0: math processor on motherboard
npx0: INT 16 interface
acpi0: A M I OEMXSDT on motherboard
acpi0: Power Button (fixed)
Timecounter ACPI-fast frequency 3579545 Hz quality 1000
acpi_timer0: 24-bit timer at 3.579545MHz port 0x808-0x80b on acpi0
cpu0: ACPI CPU on acpi0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0
pci0: ACPI PCI bus on pcib0
agp0: Intel 82865 host to AGP bridge mem 0xf800-0xfbff at device
0.0 on pci0
pcib1: ACPI PCI-PCI bridge at device 1.0 on pci0
pci1: ACPI PCI bus on pcib1
pci1: display, VGA at device 0.0 (no driver attached)
uhci0: Intel 82801EB (ICH5) USB controller USB-A port 0xef00-0xef1f irq 16
at device 29.0 on pci0
usb0: Intel 82801EB (ICH5) USB controller USB-A on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: Intel 82801EB (ICH5) USB controller USB-B port 0xef20-0xef3f irq 19
at device 29.1 on pci0
usb1: Intel 82801EB (ICH5) USB controller USB-B on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2: Intel 82801EB (ICH5) USB controller USB-C port 0xef40-0xef5f irq 18
at device 29.2 on pci0
usb2: Intel 82801EB (ICH5) USB controller USB-C on uhci2
usb2: USB revision 1.0
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
uhci3: Intel 82801EB (ICH5) USB controller USB-D port 0xef80-0xef9f irq 16
at device 29.3 on pci0
usb3: Intel 82801EB (ICH5) USB controller USB-D on uhci3
usb3: USB revision 1.0
uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub3: 2 ports with 2 removable, self powered
pci0: serial bus, USB at device 29.7 (no driver attached)
pcib2: ACPI PCI-PCI bridge at device 30.0 on pci0
pci2: ACPI PCI bus on pcib2
skc0: 3Com 3C940 Gigabit Ethernet port 0xd800-0xd8ff mem
0xfeafc000-0xfeaf irq 22 at device 5.0 on pci2
skc0: 3Com Gigabit LOM (3C940) rev. (0x1)
sk0: Marvell Semiconductor, Inc. Yukon on skc0
sk0: Ethernet address: 00:0c:6e:54:4b:19
miibus0: MII bus on sk0
e1000phy0: Marvell 88E1000 Gigabit PHY on miibus0
e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX-FDX,
auto
atapci0: Promise PDC20319 SATA150 controller port
0xdc00-0xdc7f,0xdfa0-0xdfaf,0xdf00-0xdf3f mem
0xfeac-0xfead,0xfeafb000-0xfeafbfff irq 21 at device 9.0 on pci2
atapci0: failed: rid 0x20 is memory, requested 4
ata2: channel #0 on atapci0
ata3: channel #1 on atapci0
ata4: channel #2 on atapci0
ata5: channel #3 on atapci0
xl0: 3Com 3c905C-TX Fast Etherlink XL port 0xd480-0xd4ff mem
0xfeaf9c00-0xfeaf9c7f irq 20 at device 12.0 on pci2
miibus1: MII bus on xl0
ukphy0: Generic IEEE 802.3u media interface on miibus1
ukphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
xl0: Ethernet address: 00:04:75:f1:1c:7e
isab0: 

Re: 5.4-RC2 freezing - ATA related?

2005-05-16 Thread Søren Schmidt
Elliot Finley wrote:
This has been happening since 5.3-R, I've been tuning different parameters
to no avail.  I've taken the disks off of the onboard ICH5 controller and
put them a promise TX4 S150 controller, but still the same thing happens.
The system freezes, but isn't totally dead.  It'll still respond to pings,
the screensaver still functions, but it won't respond to a CAD at the
console.  But if I press 'Enter' at the console, it'll give me a 'login:'
prompt, but after entering the username, it never comes back with the
'password:' prompt.
After manually resetting the system it boots and says 'Automatic file system
check failed; help!' and drops into single user mode.  Running fsck manually
corrects errors on all volumes.  Then it'll boot from that point.
This seems to be triggered by daily periodic as it happens at 3:02-3:03AM
each time.  But it doesn't happen *every* morning.
Hmm, sounds as a deadlock somewhere.
On the ATA part, try the ATA mkIII patches on 
http://people.freebsd.org/~sos/ATA and see if that changes anything.

--
-Søren
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4-RC2 freezing - ATA related?

2005-05-16 Thread Peter Jeremy
On Mon, May 16, 2005 at 06:40:01AM -0600, Elliot Finley wrote:
The system freezes, but isn't totally dead.  It'll still respond to pings,
the screensaver still functions, but it won't respond to a CAD at the
console.  But if I press 'Enter' at the console, it'll give me a 'login:'
prompt, but after entering the username, it never comes back with the
'password:' prompt.
...
On my lightly loaded systems, it happens rarely.  On my mailserver (fairly
heavy disk load), it happens quite frequently.

This could equally be a filesystem deadlock (race-to-root) rather than
something in the ATA controller.  Do you know if it happens gradually
(starts with one or two non-responsive, unkillable processes and gets
worse until nothing happens)?

How can I troubleshoot this?

Re-compile the kernel with:
   options KDB
   options DDB
   makeoptions DEBUG=-g
and ensure you have a dumpdev in /etc/rc.conf.  When you get a
lockup, drop to DDB (Ctrl-Alt-ESC) and run show lockedvnods, ps
and call doadump().  If you post the output (a serial console will
help here) someone might be able to provide more pointers.  (The
crashdump will help with later debugging).

Note: If you don't have another FreeBSD system handy, a hard copy
of ddb(4) will be very handy if you want to play around in DDB.

-- 
Peter
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]