mbuf leak in bpf.c

2004-12-27 Thread Johnny Eriksson
If one tries to write a datagram to a bpf device, and the datagram is
longer than the MTU on the physical interface, the write fails as it
should, but an mbuf is allocated and thrown away.  Proposed solution:

--- bpf.c.orig  Mon Dec 27 10:43:06 2004
+++ bpf.c   Mon Dec 27 10:44:16 2004
@@ -633,8 +633,10 @@
if (error)
return (error);
 
-   if (datlen  ifp-if_mtu)
+   if (datlen  ifp-if_mtu) {
+   m_freem(m);
return (EMSGSIZE);
+   }
 
if (d-bd_hdrcmplt)
dst.sa_family = pseudo_AF_HDRCMPLT;

--Johnny
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: netstat fails with memory allocation error and error in kvm_read

2004-12-27 Thread techlists
 
   You appear to be running out of kernel memory. Since you're 
   capturing the output of vmstat -m, you should check that for any 
   bins that are growing at a high rate of speed.
  
   Seems possible that its in pf :)
 
  I've checked the numbers from just before the freeze (it's 
 within 15 
  secs) with two sets of data: From a fresh boot and five minutes 
  minutes before the freeze.
 
 You might also log 'sysctl vm.kvm_free' and 'sysctl vm.zone'.

sysctl vm.zone is identical to vmstat -z (according to man vmstat).

I've graphed the output from iostat (idle/user/...), vmstat -i (interrupt
rate), vmstat -m (in use), vmstat -z (used), sysctl vm.kvm_free (which is
constant) and the number of pfstates. The graphs are at
http://www.aub.dk/~jmp/fw/plots/. The newest data are from just after a
deadlock. Are there something else I should graph?

IRQ 20 is the NIC on our internal network (800+ machines), IRQ 18 and IRQ21
are NICs connected to the internet. There are a lot of changes on the vmstat
-m graphs just before midnight last night that seems to correspond with the
increase in interrupts on IRQ 18. 

The only graphs I can see changing up to the deadlock are:
irq20 (internal NIC), 
irq21 (primary external NIC), 
the buckets (vmstat -z) all grow (I suppose this is normal?)
the Mbufs seems to grow, but nothing extreme
pffrag, pffrent (but not to levels they haven't been at before)

Most notably most of the pf graphs doesn't change. Where can I see memory
used by pf/altq? If it is pfaltqpl (in vmstat -z), it doesn't change at all.

I'm in the process of setting up a serial console in the hope that I can
break to the debugger with that. I'm also trying to provoke the deadlock so
it will happen more frequently.

/Martin

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


urgent help

2004-12-27 Thread kalin mintchev
PLEASE REPLY TO [EMAIL PROTECTED]

upgraded from 4.6 = 4.10 rel

network programs are craching the new system: netstat, ping, the qmail tcp
server all of them...
sshd is running but when accessing from outside it panics too...  what is it?

can i turn something off in the kernel?!




-- 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: urgent help

2004-12-27 Thread kalin mintchev

 PLEASE REPLY TO [EMAIL PROTECTED]

 upgraded from 4.6 = 4.10 rel

 network programs are craching the new system: netstat, ping, the qmail tcp
 server all of them...
 sshd is running but when accessing from outside it panics too...  what is
 it?

 can i turn something off in the kernel?!




-- 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: urgent help

2004-12-27 Thread Bill Moran
kalin mintchev [EMAIL PROTECTED] wrote:

 PLEASE REPLY TO [EMAIL PROTECTED]
 
 upgraded from 4.6 = 4.10 rel
 
 network programs are craching the new system: netstat, ping, the qmail tcp
 server all of them...
 sshd is running but when accessing from outside it panics too...  what is it?
 
 can i turn something off in the kernel?!

What process did you follow to update?  It sounds to me like you didn't
complete the upgrade process, skipped a step, or did it improperly.

There's no reason I can think of that upgrading should cause things to
panic, unless you did the upgrade process improperly.

-- 
Bill Moran
Potential Technologies
http://www.potentialtech.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: urgent help

2004-12-27 Thread kalin mintchev
PLEASE REPLY TO [EMAIL PROTECTED]


thank you Bill for rplying...

well i did it a few times with the same success. it's not the first time
i'm doing it. it's the first time with the 4.x..

i followed the handbook step by step - rebuild devs too..  and then
cleaned up obj.. to make it all again - the same problems were happening
after every try...

the machine would come up. then netsat or ping or ssh will crash it... the
first time i had to add the sshd user and group...

i mostly installed the new etc files except the passwd, group and hosts...

i have a copy of the old etc...

what else do i need?


 kalin mintchev [EMAIL PROTECTED] wrote:

 PLEASE REPLY TO [EMAIL PROTECTED]

 upgraded from 4.6 = 4.10 rel

 network programs are craching the new system: netstat, ping, the qmail
 tcp
 server all of them...
 sshd is running but when accessing from outside it panics too...  what
 is it?

 can i turn something off in the kernel?!

 What process did you follow to update?  It sounds to me like you didn't
 complete the upgrade process, skipped a step, or did it improperly.

 There's no reason I can think of that upgrading should cause things to
 panic, unless you did the upgrade process improperly.

 --
 Bill Moran
 Potential Technologies
 http://www.potentialtech.com



-- 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: urgent help

2004-12-27 Thread kalin mintchev
PLEASE REPLY TO [EMAIL PROTECTED]


 On Mon, Dec 27, 2004 at 02:40:34PM +0100, Andreas Wider?e Andersen
typed:
 At 09:35 27.12.2004, you wrote:
  PLEASE REPLY TO [EMAIL PROTECTED]
 
  upgraded from 4.6 = 4.10 rel
 
  network programs are craching the new system: netstat, ping, the
qmail
 tcp
  server all of them...
  sshd is running but when accessing from outside it panics too...
what
 is
  it?
 
  can i turn something off in the kernel?!
 Did you make world in addition to recompiling the Kernel? Sounds like
your system is out of sync.
 Here's a note about how I did it a while back:
 http://home.eunet.no/~awand/freebsd-4.6_installasjon.txt (it's in
Norwegian, but all commands and order should be understandable.

how do i make it in sync?!

i did buildworld first - as it's in the handbook. i've done 5.x five
before without a problem...

this is for a mailserver in production...


 From this document I understand you do a make buildkernel before you
do
 a make buildworld. That's not the recommended order. Build world
before
 you build kernel.




-- 



-- 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ggated, dvd+rw, atapicam problem

2004-12-27 Thread Pawel Jakub Dawidek
On Wed, Nov 24, 2004 at 09:25:28PM -0600, Vulpes Velox wrote:
+ Just hit a odd problem... here is what I am doing... I have a dvd+rw
+ drive that I am trying to export using ggated... of which some thing
+ is going wrong... any one have any idea what is happening?
+ 
+ I think I provided all the possible info, if any one can think of any
+ thing more, please let me know.
+ 
+ 
+ [v42]:/etc# ggatec create 192.168.0.3 /dev/cd0
+ ggate0
+ [v42]:/etc# dvd+rw-mediainfo /dev/ggate0
+ /dev/ggate0: unable to open: Inappropriate ioctl for device
+ 
+ excert from dmesg...
+ acd0: DVDR TOSHIBA ODD-DVD SD-R5272/1030 at ata1-master UDMA33
+ cd0 at ata1 bus 0 target 0 lun 0
+ cd0: TOSHIBA ODD-DVD SD-R5272 1030 Removable CD-ROM SCSI-0 device
+ cd0: 33.000MB/s transfers
+ cd0: cd present [338104 x 2048 byte records]
+ 
+ gg.exports...
+ 192.168.0.2 RW  /dev/acd0
+ 192.168.0.2 RW  /dev/acd0t01
+ 192.168.0.2 RW  /dev/cd
+ 
+ uname -a client box...
+ FreeBSD vixen42 5.3-STABLE FreeBSD 5.3-STABLE #2: Wed Nov 24 16:04:21
+ CST 2004 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/vixen42-1  i386
+ 
+ uname -a server box...
+ FreeBSD fennec 5.3-STABLE FreeBSD 5.3-STABLE #0: Wed Nov 10 13:34:27
+ CST 2004 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/fennec-1  i386
+ 
+ 
+ btw the box I am trying to access it from does not have atapicam on it
+ do to atapicam cuases this box to hardlock since it has two atapi cd
+ drives on a promise card, which cuases the system to hardlock if any
+ atapi drives are found hooked up to a promise controller.

GEOM Gate can send only I/O requests, it cannot forward ioctls.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpBTaaHI29Bp.pgp
Description: PGP signature


Re: Questions about GEOM and MIRROR

2004-12-27 Thread Pawel Jakub Dawidek
On Mon, Dec 13, 2004 at 08:20:30PM +0100, Samuel Tardieu wrote:
+ Hi.
+ 
+ I just added two disks (ad4  ad6, SATA 160Go) to my FreeBSD box. I want to
+ use them in the following configuration:
+ 
+   - ad4s1  ad6s1: geom mirror of 80Go containing all my precious data (/,
+ /usr, /var, /home)
+ 
+   - ad4s2b: swap
+ 
+   - ad4s2x, ad6s2x: non-important data
+ 
+ On the mirror (ad4s1+ad6s1), I created partitions for /, /tmp, /usr,
+ and /var.
+ 
+ Is there any pitfall in doing so? Do I have to be careful to keep extra space
+ somewhere? (such as one sector at the end of ad4s1/ad6s1)
+ 
+ I can't seem to place bootcode at the beginning of the mirror:
+ 
+ # bsdlabel /dev/mirror/precious
+ # /dev/mirror/precious:
+ 8 partitions:
+ #size   offsetfstype   [fsize bsize bps/cpg]
+   a:   524288   164.2BSD 2048 16384 32776 
+   c: 1677667310unused0 0 # raw part, don't 
edit
+   d:  2097152   5243044.2BSD 2048 16384 28552 
+   e:   524288  26214564.2BSD 2048 16384 32776 
+   f: 164620971  31457444.2BSD 2048 16384 28552 
+ 
+ # bsdlabel -B /dev/mirror/precious
+ bsdlabel: Geom not found
+ 
+ What does this error mean?

It could mean, that there is no such device.
Hard to say, as I can't reproduce it here - 'bsdlabel -B' works for me.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpuYtwfWNNkX.pgp
Description: PGP signature


Re: mbuf leak in bpf.c

2004-12-27 Thread Pawel Jakub Dawidek
On Mon, Dec 27, 2004 at 12:24:49PM +, Johnny Eriksson wrote:
+ If one tries to write a datagram to a bpf device, and the datagram is
+ longer than the MTU on the physical interface, the write fails as it
+ should, but an mbuf is allocated and thrown away.  Proposed solution:

Committed to HEAD, thanks!

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp7ufDAkrjsR.pgp
Description: PGP signature


Re: acpi boot error messages after last update (Dec 22nd)

2004-12-27 Thread Federico Galvez-Durand Besnard
Sorry, I was out for Xmas. Longer dmesg here:
http://www.del.ufrj.br/~fico/FreeBSD/debug/dmesg03
Apparently, my Notebook works well (acpi doesn't).
I did not have these error messages before
last big acpi update.
Before that update dmesg pointed out acpi was doing something
(I was debugging USB, so this dmesg was recorded):
http://www.del.ufrj.br/~fico/FreeBSD/debug/dmesg01
Thanks!
p.s.:
uname -a
FreeBSD me.HERE 5.3-STABLE FreeBSD 5.3-STABLE #20: Wed Dec 22 20:31:43 
GMT-1 2004  [EMAIL PROTECTED]  i386

Lowell Gilbert wrote:
Federico Galvez-Durand Besnard [EMAIL PROTECTED] writes:
 

Hi, just compiled new kernel with lattest acpi.
I am getting boot error messages.
I set hint.acpi.0.disabled=1 in device.hints.
With acpi disabled I get this:
( partial dmesg )
...
vga0: Generic ISA VGA at port 0x3c0-0x3df iomem 0xa-0xb on isa0
unknown: PNP0c01 can't assign resources (memory)
unknown: PNP0100 can't assign resources (irq)
unknown: PNP0303 can't assign resources (port)
unknown: PNP0f13 can't assign resources (irq)
unknown: PNP0c02 can't assign resources (port)
unknown: PNP0700 can't assign resources (port)
unknown: PNP0401 can't assign resources (port)
Timecounter TSC frequency 646825914 Hz quality 800
Timecounters tick every 10.000 msec
...
   

I don't see any error messages there.
What is the actual problem?
.
 

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Spinlock errors.

2004-12-27 Thread Yann Golanski
I'm getting a few errors since I used tried to upgrade gtk2 and librsvg,
namely:  

  Fatal error 'Spinlock called when not threaded.' at line 83 in file
  /usr/src/lib/libpthread/thread/thr_spinlock.c (errno = 0)
  gmake[3]: *** [install-data-hook] Error 134
  gmake[3]: Leaving directory 
`/usr/ports/graphics/librsvg2/work/librsvg-2.8.1/gdk -pixbuf-loader'

Any got any ideas as to how to fix it?   I suspect some mix up of some
core library but I am not sure which one...

# gmake --version
GNU Make 3.80
Copyright (C) 2002  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

# uname -a
FreeBSD gridlinked.neverness.org 5.3-STABLE FreeBSD 5.3-STABLE #1: Fri
Dec  3 13:53:15 GMT 2004
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/GRIDLINKED  i386

-- 
[EMAIL PROTECTED]  -=*=-  www.kierun.org
PGP:   009D 7287 C4A7 FD4F 1680  06E4 F751 7006 9DE2 6318


pgpWTPiuE21c4.pgp
Description: PGP signature


Re: Spinlock errors.

2004-12-27 Thread Kris Kennaway
On Mon, Dec 27, 2004 at 06:48:36PM +, Yann Golanski wrote:
 I'm getting a few errors since I used tried to upgrade gtk2 and librsvg,
 namely:  
 
   Fatal error 'Spinlock called when not threaded.' at line 83 in file
   /usr/src/lib/libpthread/thread/thr_spinlock.c (errno = 0)
   gmake[3]: *** [install-data-hook] Error 134
   gmake[3]: Leaving directory 
 `/usr/ports/graphics/librsvg2/work/librsvg-2.8.1/gdk -pixbuf-loader'
 
 Any got any ideas as to how to fix it?   I suspect some mix up of some
 core library but I am not sure which one...

See the mailing list archives and UPDATING for extensive discussion of
this issue.

Kris


pgp8YjX1NWDvM.pgp
Description: PGP signature


am64/FreeBSD-5.3-STABLE (or RELEASE) crashes often

2004-12-27 Thread Troy Bowman
And it doesn't dump its core to its dump swap space, too, so I can't run
savecore after reboot to get debugging info.  I have the swap space in
fstab commented out so it won't come up at boot to be able to manually
harvest the core, as it gives savecore: no dumps found.  (it doesn't
happen automatically, either). 

We recently thought we'd give 5.3 a go in production, and it has been
too unstable.   When it crashes, it doesn't reboot, so it just hangs
there until someone has to drive in and push the button.  Who knows,
maybe Linux would be more stable at this point.  Sigh.

Hardware that it is running on is a Tyan s2875 with dual amd64/246
processors, and 2 GB Registered DDR RAM (Corsair).  We're also running
vinum for all of the filesystems, mirroring them all, including the root
filesystem.  The vinum is using two SATA WD Raptors.  I have one older
IDE drive plugged in to capture the kernel dumps.  

We've tried many different memory configurations to see if we can tune
it so that FreeBSD can handle it (DRAM ECC vs master ECC, bank  node
interleaving turned off/on, slowing the memory down, DRAM Scrub Redirect
off/on, etc, to no avail.

It's usually pagedaemon that croaks, but it crashes on the keyboard irq
process and serial IO irq process for some reason also.  I guess since
it's usually the pager that dies, that's the reason why I can't get
kernel dumps.  Here are some (manually copied) panics from the console.

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x88
fault code  = supervisor read, page not present
instruction pointer = 0x8:0x80389aea
stack pointer   = 0x10:0xb2051a60
frame pointer   = 0x10:0xff006b12d000
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 53 (pagedaemon)
trap number = 12
panic: page fault
cpuid = 0
boot() called on cpu#0
Uptime: 10h18m49s

...

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x88
fault code  = supervisor read, page not present
instruction pointer = 0x8:0x8038a10a
frame pointer   = 0x10:0xb2051ab0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 53 (pagedaemon)
trap number = 12
panic: page fault
cpuid = 0
boot() called on cpu#0
Uptime: 15h59m55s

...
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= resumek IOPL = 0
current process = 36 (swi5: clock sio)
trap number = 12
panic: page fault
cpuid = 1
kernel trap 12 with interrupts disabled

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x48
fault code  = supervisor read, page not present
instruction pointer = 0x8: 0x803a40d3
stack pointer   = 0x10: 0xb1d63650
frame pointer   = 0x10: 0xff007b7f3a40
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0,pres 1, long 1, def32 0, gran 1
processor eflags= resume, IOPL = 0
current process = 30
trap number = 12
panic: page fault
cpuid = 1
spin lock sched lock held by 0xff007b8177b0 for  5 seconds

...


What can I do to debug this more if I can't harvest the kernel dumps to
report a bug?  Is there anything the FreeBSD team can do?   Do I need to
resort to Linux for dual amd64 support for now? cringe

Thanks,

../troy


smime.p7s
Description: S/MIME cryptographic signature


Re: am64/FreeBSD-5.3-STABLE (or RELEASE) crashes often

2004-12-27 Thread Kris Kennaway
On Mon, Dec 27, 2004 at 02:52:45PM -0700, Troy Bowman wrote:

 And it doesn't dump its core to its dump swap space, too, so I can't run
 savecore after reboot to get debugging info.  I have the swap space in
 fstab commented out so it won't come up at boot to be able to manually
 harvest the core, as it gives savecore: no dumps found.  (it doesn't
 happen automatically, either). 

Please double-check that you're running 'dumpon'.  If you don't
configure swap at boot time, it won't be run automatically by the boot
scripts.

 Hardware that it is running on is a Tyan s2875 with dual amd64/246
 processors, and 2 GB Registered DDR RAM (Corsair).  We're also running
 vinum for all of the filesystems, mirroring them all, including the root
 filesystem.  The vinum is using two SATA WD Raptors.  I have one older
 IDE drive plugged in to capture the kernel dumps.  

There's an erratum for vinum in 5.3.

 What can I do to debug this more if I can't harvest the kernel dumps to
 report a bug?

See the chapter on kernel debugging in the developers' handbook,
available on the website.  It takes you through how to configure your
kernel with support for the debugger, and how to obtain minimal
information from it when you encounter a panic.

You might like to first update to FreeBSD 5.3-STABLE in case the bugs
are already fixed.

 Is there anything the FreeBSD team can do?

Perhaps, once you have the above information.  Also report it to the
freebsd-amd64 mailing list.

Kris

pgpvGJ7fXTgCB.pgp
Description: PGP signature


Re: slow system freeze - data

2004-12-27 Thread Peter Jeremy
On Sun, 2004-Dec-26 08:14:49 +0100, Benjamin Lutz wrote:
The freeze just happened again. I managed to get into the debugger and get 
some info.

The info you dumped shows that there's a filesystem deadlock on
ad4s1f.  This is consistent with the behaviour you reported - the
system is running normally but as soon as a process trys to access
that filesystem, it freezes.  Eventually, everything all processes are
frozen.

Unfortunately, it's not clear (to me) where to go next.  Printing the
locked vnodes might help but that's not easy to do without gdb.

The first app that froze as far as I could tell was xmms.

Actually, the locks suggest that the problem started with pid 678 - kdeinit.
This is unlikely to be 

-- 
Peter Jeremy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: slow system freeze - data

2004-12-27 Thread Benjamin Lutz
Hello Peter,

 The info you dumped shows that there's a filesystem deadlock on
 ad4s1f.

In case you haven't guessed, that'd be my /usr.

 Unfortunately, it's not clear (to me) where to go next.  Printing the
 locked vnodes might help but that's not easy to do without gdb.

You mean that's the point where I need serial console access? I hope to 
have that running after the holidays.

 The first app that froze as far as I could tell was xmms.

 Actually, the locks suggest that the problem started with pid 678 -
 kdeinit. This is unlikely to be

Well, xmms is just the first app where it became apparent :)
PID 678 is really kded (at least it is at the moment - It is very likely 
it was then too, since these low PIDs seem to generally be assigned the 
same way with each boot). kded appears to be some CORBA-related tool used 
by KDE.

Btw, is my assumption that this is a kernel problem, not a problem with 
any of my applications, correct?

Anyway, many thanks for your help and insight so far, it is appreciated.

Greetings
Benjamin



pgpoLose9alke.pgp
Description: PGP signature


Re: TIMEOUT - WRITE_DMA - A possible FIX! turn off ACPI

2004-12-27 Thread Joe Koberg
Zsolt Kúti wrote:
My system produces these messages that I already know well from this
list (as well ;):
ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=213249674

Like many people I was confronted with TIMEOUT - READ_DMA
and TIMEOUT - WRITE_DMA errors on my drives. I was frustrated.
But I found a workaround: Turning off ACPI.
I just received a Highpoint RocketRaid 1640 controller,
2 Maxtor 300GB drives, and a Supermicro 5-drive SATA cage.
I am testing this configuration for a storage server.
I am using an old motherboard, DTK brand, Slot 1. 300A Celeron.
Under a fresh install of 5.3-RELEASE I am unable to read or write
both drives heavily at the same time.  One drive alone seems to work
OK. When I run dd blasting both drives with seqential IO, I get
TIMEOUT - WRITE(READ)_DMA. Repeatably, within 15 seconds.
However I got a good test before I installed 5.3-R, the box was running
with 5.3-BETA. Only difference was I booted without ACPI.
So I rebooted the freshly installed 5.3-R without ACPI, and It works!
I can read at 50MB/s per drive concurrently (hitting PCI bus speed
limit?), and write at 30MB/s per drive concurrently. No errors so
far, and its been dd'ing for a half hour.
I hope this report helps someone!

Joe Koberg
joe at osoft dot us


dmesg:
FreeBSD 5.3-RELEASE #0: Fri Nov  5 04:19:18 UTC 2004
   [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: Pentium II/Pentium II Xeon/Celeron (307.84-MHz 686-class CPU)
 Origin = GenuineIntel  Id = 0x660  Stepping = 0
 
Features=0x183f9ffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR
real memory  = 402587648 (383 MB)
avail memory = 384270336 (366 MB)
npx0: [FAST]
npx0: math processor on motherboard
npx0: INT 16 interface
pcib0: Intel 82443LX (440 LX) host to PCI bridge pcibus 0 on motherboard
pir0: PCI Interrupt Routing Table: 7 Entries on motherboard
pci0: PCI bus on pcib0
agp0: Intel 82443LX (440 LX) host to PCI bridge mem 
0xe000-0xe3ff at device 0.0 on pci0
pcib1: PCI-PCI bridge at device 1.0 on pci0
pci1: PCI bus on pcib1
pci1: display, VGA at device 0.0 (no driver attached)
isab0: PCI-ISA bridge at device 7.0 on pci0
isa0: ISA bus on isab0
atapci0: Intel PIIX4 UDMA33 controller port 
0xf000-0xf00f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 7.1 on pci0
ata0: channel #0 on atapci0
ata1: channel #1 on atapci0
uhci0: Intel 82371AB/EB (PIIX4) USB controller port 0xb000-0xb01f irq 
10 at device 7.2 on pci0
uhci0: [GIANT-LOCKED]
usb0: Intel 82371AB/EB (PIIX4) USB controller on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
ums0: Microsoft Microsoft 5-Button Mouse with IntelliEye(TM), rev 
1.10/3.00, addr 2, iclass 3/1
ums0: 5 buttons and Z dir.
pci0: bridge, PCI-unknown at device 7.3 (no driver attached)
atapci1: HighPoint HPT374 (channel 0+1) UDMA133 controller port 
0xc400-0xc4ff,0xc000-0xc003,0xbc00-0xbc07,0xb800-0xb803,0xb400-0xb407 
irq 11 at device 17.0 on pci0
ata2: channel #0 on atapci1
ata3: channel #1 on atapci1
atapci2: HighPoint HPT374 (channel 2+3) UDMA133 controller port 
0xd800-0xd8ff,0xd400-0xd403,0xd000-0xd007,0xcc00-0xcc03,0xc800-0xc807 
irq 11 at device 17.1 on pci0
ata4: channel #0 on atapci2
ata5: channel #1 on atapci2
dc0: ADMtek AN985 10/100BaseTX port 0xdc00-0xdcff mem 
0xec00-0xec0003ff irq 12 at device 18.0 on pci0
miibus0: MII bus on dc0
ukphy0: Generic IEEE 802.3u media interface on miibus0
ukphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
dc0: Ethernet address: 00:04:5a:56:80:76
dc0: if_start running deferred for Giant
dc0: [GIANT-LOCKED]
pci0: multimedia, audio at device 19.0 (no driver attached)
cpu0 on motherboard
orm0: ISA Option ROMs at iomem 0xcc000-0xcdfff,0xc-0xc8fff on isa0
pmtimer0 on isa0
atkbdc0: Keyboard controller (i8042) at port 0x64,0x60 on isa0
atkbd0: AT Keyboard irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
fdc0: Enhanced floppy controller at port 0x3f0-0x3f5 irq 6 drq 2 on isa0
fdc0: [FAST]
fd0: 1440-KB 3.5 drive on fdc0 drive 0
ppc0: Parallel port at port 0x378-0x37f irq 7 on isa0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/8 bytes threshold
ppbus0: Parallel port bus on ppc0
plip0: PLIP network interface on ppbus0
lpt0: Printer on ppbus0
lpt0: Interrupt-driven port
ppi0: Parallel I/O on ppbus0
sc0: System console at flags 0x100 on isa0
sc0: VGA 16 virtual consoles, flags=0x300
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
vga0: Generic ISA VGA at port 0x3c0-0x3df iomem 0xa-0xb on isa0
unknown: PNP0303 can't assign resources (port)
unknown: PNP0c02 can't assign resources (memory)
unknown: PNP0a03 can't assign resources (port)
unknown: PNP0501 can't assign resources (port)
unknown: PNP0501 can't assign resources (port)
unknown: PNP0700 can't assign resources (port)
unknown: 

Re: slow system freeze - data

2004-12-27 Thread Kris Kennaway
On Tue, Dec 28, 2004 at 12:38:44PM +1100, Peter Jeremy wrote:
 On Sun, 2004-Dec-26 08:14:49 +0100, Benjamin Lutz wrote:
 The freeze just happened again. I managed to get into the debugger and get 
 some info.
 
 The info you dumped shows that there's a filesystem deadlock on
 ad4s1f.  This is consistent with the behaviour you reported - the
 system is running normally but as soon as a process trys to access
 that filesystem, it freezes.  Eventually, everything all processes are
 frozen.
 
 Unfortunately, it's not clear (to me) where to go next.  Printing the
 locked vnodes might help but that's not easy to do without gdb.
 
 The first app that froze as far as I could tell was xmms.
 
 Actually, the locks suggest that the problem started with pid 678 - kdeinit.
 This is unlikely to be 

Does xmms try to run with rtprio or idprio?  Those are still broken,
and can lead to deadlocks, afaik.

Kris


pgp1YsPnuP94P.pgp
Description: PGP signature


Re: TIMEOUT - WRITE_DMA - A possible FIX! turn off ACPI

2004-12-27 Thread whitevamp

- Original Message - 
From: Joe Koberg [EMAIL PROTECTED]
To: Zsolt Kúti [EMAIL PROTECTED]
Cc: freebsd-current@freebsd.org; freebsd-stable@freebsd.org
Sent: Monday, December 27, 2004 6:29 PM
Subject: Re: TIMEOUT - WRITE_DMA - A possible FIX! turn off ACPI


 Zsolt Kúti wrote:

 My system produces these messages that I already know well from this
 list (as well ;):
 ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=213249674
 
 
 Like many people I was confronted with TIMEOUT - READ_DMA
 and TIMEOUT - WRITE_DMA errors on my drives. I was frustrated.
 But I found a workaround: Turning off ACPI.

 I just received a Highpoint RocketRaid 1640 controller,
 2 Maxtor 300GB drives, and a Supermicro 5-drive SATA cage.
 I am testing this configuration for a storage server.

 I am using an old motherboard, DTK brand, Slot 1. 300A Celeron.

 Under a fresh install of 5.3-RELEASE I am unable to read or write
 both drives heavily at the same time.  One drive alone seems to work
 OK. When I run dd blasting both drives with seqential IO, I get
 TIMEOUT - WRITE(READ)_DMA. Repeatably, within 15 seconds.

 However I got a good test before I installed 5.3-R, the box was running
 with 5.3-BETA. Only difference was I booted without ACPI.

 So I rebooted the freshly installed 5.3-R without ACPI, and It works!
 I can read at 50MB/s per drive concurrently (hitting PCI bus speed
 limit?), and write at 30MB/s per drive concurrently. No errors so
 far, and its been dd'ing for a half hour.

 I hope this report helps someone!



 Joe Koberg
 joe at osoft dot us

I 2 have been seeing this error sence 4.9 with my westeren digital 80gig hd
the error message has changed a little between the two vers .. but i do have
this in device.hints  , hint.acpi.0.disabled=1  , and i still see the
error messages . any way i just whanted to post in and let every one know
that turning off ACPI , might not work for you.
ohh and off subject here , i had acpi turned off becouse my net cards
wouldnt work with it on ..








 dmesg:

 FreeBSD 5.3-RELEASE #0: Fri Nov  5 04:19:18 UTC 2004
 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC
 Timecounter i8254 frequency 1193182 Hz quality 0
 CPU: Pentium II/Pentium II Xeon/Celeron (307.84-MHz 686-class CPU)
   Origin = GenuineIntel  Id = 0x660  Stepping = 0


Features=0x183f9ffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,
PAT,PSE36,MMX,FXSR
 real memory  = 402587648 (383 MB)
 avail memory = 384270336 (366 MB)
 npx0: [FAST]
 npx0: math processor on motherboard
 npx0: INT 16 interface
 pcib0: Intel 82443LX (440 LX) host to PCI bridge pcibus 0 on motherboard
 pir0: PCI Interrupt Routing Table: 7 Entries on motherboard
 pci0: PCI bus on pcib0
 agp0: Intel 82443LX (440 LX) host to PCI bridge mem
 0xe000-0xe3ff at device 0.0 on pci0
 pcib1: PCI-PCI bridge at device 1.0 on pci0
 pci1: PCI bus on pcib1
 pci1: display, VGA at device 0.0 (no driver attached)
 isab0: PCI-ISA bridge at device 7.0 on pci0
 isa0: ISA bus on isab0
 atapci0: Intel PIIX4 UDMA33 controller port
 0xf000-0xf00f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 7.1 on pci0
 ata0: channel #0 on atapci0
 ata1: channel #1 on atapci0
 uhci0: Intel 82371AB/EB (PIIX4) USB controller port 0xb000-0xb01f irq
 10 at device 7.2 on pci0
 uhci0: [GIANT-LOCKED]
 usb0: Intel 82371AB/EB (PIIX4) USB controller on uhci0
 usb0: USB revision 1.0
 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
 uhub0: 2 ports with 2 removable, self powered
 ums0: Microsoft Microsoft 5-Button Mouse with IntelliEye(TM), rev
 1.10/3.00, addr 2, iclass 3/1
 ums0: 5 buttons and Z dir.
 pci0: bridge, PCI-unknown at device 7.3 (no driver attached)
 atapci1: HighPoint HPT374 (channel 0+1) UDMA133 controller port
 0xc400-0xc4ff,0xc000-0xc003,0xbc00-0xbc07,0xb800-0xb803,0xb400-0xb407
 irq 11 at device 17.0 on pci0
 ata2: channel #0 on atapci1
 ata3: channel #1 on atapci1
 atapci2: HighPoint HPT374 (channel 2+3) UDMA133 controller port
 0xd800-0xd8ff,0xd400-0xd403,0xd000-0xd007,0xcc00-0xcc03,0xc800-0xc807
 irq 11 at device 17.1 on pci0
 ata4: channel #0 on atapci2
 ata5: channel #1 on atapci2
 dc0: ADMtek AN985 10/100BaseTX port 0xdc00-0xdcff mem
 0xec00-0xec0003ff irq 12 at device 18.0 on pci0
 miibus0: MII bus on dc0
 ukphy0: Generic IEEE 802.3u media interface on miibus0
 ukphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 dc0: Ethernet address: 00:04:5a:56:80:76
 dc0: if_start running deferred for Giant
 dc0: [GIANT-LOCKED]
 pci0: multimedia, audio at device 19.0 (no driver attached)
 cpu0 on motherboard
 orm0: ISA Option ROMs at iomem 0xcc000-0xcdfff,0xc-0xc8fff on isa0
 pmtimer0 on isa0
 atkbdc0: Keyboard controller (i8042) at port 0x64,0x60 on isa0
 atkbd0: AT Keyboard irq 1 on atkbdc0
 kbd0 at atkbd0
 atkbd0: [GIANT-LOCKED]
 fdc0: Enhanced floppy controller at port 0x3f0-0x3f5 irq 6 drq 2 on isa0
 fdc0: [FAST]
 fd0: 1440-KB 3.5 drive on fdc0 drive 0
 ppc0: Parallel port at port 0x378-0x37f irq 7 on isa0
 ppc0: SMC-like chipset 

Re: slow system freeze - data

2004-12-27 Thread Benjamin Lutz
 Does xmms try to run with rtprio or idprio?  Those are still broken,
 and can lead to deadlocks, afaik.

No, none all PIDs are listed as normal by idprio and rtprio, except the 
[pagezero] process, which is listed as idle priority 31 by both 
programs, and I suppose that's intentional.

Greetings
Benjamin


pgpyeZhfQAyJI.pgp
Description: PGP signature


Re: slow system freeze - data

2004-12-27 Thread Kris Kennaway
On Tue, Dec 28, 2004 at 03:52:50AM +0100, Benjamin Lutz wrote:
  Does xmms try to run with rtprio or idprio?  Those are still broken,
  and can lead to deadlocks, afaik.
 
 No, none all PIDs are listed as normal by idprio and rtprio, except the 
 [pagezero] process, which is listed as idle priority 31 by both 
 programs, and I suppose that's intentional.

You might have already mentioned this, but there aren't any other
messages being logged on the system console or in syslog, are there?
If your drive is failing this will lead to the above symptoms as well.

Kris




pgp2mc4aP86kp.pgp
Description: PGP signature