Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-20 Thread Maciej Wierzbicki

Gary Mulder wrote on 2005-07-18 23:39:

From personal experience I can repeat what Matt has stated. It seems to be 
related to what NIC you have. I have had crashes with fxp (Intel Pro 
100MBit) and bge (Broadcom Gigabit) NICs under moderate network load.


It seems not. I had crashes with fxp, xl, bge and em, IIRC.


Removing ipf reduced but did not eliminate the crashes.


Removing IPF elimitated crashes in every case of my SMP boxes.

--
*   Maciej Wierzbicki * At paranoia's poison door  *
*   VOO1-RIPE   VOO1-6BONE   *
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-20 Thread Kris Kennaway
On Wed, Jul 20, 2005 at 03:57:57PM +0200, Maciej Wierzbicki wrote:
 Gary Mulder wrote on 2005-07-18 23:39:
 
 From personal experience I can repeat what Matt has stated. It seems to be 
 related to what NIC you have. I have had crashes with fxp (Intel Pro 
 100MBit) and bge (Broadcom Gigabit) NICs under moderate network load.
 
 It seems not. I had crashes with fxp, xl, bge and em, IIRC.
 
 Removing ipf reduced but did not eliminate the crashes.
 
 Removing IPF elimitated crashes in every case of my SMP boxes.

Folks, you're talking about different things:

* Panics with IPF enabled on SMP and any network card (network card is
not relevant for this problem, which is IPF).  This problem is
understood, and the only current solution is 'don't use both of SMP
and IPF'.

* What Gary is talking about, which are apparently panics without IPF
enabled on several NICs.  Since this is a new problem, Gary needs to
do some additional diagnosis work so that someone can investigate
them.

Let's try to keep the issue clear :-)

Kris


pgpewNrzjSGb5.pgp
Description: PGP signature


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-20 Thread Kris Kennaway
On Mon, Jul 18, 2005 at 05:39:40PM -0400, Gary Mulder wrote:
 On Mon, 18 Jul 2005, Pawel Malachowski wrote:
 
  On Mon, Jul 18, 2005 at 04:09:58PM -0400, Matt Juszczak wrote:
  
   Correct.  IPF is unstable with our SMP (most of the time) - based 5.x 
   boxes.  VERY unstable.  VERY VERY unstable.
  
  Hm, this sounds bad. What is debug.mpsafenet set to? How big is traffic?
  
  I have one SMP box with ipnat, routing some megabits (even during night
  it's more than 30-40Mbps) without problems, however, ipnat is used only
  for very small group of hosts right now.
  But we plan to use ipnat more heavily so it sounds a bit scary. ;)
  
 From personal experience I can repeat what Matt has stated. It seems to be 
 related to what NIC you have. I have had crashes with fxp (Intel Pro 
 100MBit) and bge (Broadcom Gigabit) NICs under moderate network load. 
 Removing ipf reduced but did not eliminate the crashes. debug.mpsafe also 
 reduced but did not eliminate the crashes.

No, that's different then.  Please report your bugs in the usual way
(gdb traceback, etc).

Kris


pgp7c9XhqJG0U.pgp
Description: PGP signature


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-19 Thread Vivek Khera


On Jul 18, 2005, at 5:39 PM, Gary Mulder wrote:

Another person on the freebsd-amd64 list reported similar network- 
related

crashes until he switched to em (Intel Gigabit Ethernet) NICs.



that was probably me... but I don't have any firewall on these boxes  
as they are not hooked up to the internet -- just internal back-end  
DB servers.


Vivek Khera, Ph.D.
+1-301-869-4449 x806


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-18 Thread Gary Mulder
On Mon, 18 Jul 2005, Pawel Malachowski wrote:

 On Mon, Jul 18, 2005 at 04:09:58PM -0400, Matt Juszczak wrote:
 
  Correct.  IPF is unstable with our SMP (most of the time) - based 5.x 
  boxes.  VERY unstable.  VERY VERY unstable.
 
 Hm, this sounds bad. What is debug.mpsafenet set to? How big is traffic?
 
 I have one SMP box with ipnat, routing some megabits (even during night
 it's more than 30-40Mbps) without problems, however, ipnat is used only
 for very small group of hosts right now.
 But we plan to use ipnat more heavily so it sounds a bit scary. ;)
 
From personal experience I can repeat what Matt has stated. It seems to be 
related to what NIC you have. I have had crashes with fxp (Intel Pro 
100MBit) and bge (Broadcom Gigabit) NICs under moderate network load. 
Removing ipf reduced but did not eliminate the crashes. debug.mpsafe also 
reduced but did not eliminate the crashes.
 
Another person on the freebsd-amd64 list reported similar network-related 
crashes until he switched to em (Intel Gigabit Ethernet) NICs.
 
Gary


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-18 Thread Matt Juszczak
For me, 5 days up time after switching from IPF to PF. Before the switch a 
couple of hours of uptime was the maximum. Seems like the crashes are caused 
by ipfilter.



Still same for me :)  Uptime almost 20 days now after switching to PF.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-18 Thread Pawel Malachowski
On Mon, Jul 18, 2005 at 04:09:58PM -0400, Matt Juszczak wrote:

 Correct.  IPF is unstable with our SMP (most of the time) - based 5.x 
 boxes.  VERY unstable.  VERY VERY unstable.

Hm, this sounds bad. What is debug.mpsafenet set to? How big is traffic?

I have one SMP box with ipnat, routing some megabits (even during night
it's more than 30-40Mbps) without problems, however, ipnat is used only
for very small group of hosts right now.
But we plan to use ipnat more heavily so it sounds a bit scary. ;)


-- 
Paweł Małachowski
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-18 Thread Matt Juszczak

I find this messages kind of weird. Are you saying your servers only run long 
periods of uptime with pf and *not* with ipf? I run a server and almost never 
put it down. IPF performs very well, including a lot of natting for my home 
network.


Correct.  IPF is unstable with our SMP (most of the time) - based 5.x 
boxes.  VERY unstable.  VERY VERY unstable.


-Matt
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-18 Thread dick hoogendijk
On Mon, 18 Jul 2005 14:32:09 -0400 (EDT)
Matt Juszczak [EMAIL PROTECTED] wrote:

  For me, 5 days up time after switching from IPF to PF. Before the switch a 
  couple of hours of uptime was the maximum. Seems like the crashes are 
  caused 
  by ipfilter.
 
 
 Still same for me :)  Uptime almost 20 days now after switching to PF.

I find this messages kind of weird. Are you saying your servers only run long 
periods of uptime with pf and *not* with ipf? I run a server and almost never 
put it down. IPF performs very well, including a lot of natting for my home 
network.

-- 
dick -- http://nagual.st/ -- PGP/GnuPG key: F86289CE
++ Running FreeBSD 4.11-stable ++ FreeBSD 5.4
+ Nai tiruvantel ar vayuvantel i Valar tielyanna nu vilja
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-17 Thread Blaz Zupan

On Tue, 12 Jul 2005, Matt Juszczak wrote:
So far a 13 day up time after switching from IPF to PF.  If thats not the 
problem, I hope I find it soon considering this is a production server ... 
but it seems to be more stable.


For me, 5 days up time after switching from IPF to PF. Before the switch a 
couple of hours of uptime was the maximum. Seems like the crashes are caused 
by ipfilter.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-12 Thread Blaz Zupan

Could you try SMP kernel without IPF support and without using IPF module?
Could you confirm, that your SMP kernel is not crashing when you do not use
IPF?


Interesting that the box has survived almost two days now, while it was always 
crashing after at least 8 hours. Anyway, I have compiled a new kernel without 
ipfilter, I have used pf instead (the configuration changes from ipfilter to 
pf were mostly minor). We'll see how long the box survives now.


Blaz Zupan,  Medinet d.o.o, Trzaska 85, SI-2000 Maribor, Slovenia
E-mail: [EMAIL PROTECTED], Tel: +386 2 320 6320, Fax: +386 2 320 6325
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-12 Thread Maciej Wierzbicki

Blaz Zupan wrote on 2005-07-12 13:17:

Interesting that the box has survived almost two days now, while it was always 
crashing after at least 8 hours. Anyway, I have compiled a new kernel without 
ipfilter, I have used pf instead (the configuration changes from ipfilter to 
pf were mostly minor). We'll see how long the box survives now.


Please read thread titled Two Options: which to choose? (2005-06-30 by 
 Matt Juszczak, freebsd-stable), especially Max Laier answer to my mail 
in this thread.


--
*   Maciej Wierzbicki * At paranoia's poison door  *
*   VOO1-RIPE   VOO1-6BONE   *
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-12 Thread Matt Juszczak
Yes, there is absolutely no difference. Disabled HTT in the BIOS and in 
FreeBSD, the box still crashes.


Matt again :)

So far a 13 day up time after switching from IPF to PF.  If thats not the 
problem, I hope I find it soon considering this is a production server ... 
but it seems to be more stable.


*Knock On Wood*

-Matt
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-10 Thread Blaz Zupan
In order for this problem to not get lost on the freebsd-stable mailing list, 
I have opened a PR:


http://www.freebsd.org/cgi/query-pr.cgi?pr=83220
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-10 Thread Maciej Wierzbicki
On Sun, Jul 10, 2005 at 04:58:08PM +0200, Blaz Zupan wrote:

 In order for this problem to not get lost on the freebsd-stable mailing 
 list, I have opened a PR:
 
 http://www.freebsd.org/cgi/query-pr.cgi?pr=83220

Could you try SMP kernel without IPF support and without using IPF module?
Could you confirm, that your SMP kernel is not crashing when you do not use
IPF?

-- 
*   Maciej Wierzbicki * At paranoia's poison door  *
*   VOO1-RIPE   VOO1-6BONE   *
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-06 Thread Blaz Zupan

On Fri, 1 Jul 2005, Kris Kennaway wrote:

On Tue, Jun 28, 2005 at 11:26:06AM -0400, Matt Juszczak wrote:

After CPUID: 1, the machine locks cold and nothing else is printed to
the screen.


Try two things:

1) adding 'options KDB_STOP_NMI' to your kernel config.


I just learned that you also need to set the
debug.kdb.stop_cpus_with_nmi=1 sysctl (e.g. in sysctl.conf).


I'm experiencing the same crashes as Matt, but on 5.4-RELEASE-p3. The machine 
is a HP DL380 G3 and it is heavily loaded (postfix mail server running 
amavisd-new with antivirus and antispam, so it has heavy IO and CPU load). It 
does not survive more than a couple of hours, while it is rock stable running 
4.11. We have four machines like this, three of them are now again running 
4.11 and we left the fourth one at 5.4. We have two other DL380 servers 
working on our outbound mail queue, but they are not SMP and they are rock 
stable on 5.4.


Without KDB_STOP_NMI, the machine was basically stuck after a crash.

Now I've finally landed in the kernel debugger and I have a trace from DDB and 
have also been able to generate a crashdump with call doadump.


If a developer is willing to investigate, I have:
- the vmcore file from the crash (its size is 1GB)
- the corresponding kernel, compiled with debug symbols
- a GIF of the console at the time of the crash with the backtrace at the time
  of crash
- a dmesg from the box (see below)
- the kernel config file

Please contact me if you want to investigate this further.

Just in case, here is a dmesg from the box:

Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.4-RELEASE-p3 #0: Tue Jul  5 18:37:15 CEST 2005
[EMAIL PROTECTED]:/usr/obj/usr/src5/sys/DL380
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 3.06GHz (3049.93-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0xf29  Stepping = 9
  
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
  Hyperthreading: 2 logical CPUs
real memory  = 1073717248 (1023 MB)
avail memory = 1045372928 (996 MB)
ACPI APIC Table: COMPAQ 0083
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  6
 cpu3 (AP): APIC ID:  7
MADT: Forcing active-low polarity and level trigger for SCI
ioapic0 Version 1.1 irqs 0-15 on motherboard
ioapic1 Version 1.1 irqs 16-31 on motherboard
ioapic2 Version 1.1 irqs 32-47 on motherboard
ioapic3 Version 1.1 irqs 48-63 on motherboard
npx0: math processor on motherboard
npx0: INT 16 interface
acpi0: COMPAQ P29 on motherboard
acpi0: Power Button (fixed)
Timecounter ACPI-safe frequency 3579545 Hz quality 1000
acpi_timer0: 32-bit timer at 3.579545MHz port 0x920-0x923 on acpi0
cpu0: ACPI CPU on acpi0
cpu1: ACPI CPU on acpi0
cpu2: ACPI CPU on acpi0
cpu3: ACPI CPU on acpi0
pcib0: ACPI Host-PCI bridge on acpi0
pci0: ACPI PCI bus on pcib0
pci0: display, VGA at device 3.0 (no driver attached)
pci0: base peripheral at device 4.0 (no driver attached)
pci0: base peripheral at device 4.2 (no driver attached)
isab0: PCI-ISA bridge at device 15.0 on pci0
isa0: ISA bus on isab0
atapci0: ServerWorks CSB5 UDMA100 controller port 
0x2000-0x200f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 15.1 on pci0
ata0: channel #0 on atapci0
ata1: channel #1 on atapci0
ohci0: OHCI (generic) USB controller mem 0xf5ef-0xf5ef0fff irq 7 at 
device 15.2 on pci0
usb0: OHCI version 1.0, legacy support
usb0: SMM does not respond, resetting
usb0: OHCI (generic) USB controller on ohci0
usb0: USB revision 1.0
uhub0: (0x1166) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 4 ports with 4 removable, self powered
pcib1: ACPI Host-PCI bridge on acpi0
pci1: ACPI PCI bus on pcib1
ciss0: Compaq Smart Array 5i port 0x3000-0x30ff mem 
0xf7cf-0xf7cf3fff,0xf7dc-0xf7df irq 30 at device 3.0 on pci1
pcib2: ACPI Host-PCI bridge on acpi0
pci2: ACPI PCI bus on pcib2
bge0: Broadcom BCM5703 Gigabit Ethernet, ASIC rev. 0x1002 mem 
0xf7ef-0xf7ef irq 29 at device 1.0 on pci2
miibus0: MII bus on bge0
brgphy0: BCM5703 10/100/1000baseTX PHY on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 
1000baseTX-FDX, auto
bge0: Ethernet address: 00:0e:7f:20:22:91
bge1: Broadcom BCM5703 Gigabit Ethernet, ASIC rev. 0x1002 mem 
0xf7ee-0xf7ee irq 31 at device 2.0 on pci2
miibus1: MII bus on bge1
brgphy1: BCM5703 10/100/1000baseTX PHY on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 
1000baseTX-FDX, auto
bge1: Ethernet address: 00:0e:7f:20:22:90
pcib3: ACPI Host-PCI bridge on acpi0
pci3: ACPI PCI bus on pcib3
pcib4: ACPI Host-PCI bridge on acpi0
pci6: ACPI PCI bus on pcib4
pci6: base peripheral, PCI hot-plug controller at device 30.0 (no driver 
attached)
acpi_tz0: Thermal Zone on 

Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-06 Thread Claus Guttesen
 I'm experiencing the same crashes as Matt, but on 5.4-RELEASE-p3. The machine
 is a HP DL380 G3 and it is heavily loaded (postfix mail server running
 amavisd-new with antivirus and antispam, so it has heavy IO and CPU load). It
 does not survive more than a couple of hours, while it is rock stable running
 4.11. We have four machines like this, three of them are now again running
 4.11 and we left the fourth one at 5.4. We have two other DL380 servers
 working on our outbound mail queue, but they are not SMP and they are rock
 stable on 5.4.

 CPU: Intel(R) Xeon(TM) CPU 3.06GHz (3049.93-MHz 686-class CPU)
Origin = GenuineIntel  Id = 0xf29  Stepping = 9

 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
Hyperthreading: 2 logical CPUs

Have you tried to disable HTT? It's doesn't give you alot, and in some
cases it decreases performance.

regards
Claus
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-06 Thread Blaz Zupan

Have you tried to disable HTT? It's doesn't give you alot, and in some
cases it decreases performance.


Yes, there is absolutely no difference. Disabled HTT in the BIOS and in 
FreeBSD, the box still crashes.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-06 Thread Kris Kennaway
On Wed, Jul 06, 2005 at 09:40:20AM +0200, Blaz Zupan wrote:

 If a developer is willing to investigate, I have:
 - the vmcore file from the crash (its size is 1GB)
 - the corresponding kernel, compiled with debug symbols

Please obtain the backtrace with kgdb.

Kris


pgpoFrkAp3yjc.pgp
Description: PGP signature


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-06 Thread Blaz Zupan

On Wed, 6 Jul 2005, Kris Kennaway wrote:

Please obtain the backtrace with kgdb.


Here you go:

[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined 
symbol ps_pglobal_lookup]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-marcel-freebsd.
#0  doadump () at pcpu.h:159
159 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:159
#1  0xc044b006 in db_fncall (dummy1=0, dummy2=0, dummy3=-1067606609, dummy4=0xe4b6c9d0 
üÉśä(\205]ŔčÉśäěÉśä\222\a)
at /usr/src5/sys/ddb/db_command.c:531
#2  0xc044ae14 in db_command (last_cmdp=0xc0674644, cmd_table=0x0, 
aux_cmd_tablep=0xc064226c, aux_cmd_tablep_end=0xc0642270)
at /usr/src5/sys/ddb/db_command.c:349
#3  0xc044aedc in db_command_loop () at /usr/src5/sys/ddb/db_command.c:455
#4  0xc044ca75 in db_trap (type=12, code=0) at /usr/src5/sys/ddb/db_main.c:221
#5  0xc04e6599 in kdb_trap (type=12, code=0, tf=0xe4b6cb3c) at 
/usr/src5/sys/kern/subr_kdb.c:468
#6  0xc05f4c79 in trap_fatal (frame=0xe4b6cb3c, eva=36) at 
/usr/src5/sys/i386/i386/trap.c:812
#7  0xc05f43e9 in trap (frame=
  {tf_fs = -1040580584, tf_es = -1029439472, tf_ds = 16, tf_edi = 
-1038000128, tf_esi = -1066898900, tf_ebp = -457782384, tf_isp = -457782424, 
tf_ebx = -1040530304, tf_edx = -1040524364, tf_ecx = -1040524544, tf_eax = 0, 
tf_trapno = 12, tf_err = 0, tf_eip = -1068574101, tf_cs = 8, tf_eflags = 65683, 
tf_esp = 180, tf_ss = 0}) at /usr/src5/sys/i386/i386/trap.c:255
#8  0xc05e283a in calltrap () at /usr/src5/sys/i386/i386/exception.s:140
#9  0xc1fa0018 in ?? ()
#10 0xc2a40010 in ?? ()
#11 0x0010 in ?? ()
#12 0xc2216000 in ?? ()
#13 0xc0686a2c in tcbinfo ()
#14 0xe4b6cb90 in ?? ()
#15 0xe4b6cb68 in ?? ()
#16 0xc1fac480 in ?? ()
#17 0xc1fadbb4 in ?? ()
#18 0xc1fadb00 in ?? ()
#19 0x in ?? ()
#20 0x000c in ?? ()
#21 0x in ?? ()
#22 0xc04eda6b in propagate_priority (td=0xc2216000) at 
/usr/src5/sys/kern/subr_turnstile.c:243
#23 0xc04ee225 in turnstile_wait (ts=0xc1fadb00, lock=0xc0686a2c, 
owner=0xc2216000)
at /usr/src5/sys/kern/subr_turnstile.c:556
#24 0xc04c5ced in _mtx_lock_sleep (m=0xc0686a2c, td=0xc1fac480, opts=0, 
file=0x0, line=0)
at /usr/src5/sys/kern/kern_mutex.c:552
#25 0xc0559ad8 in tcp_usr_rcvd (so=0x0, flags=0) at 
/usr/src5/sys/netinet/tcp_usrreq.c:602
#26 0xc0506103 in soreceive (so=0xc27bf798, psa=0x0, uio=0xe4b6cc88, mp0=0x0, 
controlp=0x0, flagsp=0x0)
at /usr/src5/sys/kern/uipc_socket.c:1395
#27 0xc04f4bd9 in soo_read (fp=0x0, uio=0xe4b6cc88, active_cred=0xc2884a80, 
flags=0, td=0xc1fac480)
at /usr/src5/sys/kern/sys_socket.c:91
#28 0xc04ee865 in dofileread (td=0xc1fac480, fp=0xc2e17bb0, fd=10, buf=0x0, 
nbyte=4096, offset=Unhandled dwarf expression opcode 0x93
) at file.h:233
#29 0xc04ee72f in read (td=0xc1fac480, uap=0xe4b6cd14) at 
/usr/src5/sys/kern/sys_generic.c:107
#30 0xc05f4fe7 in syscall (frame=
  {tf_fs = 47, tf_es = 47, tf_ds = -1078001617, tf_edi = 10, tf_esi = 300, 
tf_ebp = -1077942168, tf_isp = -457781900, tf_ebx = 134822152, tf_edx = 0, 
tf_ecx = 10, tf_eax = 3, tf_trapno = 0, tf_err = 2, tf_eip = 672556795, tf_cs = 
31, tf_eflags = 658, tf_esp = -1077942212, tf_ss = 47}) at 
/usr/src5/sys/i386/i386/trap.c:1009
#31 0xc05e288f in Xint0x80_syscall () at /usr/src5/sys/i386/i386/exception.s:201
#32 0x002f in ?? ()
#33 0x002f in ?? ()
#34 0xbfbf002f in ?? ()
#35 0x000a in ?? ()
#36 0x012c in ?? ()
#37 0xbfbfe868 in ?? ()
#38 0xe4b6cd74 in ?? ()
#39 0x08093908 in ?? ()
#40 0x in ?? ()
#41 0x000a in ?? ()
#42 0x0003 in ?? ()
#43 0x in ?? ()
#44 0x0002 in ?? ()
#45 0x281666fb in ?? ()
#46 0x001f in ?? ()
#47 0x0292 in ?? ()
#48 0xbfbfe83c in ?? ()
#49 0x002f in ?? ()
#50 0x in ?? ()
#51 0x in ?? ()
#52 0x in ?? ()
#53 0x in ?? ()
#54 0x2c75b000 in ?? ()
#55 0xc22de000 in ?? ()
#56 0xc1fac480 in ?? ()
#57 0xe4b6ccac in ?? ()
#58 0xe4b6cc94 in ?? ()
#59 0xc1f26000 in ?? ()
#60 0xc04ded13 in sched_switch (td=0x12c, newtd=0x8093908, flags=Cannot access 
memory at address 0xbfbfe878
) at /usr/src5/sys/kern/sched_4bsd.c:881
Previous frame inner to this frame (corrupt stack?)
(kgdb) quit___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-06 Thread Kris Kennaway
On Wed, Jul 06, 2005 at 06:10:20PM +0200, Blaz Zupan wrote:
 On Wed, 6 Jul 2005, Kris Kennaway wrote:
 Please obtain the backtrace with kgdb.
 
 Here you go:

 #9  0xc1fa0018 in ?? ()
 #10 0xc2a40010 in ?? ()
 #11 0x0010 in ?? ()
 #12 0xc2216000 in ?? ()
 #13 0xc0686a2c in tcbinfo ()
 #14 0xe4b6cb90 in ?? ()
 #15 0xe4b6cb68 in ?? ()
 #16 0xc1fac480 in ?? ()
 #17 0xc1fadbb4 in ?? ()
 #18 0xc1fadb00 in ?? ()
 #19 0x in ?? ()
 #20 0x000c in ?? ()
 #21 0x in ?? ()
 #22 0xc04eda6b in propagate_priority (td=0xc2216000) at 
 /usr/src5/sys/kern/subr_turnstile.c:243
 #23 0xc04ee225 in turnstile_wait (ts=0xc1fadb00, lock=0xc0686a2c, 
 owner=0xc2216000)
 at /usr/src5/sys/kern/subr_turnstile.c:556
 #24 0xc04c5ced in _mtx_lock_sleep (m=0xc0686a2c, td=0xc1fac480, opts=0, 
 file=0x0, line=0)
 at /usr/src5/sys/kern/kern_mutex.c:552
 #25 0xc0559ad8 in tcp_usr_rcvd (so=0x0, flags=0) at 
 /usr/src5/sys/netinet/tcp_usrreq.c:602

Interesting, this seems to finger the TCP code.  Are you compiling
your kernel with -O2 though (this causes bogus stack frames like you
have here)?  If so, recompile with -O and try to obtain another trace.

CC'ing rwatson.

Kris

 #26 0xc0506103 in soreceive (so=0xc27bf798, psa=0x0, uio=0xe4b6cc88, 
 mp0=0x0, controlp=0x0, flagsp=0x0)
 at /usr/src5/sys/kern/uipc_socket.c:1395
 #27 0xc04f4bd9 in soo_read (fp=0x0, uio=0xe4b6cc88, active_cred=0xc2884a80, 
 flags=0, td=0xc1fac480)
 at /usr/src5/sys/kern/sys_socket.c:91
 #28 0xc04ee865 in dofileread (td=0xc1fac480, fp=0xc2e17bb0, fd=10, buf=0x0, 
 nbyte=4096, offset=Unhandled dwarf expression opcode 0x93
 ) at file.h:233
 #29 0xc04ee72f in read (td=0xc1fac480, uap=0xe4b6cd14) at 
 /usr/src5/sys/kern/sys_generic.c:107
 #30 0xc05f4fe7 in syscall (frame=
   {tf_fs = 47, tf_es = 47, tf_ds = -1078001617, tf_edi = 10, tf_esi = 
   300, tf_ebp = -1077942168, tf_isp = -457781900, tf_ebx = 134822152, 
   tf_edx = 0, tf_ecx = 10, tf_eax = 3, tf_trapno = 0, tf_err = 2, 
   tf_eip = 672556795, tf_cs = 31, tf_eflags = 658, tf_esp = 
   -1077942212, tf_ss = 47}) at /usr/src5/sys/i386/i386/trap.c:1009
 #31 0xc05e288f in Xint0x80_syscall () at 
 /usr/src5/sys/i386/i386/exception.s:201
 #32 0x002f in ?? ()
 #33 0x002f in ?? ()
 #34 0xbfbf002f in ?? ()
 #35 0x000a in ?? ()
 #36 0x012c in ?? ()
 #37 0xbfbfe868 in ?? ()
 #38 0xe4b6cd74 in ?? ()
 #39 0x08093908 in ?? ()
 #40 0x in ?? ()
 #41 0x000a in ?? ()
 #42 0x0003 in ?? ()
 #43 0x in ?? ()
 #44 0x0002 in ?? ()
 #45 0x281666fb in ?? ()
 #46 0x001f in ?? ()
 #47 0x0292 in ?? ()
 #48 0xbfbfe83c in ?? ()
 #49 0x002f in ?? ()
 #50 0x in ?? ()
 #51 0x in ?? ()
 #52 0x in ?? ()
 #53 0x in ?? ()
 #54 0x2c75b000 in ?? ()
 #55 0xc22de000 in ?? ()
 #56 0xc1fac480 in ?? ()
 #57 0xe4b6ccac in ?? ()
 #58 0xe4b6cc94 in ?? ()
 #59 0xc1f26000 in ?? ()
 #60 0xc04ded13 in sched_switch (td=0x12c, newtd=0x8093908, flags=Cannot 
 access memory at address 0xbfbfe878
 ) at /usr/src5/sys/kern/sched_4bsd.c:881
 Previous frame inner to this frame (corrupt stack?)
 (kgdb) quit



pgpWbDTNfOSx8.pgp
Description: PGP signature


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-06 Thread Blaz Zupan

On Wed, 6 Jul 2005, Kris Kennaway wrote:

Interesting, this seems to finger the TCP code.  Are you compiling
your kernel with -O2 though (this causes bogus stack frames like you
have here)?  If so, recompile with -O and try to obtain another trace.


Nope, no funky compile options, all at the default. The only weird thing I'm 
doing is that the world is built on a 4.11 box and is shared between all our 
boxes, so that we don't need to compile multiple times. The kernel config is 
here:


machine i386
cpu I686_CPU
ident   DL380
options SCHED_4BSD  # 4BSD scheduler
options INET# InterNETworking
options INET6   # IPv6 communications protocols
options FFS # Berkeley Fast Filesystem
options SOFTUPDATES # Enable FFS soft updates support
options UFS_ACL # Support for access control lists
options UFS_DIRHASH # Improve performance on big directories
options MD_ROOT # MD is a potential root device
options GEOM_GPT# GUID Partition Tables.
options COMPAT_43   # Compatible with BSD 4.3 [KEEP THIS!]
options COMPAT_FREEBSD4 # Compatible with FreeBSD4
options SCSI_DELAY=5000 # Delay (in ms) before probing SCSI
options KTRACE  # ktrace(1) support
options SYSVSHM # SYSV-style shared memory
options SYSVMSG # SYSV-style message queues
options SYSVSEM # SYSV-style semaphores
options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time 
extensions
options KBD_INSTALL_CDEV# install a CDEV entry in /dev
options ADAPTIVE_GIANT  # Giant mutex is adaptive.
options NMBCLUSTERS=12000
options IPFILTER
options IPFILTER_LOG
options SMP
options INCLUDE_CONFIG_FILE
options KDB_STOP_NMI
options KDB
options DDB
makeoptions DEBUG=-g#Build kernel with gdb(1) debug symbols
device  apic# I/O APIC
device  isa
device  eisa
device  pci
device  fdc
device  ata
device  atapicd # ATAPI CDROM drives
options ATA_STATIC_ID   # Static device numbering
device  scbus   # SCSI bus (required for SCSI)
device  da  # Direct Access (disks)
device  ciss# Compaq Smart RAID 5*
device  atkbdc  # AT keyboard controller
device  atkbd   # AT keyboard
device  psm # PS/2 mouse
device  vga # VGA video card driver
device  sc
device  agp # support several AGP chipsets
device  npx
device  pmtimer
device  sio # 8250, 16[45]50 based serial ports
device  miibus  # MII bus support
device  bge # Broadcom BCM570xx Gigabit Ethernet
device  loop# Network loopback
device  mem # Memory and kernel memory devices
device  io  # I/O device
device  random  # Entropy device
device  ether   # Ethernet support
device  pty # Pseudo-ttys (telnet etc)
device  md  # Memory disks
device  bpf # Berkeley packet filter
device  ohci# OHCI PCI-USB interface
device  usb # USB Bus (required)
device  ukbd# Keyboard
device  ums # Mouse
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-06 Thread Kris Kennaway
On Wed, Jul 06, 2005 at 06:20:38PM +0200, Blaz Zupan wrote:
 On Wed, 6 Jul 2005, Kris Kennaway wrote:
 Interesting, this seems to finger the TCP code.  Are you compiling
 your kernel with -O2 though (this causes bogus stack frames like you
 have here)?  If so, recompile with -O and try to obtain another trace.
 
 Nope, no funky compile options, all at the default. The only weird thing 
 I'm doing is that the world is built on a 4.11 box and is shared between 
 all our boxes, so that we don't need to compile multiple times. The kernel 
 config is here:

That should be OK as long as you're not cross-compiling for different
architectures.

Kris


pgpllfkoJ5aoO.pgp
Description: PGP signature


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-06 Thread Blaz Zupan

On Wed, 6 Jul 2005, Kris Kennaway wrote:

That should be OK as long as you're not cross-compiling for different
architectures.


No, we only have i386 boxes.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-06 Thread Eirik Øverby


On Jul 6, 2005, at 6:29 PM, Blaz Zupan wrote:


On Wed, 6 Jul 2005, Kris Kennaway wrote:


That should be OK as long as you're not cross-compiling for different
architectures.



No, we only have i386 boxes.


Hi,

thanks for doing this work. I was working on preparing a similiar set  
of information, but have been too overworked lately.


We have ordered and had delivered a substansial number of DL380  
(intel) and DL385 (amd64) machines, that will all be running FreeBSD.  
However, the recent reports about trouble on these systems has made  
me wary. Perhaps this will give FreeBSD the solution it needs (I've  
seen similiar issues on other SMP systems), and me the sleep I need  
before launch in September ;)


Thanks again. Now just hoping it's helpful to someone ;)

/Eirik



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable- 
[EMAIL PROTECTED]






___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-01 Thread Kris Kennaway
On Wed, Jun 29, 2005 at 06:05:35AM -0400, Kris Kennaway wrote:
 On Tue, Jun 28, 2005 at 11:26:06AM -0400, Matt Juszczak wrote:
 
  OK, when it crashes next and is sat at the db prompt, type tr and
  press enter to get a trace.  Copy this down (or have a serial console to
  capture the output).  Also, try typing call doadump() and see if that
  succeeds in generating a crash dump.  How were you trying to generate
  one before?
  
  Gavin
   
  
  
  I can't type anything.  The machine locks up.
  
  See: http://paste.atopia.net/126
  
  After CPUID: 1, the machine locks cold and nothing else is printed to 
  the screen.
 
 Try two things:
 
 1) adding 'options KDB_STOP_NMI' to your kernel config.

I just learned that you also need to set the
debug.kdb.stop_cpus_with_nmi=1 sysctl (e.g. in sysctl.conf).

Kris


pgpw48l0Z9fZN.pgp
Description: PGP signature


RE: FreeBSD -STABLE servers repeatedly crashing

2005-06-29 Thread Piotr Gnyp

On Tue, 28 Jun 2005, Matt Juszczak [EMAIL PROTECTED] wrote:


Please try out this patch to aid the above problem with hang instead of
dump:
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/i386/i386/trap.c.diff?r1=1.275r2=1.276
This box is now crashing once every 12 hours.  I can't apply this patch :-(. 
Does anyone have any suggestions on how I can work around this? Some have 
said its an SMP problem and some have said its a 4 GB RAM problem and some 
have said its an IPF problem  if I disabled all three of those things 
would that help this box be stable until code could be fixed?


Disabling SMP helped im my case.

--
How fortunate the man with none.
--Dead Can Dance
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-06-29 Thread Kris Kennaway
On Tue, Jun 28, 2005 at 11:26:06AM -0400, Matt Juszczak wrote:

 OK, when it crashes next and is sat at the db prompt, type tr and
 press enter to get a trace.  Copy this down (or have a serial console to
 capture the output).  Also, try typing call doadump() and see if that
 succeeds in generating a crash dump.  How were you trying to generate
 one before?
 
 Gavin
  
 
 
 I can't type anything.  The machine locks up.
 
 See: http://paste.atopia.net/126
 
 After CPUID: 1, the machine locks cold and nothing else is printed to 
 the screen.

Try two things:

1) adding 'options KDB_STOP_NMI' to your kernel config.

2) If you still can't get it to break to DDB, then compile up a
debugging kernel, run kgdb on it (as described in the developers'
handbook), and list *(0xblah) where that address is the value of the
instruction pointer in the trap message (e.g. 0xc6644eff in your paste
above).  That might at least be a start.

Kris

pgp8ZCS8abEDd.pgp
Description: PGP signature


Re: FreeBSD -STABLE servers repeatedly crashing

2005-06-29 Thread Gleb Smirnoff
On Tue, Jun 28, 2005 at 01:50:48PM -0400, Matt Juszczak wrote:
M Please try out this patch to aid the above problem with hang instead of
M dump:
M 
M 
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/i386/i386/trap.c.diff?r1=1.275r2=1.276
M This patch wouldn't go through
M I tried patching against:
M __FBSDID($FreeBSD: src/sys/i386/i386/trap.c,v 1.267.2.3 2005/05/01 
M 05:34:46 dwhite Exp $);
M which is -STABLE

Here is attached patch. It should work for STABLE. It should fix problem
with frozen kdb, and give you ability to obtain a crashdump.

-- 
Totus tuus, Glebius.
GLEBIUS-RIPN GLEB-RIPE
Index: trap.c
===
RCS file: /home/ncvs/src/sys/i386/i386/trap.c,v
retrieving revision 1.267.2.3
diff -u -r1.267.2.3 trap.c
--- trap.c  1 May 2005 05:34:46 -   1.267.2.3
+++ trap.c  29 Jun 2005 14:27:04 -
@@ -809,8 +809,15 @@
}
 
 #ifdef KDB
-   if (kdb_trap(type, 0, frame))
-   return;
+   {
+   register_t eflags;
+   eflags = intr_disable();
+   if (kdb_trap(type, 0, frame)) {
+   intr_restore(eflags);
+   return;
+   }
+   intr_restore(eflags);
+   }
 #endif
printf(trap number = %d\n, type);
if (type = MAX_TRAP_MSG)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: FreeBSD -STABLE servers repeatedly crashing.

2005-06-29 Thread Matt Juszczak



On Wed, 29 Jun 2005, Kris Kennaway wrote:


On Tue, Jun 28, 2005 at 11:26:06AM -0400, Matt Juszczak wrote:


OK, when it crashes next and is sat at the db prompt, type tr and
press enter to get a trace.  Copy this down (or have a serial console to
capture the output).  Also, try typing call doadump() and see if that
succeeds in generating a crash dump.  How were you trying to generate
one before?

Gavin




I can't type anything.  The machine locks up.

See: http://paste.atopia.net/126

After CPUID: 1, the machine locks cold and nothing else is printed to
the screen.


Try two things:

1) adding 'options KDB_STOP_NMI' to your kernel config.

2) If you still can't get it to break to DDB, then compile up a
debugging kernel, run kgdb on it (as described in the developers'
handbook), and list *(0xblah) where that address is the value of the
instruction pointer in the trap message (e.g. 0xc6644eff in your paste
above).  That might at least be a start.

Kris



OK :) I'll try this next time it crashes.  I actually disabled ipf a few 
nights ago and it hasn't crashed since... knock on wood.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-06-28 Thread Gleb Smirnoff
On Mon, Jun 27, 2005 at 07:58:18PM -0400, Matt Juszczak wrote:
M Can you please build kernel with debugging and obtain a crashdump?
M 
M High activity on the box today caused us to be able to crash it again 
M within 9 hours.  I configured all steps per the developers handbook, but 
M when I went to do savecore, it said no dumps.
M 
M It appears the machine is completely locked up when it does a kernel trap. 
M The keyboard is non-responsive, and the machine hangs and doesn't reboot.

Please try out this patch to aid the above problem with hang instead of
dump:

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/i386/i386/trap.c.diff?r1=1.275r2=1.276

-- 
Totus tuus, Glebius.
GLEBIUS-RIPN GLEB-RIPE
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-06-28 Thread Matt Juszczak

Gleb Smirnoff wrote:


On Mon, Jun 27, 2005 at 01:01:09AM -0400, Matt Juszczak wrote:
M About three weeks ago, I upgraded my 5.3-RELEASE boxes to 5.4-RELEASE.  
M I also turned on procmail globally on our mail server.  Here is our 
M current FreeBSD server setup:
M 
M URANUS  -  primary ldap

M CALIBAN -  secondary ldap
M ORION -  primary mail
M 
M Orion was the first one to crash, about three weeks ago.  Orion is 
M constantly talking to uranus, because uranus is our primary ldap server 
M (we have a planet scheme), and caliban is our secondary ldap server.  I 
M ran an email flood test on orion to see if I could crash it again.  This 
M time, the high requests on Uranus caused Uranus to crash. With two 
M different servers on two different hardware setups crashing, I had to 
M start thinking of what could be causing the problem.
M 
M Memory tests on both servers came back OK.  Orion had some ECC errors 
M which it was able to fix.  I wasn't able to catch orion's first crash, 
M but I was able to catch uranus's first crash:
M 
M http://paste.atopia.net/126


Can you please build kernel with debugging and obtain a crashdump?


 




Ever since I setup the debug kernel the machine is now crashing every 12 
hours.  I think I have to switch to OpenBSD or 4.11 FreeBSD because this 
box can't keep crashing.  It  refuses to do a crash dump.


-Matt
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-06-28 Thread Gavin Atkinson
On Tue, 2005-06-28 at 10:49 -0400, Matt Juszczak wrote:
 Gleb Smirnoff wrote:
 
 On Mon, Jun 27, 2005 at 01:01:09AM -0400, Matt Juszczak wrote:
 M About three weeks ago, I upgraded my 5.3-RELEASE boxes to 5.4-RELEASE.  
 M I also turned on procmail globally on our mail server.  Here is our 
 M current FreeBSD server setup:
 M 
 M URANUS  -  primary ldap
 M CALIBAN -  secondary ldap
 M ORION -  primary mail
 M 
 M Orion was the first one to crash, about three weeks ago.  Orion is 
 M constantly talking to uranus, because uranus is our primary ldap server 
 M (we have a planet scheme), and caliban is our secondary ldap server.  I 
 M ran an email flood test on orion to see if I could crash it again.  This 
 M time, the high requests on Uranus caused Uranus to crash. With two 
 M different servers on two different hardware setups crashing, I had to 
 M start thinking of what could be causing the problem.
 M 
 M http://paste.atopia.net/126
 
 Can you please build kernel with debugging and obtain a crashdump?
 
 Ever since I setup the debug kernel the machine is now crashing every 12 
 hours.  I think I have to switch to OpenBSD or 4.11 FreeBSD because this 
 box can't keep crashing.  It  refuses to do a crash dump.

OK, when it crashes next and is sat at the db prompt, type tr and
press enter to get a trace.  Copy this down (or have a serial console to
capture the output).  Also, try typing call doadump() and see if that
succeeds in generating a crash dump.  How were you trying to generate
one before?

Gavin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-06-28 Thread Matt Juszczak

Gavin Atkinson wrote:


On Tue, 2005-06-28 at 10:49 -0400, Matt Juszczak wrote:
 


Gleb Smirnoff wrote:

   


On Mon, Jun 27, 2005 at 01:01:09AM -0400, Matt Juszczak wrote:
M About three weeks ago, I upgraded my 5.3-RELEASE boxes to 5.4-RELEASE.  
M I also turned on procmail globally on our mail server.  Here is our 
M current FreeBSD server setup:
M 
M URANUS  -  primary ldap

M CALIBAN -  secondary ldap
M ORION -  primary mail
M 
M Orion was the first one to crash, about three weeks ago.  Orion is 
M constantly talking to uranus, because uranus is our primary ldap server 
M (we have a planet scheme), and caliban is our secondary ldap server.  I 
M ran an email flood test on orion to see if I could crash it again.  This 
M time, the high requests on Uranus caused Uranus to crash. With two 
M different servers on two different hardware setups crashing, I had to 
M start thinking of what could be causing the problem.
M 
M http://paste.atopia.net/126


Can you please build kernel with debugging and obtain a crashdump?
 

Ever since I setup the debug kernel the machine is now crashing every 12 
hours.  I think I have to switch to OpenBSD or 4.11 FreeBSD because this 
box can't keep crashing.  It  refuses to do a crash dump.
   



OK, when it crashes next and is sat at the db prompt, type tr and
press enter to get a trace.  Copy this down (or have a serial console to
capture the output).  Also, try typing call doadump() and see if that
succeeds in generating a crash dump.  How were you trying to generate
one before?

Gavin
 



I can't type anything.  The machine locks up.

See: http://paste.atopia.net/126

After CPUID: 1, the machine locks cold and nothing else is printed to 
the screen.


-Matt
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-06-28 Thread Gary Mu1der

Matt Juszczak wrote:
Ever since I setup the debug kernel the machine is now crashing every 12 
hours.  I think I have to switch to OpenBSD or 4.11 FreeBSD because this 
box can't keep crashing.  It  refuses to do a crash dump.


-Matt


Matt,

Does it refuse to crash dump or is it that you can't get the core file back?

Make sure you have enough disk space in /var/crash for capturing the 
dump. You need at least as much free disk as you have memory configured.


There was a post saying that fsck may be trashing core files if it 
starts using swap. To maximize the chances of recovering the core file 
boot into single user after the crash and do the following:


fsck -y # or fsck and read every question, if you're paranoid
mount -f /  # remounts root read/write
mount /var
savecore /var/crash
exit

Gary
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-06-28 Thread Matt Juszczak




fsck -y# or fsck and read every question, if you're paranoid
mount -f /# remounts root read/write
mount /var
savecore /var/crash
exit

Gary


Gary:

After it crashes, it locks up and hangs, no keyboard response, etc.  
When I reboot, I go into single user mode and do:


fsck -p
mount -a -t ufs
savecore /var/crash /dev/da0s1b (which is my swap)

It says no dump available.  These instructions are from the handbook.

I just got sent a patch a little while ago which apparently will help 
the system not lock up.  I'm going to try it later today and see where 
it gets me.


Thanks,

Matt
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: FreeBSD -STABLE servers repeatedly crashing

2005-06-28 Thread Matt Juszczak

Please try out this patch to aid the above problem with hang instead of
dump:

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/i386/i386/trap.c.diff?r1=1.275r2=1.276


This box is now crashing once every 12 hours.  I can't apply this patch 
:-(.  Does anyone have any suggestions on how I can work around this? 
Some have said its an SMP problem and some have said its a 4 GB RAM 
problem and some have said its an IPF problem  if I disabled all three 
of those things would that help this box be stable until code could be 
fixed?


Thanks,
Matt
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing

2005-06-28 Thread Matt Juszczak

Matt,

Sadly the FreeBSD guys will need more info before a fix is possible. I would 
suggest you revert back to FreeBSD 5.3, if you can. Even if you get a patch 
you'd want to do a whole lot of regression testing before putting it in 
production as it might break something else.


Gary,

Do you know what the chances are that this problem I'm experiencing is SMP 
related?  I don't mind turning off SMP, and I guess I could for now to see 
if that runs stable.  Otherwise, I think we're going to switch to OpenBSD, 
because these crashes are occuring so frequently (twice a day)... and as 
far as the patch and regression testing, if someone sent me a patch right 
now I would put it on the server, because the server already crashes 
daily, so a faulty patch wouldn't change much :-(.


I appreciate your response.  I'm going to do a little more research today 
before i make my decision on a platform switch.


-Matt
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing

2005-06-28 Thread Gary Mu1der

Gary,

Do you know what the chances are that this problem I'm experiencing
is SMP related?  I don't mind turning off SMP, and I guess I could
for now to see if that runs stable.  Otherwise, I think we're going
to switch to OpenBSD, because these crashes are occuring so
frequently (twice a day)... and as far as the patch and regression
testing, if someone sent me a patch right now I would put it on the
server, because the server already crashes daily, so a faulty patch
wouldn't change much :-(.

I appreciate your response.  I'm going to do a little more research 
today before i make my decision on a platform switch.


Only way to find out is to try. You could build and install the non-SMP
kernel and reboot when you can, or let it boot the new kernel next time
the system(s) crash.

A lot of the issues seem to be SMP-related. I really loaded up a GENERIC
5.4 kernel and wasn't able to get it to panic. What do you have to lose
at this point?

I would suggest that before committing to OpenBSD you verify that all
the hardware/software you have/use is supported under OpenBSD:

http://www.daemonnews.org/200104/bsd_family.html
http://www.monkey.org/openbsd/archive/misc/0311/msg01803.html

As an example: I'm fairly sure OpenBSD has recently dropped (or will 
drop) support for the Adaptec aac driver as Theo is not happy with 
Adaptec's response to his queries for interface specs.


From what I've head (YMMV) OpenSBD SMP support is not very optimal, 
possibly because it is likely that it was implemented extremely 
conservatively. OpenBSD MySQL with two CPUs can be slower than with one:


http://software.newsforge.com/article.pl?sid=04/12/27/1243207from=rss

Gary

ps. it is a case of: cost, speed, reliability - choose any two.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing

2005-06-28 Thread Matt Juszczak

Only way to find out is to try. You could build and install the non-SMP
kernel and reboot when you can, or let it boot the new kernel next time
the system(s) crash.

A lot of the issues seem to be SMP-related. I really loaded up a GENERIC
5.4 kernel and wasn't able to get it to panic. What do you have to lose
at this point?

I would suggest that before committing to OpenBSD you verify that all
the hardware/software you have/use is supported under OpenBSD:

http://www.daemonnews.org/200104/bsd_family.html
http://www.monkey.org/openbsd/archive/misc/0311/msg01803.html

As an example: I'm fairly sure OpenBSD has recently dropped (or will drop) 
support for the Adaptec aac driver as Theo is not happy with Adaptec's 
response to his queries for interface specs.


From what I've head (YMMV) OpenSBD SMP support is not very optimal, possibly 
because it is likely that it was implemented extremely conservatively. 
OpenBSD MySQL with two CPUs can be slower than with one:


http://software.newsforge.com/article.pl?sid=04/12/27/1243207from=rss

Gary

ps. it is a case of: cost, speed, reliability - choose any two.




Agreed, Theo just yelled at me cause I was having this discussion on the 
OpenBSD misc mailing list, which is my fault :-/ ... a lot of people were 
responding though and I think it just got out of hand.


As much as OpenBSD seems nice, my FreeBSD experience is a lot better.  I'm 
going to switch to Uniprocessor and see if that makes us more stable. 
Hopefully it will.


-Matt
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing

2005-06-28 Thread Goran Gajic


Hi,

I have something like 20 boxes (Dell Power Edge 370, Fujitsu-Siemens 
PRIMERGY 200 and couple of dual AMD64 Fujitsu-Siemens) servers running 
5.4-STABLE. So far, only machine that I have experienced freezing and was 
unable to get droped into KDB or to get any sort of vmcore was Dell Power 
Edge 1600SC (dual Xeon 2.4GHz with 4Gb). I have noticed that since it was 
running squid-2.5 linked to pthread when I have switched to oops which was 
compiled on 5.2.1 and linked to libc_r that machine stoped crashing (HTT 
disabled, IPFILTER also disabled configuration GENERIC). However, I have 
decided to experiment and upgraded to 6.0-CURRENT and so far I haven't 
experienced any problems - except one panic caused by linux.ko and running 
edonkeyclc for linux (it was just experiment to see if it will work on 
6.0-CURRENT). I suppose that there might be some problems related to SMP 
on 5.4 and I don't know what for are you using problematic servers and I 
don't know if it is smart to use 6.0-CURRENT but so far I have positive 
experince with it on problematic server and would rather stay with FBSD 
then switching to NetBSD or OpenBSD.


Regards,
gg.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing

2005-06-28 Thread Matt Juszczak


Hi,

I have something like 20 boxes (Dell Power Edge 370, Fujitsu-Siemens PRIMERGY 
200 and couple of dual AMD64 Fujitsu-Siemens) servers running 5.4-STABLE. So 
far, only machine that I have experienced freezing and was unable to get 
droped into KDB or to get any sort of vmcore was Dell Power Edge 1600SC (dual 
Xeon 2.4GHz with 4Gb). I have noticed that since it was running squid-2.5 
linked to pthread when I have switched to oops which was compiled on 5.2.1 
and linked to libc_r that machine stoped crashing (HTT disabled, IPFILTER 
also disabled configuration GENERIC). However, I have decided to experiment 
and upgraded to 6.0-CURRENT and so far I haven't experienced any problems - 
except one panic caused by linux.ko and running edonkeyclc for linux (it was 
just experiment to see if it will work on 6.0-CURRENT). I suppose that there 
might be some problems related to SMP on 5.4 and I don't know what for are 
you using problematic servers and I don't know if it is smart to use 
6.0-CURRENT but so far I have positive experince with it on problematic 
server and would rather stay with FBSD then switching to NetBSD or OpenBSD.



With what you're saying, maybe my problem is that I use IPFILTER and maybe 
it isn't an SMP problem?  Should I switch to PF?


-Matt
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing

2005-06-28 Thread Goran Gajic



Some people suggested so - pf is supposed to be faster then IPFILTER.
However if you are experiencing machine freezing like I did on 5.4-STABLE
I'm not sure this will help - if nothing else helps try 6.0-CURRENT. I've 
also noticed that it is running much faster with all debuging enabled 
then regular 5.4-STABLE on same hardware...



Regards,
gg.



On Tue, 28 Jun 2005, Matt Juszczak wrote:




With what you're saying, maybe my problem is that I use IPFILTER and maybe it 
isn't an SMP problem?  Should I switch to PF?


-Matt


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing

2005-06-28 Thread Matt Juszczak

Some people suggested so - pf is supposed to be faster then IPFILTER.
However if you are experiencing machine freezing like I did on 5.4-STABLE
I'm not sure this will help - if nothing else helps try 6.0-CURRENT. I've 
also noticed that it is running much faster with all debuging enabled then 
regular 5.4-STABLE on same hardware...


I dont think its a good idea to run 6.0-CURRENT production.

I'm moving the main mail server to PF, keeping SMP on.  Its also running 
5.4-STABLE as of today.  We'll see if any of this fixes anything.


Regards,

Matt
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing

2005-06-28 Thread Karl Denninger
FreeBSD 5.4-STABLE #11: Fri Apr  8 09:48:24 CDT 2005 [EMAIL 
PROTECTED]:/usr/obj/usr/src/sys/KSD-SMP 

 5:02PM  up 80 days, 21:08, 1 user, load averages: 4.04, 3.33, 3.01

Yes, SMP is enabled, as is implied by the kernel config tag.

(Very busy compilation, web and database server)

--
-- 
Karl Denninger ([EMAIL PROTECTED]) Internet Consultant  Kids Rights Activist
http://www.denninger.netMy home on the net - links to everything I do!
http://scubaforum.org   Your UNCENSORED place to talk about DIVING!
http://homecuda.com Emerald Coast: Buy / sell homes, cars, boats!
http://genesis3.blogspot.comMusings Of A Sentient Mind


On Tue, Jun 28, 2005 at 05:55:17PM -0400, Matt Juszczak wrote:
 Some people suggested so - pf is supposed to be faster then IPFILTER.
 However if you are experiencing machine freezing like I did on 5.4-STABLE
 I'm not sure this will help - if nothing else helps try 6.0-CURRENT. I've 
 also noticed that it is running much faster with all debuging enabled then 
 regular 5.4-STABLE on same hardware...
 
 I dont think its a good idea to run 6.0-CURRENT production.
 
 I'm moving the main mail server to PF, keeping SMP on.  Its also running 
 5.4-STABLE as of today.  We'll see if any of this fixes anything.
 
 Regards,
 
 Matt
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to [EMAIL PROTECTED]
 
 
 %SPAMBLOCK-SYS: Matched [freebsd], message ok


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing

2005-06-28 Thread Matt Juszczak


Yes, SMP is enabled, as is implied by the kernel config tag.

(Very busy compilation, web and database server)



Are you using PF?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-06-27 Thread Gleb Smirnoff
On Mon, Jun 27, 2005 at 01:01:09AM -0400, Matt Juszczak wrote:
M About three weeks ago, I upgraded my 5.3-RELEASE boxes to 5.4-RELEASE.  
M I also turned on procmail globally on our mail server.  Here is our 
M current FreeBSD server setup:
M 
M URANUS  -  primary ldap
M CALIBAN -  secondary ldap
M ORION -  primary mail
M 
M Orion was the first one to crash, about three weeks ago.  Orion is 
M constantly talking to uranus, because uranus is our primary ldap server 
M (we have a planet scheme), and caliban is our secondary ldap server.  I 
M ran an email flood test on orion to see if I could crash it again.  This 
M time, the high requests on Uranus caused Uranus to crash. With two 
M different servers on two different hardware setups crashing, I had to 
M start thinking of what could be causing the problem.
M 
M Memory tests on both servers came back OK.  Orion had some ECC errors 
M which it was able to fix.  I wasn't able to catch orion's first crash, 
M but I was able to catch uranus's first crash:
M 
M http://paste.atopia.net/126

Can you please build kernel with debugging and obtain a crashdump?


-- 
Totus tuus, Glebius.
GLEBIUS-RIPN GLEB-RIPE
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD -STABLE servers repeatedly crashing.

2005-06-27 Thread Matt Juszczak

Can you please build kernel with debugging and obtain a crashdump?



High activity on the box today caused us to be able to crash it again 
within 9 hours.  I configured all steps per the developers handbook, but 
when I went to do savecore, it said no dumps.


It appears the machine is completely locked up when it does a kernel trap. 
The keyboard is non-responsive, and the machine hangs and doesn't reboot.


Any other suggestions would be greatly appreciated.  For now I am going to 
take the box out of SMP mode which will hopefully keep it stable until I 
can find some further instructions.


Regards,

Matt
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]