Server locks up if more than one jail in running

2005-01-11 Thread Artem Kuchin
I  have a very bad problem. I need to seetup
large number of jails (about 100) to do virtual hosting
and i am doing it on 5.3-STABLE (not 5.3-RELEASE, i mean
-STABLE which is cvsupped). So, when i setup one jail -
everything runs fine for many days. If i setup one more jail
or 3 or more - server locks up (freezes, hangs, becomes totally
unsresponsive) within 48 hours. Sometime it runs only a couple
hours, sometimes almost a couple of days. So, to diagnose the
problem i setup a cron process which saves a lot of system
statistics every minutes. The last working minute stats are
included below as well as basic system configuration.
Jails are complete distributions, but very basic services are running:
cron, sshd. adjtimezone is disabled. All other records on /etc/crontab
are left just like in distro.
I really need help with this problem because i canno migrate
to 5.3 because of it. And i really need 5.3 because of some new
features. Maybe somone could help.
SYSTEM:
Multiprocessor. Dual XEON 2.667Ghz with HTT. 4GB DDR RAM.
RAID 5 (4 SATA disks x 120gb) on 3Ware SATA controller in
PCI 64 slot on Supermicro MB.
 dmesg --
Copyright (c) 1992-2004 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.3-STABLE #4: Wed Dec 22 10:25:48 MSK 2004
   [EMAIL PROTECTED]:/usr/obj/usr/src/sys/OMNI
ACPI APIC Table: PTLTDAPIC  
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 2.66GHz (2665.92-MHz 686-class CPU)
 Origin = GenuineIntel  Id = 0xf25  Stepping = 5
 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
 Hyperthreading: 2 logical CPUs
real memory  = 4160225280 (3967 MB)
avail memory = 4077481984 (3888 MB)
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
cpu0 (BSP): APIC ID:  0
cpu1 (AP): APIC ID:  1
cpu2 (AP): APIC ID:  6
cpu3 (AP): APIC ID:  7
ioapic0 Version 2.0 irqs 0-23 on motherboard
ioapic1 Version 2.0 irqs 24-47 on motherboard
ioapic2 Version 2.0 irqs 48-71 on motherboard
ioapic3 Version 2.0 irqs 72-95 on motherboard
ioapic4 Version 2.0 irqs 96-119 on motherboard
npx0: [FAST]
npx0: math processor on motherboard
npx0: INT 16 interface
acpi0: PTLTD   RSDT on motherboard
acpi0: Power Button (fixed)
Timecounter ACPI-fast frequency 3579545 Hz quality 1000
acpi_timer0: 24-bit timer at 3.579545MHz port 0x1008-0x100b on acpi0
cpu0: ACPI CPU (2 Cx states) on acpi0
cpu1: ACPI CPU (2 Cx states) on acpi0
cpu2: ACPI CPU (2 Cx states) on acpi0
cpu3: ACPI CPU (2 Cx states) on acpi0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0
pci0: ACPI PCI bus on pcib0
pci0: unknown at device 0.1 (no driver attached)
pcib1: ACPI PCI-PCI bridge at device 2.0 on pci0
pci1: ACPI PCI bus on pcib1
pci1: base peripheral, interrupt controller at device 28.0 (no driver attached)
pcib2: ACPI PCI-PCI bridge at device 29.0 on pci1
pci2: ACPI PCI bus on pcib2
pci1: base peripheral, interrupt controller at device 30.0 (no driver attached)
pcib3: ACPI PCI-PCI bridge at device 31.0 on pci1
pci3: ACPI PCI bus on pcib3
em0: Intel(R) PRO/1000 Network Connection, Version - 1.7.35 port 0x3000-0x303f mem 0xfc20-0xfc21 irq 28 at device 2.0 on 
pci3
em0: Ethernet address: 00:30:48:2a:2d:bc
em0:  Speed:N/A  Duplex:N/A
em1: Intel(R) PRO/1000 Network Connection, Version - 1.7.35 port 0x3040-0x307f mem 0xfc22-0xfc23 irq 29 at device 2.1 on 
pci3
em1: Ethernet address: 00:30:48:2a:2d:bd
em1:  Speed:N/A  Duplex:N/A
pcib4: ACPI PCI-PCI bridge at device 3.0 on pci0
pci4: ACPI PCI bus on pcib4
pci4: base peripheral, interrupt controller at device 28.0 (no driver attached)
pcib5: ACPI PCI-PCI bridge at device 29.0 on pci4
pci5: ACPI PCI bus on pcib5
pci4: base peripheral, interrupt controller at device 30.0 (no driver attached)
pcib6: ACPI PCI-PCI bridge at device 31.0 on pci4
pci6: ACPI PCI bus on pcib6
twe0: 3ware Storage Controller. Driver version 1.50.01.002 port 0x4000-0x400f mem 0xfc80-0xfcff irq 72 at device 1.0 on 
pci6
twe0: [GIANT-LOCKED]
twe0: 4 ports, Firmware FE7S 1.05.00.063, BIOS BE7X 1.08.00.048
uhci0: Intel 82801CA/CAM (ICH3) USB controller USB-A port 0x2000-0x201f irq 16 at device 29.0 on pci0
uhci0: [GIANT-LOCKED]
usb0: Intel 82801CA/CAM (ICH3) USB controller USB-A on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: Intel 82801CA/CAM (ICH3) USB controller USB-B port 0x2020-0x203f irq 19 at device 29.1 on pci0
uhci1: [GIANT-LOCKED]
usb1: Intel 82801CA/CAM (ICH3) USB controller USB-B on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2: Intel 82801CA/CAM (ICH3) USB controller USB-C port 0x2040-0x205f irq 18 at device 29.2 on pci0
uhci2: [GIANT-LOCKED]
usb2: 

Server locks up if more than one jail in running

2005-01-08 Thread matrix
I  have a very bad problem. I need to seetup
large number of jails (about 100) to do virtual hosting
and i am doing it on 5.3-STABLE (not 5.3-RELEASE, i mean
-STABLE which is cvsupped). So, when i setup one jail -
everything runs fine for many days. If i setup one more jail
or 3 or more - server locks up (freezes, hangs, becomes totally
unsresponsive) within 48 hours. Sometime it runs only a couple
hours, sometimes almost a couple of days. So, to diagnose the
problem i setup a cron process which saves a lot of system
statistics every minutes. The last working minute stats are
included below as well as basic system configuration.
Jails are complete distributions, but very basic services are running:
cron, sshd. adjtimezone is disabled. All other records on /etc/crontab
are left just like in distro.
I really need help with this problem because i canno migrate
to 5.3 because of it. And i really need 5.3 because of some new
features. Maybe somone could help.


SYSTEM:

Multiprocessor. Dual XEON 2.667Ghz with HTT. 4GB DDR RAM.
RAID 5 (4 SATA disks x 120gb) on 3Ware SATA controller in
PCI 64 slot on Supermicro MB.

 dmesg --

Copyright (c) 1992-2004 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
 The Regents of the University of California. All rights reserved.
FreeBSD 5.3-STABLE #4: Wed Dec 22 10:25:48 MSK 2004
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/OMNI
ACPI APIC Table: PTLTDAPIC  
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 2.66GHz (2665.92-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0xf25  Stepping = 5
  
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
  Hyperthreading: 2 logical CPUs
real memory  = 4160225280 (3967 MB)
avail memory = 4077481984 (3888 MB)
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  6
 cpu3 (AP): APIC ID:  7
ioapic0 Version 2.0 irqs 0-23 on motherboard
ioapic1 Version 2.0 irqs 24-47 on motherboard
ioapic2 Version 2.0 irqs 48-71 on motherboard
ioapic3 Version 2.0 irqs 72-95 on motherboard
ioapic4 Version 2.0 irqs 96-119 on motherboard
npx0: [FAST]
npx0: math processor on motherboard
npx0: INT 16 interface
acpi0: PTLTD   RSDT on motherboard
acpi0: Power Button (fixed)
Timecounter ACPI-fast frequency 3579545 Hz quality 1000
acpi_timer0: 24-bit timer at 3.579545MHz port 0x1008-0x100b on acpi0
cpu0: ACPI CPU (2 Cx states) on acpi0
cpu1: ACPI CPU (2 Cx states) on acpi0
cpu2: ACPI CPU (2 Cx states) on acpi0
cpu3: ACPI CPU (2 Cx states) on acpi0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0
pci0: ACPI PCI bus on pcib0
pci0: unknown at device 0.1 (no driver attached)
pcib1: ACPI PCI-PCI bridge at device 2.0 on pci0
pci1: ACPI PCI bus on pcib1
pci1: base peripheral, interrupt controller at device 28.0 (no driver 
attached)
pcib2: ACPI PCI-PCI bridge at device 29.0 on pci1
pci2: ACPI PCI bus on pcib2
pci1: base peripheral, interrupt controller at device 30.0 (no driver 
attached)
pcib3: ACPI PCI-PCI bridge at device 31.0 on pci1
pci3: ACPI PCI bus on pcib3
em0: Intel(R) PRO/1000 Network Connection, Version - 1.7.35 port 
0x3000-0x303f mem 0xfc20-0xfc21 irq 28 at device 2.0 on 
pci3
em0: Ethernet address: 00:30:48:2a:2d:bc
em0:  Speed:N/A  Duplex:N/A
em1: Intel(R) PRO/1000 Network Connection, Version - 1.7.35 port 
0x3040-0x307f mem 0xfc22-0xfc23 irq 29 at device 2.1 on 
pci3
em1: Ethernet address: 00:30:48:2a:2d:bd
em1:  Speed:N/A  Duplex:N/A
pcib4: ACPI PCI-PCI bridge at device 3.0 on pci0
pci4: ACPI PCI bus on pcib4
pci4: base peripheral, interrupt controller at device 28.0 (no driver 
attached)
pcib5: ACPI PCI-PCI bridge at device 29.0 on pci4
pci5: ACPI PCI bus on pcib5
pci4: base peripheral, interrupt controller at device 30.0 (no driver 
attached)
pcib6: ACPI PCI-PCI bridge at device 31.0 on pci4
pci6: ACPI PCI bus on pcib6
twe0: 3ware Storage Controller. Driver version 1.50.01.002 port 0x4000-0x400f 
mem 0xfc80-0xfcff irq 72 at device 1.0 on 
pci6
twe0: [GIANT-LOCKED]
twe0: 4 ports, Firmware FE7S 1.05.00.063, BIOS BE7X 1.08.00.048
uhci0: Intel 82801CA/CAM (ICH3) USB controller USB-A port 0x2000-0x201f irq 
16 at device 29.0 on pci0
uhci0: [GIANT-LOCKED]
usb0: Intel 82801CA/CAM (ICH3) USB controller USB-A on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: Intel 82801CA/CAM (ICH3) USB controller USB-B port 0x2020-0x203f irq 
19 at device 29.1 on pci0
uhci1: [GIANT-LOCKED]
usb1: Intel 82801CA/CAM (ICH3) USB controller USB-B on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2: Intel 82801CA/CAM (ICH3) USB controller USB-C port 0x2040-0x205f irq 
18 at device 29.2 on pci0
uhci2: 

Re: Server locks up if more than one jail in running

2005-01-08 Thread Robert Watson

On 8 Jan 2005 [EMAIL PROTECTED] wrote:

 I have a very bad problem. I need to seetup large number of jails (about
 100) to do virtual hosting and i am doing it on 5.3-STABLE (not
 5.3-RELEASE, i mean -STABLE which is cvsupped). So, when i setup one
 jail - everything runs fine for many days. If i setup one more jail or 3
 or more - server locks up (freezes, hangs, becomes totally
 unsresponsive) within 48 hours. Sometime it runs only a couple hours,
 sometimes almost a couple of days. So, to diagnose the problem i setup a
 cron process which saves a lot of system statistics every minutes. The
 last working minute stats are included below as well as basic system
 configuration.  Jails are complete distributions, but very basic
 services are running:  cron, sshd. adjtimezone is disabled. All other
 records on /etc/crontab are left just like in distro.  I really need
 help with this problem because i canno migrate to 5.3 because of it. And
 i really need 5.3 because of some new features. Maybe somone could help. 

If you haven't already: could you compile the kernel with options DDB,
options KDB, and options BREAK_TO_DEBUGGER and see if you can get into
the debugger using a console or (preferably) a serial break?  That way we
can attempt to diagnose the state of the system once it is wedged.  It
would also be helpful if you could set the system up to run with options
INVARIANTS, which will have some performance impact but turn on
additional self-checking in the OS.  Finally, depending on whether you can
tolerate the performance hit, running with options WITNESS (and options
WITNESS_SKIPSPIN to help a bit with performance) will allow us to
generate additional debugging information.  If you are able to get into
the debugger, it would be helpful if you could run the following, and
ideal send the output copied using a serial console:

- show pcpu show pcpu 0 show pcpu 1 show pcpu 2 show pcpu 3
- ps
- show threads
- show alllocks   # will only work with witness
- show lockedvnods

This is a substantial volume of output, so if you don't have access to a
serial console so you can copy and paste on another box, it may be
somewhat inconvenient. 

Robert N M Watson


 
 
 SYSTEM:
 
 Multiprocessor. Dual XEON 2.667Ghz with HTT. 4GB DDR RAM.
 RAID 5 (4 SATA disks x 120gb) on 3Ware SATA controller in
 PCI 64 slot on Supermicro MB.
 
  dmesg --
 
 Copyright (c) 1992-2004 The FreeBSD Project.
 Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
  The Regents of the University of California. All rights reserved.
 FreeBSD 5.3-STABLE #4: Wed Dec 22 10:25:48 MSK 2004
 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/OMNI
 ACPI APIC Table: PTLTDAPIC  
 Timecounter i8254 frequency 1193182 Hz quality 0
 CPU: Intel(R) Xeon(TM) CPU 2.66GHz (2665.92-MHz 686-class CPU)
   Origin = GenuineIntel  Id = 0xf25  Stepping = 5
   
 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
   Hyperthreading: 2 logical CPUs
 real memory  = 4160225280 (3967 MB)
 avail memory = 4077481984 (3888 MB)
 FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
  cpu0 (BSP): APIC ID:  0
  cpu1 (AP): APIC ID:  1
  cpu2 (AP): APIC ID:  6
  cpu3 (AP): APIC ID:  7
 ioapic0 Version 2.0 irqs 0-23 on motherboard
 ioapic1 Version 2.0 irqs 24-47 on motherboard
 ioapic2 Version 2.0 irqs 48-71 on motherboard
 ioapic3 Version 2.0 irqs 72-95 on motherboard
 ioapic4 Version 2.0 irqs 96-119 on motherboard
 npx0: [FAST]
 npx0: math processor on motherboard
 npx0: INT 16 interface
 acpi0: PTLTD   RSDT on motherboard
 acpi0: Power Button (fixed)
 Timecounter ACPI-fast frequency 3579545 Hz quality 1000
 acpi_timer0: 24-bit timer at 3.579545MHz port 0x1008-0x100b on acpi0
 cpu0: ACPI CPU (2 Cx states) on acpi0
 cpu1: ACPI CPU (2 Cx states) on acpi0
 cpu2: ACPI CPU (2 Cx states) on acpi0
 cpu3: ACPI CPU (2 Cx states) on acpi0
 pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0
 pci0: ACPI PCI bus on pcib0
 pci0: unknown at device 0.1 (no driver attached)
 pcib1: ACPI PCI-PCI bridge at device 2.0 on pci0
 pci1: ACPI PCI bus on pcib1
 pci1: base peripheral, interrupt controller at device 28.0 (no driver 
 attached)
 pcib2: ACPI PCI-PCI bridge at device 29.0 on pci1
 pci2: ACPI PCI bus on pcib2
 pci1: base peripheral, interrupt controller at device 30.0 (no driver 
 attached)
 pcib3: ACPI PCI-PCI bridge at device 31.0 on pci1
 pci3: ACPI PCI bus on pcib3
 em0: Intel(R) PRO/1000 Network Connection, Version - 1.7.35 port 
 0x3000-0x303f mem 0xfc20-0xfc21 irq 28 at device 2.0 on 
 pci3
 em0: Ethernet address: 00:30:48:2a:2d:bc
 em0:  Speed:N/A  Duplex:N/A
 em1: Intel(R) PRO/1000 Network Connection, Version - 1.7.35 port 
 0x3040-0x307f mem 0xfc22-0xfc23 irq 29 at device 2.1 on 
 pci3
 em1: Ethernet address: 00:30:48:2a:2d:bd
 em1:  Speed:N/A  Duplex:N/A
 pcib4: ACPI PCI-PCI bridge at device 3.0 on