Re: PF performance problem
Ariane van der Steldt wrote: On Wed, Jun 03, 2009 at 10:07:33PM -0700, patrick keshishian wrote: On Wed, Jun 3, 2009 at 3:50 AM, Richard Toohey richardtoo...@paradise.net.nz wrote: On 3/06/2009, at 10:02 PM, BARDOU Pierre wrote: Hello, I have performance issues on a OpenBSD 4.4 firewall. CPU load is OK (always below 50%), but system load is always between 1 and 1.5, it may go up to 2 sometimes. [cut] And what is the actual *problem*? What is pf failing to do? Or are you just worried about the numbers? B Search the archives for high load ... just for the record, i have seen a server where its typical load floats around 0.10 or so, but then something will happen and the plateau will get bumped to 1.10 and remain there. this was an 4.5 system. I have not identified what event caused this. I've seen similar issue with a couple of linux boxes at work where the load avg plateau will keep rising: it'll hover around ~3, then say ~6 then ~13. i don't think the issues are related, but could be caused by similar bugs in kernel. All systems continue to be responsive and it only seems that the reported load avg value is just bumped by a base value. It is definitely odd. Load on linux and load on BSD are two completely different things. On linux I recall load being the number of processes running or blocking, or something based on that. I know this is an old thread, but I came across this and thought I'd affirm this: from http://en.wikipedia.org/wiki/Load_(computing) -- Most UNIX systems count only processes in the running (on CPU) or runnable (waiting for CPU) states. However, Linux also includes processes in uninterruptible sleep states (usually waiting for disk activity), which can lead to markedly different results if many processes are blocked in I/O due to a busy or stalled I/O system. This, for example, includes processes that are blocked due to an NFS server failure or slow media (e.g., USB 1.x storage devices), leading to an elevated load average, which does not reflect an actual increase in CPU use (but still gives an idea on how long you have to wait). - In other words, ditto. I've always noticed (and then ignored) a difference between BSD/Solaris load average running the same processes vs Linux on the same hw. systat is much more useful, IMNSHO. -tico On BSD, load is the number of processes which have (wanted to) run at least once in the most recent 5-second window, with a degradation over time. So, if you have a process that wakes up every 5 seconds and prints the time on your console, you have a load average of 1. Load is not the number of cpu cycles used. A high load is just that: high. It means you have a lot of processes that sometimes run. High load does not mean your performance is going down or whatever: I ran a test today which generated a load of 200, but only used 10% of the cpu and was very responsive. You can't compare load on linux with load on bsd, I'd really appreciate if people stopped comparing apples and oranges. :P If you are interested in the internals of the system: load is the black magic that keeps the scheduling fair compared to the number of processes. Ciao,
Re: PF performance problem
On Wed, Jun 03, 2009 at 10:07:33PM -0700, patrick keshishian wrote: On Wed, Jun 3, 2009 at 3:50 AM, Richard Toohey richardtoo...@paradise.net.nz wrote: On 3/06/2009, at 10:02 PM, BARDOU Pierre wrote: Hello, I have performance issues on a OpenBSD 4.4 firewall. CPU load is OK (always below 50%), but system load is always between 1 and 1.5, it may go up to 2 sometimes. [cut] And what is the actual *problem*? What is pf failing to do? Or are you just worried about the numbers? B Search the archives for high load ... just for the record, i have seen a server where its typical load floats around 0.10 or so, but then something will happen and the plateau will get bumped to 1.10 and remain there. this was an 4.5 system. I have not identified what event caused this. I've seen similar issue with a couple of linux boxes at work where the load avg plateau will keep rising: it'll hover around ~3, then say ~6 then ~13. i don't think the issues are related, but could be caused by similar bugs in kernel. All systems continue to be responsive and it only seems that the reported load avg value is just bumped by a base value. It is definitely odd. Load on linux and load on BSD are two completely different things. On linux I recall load being the number of processes running or blocking, or something based on that. On BSD, load is the number of processes which have (wanted to) run at least once in the most recent 5-second window, with a degradation over time. So, if you have a process that wakes up every 5 seconds and prints the time on your console, you have a load average of 1. Load is not the number of cpu cycles used. A high load is just that: high. It means you have a lot of processes that sometimes run. High load does not mean your performance is going down or whatever: I ran a test today which generated a load of 200, but only used 10% of the cpu and was very responsive. You can't compare load on linux with load on bsd, I'd really appreciate if people stopped comparing apples and oranges. :P If you are interested in the internals of the system: load is the black magic that keeps the scheduling fair compared to the number of processes. Ciao, -- Ariane
Re: PF performance problem
On Wed, Jun 03, 2009 at 10:07:33PM -0700, patrick keshishian wrote: On Wed, Jun 3, 2009 at 3:50 AM, Richard Toohey richardtoo...@paradise.net.nz wrote: On 3/06/2009, at 10:02 PM, BARDOU Pierre wrote: Hello, I have performance issues on a OpenBSD 4.4 firewall. CPU load is OK (always below 50%), but system load is always between 1 and 1.5, it may go up to 2 sometimes. [cut] And what is the actual *problem*? What is pf failing to do? Or are you just worried about the numbers? B Search the archives for high load ... just for the record, i have seen a server where its typical load floats around 0.10 or so, but then something will happen and the plateau will get bumped to 1.10 and remain there. this was an 4.5 system. A sudden, significant, permanent change in load merely says that something happened that may be interesting. It doesn't tell you anything about what happened or if it's even a problem. I have not identified what event caused this. I've seen similar issue with a couple of linux boxes at work where the load avg plateau will keep rising: it'll hover around ~3, then say ~6 then ~13. i don't think the issues are related, but could be caused by similar bugs in kernel. I've seen this too over the years on *BSD and Linux or a variety of machines. Usually a few minutes with top(1), systat(1), et al will show you what's going on. Until you find out there's not much to do. A change in load is like getting a billing statement with Important: changes to your account printed on the envelope. You can run around waving the envelope asking what changed, or you can look inside and find out. All systems continue to be responsive and it only seems that the reported load avg value is just bumped by a base value. It is definitely odd. So it's not a problem... yet. It may never be a problem. Or it could be. Open the envelope and spend a few minutes reading the contents. ;-) -- Darrin Chandler| Phoenix BSD User Group | MetaBUG dwchand...@stilyagin.com | http://phxbug.org/ | http://metabug.org/ http://www.stilyagin.com/ | Daemons in the Desert | Global BUG Federation
Re: PF performance problem
On Thu, Jun 4, 2009 at 1:48 AM, Ariane van der Steldt ari...@stack.nl wrote: On Wed, Jun 03, 2009 at 10:07:33PM -0700, patrick keshishian wrote: On Wed, Jun 3, 2009 at 3:50 AM, Richard Toohey richardtoo...@paradise.net.nz wrote: On 3/06/2009, at 10:02 PM, BARDOU Pierre wrote: Hello, I have performance issues on a OpenBSD 4.4 firewall. CPU load is OK (always below 50%), but system load is always between 1 and 1.5, it may go up to 2 sometimes. [cut] And what is the actual *problem*? What is pf failing to do? Or are you just worried about the numbers? B Search the archives for high load ... just for the record, i have seen a server where its typical load floats around 0.10 or so, but then something will happen and the plateau will get bumped to 1.10 and remain there. this was an 4.5 system. I have not identified what event caused this. I've seen similar issue with a couple of linux boxes at work where the load avg plateau will keep rising: it'll hover around ~3, then say ~6 then ~13. i don't think the issues are related, but could be caused by similar bugs in kernel. All systems continue to be responsive and it only seems that the reported load avg value is just bumped by a base value. It is definitely odd. Load on linux and load on BSD are two completely different things. On linux I recall load being the number of processes running or blocking, or something based on that. Did you even read what I wrote? If so, did you understand what I said? Because I fail to see how the information you provide or your criticism of my post is at all relevant to my post. On BSD, load is the number of processes which have (wanted to) run at least once in the most recent 5-second window, with a degradation over time. So, if you have a process that wakes up every 5 seconds and prints the time on your console, you have a load average of 1. Load is not the number of cpu cycles used. Oh, really? A process running every 5 seconds and printing will cause a load average of 1? Did you even try this yourself before sending your email? Thu Jun 4 08:29:53 PDT 2009 going to sleep 5 and run uptime 8:29AM up 12:36, 2 users, load averages: 0.27, 0.40, 0.37 Thu Jun 4 08:29:58 PDT 2009 going to sleep 5 and run uptime 8:30AM up 12:36, 2 users, load averages: 0.25, 0.39, 0.37 Thu Jun 4 08:30:03 PDT 2009 going to sleep 5 and run uptime 8:30AM up 12:37, 2 users, load averages: 0.23, 0.39, 0.37 ... Thu Jun 4 08:31:54 PDT 2009 going to sleep 5 and run uptime 8:31AM up 12:38, 2 users, load averages: 0.25, 0.33, 0.35 Thu Jun 4 08:31:59 PDT 2009 going to sleep 5 and run uptime 8:32AM up 12:38, 2 users, load averages: 0.31, 0.35, 0.35 Thu Jun 4 08:32:04 PDT 2009 going to sleep 5 and run uptime 8:32AM up 12:39, 2 users, load averages: 0.36, 0.36, 0.35 Thu Jun 4 08:32:09 PDT 2009 going to sleep 5 and run uptime ... Thu Jun 4 08:36:11 PDT 2009 going to sleep 5 and run uptime 8:36AM up 12:43, 2 users, load averages: 0.48, 0.61, 0.48 Thu Jun 4 08:36:16 PDT 2009 going to sleep 5 and run uptime 8:36AM up 12:43, 2 users, load averages: 0.60, 0.63, 0.49 Thu Jun 4 08:36:21 PDT 2009 going to sleep 5 and run uptime 8:36AM up 12:43, 2 users, load averages: 0.55, 0.62, 0.48 ... 8:37AM up 12:44, 2 users, load averages: 0.33, 0.54, 0.46 Thu Jun 4 08:37:31 PDT 2009 going to sleep 5 and run uptime 8:37AM up 12:44, 2 users, load averages: 0.31, 0.53, 0.46 Thu Jun 4 08:37:36 PDT 2009 going to sleep 5 and run uptime 8:37AM up 12:44, 2 users, load averages: 0.28, 0.52, 0.46 Thu Jun 4 08:37:41 PDT 2009 going to sleep 5 and run uptime ... Thu Jun 4 08:39:16 PDT 2009 going to sleep 5 and run uptime 8:39AM up 12:46, 2 users, load averages: 0.22, 0.45, 0.43 Thu Jun 4 08:39:22 PDT 2009 going to sleep 5 and run uptime 8:39AM up 12:46, 2 users, load averages: 0.20, 0.44, 0.43 Thu Jun 4 08:39:27 PDT 2009 going to sleep 5 and run uptime 8:39AM up 12:46, 2 users, load averages: 0.19, 0.44, 0.43 ... Thu Jun 4 08:40:12 PDT 2009 going to sleep 5 and run uptime 8:40AM up 12:47, 2 users, load averages: 0.19, 0.40, 0.41 Thu Jun 4 08:40:17 PDT 2009 going to sleep 5 and run uptime 8:40AM up 12:47, 2 users, load averages: 0.17, 0.40, 0.41 Thu Jun 4 08:40:22 PDT 2009 going to sleep 5 and run uptime 8:40AM up 12:47, 2 users, load averages: 0.16, 0.39, 0.41 ... Thu Jun 4 08:41:02 PDT 2009 going to sleep 5 and run uptime 8:41AM up 12:48, 2 users, load averages: 0.13, 0.35, 0.39 Thu Jun 4 08:41:07 PDT 2009 going to sleep 5 and run uptime 8:41AM up 12:48, 2 users, load averages: 0.12, 0.35, 0.39 ... Thu Jun 4 08:42:57 PDT 2009 going to sleep 5 and run uptime 8:43AM up 12:49, 2 users, load averages: 0.15, 0.30, 0.37 Thu Jun 4 08:43:02 PDT 2009 going to sleep 5 and run uptime 8:43AM up 12:50, 2 users, load averages: 0.14, 0.30, 0.36 Thu Jun 4 08:43:08 PDT 2009 going to sleep 5 and run uptime 8:43AM up 12:50, 2 users, load averages: 0.12, 0.29, 0.36 and that loop is generated with at least two
Re: PF performance problem
On Thu, Jun 4, 2009 at 6:48 AM, Darrin Chandler dwchand...@stilyagin.com wrote: On Wed, Jun 03, 2009 at 10:07:33PM -0700, patrick keshishian wrote: On Wed, Jun 3, 2009 at 3:50 AM, Richard Toohey richardtoo...@paradise.net.nz wrote: On 3/06/2009, at 10:02 PM, BARDOU Pierre wrote: Hello, I have performance issues on a OpenBSD 4.4 firewall. CPU load is OK (always below 50%), but system load is always between 1 and 1.5, it may go up to 2 sometimes. [cut] And what is the actual *problem*? What is pf failing to do? Or are you just worried about the numbers? B Search the archives for high load ... just for the record, i have seen a server where its typical load floats around 0.10 or so, but then something will happen and the plateau will get bumped to 1.10 and remain there. this was an 4.5 system. A sudden, significant, permanent change in load merely says that something happened that may be interesting. It doesn't tell you anything about what happened or if it's even a problem. I have not identified what event caused this. I've seen similar issue with a couple of linux boxes at work where the load avg plateau will keep rising: it'll hover around ~3, then say ~6 then ~13. i don't think the issues are related, but could be caused by similar bugs in kernel. I've seen this too over the years on *BSD and Linux or a variety of machines. Usually a few minutes with top(1), systat(1), et al will show you what's going on. Until you find out there's not much to do. I've only seen it on obsd once after upgrading it to 4.5. The very same box never showed anything like that when running 4.3. I'm monitoring it for another such change. I couldn't find anything interesting using any of the tools you mentioned (top, ps, systat, etc.), nor anything the logs. As for the linux systems, they are actually production systems at a customer site. The two are RH AS 4 boxes. Same exact server hardware configuration with RH ES 5 running same exact version of our code (though compiled for ES 5) doesn't present the same issue. We've chucked it up to a kernel bug in linux that is shipped with that version, also due to some other issues (including a pthread bug) in AS 4 we have dropped support for AS 4 and recommend our customers to upgrade to ES. A change in load is like getting a billing statement with Important: changes to your account printed on the envelope. You can run around waving the envelope asking what changed, or you can look inside and find out. All systems continue to be responsive and it only seems that the reported load avg value is just bumped by a base value. It is definitely odd. So it's not a problem... yet. It may never be a problem. Or it could be. Open the envelope and spend a few minutes reading the contents. ;-) as mentioned, I did best I could with the tools I knew of. Cheers, --patrick -- Darrin Chandler B B B B B B | B Phoenix BSD User Group B | B MetaBUG dwchand...@stilyagin.com B | B http://phxbug.org/ B B B | B http://metabug.org/ http://www.stilyagin.com/ B | B Daemons in the Desert B | B Global BUG Federation
Re: PF performance problem
PF works like a charm. Without doubt. Despite of that, PF don't require the HD and the main bottlenecks are de CPU and memory (and NIC and the driver, of course). I suspect an error in your PF logging system. PD. 'Urgent' means the same words says: urgent. If you see some message related to PF in your /var/log/messages you should considerate it important (it's urgent!). See at pfctl man pages, -x flag. -- Thanks, Jordi Espasa Clofent
Re: PF performance problem
On 12:02, Wed 03 Jun 09, BARDOU Pierre wrote: Hello, I have performance issues on a OpenBSD 4.4 firewall. CPU load is OK (always below 50%), but system load is always between 1 and 1.5, it may go up to 2 sometimes. Is the system really slow ? Or are you basing this 'performance issue' on the loadavg number ? On some setups we have loadavg of around 10 and dont notice any performance impact. I suspected an I/O problem on the HDD because of pflogd, so I shut it down and the system load is always as high. Could you tell me what should I upgrade to solve this ? And what Debug: Urgent means ? Thank you Stats of PF : # pfctl -si Status: Enabled for 29 days 15:27:29 Debug: Urgent State Table Total Rate current entries16592 searches 3611345993314099.9/s inserts286242425 111.8/s removals 286225833 111.8/s Counters match 794705461 310.3/s bad-offset 00.0/s fragment 60.0/s short 00.0/s normalize2720.0/s memory 00.0/s bad-timestamp 00.0/s congestion 64940.0/s ip-option 120.0/s proto-cksum10.0/s state-mismatch1075430.0/s state-insert 109660.0/s state-limit 180.0/s src-limit 00.0/s synproxy 00.0/s dmesg : # cat /var/run/dmesg.boot OpenBSD 4.4 (GENERIC) #1021: Tue Aug 12 17:16:55 MDT 2008 dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC cpu0: Intel(R) Xeon(TM) CPU 2.80GHz (GenuineIntel 686-class) 2.80 GHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUS H,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,CNXT-ID,CX16,xTPR real mem = 1073053696 (1023MB) avail mem = 1029165056 (981MB) mainbus0 at root bios0 at mainbus0: AT/286+ BIOS, date 09/22/05, BIOS32 rev. 0 @ 0xffe90, SMBIOS rev. 2.3 @ 0xf9920 (87 entries) bios0: vendor Dell Computer Corporation version A04 date 09/22/2005 bios0: Dell Computer Corporation PowerEdge 1850 acpi0 at bios0: rev 0 acpi0: tables DSDT FACP APIC SPCR HPET MCFG acpi0: wakeup devices PCI0(S5) PALO(S5) PBLO(S5) VPR0(S5) PBHI(S5) VPR1(S5) PICH(S5) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpihpet0 at acpi0: 14318179 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus 1 (PALO) acpiprt2 at acpi0: bus 2 (DOBA) acpiprt3 at acpi0: bus 3 (DOBB) acpiprt4 at acpi0: bus 4 (PBLO) acpiprt5 at acpi0: bus 8 (VPR0) acpiprt6 at acpi0: bus 5 (PBHI) acpiprt7 at acpi0: bus 6 (PXB1) acpiprt8 at acpi0: bus 7 (PXB2) acpiprt9 at acpi0: bus 9 (PICH) acpicpu0 at acpi0 bios0: ROM list: 0xc/0xb000! 0xcb000/0x1000 0xcc000/0x800 0xcc800/0x1000 0xcd800/0x2600 0xd/0x1800 0xec000/0x4000! ipmi at mainbus0 not configured cpu0 at mainbus0 pci0 at mainbus0 bus 0: configuration mode 1 (no bios) pchb0 at pci0 dev 0 function 0 Intel E7520 Host rev 0x09 ppb0 at pci0 dev 2 function 0 Intel E7520 PCIE rev 0x09 pci1 at ppb0 bus 1 ppb1 at pci1 dev 0 function 0 Intel IOP332 PCIE-PCIX rev 0x06 pci2 at ppb1 bus 2 mpi0 at pci2 dev 5 function 0 Symbios Logic 53c1030 rev 0x08: irq 7 scsibus0 at mpi0: 16 targets, initiator 7 em0 at pci2 dev 12 function 0 Intel PRO/1000MT (82546EB) rev 0x01: irq 10, address 00:11:0a:64:32:74 em1 at pci2 dev 12 function 1 Intel PRO/1000MT (82546EB) rev 0x01: irq 11, address 00:11:0a:64:32:75 ppb2 at pci1 dev 0 function 2 Intel IOP332 PCIE-PCIX rev 0x06 pci3 at ppb2 bus 3 ami0 at pci3 dev 11 function 0 Symbios Logic MegaRAID rev 0x01: irq 3 ami0: Dell 520, 64b/lhc, FW 351S, BIOS v1.10, 64MB RAM ami0: 1 channels, 0 FC loops, 1 logical drives scsibus1 at ami0: 40 targets, initiator 40 sd0 at scsibus1 targ 0 lun 0: AMI, Host drive #00, SCSI2 0/direct fixed sd0: 34680MB, 4421 cyl, 255 head, 63 sec, 512 bytes/sec, 71024640 sec total scsibus2 at ami0: 16 targets, initiator 16 safte0 at scsibus2 targ 6 lun 0: PE/PV, 1x2 SCSI BP, 1.0 SCSI2 3/processor fixed ppb3 at pci0 dev 4 function 0 Intel E7520 PCIE rev 0x09 pci4 at ppb3 bus 4 ppb4 at pci0 dev 5 function 0 Intel E7520 PCIE rev 0x09 pci5 at ppb4 bus 5 ppb5 at pci5 dev 0 function 0 Intel PCIE-PCIE rev 0x09 pci6 at ppb5 bus 6 em2 at pci6 dev 7 function 0 Intel PRO/1000MT (82541GI) rev 0x05: irq 11, address 00:14:22:21:61:6d ppb6 at pci5 dev 0 function 2 Intel PCIE-PCIE rev 0x09 pci7 at ppb6 bus 7 em3 at
Re: PF performance problem
On 3/06/2009, at 10:02 PM, BARDOU Pierre wrote: Hello, I have performance issues on a OpenBSD 4.4 firewall. CPU load is OK (always below 50%), but system load is always between 1 and 1.5, it may go up to 2 sometimes. [cut] And what is the actual *problem*? What is pf failing to do? Or are you just worried about the numbers? Search the archives for high load ... http://marc.info/?l=openbsd-miscm=122607853731136w=3 HTH.
Re: PF performance problem
The only problem I noticed is an abnormally long ping (usually 0.3ms, sometimes -3 or 4 times a day says nagios- up to 30ms). I am worried about the numbers since this firewall is higly critical. Since it protects Citrix hosted applications, I will get instantly killed if delays are too long... -- Cordialement, Pierre BARDOU -Message d'origine- De : Richard Toohey [mailto:richardtoo...@paradise.net.nz] Envoyi : mercredi 3 juin 2009 12:50 @ : BARDOU Pierre Cc : misc@openbsd.org Objet : Re: PF performance problem On 3/06/2009, at 10:02 PM, BARDOU Pierre wrote: Hello, I have performance issues on a OpenBSD 4.4 firewall. CPU load is OK (always below 50%), but system load is always between 1 and 1.5, it may go up to 2 sometimes. [cut] And what is the actual *problem*? What is pf failing to do? Or are you just worried about the numbers? Search the archives for high load ... http://marc.info/?l=openbsd-miscm=122607853731136w=3 HTH.
Re: PF performance problem
BARDOU Pierre escribis: The only problem I noticed is an abnormally long ping (usually 0.3ms, sometimes -3 or 4 times a day says nagios- up to 30ms). M... maybe it's not a PD-related issue. Check your network. Despite of that, check the ICMP rules; use tcpdumo(1) also to debug it. I am worried about the numbers since this firewall is higly critical. Since it protects Citrix hosted applications, I will get instantly killed if delays are too long... I use PF in front of networks segments of web-hosting company. An I sleep very well... -- Thanks, Jordi Espasa Clofent
Re: PF performance problem
Thanks everybody for the help. I will stop worrying about the system load and wait a noticeable performance problem before asking for help :) I set pfctl -x urgent, and now I'm waiting for something in /var/log/messages... -- Cordialement, Pierre BARDOU
Re: PF performance problem
On Wed, Jun 3, 2009 at 3:50 AM, Richard Toohey richardtoo...@paradise.net.nz wrote: On 3/06/2009, at 10:02 PM, BARDOU Pierre wrote: Hello, I have performance issues on a OpenBSD 4.4 firewall. CPU load is OK (always below 50%), but system load is always between 1 and 1.5, it may go up to 2 sometimes. [cut] And what is the actual *problem*? What is pf failing to do? Or are you just worried about the numbers? B Search the archives for high load ... just for the record, i have seen a server where its typical load floats around 0.10 or so, but then something will happen and the plateau will get bumped to 1.10 and remain there. this was an 4.5 system. I have not identified what event caused this. I've seen similar issue with a couple of linux boxes at work where the load avg plateau will keep rising: it'll hover around ~3, then say ~6 then ~13. i don't think the issues are related, but could be caused by similar bugs in kernel. All systems continue to be responsive and it only seems that the reported load avg value is just bumped by a base value. It is definitely odd. --patrick http://marc.info/?l=openbsd-miscm=122607853731136w=3