Hi,
I'm kind of desperate here, so please try to help me.
Here's my problem:
I have a setup in production (a very dynamic website).
It consists of pfsense-->Alteon Load Balancer-->IBM Bladecenter(with a
Squids cluster on it).
pfsense is installed on IBM x335 with 2 Xeon 2.4GHz, 2GB RAM, and Dual
Intel NIC PCI-X 1Gb.
I'm connected with 1Gb to the ISP.
The problem is that no matter what I do, I can't get more than 15kpps.
After that I start to get a lot of packet loss.
At first I was sure that the ISP has me on QoS, because I never saw
traffic going over a 100Mb/s,
but then to convince me they downloaded some large files from my servers
and came up as high as 170Mb/s.
So that one was out.
Next I changed the NICs (I used the onboard Broadcom at first) and it
did save me from the need to
do Device Polling, and I have no more interrupt using half the CPU, but
not more than that.
So I upgraded to 1.2.1 RC3. And still - the most I saw was 14kpps and
102 Mb/s.
I have 700000 states entered, while I never saw it going over 250000 in
reality.
The files transfered are rather small, 600KB being the largest.
As for the Alteon, at first it was connected via another Broadcom fibre
NIC (Alteon only has 1 fibre uplink that's 1Gb),
but now that I use an Intel Dual - I connected it to a Cisco Gbic and
from there to the Alteon by another fibre Gbic (don't judge me - I don't
have a giga switch). I know it's another possible trap, but right now I
don't have any other choice.
99% of the traffic is port 80.
I don't use NAT. All the IPs are public.
WAN is static. LAN is not used. OPT1 is and also static.
WAN and OPT1 are on different subnets of course. With additional static
route (the squids cluster is on the third subnet).
CPU doesn't go over 30%. RAM is about 20-30. I'm talking peaks now.
sysctl net.inet.ip.intr_queue_drops shows 0.
I have no more than 15 rules while the first one should take care of
most of the traffic.
I tried Aggressive mode with 1.2 and it didn't help. With the current
version I'm using the Normal mode.
The biggest problem with our website is that people are starting to hit
refresh when the site is not functioning
properly and it's kind of killing our web servers. Plus it adds traffic
to the firewall, thus loading it even more.
Another weird thing I noticed is that when looking at RRD graphs I
suddenly see a blank space, like this:
------ ------ --------. And it shows on all the graphs at the same time.
I've also noticed that it's about the same time as the load kills the
website. Must be related.
Quality graphs are not showing. They did in the 1.2 version.
SNMP is not enabled. DHCP is (it was on by default and I just left it
there).
With version 1.2 I had ACPI disabled(long boot), now I have it
enabled(seems to work fine with 1.2.1), although I should mention that I
never checked the ACPI at BIOS (I saw a post by someone who had this
problem).
I've read hundreds of topics here and on the forum and I saw that with
my setup I can handle a lot more than I do now.
So what could be wrong?
Please help!
Thanks,
Lenny.
P.S. Sorry for the size of this mail, but I figured I'd rather tell you
all the details ahead.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Commercial support available - https://portal.pfsense.org