Hi,
ok, I'm back with some tests and results.
I read a lot about the em driver settings, and this is what I did:
in /etc/sysctl.conf I added:
dev.em.0.rx_processing_limit=1600
dev.em.1.rx_processing_limit=1600
although I also tried -1 and some smaller values.
in /boot/loader.conf I added:
hw.em.rxd="4096"
hw.em.txd="4096"
and I believe these took care of the errors on the interfaces I used to see.
I also decided to change these in sysctl.conf:
kern.ipc.somaxconn=1024
net.inet.ip.intr_queue_maxlen=4096
the first one was a recommendation from a freebsd documentation and the
second one I changed even though I had net.inet.ip.intr_queue_drops = 0.
I also tried changing net.isr.direct to "0".
Now, for the important part. The "emX taskq" is back(after reboot), "swi1:
net" is gone and while I don't have any serious load right now, I can see
by the percentage of this process that it will hit 100% exactly around
15kpps, as usual. And I should remind you that this is still a different
server - IBM x336.
Did I mess it up too much? Would you recommend otherwise?
Thanks,
Lenny.
On Mar 16, 2009 5:37pm, Scott Ullrich <[email protected]> wrote:
On Mon, Mar 16, 2009 at 7:14 AM, Lenny [email protected]> wrote:
> Hi again,
>
> So I did replace the server, I have an IBM x336 now instead of the
x335. The
> NIC is the identical, but not the same.
> First of all, Chris, you were absolutely right - it was some sort of a
> glitch with the hardware compatibility, as with this server I'm seeing a
> completely different behavior. I started seeing interrupt taking some
of the
> CPU(not too much though - about 8-10% when loaded), and I don't see an
emX
> taskq at all now.
> But the thing is - the problem is still there - I had a relatively high
load
> this weekend (15kpps is my high load, remember?) and once again I got
some
> packet loss and a slow response time from the website.
>
> Couple of things I noticed though:
> When it happened, the quality RRD graph showed about 35-40ms spike
(from the
> usual 1-2). It was that time that I checked the "Disable Hardware
Checksum
> Offloading" option and it was back to normal within seconds. But I saw
it
> climb few other times afterwords... So maybe it was just a coincidence.
> Also, if I check the interface status when there is normal traffic -
there
> are no errors(well, no more additional errors), but the minute the load
hits
> - I start seeing the counters climbing up. On both interfaces, but only
on
> the "In", the out is "0".
>
> And one last thing, I was thinking about maybe enforcing the negotiation
> through the config.xml. So I went through it and I saw this:
>
> em0
>
>
>
> 100
> Mb
>
>
> XXXX
> 28
> YYYY
>
>
> em1
> OPTICAL
>
>
> ZZZZ
> 29
>
>
>
>
>
> Is this normal, I mean regarding the 100Mb bandwidth? I have everything
set
> to autonegotiation and the interface status shows:
> Media 1000baseTX on both, so I assume I shouldn't touch it.
> But the 100Mb confuses me.
> Anyhow, this x336 server is a loaner and I have to return it or buy it
> within a day or two, so if you have any thoughts at all, please.
Now you may be hitting a sysctl limit. Quoting BillM from prior in
this thread:
"Check sysctl net.inet.ip.intr_queue_drops and raise
net.inet.ip.intr_queue_maxlen if it's non-zero.
Also check net.isr.drop.
The intel driver has some debugging also under the dev.em sysctl I
believe."
Scott
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Commercial support available - https://portal.pfsense.org