Hi,

ok, I'm back with some tests and results.
I read a lot about the em driver settings, and this is what I did:
in /etc/sysctl.conf I added:
dev.em.0.rx_processing_limit=1600
dev.em.1.rx_processing_limit=1600
although I also tried -1 and some smaller values.

in /boot/loader.conf I added:
hw.em.rxd="4096"
hw.em.txd="4096"
and I believe these took care of the errors on the interfaces I used to see.

I also decided to change these in sysctl.conf:
kern.ipc.somaxconn=1024
net.inet.ip.intr_queue_maxlen=4096

the first one was a recommendation from a freebsd documentation and the second one I changed even though I had net.inet.ip.intr_queue_drops = 0.
I also tried changing net.isr.direct to "0".

Now, for the important part. The "emX taskq" is back(after reboot), "swi1: net" is gone and while I don't have any serious load right now, I can see by the percentage of this process that it will hit 100% exactly around 15kpps, as usual. And I should remind you that this is still a different server - IBM x336.

Did I mess it up too much? Would you recommend otherwise?

Thanks,

Lenny.


On Mar 16, 2009 5:37pm, Scott Ullrich <[email protected]> wrote:
On Mon, Mar 16, 2009 at 7:14 AM, Lenny [email protected]> wrote:

> Hi again,

>

> So I did replace the server, I have an IBM x336 now instead of the x335. The

> NIC is the identical, but not the same.

> First of all, Chris, you were absolutely right - it was some sort of a

> glitch with the hardware compatibility, as with this server I'm seeing a

> completely different behavior. I started seeing interrupt taking some of the

> CPU(not too much though - about 8-10% when loaded), and I don't see an emX

> taskq at all now.

> But the thing is - the problem is still there - I had a relatively high load

> this weekend (15kpps is my high load, remember?) and once again I got some

> packet loss and a slow response time from the website.

>

> Couple of things I noticed though:

> When it happened, the quality RRD graph showed about 35-40ms spike (from the

> usual 1-2). It was that time that I checked the "Disable Hardware Checksum

> Offloading" option and it was back to normal within seconds. But I saw it

> climb few other times afterwords... So maybe it was just a coincidence.

> Also, if I check the interface status when there is normal traffic - there

> are no errors(well, no more additional errors), but the minute the load hits

> - I start seeing the counters climbing up. On both interfaces, but only on

> the "In", the out is "0".

>

> And one last thing, I was thinking about maybe enforcing the negotiation

> through the config.xml. So I went through it and I saw this:

>

> em0

>

>

>

> 100

> Mb

>

>

> XXXX

> 28

> YYYY

>

>

> em1

> OPTICAL

>

>

> ZZZZ

> 29

>

>

>

>

>

> Is this normal, I mean regarding the 100Mb bandwidth? I have everything set

> to autonegotiation and the interface status shows:

> Media 1000baseTX on both, so I assume I shouldn't touch it.

> But the 100Mb confuses me.

> Anyhow, this x336 server is a loaner and I have to return it or buy it

> within a day or two, so if you have any thoughts at all, please.



Now you may be hitting a sysctl limit. Quoting BillM from prior in

this thread:



"Check sysctl net.inet.ip.intr_queue_drops and raise

net.inet.ip.intr_queue_maxlen if it's non-zero.



Also check net.isr.drop.



The intel driver has some debugging also under the dev.em sysctl I believe."



Scott



---------------------------------------------------------------------

To unsubscribe, e-mail: [email protected]

For additional commands, e-mail: [email protected]



Commercial support available - https://portal.pfsense.org



Reply via email to