Re: netmap & chelsio

2018-07-12 Thread Eggert, Lars
Hi,

On 2018-7-11, at 18:53, Navdeep Parhar  wrote:
> Try changing lazy_tx_credit_flush to 0 on the running kernel with a
> debugger, or compile the driver with it set to 0 -- it's in t4_netmap.c:
> 
> int lazy_tx_credit_flush = 1;

thanks! With that, I get performance similar to the ixl cards on first try.

> I'm surprised I don't have a tunable/sysctl for it.  I'll add one really
> soon.

That would be useful!

(Also useful would be some definite documentation on what all the loader 
tunables are and what they do. But the current situation is already much better 
than for the Intel cards, where esp. those that have been iflib-ified seem to 
have completely undocumented tunables now.)

General/unrelated suggestion: Could the kernel spit out a warning when it 
encounters a loader tunable that doesn't do anything? That would allow one to 
at least catch when tunables are renamed/changed.

Lars


signature.asc
Description: Message signed with OpenPGP


Re: netmap & chelsio

2018-07-11 Thread Eggert, Lars
Hi,

I have netmap working with the T6 cards now.

However, performance is very poor. It seems to take several milliseconds after 
a NIOCTXSYNC ioctl before the tail is updated?

In case it matters, here is what is in loader.conf:

hw.cxgbe.num_vis=2
hw.cxgbe.fl_pktshift=0
hw.cxgbe.ntxq=1
hw.cxgbe.nrxq=1
hw.cxgbe.qsize_txq=512
hw.cxgbe.qsize_rxq=512
hw.cxgbe.cong_drop=1
hw.cxgbe.pause_settings=1
hw.cxgbe.autoneg=0
hw.cxgbe.nm_rx_nframes=1
hw.cxgbe.nm_rx_ndesc=1

Lars


signature.asc
Description: Message signed with OpenPGP


Re: netmap & chelsio

2018-07-06 Thread Eggert, Lars
Hi,

On 2018-7-5, at 17:47, n...@freebsd.org wrote:
> Set hw.cxgbe.fl_pktshift=0 in loader.conf to stop the chip from doing
> this.  See cxgbe(4) for details on the knob.  It's a historic
> optimization that doesn't seem to matter on modern CPUs, so the driver
> default should probably be 0 instead of 2.

thanks, I must have missed this in the man page.

Looking at this in detail now, I wonder if there are other loader settings that 
should be set for netmap use, such as hw.cxgbe.buffer_packing=0 and/or 
hw.cxgbe.allow_mbufs_in_cluster=0?

Thanks,
Lars


signature.asc
Description: Message signed with OpenPGP


netmap & chelsio

2018-07-05 Thread Eggert, Lars
Hi,

when receiving packets via netmap (current GitHub version) on a Chelsio T6 vcc0 
device on -CURRENT, it appears that the Ethernet header starts at an offset of 
two bytes into the netmap slot. So far, I have only used netmap with various 
Intel NICs, where the Ethernet header starts at offset zero.

Is this a bug? Is this to be expected?

Thanks,
Lars


signature.asc
Description: Message signed with OpenPGP


netmap error on -CURRENT with em and igb

2017-02-16 Thread Eggert, Lars
Hi,

I can't put em or igb interfaces into netmap mode on a recent -CURRENT (ix 
interfaces work on the same machines). Here are the pkt-gen and dmesg outputs:

# sudo sysctl dev.netmap.admode=1
# sudo sysctl dev.netmap.verbose=1

# sudo ./pkt-gen -i em1
790.411737 main [2274] interface is em1
790.411753 main [2397] running on 1 cpus (have 20)
790.411823 extract_ip_range [369] range is 10.0.0.1:0 to 10.0.0.1:0
790.411828 extract_ip_range [369] range is 10.1.0.1:0 to 10.1.0.1:0
790.625117 nm_open [898] NIOCREGIF failed: Operation not permitted em1
790.625126 main [2479] Unable to open netmap:em1: Operation not permitted
790.625129 main [2560] aborting

dmesg:
790.411868 [1776] netmap_interp_ringid  em1: tx [0,2) rx [0,2) id 0
790.419431 [ 551] nm_mem_assign_group   iommu_group 0
790.425548 [1148] netmap_config_obj_allocator objsize 1024 clustsize 4096 
objects 4
790.434368 [1148] netmap_config_obj_allocator objsize 36864 clustsize 36864 
objects 1
790.443390 [1148] netmap_config_obj_allocator objsize 2048 clustsize 4096 
objects 2
790.452219 [1270] netmap_finalize_obj_allocator Pre-allocated 25 clusters 
(4/100KB) for 'netmap_if'
790.463605 [1270] netmap_finalize_obj_allocator Pre-allocated 200 clusters 
(36/7200KB) for 'netmap_ring'
790.534919 [1270] netmap_finalize_obj_allocator Pre-allocated 81920 clusters 
(4/327680KB) for 'netmap_buf'
790.546113 [1377] netmap_mem_finalize_all   interfaces 100 KB, rings 7200 KB, 
buffers 320 MB
790.555873 [1380] netmap_mem_finalize_all   Free buffers: 163838
790.605325 [1831] netmap_mem_global_deref   active = 0
790.612967 [ 949] netmap_close  dev 0xf8000a4d7c00 fflag 
0x20003 devtype 8192 td 0xf8000b899000

# sudo ./pkt-gen -i igb1
094.077695 main [2274] interface is igb1
094.077711 main [2397] running on 1 cpus (have 10)
094.077822 extract_ip_range [369] range is 10.0.0.1:0 to 10.0.0.1:0
094.077827 extract_ip_range [369] range is 10.1.0.1:0 to 10.1.0.1:0
094.280311 nm_open [898] NIOCREGIF failed: Operation not permitted igb1
094.280319 main [2479] Unable to open netmap:igb1: Operation not permitted
094.280323 main [2560] aborting

dmesg:
094.077866 [1776] netmap_interp_ringid  igb1: tx [0,8) rx [0,8) id 0
094.085425 [ 551] nm_mem_assign_group   iommu_group 0
094.091449 [1148] netmap_config_obj_allocator objsize 1024 clustsize 4096 
objects 4
094.100125 [1148] netmap_config_obj_allocator objsize 36864 clustsize 36864 
objects 1
094.109004 [1148] netmap_config_obj_allocator objsize 2048 clustsize 4096 
objects 2
094.117705 [1270] netmap_finalize_obj_allocator Pre-allocated 25 clusters 
(4/100KB) for 'netmap_if'
094.128918 [1270] netmap_finalize_obj_allocator Pre-allocated 200 clusters 
(36/7200KB) for 'netmap_ring'
094.199456 [1270] netmap_finalize_obj_allocator Pre-allocated 81920 clusters 
(4/327680KB) for 'netmap_buf'
094.210482 [1377] netmap_mem_finalize_all   interfaces 100 KB, rings 7200 KB, 
buffers 320 MB
094.220078 [1380] netmap_mem_finalize_all   Free buffers: 163838
094.260201 [1831] netmap_mem_global_deref   active = 0
094.268364 [ 949] netmap_close  dev 0xf8000a453000 fflag 
0x20003 devtype 8192 td 0xf80014d45000

Lars


signature.asc
Description: Message signed with OpenPGP


Re: Issues with ixl(4)

2016-04-20 Thread Eggert, Lars
On 2016-04-20, at 1:15, K. Macy  wrote:
> FWIW, NFLX sees performance close to that of cxgbe (by far the best
> maintained, best performing FreeBSD 40G driver) with an iflib
> converted driver. The iflib updated driver will be imported by 11 but
> won't become the default driver until 11.1 for wont of QA resources at
> Intel.

Nice! I still have the cards, so will be sure to test.

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Issues with ixl(4)

2016-04-18 Thread Eggert, Lars
I haven't played with lagg+vlan+bridge, but I briefly evaluated XL710 boards 
last year 
(https://lists.freebsd.org/pipermail/freebsd-net/2015-October/043584.html) and 
saw very poor throughputs and latencies even in very simple setups. As far as I 
could figure it out, TSO/LRO wasn't being performed (although enabled) and so I 
ran into packet-rate issues.

I basically gave up and went with a different vendor. FWIW, the XL710 boards in 
the same machines booted into Linux performed fine.

Lars

> On 2016-04-19, at 6:06, Dustin Marquess  wrote:
> 
> I'm having some strange issues with ixl(4) and a X710-DA4 card in a new-ish
> Intel-based server.  I'm pretty much replicating an existing setup from an
> older AMD machine that used 2 x X520-DA2 cards and ixgbe(4).  This is all
> on -CURRENT.
> 
> It's meant to be a bhyve server, so the 4x10GE ports are put into a
> LACP-based lagg(4), then vlan(4) interfaces are bound to the lagg, and then
> if_bridge(4) interfaces are created to bind the vlan and tap interfaces
> together.
> 
> The X710-DA4 is running the latest NVM from Intel (5.02):
> 
> dev.ixl.3.fw_version: nvm 5.02 etid 80002284 oem 0.0.0
> dev.ixl.2.fw_version: nvm 5.02 etid 80002284 oem 0.0.0
> dev.ixl.1.fw_version: nvm 5.02 etid 80002284 oem 0.0.0
> dev.ixl.0.fw_version: nvm 5.02 etid 80002284 oem 0.0.0
> 
> I've tried both the ixl driver that comes with -CURRENT (1.4.3?) and the
> 1.4.27 driver from Intel and am having the same problem.  The problem is
> this exactly (sorry it's taken me so long to get to it!):
> 
> Using just one interface, one interface + VLANs, the lagg without VLANs,
> etc, everything works perfectly fine.  As soon as I combine
> lagg+vlan+bridge, all hell breaks loose.  One machine can ping one alias on
> the server but not the other while other machines can.  The server itself
> can't ping the DNS server nor the default route, but can ping things
> through the default route, etc.  The behavior is very unpredictable.  ssh
> can take a few times to get in, and then once it, "svn update" will work
> for a few seconds and then bomb out, etc.
> 
> He is the working config from the X520-DA2 system:
> 
> ifconfig_ix0="-lro -tso -txcsum up"
> ifconfig_ix1="-lro -tso -txcsum up"
> ifconfig_ix2="-lro -tso -txcsum up"
> ifconfig_ix3="-lro -tso -txcsum up"
> cloned_interfaces="lagg0 tap0 tap1 bridge0 bridge1 vlan1 vlan2"
> ifconfig_lagg0="laggproto lacp laggport ix0 laggport ix1 laggport ix2
> laggport ix3"
> ifconfig_vlan1="vlan 1 vlandev lagg0"
> ifconfig_vlan2="vlan 2 vlandev lagg0"
> ifconfig_bridge0="inet 192.168.1.100/24 addm vlan1 addm tap0"
> ifconfig_bridge1="addm vlan2 addm tap1"
> defaultrouter="192.168.1.1"
> 
> Here is the "broken" config from the X710-DA4 system:
> 
> ifconfig_ixl0="-rxcsum -txcsum -lro -tso -vlanmtu -vlanhwtag -vlanhwfilter
> -vlanhwtso -vlanhwcsum up"
> ifconfig_ixl1="-rxcsum -txcsum -lro -tso -vlanmtu -vlanhwtag -vlanhwfilter
> -vlanhwtso -vlanhwcsum up"
> ifconfig_ixl2="-rxcsum -txcsum -lro -tso -vlanmtu -vlanhwtag -vlanhwfilter
> -vlanhwtso -vlanhwcsum up"
> ifconfig_ixl3="-rxcsum -txcsum -lro -tso -vlanmtu -vlanhwtag -vlanhwfilter
> -vlanhwtso -vlanhwcsum up"
> cloned_interfaces="lagg0 tap0 tap1 bridge0 bridge1 vlan1 vlan2"
> ifconfig_lagg0="laggproto lacp laggport ixl0 laggport ixl1 laggport ixl2
> laggport ixl3"
> ifconfig_vlan1="vlan 1 vlandev lagg0"
> ifconfig_vlan2="vlan 2 vlandev lagg0"
> ifconfig_bridge0="inet 192.168.1.101/24 addm vlan1 addm tap0"
> ifconfig_bridge1="addm vlan2 addm tap1"
> defaultrouter="192.168.1.1"
> 
> I've changed the various flags in the ifconfig_ixl# lines without any
> obvious differences.  Both machines are connected to the same HPe 5820X
> switch with the same exact config, so I don't believe it's a switch issue.
> 
> Any ideas? Has anybody seen something like this before?
> 
> Thanks!
> -Dustin
> ___
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: ixl 40G bad performance?

2015-12-10 Thread Eggert, Lars
Hi,

On 2015-12-10, at 20:42, Denis Pearson  wrote:
> I can probably find out a snapshot with the code at the time and extract a 
> diff, yes. I just don't know how it worths wasting the time when the problem 
> is not reproducible on the current 1.4.8 driver which will hopefully get into 
> -CURRENT (if it's not already there?).

per my last email, I do see the same issues with 1.4.8.

This is with a single netperf TCP flow, no NIC parameter tuning and no RSS or 
PCBGROUP in the kernel.

> This is why I suggested a transceiver change or replug first.

I will test this next week. (However, the same testbed booted into Linux 
doesn't see these low netperf numbers.)

It really smells like a TSO/LRO (= packet rate) issue. If I configure 
jumbograms, performance jumps up as expected.

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: ixl 40G bad performance?

2015-12-10 Thread Eggert, Lars
On 2015-10-26, at 18:40, Eggert, Lars <l...@netapp.com> wrote:
> On 2015-10-26, at 17:08, Pieper, Jeffrey E <jeffrey.e.pie...@intel.com> wrote:
>> As a caveat, this was using default netperf message sizes.
> 
> I get the same ~3 Gb/s with the default netperf sizes and driver 1.4.5.

Now there is version 1.4.8 on the Intel website, but it doesn't change things 
for me.

> When you tcpdump during the run, do you see TSO/LRO in effect, i.e., do you 
> see "segments" > 32K in the trace?

I still see no TSO/LRO in effect when tcpdump'ing on the receiver; note how all 
the packets are 1448 bytes:

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ixl0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:02:42.328782 IP 10.0.4.1.21507 > 10.0.4.2.12865: Flags [S], seq 15244366, 
win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 478099 ecr 0], length 0
17:02:42.328808 IP 10.0.4.2.12865 > 10.0.4.1.21507: Flags [S.], seq 1819579546, 
ack 15244367, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 
3553932482 ecr 478099], length 0
17:02:42.328842 IP 10.0.4.1.21507 > 10.0.4.2.12865: Flags [.], ack 1, win 1040, 
options [nop,nop,TS val 478099 ecr 3553932482], length 0
17:02:42.329804 IP 10.0.4.1.21507 > 10.0.4.2.12865: Flags [P.], seq 1:657, ack 
1, win 1040, options [nop,nop,TS val 478100 ecr 3553932482], length 656
17:02:42.331671 IP 10.0.4.2.12865 > 10.0.4.1.21507: Flags [P.], seq 1:657, ack 
657, win 1040, options [nop,nop,TS val 3553932485 ecr 478100], length 656
17:02:42.331717 IP 10.0.4.1.10449 > 10.0.4.2.30216: Flags [S], seq 1387798477, 
win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 478102 ecr 0], length 0
17:02:42.331729 IP 10.0.4.2.30216 > 10.0.4.1.10449: Flags [S.], seq 4085135109, 
ack 1387798478, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 
282922 ecr 478102], length 0
17:02:42.331781 IP 10.0.4.1.10449 > 10.0.4.2.30216: Flags [.], ack 1, win 1040, 
options [nop,nop,TS val 478102 ecr 282922], length 0
17:02:42.331796 IP 10.0.4.1.10449 > 10.0.4.2.30216: Flags [.], seq 1:1449, ack 
1, win 1040, options [nop,nop,TS val 478102 ecr 282922], length 1448
17:02:42.331800 IP 10.0.4.1.10449 > 10.0.4.2.30216: Flags [.], seq 1449:2897, 
ack 1, win 1040, options [nop,nop,TS val 478102 ecr 282922], length 1448
17:02:42.331807 IP 10.0.4.2.30216 > 10.0.4.1.10449: Flags [.], ack 2897, win 
1018, options [nop,nop,TS val 282923 ecr 478102], length 0
17:02:42.331809 IP 10.0.4.1.10449 > 10.0.4.2.30216: Flags [.], seq 2897:4345, 
ack 1, win 1040, options [nop,nop,TS val 478102 ecr 282922], length 1448
17:02:42.331813 IP 10.0.4.1.10449 > 10.0.4.2.30216: Flags [.], seq 4345:5793, 
ack 1, win 1040, options [nop,nop,TS val 478102 ecr 282922], length 1448
17:02:42.331817 IP 10.0.4.2.30216 > 10.0.4.1.10449: Flags [.], ack 5793, win 
1018, options [nop,nop,TS val 282923 ecr 478102], length 0
17:02:42.331818 IP 10.0.4.1.10449 > 10.0.4.2.30216: Flags [.], seq 5793:7241, 
ack 1, win 1040, options [nop,nop,TS val 478102 ecr 282922], length 1448
17:02:42.331821 IP 10.0.4.1.10449 > 10.0.4.2.30216: Flags [.], seq 7241:8689, 
ack 1, win 1040, options [nop,nop,TS val 478102 ecr 282922], length 1448
17:02:42.331825 IP 10.0.4.2.30216 > 10.0.4.1.10449: Flags [.], ack 8689, win 
1018, options [nop,nop,TS val 282923 ecr 478102], length 0
17:02:42.331826 IP 10.0.4.1.10449 > 10.0.4.2.30216: Flags [.], seq 8689:10137, 
ack 1, win 1040, options [nop,nop,TS val 478102 ecr 282922], length 1448
17:02:42.331829 IP 10.0.4.1.10449 > 10.0.4.2.30216: Flags [.], seq 10137:11585, 
ack 1, win 1040, options [nop,nop,TS val 478102 ecr 282922], length 1448
...

Doing the same trace over 10G ix interfaces shows most segments in the 8-32K 
range, indicating that TSO/LRO are in use (and results in 9.9G throughput).

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Intel XL710 NVM updating and driver version

2015-11-10 Thread Eggert, Lars
On 2015-11-10, at 18:10, Steven Hartland  wrote:
> 
> Having spent 30mins searching for the FreeBSD utility to perform said update 
> I'm hitting a dead end so looking for advice on where to find this?

No idea. I ended up booting into Linux to flash.

> While searching for this I found that the latest driver on Intel's site is 
> 1.4.8 
> ,
>  in a file named 1.4.5 
> 
>  and reporting 1.4.5 so it looks like the wrong version was uploaded on 
> Intel's download site.

Yep, found that too.

> The driver in HEAD is only 1.4.3 so the next question is there a reason HEAD 
> is behind?

No idea either.

If you're running -CURRENT, would you do me a favor and do a netperf run? I 
fail to get speeds over 5Gb/s, because it seems like TSO/LRO isn't in effect 
(although enabled).

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: ixl 40G bad performance?

2015-10-26 Thread Eggert, Lars
On 2015-10-26, at 4:38, Kevin Oberman  wrote:
> On Sun, Oct 25, 2015 at 12:10 AM, Daniel Engberg <
> daniel.engberg.li...@pyret.net> wrote:
> 
>> One thing I've noticed that probably affects your performance benchmarks
>> somewhat is that you're using iperf(2) instead of the newer iperf3 but I
>> could be wrong...
> 
> iperf3 is not a newer version of iperf. It is a total re-write and a rather
> different tool. It has significant improvements in many areas and new
> capabilities that might be of use. That said, there is no reason to think
> that the results of tests using iperf2 are in any way inaccurate. However,
> it is entirely possible to get misleading results if options not properly
> selected.

FWIW, I've been using netperf and tried various options.

I don't think the issues is the benchmarking tool. I think the issue is TSO/LRO 
issues (per my earlier email.)

Lars



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: ixl 40G bad performance?

2015-10-26 Thread Eggert, Lars
On 2015-10-26, at 15:38, Pieper, Jeffrey E  wrote:
> With the latest ixl component from: 
> https://downloadcenter.intel.com/download/25160/Network-Adapter-Driver-for-PCI-E-40-Gigabit-Network-Connections-under-FreeBSD-
> 
> running on 10.2 amd64, I easily get 9.6 Gb/s with one netperf stream, either 
> b2b or through a switch. This is with no driver/kernel tuning. Running 4 
> streams easily gets me 36 GB/s.

Thanks, will test!

If the newer driver makes a difference, any chance we'll see it in -HEAD soon?

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: ixl 40G bad performance?

2015-10-26 Thread Eggert, Lars
On 2015-10-26, at 17:08, Pieper, Jeffrey E  wrote:
> As a caveat, this was using default netperf message sizes.

I get the same ~3 Gb/s with the default netperf sizes and driver 1.4.5.

When you tcpdump during the run, do you see TSO/LRO in effect, i.e., do you see 
"segments" > 32K in the trace?

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: ixl 40G bad performance?

2015-10-24 Thread Eggert, Lars
On 2015-10-23, at 23:36, Eric Joyner  wrote:
> I see that the sysctl does clobber the global value, but have you tried 
> lowering the interval / raising the rate? You could try something like 
> 10usecs, and see if that helps. We'll do some more investigation here -- 
> 3Gb/s on a 40Gb/s using default settings is terrible, and we shouldn't let 
> that be happening.

I played with different settings, but I've never been able to get more than 
4Gb/s, whereas under Linux 4.2 without any special settings I get 13.

See my other email on TSO/LRO not looking to be effective; that would certainly 
explain it. Plausible? Anything to try here?

Lars



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: ixl 40G bad performance?

2015-10-24 Thread Eggert, Lars
On 2015-10-24, at 10:32, Jack Vogel <jfvo...@gmail.com> wrote:
> 13 on a 40G interface?? I don't think that's very good for Linux either, is
> this a 4x10 adapter?

No, its's a 2x40. And I can get it into the high 30s with tuning. I just 
mentioned the value to illustrate that something seems to be seriously broken 
under FreeBSD.

Lars

> Maybe elaborating on the details of the hardware, you sure you don't have a
> bad PCI slot
> somewhere that might be throttling everything?
> 
> Cheers,
> 
> Jack
> 
> 
> On Sat, Oct 24, 2015 at 12:43 AM, Eggert, Lars <l...@netapp.com> wrote:
> 
>> On 2015-10-23, at 23:36, Eric Joyner <e...@freebsd.org> wrote:
>> 
>> I see that the sysctl does clobber the global value, but have you tried
>> lowering the interval / raising the rate? You could try something like
>> 10usecs, and see if that helps. We'll do some more investigation here --
>> 3Gb/s on a 40Gb/s using default settings is terrible, and we shouldn't let
>> that be happening.
>> 
>> 
>> I played with different settings, but I've never been able to get more
>> than 4Gb/s, whereas under Linux 4.2 without any special settings I get 13.
>> 
>> See my other email on TSO/LRO not looking to be effective; that would
>> certainly explain it. Plausible? Anything to try here?
>> 
>> Lars
>> 
>> 
> ___
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: ixl 40G bad performance?

2015-10-22 Thread Eggert, Lars
Hi,

for those of you following along, I did try jumbograms and throughput increases 
roughly 5x. So it looks like I'm hitting a packet-rate limit somewhere.

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: ixl 40G bad performance?

2015-10-22 Thread Eggert, Lars
On 2015-10-22, at 9:38, Eggert, Lars <l...@netapp.com> wrote:
> for those of you following along, I did try jumbograms and throughput 
> increases roughly 5x. So it looks like I'm hitting a packet-rate limit 
> somewhere.

Does the ixl driver have an issue with TSO/LRO?

If I tcpdump on the receiver when testing the 10G ix interfaces, I see that 
most "packets" are up to 64KB in the traces on both sender and receiver, which 
is expected with TSO/LRO.

When I look at the traffic over the ixl interfaces, I see that most "packets" 
on the sender are much smaller (~2896 aka 2 segments; although some few are 
>40K). On the receiver, I only see 1448 byte packets.

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: ixl 40G bad performance?

2015-10-21 Thread Eggert, Lars
Hi Jack,

On 2015-10-21, at 16:14, Jack Vogel  wrote:
> The 40G hardware is absolutely dependent on firmware, if you have a mismatch
> for instance, it can totally bork things. So, I would work with your Intel
> rep and be sure you have the correct version for your specific hardware.

I got these tester cards from Amazon, so I don't have a rep.

I flashed the latest NVM (1.2.5), because previously the FreeBSD driver was 
complaining about the firmware being too old. But I did that before the 
experiments.

If there is anything else I should be doing, I'd appreciate being put in 
contact with someone at Intel who can help.

Thanks,
Lars
Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: ixl 40G bad performance?

2015-10-21 Thread Eggert, Lars
Hi Bruce,

thanks for the very detailed analysis of the ixl sysctls!

On 2015-10-20, at 16:51, Bruce Evans  wrote:
> 
> Lowering (improving) latency always lowers (unimproves) throughput by
> increasing load.

That, I also understand. But even when I back off the itr values to something 
more reasonable, throughput still remains low.

With all the tweaking I have tried, I have yet to top 3 Gb/s with ixl cards, 
whereas they do ~13 Gb/s on Linux straight out of the box.

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: ixl 40G bad performance?

2015-10-20 Thread Eggert, Lars
Hi,

On 2015-10-20, at 10:24, Ian Smith <smi...@nimnet.asn.au> wrote:
> Actually, you want to set hw.acpi.cpu.cx_lowest=C1 instead.

Done.

On 2015-10-19, at 17:55, Luigi Rizzo <ri...@iet.unipi.it> wrote:
> On Mon, Oct 19, 2015 at 8:34 AM, Eggert, Lars <l...@netapp.com> wrote:
>> The only other sysctls in ixl(4) that look relevant are:
>> 
>> hw.ixl.rx_itr
>> The RX interrupt rate value, set to 8K by default.
>> 
>> hw.ixl.tx_itr
>> The TX interrupt rate value, set to 4K by default.
>> 
> 
> yes those. raise to 20-50k and see what you get in
> terms of ping latency.

While ixl(4) talks about 8K and 4K, the defaults actually seem to be:

hw.ixl.tx_itr: 122
hw.ixl.rx_itr: 62

Doubling those values *increases* flood ping latency to ~200 usec (from ~116 
usec).

Halving them to 62/31 decreases flood ping latency to ~50 usec, but still 
doesn't increase iperf throughput (still 2.8 Gb/s). Going to 31/16 further 
drops latency to 24 usec, with no change in throughput.

(Looking at the "interrupt Moderation parameters" #defines in sys/dev/ixl/ixl.h 
it seems that ixl likes to have its irq rates specified with some weird divider 
scheme.)

With 5/5 (which corresponds to IXL_ITR_100K), I get down to 16 usec. 
Unfortunately, throughput is then also down to about 2 Gb/s.

One thing I noticed in top is that one queue irq is using quite a bit of CPU 
when I run iperf:

   11  0   -92- 0K  1152K CPU22   0:19  50.98% intr{irq293: 
ixl1:q2}
   11  0   -92- 0K  1152K WAIT3   0:02   5.18% intr{irq294: 
ixl1:q3}
0  0   -920 0K  8944K -  25   0:01   1.07% kernel{ixl1 que}
   11  0   -92- 0K  1152K WAIT1   0:01   0.00% intr{irq292: 
ixl1:q1}
   11  0   -92- 0K  1152K WAIT0   0:00   0.00% intr{irq291: 
ixl1:q0}
0  0   -920 0K  8944K -  22   0:00   0.00% kernel{ixl1 
adminq}
0  0   -920 0K  8944K -  31   0:00   0.00% kernel{ixl1 que}
0  0   -920 0K  8944K -  31   0:00   0.00% kernel{ixl1 que}
0  0   -920 0K  8944K -  31   0:00   0.00% kernel{ixl1 que}
   11  0   -92- 0K  1152K WAIT   -1   0:00   0.00% intr{irq290: 
ixl1:aq}

With 10G ix interfaces and a throughput of ~9Gb/s, the CPU load is much lower:

   11  0   -92- 0K  1152K WAIT0   0:05   7.67% intr{irq274: 
ix0:que }
0  0   -920 0K  8944K -  27   0:00   0.29% kernel{ix0 que}
0  0   -920 0K  8944K -  10   0:00   0.00% kernel{ix0 linkq}
   11  0   -92- 0K  1152K WAIT1   0:00   0.00% intr{irq275: 
ix0:que }
   11  0   -92- 0K  1152K WAIT3   0:00   0.00% intr{irq277: 
ix0:que }
   11  0   -92- 0K  1152K WAIT2   0:00   0.00% intr{irq276: 
ix0:que }
   11  0   -92- 0K  1152K WAIT   18   0:00   0.00% intr{irq278: 
ix0:link}
0  0   -920 0K  8944K -   0   0:00   0.00% kernel{ix0 que}
0  0   -920 0K  8944K -   0   0:00   0.00% kernel{ix0 que}
0  0   -920 0K  8944K -   0   0:00   0.00% kernel{ix0 que}

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


ixl 40G bad performance?

2015-10-19 Thread Eggert, Lars
Hi,

I'm running a few simple tests on -CURRENT with a pair of dual-port Intel XL710 
boards, which are seen by the kernel as:

ixl0:  mem 
0xdc80-0xdcff,0xdd808000-0xdd80 irq 32 at device 0.0 on pci3
ixl0: Using MSIX interrupts with 33 vectors
ixl0: f4.40 a1.4 n04.53 e80001dca
ixl0: Using defaults for TSO: 65518/35/2048
ixl0: Ethernet address: 68:05:ca:32:0b:98
ixl0: PCI Express Bus: Speed 8.0GT/s Width x8
ixl0: netmap queues/slots: TX 32/1024, RX 32/1024
ixl1:  mem 
0xdc00-0xdc7f,0xdd80-0xdd807fff irq 32 at device 0.1 on pci3
ixl1: Using MSIX interrupts with 33 vectors
ixl1: f4.40 a1.4 n04.53 e80001dca
ixl1: Using defaults for TSO: 65518/35/2048
ixl1: Ethernet address: 68:05:ca:32:0b:99
ixl1: PCI Express Bus: Speed 8.0GT/s Width x8
ixl1: netmap queues/slots: TX 32/1024, RX 32/1024
ixl0: link state changed to UP
ixl1: link state changed to UP

I have two identical machines connected with patch cables (no switch). iperf 
performance is bad:

# iperf -c 10.0.1.2

Client connecting to 10.0.1.2, TCP port 5001
TCP window size: 32.5 KByte (default)

[  3] local 10.0.1.1 port 19238 connected with 10.0.1.2 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec  3.91 GBytes  3.36 Gbits/sec

As is flood ping latency:

# sudo ping -f 10.0.1.2
PING 10.0.1.2 (10.0.1.2): 56 data bytes
.^C
--- 10.0.1.2 ping statistics ---
41927 packets transmitted, 41926 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.084/0.116/0.145/0.002 ms

Any ideas on what's going on here? Testing 10G ix interfaces between the same 
two machines results in 9.39 Gbits/sec and flood ping latencies of 17 usec.

Thanks,
Lars

PS: Full dmesg attached.

Copyright (c) 1992-2015 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.0-CURRENT #2 483de3c(muclab)-dirty: Mon Oct 19 11:01:16 CEST 2015

el...@laurel.muccbc.hq.netapp.com:/usr/home/elars/obj/usr/home/elars/src/sys/MUCLAB
 amd64
FreeBSD clang version 3.7.0 (tags/RELEASE_370/final 246257) 20150906
VT(vga): resolution 640x480
CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz (2000.05-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x206d7  Family=0x6  Model=0x2d  Stepping=7
  
Features=0xbfebfbff
  
Features2=0x1fbee3ff
  AMD Features=0x2c100800
  AMD Features2=0x1
  XSAVE Features=0x1
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
  TSC: P-state invariant, performance statistics
real memory  = 137438953472 (131072 MB)
avail memory = 133484290048 (127300 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: < >
FreeBSD/SMP: Multiprocessor System Detected: 32 CPUs
FreeBSD/SMP: 2 package(s) x 8 core(s) x 2 SMT threads
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
 cpu4 (AP): APIC ID:  4
 cpu5 (AP): APIC ID:  5
 cpu6 (AP): APIC ID:  6
 cpu7 (AP): APIC ID:  7
 cpu8 (AP): APIC ID:  8
 cpu9 (AP): APIC ID:  9
 cpu10 (AP): APIC ID: 10
 cpu11 (AP): APIC ID: 11
 cpu12 (AP): APIC ID: 12
 cpu13 (AP): APIC ID: 13
 cpu14 (AP): APIC ID: 14
 cpu15 (AP): APIC ID: 15
 cpu16 (AP): APIC ID: 32
 cpu17 (AP): APIC ID: 33
 cpu18 (AP): APIC ID: 34
 cpu19 (AP): APIC ID: 35
 cpu20 (AP): APIC ID: 36
 cpu21 (AP): APIC ID: 37
 cpu22 (AP): APIC ID: 38
 cpu23 (AP): APIC ID: 39
 cpu24 (AP): APIC ID: 40
 cpu25 (AP): APIC ID: 41
 cpu26 (AP): APIC ID: 42
 cpu27 (AP): APIC ID: 43
 cpu28 (AP): APIC ID: 44
 cpu29 (AP): APIC ID: 45
 cpu30 (AP): APIC ID: 46
 cpu31 (AP): APIC ID: 47
ioapic0  irqs 0-23 on motherboard
ioapic1  irqs 24-47 on motherboard
ioapic2  irqs 48-71 on motherboard
random: entropy device external interface
module_register_init: MOD_LOAD (vesa, 0x8094fb90, 0) error 19
netmap: loaded module
vtvga0:  on motherboard
smbios0:  at iomem 0xf04d0-0xf04ee on motherboard
smbios0: Version: 2.7, BCD Revision: 2.7
cryptosoft0:  on motherboard
aesni0:  on motherboard
acpi0:  on motherboard
acpi0: Power Button (fixed)
cpu0:  on acpi0
cpu1:  on acpi0
cpu2:  on acpi0
cpu3:  on acpi0
cpu4:  on acpi0
cpu5:  on acpi0
cpu6:  on acpi0
cpu7:  on acpi0
cpu8:  on acpi0
cpu9:  on acpi0
cpu10:  on acpi0
cpu11:  on acpi0
cpu12:  on acpi0
cpu13:  on acpi0
cpu14:  on acpi0
cpu15:  on acpi0
cpu16:  on acpi0
cpu17:  on acpi0
cpu18:  on acpi0
cpu19:  on acpi0
cpu20:  on acpi0
cpu21:  on acpi0
cpu22:  on acpi0

Re: ixl 40G bad performance?

2015-10-19 Thread Eggert, Lars
Hi,

On 2015-10-19, at 16:20, Luigi Rizzo  wrote:
> 
> i would look at the following:
> - c states and clock speed - make sure you never go below C1,
>  and fix the clock speed to max.
>  Sure these parameters also affect the 10G card, but there
>  may be strange interaction that trigger the power saving
>  modes in different ways

I already have powerd_flags="-a max -b max -n max" in rc.conf, which I hope 
should be enough.

> - interrupt moderation (may affect ping latency,
>  do not remember how it is set in ixl but probably a sysctl

ixl(4) describes two sysctls that sound like they control AIM, and they default 
to off:

hw.ixl.dynamic_tx_itr: 0
hw.ixl.dynamic_rx_itr: 0

> - number of queues (32 is a lot i wouldn't use more than 4-8),
>  may affect cpu-socket affinity

With hw.ixl.max_queues=4 in loader.conf, performance is still unchanged.

> - tso and flow director - i have seen bad effects of
>  accelerations so i would run the iperf test with
>  of these features disabled on both sides, and then enable
>  them one at a time

No change with "ifconfig -tso4 -tso6 -rxcsum -txcsum -lro".

How do I turn off flow director?

> - queue sizes - the driver seems to use 1024 slots which is
>  about 1.5 MB queued, which in turn means you have 300us
>  (and possibly half of that) to drain the queue at 40Gbit/s.
>  150-300us may seem an eternity, but if a couple of cores fall
>  into c7 your budget is gone and the loss will trigger a
>  retransmission and window halving etc.

Also no change with "hw.ixl.ringsz=256" in loader.conf.

This is really weird.

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: ixl 40G bad performance?

2015-10-19 Thread Eggert, Lars
Hi,

in order to eliminate network or hardware weirdness, I've rerun the test with 
Linux 4.3rc6, where I get 13.1 Gbits/sec throughput and 52 usec flood ping 
latency. Not great either, but in line with earlier experiments with Mellanox 
NICs and an untuned Linux system.

On 2015-10-19, at 17:11, Luigi Rizzo  wrote:
> I suspect it might not touch the c states, but better check. The safest is
> disable them in the bios.

I'll try that.

>> hw.ixl.dynamic_tx_itr: 0
>> hw.ixl.dynamic_rx_itr: 0
>> 
>> 
> There must be some other control for the actual (fixed, not dynamic)
> moderation.

The only other sysctls in ixl(4) that look relevant are:

 hw.ixl.rx_itr
 The RX interrupt rate value, set to 8K by default.

 hw.ixl.tx_itr
 The TX interrupt rate value, set to 4K by default.

I'll play with those.

>> Also no change with "hw.ixl.ringsz=256" in loader.conf.
> 
> Any better success with 2048 slots?
> 3.5 gbit  is what I used to see on the ixgbe with tso disabled, probably
> hitting a CPU bound.

Will try.

Thanks!

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Testing Congestion Control Algorithms

2015-04-23 Thread Eggert, Lars
Hi,

On 2015-4-23, at 09:17, Karlis Laivins karlis.laiv...@gmail.com wrote:
 I am currently working on a modification of TCP NewReno congestion control
 algorithm. It seems that I have been able to write a working module.
 
 Now, I am looking for a way to test the performance of the built-in
 congestion control algorithms and the new algorithm. I have heard about the
 NS-2 simulator, and I am trying to compile and configure it now, but that's
 just a statistical tool (from what I hear) and the results are far from
 reality (please correct me, if I am wrong).
 
 Please recommend a tool or way I can test the performance of the congestion
 control algorithm in a real environment (sender side - 2 Computers, one
 connected to the wireless network, other to a wire, receiver - one PC,
 running FTP server, both senders each sending a big file at the same time).
 I would like to get comparable performance results from each of the
 existing congestion control algorithm as well as the new one I have created
 by modifying the NewReno algorithm.

I think you are moving away from the scope where freebsd-net is the correct 
mailing list. 

There are literally thousands of research papers comparing congestion control 
algorithms and other TCP improvements. I suggest you check some of those (Sally 
Floyd's papers http://www.icir.org/floyd/ are still a good starting point) and 
read up on what the ICCRG has done: https://irtf.org/iccrg

Lars
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


siftr buildkernel failure?

2015-03-24 Thread Eggert, Lars
Anyone else seeing this on -current right now?

/usr/home/elars/src/sys/modules/siftr/../../netinet/siftr.c:493:7: error: data 
argument not used by format string [-Werror,-Wformat-extra-args]
pkt_node-flowid,
^

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Intel DPDK added to FreeBSD ports collection

2014-10-29 Thread Eggert, Lars
Hi,

On 2014-10-20, at 17:40, Jim Harris jimhar...@freebsd.org wrote:
 Just wanted to send a heads-up that Intel's Data Plane Development Kit
 (DPDK) for high speed packet processing was added to the FreeBSD ports
 collection last week under net/dpdk.
...
 For any questions, please check out dpdk.org and the developer mailing list
 (d...@dpdk.org).

that list seems to be more about the development of the kit, rather that about 
using the kit?

I've started to play with DPDK under FreeBSD, and am getting the following 
warning:

EAL: WARNING: clock_gettime cannot use CLOCK_MONOTONIC_RAW and HPET is not 
available - clock timings may be less accurate.

This is with the HPET changes rpaulo@ recently committed, and /dev/hpet0 is 
present.

Lars

smime.p7s
Description: S/MIME cryptographic signature


Re: netmap on ubuntu 14.04?

2014-10-01 Thread Eggert, Lars
On 2014-9-30, at 11:25, Stefano Garzarella stefanogarzare...@gmail.com wrote:
 for linux 3.16, can you try with next branch in
 https://code.google.com/p/netmap/?

Can confirm that the next branch works (as in, drivers compile correctly for 
3.16). Not tested behavior/performance yet.

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: netmap on ubuntu 14.04?

2014-09-30 Thread Eggert, Lars
On 2014-9-26, at 15:19, Leupoldt, Martin martin.leupo...@hob.de wrote:
 has anyone experience about netmap on a Ubuntu 14.04 machine?

I've compiled it with 3.13 under Debian; 3.16 fails to compile because the 
patch doesn't apply cleanly.

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: netmap wishlist

2014-09-12 Thread Eggert, Lars
Hi,

On 2014-9-12, at 9:31, Luigi Rizzo ri...@iet.unipi.it wrote:
 there is something already available/in progress for some of the above,
 but here are my thoughts on the various subjects:
 
 - netmap is designed to work with large frames, by setting the buffer
   size to something suitable (using a sysctl).
...
 The downside is some waste on buffers (they are fixed size so having
   to allocate say 16K for a 64 byte frame is a bit annoying).

that's OK for what I'm doing.

 - checksums offloading can be added trivially in the *_txsync(),
   once again on a per-nic basis.
   Problem is, is we start adding per-packet features (say, checksums,
   scatter-gather I/O, segmentation) in the inner loop of *_txsync()
   we are going to lose some performance for high rate applications.

What about making these things compile-time options? I totally see that if you 
want to use netmap for fast switching, you wouldn't want these. But if you use 
netmap for operating on IP and transport protocol packets, they become really 
essential. (Esp. at 40G - which reminds me that I forgot to add netmap support 
for the ixl driver to the wishlist...)

 - the VALE switch has support for segmentation and checksum avoidance.
   Clients can register as virtio-net capable: in this case the port will
   accept/deliver large segments across that port, and do segmentation and
   checksum as required for ports that are not virtio-net enabled
   (e.g. physical NICs attached to the same VALE switch).
   This was developed earlier this year by Vincenzo Maffione.

I may look into this. I'm unclear if adding a VALE layer into the system just 
to get this feature would be wort it in terms of performance.

   We could probably leverage this code to work also on top of NICs
   connected through netmap, e.g. programming the NIC to use its own
   native offloading, but i am skeptical about the usefulness and
   concerned about the potential performance loss in *_txsync().

I totally see that, but maybe a compile-time option would work. There are 
several distinct use cases for netmap at the moment, and it's unlikely that the 
same system would need to support several of them, so compile-time 
specialization may be sufficient here.

 - Stefano Garzarella has some code to do software GSO (this is for FreeBSD,
   linux already has something similar), which will be presented at
   EuroBSDCon later this month in Sofia. This should address the
   segmentation issue on the host stack.

Nice, I will take a look.

 - on the receive side, both FreeBSD and Linux have an efficient
   RLO software fallback in case the NIC does not support it
   natively, i think we do not need this at the NIC/switch level.

OK, I need to look into this.

Oh, and my list was prioritized - I think the checksum offload would be the 
real winner when dealing munging IP and transport packets.

Thanks,
Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: netmap extra rings and buffers

2014-09-05 Thread Eggert, Lars
Thank you!

On 2014-9-4, at 17:48, Luigi Rizzo ri...@iet.unipi.it wrote:

 On Thu, Sep 04, 2014 at 11:58:28AM +, Eggert, Lars wrote:
 Hi Luigi,
 
 I'm allocating extra rings and/or extra buffers via the nr_arg1/nr_arg3 
 parameters for NIOCREGIF.
 
 Once I've done that, how do I actually access those rings and buffers?
 
 For extra rings, the documentation and example code don't really say 
 anything.
 
 For extra buffers, the documentation says nifp-ni_bufs_head will be the 
 index of the first buffer but doesn't really explain how I can find the 
 buffer given its index (since it's not in a ring, the NETMAP_BUF macro 
 doesn't seem to apply?) The part about buffers are linked to each other 
 using the first uint32_t as the index is also unclear to me.
 
 Do you have some more text or example code that shows how to use extra rings 
 and buffers?
 
 the ifield to request extra rings is only important when you want
 to make sure that the memory region for a VALE port has also
 space to host some pipes. Otherwise, for physical ports (which at
 the moment all share the same address space) there is not a real
 need to specify it.
 
 For the extra buffers, remember that NETMAP_BUF() can translate
 buffer indexes for any netmap buffer, even those not in a ring.
 All it does is grab the base address of the buffer pool from the
 ring, and add the buffer index times the buffer size.
 
 So you can navigate the pool of extra buffers as follows
 
uint32_t x = nifp-ni_bufs_head;   // index of first buf
 
void *p = NETMAP_BUF(any_ring, x); // address of the first buffer
 
x = *((uint32_t *)p);  // index of the next buffer
 
 cheers
 luigi



signature.asc
Description: Message signed with OpenPGP using GPGMail


netmap extra rings and buffers

2014-09-04 Thread Eggert, Lars
Hi Luigi,

I'm allocating extra rings and/or extra buffers via the nr_arg1/nr_arg3 
parameters for NIOCREGIF.

Once I've done that, how do I actually access those rings and buffers?

For extra rings, the documentation and example code don't really say anything.

For extra buffers, the documentation says nifp-ni_bufs_head will be the index 
of the first buffer but doesn't really explain how I can find the buffer given 
its index (since it's not in a ring, the NETMAP_BUF macro doesn't seem to 
apply?) The part about buffers are linked to each other using the first 
uint32_t as the index is also unclear to me.

Do you have some more text or example code that shows how to use extra rings 
and buffers?

Thanks,
Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: igb requests for mbufs denied

2014-09-02 Thread Eggert, Lars
On 2014-9-1, at 21:09, John-Mark Gurney j...@funkthat.com wrote:
 Still waiting for the other info I requested in my email...

Sorry, which other info? (The message is not in dmesg, it's from netstat -m.)

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: igb requests for mbufs denied

2014-09-01 Thread Eggert, Lars
On 2014-8-30, at 0:32, John-Mark Gurney j...@funkthat.com wrote:
 Also, what does sysctl dev.em and sysctl dev.igb show?

The box has no em interfaces.

[root@laurel: ~] sysctl dev.igb
dev.igb.0.%desc: Intel(R) PRO/1000 Network Connection version - 2.4.0
dev.igb.0.%driver: igb
dev.igb.0.%location: slot=0 function=0 handle=\_SB_.PCI0.RP03.LAN1
dev.igb.0.%pnpinfo: vendor=0x8086 device=0x1533 subvendor=0x1734 
subdevice=0x11f1 class=0x02
dev.igb.0.%parent: pci4
dev.igb.0.nvm: -1
dev.igb.0.enable_aim: 1
dev.igb.0.fc: 3
dev.igb.0.rx_processing_limit: 100
dev.igb.0.dmac: 0
dev.igb.0.eee_disabled: 0
dev.igb.0.link_irq: 0
dev.igb.0.dropped: 0
dev.igb.0.tx_dma_fail: 0
dev.igb.0.rx_overruns: 0
dev.igb.0.watchdog_timeouts: 0
dev.igb.0.device_control: 135266881
dev.igb.0.rx_control: 4194304
dev.igb.0.interrupt_mask: 0
dev.igb.0.extended_int_mask: 2147483648
dev.igb.0.tx_buf_alloc: 0
dev.igb.0.rx_buf_alloc: 0
dev.igb.0.fc_high_water: 31328
dev.igb.0.fc_low_water: 31312
dev.igb.0.queue0.no_desc_avail: 0
dev.igb.0.queue0.tx_packets: 0
dev.igb.0.queue0.rx_packets: 0
dev.igb.0.queue0.rx_bytes: 0
dev.igb.0.queue0.lro_queued: 0
dev.igb.0.queue0.lro_flushed: 0
dev.igb.0.mac_stats.excess_coll: 0
dev.igb.0.mac_stats.single_coll: 0
dev.igb.0.mac_stats.multiple_coll: 0
dev.igb.0.mac_stats.late_coll: 0
dev.igb.0.mac_stats.collision_count: 0
dev.igb.0.mac_stats.symbol_errors: 0
dev.igb.0.mac_stats.sequence_errors: 0
dev.igb.0.mac_stats.defer_count: 0
dev.igb.0.mac_stats.missed_packets: 0
dev.igb.0.mac_stats.recv_no_buff: 0
dev.igb.0.mac_stats.recv_undersize: 0
dev.igb.0.mac_stats.recv_fragmented: 0
dev.igb.0.mac_stats.recv_oversize: 0
dev.igb.0.mac_stats.recv_jabber: 0
dev.igb.0.mac_stats.recv_errs: 0
dev.igb.0.mac_stats.crc_errs: 0
dev.igb.0.mac_stats.alignment_errs: 0
dev.igb.0.mac_stats.coll_ext_errs: 0
dev.igb.0.mac_stats.xon_recvd: 0
dev.igb.0.mac_stats.xon_txd: 0
dev.igb.0.mac_stats.xoff_recvd: 0
dev.igb.0.mac_stats.xoff_txd: 0
dev.igb.0.mac_stats.total_pkts_recvd: 0
dev.igb.0.mac_stats.good_pkts_recvd: 0
dev.igb.0.mac_stats.bcast_pkts_recvd: 0
dev.igb.0.mac_stats.mcast_pkts_recvd: 0
dev.igb.0.mac_stats.rx_frames_64: 0
dev.igb.0.mac_stats.rx_frames_65_127: 0
dev.igb.0.mac_stats.rx_frames_128_255: 0
dev.igb.0.mac_stats.rx_frames_256_511: 0
dev.igb.0.mac_stats.rx_frames_512_1023: 0
dev.igb.0.mac_stats.rx_frames_1024_1522: 0
dev.igb.0.mac_stats.good_octets_recvd: 0
dev.igb.0.mac_stats.good_octets_txd: 0
dev.igb.0.mac_stats.total_pkts_txd: 0
dev.igb.0.mac_stats.good_pkts_txd: 0
dev.igb.0.mac_stats.bcast_pkts_txd: 0
dev.igb.0.mac_stats.mcast_pkts_txd: 0
dev.igb.0.mac_stats.tx_frames_64: 0
dev.igb.0.mac_stats.tx_frames_65_127: 0
dev.igb.0.mac_stats.tx_frames_128_255: 0
dev.igb.0.mac_stats.tx_frames_256_511: 0
dev.igb.0.mac_stats.tx_frames_512_1023: 0
dev.igb.0.mac_stats.tx_frames_1024_1522: 0
dev.igb.0.mac_stats.tso_txd: 0
dev.igb.0.mac_stats.tso_ctx_fail: 0
dev.igb.0.interrupts.asserts: 0
dev.igb.0.interrupts.rx_pkt_timer: 0
dev.igb.0.interrupts.rx_abs_timer: 0
dev.igb.0.interrupts.tx_pkt_timer: 0
dev.igb.0.interrupts.tx_abs_timer: 0
dev.igb.0.interrupts.tx_queue_empty: 0
dev.igb.0.interrupts.tx_queue_min_thresh: 0
dev.igb.0.interrupts.rx_desc_min_thresh: 0
dev.igb.0.interrupts.rx_overrun: 0
dev.igb.0.host.breaker_tx_pkt: 0
dev.igb.0.host.host_tx_pkt_discard: 0
dev.igb.0.host.rx_pkt: 0
dev.igb.0.host.breaker_rx_pkts: 0
dev.igb.0.host.breaker_rx_pkt_drop: 0
dev.igb.0.host.tx_good_pkt: 0
dev.igb.0.host.breaker_tx_pkt_drop: 0
dev.igb.0.host.rx_good_bytes: 0
dev.igb.0.host.tx_good_bytes: 0
dev.igb.0.host.length_errors: 0
dev.igb.0.host.serdes_violation_pkt: 0
dev.igb.0.host.header_redir_missed: 0
dev.igb.1.%desc: Intel(R) PRO/1000 Network Connection version - 2.4.0
dev.igb.1.%driver: igb
dev.igb.1.%location: slot=0 function=0 handle=\_SB_.PCI0.RP04.LAN2
dev.igb.1.%pnpinfo: vendor=0x8086 device=0x1533 subvendor=0x1734 
subdevice=0x11f1 class=0x02
dev.igb.1.%parent: pci5
dev.igb.1.nvm: -1
dev.igb.1.enable_aim: 1
dev.igb.1.fc: 3
dev.igb.1.rx_processing_limit: 100
dev.igb.1.dmac: 0
dev.igb.1.eee_disabled: 0
dev.igb.1.link_irq: 0
dev.igb.1.dropped: 0
dev.igb.1.tx_dma_fail: 0
dev.igb.1.rx_overruns: 0
dev.igb.1.watchdog_timeouts: 0
dev.igb.1.device_control: 135266881
dev.igb.1.rx_control: 4194304
dev.igb.1.interrupt_mask: 0
dev.igb.1.extended_int_mask: 2147483648
dev.igb.1.tx_buf_alloc: 0
dev.igb.1.rx_buf_alloc: 0
dev.igb.1.fc_high_water: 31328
dev.igb.1.fc_low_water: 31312
dev.igb.1.queue0.no_desc_avail: 0
dev.igb.1.queue0.tx_packets: 0
dev.igb.1.queue0.rx_packets: 0
dev.igb.1.queue0.rx_bytes: 0
dev.igb.1.queue0.lro_queued: 0
dev.igb.1.queue0.lro_flushed: 0
dev.igb.1.mac_stats.excess_coll: 0
dev.igb.1.mac_stats.single_coll: 0
dev.igb.1.mac_stats.multiple_coll: 0
dev.igb.1.mac_stats.late_coll: 0
dev.igb.1.mac_stats.collision_count: 0
dev.igb.1.mac_stats.symbol_errors: 0
dev.igb.1.mac_stats.sequence_errors: 0
dev.igb.1.mac_stats.defer_count: 0
dev.igb.1.mac_stats.missed_packets: 0
dev.igb.1.mac_stats.recv_no_buff: 0

Re: tutorial on Netmap in Mountain View - Aug.28

2014-09-01 Thread Eggert, Lars
On 2014-8-18, at 12:29, Carlos Ferreira carlosmf...@gmail.com wrote:
 Do you have presentations or tutorial code from that tutorial, that you can
 share here?

+1

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: igb requests for mbufs denied

2014-09-01 Thread Eggert, Lars
On 2014-8-30, at 7:24, Adrian Chadd adr...@freebsd.org wrote:
 What's the output of vmstat -z ?

[root@laurel: ~] vmstat -z
ITEM   SIZE  LIMIT USED FREE  REQ FAIL SLEEP

UMA Kegs:   384,  0,  99,   1,  99,   0,   0
UMA Zones: 1152,  0,  99,   0,  99,   0,   0
UMA Slabs:   80,  0,   15827,  23,   56426,   0,   0
UMA RCntSlabs:   88,  0,2283,  12,2283,   0,   0
UMA Hash:   256,  0,   5,  10,   8,   0,   0
4 Bucket:32,  0,  40, 835,   97770,   0,   0
8 Bucket:64,  0, 168, 514,   95742,   0,   0
16 Bucket:  128,  0,  91, 715,   44919, 202,   0
32 Bucket:  256,  0,3084,1386,   32197, 106,   0
64 Bucket:  512,  0,1579,2805,   37281, 101,   0
128 Bucket:1024,  0,2043, 361,   69687,82929,   0
vmem btag:   56,  0,   48977,   12722,  346536, 869,   0
VM OBJECT:  256,  0,  154107, 348,  389148,   0,   0
RADIX NODE: 144,  0,  184412,   22624,  674520,  49,   0
MAP:240,  0,   3,  61,   3,   0,   0
KMAP ENTRY: 128,  0,   9, 394,   9,   0,   0
MAP ENTRY:  128,  0, 760,1131,  430304,   0,   0
VMSPACE:448,  0,  28, 179,   11739,   0,   0
fakepg: 104,  0,   0,   0,   0,   0,   0
mt_zone:   4112,  0, 360,   0, 360,   0,   0
16:  16,  0,4012, 506,  518238,   0,   0
32:  32,  0,3760, 615,   36818,   0,   0
64:  64,  0,   14097,   74625, 3633439,   0,   0
128:128,  0,8331,  125930, 1771231,   0,   0
256:256,  0,1027,   82298, 1877944,   0,   0
512:512,  0, 854,   95994,  345637,   0,   0
1024:  1024,  0,  84, 168,   93998,   0,   0
2048:  2048,  0, 631, 733,   22957,   0,   0
4096:  4096,  0, 501,  32,   14417,   0,   0
SLEEPQUEUE:  80,  0, 193, 458, 193,   0,   0
uint64 pcpu:  8,  0,1602,  62,1602,   0,   0
Files:   80,  0, 147, 603, 1158104,   0,   0
TURNSTILE:  136,  0, 193, 187, 193,   0,   0
rl_entry:40,  0,  86, 414,  86,   0,   0
umtx pi: 96,  0,   0,   0,   0,   0,   0
MAC labels:  40,  0,   0,   0,   0,   0,   0
PROC:  1208,  0,  53,  52,   11825,   0,   0
THREAD:1168,  0, 171,  21, 189,   0,   0
cpuset:  72,  0,  87, 463, 149,   0,   0
audit_record:  1248,  0,   0,   0,   0,   0,   0
mbuf_packet:256, 52127340,3069, 791, 4919006,3110,   0
mbuf:   256, 52127340,   1,2259, 5017862,1247,   0
mbuf_cluster:  2048, 8144896,3860, 682,  583821,3377,   0
mbuf_jumbo_page:   4096, 1018097,   0,  12,  150937,   5,   0
mbuf_jumbo_9k: 9216, 301658,   0,   0,   0,   0,   0
mbuf_jumbo_16k:   16384, 169682,   0,   0,   0,   0,   0
mbuf_ext_refcnt:  4,  0,   0,   0,   0,   0,   0
g_bio:  248,  0,   0,   11088,12456274,   0,   0
ttyinq: 160,  0, 210, 240, 705,   0,   0
ttyoutq:256,  0, 109, 206, 367,   0,   0
ata_request:336,  0,   0,   0,   0,   0,   0
vtnet_tx_hdr:24,  0,   0,   0,   0,   0,   0
cryptop: 88,  0,   0,   0,   0,   0,   0
cryptodesc:  72,  0,   0,   0,   0,   0,   0
FPU_save_area:  832,  0,   0,   0,   0,   0,   0
VNODE:  472,  0,  308616, 152, 2156908,   0,   0
VNODEPOLL:  112,  0,   0,   0,   0,   0,   0
BUF TRIE:   144,  0, 951,  104997,  271764,   0,   0
NAMEI: 1024,  0,   0,  68, 6635607,   0,   0
S VFS Cache:108,  0,  284884,   25741, 2028838,   0,   0
STS VFS Cache:  148,  0,   0,   0,   0,   0,   0
L VFS Cache:328,  0,   33803, 805,  275904,   0,   0
LTS VFS Cache:  368,  0,  10,  80,  30,   0,   0
DIRHASH:   1024,  0,2828,   8,4099,   0,   0
NCLNODE:528,  0,  14,  28,  14,   0,   0
fuse_ticket:224,  0,   0,

Re: igb requests for mbufs denied

2014-08-29 Thread Eggert, Lars
On 2014-8-28, at 21:07, Steven Hartland kill...@multiplay.co.uk wrote:
 When you say you've bumped mbclusters and mbufs, was that in
 /boot/loader.conf or /etc/sysctl.conf. If the latter then thats
 too late for driver init so try the former.

loader.conf

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


igb requests for mbufs denied

2014-08-28 Thread Eggert, Lars
Hi,

no matter what value I bump kern.ipc.nmbclusters and kern.ipc.nmbufs to, I 
still get requests for mbufs denied with igb interfaces, and the occasional 
connection stall, even when dialing down hw.igb.num_queues=1:

[root@laurel: ~] netstat -m
3070/1355/4425 mbufs in use (current/cache/total)
3069/773/3842/8144896 mbuf clusters in use (current/cache/total/max)
3069/772 mbuf+clusters out of packet secondary zone in use (current/cache)
0/6/6/1018097 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/301658 9k jumbo clusters in use (current/cache/total/max)
0/0/0/169682 16k jumbo clusters in use (current/cache/total/max)
6905K/1908K/8814K bytes allocated to network (current/cache/total)
73/3831/3091 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
6/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile

The box in question has six igb interfaces, 2x 'I210 Gigabit Network 
Connection' and 4x '82580 Gigabit Network Connection' and is running:

FreeBSD laurel.muccbc.hq.netapp.com 10.0-RELEASE-p7 FreeBSD 10.0-RELEASE-p7 #0: 
Tue Jul  8 06:37:44 UTC 2014 
r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

Anyone have any ideas what else to try?

Thanks,
Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: igb requests for mbufs denied

2014-08-28 Thread Eggert, Lars
On 2014-8-28, at 13:17, Alexander V. Chernikov melif...@freebsd.org wrote:
 Do you have jumbo frames turned on?

Nope.

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: igb requests for mbufs denied

2014-08-28 Thread Eggert, Lars
On 2014-8-28, at 13:02, Mike Tancsa m...@sentex.net wrote:
 hw.igb.enable_msix=0

Doesn't change things either, unfortunately.

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: igb requests for mbufs denied

2014-08-28 Thread Eggert, Lars
On 2014-8-28, at 10:31, Eggert, Lars l...@netapp.com wrote:
 73/3831/3091 requests for mbufs denied (mbufs/clusters/mbuf+clusters)

I just noticed that these are already there just after boot, and then also 
don't seem to be increasing anymore (or only very slowly.) Just in case that 
gives someone an idea.

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: igb requests for mbufs denied

2014-08-28 Thread Eggert, Lars
Hi,

also just noticed that there is a version 2.4.2 driver on Intel's site for 
these cards (https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=15815) 
whereas FreeBSD (incl. -CURRENT) is at 2.4.0.

No changelog, unfortunately.

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Patches for RFC6937 and draft-ietf-tcpm-newcwv-00

2014-08-27 Thread Eggert, Lars
It would be great if people could also review Aris' PRR patch - RFC6937 has 
been out for a while.

Lars



prr.patch
Description: Binary data


On 2014-8-26, at 20:09, Adrian Chadd adr...@freebsd.org wrote:

 Hi!
 
 I'm going to merge Tom's work in a week unless someone gives me a
 really good reason not to.
 
 I think there's been enough work and discussion about it since the
 first post from Lars in Feburary and enough review opportunity.
 
 
 -a
 
 
 On 26 August 2014 07:55, Tom Jones jo...@sdf.org wrote:
 On Tue, Aug 26, 2014 at 02:43:49PM +, Eggert, Lars wrote:
 Hi,
 
 the newcwv patch is probably stale now with Tom Jones' recent patch based on
 a more up-to-date version of the Internet-Draft, but the PRR patch should
 still be useful?
 
 My newcwv patch is much more up to date than Aris's, but it is slightly
 different in implementation. I have had a few suggestions from Adrian, but he
 couldn't comment on how it relates to the tcp internals.
 
 There is a PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191520
 
 The biggest difference in structure between mine and Aris's patch is the use 
 of
 tcp timers. It would be good to hear if my approach or Aris's is prefered.
 
 On 2014-6-19, at 23:35, George Neville-Neil g...@neville-neil.com wrote:
 
 On 4 Feb 2014, at 1:38, Eggert, Lars wrote:
 
 Hi,
 
 below are two patches that implement RFC6937 (Proportional Rate 
 Reduction for TCP) and draft-ietf-tcpm-newcwv-00 (Updating TCP to 
 support Rate-Limited Traffic). They were done by Aris Angelogiannopoulos 
 for his MS thesis, which is at 
 https://eggert.org/students/angelogiannopoulos-thesis.pdf.
 
 The patches should apply to -CURRENT as of Sep 17, 2013. (Sorry for the 
 delay in sending them, we'd been trying to get some feedback from 
 committers first, without luck.)
 
 Please note that newcwv is still a work in progress in the IETF, and the 
 patch has some limitations with regards to the pipeACK Sampling Period 
 mentioned in the Internet-Draft. Aris says this in his thesis about what 
 exactly he implemented:
 
 The second implementation choice, is in regards with the measurement of 
 pipeACK. This variable is the most important introduced by the method and 
 is used to compute the phase that the sender currently lies in. In order 
 to compute pipeACK the approach suggested by the Internet Draft (ID) is 
 followed [ncwv]. During initialization, pipeACK is set to the maximum 
 possible value. A helper variable prevHighACK is introduced that is 
 initialized to the initial sequence number (iss). prevHighACK holds the 
 value of the highest acknowledged byte so far. pipeACK is measured once 
 per RTT meaning that when an ACK covering prevHighACK is received, 
 pipeACK becomes the difference between the current ACK and prevHighACK. 
 This is called a pipeACK sample.  A newer version of the draft suggests 
 that multiple pipeACK samples can be used during the pipeACK sampling 
 period.
 
 Lars
 
 
 [prr.patch]
 
 [newcwv.patch]
 
 Apologies for not looking at this as yet.  It is now closer to the top of 
 my list.
 
 Best,
 George
 
 
 
 
 --
 Tom
 @adventureloop
 adventurist.me
 
 :wq
 ___
 freebsd-net@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Patches for RFC6937 and draft-ietf-tcpm-newcwv-00

2014-08-27 Thread Eggert, Lars
Not as far as I know. 

Lars

On 2014-8-27, at 9:39, Adrian Chadd adr...@freebsd.org wrote:

 Is there a PR for it?
 
 
 -a
 
 
 On 27 August 2014 00:23, Eggert, Lars l...@netapp.com wrote:
 It would be great if people could also review Aris' PRR patch - RFC6937 has 
 been out for a while.
 
 Lars
 
 
 
 
 On 2014-8-26, at 20:09, Adrian Chadd adr...@freebsd.org wrote:
 
 Hi!
 
 I'm going to merge Tom's work in a week unless someone gives me a
 really good reason not to.
 
 I think there's been enough work and discussion about it since the
 first post from Lars in Feburary and enough review opportunity.
 
 
 -a
 
 
 On 26 August 2014 07:55, Tom Jones jo...@sdf.org wrote:
 On Tue, Aug 26, 2014 at 02:43:49PM +, Eggert, Lars wrote:
 Hi,
 
 the newcwv patch is probably stale now with Tom Jones' recent patch based 
 on
 a more up-to-date version of the Internet-Draft, but the PRR patch should
 still be useful?
 
 My newcwv patch is much more up to date than Aris's, but it is slightly
 different in implementation. I have had a few suggestions from Adrian, but 
 he
 couldn't comment on how it relates to the tcp internals.
 
 There is a PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191520
 
 The biggest difference in structure between mine and Aris's patch is the 
 use of
 tcp timers. It would be good to hear if my approach or Aris's is prefered.
 
 On 2014-6-19, at 23:35, George Neville-Neil g...@neville-neil.com wrote:
 
 On 4 Feb 2014, at 1:38, Eggert, Lars wrote:
 
 Hi,
 
 below are two patches that implement RFC6937 (Proportional Rate 
 Reduction for TCP) and draft-ietf-tcpm-newcwv-00 (Updating TCP to 
 support Rate-Limited Traffic). They were done by Aris 
 Angelogiannopoulos for his MS thesis, which is at 
 https://eggert.org/students/angelogiannopoulos-thesis.pdf.
 
 The patches should apply to -CURRENT as of Sep 17, 2013. (Sorry for the 
 delay in sending them, we'd been trying to get some feedback from 
 committers first, without luck.)
 
 Please note that newcwv is still a work in progress in the IETF, and 
 the patch has some limitations with regards to the pipeACK Sampling 
 Period mentioned in the Internet-Draft. Aris says this in his thesis 
 about what exactly he implemented:
 
 The second implementation choice, is in regards with the measurement 
 of pipeACK. This variable is the most important introduced by the 
 method and is used to compute the phase that the sender currently lies 
 in. In order to compute pipeACK the approach suggested by the Internet 
 Draft (ID) is followed [ncwv]. During initialization, pipeACK is set to 
 the maximum possible value. A helper variable prevHighACK is introduced 
 that is initialized to the initial sequence number (iss). prevHighACK 
 holds the value of the highest acknowledged byte so far. pipeACK is 
 measured once per RTT meaning that when an ACK covering prevHighACK is 
 received, pipeACK becomes the difference between the current ACK and 
 prevHighACK. This is called a pipeACK sample.  A newer version of the 
 draft suggests that multiple pipeACK samples can be used during the 
 pipeACK sampling period.
 
 Lars
 
 
 [prr.patch]
 
 [newcwv.patch]
 
 Apologies for not looking at this as yet.  It is now closer to the top 
 of my list.
 
 Best,
 George
 
 
 
 
 --
 Tom
 @adventureloop
 adventurist.me
 
 :wq
 ___
 freebsd-net@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
 
 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Patches for RFC6937 and draft-ietf-tcpm-newcwv-00

2014-08-27 Thread Eggert, Lars
Yep

On 2014-8-27, at 9:53, Adrian Chadd adr...@freebsd.org wrote:

 Ok. Is it the same patch you sent out in Feb?
 
 
 -a
 
 
 On 27 August 2014 00:43, Eggert, Lars l...@netapp.com wrote:
 Not as far as I know.
 
 Lars
 
 On 2014-8-27, at 9:39, Adrian Chadd adr...@freebsd.org wrote:
 
 Is there a PR for it?
 
 
 -a
 
 
 On 27 August 2014 00:23, Eggert, Lars l...@netapp.com wrote:
 It would be great if people could also review Aris' PRR patch - RFC6937 
 has been out for a while.
 
 Lars
 
 
 
 
 On 2014-8-26, at 20:09, Adrian Chadd adr...@freebsd.org wrote:
 
 Hi!
 
 I'm going to merge Tom's work in a week unless someone gives me a
 really good reason not to.
 
 I think there's been enough work and discussion about it since the
 first post from Lars in Feburary and enough review opportunity.
 
 
 -a
 
 
 On 26 August 2014 07:55, Tom Jones jo...@sdf.org wrote:
 On Tue, Aug 26, 2014 at 02:43:49PM +, Eggert, Lars wrote:
 Hi,
 
 the newcwv patch is probably stale now with Tom Jones' recent patch 
 based on
 a more up-to-date version of the Internet-Draft, but the PRR patch 
 should
 still be useful?
 
 My newcwv patch is much more up to date than Aris's, but it is slightly
 different in implementation. I have had a few suggestions from Adrian, 
 but he
 couldn't comment on how it relates to the tcp internals.
 
 There is a PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191520
 
 The biggest difference in structure between mine and Aris's patch is the 
 use of
 tcp timers. It would be good to hear if my approach or Aris's is 
 prefered.
 
 On 2014-6-19, at 23:35, George Neville-Neil g...@neville-neil.com 
 wrote:
 
 On 4 Feb 2014, at 1:38, Eggert, Lars wrote:
 
 Hi,
 
 below are two patches that implement RFC6937 (Proportional Rate 
 Reduction for TCP) and draft-ietf-tcpm-newcwv-00 (Updating TCP to 
 support Rate-Limited Traffic). They were done by Aris 
 Angelogiannopoulos for his MS thesis, which is at 
 https://eggert.org/students/angelogiannopoulos-thesis.pdf.
 
 The patches should apply to -CURRENT as of Sep 17, 2013. (Sorry for 
 the delay in sending them, we'd been trying to get some feedback from 
 committers first, without luck.)
 
 Please note that newcwv is still a work in progress in the IETF, and 
 the patch has some limitations with regards to the pipeACK Sampling 
 Period mentioned in the Internet-Draft. Aris says this in his thesis 
 about what exactly he implemented:
 
 The second implementation choice, is in regards with the measurement 
 of pipeACK. This variable is the most important introduced by the 
 method and is used to compute the phase that the sender currently 
 lies in. In order to compute pipeACK the approach suggested by the 
 Internet Draft (ID) is followed [ncwv]. During initialization, 
 pipeACK is set to the maximum possible value. A helper variable 
 prevHighACK is introduced that is initialized to the initial sequence 
 number (iss). prevHighACK holds the value of the highest acknowledged 
 byte so far. pipeACK is measured once per RTT meaning that when an 
 ACK covering prevHighACK is received, pipeACK becomes the difference 
 between the current ACK and prevHighACK. This is called a pipeACK 
 sample.  A newer version of the draft suggests that multiple pipeACK 
 samples can be used during the pipeACK sampling period.
 
 Lars
 
 
 [prr.patch]
 
 [newcwv.patch]
 
 Apologies for not looking at this as yet.  It is now closer to the top 
 of my list.
 
 Best,
 George
 
 
 
 
 --
 Tom
 @adventureloop
 adventurist.me
 
 :wq
 ___
 freebsd-net@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
 
 
 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Patches for RFC6937 and draft-ietf-tcpm-newcwv-00

2014-08-26 Thread Eggert, Lars
Hi,

the newcwv patch is probably stale now with Tom Jones' recent patch based on a 
more up-to-date version of the Internet-Draft, but the PRR patch should still 
be useful?

Lars

On 2014-6-19, at 23:35, George Neville-Neil g...@neville-neil.com wrote:

 On 4 Feb 2014, at 1:38, Eggert, Lars wrote:
 
 Hi,
 
 below are two patches that implement RFC6937 (Proportional Rate Reduction 
 for TCP) and draft-ietf-tcpm-newcwv-00 (Updating TCP to support 
 Rate-Limited Traffic). They were done by Aris Angelogiannopoulos for his MS 
 thesis, which is at 
 https://eggert.org/students/angelogiannopoulos-thesis.pdf.
 
 The patches should apply to -CURRENT as of Sep 17, 2013. (Sorry for the 
 delay in sending them, we'd been trying to get some feedback from committers 
 first, without luck.)
 
 Please note that newcwv is still a work in progress in the IETF, and the 
 patch has some limitations with regards to the pipeACK Sampling Period 
 mentioned in the Internet-Draft. Aris says this in his thesis about what 
 exactly he implemented:
 
 The second implementation choice, is in regards with the measurement of 
 pipeACK. This variable is the most important introduced by the method and is 
 used to compute the phase that the sender currently lies in. In order to 
 compute pipeACK the approach suggested by the Internet Draft (ID) is 
 followed [ncwv]. During initialization, pipeACK is set to the maximum 
 possible value. A helper variable prevHighACK is introduced that is 
 initialized to the initial sequence number (iss). prevHighACK holds the 
 value of the highest acknowledged byte so far. pipeACK is measured once per 
 RTT meaning that when an ACK covering prevHighACK is received, pipeACK 
 becomes the difference between the current ACK and prevHighACK. This is 
 called a pipeACK sample.  A newer version of the draft suggests that 
 multiple pipeACK samples can be used during the pipeACK sampling period.
 
 Lars
 
 
 [prr.patch]
 
 [newcwv.patch]
 
 Apologies for not looking at this as yet.  It is now closer to the top of my 
 list.
 
 Best,
 George



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Regression test suite for TCP

2014-08-21 Thread Eggert, Lars
On 2014-8-20, at 22:14, vijju.singh vijju.si...@gmail.com wrote:
 Have you looked at packetdrill from Google? 

packetdrill is great, but Google keeps their (extensive) library of regression 
tests private.

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: A problem on TCP in High RTT Environment.

2014-08-12 Thread Eggert, Lars
Hi,

On 2014-8-12, at 1:52, hiren panchasara hiren.panchas...@gmail.com wrote:
 On Mon, Aug 11, 2014 at 12:59 PM, Michael Tuexen
 michael.tue...@lurchi.franken.de wrote:
 If I remember correctly, I increased
 kern.ipc.nmbufs and kern.ipc.nmbclusters in /boot/loader.conf
 
 I believe, you just need to set kern.ipc.nmbclusters (max mbuf
 clusters allowed) and kern.ipc.nmbufs (max mbufs allowed) should be
 adjusted based on that.

I bumped kern.ipc.nmbclusters by a factor of 100 (from 2036224 to 203622400). 
As Hiren said, kern.ipc.nmbufs auto-adjusted (from 13031835 to 205111860).

However, I still see requests for mbufs denied immediately after reboot.

root@laurel:~ # netstat -m
12280/1580/13860 mbufs in use (current/cache/total)
12279/827/13106/203622400 mbuf clusters in use (current/cache/total/max)
12279/819 mbuf+clusters out of packet secondary zone in use (current/cache)
0/3/3/1018111 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/301662 9k jumbo clusters in use (current/cache/total/max)
0/0/0/169685 16k jumbo clusters in use (current/cache/total/max)
27628K/2061K/29689K bytes allocated to network (current/cache/total)
253/5481/12473 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile

I just noticed that the total mbufs in use didn't seem to have increase when 
I did the 100x scaling of kern.ipc.nmbclusters (and kern.ipc.nmbufs 
auto-adjusted). Neither did bytes allocated to network. Is that expected?

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: A problem on TCP in High RTT Environment.

2014-08-12 Thread Eggert, Lars
On 2014-8-12, at 12:31, Michael Tuexen michael.tue...@lurchi.franken.de wrote:
 On 12 Aug 2014, at 10:02, Eggert, Lars l...@netapp.com wrote:
 I bumped kern.ipc.nmbclusters by a factor of 100 (from 2036224 to 
 203622400). As Hiren said, kern.ipc.nmbufs auto-adjusted (from 13031835 to 
 205111860).
 Just to double check: You changed it in /boot/loader.conf, right?

Yep, and it has taken effect:

# sysctl -a | egrep 'nmb|mbuf'
kern.ipc.maxmbufmem: 16680744960
kern.ipc.nmbclusters: 203622400
kern.ipc.nmbjumbop: 1018111
kern.ipc.nmbjumbo9: 904986
kern.ipc.nmbjumbo16: 678740
kern.ipc.nmbufs: 205111860
net.inet.sctp.max_chained_mbufs: 5

 I just noticed that the total mbufs in use didn't seem to have increase 
 when I did the 100x scaling of kern.ipc.nmbclusters (and kern.ipc.nmbufs 
 auto-adjusted). Neither did bytes allocated to network. Is that expected?
 I don't think so...

Am I hitting some other kernel limit on how much memory in total the stack can 
use?

Lars



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: A problem on TCP in High RTT Environment.

2014-08-11 Thread Eggert, Lars
Hi,

On 2014-8-10, at 5:48, Niu Zhixiong kaia...@gmail.com wrote:
 I am using Intel I350-T4 NIC.

igb driver?

I've been having weird issues with this driver under 10-RELEASE, too. On one 
machine, I had to limit hw.igb.num_queues=2 in order to get any sort of useful 
connectivity. On another machine, I had to severely bump kern.ipc.nmbclusters  
friends. I'm not sure this is the issue here, since SCTP seems to be working 
OK, but I'm not trusting igb NICs at the moment.

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: A problem on TCP in High RTT Environment.

2014-08-11 Thread Eggert, Lars
On 2014-8-11, at 9:17, Michael Tuexen michael.tue...@lurchi.franken.de wrote:
 Was there any suspicious output provided by netstat -m when the problems 
 occur?

root@laurel:~ # netstat -m
8186/2179/10365 mbufs in use (current/cache/total)
8184/1214/9398/2036224 mbuf clusters in use (current/cache/total/max)
8184/885 mbuf+clusters out of packet secondary zone in use (current/cache)
0/5/5/1018111 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/301662 9k jumbo clusters in use (current/cache/total/max)
0/0/0/169685 16k jumbo clusters in use (current/cache/total/max)
18414K/2992K/21407K bytes allocated to network (current/cache/total)
544/57/8194 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile

root@laurel:~ # uptime
 2:12PM  up 37 mins, 3 users, load averages: 0.20, 0.25, 0.15

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: A problem on TCP in High RTT Environment.

2014-08-11 Thread Eggert, Lars
Hi,

On 2014-8-11, at 21:27, Michael Tuexen michael.tue...@lurchi.franken.de wrote:
 On 11 Aug 2014, at 14:12, Eggert, Lars l...@netapp.com wrote:
 544/57/8194 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
 I guess the above is the problem. The card wants a lot of mbufs...
 So the problem should go away if you increase the number of mbufs/clusters,
 which means no requests are denied and you don't experience any performance
 issue.

And I have six of those ports in that box...

So I bump kern.ipc.nmbclusters? Any additional sysctls I should bump?

Thanks,
Lars



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: A problem on TCP in High RTT Environment.

2014-08-09 Thread Eggert, Lars
At 400ms at @ 20Mbps, your are probably receive window limited. Bump 
net.inet.tcp.recvspace. (Your net.inet.sctp.recvspace is much larger, which 
probably explains the performance difference.)

On 2014-8-8, at 14:34, Niu Zhixiong kaia...@gmail.com wrote:

 Dear all,
 
 Last month, I send problems related to FTP/TCP in a high RTT environment.
 After that, I setup a simulation environment(Dummynet) to test TCP and SCTP
 in high delay environment. After finishing the test, I can see TCP is
 always slower than SCTP. But, I think it is not possible. (Plz see the
 figure in the attachment). When the delay is 200ms(means RTT=400ms).
 Besides, the TCP is extremely slow.
 
 ALL BW=20Mbps, DELAY= 0 ~ 200MS, Packet LOSS = 0 (by dummynet)
 
 This is my parameters:
 FreeBSD vfreetest0 10.0-RELEASE FreeBSD 10.0-RELEASE #0: Thu Aug  7
 11:04:15 HKT 2014
 
 sysctl net.inet.tcp
 net.inet.tcp.rfc1323: 1
 net.inet.tcp.mssdflt: 536
 net.inet.tcp.keepidle: 720
 net.inet.tcp.keepintvl: 75000
 net.inet.tcp.sendspace: 32768
 net.inet.tcp.recvspace: 65536
 net.inet.tcp.keepinit: 75000
 net.inet.tcp.delacktime: 100
 net.inet.tcp.v6mssdflt: 1220
 net.inet.tcp.cc.algorithm: newreno
 net.inet.tcp.cc.available: newreno
 net.inet.tcp.hostcache.cachelimit: 15360
 net.inet.tcp.hostcache.hashsize: 512
 net.inet.tcp.hostcache.bucketlimit: 30
 net.inet.tcp.hostcache.count: 0
 net.inet.tcp.hostcache.expire: 3600
 net.inet.tcp.hostcache.prune: 300
 net.inet.tcp.hostcache.purge: 0
 net.inet.tcp.log_in_vain: 0
 net.inet.tcp.blackhole: 0
 net.inet.tcp.delayed_ack: 1
 net.inet.tcp.drop_synfin: 0
 net.inet.tcp.rfc3042: 1
 net.inet.tcp.rfc3390: 1
 net.inet.tcp.experimental.initcwnd10: 1
 net.inet.tcp.rfc3465: 1
 net.inet.tcp.abc_l_var: 2
 net.inet.tcp.ecn.enable: 0
 net.inet.tcp.ecn.maxretries: 1
 net.inet.tcp.insecure_rst: 0
 net.inet.tcp.recvbuf_auto: 0
 net.inet.tcp.recvbuf_inc: 16384
 net.inet.tcp.recvbuf_max: 2097152
 net.inet.tcp.path_mtu_discovery: 1
 net.inet.tcp.tso: 1
 net.inet.tcp.sendbuf_auto: 0
 net.inet.tcp.sendbuf_inc: 8192
 net.inet.tcp.sendbuf_max: 2097152
 net.inet.tcp.reass.maxsegments: 15900
 net.inet.tcp.reass.cursegments: 0
 net.inet.tcp.reass.overflows: 0
 net.inet.tcp.sack.enable: 1
 net.inet.tcp.sack.maxholes: 128
 net.inet.tcp.sack.globalmaxholes: 65536
 net.inet.tcp.sack.globalholes: 0
 net.inet.tcp.minmss: 216
 net.inet.tcp.log_debug: 0
 net.inet.tcp.tcbhashsize: 32768
 net.inet.tcp.do_tcpdrain: 1
 net.inet.tcp.pcbcount: 4
 net.inet.tcp.icmp_may_rst: 1
 net.inet.tcp.isn_reseed_interval: 0
 net.inet.tcp.soreceive_stream: 0
 net.inet.tcp.syncookies: 1
 net.inet.tcp.syncookies_only: 0
 net.inet.tcp.syncache.bucketlimit: 30
 net.inet.tcp.syncache.cachelimit: 15375
 net.inet.tcp.syncache.count: 0
 net.inet.tcp.syncache.hashsize: 512
 net.inet.tcp.syncache.rexmtlimit: 3
 net.inet.tcp.syncache.rst_on_sock_fail: 1
 net.inet.tcp.msl: 3
 net.inet.tcp.rexmit_min: 30
 net.inet.tcp.rexmit_slop: 200
 net.inet.tcp.always_keepalive: 1
 net.inet.tcp.fast_finwait2_recycle: 0
 net.inet.tcp.finwait2_timeout: 6
 net.inet.tcp.keepcnt: 8
 net.inet.tcp.rexmit_drop_options: 0
 net.inet.tcp.per_cpu_timers: 0
 net.inet.tcp.timer_race: 0
 net.inet.tcp.maxtcptw: 26070
 net.inet.tcp.nolocaltimewait: 0
 
 kern.ipc.maxsockbuf: 400
 kern.ipc.sockbuf_waste_factor: 8
 kern.ipc.max_linkhdr: 16
 kern.ipc.max_protohdr: 60
 kern.ipc.max_hdr: 76
 kern.ipc.max_datalen: 92
 kern.ipc.maxmbufmem: 2073962496
 kern.ipc.nmbclusters: 253170
 kern.ipc.nmbjumbop: 126584
 kern.ipc.nmbjumbo9: 112518
 kern.ipc.nmbjumbo16: 84388
 kern.ipc.nmbufs: 1620285
 kern.ipc.maxpipekva: 66736128
 kern.ipc.pipekva: 16384
 kern.ipc.pipefragretry: 0
 kern.ipc.pipeallocfail: 0
 kern.ipc.piperesizefail: 0
 kern.ipc.piperesizeallowed: 1
 kern.ipc.msgmax: 16384
 kern.ipc.msgmni: 40
 kern.ipc.msgmnb: 2048
 kern.ipc.msgtql: 40
 kern.ipc.msgssz: 8
 kern.ipc.msgseg: 2048
 kern.ipc.semmni: 50
 kern.ipc.semmns: 340
 kern.ipc.semmnu: 150
 kern.ipc.semmsl: 340
 kern.ipc.semopm: 100
 kern.ipc.semume: 50
 kern.ipc.semusz: 632
 kern.ipc.semvmx: 32767
 kern.ipc.semaem: 16384
 kern.ipc.shmmax: 536870912
 kern.ipc.shmmin: 1
 kern.ipc.shmmni: 192
 kern.ipc.shmseg: 128
 kern.ipc.shmall: 131072
 kern.ipc.shm_use_phys: 0
 kern.ipc.shm_allow_removed: 0
 kern.ipc.soacceptqueue: 128
 kern.ipc.numopensockets: 14
 kern.ipc.maxsockets: 130350
 kern.ipc.sendfile.readahead: 1
 
 sysctl net.inet.sctp
 net.inet.sctp.sendspace: 2097152
 net.inet.sctp.recvspace: 2097152
 net.inet.sctp.auto_asconf: 1
 net.inet.sctp.ecn_enable: 1
 net.inet.sctp.strict_sacks: 1
 net.inet.sctp.peer_chkoh: 256
 net.inet.sctp.maxburst: 4
 net.inet.sctp.fr_maxburst: 4
 net.inet.sctp.maxchunks: 31646
 net.inet.sctp.tcbhashsize: 1024
 net.inet.sctp.pcbhashsize: 256
 net.inet.sctp.min_split_point: 2904
 net.inet.sctp.chunkscale: 10
 net.inet.sctp.delayed_sack_time: 200
 net.inet.sctp.sack_freq: 2
 net.inet.sctp.sys_resource: 1000
 net.inet.sctp.asoc_resource: 10
 net.inet.sctp.heartbeat_interval: 3
 

Re: [rfc] tcp timer update for RSS

2014-07-01 Thread Eggert, Lars
Hi,

[elars@stanley:/home/elars/src] 1 ⌀ grep -r IP_RSSCPUID sys
sys/netinet/in.h:/* 71 - XXX was IP_RSSCPUID - can recycle whenever */
sys/netinet/ip_output.c:case IP_RSSCPUID:

kernel compilation with RSS currently fails, because IP_RSSCPUID is still used 
in ip_output.c.

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Patches for RFC6937 and draft-ietf-tcpm-newcwv-00

2014-04-11 Thread Eggert, Lars
Hi,

since folks are playing with Midori's DCTCP patch, I wanted to make sure that 
you were also aware of the patches that Aris did for PRR and NewCWV...

Lars

On 2014-2-4, at 10:38, Eggert, Lars l...@netapp.com wrote:

 Hi,
 
 below are two patches that implement RFC6937 (Proportional Rate Reduction 
 for TCP) and draft-ietf-tcpm-newcwv-00 (Updating TCP to support 
 Rate-Limited Traffic). They were done by Aris Angelogiannopoulos for his MS 
 thesis, which is at https://eggert.org/students/angelogiannopoulos-thesis.pdf.
 
 The patches should apply to -CURRENT as of Sep 17, 2013. (Sorry for the delay 
 in sending them, we'd been trying to get some feedback from committers first, 
 without luck.)
 
 Please note that newcwv is still a work in progress in the IETF, and the 
 patch has some limitations with regards to the pipeACK Sampling Period 
 mentioned in the Internet-Draft. Aris says this in his thesis about what 
 exactly he implemented:
 
 The second implementation choice, is in regards with the measurement of 
 pipeACK. This variable is the most important introduced by the method and is 
 used to compute the phase that the sender currently lies in. In order to 
 compute pipeACK the approach suggested by the Internet Draft (ID) is followed 
 [ncwv]. During initialization, pipeACK is set to the maximum possible value. 
 A helper variable prevHighACK is introduced that is initialized to the 
 initial sequence number (iss). prevHighACK holds the value of the highest 
 acknowledged byte so far. pipeACK is measured once per RTT meaning that when 
 an ACK covering prevHighACK is received, pipeACK becomes the difference 
 between the current ACK and prevHighACK. This is called a pipeACK sample.  A 
 newer version of the draft suggests that multiple pipeACK samples can be used 
 during the pipeACK sampling period.
 
 Lars
 
 prr.patchnewcwv.patch



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: DCTCP implementation

2014-04-01 Thread Eggert, Lars
Hi,

On 2014-3-31, at 7:37, Midori Kato kat...@sfc.wide.ad.jp wrote:
  I will send an ECN marking implmenetation in dummynet and test scripts 
 personally to you.

I think you can send the dummynet ECN patch also to the list. I know Luigi is 
reviewing it for a merge, but that lets people play with it already.

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: TCP Initial Window 10 MFC (was: Re: svn commit: r252789 - stable/9/sys/netinet)

2013-08-14 Thread Eggert, Lars
Hi,

On Aug 14, 2013, at 10:36, Lawrence Stewart lstew...@freebsd.org wrote:
 I don't think this change should have been MFCed, at least not in its
 current form.

FYI, Google's own data as presented in the HTTPBIS working group of the recent 
Berlin IETF shows that 10 is too high for ~25% of their web connections: see 
slide 2 of http://www.ietf.org/proceedings/87/slides/slides-87-httpbis-5.pdf

(That slide shows a CDF of CWND values the server used at the end of a web 
transaction.)

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: TCP Initial Window 10 MFC

2013-08-14 Thread Eggert, Lars
Hi,

On Aug 14, 2013, at 17:27, Lawrence Stewart lstew...@freebsd.org
 wrote:
 Do you recall if they said
 how many flows made up the CDF?

I think very many - check out the audio archive or the minutes of the 
meeting, it should have the details.

Lars


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: TCP Initial Window 10 MFC (was: Re: svn commit: r252789 - stable/9/sys/netinet)

2013-08-14 Thread Eggert, Lars
Oh: The other interesting bit is that Chrome defaulted to telling the server to 
use IW32 if it had no cached value...

I think Google are still heavily tweaking the mechanisms.

Lars

On Aug 14, 2013, at 16:46, Eggert, Lars l...@netapp.com wrote:

 Hi,
 
 On Aug 14, 2013, at 10:36, Lawrence Stewart lstew...@freebsd.org wrote:
 I don't think this change should have been MFCed, at least not in its
 current form.
 
 FYI, Google's own data as presented in the HTTPBIS working group of the 
 recent Berlin IETF shows that 10 is too high for ~25% of their web 
 connections: see slide 2 of 
 http://www.ietf.org/proceedings/87/slides/slides-87-httpbis-5.pdf
 
 (That slide shows a CDF of CWND values the server used at the end of a web 
 transaction.)
 
 Lars



signature.asc
Description: Message signed with OpenPGP using GPGMail


hw.igb.num_queues default

2013-06-20 Thread Eggert, Lars
Hi,

I just popped a new four-port igb card into a -STABLE system and encountered 
severe issues even when unloaded right after boot, to the point where I 
couldn't even ssh into the system anymore. The box has 2x4 cores:

CPU: Intel(R) Xeon(R) CPU   X5450  @ 3.00GHz (2992.60-MHz K8-class CPU)
  Origin = GenuineIntel  Id = 0x10676  Family = 0x6  Model = 0x17  Stepping = 
6
  
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
  
Features2=0xce3bdSSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1
  AMD Features=0x20100800SYSCALL,NX,LM
  AMD Features2=0x1LAHF
  TSC: P-state invariant, performance statistics
real memory  = 8589934592 (8192 MB)
avail memory = 8239513600 (7857 MB)
MPTable: DELL PE 01B2 
Event timer LAPIC quality 400
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 2 package(s) x 4 core(s)

By default, the igb driver seems to set up one queue per detected CPU. Googling 
around, people seemed to suggest that limiting the number of queues makes 
things work better. I can confirm that setting hw.igb.num_queues=2 seems to 
have fixed the issue. (Two was the first value I tried, maybe other values 
other than 0 would work, too.)

In order to uphold POLA, should the igb driver maybe default to a conservative 
value for hw.igb.num_queues that may not deliver optimal performance, but at 
least works out of the box?

Lars
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: hw.igb.num_queues default

2013-06-20 Thread Eggert, Lars
Hi,

On Jun 20, 2013, at 16:29, Andre Oppermann an...@freebsd.org wrote:
 On 20.06.2013 15:37, Eugene Grosbein wrote:
 Or, better, make nmbclusters auto-tuning smarter, if any.
 I mean, use more nmbclusters for machines with large amounts of memory.
 
 That has already been done in HEAD.

the box in question is running -CURRENT, so that may also still help.

Frankly, I don't really care what the correct fix is. I just want to be able to 
plop this NIC in and be able to connect to it without the need to diddle any 
loader knobs.

Lars
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: hw.igb.num_queues default

2013-06-20 Thread Eggert, Lars
On Jun 20, 2013, at 17:51, Eggert, Lars l...@netapp.com wrote:
 the box in question is running -CURRENT, so that may also still help.

s/CURRENT/STABLE/

 
 Frankly, I don't really care what the correct fix is. I just want to be able 
 to plop this NIC in and be able to connect to it without the need to diddle 
 any loader knobs.
 
 Lars
 ___
 freebsd-net@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


TCP RST with gpxe

2013-06-07 Thread Eggert, Lars
Hi,

when loading things from gpxe over HTTP from a FreeBSD server, FreeBSD resets 
the connection after the GET. A Linux HTTP server doesn't. Dump attached.

Any clues as to why this is happening?

Thanks,
Lars



dump3
Description: dump3
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org

Re: enable tcpdump GUESS_TSO flag?

2013-04-09 Thread Eggert, Lars
Hi,

On Apr 8, 2013, at 7:18, YongHyeon PYUN pyu...@gmail.com wrote:
 I don't have strong option on enabling that flag but I think it
 would be even better to have an option to enable/disable that
 feature(default off).

I agree that would be the better solution, but it's a larger patch to the 
source.

 em(4) controllers require IP length should
 be 0 before controller performs TSO operation. fxp(4) controllers
 requires IP length should be set to the IP length of the first TCP
 segment after TSO operation. bpf listeners see the modified packet
 so it can confuse them. AFAIK, except em(4)/fxp(4) controllers, no
 other controllers in tree have such limitation with TSO. Enabling
 GUESS_TSO flag may make it hard to debug network/driver issues I
 guess.

Not sure. Yeah, you wouldn't see IP bad-len 0 messages anymore in traces, but 
instead those packets would contain garbage. That should still be noticeable to 
someone looking into the dump.

Lars

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


enable tcpdump GUESS_TSO flag?

2013-04-04 Thread Eggert, Lars
Hi,

I wonder whether it'd be a good idea to enable tcpdump's GUESS_TSO flag by 
default? It enables a heuristic that lets tcpdump understand pcaps that include 
segments generated by TCP TSO (which otherwise show up as IP bad-len 0.)

See the dicussion at 
http://www.mail-archive.com/tcpdump-workers@lists.tcpdump.org/msg01051.html for 
details.

Lars

diff --git a/usr.sbin/tcpdump/tcpdump/Makefile 
b/usr.sbin/tcpdump/tcpdump/Makefile
index ca8ec4c..5fd73a1 100644
--- a/usr.sbin/tcpdump/tcpdump/Makefile
+++ b/usr.sbin/tcpdump/tcpdump/Makefile
@@ -45,6 +45,10 @@ CFLAGS+= -I${.CURDIR} -I${TCPDUMP_DISTDIR}
 CFLAGS+= -DHAVE_CONFIG_H
 CFLAGS+= -D_U_=__attribute__((unused))
 
+# Enable tcpdump heuristic to identify TSO-generated packets; see
+# http://www.mail-archive.com/tcpdump-workers@lists.tcpdump.org/msg01051.html
+CFLAGS+= -DGUESS_TSO
+
 .if ${MK_INET6_SUPPORT} != no
 SRCS+= print-ip6.c print-ip6opts.c print-mobility.c print-ripng.c \
print-icmp6.c print-babel.c print-frag6.c print-rt6.c print-ospf6.c \

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: ntpd bind() failure: Can't assign requested address

2013-03-26 Thread Eggert, Lars
Hi,

 I confirm I have the same issue on 9.1 r247912 , as below:

same here, on FreeBSD 10.0-CURRENT #5 r+16848a4-dirty:

Mar 26 11:43:17  ntpd[2783]: bind() fd 23, family AF_INET6, port 123, scope 1, 
addr fe80::92e2:baff:fe2b:3a00, mcast=0 flags=0x11 fails: Can't assign 
requested address
Mar 26 11:43:17  ntpd[2783]: unable to create socket on ix0 (3) for 
fe80::92e2:baff:fe2b:3a00#123
Mar 26 11:43:17  ntpd[2783]: bind() fd 23, family AF_INET6, port 123, scope 0, 
addr fd00:cafe:cafe:200::7, mcast=0 flags=0x11 fails: Can't assign requested 
address
Mar 26 11:43:17  ntpd[2783]: unable to create socket on ix0 (4) for 
fd00:cafe:cafe:200::7#123
Mar 26 11:43:17  ntpd[2783]: bind() fd 24, family AF_INET6, port 123, scope 2, 
addr fe80::92e2:baff:fe2b:3a01, mcast=0 flags=0x11 fails: Can't assign 
requested address
Mar 26 11:43:17  ntpd[2783]: unable to create socket on ix1 (6) for 
fe80::92e2:baff:fe2b:3a01#123
Mar 26 11:43:17  ntpd[2783]: bind() fd 24, family AF_INET6, port 123, scope 0, 
addr fd00:cafe:cafe:201::7, mcast=0 flags=0x11 fails: Can't assign requested 
address
Mar 26 11:43:17  ntpd[2783]: unable to create socket on ix1 (7) for 
fd00:cafe:cafe:201::7#123
Mar 26 11:43:17  ntpd[2783]: bind() fd 25, family AF_INET6, port 123, scope 3, 
addr fe80::21b:21ff:fea8:a534, mcast=0 flags=0x11 fails: Can't assign requested 
address
Mar 26 11:43:17  ntpd[2783]: unable to create socket on em0 (9) for 
fe80::21b:21ff:fea8:a534#123
Mar 26 11:43:17  ntpd[2783]: bind() fd 25, family AF_INET6, port 123, scope 0, 
addr fd00:cafe:cafe:100::7, mcast=0 flags=0x11 fails: Can't assign requested 
address
Mar 26 11:43:17  ntpd[2783]: unable to create socket on em0 (10) for 
fd00:cafe:cafe:100::7#123
Mar 26 11:43:17  ntpd[2783]: bind() fd 26, family AF_INET6, port 123, scope 4, 
addr fe80::21b:21ff:fea8:a535, mcast=0 flags=0x11 fails: Can't assign requested 
address
Mar 26 11:43:17  ntpd[2783]: unable to create socket on em1 (12) for 
fe80::21b:21ff:fea8:a535#123
Mar 26 11:43:17  ntpd[2783]: bind() fd 26, family AF_INET6, port 123, scope 0, 
addr fd00:cafe:cafe:101::7, mcast=0 flags=0x11 fails: Can't assign requested 
address
Mar 26 11:43:17  ntpd[2783]: unable to create socket on em1 (13) for 
fd00:cafe:cafe:101::7#123
Mar 26 11:43:17  ntpd[2783]: bind() fd 27, family AF_INET6, port 123, scope 5, 
addr fe80::21b:21ff:fea8:a536, mcast=0 flags=0x11 fails: Can't assign requested 
address
Mar 26 11:43:17  ntpd[2783]: unable to create socket on em2 (15) for 
fe80::21b:21ff:fea8:a536#123
Mar 26 11:43:17  ntpd[2783]: bind() fd 27, family AF_INET6, port 123, scope 0, 
addr fd00:cafe:cafe:102::7, mcast=0 flags=0x11 fails: Can't assign requested 
address
Mar 26 11:43:17  ntpd[2783]: unable to create socket on em2 (16) for 
fd00:cafe:cafe:102::7#123
Mar 26 11:43:17  ntpd[2783]: bind() fd 28, family AF_INET6, port 123, scope 6, 
addr fe80::21b:21ff:fea8:a537, mcast=0 flags=0x11 fails: Can't assign requested 
address
Mar 26 11:43:17  ntpd[2783]: unable to create socket on em3 (18) for 
fe80::21b:21ff:fea8:a537#123
Mar 26 11:43:17  ntpd[2783]: bind() fd 28, family AF_INET6, port 123, scope 0, 
addr fd00:cafe:cafe:103::7, mcast=0 flags=0x11 fails: Can't assign requested 
address
Mar 26 11:43:17  ntpd[2783]: unable to create socket on em3 (19) for 
fd00:cafe:cafe:103::7#123
Mar 26 11:43:17  ntpd[2783]: bind() fd 31, family AF_INET6, port 123, scope 0, 
addr fd00:cafe:cafe:104::7, mcast=0 flags=0x11 fails: Can't assign requested 
address
Mar 26 11:43:17  ntpd[2783]: bind() fd 34, family AF_INET6, port 123, scope 0, 
addr fd00:cafe:cafe:1::7, mcast=0 flags=0x11 fails: Can't assign requested 
address

 However, ntpd runs fine and is bound to the following addresses:

same here, running fine.

My ifconfig:

ix0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500

options=407bbRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO
ether 90:e2:ba:2b:3a:00
inet 10.2.0.7 netmask 0xff00 broadcast 10.2.0.255 
inet6 fe80::92e2:baff:fe2b:3a00%ix0 prefixlen 64 scopeid 0x1 
inet6 fd00:cafe:cafe:200::7 prefixlen 64 
nd6 options=21PERFORMNUD,AUTO_LINKLOCAL
media: Ethernet autoselect (10Gbase-Twinax full-duplex)
status: active
ix1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500

options=407bbRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO
ether 90:e2:ba:2b:3a:01
inet 10.2.1.7 netmask 0xff00 broadcast 10.2.1.255 
inet6 fe80::92e2:baff:fe2b:3a01%ix1 prefixlen 64 scopeid 0x2 
inet6 fd00:cafe:cafe:201::7 prefixlen 64 
nd6 options=21PERFORMNUD,AUTO_LINKLOCAL
media: Ethernet autoselect (10Gbase-Twinax full-duplex)
status: active
em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500

options=4019bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,VLAN_HWTSO
ether 00:1b:21:a8:a5:34

Re: ntpd bind() failure: Can't assign requested address

2013-03-26 Thread Eggert, Lars
On Mar 26, 2013, at 12:59, kt...@acm.org
 wrote:
 How do you configure your network interfaces? Using /etc/start_if* or 
 /etc/rc.conf?br/

The latter.

(Actually, most of them are configured in rc.local with a bit of shell code 
that generates the IP address from the MAC address for a set of machines.)

Lars
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: ntpd bind() failure: Can't assign requested address

2013-03-26 Thread Eggert, Lars
Hi,

On Mar 26, 2013, at 15:39, kit kt...@acm.org wrote:
 
 try setting ipv6_activate_all_interfaces to yes

I had that set all along.

 and configuring the corresponding $ifconfig_IF_ipv6 in your rc.conf if you 
 haven't done so already

Can't really do this, because dhclient needs to have finished so I can generate 
IPv6 addresses based on the IPv4 address  MAC address of the interfaces. (Yes, 
this is an ugly hack, but these are lab machines.)

Lars
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Problems with two interfaces on the same subnet?

2013-02-12 Thread Eggert, Lars
Hi,

On Feb 12, 2013, at 9:32, Ivan Voras ivo...@freebsd.org wrote:
 I have a machine with two interfaces, igb2 and igb3 on the same subnet
 but with different IP addresses, e.g. igb2 has 192.168.1.221, igb3 has
 192.168.1.222. Firstly, is there anything which would preclude this from
 working? As I see it, these are two different MAC addresses and to the
 outside world should look as two different hosts, right?

depends on what you mean by work. There will only be one default route out of 
the box.

 * With both NICs connected to different switches, everything appears to
 work, BUT, if igb2 cable is disconnected, pings to igb3 simply stop,
 even though its cable *is* still connected.

This sounds like your default route is going via igb2.

You can make this work with ipfw rules (and I guess also setfib, although I 
have not tried that.)

Lars

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Problems with two interfaces on the same subnet?

2013-02-12 Thread Eggert, Lars
Hi,

On Feb 12, 2013, at 9:50, Ivan Voras ivo...@freebsd.org wrote:
 You can make this work with ipfw rules (and I guess also setfib, although I 
 have not tried that.)
 
 The concept of FIBs looks clean and applicable but setfib works on newly
 started process, and I would need to do something like apply it to
 packets coming from an interface.

Assuming your default route is via igb2, you can do something like this:

ipfw add fwd router upstream of igb3 ip4 from local address of igb3 to not 
subnet of igb2 out

(From memory, no guarantees.)

Lars
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: [PATCH] Add a new TCP_IGNOREIDLE socket option

2013-02-11 Thread Eggert, Lars
On Feb 10, 2013, at 11:36, Andrey Zonov z...@freebsd.org wrote:
 Google made many many TCP tweaks.  Increased initial window, small RTO,
 enabled ignore after idle and others.  They published that, other people
 just blindly applied these tunings and the Internet still works.

MANY people are experimenting with the changes Google is proposing, in order to 
evaluate if and how well they work. Sure, some folks may blindly apply them, 
but please don't generalize.

Lars

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: [PATCH] Add a new TCP_IGNOREIDLE socket option

2013-02-09 Thread Eggert, Lars
On Feb 10, 2013, at 6:05, Kevin Oberman kob6...@gmail.com wrote:
 One idea that popped into my head (and may be completely ridiculous,
 is to make its availability dependent on a kernel option and have
 warning in NOTES about it contravening normal and accepted practice
 and that it can cause serious problems both for yourself and for
 others using the network.

Also, if it gets merged, don't call it TCP_IGNOREIDLE. Call it 
TCP_BLAST_DANGEROUSLY_AFTER_IDLE.

Lars
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: high cpu usage on natd / dhcpd

2013-02-07 Thread Eggert, Lars
On Jan 31, 2013, at 16:03, Matthew Luckie m...@luckie.org.nz wrote:
 
 00510 allow ip from me to not me out via em1
 00550 divert 8668 ip from any to any via em1
 
 Rule 510 fixes it.

Yep, it does. Can I ask someone to commit this to rc.firewall?

(And I wonder if the rules for the ipfw kernel firewall need a similar 
addition, because the system locks up under heavy network load if I use that 
instead of natd.)

Lars

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: high cpu usage on natd / dhcpd

2013-02-07 Thread Eggert, Lars
Hi,

On Feb 7, 2013, at 13:40, Ian Smith smi...@nimnet.asn.au wrote:
 On Thu, 7 Feb 2013 08:08:59 +, Eggert, Lars wrote:
 On Jan 31, 2013, at 16:03, Matthew Luckie m...@luckie.org.nz wrote:
 
 00510 allow ip from me to not me out via em1
 00550 divert 8668 ip from any to any via em1
 
 Rule 510 fixes it.
 
 Yep, it does. Can I ask someone to commit this to rc.firewall?
 
 The ruleset Matthew posted bears no resemblance to rc.firewall, so I 
 don't see that (or how) it solves any generic problem.

sorry for having been imprecise. What I was asking for was this change:

--- /usr/src/etc/rc.firewall2012-11-17 12:36:10.0 +0100
+++ rc.firewall 2013-02-06 11:35:45.0 +0100
@@ -155,6 +155,7 @@
case ${natd_enable} in
[Yy][Ee][Ss])
if [ -n ${natd_interface} ]; then
+   ${fwcmd} add 49 allow ip from me to not me out via 
${natd_interface}
${fwcmd} add 50 divert natd ip4 from any to any via 
${natd_interface}
fi
;;

 (And I wonder if the rules for the ipfw kernel firewall need a 
 similar addition, because the system locks up under heavy network 
 load if I use that instead of natd.)
 
 Which rc.firewall ruleset are you referring to?

My rc.conf has:

gateway_enable=YES 
firewall_enable=YES 
firewall_type=OPEN 
natd_enable=YES
natd_interface=bce0

With the patch above, that seems to work fine.

I tried to replace the natd_* lines with:

firewall_nat_enable=YES
firewall_nat_interface=bce0

which caused the machine to lock up under load, similar to when natd started 
eating CPU cycles. This made me wonder if a similar patch to the above for the 
firewall_nat_* case in rc.firewall might be needed.

  There certainly are 
 problems with the 'simple' ruleset relating to use of $natd_enable vs 
 $firewall_nat_enable (not to mention the denial of ALL icmp traffic) 
 that I posted patches to a couple of years ago in ipfw@ to rc.firewall 
 and /etc/rc.d/{ipfw,natd) addressing about 4 PRs .. sadly to no avail.
 
 I suggest following up to ipfw@ (cc'd) rather than net@

Will subscribe, thanks.

Lars
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Data Center Bridging?

2013-02-06 Thread Eggert, Lars
Hi Jack,

On Jan 22, 2013, at 19:23, Jack Vogel jfvo...@gmail.com wrote:
 I have never implemented this in the FreeBSD drivers primarily because the
 motivation for it say, in Linux,
 was to handle multiple traffic classes, for instance FCOE or iSCSI, but
 FreeBSD has not had these features
 to implement this for.  Give me a reason to do it, and I can see about
 adding it :)

I'm interested in seeing if DCB can be used for lossless IP communication over 
a simple and private LAN fabric. I have some student cycles that I can direct 
at helping with the implementation, if that's useful?

Lars
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: high cpu usage on natd / dhcpd

2013-01-31 Thread Eggert, Lars
Hi,

 I have a small system running FreeBSD 8.2 that does NAT using ipfw and 
 natd to systems attached to two interfaces: em0 and wlan0.  I have a 
 dhcpd daemon issuing leases on those interfaces.  The system has an em1 
 interface plugged into a cable modem where it obtains a DHCP lease from 
 an ISP.
 
 For some reason, when traffic from the Internet terminates on the system 
 itself (I scp a file from the computer) the natd and dhcpd processes 
 consume significant CPU, and the throughput is less than I expect. 
 Traffic that passes through to a computer behind the NAT flows without 
 causing the natd or dhcpd processes to measurably consume CPU.

I see exactly the same issue on -STABLE. Have you been able to figure out the 
cause?

Thanks,
Lars
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: high cpu usage on natd / dhcpd

2013-01-31 Thread Eggert, Lars
Hi,

On Jan 31, 2013, at 10:42, Kevin Lo ke...@kevlo.org wrote:
 Use ipfw nat instead. It uses the libalias(3) in kernel and avoids
 gigantic natd(8) overhead.

I tried that, but it froze the system.

Lars
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Data Center Bridging?

2013-01-23 Thread Eggert, Lars
Hi,

On Jan 22, 2013, at 19:23, Jack Vogel jfvo...@gmail.com wrote:
 I have never implemented this in the FreeBSD drivers primarily because the
 motivation for it say, in Linux,
 was to handle multiple traffic classes, for instance FCOE or iSCSI, but
 FreeBSD has not had these features
 to implement this for.  Give me a reason to do it, and I can see about
 adding it :)

well, my motivation for asking was that I'll soon have some students 
investigate various options for low-latency reliable communication over a 
private fabric. DCB is one candidate.

On Jan 22, 2013, at 19:49, Navdeep Parhar npar...@gmail.com wrote:
 cxgbe(4) hardware supports DCB/DCBX, but I haven't looked at what it
 would take to add driver + OS support.

Driver support for either Intel or Chelsio NICs would be great!

Lars
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Data Center Bridging?

2013-01-22 Thread Eggert, Lars
Hi,

on Linux, various NICs (e.g., ixgbe) support Data Center Bridging. Is this also 
available under FreeBSD? Do *any* NICs support DCB under FreeBSD?

Thanks,
Lars
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


static kernel with mod_cc?

2013-01-15 Thread Eggert, Lars
Hi,

mod_cc(4) says:

 Algorithm modules can be compiled into the kernel or loaded as
 kernel modules using the kld(4) facility.

Maybe I'm dense, but I can't figure out how to statically compile mod_cc 
modules into the kernel? (I'm using a PAE kernel w/o modules.)

Hints appreciated.

Thanks,
Lars
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: static kernel with mod_cc?

2013-01-15 Thread Eggert, Lars
Hi,

On Jan 15, 2013, at 14:09, Lawrence Stewart lstew...@freebsd.org wrote:
 You're not dense - the build glue to allow an algorithm to be specified
 in a kernel config file doesn't exist.

ah, that explains it. I guess it doesn't exist for siftr either?

 The hacky way to achieve what you want would be to edit
 /path/to/src/sys/conf/files and manually add a line like so below the
 cc_newreno.c line:
 
 netinet/cc/cc_algo.c  optional inet | inet6
 
 That will compile the module into the kernel, assuming options INET or
 options INET6 is in your kernel config file.

Thanks, will try!

Lars
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


FreeBSD bufferbloat?

2012-10-11 Thread Eggert, Lars
Hi,

is anyone in BSD-land working on de-bufferbloating the kernel, similar to what 
the Linux folks are currently doing?

Lars

Re: [patch] sysctls for TCP timers

2012-09-20 Thread Eggert, Lars
Hi,

On Sep 20, 2012, at 9:25, Andrey Zonov z...@freebsd.org wrote:
 Some of them may be read google's article about tuning TCP parameters
 [1].  I convert most of TCP timers to sysctls [2] and we are using this
 patch for few months.  We tuned net.inet.tcp.rtobase and
 net.inet.tcp.syncache.rexmttime and it gives good results (especially in
 conjunction with cc_htcp(4)).

can you share some measurements that quantify the results?

Thanks,
Lars

Re: Multiroute question

2012-09-20 Thread Eggert, Lars
On Sep 20, 2012, at 16:16, Juan José Sánchez Mesa juanjo.lis...@doblej.net
 wrote:
 There is a way to configure the network so that outgoing packets goes to the 
 card from where the incoming packets was arrived ?

Policy routing e.g. with ipfw. Read up on ipfw fwd.

Lars