date:20160204

[Bug 203630] [Hyper-V] [nat] [tcp] 10.2 NAT bug in TCP stack or hyperv netsvc driver

2016-02-04 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203630

--- Comment #29 from Eddy  ---
Hello everybody,

The issue was fixed with patch r291156. I tested it on a clean FreeBSD install
by recompiling the kernel in a test environment and it worked.

It was merged to the STABLE 10 branch (Fri Dec 18 14:56:49 UTC 2015). I assume
that the latest build include the fix, however I'm running 10.2-RELEASE-p12 on
my production server but the problem still occurs.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

[Bug 203630] [Hyper-V] [nat] [tcp] 10.2 NAT bug in TCP stack or hyperv netsvc driver

2016-02-04 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203630

--- Comment #28 from Franco Fichtner  ---
Hello,

We've run into this too over at OPNsense. This is a harsh regression from 10.1
to 10.2. It needs an errata for 10.2.


Thank you,
Franco

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: swaping ring slots between NIC ring and Host ring does not always success

2016-02-04 Thread Victor Detoni

Both interfaces are up? Like ifconfig... up

I had this the same problem and I solve with commands above

Em quinta-feira, 4 de fevereiro de 2016, Xiaoye Sun 
escreveu:

> Hi Luigi,
>
> Thanks for your explanation.
>
> I used three machines to do this experiment. They are directly connected.
>
> [(machine1) eth1]---[eth2 (machine2) eth3]---[eth4 (machine3)].
>
> First, I tried to run bridge.c on machine2 using the command *bridge -i
> netmap:eth2 -i netmap:eth3*. (sender receiver or XYZ were not running on
> machine 1or3)
>
> For my understanding, in this setup, machine2 will be transparent to
> machine1&3 since it forwards packet from its eth2 to eth3 and vice versa
> without any modification to the packets.
>
> I tried to ping machine 3 from machine 1 using the command like *ping
> 10.11.10.3*. However, it still does not success.
> This is because that before machine1 sends ping message to machine3, it
> will first send a ARP request message to get the mac address of machine3.
> machine3 gets that ARP request, and send the reply back (I use tcpdump to
> verify that machine3 gets the ARP request and send out the ARP reply).
> However, machine1 does not get the ARP reply.
>
> I checked that the bridge can only forwarding packet in one direction at
> the same time. it gets the ARP request but doesn't see the ARP reply
> (*pkt_queued* always returns 0 for one nic...).
>
> This behavior looks very weird to me. Do you think there is a compatibility
> issues between netmap and the os I am using? Is there a verified linux
> distribution (also the version) that perfectly works well with netmap?
>
> The OS I use is 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 (2015-05-24)
> x86_64 GNU/Linux.
> Linux kernel version is *3.16.0-4-amd64*
>
>
> Thanks!
> Xiaoye
>
>
>
>
>
>
> On Wed, Feb 3, 2016 at 2:12 AM, Luigi Rizzo  > wrote:
>
> > On Tue, Feb 2, 2016 at 10:48 PM, Xiaoye Sun  > wrote:
> > >
> > >
> > > On Mon, Feb 1, 2016 at 11:34 PM, Luigi Rizzo  > wrote:
> > >>
> > >> On Tue, Feb 2, 2016 at 6:23 AM, Xiaoye Sun  > wrote:
> > >> > Hi Luigi,
> > >> >
> > >> > I have to clarify about the *jumping issue* about the slot indexes.
> > >> > In the bridge.c program, the slot index never jumps and it increases
> > >> > sequentially.
> > >> > In the receiver.c program, the udp packet seq jumps and I showed the
> > >> > slot
> > >> > index that each udp packet uses. So the slot index jumps together
> with
> > >> > the
> > >> > udp seq (at the receiver program only).
> > >>
> > >> So let me understand, is the "slot" some information written
> > >> in the packet by bridge.c (referring to the rx or tx slot,
> > >> I am not sure) and then read and printed by receiver.c
> > >> (which gets the packet through recvfrom so there isn't
> > >> really any slot index) ?
> > >>
> > > It works in the other way:
> > > The bridge.c checks the seq numbers of the udp packets in netmap slots
> > (in
> > > nic rx ring) before the swap; then it records the seq number, slot
> > > number(both rx and tx (tx indexes were not shown in the previous email
> > since
> > > they all look correct)) and buf_idx (rx and tx). The bridge.c does not
> > > change anything in the buffer and it knows the slot and buf_idx that a
> > > packet uses. Please refer to the added code in *process_rings* function
> > > http://www.owlnet.rice.edu/~xs6/bridge.c
> > > The receiver.c checks the seq numbers only and print out the seq
> numbers
> > it
> > > receive sequentially.
> > > With these information, I manually match the seq number I got from
> > > receiver.c and the seq number I got from bridge.c. So we know what is
> the
> > > seq order the receiver sees and which slot a packet uses when bridge.c
> > swaps
> > > the buf_idxs.
> > >
> > >> Do you see any ordering inversion when the receiver
> > >> gets packets through the NETMAP API (e.g. using bridge.c
> > >> instead of receiver.c) ?
> > >>
> > > There is no ordering inversion seen by bridge.c (As I said in the
> > previous
> > > paragraph, the bridge.c checks the seq number and I did not see any
> order
> > > inversion in THIS simple experiment (In my multicast protocol
> (mentioned
> > in
> > > the first email), there is ordering inversion. But let us solve the
> > simple
> > > bridge.c's problem first. I think they are two relatively independent
> > > issues.)).
> >
> > Sorry there was a misunderstanding.
> > I wanted you to check the following setup:
> >
> > [1: send.c] ->- [2: bridge.c] ->- [3: XYZ]
> >
> > where in XYZ you replace your receiver.c with some
> > netmap-based receiver (it could be pkt-gen in rx mode,
> > or possibly even another instance of bridge.c where
> > you connect the output port to a vale switch so
> > traffic is dropped), and then in XYZ print the content
> > of the packets.
> >
> > From your previous report we know that node 2: sees

[Bug 206904] tailq crash/nd inet6

2016-02-04 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=206904

Mark Linimon  changed:

   What|Removed |Added

   Assignee|freebsd-b...@freebsd.org|freebsd-net@FreeBSD.org

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Fwd: swaping ring slots between NIC ring and Host ring does not always success

2016-02-04 Thread Xiaoye Sun

Hi Luigi,

Thanks for your explanation.

I used three machines to do this experiment. They are directly connected.

[(machine1) eth1]---[eth2 (machine2) eth3]---[eth4 (machine3)].

First, I tried to run bridge.c on machine2 using the command *bridge -i
netmap:eth2 -i netmap:eth3*. (sender receiver or XYZ were not running on
machine 1or3)

For my understanding, in this setup, machine2 will be transparent to
machine1&3 since it forwards packet from its eth2 to eth3 and vice versa
without any modification to the packets.

I tried to ping machine 3 from machine 1 using the command like *ping
10.11.10.3*. However, it still does not success.
This is because that before machine1 sends ping message to machine3, it
will first send a ARP request message to get the mac address of machine3.
machine3 gets that ARP request, and send the reply back (I use tcpdump to
verify that machine3 gets the ARP request and send out the ARP reply).
However, machine1 does not get the ARP reply.

I checked that the bridge can only forwarding packet in one direction at
the same time. it gets the ARP request but doesn't see the ARP reply
(*pkt_queued* always returns 0 for one nic...).

This behavior looks very weird to me. Do you think there is a compatibility
issues between netmap and the os I am using? Is there a verified linux
distribution (also the version) that perfectly works well with netmap?

The OS I use is 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 (2015-05-24)
x86_64 GNU/Linux.
Linux kernel version is *3.16.0-4-amd64*


Thanks!
Xiaoye






On Wed, Feb 3, 2016 at 2:12 AM, Luigi Rizzo  wrote:

> On Tue, Feb 2, 2016 at 10:48 PM, Xiaoye Sun  wrote:
> >
> >
> > On Mon, Feb 1, 2016 at 11:34 PM, Luigi Rizzo  wrote:
> >>
> >> On Tue, Feb 2, 2016 at 6:23 AM, Xiaoye Sun  wrote:
> >> > Hi Luigi,
> >> >
> >> > I have to clarify about the *jumping issue* about the slot indexes.
> >> > In the bridge.c program, the slot index never jumps and it increases
> >> > sequentially.
> >> > In the receiver.c program, the udp packet seq jumps and I showed the
> >> > slot
> >> > index that each udp packet uses. So the slot index jumps together with
> >> > the
> >> > udp seq (at the receiver program only).
> >>
> >> So let me understand, is the "slot" some information written
> >> in the packet by bridge.c (referring to the rx or tx slot,
> >> I am not sure) and then read and printed by receiver.c
> >> (which gets the packet through recvfrom so there isn't
> >> really any slot index) ?
> >>
> > It works in the other way:
> > The bridge.c checks the seq numbers of the udp packets in netmap slots
> (in
> > nic rx ring) before the swap; then it records the seq number, slot
> > number(both rx and tx (tx indexes were not shown in the previous email
> since
> > they all look correct)) and buf_idx (rx and tx). The bridge.c does not
> > change anything in the buffer and it knows the slot and buf_idx that a
> > packet uses. Please refer to the added code in *process_rings* function
> > http://www.owlnet.rice.edu/~xs6/bridge.c
> > The receiver.c checks the seq numbers only and print out the seq numbers
> it
> > receive sequentially.
> > With these information, I manually match the seq number I got from
> > receiver.c and the seq number I got from bridge.c. So we know what is the
> > seq order the receiver sees and which slot a packet uses when bridge.c
> swaps
> > the buf_idxs.
> >
> >> Do you see any ordering inversion when the receiver
> >> gets packets through the NETMAP API (e.g. using bridge.c
> >> instead of receiver.c) ?
> >>
> > There is no ordering inversion seen by bridge.c (As I said in the
> previous
> > paragraph, the bridge.c checks the seq number and I did not see any order
> > inversion in THIS simple experiment (In my multicast protocol (mentioned
> in
> > the first email), there is ordering inversion. But let us solve the
> simple
> > bridge.c's problem first. I think they are two relatively independent
> > issues.)).
>
> Sorry there was a misunderstanding.
> I wanted you to check the following setup:
>
> [1: send.c] ->- [2: bridge.c] ->- [3: XYZ]
>
> where in XYZ you replace your receiver.c with some
> netmap-based receiver (it could be pkt-gen in rx mode,
> or possibly even another instance of bridge.c where
> you connect the output port to a vale switch so
> traffic is dropped), and then in XYZ print the content
> of the packets.
>
> From your previous report we know that node 2: sees packets
> in order, and node 3: sees packets out of order.
> However, if the problem were due to bridge.c sending
> the old buffer and not the new one, you'd see not only
> reordering but also replication of packets.
>
> The fact that you see only the reordering in 3: makes
> me think that the problem is in that node, and it could
> be the network stack in 3: that does something strange.
> So if you can run something netmap based in 3: and make
> sure there is only one queue to

[Bug 206932] Realtek 8111 card stops responding under high load in netmap mode

2016-02-04 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=206932

Mark Linimon  changed:

   What|Removed |Added

   Assignee|freebsd-b...@freebsd.org|freebsd-net@FreeBSD.org

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

[Differential] [Updated] D4825: tcp/lro: Add network driver configurable LRO entry depth

2016-02-04 Thread hselasky (Hans Petter Selasky)

hselasky added a comment.


  FYI
  
  https://reviews.freebsd.org/D1761 might be related to this one.
  
  Should you check that "lc->lro_hiwat" is greater or equal to 
"lc->ifp->if_mtu" ?
  
  --HPS

REVISION DETAIL
  https://reviews.freebsd.org/D4825

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: sepherosa_gmail.com, network, transport, adrian, delphij, 
decui_microsoft.com, honzhan_microsoft.com, howard0su_gmail.com, glebius
Cc: hselasky, np, freebsd-net-list
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

[Differential] [Abandoned] D4825: tcp/lro: Add network driver configurable LRO entry depth

2016-02-04 Thread sepherosa_gmail.com (Sepherosa Ziehau)

sepherosa_gmail.com abandoned this revision.
sepherosa_gmail.com added a comment.


  Updated version at:
  https://reviews.freebsd.org/D5185

REVISION DETAIL
  https://reviews.freebsd.org/D4825

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: sepherosa_gmail.com, network, transport, adrian, delphij, 
decui_microsoft.com, honzhan_microsoft.com, howard0su_gmail.com, glebius
Cc: hselasky, np, freebsd-net-list
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

[Differential] [Commented On] D4825: tcp/lro: Add network driver configurable LRO entry depth

2016-02-04 Thread sepherosa_gmail.com (Sepherosa Ziehau)

sepherosa_gmail.com added a comment.


  In https://reviews.freebsd.org/D4825#110653, @hselasky wrote:
  
  > FYI
  >
  > https://reviews.freebsd.org/D1761 might be related to this one.
  >
  > Should you check that "lc->lro_hiwat" is greater or equal to 
"lc->ifp->if_mtu" ?
  >
  > --HPS
  
  
  I have discarded this one, please take a look at this:
  https://reviews.freebsd.org/D5185
  
  Thanks,
  sephe

REVISION DETAIL
  https://reviews.freebsd.org/D4825

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: sepherosa_gmail.com, network, transport, adrian, delphij, 
decui_microsoft.com, honzhan_microsoft.com, howard0su_gmail.com, glebius
Cc: hselasky, np, freebsd-net-list
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: swaping ring slots between NIC ring and Host ring does not always success

2016-02-04 Thread Xiaoye Sun

Yes. all the interfaces are up. Are you able to get ARP request when the
interfaces are down?

On Thursday, February 4, 2016, Victor Detoni  wrote:

> Both interfaces are up? Like ifconfig... up
>
> I had this the same problem and I solve with commands above
>
> Em quinta-feira, 4 de fevereiro de 2016, Xiaoye Sun  > escreveu:
>
>> Hi Luigi,
>>
>> Thanks for your explanation.
>>
>> I used three machines to do this experiment. They are directly connected.
>>
>> [(machine1) eth1]---[eth2 (machine2) eth3]---[eth4 (machine3)].
>>
>> First, I tried to run bridge.c on machine2 using the command *bridge -i
>> netmap:eth2 -i netmap:eth3*. (sender receiver or XYZ were not running on
>> machine 1or3)
>>
>> For my understanding, in this setup, machine2 will be transparent to
>> machine1&3 since it forwards packet from its eth2 to eth3 and vice versa
>> without any modification to the packets.
>>
>> I tried to ping machine 3 from machine 1 using the command like *ping
>> 10.11.10.3*. However, it still does not success.
>> This is because that before machine1 sends ping message to machine3, it
>> will first send a ARP request message to get the mac address of machine3.
>> machine3 gets that ARP request, and send the reply back (I use tcpdump to
>> verify that machine3 gets the ARP request and send out the ARP reply).
>> However, machine1 does not get the ARP reply.
>>
>> I checked that the bridge can only forwarding packet in one direction at
>> the same time. it gets the ARP request but doesn't see the ARP reply
>> (*pkt_queued* always returns 0 for one nic...).
>>
>> This behavior looks very weird to me. Do you think there is a
>> compatibility
>> issues between netmap and the os I am using? Is there a verified linux
>> distribution (also the version) that perfectly works well with netmap?
>>
>> The OS I use is 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 (2015-05-24)
>> x86_64 GNU/Linux.
>> Linux kernel version is *3.16.0-4-amd64*
>>
>>
>> Thanks!
>> Xiaoye
>>
>>
>>
>>
>>
>>
>> On Wed, Feb 3, 2016 at 2:12 AM, Luigi Rizzo  wrote:
>>
>> > On Tue, Feb 2, 2016 at 10:48 PM, Xiaoye Sun 
>> wrote:
>> > >
>> > >
>> > > On Mon, Feb 1, 2016 at 11:34 PM, Luigi Rizzo 
>> wrote:
>> > >>
>> > >> On Tue, Feb 2, 2016 at 6:23 AM, Xiaoye Sun 
>> wrote:
>> > >> > Hi Luigi,
>> > >> >
>> > >> > I have to clarify about the *jumping issue* about the slot indexes.
>> > >> > In the bridge.c program, the slot index never jumps and it
>> increases
>> > >> > sequentially.
>> > >> > In the receiver.c program, the udp packet seq jumps and I showed
>> the
>> > >> > slot
>> > >> > index that each udp packet uses. So the slot index jumps together
>> with
>> > >> > the
>> > >> > udp seq (at the receiver program only).
>> > >>
>> > >> So let me understand, is the "slot" some information written
>> > >> in the packet by bridge.c (referring to the rx or tx slot,
>> > >> I am not sure) and then read and printed by receiver.c
>> > >> (which gets the packet through recvfrom so there isn't
>> > >> really any slot index) ?
>> > >>
>> > > It works in the other way:
>> > > The bridge.c checks the seq numbers of the udp packets in netmap slots
>> > (in
>> > > nic rx ring) before the swap; then it records the seq number, slot
>> > > number(both rx and tx (tx indexes were not shown in the previous email
>> > since
>> > > they all look correct)) and buf_idx (rx and tx). The bridge.c does not
>> > > change anything in the buffer and it knows the slot and buf_idx that a
>> > > packet uses. Please refer to the added code in *process_rings*
>> function
>> > > http://www.owlnet.rice.edu/~xs6/bridge.c
>> > > The receiver.c checks the seq numbers only and print out the seq
>> numbers
>> > it
>> > > receive sequentially.
>> > > With these information, I manually match the seq number I got from
>> > > receiver.c and the seq number I got from bridge.c. So we know what is
>> the
>> > > seq order the receiver sees and which slot a packet uses when bridge.c
>> > swaps
>> > > the buf_idxs.
>> > >
>> > >> Do you see any ordering inversion when the receiver
>> > >> gets packets through the NETMAP API (e.g. using bridge.c
>> > >> instead of receiver.c) ?
>> > >>
>> > > There is no ordering inversion seen by bridge.c (As I said in the
>> > previous
>> > > paragraph, the bridge.c checks the seq number and I did not see any
>> order
>> > > inversion in THIS simple experiment (In my multicast protocol
>> (mentioned
>> > in
>> > > the first email), there is ordering inversion. But let us solve the
>> > simple
>> > > bridge.c's problem first. I think they are two relatively independent
>> > > issues.)).
>> >
>> > Sorry there was a misunderstanding.
>> > I wanted you to check the following setup:
>> >
>> > [1: send.c] ->- [2: bridge.c] ->- [3: XYZ]
>> >
>> > where in XYZ you replace your receiver.c with some

[Differential] [Updated, 114 lines] D5185: tcp/lro: Allow network drivers to set the limit for TCP ACK/data segment aggregation limit

2016-02-04 Thread sepherosa_gmail.com (Sepherosa Ziehau)

sepherosa_gmail.com updated the summary for this revision.
sepherosa_gmail.com updated this revision to Diff 13028.
sepherosa_gmail.com added a comment.


  Address gallatin and adrian's concern.

CHANGES SINCE LAST UPDATE
  https://reviews.freebsd.org/D5185?vs=12995=13028

REVISION DETAIL
  https://reviews.freebsd.org/D5185

AFFECTED FILES
  sys/dev/hyperv/netvsc/hv_net_vsc.h
  sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
  sys/netinet/tcp_lro.c
  sys/netinet/tcp_lro.h

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: sepherosa_gmail.com, network, adrian, delphij, royger, decui_microsoft.com, 
honzhan_microsoft.com, howard0su_gmail.com, hselasky, np, gallatin, transport
Cc: freebsd-virtualization-list, freebsd-net-list
diff --git a/sys/netinet/tcp_lro.h b/sys/netinet/tcp_lro.h
--- a/sys/netinet/tcp_lro.h
+++ b/sys/netinet/tcp_lro.h
@@ -91,11 +91,16 @@
 	unsigned	lro_cnt;
 	unsigned	lro_mbuf_count;
 	unsigned	lro_mbuf_max;
+	unsigned short	lro_ackcnt_lim;		/* max # of aggregated ACKs */
+	unsigned short	lro_length_lim;		/* max len of aggregated data */
 
 	struct lro_head	lro_active;
 	struct lro_head	lro_free;
 };
 
+#define	TCP_LRO_LENGTH_MAX	65535
+#define	TCP_LRO_ACKCNT_MAX	65535		/* unlimited */
+
 int tcp_lro_init(struct lro_ctrl *);
 int tcp_lro_init_args(struct lro_ctrl *, struct ifnet *, unsigned, unsigned);
 void tcp_lro_free(struct lro_ctrl *);
diff --git a/sys/netinet/tcp_lro.c b/sys/netinet/tcp_lro.c
--- a/sys/netinet/tcp_lro.c
+++ b/sys/netinet/tcp_lro.c
@@ -87,6 +87,8 @@
 	lc->lro_mbuf_count = 0;
 	lc->lro_mbuf_max = lro_mbufs;
 	lc->lro_cnt = lro_entries;
+	lc->lro_ackcnt_lim = TCP_LRO_ACKCNT_MAX;
+	lc->lro_length_lim = TCP_LRO_LENGTH_MAX;
 	lc->ifp = ifp;
 	SLIST_INIT(>lro_free);
 	SLIST_INIT(>lro_active);
@@ -608,7 +610,7 @@
 		}
 
 		/* Flush now if appending will result in overflow. */
-		if (le->p_len > (65535 - tcp_data_len)) {
+		if (le->p_len > (lc->lro_length_lim - tcp_data_len)) {
 			SLIST_REMOVE(>lro_active, le, lro_entry, next);
 			tcp_lro_flush(lc, le);
 			break;
@@ -646,6 +648,15 @@
 
 		if (tcp_data_len == 0) {
 			m_freem(m);
+			/*
+			 * Flush this LRO entry, if this ACK should not
+			 * be further delayed.
+			 */
+			if (le->append_cnt >= lc->lro_ackcnt_lim) {
+SLIST_REMOVE(>lro_active, le, lro_entry,
+next);
+tcp_lro_flush(lc, le);
+			}
 			return (0);
 		}
 
@@ -666,7 +677,7 @@
 		 * If a possible next full length packet would cause an
 		 * overflow, pro-actively flush now.
 		 */
-		if (le->p_len > (65535 - lc->ifp->if_mtu)) {
+		if (le->p_len > (lc->lro_length_lim - lc->ifp->if_mtu)) {
 			SLIST_REMOVE(>lro_active, le, lro_entry, next);
 			tcp_lro_flush(lc, le);
 		} else
diff --git a/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c b/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
--- a/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
+++ b/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
@@ -176,14 +176,11 @@
 #define HN_CSUM_ASSIST_WIN8	(CSUM_TCP)
 #define HN_CSUM_ASSIST		(CSUM_IP | CSUM_UDP | CSUM_TCP)
 
-/* XXX move to netinet/tcp_lro.h */
-#define HN_LRO_HIWAT_MAX65535
-#define HN_LRO_HIWAT_DEFHN_LRO_HIWAT_MAX
+#define HN_LRO_LENLIM_DEF		(25 * ETHERMTU)
 /* YYY 2*MTU is a bit rough, but should be good enough. */
-#define HN_LRO_HIWAT_MTULIM(ifp)			(2 * (ifp)->if_mtu)
-#define HN_LRO_HIWAT_ISVALID(sc, hiwat)			\
-((hiwat) >= HN_LRO_HIWAT_MTULIM((sc)->hn_ifp) ||	\
- (hiwat) <= HN_LRO_HIWAT_MAX)
+#define HN_LRO_LENLIM_MIN(ifp)		(2 * (ifp)->if_mtu)
+
+#define HN_LRO_ACKCNT_DEF		1
 
 /*
  * Be aware that this sleepable mutex will exhibit WITNESS errors when
@@ -253,9 +250,8 @@
 static void hn_start_txeof(struct ifnet *ifp);
 static int hn_ifmedia_upd(struct ifnet *ifp);
 static void hn_ifmedia_sts(struct ifnet *ifp, struct ifmediareq *ifmr);
-#ifdef HN_LRO_HIWAT
-static int hn_lro_hiwat_sysctl(SYSCTL_HANDLER_ARGS);
-#endif
+static int hn_lro_lenlim_sysctl(SYSCTL_HANDLER_ARGS);
+static int hn_lro_ackcnt_sysctl(SYSCTL_HANDLER_ARGS);
 static int hn_trust_hcsum_sysctl(SYSCTL_HANDLER_ARGS);
 static int hn_tx_chimney_size_sysctl(SYSCTL_HANDLER_ARGS);
 static int hn_check_iplen(const struct mbuf *, int);
@@ -265,15 +261,6 @@
 static void hn_txeof_taskfunc(void *xsc, int pending);
 static int hn_encap(struct hn_softc *, struct hn_txdesc *, struct mbuf **);
 
-static __inline void
-hn_set_lro_hiwat(struct hn_softc *sc, int hiwat)
-{
-	sc->hn_lro_hiwat = hiwat;
-#ifdef HN_LRO_HIWAT
-	sc->hn_lro.lro_hiwat = sc->hn_lro_hiwat;
-#endif
-}
-
 static int
 hn_ifmedia_upd(struct ifnet *ifp __unused)
 {
@@ -358,7 +345,6 @@
 	bzero(sc, sizeof(hn_softc_t));
 	sc->hn_unit = unit;
 	sc->hn_dev = dev;
-	sc->hn_lro_hiwat = HN_LRO_HIWAT_DEF;
 	sc->hn_direct_tx_size = hn_direct_tx_size;
 	if (hn_trust_hosttcp)
 		sc->hn_trust_hcsum |= HN_TRUST_HCSUM_TCP;
@@ -442,9 +428,8 @@
 	/* Driver private LRO settings */
 	sc->hn_lro.ifp = ifp;
 #endif
-#ifdef HN_LRO_HIWAT
-	sc->hn_lro.lro_hiwat = sc->hn_lro_hiwat;
-#endif
+

[Differential] [Commented On] D5185: tcp/lro: Allow network drivers to set the limit for TCP ACK/data segment aggregation limit

2016-02-04 Thread sepherosa_gmail.com (Sepherosa Ziehau)

sepherosa_gmail.com added a comment.


  I will adjust the patch accordingly.

INLINE COMMENTS
  sys/netinet/tcp_lro.c:655 Sure :)
  sys/netinet/tcp_lro.c:684 Sounds fine to me.  I did the byte limit before 
(https://reviews.freebsd.org/D4825).  But it turns out the ACKs need seperate 
limit (append count based).  To make them consistent, I changed the original 
patch to use append count too.

REVISION DETAIL
  https://reviews.freebsd.org/D5185

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: sepherosa_gmail.com, network, adrian, delphij, royger, decui_microsoft.com, 
honzhan_microsoft.com, howard0su_gmail.com, hselasky, np, transport, gallatin
Cc: freebsd-virtualization-list, freebsd-net-list
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

[Bug 206932] Realtek 8111 card stops responding under high load in netmap mode

2016-02-04 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=206932

Olivier - interfaSys sàrl  changed:

   What|Removed |Added

Version|10.2-STABLE |11.0-CURRENT

--- Comment #1 from Olivier - interfaSys sàrl  
---
I've just tested on 11-CURRENT and got the same results.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: swaping ring slots between NIC ring and Host ring does not always success

2016-02-04 Thread Victor Detoni

I'm sorry, I made mistake. To workaround this try `ip link set $IFACE
promisc on`



On Thu, Feb 4, 2016 at 10:04 PM, Xiaoye Sun  wrote:

> Yes. all the interfaces are up. Are you able to get ARP request when the
> interfaces are down?
>
>
> On Thursday, February 4, 2016, Victor Detoni 
> wrote:
>
>> Both interfaces are up? Like ifconfig... up
>>
>> I had this the same problem and I solve with commands above
>>
>> Em quinta-feira, 4 de fevereiro de 2016, Xiaoye Sun 
>> escreveu:
>>
>>> Hi Luigi,
>>>
>>> Thanks for your explanation.
>>>
>>> I used three machines to do this experiment. They are directly connected.
>>>
>>> [(machine1) eth1]---[eth2 (machine2) eth3]---[eth4 (machine3)].
>>>
>>> First, I tried to run bridge.c on machine2 using the command *bridge -i
>>> netmap:eth2 -i netmap:eth3*. (sender receiver or XYZ were not running on
>>> machine 1or3)
>>>
>>> For my understanding, in this setup, machine2 will be transparent to
>>> machine1&3 since it forwards packet from its eth2 to eth3 and vice versa
>>> without any modification to the packets.
>>>
>>> I tried to ping machine 3 from machine 1 using the command like *ping
>>> 10.11.10.3*. However, it still does not success.
>>> This is because that before machine1 sends ping message to machine3, it
>>> will first send a ARP request message to get the mac address of machine3.
>>> machine3 gets that ARP request, and send the reply back (I use tcpdump to
>>> verify that machine3 gets the ARP request and send out the ARP reply).
>>> However, machine1 does not get the ARP reply.
>>>
>>> I checked that the bridge can only forwarding packet in one direction at
>>> the same time. it gets the ARP request but doesn't see the ARP reply
>>> (*pkt_queued* always returns 0 for one nic...).
>>>
>>> This behavior looks very weird to me. Do you think there is a
>>> compatibility
>>> issues between netmap and the os I am using? Is there a verified linux
>>> distribution (also the version) that perfectly works well with netmap?
>>>
>>> The OS I use is 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 (2015-05-24)
>>> x86_64 GNU/Linux.
>>> Linux kernel version is *3.16.0-4-amd64*
>>>
>>>
>>> Thanks!
>>> Xiaoye
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Feb 3, 2016 at 2:12 AM, Luigi Rizzo  wrote:
>>>
>>> > On Tue, Feb 2, 2016 at 10:48 PM, Xiaoye Sun 
>>> wrote:
>>> > >
>>> > >
>>> > > On Mon, Feb 1, 2016 at 11:34 PM, Luigi Rizzo 
>>> wrote:
>>> > >>
>>> > >> On Tue, Feb 2, 2016 at 6:23 AM, Xiaoye Sun 
>>> wrote:
>>> > >> > Hi Luigi,
>>> > >> >
>>> > >> > I have to clarify about the *jumping issue* about the slot
>>> indexes.
>>> > >> > In the bridge.c program, the slot index never jumps and it
>>> increases
>>> > >> > sequentially.
>>> > >> > In the receiver.c program, the udp packet seq jumps and I showed
>>> the
>>> > >> > slot
>>> > >> > index that each udp packet uses. So the slot index jumps together
>>> with
>>> > >> > the
>>> > >> > udp seq (at the receiver program only).
>>> > >>
>>> > >> So let me understand, is the "slot" some information written
>>> > >> in the packet by bridge.c (referring to the rx or tx slot,
>>> > >> I am not sure) and then read and printed by receiver.c
>>> > >> (which gets the packet through recvfrom so there isn't
>>> > >> really any slot index) ?
>>> > >>
>>> > > It works in the other way:
>>> > > The bridge.c checks the seq numbers of the udp packets in netmap
>>> slots
>>> > (in
>>> > > nic rx ring) before the swap; then it records the seq number, slot
>>> > > number(both rx and tx (tx indexes were not shown in the previous
>>> email
>>> > since
>>> > > they all look correct)) and buf_idx (rx and tx). The bridge.c does
>>> not
>>> > > change anything in the buffer and it knows the slot and buf_idx that
>>> a
>>> > > packet uses. Please refer to the added code in *process_rings*
>>> function
>>> > > http://www.owlnet.rice.edu/~xs6/bridge.c
>>> > > The receiver.c checks the seq numbers only and print out the seq
>>> numbers
>>> > it
>>> > > receive sequentially.
>>> > > With these information, I manually match the seq number I got from
>>> > > receiver.c and the seq number I got from bridge.c. So we know what
>>> is the
>>> > > seq order the receiver sees and which slot a packet uses when
>>> bridge.c
>>> > swaps
>>> > > the buf_idxs.
>>> > >
>>> > >> Do you see any ordering inversion when the receiver
>>> > >> gets packets through the NETMAP API (e.g. using bridge.c
>>> > >> instead of receiver.c) ?
>>> > >>
>>> > > There is no ordering inversion seen by bridge.c (As I said in the
>>> > previous
>>> > > paragraph, the bridge.c checks the seq number and I did not see any
>>> order
>>> > > inversion in THIS simple experiment (In my multicast protocol
>>> (mentioned
>>> > in
>>> > > the first email), there is ordering inversion. But let us solve the
>>> > simple
>>> > > bridge.c's problem first. I think

[Bug 206904] tailq crash/nd inet6

2016-02-04 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=206904

--- Comment #2 from Larry Rosenman  ---
Created attachment 166582
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=166582=edit
another one

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

[Bug 206904] tailq crash/nd inet6

2016-02-04 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=206904

--- Comment #3 from Larry Rosenman  ---
Created attachment 166583
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=166583=edit
and a 3rd

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

[Bug 206904] tailq crash/nd inet6

2016-02-04 Thread bugzilla-noreply

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=206904

--- Comment #4 from Larry Rosenman  ---
vmcore's are ALL available, and I can give a @FreeBSD.org dev access.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

dev.netmap.buf_size and packett size from host

2016-02-04 Thread Eduardo Meyer

hello,

I have a netmap application which has host mode bridge/fwd, with default
settings I have the following error some often:

884.260394 [2950] netmap_transmit   igb1 from_host, drop packet
size 2962 > 2048

the only application which relies on host mode is bird, so those packets
are probably from bird daemon, when I get those errors I get bird sessions
failing and restart

I raised dev.netmap.buf_size to 5000 it ajusted to 5120, things got better
but I still have logs:

netmap_transmit   igb1 from_host, drop packet size 5858 > 5120

Now the main question is, when dev.netmap.buf_size is 2048 the application
uses 1.3G of RAM but when I raise to 5120 it uses 3G of RAM.

So I need to understand, is this packet size really related from what I get
from the application packets coming from host to netmap? If so can I allow
for bigger sizes, like 16k (lo0 mtu) without pre-alloc so much more RAM?

thank you

E. Meyer
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: dev.netmap.buf_size and packett size from host

2016-02-04 Thread Luigi Rizzo

Make sure you disable TSO on the interface used in netmap
mode, and then check that you use an MTU of 1500 on that
interface.
You should not receive frames larger than MTU coming from
the host in these conditions.

cheers
luigi


On Thu, Feb 4, 2016 at 3:26 PM, Eduardo Meyer  wrote:
> hello,
>
> I have a netmap application which has host mode bridge/fwd, with default
> settings I have the following error some often:
>
> 884.260394 [2950] netmap_transmit   igb1 from_host, drop packet
> size 2962 > 2048
>
> the only application which relies on host mode is bird, so those packets
> are probably from bird daemon, when I get those errors I get bird sessions
> failing and restart
>
> I raised dev.netmap.buf_size to 5000 it ajusted to 5120, things got better
> but I still have logs:
>
> netmap_transmit   igb1 from_host, drop packet size 5858 > 5120
>
> Now the main question is, when dev.netmap.buf_size is 2048 the application
> uses 1.3G of RAM but when I raise to 5120 it uses 3G of RAM.
>
> So I need to understand, is this packet size really related from what I get
> from the application packets coming from host to netmap? If so can I allow
> for bigger sizes, like 16k (lo0 mtu) without pre-alloc so much more RAM?
>
> thank you
>
> E. Meyer
> ___
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"



-- 
-+---
 Prof. Luigi RIZZO, ri...@iet.unipi.it  . Dip. di Ing. dell'Informazione
 http://www.iet.unipi.it/~luigi/. Universita` di Pisa
 TEL  +39-050-2217533   . via Diotisalvi 2
 Mobile   +39-338-6809875   . 56122 PISA (Italy)
-+---
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

RE: ixgbe: Network performance tuning (#TCP connections)

2016-02-04 Thread Hongjiang Zhang

Did you enable LRO on FreeBSD side (check 'ifconfig' output)? Linux default 
enables GRO (see the output of 'ethtool -k eth0').

Thanks
Hongjiang Zhang

-Original Message-
From: owner-freebsd-...@freebsd.org [mailto:owner-freebsd-...@freebsd.org] On 
Behalf Of Meyer, Wolfgang
Sent: Wednesday, February 3, 2016 9:37 PM
To: 'freebsd-net@FreeBSD.org' 
Cc: 'freebsd-performa...@freebsd.org' 
Subject: ixgbe: Network performance tuning (#TCP connections)

Hello,

we are evaluating network performance on a DELL-Server (PowerEdge R930 with 4 
Sockets, hw.model: Intel(R) Xeon(R) CPU E7-8891 v3 @ 2.80GHz) with 10 
GbE-Cards. We use programs that on server side accepts connections on a 
IP-address+port from the client side and after establishing the connection data 
is sent in turns between server and client in a predefined pattern (server side 
sends more data than client side) with sleeps in between the send phases. The 
test set-up is chosen in such way that every client process initiates 500 
connections handled in threads and on the server side each process representing 
an IP/Port pair also handles 500 connections in threads.

The number of connections is then increased and the overall network througput 
is observed using nload. On FreeBSD (on server side) roughly at 50,000 
connections errors begin to occur and the overall throughput won't increase 
further with more connections. With Linux on the server side it is possible to 
establish more than 120,000 connections and at 50,000 connections the overall 
throughput ist double that of FreeBSD with the same sending pattern. 
Furthermore system load on FreeBSD is much higher with 50 % system usage on 
each core and 80 % interrupt usage on the 8 cores handling the interrupt queues 
for the NIC. In comparison Linux has <10 % system usage, <10 % user usage and 
about 15 % interrupt usage on the 16 cores handling the network interrupts for 
50,000 connections.

Varying the numbers for the NIC interrupt queues won't change the performance 
(rather worsens the situation). Disabling Hyperthreading (utilising 40 cores) 
degrades the performance. Increasing MAXCPU to utilise all 80 cores won't 
improve compared to 64 cores, atkbd and uart had to be disabled to avoid kernel 
panics with increased MAXCPU (thanks to Andre Oppermann for investigating 
this). Initiallly the tests were made on 10.2 Release, later I switched to 10 
Stable (later with ixgbe driver version 3.1.0) but that didn't change the 
numbers.

Some sysctl configurables were modified along the network performance 
guidelines found on the net (e.g. 
https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fcalomel.org%2ffreebsd_network_tuning.html%2c=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1=xsMoC%2b1ZcnoHBnPqhLUMDIr8VLBcLejnrXgkRyDWzYc%3d
 
https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fwww.freebsd.org%2fdoc%2fhandbook%2fconfigtuning-kernel-limits.html%2c=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1=XNqvrYfTNzfe2btrip%2f5FoX3iTTpTSbNrDjbhtVBevo%3d
 
https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fpleiades.ucsc.edu%2fhyades%2fFreeBSD_Network_Tuning=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1=%2bQ66X%2frnqNakX%2fSGcK08QTTrsDjUUWBHOXu6%2fOBIBN
 Q%3d) but most of them didn't have any measuarable impact. Final sysctl.conf 
and loader.conf settings see below. Actually the only tunables that provided 
any improvement were identified to be hw.ix.txd, and hw.ix.rxd that were 
reduced (!) to the minimum value of 64 and hw.ix.tx_process_limit and 
hw.ix.rx_process_limit that were set to -1.

Any ideas what tunables might be changed to get a higher number of TCP 
connections (it's not a question of the overall throughput as changing the 
sending pattern allows me to fully utilise the 10Gb bandwidth)? How can I 
determine where the kernel is spending its time that causes the high CPU load? 
Any pointers are highly appreciated, I can't believe that there is such a 
blatant difference in network performance compared to Linux.

Regards,
Wolfgang

:
cc_htcp_load="YES"
hw.ix.txd="64"
hw.ix.rxd="64"
hw.ix.tx_process_limit="-1"
hw.ix.rx_process_limit="-1"
hw.ix.num_queues="8"
#hw.ix.enable_aim="0"
#hw.ix.max_interrupt_rate="31250"

#net.isr.maxthreads="16"

:
kern.ipc.soacceptqueue=1024

kern.ipc.maxsockbuf=16777216
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216

net.inet.tcp.tso=0
net.inet.tcp.mssdflt=1460
net.inet.tcp.minmss=1300

net.inet.tcp.nolocaltimewait=1
net.inet.tcp.syncache.rexmtlimit=0

#net.inet.tcp.syncookies=0
net.inet.tcp.drop_synfin=1
net.inet.tcp.fast_finwait2_recycle=1

net.inet.tcp.icmp_may_rst=0
net.inet.tcp.msl=5000
net.inet.tcp.path_mtu_discovery=0

Re: dev.netmap.buf_size and packett size from host

2016-02-04 Thread Eduardo Meyer

mtu is good, TSO was on, thank you will retest right now.

which other port features should I disable? I only disabled txcsum and
rxcsum before, now tso on the list, anything else in netmap mode?

On Thu, Feb 4, 2016 at 12:29 PM, Luigi Rizzo  wrote:

> Make sure you disable TSO on the interface used in netmap
> mode, and then check that you use an MTU of 1500 on that
> interface.
> You should not receive frames larger than MTU coming from
> the host in these conditions.
>
> cheers
> luigi
>
>
> On Thu, Feb 4, 2016 at 3:26 PM, Eduardo Meyer 
> wrote:
> > hello,
> >
> > I have a netmap application which has host mode bridge/fwd, with default
> > settings I have the following error some often:
> >
> > 884.260394 [2950] netmap_transmit   igb1 from_host, drop packet
> > size 2962 > 2048
> >
> > the only application which relies on host mode is bird, so those packets
> > are probably from bird daemon, when I get those errors I get bird
> sessions
> > failing and restart
> >
> > I raised dev.netmap.buf_size to 5000 it ajusted to 5120, things got
> better
> > but I still have logs:
> >
> > netmap_transmit   igb1 from_host, drop packet size 5858 > 5120
> >
> > Now the main question is, when dev.netmap.buf_size is 2048 the
> application
> > uses 1.3G of RAM but when I raise to 5120 it uses 3G of RAM.
> >
> > So I need to understand, is this packet size really related from what I
> get
> > from the application packets coming from host to netmap? If so can I
> allow
> > for bigger sizes, like 16k (lo0 mtu) without pre-alloc so much more RAM?
> >
> > thank you
> >
> > E. Meyer
> > ___
> > freebsd-net@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
>
>
>
> --
> -+---
>  Prof. Luigi RIZZO, ri...@iet.unipi.it  . Dip. di Ing. dell'Informazione
>  http://www.iet.unipi.it/~luigi/. Universita` di Pisa
>  TEL  +39-050-2217533   . via Diotisalvi 2
>  Mobile   +39-338-6809875   . 56122 PISA (Italy)
> -+---
>



-- 
===
Eduardo Meyer
pessoal: dudu.me...@gmail.com
profissional: ddm.farmac...@saude.gov.br
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

RE: ixgbe: Network performance tuning (#TCP connections)

2016-02-04 Thread Hongjiang Zhang

Please check whether LRO is enabled on your FreeBSD server with "ifconfig". 
Linux default enables GRO (see the output of 'ethtool -k eth0'), which covers 
LRO optimization.

Thanks
Hongjiang Zhang

-Original Message-
From: owner-freebsd-...@freebsd.org [mailto:owner-freebsd-...@freebsd.org] On 
Behalf Of Meyer, Wolfgang
Sent: Wednesday, February 3, 2016 9:37 PM
To: 'freebsd-net@FreeBSD.org' 
Cc: 'freebsd-performa...@freebsd.org' 
Subject: ixgbe: Network performance tuning (#TCP connections)

Hello,

we are evaluating network performance on a DELL-Server (PowerEdge R930 with 4 
Sockets, hw.model: Intel(R) Xeon(R) CPU E7-8891 v3 @ 2.80GHz) with 10 
GbE-Cards. We use programs that on server side accepts connections on a 
IP-address+port from the client side and after establishing the connection data 
is sent in turns between server and client in a predefined pattern (server side 
sends more data than client side) with sleeps in between the send phases. The 
test set-up is chosen in such way that every client process initiates 500 
connections handled in threads and on the server side each process representing 
an IP/Port pair also handles 500 connections in threads.

The number of connections is then increased and the overall network througput 
is observed using nload. On FreeBSD (on server side) roughly at 50,000 
connections errors begin to occur and the overall throughput won't increase 
further with more connections. With Linux on the server side it is possible to 
establish more than 120,000 connections and at 50,000 connections the overall 
throughput ist double that of FreeBSD with the same sending pattern. 
Furthermore system load on FreeBSD is much higher with 50 % system usage on 
each core and 80 % interrupt usage on the 8 cores handling the interrupt queues 
for the NIC. In comparison Linux has <10 % system usage, <10 % user usage and 
about 15 % interrupt usage on the 16 cores handling the network interrupts for 
50,000 connections.

Varying the numbers for the NIC interrupt queues won't change the performance 
(rather worsens the situation). Disabling Hyperthreading (utilising 40 cores) 
degrades the performance. Increasing MAXCPU to utilise all 80 cores won't 
improve compared to 64 cores, atkbd and uart had to be disabled to avoid kernel 
panics with increased MAXCPU (thanks to Andre Oppermann for investigating 
this). Initiallly the tests were made on 10.2 Release, later I switched to 10 
Stable (later with ixgbe driver version 3.1.0) but that didn't change the 
numbers.

Some sysctl configurables were modified along the network performance 
guidelines found on the net (e.g. 
https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fcalomel.org%2ffreebsd_network_tuning.html%2c=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1=xsMoC%2b1ZcnoHBnPqhLUMDIr8VLBcLejnrXgkRyDWzYc%3d
 
https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fwww.freebsd.org%2fdoc%2fhandbook%2fconfigtuning-kernel-limits.html%2c=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1=XNqvrYfTNzfe2btrip%2f5FoX3iTTpTSbNrDjbhtVBevo%3d
 
https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fpleiades.ucsc.edu%2fhyades%2fFreeBSD_Network_Tuning=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1=%2bQ66X%2frnqNakX%2fSGcK08QTTrsDjUUWBHOXu6%2fOBIBN
 Q%3d) but most of them didn't have any measuarable impact. Final sysctl.conf 
and loader.conf settings see below. Actually the only tunables that provided 
any improvement were identified to be hw.ix.txd, and hw.ix.rxd that were 
reduced (!) to the minimum value of 64 and hw.ix.tx_process_limit and 
hw.ix.rx_process_limit that were set to -1.

Any ideas what tunables might be changed to get a higher number of TCP 
connections (it's not a question of the overall throughput as changing the 
sending pattern allows me to fully utilise the 10Gb bandwidth)? How can I 
determine where the kernel is spending its time that causes the high CPU load? 
Any pointers are highly appreciated, I can't believe that there is such a 
blatant difference in network performance compared to Linux.

Regards,
Wolfgang

:
cc_htcp_load="YES"
hw.ix.txd="64"
hw.ix.rxd="64"
hw.ix.tx_process_limit="-1"
hw.ix.rx_process_limit="-1"
hw.ix.num_queues="8"
#hw.ix.enable_aim="0"
#hw.ix.max_interrupt_rate="31250"

#net.isr.maxthreads="16"

:
kern.ipc.soacceptqueue=1024

kern.ipc.maxsockbuf=16777216
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216

net.inet.tcp.tso=0
net.inet.tcp.mssdflt=1460
net.inet.tcp.minmss=1300

net.inet.tcp.nolocaltimewait=1
net.inet.tcp.syncache.rexmtlimit=0

#net.inet.tcp.syncookies=0
net.inet.tcp.drop_synfin=1
net.inet.tcp.fast_finwait2_recycle=1

net.inet.tcp.icmp_may_rst=0
net.inet.tcp.msl=5000

[Differential] [Closed] D5159: hyperv/hn: Recover half of the chimney sending space

2016-02-04 Thread Phabricator

This revision was automatically updated to reflect the committed changes.
Closed by commit rS295303: hyperv/hn: Recover half of the chimney sending space 
(authored by sephe).

CHANGED PRIOR TO COMMIT
  https://reviews.freebsd.org/D5159?vs=12926=13037#toc

REPOSITORY
  rS FreeBSD src repository

CHANGES SINCE LAST UPDATE
  https://reviews.freebsd.org/D5159?vs=12926=13037

REVISION DETAIL
  https://reviews.freebsd.org/D5159

AFFECTED FILES
  head/sys/dev/hyperv/netvsc/hv_net_vsc.c

CHANGE DETAILS
  diff --git a/head/sys/dev/hyperv/netvsc/hv_net_vsc.c 
b/head/sys/dev/hyperv/netvsc/hv_net_vsc.c
  --- a/head/sys/dev/hyperv/netvsc/hv_net_vsc.c
  +++ b/head/sys/dev/hyperv/netvsc/hv_net_vsc.c
  @@ -136,15 +136,15 @@
int i;
   
for (i = 0; i < bitsmap_words; i++) {
  - idx = ffs(~bitsmap[i]);
  + idx = ffsl(~bitsmap[i]);
if (0 == idx)
continue;
   
idx--;
  - if (i * BITS_PER_LONG + idx >= net_dev->send_section_count)
  - return (ret);
  + KASSERT(i * BITS_PER_LONG + idx < net_dev->send_section_count,
  + ("invalid i %d and idx %lu", i, idx));
   
  - if (synch_test_and_set_bit(idx, [i]))
  + if (atomic_testandset_long([i], idx))
continue;
   
ret = i * BITS_PER_LONG + idx;
  @@ -789,8 +789,27 @@
if (NULL != net_vsc_pkt) {
if (net_vsc_pkt->send_buf_section_idx !=
NVSP_1_CHIMNEY_SEND_INVALID_SECTION_INDEX) {
  - 
synch_change_bit(net_vsc_pkt->send_buf_section_idx,
  - net_dev->send_section_bitsmap);
  + u_long mask;
  + int idx;
  +
  + idx = net_vsc_pkt->send_buf_section_idx /
  + BITS_PER_LONG;
  + KASSERT(idx < net_dev->bitsmap_words,
  + ("invalid section index %u",
  +  net_vsc_pkt->send_buf_section_idx));
  + mask = 1UL <<
  + (net_vsc_pkt->send_buf_section_idx %
  +  BITS_PER_LONG);
  +
  + KASSERT(net_dev->send_section_bitsmap[idx] &
  + mask,
  + ("index bitmap 0x%lx, section index %u, "
  +  "bitmap idx %d, bitmask 0x%lx",
  +  net_dev->send_section_bitsmap[idx],
  +  net_vsc_pkt->send_buf_section_idx,
  +  idx, mask));
  + atomic_clear_long(
  + _dev->send_section_bitsmap[idx], mask);
}

/* Notify the layer above us */

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: sepherosa_gmail.com, delphij, royger, decui_microsoft.com, 
honzhan_microsoft.com, howard0su_gmail.com, adrian, network
Cc: freebsd-net-list
diff --git a/head/sys/dev/hyperv/netvsc/hv_net_vsc.c b/head/sys/dev/hyperv/netvsc/hv_net_vsc.c
--- a/head/sys/dev/hyperv/netvsc/hv_net_vsc.c
+++ b/head/sys/dev/hyperv/netvsc/hv_net_vsc.c
@@ -136,15 +136,15 @@
 	int i;
 
 	for (i = 0; i < bitsmap_words; i++) {
-		idx = ffs(~bitsmap[i]);
+		idx = ffsl(~bitsmap[i]);
 		if (0 == idx)
 			continue;
 
 		idx--;
-		if (i * BITS_PER_LONG + idx >= net_dev->send_section_count)
-			return (ret);
+		KASSERT(i * BITS_PER_LONG + idx < net_dev->send_section_count,
+		("invalid i %d and idx %lu", i, idx));
 
-		if (synch_test_and_set_bit(idx, [i]))
+		if (atomic_testandset_long([i], idx))
 			continue;
 
 		ret = i * BITS_PER_LONG + idx;
@@ -789,8 +789,27 @@
 		if (NULL != net_vsc_pkt) {
 			if (net_vsc_pkt->send_buf_section_idx !=
 			NVSP_1_CHIMNEY_SEND_INVALID_SECTION_INDEX) {
-synch_change_bit(net_vsc_pkt->send_buf_section_idx,
-net_dev->send_section_bitsmap);
+u_long mask;
+int idx;
+
+idx = net_vsc_pkt->send_buf_section_idx /
+BITS_PER_LONG;
+KASSERT(idx < net_dev->bitsmap_words,
+("invalid section index %u",
+ net_vsc_pkt->send_buf_section_idx));
+mask = 1UL <<
+(net_vsc_pkt->send_buf_section_idx %
+ BITS_PER_LONG);
+
+KASSERT(net_dev->send_section_bitsmap[idx] &
+mask,
+("index bitmap 0x%lx, section index %u, "
+ "bitmap idx %d, bitmask 0x%lx",
+ net_dev->send_section_bitsmap[idx],
+ net_vsc_pkt->send_buf_section_idx,
+ idx, mask));
+atomic_clear_long(
+_dev->send_section_bitsmap[idx], mask);
 			}
 			
 			/* Notify the layer above us */

___

[Differential] [Closed] D5166: hyperv/hn: Increase LRO entry count to 128 by default

2016-02-04 Thread Phabricator

This revision was automatically updated to reflect the committed changes.
Closed by commit rS295304: hyperv/hn: Increase LRO entry count to 128 by 
default (authored by sephe).

CHANGED PRIOR TO COMMIT
  https://reviews.freebsd.org/D5166?vs=12947=13038#toc

REPOSITORY
  rS FreeBSD src repository

CHANGES SINCE LAST UPDATE
  https://reviews.freebsd.org/D5166?vs=12947=13038

REVISION DETAIL
  https://reviews.freebsd.org/D5166

AFFECTED FILES
  head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c

CHANGE DETAILS
  diff --git a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c 
b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
  --- a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
  +++ b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
  @@ -132,6 +132,8 @@
   /* YYY should get it from the underlying channel */
   #define HN_TX_DESC_CNT   512
   
  +#define HN_LROENT_CNT_DEF128
  +
   #define HN_RNDIS_MSG_LEN \
   (sizeof(rndis_msg) + \
RNDIS_VLAN_PPI_SIZE +   \
  @@ -232,6 +234,13 @@
   static int hn_direct_tx_size = HN_DIRECT_TX_SIZE_DEF;
   TUNABLE_INT("dev.hn.direct_tx_size", _direct_tx_size);
   
  +#if defined(INET) || defined(INET6)
  +#if __FreeBSD_version >= 1100095
  +static int hn_lro_entry_count = HN_LROENT_CNT_DEF;
  +TUNABLE_INT("dev.hn.lro_entry_count", _lro_entry_count);
  +#endif
  +#endif
  +
   /*
* Forward declarations
*/
  @@ -335,6 +344,11 @@
   #if __FreeBSD_version >= 1100045
int tso_maxlen;
   #endif
  +#if defined(INET) || defined(INET6)
  +#if __FreeBSD_version >= 1100095
  + int lroent_cnt;
  +#endif
  +#endif
   
sc = device_get_softc(dev);
if (sc == NULL) {
  @@ -417,9 +431,17 @@
}
   
   #if defined(INET) || defined(INET6)
  +#if __FreeBSD_version >= 1100095
  + lroent_cnt = hn_lro_entry_count;
  + if (lroent_cnt < TCP_LRO_ENTRIES)
  + lroent_cnt = TCP_LRO_ENTRIES;
  + tcp_lro_init_args(>hn_lro, ifp, lroent_cnt, 0);
  + device_printf(dev, "LRO: entry count %d\n", lroent_cnt);
  +#else
tcp_lro_init(>hn_lro);
/* Driver private LRO settings */
sc->hn_lro.ifp = ifp;
  +#endif
   #ifdef HN_LRO_HIWAT
sc->hn_lro.lro_hiwat = sc->hn_lro_hiwat;
   #endif
  @@ -547,6 +569,12 @@
SYSCTL_ADD_INT(dc_ctx, dc_child, OID_AUTO, "direct_tx_size",
CTLFLAG_RD, _direct_tx_size, 0,
"Size of the packet for direct transmission");
  +#if defined(INET) || defined(INET6)
  +#if __FreeBSD_version >= 1100095
  + SYSCTL_ADD_INT(dc_ctx, dc_child, OID_AUTO, "lro_entry_count",
  + CTLFLAG_RD, _lro_entry_count, 0, "LRO entry count");
  +#endif
  +#endif
}
   
return (0);

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: sepherosa_gmail.com, delphij, royger, decui_microsoft.com, 
honzhan_microsoft.com, howard0su_gmail.com, adrian, network
Cc: freebsd-net-list
diff --git a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
--- a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
+++ b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
@@ -132,6 +132,8 @@
 /* YYY should get it from the underlying channel */
 #define HN_TX_DESC_CNT			512
 
+#define HN_LROENT_CNT_DEF		128
+
 #define HN_RNDIS_MSG_LEN		\
 (sizeof(rndis_msg) +		\
  RNDIS_VLAN_PPI_SIZE +		\
@@ -232,6 +234,13 @@
 static int hn_direct_tx_size = HN_DIRECT_TX_SIZE_DEF;
 TUNABLE_INT("dev.hn.direct_tx_size", _direct_tx_size);
 
+#if defined(INET) || defined(INET6)
+#if __FreeBSD_version >= 1100095
+static int hn_lro_entry_count = HN_LROENT_CNT_DEF;
+TUNABLE_INT("dev.hn.lro_entry_count", _lro_entry_count);
+#endif
+#endif
+
 /*
  * Forward declarations
  */
@@ -335,6 +344,11 @@
 #if __FreeBSD_version >= 1100045
 	int tso_maxlen;
 #endif
+#if defined(INET) || defined(INET6)
+#if __FreeBSD_version >= 1100095
+	int lroent_cnt;
+#endif
+#endif
 
 	sc = device_get_softc(dev);
 	if (sc == NULL) {
@@ -417,9 +431,17 @@
 	}
 
 #if defined(INET) || defined(INET6)
+#if __FreeBSD_version >= 1100095
+	lroent_cnt = hn_lro_entry_count;
+	if (lroent_cnt < TCP_LRO_ENTRIES)
+		lroent_cnt = TCP_LRO_ENTRIES;
+	tcp_lro_init_args(>hn_lro, ifp, lroent_cnt, 0);
+	device_printf(dev, "LRO: entry count %d\n", lroent_cnt);
+#else
 	tcp_lro_init(>hn_lro);
 	/* Driver private LRO settings */
 	sc->hn_lro.ifp = ifp;
+#endif
 #ifdef HN_LRO_HIWAT
 	sc->hn_lro.lro_hiwat = sc->hn_lro_hiwat;
 #endif
@@ -547,6 +569,12 @@
 		SYSCTL_ADD_INT(dc_ctx, dc_child, OID_AUTO, "direct_tx_size",
 		CTLFLAG_RD, _direct_tx_size, 0,
 		"Size of the packet for direct transmission");
+#if defined(INET) || defined(INET6)
+#if __FreeBSD_version >= 1100095
+		SYSCTL_ADD_INT(dc_ctx, dc_child, OID_AUTO, "lro_entry_count",
+		CTLFLAG_RD, _lro_entry_count, 0, "LRO entry count");
+#endif
+#endif
 	}
 
 	return (0);

[Differential] [Closed] D5085: hyperv/hn: Avoid duplicate csum features settings

2016-02-04 Thread Phabricator

This revision was automatically updated to reflect the committed changes.
Closed by commit rS295296: hyperv/hn: Avoid duplicate csum features settings 
(authored by sephe).

CHANGED PRIOR TO COMMIT
  https://reviews.freebsd.org/D5085?vs=12744=13030#toc

REPOSITORY
  rS FreeBSD src repository

CHANGES SINCE LAST UPDATE
  https://reviews.freebsd.org/D5085?vs=12744=13030

REVISION DETAIL
  https://reviews.freebsd.org/D5085

AFFECTED FILES
  head/sys/dev/hyperv/netvsc/hv_net_vsc.h
  head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: sepherosa_gmail.com, delphij, royger, decui_microsoft.com, 
howard0su_gmail.com, honzhan_microsoft.com, adrian, network
Cc: freebsd-net-list
diff --git a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
--- a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
+++ b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
@@ -176,6 +176,14 @@
 CSUM_IP_ISCSI|CSUM_IP6_UDP|CSUM_IP6_TCP|CSUM_IP6_SCTP|		\
 CSUM_IP6_TSO|CSUM_IP6_ISCSI)
 
+/*
+ * Only enable UDP checksum offloading when it is on 2012R2 or
+ * later.  UDP checksum offloading doesn't work on earlier
+ * Windows releases.
+ */
+#define HN_CSUM_ASSIST_WIN8	(CSUM_TCP)
+#define HN_CSUM_ASSIST		(CSUM_UDP | CSUM_TCP)
+
 /* XXX move to netinet/tcp_lro.h */
 #define HN_LRO_HIWAT_MAX65535
 #define HN_LRO_HIWAT_DEFHN_LRO_HIWAT_MAX
@@ -444,15 +452,12 @@
 	ifp->if_capenable |=
 	IFCAP_VLAN_HWTAGGING | IFCAP_VLAN_MTU | IFCAP_HWCSUM | IFCAP_TSO |
 	IFCAP_LRO;
-	/*
-	 * Only enable UDP checksum offloading when it is on 2012R2 or
-	 * later. UDP checksum offloading doesn't work on earlier
-	 * Windows releases.
-	 */
+
 	if (hv_vmbus_protocal_version >= HV_VMBUS_VERSION_WIN8_1)
-		ifp->if_hwassist = CSUM_TCP | CSUM_UDP | CSUM_TSO;
+		sc->hn_csum_assist = HN_CSUM_ASSIST;
 	else
-		ifp->if_hwassist = CSUM_TCP | CSUM_TSO;
+		sc->hn_csum_assist = HN_CSUM_ASSIST_WIN8;
+	ifp->if_hwassist = sc->hn_csum_assist | CSUM_TSO;
 
 	error = hv_rf_on_device_add(device_ctx, _info);
 	if (error)
@@ -1506,47 +1511,40 @@
 		error = 0;
 		break;
 	case SIOCSIFCAP:
+		NV_LOCK(sc);
+
 		mask = ifr->ifr_reqcap ^ ifp->if_capenable;
 		if (mask & IFCAP_TXCSUM) {
-			if (IFCAP_TXCSUM & ifp->if_capenable) {
-ifp->if_capenable &= ~IFCAP_TXCSUM;
-ifp->if_hwassist &= ~(CSUM_TCP | CSUM_UDP);
-			} else {
-ifp->if_capenable |= IFCAP_TXCSUM;
-/*
- * Only enable UDP checksum offloading on
- * Windows Server 2012R2 or later releases.
- */
-if (hv_vmbus_protocal_version >=
-HV_VMBUS_VERSION_WIN8_1) {
-	ifp->if_hwassist |=
-	(CSUM_TCP | CSUM_UDP);
-} else {
-	ifp->if_hwassist |= CSUM_TCP;
-}
-			}
+			ifp->if_capenable ^= IFCAP_TXCSUM;
+			if (ifp->if_capenable & IFCAP_TXCSUM)
+ifp->if_hwassist |= sc->hn_csum_assist;
+			else
+ifp->if_hwassist &= ~sc->hn_csum_assist;
 		}
 
-		if (mask & IFCAP_RXCSUM) {
-			if (IFCAP_RXCSUM & ifp->if_capenable) {
-ifp->if_capenable &= ~IFCAP_RXCSUM;
-			} else {
-ifp->if_capenable |= IFCAP_RXCSUM;
-			}
-		}
+		if (mask & IFCAP_RXCSUM)
+			ifp->if_capenable ^= IFCAP_RXCSUM;
+
 		if (mask & IFCAP_LRO)
 			ifp->if_capenable ^= IFCAP_LRO;
 
 		if (mask & IFCAP_TSO4) {
 			ifp->if_capenable ^= IFCAP_TSO4;
-			ifp->if_hwassist ^= CSUM_IP_TSO;
+			if (ifp->if_capenable & IFCAP_TSO4)
+ifp->if_hwassist |= CSUM_IP_TSO;
+			else
+ifp->if_hwassist &= ~CSUM_IP_TSO;
 		}
 
 		if (mask & IFCAP_TSO6) {
 			ifp->if_capenable ^= IFCAP_TSO6;
-			ifp->if_hwassist ^= CSUM_IP6_TSO;
+			if (ifp->if_capenable & IFCAP_TSO6)
+ifp->if_hwassist |= CSUM_IP6_TSO;
+			else
+ifp->if_hwassist &= ~CSUM_IP6_TSO;
 		}
 
+		NV_UNLOCK(sc);
 		error = 0;
 		break;
 	case SIOCADDMULTI:
diff --git a/head/sys/dev/hyperv/netvsc/hv_net_vsc.h b/head/sys/dev/hyperv/netvsc/hv_net_vsc.h
--- a/head/sys/dev/hyperv/netvsc/hv_net_vsc.h
+++ b/head/sys/dev/hyperv/netvsc/hv_net_vsc.h
@@ -1043,6 +1043,8 @@
 	u_long		hn_txdma_failed;
 	u_long		hn_tx_collapsed;
 	u_long		hn_tx_chimney;
+
+	uint64_t	hn_csum_assist;
 } hn_softc_t;
 
 

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

[Differential] [Closed] D5104: hyperv/hn: Obey IFCAP_RXCSUM

2016-02-04 Thread Phabricator

This revision was automatically updated to reflect the committed changes.
Closed by commit rS295301: hyperv/hn: Obey IFCAP_RXCSUM configure (authored by 
sephe).

CHANGED PRIOR TO COMMIT
  https://reviews.freebsd.org/D5104?vs=12783=13035#toc

REPOSITORY
  rS FreeBSD src repository

CHANGES SINCE LAST UPDATE
  https://reviews.freebsd.org/D5104?vs=12783=13035

REVISION DETAIL
  https://reviews.freebsd.org/D5104

AFFECTED FILES
  head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c

CHANGE DETAILS
  diff --git a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c 
b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
  --- a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
  +++ b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
  @@ -1142,7 +1142,7 @@
struct mbuf *m_new;
struct ifnet *ifp;
device_t dev = device_ctx->device;
  - int size, do_lro = 0;
  + int size, do_lro = 0, do_csum = 1;
   
if (sc == NULL) {
return (0); /* TODO: KYS how can this be! */
  @@ -1190,18 +1190,21 @@
}
m_new->m_pkthdr.rcvif = ifp;
   
  + if (__predict_false((ifp->if_capenable & IFCAP_RXCSUM) == 0))
  + do_csum = 0;
  +
/* receive side checksum offload */
if (csum_info != NULL) {
/* IP csum offload */
  - if (csum_info->receive.ip_csum_succeeded) {
  + if (csum_info->receive.ip_csum_succeeded && do_csum) {
m_new->m_pkthdr.csum_flags |=
(CSUM_IP_CHECKED | CSUM_IP_VALID);
sc->hn_csum_ip++;
}
   
/* TCP/UDP csum offload */
  - if (csum_info->receive.tcp_csum_succeeded ||
  - csum_info->receive.udp_csum_succeeded) {
  + if ((csum_info->receive.tcp_csum_succeeded ||
  +  csum_info->receive.udp_csum_succeeded) && do_csum) {
m_new->m_pkthdr.csum_flags |=
(CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
m_new->m_pkthdr.csum_data = 0x;
  @@ -1239,7 +1242,8 @@
   
pr = hn_check_iplen(m_new, hoff);
if (pr == IPPROTO_TCP) {
  - if (sc->hn_trust_hcsum & HN_TRUST_HCSUM_TCP) {
  + if (do_csum &&
  + (sc->hn_trust_hcsum & HN_TRUST_HCSUM_TCP)) {
sc->hn_csum_trusted++;
m_new->m_pkthdr.csum_flags |=
   (CSUM_IP_CHECKED | CSUM_IP_VALID |
  @@ -1249,14 +1253,15 @@
/* Rely on SW csum verification though... */
do_lro = 1;
} else if (pr == IPPROTO_UDP) {
  - if (sc->hn_trust_hcsum & HN_TRUST_HCSUM_UDP) {
  + if (do_csum &&
  + (sc->hn_trust_hcsum & HN_TRUST_HCSUM_UDP)) {
sc->hn_csum_trusted++;
m_new->m_pkthdr.csum_flags |=
   (CSUM_IP_CHECKED | CSUM_IP_VALID |
CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
m_new->m_pkthdr.csum_data = 0x;
}
  - } else if (pr != IPPROTO_DONE &&
  + } else if (pr != IPPROTO_DONE && do_csum &&
(sc->hn_trust_hcsum & HN_TRUST_HCSUM_IP)) {
sc->hn_csum_trusted++;
m_new->m_pkthdr.csum_flags |=

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: sepherosa_gmail.com, delphij, royger, decui_microsoft.com, 
honzhan_microsoft.com, howard0su_gmail.com, adrian, network
Cc: freebsd-net-list
diff --git a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
--- a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
+++ b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
@@ -1142,7 +1142,7 @@
 	struct mbuf *m_new;
 	struct ifnet *ifp;
 	device_t dev = device_ctx->device;
-	int size, do_lro = 0;
+	int size, do_lro = 0, do_csum = 1;
 
 	if (sc == NULL) {
 		return (0); /* TODO: KYS how can this be! */
@@ -1190,18 +1190,21 @@
 	}
 	m_new->m_pkthdr.rcvif = ifp;
 
+	if (__predict_false((ifp->if_capenable & IFCAP_RXCSUM) == 0))
+		do_csum = 0;
+
 	/* receive side checksum offload */
 	if (csum_info != NULL) {
 		/* IP csum offload */
-		if (csum_info->receive.ip_csum_succeeded) {
+		if (csum_info->receive.ip_csum_succeeded && do_csum) {
 			m_new->m_pkthdr.csum_flags |=
 			(CSUM_IP_CHECKED | CSUM_IP_VALID);
 			sc->hn_csum_ip++;
 		}
 
 		/* TCP/UDP csum offload */
-		if (csum_info->receive.tcp_csum_succeeded

[Differential] [Closed] D5098: hyperv/hn: Reorganize TX csum offloading

2016-02-04 Thread Phabricator

This revision was automatically updated to reflect the committed changes.
Closed by commit rS295297: hyperv/hn: Reorganize TX csum offloading (authored 
by sephe).

CHANGED PRIOR TO COMMIT
  https://reviews.freebsd.org/D5098?vs=12774=13031#toc

REPOSITORY
  rS FreeBSD src repository

CHANGES SINCE LAST UPDATE
  https://reviews.freebsd.org/D5098?vs=12774=13031

REVISION DETAIL
  https://reviews.freebsd.org/D5098

AFFECTED FILES
  head/sys/dev/hyperv/netvsc/hv_net_vsc.h
  head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: sepherosa_gmail.com, delphij, royger, decui_microsoft.com, 
howard0su_gmail.com, honzhan_microsoft.com, adrian, network
Cc: freebsd-net-list
diff --git a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
--- a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
+++ b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
@@ -167,16 +167,6 @@
 #define HN_TXD_FLAG_DMAMAP	0x2
 
 /*
- * A unified flag for all outbound check sum flags is useful,
- * and it helps avoiding unnecessary check sum calculation in
- * network forwarding scenario.
- */
-#define HV_CSUM_FOR_OUTBOUND		\
-(CSUM_IP|CSUM_IP_UDP|CSUM_IP_TCP|CSUM_IP_SCTP|CSUM_IP_TSO|		\
-CSUM_IP_ISCSI|CSUM_IP6_UDP|CSUM_IP6_TCP|CSUM_IP6_SCTP|		\
-CSUM_IP6_TSO|CSUM_IP6_ISCSI)
-
-/*
  * Only enable UDP checksum offloading when it is on 2012R2 or
  * later.  UDP checksum offloading doesn't work on earlier
  * Windows releases.
@@ -265,62 +255,6 @@
 #endif
 }
 
-/*
- * NetVsc get message transport protocol type 
- */
-static uint32_t get_transport_proto_type(struct mbuf *m_head)
-{
-	uint32_t ret_val = TRANSPORT_TYPE_NOT_IP;
-	uint16_t ether_type = 0;
-	int ether_len = 0;
-	struct ether_vlan_header *eh;
-#ifdef INET
-	struct ip *iph;
-#endif
-#ifdef INET6
-	struct ip6_hdr *ip6;
-#endif
-
-	eh = mtod(m_head, struct ether_vlan_header*);
-	if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
-		ether_len = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN;
-		ether_type = eh->evl_proto;
-	} else {
-		ether_len = ETHER_HDR_LEN;
-		ether_type = eh->evl_encap_proto;
-	}
-
-	switch (ntohs(ether_type)) {
-#ifdef INET6
-	case ETHERTYPE_IPV6:
-		ip6 = (struct ip6_hdr *)(m_head->m_data + ether_len);
-
-		if (IPPROTO_TCP == ip6->ip6_nxt) {
-			ret_val = TRANSPORT_TYPE_IPV6_TCP;
-		} else if (IPPROTO_UDP == ip6->ip6_nxt) {
-			ret_val = TRANSPORT_TYPE_IPV6_UDP;
-		}
-		break;
-#endif
-#ifdef INET
-	case ETHERTYPE_IP:
-		iph = (struct ip *)(m_head->m_data + ether_len);
-
-		if (IPPROTO_TCP == iph->ip_p) {
-			ret_val = TRANSPORT_TYPE_IPV4_TCP;
-		} else if (IPPROTO_UDP == iph->ip_p) {
-			ret_val = TRANSPORT_TYPE_IPV4_UDP;
-		}
-		break;
-#endif
-	default:
-		ret_val = TRANSPORT_TYPE_NOT_IP;
-		break;
-	}
-
-	return (ret_val);
-}
-
 static int
 hn_ifmedia_upd(struct ifnet *ifp __unused)
 {
@@ -783,16 +717,13 @@
 	hn_softc_t *sc = ifp->if_softc;
 	struct hv_device *device_ctx = vmbus_get_devctx(sc->hn_dev);
 	netvsc_dev *net_dev = sc->net_dev;
-	struct ether_vlan_header *eh;
 	rndis_msg *rndis_mesg;
 	rndis_packet *rndis_pkt;
 	rndis_per_packet_info *rppi;
 	ndis_8021q_info *rppi_vlan_info;
 	rndis_tcp_ip_csum_info *csum_info;
 	rndis_tcp_tso_info *tso_info;	
-	int ether_len;
 	uint32_t rndis_msg_size = 0;
-	uint32_t trans_proto_type;
 
 	if ((ifp->if_drv_flags & (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) !=
 	IFF_DRV_RUNNING)
@@ -872,101 +803,78 @@
 			m_head->m_pkthdr.ether_vtag & 0xfff;
 		}
 
-		/* Only check the flags for outbound and ignore the ones for inbound */
-		if (0 == (m_head->m_pkthdr.csum_flags & HV_CSUM_FOR_OUTBOUND)) {
-			goto pre_send;
-		}
-
-		eh = mtod(m_head, struct ether_vlan_header*);
-		if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
-			ether_len = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN;
-		} else {
-			ether_len = ETHER_HDR_LEN;
-		}
-
-		trans_proto_type = get_transport_proto_type(m_head);
-		if (TRANSPORT_TYPE_NOT_IP == trans_proto_type) {
-			goto pre_send;
-		}
-
-		/*
-		 * TSO packet needless to setup the send side checksum
-		 * offload.
-		 */
 		if (m_head->m_pkthdr.csum_flags & CSUM_TSO) {
-			goto do_tso;
-		}
+			struct ether_vlan_header *eh;
+			int ether_len;
 
-		/* setup checksum offload */
-		rndis_msg_size += RNDIS_CSUM_PPI_SIZE;
-		rppi = hv_set_rppi_data(rndis_mesg, RNDIS_CSUM_PPI_SIZE,
-		tcpip_chksum_info);
-		csum_info = (rndis_tcp_ip_csum_info *)((char*)rppi +
-		rppi->per_packet_info_offset);
-
-		if (trans_proto_type & (TYPE_IPV4 << 16)) {
-			csum_info->xmit.is_ipv4 = 1;
-		} else {
-			csum_info->xmit.is_ipv6 = 1;
-		}
+			eh = mtod(m_head, struct ether_vlan_header*);
+			if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
+ether_len = ETHER_HDR_LEN +
+ETHER_VLAN_ENCAP_LEN;
+			} else {
+ether_len = ETHER_HDR_LEN;
+			}
 
-		if (trans_proto_type & TYPE_TCP) {
-			csum_info->xmit.tcp_csum = 1;
-			csum_info->xmit.tcp_header_offset = 0;
-		}

[Differential] [Closed] D5099: hyperv/hn: Enable IP header checksum offloading

2016-02-04 Thread Phabricator

This revision was automatically updated to reflect the committed changes.
Closed by commit rS295298: hyperv/hn: Enable IP header checksum offloading 
(authored by sephe).

CHANGED PRIOR TO COMMIT
  https://reviews.freebsd.org/D5099?vs=12775=13032#toc

REPOSITORY
  rS FreeBSD src repository

CHANGES SINCE LAST UPDATE
  https://reviews.freebsd.org/D5099?vs=12775=13032

REVISION DETAIL
  https://reviews.freebsd.org/D5099

AFFECTED FILES
  head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c

CHANGE DETAILS
  diff --git a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c 
b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
  --- a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
  +++ b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
  @@ -172,7 +172,7 @@
* Windows releases.
*/
   #define HN_CSUM_ASSIST_WIN8  (CSUM_TCP)
  -#define HN_CSUM_ASSIST   (CSUM_UDP | CSUM_TCP)
  +#define HN_CSUM_ASSIST   (CSUM_IP | CSUM_UDP | CSUM_TCP)
   
   /* XXX move to netinet/tcp_lro.h */
   #define HN_LRO_HIWAT_MAX 65535
  @@ -867,6 +867,9 @@
rppi->per_packet_info_offset);
   
csum_info->xmit.is_ipv4 = 1;
  + if (m_head->m_pkthdr.csum_flags & CSUM_IP)
  + csum_info->xmit.ip_header_csum = 1;
  +
if (m_head->m_pkthdr.csum_flags & CSUM_TCP) {
csum_info->xmit.tcp_csum = 1;
csum_info->xmit.tcp_header_offset = 0;

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: sepherosa_gmail.com, delphij, royger, decui_microsoft.com, 
honzhan_microsoft.com, howard0su_gmail.com, adrian, network
Cc: freebsd-net-list
diff --git a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
--- a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
+++ b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
@@ -172,7 +172,7 @@
  * Windows releases.
  */
 #define HN_CSUM_ASSIST_WIN8	(CSUM_TCP)
-#define HN_CSUM_ASSIST		(CSUM_UDP | CSUM_TCP)
+#define HN_CSUM_ASSIST		(CSUM_IP | CSUM_UDP | CSUM_TCP)
 
 /* XXX move to netinet/tcp_lro.h */
 #define HN_LRO_HIWAT_MAX65535
@@ -867,6 +867,9 @@
 			rppi->per_packet_info_offset);
 
 			csum_info->xmit.is_ipv4 = 1;
+			if (m_head->m_pkthdr.csum_flags & CSUM_IP)
+csum_info->xmit.ip_header_csum = 1;
+
 			if (m_head->m_pkthdr.csum_flags & CSUM_TCP) {
 csum_info->xmit.tcp_csum = 1;
 csum_info->xmit.tcp_header_offset = 0;

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

[Differential] [Closed] D5103: hyperv/hn: Add sysctl to trust host side UDP and IP csum verification

2016-02-04 Thread Phabricator

This revision was automatically updated to reflect the committed changes.
Closed by commit rS295300: hyperv/hn: Add sysctls to trust host side UDP and IP 
csum verification (authored by sephe).

CHANGED PRIOR TO COMMIT
  https://reviews.freebsd.org/D5103?vs=12782=13034#toc

REPOSITORY
  rS FreeBSD src repository

CHANGES SINCE LAST UPDATE
  https://reviews.freebsd.org/D5103?vs=12782=13034

REVISION DETAIL
  https://reviews.freebsd.org/D5103

AFFECTED FILES
  head/sys/dev/hyperv/netvsc/hv_net_vsc.h
  head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: sepherosa_gmail.com, delphij, royger, decui_microsoft.com, 
howard0su_gmail.com, honzhan_microsoft.com, adrian, network
Cc: freebsd-net-list
diff --git a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
--- a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
+++ b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
@@ -210,6 +210,14 @@
 static int hn_trust_hosttcp = 1;
 TUNABLE_INT("dev.hn.trust_hosttcp", _trust_hosttcp);
 
+/* Trust udp datagrams verification on host side. */
+static int hn_trust_hostudp = 1;
+TUNABLE_INT("dev.hn.trust_hostudp", _trust_hostudp);
+
+/* Trust ip packets verification on host side. */
+static int hn_trust_hostip = 1;
+TUNABLE_INT("dev.hn.trust_hostip", _trust_hostip);
+
 #if __FreeBSD_version >= 1100045
 /* Limit TSO burst size */
 static int hn_tso_maxlen = 0;
@@ -239,6 +247,7 @@
 #ifdef HN_LRO_HIWAT
 static int hn_lro_hiwat_sysctl(SYSCTL_HANDLER_ARGS);
 #endif
+static int hn_trust_hcsum_sysctl(SYSCTL_HANDLER_ARGS);
 static int hn_tx_chimney_size_sysctl(SYSCTL_HANDLER_ARGS);
 static int hn_check_iplen(const struct mbuf *, int);
 static int hn_create_tx_ring(struct hn_softc *sc);
@@ -335,8 +344,13 @@
 	sc->hn_unit = unit;
 	sc->hn_dev = dev;
 	sc->hn_lro_hiwat = HN_LRO_HIWAT_DEF;
-	sc->hn_trust_hosttcp = hn_trust_hosttcp;
 	sc->hn_direct_tx_size = hn_direct_tx_size;
+	if (hn_trust_hosttcp)
+		sc->hn_trust_hcsum |= HN_TRUST_HCSUM_TCP;
+	if (hn_trust_hostudp)
+		sc->hn_trust_hcsum |= HN_TRUST_HCSUM_UDP;
+	if (hn_trust_hostip)
+		sc->hn_trust_hcsum |= HN_TRUST_HCSUM_IP;
 
 	sc->hn_tx_taskq = taskqueue_create_fast("hn_tx", M_WAITOK,
 	taskqueue_thread_enqueue, >hn_tx_taskq);
@@ -448,19 +462,30 @@
 	CTLTYPE_INT | CTLFLAG_RW, sc, 0, hn_lro_hiwat_sysctl,
 	"I", "LRO high watermark");
 #endif
-	SYSCTL_ADD_INT(ctx, child, OID_AUTO, "trust_hosttcp",
-	CTLFLAG_RW, >hn_trust_hosttcp, 0,
+	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "trust_hosttcp",
+	CTLTYPE_INT | CTLFLAG_RW, sc, HN_TRUST_HCSUM_TCP,
+	hn_trust_hcsum_sysctl, "I",
 	"Trust tcp segement verification on host side, "
 	"when csum info is missing");
+	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "trust_hostudp",
+	CTLTYPE_INT | CTLFLAG_RW, sc, HN_TRUST_HCSUM_UDP,
+	hn_trust_hcsum_sysctl, "I",
+	"Trust udp datagram verification on host side, "
+	"when csum info is missing");
+	SYSCTL_ADD_PROC(ctx, child, OID_AUTO, "trust_hostip",
+	CTLTYPE_INT | CTLFLAG_RW, sc, HN_TRUST_HCSUM_IP,
+	hn_trust_hcsum_sysctl, "I",
+	"Trust ip packet verification on host side, "
+	"when csum info is missing");
 	SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "csum_ip",
 	CTLFLAG_RW, >hn_csum_ip, "RXCSUM IP");
 	SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "csum_tcp",
 	CTLFLAG_RW, >hn_csum_tcp, "RXCSUM TCP");
 	SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "csum_udp",
 	CTLFLAG_RW, >hn_csum_udp, "RXCSUM UDP");
 	SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "csum_trusted",
 	CTLFLAG_RW, >hn_csum_trusted,
-	"# of TCP segements that we trust host's csum verification");
+	"# of packets that we trust host's csum verification");
 	SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "small_pkts",
 	CTLFLAG_RW, >hn_small_pkts, "# of small packets received");
 	SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "no_txdescs",
@@ -503,6 +528,14 @@
 		CTLFLAG_RD, _trust_hosttcp, 0,
 		"Trust tcp segement verification on host side, "
 		"when csum info is missing (global setting)");
+		SYSCTL_ADD_INT(dc_ctx, dc_child, OID_AUTO, "trust_hostudp",
+		CTLFLAG_RD, _trust_hostudp, 0,
+		"Trust udp datagram verification on host side, "
+		"when csum info is missing (global setting)");
+		SYSCTL_ADD_INT(dc_ctx, dc_child, OID_AUTO, "trust_hostip",
+		CTLFLAG_RD, _trust_hostip, 0,
+		"Trust ip packet verification on host side, "
+		"when csum info is missing (global setting)");
 		SYSCTL_ADD_INT(dc_ctx, dc_child, OID_AUTO, "tx_chimney_size",
 		CTLFLAG_RD, _tx_chimney_size, 0,
 		"Chimney send packet size limit");
@@ -1206,15 +1239,28 @@
 
 			pr = hn_check_iplen(m_new, hoff);
 			if (pr == IPPROTO_TCP) {
-if (sc->hn_trust_hosttcp) {
+if (sc->hn_trust_hcsum & HN_TRUST_HCSUM_TCP) {
 	sc->hn_csum_trusted++;
 	m_new->m_pkthdr.csum_flags |=
 	   (CSUM_IP_CHECKED | CSUM_IP_VALID |

[Differential] [Closed] D5175: hyperv/hn: Add an option to always do transmission scheduling

2016-02-04 Thread Phabricator

This revision was automatically updated to reflect the committed changes.
Closed by commit rS295306: hyperv/hn: Add an option to always do transmission 
scheduling (authored by sephe).

CHANGED PRIOR TO COMMIT
  https://reviews.freebsd.org/D5175?vs=12968=13040#toc

REPOSITORY
  rS FreeBSD src repository

CHANGES SINCE LAST UPDATE
  https://reviews.freebsd.org/D5175?vs=12968=13040

REVISION DETAIL
  https://reviews.freebsd.org/D5175

AFFECTED FILES
  head/sys/dev/hyperv/netvsc/hv_net_vsc.h
  head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c

CHANGE DETAILS
  diff --git a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c 
b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
  --- a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
  +++ b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
  @@ -534,6 +534,10 @@
SYSCTL_ADD_INT(ctx, child, OID_AUTO, "direct_tx_size",
CTLFLAG_RW, >hn_direct_tx_size, 0,
"Size of the packet for direct transmission");
  + SYSCTL_ADD_INT(ctx, child, OID_AUTO, "sched_tx",
  + CTLFLAG_RW, >hn_sched_tx, 0,
  + "Always schedule transmission "
  + "instead of doing direct transmission");
   
if (unit == 0) {
struct sysctl_ctx_list *dc_ctx;
  @@ -1602,26 +1606,31 @@
   static void
   hn_start(struct ifnet *ifp)
   {
  - hn_softc_t *sc;
  + struct hn_softc *sc = ifp->if_softc;
  +
  + if (sc->hn_sched_tx)
  + goto do_sched;
   
  - sc = ifp->if_softc;
if (NV_TRYLOCK(sc)) {
int sched;
   
sched = hn_start_locked(ifp, sc->hn_direct_tx_size);
NV_UNLOCK(sc);
if (!sched)
return;
}
  +do_sched:
taskqueue_enqueue_fast(sc->hn_tx_taskq, >hn_start_task);
   }
   
   static void
   hn_start_txeof(struct ifnet *ifp)
   {
  - hn_softc_t *sc;
  + struct hn_softc *sc = ifp->if_softc;
  +
  + if (sc->hn_sched_tx)
  + goto do_sched;
   
  - sc = ifp->if_softc;
if (NV_TRYLOCK(sc)) {
int sched;
   
  @@ -1633,6 +1642,7 @@
>hn_start_task);
}
} else {
  +do_sched:
/*
 * Release the OACTIVE earlier, with the hope, that
 * others could catch up.  The task will clear the
  diff --git a/head/sys/dev/hyperv/netvsc/hv_net_vsc.h 
b/head/sys/dev/hyperv/netvsc/hv_net_vsc.h
  --- a/head/sys/dev/hyperv/netvsc/hv_net_vsc.h
  +++ b/head/sys/dev/hyperv/netvsc/hv_net_vsc.h
  @@ -1023,6 +1023,7 @@
int hn_txdesc_avail;
int hn_txeof;
   
  + int hn_sched_tx;
int hn_direct_tx_size;
struct taskqueue *hn_tx_taskq;
struct task hn_start_task;

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: sepherosa_gmail.com, delphij, royger, decui_microsoft.com, 
howard0su_gmail.com, adrian, network, honzhan_microsoft.com
Cc: freebsd-virtualization-list, freebsd-net-list
diff --git a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
--- a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
+++ b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
@@ -534,6 +534,10 @@
 	SYSCTL_ADD_INT(ctx, child, OID_AUTO, "direct_tx_size",
 	CTLFLAG_RW, >hn_direct_tx_size, 0,
 	"Size of the packet for direct transmission");
+	SYSCTL_ADD_INT(ctx, child, OID_AUTO, "sched_tx",
+	CTLFLAG_RW, >hn_sched_tx, 0,
+	"Always schedule transmission "
+	"instead of doing direct transmission");
 
 	if (unit == 0) {
 		struct sysctl_ctx_list *dc_ctx;
@@ -1602,26 +1606,31 @@
 static void
 hn_start(struct ifnet *ifp)
 {
-	hn_softc_t *sc;
+	struct hn_softc *sc = ifp->if_softc;
+
+	if (sc->hn_sched_tx)
+		goto do_sched;
 
-	sc = ifp->if_softc;
 	if (NV_TRYLOCK(sc)) {
 		int sched;
 
 		sched = hn_start_locked(ifp, sc->hn_direct_tx_size);
 		NV_UNLOCK(sc);
 		if (!sched)
 			return;
 	}
+do_sched:
 	taskqueue_enqueue_fast(sc->hn_tx_taskq, >hn_start_task);
 }
 
 static void
 hn_start_txeof(struct ifnet *ifp)
 {
-	hn_softc_t *sc;
+	struct hn_softc *sc = ifp->if_softc;
+
+	if (sc->hn_sched_tx)
+		goto do_sched;
 
-	sc = ifp->if_softc;
 	if (NV_TRYLOCK(sc)) {
 		int sched;
 
@@ -1633,6 +1642,7 @@
 			>hn_start_task);
 		}
 	} else {
+do_sched:
 		/*
 		 * Release the OACTIVE earlier, with the hope, that
 		 * others could catch up.  The task will clear the
diff --git a/head/sys/dev/hyperv/netvsc/hv_net_vsc.h b/head/sys/dev/hyperv/netvsc/hv_net_vsc.h
--- a/head/sys/dev/hyperv/netvsc/hv_net_vsc.h
+++ b/head/sys/dev/hyperv/netvsc/hv_net_vsc.h
@@ -1023,6 +1023,7 @@
 	int		hn_txdesc_avail;
 	int		hn_txeof;
 
+	int		hn_sched_tx;
 	int		hn_direct_tx_size;
 	struct taskqueue *hn_tx_taskq;
 	struct task	hn_start_task;

___
freebsd-net@freebsd.org mailing list

[Differential] [Closed] D5102: hyperv/hn: Enable UDP RXCSUM

2016-02-04 Thread Phabricator

This revision was automatically updated to reflect the committed changes.
Closed by commit rS295299: hyperv/hn: Enable UDP RXCSUM (authored by sephe).

CHANGED PRIOR TO COMMIT
  https://reviews.freebsd.org/D5102?vs=12780=13033#toc

REPOSITORY
  rS FreeBSD src repository

CHANGES SINCE LAST UPDATE
  https://reviews.freebsd.org/D5102?vs=12780=13033

REVISION DETAIL
  https://reviews.freebsd.org/D5102

AFFECTED FILES
  head/sys/dev/hyperv/netvsc/hv_net_vsc.h
  head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c

CHANGE DETAILS
  diff --git a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c 
b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
  --- a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
  +++ b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
  @@ -456,6 +456,8 @@
CTLFLAG_RW, >hn_csum_ip, "RXCSUM IP");
SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "csum_tcp",
CTLFLAG_RW, >hn_csum_tcp, "RXCSUM TCP");
  + SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "csum_udp",
  + CTLFLAG_RW, >hn_csum_udp, "RXCSUM UDP");
SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "csum_trusted",
CTLFLAG_RW, >hn_csum_trusted,
"# of TCP segements that we trust host's csum verification");
  @@ -1156,20 +1158,24 @@
m_new->m_pkthdr.rcvif = ifp;
   
/* receive side checksum offload */
  - if (NULL != csum_info) {
  + if (csum_info != NULL) {
/* IP csum offload */
if (csum_info->receive.ip_csum_succeeded) {
m_new->m_pkthdr.csum_flags |=
(CSUM_IP_CHECKED | CSUM_IP_VALID);
sc->hn_csum_ip++;
}
   
  - /* TCP csum offload */
  - if (csum_info->receive.tcp_csum_succeeded) {
  + /* TCP/UDP csum offload */
  + if (csum_info->receive.tcp_csum_succeeded ||
  + csum_info->receive.udp_csum_succeeded) {
m_new->m_pkthdr.csum_flags |=
(CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
m_new->m_pkthdr.csum_data = 0x;
  - sc->hn_csum_tcp++;
  + if (csum_info->receive.tcp_csum_succeeded)
  + sc->hn_csum_tcp++;
  + else
  + sc->hn_csum_udp++;
}
   
if (csum_info->receive.ip_csum_succeeded &&
  diff --git a/head/sys/dev/hyperv/netvsc/hv_net_vsc.h 
b/head/sys/dev/hyperv/netvsc/hv_net_vsc.h
  --- a/head/sys/dev/hyperv/netvsc/hv_net_vsc.h
  +++ b/head/sys/dev/hyperv/netvsc/hv_net_vsc.h
  @@ -1036,6 +1036,7 @@
   
u_long  hn_csum_ip;
u_long  hn_csum_tcp;
  + u_long  hn_csum_udp;
u_long  hn_csum_trusted;
u_long  hn_lro_tried;
u_long  hn_small_pkts;

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: sepherosa_gmail.com, delphij, royger, decui_microsoft.com, 
honzhan_microsoft.com, howard0su_gmail.com, adrian, network
Cc: freebsd-net-list
diff --git a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
--- a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
+++ b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
@@ -456,6 +456,8 @@
 	CTLFLAG_RW, >hn_csum_ip, "RXCSUM IP");
 	SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "csum_tcp",
 	CTLFLAG_RW, >hn_csum_tcp, "RXCSUM TCP");
+	SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "csum_udp",
+	CTLFLAG_RW, >hn_csum_udp, "RXCSUM UDP");
 	SYSCTL_ADD_ULONG(ctx, child, OID_AUTO, "csum_trusted",
 	CTLFLAG_RW, >hn_csum_trusted,
 	"# of TCP segements that we trust host's csum verification");
@@ -1156,20 +1158,24 @@
 	m_new->m_pkthdr.rcvif = ifp;
 
 	/* receive side checksum offload */
-	if (NULL != csum_info) {
+	if (csum_info != NULL) {
 		/* IP csum offload */
 		if (csum_info->receive.ip_csum_succeeded) {
 			m_new->m_pkthdr.csum_flags |=
 			(CSUM_IP_CHECKED | CSUM_IP_VALID);
 			sc->hn_csum_ip++;
 		}
 
-		/* TCP csum offload */
-		if (csum_info->receive.tcp_csum_succeeded) {
+		/* TCP/UDP csum offload */
+		if (csum_info->receive.tcp_csum_succeeded ||
+		csum_info->receive.udp_csum_succeeded) {
 			m_new->m_pkthdr.csum_flags |=
 			(CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
 			m_new->m_pkthdr.csum_data = 0x;
-			sc->hn_csum_tcp++;
+			if (csum_info->receive.tcp_csum_succeeded)
+sc->hn_csum_tcp++;
+			else
+sc->hn_csum_udp++;
 		}
 
 		if (csum_info->receive.ip_csum_succeeded &&
diff --git a/head/sys/dev/hyperv/netvsc/hv_net_vsc.h b/head/sys/dev/hyperv/netvsc/hv_net_vsc.h
--- a/head/sys/dev/hyperv/netvsc/hv_net_vsc.h
+++ b/head/sys/dev/hyperv/netvsc/hv_net_vsc.h
@@ -1036,6 +1036,7 @@
 
 	u_long		hn_csum_ip;
 	u_long		hn_csum_tcp;
+	u_long		hn_csum_udp;
 	u_long		hn_csum_trusted;
 	u_long		hn_lro_tried;
 	u_long		hn_small_pkts;

[Differential] [Closed] D5158: hyperv/hn: Factor out hn_encap from hn_start_locked()

2016-02-04 Thread Phabricator

This revision was automatically updated to reflect the committed changes.
Closed by commit rS295302: hyperv/hn: Factor out hn_encap() from 
hn_start_locked() (authored by sephe).

CHANGED PRIOR TO COMMIT
  https://reviews.freebsd.org/D5158?vs=12925=13036#toc

REPOSITORY
  rS FreeBSD src repository

CHANGES SINCE LAST UPDATE
  https://reviews.freebsd.org/D5158?vs=12925=13036

REVISION DETAIL
  https://reviews.freebsd.org/D5158

AFFECTED FILES
  head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: sepherosa_gmail.com, delphij, royger, decui_microsoft.com, 
honzhan_microsoft.com, howard0su_gmail.com, adrian, network
Cc: freebsd-net-list
diff --git a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
--- a/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
+++ b/head/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
@@ -254,6 +254,7 @@
 static void hn_destroy_tx_ring(struct hn_softc *sc);
 static void hn_start_taskfunc(void *xsc, int pending);
 static void hn_txeof_taskfunc(void *xsc, int pending);
+static int hn_encap(struct hn_softc *, struct hn_txdesc *, struct mbuf **);
 
 static __inline void
 hn_set_lro_hiwat(struct hn_softc *sc, int hiwat)
@@ -744,31 +745,235 @@
 }
 
 /*
- * Start a transmit of one or more packets
+ * NOTE:
+ * This this function fails, then both txd and m_head0 will be freed
  */
 static int
-hn_start_locked(struct ifnet *ifp, int len)
+hn_encap(struct hn_softc *sc, struct hn_txdesc *txd, struct mbuf **m_head0)
 {
-	hn_softc_t *sc = ifp->if_softc;
-	struct hv_device *device_ctx = vmbus_get_devctx(sc->hn_dev);
-	netvsc_dev *net_dev = sc->net_dev;
+	bus_dma_segment_t segs[HN_TX_DATA_SEGCNT_MAX];
+	int error, nsegs, i;
+	struct mbuf *m_head = *m_head0;
+	netvsc_packet *packet;
 	rndis_msg *rndis_mesg;
 	rndis_packet *rndis_pkt;
 	rndis_per_packet_info *rppi;
-	ndis_8021q_info *rppi_vlan_info;
-	rndis_tcp_ip_csum_info *csum_info;
-	rndis_tcp_tso_info *tso_info;	
-	uint32_t rndis_msg_size = 0;
+	uint32_t rndis_msg_size;
+
+	packet = >netvsc_pkt;
+	packet->is_data_pkt = TRUE;
+	packet->tot_data_buf_len = m_head->m_pkthdr.len;
+
+	/*
+	 * extension points to the area reserved for the
+	 * rndis_filter_packet, which is placed just after
+	 * the netvsc_packet (and rppi struct, if present;
+	 * length is updated later).
+	 */
+	rndis_mesg = txd->rndis_msg;
+	/* XXX not necessary */
+	memset(rndis_mesg, 0, HN_RNDIS_MSG_LEN);
+	rndis_mesg->ndis_msg_type = REMOTE_NDIS_PACKET_MSG;
+
+	rndis_pkt = _mesg->msg.packet;
+	rndis_pkt->data_offset = sizeof(rndis_packet);
+	rndis_pkt->data_length = packet->tot_data_buf_len;
+	rndis_pkt->per_pkt_info_offset = sizeof(rndis_packet);
+
+	rndis_msg_size = RNDIS_MESSAGE_SIZE(rndis_packet);
+
+	if (m_head->m_flags & M_VLANTAG) {
+		ndis_8021q_info *rppi_vlan_info;
+
+		rndis_msg_size += RNDIS_VLAN_PPI_SIZE;
+		rppi = hv_set_rppi_data(rndis_mesg, RNDIS_VLAN_PPI_SIZE,
+		ieee_8021q_info);
+
+		rppi_vlan_info = (ndis_8021q_info *)((uint8_t *)rppi +
+		rppi->per_packet_info_offset);
+		rppi_vlan_info->u1.s1.vlan_id =
+		m_head->m_pkthdr.ether_vtag & 0xfff;
+	}
+
+	if (m_head->m_pkthdr.csum_flags & CSUM_TSO) {
+		rndis_tcp_tso_info *tso_info;	
+		struct ether_vlan_header *eh;
+		int ether_len;
+
+		/*
+		 * XXX need m_pullup and use mtodo
+		 */
+		eh = mtod(m_head, struct ether_vlan_header*);
+		if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN))
+			ether_len = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN;
+		else
+			ether_len = ETHER_HDR_LEN;
+
+		rndis_msg_size += RNDIS_TSO_PPI_SIZE;
+		rppi = hv_set_rppi_data(rndis_mesg, RNDIS_TSO_PPI_SIZE,
+		tcp_large_send_info);
+
+		tso_info = (rndis_tcp_tso_info *)((uint8_t *)rppi +
+		rppi->per_packet_info_offset);
+		tso_info->lso_v2_xmit.type =
+		RNDIS_TCP_LARGE_SEND_OFFLOAD_V2_TYPE;
+
+#ifdef INET
+		if (m_head->m_pkthdr.csum_flags & CSUM_IP_TSO) {
+			struct ip *ip =
+			(struct ip *)(m_head->m_data + ether_len);
+			unsigned long iph_len = ip->ip_hl << 2;
+			struct tcphdr *th =
+			(struct tcphdr *)((caddr_t)ip + iph_len);
+
+			tso_info->lso_v2_xmit.ip_version =
+			RNDIS_TCP_LARGE_SEND_OFFLOAD_IPV4;
+			ip->ip_len = 0;
+			ip->ip_sum = 0;
+
+			th->th_sum = in_pseudo(ip->ip_src.s_addr,
+			ip->ip_dst.s_addr, htons(IPPROTO_TCP));
+		}
+#endif
+#if defined(INET6) && defined(INET)
+		else
+#endif
+#ifdef INET6
+		{
+			struct ip6_hdr *ip6 = (struct ip6_hdr *)
+			(m_head->m_data + ether_len);
+			struct tcphdr *th = (struct tcphdr *)(ip6 + 1);
+
+			tso_info->lso_v2_xmit.ip_version =
+			RNDIS_TCP_LARGE_SEND_OFFLOAD_IPV6;
+			ip6->ip6_plen = 0;
+			th->th_sum = in6_cksum_pseudo(ip6, 0, IPPROTO_TCP, 0);
+		}
+#endif
+		tso_info->lso_v2_xmit.tcp_header_offset = 0;
+		tso_info->lso_v2_xmit.mss = m_head->m_pkthdr.tso_segsz;
+	} else if (m_head->m_pkthdr.csum_flags & sc->hn_csum_assist) {
+		rndis_tcp_ip_csum_info *csum_info;
+
+		rndis_msg_size +=

[Differential] [Updated] D5185: tcp/lro: Allow network drivers to set the limit for TCP ACK/data segment aggregation limit

2016-02-04 Thread gallatin (Andrew Gallatin)

gallatin added a comment.


  It might be nice to make these general tunables that could be done centrally 
and apply to all drivers, but that's probably outside the scope of the review.

INLINE COMMENTS
  sys/netinet/tcp_lro.c:655 Can you just initialize ack_append_limit to the max 
value for whatever type it is and eliminate the check for a 0 ack_append_limit? 
 That would eliminate one clause from this conditional.
  sys/netinet/tcp_lro.c:684 Rather than adding more clauses to this condition,  
how would to feel about setting an append limit in bytes, and replacing the 
hard-coded 65535 with this new limit?   The default lro init would initialize 
the new limit to 65535.  And hn(4) would initialize it in terms of multiples of 
its MTU.

REVISION DETAIL
  https://reviews.freebsd.org/D5185

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: sepherosa_gmail.com, network, adrian, delphij, royger, decui_microsoft.com, 
honzhan_microsoft.com, howard0su_gmail.com, hselasky, np, transport, gallatin
Cc: freebsd-virtualization-list, freebsd-net-list
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: ixgbe: Network performance tuning (#TCP connections)

2016-02-04 Thread Remy Nonnenmacher




On 02/03/16 14:37, Meyer, Wolfgang wrote:

Hello,

we are evaluating network performance on a DELL-Server (PowerEdge R930 with 4 
Sockets, hw.model: Intel(R) Xeon(R) CPU E7-8891 v3 @ 2.80GHz) with 10 
GbE-Cards. We use programs that on server side accepts connections on a 
IP-address+port from the client side and after establishing the connection data 
is sent in turns between server and client in a predefined pattern (server side 
sends more data than client side) with sleeps in between the send phases. The 
test set-up is chosen in such way that every client process initiates 500 
connections handled in threads and on the server side each process representing 
an IP/Port pair also handles 500 connections in threads.

The number of connections is then increased and the overall network througput is 
observed using nload. On FreeBSD (on server side) roughly at 50,000 connections 
errors begin to occur and the overall throughput won't increase further with more 
connections. With Linux on the server side it is possible to establish more than 
120,000 connections and at 50,000 connections the overall throughput ist double that 
of FreeBSD with the same sending pattern. Furthermore system load on FreeBSD is much 
higher with 50 % system usage on each core and 80 % interrupt usage on the 8 cores 
handling the interrupt queues for the NIC. In comparison Linux has <10 % system 
usage, <10 % user usage and about 15 % interrupt usage on the 16 cores handling 
the network interrupts for 50,000 connections.

Varying the numbers for the NIC interrupt queues won't change the performance 
(rather worsens the situation). Disabling Hyperthreading (utilising 40 cores) 
degrades the performance. Increasing MAXCPU to utilise all 80 cores won't 
improve compared to 64 cores, atkbd and uart had to be disabled to avoid kernel 
panics with increased MAXCPU (thanks to Andre Oppermann for investigating 
this). Initiallly the tests were made on 10.2 Release, later I switched to 10 
Stable (later with ixgbe driver version 3.1.0) but that didn't change the 
numbers.

Some sysctl configurables were modified along the network performance 
guidelines found on the net (e.g. 
https://calomel.org/freebsd_network_tuning.html, 
https://www.freebsd.org/doc/handbook/configtuning-kernel-limits.html, 
https://pleiades.ucsc.edu/hyades/FreeBSD_Network_Tuning) but most of them 
didn't have any measuarable impact. Final sysctl.conf and loader.conf settings 
see below. Actually the only tunables that provided any improvement were 
identified to be hw.ix.txd, and hw.ix.rxd that were reduced (!) to the minimum 
value of 64 and hw.ix.tx_process_limit and hw.ix.rx_process_limit that were set 
to -1.

Any ideas what tunables might be changed to get a higher number of TCP 
connections (it's not a question of the overall throughput as changing the 
sending pattern allows me to fully utilise the 10Gb bandwidth)? How can I 
determine where the kernel is spending its time that causes the high CPU load? 
Any pointers are highly appreciated, I can't believe that there is such a 
blatant difference in network performance compared to Linux.

Regards,
Wolfgang


[SNIP]

Hi Wolfgang,

hwpmc is your friend here if you need to investigate where are your processors 
wasting their time.

Either you will find them contending for network stack (probably the pcb hash 
table), either they are fighting each other in the scheduler's lock(s) trying 
to steal jobs from working ones.

Also check QPI links activity that may reveal interesting facts about PCI 
root-complexes geography vs processes locations and migration.

You have two options here: Either you persist in using a 4x10 core machine and 
you will have a long time rearranging stickyness of processes and interrupt to 
specific cores/packages (Driver, then isr rings, then userland) and police the 
whole thing (read peacekeeping the riot), either you go to the much simpler 
solution that is 1 (yes, one) socket machine, fastest available proc with low 
core (E5-1630v2/3 or 1650) that can handle 10G links hands down out-of-the-box.

Also note that there are specific and interesting optimization in the L2 
generation on -head that you may want to try if the problem is stack-centered.

You may also have a threading problem (userland ones). In the domain of 
counting instructions per packets (you can practice that with netmap as a 
wonderfull mean of really 'sensing' what 40Gbps is), threading is bad (and 
Hyperthreading is evil).

Thanks.

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: 10.2-RELEASE-p12 pf+GRE crashing

2016-02-04 Thread Matthew Grooms


On 2/3/2016 6:47 PM, Matthew Grooms wrote:
This turned out to be another issue that was patched in head but not 
back ported to stable. I can't explain why it didn't get tripped when 
GRE tunnels were disabled. With the patch applied, I can reload my 
rule sets again without crashing ...


https://svnweb.freebsd.org/base?view=revision=264689



I wanted to clarify in case another user runs into this issue and 
searches the mailing list history for a solution: The patch I applied to 
fix this particular kernel crash wasn't 264689, it was ...


https://svnweb.freebsd.org/base?view=revision=264915

Sorry for the misinformation. I cut and pasted the wrong link.

-Matthew


(kgdb) bt
#0  doadump (textdump=) at pcpu.h:219
#1  0x807c81f2 in kern_reboot (howto=260) at 
../../../kern/kern_shutdown.c:451
#2  0x807c85d5 in vpanic (fmt=, ap=optimized out>)

at ../../../kern/kern_shutdown.c:758
#3  0x807c8463 in panic (fmt=0x0) at 
../../../kern/kern_shutdown.c:687

#4  0x80bdc10b in trap_fatal (frame=,
eva=) at ../../../amd64/amd64/trap.c:851
#5  0x80bdc40d in trap_pfault (frame=0xfe233a80,
usermode=) at ../../../amd64/amd64/trap.c:674
#6  0x80bdbaaa in trap (frame=0xfe233a80)
at ../../../amd64/amd64/trap.c:440
#7  0x80bc1fa2 in calltrap () at 
../../../amd64/amd64/exception.S:236
#8  0x809c07f4 in pfr_detach_table (kt=0x0) at 
../../../netpfil/pf/pf_table.c:2047

#9  0x809a91f4 in pf_empty_pool (poola=0x813c3d68)
at ../../../netpfil/pf/pf_ioctl.c:354
#10 0x809ab3e5 in pfioctl (dev=, 
cmd=,
addr=0xf8005eaf6800 "", flags=, td=optimized out>)

at ../../../netpfil/pf/pf_ioctl.c:2189
#11 0x806b5659 in devfs_ioctl_f (fp=0xf8000a2927d0, 
com=3295691827,
data=0xf8005eaf6800, cred=, 
td=0xf8000a25f000)

at ../../../fs/devfs/devfs_vnops.c:785
#12 0x8081b805 in kern_ioctl (td=0xf8000a25f000, fd=optimized out>,

com=2) at file.h:320
#13 0x8081b500 in sys_ioctl (td=0xf8000a25f000, 
uap=0xfe234b40)

at ../../../kern/sys_generic.c:718
#14 0x80bdca27 in amd64_syscall (td=0xf8000a25f000, traced=0)
at subr_syscall.c:134
#15 0x80bc228b in Xfast_syscall () at 
../../../amd64/amd64/exception.S:396

#16 0x000800dd9fda in ?? ()
Previous frame inner to this frame (corrupt stack?)
Current language:  auto; currently minimal

-Matthew
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: dev.netmap.buf_size and packett size from host

2016-02-04 Thread Luigi Rizzo

disable all hardware accelerations when using netmap.

cheers
luigi

On Thu, Feb 4, 2016 at 3:34 PM, Eduardo Meyer  wrote:
> mtu is good, TSO was on, thank you will retest right now.
>
> which other port features should I disable? I only disabled txcsum and
> rxcsum before, now tso on the list, anything else in netmap mode?
>
> On Thu, Feb 4, 2016 at 12:29 PM, Luigi Rizzo  wrote:
>>
>> Make sure you disable TSO on the interface used in netmap
>> mode, and then check that you use an MTU of 1500 on that
>> interface.
>> You should not receive frames larger than MTU coming from
>> the host in these conditions.
>>
>> cheers
>> luigi
>>
>>
>> On Thu, Feb 4, 2016 at 3:26 PM, Eduardo Meyer 
>> wrote:
>> > hello,
>> >
>> > I have a netmap application which has host mode bridge/fwd, with default
>> > settings I have the following error some often:
>> >
>> > 884.260394 [2950] netmap_transmit   igb1 from_host, drop packet
>> > size 2962 > 2048
>> >
>> > the only application which relies on host mode is bird, so those packets
>> > are probably from bird daemon, when I get those errors I get bird
>> > sessions
>> > failing and restart
>> >
>> > I raised dev.netmap.buf_size to 5000 it ajusted to 5120, things got
>> > better
>> > but I still have logs:
>> >
>> > netmap_transmit   igb1 from_host, drop packet size 5858 > 5120
>> >
>> > Now the main question is, when dev.netmap.buf_size is 2048 the
>> > application
>> > uses 1.3G of RAM but when I raise to 5120 it uses 3G of RAM.
>> >
>> > So I need to understand, is this packet size really related from what I
>> > get
>> > from the application packets coming from host to netmap? If so can I
>> > allow
>> > for bigger sizes, like 16k (lo0 mtu) without pre-alloc so much more RAM?
>> >
>> > thank you
>> >
>> > E. Meyer
>> > ___
>> > freebsd-net@freebsd.org mailing list
>> > https://lists.freebsd.org/mailman/listinfo/freebsd-net
>> > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
>>
>>
>>
>> --
>> -+---
>>  Prof. Luigi RIZZO, ri...@iet.unipi.it  . Dip. di Ing. dell'Informazione
>>  http://www.iet.unipi.it/~luigi/. Universita` di Pisa
>>  TEL  +39-050-2217533   . via Diotisalvi 2
>>  Mobile   +39-338-6809875   . 56122 PISA (Italy)
>> -+---
>
>
>
>
> --
> ===
> Eduardo Meyer
> pessoal: dudu.me...@gmail.com
> profissional: ddm.farmac...@saude.gov.br



-- 
-+---
 Prof. Luigi RIZZO, ri...@iet.unipi.it  . Dip. di Ing. dell'Informazione
 http://www.iet.unipi.it/~luigi/. Universita` di Pisa
 TEL  +39-050-2217533   . via Diotisalvi 2
 Mobile   +39-338-6809875   . 56122 PISA (Italy)
-+---
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

[Differential] [Commented On] D5185: tcp/lro: Allow network drivers to set the limit for TCP ACK/data segment aggregation limit

2016-02-04 Thread adrian (Adrian Chadd)

adrian added inline comments.

INLINE COMMENTS
  sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c:455 this should be a separate 
commit

REVISION DETAIL
  https://reviews.freebsd.org/D5185

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: sepherosa_gmail.com, network, adrian, delphij, royger, decui_microsoft.com, 
honzhan_microsoft.com, howard0su_gmail.com, hselasky, np, transport, gallatin
Cc: freebsd-virtualization-list, freebsd-net-list
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: dev.netmap.buf_size and packett size from host

2016-02-04 Thread Eduardo Meyer

when i disabled LRO it ruined communication on the port to network (altough
from host was ok), everything else looks good and so far I had no problem
with big packets coming from host, so -tso did it, thank you!

On Thu, Feb 4, 2016 at 12:35 PM, Luigi Rizzo  wrote:

> disable all hardware accelerations when using netmap.
>
> cheers
> luigi
>
> On Thu, Feb 4, 2016 at 3:34 PM, Eduardo Meyer 
> wrote:
> > mtu is good, TSO was on, thank you will retest right now.
> >
> > which other port features should I disable? I only disabled txcsum and
> > rxcsum before, now tso on the list, anything else in netmap mode?
> >
> > On Thu, Feb 4, 2016 at 12:29 PM, Luigi Rizzo  wrote:
> >>
> >> Make sure you disable TSO on the interface used in netmap
> >> mode, and then check that you use an MTU of 1500 on that
> >> interface.
> >> You should not receive frames larger than MTU coming from
> >> the host in these conditions.
> >>
> >> cheers
> >> luigi
> >>
> >>
> >> On Thu, Feb 4, 2016 at 3:26 PM, Eduardo Meyer 
> >> wrote:
> >> > hello,
> >> >
> >> > I have a netmap application which has host mode bridge/fwd, with
> default
> >> > settings I have the following error some often:
> >> >
> >> > 884.260394 [2950] netmap_transmit   igb1 from_host, drop
> packet
> >> > size 2962 > 2048
> >> >
> >> > the only application which relies on host mode is bird, so those
> packets
> >> > are probably from bird daemon, when I get those errors I get bird
> >> > sessions
> >> > failing and restart
> >> >
> >> > I raised dev.netmap.buf_size to 5000 it ajusted to 5120, things got
> >> > better
> >> > but I still have logs:
> >> >
> >> > netmap_transmit   igb1 from_host, drop packet size 5858 > 5120
> >> >
> >> > Now the main question is, when dev.netmap.buf_size is 2048 the
> >> > application
> >> > uses 1.3G of RAM but when I raise to 5120 it uses 3G of RAM.
> >> >
> >> > So I need to understand, is this packet size really related from what
> I
> >> > get
> >> > from the application packets coming from host to netmap? If so can I
> >> > allow
> >> > for bigger sizes, like 16k (lo0 mtu) without pre-alloc so much more
> RAM?
> >> >
> >> > thank you
> >> >
> >> > E. Meyer
> >> > ___
> >> > freebsd-net@freebsd.org mailing list
> >> > https://lists.freebsd.org/mailman/listinfo/freebsd-net
> >> > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org
> "
> >>
> >>
> >>
> >> --
> >>
> -+---
> >>  Prof. Luigi RIZZO, ri...@iet.unipi.it  . Dip. di Ing.
> dell'Informazione
> >>  http://www.iet.unipi.it/~luigi/. Universita` di Pisa
> >>  TEL  +39-050-2217533   . via Diotisalvi 2
> >>  Mobile   +39-338-6809875   . 56122 PISA (Italy)
> >>
> -+---
> >
> >
> >
> >
> > --
> > ===
> > Eduardo Meyer
> > pessoal: dudu.me...@gmail.com
> > profissional: ddm.farmac...@saude.gov.br
>
>
>
> --
> -+---
>  Prof. Luigi RIZZO, ri...@iet.unipi.it  . Dip. di Ing. dell'Informazione
>  http://www.iet.unipi.it/~luigi/. Universita` di Pisa
>  TEL  +39-050-2217533   . via Diotisalvi 2
>  Mobile   +39-338-6809875   . 56122 PISA (Italy)
> -+---
>



-- 
===
Eduardo Meyer
pessoal: dudu.me...@gmail.com
profissional: ddm.farmac...@saude.gov.br
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

[Differential] [Request, 117 lines] D5185: tcp/lro: Allow network drivers to set the limit for TCP ACK/data segment aggregation limit

2016-02-04 Thread sepherosa_gmail.com (Sepherosa Ziehau)

sepherosa_gmail.com created this revision.
sepherosa_gmail.com added reviewers: network, adrian, delphij, royger, 
decui_microsoft.com, honzhan_microsoft.com, howard0su_gmail.com, gallatin, 
hselasky, np.
sepherosa_gmail.com added subscribers: freebsd-net-list, 
freebsd-virtualization-list.
Herald added a reviewer: transport.

REVISION SUMMARY
  It's append_cnt based.  Unless the network driver sets these two limits, its 
an NO-OP.
  
  For hn(4):
  
  - Set TCP ACK append limit to 1, i.e. aggregate 2 ACKs at most.  Aggregate 
anything more than 2 hurts TCP sending performance in hyperv.  This 
significantly improves the TCP sending performance when the number of 
concurrent connetion is low (2~8).  And greatly stabilize the TCP sending 
performance in other cases.
  - Set TCP data segments append limit to 25.  Without this limitation, hn(4) 
could aggregate ~45 TCP data segments for each connection (even at 64 or more 
connections) before dispatching them to socket code; large aggregation slows 
down ACK sending and eventually hurts/destabilizes TCP reception performance.  
This setting stabilizes and improves TCP reception performance for >4 
concurrent connections significantly.
  
  Make them sysctls so they could be adjusted.

REVISION DETAIL
  https://reviews.freebsd.org/D5185

AFFECTED FILES
  sys/dev/hyperv/netvsc/hv_net_vsc.h
  sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
  sys/netinet/tcp_lro.c
  sys/netinet/tcp_lro.h

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: sepherosa_gmail.com, network, transport, adrian, delphij, royger, 
decui_microsoft.com, honzhan_microsoft.com, howard0su_gmail.com, gallatin, 
hselasky, np
Cc: freebsd-virtualization-list, freebsd-net-list
diff --git a/sys/netinet/tcp_lro.h b/sys/netinet/tcp_lro.h
--- a/sys/netinet/tcp_lro.h
+++ b/sys/netinet/tcp_lro.h
@@ -91,6 +91,8 @@
 	unsigned	lro_cnt;
 	unsigned	lro_mbuf_count;
 	unsigned	lro_mbuf_max;
+	unsigned short	lro_ack_append_lim;
+	unsigned short	lro_data_append_lim;
 
 	struct lro_head	lro_active;
 	struct lro_head	lro_free;
diff --git a/sys/netinet/tcp_lro.c b/sys/netinet/tcp_lro.c
--- a/sys/netinet/tcp_lro.c
+++ b/sys/netinet/tcp_lro.c
@@ -88,6 +88,8 @@
 	lc->lro_mbuf_max = lro_mbufs;
 	lc->lro_cnt = lro_entries;
 	lc->ifp = ifp;
+	lc->lro_ack_append_lim = 0;
+	lc->lro_data_append_lim = 0;
 	SLIST_INIT(>lro_free);
 	SLIST_INIT(>lro_active);
 
@@ -646,6 +648,16 @@
 
 		if (tcp_data_len == 0) {
 			m_freem(m);
+			/*
+			 * Flush this LRO entry, if this ACK should
+			 * not be further delayed.
+			 */
+			if (lc->lro_ack_append_lim &&
+			le->append_cnt >= lc->lro_ack_append_lim) {
+SLIST_REMOVE(>lro_active, le, lro_entry,
+next);
+tcp_lro_flush(lc, le);
+			}
 			return (0);
 		}
 
@@ -664,9 +676,12 @@
 
 		/*
 		 * If a possible next full length packet would cause an
-		 * overflow, pro-actively flush now.
+		 * overflow, pro-actively flush now.  And if we are asked
+		 * to limit the data aggregate, flush this LRO entry now.
 		 */
-		if (le->p_len > (65535 - lc->ifp->if_mtu)) {
+		if (le->p_len > (65535 - lc->ifp->if_mtu) ||
+		(lc->lro_data_append_lim &&
+		 le->append_cnt >= lc->lro_data_append_lim)) {
 			SLIST_REMOVE(>lro_active, le, lro_entry, next);
 			tcp_lro_flush(lc, le);
 		} else
diff --git a/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c b/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
--- a/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
+++ b/sys/dev/hyperv/netvsc/hv_netvsc_drv_freebsd.c
@@ -176,14 +176,8 @@
 #define HN_CSUM_ASSIST_WIN8	(CSUM_TCP)
 #define HN_CSUM_ASSIST		(CSUM_IP | CSUM_UDP | CSUM_TCP)
 
-/* XXX move to netinet/tcp_lro.h */
-#define HN_LRO_HIWAT_MAX65535
-#define HN_LRO_HIWAT_DEFHN_LRO_HIWAT_MAX
-/* YYY 2*MTU is a bit rough, but should be good enough. */
-#define HN_LRO_HIWAT_MTULIM(ifp)			(2 * (ifp)->if_mtu)
-#define HN_LRO_HIWAT_ISVALID(sc, hiwat)			\
-((hiwat) >= HN_LRO_HIWAT_MTULIM((sc)->hn_ifp) ||	\
- (hiwat) <= HN_LRO_HIWAT_MAX)
+#define HN_LRO_ACK_APPEND_LIM	1
+#define HN_LRO_DATA_APPEND_LIM	25
 
 /*
  * Be aware that this sleepable mutex will exhibit WITNESS errors when
@@ -253,27 +247,16 @@
 static void hn_start_txeof(struct ifnet *ifp);
 static int hn_ifmedia_upd(struct ifnet *ifp);
 static void hn_ifmedia_sts(struct ifnet *ifp, struct ifmediareq *ifmr);
-#ifdef HN_LRO_HIWAT
-static int hn_lro_hiwat_sysctl(SYSCTL_HANDLER_ARGS);
-#endif
 static int hn_trust_hcsum_sysctl(SYSCTL_HANDLER_ARGS);
 static int hn_tx_chimney_size_sysctl(SYSCTL_HANDLER_ARGS);
+static int hn_lro_append_lim_sysctl(SYSCTL_HANDLER_ARGS);
 static int hn_check_iplen(const struct mbuf *, int);
 static int hn_create_tx_ring(struct hn_softc *sc);
 static void hn_destroy_tx_ring(struct hn_softc *sc);
 static void hn_start_taskfunc(void *xsc, int pending);
 static void hn_txeof_taskfunc(void *xsc, int pending);
 static int hn_encap(struct hn_softc *, struct hn_txdesc *, struct mbuf **);
 
-static __inline void

[Differential] [Updated] D5185: tcp/lro: Allow network drivers to set the limit for TCP ACK/data segment aggregation limit

2016-02-04 Thread sepherosa_gmail.com (Sepherosa Ziehau)

sepherosa_gmail.com updated the summary for this revision.

REVISION DETAIL
  https://reviews.freebsd.org/D5185

EMAIL PREFERENCES
  https://reviews.freebsd.org/settings/panel/emailpreferences/

To: sepherosa_gmail.com, network, adrian, delphij, royger, decui_microsoft.com, 
honzhan_microsoft.com, howard0su_gmail.com, gallatin, hselasky, np, transport
Cc: freebsd-virtualization-list, freebsd-net-list
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

40 matches

Mail list logo