Re: Traffic "corruption" in 12-stable

2020-08-04 Thread Joe Clarke


> On Aug 4, 2020, at 11:51, Mark Johnston  wrote:
> 
> On Mon, Aug 03, 2020 at 05:22:37PM -0400, Joe Clarke wrote:
>>> On Jul 27, 2020, at 15:41, Joe Clarke  wrote:
 On Jul 27, 2020, at 15:01, Mark Johnston  wrote:
 There are some fixes for vmx not present in stable/12 (yet).  I did a
 merge of a number of outstanding revisions.  Would you be able to test
 the patch?  I haven't observed any problems with it on a host using igb,
 but I have no ability to test vmx at the moment.
>>> 
>>> I’m down to test anything.  I did notice quite a few vmxnet3 changes around 
>>> performance that appealed to me.  I tried a few of them on my last kernel.  
>>> That took much longer to exhibit the problem, but eventually did.
>>> 
>>> I can tell you I don’t have all of these patches in, though.  I’ll build 
>>> with this diff and start running it now.  I’ll let you know how it goes.
>> 
>> So it’s been just over a week of runtime with this full patch set.  I have 
>> seen no further issues with ingress packet “truncation”, and performance has 
>> been what I expect.  I’m going to keep running, but I think this seems like 
>> a good set to MFC.
> 
> Done in r363844, thanks.

Thank you.  On day 8, and still no issues.

Joe


---
PGP Key : http://www.marcuscom.com/pgp.asc




___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Traffic "corruption" in 12-stable

2020-08-04 Thread Mark Johnston
On Mon, Aug 03, 2020 at 05:22:37PM -0400, Joe Clarke wrote:
> > On Jul 27, 2020, at 15:41, Joe Clarke  wrote:
> >> On Jul 27, 2020, at 15:01, Mark Johnston  wrote:
> >> There are some fixes for vmx not present in stable/12 (yet).  I did a
> >> merge of a number of outstanding revisions.  Would you be able to test
> >> the patch?  I haven't observed any problems with it on a host using igb,
> >> but I have no ability to test vmx at the moment.
> > 
> > I’m down to test anything.  I did notice quite a few vmxnet3 changes around 
> > performance that appealed to me.  I tried a few of them on my last kernel.  
> > That took much longer to exhibit the problem, but eventually did.
> > 
> > I can tell you I don’t have all of these patches in, though.  I’ll build 
> > with this diff and start running it now.  I’ll let you know how it goes.
> 
> So it’s been just over a week of runtime with this full patch set.  I have 
> seen no further issues with ingress packet “truncation”, and performance has 
> been what I expect.  I’m going to keep running, but I think this seems like a 
> good set to MFC.

Done in r363844, thanks.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Traffic "corruption" in 12-stable

2020-08-03 Thread Joe Clarke


> On Jul 27, 2020, at 15:41, Joe Clarke  wrote:
> 
> 
> 
>> On Jul 27, 2020, at 15:01, Mark Johnston  wrote:
>> 
>> On Sun, Jul 26, 2020 at 06:16:07PM -0400, Joe Clarke wrote:
>>> About two weeks ago, I upgraded from the latest 11-stable to the latest 
>>> 12-stable.  After that, I periodically see the network throughput come to a 
>>> near standstill.  This FreeBSD machine is an ESXi VM with two interfaces.  
>>> It acts as a router.  It uses vmxnet3 interfaces for both LAN and WAN.  It 
>>> runs ipfw with in-kernel NAT.  The LAN side uses a bridge with vmx0 and a 
>>> tap0 L2 VPN interface.  My LAN side uses an MTU of 9000, and my vmx1 (WAN 
>>> side) uses the default 1500.
>>> 
>>> Besides seeing massive packet loss and huge latency (~ 200 ms for on-LAN 
>>> ping times), I know the problem has occurred because my lldpd reports:
>>> 
>>> Jul 26 15:47:03 namale lldpd[1126]: frame too short for tlv received on 
>>> bridge0
>>> 
>>> And if I turn on ipfw verbose messages, I see tons of:
>>> 
>>> Jul 26 16:02:23 namale kernel: ipfw: pullup failed
>>> 
>>> This leads to me to believe packets are being corrupted on ingress.  I’ve 
>>> applied all the recent iflib changes, but the problem persists. What causes 
>>> it, I don’t know.
>>> 
>>> The only thing that changed (and yes, it’s a big one) is I upgraded to 
>>> 12-stable.  Meaning, the rest of the network infra and topology has 
>>> remained the same.  This did not happen at all in 11-stable.
>>> 
>>> I’m open to suggestions.
>> 
>> There are some fixes for vmx not present in stable/12 (yet).  I did a
>> merge of a number of outstanding revisions.  Would you be able to test
>> the patch?  I haven't observed any problems with it on a host using igb,
>> but I have no ability to test vmx at the moment.
> 
> I’m down to test anything.  I did notice quite a few vmxnet3 changes around 
> performance that appealed to me.  I tried a few of them on my last kernel.  
> That took much longer to exhibit the problem, but eventually did.
> 
> I can tell you I don’t have all of these patches in, though.  I’ll build with 
> this diff and start running it now.  I’ll let you know how it goes.

So it’s been just over a week of runtime with this full patch set.  I have seen 
no further issues with ingress packet “truncation”, and performance has been 
what I expect.  I’m going to keep running, but I think this seems like a good 
set to MFC.

Thanks again for your help.

Joe


---
PGP Key : http://www.marcuscom.com/pgp.asc




___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Traffic "corruption" in 12-stable

2020-07-27 Thread Joe Clarke


> On Jul 27, 2020, at 15:01, Mark Johnston  wrote:
> 
> On Sun, Jul 26, 2020 at 06:16:07PM -0400, Joe Clarke wrote:
>> About two weeks ago, I upgraded from the latest 11-stable to the latest 
>> 12-stable.  After that, I periodically see the network throughput come to a 
>> near standstill.  This FreeBSD machine is an ESXi VM with two interfaces.  
>> It acts as a router.  It uses vmxnet3 interfaces for both LAN and WAN.  It 
>> runs ipfw with in-kernel NAT.  The LAN side uses a bridge with vmx0 and a 
>> tap0 L2 VPN interface.  My LAN side uses an MTU of 9000, and my vmx1 (WAN 
>> side) uses the default 1500.
>> 
>> Besides seeing massive packet loss and huge latency (~ 200 ms for on-LAN 
>> ping times), I know the problem has occurred because my lldpd reports:
>> 
>> Jul 26 15:47:03 namale lldpd[1126]: frame too short for tlv received on 
>> bridge0
>> 
>> And if I turn on ipfw verbose messages, I see tons of:
>> 
>> Jul 26 16:02:23 namale kernel: ipfw: pullup failed
>> 
>> This leads to me to believe packets are being corrupted on ingress.  I’ve 
>> applied all the recent iflib changes, but the problem persists. What causes 
>> it, I don’t know.
>> 
>> The only thing that changed (and yes, it’s a big one) is I upgraded to 
>> 12-stable.  Meaning, the rest of the network infra and topology has remained 
>> the same.  This did not happen at all in 11-stable.
>> 
>> I’m open to suggestions.
> 
> There are some fixes for vmx not present in stable/12 (yet).  I did a
> merge of a number of outstanding revisions.  Would you be able to test
> the patch?  I haven't observed any problems with it on a host using igb,
> but I have no ability to test vmx at the moment.

I’m down to test anything.  I did notice quite a few vmxnet3 changes around 
performance that appealed to me.  I tried a few of them on my last kernel.  
That took much longer to exhibit the problem, but eventually did.

I can tell you I don’t have all of these patches in, though.  I’ll build with 
this diff and start running it now.  I’ll let you know how it goes.

Thanks!

Joe



---
PGP Key : http://www.marcuscom.com/pgp.asc




___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Traffic "corruption" in 12-stable

2020-07-27 Thread Mark Johnston
On Sun, Jul 26, 2020 at 06:16:07PM -0400, Joe Clarke wrote:
> About two weeks ago, I upgraded from the latest 11-stable to the latest 
> 12-stable.  After that, I periodically see the network throughput come to a 
> near standstill.  This FreeBSD machine is an ESXi VM with two interfaces.  It 
> acts as a router.  It uses vmxnet3 interfaces for both LAN and WAN.  It runs 
> ipfw with in-kernel NAT.  The LAN side uses a bridge with vmx0 and a tap0 L2 
> VPN interface.  My LAN side uses an MTU of 9000, and my vmx1 (WAN side) uses 
> the default 1500.
> 
> Besides seeing massive packet loss and huge latency (~ 200 ms for on-LAN ping 
> times), I know the problem has occurred because my lldpd reports:
> 
> Jul 26 15:47:03 namale lldpd[1126]: frame too short for tlv received on 
> bridge0
> 
> And if I turn on ipfw verbose messages, I see tons of:
> 
> Jul 26 16:02:23 namale kernel: ipfw: pullup failed
> 
> This leads to me to believe packets are being corrupted on ingress.  I’ve 
> applied all the recent iflib changes, but the problem persists. What causes 
> it, I don’t know.
> 
> The only thing that changed (and yes, it’s a big one) is I upgraded to 
> 12-stable.  Meaning, the rest of the network infra and topology has remained 
> the same.  This did not happen at all in 11-stable.
> 
> I’m open to suggestions.

There are some fixes for vmx not present in stable/12 (yet).  I did a
merge of a number of outstanding revisions.  Would you be able to test
the patch?  I haven't observed any problems with it on a host using igb,
but I have no ability to test vmx at the moment.

https://people.freebsd.org/~markj/patches/iflib-stable12.diff
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Traffic "corruption" in 12-stable

2020-07-27 Thread Joe Clarke


> On Jul 27, 2020, at 01:00, Eugene Grosbein  wrote:
> 
> 27.07.2020 5:16, Joe Clarke wrote:
> 
>> About two weeks ago, I upgraded from the latest 11-stable to the latest 
>> 12-stable.  After that, I periodically see the network throughput come to a 
>> near standstill.  This FreeBSD machine is an ESXi VM with two interfaces.  
>> It acts as a router.  It uses vmxnet3 interfaces for both LAN and WAN.  It 
>> runs ipfw with in-kernel NAT.  The LAN side uses a bridge with vmx0 and a 
>> tap0 L2 VPN interface.  My LAN side uses an MTU of 9000, and my vmx1 (WAN 
>> side) uses the default 1500.
>> 
>> Besides seeing massive packet loss and huge latency (~ 200 ms for on-LAN 
>> ping times), I know the problem has occurred because my lldpd reports:
>> 
>> Jul 26 15:47:03 namale lldpd[1126]: frame too short for tlv received on 
>> bridge0
>> 
>> And if I turn on ipfw verbose messages, I see tons of:
>> 
>> Jul 26 16:02:23 namale kernel: ipfw: pullup failed
>> 
>> This leads to me to believe packets are being corrupted on ingress.  I’ve 
>> applied all the recent iflib changes, but the problem persists. What causes 
>> it, I don’t know.
>> 
>> The only thing that changed (and yes, it’s a big one) is I upgraded to 
>> 12-stable.  Meaning, the rest of the network infra and topology has remained 
>> the same.  This did not happen at all in 11-stable.
>> 
>> I’m open to suggestions.
> 
> First, try: ifconfig $ifname -rxcsum -txcsum

Thanks for the suggestion.  I should have mentioned I’ve been initializing 
these two interfaces since 11-stable with:

ifconfig_vmx0="up mtu 9000 -tso -lro -vlanhwtso -rxcsum -txcsum -rxcsum6 
-txcsum6 -tso4 -tso6 -vlanhwcsum”
ifconfig_vmx1="DHCP -tso -lro -vlanhwtso -rxcsum -txcsum -rxcsum6 -txcsum6 
-tso4 -tso6 -vlanhwcsum”

And I’m running:

FreeBSD namale.marcuscom.com 12.1-STABLE FreeBSD 12.1-STABLE NAMALE  amd64 
1201520 1201520

I most recently built this yesterday, but the previous kernel that exhibited 
the problem was built about a week ago.  It had the fragment fixes for iflib.c.

Joe

> 


---
PGP Key : http://www.marcuscom.com/pgp.asc




___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Traffic "corruption" in 12-stable

2020-07-27 Thread Graham Menhennitt

On 27/7/20 3:00 pm, Eugene Grosbein wrote:

27.07.2020 5:16, Joe Clarke wrote:


About two weeks ago, I upgraded from the latest 11-stable to the latest 
12-stable.  After that, I periodically see the network throughput come to a 
near standstill.  This FreeBSD machine is an ESXi VM with two interfaces.  It 
acts as a router.  It uses vmxnet3 interfaces for both LAN and WAN.  It runs 
ipfw with in-kernel NAT.  The LAN side uses a bridge with vmx0 and a tap0 L2 
VPN interface.  My LAN side uses an MTU of 9000, and my vmx1 (WAN side) uses 
the default 1500.

Besides seeing massive packet loss and huge latency (~ 200 ms for on-LAN ping 
times), I know the problem has occurred because my lldpd reports:

Jul 26 15:47:03 namale lldpd[1126]: frame too short for tlv received on bridge0

And if I turn on ipfw verbose messages, I see tons of:

Jul 26 16:02:23 namale kernel: ipfw: pullup failed

This leads to me to believe packets are being corrupted on ingress.  I’ve 
applied all the recent iflib changes, but the problem persists. What causes it, 
I don’t know.

The only thing that changed (and yes, it’s a big one) is I upgraded to 
12-stable.  Meaning, the rest of the network infra and topology has remained 
the same.  This did not happen at all in 11-stable.

I’m open to suggestions.

First, try: ifconfig $ifname -rxcsum -txcsum

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


And possibly " -vlanhwtso -tso4" as well.

Graham

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Traffic "corruption" in 12-stable

2020-07-26 Thread Eugene Grosbein
27.07.2020 5:16, Joe Clarke wrote:

> About two weeks ago, I upgraded from the latest 11-stable to the latest 
> 12-stable.  After that, I periodically see the network throughput come to a 
> near standstill.  This FreeBSD machine is an ESXi VM with two interfaces.  It 
> acts as a router.  It uses vmxnet3 interfaces for both LAN and WAN.  It runs 
> ipfw with in-kernel NAT.  The LAN side uses a bridge with vmx0 and a tap0 L2 
> VPN interface.  My LAN side uses an MTU of 9000, and my vmx1 (WAN side) uses 
> the default 1500.
> 
> Besides seeing massive packet loss and huge latency (~ 200 ms for on-LAN ping 
> times), I know the problem has occurred because my lldpd reports:
> 
> Jul 26 15:47:03 namale lldpd[1126]: frame too short for tlv received on 
> bridge0
> 
> And if I turn on ipfw verbose messages, I see tons of:
> 
> Jul 26 16:02:23 namale kernel: ipfw: pullup failed
> 
> This leads to me to believe packets are being corrupted on ingress.  I’ve 
> applied all the recent iflib changes, but the problem persists. What causes 
> it, I don’t know.
> 
> The only thing that changed (and yes, it’s a big one) is I upgraded to 
> 12-stable.  Meaning, the rest of the network infra and topology has remained 
> the same.  This did not happen at all in 11-stable.
> 
> I’m open to suggestions.

First, try: ifconfig $ifname -rxcsum -txcsum

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Traffic "corruption" in 12-stable

2020-07-26 Thread Joe Clarke
About two weeks ago, I upgraded from the latest 11-stable to the latest 
12-stable.  After that, I periodically see the network throughput come to a 
near standstill.  This FreeBSD machine is an ESXi VM with two interfaces.  It 
acts as a router.  It uses vmxnet3 interfaces for both LAN and WAN.  It runs 
ipfw with in-kernel NAT.  The LAN side uses a bridge with vmx0 and a tap0 L2 
VPN interface.  My LAN side uses an MTU of 9000, and my vmx1 (WAN side) uses 
the default 1500.

Besides seeing massive packet loss and huge latency (~ 200 ms for on-LAN ping 
times), I know the problem has occurred because my lldpd reports:

Jul 26 15:47:03 namale lldpd[1126]: frame too short for tlv received on bridge0

And if I turn on ipfw verbose messages, I see tons of:

Jul 26 16:02:23 namale kernel: ipfw: pullup failed

This leads to me to believe packets are being corrupted on ingress.  I’ve 
applied all the recent iflib changes, but the problem persists. What causes it, 
I don’t know.

The only thing that changed (and yes, it’s a big one) is I upgraded to 
12-stable.  Meaning, the rest of the network infra and topology has remained 
the same.  This did not happen at all in 11-stable.

I’m open to suggestions.

Thanks.

Joe

---
PGP Key : http://www.marcuscom.com/pgp.asc




___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"