Re: [Bug 235031] [em] em0: poor NFS performance, strange behavior

2019-08-31 Thread Bruce Evans

On Thu, 15 Aug 2019 a bug that doesn't want repl...@freebsd.org wrote:


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235031

--- Comment #36 from Martin Birgmeier  ---
I just notice that the console and syslog have about 20 messages of

em: frame error: ignored
em: frame error: ignored
em: frame error: ignored
em: frame error: ignored
em: frame error: ignored

Uptime is 2 1/2 hours.


You seem to be using my old patch which is not in -current:

Index: em_txrx.c
XX ===
XX --- em_txrx.c(revision 348771)
XX +++ em_txrx.c(working copy)
XX @@ -629,9 +629,20 @@
XX 
XX  		/* Make sure bad packets are discarded */

XX  if (errors & E1000_RXD_ERR_FRAME_ERR_MASK) {
XX +#if 0
XX  adapter->dropped_pkts++;
XX -/* XXX fixup if common */
XX  return (EBADMSG);
XX +#else
XX +/*
XX + * XXX the above error handling is worse than none.
XX + * First it it drops 'i' packets before the current
XX + * one and doesn't count them.  Then it returns an
XX + * error.  iflib can't really handle this error.
XX + * It just resets, and this usually drops many more
XX + * packets (without counting them) and much time.
XX + */
XX +printf("lem: frame error: ignored\n");
XX +#endif
XX  }
XX 
XX  		ri->iri_frags[i].irf_flid = 0;

XX @@ -692,8 +703,12 @@
XX 
XX  		/* Make sure bad packets are discarded */

XX  if (staterr & E1000_RXDEXT_ERR_FRAME_ERR_MASK) {
XX +#if 0
XX  adapter->dropped_pkts++;
XX  return EBADMSG;
XX +#else
XX +printf("em: frame error: ignored\n");
XX +#endif
XX  }
XX 
XX  		ri->iri_frags[i].irf_flid = 0;


Without this patch, no message is printed and the device takes a long
time to recover (when I wrote the patch, recovery was from something
like a watchdog timeout after many seconds).  With the patch, the recovery
is good enough for nfs over udp to not lose any data or time out, but I
don't trust this so I print the message.

Pre-iflib versions of [l]em handled this correctly by dropping a single
packet, which was easy to do.  Unpatched iflib makes a mess by returning
with subsequent packets unprocessed.  It apparently just stops receiving
until kicked by a watchdog.

I don't know what causes this error.  Maybe just a bad cable or switch.
I don't see it for I218V with the same cable and switch.

Bruce
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [Bug 235031] [em] em0: poor NFS performance, strange behavior

2019-02-03 Thread Eugene Grosbein
04.02.2019 8:55, Andy Farkas пишет:
> On 02/02/2019 04:11, bugzilla-nore...@freebsd.org wrote:
>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235031
> ...
>> The exactly same client runs fine under Hyper-V using de0 (de(4)).
>>
> 
> de(4) is going away. Does this affect FBSD on Hyper-V into the future?

We have sys/dev/hyperv/netvsc/if_hn.c for Hyper-V (hn0) network interfaces 
since FreeBSD 10.

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [Bug 235031] [em] em0: poor NFS performance, strange behavior

2019-02-03 Thread Andy Farkas

On 02/02/2019 04:11, bugzilla-nore...@freebsd.org wrote:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235031

...

The exactly same client runs fine under Hyper-V using de0 (de(4)).



de(4) is going away. Does this affect FBSD on Hyper-V into the future?

-andyf

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [Bug 235031] [em] em0: poor NFS performance, strange behavior

2019-01-20 Thread Bruce Evans

On Mon, 21 Jan 2019, Bruce Evans wrote:


...  For the em0
NIC on my client, even the null change from autoselect to 1000baseT
full-duplex often corrupts the NIC state so that even ping doesn't
work.  I got tired of that and fixed the missing stopping:

XX Index: iflib.c
XX ===
XX --- iflib.c  (revision 332488)
XX +++ iflib.c  (working copy)
XX @@ -2232,7 +2234,7 @@
XX XX   CTX_LOCK(ctx);
XX  if ((err = IFDI_MEDIA_CHANGE(ctx)) == 0)
XX -iflib_init_locked(ctx);
XX +iflib_if_init_locked(ctx);
XX  CTX_UNLOCK(ctx);
XX  return (err);
XX  }

The fix works perfectly.  Now it is safe to change the media on the.  The


On the client.


null change from autoselect to 1000baseT full-duplex on the client now
doesn't corrupt the state or change the nfs or ping speeds.  Changing
the media to 100baseTX full-duplex on the client gives much the same
misbehaviour as changing the media on the server similarly (not quite
so bad).  But changing the mediat to 100baseTX full-duplex on both gives
much worse behaviour.  Sometimes it causes the frame error reported
by my previous patch.  Clearly there is a protocol mismatch.


This patch fixes the severe corruption of the state reported by the
"desc avail = 1024, pidx = 0" message for the case where the corruption
is from missing stopping for media changes.  I see that you reported
this corruption in this PR for rxcsum toggling and in another PR for
lro toggling.  These operations work right for me, at least with the
above patch, but the patch doesn't affect flags changes and stopping
seems to be done correctly for flags changes.

Bruce
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [Bug 235031] [em] em0: poor NFS performance, strange behavior

2019-01-20 Thread Martin Birgmeier
Hi Bruce,

Thank you for your support.

The machine A with the em0 issue is running at 1 Gbps and acts as NFS
server. The NFS client B has a 100 Mbps interface. B gets a throughput
of only 1 Mbyte/s when talking to A but the full 10 Mbyte/s when talking
to another third machine C. In addition, while B is talking to A, if at
the same time A runs an iperf to C, the situation for B improves (up to
5..7 Mbyte/s).

All machines are connected by a DGS-1210-24 1 Gbps switch.

In the mailing list and FreeBSD bugs I have seen that there are a
multitude of issues with the em driver in FreeBSD 12. It seems that the
switch to iflib has introduced them.

I have also discovered that there is net/intel-em-kmod. What is the
relationship between the driver in the base sources and this one? How
advisable is it to use the driver from ports?

-- Martin

On 20.01.19 13:56, Bruce Evans wrote:
> On Sun, 20 Jan 2019, Martin Birgmeier wrote:
>
>> Regarding duplex, ifconfig shows the following:
>>
>> [0]# ifconfig em0
>> em0: flags=8843 metric 0 mtu
>> 1500
>>    
>> options=81249b
>>
>>     ether f0:de:f1:98:86:a9
>>     inet 192.168.1.19 netmask 0xff00 broadcast 192.168.1.255
>>     inet6 fe80::f2de:f1ff:fe98:86a9%em0 prefixlen 64 scopeid 0x1
>>     inet6 fec0:0:0:4d42::13 prefixlen 64
>>     inet6 fec0::4d42:f2de:f1ff:fe98:86a9 prefixlen 64 autoconf
>>     inet6 2002:bc17:f381:4d42:f2de:f1ff:fe98:86a9 prefixlen 64
>> autoconf
>>     media: Ethernet autoselect (1000baseT )
>>     status: active
>>     nd6 options=23
>> [0]#
>>
>> This seems to be o.k.
>
> The media setting can't be trusted to have reached the hardware -- see my
> previous reply.
>
> But I thought that you said that you were using 100 Mbps (presumably with
> autoselect).  The above shos autoselect giving 1 Gbps.
>
> I checked that iflib_media_change() is not called for autoselect to 1
> Gbps
> here.  Also that it fails to stop the NIC if called.  Also that it breaks
> the NIC's state after a few calls in the loop:
>
> while :; do
>     ./ifconfig em0 media 1000baseT mediaopt full-duplex
>     ./ifconfig em0 media autoselect
> done
>
> provided ./ifconfig is on nfs.  This gives null changes disguised as
> non-null changes so that iflib_media_change() is called.
>
> Console output for this:
>
> XX link state changed to down
> XX Link state changed to up
> XX link state changed to down
> XX em0: TX(0) desc avail = 21, pidx = 34
>
> Sometimes the queue indexes are corrupted and this messages is printed.
> Sometimes, but never in this output, this message is repeated many times
> before the interface comes back up.  Actually, this doesn't always
> occur between down and up, and when it is repeaded the queue state is
> avail = 1024, pidx = 0, and this state seems to be sticky unless ifconfig
> somehow runs to generate another reinitialization.
>
> XX Link state changed to up
> XX link state changed to down
> XX Link state changed to up
> XX link state changed to down
> XX Link state changed to up
> XX link state changed to down
> XX Link state changed to up
> XX link state changed to down
> XX Link state changed to up
> XX em0: TX(0) desc avail = 1, pidx = 30
> XX link state changed to down
> XX Link state changed to up
> XX link state changed to down
> XX Link state changed to up
> XX link state changed to down
> XX Link state changed to up
> XX link state changed to down
> XX Link state changed to up
> XX link state changed to down
> XX Link state changed to up
> XX link state changed to down
> XX em0: TX(0) desc avail = 14, pidx = 33
> XX Link state changed to up
>
> ipv4 ping is broken most of the time while this loop is running.  Of
> course
> ping should stop responding while the interface is down.  It rarely
> starts
> when the interface comes back up.  Sometimes it starts with low latency,
> but usually it starts with DUPs.  For about 50 iterations, the only ping
> output was:
>
> XX 64 bytes from 192.168.2.8: icmp_seq=619 ttl=64 time=0.158 ms
> XX 64 bytes from 192.168.2.8: icmp_seq=619 ttl=64 time=3523.305 ms (DUP!)
> XX 64 bytes from 192.168.2.8: icmp_seq=619 ttl=64 time=6696.247 ms (DUP!)
> XX 64 bytes from 192.168.2.8: icmp_seq=619 ttl=64 time=9857.912 ms (DUP!)
> XX 64 bytes from 192.168.2.8: icmp_seq=728 ttl=64 time=0.094 ms
> XX 64 bytes from 192.168.2.8: icmp_seq=728 ttl=64 time=4154.124 ms (DUP!)
> XX 64 bytes from 192.168.2.8: icmp_seq=728 ttl=64 time=7253.986 ms (DUP!)
> XX 64 bytes from 192.168.2.8: icmp_seq=728 ttl=64 time=10367.938 ms
> (DUP!)
> XX 64 bytes from 192.168.2.8: icmp_seq=728 ttl=64 time=13540.805 ms
> (DUP!)
>
> Bruce
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [Bug 235031] [em] em0: poor NFS performance, strange behavior

2019-01-20 Thread Bruce Evans

On Sun, 20 Jan 2019, Martin Birgmeier wrote:


Regarding duplex, ifconfig shows the following:

[0]# ifconfig em0
em0: flags=8843 metric 0 mtu 1500
??
options=81249b
?? ether f0:de:f1:98:86:a9
?? inet 192.168.1.19 netmask 0xff00 broadcast 192.168.1.255
?? inet6 fe80::f2de:f1ff:fe98:86a9%em0 prefixlen 64 scopeid 0x1
?? inet6 fec0:0:0:4d42::13 prefixlen 64
?? inet6 fec0::4d42:f2de:f1ff:fe98:86a9 prefixlen 64 autoconf
?? inet6 2002:bc17:f381:4d42:f2de:f1ff:fe98:86a9 prefixlen 64 
autoconf
?? media: Ethernet autoselect (1000baseT )
?? status: active
?? nd6 options=23
[0]#

This seems to be o.k.


The media setting can't be trusted to have reached the hardware -- see my
previous reply.

But I thought that you said that you were using 100 Mbps (presumably with
autoselect).  The above shos autoselect giving 1 Gbps.

I checked that iflib_media_change() is not called for autoselect to 1 Gbps
here.  Also that it fails to stop the NIC if called.  Also that it breaks
the NIC's state after a few calls in the loop:

while :; do
./ifconfig em0 media 1000baseT mediaopt full-duplex
./ifconfig em0 media autoselect
done

provided ./ifconfig is on nfs.  This gives null changes disguised as
non-null changes so that iflib_media_change() is called.

Console output for this:

XX link state changed to down
XX Link state changed to up
XX link state changed to down
XX em0: TX(0) desc avail = 21, pidx = 34

Sometimes the queue indexes are corrupted and this messages is printed.
Sometimes, but never in this output, this message is repeated many times
before the interface comes back up.  Actually, this doesn't always
occur between down and up, and when it is repeaded the queue state is
avail = 1024, pidx = 0, and this state seems to be sticky unless ifconfig
somehow runs to generate another reinitialization.

XX Link state changed to up
XX link state changed to down
XX Link state changed to up
XX link state changed to down
XX Link state changed to up
XX link state changed to down
XX Link state changed to up
XX link state changed to down
XX Link state changed to up
XX em0: TX(0) desc avail = 1, pidx = 30
XX link state changed to down
XX Link state changed to up
XX link state changed to down
XX Link state changed to up
XX link state changed to down
XX Link state changed to up
XX link state changed to down
XX Link state changed to up
XX link state changed to down
XX Link state changed to up
XX link state changed to down
XX em0: TX(0) desc avail = 14, pidx = 33
XX Link state changed to up

ipv4 ping is broken most of the time while this loop is running.  Of course
ping should stop responding while the interface is down.  It rarely starts
when the interface comes back up.  Sometimes it starts with low latency,
but usually it starts with DUPs.  For about 50 iterations, the only ping
output was:

XX 64 bytes from 192.168.2.8: icmp_seq=619 ttl=64 time=0.158 ms
XX 64 bytes from 192.168.2.8: icmp_seq=619 ttl=64 time=3523.305 ms (DUP!)
XX 64 bytes from 192.168.2.8: icmp_seq=619 ttl=64 time=6696.247 ms (DUP!)
XX 64 bytes from 192.168.2.8: icmp_seq=619 ttl=64 time=9857.912 ms (DUP!)
XX 64 bytes from 192.168.2.8: icmp_seq=728 ttl=64 time=0.094 ms
XX 64 bytes from 192.168.2.8: icmp_seq=728 ttl=64 time=4154.124 ms (DUP!)
XX 64 bytes from 192.168.2.8: icmp_seq=728 ttl=64 time=7253.986 ms (DUP!)
XX 64 bytes from 192.168.2.8: icmp_seq=728 ttl=64 time=10367.938 ms (DUP!)
XX 64 bytes from 192.168.2.8: icmp_seq=728 ttl=64 time=13540.805 ms (DUP!)

Bruce___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [Bug 235031] [em] em0: poor NFS performance, strange behavior

2019-01-20 Thread Bruce Evans

On Sun, 20 Jan 2019, Martin Birgmeier wrote:


I am not using resume at all... just normal startup/shutdown.


You might be using media change, which has the same bug as resume had.

Bruce
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [Bug 235031] [em] em0: poor NFS performance, strange behavior

2019-01-20 Thread Bruce Evans

On Sun, 20 Jan 2019, Martin Birgmeier wrote:


The machine A with the em0 issue is running at 1 Gbps and acts as NFS
server. The NFS client B has a 100 Mbps interface. B gets a throughput
of only 1 Mbyte/s when talking to A but the full 10 Mbyte/s when talking
to another third machine C. In addition, while B is talking to A, if at
the same time A runs an iperf to C, the situation for B improves (up to
5..7 Mbyte/s).

All machines are connected by a DGS-1210-24 1 Gbps switch.


I see.  I get worse misbehaviour (nfs write speed 24 KB/s for 512-blocks
instead of 1 MB/s) after changing the media of the bge NIC on my server
to 1000base full-duplex (where the switch is a cheap TP-Link 1 Gbps).
ping remains fast.  Concurrent ping doesn't improve nfs.  For the em0
NIC on my client, even the null change from autoselect to 1000baseT
full-duplex often corrupts the NIC state so that even ping doesn't
work.  I got tired of that and fixed the missing stopping:

XX Index: iflib.c
XX ===
XX --- iflib.c  (revision 332488)
XX +++ iflib.c  (working copy)
XX @@ -2232,7 +2234,7 @@
XX 
XX  	CTX_LOCK(ctx);

XX  if ((err = IFDI_MEDIA_CHANGE(ctx)) == 0)
XX -iflib_init_locked(ctx);
XX +iflib_if_init_locked(ctx);
XX  CTX_UNLOCK(ctx);
XX  return (err);
XX  }

The fix works perfectly.  Now it is safe to change the media on the.  The
null change from autoselect to 1000baseT full-duplex on the client now
doesn't corrupt the state or change the nfs or ping speeds.  Changing
the media to 100baseTX full-duplex on the client gives much the same
misbehaviour as changing the media on the server similarly (not quite
so bad).  But changing the mediat to 100baseTX full-duplex on both gives
much worse behaviour.  Sometimes it causes the frame error reported
by my previous patch.  Clearly there is a protocol mismatch.

This problem occurs often.  I don't know how it can occur when there
is a switch.  The switch should translate to 1000 Mbps for the em0 side.

I don't really understand this, but have a lot of code in mii/e1000phy.c
related to it, and once tested this with all combinations of speeds and
duplexes.  e1000phy.c has nothing to do with Intel e1000, but is for an
old Marvell phy.  I have one on an sk NIC, and it stopped working at
1 Gbps on cold days.  The simplest fix was to set the speed manually,
but this gave problems like the above, and gives an unnecessarily low
speed on warm days.  At least my version of e1000phy.c or sk has some
link flags which give more control over this.

Half-duplex on both sides works!  The old version of bge on the server
doesn't support mediaopt half-duplex, but seems to default to that and
ifconfig prints nothing for the duplex.  -current em0 supports it.
Working means that the nfs write speed is about 9 MB/s.  Half-duplex
is of course slightly slower than full-duplex.  Similarly for 10baseT/UTP.

I found my old tables of working combinations of duplexes and autoselects
for bge <-> switch <-> sk and bge <-> sk.  The switch affects the working
combinations.  The tables are cryptic, but seem to be as follows:

switch case:
bge sk  success
--- --  ---
A   A   n/a (handling of the sk bug gives a fuzzy auto speed)
1   A   n/a
1F  A   n/a
1   1   OK (as above)
1F  1   fail
1   1F  OK! (1F -> 1)
1F  1F  fail! (as above)
A   1   OK (A -> 1)
A   1F  OK (A -> 1F)

direct case:
bge sk  success
--- --  ---
A   A   n/a (handling of the sk bug gives a fuzzy auto speed)
1   A   OK (A -> 1)
1F  A   partial succes (giving half-duplex!?)
1   1   OK (as above)
1F  1   fail
1   1F  fail (as expected, but different from switch case!)
1F  1F  fail! (as above)
A   1   OK (A -> 1)
A   1F  OK (A -> 1F)

Here 1 means a speed of 1000 Mbps or possibly 100 Mbps, A means autoselect,
F means full duplex, and the absense of F means half-duplex or nothing.

A for both should work and is normally used, and the only really weird case
is 1F for both not working.


...
I have also discovered that there is net/intel-em-kmod. What is the
relationship between the driver in the base sources and this one? How
advisable is it to use the driver from ports?


I don't know about that.  I guess Intel still does some development,
especially for newer chipsets.

Bruce
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [Bug 235031] [em] em0: poor NFS performance, strange behavior

2019-01-20 Thread Martin Birgmeier
Regarding duplex, ifconfig shows the following:

[0]# ifconfig em0
em0: flags=8843 metric 0 mtu 1500
   
options=81249b
    ether f0:de:f1:98:86:a9
    inet 192.168.1.19 netmask 0xff00 broadcast 192.168.1.255
    inet6 fe80::f2de:f1ff:fe98:86a9%em0 prefixlen 64 scopeid 0x1
    inet6 fec0:0:0:4d42::13 prefixlen 64
    inet6 fec0::4d42:f2de:f1ff:fe98:86a9 prefixlen 64 autoconf
    inet6 2002:bc17:f381:4d42:f2de:f1ff:fe98:86a9 prefixlen 64 autoconf
    media: Ethernet autoselect (1000baseT )
    status: active
    nd6 options=23
[0]#

This seems to be o.k.

-- Martin

On 20.01.19 06:28, Bruce Evans wrote:
> On Sat, 19 Jan 2019, Martin Birgmeier wrote:
>
>> I just tried the patch by Bruce (from the mail sent 10 hours ago), but
>> it makes no difference.
>>
>> Also, it does not seem like bad frames or too high an interrupt rate are
>> the problem (the machine should easily handle what is coming from its
>> NFS client which only has a 100 Mbps interface).
>>
>> I believe that the simplifications introduced to sys/dev/e1000 between
>> 11.2 and 12.0 have broken something.
>
> They aren't exactly simplifications :-).
>
> Did you check for the common problex of a duplex mismatch?  ISR that some
> versions if iflib'ed em didn't negotiate right for your speed of 100
> Mbps.
>
> Here I can break nfs using "ifconfig em0 media 100baseTX mediaopt
> full-duplex" and forgetting the mediaopt part.  This gives half-duplex.
> ipv4 ping still works, but its latency increases from ~125 usec to ~76
> msec.  The latter latency destroys nfs performance.  After the media
> change, there are a lot of DUP packets with an initial latency of ~43
> second and the latency decreasing by the ping interval of 1 second for
> the next 42 or 43 DUPs until the backlog is cleared; the latency is
> then between 71 and 80 msec.  Changing the media and mediaopt back
> to 1000baseT[X] full-duplex restores low latency but causes 1 DUP with
> delay ~19 seconds
>
> Suspend/resume used to give much the same misbehaviour, by not stopping
> the NIC when reinitializing it in resume.  This was fixed in r342855.
> This might be the bug!  iflib_media_change() calls iflib_init_locked()
> liked resume used to, so seems to be missing stopping.  Changing this
> should fix at least the DUPs.
>
> The function names or layering are confusing.  iflib_init_locked()
> doesn't initialize the if.  iflib_if_init_locked() does that.  All
> iflib_init_locked() does is call iflib_stop(), then iflib_init_locked().
> and iflib.  Grep shows the following related iflib*init*() calls:
> - iflib_netmap_register manually inlines iflib_if_init_locked().  This
>   is a style bug
> - iflib_media_change() only calls iflib_init_locked().  This seems to be
>   a bug
> - _task_fn_admin() calls iflib_if_init_locked() for resetting.  This
> seems
>   to be correctly obfuscated
> - iflib_if_init_locked() calls iflib_init_locked().  This is part of
>   implementing the obfuscation - iflib_if_init() calls
> iflib_if_init_locked().  This is correct
> - iflib_if_ioctl(): SIOCSIFMTU calls iflib_stop(), then does some
> locking,
>   then sets the mtu in software, then calls iflib_init_locked().  This
>   seems to be correct, and shows that the iflib_if_init_locked() is not
>   even generally useful.  This gives down/up for non-null changes.  This
>   works correctly (some ping packets are lost, but there are no DUPs.
> - iflib_if_ioctl(): SIOCSIFCAP is like SIOCSIFMTU, except I didn't test
>   it and its splitting of stopping and init'ing is a bit messier because
>   both operations are under a more complicated conditional.
> - iflib_if_ioctl(): calls iflib_if_init().  This
>   is correct.
> - iflib_vlan_[un]register() call iflib_if_init_locked().  This seems
> to be
>   correctly obfuscated
> - iflib_device_resume() calls iflib_if_init_locked().  This is correctly
>   obfuscated
> - if_setinitfn() is called to set iflib_if_init as the init function. 
> This
>   is correct.
>
> Summary: only media change seems to be broken, but there are some
> style bugs.
>
> The bug apparently btoke resume by reinitializing an active state
> (even locking doesn't help much, but I now remember than resume
> succeeded every 10-100 tries in the buggy versions -- there were always
> a lot of DUPs, but sometimes to low latency came back).  My tests
> usually used zzz and my zzz and other utilities are on nfs, so nfs was
> fairly active just before suspend.
>
> I don't know if iflib_media_change() is called at boot time, especially
> if the media is autoselect.  At boot time, the state might be less
> active or closer to the reset state, so that even a manual media change
> that surely calls iflib_media_change() has more chance of working than
> at resume time with zzz and other utilities on nfs.
>
> I don't know what the media was after the broken resume.  Its reported
> result can't be trusted anyway.  To recover from the broken resume, it
> usually worked to repeat 

Re: [Bug 235031] [em] em0: poor NFS performance, strange behavior

2019-01-20 Thread Martin Birgmeier
I am not using resume at all... just normal startup/shutdown.

-- Martin

On 20.01.19 07:19, Bruce Evans wrote:
> On Sun, 20 Jan 2019, Bruce Evans wrote:
>
>> [iflib_media_change() is missing iflib_stop(), like iflib_resume() was]
>>
>> I don't know what the media was after the broken resume.  Its reported
>> result can't be trusted anyway.  To recover from the broken resume, it
>> usually worked to repeat down/up a few times.  This is consistent with
>> bug -- eventually, previous down/up's change the state to close enough
>> to stopped.  But using the interface in any way (including pinging it
>> to see if it is still broken) makes it not so close to being stopped.
>
> Further debugging after restoring the bug in resume:
> - I use mainly zzz to suspend
> - the bug usually doesn't break the interface if I copy zzz from nfs to
>   non-nfs and use the copy.  This explains why almost no one except me
>   noticed the bug -- zzz is usually not on nfs, and other nfs activity
>   is usually lighter than mine too.  (Suspend apparently doesn't do
> enough
>   stopping or syncing generally.  It should fsync() all files ...)
> - the bug usually does break the interface if zzz is on nfs
> - when the bug breaks the interface:
>   - the media is reported as unchanged
>   - after DUPs starting with a delay of many seconds and reducing by the
>     ping interval of 1 second for each until the delay is less than 1
>     second, the ping latency stabilizes at quite different values after
>     each suspend/resume.  These values tend to be higher than for media
>     change (several hundred ms instead of 76 ms).
>   - my ifconfig excutable is one of several under /sbin which is not
> on nfs,
>     but my ifconfig is actually a shell script in $HOME/bin; the script
>     selects the correct version of ifconfig for the current kernel; it is
>     on nfs, and uses utilties on nfs.  I sometimes forget this, and then
>     running plain ifconfig to attempt to recover takes too long, and if I
>     wait then the nfs activity for finding ifconfig not on nfs tends to
>     propagate the broken interface (like zzz not on nfs breaks it).
>     Manually selecting the correct version of ifconfig under /sbin and
> using
>     it tends to work right (like zzz not on nfs).
>   - even an mtu change is enough to recover.  This is not surprising,
> since
>     it does slightly more than down/up as an implementation detail.  This
>     shows that the reported media value is at least used by the reinit
> for
>     the mtu change.
>   - pinging the interface didn't make it active enough for the
> recovery to
>     not usually work.
>
> Bruce
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [Bug 235031] [em] em0: poor NFS performance, strange behavior

2019-01-20 Thread Bruce Evans

On Sun, 20 Jan 2019, Bruce Evans wrote:


[iflib_media_change() is missing iflib_stop(), like iflib_resume() was]

I don't know what the media was after the broken resume.  Its reported
result can't be trusted anyway.  To recover from the broken resume, it
usually worked to repeat down/up a few times.  This is consistent with
bug -- eventually, previous down/up's change the state to close enough
to stopped.  But using the interface in any way (including pinging it
to see if it is still broken) makes it not so close to being stopped.


Further debugging after restoring the bug in resume:
- I use mainly zzz to suspend
- the bug usually doesn't break the interface if I copy zzz from nfs to
  non-nfs and use the copy.  This explains why almost no one except me
  noticed the bug -- zzz is usually not on nfs, and other nfs activity
  is usually lighter than mine too.  (Suspend apparently doesn't do enough
  stopping or syncing generally.  It should fsync() all files ...)
- the bug usually does break the interface if zzz is on nfs
- when the bug breaks the interface:
  - the media is reported as unchanged
  - after DUPs starting with a delay of many seconds and reducing by the
ping interval of 1 second for each until the delay is less than 1
second, the ping latency stabilizes at quite different values after
each suspend/resume.  These values tend to be higher than for media
change (several hundred ms instead of 76 ms).
  - my ifconfig excutable is one of several under /sbin which is not on nfs,
but my ifconfig is actually a shell script in $HOME/bin; the script
selects the correct version of ifconfig for the current kernel; it is
on nfs, and uses utilties on nfs.  I sometimes forget this, and then
running plain ifconfig to attempt to recover takes too long, and if I
wait then the nfs activity for finding ifconfig not on nfs tends to
propagate the broken interface (like zzz not on nfs breaks it).
Manually selecting the correct version of ifconfig under /sbin and using
it tends to work right (like zzz not on nfs).
  - even an mtu change is enough to recover.  This is not surprising, since
it does slightly more than down/up as an implementation detail.  This
shows that the reported media value is at least used by the reinit for
the mtu change.
  - pinging the interface didn't make it active enough for the recovery to
not usually work.

Bruce
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [Bug 235031] [em] em0: poor NFS performance, strange behavior

2019-01-20 Thread Bruce Evans

On Sat, 19 Jan 2019, Martin Birgmeier wrote:


I just tried the patch by Bruce (from the mail sent 10 hours ago), but
it makes no difference.

Also, it does not seem like bad frames or too high an interrupt rate are
the problem (the machine should easily handle what is coming from its
NFS client which only has a 100 Mbps interface).

I believe that the simplifications introduced to sys/dev/e1000 between
11.2 and 12.0 have broken something.


They aren't exactly simplifications :-).

Did you check for the common problex of a duplex mismatch?  ISR that some
versions if iflib'ed em didn't negotiate right for your speed of 100 Mbps.

Here I can break nfs using "ifconfig em0 media 100baseTX mediaopt
full-duplex" and forgetting the mediaopt part.  This gives half-duplex.
ipv4 ping still works, but its latency increases from ~125 usec to ~76
msec.  The latter latency destroys nfs performance.  After the media
change, there are a lot of DUP packets with an initial latency of ~43
second and the latency decreasing by the ping interval of 1 second for
the next 42 or 43 DUPs until the backlog is cleared; the latency is
then between 71 and 80 msec.  Changing the media and mediaopt back
to 1000baseT[X] full-duplex restores low latency but causes 1 DUP with
delay ~19 seconds

Suspend/resume used to give much the same misbehaviour, by not stopping
the NIC when reinitializing it in resume.  This was fixed in r342855.
This might be the bug!  iflib_media_change() calls iflib_init_locked()
liked resume used to, so seems to be missing stopping.  Changing this
should fix at least the DUPs.

The function names or layering are confusing.  iflib_init_locked()
doesn't initialize the if.  iflib_if_init_locked() does that.  All
iflib_init_locked() does is call iflib_stop(), then iflib_init_locked().
and iflib.  Grep shows the following related iflib*init*() calls:
- iflib_netmap_register manually inlines iflib_if_init_locked().  This
  is a style bug
- iflib_media_change() only calls iflib_init_locked().  This seems to be
  a bug
- _task_fn_admin() calls iflib_if_init_locked() for resetting.  This seems
  to be correctly obfuscated
- iflib_if_init_locked() calls iflib_init_locked().  This is part of
  implementing the obfuscation 
- iflib_if_init() calls iflib_if_init_locked().  This is correct

- iflib_if_ioctl(): SIOCSIFMTU calls iflib_stop(), then does some locking,
  then sets the mtu in software, then calls iflib_init_locked().  This
  seems to be correct, and shows that the iflib_if_init_locked() is not
  even generally useful.  This gives down/up for non-null changes.  This
  works correctly (some ping packets are lost, but there are no DUPs.
- iflib_if_ioctl(): SIOCSIFCAP is like SIOCSIFMTU, except I didn't test
  it and its splitting of stopping and init'ing is a bit messier because
  both operations are under a more complicated conditional.
- iflib_if_ioctl(): calls iflib_if_init().  This
  is correct.
- iflib_vlan_[un]register() call iflib_if_init_locked().  This seems to be
  correctly obfuscated
- iflib_device_resume() calls iflib_if_init_locked().  This is correctly
  obfuscated
- if_setinitfn() is called to set iflib_if_init as the init function.  This
  is correct.

Summary: only media change seems to be broken, but there are some style bugs.

The bug apparently btoke resume by reinitializing an active state
(even locking doesn't help much, but I now remember than resume
succeeded every 10-100 tries in the buggy versions -- there were always
a lot of DUPs, but sometimes to low latency came back).  My tests
usually used zzz and my zzz and other utilities are on nfs, so nfs was
fairly active just before suspend.

I don't know if iflib_media_change() is called at boot time, especially
if the media is autoselect.  At boot time, the state might be less
active or closer to the reset state, so that even a manual media change
that surely calls iflib_media_change() has more chance of working than
at resume time with zzz and other utilities on nfs.

I don't know what the media was after the broken resume.  Its reported
result can't be trusted anyway.  To recover from the broken resume, it
usually worked to repeat down/up a few times.  This is consistent with
bug -- eventually, previous down/up's change the state to close enough
to stopped.  But using the interface in any way (including pinging it
to see if it is still broken) makes it not so close to being stopped.

Bruce
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [Bug 235031] [em] em0: poor NFS performance, strange behavior

2019-01-19 Thread Martin Birgmeier
I just tried the patch by Bruce (from the mail sent 10 hours ago), but
it makes no difference.

Also, it does not seem like bad frames or too high an interrupt rate are
the problem (the machine should easily handle what is coming from its
NFS client which only has a 100 Mbps interface).

I believe that the simplifications introduced to sys/dev/e1000 between
11.2 and 12.0 have broken something.

-- Martin

On 19.01.19 21:06, Bruce Evans wrote:
> On Sun, 20 Jan 2019, Eugene Grosbein wrote:
>
>> 19.01.2019 17:21, Bruce Evans wrote:
>>
>>> Your problem looks more like lost interrupts.  All em NICs should
>>> interrupt
>>> at the default interrupt moderation rate of 8 kHz under load.  Once
>>> there
>>> are are that many interrupts, there is not much else that can go
>>> wrong (nfs
>>> would have to be working to generate that many interrupts).
>>
>> I have a patch (in production since 8.x) that makes em(4) support
>> hw.em.max_interrupt_rate
>> just like igb(4) supports hw.igb.max_interrupt_rate:
>>
>> http://www.grosbein.net/freebsd/patches/em_sysctl-11.0.diff.gz
>>
>> It also brings in sysctls dev.em.X.max_interrupt_rate and
>> hw.em.max_interrupt_rate sets defaults for them.
>
> This is inverted and spelled dev.em.X.itr for em.
>
> Hmm, em already has this, but it is only a read-only tunable.
>
> igb seems to have gone away.  In FreeBSD-11, its
> dev.em.X.max_interrupt_rate
> is also only a tunable.
>
> I use the variants of the following fix for itr in FreeBSD-[7-13]
>
> XX Index: if_em.c
> XX ===
> XX --- if_em.c    (revision 332488)
> XX +++ if_em.c    (working copy)
> XX @@ -908,10 +910,10 @@
> XX  E1000_REGISTER(hw, E1000_TADV),
> XX  em_tx_abs_int_delay_dflt);
> XX  em_add_int_delay_sysctl(adapter, "itr",
> XX -    "interrupt delay limit in usecs/4",
> XX +    "interrupt delay limit in usecs",
> XX  >tx_itr,
> XX  E1000_REGISTER(hw, E1000_ITR),
> XX -    DEFAULT_ITR);
> XX +    100 / MAX_INTS_PER_SEC);
> XX XX  hw->mac.autoneg = DO_AUTO_NEG;
> XX  hw->phy.autoneg_wait_to_complete = FALSE;
>
> This fixes the description and the initial value for the sysctl to match
> the code.  The description almost matches the buggy initial value.  The
> hardware has power of 2 units, but the code scales to microseconds. 
> Except
> the initial value has was in hardware units scaled by another power of 2
> which made them nearly microseconds/4.  The code sets the initial
> value to
> a representation of 125 usec (8 kHz), but the sysctl says that the
> initial
> value is 488 and the description says that this is a representation of
> 488/4 = 122 usec.  However, writing back this value using sysctl gives
> 488 usec (~2 kHz).  The magic number 122 is 125 mis-scaled by 1000/1024.
>
> FreeBSD[7-10] have lem in a separate file with the bug duplicated, so
> need the patch duplicated.  FreeBSD[7-8] don't have a sysctl for this.
> They default to 125 usec and there is no way to see or change the value.
> I usually want the smaller value of 0, and hard-code this when there is
> no sysctl.
>
> DEFAULT_ITR is used mainly to obfuscate this.  IGB_DEFAULT_ITR and
> IGB_LINK_ITR are also defined, but are not used even in versions of
> FreeBSD
> that have igb.
>
>> I use hw.em.max_interrupt_rate=32000 for 1GB link passing average
>> sized packets
>> (about 600 bytes per packet at average) but driver's default 8000
>> should be nearly fine
>> for full size packets (1500 or above) and this 8000 limit cannot be
>> reason for such low throughput.
>
> 0 for itr maxes out at about 100 kHz here.  This is good for low
> latency with
> small packets.
>
> My version of bge dynamically modifies the rate to match the rx load (no
> moderation for light loads).  tx is handled specially and only needs 1
> interrupt every few seconds for freeing resources.
>
> Bruce
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [Bug 235031] [em] em0: poor NFS performance, strange behavior

2019-01-19 Thread Bruce Evans

On Sun, 20 Jan 2019, Eugene Grosbein wrote:


19.01.2019 17:21, Bruce Evans wrote:


Your problem looks more like lost interrupts.  All em NICs should interrupt
at the default interrupt moderation rate of 8 kHz under load.  Once there
are are that many interrupts, there is not much else that can go wrong (nfs
would have to be working to generate that many interrupts).


I have a patch (in production since 8.x) that makes em(4) support 
hw.em.max_interrupt_rate
just like igb(4) supports hw.igb.max_interrupt_rate:

http://www.grosbein.net/freebsd/patches/em_sysctl-11.0.diff.gz

It also brings in sysctls dev.em.X.max_interrupt_rate and 
hw.em.max_interrupt_rate sets defaults for them.


This is inverted and spelled dev.em.X.itr for em.

Hmm, em already has this, but it is only a read-only tunable.

igb seems to have gone away.  In FreeBSD-11, its dev.em.X.max_interrupt_rate
is also only a tunable.

I use the variants of the following fix for itr in FreeBSD-[7-13]

XX Index: if_em.c
XX ===
XX --- if_em.c  (revision 332488)
XX +++ if_em.c  (working copy)
XX @@ -908,10 +910,10 @@
XX  E1000_REGISTER(hw, E1000_TADV),
XX  em_tx_abs_int_delay_dflt);
XX  em_add_int_delay_sysctl(adapter, "itr",
XX -"interrupt delay limit in usecs/4",
XX +"interrupt delay limit in usecs",
XX  >tx_itr,
XX  E1000_REGISTER(hw, E1000_ITR),
XX -DEFAULT_ITR);
XX +100 / MAX_INTS_PER_SEC);
XX 
XX  	hw->mac.autoneg = DO_AUTO_NEG;

XX  hw->phy.autoneg_wait_to_complete = FALSE;

This fixes the description and the initial value for the sysctl to match
the code.  The description almost matches the buggy initial value.  The
hardware has power of 2 units, but the code scales to microseconds.  Except
the initial value has was in hardware units scaled by another power of 2
which made them nearly microseconds/4.  The code sets the initial value to
a representation of 125 usec (8 kHz), but the sysctl says that the initial
value is 488 and the description says that this is a representation of
488/4 = 122 usec.  However, writing back this value using sysctl gives
488 usec (~2 kHz).  The magic number 122 is 125 mis-scaled by 1000/1024.

FreeBSD[7-10] have lem in a separate file with the bug duplicated, so
need the patch duplicated.  FreeBSD[7-8] don't have a sysctl for this.
They default to 125 usec and there is no way to see or change the value.
I usually want the smaller value of 0, and hard-code this when there is
no sysctl.

DEFAULT_ITR is used mainly to obfuscate this.  IGB_DEFAULT_ITR and
IGB_LINK_ITR are also defined, but are not used even in versions of FreeBSD
that have igb.


I use hw.em.max_interrupt_rate=32000 for 1GB link passing average sized packets
(about 600 bytes per packet at average) but driver's default 8000 should be 
nearly fine
for full size packets (1500 or above) and this 8000 limit cannot be reason for 
such low throughput.


0 for itr maxes out at about 100 kHz here.  This is good for low latency with
small packets.

My version of bge dynamically modifies the rate to match the rx load (no
moderation for light loads).  tx is handled specially and only needs 1
interrupt every few seconds for freeing resources.

Bruce
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [Bug 235031] [em] em0: poor NFS performance, strange behavior

2019-01-19 Thread Eugene Grosbein
19.01.2019 17:21, Bruce Evans wrote:

> Your problem looks more like lost interrupts.  All em NICs should interrupt
> at the default interrupt moderation rate of 8 kHz under load.  Once there
> are are that many interrupts, there is not much else that can go wrong (nfs
> would have to be working to generate that many interrupts).

I have a patch (in production since 8.x) that makes em(4) support 
hw.em.max_interrupt_rate
just like igb(4) supports hw.igb.max_interrupt_rate:

http://www.grosbein.net/freebsd/patches/em_sysctl-11.0.diff.gz

It also brings in sysctls dev.em.X.max_interrupt_rate and 
hw.em.max_interrupt_rate sets defaults for them.

I use hw.em.max_interrupt_rate=32000 for 1GB link passing average sized packets
(about 600 bytes per packet at average) but driver's default 8000 should be 
nearly fine
for full size packets (1500 or above) and this 8000 limit cannot be reason for 
such low throughput.

> Bugs in iflib are easy to avoid by running FreeBSD-11.  PRO-1000 is supported
> by most versions of FreeBSD and doesn't have the bug fixed by the above in
> FreeBSD[7-11].

Agreed.

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: [Bug 235031] [em] em0: poor NFS performance, strange behavior

2019-01-19 Thread Bruce Evans

On Fri, 18 Jan 2019 a bug that doesn't want repl...@freebsd.org wrote:


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235031

Yes; I just thought it was going to help and wanted to make it permanent right
away. Bad idea.

In the meantime:

[0]# cat /var/db/ntpd.drift
-6.596
[0]#

What can you get from the ntp drift?


I doubt that anything can be got from the ntp drift.  Maybe watching
it for several hours would show that it is wild, but wildness shouldn't
affect nfs throughput much.

I use a couple of fixes for iflib and em, but only the following one is
related to nfs on PRO-1000:

XX Index: em_txrx.c
XX ===
XX --- em_txrx.c(revision 343087)
XX +++ em_txrx.c(working copy)
XX @@ -634,9 +634,20 @@
XX 
XX  		/* Make sure bad packets are discarded */

XX  if (errors & E1000_RXD_ERR_FRAME_ERR_MASK) {
XX +#if 0
XX  adapter->dropped_pkts++;
XX -/* XXX fixup if common */
XX  return (EBADMSG);
XX +#else
XX +/*
XX + * XXX the above error handling is worse than none.
XX + * First it it drops 'i' packets before the current
XX + * one and doesn't count them.  Then it returns an
XX + * error.  iflib can't really handle this error.
XX + * It just resets, and this usually drops many more
XX + * packets (without counting them) and much time.
XX + */
XX +printf("lem: frame error: ignored\n");
XX +#endif
XX  }
XX 
XX  		ri->iri_frags[i].irf_flid = 0;

XX @@ -697,8 +708,12 @@
XX 
XX  		/* Make sure bad packets are discarded */

XX  if (staterr & E1000_RXDEXT_ERR_FRAME_ERR_MASK) {
XX +#if 0
XX  adapter->dropped_pkts++;
XX  return EBADMSG;
XX +#else
XX +printf("em: frame error: ignored\n");
XX +#endif
XX  }
XX 
XX  		ri->iri_frags[i].irf_flid = 0;


On my system, the bug fixed by this only occurs rarely, and only on
PRO-1000 (not on I218-V going through the same low-end network switch),
and has only been observed under moderately heavy nfs use with lots
of small RPCs and not many i/o's.  When it occurs, nfs with unpatched
em takes many seconds to recover, but with the patch nfs barely notices
the error.  I use nfs over UDP since TCP is significantly slower due
to higher latency once the network latency is low enough (here it is
51 usec for old PRO-1000 and 80 usec for I218-V, with about 20 usec
in the switch and a lower latency old bge NIC on the other side).  UDP
gives worse error recovery.

Your problem looks more like lost interrupts.  All em NICs should interrupt
at the default interrupt moderation rate of 8 kHz under load.  Once there
are are that many interrupts, there is not much else that can go wrong (nfs
would have to be working to generate that many interrupts).

Bugs in iflib are easy to avoid by running FreeBSD-11.  PRO-1000 is supported
by most versions of FreeBSD and doesn't have the bug fixed by the above in
FreeBSD[7-11].

Bruce
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"