Hi Yaroslav,

Dne 01. 02. 19 v 16:31 Yaroslav Petrov napsal(a):
> Hi Petr,
>  
> I test it on vgv7510kw22 (100Mbit ports):
>  
> 1. The backport of the vanilla eth driver path doesnt work (interfaces
> are DOWN) -> no Network

As there is phy interface in openwrt version it is possible it is not 
compatible with your device I think I've had to change the iteration over phy 
OF nodes to fit these two variants together. Anyway that isn't a critical 
section, the critical section is xmit, poll and rx parts, which are really 
ineffective in openwrt version. 

> 2. ICU patch give is quite interesting result: If rx/tx are balanced
> betwin CPUs, I have c.a. 88Mbit/sec (max 65% sirq load) and c.a. 92Mbit
> without balancing (max 50% sirq load):

I have a theory about that I was fiddling with /proc/irq/72/smp_affinity and 
/proc/irq/73/smp_affinity and it seems there is a correlation with SMP affinity 
of IRQ and communicating process.

With this test:

host (server, receiving):
        nc -l -p 4321 | pv > /dev/null
lantiq (client, sending):
        cat /dev/zero | nc 10.0.0.1 4321

I can get in pv up to 9.3 MiByte/s when both irqs are on the same VPE. When 
IRQs are on different VPEs I can get about 8.3 MiByte/s. When I quickly switch 
both IRQ on the other VPE I get about 7.4 MiByte/s for few seconds until it 
reaches again 9.3 MiByte/s. That probably scheduler moving netcat process on 
the same VPE as interrupts are.

So it seems there is some overhead between two VPEs on sending data and 
receiving TCP handshakes. So the ethernet seems to be more effective to have 
both IRQ on the same VPE. But other peripherals could use this for the similar 
speedup. If all ethernet IRQs would be on one VPE and all wifi IRQs (+ 
wpa_supplicant) on the other, the wpa_supplicant should be faster in a similar 
fashion as netcat is faster there.

my iperf3 tests (iperf3 -s on lantiq): all 5 patches, vrx200_rx on CPU0, 
vrx200_tx on CPU1, no irqbalance, TD-W9980B (2x 1G native phy, 2x 1G external 
phy; lan 20m 5e UTP connected in port "lan 2" - on external phy), rootfs over 
NFS, no wifi, no DSL, no USB, fixed kernel warnings about full TX fifo in 
backported ethernet driver

iperf3 -c 10.0.0.80
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   354 MBytes   297 Mbits/sec  456             sender
[  4]   0.00-10.00  sec   353 MBytes   296 Mbits/sec                  receiver

iperf3 -c 10.0.0.80 -R
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   115 MBytes  96.5 Mbits/sec    0             sender
[  4]   0.00-10.00  sec   115 MBytes  96.5 Mbits/sec                  receiver

iperf3 -c 10.0.0.80 -u -b 150M
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total 
Datagrams
[  4]   0.00-10.00  sec   177 MBytes   149 Mbits/sec  715.205 ms  789/929 (85%) 
 
[  4] Sent 929 datagrams
... in lantiq console:
[  5]   0.00-10.00  sec  1.09 MBytes   917 Kbits/sec  715.205 ms  789/929 (85%) 
 receiver

iperf3 -c 10.0.0.80 -u -b 150M -R
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total 
Datagrams
[  4]   0.00-10.00  sec   179 MBytes   150 Mbits/sec  0.157 ms  0/22887 (0%)  
[  4] Sent 22887 datagrams
... in lantiq console
[  5]   0.00-10.00  sec   179 MBytes   150 Mbits/sec  0.000 ms  0/22887 (0%)  
sender

iperf3 -c 10.0.0.80 -u -b 1000M
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total 
Datagrams
[  4]   0.00-10.00  sec  1.01 GBytes   865 Mbits/sec  0.284 ms  131600/132024 
(1e+02%)  
[  4] Sent 132024 datagrams
... in lantiq console
[  5]   0.00-10.00  sec  3.31 MBytes  2.78 Mbits/sec  0.284 ms  131600/132024 
(1e+02%)  receiver

iperf3 -c 10.0.0.80 -u -b 1000M -R
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total 
Datagrams
[  4]   0.00-10.00  sec   248 MBytes   208 Mbits/sec  0.022 ms  0/31752 (0%)  
[  4] Sent 31752 datagrams
... in lantiq console
[  5]   0.00-10.00  sec   248 MBytes   208 Mbits/sec  0.000 ms  0/31752 (0%)  
sender

===========================

BTW before backporting the whole vanilla ethernet driver I've come to some 
optimizations in openwrt version

static int xrx200_poll_rx(struct napi_struct *napi, int budget)
...
        if (complete || !rx) {
                napi_complete(&ch->napi);
                ltq_dma_enable_irq(&ch->dma);
        }

if changed to:

        if (complete || !rx) {
                if (napi_complete_done(&ch->napi, rx)) {
                        ltq_dma_enable_irq(&ch->dma);
                }
        }

will reduce irq load (irq will generate only after work completion)

Another place at:

static void xrx200_tx_housekeeping(unsigned long ptr)
...
        for (i = 0; i < XRX200_MAX_DEV && ch->devs[i]; i++)
                netif_wake_queue(ch->devs[i]);

any tasklet will try to wake both queues (the second one wasn't even used in my 
device!). Reducing the driver to only a single TX queue increased the TX 
bitrate.

(IMO there is a timeout problem with the vanilla driver, where 
xrx200_start_xmit() can stop the queue if all descriptors are filled, but there 
is no queue waking up in xrx200_tx_housekeeping().


best regards
Petr

_______________________________________________
openwrt-devel mailing list
[email protected]
https://lists.openwrt.org/mailman/listinfo/openwrt-devel

Reply via email to