Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"

2017-03-23 Thread Joao Pinto
Às 10:56 AM de 3/23/2017, Joao Pinto escreveu:
> Às 10:51 AM de 3/23/2017, Giuseppe CAVALLARO escreveu:
>> On 3/23/2017 11:48 AM, Giuseppe CAVALLARO wrote:
>>> Hello
>>>
>>> On 3/23/2017 11:20 AM, Corentin Labbe wrote:
> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>> Could you please share the iperf cmds you are using in order for me to
> reproduce
>> in my side?
>>>

HW Version: 4.21 QoS Core in HAPS DX7 (FPGA)
The connection between the FPGA and PC where stmmac is running is PCIe.
My configurations are done in stmmac_pci. Here they are:

@@ -68,10 +70,52 @@ static void stmmac_default_data(struct plat_stmmacenet_data
*plat)
 {
plat->bus_id = 1;
plat->phy_addr = 0;
-   plat->interface = PHY_INTERFACE_MODE_GMII;
-   plat->clk_csr = 2;  /* clk_csr_i = 20-35MHz & MDC = clk_csr_i/16 */
-   plat->has_gmac = 1;
-   plat->force_sf_dma_mode = 1;
+   plat->interface = PHY_INTERFACE_MODE_SGMII;
+   plat->clk_csr = 0x5;
+   plat->has_gmac = 0;
+   plat->has_gmac4 = 1;
+   plat->force_sf_dma_mode = 0;
+
+   plat->rx_queues_to_use = 4;
+   plat->tx_queues_to_use = 4;
+
+   plat->rx_sched_algorithm = MTL_RX_ALGORITHM_SP;
+
+   plat->rx_queues_cfg[0].mode_to_use = MTL_QUEUE_AVB;
+   plat->rx_queues_cfg[1].mode_to_use = MTL_QUEUE_DCB;
+   plat->rx_queues_cfg[2].mode_to_use = MTL_QUEUE_DCB;
+   plat->rx_queues_cfg[3].mode_to_use = MTL_QUEUE_DCB;
+
+   plat->tx_queues_cfg[0].mode_to_use = MTL_QUEUE_DCB;
+   plat->tx_queues_cfg[1].mode_to_use = MTL_QUEUE_AVB;
+   plat->tx_queues_cfg[2].mode_to_use = MTL_QUEUE_DCB;
+   plat->tx_queues_cfg[3].mode_to_use = MTL_QUEUE_DCB;
+
+   plat->tx_queues_cfg[1].send_slope = 0xCCC;
+   plat->tx_queues_cfg[1].idle_slope = 0x1333;
+   plat->tx_queues_cfg[1].high_credit = 0x4B;
+   plat->tx_queues_cfg[1].low_credit = 0xFFB5;
+
+   plat->rx_queues_cfg[0].chan = 0;
+   plat->rx_queues_cfg[1].chan = 1;
+   plat->rx_queues_cfg[2].chan = 2;
+   plat->rx_queues_cfg[3].chan = 3;
+
+   plat->tx_sched_algorithm = MTL_TX_ALGORITHM_WRR;
+   plat->tx_queues_cfg[0].weight = 0x10;
+   plat->tx_queues_cfg[1].weight = 0x11;
+   plat->tx_queues_cfg[2].weight = 0x12;
+   plat->tx_queues_cfg[3].weight = 0x13;
+
+   /* Disable Priority config by default */
+   plat->tx_queues_cfg[0].use_prio = false;
+   plat->rx_queues_cfg[0].use_prio = false;
+
+   /* Disable RX queues routing by default */
+   plat->rx_queues_cfg[0].pkt_route = 0x0;
+   plat->rx_queues_cfg[1].pkt_route = 0x0;
+   plat->rx_queues_cfg[2].pkt_route = 0x0;
+   plat->rx_queues_cfg[3].pkt_route = 0x0;

plat->mdio_bus_data->phy_reset = NULL;
plat->mdio_bus_data->phy_mask = 0;
@@ -83,22 +127,14 @@ static void stmmac_default_data(struct plat_stmmacenet_data
*plat)
/* Set default value for multicast hash bins */
plat->multicast_filter_bins = HASH_TABLE_SIZE;

+   plat->dma_cfg->fixed_burst = 0;
+   plat->dma_cfg->aal = 0;
+
/* Set default value for unicast filter entries */
plat->unicast_filter_entries = 1;

/* Set the maxmtu to a default of JUMBO_LEN */
plat->maxmtu = JUMBO_LEN;
-
-   /* Set default number of RX and TX queues to use */
-   plat->tx_queues_to_use = 1;
-   plat->rx_queues_to_use = 1;
-
-   /* Disable Priority config by default */
-   plat->tx_queues_cfg[0].use_prio = false;
-   plat->rx_queues_cfg[0].use_prio = false;
-
-   /* Disable RX queues routing by default */
-   plat->rx_queues_cfg[0].pkt_route = 0x0;
 }


*** TESTS ***


*TEST 1: File (linux-next tarball) transfer of ~1.4G by scp to the DUT*

scp net-next-20170323.tar.gz x@XXX:/home/synopsys/
The authenticity of host 'X' can't be established.
ECDSA key fingerprint is SHA256:/XX.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'XX' (ECDSA) to the list of known hosts.
XX@X's password:
net-next20170323.tar.gz

 100% 1366MB  79.3MB/s   00:17

ifconfig after transfer:

eth1  Link encap:Ethernet  HWaddr 
  inet addr:  Bcast:  Mask:
  inet6 addr: X Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:1026614 errors:0 dropped:0 overruns:0 frame:0
  TX packets:56804 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:1502856063 (1.5 GB)  TX bytes:4224767 (4.2 MB)
  Interrupt:16

*stmmac Log after transfer:

#:~/temp$ dmesg | grep stmmac
[0.278200] stmmac - user ID: 0x10, Synopsys ID: 0x42
[0.278207] stmmaceth :01:00.0: DMA HW capability register supported
[0.278209] stmmaceth :01:00.0: RX Checksum Offload Engine supported
[0.278211] stmmaceth :01:00.0: TX Checksum insertion supported
[0.278224] 

Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"

2017-03-23 Thread Joao Pinto
Às 10:51 AM de 3/23/2017, Giuseppe CAVALLARO escreveu:
> On 3/23/2017 11:48 AM, Giuseppe CAVALLARO wrote:
>> Hello
>>
>> On 3/23/2017 11:20 AM, Corentin Labbe wrote:
 I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
 >Could you please share the iperf cmds you are using in order for me to
 reproduce
 >in my side?
>>
>> Joao, you have a really powerful HW integration with multiple channels for
>> both RX and TX.
>> Often this is not the same for other setup where, usually just a DMA0 is
>> present or, sometime, there
>> is just one RX extra channel.
>>
>> My question is, what happens on this kind of configurations? Are we still
>> guarantying the best performances?
>>
>> Also we have to guarantee, that the TSO and SG are always working. Another
>> point is the buffer sizes that
>> can be different among platforms.
>>
>> The problem  below reported by Corentin push me to think that there is a bug,
>> so we should
>> understand when this has been introduced and if likely fixed by some
>> configuration we are
>> not take care right now.
>>
>> ndesc_get_rx_status: Oversized frame spanned multiple buffers"
> 
> I wonder if this could be easily triggered by getting a big file via FTP. So 
> not
> properly related on performance benchs

I am going to do that test and check it out and also run iperf a couple of
times. I am counting on doing this today and send you later the results. If
anyone gets results sooner please share.

> 
> peppe
> 
>>
>>
>> Best Regards
>> Peppe
>>
> 

Thanks.


Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"

2017-03-23 Thread Joao Pinto

Hi Peppe,

Às 10:48 AM de 3/23/2017, Giuseppe CAVALLARO escreveu:
> Hello
> 
> On 3/23/2017 11:20 AM, Corentin Labbe wrote:
>>> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>>> >Could you please share the iperf cmds you are using in order for me to
>>> reproduce
>>> >in my side?
> 
> Joao, you have a really powerful HW integration with multiple channels for 
> both
> RX and TX.
> Often this is not the same for other setup where, usually just a DMA0 is 
> present
> or, sometime, there
> is just one RX extra channel.

My opinion is that we should not have problems, since the majority of features
introduced are used if you configure rx queues > 1 or tx queues > 1, so if you
use the default (=1) those confiogurations will not take place.

> 
> My question is, what happens on this kind of configurations? Are we still
> guarantying the best performances?
> 
> Also we have to guarantee, that the TSO and SG are always working. Another 
> point
> is the buffer sizes that
> can be different among platforms.

We have to pay attention to the RX buffer size, since I had problems with DHCP
messages not being received because of little buffer size.
Currently TX buffer size is not configurable and in the future it should be
useful to include it too.

> 
> The problem  below reported by Corentin push me to think that there is a bug, 
> so
> we should
> understand when this has been introduced and if likely fixed by some
> configuration we are
> not take care right now.

Of course.

> 
> ndesc_get_rx_status: Oversized frame spanned multiple buffers"
> 
> 
> Best Regards
> Peppe

Thanks,
Joao



Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"

2017-03-23 Thread Giuseppe CAVALLARO

On 3/23/2017 11:48 AM, Giuseppe CAVALLARO wrote:

Hello

On 3/23/2017 11:20 AM, Corentin Labbe wrote:

I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>Could you please share the iperf cmds you are using in order for me 
to reproduce

>in my side?


Joao, you have a really powerful HW integration with multiple channels 
for both RX and TX.
Often this is not the same for other setup where, usually just a DMA0 
is present or, sometime, there

is just one RX extra channel.

My question is, what happens on this kind of configurations? Are we 
still guarantying the best performances?


Also we have to guarantee, that the TSO and SG are always working. 
Another point is the buffer sizes that

can be different among platforms.

The problem  below reported by Corentin push me to think that there is 
a bug, so we should
understand when this has been introduced and if likely fixed by some 
configuration we are

not take care right now.

ndesc_get_rx_status: Oversized frame spanned multiple buffers"


I wonder if this could be easily triggered by getting a big file via 
FTP. So not properly related on performance benchs


peppe




Best Regards
Peppe





Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"

2017-03-23 Thread Giuseppe CAVALLARO

Hello

On 3/23/2017 11:20 AM, Corentin Labbe wrote:

I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>Could you please share the iperf cmds you are using in order for me to 
reproduce
>in my side?


Joao, you have a really powerful HW integration with multiple channels 
for both RX and TX.
Often this is not the same for other setup where, usually just a DMA0 is 
present or, sometime, there

is just one RX extra channel.

My question is, what happens on this kind of configurations? Are we 
still guarantying the best performances?


Also we have to guarantee, that the TSO and SG are always working. 
Another point is the buffer sizes that

can be different among platforms.

The problem  below reported by Corentin push me to think that there is a 
bug, so we should
understand when this has been introduced and if likely fixed by some 
configuration we are

not take care right now.

ndesc_get_rx_status: Oversized frame spanned multiple buffers"


Best Regards
Peppe


Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"

2017-03-23 Thread Joao Pinto
Às 10:20 AM de 3/23/2017, Corentin Labbe escreveu:
> On Thu, Mar 23, 2017 at 10:12:18AM +, Joao Pinto wrote:
>>
>> Hi Corentin,
>>
>> Às 10:08 AM de 3/23/2017, Corentin Labbe escreveu:
>>> Hello
>>>
>>> Using next-20170323 produce a huge performance regression on my sunxi 
>>> boards.
>>> On dwmac-sun8i, iperf goes from 94mbs/s to 37 when sending.
>>>
>>> On cubieboard2(dwmac-sunxi), iperf made the kernel flood with 
>>> "ndesc_get_rx_status: Oversized frame spanned multiple buffers"
>>> and network is lost after.
>>>
>>> Reverting aff3d9eff84399e433c4aca65a9bb236581bc082 fix the issue.
>>> I still try to found which part of this patch mades the performance lower.
>>>
>>> Regards
>>> Corentin Labbe
>>>
>>
>> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>> Could you please share the iperf cmds you are using in order for me to 
>> reproduce
>> in my side?
> 
> simple iperf -c serverip for both board
> 

Ok, I am going to run my tests with a fresh net-next and come back to you soon.

Thanks,
Joao


Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"

2017-03-23 Thread Corentin Labbe
On Thu, Mar 23, 2017 at 10:12:18AM +, Joao Pinto wrote:
> 
> Hi Corentin,
> 
> Às 10:08 AM de 3/23/2017, Corentin Labbe escreveu:
> > Hello
> > 
> > Using next-20170323 produce a huge performance regression on my sunxi 
> > boards.
> > On dwmac-sun8i, iperf goes from 94mbs/s to 37 when sending.
> > 
> > On cubieboard2(dwmac-sunxi), iperf made the kernel flood with 
> > "ndesc_get_rx_status: Oversized frame spanned multiple buffers"
> > and network is lost after.
> > 
> > Reverting aff3d9eff84399e433c4aca65a9bb236581bc082 fix the issue.
> > I still try to found which part of this patch mades the performance lower.
> > 
> > Regards
> > Corentin Labbe
> > 
> 
> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
> Could you please share the iperf cmds you are using in order for me to 
> reproduce
> in my side?

simple iperf -c serverip for both board



Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"

2017-03-23 Thread Joao Pinto

Hi Corentin,

Às 10:08 AM de 3/23/2017, Corentin Labbe escreveu:
> Hello
> 
> Using next-20170323 produce a huge performance regression on my sunxi boards.
> On dwmac-sun8i, iperf goes from 94mbs/s to 37 when sending.
> 
> On cubieboard2(dwmac-sunxi), iperf made the kernel flood with 
> "ndesc_get_rx_status: Oversized frame spanned multiple buffers"
> and network is lost after.
> 
> Reverting aff3d9eff84399e433c4aca65a9bb236581bc082 fix the issue.
> I still try to found which part of this patch mades the performance lower.
> 
> Regards
> Corentin Labbe
> 

I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
Could you please share the iperf cmds you are using in order for me to reproduce
in my side?

@stmmac users: It would be great if people that have a setup could also perform
teh same iperf test in order to clean in up for everyone.

Thanks,
Joao


stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"

2017-03-23 Thread Corentin Labbe
Hello

Using next-20170323 produce a huge performance regression on my sunxi boards.
On dwmac-sun8i, iperf goes from 94mbs/s to 37 when sending.

On cubieboard2(dwmac-sunxi), iperf made the kernel flood with 
"ndesc_get_rx_status: Oversized frame spanned multiple buffers"
and network is lost after.

Reverting aff3d9eff84399e433c4aca65a9bb236581bc082 fix the issue.
I still try to found which part of this patch mades the performance lower.

Regards
Corentin Labbe