Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-04 Thread Jose Abreu



On 04-09-2018 10:16, Jerome Brunet wrote:
> On Mon, 2018-09-03 at 16:47 +0100, Jose Abreu wrote:
>> On 03-09-2018 16:38, Jerome Brunet wrote:
>>> On Mon, 2018-09-03 at 16:22 +0100, Jose Abreu wrote:
 On 03-09-2018 15:10, Jerome Brunet wrote:
> On Mon, 2018-09-03 at 12:47 +0100, Jose Abreu wrote:
>> On 03-09-2018 11:16, Jerome Brunet wrote:
>>> No notable change. Rx is fine but Tx:
>>> [  5]   3.00-4.00   sec  3.55 MBytes  29.8 Mbits/sec   51   12.7 KBytes
>>>
>>> I suppose the problem as something to do with the retries. When doing 
>>> Tx test
>>> alone, we don't have such a things a throughput where we expect it to 
>>> be.
>> Yeah, I just remembered you are not using GMAC4 so it wouldn't
>> make a difference. Is your version 3.710? If so please try adding
>> the following compatible to your DT bindings "snps,dwmac-3.710".
> According to the documentation, it is a 3.70a but I learn (the hard way) 
> not to
> trust the documentation too much. Is there anyway to make sure which 
> version we
> have. Like a register to read ?
 It should be dumped at probe by a string like this one:

 "User ID: 0xXY, Synopsys ID: 0xXZ"
>>> User ID: 0x11, Synopsys ID: 0x37 ? What to does it map to ?
>> Its 3.7. As for the User ID this can be changed by final HW team
>> so I can't confirm what it means.
>>
> Is there anyway to know if it is a 3.70a or 3.71 ?

If the user ID wasn't changed from default then its 3.71.

>
> Out of curiosity, I changed the compatible to "snps,dwmac-3.710" anyway. 
> For
> some reason, the MDIO bus failed to register with this. Since it is not 
> the
> documented version, I did not check why.
 No you can't change. You need to add it. So it should stay like this:

 compatible = "amlogic,meson-gxbb-dwmac", "snps,dwmac",
 "snps,dwmac-3.710";
>>> Adding "snps,dwmac-3.710" does not change anything for me.
>>> Having both Tx and Rx at the same time still wreck Tx throughput 
>>> unfortunately 
>> Okay, so you said that there are lots of retries: can you disable
>> COE at all ? (it should be something like: ethtool -K eth0 rx off
>> tx off).
> Done but no change.

Ok. Are you able to analyze the sent / received packets using
pcap so that we can understand why there are lots of retries ?

Thanks and Best Regards,
Jose Miguel Abreu

>
>> Thanks and Best Regards,
>> Jose Miguel Abreu
>>
 Thanks and Best Regards,
 Jose Miguel Abreu

>>> By the way, your mailer (and its auto 80 column rule I suppose) made 
>>> the patch
>>> below a bit harder to apply
>> Sorry. Next time I will send as attachment.
> No worries
>
>> Thanks and Best Regards,
>> Jose Miguel Abreu
>>
>



Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-04 Thread Jerome Brunet
On Mon, 2018-09-03 at 16:47 +0100, Jose Abreu wrote:
> On 03-09-2018 16:38, Jerome Brunet wrote:
> > On Mon, 2018-09-03 at 16:22 +0100, Jose Abreu wrote:
> > > On 03-09-2018 15:10, Jerome Brunet wrote:
> > > > On Mon, 2018-09-03 at 12:47 +0100, Jose Abreu wrote:
> > > > > On 03-09-2018 11:16, Jerome Brunet wrote:
> > > > > > No notable change. Rx is fine but Tx:
> > > > > > [  5]   3.00-4.00   sec  3.55 MBytes  29.8 Mbits/sec   51   12.7 
> > > > > > KBytes
> > > > > > 
> > > > > > I suppose the problem as something to do with the retries. When 
> > > > > > doing Tx test
> > > > > > alone, we don't have such a things a throughput where we expect it 
> > > > > > to be.
> > > > > 
> > > > > Yeah, I just remembered you are not using GMAC4 so it wouldn't
> > > > > make a difference. Is your version 3.710? If so please try adding
> > > > > the following compatible to your DT bindings "snps,dwmac-3.710".
> > > > 
> > > > According to the documentation, it is a 3.70a but I learn (the hard 
> > > > way) not to
> > > > trust the documentation too much. Is there anyway to make sure which 
> > > > version we
> > > > have. Like a register to read ?
> > > 
> > > It should be dumped at probe by a string like this one:
> > > 
> > > "User ID: 0xXY, Synopsys ID: 0xXZ"
> > 
> > User ID: 0x11, Synopsys ID: 0x37 ? What to does it map to ?
> 
> Its 3.7. As for the User ID this can be changed by final HW team
> so I can't confirm what it means.
> 

Is there anyway to know if it is a 3.70a or 3.71 ?

> > 
> > > > Out of curiosity, I changed the compatible to "snps,dwmac-3.710" 
> > > > anyway. For
> > > > some reason, the MDIO bus failed to register with this. Since it is not 
> > > > the
> > > > documented version, I did not check why.
> > > 
> > > No you can't change. You need to add it. So it should stay like this:
> > > 
> > > compatible = "amlogic,meson-gxbb-dwmac", "snps,dwmac",
> > > "snps,dwmac-3.710";
> > 
> > Adding "snps,dwmac-3.710" does not change anything for me.
> > Having both Tx and Rx at the same time still wreck Tx throughput 
> > unfortunately 
> 
> Okay, so you said that there are lots of retries: can you disable
> COE at all ? (it should be something like: ethtool -K eth0 rx off
> tx off).

Done but no change.

> 
> Thanks and Best Regards,
> Jose Miguel Abreu
> 
> > 
> > > Thanks and Best Regards,
> > > Jose Miguel Abreu
> > > 
> > > > > > By the way, your mailer (and its auto 80 column rule I suppose) 
> > > > > > made the patch
> > > > > > below a bit harder to apply
> > > > > 
> > > > > Sorry. Next time I will send as attachment.
> > > > 
> > > > No worries
> > > > 
> > > > > Thanks and Best Regards,
> > > > > Jose Miguel Abreu
> 
> 




Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-03 Thread Jose Abreu
On 03-09-2018 16:38, Jerome Brunet wrote:
> On Mon, 2018-09-03 at 16:22 +0100, Jose Abreu wrote:
>> On 03-09-2018 15:10, Jerome Brunet wrote:
>>> On Mon, 2018-09-03 at 12:47 +0100, Jose Abreu wrote:
 On 03-09-2018 11:16, Jerome Brunet wrote:
> No notable change. Rx is fine but Tx:
> [  5]   3.00-4.00   sec  3.55 MBytes  29.8 Mbits/sec   51   12.7 KBytes
>
> I suppose the problem as something to do with the retries. When doing Tx 
> test
> alone, we don't have such a things a throughput where we expect it to be.
 Yeah, I just remembered you are not using GMAC4 so it wouldn't
 make a difference. Is your version 3.710? If so please try adding
 the following compatible to your DT bindings "snps,dwmac-3.710".
>>> According to the documentation, it is a 3.70a but I learn (the hard way) 
>>> not to
>>> trust the documentation too much. Is there anyway to make sure which 
>>> version we
>>> have. Like a register to read ?
>> It should be dumped at probe by a string like this one:
>>
>> "User ID: 0xXY, Synopsys ID: 0xXZ"
> User ID: 0x11, Synopsys ID: 0x37 ? What to does it map to ?

Its 3.7. As for the User ID this can be changed by final HW team
so I can't confirm what it means.

>
>>> Out of curiosity, I changed the compatible to "snps,dwmac-3.710" anyway. For
>>> some reason, the MDIO bus failed to register with this. Since it is not the
>>> documented version, I did not check why.
>> No you can't change. You need to add it. So it should stay like this:
>>
>> compatible = "amlogic,meson-gxbb-dwmac", "snps,dwmac",
>> "snps,dwmac-3.710";
> Adding "snps,dwmac-3.710" does not change anything for me.
> Having both Tx and Rx at the same time still wreck Tx throughput 
> unfortunately 

Okay, so you said that there are lots of retries: can you disable
COE at all ? (it should be something like: ethtool -K eth0 rx off
tx off).

Thanks and Best Regards,
Jose Miguel Abreu

>
>> Thanks and Best Regards,
>> Jose Miguel Abreu
>>
> By the way, your mailer (and its auto 80 column rule I suppose) made the 
> patch
> below a bit harder to apply
 Sorry. Next time I will send as attachment.
>>> No worries
>>>
 Thanks and Best Regards,
 Jose Miguel Abreu
>>
>



Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-03 Thread Jerome Brunet
On Mon, 2018-09-03 at 16:22 +0100, Jose Abreu wrote:
> On 03-09-2018 15:10, Jerome Brunet wrote:
> > On Mon, 2018-09-03 at 12:47 +0100, Jose Abreu wrote:
> > > On 03-09-2018 11:16, Jerome Brunet wrote:
> > > > No notable change. Rx is fine but Tx:
> > > > [  5]   3.00-4.00   sec  3.55 MBytes  29.8 Mbits/sec   51   12.7 KBytes
> > > > 
> > > > I suppose the problem as something to do with the retries. When doing 
> > > > Tx test
> > > > alone, we don't have such a things a throughput where we expect it to 
> > > > be.
> > > 
> > > Yeah, I just remembered you are not using GMAC4 so it wouldn't
> > > make a difference. Is your version 3.710? If so please try adding
> > > the following compatible to your DT bindings "snps,dwmac-3.710".
> > 
> > According to the documentation, it is a 3.70a but I learn (the hard way) 
> > not to
> > trust the documentation too much. Is there anyway to make sure which 
> > version we
> > have. Like a register to read ?
> 
> It should be dumped at probe by a string like this one:
> 
> "User ID: 0xXY, Synopsys ID: 0xXZ"

User ID: 0x11, Synopsys ID: 0x37 ? What to does it map to ?

> 
> > 
> > Out of curiosity, I changed the compatible to "snps,dwmac-3.710" anyway. For
> > some reason, the MDIO bus failed to register with this. Since it is not the
> > documented version, I did not check why.
> 
> No you can't change. You need to add it. So it should stay like this:
> 
> compatible = "amlogic,meson-gxbb-dwmac", "snps,dwmac",
> "snps,dwmac-3.710";

Adding "snps,dwmac-3.710" does not change anything for me.
Having both Tx and Rx at the same time still wreck Tx throughput unfortunately 

> 
> Thanks and Best Regards,
> Jose Miguel Abreu
> 
> > 
> > > > By the way, your mailer (and its auto 80 column rule I suppose) made 
> > > > the patch
> > > > below a bit harder to apply
> > > 
> > > Sorry. Next time I will send as attachment.
> > 
> > No worries
> > 
> > > Thanks and Best Regards,
> > > Jose Miguel Abreu
> 
> 




Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-03 Thread Jose Abreu
On 03-09-2018 15:10, Jerome Brunet wrote:
> On Mon, 2018-09-03 at 12:47 +0100, Jose Abreu wrote:
>> On 03-09-2018 11:16, Jerome Brunet wrote:
>>> No notable change. Rx is fine but Tx:
>>> [  5]   3.00-4.00   sec  3.55 MBytes  29.8 Mbits/sec   51   12.7 KBytes
>>>
>>> I suppose the problem as something to do with the retries. When doing Tx 
>>> test
>>> alone, we don't have such a things a throughput where we expect it to be.
>> Yeah, I just remembered you are not using GMAC4 so it wouldn't
>> make a difference. Is your version 3.710? If so please try adding
>> the following compatible to your DT bindings "snps,dwmac-3.710".
> According to the documentation, it is a 3.70a but I learn (the hard way) not 
> to
> trust the documentation too much. Is there anyway to make sure which version 
> we
> have. Like a register to read ?

It should be dumped at probe by a string like this one:

"User ID: 0xXY, Synopsys ID: 0xXZ"

>
> Out of curiosity, I changed the compatible to "snps,dwmac-3.710" anyway. For
> some reason, the MDIO bus failed to register with this. Since it is not the
> documented version, I did not check why.

No you can't change. You need to add it. So it should stay like this:

compatible = "amlogic,meson-gxbb-dwmac", "snps,dwmac",
"snps,dwmac-3.710";

Thanks and Best Regards,
Jose Miguel Abreu

>
>>> By the way, your mailer (and its auto 80 column rule I suppose) made the 
>>> patch
>>> below a bit harder to apply
>> Sorry. Next time I will send as attachment.
> No worries
>
>> Thanks and Best Regards,
>> Jose Miguel Abreu
>



Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-03 Thread Jerome Brunet
On Mon, 2018-09-03 at 12:47 +0100, Jose Abreu wrote:
> On 03-09-2018 11:16, Jerome Brunet wrote:
> > No notable change. Rx is fine but Tx:
> > [  5]   3.00-4.00   sec  3.55 MBytes  29.8 Mbits/sec   51   12.7 KBytes
> > 
> > I suppose the problem as something to do with the retries. When doing Tx 
> > test
> > alone, we don't have such a things a throughput where we expect it to be.
> 
> Yeah, I just remembered you are not using GMAC4 so it wouldn't
> make a difference. Is your version 3.710? If so please try adding
> the following compatible to your DT bindings "snps,dwmac-3.710".

According to the documentation, it is a 3.70a but I learn (the hard way) not to
trust the documentation too much. Is there anyway to make sure which version we
have. Like a register to read ?

Out of curiosity, I changed the compatible to "snps,dwmac-3.710" anyway. For
some reason, the MDIO bus failed to register with this. Since it is not the
documented version, I did not check why.

> 
> > 
> > By the way, your mailer (and its auto 80 column rule I suppose) made the 
> > patch
> > below a bit harder to apply
> 
> Sorry. Next time I will send as attachment.

No worries

> 
> Thanks and Best Regards,
> Jose Miguel Abreu




Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-03 Thread Jose Abreu
On 03-09-2018 11:16, Jerome Brunet wrote:
> No notable change. Rx is fine but Tx:
> [  5]   3.00-4.00   sec  3.55 MBytes  29.8 Mbits/sec   51   12.7 KBytes
>
> I suppose the problem as something to do with the retries. When doing Tx test
> alone, we don't have such a things a throughput where we expect it to be.

Yeah, I just remembered you are not using GMAC4 so it wouldn't
make a difference. Is your version 3.710? If so please try adding
the following compatible to your DT bindings "snps,dwmac-3.710".

>
> By the way, your mailer (and its auto 80 column rule I suppose) made the patch
> below a bit harder to apply

Sorry. Next time I will send as attachment.

Thanks and Best Regards,
Jose Miguel Abreu


Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-03 Thread Jerome Brunet
On Mon, 2018-09-03 at 10:36 +0100, Jose Abreu wrote:
> Hi Jerome,
> 
> On 03-09-2018 09:56, Jerome Brunet wrote:
> > On Thu, 2018-08-30 at 11:37 +0100, Jose Abreu wrote:
> > > [ As for now this is only for testing! ]
> > > 
> > > This follows David Miller advice and tries to fix coalesce timer in
> > > multi-queue scenarios.
> > > 
> > > We are now using per-queue coalesce values and per-queue TX timer. This
> > > assumes that tx_queues == rx_queues, which can not be necessarly true.
> > > Official patch will need to have this fixed.
> > > 
> > > Coalesce timer default values was changed to 1ms and the coalesce frames
> > > to 25.
> > > 
> > > Tested in B2B setup between XGMAC2 and GMAC5.
> > 
> > Tested on Amlogic meson-axg-s400. No regression seen so far.
> > (arch/arm64/boot/dts/amlogic/meson-axg-s400.dts)
> > 
> > As far as I understand from the device tree parsing, this platform (and all
> > other amlogic platforms) use single queue.
> 
> Thanks for testing! I will send a formal patch once I get around
> the problem of rx queues != tx queues.
> 
> > 
> > ---
> > 
> > Jose,
> > 
> > On another topic doing iperf3 test on amlogic's devices we seen a strange
> > behavior.
> > 
> > Doing Tx or Rx test usually works fine (700MBps to 900MBps depending on the
> > platform). However, when doing both Rx and Tx at the same time, We see the 
> > Tx
> > throughput dropping significantly (~30MBps) and lot of TCP retries.
> > 
> > Would you any idea what might be our problem ? or how to start investigating
> > this ?
> > 
> 
> I'm not able to reproduce this here but I'm using multiple queue.
> I will try with single queue. In the meantime please try this
> patch (it shall be applied directly on top of this RFT):

No notable change. Rx is fine but Tx:
[  5]   3.00-4.00   sec  3.55 MBytes  29.8 Mbits/sec   51   12.7 KBytes

I suppose the problem as something to do with the retries. When doing Tx test
alone, we don't have such a things a throughput where we expect it to be.

By the way, your mailer (and its auto 80 column rule I suppose) made the patch
below a bit harder to apply

> 
> 
> --->8
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index ae26a6e8608e..1407975320aa 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -2210,8 +2210,7 @@ static int stmmac_init_dma_engine(struct
> stmmac_priv *priv)
> stmmac_init_tx_chan(priv, priv->ioaddr,
> priv->plat->dma_cfg,
> tx_q->dma_tx_phy, chan);
>  
> -   tx_q->tx_tail_addr = tx_q->dma_tx_phy +
> -   (DMA_TX_SIZE * sizeof(struct dma_desc));
> +   tx_q->tx_tail_addr = tx_q->dma_tx_phy;
> stmmac_set_tx_tail_ptr(priv, priv->ioaddr,
>tx_q->tx_tail_addr, chan);
> }
> @@ -3004,6 +3003,7 @@ static netdev_tx_t stmmac_tso_xmit(struct
> sk_buff *skb, struct net_device *dev)
>  
> netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue),
> skb->len);
>  
> +   tx_q->tx_tail_addr = tx_q->dma_tx_phy + (tx_q->cur_tx *
> sizeof(*desc));
> stmmac_set_tx_tail_ptr(priv, priv->ioaddr,
> tx_q->tx_tail_addr, queue);
>  
> if (priv->tx_coal_timer && !tx_q->tx_timer_active) {
> @@ -3223,6 +3223,8 @@ static netdev_tx_t stmmac_xmit(struct
> sk_buff *skb, struct net_device *dev)
> netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue),
> skb->len);
>  
> stmmac_enable_dma_transmission(priv, priv->ioaddr);
> +
> +   tx_q->tx_tail_addr = tx_q->dma_tx_phy + (tx_q->cur_tx *
> sizeof(*desc));
> stmmac_set_tx_tail_ptr(priv, priv->ioaddr,
> tx_q->tx_tail_addr, queue);
>  
> if (priv->tx_coal_timer && !tx_q->tx_timer_active) {
> --->8
> 
> Thanks and Best Regards,
> Jose Miguel Abreu




Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-03 Thread Jose Abreu
Hi Jerome,

On 03-09-2018 09:56, Jerome Brunet wrote:
> On Thu, 2018-08-30 at 11:37 +0100, Jose Abreu wrote:
>> [ As for now this is only for testing! ]
>>
>> This follows David Miller advice and tries to fix coalesce timer in
>> multi-queue scenarios.
>>
>> We are now using per-queue coalesce values and per-queue TX timer. This
>> assumes that tx_queues == rx_queues, which can not be necessarly true.
>> Official patch will need to have this fixed.
>>
>> Coalesce timer default values was changed to 1ms and the coalesce frames
>> to 25.
>>
>> Tested in B2B setup between XGMAC2 and GMAC5.
> Tested on Amlogic meson-axg-s400. No regression seen so far.
> (arch/arm64/boot/dts/amlogic/meson-axg-s400.dts)
>
> As far as I understand from the device tree parsing, this platform (and all
> other amlogic platforms) use single queue.

Thanks for testing! I will send a formal patch once I get around
the problem of rx queues != tx queues.

>
> ---
>
> Jose,
>
> On another topic doing iperf3 test on amlogic's devices we seen a strange
> behavior.
>
> Doing Tx or Rx test usually works fine (700MBps to 900MBps depending on the
> platform). However, when doing both Rx and Tx at the same time, We see the Tx
> throughput dropping significantly (~30MBps) and lot of TCP retries.
>
> Would you any idea what might be our problem ? or how to start investigating
> this ?
>

I'm not able to reproduce this here but I'm using multiple queue.
I will try with single queue. In the meantime please try this
patch (it shall be applied directly on top of this RFT):


--->8
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index ae26a6e8608e..1407975320aa 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2210,8 +2210,7 @@ static int stmmac_init_dma_engine(struct
stmmac_priv *priv)
stmmac_init_tx_chan(priv, priv->ioaddr,
priv->plat->dma_cfg,
tx_q->dma_tx_phy, chan);
 
-   tx_q->tx_tail_addr = tx_q->dma_tx_phy +
-   (DMA_TX_SIZE * sizeof(struct dma_desc));
+   tx_q->tx_tail_addr = tx_q->dma_tx_phy;
stmmac_set_tx_tail_ptr(priv, priv->ioaddr,
   tx_q->tx_tail_addr, chan);
}
@@ -3004,6 +3003,7 @@ static netdev_tx_t stmmac_tso_xmit(struct
sk_buff *skb, struct net_device *dev)
 
netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue),
skb->len);
 
+   tx_q->tx_tail_addr = tx_q->dma_tx_phy + (tx_q->cur_tx *
sizeof(*desc));
stmmac_set_tx_tail_ptr(priv, priv->ioaddr,
tx_q->tx_tail_addr, queue);
 
if (priv->tx_coal_timer && !tx_q->tx_timer_active) {
@@ -3223,6 +3223,8 @@ static netdev_tx_t stmmac_xmit(struct
sk_buff *skb, struct net_device *dev)
netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue),
skb->len);
 
stmmac_enable_dma_transmission(priv, priv->ioaddr);
+
+   tx_q->tx_tail_addr = tx_q->dma_tx_phy + (tx_q->cur_tx *
sizeof(*desc));
stmmac_set_tx_tail_ptr(priv, priv->ioaddr,
tx_q->tx_tail_addr, queue);
 
if (priv->tx_coal_timer && !tx_q->tx_timer_active) {
--->8

Thanks and Best Regards,
Jose Miguel Abreu


Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-03 Thread Jerome Brunet
On Thu, 2018-08-30 at 11:37 +0100, Jose Abreu wrote:
> [ As for now this is only for testing! ]
> 
> This follows David Miller advice and tries to fix coalesce timer in
> multi-queue scenarios.
> 
> We are now using per-queue coalesce values and per-queue TX timer. This
> assumes that tx_queues == rx_queues, which can not be necessarly true.
> Official patch will need to have this fixed.
> 
> Coalesce timer default values was changed to 1ms and the coalesce frames
> to 25.
> 
> Tested in B2B setup between XGMAC2 and GMAC5.

Tested on Amlogic meson-axg-s400. No regression seen so far.
(arch/arm64/boot/dts/amlogic/meson-axg-s400.dts)

As far as I understand from the device tree parsing, this platform (and all
other amlogic platforms) use single queue.

---

Jose,

On another topic doing iperf3 test on amlogic's devices we seen a strange
behavior.

Doing Tx or Rx test usually works fine (700MBps to 900MBps depending on the
platform). However, when doing both Rx and Tx at the same time, We see the Tx
throughput dropping significantly (~30MBps) and lot of TCP retries.

Would you any idea what might be our problem ? or how to start investigating
this ?