Re: ath10k wake_tx_queue issues

2018-05-17 Thread Niklas Cassel
On Wed, May 16, 2018 at 07:28:21PM +0200, Erik Stromdahl wrote:
> Hello Niklas
> 
> Quick question:
> Are you using my patch: "ath10k: add htt_tx num_pending window"?

Nope, but I definitely think that your patch should be merged,
since the current code can lock/unlock/lock a lot of times for
no good reason.

(I actually tried it, but I could still reproduce the bug with it.)

> 
> I assume (from your logs below) that you are not...
> 

Thanks a lot for you suggestions Erik!

Increasing max_num_pending might be a good idea (perhaps we will
get better thoughput in the SDIO case).

However, increasing either max_num_pending or
HTC_HOST_MAX_MSG_PER_TX_BUNDLE would probably just move the problem,
since it would still be possible for us to get hit by the same problem
again in the future.

I actually took Toke's suggestion and cooked up a patch:
https://marc.info/?l=linux-kernel=152659902128543=2


Kind regards,
Niklas




Re: ath10k wake_tx_queue issues

2018-05-16 Thread Erik Stromdahl

Hello Niklas

Quick question:
Are you using my patch: "ath10k: add htt_tx num_pending window"?

I assume (from your logs below) that you are not...

See more comments below.




I guess the best way to describe this is to show my ftrace buffer:

  ksoftirqd/2-21[002] .ns474.711744: ath10k_htt_tx_dec_pending: 
num_pen: 60
  ksoftirqd/2-21[002] .ns374.711761: ath10k_mac_op_wake_tx_queue: 
txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 60 
queue: 0
  ksoftirqd/2-21[002] .ns474.711765: ath10k_htt_tx_inc_pending: 
num_pen: 61
  ksoftirqd/2-21[002] .ns474.711781: ath10k_htt_tx_inc_pending: 
num_pen: 62
  ksoftirqd/2-21[002] .ns474.711787: ath10k_htt_tx_dec_pending: 
num_pen: 61
  ksoftirqd/2-21[002] .ns374.711803: ath10k_mac_op_wake_tx_queue: 
txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 61 
queue: 0
  ksoftirqd/2-21[002] .ns474.711807: ath10k_htt_tx_inc_pending: 
num_pen: 62
  ksoftirqd/2-21[002] .ns474.711823: ath10k_htt_tx_inc_pending: 
num_pen: 63
  ksoftirqd/2-21[002] .ns474.711829: ath10k_htt_tx_dec_pending: 
num_pen: 62
  ksoftirqd/2-21[002] .ns374.711845: ath10k_mac_op_wake_tx_queue: 
txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 62 
queue: 0
  ksoftirqd/2-21[002] .ns474.711849: ath10k_htt_tx_inc_pending: 
num_pen: 63
  ksoftirqd/2-21[002] .ns474.711865: ath10k_htt_tx_inc_pending: 
num_pen: 64

For high latency devices, when the num_pending counter reaches this point (64),
ath10k calls ieee80211_stop_queues (from ath10k_htt_tx_inc_pending -> 
ath10k_mac_tx_lock).
This will stop all per-TID queues (16 in total).


  ksoftirqd/2-21[002] dns574.711870: stop_queue: phy0 queue:0, 
reason:0
  ksoftirqd/2-21[002] dns574.711874: stop_queue: phy0 queue:1, 
reason:0
  ksoftirqd/2-21[002] dns574.711877: stop_queue: phy0 queue:2, 
reason:0
  ksoftirqd/2-21[002] dns574.711880: stop_queue: phy0 queue:3, 
reason:0
  ksoftirqd/2-21[002] dns574.711882: stop_queue: phy0 queue:4, 
reason:0
  ksoftirqd/2-21[002] dns574.711885: stop_queue: phy0 queue:5, 
reason:0
  ksoftirqd/2-21[002] dns574.711887: stop_queue: phy0 queue:6, 
reason:0
  ksoftirqd/2-21[002] dns574.711890: stop_queue: phy0 queue:7, 
reason:0
  ksoftirqd/2-21[002] dns574.711892: stop_queue: phy0 queue:8, 
reason:0
  ksoftirqd/2-21[002] dns574.711895: stop_queue: phy0 queue:9, 
reason:0
  ksoftirqd/2-21[002] dns574.711898: stop_queue: phy0 queue:10, 
reason:0
  ksoftirqd/2-21[002] dns574.711900: stop_queue: phy0 queue:11, 
reason:0
  ksoftirqd/2-21[002] dns574.711903: stop_queue: phy0 queue:12, 
reason:0
  ksoftirqd/2-21[002] dns574.711905: stop_queue: phy0 queue:13, 
reason:0
  ksoftirqd/2-21[002] dns574.711908: stop_queue: phy0 queue:14, 
reason:0
  ksoftirqd/2-21[002] dns574.711910: stop_queue: phy0 queue:15, 
reason:0
  ksoftirqd/2-21[002] .ns474.711917: ath10k_htt_tx_dec_pending: 
num_pen: 63

At this point, the driver have received a TX_COMPL_IND (since num_pending has 
been decremented to 63).
Without the patch I mentioned above, the logic is that the TX queues will be 
re-enabled immediately.
This is achieved by calling ieee80211_stop_queues.


  ksoftirqd/2-21[002] dns574.711922: wake_queue: phy0 queue:0, 
reason:0
  ksoftirqd/2-21[002] dns574.711929: wake_queue: phy0 queue:15, 
reason:0

2 queues have been enabled here. I don't know why not all queues have been 
re-enabled.
Perhaps it is due to that mac80211 immediately gets more data on one of the 
just re-enabled queues,
resulting in...

  ksoftirqd/2-21[002] .ns374.711948: ath10k_mac_op_wake_tx_queue: 
txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 63 
queue: 0

...the the wake_tx_queue op beeing invoked again ...

  ksoftirqd/2-21[002] .ns474.711952: ath10k_htt_tx_inc_pending: 
num_pen: 64

... and the num_pending counter beeing incremented as a result of this.
This results in all queues beeing ...

  ksoftirqd/2-21[002] dns574.711956: stop_queue: phy0 queue:0, 
reason:0
  ksoftirqd/2-21[002] dns574.711959: stop_queue: phy0 queue:1, 
reason:0
  ksoftirqd/2-21[002] dns574.711962: stop_queue: phy0 queue:2, 
reason:0
  ksoftirqd/2-21[002] dns574.711964: stop_queue: phy0 queue:3, 
reason:0
  ksoftirqd/2-21[002] dns574.711967: stop_queue: phy0 queue:4, 
reason:0
  ksoftirqd/2-21[002] dns574.711969: stop_queue: phy0 queue:5, 
reason:0
  ksoftirqd/2-21[002] dns574.711972: stop_queue: phy0 queue:6, 
reason:0
  ksoftirqd/2-21[002] dns574.711974: stop_queue: phy0 queue:7, 
reason:0
  ksoftirqd/2-21  

Re: ath10k wake_tx_queue issues

2018-05-16 Thread Toke Høiland-Jørgensen
Niklas Cassel  writes:
> [ .. snip .. ]
>> > Sure, the regular way ath10k_mac_op_wake_tx_queue is called is via
>> > ieee80211_subif_start_xmit => __ieee80211_subif_start_xmit => 
>> > ieee80211_xmit_fast
>> > => ieee80211_queue_skb => drv_wake_tx_queue.
>> >
>> > But I was expecting the call to ieee80211_wake_queue to somehow trigger a 
>> > call
>> > to ath10k_mac_op_wake_tx_queue, since there is still data in the send 
>> > buffer/
>> > in the ieee80211_txq that needs to be sent, to allow more data to be 
>> > written to
>> > the socket. But obviously the callback never comes.
>> > Or how else is this supposed to work?
>> 
>> The driver should reschedule itself before/after calling
>> ieee80211_wake_queue. mt76 does this; I'm not actually sure if ath9k
>> does the right thing either, I'm not too familiar with that part of the
>> code. There's no direct call to reschedule that I can see, but there may
>> be another reason why this is not needed for ath9k. I'm sure Felix
>> knows?
>
> Hello Toke
>
> Unfortunately, it doesn't look like mt76 uses any ieee80211_* function
> to reschedule.

It doesn't need to; it just reschedules itself.

Basically, the wake_tx_queue() callback is just a way for mac80211 to
notify the driver that new packets are available and that it should
start its scheduling function. But in this case it is the driver that is
restarting the queues, so it already knows that. And so it can just call
its internal scheduling function. This is what mt76 does in
mt76_dma_tx_cleanup() with the call to mt76_txq_schedule() before
calling ieee80211_wake_queue().

I think that what ath10k should be doing is calling
ath10k_mac_tx_push_pending() when it restarts the queues.

> I just came across a ieee80211_schedule_txq() function in e937b8da5a59
> ("mac80211: Add TXQ scheduling API"). However, this commit was
> reverted. Any plans on resubmitting this?

Yeah, I have a revised version lying around waiting for Felix to review
it. But that wouldn't help this bug; it's just an API change, it doesn't
change behaviour...

-Toke


Re: ath10k wake_tx_queue issues

2018-05-16 Thread Niklas Cassel
On Wed, May 16, 2018 at 07:54:54AM +0200, Alexander Wetzel wrote:
> Hello,
> 
> This sounds exactly like the issue I just submitted a patch for.
> Can you test https://patchwork.kernel.org/patch/10399613/ if that solves
> the issue?

Hello Alexander,


I just tried you fix, and unfortunately it doesn't solve my problem.


Regards,
Niklas

> 
> 
> 
> Am 15.05.2018 um 22:31 schrieb Niklas Cassel:
> > On Tue, May 15, 2018 at 04:13:48PM +0200, Toke Høiland-Jørgensen wrote:
> >> [ Adding Felix ]
> >>
> >>
> >> Niklas Cassel  writes:
> >>
> >>> Hello mac80211 and ath10k people
> >>>
> >>> Using ath10k, TX stops working when running iperf
> >>>
> >>> [  3]  0.0- 1.0 sec  2.00 MBytes  16.8 Mbits/sec
> >>> [  3]  1.0- 2.0 sec  3.12 MBytes  26.2 Mbits/sec
> >>> [  3]  2.0- 3.0 sec  3.25 MBytes  27.3 Mbits/sec
> >>> [  3]  3.0- 4.0 sec   655 KBytes  5.36 Mbits/sec
> >>> [  3]  4.0- 5.0 sec  0.00 Bytes  0.00 bits/sec
> >>> [  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
> >>> [  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
> >>> [  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
> >>> [  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
> >>> [  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
> >>> [  3]  0.0-10.3 sec  9.01 MBytes  7.32 Mbits/sec
> >>>
> >>> The problem can be reproduced without specifying a send buffer size,
> >>> however, specifying a small send buffer helps to reproduce the problem 
> >>> faster.
> >>>
> >>> What happens is that iperf gets -EAGAIN on write().
> >>> It continues to get -EAGAIN, even if iperf runs for e.g. 300 seconds.
> >>> The reason why we get -EAGAIN is because the send socket buffer is full
> >>> (iperf uses non-blocking I/O).
> >>>
> >>> The problem is that the mac80211 wake_tx_queue callback never comes.
> >>>
> >>> I guess the best way to describe this is to show my ftrace buffer:
> >>>
> >>>  ksoftirqd/2-21[002] .ns474.711744: 
> >>> ath10k_htt_tx_dec_pending: num_pen: 60
> >>>  ksoftirqd/2-21[002] .ns374.711761: 
> >>> ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: 
> >>> frame_cnt: 1 byte_cnt: 1534 num_pen: 60 queue: 0
> >>>  ksoftirqd/2-21[002] .ns474.711765: 
> >>> ath10k_htt_tx_inc_pending: num_pen: 61
> >>>  ksoftirqd/2-21[002] .ns474.711781: 
> >>> ath10k_htt_tx_inc_pending: num_pen: 62
> >>>  ksoftirqd/2-21[002] .ns474.711787: 
> >>> ath10k_htt_tx_dec_pending: num_pen: 61
> >>>  ksoftirqd/2-21[002] .ns374.711803: 
> >>> ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: 
> >>> frame_cnt: 1 byte_cnt: 1534 num_pen: 61 queue: 0
> >>>  ksoftirqd/2-21[002] .ns474.711807: 
> >>> ath10k_htt_tx_inc_pending: num_pen: 62
> >>>  ksoftirqd/2-21[002] .ns474.711823: 
> >>> ath10k_htt_tx_inc_pending: num_pen: 63
> >>>  ksoftirqd/2-21[002] .ns474.711829: 
> >>> ath10k_htt_tx_dec_pending: num_pen: 62
> >>>  ksoftirqd/2-21[002] .ns374.711845: 
> >>> ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: 
> >>> frame_cnt: 1 byte_cnt: 1534 num_pen: 62 queue: 0
> >>>  ksoftirqd/2-21[002] .ns474.711849: 
> >>> ath10k_htt_tx_inc_pending: num_pen: 63
> >>>  ksoftirqd/2-21[002] .ns474.711865: 
> >>> ath10k_htt_tx_inc_pending: num_pen: 64
> >>>  ksoftirqd/2-21[002] dns574.711870: stop_queue: phy0 queue:0, 
> >>> reason:0
> >>>  ksoftirqd/2-21[002] dns574.711874: stop_queue: phy0 queue:1, 
> >>> reason:0
> >>>  ksoftirqd/2-21[002] dns574.711877: stop_queue: phy0 queue:2, 
> >>> reason:0
> >>>  ksoftirqd/2-21[002] dns574.711880: stop_queue: phy0 queue:3, 
> >>> reason:0
> >>>  ksoftirqd/2-21[002] dns574.711882: stop_queue: phy0 queue:4, 
> >>> reason:0
> >>>  ksoftirqd/2-21[002] dns574.711885: stop_queue: phy0 queue:5, 
> >>> reason:0
> >>>  ksoftirqd/2-21[002] dns574.711887: stop_queue: phy0 queue:6, 
> >>> reason:0
> >>>  ksoftirqd/2-21[002] dns574.711890: stop_queue: phy0 queue:7, 
> >>> reason:0
> >>>  ksoftirqd/2-21[002] dns574.711892: stop_queue: phy0 queue:8, 
> >>> reason:0
> >>>  ksoftirqd/2-21[002] dns574.711895: stop_queue: phy0 queue:9, 
> >>> reason:0
> >>>  ksoftirqd/2-21[002] dns574.711898: stop_queue: phy0 
> >>> queue:10, reason:0
> >>>  ksoftirqd/2-21[002] dns574.711900: stop_queue: phy0 
> >>> queue:11, reason:0
> >>>  ksoftirqd/2-21[002] dns574.711903: stop_queue: phy0 
> >>> queue:12, reason:0
> >>>  ksoftirqd/2-21[002] dns574.711905: stop_queue: phy0 
> >>> queue:13, reason:0
> >>>  ksoftirqd/2-21[002] dns574.711908: stop_queue: phy0 
> >>> queue:14, reason:0
> >>>  ksoftirqd/2-21[002] dns574.711910: stop_queue: phy0 
> >>> queue:15, reason:0
> >>>  ksoftirqd/2-21[002] .ns474.711917: 
> >>> ath10k_htt_tx_dec_pending: num_pen: 63
> >>>  ksoftirqd/2-21[002] dns5 

Re: ath10k wake_tx_queue issues

2018-05-16 Thread Alexander Wetzel
Hello,

This sounds exactly like the issue I just submitted a patch for.
Can you test https://patchwork.kernel.org/patch/10399613/ if that solves
the issue?



Am 15.05.2018 um 22:31 schrieb Niklas Cassel:
> On Tue, May 15, 2018 at 04:13:48PM +0200, Toke Høiland-Jørgensen wrote:
>> [ Adding Felix ]
>>
>>
>> Niklas Cassel  writes:
>>
>>> Hello mac80211 and ath10k people
>>>
>>> Using ath10k, TX stops working when running iperf
>>>
>>> [  3]  0.0- 1.0 sec  2.00 MBytes  16.8 Mbits/sec
>>> [  3]  1.0- 2.0 sec  3.12 MBytes  26.2 Mbits/sec
>>> [  3]  2.0- 3.0 sec  3.25 MBytes  27.3 Mbits/sec
>>> [  3]  3.0- 4.0 sec   655 KBytes  5.36 Mbits/sec
>>> [  3]  4.0- 5.0 sec  0.00 Bytes  0.00 bits/sec
>>> [  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
>>> [  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
>>> [  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
>>> [  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
>>> [  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
>>> [  3]  0.0-10.3 sec  9.01 MBytes  7.32 Mbits/sec
>>>
>>> The problem can be reproduced without specifying a send buffer size,
>>> however, specifying a small send buffer helps to reproduce the problem 
>>> faster.
>>>
>>> What happens is that iperf gets -EAGAIN on write().
>>> It continues to get -EAGAIN, even if iperf runs for e.g. 300 seconds.
>>> The reason why we get -EAGAIN is because the send socket buffer is full
>>> (iperf uses non-blocking I/O).
>>>
>>> The problem is that the mac80211 wake_tx_queue callback never comes.
>>>
>>> I guess the best way to describe this is to show my ftrace buffer:
>>>
>>>  ksoftirqd/2-21[002] .ns474.711744: ath10k_htt_tx_dec_pending: 
>>> num_pen: 60
>>>  ksoftirqd/2-21[002] .ns374.711761: 
>>> ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: 
>>> frame_cnt: 1 byte_cnt: 1534 num_pen: 60 queue: 0
>>>  ksoftirqd/2-21[002] .ns474.711765: ath10k_htt_tx_inc_pending: 
>>> num_pen: 61
>>>  ksoftirqd/2-21[002] .ns474.711781: ath10k_htt_tx_inc_pending: 
>>> num_pen: 62
>>>  ksoftirqd/2-21[002] .ns474.711787: ath10k_htt_tx_dec_pending: 
>>> num_pen: 61
>>>  ksoftirqd/2-21[002] .ns374.711803: 
>>> ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: 
>>> frame_cnt: 1 byte_cnt: 1534 num_pen: 61 queue: 0
>>>  ksoftirqd/2-21[002] .ns474.711807: ath10k_htt_tx_inc_pending: 
>>> num_pen: 62
>>>  ksoftirqd/2-21[002] .ns474.711823: ath10k_htt_tx_inc_pending: 
>>> num_pen: 63
>>>  ksoftirqd/2-21[002] .ns474.711829: ath10k_htt_tx_dec_pending: 
>>> num_pen: 62
>>>  ksoftirqd/2-21[002] .ns374.711845: 
>>> ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: 
>>> frame_cnt: 1 byte_cnt: 1534 num_pen: 62 queue: 0
>>>  ksoftirqd/2-21[002] .ns474.711849: ath10k_htt_tx_inc_pending: 
>>> num_pen: 63
>>>  ksoftirqd/2-21[002] .ns474.711865: ath10k_htt_tx_inc_pending: 
>>> num_pen: 64
>>>  ksoftirqd/2-21[002] dns574.711870: stop_queue: phy0 queue:0, 
>>> reason:0
>>>  ksoftirqd/2-21[002] dns574.711874: stop_queue: phy0 queue:1, 
>>> reason:0
>>>  ksoftirqd/2-21[002] dns574.711877: stop_queue: phy0 queue:2, 
>>> reason:0
>>>  ksoftirqd/2-21[002] dns574.711880: stop_queue: phy0 queue:3, 
>>> reason:0
>>>  ksoftirqd/2-21[002] dns574.711882: stop_queue: phy0 queue:4, 
>>> reason:0
>>>  ksoftirqd/2-21[002] dns574.711885: stop_queue: phy0 queue:5, 
>>> reason:0
>>>  ksoftirqd/2-21[002] dns574.711887: stop_queue: phy0 queue:6, 
>>> reason:0
>>>  ksoftirqd/2-21[002] dns574.711890: stop_queue: phy0 queue:7, 
>>> reason:0
>>>  ksoftirqd/2-21[002] dns574.711892: stop_queue: phy0 queue:8, 
>>> reason:0
>>>  ksoftirqd/2-21[002] dns574.711895: stop_queue: phy0 queue:9, 
>>> reason:0
>>>  ksoftirqd/2-21[002] dns574.711898: stop_queue: phy0 queue:10, 
>>> reason:0
>>>  ksoftirqd/2-21[002] dns574.711900: stop_queue: phy0 queue:11, 
>>> reason:0
>>>  ksoftirqd/2-21[002] dns574.711903: stop_queue: phy0 queue:12, 
>>> reason:0
>>>  ksoftirqd/2-21[002] dns574.711905: stop_queue: phy0 queue:13, 
>>> reason:0
>>>  ksoftirqd/2-21[002] dns574.711908: stop_queue: phy0 queue:14, 
>>> reason:0
>>>  ksoftirqd/2-21[002] dns574.711910: stop_queue: phy0 queue:15, 
>>> reason:0
>>>  ksoftirqd/2-21[002] .ns474.711917: ath10k_htt_tx_dec_pending: 
>>> num_pen: 63
>>>  ksoftirqd/2-21[002] dns574.711922: wake_queue: phy0 queue:0, 
>>> reason:0
>>>  ksoftirqd/2-21[002] dns574.711929: wake_queue: phy0 queue:15, 
>>> reason:0
>>>  ksoftirqd/2-21[002] .ns374.711948: 
>>> ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: 
>>> frame_cnt: 1 byte_cnt: 1534 num_pen: 63 queue: 0
>>>  ksoftirqd/2-21[002] .ns474.711952: 

Re: ath10k wake_tx_queue issues

2018-05-15 Thread Niklas Cassel
On Tue, May 15, 2018 at 04:13:48PM +0200, Toke Høiland-Jørgensen wrote:
> [ Adding Felix ]
> 
> 
> Niklas Cassel  writes:
> 
> > Hello mac80211 and ath10k people
> >
> > Using ath10k, TX stops working when running iperf
> >
> > [  3]  0.0- 1.0 sec  2.00 MBytes  16.8 Mbits/sec
> > [  3]  1.0- 2.0 sec  3.12 MBytes  26.2 Mbits/sec
> > [  3]  2.0- 3.0 sec  3.25 MBytes  27.3 Mbits/sec
> > [  3]  3.0- 4.0 sec   655 KBytes  5.36 Mbits/sec
> > [  3]  4.0- 5.0 sec  0.00 Bytes  0.00 bits/sec
> > [  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
> > [  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
> > [  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
> > [  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
> > [  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
> > [  3]  0.0-10.3 sec  9.01 MBytes  7.32 Mbits/sec
> >
> > The problem can be reproduced without specifying a send buffer size,
> > however, specifying a small send buffer helps to reproduce the problem 
> > faster.
> >
> > What happens is that iperf gets -EAGAIN on write().
> > It continues to get -EAGAIN, even if iperf runs for e.g. 300 seconds.
> > The reason why we get -EAGAIN is because the send socket buffer is full
> > (iperf uses non-blocking I/O).
> >
> > The problem is that the mac80211 wake_tx_queue callback never comes.
> >
> > I guess the best way to describe this is to show my ftrace buffer:
> >
> >  ksoftirqd/2-21[002] .ns474.711744: ath10k_htt_tx_dec_pending: 
> > num_pen: 60
> >  ksoftirqd/2-21[002] .ns374.711761: 
> > ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: 
> > frame_cnt: 1 byte_cnt: 1534 num_pen: 60 queue: 0
> >  ksoftirqd/2-21[002] .ns474.711765: ath10k_htt_tx_inc_pending: 
> > num_pen: 61
> >  ksoftirqd/2-21[002] .ns474.711781: ath10k_htt_tx_inc_pending: 
> > num_pen: 62
> >  ksoftirqd/2-21[002] .ns474.711787: ath10k_htt_tx_dec_pending: 
> > num_pen: 61
> >  ksoftirqd/2-21[002] .ns374.711803: 
> > ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: 
> > frame_cnt: 1 byte_cnt: 1534 num_pen: 61 queue: 0
> >  ksoftirqd/2-21[002] .ns474.711807: ath10k_htt_tx_inc_pending: 
> > num_pen: 62
> >  ksoftirqd/2-21[002] .ns474.711823: ath10k_htt_tx_inc_pending: 
> > num_pen: 63
> >  ksoftirqd/2-21[002] .ns474.711829: ath10k_htt_tx_dec_pending: 
> > num_pen: 62
> >  ksoftirqd/2-21[002] .ns374.711845: 
> > ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: 
> > frame_cnt: 1 byte_cnt: 1534 num_pen: 62 queue: 0
> >  ksoftirqd/2-21[002] .ns474.711849: ath10k_htt_tx_inc_pending: 
> > num_pen: 63
> >  ksoftirqd/2-21[002] .ns474.711865: ath10k_htt_tx_inc_pending: 
> > num_pen: 64
> >  ksoftirqd/2-21[002] dns574.711870: stop_queue: phy0 queue:0, 
> > reason:0
> >  ksoftirqd/2-21[002] dns574.711874: stop_queue: phy0 queue:1, 
> > reason:0
> >  ksoftirqd/2-21[002] dns574.711877: stop_queue: phy0 queue:2, 
> > reason:0
> >  ksoftirqd/2-21[002] dns574.711880: stop_queue: phy0 queue:3, 
> > reason:0
> >  ksoftirqd/2-21[002] dns574.711882: stop_queue: phy0 queue:4, 
> > reason:0
> >  ksoftirqd/2-21[002] dns574.711885: stop_queue: phy0 queue:5, 
> > reason:0
> >  ksoftirqd/2-21[002] dns574.711887: stop_queue: phy0 queue:6, 
> > reason:0
> >  ksoftirqd/2-21[002] dns574.711890: stop_queue: phy0 queue:7, 
> > reason:0
> >  ksoftirqd/2-21[002] dns574.711892: stop_queue: phy0 queue:8, 
> > reason:0
> >  ksoftirqd/2-21[002] dns574.711895: stop_queue: phy0 queue:9, 
> > reason:0
> >  ksoftirqd/2-21[002] dns574.711898: stop_queue: phy0 queue:10, 
> > reason:0
> >  ksoftirqd/2-21[002] dns574.711900: stop_queue: phy0 queue:11, 
> > reason:0
> >  ksoftirqd/2-21[002] dns574.711903: stop_queue: phy0 queue:12, 
> > reason:0
> >  ksoftirqd/2-21[002] dns574.711905: stop_queue: phy0 queue:13, 
> > reason:0
> >  ksoftirqd/2-21[002] dns574.711908: stop_queue: phy0 queue:14, 
> > reason:0
> >  ksoftirqd/2-21[002] dns574.711910: stop_queue: phy0 queue:15, 
> > reason:0
> >  ksoftirqd/2-21[002] .ns474.711917: ath10k_htt_tx_dec_pending: 
> > num_pen: 63
> >  ksoftirqd/2-21[002] dns574.711922: wake_queue: phy0 queue:0, 
> > reason:0
> >  ksoftirqd/2-21[002] dns574.711929: wake_queue: phy0 queue:15, 
> > reason:0
> >  ksoftirqd/2-21[002] .ns374.711948: 
> > ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: 
> > frame_cnt: 1 byte_cnt: 1534 num_pen: 63 queue: 0
> >  ksoftirqd/2-21[002] .ns474.711952: ath10k_htt_tx_inc_pending: 
> > num_pen: 64
> >  ksoftirqd/2-21[002] dns574.711956: stop_queue: phy0 queue:0, 
> > reason:0
> >  ksoftirqd/2-21[002] dns574.711959: stop_queue: phy0 queue:1, 
> 

Re: ath10k wake_tx_queue issues

2018-05-15 Thread Toke Høiland-Jørgensen
[ Adding Felix ]


Niklas Cassel  writes:

> Hello mac80211 and ath10k people
>
> Using ath10k, TX stops working when running iperf
>
> [  3]  0.0- 1.0 sec  2.00 MBytes  16.8 Mbits/sec
> [  3]  1.0- 2.0 sec  3.12 MBytes  26.2 Mbits/sec
> [  3]  2.0- 3.0 sec  3.25 MBytes  27.3 Mbits/sec
> [  3]  3.0- 4.0 sec   655 KBytes  5.36 Mbits/sec
> [  3]  4.0- 5.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  0.0-10.3 sec  9.01 MBytes  7.32 Mbits/sec
>
> The problem can be reproduced without specifying a send buffer size,
> however, specifying a small send buffer helps to reproduce the problem faster.
>
> What happens is that iperf gets -EAGAIN on write().
> It continues to get -EAGAIN, even if iperf runs for e.g. 300 seconds.
> The reason why we get -EAGAIN is because the send socket buffer is full
> (iperf uses non-blocking I/O).
>
> The problem is that the mac80211 wake_tx_queue callback never comes.
>
> I guess the best way to describe this is to show my ftrace buffer:
>
>  ksoftirqd/2-21[002] .ns474.711744: ath10k_htt_tx_dec_pending: 
> num_pen: 60
>  ksoftirqd/2-21[002] .ns374.711761: ath10k_mac_op_wake_tx_queue: 
> txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 
> 60 queue: 0
>  ksoftirqd/2-21[002] .ns474.711765: ath10k_htt_tx_inc_pending: 
> num_pen: 61
>  ksoftirqd/2-21[002] .ns474.711781: ath10k_htt_tx_inc_pending: 
> num_pen: 62
>  ksoftirqd/2-21[002] .ns474.711787: ath10k_htt_tx_dec_pending: 
> num_pen: 61
>  ksoftirqd/2-21[002] .ns374.711803: ath10k_mac_op_wake_tx_queue: 
> txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 
> 61 queue: 0
>  ksoftirqd/2-21[002] .ns474.711807: ath10k_htt_tx_inc_pending: 
> num_pen: 62
>  ksoftirqd/2-21[002] .ns474.711823: ath10k_htt_tx_inc_pending: 
> num_pen: 63
>  ksoftirqd/2-21[002] .ns474.711829: ath10k_htt_tx_dec_pending: 
> num_pen: 62
>  ksoftirqd/2-21[002] .ns374.711845: ath10k_mac_op_wake_tx_queue: 
> txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 
> 62 queue: 0
>  ksoftirqd/2-21[002] .ns474.711849: ath10k_htt_tx_inc_pending: 
> num_pen: 63
>  ksoftirqd/2-21[002] .ns474.711865: ath10k_htt_tx_inc_pending: 
> num_pen: 64
>  ksoftirqd/2-21[002] dns574.711870: stop_queue: phy0 queue:0, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711874: stop_queue: phy0 queue:1, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711877: stop_queue: phy0 queue:2, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711880: stop_queue: phy0 queue:3, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711882: stop_queue: phy0 queue:4, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711885: stop_queue: phy0 queue:5, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711887: stop_queue: phy0 queue:6, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711890: stop_queue: phy0 queue:7, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711892: stop_queue: phy0 queue:8, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711895: stop_queue: phy0 queue:9, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711898: stop_queue: phy0 queue:10, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711900: stop_queue: phy0 queue:11, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711903: stop_queue: phy0 queue:12, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711905: stop_queue: phy0 queue:13, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711908: stop_queue: phy0 queue:14, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711910: stop_queue: phy0 queue:15, 
> reason:0
>  ksoftirqd/2-21[002] .ns474.711917: ath10k_htt_tx_dec_pending: 
> num_pen: 63
>  ksoftirqd/2-21[002] dns574.711922: wake_queue: phy0 queue:0, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711929: wake_queue: phy0 queue:15, 
> reason:0
>  ksoftirqd/2-21[002] .ns374.711948: ath10k_mac_op_wake_tx_queue: 
> txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 
> 63 queue: 0
>  ksoftirqd/2-21[002] .ns474.711952: ath10k_htt_tx_inc_pending: 
> num_pen: 64
>  ksoftirqd/2-21[002] dns574.711956: stop_queue: phy0 queue:0, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711959: stop_queue: phy0 queue:1, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711962: stop_queue: phy0 queue:2, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711964: stop_queue: phy0 queue:3, 
> reason:0
>  ksoftirqd/2-21[002] dns574.711967: stop_queue: phy0 queue:4, 
> reason:0
>  

Re: ath10k wake_tx_queue issues

2018-05-15 Thread Ben Greear


On 05/15/2018 06:45 AM, Niklas Cassel wrote:

Hello mac80211 and ath10k people

Using ath10k, TX stops working when running iperf


What kernel and firmware and ath10k hardware are you using?

Does the same thing happen on older kernels?  I had some issues
with ath9k having some sort of tx-hang, but my test case was
using 200 virtual stations to transmit, and others could not reproduce
it, so I am not sure I saw the same issue as you did.

Thanks,
Ben



[  3]  0.0- 1.0 sec  2.00 MBytes  16.8 Mbits/sec
[  3]  1.0- 2.0 sec  3.12 MBytes  26.2 Mbits/sec
[  3]  2.0- 3.0 sec  3.25 MBytes  27.3 Mbits/sec
[  3]  3.0- 4.0 sec   655 KBytes  5.36 Mbits/sec
[  3]  4.0- 5.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  0.0-10.3 sec  9.01 MBytes  7.32 Mbits/sec

The problem can be reproduced without specifying a send buffer size,
however, specifying a small send buffer helps to reproduce the problem faster.

What happens is that iperf gets -EAGAIN on write().
It continues to get -EAGAIN, even if iperf runs for e.g. 300 seconds.
The reason why we get -EAGAIN is because the send socket buffer is full
(iperf uses non-blocking I/O).

The problem is that the mac80211 wake_tx_queue callback never comes.

I guess the best way to describe this is to show my ftrace buffer:

 ksoftirqd/2-21[002] .ns474.711744: ath10k_htt_tx_dec_pending: 
num_pen: 60
 ksoftirqd/2-21[002] .ns374.711761: ath10k_mac_op_wake_tx_queue: 
txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 60 
queue: 0
 ksoftirqd/2-21[002] .ns474.711765: ath10k_htt_tx_inc_pending: 
num_pen: 61
 ksoftirqd/2-21[002] .ns474.711781: ath10k_htt_tx_inc_pending: 
num_pen: 62
 ksoftirqd/2-21[002] .ns474.711787: ath10k_htt_tx_dec_pending: 
num_pen: 61
 ksoftirqd/2-21[002] .ns374.711803: ath10k_mac_op_wake_tx_queue: 
txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 61 
queue: 0
 ksoftirqd/2-21[002] .ns474.711807: ath10k_htt_tx_inc_pending: 
num_pen: 62
 ksoftirqd/2-21[002] .ns474.711823: ath10k_htt_tx_inc_pending: 
num_pen: 63
 ksoftirqd/2-21[002] .ns474.711829: ath10k_htt_tx_dec_pending: 
num_pen: 62
 ksoftirqd/2-21[002] .ns374.711845: ath10k_mac_op_wake_tx_queue: 
txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 62 
queue: 0
 ksoftirqd/2-21[002] .ns474.711849: ath10k_htt_tx_inc_pending: 
num_pen: 63
 ksoftirqd/2-21[002] .ns474.711865: ath10k_htt_tx_inc_pending: 
num_pen: 64
 ksoftirqd/2-21[002] dns574.711870: stop_queue: phy0 queue:0, 
reason:0
 ksoftirqd/2-21[002] dns574.711874: stop_queue: phy0 queue:1, 
reason:0
 ksoftirqd/2-21[002] dns574.711877: stop_queue: phy0 queue:2, 
reason:0
 ksoftirqd/2-21[002] dns574.711880: stop_queue: phy0 queue:3, 
reason:0
 ksoftirqd/2-21[002] dns574.711882: stop_queue: phy0 queue:4, 
reason:0
 ksoftirqd/2-21[002] dns574.711885: stop_queue: phy0 queue:5, 
reason:0
 ksoftirqd/2-21[002] dns574.711887: stop_queue: phy0 queue:6, 
reason:0
 ksoftirqd/2-21[002] dns574.711890: stop_queue: phy0 queue:7, 
reason:0
 ksoftirqd/2-21[002] dns574.711892: stop_queue: phy0 queue:8, 
reason:0
 ksoftirqd/2-21[002] dns574.711895: stop_queue: phy0 queue:9, 
reason:0
 ksoftirqd/2-21[002] dns574.711898: stop_queue: phy0 queue:10, 
reason:0
 ksoftirqd/2-21[002] dns574.711900: stop_queue: phy0 queue:11, 
reason:0
 ksoftirqd/2-21[002] dns574.711903: stop_queue: phy0 queue:12, 
reason:0
 ksoftirqd/2-21[002] dns574.711905: stop_queue: phy0 queue:13, 
reason:0
 ksoftirqd/2-21[002] dns574.711908: stop_queue: phy0 queue:14, 
reason:0
 ksoftirqd/2-21[002] dns574.711910: stop_queue: phy0 queue:15, 
reason:0
 ksoftirqd/2-21[002] .ns474.711917: ath10k_htt_tx_dec_pending: 
num_pen: 63
 ksoftirqd/2-21[002] dns574.711922: wake_queue: phy0 queue:0, 
reason:0
 ksoftirqd/2-21[002] dns574.711929: wake_queue: phy0 queue:15, 
reason:0
 ksoftirqd/2-21[002] .ns374.711948: ath10k_mac_op_wake_tx_queue: 
txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 63 
queue: 0
 ksoftirqd/2-21[002] .ns474.711952: ath10k_htt_tx_inc_pending: 
num_pen: 64
 ksoftirqd/2-21[002] dns574.711956: stop_queue: phy0 queue:0, 
reason:0
 ksoftirqd/2-21[002] dns574.711959: stop_queue: phy0 queue:1, 
reason:0
 ksoftirqd/2-21[002] dns574.711962: stop_queue: phy0 queue:2, 
reason:0
 ksoftirqd/2-21[002] dns574.711964: stop_queue: phy0 queue:3, 
reason:0