Re: [PATCH] ath10k: enable threaded napi on ath10k driver

2021-12-22 Thread Felix Fietkau

On 2021-12-14 23:39, Abhishek Kumar wrote:

NAPI poll can be done in threaded context along with soft irq
context. Threaded context can be scheduled efficiently, thus
creating less of bottleneck during Rx processing. This patch is
to enable threaded NAPI on ath10k driver.

Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.2.2-00696-QCAHLSWMTPL-1
Signed-off-by: Abhishek Kumar 

The name of the dummy device should be filled in as well. How about
something like this:

--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -3576,6 +3576,9 @@ struct ath10k *ath10k_core_create(size_t priv_size, 
struct device *dev,
  ath10k_core_set_coverage_class_work);
 
 	init_dummy_netdev(>napi_dev);

+   snprintf(ar->napi_dev.name, sizeof(ar->napi_dev.name), "%s",
+wiphy_name(ar->hw->wiphy));
+   ar->napi_dev.threaded = 1;
 
 	ret = ath10k_coredump_create(ar);

if (ret)

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [RFC 2/7] ath10k: Add support to process rx packet in thread

2021-03-25 Thread Felix Fietkau


On 2021-03-25 10:45, Rakesh Pillai wrote:
> Hi Felix / Ben,
> 
> In case of ath10k (snoc based targets), we have a lot of processing in the 
> NAPI context.
> Even moving this to threaded NAPI is not helping much due to the load.
> 
> Breaking the tasks into multiple context (with the patch series I posted) is 
> helping in improving the throughput.
> With the current rx_thread based approach, the rx processing is broken into 
> two parallel contexts
> 1) reaping the packets from the HW
> 2) processing these packets list and handing it over to mac80211 (and later 
> to the network stack)
> 
> This is the primary reason for choosing the rx thread approach.
Have you considered the possibility that maybe the problem is that the
driver doing too much work?
One example is that you could take advantage of the new 802.3 decap
offload to simplify rx processing. Worked for me on mt76 where a
dual-core 1.3 GHz A64 can easily handle >1.8 Gbps local TCP rx on a
single card, without the rx NAPI thread being the biggest consumer of
CPU cycles.

And if you can't do that and still consider all of the metric tons of
processing work necessary, you could still do this:
On interrupts, spawn a processing thread that traverses the ring and
does the preparation work (instead of NAPI).
>From that thread you schedule the threaded NAPI handler that processes
these packets further and hands them to mac80211.
To keep the load somewhat balanced, you can limit the number of
pre-processed packets in the ring.

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [RFC 2/7] ath10k: Add support to process rx packet in thread

2021-03-23 Thread Felix Fietkau


On 2021-03-23 04:01, Ben Greear wrote:
> On 3/22/21 6:20 PM, Brian Norris wrote:
>> On Mon, Mar 22, 2021 at 4:58 PM Ben Greear  wrote:
>>> On 7/22/20 6:00 AM, Felix Fietkau wrote:
>>>> On 2020-07-22 14:55, Johannes Berg wrote:
>>>>> On Wed, 2020-07-22 at 14:27 +0200, Felix Fietkau wrote:
>>>>>
>>>>>> I'm considering testing a different approach (with mt76 initially):
>>>>>> - Add a mac80211 rx function that puts processed skbs into a list
>>>>>> instead of handing them to the network stack directly.
>>>>>
>>>>> Would this be *after* all the mac80211 processing, i.e. in place of the
>>>>> rx-up-to-stack?
>>>> Yes, it would run all the rx handlers normally and then put the
>>>> resulting skbs into a list instead of calling netif_receive_skb or
>>>> napi_gro_frags.
>>>
>>> Whatever came of this?  I realized I'm running Felix's patch since his mt76
>>> driver needs it.  Any chance it will go upstream?
>> 
>> If you're asking about $subject (moving NAPI/RX to a thread), this
>> landed upstream recently:
>> http://git.kernel.org/linus/adbb4fb028452b1b0488a1a7b66ab856cdf20715
>> 
>> It needs a bit of coaxing to work on a WiFi driver (including: WiFi
>> drivers tend to have a different netdev for NAPI than they expose to
>> /sys/class/net/), but it's there.
>> 
>> I'm not sure if people had something else in mind in the stuff you're
>> quoting though.
> 
> No, I got it confused with something Felix did:
> 
> https://github.com/greearb/mt76/blob/master/patches/0001-net-add-support-for-threaded-NAPI-polling.patch
> 
> Maybe the NAPI/RX to a thread thing superceded Felix's patch?
Yes, it did and it's in linux-next already.
I sent the following change to make mt76 use it:
https://github.com/nbd168/wireless/commit/1d4ff31437e5aaa999bd7a

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [RFC 0/7] Add support to process rx packets in thread

2020-07-26 Thread Felix Fietkau
On 2020-07-26 10:32, Hillf Danton wrote:
> 
> On Sun, 26 Jul 2020 10:10:15 +0200 Felix Fietkau wrote:
>> On 2020-07-26 03:22, Hillf Danton wrote:
>> > 
>> > Feel free to do that. Is it likely for me to select a Cc?
>> > 
>> Shall I use Signed-off-by: Hillf Danton ?
> 
> s/Signed-off-by/Cc/
> 
>> What Cc do you want me to add?
> 
> I prefer Cc over other tags.
Ah, okay. I was planning on adding you as the author of the patch, since
you did most of the work on it. If you want to be attributed as author
(or in Co-developed-by), I'd need your Signed-off-by.

If you don't want that, I can submit it under my name and leave you in
as Cc only.

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [RFC 0/7] Add support to process rx packets in thread

2020-07-26 Thread Felix Fietkau
On 2020-07-26 03:22, Hillf Danton wrote:
>> - add a state bit for threaded NAPI
>> - make netif_threaded_napi_add inline
>> - run queue_work outside of local_irq_save/restore (it does that
>> internally already)
>>
>> If you don't mind, I'd like to propose this to netdev soon. Can I have
>> your Signed-off-by for that?
> 
> Feel free to do that. Is it likely for me to select a Cc?
Shall I use Signed-off-by: Hillf Danton ?
What Cc do you want me to add?

>> ---
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -347,6 +347,7 @@ struct napi_struct {
>>  struct list_headdev_list;
>>  struct hlist_node   napi_hash_node;
>>  unsigned intnapi_id;
>> +struct work_struct  work;
>>  };
>>  
>>  enum {
>> @@ -357,6 +358,7 @@ enum {
>>  NAPI_STATE_HASHED,  /* In NAPI hash (busy polling possible) */
>>  NAPI_STATE_NO_BUSY_POLL,/* Do not add in napi_hash, no busy polling */
>>  NAPI_STATE_IN_BUSY_POLL,/* sk_busy_loop() owns this NAPI */
>> +NAPI_STATE_THREADED,/* Use threaded NAPI */
>>  };
>>  
>>  enum {
>> @@ -367,6 +369,7 @@ enum {
>>  NAPIF_STATE_HASHED   = BIT(NAPI_STATE_HASHED),
>>  NAPIF_STATE_NO_BUSY_POLL = BIT(NAPI_STATE_NO_BUSY_POLL),
>>  NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL),
>> +NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED),
>>  };
>>  
>>  enum gro_result {
>> @@ -2315,6 +2318,26 @@ static inline void *netdev_priv(const struct 
>> net_device *dev)
>>  void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
>>  int (*poll)(struct napi_struct *, int), int weight);
>>  
>> +/**
>> + *  netif_threaded_napi_add - initialize a NAPI context
>> + *  @dev:  network device
>> + *  @napi: NAPI context
>> + *  @poll: polling function
>> + *  @weight: default weight
>> + *
>> + * This variant of netif_napi_add() should be used from drivers using NAPI
>> + * with CPU intensive poll functions.
>> + * This will schedule polling from a high priority workqueue that
>> + */
>> +static inline void netif_threaded_napi_add(struct net_device *dev,
>> +   struct napi_struct *napi,
>> +   int (*poll)(struct napi_struct *, 
>> int),
>> +   int weight)
>> +{
>> +set_bit(NAPI_STATE_THREADED, >state);
>> +netif_napi_add(dev, napi, poll, weight);
>> +}
>> +
>>  /**
>>   *  netif_tx_napi_add - initialize a NAPI context
>>   *  @dev:  network device
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -158,6 +158,7 @@ static DEFINE_SPINLOCK(offload_lock);
>>  struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;
>>  struct list_head ptype_all __read_mostly;   /* Taps */
>>  static struct list_head offload_base __read_mostly;
>> +static struct workqueue_struct *napi_workq;
> 
> Is __read_mostly missing?
Yes, thanks. I will add that before sending the patch.

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [RFC 0/7] Add support to process rx packets in thread

2020-07-25 Thread Felix Fietkau
On 2020-07-25 10:16, Hillf Danton wrote:
> Hi folks
> 
> Below is a minimunm poc implementation I can imagine on top of workqueue
> to make napi threaded. Thoughts are appreciated.
Hi Hillf,

For some reason I don't see your mails on linux-wireless/netdev.
I've cleaned up your implementation a bit and I ran some tests with mt76
on an mt7621 embedded board. The results look pretty nice, performance
is a lot more consistent in my tests now.

Here are the changes that I've made compared to your version:

- remove the #ifdef, I think it's unnecessary
- add a state bit for threaded NAPI
- make netif_threaded_napi_add inline
- run queue_work outside of local_irq_save/restore (it does that
internally already)

If you don't mind, I'd like to propose this to netdev soon. Can I have
your Signed-off-by for that?

Thanks,

- Felix

---
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -347,6 +347,7 @@ struct napi_struct {
struct list_headdev_list;
struct hlist_node   napi_hash_node;
unsigned intnapi_id;
+   struct work_struct  work;
 };
 
 enum {
@@ -357,6 +358,7 @@ enum {
NAPI_STATE_HASHED,  /* In NAPI hash (busy polling possible) */
NAPI_STATE_NO_BUSY_POLL,/* Do not add in napi_hash, no busy polling */
NAPI_STATE_IN_BUSY_POLL,/* sk_busy_loop() owns this NAPI */
+   NAPI_STATE_THREADED,/* Use threaded NAPI */
 };
 
 enum {
@@ -367,6 +369,7 @@ enum {
NAPIF_STATE_HASHED   = BIT(NAPI_STATE_HASHED),
NAPIF_STATE_NO_BUSY_POLL = BIT(NAPI_STATE_NO_BUSY_POLL),
NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL),
+   NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED),
 };
 
 enum gro_result {
@@ -2315,6 +2318,26 @@ static inline void *netdev_priv(const struct net_device 
*dev)
 void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
int (*poll)(struct napi_struct *, int), int weight);
 
+/**
+ * netif_threaded_napi_add - initialize a NAPI context
+ * @dev:  network device
+ * @napi: NAPI context
+ * @poll: polling function
+ * @weight: default weight
+ *
+ * This variant of netif_napi_add() should be used from drivers using NAPI
+ * with CPU intensive poll functions.
+ * This will schedule polling from a high priority workqueue that
+ */
+static inline void netif_threaded_napi_add(struct net_device *dev,
+  struct napi_struct *napi,
+  int (*poll)(struct napi_struct *, 
int),
+  int weight)
+{
+   set_bit(NAPI_STATE_THREADED, >state);
+   netif_napi_add(dev, napi, poll, weight);
+}
+
 /**
  * netif_tx_napi_add - initialize a NAPI context
  * @dev:  network device
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -158,6 +158,7 @@ static DEFINE_SPINLOCK(offload_lock);
 struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;
 struct list_head ptype_all __read_mostly;  /* Taps */
 static struct list_head offload_base __read_mostly;
+static struct workqueue_struct *napi_workq;
 
 static int netif_rx_internal(struct sk_buff *skb);
 static int call_netdevice_notifiers_info(unsigned long val,
@@ -6286,6 +6287,11 @@ void __napi_schedule(struct napi_struct *n)
 {
unsigned long flags;
 
+   if (test_bit(NAPI_STATE_THREADED, >state)) {
+   queue_work(napi_workq, >work);
+   return;
+   }
+
local_irq_save(flags);
napi_schedule(this_cpu_ptr(_data), n);
local_irq_restore(flags);
@@ -6333,6 +6339,11 @@ EXPORT_SYMBOL(napi_schedule_prep);
  */
 void __napi_schedule_irqoff(struct napi_struct *n)
 {
+   if (test_bit(NAPI_STATE_THREADED, >state)) {
+   queue_work(napi_workq, >work);
+   return;
+   }
+
napi_schedule(this_cpu_ptr(_data), n);
 }
 EXPORT_SYMBOL(__napi_schedule_irqoff);
@@ -6601,6 +6612,29 @@ static void init_gro_hash(struct napi_struct *napi)
napi->gro_bitmask = 0;
 }
 
+static void napi_workfn(struct work_struct *work)
+{
+   struct napi_struct *n = container_of(work, struct napi_struct, work);
+
+   for (;;) {
+   if (!test_bit(NAPI_STATE_SCHED, >state))
+   return;
+
+   if (n->poll(n, n->weight) < n->weight)
+   return;
+
+   if (need_resched()) {
+   /*
+* have to pay for the latency of task switch even if
+* napi is scheduled
+*/
+   if (test_bit(NAPI_STATE_SCHED, >state))
+   queue_work(napi_workq, work);
+   return;
+   }
+   }
+}
+
 void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
int (*poll)(struct napi_struct *, int), int weight)
 {
@@ -6621,6 +6655,7 @@ void 

Re: [RFC 2/7] ath10k: Add support to process rx packet in thread

2020-07-22 Thread Felix Fietkau
On 2020-07-22 14:55, Johannes Berg wrote:
> On Wed, 2020-07-22 at 14:27 +0200, Felix Fietkau wrote:
> 
>> I'm considering testing a different approach (with mt76 initially):
>> - Add a mac80211 rx function that puts processed skbs into a list
>> instead of handing them to the network stack directly.
> 
> Would this be *after* all the mac80211 processing, i.e. in place of the
> rx-up-to-stack?
Yes, it would run all the rx handlers normally and then put the
resulting skbs into a list instead of calling netif_receive_skb or
napi_gro_frags.

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [RFC 2/7] ath10k: Add support to process rx packet in thread

2020-07-22 Thread Felix Fietkau
On 2020-07-21 23:53, Rajkumar Manoharan wrote:
> On 2020-07-21 10:14, Rakesh Pillai wrote:
>> NAPI instance gets scheduled on a CPU core on which
>> the IRQ was triggered. The processing of rx packets
>> can be CPU intensive and since NAPI cannot be moved
>> to a different CPU core, to get better performance,
>> its better to move the gist of rx packet processing
>> in a high priority thread.
>> 
>> Add the init/deinit part for a thread to process the
>> receive packets.
>> 
> IMHO this defeat the whole purpose of NAPI. Originally in ath10k
> irq processing happened in tasklet (high priority) context which in
> turn push more data to net core even though net is unable to process
> driver data as both happen in different context (fast producer - slow 
> consumer)
> issue. Why can't CPU governor schedule the interrupts in less loaded CPU 
> core?
> Otherwise you can play with different RPS and affinity settings to meet 
> your
> requirement.
> 
> IMO introducing high priority tasklets/threads is not viable solution.
I'm beginning to think that the main problem with NAPI here is that the
work done by poll functions on 802.11 drivers is significantly more CPU
intensive compared to ethernet drivers, possibly more than what NAPI was
designed for.

I'm considering testing a different approach (with mt76 initially):
- Add a mac80211 rx function that puts processed skbs into a list
instead of handing them to the network stack directly.
- Move all rx processing to a high priority thread, keep a driver
internal queue for fully processed packets.
- Schedule NAPI poll on completion.
- NAPI poll function pulls from the internal queue and passes to the
network stack.

With this approach, the network stack retains some control over the
processing rate of rx packets, while the scheduler can move the CPU
intensive processing around to where it fits best.

What do you think?

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v3 3/6] mac80211: Add airtime accounting and scheduling to TXQs

2018-11-15 Thread Felix Fietkau
On 2018-11-14 18:40, Toke Høiland-Jørgensen wrote:
>> This part doesn't really make much sense to me, but maybe I'm
>> misunderstanding how the code works.
>> Let's assume we have a driver like ath9k or mt76, which tries to keep a
>> number of aggregates in the hardware queue, and the hardware queue is
>> currently empty.
>> If the current txq entry is kept at the head of the schedule list,
>> wouldn't the code just pull from that one over and over again, until
>> enough packets are transmitted by the hardware and their tx status
>> processed?
>> It seems to me that while fairness is still preserved in the long run,
>> this could lead to rather bursty scheduling, which may not be
>> particularly latency friendly.
> 
> Yes, it'll be a bit more bursty when the hardware queue is completely
> empty. However, when a TX completion comes back, that will adjust the
> deficit of that sta and cause it to be rotated on the next dequeue. This
> obviously relies on the fact that the lower-level hardware queue is
> sufficiently shallow to not add a lot of latency. But we want that to be
> the case anyway. In practice, it works quite well for ath9k, but not so
> well for ath10k because it has a large buffer in firmware.
> 
> If we requeue the TXQ at the end of the list, a station that is taking
> up too much airtime will fail to be throttled properly, so the
> queue-at-head is kinda needed to ensure fairness...
Thanks for the explanation, that makes sense to me. I have an idea on
how to mitigate the burstiness within the driver. I'll write it down in
pseudocode, please let me know if you think that'll work.

do {
struct ieee80211_txq *pending_txq[4];
int n_pending_txq = 0;
int i;

if (hwq->pending < 4)
break;

nframes = 0;

ieee80211_txq_schedule_start(hw, ac)
do {
bool requeue = false;

struct ieee80211_txq *txq;

txq = ieee80211_next_txq(hw, ac);
if (!txq)
break;

nframes += schedule_txq(txq, );
if (requeue)
pending_txq[n_pending_txq++] = txq;

} while (n_pending_txq < ARRAY_SIZE(pending_txq));

for (i = n_pending_txq; i > 0; i--)
ieee80211_return_txq(hw, pending_txq[i - 1]);

ieee80211_txq_schedule_end(hw, ac)
} while (nframes);

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v3 3/6] mac80211: Add airtime accounting and scheduling to TXQs

2018-11-14 Thread Felix Fietkau
On 2018-11-12 23:51, Rajkumar Manoharan wrote:
> From: Toke Høiland-Jørgensen 
> 
> This adds airtime accounting and scheduling to the mac80211 TXQ
> scheduler. A new callback, ieee80211_sta_register_airtime(), is added
> that drivers can call to report airtime usage for stations.
> 
> When airtime information is present, mac80211 will schedule TXQs
> (through ieee80211_next_txq()) in a way that enforces airtime fairness
> between active stations. This scheduling works the same way as the ath9k
> in-driver airtime fairness scheduling. If no airtime usage is reported
> by the driver, the scheduler will default to round-robin scheduling.
> 
> For drivers that don't control TXQ scheduling in software, a new API
> function, ieee80211_txq_may_transmit(), is added which the driver can use
> to check if the TXQ is eligible for transmission, or should be throttled to
> enforce fairness. Calls to this function must also be enclosed in
> ieee80211_txq_schedule_{start,end}() calls to ensure proper locking.
> 
> The API ieee80211_txq_may_transmit() also ensures that TXQ list will be
> aligned aginst driver's own round-robin scheduler list. i.e it rotates
> the TXQ list till it makes the requested node becomes the first entry
> in TXQ list. Thus both the TXQ list and driver's list are in sync.
> 
> Co-Developed-by: Rajkumar Manoharan 
> Signed-off-by: Toke Høiland-Jørgensen 
> Signed-off-by: Rajkumar Manoharan 
> ---
>  include/net/mac80211.h | 59 ++
>  net/mac80211/cfg.c |  3 ++
>  net/mac80211/debugfs.c |  3 ++
>  net/mac80211/debugfs_sta.c | 50 --
>  net/mac80211/ieee80211_i.h |  2 ++
>  net/mac80211/main.c|  4 +++
>  net/mac80211/sta_info.c| 44 +--
>  net/mac80211/sta_info.h| 13 +++
>  net/mac80211/status.c  |  6 
>  net/mac80211/tx.c  | 90 
> +++---
>  10 files changed, 264 insertions(+), 10 deletions(-)
> 
> diff --git a/net/mac80211/status.c b/net/mac80211/status.c
> index aa4afbf0abaf..a1f1256448f5 100644
> --- a/net/mac80211/status.c
> +++ b/net/mac80211/status.c
> @@ -818,6 +818,12 @@ static void __ieee80211_tx_status(struct ieee80211_hw 
> *hw,
>   ieee80211_sta_tx_notify(sta->sdata, (void *) skb->data,
>   acked, info->status.tx_time);
>  
> + if (info->status.tx_time &&
> + wiphy_ext_feature_isset(local->hw.wiphy,
> + 
> NL80211_EXT_FEATURE_AIRTIME_FAIRNESS))
> + ieee80211_sta_register_airtime(>sta, tid,
> +info->status.tx_time, 0);
> +
>   if (ieee80211_hw_check(>hw, REPORTS_TX_ACK_STATUS)) {
>   if (info->flags & IEEE80211_TX_STAT_ACK) {
>   if (sta->status_stats.lost_packets)
I think the same is needed in ieee80211_tx_status_ext.

> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> index 305965283506..3f417e80e041 100644
> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> @@ -3660,12 +3680,74 @@ void ieee80211_return_txq(struct ieee80211_hw *hw,
>   lockdep_assert_held(>active_txq_lock[txq->ac]);
>  
>   if (list_empty(>schedule_order) &&
> - (!skb_queue_empty(>frags) || txqi->tin.backlog_packets))
> - list_add_tail(>schedule_order,
> -   >active_txqs[txq->ac]);
> + (!skb_queue_empty(>frags) || txqi->tin.backlog_packets)) {
> + /* If airtime accounting is active, always enqueue STAs at the
> +  * head of the list to ensure that they only get moved to the
> +  * back by the airtime DRR scheduler once they have a negative
> +  * deficit. A station that already has a negative deficit will
> +  * get immediately moved to the back of the list on the next
> +  * call to ieee80211_next_txq().
> +  */
> + if (txqi->txq.sta &&
> + wiphy_ext_feature_isset(local->hw.wiphy,
> + 
> NL80211_EXT_FEATURE_AIRTIME_FAIRNESS))
> + list_add(>schedule_order,
> +  >active_txqs[txq->ac]);
> + else
> + list_add_tail(>schedule_order,
> +   >active_txqs[txq->ac]);
> + }
>  }
This part doesn't really make much sense to me, but maybe I'm
misunderstanding how the code works.
Let's assume we have a driver like ath9k or mt76, which tries to keep a
number of aggregates in the hardware queue, and the hardware queue is
currently empty.
If the current txq entry is kept at the head of the schedule list,
wouldn't the code just pull from that one over and over again, until
enough packets are transmitted by the hardware and their tx status
processed?
It seems to me that while fairness is still 

Re: [PATCH v12] ath10k: add LED and GPIO controlling support for various chipsets

2018-03-08 Thread Felix Fietkau
On 2018-03-08 15:05, Pavel Machek wrote:
> On Thu 2018-03-08 13:33:29, Sebastian Gottschall wrote:
>> Am 08.03.2018 um 10:02 schrieb Pavel Machek:
>> >On Wed 2018-03-07 18:54:41, Sebastian Gottschall wrote:
>> >>Am 07.03.2018 um 17:22 schrieb Rafał Miłecki:
>> >>>On 2 March 2018 at 10:22, Sebastian Gottschall  
>> >>>wrote:
>> >>leds-gpio is crap and limited. you can just register one platform data 
>> >>at
>> >>kernel runtime since its identified by its object name "led-gpio" but 
>> >>the
>> >>kernel forbidds to register 2 platform datas with the same name
>> >>consider the ar71xx devices with qca988x wifi chipsets. they all have
>> >>already a led platform data registered
>> >>at boottime. a second can't be registered anymore so gpio_led is 
>> >>useless
>> >>at
>> >>all for most developers on such platforms. its mainly used for early
>> >>kernel
>> >>platform data initialisation for system leds.
>> >If leds-gpio has limitations, please fix those, rather then
>> >introducing duplicated code.
>> there is no duplicated code introduced and there is no solution for it.
>> consider that all wifi drivers with softled support
>> are going that way with registering a own led driver. see ath9k for
>> instance. gpio-led cannot be used for it and there is no way to
>> support multiple platform datas with the same name. its a kernel 
>> limitation
>> >>>I just reviewed some mips arch patch adding support for more LEDs for
>> >>>selected devices:
>> >>>[PATCH] MIPS: BCM47XX: Add Luxul XAP1500/XWR1750 WiFi LEDs
>> >>>https://patchwork.linux-mips.org/patch/18681/
>> >>>
>> >>>It seems to be simply adding another call to the
>> >>>gpio_led_register_device(). It seems to me you can call that function
>> >>>multiple times and register multiple structs with LEDs.
>> >>>
>> >>>Isn't all you need something like this?
>> >>>
>> >>>static const struct gpio_led ath10k_leds[] = {
>> >>> {
>> >>> .name = "ath10k:color:function",
>> >>> .active_low = 1,
>> >>> .default_state = LEDS_GPIO_DEFSTATE_KEEP,
>> >>> }
>> >>>};
>> >>>
>> >>>static struct gpio_led_platform_data bcm47xx_leds_pdata_extra = {
>> >>> leds = ath10k_leds;
>> >>> num_leds = ARRAY_SIZE(ath10k_leds);
>> >>>};
>> >>>
>> >>>ath10k_leds.gpio = ar->hw_params.led_pin;
>> >>>gpio_led_register_device(0, _leds);
>> >>the problem are other architectures which have already registered gpio_led
>> >>at system start like ar71xx
>> >>you cannot register a second one. so a independend led driver is a
>> >>requirement for direct control
>> >If the limitation indeed exists, please fix the limitation rather than
>> >working around it in each and every driver.
>> see ath9k. its exact the same implementation.
> 
> Ok, so one more driver to fix.
> 
>> in addition my variant does also work without gpiolib support. so it can be
>> used even if the kernel is configured
>> without gpio support.
>> and not to forget, using a own led driver is more ligthweight from the call
>> path for each led on / off event which is important for
>> low performance embedded devices
> 
> We are not going to copy code because such code works without
> libraries, and we are not going to copy code because that uses
> less cache during calls. Sorry.
> 
> NAK. Please fix your patch. Since this discussion seems to have taken a 
> rather weird turn, I've
taken a look at the specific code, and from my point of view the code
that sets up the LED (including callback) is so trivial that it's simply
not worth dealing with adding the leds-gpio driver to the mix.
It adds extra complexity and an extra dependency for no reason at all.
There's no extra functionality to be gained by using it.

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH] ath10k: fix use-after-free in ath10k_wmi_cmd_send_nowait

2018-02-11 Thread Felix Fietkau
On 2018-02-11 03:56, Carl Huang wrote:
> The skb may be freed in tx completion context before
> trace_ath10k_wmi_cmd is called. This can be easily captured
> when KASAN(Kernel Address Sanitizer) is enabled. The fix is
> to add a reference count to the skb and release it after
> trace_ath10k_wmi_cmd is called.
> 
> Signed-off-by: Carl Huang 
I think it makes more sense to simply call the trace function before
ath10k_htc_send. Also, for a trivial change like this it probably does
not make sense to add a Copyright line either.

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2] ath10k: Re-enable TXQs for all devices

2017-11-11 Thread Felix Fietkau
On 2017-11-10 01:48, Toke Høiland-Jørgensen wrote:
> Commit 4ca1807815aa6801aaced7fdefa9edacc2521767 disables the use of the
> mac80211 TXQs for some devices because of a theoretical throughput
> regression. The original regression report[1] was related to fq_codel
> qdisc drop performance, which was fixed in
> 9d18562a227874289fda8ca5d117d8f503f1dcca. Since then, we have not seen
> the TXQ-related regression, so it should be safe to re-enable TXQs.
That commit is unrelated to the fq/codel implementations in mac80211,
since mac80211 with txq does not use qdisc.

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH] ath10k : Fix channel survey dump

2017-04-26 Thread Felix Fietkau
On 2017-04-26 16:41, Venkateswara Rao Naralasetty wrote:
> Channel active/busy time are showing incorrect
> (less than previous or sometimes zero) for
> successive survey dump command.
> 
> example:
> Survey data from wlan0
> frequency:  5180 MHz [in use]
> channel active time:54995 ms
> channel busy time:  432 ms
> channel receive time:   0 ms
> channel transmit time:  59 ms
> Survey data from wlan0
> frequency:  5180 MHz [in use]
> channel active time:32592 ms
> channel busy time:  254 ms
> channel receive time:   0 ms
> channel transmit time:  0 ms
> 
> This patch fix this issue by assigning 'wmi_bss_survey_req_type'
> as 'WMI_BSS_SURVEY_REQ_TYPE_READ'.
> 
> Firmware ver 10.4-3.4-00082
> Hardware QCA4019
> 
> Signed-off-by: Venkateswara Rao Naralasetty 
> ---
>  drivers/net/wireless/ath/ath10k/mac.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
> b/drivers/net/wireless/ath/ath10k/mac.c
> index 9977829..87a9b55 100644
> --- a/drivers/net/wireless/ath/ath10k/mac.c
> +++ b/drivers/net/wireless/ath/ath10k/mac.c
> @@ -6621,7 +6621,7 @@ static void ath10k_reconfig_complete(struct 
> ieee80211_hw *hw,
> struct ieee80211_channel *channel)
>  {
>   int ret;
> - enum wmi_bss_survey_req_type type = WMI_BSS_SURVEY_REQ_TYPE_READ_CLEAR;
> + enum wmi_bss_survey_req_type type = WMI_BSS_SURVEY_REQ_TYPE_READ;
Does the firmware read the registers directly, or does it accumulate the
results in a way that can't overflow?
If you don't clear the counters on reset, the overflow will be
problematic for the current-channel stats.
I think a better approach would be to use READ_CLEAR for in-use channels
and store the sum inside the driver.

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries.

2016-09-09 Thread Felix Fietkau
On 2016-08-19 03:26, gree...@candelatech.com wrote:
> From: Ben Greear 
> 
> I was seeing kernel crashes due to accessing freed memory
> while debugging a 9984 firmware that was crashing often.
> 
> This patch fixes the crashes.  I am not certain if there
> is a better way or not.
> 
> Signed-off-by: Ben Greear 
> ---
>  drivers/net/wireless/ath/ath10k/mac.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
> b/drivers/net/wireless/ath/ath10k/mac.c
> index 5659ef1..916119c 100644
> --- a/drivers/net/wireless/ath/ath10k/mac.c
> +++ b/drivers/net/wireless/ath/ath10k/mac.c
> @@ -4172,8 +4172,10 @@ static void ath10k_mac_txq_init(struct ieee80211_txq 
> *txq)
>  static void ath10k_mac_txq_unref(struct ath10k *ar, struct ieee80211_txq 
> *txq)
>  {
>   struct ath10k_txq *artxq = (void *)txq->drv_priv;
> + struct ath10k_txq *tmp, *walker;
>   struct ath10k_skb_cb *cb;
>   struct sk_buff *msdu;
> + struct ieee80211_txq *txq_tmp;
>   int msdu_id;
>  
>   if (!txq)
> @@ -4182,6 +4184,14 @@ static void ath10k_mac_txq_unref(struct ath10k *ar, 
> struct ieee80211_txq *txq)
>   spin_lock_bh(>txqs_lock);
>   if (!list_empty(>list))
>   list_del_init(>list);
> +
> + /* Remove from ar->txqs in case it still exists there. */
> + list_for_each_entry_safe(walker, tmp, >txqs, list) {
> + txq_tmp = container_of((void *)walker, struct ieee80211_txq,
> +drv_priv);
> + if (txq_tmp == txq)
> + list_del(>list);
> + }
This makes no sense at all. From txq_tmp == txq we can deduce that
walker == artxq. In the context above, it already does a
list_del_init(>list).

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [RFC] ath10k: silence firmware file probing warnings

2016-08-02 Thread Felix Fietkau
On 2016-08-02 13:18, Valo, Kalle wrote:
> Michal Kazior  writes:
> 
>> Firmware files are versioned to prevent older
>> driver instances to load unsupported firmware
>> blobs. This is reflected with a fallback logic
>> which attempts to load several firmware files.
>>
>> This however produced a lot of unnecessary
>> warnings sometimes confusing users and leading
>> them to rename firmware files making things even
>> more confusing.
>>
>> Hence use request_firmware_direct() which does not
>> produce extra warnings. This shouldn't really
>> break anything because most modern systems don't
>> rely on udev/hotplug helpers to load firmware
>> files anymore.
>>
>> Signed-off-by: Michal Kazior 
> 
> Nice. These "firmware not found" messages have been confusing ath10k
> users for ages and should be properly fixed. I hope we find a solution.
> 
> But I talked with Felix about this and he made a good point about board
> and calibration files. Calibration files might be created runtime, for
> example retrieved from NAND etc, and this might break the use case when
> ath10k is statically linked to kernel. Is the combination used in real
> life and should we care, that I do not know, but I'm worried of possible
> regressions. I guess LEDE/openwrt always loads ath10k as a module and
> after the calibration file is created?
ath10k is always loaded as a module, and the calibration file is created
by a script that's triggered by the firmware uevent.

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH] ath10k: Fix mgmt tx status for 10.4.3 firmware.

2016-03-01 Thread Felix Fietkau
On 2016-02-29 22:50, gree...@candelatech.com wrote:
> From: Ben Greear 
> 
> When testing a 10.4.3 firmware in station mode, I notice that
> when the AP is powered down, the ath10k does not notice AP is gone
> because mgt frames get tx status of 3, which is not handled.
> 
> It appears that status 3 means something similar to failed-retry.
> 
> Treating it thus lets the station disconnect properly.
> 
> Tested against a non-stock 10.4.3 firmware, but likely upstream
> firmware acts similarly in this case.
> 
> Signed-off-by: Ben Greear 
> ---
> 
> This patch is against a heavily patched tree based on 4.4, hopefully
> it can be applied to upstream code w/out too much trouble.
> 
> Someone should verify this on stock 10.4-ish firmware before applying.
> 
>  drivers/net/wireless/ath/ath10k/htt.h|  4 +++-
>  drivers/net/wireless/ath/ath10k/htt_rx.c | 11 +--
>  2 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/wireless/ath/ath10k/htt.h 
> b/drivers/net/wireless/ath/ath10k/htt.h
> index c31a31f..de663a6 100644
> --- a/drivers/net/wireless/ath/ath10k/htt.h
> +++ b/drivers/net/wireless/ath/ath10k/htt.h
> @@ -309,7 +309,9 @@ struct htt_mgmt_tx_desc {
>  enum htt_mgmt_tx_status {
>   HTT_MGMT_TX_STATUS_OK= 0,
>   HTT_MGMT_TX_STATUS_RETRY = 1,
> - HTT_MGMT_TX_STATUS_DROP  = 2
> + HTT_MGMT_TX_STATUS_DROP  = 2,
> + HTT_MGMT_TX_STATUS_TXFILT = 3 /* Seems to be logically similar to
> +  RETRY failure. */
In that case the driver should probably set IEEE80211_TX_STAT_TX_FILTERED

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: Issue with QCA9980 initialization

2015-12-20 Thread Felix Fietkau
On 2015-12-17 13:10, Josh Bendavid wrote:
> I have a problem with QCA9980 initialization.
> 
> Hardware/software environment is a TP-Link Archer C2600 with Openwrt
> trunk + patches for the c2600.  (This device has two QCA9980
> controllers for simultaneous dual band operation)
> 
> Openwrt is currently populating the firmware files as follows
> 
> https://github.com/kvalo/ath10k-firmware/blob/69f955c3dd95d97c7c30960aa5c9852c80b85cc3/QCA99X0/hw2.0/boardData_AR900B_CUS239_5G_v2_001.bin
> -> /lib/firmware/ath10k/QCA99X0/hw2.0/board.bin
> 
> https://github.com/kvalo/ath10k-firmware/blob/69f955c3dd95d97c7c30960aa5c9852c80b85cc3/QCA99X0/hw2.0/firmware-5.bin_10.4.1.00030-1
> -> /lib/firmware/ath10k/QCA99X0/hw2.0/firmware-5.bin
> 
> https://www.codeaurora.org/cgit/quic/qsdk/oss/firmware/ath10k-firmware/plain/ath10k/QCA99X0/hw2.0/board-2.bin?id=ddcec9efd245da9365c474f513a855a55f3ac7fe
> -> /lib/firmware/ath10k/QCA99X0/hw2.0/board-2.bin
> 
> Then I am extracting (at hardware initialization time) the appropriate
> calibration data from the ART partition on the router flash
> 
> offset/count 4096/12064 -> /lib/firmware/ath10k/cal-pci-:01:00.0.bin
> offset/count 20480/12064 -> /lib/firmware/ath10k/cal-pci-0001:01:00.0.bin
> 
> Using ath10k as shipped with compat-wireless in openwrt trunk out of
> the box I get the output in attached nopatch.txt (and wireless is not
> functioning clearly)
> 
> Using instead the patch from
> http://lists.infradead.org/pipermail/ath10k/2015-November/006489.html
> 
> Then I get the output in attached patch.txt, and wireless is working
> correctly with both radios.
> 
> Is it a known problem?  Is there a mistake somewhere in how I am
> populating /lib/firmware, or is the applied patch the correct fix for
> the time being?
It's a known problem. ath10k has a bug where it needlessly tries to
extract the OTP board id and fetch data from board-2.bin (and fails if
it can't) even though it already has valid per-radio cal-data.

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH] ath10k: add modparam 'hw_csum' to make HW checksum configurable

2015-12-17 Thread Felix Fietkau
On 2015-12-17 23:01, Peter Oh wrote:
> 
> On 12/16/2015 03:59 PM, Felix Fietkau wrote:
>> On 2015-12-17 00:50, Peter Oh wrote:
>>> On 12/16/2015 01:54 PM, Felix Fietkau wrote:
>>>> On 2015-12-16 22:19, Peter Oh wrote:
>>>>> On 12/16/2015 12:53 PM, Felix Fietkau wrote:
>>>>>> On 2015-12-16 21:46, Peter Oh wrote:
>>>>>>> On 12/16/2015 12:35 PM, Felix Fietkau wrote:
>>>>>>>> On 2015-12-16 21:29, Peter Oh wrote:
>>>>>>>>> On 12/16/2015 10:27 AM, Felix Fietkau wrote:
>>>>>>>>>> On 2015-12-16 19:20, Peter Oh wrote:
>>>>>>>>>>> Some hardwares such as QCA988X and QCA99X0 doesn't have
>>>>>>>>>>> capability of checksum offload when frame formats are not
>>>>>>>>>>> suitable for it such as Mesh frame.
>>>>>>>>>>> Hence add a module parameter, hw_csum, to make checksum offload
>>>>>>>>>>> configurable during module registration time.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Peter Oh <p...@qca.qualcomm.com>
>>>>>>>>>> How about instead of inventing yet another crappy module parameter, 
>>>>>>>>>> you
>>>>>>>>>> call skb_checksum_help() in the driver in cases where the hardware is
>>>>>>>>>> unable to offload the checksum calculation.
>>>>>>>>>>
>>>>>>>>>> That way the user has to worry about less driver specific hackery ;)
>>>>>>>>> That will be good option for hardware not supporting HW checksum, but 
>>>>>>>>> I
>>>>>>>>> mind that using the function will add more workload per every packet 
>>>>>>>>> on
>>>>>>>>> critical data path when HW supports checksum resulting in throughput 
>>>>>>>>> down.
>>>>>>>> I didn't mean calling it for every single frame in the data path.
>>>>>>>> What I'm suggesting is calling it selectively only for mesh frames, or
>>>>>>>> any other frames that the hardware cannot offload, and leaving the rest
>>>>>>>> for the hardware to process.
>>>>>>>>
>>>>>>>> There should be no performance difference between disabling checksum
>>>>>>>> offload and calling skb_checksum_help from the driver.
>>>>>>> To call it selectively for Mesh frame or interface, we need to add it on
>>>>>>> mac80211 layer such as ieee80211_build_hdr() since driver layer does not
>>>>>>> care the interface type in data path.
>>>>>> No need to change mac80211 - it only touches the headers, and
>>>>>> skb_checksum_help does not care about that. The skb has enough
>>>>>> information for it to find the right range to calculate the checksum and
>>>>>> the place to store it.
>>>>> If mentioned to use the function to mesh frame only without touching
>>>>> mac80211, then how do you suggest it to apply it only to mesh frame
>>>>> without interfere other data frames?
>>>>> Can you share your example?
>>>> It's trivial - in ath10k_tx you do this:
>>>>
>>>> if (vif->type == NL80211_IFTYPE_MESH_POINT &&
>>>>   skb->ip_summed == CHECKSUM_PARTIAL)
>>>>skb_checksum_help(skb);
>>> Thank you Felix for the quick response.
>>> I agree on your user experience opinion,
>>> but what do you think when ath10k has a new chip supporting HW checksum
>>> for Mesh?
>> Then you simply update the checks. What's the big deal?
> keep adding condition to such data path is not a good option.
> I also considered again about user experiences and reached to that this 
> patch won't disturb user experience since the products will ship with 
> proper module settings. for instance the parameter will be turned on if 
> product support it other wise will be turned off as they shipped, so 
> that users don't need to touch it.
I think the point you were missing is the one that there is no such
thing as a proper setting for this module parameter, since it doesn't
really depend much on the hardware or the product, but on the wifi mode
that you are using.

> In addition, for enterprise customers, they do care even a very small 
> performance drop or enhancement especially when they are running BMT 
> among vendors.
> So we need to avoid adding extra codes in data path in my opinion.
The regular data tx path already checks ar->dev_flags to decide whether
to use raw mode or not. This means that this part of the data structure
is already cached. The vif type is also cached, since it's accessed in
the same part of the function.
Because of that, the impact of adding an extra check even for a hardware
capability will be so low, that I'm pretty sure you will not be able to
measure it. And even if it were measurable, it's probably quite easy to
find a few places to optimize

I find the tradeoff you are making very odd: For users that don't know
about the module parameter (depending on the default value) it either
just randomly doesn't work in mesh or always runs with degraded
performance. All this to save adding a check that will be completely
irrelevant for performance, since it won't result in any extra cache
stalls (which are the typical bottleneck in the data path).

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH] ath10k: add modparam 'hw_csum' to make HW checksum configurable

2015-12-16 Thread Felix Fietkau
On 2015-12-16 21:29, Peter Oh wrote:
> 
> On 12/16/2015 10:27 AM, Felix Fietkau wrote:
>> On 2015-12-16 19:20, Peter Oh wrote:
>>> Some hardwares such as QCA988X and QCA99X0 doesn't have
>>> capability of checksum offload when frame formats are not
>>> suitable for it such as Mesh frame.
>>> Hence add a module parameter, hw_csum, to make checksum offload
>>> configurable during module registration time.
>>>
>>> Signed-off-by: Peter Oh <p...@qca.qualcomm.com>
>> How about instead of inventing yet another crappy module parameter, you
>> call skb_checksum_help() in the driver in cases where the hardware is
>> unable to offload the checksum calculation.
>>
>> That way the user has to worry about less driver specific hackery ;)
> That will be good option for hardware not supporting HW checksum, but I 
> mind that using the function will add more workload per every packet on 
> critical data path when HW supports checksum resulting in throughput down.
I didn't mean calling it for every single frame in the data path.
What I'm suggesting is calling it selectively only for mesh frames, or
any other frames that the hardware cannot offload, and leaving the rest
for the hardware to process.

There should be no performance difference between disabling checksum
offload and calling skb_checksum_help from the driver.

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH] ath10k: add modparam 'hw_csum' to make HW checksum configurable

2015-12-16 Thread Felix Fietkau
On 2015-12-16 19:20, Peter Oh wrote:
> Some hardwares such as QCA988X and QCA99X0 doesn't have
> capability of checksum offload when frame formats are not
> suitable for it such as Mesh frame.
> Hence add a module parameter, hw_csum, to make checksum offload
> configurable during module registration time.
> 
> Signed-off-by: Peter Oh 
How about instead of inventing yet another crappy module parameter, you
call skb_checksum_help() in the driver in cases where the hardware is
unable to offload the checksum calculation.

That way the user has to worry about less driver specific hackery ;)

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH] ath10k: add modparam 'hw_csum' to make HW checksum configurable

2015-12-16 Thread Felix Fietkau
On 2015-12-16 22:19, Peter Oh wrote:
> 
> On 12/16/2015 12:53 PM, Felix Fietkau wrote:
>> On 2015-12-16 21:46, Peter Oh wrote:
>>> On 12/16/2015 12:35 PM, Felix Fietkau wrote:
>>>> On 2015-12-16 21:29, Peter Oh wrote:
>>>>> On 12/16/2015 10:27 AM, Felix Fietkau wrote:
>>>>>> On 2015-12-16 19:20, Peter Oh wrote:
>>>>>>> Some hardwares such as QCA988X and QCA99X0 doesn't have
>>>>>>> capability of checksum offload when frame formats are not
>>>>>>> suitable for it such as Mesh frame.
>>>>>>> Hence add a module parameter, hw_csum, to make checksum offload
>>>>>>> configurable during module registration time.
>>>>>>>
>>>>>>> Signed-off-by: Peter Oh <p...@qca.qualcomm.com>
>>>>>> How about instead of inventing yet another crappy module parameter, you
>>>>>> call skb_checksum_help() in the driver in cases where the hardware is
>>>>>> unable to offload the checksum calculation.
>>>>>>
>>>>>> That way the user has to worry about less driver specific hackery ;)
>>>>> That will be good option for hardware not supporting HW checksum, but I
>>>>> mind that using the function will add more workload per every packet on
>>>>> critical data path when HW supports checksum resulting in throughput down.
>>>> I didn't mean calling it for every single frame in the data path.
>>>> What I'm suggesting is calling it selectively only for mesh frames, or
>>>> any other frames that the hardware cannot offload, and leaving the rest
>>>> for the hardware to process.
>>>>
>>>> There should be no performance difference between disabling checksum
>>>> offload and calling skb_checksum_help from the driver.
>>> To call it selectively for Mesh frame or interface, we need to add it on
>>> mac80211 layer such as ieee80211_build_hdr() since driver layer does not
>>> care the interface type in data path.
>> No need to change mac80211 - it only touches the headers, and
>> skb_checksum_help does not care about that. The skb has enough
>> information for it to find the right range to calculate the checksum and
>> the place to store it.
> If mentioned to use the function to mesh frame only without touching 
> mac80211, then how do you suggest it to apply it only to mesh frame 
> without interfere other data frames?
> Can you share your example?
It's trivial - in ath10k_tx you do this:

if (vif->type == NL80211_IFTYPE_MESH_POINT &&
skb->ip_summed == CHECKSUM_PARTIAL)
skb_checksum_help(skb);

>>> In that case it will also introduce throughput degrade to HW that
>>> supports HW checksum for Mesh.
>> This doesn't make any sense to me. Are you saying that there's no way
>> for the driver to detect the cases where the hardware cannot do checksum
>> offloading?
> I'm saying the case that HW supports checksum except for specific frame 
> such as Mesh and to make driver support both case dynamically at code 
> level, it requires extra codes which need to check if the frame is Mesh 
> or not. Since this approach requires extra workload especially in data 
> path, it will degrade driver's performance.
The check is cheap enough that it will not have any visible impact. And
the improved user experience is certainly worth it ;)

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH] ath10k: add modparam 'hw_csum' to make HW checksum configurable

2015-12-16 Thread Felix Fietkau
On 2015-12-17 00:50, Peter Oh wrote:
> 
> On 12/16/2015 01:54 PM, Felix Fietkau wrote:
>> On 2015-12-16 22:19, Peter Oh wrote:
>>> On 12/16/2015 12:53 PM, Felix Fietkau wrote:
>>>> On 2015-12-16 21:46, Peter Oh wrote:
>>>>> On 12/16/2015 12:35 PM, Felix Fietkau wrote:
>>>>>> On 2015-12-16 21:29, Peter Oh wrote:
>>>>>>> On 12/16/2015 10:27 AM, Felix Fietkau wrote:
>>>>>>>> On 2015-12-16 19:20, Peter Oh wrote:
>>>>>>>>> Some hardwares such as QCA988X and QCA99X0 doesn't have
>>>>>>>>> capability of checksum offload when frame formats are not
>>>>>>>>> suitable for it such as Mesh frame.
>>>>>>>>> Hence add a module parameter, hw_csum, to make checksum offload
>>>>>>>>> configurable during module registration time.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Peter Oh <p...@qca.qualcomm.com>
>>>>>>>> How about instead of inventing yet another crappy module parameter, you
>>>>>>>> call skb_checksum_help() in the driver in cases where the hardware is
>>>>>>>> unable to offload the checksum calculation.
>>>>>>>>
>>>>>>>> That way the user has to worry about less driver specific hackery ;)
>>>>>>> That will be good option for hardware not supporting HW checksum, but I
>>>>>>> mind that using the function will add more workload per every packet on
>>>>>>> critical data path when HW supports checksum resulting in throughput 
>>>>>>> down.
>>>>>> I didn't mean calling it for every single frame in the data path.
>>>>>> What I'm suggesting is calling it selectively only for mesh frames, or
>>>>>> any other frames that the hardware cannot offload, and leaving the rest
>>>>>> for the hardware to process.
>>>>>>
>>>>>> There should be no performance difference between disabling checksum
>>>>>> offload and calling skb_checksum_help from the driver.
>>>>> To call it selectively for Mesh frame or interface, we need to add it on
>>>>> mac80211 layer such as ieee80211_build_hdr() since driver layer does not
>>>>> care the interface type in data path.
>>>> No need to change mac80211 - it only touches the headers, and
>>>> skb_checksum_help does not care about that. The skb has enough
>>>> information for it to find the right range to calculate the checksum and
>>>> the place to store it.
>>> If mentioned to use the function to mesh frame only without touching
>>> mac80211, then how do you suggest it to apply it only to mesh frame
>>> without interfere other data frames?
>>> Can you share your example?
>> It's trivial - in ath10k_tx you do this:
>>
>> if (vif->type == NL80211_IFTYPE_MESH_POINT &&
>>  skb->ip_summed == CHECKSUM_PARTIAL)
>>  skb_checksum_help(skb);
> Thank you Felix for the quick response.
> I agree on your user experience opinion,
> but what do you think when ath10k has a new chip supporting HW checksum 
> for Mesh?
Then you simply update the checks. What's the big deal?

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH v2] ath10k: do not use coherent memory for allocated device memory chunks

2015-11-30 Thread Felix Fietkau
Coherent memory is more expensive to allocate (and constrained on some
architectures where it has to be pre-allocated). It is also completely
unnecessary, since the host has no reason to even access these allocated
memory spaces

Signed-off-by: Felix Fietkau <n...@openwrt.org>
---
 drivers/net/wireless/ath/ath10k/wmi.c | 61 ---
 1 file changed, 43 insertions(+), 18 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/wmi.c 
b/drivers/net/wireless/ath/ath10k/wmi.c
index 9021079..1386dd8 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.c
+++ b/drivers/net/wireless/ath/ath10k/wmi.c
@@ -4300,34 +4300,58 @@ void ath10k_wmi_event_vdev_resume_req(struct ath10k 
*ar, struct sk_buff *skb)
ath10k_dbg(ar, ATH10K_DBG_WMI, "WMI_VDEV_RESUME_REQ_EVENTID\n");
 }
 
-static int ath10k_wmi_alloc_host_mem(struct ath10k *ar, u32 req_id,
-u32 num_units, u32 unit_len)
+static int ath10k_wmi_alloc_chunk(struct ath10k *ar, u32 req_id,
+ u32 num_units, u32 unit_len)
 {
dma_addr_t paddr;
-   u32 pool_size;
+   u32 pool_size = 0;
int idx = ar->wmi.num_mem_chunks;
+   void *vaddr = NULL;
 
-   pool_size = num_units * round_up(unit_len, 4);
+   if (ar->wmi.num_mem_chunks == ARRAY_SIZE(ar->wmi.mem_chunks))
+   return -ENOMEM;
 
-   if (!pool_size)
-   return -EINVAL;
+   while (!vaddr && num_units) {
+   pool_size = num_units * round_up(unit_len, 4);
+   if (!pool_size)
+   return -EINVAL;
 
-   ar->wmi.mem_chunks[idx].vaddr = dma_alloc_coherent(ar->dev,
-  pool_size,
-  ,
-  GFP_KERNEL);
-   if (!ar->wmi.mem_chunks[idx].vaddr) {
-   ath10k_warn(ar, "failed to allocate memory chunk\n");
-   return -ENOMEM;
+   vaddr = kzalloc(pool_size, GFP_KERNEL | __GFP_NOWARN);
+   if (!vaddr)
+   num_units /= 2;
}
 
-   memset(ar->wmi.mem_chunks[idx].vaddr, 0, pool_size);
+   if (!num_units)
+   return -ENOMEM;
+
+   paddr = dma_map_single(ar->dev, vaddr, pool_size, DMA_TO_DEVICE);
+   if (dma_mapping_error(ar->dev, paddr)) {
+   kfree(vaddr);
+   return -ENOMEM;
+   }
 
+   ar->wmi.mem_chunks[idx].vaddr = vaddr;
ar->wmi.mem_chunks[idx].paddr = paddr;
ar->wmi.mem_chunks[idx].len = pool_size;
ar->wmi.mem_chunks[idx].req_id = req_id;
ar->wmi.num_mem_chunks++;
 
+   return num_units;
+}
+
+static int ath10k_wmi_alloc_host_mem(struct ath10k *ar, u32 req_id,
+u32 num_units, u32 unit_len)
+{
+   int ret;
+
+   while (num_units) {
+   ret = ath10k_wmi_alloc_chunk(ar, req_id, num_units, unit_len);
+   if (ret < 0)
+   return ret;
+
+   num_units -= ret;
+   }
+
return 0;
 }
 
@@ -7705,10 +7729,11 @@ void ath10k_wmi_free_host_mem(struct ath10k *ar)
 
/* free the host memory chunks requested by firmware */
for (i = 0; i < ar->wmi.num_mem_chunks; i++) {
-   dma_free_coherent(ar->dev,
- ar->wmi.mem_chunks[i].len,
- ar->wmi.mem_chunks[i].vaddr,
- ar->wmi.mem_chunks[i].paddr);
+   dma_unmap_single(ar->dev,
+ar->wmi.mem_chunks[i].paddr,
+ar->wmi.mem_chunks[i].len,
+DMA_TO_DEVICE);
+   kfree(ar->wmi.mem_chunks[i].vaddr);
}
 
ar->wmi.num_mem_chunks = 0;
-- 
2.2.2


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH] ath10k: stop abusing GFP_DMA

2015-11-24 Thread Felix Fietkau
Allocations from the DMA zone were originally added for legacy ISA
stuff, or PCI devices that have specific limitations in their DMA
addressing capabilities. It has no place in ath10k, which can do
full 32-bit DMA.

Fixes memory allocation errors on some platforms.

Signed-off-by: Felix Fietkau <n...@openwrt.org>
---
 drivers/net/wireless/ath/ath10k/htt_rx.c | 4 ++--
 drivers/net/wireless/ath/ath10k/htt_tx.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c 
b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 396645b..8f240e1 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -536,7 +536,7 @@ int ath10k_htt_rx_alloc(struct ath10k_htt *htt)
 
size = htt->rx_ring.size * sizeof(htt->rx_ring.paddrs_ring);
 
-   vaddr = dma_alloc_coherent(htt->ar->dev, size, , GFP_DMA);
+   vaddr = dma_alloc_coherent(htt->ar->dev, size, , GFP_KERNEL);
if (!vaddr)
goto err_dma_ring;
 
@@ -545,7 +545,7 @@ int ath10k_htt_rx_alloc(struct ath10k_htt *htt)
 
vaddr = dma_alloc_coherent(htt->ar->dev,
   sizeof(*htt->rx_ring.alloc_idx.vaddr),
-  , GFP_DMA);
+  , GFP_KERNEL);
if (!vaddr)
goto err_dma_idx;
 
diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c 
b/drivers/net/wireless/ath/ath10k/htt_tx.c
index 8f76b9d..ed62ec8 100644
--- a/drivers/net/wireless/ath/ath10k/htt_tx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
@@ -111,7 +111,7 @@ int ath10k_htt_tx_alloc(struct ath10k_htt *htt)
size = htt->max_num_pending_tx * sizeof(struct ath10k_htt_txbuf);
htt->txbuf.vaddr = dma_alloc_coherent(ar->dev, size,
  >txbuf.paddr,
- GFP_DMA);
+ GFP_KERNEL);
if (!htt->txbuf.vaddr) {
ath10k_err(ar, "failed to alloc tx buffer\n");
ret = -ENOMEM;
@@ -124,7 +124,7 @@ int ath10k_htt_tx_alloc(struct ath10k_htt *htt)
size = htt->max_num_pending_tx * sizeof(struct htt_msdu_ext_desc);
htt->frag_desc.vaddr = dma_alloc_coherent(ar->dev, size,
  >frag_desc.paddr,
- GFP_DMA);
+ GFP_KERNEL);
if (!htt->frag_desc.vaddr) {
ath10k_warn(ar, "failed to alloc fragment desc memory\n");
ret = -ENOMEM;
-- 
2.2.2


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 2/2] ath10k: do not use coherent memory for tx buffers

2015-11-24 Thread Felix Fietkau
On 2015-11-23 19:50, Peter Oh wrote:
> 
> On 11/23/2015 10:18 AM, Felix Fietkau wrote:
>> On 2015-11-23 18:25, Peter Oh wrote:
>>> Hi,
>>>
>>> Have you measured the peak throughput?
>>> The pre-allocated coherent memory concept was introduced as once of peak
>>> throughput improvement.
>> It's all still pre-allocated and pre-mapped.
> Right. I mis-guessed with the title.
>>
>>> IIRC, dma_map_single takes about 4 us on Cortex A7 and dma_unmap_single
>>> also takes time to invalid cache.
>> That's why I didn't put a map/unmap in the hot path. There is only a
>> cache sync there. With coherent memory, every single word access blocks
>> until the transaction is complete. With cached/mapped memory, the CPU
>> can fill the cachelines first, then flush it in one go. This usually
>> ends up being faster than working with coherent memory directly.
>>
>>> Please share your tput number before and after, so I don't need to worry
>>> about performance degrade.
>> I don't have an ideal setup for tput tests at the moment, so I can't
>> give you any numbers.
> Could you share any rough number?
>>   However, on the device that I'm testing on
>> (IPQ806x based), this patch makes the difference between working and
>> non-working wifi, fixing the regression introduced by your pre-allocated
>> coherent memory patch.
> Thank you for the catch up and fix.
> Btw, the regression can be fixed by using GFP_KERNEL, instead of 
> GFP_DMA, right?
I just did some timing measurements, and it seems that the DMA coherent
variant is roughly 200 nanoseconds faster. Maybe the extra latency is
caused by the CPU filling the cacheline from RAM first.

Kalle, please only merge the first one and drop this patch.
I will send a replacement for it.

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 2/2] ath10k: do not use coherent memory for tx buffers

2015-11-23 Thread Felix Fietkau
On 2015-11-23 18:25, Peter Oh wrote:
> Hi,
> 
> Have you measured the peak throughput?
> The pre-allocated coherent memory concept was introduced as once of peak 
> throughput improvement.
It's all still pre-allocated and pre-mapped.

> IIRC, dma_map_single takes about 4 us on Cortex A7 and dma_unmap_single 
> also takes time to invalid cache.
That's why I didn't put a map/unmap in the hot path. There is only a
cache sync there. With coherent memory, every single word access blocks
until the transaction is complete. With cached/mapped memory, the CPU
can fill the cachelines first, then flush it in one go. This usually
ends up being faster than working with coherent memory directly.

> Please share your tput number before and after, so I don't need to worry 
> about performance degrade.
I don't have an ideal setup for tput tests at the moment, so I can't
give you any numbers. However, on the device that I'm testing on
(IPQ806x based), this patch makes the difference between working and
non-working wifi, fixing the regression introduced by your pre-allocated
coherent memory patch.

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH 1/2] ath10k: do not use coherent memory for allocated device memory chunks

2015-11-23 Thread Felix Fietkau
Coherent memory is more expensive to allocate (and constrained on some
architectures where it has to be pre-allocated). It is also completely
unnecessary, since the host has no reason to even access these allocated
memory spaces

Signed-off-by: Felix Fietkau <n...@openwrt.org>
---
 drivers/net/wireless/ath/ath10k/wmi.c | 59 +--
 1 file changed, 42 insertions(+), 17 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/wmi.c 
b/drivers/net/wireless/ath/ath10k/wmi.c
index 9021079..6cd097c 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.c
+++ b/drivers/net/wireless/ath/ath10k/wmi.c
@@ -4300,34 +4300,58 @@ void ath10k_wmi_event_vdev_resume_req(struct ath10k 
*ar, struct sk_buff *skb)
ath10k_dbg(ar, ATH10K_DBG_WMI, "WMI_VDEV_RESUME_REQ_EVENTID\n");
 }
 
-static int ath10k_wmi_alloc_host_mem(struct ath10k *ar, u32 req_id,
-u32 num_units, u32 unit_len)
+static int ath10k_wmi_alloc_chunk(struct ath10k *ar, u32 req_id,
+ u32 num_units, u32 unit_len)
 {
dma_addr_t paddr;
u32 pool_size;
int idx = ar->wmi.num_mem_chunks;
+   void *vaddr = NULL;
 
-   pool_size = num_units * round_up(unit_len, 4);
+   if (ar->wmi.num_mem_chunks == ARRAY_SIZE(ar->wmi.mem_chunks))
+   return -ENOMEM;
 
-   if (!pool_size)
-   return -EINVAL;
+   while (!vaddr && num_units) {
+   pool_size = num_units * round_up(unit_len, 4);
+   if (!pool_size)
+   return -EINVAL;
 
-   ar->wmi.mem_chunks[idx].vaddr = dma_alloc_coherent(ar->dev,
-  pool_size,
-  ,
-  GFP_KERNEL);
-   if (!ar->wmi.mem_chunks[idx].vaddr) {
-   ath10k_warn(ar, "failed to allocate memory chunk\n");
-   return -ENOMEM;
+   vaddr = kzalloc(pool_size, GFP_KERNEL | __GFP_NOWARN);
+   if (!vaddr)
+   num_units /= 2;
}
 
-   memset(ar->wmi.mem_chunks[idx].vaddr, 0, pool_size);
+   if (!num_units)
+   return -ENOMEM;
 
+   paddr = dma_map_single(ar->dev, vaddr, pool_size, DMA_TO_DEVICE);
+   if (dma_mapping_error(ar->dev, paddr)) {
+   kfree(vaddr);
+   return -ENOMEM;
+   }
+
+   ar->wmi.mem_chunks[idx].vaddr = vaddr;
ar->wmi.mem_chunks[idx].paddr = paddr;
ar->wmi.mem_chunks[idx].len = pool_size;
ar->wmi.mem_chunks[idx].req_id = req_id;
ar->wmi.num_mem_chunks++;
 
+   return num_units;
+}
+
+static int ath10k_wmi_alloc_host_mem(struct ath10k *ar, u32 req_id,
+u32 num_units, u32 unit_len)
+{
+   int ret;
+
+   while (num_units) {
+   ret = ath10k_wmi_alloc_chunk(ar, req_id, num_units, unit_len);
+   if (ret < 0)
+   return ret;
+
+   num_units -= ret;
+   }
+
return 0;
 }
 
@@ -7705,10 +7729,11 @@ void ath10k_wmi_free_host_mem(struct ath10k *ar)
 
/* free the host memory chunks requested by firmware */
for (i = 0; i < ar->wmi.num_mem_chunks; i++) {
-   dma_free_coherent(ar->dev,
- ar->wmi.mem_chunks[i].len,
- ar->wmi.mem_chunks[i].vaddr,
- ar->wmi.mem_chunks[i].paddr);
+   dma_unmap_single(ar->dev,
+ar->wmi.mem_chunks[i].paddr,
+ar->wmi.mem_chunks[i].len,
+DMA_TO_DEVICE);
+   kfree(ar->wmi.mem_chunks[i].vaddr);
}
 
ar->wmi.num_mem_chunks = 0;
-- 
2.2.2


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 4.3] ath10k: fix DMA related firmware crashes on multiple devices

2015-09-14 Thread Felix Fietkau
On 2015-09-14 09:00, Kalle Valo wrote:
> Felix Fietkau <n...@openwrt.org> writes:
> 
>> Some platforms really don't like DMA bursts of 256 bytes, and this
>> causes the firmware to crash when sending beacons.
>> Also, changing this based on the firmware version does not seem to make
>> much sense, so use 128 bytes for all versions.
>>
>> Cc: sta...@vger.kernel.org
>> Signed-off-by: Felix Fietkau <n...@openwrt.org>
> 
> Nice, good catch! Just to make sure, this fixes the issues some people
> have reported that 10.1 firmware works but 10.2 firmware crashes when
> starting beaconing?
Yes. I was able to reproduce that issue myself on a Gateworks Laguna
device, which is among the affected devices reported in that email
thread. This patch fixes it for me.

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH 4.3 RESEND] ath10k: fix DMA related firmware crashes on multiple devices

2015-09-14 Thread Felix Fietkau
Some platforms really don't like DMA bursts of 256 bytes, and this
causes the firmware to crash when sending beacons.
Also, changing this based on the firmware version does not seem to make
much sense, so use 128 bytes for all versions.

Cc: sta...@vger.kernel.org
Signed-off-by: Felix Fietkau <n...@openwrt.org>
---
 drivers/net/wireless/ath/ath10k/hw.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/hw.h 
b/drivers/net/wireless/ath/ath10k/hw.h
index 23afcda..678d72a 100644
--- a/drivers/net/wireless/ath/ath10k/hw.h
+++ b/drivers/net/wireless/ath/ath10k/hw.h
@@ -337,7 +337,7 @@ enum ath10k_hw_rate_cck {
 #define TARGET_10X_MAX_FRAG_ENTRIES0
 
 /* 10.2 parameters */
-#define TARGET_10_2_DMA_BURST_SIZE 1
+#define TARGET_10_2_DMA_BURST_SIZE 0
 
 /* Target specific defines for WMI-TLV firmware */
 #define TARGET_TLV_NUM_VDEVS   4
@@ -391,7 +391,7 @@ enum ath10k_hw_rate_cck {
 
 #define TARGET_10_4_TX_DBG_LOG_SIZE1024
 #define TARGET_10_4_NUM_WDS_ENTRIES32
-#define TARGET_10_4_DMA_BURST_SIZE 1
+#define TARGET_10_4_DMA_BURST_SIZE 0
 #define TARGET_10_4_MAC_AGGR_DELIM 0
 #define TARGET_10_4_RX_SKIP_DEFRAG_TIMEOUT_DUP_DETECTION_CHECK 1
 #define TARGET_10_4_VOW_CONFIG 0
-- 
2.2.2


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH] ath10k: Add QCA99X0 to supported device list

2015-08-06 Thread Felix Fietkau
On 2015-08-06 20:45, Peter Oh wrote:
 
 On 08/06/2015 12:59 AM, Felix Fietkau wrote:
 On 2015-08-06 09:43, Vasanthakumar Thiagarajan wrote:
 On Wednesday 05 August 2015 05:51 PM, Felix Fietkau wrote:
 On 2015-07-21 13:36, Vasanthakumar Thiagarajan wrote:
 Add vendor/device id of QCA99X0 V2.0 to pci id table and
 QCA99X0_HW_2_0_CHIP_ID_REV to ath10k_pci_supp_chips[] for
 QCA99X0 to get detected by the driver.

 Signed-off-by: Vasanthakumar Thiagarajan vthia...@qti.qualcomm.com
 I just tested a 99X0 card, and I'm getting this:

 [8.742514] ath10k_pci :01:00.0: enabling device (0140 - 0142)
 [8.747173] ath10k_pci :01:00.0: pci irq msi interrupts 1
 irq_mode 0 reset_mode 0
 [   11.793314] ath10k_pci :01:00.0: failed to receive initialized
 event from target: 
 [   11.793344] ath10k_pci :01:00.0: failed to wait for target after
 cold reset: -110
 [   11.800814] ath10k_pci :01:00.0: failed to reset chip: -110
 [   11.809228] ath10k_pci: probe of :01:00.0 failed with error -110

 This is on an AP148 board with Linux 3.18, backports generated from
 latest wireless-testing (2015-08-03) + all patches from kvalo's ath
 tree.
 Any idea about the version of AP148 that you are using?. By any chance
 did
 you check if QCA988X is working on the same board?.
 No idea about the version, but QCA988X is working in the same slot.
 QCA99X0 uses a different version of firmware from QCA988X.
 Could you verify which firmware version or file name is using?
ath10k/QCA99X0/hw2.0/firmware-5.bin_10.4.1.7-1 from git is
/lib/firmware/ath10k/QCA99X0/hw2.0/firmware-5.bin on the device.

However, this error shows up even before it gets to the point of loading
the firmware.

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH] ath10k: Add QCA99X0 to supported device list

2015-08-06 Thread Felix Fietkau
On 2015-08-06 09:43, Vasanthakumar Thiagarajan wrote:
 On Wednesday 05 August 2015 05:51 PM, Felix Fietkau wrote:
 On 2015-07-21 13:36, Vasanthakumar Thiagarajan wrote:
 Add vendor/device id of QCA99X0 V2.0 to pci id table and
 QCA99X0_HW_2_0_CHIP_ID_REV to ath10k_pci_supp_chips[] for
 QCA99X0 to get detected by the driver.

 Signed-off-by: Vasanthakumar Thiagarajan vthia...@qti.qualcomm.com
 I just tested a 99X0 card, and I'm getting this:

 [8.742514] ath10k_pci :01:00.0: enabling device (0140 - 0142)
 [8.747173] ath10k_pci :01:00.0: pci irq msi interrupts 1 irq_mode 0 
 reset_mode 0
 [   11.793314] ath10k_pci :01:00.0: failed to receive initialized event 
 from target: 
 [   11.793344] ath10k_pci :01:00.0: failed to wait for target after cold 
 reset: -110
 [   11.800814] ath10k_pci :01:00.0: failed to reset chip: -110
 [   11.809228] ath10k_pci: probe of :01:00.0 failed with error -110

 This is on an AP148 board with Linux 3.18, backports generated from
 latest wireless-testing (2015-08-03) + all patches from kvalo's ath
 tree.
 
 Any idea about the version of AP148 that you are using?. By any chance did
 you check if QCA988X is working on the same board?.
No idea about the version, but QCA988X is working in the same slot.

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 2/2] ath10k: re-config ht_caps when chainmask is modified.

2014-11-25 Thread Felix Fietkau
On 2014-11-24 11:53, Kalle Valo wrote:
 Kalle Valo kv...@qca.qualcomm.com writes:
 
 @@ -2537,6 +2560,17 @@ static int __ath10k_set_antenna(struct ath10k *ar, 
 u32 tx_ant, u32 rx_ant)
 ar-cfg_tx_chainmask = tx_ant;
 ar-cfg_rx_chainmask = rx_ant;
  
 +   ht_cap = ath10k_get_ht_cap(ar, true);
 +   vht_cap = ath10k_create_vht_cap(ar, true);
 +
 +   if (ar-phy_capability  WHAL_WLAN_11G_CAPABILITY)
 +   ar-mac.sbands[IEEE80211_BAND_2GHZ].ht_cap = ht_cap;
 +
 +   if (ar-phy_capability  WHAL_WLAN_11A_CAPABILITY) {
 +   ar-mac.sbands[IEEE80211_BAND_5GHZ].ht_cap = ht_cap;
 +   ar-mac.sbands[IEEE80211_BAND_5GHZ].vht_cap = vht_cap;
 +   }

 So this modifies stryct wiphy::bands after we have called
 ieee80211_register_hw(). Is this something which both cfg80211 and
 mac80211 support? I didn't find the documentation mentioning anything
 about this so I got a bit worried.
 
 Johannes mentioned me that this is not supported so I am reluctant to
 take these. Unless I'm missing something, of course.
FWIW, ath9k has been doing the same for a long time now. Antenna
settings can only be changed while the device is stopped, so it should
be safe.

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH] ath10k: add SURVEY_INFO_IN_USE for current channel on survey

2014-10-23 Thread Felix Fietkau
On 2014-10-23 09:13, Kalle Valo wrote:
 Felix Fietkau n...@openwrt.org writes:
 
 Signed-off-by: Felix Fietkau n...@openwrt.org
 ---
  drivers/net/wireless/ath/ath10k/mac.c | 3 +++
  1 file changed, 3 insertions(+)

 diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
 b/drivers/net/wireless/ath/ath10k/mac.c
 index 4670930..bc440dc 100644
 --- a/drivers/net/wireless/ath/ath10k/mac.c
 +++ b/drivers/net/wireless/ath/ath10k/mac.c
 @@ -3975,6 +3975,9 @@ static int ath10k_get_survey(struct ieee80211_hw *hw, 
 int idx,
  
  survey-channel = sband-channels[idx];
  
 +if (ar-rx_channel == survey-channel)
 +survey-filled |= SURVEY_INFO_IN_USE;
 +
 
 Why? Does this fix a visible bug? What changes from user space point of
 view? I'm asking because I want to add something to the commit log.
When user space requests survey info, it is useful to know which of the
survey data refers to the channel that is currently actively being used.
One of the use cases is getting the current channel noise for status output.
Without this flag you would have to look up the channel separately and
then compare it against the frequency in the survey output in user space.

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k