Re: [PATCH] wcn36xx: dequeue all pending indicator messages

2018-03-18 Thread Bjorn Andersson
On Thu 15 Mar 15:37 PDT 2018, Daniel Mack wrote:

> In case wcn36xx_smd_rsp_process() is called more than once before
> hal_ind_work was dispatched, the messages will end up in hal_ind_queue,
> but wcn36xx_ind_smd_work() will only look at the first message in that
> list.
> 
> Fix this by dequeing the messages from the list in a loop, and only stop
> when it's empty.
> 

Reviewed-by: Bjorn Andersson 

Thanks for fixing this, I thought I already had done that.

Regards,
Bjorn

> Signed-off-by: Daniel Mack 
> ---
>  drivers/net/wireless/ath/wcn36xx/smd.c | 95 
> +++---
>  1 file changed, 52 insertions(+), 43 deletions(-)
> 
> diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
> b/drivers/net/wireless/ath/wcn36xx/smd.c
> index 7cc29285e052..a6b5352f59e9 100644
> --- a/drivers/net/wireless/ath/wcn36xx/smd.c
> +++ b/drivers/net/wireless/ath/wcn36xx/smd.c
> @@ -2409,54 +2409,63 @@ static void wcn36xx_ind_smd_work(struct work_struct 
> *work)
>  {
>   struct wcn36xx *wcn =
>   container_of(work, struct wcn36xx, hal_ind_work);
> - struct wcn36xx_hal_msg_header *msg_header;
> - struct wcn36xx_hal_ind_msg *hal_ind_msg;
> - unsigned long flags;
>  
> - spin_lock_irqsave(>hal_ind_lock, flags);
> + for (;;) {
> + struct wcn36xx_hal_msg_header *msg_header;
> + struct wcn36xx_hal_ind_msg *hal_ind_msg;
> + unsigned long flags;
>  
> - hal_ind_msg = list_first_entry(>hal_ind_queue,
> -struct wcn36xx_hal_ind_msg,
> -list);
> - list_del(wcn->hal_ind_queue.next);
> - spin_unlock_irqrestore(>hal_ind_lock, flags);
> + spin_lock_irqsave(>hal_ind_lock, flags);
>  
> - msg_header = (struct wcn36xx_hal_msg_header *)hal_ind_msg->msg;
> + if (list_empty(>hal_ind_queue)) {
> + spin_unlock_irqrestore(>hal_ind_lock, flags);
> + return;
> + }
>  
> - switch (msg_header->msg_type) {
> - case WCN36XX_HAL_COEX_IND:
> - case WCN36XX_HAL_DEL_BA_IND:
> - case WCN36XX_HAL_AVOID_FREQ_RANGE_IND:
> - break;
> - case WCN36XX_HAL_OTA_TX_COMPL_IND:
> - wcn36xx_smd_tx_compl_ind(wcn,
> -  hal_ind_msg->msg,
> -  hal_ind_msg->msg_len);
> - break;
> - case WCN36XX_HAL_MISSED_BEACON_IND:
> - wcn36xx_smd_missed_beacon_ind(wcn,
> -   hal_ind_msg->msg,
> -   hal_ind_msg->msg_len);
> - break;
> - case WCN36XX_HAL_DELETE_STA_CONTEXT_IND:
> - wcn36xx_smd_delete_sta_context_ind(wcn,
> -hal_ind_msg->msg,
> -hal_ind_msg->msg_len);
> - break;
> - case WCN36XX_HAL_PRINT_REG_INFO_IND:
> - wcn36xx_smd_print_reg_info_ind(wcn,
> -hal_ind_msg->msg,
> -hal_ind_msg->msg_len);
> - break;
> - case WCN36XX_HAL_SCAN_OFFLOAD_IND:
> - wcn36xx_smd_hw_scan_ind(wcn, hal_ind_msg->msg,
> - hal_ind_msg->msg_len);
> - break;
> - default:
> - wcn36xx_err("SMD_EVENT (%d) not supported\n",
> -   msg_header->msg_type);
> + hal_ind_msg = list_first_entry(>hal_ind_queue,
> +struct wcn36xx_hal_ind_msg,
> +list);
> + list_del(_ind_msg->list);
> + spin_unlock_irqrestore(>hal_ind_lock, flags);
> +
> + msg_header = (struct wcn36xx_hal_msg_header *)hal_ind_msg->msg;
> +
> + switch (msg_header->msg_type) {
> + case WCN36XX_HAL_COEX_IND:
> + case WCN36XX_HAL_DEL_BA_IND:
> + case WCN36XX_HAL_AVOID_FREQ_RANGE_IND:
> + break;
> + case WCN36XX_HAL_OTA_TX_COMPL_IND:
> + wcn36xx_smd_tx_compl_ind(wcn,
> +  hal_ind_msg->msg,
> +  hal_ind_msg->msg_len);
> + break;
> + case WCN36XX_HAL_MISSED_BEACON_IND:
> + wcn36xx_smd_missed_beacon_ind(wcn,
> +   hal_ind_msg->msg,
> +   hal_ind_msg->msg_len);
> + break;
> + case WCN36XX_HAL_DELETE_STA_CONTEXT_IND:
> + wcn36xx_smd_delete_sta_context_ind(wcn,
> +hal_ind_msg->msg,
> +  

Re: [PATCH] wcn36xx: dequeue all pending indicator messages

2018-03-16 Thread Daniel Mack
On Friday, March 16, 2018 11:22 AM, Kalle Valo wrote:
> Daniel Mack  writes:
> 
>> Hi,
>>
>> On Friday, March 16, 2018 10:09 AM, Ramon Fried wrote:
>>> On 3/16/2018 12:37 AM, Daniel Mack wrote:
 In case wcn36xx_smd_rsp_process() is called more than once before
 hal_ind_work was dispatched, the messages will end up in hal_ind_queue,
 but wcn36xx_ind_smd_work() will only look at the first message in that
 list.

 Fix this by dequeing the messages from the list in a loop, and only stop
 when it's empty.
>>> Interesting. does it solve a specific bug ? can you elaborate ?
>>
>> I'm poking around in the driver to hopefully find issues that cause
>> instability and failures in joining networks, which I am seeing a lot.
>> There are a number of bug reports regarding this, for instance
>>
>>   https://bugs.96boards.org/show_bug.cgi?id=538
>>   https://bugs.96boards.org/show_bug.cgi?id=319
>>
>> I'm following your patches and also started to look into the driver
>> myself, and during review, I noticed that list handling issue. I have a
>> big fat warning locally that would tell me if the list ever contains
>> more than one entry, but that never happens, as the indicator messages
>> are way too infrequent to trigger the race. So this isn't a real-world
>> issue as far as I can tell, but it is still quite obviously a bug. Hence
>> I considered posting a patch.
> 
> It's a good idea to mention in the commit log if the fix is for a
> theoretical issue and does not necessarily fix anything visible. Helps
> to understand the background, prioritise which release the fix should go
> etc.
> 

Well, it's a race which can (theoretically) happen anytime.
But sure, I will add another sentence to the commit log and resend.


Thanks,
Daniel


Re: [PATCH] wcn36xx: dequeue all pending indicator messages

2018-03-16 Thread Kalle Valo
Daniel Mack  writes:

> Hi,
>
> On Friday, March 16, 2018 10:09 AM, Ramon Fried wrote:
>> On 3/16/2018 12:37 AM, Daniel Mack wrote:
>>> In case wcn36xx_smd_rsp_process() is called more than once before
>>> hal_ind_work was dispatched, the messages will end up in hal_ind_queue,
>>> but wcn36xx_ind_smd_work() will only look at the first message in that
>>> list.
>>>
>>> Fix this by dequeing the messages from the list in a loop, and only stop
>>> when it's empty.
>> Interesting. does it solve a specific bug ? can you elaborate ?
>
> I'm poking around in the driver to hopefully find issues that cause
> instability and failures in joining networks, which I am seeing a lot.
> There are a number of bug reports regarding this, for instance
>
>   https://bugs.96boards.org/show_bug.cgi?id=538
>   https://bugs.96boards.org/show_bug.cgi?id=319
>
> I'm following your patches and also started to look into the driver
> myself, and during review, I noticed that list handling issue. I have a
> big fat warning locally that would tell me if the list ever contains
> more than one entry, but that never happens, as the indicator messages
> are way too infrequent to trigger the race. So this isn't a real-world
> issue as far as I can tell, but it is still quite obviously a bug. Hence
> I considered posting a patch.

It's a good idea to mention in the commit log if the fix is for a
theoretical issue and does not necessarily fix anything visible. Helps
to understand the background, prioritise which release the fix should go
etc.

-- 
Kalle Valo


Re: [PATCH] wcn36xx: dequeue all pending indicator messages

2018-03-16 Thread Daniel Mack
Hi,

On Friday, March 16, 2018 10:09 AM, Ramon Fried wrote:
> On 3/16/2018 12:37 AM, Daniel Mack wrote:
>> In case wcn36xx_smd_rsp_process() is called more than once before
>> hal_ind_work was dispatched, the messages will end up in hal_ind_queue,
>> but wcn36xx_ind_smd_work() will only look at the first message in that
>> list.
>>
>> Fix this by dequeing the messages from the list in a loop, and only stop
>> when it's empty.
> Interesting. does it solve a specific bug ? can you elaborate ?

I'm poking around in the driver to hopefully find issues that cause
instability and failures in joining networks, which I am seeing a lot.
There are a number of bug reports regarding this, for instance

  https://bugs.96boards.org/show_bug.cgi?id=538
  https://bugs.96boards.org/show_bug.cgi?id=319

I'm following your patches and also started to look into the driver
myself, and during review, I noticed that list handling issue. I have a
big fat warning locally that would tell me if the list ever contains
more than one entry, but that never happens, as the indicator messages
are way too infrequent to trigger the race. So this isn't a real-world
issue as far as I can tell, but it is still quite obviously a bug. Hence
I considered posting a patch.

I have another one that handles missed TX ACKs from the firmware (TODO
comment in wcn36xx_start_tx()), but that case doesn't happen either.

So I'll keep looking further, and will send things if I spot something.
In the meantime, as you seem to be familiar with the firmware internals,
I'd appreciate any pointer in how to narrow this down. It's extremely
important to us, as we want to release a product featuring the wcn36xx.


Thanks a lot for all your efforts!

Daniel


>>
>> Signed-off-by: Daniel Mack 
>> ---
>>   drivers/net/wireless/ath/wcn36xx/smd.c | 95 
>> +++---
>>   1 file changed, 52 insertions(+), 43 deletions(-)
>>
>> diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
>> b/drivers/net/wireless/ath/wcn36xx/smd.c
>> index 7cc29285e052..a6b5352f59e9 100644
>> --- a/drivers/net/wireless/ath/wcn36xx/smd.c
>> +++ b/drivers/net/wireless/ath/wcn36xx/smd.c
>> @@ -2409,54 +2409,63 @@ static void wcn36xx_ind_smd_work(struct work_struct 
>> *work)
>>   {
>>  struct wcn36xx *wcn =
>>  container_of(work, struct wcn36xx, hal_ind_work);
>> -struct wcn36xx_hal_msg_header *msg_header;
>> -struct wcn36xx_hal_ind_msg *hal_ind_msg;
>> -unsigned long flags;
>>   
>> -spin_lock_irqsave(>hal_ind_lock, flags);
>> +for (;;) {
>> +struct wcn36xx_hal_msg_header *msg_header;
>> +struct wcn36xx_hal_ind_msg *hal_ind_msg;
>> +unsigned long flags;
>>   
>> -hal_ind_msg = list_first_entry(>hal_ind_queue,
>> -   struct wcn36xx_hal_ind_msg,
>> -   list);
>> -list_del(wcn->hal_ind_queue.next);
>> -spin_unlock_irqrestore(>hal_ind_lock, flags);
>> +spin_lock_irqsave(>hal_ind_lock, flags);
>>   
>> -msg_header = (struct wcn36xx_hal_msg_header *)hal_ind_msg->msg;
>> +if (list_empty(>hal_ind_queue)) {
>> +spin_unlock_irqrestore(>hal_ind_lock, flags);
>> +return;
>> +}
>>   
>> -switch (msg_header->msg_type) {
>> -case WCN36XX_HAL_COEX_IND:
>> -case WCN36XX_HAL_DEL_BA_IND:
>> -case WCN36XX_HAL_AVOID_FREQ_RANGE_IND:
>> -break;
>> -case WCN36XX_HAL_OTA_TX_COMPL_IND:
>> -wcn36xx_smd_tx_compl_ind(wcn,
>> - hal_ind_msg->msg,
>> - hal_ind_msg->msg_len);
>> -break;
>> -case WCN36XX_HAL_MISSED_BEACON_IND:
>> -wcn36xx_smd_missed_beacon_ind(wcn,
>> -  hal_ind_msg->msg,
>> -  hal_ind_msg->msg_len);
>> -break;
>> -case WCN36XX_HAL_DELETE_STA_CONTEXT_IND:
>> -wcn36xx_smd_delete_sta_context_ind(wcn,
>> -   hal_ind_msg->msg,
>> -   hal_ind_msg->msg_len);
>> -break;
>> -case WCN36XX_HAL_PRINT_REG_INFO_IND:
>> -wcn36xx_smd_print_reg_info_ind(wcn,
>> -   hal_ind_msg->msg,
>> -   hal_ind_msg->msg_len);
>> -break;
>> -case WCN36XX_HAL_SCAN_OFFLOAD_IND:
>> -wcn36xx_smd_hw_scan_ind(wcn, hal_ind_msg->msg,
>> -hal_ind_msg->msg_len);
>> -break;
>> -default:
>> -wcn36xx_err("SMD_EVENT (%d) not supported\n",
>> -  msg_header->msg_type);
>> +hal_ind_msg = list_first_entry(>hal_ind_queue,
>> +   struct wcn36xx_hal_ind_msg,
>> +

Re: [PATCH] wcn36xx: dequeue all pending indicator messages

2018-03-16 Thread Ramon Fried


On 3/16/2018 12:37 AM, Daniel Mack wrote:

In case wcn36xx_smd_rsp_process() is called more than once before
hal_ind_work was dispatched, the messages will end up in hal_ind_queue,
but wcn36xx_ind_smd_work() will only look at the first message in that
list.

Fix this by dequeing the messages from the list in a loop, and only stop
when it's empty.

Interesting. does it solve a specific bug ? can you elaborate ?


Signed-off-by: Daniel Mack 
---
  drivers/net/wireless/ath/wcn36xx/smd.c | 95 +++---
  1 file changed, 52 insertions(+), 43 deletions(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
b/drivers/net/wireless/ath/wcn36xx/smd.c
index 7cc29285e052..a6b5352f59e9 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.c
+++ b/drivers/net/wireless/ath/wcn36xx/smd.c
@@ -2409,54 +2409,63 @@ static void wcn36xx_ind_smd_work(struct work_struct 
*work)
  {
struct wcn36xx *wcn =
container_of(work, struct wcn36xx, hal_ind_work);
-   struct wcn36xx_hal_msg_header *msg_header;
-   struct wcn36xx_hal_ind_msg *hal_ind_msg;
-   unsigned long flags;
  
-	spin_lock_irqsave(>hal_ind_lock, flags);

+   for (;;) {
+   struct wcn36xx_hal_msg_header *msg_header;
+   struct wcn36xx_hal_ind_msg *hal_ind_msg;
+   unsigned long flags;
  
-	hal_ind_msg = list_first_entry(>hal_ind_queue,

-  struct wcn36xx_hal_ind_msg,
-  list);
-   list_del(wcn->hal_ind_queue.next);
-   spin_unlock_irqrestore(>hal_ind_lock, flags);
+   spin_lock_irqsave(>hal_ind_lock, flags);
  
-	msg_header = (struct wcn36xx_hal_msg_header *)hal_ind_msg->msg;

+   if (list_empty(>hal_ind_queue)) {
+   spin_unlock_irqrestore(>hal_ind_lock, flags);
+   return;
+   }
  
-	switch (msg_header->msg_type) {

-   case WCN36XX_HAL_COEX_IND:
-   case WCN36XX_HAL_DEL_BA_IND:
-   case WCN36XX_HAL_AVOID_FREQ_RANGE_IND:
-   break;
-   case WCN36XX_HAL_OTA_TX_COMPL_IND:
-   wcn36xx_smd_tx_compl_ind(wcn,
-hal_ind_msg->msg,
-hal_ind_msg->msg_len);
-   break;
-   case WCN36XX_HAL_MISSED_BEACON_IND:
-   wcn36xx_smd_missed_beacon_ind(wcn,
- hal_ind_msg->msg,
- hal_ind_msg->msg_len);
-   break;
-   case WCN36XX_HAL_DELETE_STA_CONTEXT_IND:
-   wcn36xx_smd_delete_sta_context_ind(wcn,
-  hal_ind_msg->msg,
-  hal_ind_msg->msg_len);
-   break;
-   case WCN36XX_HAL_PRINT_REG_INFO_IND:
-   wcn36xx_smd_print_reg_info_ind(wcn,
-  hal_ind_msg->msg,
-  hal_ind_msg->msg_len);
-   break;
-   case WCN36XX_HAL_SCAN_OFFLOAD_IND:
-   wcn36xx_smd_hw_scan_ind(wcn, hal_ind_msg->msg,
-   hal_ind_msg->msg_len);
-   break;
-   default:
-   wcn36xx_err("SMD_EVENT (%d) not supported\n",
- msg_header->msg_type);
+   hal_ind_msg = list_first_entry(>hal_ind_queue,
+  struct wcn36xx_hal_ind_msg,
+  list);
+   list_del(_ind_msg->list);
+   spin_unlock_irqrestore(>hal_ind_lock, flags);
+
+   msg_header = (struct wcn36xx_hal_msg_header *)hal_ind_msg->msg;
+
+   switch (msg_header->msg_type) {
+   case WCN36XX_HAL_COEX_IND:
+   case WCN36XX_HAL_DEL_BA_IND:
+   case WCN36XX_HAL_AVOID_FREQ_RANGE_IND:
+   break;
+   case WCN36XX_HAL_OTA_TX_COMPL_IND:
+   wcn36xx_smd_tx_compl_ind(wcn,
+hal_ind_msg->msg,
+hal_ind_msg->msg_len);
+   break;
+   case WCN36XX_HAL_MISSED_BEACON_IND:
+   wcn36xx_smd_missed_beacon_ind(wcn,
+ hal_ind_msg->msg,
+ hal_ind_msg->msg_len);
+   break;
+   case WCN36XX_HAL_DELETE_STA_CONTEXT_IND:
+   wcn36xx_smd_delete_sta_context_ind(wcn,
+  hal_ind_msg->msg,
+  
hal_ind_msg->msg_len);
+   break;
+   case WCN36XX_HAL_PRINT_REG_INFO_IND:
+   

[PATCH] wcn36xx: dequeue all pending indicator messages

2018-03-15 Thread Daniel Mack
In case wcn36xx_smd_rsp_process() is called more than once before
hal_ind_work was dispatched, the messages will end up in hal_ind_queue,
but wcn36xx_ind_smd_work() will only look at the first message in that
list.

Fix this by dequeing the messages from the list in a loop, and only stop
when it's empty.

Signed-off-by: Daniel Mack 
---
 drivers/net/wireless/ath/wcn36xx/smd.c | 95 +++---
 1 file changed, 52 insertions(+), 43 deletions(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
b/drivers/net/wireless/ath/wcn36xx/smd.c
index 7cc29285e052..a6b5352f59e9 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.c
+++ b/drivers/net/wireless/ath/wcn36xx/smd.c
@@ -2409,54 +2409,63 @@ static void wcn36xx_ind_smd_work(struct work_struct 
*work)
 {
struct wcn36xx *wcn =
container_of(work, struct wcn36xx, hal_ind_work);
-   struct wcn36xx_hal_msg_header *msg_header;
-   struct wcn36xx_hal_ind_msg *hal_ind_msg;
-   unsigned long flags;
 
-   spin_lock_irqsave(>hal_ind_lock, flags);
+   for (;;) {
+   struct wcn36xx_hal_msg_header *msg_header;
+   struct wcn36xx_hal_ind_msg *hal_ind_msg;
+   unsigned long flags;
 
-   hal_ind_msg = list_first_entry(>hal_ind_queue,
-  struct wcn36xx_hal_ind_msg,
-  list);
-   list_del(wcn->hal_ind_queue.next);
-   spin_unlock_irqrestore(>hal_ind_lock, flags);
+   spin_lock_irqsave(>hal_ind_lock, flags);
 
-   msg_header = (struct wcn36xx_hal_msg_header *)hal_ind_msg->msg;
+   if (list_empty(>hal_ind_queue)) {
+   spin_unlock_irqrestore(>hal_ind_lock, flags);
+   return;
+   }
 
-   switch (msg_header->msg_type) {
-   case WCN36XX_HAL_COEX_IND:
-   case WCN36XX_HAL_DEL_BA_IND:
-   case WCN36XX_HAL_AVOID_FREQ_RANGE_IND:
-   break;
-   case WCN36XX_HAL_OTA_TX_COMPL_IND:
-   wcn36xx_smd_tx_compl_ind(wcn,
-hal_ind_msg->msg,
-hal_ind_msg->msg_len);
-   break;
-   case WCN36XX_HAL_MISSED_BEACON_IND:
-   wcn36xx_smd_missed_beacon_ind(wcn,
- hal_ind_msg->msg,
- hal_ind_msg->msg_len);
-   break;
-   case WCN36XX_HAL_DELETE_STA_CONTEXT_IND:
-   wcn36xx_smd_delete_sta_context_ind(wcn,
-  hal_ind_msg->msg,
-  hal_ind_msg->msg_len);
-   break;
-   case WCN36XX_HAL_PRINT_REG_INFO_IND:
-   wcn36xx_smd_print_reg_info_ind(wcn,
-  hal_ind_msg->msg,
-  hal_ind_msg->msg_len);
-   break;
-   case WCN36XX_HAL_SCAN_OFFLOAD_IND:
-   wcn36xx_smd_hw_scan_ind(wcn, hal_ind_msg->msg,
-   hal_ind_msg->msg_len);
-   break;
-   default:
-   wcn36xx_err("SMD_EVENT (%d) not supported\n",
- msg_header->msg_type);
+   hal_ind_msg = list_first_entry(>hal_ind_queue,
+  struct wcn36xx_hal_ind_msg,
+  list);
+   list_del(_ind_msg->list);
+   spin_unlock_irqrestore(>hal_ind_lock, flags);
+
+   msg_header = (struct wcn36xx_hal_msg_header *)hal_ind_msg->msg;
+
+   switch (msg_header->msg_type) {
+   case WCN36XX_HAL_COEX_IND:
+   case WCN36XX_HAL_DEL_BA_IND:
+   case WCN36XX_HAL_AVOID_FREQ_RANGE_IND:
+   break;
+   case WCN36XX_HAL_OTA_TX_COMPL_IND:
+   wcn36xx_smd_tx_compl_ind(wcn,
+hal_ind_msg->msg,
+hal_ind_msg->msg_len);
+   break;
+   case WCN36XX_HAL_MISSED_BEACON_IND:
+   wcn36xx_smd_missed_beacon_ind(wcn,
+ hal_ind_msg->msg,
+ hal_ind_msg->msg_len);
+   break;
+   case WCN36XX_HAL_DELETE_STA_CONTEXT_IND:
+   wcn36xx_smd_delete_sta_context_ind(wcn,
+  hal_ind_msg->msg,
+  
hal_ind_msg->msg_len);
+   break;
+   case WCN36XX_HAL_PRINT_REG_INFO_IND:
+   wcn36xx_smd_print_reg_info_ind(wcn,
+