Re: [PATCH] wireless/marvell/mwifiex: Fix a double free in mwifiex_send_tdls_action_frame

2021-04-13 Thread Brian Norris
Hi,

On Wed, Apr 14, 2021 at 12:08:32AM +0800, lyl2...@mail.ustc.edu.cn wrote:
> 
> Hi,
>   maintianers.
> 
>   Sorry to disturb you, but this patch seems to be missed more than two weeks.
>   Could you help to review this patch? I am sure it won't take you much time.

You might take a look here:
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches#checking_state_of_patches_from_patchwork
https://patchwork.kernel.org/project/linux-wireless/list/

Two weeks is not unheard of for waiting.

Anyway, you *did* succeed in catching my attention today, so I'll give
you a review, below:

> > -原始邮件-
> > 发件人: "Lv Yunlong" 
> > 发送时间: 2021-03-29 19:24:35 (星期一)
> > 收件人: amitkar...@gmail.com, ganapathi.b...@nxp.com, huxinming...@gmail.com, 
> > kv...@codeaurora.org, da...@davemloft.net, k...@kernel.org
> > 抄送: linux-wirel...@vger.kernel.org, net...@vger.kernel.org, 
> > linux-kernel@vger.kernel.org, "Lv Yunlong" 
> > 主题: [PATCH] wireless/marvell/mwifiex: Fix a double free in 
> > mwifiex_send_tdls_action_frame
> > 
> > In mwifiex_send_tdls_action_frame, it calls 
> > mwifiex_construct_tdls_action_frame
> > (..,skb). The skb will be freed in mwifiex_construct_tdls_action_frame() 
> > when
> > it is failed. But when mwifiex_construct_tdls_action_frame() returns error,
> > the skb will be freed in the second time by dev_kfree_skb_any(skb).
> > 
> > My patch removes the redundant dev_kfree_skb_any(skb) when
> > mwifiex_construct_tdls_action_frame() failed.
> > 
> > Fixes: b23bce2965680 ("mwifiex: add tdls_mgmt handler support")
> > Signed-off-by: Lv Yunlong 
> > ---
> >  drivers/net/wireless/marvell/mwifiex/tdls.c | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/drivers/net/wireless/marvell/mwifiex/tdls.c 
> > b/drivers/net/wireless/marvell/mwifiex/tdls.c
> > index 97bb87c3676b..8d4d0a9cf6ac 100644
> > --- a/drivers/net/wireless/marvell/mwifiex/tdls.c
> > +++ b/drivers/net/wireless/marvell/mwifiex/tdls.c
> > @@ -856,7 +856,6 @@ int mwifiex_send_tdls_action_frame(struct 
> > mwifiex_private *priv, const u8 *peer,
> > if (mwifiex_construct_tdls_action_frame(priv, peer, action_code,
> > dialog_token, status_code,
> > skb)) {
> > -   dev_kfree_skb_any(skb);

Good catch, and this looks correct for most cases, but I'll note that
you missed one issue: mwifiex_construct_tdls_action_frame() has a
default/error case ("Unknown TDLS action frame type") which does *not*
free this SKB, so you should patch that up at the same time. Otherwise,
you're trading a double-free for a potential memory leak.

With that fixed:

Reviewed-by: Brian Norris 

Thanks.

> > return -EINVAL;
> > }
> >  
> > -- 
> > 2.25.1
> > 


Re: [RFC 2/7] ath10k: Add support to process rx packet in thread

2021-03-22 Thread Brian Norris
On Mon, Mar 22, 2021 at 4:58 PM Ben Greear  wrote:
> On 7/22/20 6:00 AM, Felix Fietkau wrote:
> > On 2020-07-22 14:55, Johannes Berg wrote:
> >> On Wed, 2020-07-22 at 14:27 +0200, Felix Fietkau wrote:
> >>
> >>> I'm considering testing a different approach (with mt76 initially):
> >>> - Add a mac80211 rx function that puts processed skbs into a list
> >>> instead of handing them to the network stack directly.
> >>
> >> Would this be *after* all the mac80211 processing, i.e. in place of the
> >> rx-up-to-stack?
> > Yes, it would run all the rx handlers normally and then put the
> > resulting skbs into a list instead of calling netif_receive_skb or
> > napi_gro_frags.
>
> Whatever came of this?  I realized I'm running Felix's patch since his mt76
> driver needs it.  Any chance it will go upstream?

If you're asking about $subject (moving NAPI/RX to a thread), this
landed upstream recently:
http://git.kernel.org/linus/adbb4fb028452b1b0488a1a7b66ab856cdf20715

It needs a bit of coaxing to work on a WiFi driver (including: WiFi
drivers tend to have a different netdev for NAPI than they expose to
/sys/class/net/), but it's there.

I'm not sure if people had something else in mind in the stuff you're
quoting though.

Brian


[PATCH] mwifiex: don't print SSID to logs

2021-02-24 Thread Brian Norris
There are a few reasons not to dump SSIDs as-is in kernel logs:

1) they're not guaranteed to be any particular text encoding (UTF-8,
   ASCII, ...) in general
2) it's somewhat redundant; the BSSID should be enough to uniquely
   identify the AP/STA to which we're connecting
3) BSSIDs have an easily-recognized format, whereas SSIDs do not (they
   are free-form)
4) other common drivers (e.g., everything based on mac80211) get along
   just fine by only including BSSIDs when logging state transitions

Additional notes on reason #3: this is important for the
privacy-conscious, especially when providing tools that convey
kernel logs on behalf of a user -- e.g., when reporting bugs. So for
example, it's easy to automatically filter logs for MAC addresses, but
it's much harder to filter SSIDs out of unstructured text.

Signed-off-by: Brian Norris 
---
 drivers/net/wireless/marvell/mwifiex/cfg80211.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/cfg80211.c 
b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
index a2ed268ce0da..0961f4a5e415 100644
--- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c
+++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
@@ -2300,8 +2300,7 @@ mwifiex_cfg80211_assoc(struct mwifiex_private *priv, 
size_t ssid_len,
is_scanning_required = 1;
} else {
mwifiex_dbg(priv->adapter, MSG,
-   "info: trying to associate to '%.*s' bssid 
%pM\n",
-   req_ssid.ssid_len, (char *)req_ssid.ssid,
+   "info: trying to associate to bssid %pM\n",
bss->bssid);
memcpy(>cfg_bssid, bss->bssid, ETH_ALEN);
break;
@@ -2378,8 +2377,7 @@ mwifiex_cfg80211_connect(struct wiphy *wiphy, struct 
net_device *dev,
}
 
mwifiex_dbg(adapter, INFO,
-   "info: Trying to associate to %.*s and bssid %pM\n",
-   (int)sme->ssid_len, (char *)sme->ssid, sme->bssid);
+   "info: Trying to associate to bssid %pM\n", sme->bssid);
 
if (!mwifiex_stop_bg_scan(priv))
cfg80211_sched_scan_stopped_locked(priv->wdev.wiphy, 0);
@@ -2512,9 +2510,8 @@ mwifiex_cfg80211_join_ibss(struct wiphy *wiphy, struct 
net_device *dev,
goto done;
}
 
-   mwifiex_dbg(priv->adapter, MSG,
-   "info: trying to join to %.*s and bssid %pM\n",
-   params->ssid_len, (char *)params->ssid, params->bssid);
+   mwifiex_dbg(priv->adapter, MSG, "info: trying to join to bssid %pM\n",
+   params->bssid);
 
mwifiex_set_ibss_params(priv, params);
 
-- 
2.30.0.617.g56c4b15f3c-goog



Re: [PATCH v2] nand: brcmnand: fix OOB R/W with Hamming ECC

2021-02-24 Thread Brian Norris
On Wed, Feb 24, 2021 at 12:02 AM Álvaro Fernández Rojas
 wrote:
> Fixes: 27c5b17cd1b1 ("mtd: nand: add NAND driver "library" for Broadcom STB 
> NAND controller")

FWIW, I could believe this was broken. We weren't testing Hamming ECC
(nor JFFS2) at the time, so it could easily have obvious bugs like
this.

And since I got this far, and I'm still in MAINTAINERS, I guess:

Acked-by: Brian Norris 


Re: [PATCH] mwifiex: Report connected BSS with cfg80211_connect_bss()

2021-02-01 Thread Brian Norris
On Sun, Jan 31, 2021 at 11:07 PM Yen-lin Lai  wrote:
> When a network is moved or reconfigured on the different channel, there
> can be multiple BSSes with the same BSSID and SSID in scan result
> before the old one expires. Then, it can cause cfg80211_connect_result
> to map current_bss to a bss with the wrong channel.
>
> Let mwifiex_cfg80211_assoc return the selected BSS and then the caller
> can report it cfg80211_connect_bss.
>
> Signed-off-by: Yen-lin Lai 

This seems sane to me:

Reviewed-by: Brian Norris 


Re: [PATCH] marvell/mwifiex: replace one-element array with flexible-array member.

2021-01-19 Thread Brian Norris
One more thing, for context:

On Tue, Jan 19, 2021 at 11:11 AM Brian Norris  wrote:
> On Fri, Jan 15, 2021 at 1:39 AM Jiapeng Zhong
>  wrote:
> >
> > Fix the follow coccicheck warnings:
> >
> > ./drivers/net/wireless/marvell/mwifiex/fw.h: WARNING use flexible-array
> > member instead(https://www.kernel.org/doc/html/latest/process/
> > deprecated.html#zero-length-and-one-element-arrays)
> >
> > Reported-by: Abaci Robot 
> > Signed-off-by: Jiapeng Zhong 
>
> Past experience unfortunately requires me to ask: did you test your
> changes? I understand that's a mostly legit warning, and a good
> deprecation notice, but that doesn't mean this is the right fix. One
> probably should instead audit the usage sites to see if they *are*
> already making proper sizeof (or other) comparisons, and if not, fix
> those first. And if any sites *are* doing correct sizeof computations
> using the existing struct layouts, then you are probably breaking
> them.
>
> Or if you have audited the usage of these structs, it's nice to make a
> small explanation of why this is correct, so I (and other readers)
> don't have to ask these questions :)

FYI, there are others who I believe are making similar blind changes
to this code:

[PATCH 1/1] mwifiex: Fix possible buffer overflows in mwifiex_config_scan
https://lore.kernel.org/linux-wireless/CA+ASDXPkLg2GGFJTt25YO7wae==yahftf8jxu520pl_vzat...@mail.gmail.com/

For that particular case (the 'ssid' field in
mwifiex_ie_types_wildcard_ssid_params), the previous patch-er was
incorrect, and I believe your change is a better one. But neither of
you provided useful analysis.

Brian


Re: [PATCH] marvell/mwifiex: replace one-element array with flexible-array member.

2021-01-19 Thread Brian Norris
Hi,

On Fri, Jan 15, 2021 at 1:39 AM Jiapeng Zhong
 wrote:
>
> Fix the follow coccicheck warnings:
>
> ./drivers/net/wireless/marvell/mwifiex/fw.h: WARNING use flexible-array
> member instead(https://www.kernel.org/doc/html/latest/process/
> deprecated.html#zero-length-and-one-element-arrays)
>
> Reported-by: Abaci Robot 
> Signed-off-by: Jiapeng Zhong 

Past experience unfortunately requires me to ask: did you test your
changes? I understand that's a mostly legit warning, and a good
deprecation notice, but that doesn't mean this is the right fix. One
probably should instead audit the usage sites to see if they *are*
already making proper sizeof (or other) comparisons, and if not, fix
those first. And if any sites *are* doing correct sizeof computations
using the existing struct layouts, then you are probably breaking
them.

Or if you have audited the usage of these structs, it's nice to make a
small explanation of why this is correct, so I (and other readers)
don't have to ask these questions :)

Regards,
Brian


Re: [PATCH 1/1] mwifiex: Fix possible buffer overflows in mwifiex_config_scan

2021-01-11 Thread Brian Norris
(Note: this is version 1; there's a later version posted, which does
not have a v2 tag...)

https://lore.kernel.org/linux-wireless/20201208150951.35866-1-ruc_zhangxiao...@163.com/

On Sat, Jan 9, 2021 at 7:11 AM Peter Seiderer  wrote:
> On Tue,  8 Dec 2020 20:45:23 +0800, Xiaohui Zhang  
> wrote:
> > From: Zhang Xiaohui 
> > mwifiex_config_scan() calls memcpy() without checking
> > the destination size may trigger a buffer overflower,
> > which a local user could use to cause denial of service
> > or the execution of arbitrary code.
> > Fix it by putting the length check before calling memcpy().
> >
> > Signed-off-by: Zhang Xiaohui 
> > ---
> >  drivers/net/wireless/marvell/mwifiex/scan.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/net/wireless/marvell/mwifiex/scan.c 
> > b/drivers/net/wireless/marvell/mwifiex/scan.c
> > index c2a685f63..b1d90678f 100644
> > --- a/drivers/net/wireless/marvell/mwifiex/scan.c
> > +++ b/drivers/net/wireless/marvell/mwifiex/scan.c
> > @@ -930,6 +930,8 @@ mwifiex_config_scan(struct mwifiex_private *priv,
> >   "DIRECT-", 7))
> >   wildcard_ssid_tlv->max_ssid_length = 0xfe;
> >
> > + if (ssid_len > 1)
> > + ssid_len = 1;
>
> Why do your believe the available size is only '1'? A SSID is expected
> to be of size IEEE80211_MAX_SSID_LE/32 and the wildcard_ssid_tlv pointer
> is casted from tlv_pos (some lines above) which is a pointer/index into
> scan_cfg_out->tlv_buf...

I pointed out this discrepancy already, taking a slightly different approach:

https://lore.kernel.org/linux-wireless/ca+asdxpvu5s0vm0aocyqln090u3bwa_nv358ywkpxuu223u...@mail.gmail.com/

> And the define (line 44) indicates there should be enough space for a SSID:
>
>   42 /* Memory needed to store a max number/size WildCard SSID TLV for a 
> firmware
>   43 scan */
>   44 #define WILDCARD_SSID_TLV_MAX_SIZE  \
>   45 (MWIFIEX_MAX_SSID_LIST_LENGTH * \
>   46 (sizeof(struct mwifiex_ie_types_wildcard_ssid_params)   \
>   47 + IEEE80211_MAX_SSID_LEN))

Ah, good catch. So this may not be a true overflow issue at all, even
if it's confusing and bad code. The "problem" is that this piece of
the struct is variable-length, and unless we're going to dynamically
resize/reallocate the whole buffer, we have to assume the initial
allocation was large enough (and per your note, it is).

> For sure something to improve here instead of using a confusing 'u8 ssid[1]'
> in struct mwifiex_ie_types_wildcard_ssid_params...

Yep.

Brian


Re: [PATCH 0/3] mac80211: Trigger disconnect for STA during recovery

2020-12-17 Thread Brian Norris
On Thu, Dec 17, 2020 at 2:57 PM Ben Greear  wrote:
> On 12/17/20 2:24 PM, Brian Norris wrote:
> > I'd also note that we don't operate in AP mode -- only STA -- and IIRC
> > Ben, you've complained about AP mode in the past.
>
> I complain about all sorts of things, but I'm usually running
> station mode :)

Hehe, fair :) Maybe I'm mixed up.

But I do get the feeling that specifically within the ath10k family,
there are wildly different use cases (mobile, PC, AP) and chips (and
firmware) that tend to go along with them, and that those use cases
get a fairly different population of {developers, testers, reporters}.
So claiming "feature X works" pretty much always has to be couched in
which chips, firmware, and use case. And there's certainly some wisdom
in these sections:

https://wireless.wiki.kernel.org/en/users/drivers/ath10k/submittingpatches#hardware_families
https://wireless.wiki.kernel.org/en/users/drivers/ath10k/submittingpatches#tested-on_tag

> Do you actually see iwlwifi stations stay associated through
> firmware crashes?

Yes.

> Anyway, happy to hear some have seamless recovery, and in that case,
> I have no objections to the patch.

OK! I hope I'm not the only one with such results, because then I
still might question my sanity (and test coverage), but that's still
my understanding.

BTW, I haven't yet closely reviewed the patch series myself, but I ACK
the concept.

Brian


Re: [PATCH 0/3] mac80211: Trigger disconnect for STA during recovery

2020-12-17 Thread Brian Norris
On Tue, Dec 15, 2020 at 10:51:13PM +0530, Youghandhar Chintala wrote:
> From: Rakesh Pillai 

I meant to mention in my other reply: the threading on this series is
broken (as in, it doesn't exist). It looks like you're using
git-send-email (good!), but somehow it doesn't have any In-Reply-To or
References (bad!). Did you send all your mail in one invocation, or did
you send them as separate git-send-email commands? Anyway, please
investigate what when wrong so you can get this right in the future.

For one, this affects Patchwork's ability to group patch series (not to
mention everybody who uses a decent mail reader, with proper threading).
See for example the lore archive, which only is threading replies to
this cover letter:

https://lore.kernel.org/linux-wireless/20201215172113.5038-1-yough...@codeaurora.org/

Regards,
Brian


Re: [PATCH 0/3] mac80211: Trigger disconnect for STA during recovery

2020-12-17 Thread Brian Norris
On Tue, Dec 15, 2020 at 10:23:33AM -0800, Ben Greear wrote:
> On 12/15/20 9:21 AM, Youghandhar Chintala wrote:
> > From: Rakesh Pillai 
> > 
> > Currently in case of target hardware restart ,we just reconfig and
> > re-enable the security keys and enable the network queues to start
> > data traffic back from where it was interrupted.
> 
> Are there any known mac80211 radios/drivers that *can* support seamless 
> restarts?
> 
> If not, then just could always enable this feature in mac80211?

I'm quite sure that iwlwifi intentionally supports a seamless restart.
>From my experience with dealing with user reports, I don't recall any
issues where restart didn't function as expected, unless there was some
deeper underlying failure (e.g., hardware/power failure; driver bugs /
lockups).

I don't have very good stats for ath10k/QCA6174, but it survives
our testing OK and I again don't recall any user-reported complaints in
this area. I'd say this is a weaker example though, as I don't have as
clear of data. (By contrast, ath10k/WCN399x, which Rakesh, et al, are
patching here, does not pass our tests at all, and clearly fails to
recover from "seamless" restarts, as noted in patch 3.)

I'd also note that we don't operate in AP mode -- only STA -- and IIRC
Ben, you've complained about AP mode in the past.

Brian


Re: [PATCH] ath10k: Fix error handling in case of CE pipe init failure

2020-12-11 Thread Brian Norris
On Fri, Dec 11, 2020 at 11:00 AM Rakesh Pillai  wrote:
>
> Currently if the copy engine pipe init fails for snoc based
> chipsets, the rri is not freed.
>
> Fix this error handling for copy engine pipe init
> failure.
>
> Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.1-01040-QCAHLSWMTPLZ-1
>
> Fixes: 4945af5b264f ("ath10k: enable SRRI/DRRI support on ddr for WCN3990")
> Signed-off-by: Rakesh Pillai 

Reviewed-by: Brian Norris 


Re: [PATCH] ath10k: Remove voltage regulator votes during wifi disable

2020-12-10 Thread Brian Norris
On Thu, Dec 10, 2020 at 7:09 AM Rakesh Pillai  wrote:
> --- a/drivers/net/wireless/ath/ath10k/snoc.c
> +++ b/drivers/net/wireless/ath/ath10k/snoc.c
> @@ -1045,14 +1085,18 @@ static int ath10k_snoc_hif_power_up(struct ath10k *ar,
> ret = ath10k_snoc_init_pipes(ar);
> if (ret) {
> ath10k_err(ar, "failed to initialize CE: %d\n", ret);
> -   goto err_wlan_enable;
> +   goto err_free_rri;
> }
>
> return 0;
>
> -err_wlan_enable:
> +err_free_rri:
> +   ath10k_ce_free_rri(ar);

This change in the error path seems to be an unrelated (but correct)
fix. It deserves its own patch, I think.

Brian

> ath10k_snoc_wlan_disable(ar);
>
> +err_hw_power_off:
> +   ath10k_hw_power_off(ar);
> +
> return ret;
>  }
>


Re: [PATCH 1/2] platform/chrome: cros_ec_proto: Use EC_HOST_EVENT_MASK not BIT

2020-12-09 Thread Brian Norris
On Wed, Dec 09, 2020 at 10:03:54PM +, Evan Benn wrote:
> The host_event_code enum is 1-based, use EC_HOST_EVENT_MASK not BIT to
> generate the intended mask. This patch changes the behaviour of the
> mask, a following patch will restore the intended behaviour:
> 'Add LID and BATTERY to default mask'
> 
> Fixes: c214e564acb2ad9463293ab9c109bfdae91fbeaf

Normally the short hash + commit title are used here, so:

Fixes: c214e564acb2 ("platform/chrome: cros_ec_proto: ignore unnecessary 
wakeups on old ECs")

https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-changes

With luck, maintainers can fix that up when applying, so you don't need
to resend.

Otherwise, both patches look good to me, thanks!

Reviewed-by: Brian Norris 


Re: [PATCH 1/1] mwifiex: Fix possible buffer overflows in mwifiex_uap_bss_param_prepare

2020-12-08 Thread Brian Norris
(FWIW, this author's mail has been routed to my spam mailbox. That's
partly my fault and/or my "choice" of mail provider, but that's why I
only see these once Kalle replies to them.)

On Tue, Dec 8, 2020 at 8:03 AM Xiaohui Zhang  wrote:
>
> From: Zhang Xiaohui 
>
> mwifiex_uap_bss_param_prepare() calls memcpy() without checking
> the destination size may trigger a buffer overflower,
> which a local user could use to cause denial of service or the
> execution of arbitrary code.
> Fix it by putting the length check before calling memcpy().
>
> Signed-off-by: Zhang Xiaohui 
> ---
>  drivers/net/wireless/marvell/mwifiex/uap_cmd.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/marvell/mwifiex/uap_cmd.c 
> b/drivers/net/wireless/marvell/mwifiex/uap_cmd.c
> index b48a85d79..937c75e89 100644
> --- a/drivers/net/wireless/marvell/mwifiex/uap_cmd.c
> +++ b/drivers/net/wireless/marvell/mwifiex/uap_cmd.c
> @@ -502,7 +502,8 @@ mwifiex_uap_bss_param_prepare(u8 *tlv, void *cmd_buf, u16 
> *param_size)
> ssid = (struct host_cmd_tlv_ssid *)tlv;
> ssid->header.type = cpu_to_le16(TLV_TYPE_UAP_SSID);
> ssid->header.len = cpu_to_le16((u16)bss_cfg->ssid.ssid_len);
> -   memcpy(ssid->ssid, bss_cfg->ssid.ssid, 
> bss_cfg->ssid.ssid_len);
> +   memcpy(ssid->ssid, bss_cfg->ssid.ssid,
> +  min_t(u32, bss_cfg->ssid.ssid_len, 
> strlen(ssid->ssid)));

This strlen() check makes no sense to me. We are *writing* to
ssid->ssid, so its initial contents are either zero or garbage --
strlen() will either give a zero or unpredictable value. I'm pretty
sure that's not what you intend.

On the other hand, it's hard to determine what the proper bound here
*should* be. This 'ssid' struct is really just a pointer into
mwifiex_cmd_uap_sys_config()'s uap_sys_config (struct
host_cmd_ds_sys_config), which doesn't have any defined length -- its
length is only given by way of its surrounding buffers/structs.
Altogether, the code is hard to reason about.

Anyway, this patch is wrong, so NAK.

Brian

> cmd_size += sizeof(struct mwifiex_ie_types_header) +
> bss_cfg->ssid.ssid_len;
> tlv += sizeof(struct mwifiex_ie_types_header) +
> --
> 2.17.1
>


Re: [PATCH 1/1] mwifiex: Fix possible buffer overflows in mwifiex_config_scan

2020-12-08 Thread Brian Norris
On Tue, Dec 8, 2020 at 7:14 AM Xiaohui Zhang  wrote:
>
> From: Zhang Xiaohui 
>
> mwifiex_config_scan() calls memcpy() without checking
> the destination size may trigger a buffer overflower,
> which a local user could use to cause denial of service
> or the execution of arbitrary code.
> Fix it by putting the length check before calling memcpy().

^^ That's not really what you're doing any more, for the record. But
then, describing "what" is not really the point of a commit message
(that's what the code is for), so maybe that's not that important.

> Signed-off-by: Zhang Xiaohui 
> ---
>  drivers/net/wireless/marvell/mwifiex/scan.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/marvell/mwifiex/scan.c 
> b/drivers/net/wireless/marvell/mwifiex/scan.c
> index c2a685f63..34293fd80 100644
> --- a/drivers/net/wireless/marvell/mwifiex/scan.c
> +++ b/drivers/net/wireless/marvell/mwifiex/scan.c
> @@ -931,7 +931,7 @@ mwifiex_config_scan(struct mwifiex_private *priv,
> wildcard_ssid_tlv->max_ssid_length = 0xfe;
>
> memcpy(wildcard_ssid_tlv->ssid,
> -  user_scan_in->ssid_list[i].ssid, ssid_len);
> +  user_scan_in->ssid_list[i].ssid, min_t(u32, 
> ssid_len, 1));

This *looks* like it should be wrong, because SSIDs are clearly longer
than 1 byte in many cases, but you *are* right that this is what the
struct is defined as:

struct mwifiex_ie_types_wildcard_ssid_params {
...
u8 ssid[1];
};

This feels like something that could use some confirmation from
NXP/ex-Marvell folks if possible, but if not that, at least some
creative testing. Did you actually test this patch, to make sure
non-wildcard scans still work?

Also, even if this is correct, it seems like it would be more correct
to use 'sizeof(wildcard_ssid_tlv->ssid)' instead of a magic number 1.

Brian

>
> tlv_pos += (sizeof(wildcard_ssid_tlv->header)
> + le16_to_cpu(wildcard_ssid_tlv->header.len));
> --
> 2.17.1
>


Re: [PATCH v2] ath10k: skip the wait for completion to recovery in shutdown path

2020-12-01 Thread Brian Norris
On Thu, Nov 26, 2020 at 9:16 AM Youghandhar Chintala
 wrote:
> --- a/drivers/net/wireless/ath/ath10k/snoc.c
> +++ b/drivers/net/wireless/ath/ath10k/snoc.c
> @@ -1790,9 +1790,6 @@ static int ath10k_snoc_remove(struct platform_device 
> *pdev)
>
> reinit_completion(>driver_recovery);
>
> -   if (test_bit(ATH10K_SNOC_FLAG_RECOVERY, _snoc->flags))
> -   wait_for_completion_timeout(>driver_recovery, 3 * HZ);

Hmm, this is the only instance of waiting for this completion, which
means that after this patch, 'ar->driver_recovery' is doing exactly
nothing. Should you instead just remove it completely?

Also, if your patch is correct, it seems like the completion was never
needed in the first place. You should probably address such a claim in
the commit message; is there truly no need to wait here? Or was there
some purpose here, but that purpose was accomplished some other way?
Or was there a purpose, and that purpose was misguided? It feels to me
like it is indeed correct to remove this (shutdown should be performed
promptly; we don't need to delay it just to try to "finish
recovering"), but it's your job to convince the reader.

Brian

> -
> set_bit(ATH10K_SNOC_FLAG_UNREGISTERING, _snoc->flags);
>
> ath10k_core_unregister(ar);


Re: [PATCH 1/3] mwifiex: disable ps_mode explicitly by default instead

2020-11-20 Thread Brian Norris
On Fri, Oct 30, 2020 at 1:04 AM Tsuchiya Yuto  wrote:
> On Thu, 2020-10-29 at 11:25 -0700, Brian Norris wrote:
> > For the record, Chrome OS supports plenty of mwifiex systems with 8897
> > (SDIO only) and 8997 (PCIe), with PS enabled, and you're hurting
> > those. Your problem sounds to be exclusively a problem with the PCIe
> > 8897 firmware.
>
> Actually, I already know that some Chromebooks use these mwifiex cards
> (but not out PCIe-88W8897) because I personally like chromiumos. I'm
> always wondering what is the difference. If the difference is firmware,
> our PCIe-88W8897 firmware should really be fixed instead of this stupid
> series.

PCIe is a very different beast. (For one, it uses DMA and
memory-mapped registers, where SDIO has neither.) It was a very
difficult slog to get PCIe/8997 working reliably for the few
Chromebooks that shipped it, and lots of that work is in firmware. I
would not be surprised if the PCIe-related changes Marvell made for
8997 never fed back into their PCIe-8897 firmware. Or maybe they only
ever launched PCIe-8897 for Windows, and the Windows driver included
workarounds that were never published to their Linux driver. But now
I'm just speculating.

> Yes, I'm sorry that I know this series is just a stupid one but I have to
> send this anyway because this stability issue has not been fixed for a
> long time. I should have added this buglink to every commit as well:
>
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=109681
>
> If the firmware can't be fixed, I'm afraid I have to go this way. It makes
> no sense to keep enabling power_save for the affected devices if we know
> it's broken.

Condolences and sympathy, seriously. You likely have little chance of
getting the firmware fixed, so without new information (e.g,. other
workarounds?), this is the probably the right way to go.

Brian


Re: [PATCH] mwifiex: pcie: add enable_device_dump module parameter

2020-11-20 Thread Brian Norris
(Sorry if anything's a bit slow here. I don't really have time to
write out full proposals myself.)

On Fri, Oct 30, 2020 at 3:30 AM Tsuchiya Yuto  wrote:
> Let me know if splitting this patch like this works. 1) The first patch
> is to add this module parameter but don't change the default behavior.

That *could* be OK with me, although it's already been said that there
are many people who dislike extra module parameters. I also don't see
why this needs to be a module parameter at all. If you do #2 right,
you don't really need this, as there are already several standard ways
of doing this (e.g., via Kconfig, or via nl80211 on a per-device
basis).

> 2) The second patch is to change the parameter value depending on the
> DMI matching or something so that it doesn't break the existing users.

Point 2 sounds good, and this is the key point. Note that you can do
point 2 without making it a module parameter. Just keep a flag in the
driver-private structures.

> But what I want to say here as well is that, if the firmware can be fixed,
> we don't need a patch like this.

Sure. That's also where we don't necessarily need more ways to control
this from user space (e.g., module parameters), but just better
detection of currently broken systems (in the driver). And if firmware
ever gets fixed, we can undo the "broken device" detection.

Brian


Re: [PATCH stable] net: sch_generic: fix the missing new qdisc assignment bug

2020-11-12 Thread Brian Norris
On Tue, Nov 03, 2020 at 11:25:38AM +0800, Yunsheng Lin wrote:
> commit 2fb541c862c9 ("net: sch_generic: aviod concurrent reset and enqueue op 
> for lockless qdisc")
> 
> When the above upstream commit is backported to stable kernel,
> one assignment is missing, which causes two problems reported
> by Joakim and Vishwanath, see [1] and [2].
> 
> So add the assignment back to fix it.
> 
> 1. https://www.spinics.net/lists/netdev/msg693916.html
> 2. https://www.spinics.net/lists/netdev/msg695131.html
> 
> Fixes: 749cc0b0c7f3 ("net: sch_generic: aviod concurrent reset and enqueue op 
> for lockless qdisc")
> Signed-off-by: Yunsheng Lin 

For whatever it's worth, we've seen similar problems (particularly,
ENOBUFS on AF_PACKET sockets) and have tested this fix on 4.19.y (we're
also queueing it up on our 5.4.y branch, but haven't tested it as much):

Tested-by: Brian Norris 

Thanks!


Re: [PATCH 41/41] realtek: rtw88: pci: Add prototypes for .probe, .remove and .shutdown

2020-11-02 Thread Brian Norris
On Mon, Nov 2, 2020 at 3:25 AM Lee Jones  wrote:
> --- a/drivers/net/wireless/realtek/rtw88/pci.h
> +++ b/drivers/net/wireless/realtek/rtw88/pci.h
> @@ -212,6 +212,10 @@ struct rtw_pci {
> void __iomem *mmap;
>  };
>
> +int rtw_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id);
> +void rtw_pci_remove(struct pci_dev *pdev);
> +void rtw_pci_shutdown(struct pci_dev *pdev);
> +
>

These definitions are already in 4 other header files:

drivers/net/wireless/realtek/rtw88/rtw8723de.h
drivers/net/wireless/realtek/rtw88/rtw8821ce.h
drivers/net/wireless/realtek/rtw88/rtw8822be.h
drivers/net/wireless/realtek/rtw88/rtw8822ce.h

Seems like you should be moving them, not just adding yet another duplicate.

Brian


Re: [PATCH 1/3] mwifiex: disable ps_mode explicitly by default instead

2020-10-29 Thread Brian Norris
On Thu, Oct 29, 2020 at 11:37 AM Andy Shevchenko
 wrote:
> And this feeling (that it's a FW issue) what I have. But the problem
> here, that Marvell didn't fix and probably won't fix their FW...

Sure, I wouldn't hold your breath. So some of these tactics (disabling
PS, etc.) may be valid, but you have to do them smartly, acknowledging
that there are other (more stable) firmwares and chips in use for this
same driver.

> Just wondering if Google (and MS in their turn) use different
> firmwares to what we have available in Linux.

No clue about MS. But Chrom{e,ium} OS generally publishes all this
stuff where possible. You can see what we use here:

https://chromium.googlesource.com/chromiumos/third_party/linux-firmware/+/HEAD/mrvl/
https://chromium.googlesource.com/chromiumos/third_party/marvell/+/HEAD/

We try to stay somewhat in sync / parallel with "upstream"
linux-firmware, and strongly encourage vendors to send the same
binaries upstream when they hand them to us, but there are exceptions
and oversights (e.g., old products might have used a different
firmware branch).

Notably, I'll repeat: we (Chrome OS) don't actually support the PCIe
variant of 8897, so the report in question ("PCIe-88W8897") has no
equivalent in a supported Chrome OS system (even if there are binaries
in the links above, we don't use them). I would not be surprised if
there are an enormous number of firmware bugs there, as there were
initially for PCIe-88W8997 (which we do support).

Brian


Re: [PATCH 1/3] mwifiex: disable ps_mode explicitly by default instead

2020-10-29 Thread Brian Norris
On Wed, Oct 28, 2020 at 7:04 PM Tsuchiya Yuto  wrote:
>
> On Microsoft Surface devices (PCIe-88W8897), the ps_mode causes
> connection unstable, especially with 5GHz APs. Then, it eventually causes
> fw crash.
>
> This commit disables ps_mode by default instead of enabling it.
>
> Required code is extracted from mwifiex_drv_set_power().
>
> Signed-off-by: Tsuchiya Yuto 

You should read up on WIPHY_FLAG_PS_ON_BY_DEFAULT and
CONFIG_CFG80211_DEFAULT_PS, and set/respect those appropriately (hint:
mwifiex sets WIPHY_FLAG_PS_ON_BY_DEFAULT, and your patch makes this a
lie). Also, this seems like a quirk that you haven't properly worked
out -- if you're working on a quirk framework in your other series,
you should just key into that.

For the record, Chrome OS supports plenty of mwifiex systems with 8897
(SDIO only) and 8997 (PCIe), with PS enabled, and you're hurting
those. Your problem sounds to be exclusively a problem with the PCIe
8897 firmware.

As-is, NAK.

Brian


Re: [PATCH] mwifiex: pcie: add enable_device_dump module parameter

2020-10-28 Thread Brian Norris
On Wed, Oct 28, 2020 at 3:58 PM Tsuchiya Yuto  wrote:
>
> The devicve_dump may take a little bit long time and users may want to
> disable the dump for daily usage.
>
> This commit adds a new module parameter enable_device_dump and disables
> the device_dump by default.

As with one of your other patches, please don't change the defaults
and hide them under a module parameter. If you're adding a module
parameter, leave the default behavior alone.

This also seems like something that might be nicer as a user-space
knob in generic form (similar to "/sys/class/devcoredump/disabled",
except on a per-device basis, and fed back to the driver so it doesn't
waste time generating such dumps), but I suppose I can see why a
module parameter (so you can just stick your configuration in
/etc/modprobe.d/) might be easier to deal with in some cases.

Brian


Re: [PATCH 2/3] mwifiex: add allow_ps_mode module parameter

2020-10-28 Thread Brian Norris
On Wed, Oct 28, 2020 at 2:56 PM Tsuchiya Yuto  wrote:
>
> To make the ps_mode (power_save) control easier, this commit adds a new
> module parameter allow_ps_mode and set it false (disallowed) by default.

This sounds like a bad idea, as it breaks all the existing users who
expect this feature to be allowed. Seems like you should flip the
defaults. Without some better justification, NACK.

Also, I can't find the other 2 patches in this alleged series. Maybe
they're still making it through the mailing lists and archives.

Brian

> When this parameter is set to false, changing the power_save mode will
> be disallowed like the following:
>
> $ sudo iw dev mlan0 set power_save on
> command failed: Operation not permitted (-1)
>
> Signed-off-by: Tsuchiya Yuto 


Re: [PATCH] wireless: mwifiex: fix double free

2020-10-05 Thread Brian Norris
On Sun, Oct 4, 2020 at 6:19 AM  wrote:
>
> From: Tom Rix 
>
> clang static analysis reports this problem:
>
> sdio.c:2403:3: warning: Attempt to free released memory
> kfree(card->mpa_rx.buf);
> ^~~

That's some interesting static analysis for a compiler.

> When mwifiex_init_sdio() fails in its first call to
> mwifiex_alloc_sdio_mpa_buffer, it falls back to calling it
> again.  If the second alloc of mpa_tx.buf fails, the error
> handler will try to free the old, previously freed mpa_rx.buf.
> Reviewing the code, it looks like a second double free would
> happen with mwifiex_cleanup_sdio().
>
> So set both pointers to NULL when they are freed.
>
> Fixes: 5e6e3a92b9a4 ("wireless: mwifiex: initial commit for Marvell mwifiex 
> driver")
> Signed-off-by: Tom Rix 

For whatever it's worth:

Reviewed-by: Brian Norris 


Re: [PATCH] scripts/setlocalversion: make git describe output more reliable

2020-09-10 Thread Brian Norris
On Thu, Sep 10, 2020 at 7:35 AM Masahiro Yamada  wrote:
> On Thu, Sep 10, 2020 at 8:57 PM Rasmus Villemoes
>  wrote:
> > So in order to avoid `uname -a` output relying on such random details
> > of the build environment which are rather hard to ensure are
> > consistent between developers and buildbots, use an explicit
> > --abbrev=15 option (and for consistency, also use rev-parse --short=15
> > for the unlikely case of no signed tags being usable).

For the patch:

Reviewed-by: Brian Norris 

> I agree that any randomness should be avoided.
>
> My question is, do we need 15-digits?
...
> So, I think the conflict happens
> only when we have two commits that start with the same 7-digits
> in the _same_ release. Is this correct?

For the rev-parse usage ("unlikely case where we have no signed tag"),
the total number of objects is definitely relevant, and the man-page
says:
"unique prefix with at least length characters"
i.e., it might be longer, if needed for uniqueness.

For git-describe (the case where we have a tag to base off):
"use  digits, or as many digits as needed to form a unique object name"
I'm not quite sure whether the uniqueness is including anything about
the tag-relative prefix, but if not, then we have the same problem.

At least, that's my reading of the situation.

Brian


Re: [PATCH] rtw88: pci: Power cycle device during shutdown

2020-08-25 Thread Brian Norris
On Mon, Aug 24, 2020 at 2:32 AM Kai-Heng Feng
 wrote:
>
> Sometimes system freeze on cold/warm boot when rtw88 is probing.
>
> According to [1], platform firmware may not properly power manage the
> device during shutdown. I did some expirements and putting the device to
> D3 can workaround the issue.
>
> So let's power cycle the device by putting the device to D3 at shutdown
> to prevent the issue from happening.
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=206411#c9
>
> Signed-off-by: Kai-Heng Feng 

Can you at least include some more details, like which hardware
version and firmware? And which platform? It seems a bit harsh to
include a platform workaround to run for everyone, unless there's
truly no downside. But even then, it's good to log what you're working
with, in case there are ever problems with this in the future.

Brian


Re: [PATCH net] mwifiex: Increase AES key storage size to 256 bits

2020-08-25 Thread Brian Norris
Hi,

On Tue, Aug 25, 2020 at 8:38 AM Maximilian Luz  wrote:
>
> Following commit e18696786548 ("mwifiex: Prevent memory corruption
> handling keys") the mwifiex driver fails to authenticate with certain
> networks, specifically networks with 256 bit keys, and repeatedly asks
> for the password. The kernel log repeats the following lines (id and
> bssid redacted):
>
> mwifiex_pcie :01:00.0: info: trying to associate to '' bssid 
> 
> mwifiex_pcie :01:00.0: info: associated to bssid  successfully
> mwifiex_pcie :01:00.0: crypto keys added
> mwifiex_pcie :01:00.0: info: successfully disconnected from : 
> reason code 3
>
> Tracking down this problem lead to the overflow check introduced by the
> aforementioned commit into mwifiex_ret_802_11_key_material_v2(). This
> check fails on networks with 256 bit keys due to the current storage
> size for AES keys in struct mwifiex_aes_param being only 128 bit.
>
> To fix this issue, increase the storage size for AES keys to 256 bit.
>
> Signed-off-by: Maximilian Luz 
> Reported-by: Kaloyan Nikolov 
> Tested-by: Kaloyan Nikolov 

Thanks for this! I just happened to notice this breakage here, as we
just merged the relevant -stable updates. I think it would be wise to
get the Fixes tag Dan noted, when Kalle lands this.

Reviewed-by: Brian Norris 
Tested-by: Brian Norris 

Also, while technically the regressing commit (e18696786548 ("mwifiex:
Prevent memory corruption handling keys")) was fixing a potential
overflow, the encasing command structure (struct host_cmd_ds_command)
is a union of a ton of other command layouts, and likely had plenty of
padding at the end, which would at least explain why non-malicious
scenarios weren't problematic pre-commit-e18696786548. It's also not
clear to me how much the network can directly determine this length,
but I suppose that's beside the point now -- it's good to fix both of
these bugs.

Regards,
Brian


Re: [PATCH v2] mwifiex: don't call del_timer_sync() on uninitialized timer

2020-08-24 Thread Brian Norris
On Fri, Aug 21, 2020 at 1:28 AM Tetsuo Handa
 wrote:
>
> syzbot is reporting that del_timer_sync() is called from
> mwifiex_usb_cleanup_tx_aggr() from mwifiex_unregister_dev() without
> checking timer_setup() from mwifiex_usb_tx_init() was called [1].
>
> Ganapathi Bhat proposed a possibly cleaner fix, but it seems that
> that fix was forgotten [2].
>
> "grep -FrB1 'del_timer' drivers/ | grep -FA1 '.function)'" says that
> currently there are 28 locations which call del_timer[_sync]() only if
> that timer's function field was initialized (because timer_setup() sets
> that timer's function field). Therefore, let's use same approach here.
>
> [1] 
> https://syzkaller.appspot.com/bug?id=26525f643f454dd7be0078423e3cdb0d57744959
> [2] 
> https://lkml.kernel.org/r/ca+asdxmht2gq9hy+ip_bykwxssrewdp3_bafmkncuqj3k+-...@mail.gmail.com
>
> Reported-by: syzbot 
> Cc: Ganapathi Bhat 
> Cc: Brian Norris 
> Signed-off-by: Tetsuo Handa 

This seems good to me:

Reviewed-by: Brian Norris 

> ---
>  drivers/net/wireless/marvell/mwifiex/usb.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/marvell/mwifiex/usb.c 
> b/drivers/net/wireless/marvell/mwifiex/usb.c
> index 6f3cfde4654c..426e39d4ccf0 100644
> --- a/drivers/net/wireless/marvell/mwifiex/usb.c
> +++ b/drivers/net/wireless/marvell/mwifiex/usb.c
> @@ -1353,7 +1353,8 @@ static void mwifiex_usb_cleanup_tx_aggr(struct 
> mwifiex_adapter *adapter)
> skb_dequeue(>tx_aggr.aggr_list)))
> mwifiex_write_data_complete(adapter, skb_tmp,
> 0, -1);
> -   del_timer_sync(>tx_aggr.timer_cnxt.hold_timer);
> +   if (port->tx_aggr.timer_cnxt.hold_timer.function)
> +   del_timer_sync(>tx_aggr.timer_cnxt.hold_timer);
> port->tx_aggr.timer_cnxt.is_hold_timer_set = false;
> port->tx_aggr.timer_cnxt.hold_tmo_msecs = 0;
> }
> --
> 2.18.4
>


Re: [PATCH v4 7/7] pwm: cros-ec: Simplify EC error handling

2020-08-06 Thread Brian Norris
On Thu, Aug 6, 2020 at 8:33 AM Guenter Roeck  wrote:
>
> With enhanced error reporting from cros_ec_cmd_xfer_status() in place,
> we can fully use it and no longer rely on EC error codes.
>
> Signed-off-by: Guenter Roeck 

Reviewed-by: Brian Norris 


Re: [PATCH] drivers/net/wan/lapbether: Use needed_headroom instead of hard_header_len

2020-07-30 Thread Brian Norris
On Tue, Jul 28, 2020 at 6:42 PM Xie He  wrote:
> On Tue, Jul 28, 2020 at 12:52 PM -0700
> Brian Norris  wrote:
> > What is the intention with this X25 protocol? I guess the headers added
> > in lapbeth_data_transmit() are supposed to be "invisible", as with this
> > note in af_packet.c?
...
> This driver is not intended to be used with IPv4 or IPv6 protocols,
> but is intended to be used with a special "X.25" protocol. That's the
> reason the device type is ARPHRD_X25. I used "grep" in the X.25
> network layer code (net/x25) and I found there's nowhere
> "dev_hard_header" is called. I also used "grep" in all the X.25

That well could just be a bug in net/x25/. But from context, it does
appear that X.25 does not intend to expose its headers to higher
layers.

> drivers in the kernel (lapbether.c, x25_asy.c, hdlc_x25.c under
> drivers/net/wan) and I found no driver implemented "header_ops". So I
> think the X.25 networking code doesn't expect any header visible
> outside of the device driver, and X.25 drivers should make their
> headers invisible outside of them.
>
> So I think hard_header_len should be 0 for all X.25 drivers, so that
> they can be used correctly with af_packet.c.
>
> I don't know if this sounds plausible to you. If it does, could you
> please let me have your name in a "Reviewed_by" tag. It would be of
> great help to have your support. Thanks!

Sure, I can do that:

Reviewed-by: Brian Norris 

I guess x25 is basically an abandoned project, if you're coming to me for this?


Re: [PATCH v3 6/6] platform/chrome: cros_ec_proto: Convert EC error codes to Linux error codes

2020-07-29 Thread Brian Norris
On Wed, Jul 29, 2020 at 4:22 PM Guenter Roeck  wrote:
> On 7/29/20 3:21 PM, Brian Norris wrote:
> > On Sun, Jul 26, 2020 at 03:01:01PM -0700, Guenter Roeck wrote:
> >> --- a/drivers/platform/chrome/cros_ec_proto.c
> >> +++ b/drivers/platform/chrome/cros_ec_proto.c

> > ^^ Maybe we want to double check 'ret != 0'? Or maybe
> >
> >   ret = cros_ec_error_map[result];
> >   if (!ret) {
>
> 'ret' won't ever be 0 here. Above:
> && 
> cros_ec_error_map[result]
>
> and below:
>
> else
> ret = -EPROTO;

Ah, I'm reading too quickly. You're correct, sorry.

> >   ret = -EPROTO;
> >   dev_err(..., "Unexpected EC result code 
> > %d\n", result);
> >   }
> >
> > ? Could even be WARN_ON(), since this would be an actionable programming
> > error, not exactly an external factor. Or maybe I'm being paranoid, and
> > future programmers are perfect.
> >
> I think, if anything, we might consider adding the message below (result >=
> ARRAY_SIZE(cros_ec_error_map) is just as bad). Not sure myself. I am
> open to adding it if people think it would be useful/desirable.

No, my primary motivation was that I thought the logic left room for
error if there were holes. I was mistaken on that point. Secondarily,
it was also potentially useful to point out when we fell into those
holes. I'm not sure logging the warning is that important. Generally,
we only care about a handful of result codes, and as long as the rest
don't end up as "success", I think we're in OK shape.

Sorry for the noise. Here's my tag (which given my misreading so far,
should probably have a heavy discount on its value):

Reviewed-by: Brian Norris 


Re: [PATCH v3 0/6] platform/chrome: cros_ec_proto: Convert EC error codes to Linux error codes

2020-07-29 Thread Brian Norris
On Sun, Jul 26, 2020 at 03:00:55PM -0700, Guenter Roeck wrote:
> The EC reports a variety of error codes. Most of those, with the exception
> of EC_RES_INVALID_VERSION, are converted to -EPROTO. As result, the actual
> error code gets lost. In cros_ec_cmd_xfer_status(), convert all EC errors
> to Linux error codes to report a more meaningful error to the caller to aid
> debugging.
> 
> To prepare for this change, handle error codes other than -EPROTO for all
> callers of cros_ec_cmd_xfer_status(). Specifically, no longer assume that
> -EPROTO reflects an error from the EC and all other error codes reflect a
> transfer error.
> 
> v2: Add patches 1/4 to 3/4 to handle callers of cros_ec_cmd_xfer_status()
> v3: Add patches 4/6 and 5/6 to handle additional callers of
>   cros_ec_cmd_xfer_status()
> Use -ENOPROTOOPT for EC_RES_INVALID_VERSION
> Implement function to convert error codes

A small potential (i.e., being paranoid about future changes) note on
patch 6, but otherwise looks fine to me:

Reviewed-by: Brian Norris 


Re: [PATCH v3 6/6] platform/chrome: cros_ec_proto: Convert EC error codes to Linux error codes

2020-07-29 Thread Brian Norris
Hi Guenter,

On Sun, Jul 26, 2020 at 03:01:01PM -0700, Guenter Roeck wrote:
> v3: Use -ENOPROTOOPT for EC_RES_INVALID_VERSION
> Implement function to convert error codes
> v2: No change
> 
>  drivers/platform/chrome/cros_ec_proto.c | 52 -
>  1 file changed, 42 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/platform/chrome/cros_ec_proto.c 
> b/drivers/platform/chrome/cros_ec_proto.c
> index e5bbec979a2a..a081b8245682 100644
> --- a/drivers/platform/chrome/cros_ec_proto.c
> +++ b/drivers/platform/chrome/cros_ec_proto.c
> @@ -15,6 +15,43 @@
>  
>  #define EC_COMMAND_RETRIES   50
>  
> +static const int cros_ec_error_map[] = {
> + [EC_RES_INVALID_COMMAND] = -EOPNOTSUPP,
> + [EC_RES_ERROR] = -EIO,
> + [EC_RES_INVALID_PARAM] = -EINVAL,
> + [EC_RES_ACCESS_DENIED] = -EACCES,
> + [EC_RES_INVALID_RESPONSE] = -EPROTO,
> + [EC_RES_INVALID_VERSION] = -ENOPROTOOPT,
> + [EC_RES_INVALID_CHECKSUM] = -EBADMSG,
> + [EC_RES_IN_PROGRESS] = -EINPROGRESS,
> + [EC_RES_UNAVAILABLE] = -ENODATA,
> + [EC_RES_TIMEOUT] = -ETIMEDOUT,
> + [EC_RES_OVERFLOW] = -EOVERFLOW,
> + [EC_RES_INVALID_HEADER] = -EBADR,
> + [EC_RES_REQUEST_TRUNCATED] = -EBADR,
> + [EC_RES_RESPONSE_TOO_BIG] = -EFBIG,
> + [EC_RES_BUS_ERROR] = -EFAULT,
> + [EC_RES_BUSY] = -EBUSY,
> + [EC_RES_INVALID_HEADER_VERSION] = -EBADMSG,
> + [EC_RES_INVALID_HEADER_CRC] = -EBADMSG,
> + [EC_RES_INVALID_DATA_CRC] = -EBADMSG,
> + [EC_RES_DUP_UNAVAILABLE] = -ENODATA,
> +};

Sorry I didn't pay attention to this earlier, but is there any
programmatic way to ensure that we don't have unexpected holes here? If
we do (e.g., we add new error codes, but they aren't contiguous for
whatever reasons), then those will get treated as "success" with your
current patch.

I say "unexpected" hole, because EC_RES_SUCCESS (0) is an expected hole.

> +
> +static int cros_ec_map_error(uint32_t result)
> +{
> + int ret = 0;
> +
> + if (result != EC_RES_SUCCESS) {
> + if (result < ARRAY_SIZE(cros_ec_error_map) && 
> cros_ec_error_map[result])
> + ret = cros_ec_error_map[result];

^^ Maybe we want to double check 'ret != 0'? Or maybe

ret = cros_ec_error_map[result];
if (!ret) {
ret = -EPROTO;
dev_err(..., "Unexpected EC result code %d\n", 
result);
}

? Could even be WARN_ON(), since this would be an actionable programming
error, not exactly an external factor. Or maybe I'm being paranoid, and
future programmers are perfect.

Otherwise:

Reviewed-by: Brian Norris 

> + else
> + ret = -EPROTO;
> + }
> +
> + return ret;
> +}
> +
>  static int prepare_packet(struct cros_ec_device *ec_dev,
> struct cros_ec_command *msg)
>  {


Re: [PATCH] drivers/net/wan/lapbether: Use needed_headroom instead of hard_header_len

2020-07-28 Thread Brian Norris
(Reviewing as requested; I'm not familiar with this driver either, or
really any WAN driver. It also seems that hard_header_len vs.
needed_headroom aren't very well documented, and even I can't guarantee
I understand them completely. So take my thoughts with a grain of salt.)

Hi,

On Sun, Jul 26, 2020 at 04:05:24AM -0700, Xie He wrote:
> In net/packet/af_packet.c, the function packet_snd first reserves a
> headroom of length (dev->hard_header_len + dev->needed_headroom).
> Then if the socket is a SOCK_DGRAM socket, it calls dev_hard_header,
> which calls dev->header_ops->create, to create the link layer header.
> If the socket is a SOCK_RAW socket, it "un-reserves" a headroom of
> length (dev->hard_header_len), and assumes the user to provide the
> appropriate link layer header.
> 
> So according to the logic of af_packet.c, dev->hard_header_len should
> be the length of the header that would be created by
> dev->header_ops->create.

I believe I'm with you up to here, but:

> However, this driver doesn't provide dev->header_ops, so logically
> dev->hard_header_len should be 0.

I'm not clear on this part.

What's to say you shouldn't be implementing header_ops instead? Note
that with WiFi drivers, they're exposing themselves as ARPHRD_ETHER, and
only the Ethernet headers are part of the upper "protocol" headers. So
my patch deferred to the eth headers.

What is the intention with this X25 protocol? I guess the headers added
in lapbeth_data_transmit() are supposed to be "invisible", as with this
note in af_packet.c?

   - if device has no dev->hard_header routine, it adds and removes ll header
 inside itself. In this case ll header is invisible outside of device,
 but higher levels still should reserve dev->hard_header_len.

If that's the case, then yes, I believe this patch should be correct.

Brian

> So we should use dev->needed_headroom instead of dev->hard_header_len
> to request necessary headroom to be allocated.
> 
> Signed-off-by: Xie He 


Re: [PATCH] mwifiex: don't call del_timer_sync() on uninitialized timer

2020-07-28 Thread Brian Norris
Hi,

On Mon, Jul 27, 2020 at 6:45 PM Tetsuo Handa
 wrote:
>
> syzbot is reporting that del_timer_sync() is called from
> mwifiex_usb_cleanup_tx_aggr() from mwifiex_unregister_dev() without
> checking timer_setup() from mwifiex_usb_tx_init() was called [1].
> Since mwifiex_usb_prepare_tx_aggr_skb() is calling del_timer() if
> is_hold_timer_set == true, use the same condition for del_timer_sync().
>
> [1] 
> https://syzkaller.appspot.com/bug?id=fdeef9cf7348be8b8ab5b847f2ed993aba8ea7b6
>
> Reported-by: syzbot 
> Cc: Ganapathi Bhat 
> Signed-off-by: Tetsuo Handa 
> ---
> A patch from Ganapathi Bhat ( https://patchwork.kernel.org/patch/10990275/ ) 
> is stalling
> at 
> https://lore.kernel.org/linux-usb/mn2pr18mb2637d7c742bc235fe38367f0a0...@mn2pr18mb2637.namprd18.prod.outlook.com/
>  .
> syzbot by now got this report for 1 times. Do we want to go with this 
> simple patch?

Sorry, that stall is partly my fault, and partly Ganapathi's. It
doesn't help that it took him 4 months to reply to my questions, so I
completely lost even the tiny bit of context I had managed to build up
in my head at initial review time... and so it's still buried in the
dark corners of my inbox. (I think I'll go archive that now, because
it really deserves a better sell than it had initially, if Ganapathi
really wants to land it.)

>  drivers/net/wireless/marvell/mwifiex/usb.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/marvell/mwifiex/usb.c 
> b/drivers/net/wireless/marvell/mwifiex/usb.c
> index 6f3cfde..04a1461 100644
> --- a/drivers/net/wireless/marvell/mwifiex/usb.c
> +++ b/drivers/net/wireless/marvell/mwifiex/usb.c
> @@ -1353,7 +1353,8 @@ static void mwifiex_usb_cleanup_tx_aggr(struct 
> mwifiex_adapter *adapter)
> skb_dequeue(>tx_aggr.aggr_list)))
> mwifiex_write_data_complete(adapter, skb_tmp,
> 0, -1);
> -   del_timer_sync(>tx_aggr.timer_cnxt.hold_timer);
> +   if (port->tx_aggr.timer_cnxt.is_hold_timer_set)

I believe if we ever actually started aggregation, then the timer can
be active at this point, and thus, the access to 'is_hold_timer_set'
is racy.

This *probably* deserves a better refactor, but in absence of that
(and a better explanation than Ganapathi gave), I think you at least
need to hold port->tx_aggr_lock. So perhaps (totally untested):

  spin_lock_bh(>tx_aggr_lock);
  if (port->tx_aggr.timer_cnxt.is_hold_timer_set) {
port->tx_aggr.timer_cnxt.is_hold_timer_set = false;
spin_unlock_bh(>tx_aggr_lock);
/* Timer could still be running, but it can't be restarted at this
point, so this is safe. */
del_timer_sync(>tx_aggr.timer_cnxt.hold_timer);
  } else {
spin_unlock_bh(>tx_aggr_lock);
  }

Otherwise, I think this is fine:

Reviewed-by: Brian Norris 

I also believe mwifiex_usb_prepare_tx_aggr_skb() needs to stop using
del_timer() (without the _sync()), because otherwise we might have
deactivated the timer already but not ensured that it has completely
finished executing on other CPUs. But that is probably orthogonal to
the current patch. (Again, so much in this driver needs refactoring.)

Side note: this entire TX aggregation feature for USB has been hidden
behind the mwifiex.aggr_ctrl module param since its introduction,
which has always been disabled by default. I wonder whether anybody is
*really* testing it, or whether it's 100% broken, as with many things
in this driver...


Brian

> +   del_timer_sync(>tx_aggr.timer_cnxt.hold_timer);
> port->tx_aggr.timer_cnxt.is_hold_timer_set = false;
> port->tx_aggr.timer_cnxt.hold_tmo_msecs = 0;
> }
> --
> 1.8.3.1
>


Re: [PATCH 2/2] platform/chrome: cros_ec_proto: check for missing EC_CMD_HOST_EVENT_GET_WAKE_MASK

2020-07-24 Thread Brian Norris
On Thu, Jul 23, 2020 at 1:04 AM Enric Balletbo i Serra
 wrote:
> Another thing that we can do (although this is specific for you and doesn't
> solve the problem with people like you that are interested on this), is add 
> you
> as a reviewer in the MAINTAINERS file. The CrOS EC has a lot of subtleties, 
> and
> having more ChromeOS people involved in the reviewing upstream is more than 
> welcome.

I'll think about that. If my lore.kernel.org searches over the last 4
months are correct, there's been about 50 patches a month, which is a
bit much as a side project. (I don't regularly work on EC stuff these
days.) It still might be better to subscribe and filter on my own
("opt in"), rather than pretend that I'm going to pay any attention to
50 patches a month.

Brian


[PATCH v2 2/2] platform/chrome: cros_ec_proto: check for missing EC_CMD_HOST_EVENT_GET_WAKE_MASK

2020-07-24 Thread Brian Norris
As with cros_ec_cmd_xfer_status(), etc., it's not enough to simply check
for the return status of send_command() -- that only covers transport or
other similarly-fatal errors. One must also check the ->result field, to
see whether the command really succeeded. If not, we can't use the data
it returns.

The caller of cros_ec_get_host_event_wake_mask() ignores this, and so
for example, on EC's where the command is not implemented, we're using
junk (or in practice, all zeros) for our wake-mask. We should be using a
non-zero default (currently, it's supposed to be all-1's).

Fix this by checking the ->result field and returning -EPROTO for
errors.

I might label this as fixing commit 29d99b966d60 ("cros_ec: Don't signal
wake event for non-wake host events"), except that this fix alone
actually may make things worse, as it now allows for a lot more spurious
wakeups. The patch "platform/chrome: cros_ec_proto: ignore battery/AC
wakeups on old ECs" helps to mitigate this.

Signed-off-by: Brian Norris 
---
v2:
 * EOPNOTSUPP, not ENOTSUPP
---
 drivers/platform/chrome/cros_ec_proto.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/platform/chrome/cros_ec_proto.c 
b/drivers/platform/chrome/cros_ec_proto.c
index e93024b55ce8..31ca0af62388 100644
--- a/drivers/platform/chrome/cros_ec_proto.c
+++ b/drivers/platform/chrome/cros_ec_proto.c
@@ -208,6 +208,12 @@ static int cros_ec_get_host_event_wake_mask(struct 
cros_ec_device *ec_dev,
msg->insize = sizeof(*r);
 
ret = send_command(ec_dev, msg);
+   if (ret >= 0) {
+   if (msg->result == EC_RES_INVALID_COMMAND)
+   return -EOPNOTSUPP;
+   if (msg->result != EC_RES_SUCCESS)
+   return -EPROTO;
+   }
if (ret > 0) {
r = (struct ec_response_host_event_mask *)msg->data;
*mask = r->mask;
@@ -488,6 +494,13 @@ int cros_ec_query_all(struct cros_ec_device *ec_dev)
  BIT(EC_HOST_EVENT_BATTERY_CRITICAL) |
  BIT(EC_HOST_EVENT_PD_MCU) |
  BIT(EC_HOST_EVENT_BATTERY_STATUS));
+   /*
+* Old ECs may not support this command. Complain about all
+* other errors.
+*/
+   if (ret != -EOPNOTSUPP)
+   dev_err(ec_dev->dev,
+   "failed to retrieve wake mask: %d\n", ret);
}
 
ret = 0;
-- 
2.28.0.rc0.142.g3c755180ce-goog



[PATCH v2 1/2] platform/chrome: cros_ec_proto: ignore unnecessary wakeups on old ECs

2020-07-24 Thread Brian Norris
ECs that don't implement EC_CMD_HOST_EVENT_GET_WAKE_MASK should still
have some reasonable default mask -- otherwise, they'll treat a variety
of EC signals as spurious wakeups. Battery and AC events can be
especially common, for devices that have been sitting at full charge
plugged into AC for a long time, as they may cycle their charging off
and on, or their battery may start reporting failures as it ages.

Treating these as wakeups does not serve a useful purpose, and is
instead often counterproductive. And indeed, later ECs (that implement
the mask) don't include these events in their wake-mask.

Note that this patch doesn't do anything without the subsequent patch
("platform/chrome: cros_ec_proto: check for missing
EC_CMD_HOST_EVENT_GET_WAKE_MASK"), because
cros_ec_get_host_event_wake_mask() currently does not return an error if
EC_CMD_HOST_EVENT_GET_WAKE_MASK is not implemented.

Some additional notes:
While the EC typically knows not to wake the CPU for these unimportant
events once the CPU reaches a sleep state, it doesn't really have a way
to know that the CPU is "almost" asleep, unless it has support for
EC_CMD_HOST_SLEEP_EVENT. Alas, these older ECs do not support that
command either, so this solution is not 100% complete.

Signed-off-by: Brian Norris 
---
v2:
 * more notes in commit message
---
 drivers/platform/chrome/cros_ec_proto.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/platform/chrome/cros_ec_proto.c 
b/drivers/platform/chrome/cros_ec_proto.c
index 3e745e0fe092..e93024b55ce8 100644
--- a/drivers/platform/chrome/cros_ec_proto.c
+++ b/drivers/platform/chrome/cros_ec_proto.c
@@ -469,14 +469,26 @@ int cros_ec_query_all(struct cros_ec_device *ec_dev)
_mask);
ec_dev->host_sleep_v1 = (ret >= 0 && (ver_mask & EC_VER_MASK(1)));
 
-   /*
-* Get host event wake mask, assume all events are wake events
-* if unavailable.
-*/
+   /* Get host event wake mask. */
ret = cros_ec_get_host_event_wake_mask(ec_dev, proto_msg,
   _dev->host_event_wake_mask);
-   if (ret < 0)
-   ec_dev->host_event_wake_mask = U32_MAX;
+   if (ret < 0) {
+   /*
+* If the EC doesn't support EC_CMD_HOST_EVENT_GET_WAKE_MASK,
+* use a reasonable default. Note that we ignore various
+* battery, AC status, and power-state events, because (a)
+* those can be quite common (e.g., when sitting at full
+* charge, on AC) and (b) these are not actionable wake events;
+* if anything, we'd like to continue suspending (to save
+* power), not wake up.
+*/
+   ec_dev->host_event_wake_mask = U32_MAX &
+   ~(BIT(EC_HOST_EVENT_AC_DISCONNECTED) |
+ BIT(EC_HOST_EVENT_BATTERY_LOW) |
+ BIT(EC_HOST_EVENT_BATTERY_CRITICAL) |
+ BIT(EC_HOST_EVENT_PD_MCU) |
+ BIT(EC_HOST_EVENT_BATTERY_STATUS));
+   }
 
ret = 0;
 
-- 
2.28.0.rc0.142.g3c755180ce-goog



Re: [PATCH 2/2] platform/chrome: cros_ec_proto: check for missing EC_CMD_HOST_EVENT_GET_WAKE_MASK

2020-07-22 Thread Brian Norris
On Wed, Jul 22, 2020 at 5:43 PM Brian Norris  wrote:
> unless I got
> refactor cros_ec_get_host_event_wake_mask() to use
> cros_ec_cmd_xfer_status() instead of send_command(). I'm actually not
> sure why we don't do that, now that I think about it...

Ah, that would appear to be recursion (cros_ec_query_all() ->
cros_ec_get_host_event_wake_mask() -> cros_ec_cmd_xfer_status() -> ...
cros_ec_query_all()), although that could only happen if the first
cros_ec_query_all() doesn't initialize 'proto_version' to something
besides EC_PROTO_VERSION_UNKNOWN. That is only possible if the EC
reports '0' back to us.

I might skip out on that particular refactor for the moment.

Brian


Re: [PATCH 2/2] platform/chrome: cros_ec_proto: check for missing EC_CMD_HOST_EVENT_GET_WAKE_MASK

2020-07-22 Thread Brian Norris
On Wed, Jul 22, 2020 at 2:13 PM Guenter Roeck  wrote:
> On Wed, Jul 22, 2020 at 1:50 PM Brian Norris  wrote:
> > Other than perhaps taking a lesson not to propagate -ENOTSUPP, I don't
> > think this series should block on that, as this is a bugfix IMO.
>
> My patch will return -EOPNOTSUPP for EC_RES_INVALID_COMMAND, so maybe
> you could do the same. In my latest version (not yet submitted) I
> extracted the conversion into a separate function, so if your patch is
> accepted now I can just add another patch on top of it to start using
> that function.

Sure, I can use EOPNOTSUPP in v2.

BTW, the error code is completely internal to cros_ec_proto.c in my
patch, so it seems even less-related to your series, unless I got
refactor cros_ec_get_host_event_wake_mask() to use
cros_ec_cmd_xfer_status() instead of send_command(). I'm actually not
sure why we don't do that, now that I think about it...

So WDYT? Should I rebase on your eventual v3 and refactor to
cros_ec_cmd_xfer_status()? Or (re)submit this first, and add one more
cros_ec_cmd_xfer_status() usage for you to tweak in your series?

I don't mind a lot either way, except that I would like to port this
to older kernels soon.

Brian


Re: [PATCH v2 0/4] platform/chrome: cros_ec_proto: Convert EC error codes to Linux error codes

2020-07-22 Thread Brian Norris
Hi Guenter,

On Mon, Jul 20, 2020 at 01:22:39PM -0700, Guenter Roeck wrote:
> The EC reports a variety of error codes. Most of those, with the exception
> of EC_RES_INVALID_VERSION, are converted to -EPROTO. As result, the actual
> error code gets lost. In cros_ec_cmd_xfer_status(), convert all EC errors
> to Linux error codes to report a more meaningful error to the caller to aid
> debugging.
> 
> To prepare for this change, handle error codes other than -EPROTO for all
> callers of cros_ec_cmd_xfer_status(). Specifically, no longer assume that
> -EPROTO reflects an error from the EC and all other error codes reflect a
> transfer error.
> 
> v2: Add patches 1/4 to 3/4 to handle callers of cros_ec_cmd_xfer_status()

I did a rough grep to see what you might be missing:

  git grep -n EPROTO | grep -e cros -e '-ec'

I think cros-ec-pwm / cros_ec_num_pwms() might need fixing too? Boy, I
wrote that, but it sure ain't easy to read...(*checks watch*)...4 years
later.

Apart from the notes already made, I think the series looks good:

Reviewed-by: Brian Norris 

Feel free to CC me on v3, if you want another look

Brian


Re: [PATCH v2 4/4] platform/chrome: cros_ec_proto: Convert EC error codes to Linux error codes

2020-07-22 Thread Brian Norris
+ drinkcat, aseda

On Tue, Jul 21, 2020 at 07:23:20AM -0700, Guenter Roeck wrote:
> On Tue, Jul 21, 2020 at 01:29:01PM +0200, Enric Balletbo i Serra wrote:
> > On 20/7/20 22:22, Guenter Roeck wrote:
> > > + [EC_RES_INVALID_HEADER_VERSION] = -EBADMSG,
> 
> Any idea for EC_RES_INVALID_HEADER_VERSION ? I am not entirely happy
> with -EBADMSG: the error is distinctly different to CRC errors.
> EPROTONOSUPPORT as well, maybe, or something else ?

FWIW, these (INVALID_HEADER_VERSION, INVALID_HEADER_CRC,
INVALID_DATA_CRC) aren't actually used on any firmware yet. This has
been open forever:
https://crbug.com/787159
Added here:
https://chromium-review.googlesource.com/c/chromiumos/platform/ec/+/780452/

Unfortunately, the linked design doc (still in draft) is not public.

My understanding is that while they're not all exactly the same (CRC is
different than the others), they are all still supposed to represent
"corrupt request [from the Application Processor]". EBADMSG seems good
enough to me.

Brian

P.S. for those added late -- you can grab the whole thread from here:
https://lore.kernel.org/lkml/20200720202243.180230-1-li...@roeck-us.net/
or in mbox form:
https://lore.kernel.org/lkml/20200720202243.180230-1-li...@roeck-us.net/t.mbox.gz


Re: [PATCH 2/2] platform/chrome: cros_ec_proto: check for missing EC_CMD_HOST_EVENT_GET_WAKE_MASK

2020-07-22 Thread Brian Norris
On Wed, Jul 22, 2020 at 3:19 AM Enric Balletbo i Serra
 wrote:
>
> Hi Brian,
>
> Thank you for your patch, I'll take a look soon but I'd like to ask if you can
> join the discussion with this patchset [1], specially this one [2]. We're 
> trying
> to match EC errors with standard linux kernel errors because we think can be
> helpful.
>
> [1] https://lore.kernel.org/patchwork/cover/1276734/
> [2] https://lore.kernel.org/patchwork/patch/1276738/

Hi Enric,

Thanks, I'll do that. I do wonder sometimes how non-maintainers should
best support "community" around these things, for subsystems that
don't have a dedicated mailing list and are therefore sent only to
maintainers + LKML-fire-hose. I could probably subscribe to LKML and
filter it, but something tells me my mailbox will still manage to
explode somehow... Anyway, I digress.

Other than perhaps taking a lesson not to propagate -ENOTSUPP, I don't
think this series should block on that, as this is a bugfix IMO.

Regards,
Brian


[PATCH 2/2] platform/chrome: cros_ec_proto: check for missing EC_CMD_HOST_EVENT_GET_WAKE_MASK

2020-07-21 Thread Brian Norris
As with cros_ec_cmd_xfer_status(), etc., it's not enough to simply check
for the return status of send_command() -- that only covers transport or
other similarly-fatal errors. One must also check the ->result field, to
see whether the command really succeeded. If not, we can't use the data
it returns.

The caller of cros_ec_get_host_event_wake_mask() ignores this, and so
for example, on EC's where the command is not implemented, we're using
junk (or in practice, all zeros) for our wake-mask. We should be using a
non-zero default (currently, it's supposed to be all-1's).

Fix this by checking the ->result field and returning -EPROTO for
errors.

I might label this as fixing commit 29d99b966d60 ("cros_ec: Don't signal
wake event for non-wake host events"), except that this fix alone
actually may make things worse, as it now allows for a lot more spurious
wakeups. The patch "platform/chrome: cros_ec_proto: ignore battery/AC
wakeups on old ECs" helps to mitigate this.

Signed-off-by: Brian Norris 
---
 drivers/platform/chrome/cros_ec_proto.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/platform/chrome/cros_ec_proto.c 
b/drivers/platform/chrome/cros_ec_proto.c
index e93024b55ce8..01a74abe4191 100644
--- a/drivers/platform/chrome/cros_ec_proto.c
+++ b/drivers/platform/chrome/cros_ec_proto.c
@@ -208,6 +208,12 @@ static int cros_ec_get_host_event_wake_mask(struct 
cros_ec_device *ec_dev,
msg->insize = sizeof(*r);
 
ret = send_command(ec_dev, msg);
+   if (ret >= 0) {
+   if (msg->result == EC_RES_INVALID_COMMAND)
+   return -ENOTSUPP;
+   if (msg->result != EC_RES_SUCCESS)
+   return -EPROTO;
+   }
if (ret > 0) {
r = (struct ec_response_host_event_mask *)msg->data;
*mask = r->mask;
@@ -488,6 +494,13 @@ int cros_ec_query_all(struct cros_ec_device *ec_dev)
  BIT(EC_HOST_EVENT_BATTERY_CRITICAL) |
  BIT(EC_HOST_EVENT_PD_MCU) |
  BIT(EC_HOST_EVENT_BATTERY_STATUS));
+   /*
+* Old ECs may not support this command. Complain about all
+* other errors.
+*/
+   if (ret != -ENOTSUPP)
+   dev_err(ec_dev->dev,
+   "failed to retrieve wake mask: %d\n", ret);
}
 
ret = 0;
-- 
2.28.0.rc0.105.gf9edc3c819-goog



[PATCH 1/2] platform/chrome: cros_ec_proto: ignore unnecessary wakeups on old ECs

2020-07-21 Thread Brian Norris
ECs that don't implement EC_CMD_HOST_EVENT_GET_WAKE_MASK should still
have some reasonable default mask -- otherwise, they'll treat a variety
of EC signals as spurious wakeups. Battery and AC events can be
especially common, for devices that have been sitting at full charge
plugged into AC for a long time, as they may cycle their charging off
and on, or their battery may start reporting failures as it ages.

Treating these as wakeups does not serve a useful purpose, and is
instead often counterproductive. And indeed, later ECs (that implement
the mask) don't include these events in their wake-mask.

Note that this patch doesn't do anything without the subsequent patch
("platform/chrome: cros_ec_proto: check for missing
EC_CMD_HOST_EVENT_GET_WAKE_MASK"), because
cros_ec_get_host_event_wake_mask() currently does not return an error if
EC_CMD_HOST_EVENT_GET_WAKE_MASK is not implemented.

Signed-off-by: Brian Norris 
---
 drivers/platform/chrome/cros_ec_proto.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/platform/chrome/cros_ec_proto.c 
b/drivers/platform/chrome/cros_ec_proto.c
index 3e745e0fe092..e93024b55ce8 100644
--- a/drivers/platform/chrome/cros_ec_proto.c
+++ b/drivers/platform/chrome/cros_ec_proto.c
@@ -469,14 +469,26 @@ int cros_ec_query_all(struct cros_ec_device *ec_dev)
_mask);
ec_dev->host_sleep_v1 = (ret >= 0 && (ver_mask & EC_VER_MASK(1)));
 
-   /*
-* Get host event wake mask, assume all events are wake events
-* if unavailable.
-*/
+   /* Get host event wake mask. */
ret = cros_ec_get_host_event_wake_mask(ec_dev, proto_msg,
   _dev->host_event_wake_mask);
-   if (ret < 0)
-   ec_dev->host_event_wake_mask = U32_MAX;
+   if (ret < 0) {
+   /*
+* If the EC doesn't support EC_CMD_HOST_EVENT_GET_WAKE_MASK,
+* use a reasonable default. Note that we ignore various
+* battery, AC status, and power-state events, because (a)
+* those can be quite common (e.g., when sitting at full
+* charge, on AC) and (b) these are not actionable wake events;
+* if anything, we'd like to continue suspending (to save
+* power), not wake up.
+*/
+   ec_dev->host_event_wake_mask = U32_MAX &
+   ~(BIT(EC_HOST_EVENT_AC_DISCONNECTED) |
+ BIT(EC_HOST_EVENT_BATTERY_LOW) |
+ BIT(EC_HOST_EVENT_BATTERY_CRITICAL) |
+ BIT(EC_HOST_EVENT_PD_MCU) |
+ BIT(EC_HOST_EVENT_BATTERY_STATUS));
+   }
 
ret = 0;
 
-- 
2.28.0.rc0.105.gf9edc3c819-goog



[RFC PATCH] firmware: qcom_scm: disable SDI at boot

2020-07-21 Thread Brian Norris
*** Disclaimer: I know extremely little about Qualcomm's undocumented
SMC API. ***

I'm trying to get upstream support for an IPQ4019 device I have, and by
default, warm boot does not work properly -- it appears to trap into
TrustZone. I find that the downstream/vendor kernel makes this call at
boot, with notes about the watchdog and SDI configuration. It appears
some of this is leftover from when they had download-mode enabled, as
well as some other debug features, and they didn't get completely turned
off in the production bootloader. But I reall can't tell that well; I'm
just going off the minimal source code and git logs.

Because this is so undocumented, I can't tell what the right thing to do
is -- should this go behind a DT property? Should I apply it only for
ipq4019? Is there a better way to describe and document the bits used in
this command?

If it helps, I can try to shop around for some testing on other systems.

Signed-off-by: Brian Norris 
---
 drivers/firmware/qcom_scm.c | 20 
 drivers/firmware/qcom_scm.h |  1 +
 2 files changed, 21 insertions(+)

diff --git a/drivers/firmware/qcom_scm.c b/drivers/firmware/qcom_scm.c
index 0e7233a20f34..70c46f1668d1 100644
--- a/drivers/firmware/qcom_scm.c
+++ b/drivers/firmware/qcom_scm.c
@@ -394,6 +394,20 @@ static int __qcom_scm_set_dload_mode(struct device *dev, 
bool enable)
return qcom_scm_call(__scm->dev, , NULL);
 }
 
+static int __qcom_scm_disable_sdi(struct device *dev)
+{
+   struct qcom_scm_desc desc = {
+   .svc = QCOM_SCM_SVC_BOOT,
+   .cmd = QCOM_SCM_BOOT_CONFIG_SDI,
+   .arginfo = QCOM_SCM_ARGS(2),
+   .args[0] = 1  /* 1: disable watchdog debug */,
+   .args[1] = 0  /* 0: disable SDI */,
+   .owner = ARM_SMCCC_OWNER_SIP,
+   };
+
+   return qcom_scm_call(__scm->dev, , NULL);
+}
+
 static void qcom_scm_set_download_mode(bool enable)
 {
bool avail;
@@ -1122,6 +1136,12 @@ static int qcom_scm_probe(struct platform_device *pdev)
if (download_mode)
qcom_scm_set_download_mode(true);
 
+   /*
+* Some bootloaders leave this enabled by default, which prevents
+* normal warm reboot.
+*/
+   __qcom_scm_disable_sdi(__scm->dev);
+
return 0;
 }
 
diff --git a/drivers/firmware/qcom_scm.h b/drivers/firmware/qcom_scm.h
index d9ed670da222..39c3dc0dfc50 100644
--- a/drivers/firmware/qcom_scm.h
+++ b/drivers/firmware/qcom_scm.h
@@ -74,6 +74,7 @@ extern int scm_legacy_call(struct device *dev, const struct 
qcom_scm_desc *desc,
 #define QCOM_SCM_SVC_BOOT  0x01
 #define QCOM_SCM_BOOT_SET_ADDR 0x01
 #define QCOM_SCM_BOOT_TERMINATE_PC 0x02
+#define QCOM_SCM_BOOT_CONFIG_SDI   0x09
 #define QCOM_SCM_BOOT_SET_DLOAD_MODE   0x10
 #define QCOM_SCM_BOOT_SET_REMOTE_STATE 0x0a
 #define QCOM_SCM_FLUSH_FLAG_MASK   0x3
-- 
2.27.0



Re: [PATCH] ath10k: Keep track of which interrupts fired, don't poll them

2020-07-08 Thread Brian Norris
On Wed, Jul 8, 2020 at 4:14 PM Doug Anderson  wrote:
> On Wed, Jul 8, 2020 at 4:03 PM Brian Norris  wrote:
> > If I'm reading correctly, you're removing the only remaining use of
> > 'per_ce_irq'. Should we kill the field entirely?
>
> Ah, you are indeed correct!  I hadn't noticed that.  Unless I hear
> otherwise, I'll send a v2 tomorrow that removes the field entirely.

A healthy middle ground might put that in a patch 2, so it's easily
dropped if desired. *shrug*

> > Or perhaps we should
> > leave some kind of WARN_ON() (BUG_ON()?) if this function is called
> > erroneously with per_ce_irq==true? But I suppose this driver is full
> > of landmines if the CE API is used incorrectly.
>
> Yeah, I originally had a WARN_ON() here and then took it out because
> it seemed like extra overhead and, as you said, someone writing the
> code has to know how the API works already I think.  ...but I'll add
> it back in if people want.

I believe WARN_ON() and friends have a built-in unlikely(), so it
shouldn't have much overhead. But I don't really mind either way.

> > Do you need to clear this map if the interface goes down or if there's
> > a firmware crash? Right now, I don't think there's a guarantee that
> > we'll run through a NAPI poll in those cases, which is the only place
> > you clear the map, and if the hardware/firmware has been reset, the
> > state map is probably not valid.
>
> Seems like a good idea.  Is the right place at the start of
> ath10k_snoc_hif_start()?

Either there or in .power_down()/.power_up(). I think either would be
equally correct, but I'm not entirely sure if the semantic difference
is meaningful for this.

Brian


Re: [PATCH] ath10k: Keep track of which interrupts fired, don't poll them

2020-07-08 Thread Brian Norris
On Tue, Jul 7, 2020 at 10:18 AM Douglas Anderson  wrote:
> diff --git a/drivers/net/wireless/ath/ath10k/ce.h 
> b/drivers/net/wireless/ath/ath10k/ce.h
> index a440aaf74aa4..666ce384a1d8 100644
> --- a/drivers/net/wireless/ath/ath10k/ce.h
> +++ b/drivers/net/wireless/ath/ath10k/ce.h
...
> @@ -376,12 +377,9 @@ static inline u32 ath10k_ce_interrupt_summary(struct 
> ath10k *ar)
>  {
> struct ath10k_ce *ce = ath10k_ce_priv(ar);
>
> -   if (!ar->hw_params.per_ce_irq)

If I'm reading correctly, you're removing the only remaining use of
'per_ce_irq'. Should we kill the field entirely? Or perhaps we should
leave some kind of WARN_ON() (BUG_ON()?) if this function is called
erroneously with per_ce_irq==true? But I suppose this driver is full
of landmines if the CE API is used incorrectly.

> -   return CE_WRAPPER_INTERRUPT_SUMMARY_HOST_MSI_GET(
> -   ce->bus_ops->read32((ar), CE_WRAPPER_BASE_ADDRESS +
> -   CE_WRAPPER_INTERRUPT_SUMMARY_ADDRESS));
> -   else
> -   return ath10k_ce_gen_interrupt_summary(ar);
> +   return CE_WRAPPER_INTERRUPT_SUMMARY_HOST_MSI_GET(
> +   ce->bus_ops->read32((ar), CE_WRAPPER_BASE_ADDRESS +
> +   CE_WRAPPER_INTERRUPT_SUMMARY_ADDRESS));
>  }
>
>  /* Host software's Copy Engine configuration. */

> diff --git a/drivers/net/wireless/ath/ath10k/snoc.h 
> b/drivers/net/wireless/ath/ath10k/snoc.h
> index a3dd06f6ac62..5095d1893681 100644
> --- a/drivers/net/wireless/ath/ath10k/snoc.h
> +++ b/drivers/net/wireless/ath/ath10k/snoc.h
> @@ -78,6 +78,7 @@ struct ath10k_snoc {
> unsigned long flags;
> bool xo_cal_supported;
> u32 xo_cal_data;
> +   DECLARE_BITMAP(pending_ce_irqs, CE_COUNT_MAX);

Do you need to clear this map if the interface goes down or if there's
a firmware crash? Right now, I don't think there's a guarantee that
we'll run through a NAPI poll in those cases, which is the only place
you clear the map, and if the hardware/firmware has been reset, the
state map is probably not valid.

Otherwise, looks OK to me:

Reviewed-by: Brian Norris 


Re: [PATCH] ath10k: Add interrupt summary based CE processing

2020-06-26 Thread Brian Norris
On Fri, Jun 26, 2020 at 2:49 PM Doug Anderson  wrote:
> I should also note that, while I'm not terribly familiar with Kalle's
> workflow, I would have expected to see him in the "To:" list.  I've
> added him, but it's possible he'll need you to repost the patch with
> him in the "To:" list.

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches#who_to_address
https://wireless.wiki.kernel.org/en/users/drivers/ath10k/submittingpatches

Patchwork is his patch queue, so I don't think you need to address him directly.

Brian


Re: [PATCH v3 5/8] ath10k: use new taint_firmware_crashed()

2020-06-02 Thread Brian Norris
On Tue, May 26, 2020 at 7:58 AM Luis Chamberlain  wrote:
>
> This makes use of the new taint_firmware_crashed() to help
> annotate when firmware for device drivers crash. When firmware
> crashes devices can sometimes become unresponsive, and recovery
> sometimes requires a driver unload / reload and in the worst cases
> a reboot.

Just for the record, the underlying problem you seem to be complaining
about does not appear to be a firmware crash at all. It does happen to
result in a firmware crash report much later on (because when the PCIe
endpoint is this hosed, sooner or later the driver thinks the firmware
is dead), but it's not likely the root cause. More below.

> Using a taint flag allows us to annotate when this happens clearly.
>
> I have run into this situation with this driver with the latest
> firmware as of today, May 21, 2020 using v5.6.0, leaving me at
> a state at which my only option is to reboot. Driver removal and
> addition does not fix the situation. This is reported on kernel.org
> bugzilla korg#207851 [0].

I took a look, and replied there:
https://bugzilla.kernel.org/show_bug.cgi?id=207851#c2

Per the above, it seems more likely you have a PCI or power management
problem, not an ath10k or ath10k-firmware problem.

> But this isn't the first firmware crash reported,
> others have been filed before and none of these bugs have yet been
> addressed [1] [2] [3].  Including my own I see these firmware crash
> reports:

Yes, firmware does crash. Sometimes repeatedly. It also happens to be
closed source, so it's nearly impossible for the average Linux dev to
debug. But FWIW, those 3 all appear to be recoverable -- and then they
crash again a few minutes later. So just as claimed on prior
iterations of this patchset, ath10k is doing fine at recovery [*] --
it's "only" the firmware that's a problem. (And, if a WiFi firmware
doesn't like something in the RF environment...it's totally
understandable that the crash will happen more than once. Of course
that sucks, but it's not unexpected.) Crucially, rebooting won't
really do anything to help these people, AIUI.

Maybe what you really want is to taint the kernel every time a
non-free firmware is loaded ;)

I'd also note that those 3 reports are 3 years old. There have been
many ath10k-firmware updates since then, so it's not necessarily fair
to dig those back up. Also, bugzilla.kernel.org is totally ignored by
many linux-wireless@ folks. But I digress...

All in all, I have no interest in this proposal, for many of the
reasons already mentioned on previous iterations. It's way too coarse
and won't be useful in understanding what's going on in a system, IMO,
at least for ath10k. But it's also easy enough to ignore, so if it
makes somebody happy to claim a taint, then so be it.

Regards,
Brian

[*] Although, at least one of those doesn't appear to be as "clean" of
a recovery attempt as typical. Maybe there are some lurking driver
bugs in there too.


>   * korg#207851 [0]
>   * korg#197013 [1]
>   * korg#201237 [2]
>   * korg#195987 [3]
>
> [0] https://bugzilla.kernel.org/show_bug.cgi?id=207851
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=197013
> [2] https://bugzilla.kernel.org/show_bug.cgi?id=201237
> [3] https://bugzilla.kernel.org/show_bug.cgi?id=195987


Re: [PATCH] wireless: ath10k: Return early in ath10k_qmi_event_server_exit() to avoid hard crash on reboot

2020-06-02 Thread Brian Norris
On Tue, Jun 2, 2020 at 12:40 PM John Stultz  wrote:
> On Tue, Jun 2, 2020 at 12:16 PM Brian Norris  wrote:
> > On Mon, Jun 1, 2020 at 10:25 PM John Stultz  wrote:
> > >
> > > Ever since 5.7-rc1, if we call
> > > ath10k_qmi_remove_msa_permission(), the db845c hard crashes on
> > > reboot, resulting in the device getting stuck in the usb crash
> > > debug mode and not coming back up wihthout a hard power off.
> > >
> > > This hack avoids the issue by returning early in
> > > ath10k_qmi_event_server_exit().
> > >
> > > A better solution is very much desired!
> >
> > Any chance you can bisect what caused this? There are a lot of
> > non-ath10k pieces involved in this stuff.
>
> Amit had spent some work on chasing it down to the in kernel qrtr-ns
> work, and reported it here:
>   https://lists.infradead.org/pipermail/ath10k/2020-April/014970.html
>
> But that discussion seemingly stalled out, so I came up with this hack
> to workaround it for us.

If I'm reading it right, then that means we should revert this stuff
from v5.7-rc1:

0c2204a4ad71 net: qrtr: Migrate nameservice to kernel from userspace

At least, until people can resolve the tail end of that thread. New
features (ath11k, etc.) are not a reason to break existing features
(ath10k/wcn3990).

Brian


Re: [PATCH] wireless: ath10k: Return early in ath10k_qmi_event_server_exit() to avoid hard crash on reboot

2020-06-02 Thread Brian Norris
+ Sibi

On Mon, Jun 1, 2020 at 10:25 PM John Stultz  wrote:
>
> Ever since 5.7-rc1, if we call
> ath10k_qmi_remove_msa_permission(), the db845c hard crashes on
> reboot, resulting in the device getting stuck in the usb crash
> debug mode and not coming back up wihthout a hard power off.
>
> This hack avoids the issue by returning early in
> ath10k_qmi_event_server_exit().
>
> A better solution is very much desired!

Any chance you can bisect what caused this? There are a lot of
non-ath10k pieces involved in this stuff.

Brian


Re: [PATCH] Revert "ath: add support for special 0x0 regulatory domain"

2020-06-02 Thread Brian Norris
On Thu, May 28, 2020 at 8:42 AM Adrian Chadd  wrote:
> On Thu, 28 May 2020 at 05:02, Julian Calaby  wrote:
> > On Thu, May 28, 2020 at 5:18 AM Brian Norris  
> > wrote:
> > >
> > > This reverts commit 2dc016599cfa9672a147528ca26d70c3654a5423.
> > >
> > > Users are reporting regressions in regulatory domain detection and
> > > channel availability.
> > >
> > > The problem this was trying to resolve was fixed in firmware anyway:
> >
> > Should we tell the user their firmware needs to be upgraded if it
> > reports this regulatory domain instead of completely dropping support
> > for it?

I'm not really sure how to do that properly in general, and I don't
plan to do so. I'm simply reverting a change that caused people
problems, and noting at the same time that the original problem was
resolved differently.

I don't really have a stake in this patch, because everything I care
about works correctly either way. (And AFAICT, any hardware that is
affected by this patch is somewhat broken.) I'm only posting the
revert as a community service, because Wen couldn't be bothered to do
it himself.

> Also that commit mentioned a 6174 firmware, but what about all the other 
> older chips with a regulatory domain of 0x0 ?

My understanding was that no QCA modules *should* be shipped with a
value of 0 in this field. The instance I'm aware of was more or less a
manufacturing error I think, and we got Qualcomm to patch it over in
software. I don't think people expected anybody else to have shipped
modules with a 0 value, but apparently they did. I don't know what to
do with those, other than just leave well enough alone (i.e., $subject
revert).

> As a side note, I'd /really appreciate/ if ath10k changes were tested on a 
> variety of ath10k hardware and firmware revisions, rather than just either 
> the Rome or embedded radios, rather than also including peregrine, cascade, 
> besra, etc.

Wouldn't we all love it if everybody else tested appropriately. But
Qualcomm folks can't be coordinated (trust me, I've tried), and apart
from things like KernelCI (which so far has no WiFi tests, IIUC),
there's no community testing efforts that don't involve
"${RANDOM_PERSON} boots ${PERSONAL_BOX} and see if it blows up."

This also might not be the best place to admit it, but I'll be up
front: I have no idea what peregrine, cascade, or besra are.

Brian


[PATCH] Revert "ath: add support for special 0x0 regulatory domain"

2020-05-27 Thread Brian Norris
This reverts commit 2dc016599cfa9672a147528ca26d70c3654a5423.

Users are reporting regressions in regulatory domain detection and
channel availability.

The problem this was trying to resolve was fixed in firmware anyway:

QCA6174 hw3.0: sdio-4.4.1: add firmware.bin_WLAN.RMH.4.4.1-00042

https://github.com/kvalo/ath10k-firmware/commit/4d382787f0efa77dba40394e0bc604f8eff82552

Link: https://bbs.archlinux.org/viewtopic.php?id=254535
Link: http://lists.infradead.org/pipermail/ath10k/2020-April/014871.html
Link: http://lists.infradead.org/pipermail/ath10k/2020-May/015152.html
Fixes: 2dc016599cfa ("ath: add support for special 0x0 regulatory domain")
Cc: 
Cc: Wen Gong 
Signed-off-by: Brian Norris 
---
 drivers/net/wireless/ath/regd.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/ath/regd.c b/drivers/net/wireless/ath/regd.c
index bee9110b91f3..20f4f8ea9f89 100644
--- a/drivers/net/wireless/ath/regd.c
+++ b/drivers/net/wireless/ath/regd.c
@@ -666,14 +666,14 @@ ath_regd_init_wiphy(struct ath_regulatory *reg,
 
 /*
  * Some users have reported their EEPROM programmed with
- * 0x8000 or 0x0 set, this is not a supported regulatory
- * domain but since we have more than one user with it we
- * need a solution for them. We default to 0x64, which is
- * the default Atheros world regulatory domain.
+ * 0x8000 set, this is not a supported regulatory domain
+ * but since we have more than one user with it we need
+ * a solution for them. We default to 0x64, which is the
+ * default Atheros world regulatory domain.
  */
 static void ath_regd_sanitize(struct ath_regulatory *reg)
 {
-   if (reg->current_rd != COUNTRY_ERD_FLAG && reg->current_rd != 0)
+   if (reg->current_rd != COUNTRY_ERD_FLAG)
return;
printk(KERN_DEBUG "ath: EEPROM regdomain sanitized\n");
reg->current_rd = 0x64;
-- 
2.27.0.rc0.183.gde8f92d652-goog



Re: [PATCH v2 12/15] ath10k: use new module_firmware_crashed()

2020-05-21 Thread Brian Norris
On Tue, May 19, 2020 at 10:37 PM Emmanuel Grumbach  wrote:
> So I believe we already have this uevent, it is the devcoredump. All
> we need is to add the unique id.

I think there are a few reasons that devcoredump doesn't satisfy what
either Luis or I want.

1) it can be disabled entirely [1], for good reasons (e.g., think of
non-${CHIP_VENDOR} folks, who can't (and don't want to) do anything
with the opaque dumps provided by closed-source firmware)
2) not all drivers necessarily have a useful dump to provide when
there's a crash; look at the rest of Luis's series to see the kinds of
drivers-with-firmware that are crashing, some of which aren't dumping
anything
3) for those that do support devcoredump, it may be used for purposes
that are not "crashes" -- e.g., some provide debugfs or other knobs to
initiate dumps, for diagnostic or debugging purposes

Brian

[1] devcd_disabled
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/devcoredump.c?h=v5.6#n22


Re: [PATCH] ath9k: release allocated buffer if timed out

2020-05-20 Thread Brian Norris
On Wed, May 13, 2020 at 12:02 PM Brian Norris  wrote:
>
> On Wed, May 13, 2020 at 12:05 AM Kalle Valo  wrote:
> > Actually it's already reverted in -next, nobody just realised that it's
> > a regression from commit 728c1e2a05e4:
> >
> > ced21a4c726b ath9k: Fix use-after-free Read in htc_connect_service
>
> Nice.
>
> > v5.8-rc1 should be the first release having the fix.
>
> So I guess we have to wait until 5.8-rc1 (when this lands in mainline)
> to send this manually to sta...@vger.kernel.org?

For the record, there are more reports of this, if I'm reading them right:

https://bugzilla.kernel.org/show_bug.cgi?id=207797


Re: [PATCH v2 12/15] ath10k: use new module_firmware_crashed()

2020-05-19 Thread Brian Norris
Hi Luis,

On Tue, May 19, 2020 at 7:02 AM Luis Chamberlain  wrote:
> On Mon, May 18, 2020 at 06:23:33PM -0700, Brian Norris wrote:
> > On Sat, May 16, 2020 at 6:51 AM Johannes Berg  
> > wrote:
> > > In addition, look what we have in iwl_trans_pcie_removal_wk(). If we
> > > detect that the device is really wedged enough that the only way we can
> > > still try to recover is by completely unbinding the driver from it, then
> > > we give userspace a uevent for that. I don't remember exactly how and
> > > where that gets used (ChromeOS) though, but it'd be nice to have that
> > > sort of thing as part of the infrastructure, in a sort of two-level
> > > notification?
> >
> > 
> > We use this on certain devices where we know the underlying hardware
> > has design issues that may lead to device failure
>
> Ah, after reading below I see you meant for iwlwifi.

Sorry, I was replying to Johannes, who I believe had his "we"="Intel"
hat (as iwlwifi maintainer) on, and was pointing at
iwl_trans_pcie_removal_wk().

> If userspace can indeed grow to support this, that would be fantastic.

Well, Chrome OS tailors its user space a bit more to the hardware (and
kernel/drivers in use) than the average distro might. We already do
this (for some values of "this") today. Is that "fantastic" to you? :D

> > -- then when we see
> > this sort of unrecoverable "firmware-death", we remove the
> > device[*]+driver, force-reset the PCI device (SBR), and try to
> > reload/reattach the driver. This all happens by way of a udev rule.
>
> So you've sprikled your own udev event here as part of your kernel delta?

No kernel delta -- the event is there already:
iwl_trans_pcie_removal_wk()
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/wireless/intel/iwlwifi/pcie/trans.c?h=v5.6#n2027

And you can see our udev rules and scripts, in all their ugly details
here, if you really care:
https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/master/net-wireless/iwlwifi_rescan/files/

> > We
> > also log this sort of stuff (and metrics around it) for bug reports
> > and health statistics, since we really hope to not see this happen
> > often.
>
> Assuming perfection is ideal but silly. So, what infrastructure do you
> use for this sort of issue?

We don't yet log firmware crashes generally, but for all our current
crash reports (including WARN()), they go through this:
https://chromium.googlesource.com/chromiumos/platform2/+/master/crash-reporter/README.md

For example, look for "cut here" in:
https://chromium.googlesource.com/chromiumos/platform2/+/master/crash-reporter/anomaly_detector.cc

For other specific metrics (like counting "EVENT=INACCESSIBLE"), we
use the Chrome UMA system:
https://chromium.googlesource.com/chromiumos/platform2/+/master/metrics/README.md

I don't imagine the "infrastructure" side of any of that would be
useful to you, but maybe the client-side gathering can at least show
you what we do.

> > [*] "We" (user space) don't actually do this...it happens via the
> > 'remove_when_gone' module parameter abomination found in iwlwifi.
>
> BTW is this likely a place on iwlwifi where the firmware likely crashed?

iwl_trans_pcie_removal_wk() is triggered because HW accesses timed out
in a way that is likely due to a dead PCIe endpoint. It's not directly
a firmware crash, although there may be firmware crashes reported
around the same time.

Intel folks can probably give a more nuanced answer, but their
firmware crashes usually land in something like iwl_mvm_nic_error() ->
iwl_mvm_dump_nic_error_log() + iwl_mvm_nic_restart(). If you can make
your proposal less punishing (e.g., removing the "taint" as Johannes
requested), maybe iwlwifi would be another good candidate for
$subject. iwlwifi generally expects to recover seamlessly, but it's
also good to know if you've seen a hundred of these in a row.

> > A uevent
> > would suit us very well I think, although it would be nice if drivers
> > could also supply some small amount of informative text along with it
>
> A follow up to this series was to add a uevent to add_taint(), however
> since a *count* is not considered I think it is correct to seek
> alternatives at this point. The leaner the solution the better though.
>
> Do you have a pointer to what guys use so I can read?

Hopefully the above pointers are useful to you. We don't yet have
firmware-crash parsing implemented, so I don't have pointers for that
piece yet.

> > (e.g., a sort of "reason code", in case we can possibly aggregate
> > certain failure types). We already do this sort of thing for WARN()
> > and fri

Re: [PATCH v2 12/15] ath10k: use new module_firmware_crashed()

2020-05-18 Thread Brian Norris
On Sat, May 16, 2020 at 6:51 AM Johannes Berg  wrote:
> In addition, look what we have in iwl_trans_pcie_removal_wk(). If we
> detect that the device is really wedged enough that the only way we can
> still try to recover is by completely unbinding the driver from it, then
> we give userspace a uevent for that. I don't remember exactly how and
> where that gets used (ChromeOS) though, but it'd be nice to have that
> sort of thing as part of the infrastructure, in a sort of two-level
> notification?


We use this on certain devices where we know the underlying hardware
has design issues that may lead to device failure -- then when we see
this sort of unrecoverable "firmware-death", we remove the
device[*]+driver, force-reset the PCI device (SBR), and try to
reload/reattach the driver. This all happens by way of a udev rule. We
also log this sort of stuff (and metrics around it) for bug reports
and health statistics, since we really hope to not see this happen
often.

[*] "We" (user space) don't actually do this...it happens via the
'remove_when_gone' module parameter abomination found in iwlwifi. I'd
personally rather see the EVENT=INACESSIBLE stuff on its own, and let
user space deal with when and how to remove and reset the device. But
I digress too much here ;)


I really came to this thread to say that I also love the idea of a
generic mechanism (a la $subject) to report firmware crashes, but I
also have no interest in seeing a taint flag for it. For Chrome OS, I
would readily (as in, we're already looking at more-hacky /
non-generic ways to do this for drivers we care about) process these
kinds of stats as they happen, logging metrics for bug reports and/or
for automated crash statistics, when we see a firmware crash. A uevent
would suit us very well I think, although it would be nice if drivers
could also supply some small amount of informative text along with it
(e.g., a sort of "reason code", in case we can possibly aggregate
certain failure types). We already do this sort of thing for WARN()
and friends (not via uevent, but via log parsing; at least it has nice
"cut here" markers!).

Perhaps devlink (as proposed down-thread) would also fit the bill. I
don't think sysfs alone would fit our needs, as we'd like to process
these things as they happen, not only when a user submits a bug
report.

> Level 1: firmware crashed, but we're recovering, at least mostly, and
> it's more informational

Chrome OS would love to track these things too, since we'd like to see
these minimized, even if they're usually recoverable ;)

> Level 2: device is wedged, going to try to recover by some more forceful
> means (perhaps some devices can be power-cycled? etc.) but (more) state
> would be lost in these cases?

And we'd definitely want to know about these. We already get this for
the iwlwifi case described above, in a non-generic way.

In general, it's probably not that easy to tell the difference between
1 and 2, since even as you and Luis have noted, with the same driver
(and the same driver location), you find the same crashes may or may
not be recoverable. iwlwifi has extracted certain level 2 cases into
iwl_trans_pcie_removal_wk(), but even iwlwifi doesn't know all the
ways in which level 1 crashes actually lead to severe
(non-recoverable) failure.

Regards,
Brian


Re: [PATCH] ath9k: release allocated buffer if timed out

2020-05-13 Thread Brian Norris
On Tue, May 12, 2020 at 8:25 PM Navid Emamdoost
 wrote:
> I found this via static analysis and as a result, did had the inputs
> to test it with (like the way fuzzing works).

Fuzzing is dynamic analysis, so I'm not sure how that fits.

> It may be beneficial if you could point me to any testing
> infrastructure that you use or are aware of for future cases.

syzbot (a real fuzzer -- I believe it uses fake USB devices [1])
caught the error, apparently:
https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git/commit/?id=ced21a4c726bdc60b1680c050a284b08803bc64c
so you might look at using that too.

Traditionally, "testing your patches" means having hardware that runs
the driver in question when patching said driver. That likely won't
scale for researchers, but then, perhaps it just means you need to be
more clear on how you caught the issue and how you did (or didn't)
test it, so it's easier to reconcile your claims with the testing done
by real users.

If you only did static analysis, then we can be more confident in
reverting. The fuzz-tested revert is an even nicer bonus.

Brian

[1] https://github.com/google/syzkaller/blob/master/docs/syzbot.md#usb-bugs
https://github.com/google/syzkaller/blob/master/docs/linux/external_fuzzing_usb.md


Re: [PATCH] ath9k: release allocated buffer if timed out

2020-05-13 Thread Brian Norris
On Wed, May 13, 2020 at 12:05 AM Kalle Valo  wrote:
> Actually it's already reverted in -next, nobody just realised that it's
> a regression from commit 728c1e2a05e4:
>
> ced21a4c726b ath9k: Fix use-after-free Read in htc_connect_service

Nice.

> v5.8-rc1 should be the first release having the fix.

So I guess we have to wait until 5.8-rc1 (when this lands in mainline)
to send this manually to sta...@vger.kernel.org?

Brian


Re: [PATCH] ath9k: release allocated buffer if timed out

2020-05-12 Thread Brian Norris
On Fri, Sep 6, 2019 at 11:59 AM Navid Emamdoost
 wrote:
>
> In ath9k_wmi_cmd, the allocated network buffer needs to be released
> if timeout happens. Otherwise memory will be leaked.
>
> Signed-off-by: Navid Emamdoost 

I wonder, did you actually test your patches? I ask, because it seems
that all your patches are of the same mechanical variety (produced by
some sort of research project?), and if I look around a bit, I see
several mistakes and regressions noted on your other patches. And
recently, I see someone reporting a 5.4 kernel regression, which looks
a lot like it was caused by this patch:

https://bugzilla.kernel.org/show_bug.cgi?id=207703#c1

I'll propose a revert, if there's no evidence this was actually tested
or otherwise confirmed to fix a real bug.

Brian


Re: [PATCH v4 resend 2] dt-bindings: net: btusb: DT fix s/interrupt-name/interrupt-names/

2020-05-05 Thread Brian Norris
On Tue, May 5, 2020 at 6:36 AM Geert Uytterhoeven
 wrote:
>
> The standard DT property name is "interrupt-names".
>
> Fixes: fd913ef7ce619467 ("Bluetooth: btusb: Add out-of-band wakeup support")
> Signed-off-by: Geert Uytterhoeven 
> Acked-by: Rob Herring 

If it matters:

Reviewed-by: Brian Norris 

We're definitely using the plural ("interrupt-names") not the
singular, so this was just a typo.

Brian


Re: [PATCH] net: rtw88: fix an issue about leak system resources

2020-05-04 Thread Brian Norris
(Markus is clearly not taking the hint, but FYI for everyone else:)

On Mon, May 4, 2020 at 8:00 AM Markus Elfring  wrote:
> > BTW, In the past week, you asked me to change the commit comments in my
> > 6 patches like this one. Let me return to the essence of patch, point
> > out the code problems and better solutions will be more popular.
>
> I would appreciate if various update suggestions would become nicer somehow.

Markus is not really providing any value to the community. Just search
for his recent mail history -- it's all silly commit message
nitpicking of little value. He's been blacklisted by a number of
people already:

https://lkml.kernel.org/lkml/20190919112937.ga3072...@kroah.com/

Some people continue to humor him, but it's mostly just a waste of
their time, as this has been going on for years. Just look at searches
like this, and tell me whether they produce anything useful:

https://lkml.kernel.org/lkml/?q=%22markus+elfring%22=5000

Brian


Re: [PATCH] rtc: report time-retrieval errors when updating alarm

2019-10-21 Thread Brian Norris
On Mon, Oct 21, 2019 at 10:48 AM Alexandre Belloni
 wrote:
> On 21/10/2019 10:20:08-0700, Brian Norris wrote:
> > Hi Alexandre!
> >
> > On Mon, Oct 21, 2019 at 9:11 AM Alexandre Belloni
> >  wrote:
> > > On 21/05/2018 09:42:22-0700, Brian Norris wrote:
> > > > __rtc_read_time() can fail (e.g., if the RTC uses an unreliable medium).
> > > > When it does, we don't report the error, but instead calculate a
> > > > 1-second alarm based on the potentially-garbage 'tm' (in practice,
> > > > __rtc_read_time() zeroes out the time first, so it's likely to still be
> > > > all 0).
> > > >
> > > > Let's propagate the error instead.
> > > >
> > >
> > > I submitted
> > > https://lore.kernel.org/linux-rtc/20191021155631.3342-2-alexandre.bell...@bootlin.com/T/#u
> > > to solve this after looking at all the implication checking the return
> > > value of __rtc_read_time had.
> >
> > Only about a year and a half late, nice!
>
> I know, right? :) The fact is that this is not a common issue or at
> least, I didn't have any report that this was causing real problems in
> the field. So it ended up being one of the longest standing patch in
> patchwork...

I suppose I could have put specific examples in here: the Rockchip
RK3399-based Gru family of Chromebooks
(arch/arm64/boot/dts/rockchip/rk3399-gru-{kevin,bob,scarlet}*.dts) use
the Chrome OS EC-based RTC (drivers/rtc/rtc-cros-ec.c), and the EC
protocol is prone to occasional errors. So we definitely have a
concrete case where this problem can be tickled. I guess I was too
terse in summarizing that as "if the RTC uses an unreliable medium"?

As for the actual symptoms: this was part of a variety of problems
that resulted in seeing interrupt storms from our EC/RTC when running
`hwclock -r`. I believe there were other patches that were more
critical to resolving the worst symptoms, but this error was noticed
along the way. If you care to read more, you can see our downstream
kernel patches here, when we first handled this problem:

https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1067442
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1066984
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1069546

Unfortunately, the bug links are private (they were dealing with
partner/factory issues), so you can only glean the implicit
information from the code. And since this was over a year ago, my
memory is a little fuzzy on what exactly the source of the interrupt
storm was...

> >Fortunately we have a decent
> > (albeit time-consuming) process for rebasing our downstream patches in
> > Chrome OS kernels...
> >
>
> Do you need that patch backported to LTS kernels?

Eh, I dunno. If anything that'll just cause us merge troubles (but not
too much) on our Chrome OS kernels, which already carry my patch. But
if there are any non-Chrome-OS users of these Chromebooks (there are)
that are seeing this problem (I'm not sure), they might appreciate it.

By the way, I wonder if your patch actually deserves a "Reported-by".
I suppose I also left off Jeffy as the reporter, but it would be:

Reported-by: Jeffy Chen 
Reported-by: Brian Norris 

Brian


Re: [PATCH] rtc: report time-retrieval errors when updating alarm

2019-10-21 Thread Brian Norris
Hi Alexandre!

On Mon, Oct 21, 2019 at 9:11 AM Alexandre Belloni
 wrote:
> On 21/05/2018 09:42:22-0700, Brian Norris wrote:
> > __rtc_read_time() can fail (e.g., if the RTC uses an unreliable medium).
> > When it does, we don't report the error, but instead calculate a
> > 1-second alarm based on the potentially-garbage 'tm' (in practice,
> > __rtc_read_time() zeroes out the time first, so it's likely to still be
> > all 0).
> >
> > Let's propagate the error instead.
> >
>
> I submitted
> https://lore.kernel.org/linux-rtc/20191021155631.3342-2-alexandre.bell...@bootlin.com/T/#u
> to solve this after looking at all the implication checking the return
> value of __rtc_read_time had.

Only about a year and a half late, nice! Fortunately we have a decent
(albeit time-consuming) process for rebasing our downstream patches in
Chrome OS kernels...

Brian


Re: [PATCH] MAINTAINERS: mtd/ubi/ubifs: Remove inactive maintainers

2019-10-18 Thread Brian Norris
On Thu, Oct 17, 2019 at 7:22 AM Miquel Raynal  wrote:
>
> Despite their substantial personal investment in the MTD/UBI/UBIFS a
> few years back, David, Brian, Artem and Adrian are not actively
> maintaining the subsystem anymore. We warmly salute them for all the
> work they have achieved and will of course still welcome their
> participation and reviews.
>
> That said, Marek retired himself a few weeks ago quoting Harald [1]:
>
> It matters who has which title and when. Should somebody not
> be an active maintainer, make sure he's not listed as such.
>
> For this same reason, let’s trim the maintainers list with the
> actually active ones over the past two years.
>
> [1] http://laforge.gnumonks.org/blog/20180307-mchardy-gpl/
>
> Cc: David Woodhouse 
> Cc: Brian Norris 
> Cc: Artem Bityutskiy 
> Cc: Adrian Hunter 
> Cc: Marek Vasut 
> Cc: Miquel Raynal 
> Cc: Richard Weinberger 
> Cc: Vignesh Raghavendra 
> Cc: Tudor Ambarus 
> Signed-off-by: Miquel Raynal 

I've been meaning to do this for a while, so thanks.

Acked-by: Brian Norris 


Re: [PATCH 3/3] HID: google: whiskers: mask out extra flags in EC event_type

2019-10-07 Thread Brian Norris
On Sat, Oct 5, 2019 at 3:16 AM Ikjoon Jang  wrote:
>
> Whiskers needs to get notifications from EC for getting current base
> attached state. EC sends extra bits in event_type field that receiver
> should mask out.

Notably, this patch was never actually landed upstream:

https://lore.kernel.org/patchwork/patch/1019477/
[PATCH] mfd: cros_ec: Add support for MKBP more event flags

and therefore, this EC_CMD_GET_NEXT_EVENT v2 handling is not yet truly
relevant. (i.e., no upstream-proper users should hit this bug yet.)
But that's also a reminder that we need a patch like this for *every*
cros_ec client driver that's using the event_type field. Other
unpatched drivers include
drivers/media/platform/cros-ec-cec/cros-ec-cec.c,
drivers/platform/chrome/cros_ec_chardev.c, and possibly others.

So I wonder: why don't we
(a) *really* try to upstream the above patch and
(b) fix it so that event_data.event_type *always* masks out
EC_MKBP_EVENT_TYPE_MASK
?

(We could still handle the EC_MKBP_HAS_MORE_EVENTS bit within
cros_ec.c, but there's no need for every other driver to have to know
anything about it.)

Of course, this is another reminder that we should *really* try to get
our cros_ec patches landed properly in upstream, because otherwise we
have a different set of bugs and features landing in various
downstream and mostly-upstream kernels.

Brian

...
> --- a/drivers/hid/hid-google-hammer.c
> +++ b/drivers/hid/hid-google-hammer.c
> @@ -96,8 +96,9 @@ static int cbas_ec_notify(struct notifier_block *nb,
> struct cros_ec_device *ec = _notify;
> unsigned long flags;
> bool base_present;
> +   const u8 event_type = ec->event_data.event_type & 
> EC_MKBP_EVENT_TYPE_MASK;
>
> -   if (ec->event_data.event_type == EC_MKBP_EVENT_SWITCH) {
> +   if (event_type == EC_MKBP_EVENT_SWITCH) {
> base_present = cbas_parse_base_state(
> >event_data.data.switches);
> dev_dbg(cbas_ec.dev,


Re: [PATCH] firmware: google: increment VPD key_len properly

2019-09-30 Thread Brian Norris
On Mon, Sep 30, 2019 at 2:45 PM Brian Norris  wrote:
> Fixes: 4b708b7b1a2c ("firmware: google: check if size is valid when decoding 
> VPD data")
> Cc: 

Perhaps I should have modified the subject to note the urgency (e.g.,
[PATCH 5.4]). The above regression was recently shipped to v4.14.146
and v4.19.75.

Brian


[PATCH] firmware: google: increment VPD key_len properly

2019-09-30 Thread Brian Norris
Commit 4b708b7b1a2c ("firmware: google: check if size is valid when
decoding VPD data") adds length checks, but the new vpd_decode_entry()
function botched the logic -- it adds the key length twice, instead of
adding the key and value lengths separately.

On my local system, this means vpd.c's vpd_section_create_attribs() hits
an error case after the first attribute it parses, since it's no longer
looking at the correct offset. With this patch, I'm back to seeing all
the correct attributes in /sys/firmware/vpd/...

Fixes: 4b708b7b1a2c ("firmware: google: check if size is valid when decoding 
VPD data")
Cc: 
Cc: Hung-Te Lin 
Cc: Guenter Roeck 
Cc: Stephen Boyd 
Signed-off-by: Brian Norris 
---
 drivers/firmware/google/vpd_decode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/firmware/google/vpd_decode.c 
b/drivers/firmware/google/vpd_decode.c
index dda525c0f968..5c6f2a74f104 100644
--- a/drivers/firmware/google/vpd_decode.c
+++ b/drivers/firmware/google/vpd_decode.c
@@ -52,7 +52,7 @@ static int vpd_decode_entry(const u32 max_len, const u8 
*input_buf,
if (max_len - consumed < *entry_len)
return VPD_FAIL;
 
-   consumed += decoded_len;
+   consumed += *entry_len;
*_consumed = consumed;
return VPD_OK;
 }
-- 
2.23.0.444.g18eeb5a265-goog



Re: [PATCH] nl80211: add NL80211_CMD_UPDATE_FT_IES to supported commands

2019-08-22 Thread Brian Norris
On Thu, Aug 22, 2019 at 10:48:06AM -0700, Matthew Wang wrote:
> Add NL80211_CMD_UPDATE_FT_IES to supported commands. In mac80211 drivers,
> this can be implemented via existing NL80211_CMD_AUTHENTICATE and
> NL80211_ATTR_IE, but non-mac80211 drivers have a separate command for
> this. A driver supports FT if it either is mac80211 or supports this
> command.
> 
> Signed-off-by: Matthew Wang 

FWIW:

Reviewed-by: Brian Norris 

> Change-Id: I93e3d09a6d949466d1aea48bff2c3ad862edccc6

Oops :)


Re: [PATCH v3 1/2] rtw88: pci: Rearrange the memory usage for skb in RX ISR

2019-08-15 Thread Brian Norris
Hi all,

I realize this already is merged, and it had some previous review
comments that led to the decisions in this patch, but I'd still like
to ask here, where I think I'm reaching the relevant parties:

On Wed, Jul 10, 2019 at 1:43 AM Jian-Hong Pan  wrote:
...
> This patch allocates a new, data-sized skb first in RX ISR. After
> copying the data in, we pass it to the upper layers. However, if skb
> allocation fails, we effectively drop the frame. In both cases, the
> original, full size ring skb is reused.
>
> In addition, by fixing the kernel crash, the RX routine should now
> generally behave better under low memory conditions.
>
> Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204053
> Signed-off-by: Jian-Hong Pan 
> Cc: 
> ---
> v2:
>  - Allocate new data-sized skb and put data into it, then pass it to
>mac80211. Reuse the original skb in RX ring by DMA sync.

Is it really wise to force an extra memcpy() for *every* delivery?
Isn't there some other strategy that could be used to properly handle
low-memory scenarios while still passing the original buffer up to
higher layers most of the time? Or is it really so bad to keep
re-allocating RTK_PCI_RX_BUF_SIZE (>8KB) of contiguous memory, to
re-fill the RX ring? And if that is so bad, can we reduce the
requirement for contiguous memory instead? (e.g., keep with smaller
buffers, and perform aggregation / scatter-gather only for frames that
are really larger?)

Anyway, that's mostly a long-term thought, as this patch is good for
fixing the important memory errors, even if it's not necessarily the
ideal solution.

Regards,
Brian


Re: Realtek r8822be wireless card fails to work with new rtw88 kernel module

2019-08-06 Thread Brian Norris
+ yhchuang

On Tue, Aug 6, 2019 at 7:32 AM 고준  wrote:
>
> Hello,
>
> I recently reported a bug to Ubuntu regarding a regression in wireless
> driver support for the Realtek r8822be wireless chipset. The issue
> link on launchpad is:
>
> https://bugs.launchpad.net/bugs/1838133
>
> After Canonical developers triaged the bug they determined that the
> problem lies upstream, and instructed me to send mails to the relevant
> kernel module maintainers at Realtek and to the general kernel.org
> mailing list.
>
> I built kernel 5.3.0-rc1+ with the latest realtek drivers from
> wireless-drivers-next but my Realtek r8822be doesn't work with
> rtw88/rtwpci kernel modules.
>
> Please let me know if there is any additional information I can
> provide that would help in debugging this issue.

Any chance this would help you?

https://patchwork.kernel.org/patch/11065631/

Somebody else was complaining about 8822be regressions that were fixed
with that.

Brian


Re: [PATCH] driver core: platform: return -ENXIO for missing GpioInt

2019-08-06 Thread Brian Norris
On Mon, Jul 29, 2019 at 1:50 PM Brian Norris  wrote:
> Side note: it might have helped alleviate some of this pain if there
> were email notifications to the mailing list when a patch gets applied.
> I didn't realize (and I'm not sure if Enrico did) that v2 was already
> merged by the time I noted its mistakes. If I had known, I would have
> suggested a follow-up patch, not a v3.

I guess I'll be the bot this time: 'twas applied by Greg on Tuesday,
July 30 UTC-07:00.

Thanks,
Brian


[PATCH] Revert "mwifiex: fix system hang problem after resume"

2019-08-05 Thread Brian Norris
This reverts commit 437322ea2a36d112e20aa7282c869bf924b3a836.

This above-mentioned "fix" does not actually do anything to prevent a
race condition. It simply papers over it so that the issue doesn't
appear.

If this is a real problem, it should be explained better than the above
commit does, and an alternative, non-racy solution should be found.

For further reason to revert this: there's no reason we can't try
resetting the card when it's *actually* stuck in host-sleep mode. So
instead, this is unnecessarily creating scenarios where we can't recover
Wifi (and in fact, I'm fielding reports of Chromebooks that can't
recover after the aforementioned commit).

Note that this was proposed in 2017 and Ack'ed then, but due to my
marking as RFC, it never went anywhere:

https://patchwork.kernel.org/patch/9657277/
[RFC] Revert "mwifiex: fix system hang problem after resume"

Cc: Amitkumar Karwar 
Signed-off-by: Brian Norris 
Reviewed-by: Dmitry Torokhov 
Acked-by: Amitkumar Karwar 
Tested-by: Matthias Kaehlcke 
---
 drivers/net/wireless/marvell/mwifiex/init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/init.c 
b/drivers/net/wireless/marvell/mwifiex/init.c
index 6c0e52eb8794..1aa93e7e9835 100644
--- a/drivers/net/wireless/marvell/mwifiex/init.c
+++ b/drivers/net/wireless/marvell/mwifiex/init.c
@@ -59,7 +59,7 @@ static void wakeup_timer_fn(struct timer_list *t)
adapter->hw_status = MWIFIEX_HW_STATUS_RESET;
mwifiex_cancel_all_pending_cmd(adapter);
 
-   if (adapter->if_ops.card_reset && !adapter->hs_activated)
+   if (adapter->if_ops.card_reset)
adapter->if_ops.card_reset(adapter);
 }
 
-- 
2.22.0.770.g0f2c4a37fd-goog



Re: [RFC PATCH] Revert "mwifiex: fix system hang problem after resume"

2019-08-02 Thread Brian Norris
On Fri, Aug 2, 2019 at 6:55 PM Kalle Valo  wrote:
>
> Brian Norris  writes:
>
> > + Doug, Matthias, who are seeing problems (or, failure to try to
> > recover, as predicted below)
> > + Amit's new email
> > + new maintainers
> >
> > Perhaps it's my fault for marking this RFC. But I changed the status
> > back to "New" in Patchwork, in case that helps:
>
> But I still see it marked as RFC. So the patch in question is:
>
> https://patchwork.kernel.org/patch/9657277/

Oops, I didn't hit the "Update" button :(

I changed it now, but I'll change it back again.

> Changing the patchwork state to RFC means that it's dropped and out of
> my radar. Also, if I see "RFC" in the subject I assume that's a patch
> which I should not apply by default.

Ack. Well, there were some "RFCs" I sent recently that you *did*
apply, so I didn't really know what happens normally.

> > On Fri, Mar 31, 2017 at 01:21:36PM -0700, Brian Norris wrote:
...
> > FWIW, I got an Acked-by from Amit when he was still at Marvell. And
> > another Reviewed-by from Dmitry. This still applies. Should I resend?
> > (I'll do that if I don't hear a response within a few days.)
>
> This patch is from 2017 so better to resend, and without RFC markings.

Yep, will do.

Brian


Re: [RFC PATCH] Revert "mwifiex: fix system hang problem after resume"

2019-08-02 Thread Brian Norris
+ Doug, Matthias, who are seeing problems (or, failure to try to
recover, as predicted below)
+ Amit's new email
+ new maintainers

Perhaps it's my fault for marking this RFC. But I changed the status
back to "New" in Patchwork, in case that helps:

On Fri, Mar 31, 2017 at 01:21:36PM -0700, Brian Norris wrote:
> This reverts commit 437322ea2a36d112e20aa7282c869bf924b3a836.
> 
> This above-mentioned "fix" does not actually do anything to prevent a
> race condition. It simply papers over it so that the issue doesn't
> appear.
> 
> If this is a real problem, it should be explained better than the above
> commit does, and an alternative, non-racy solution should be found.
> 
> For further reason to revert this: there's ot reason we can't try

s/ot/no/

...oops.

> resetting the card when it's *actually* stuck in host-sleep mode. So
> instead, this is unnecessarily creating scenarios where we can't recover
> Wifi.
> 
> Cc: Amitkumar Karwar 
> Signed-off-by: Brian Norris 
> ---
> Amit, please take a look. AIUI, your "fix" is wrong, and quite racy. If you
> still think it's needed, can you please propose an alternative? Or at least
> explain more why this is needed? Thanks.

FWIW, I got an Acked-by from Amit when he was still at Marvell. And
another Reviewed-by from Dmitry. This still applies. Should I resend?
(I'll do that if I don't hear a response within a few days.)

Thanks,
Brian

>  drivers/net/wireless/marvell/mwifiex/init.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/wireless/marvell/mwifiex/init.c 
> b/drivers/net/wireless/marvell/mwifiex/init.c
> index 756948385b60..0dab77b526de 100644
> --- a/drivers/net/wireless/marvell/mwifiex/init.c
> +++ b/drivers/net/wireless/marvell/mwifiex/init.c
> @@ -60,7 +60,7 @@ static void wakeup_timer_fn(unsigned long data)
>   adapter->hw_status = MWIFIEX_HW_STATUS_RESET;
>   mwifiex_cancel_all_pending_cmd(adapter);
>  
> - if (adapter->if_ops.card_reset && !adapter->hs_activated)
> + if (adapter->if_ops.card_reset)
>   adapter->if_ops.card_reset(adapter);
>  }
>  
> -- 
> 2.12.2.564.g063fe858b8-goog
> 


Re: [PATCH] driver core: platform: return -ENXIO for missing GpioInt

2019-07-29 Thread Brian Norris
On Mon, Jul 29, 2019 at 1:54 PM Nathan Chancellor
 wrote:
> On Mon, Jul 29, 2019 at 01:49:54PM -0700, Brian Norris wrote:
> > Side note: it might have helped alleviate some of this pain if there
> > were email notifications to the mailing list when a patch gets applied.
> > I didn't realize (and I'm not sure if Enrico did) that v2 was already
> > merged by the time I noted its mistakes. If I had known, I would have
> > suggested a follow-up patch, not a v3.
>
> I've found this to be fairly reliable for getting notified when
> something gets applied if it is a tree that shows up in -next.
>
> https://www.kernel.org/get-notifications-for-your-patches.html

I didn't send the original patch. I was only debugging/reviewing
someone else's patch, and jumped in after its authorship (as it hit
issues in our own CI system). So it was more of a "drive-by" scenario,
and it doesn't sound like that page addresses this situation.

Brian


[PATCH] driver core: platform: return -ENXIO for missing GpioInt

2019-07-29 Thread Brian Norris
Commit daaef255dc96 ("driver: platform: Support parsing GpioInt 0 in
platform_get_irq()") broke the Embedded Controller driver on most LPC
Chromebooks (i.e., most x86 Chromebooks), because cros_ec_lpc expects
platform_get_irq() to return -ENXIO for non-existent IRQs.
Unfortunately, acpi_dev_gpio_irq_get() doesn't follow this convention
and returns -ENOENT instead. So we get this error from cros_ec_lpc:

   couldn't retrieve IRQ number (-2)

I see a variety of drivers that treat -ENXIO specially, so rather than
fix all of them, let's fix up the API to restore its previous behavior.

I reported this on v2 of this patch:

https://lore.kernel.org/lkml/20190220180538.ga42...@google.com/

but apparently the patch had already been merged before v3 got sent out:

https://lore.kernel.org/lkml/20190221193429.161300-1-egran...@chromium.org/

and the result is that the bug landed and remains unfixed.

I differ from the v3 patch by:
 * allowing for ret==0, even though acpi_dev_gpio_irq_get() specifically
   documents (and enforces) that 0 is not a valid return value (noted on
   the v3 review)
 * adding a small comment

Reported-by: Brian Norris 
Reported-by: Salvatore Bellizzi 
Cc: Enrico Granata 
Cc: 
Fixes: daaef255dc96 ("driver: platform: Support parsing GpioInt 0 in 
platform_get_irq()")
Signed-off-by: Brian Norris 
---
Side note: it might have helped alleviate some of this pain if there
were email notifications to the mailing list when a patch gets applied.
I didn't realize (and I'm not sure if Enrico did) that v2 was already
merged by the time I noted its mistakes. If I had known, I would have
suggested a follow-up patch, not a v3.

I know some maintainers' "tip bots" do this, but not all apparently.

 drivers/base/platform.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 506a0175a5a7..ec974ba9c0c4 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -157,8 +157,13 @@ int platform_get_irq(struct platform_device *dev, unsigned 
int num)
 * the device will only expose one IRQ, and this fallback
 * allows a common code path across either kind of resource.
 */
-   if (num == 0 && has_acpi_companion(>dev))
-   return acpi_dev_gpio_irq_get(ACPI_COMPANION(>dev), num);
+   if (num == 0 && has_acpi_companion(>dev)) {
+   int ret = acpi_dev_gpio_irq_get(ACPI_COMPANION(>dev), num);
+
+   /* Our callers expect -ENXIO for missing IRQs. */
+   if (ret >= 0 || ret == -EPROBE_DEFER)
+   return ret;
+   }
 
return -ENXIO;
 #endif
-- 
2.22.0.709.g102302147b-goog



Re: [PATCH 5.3] mwifiex: fix 802.11n/WPA detection

2019-07-29 Thread Brian Norris
On Mon, Jul 29, 2019 at 9:01 AM Takashi Iwai  wrote:
> This isn't seen in linux-next yet.

Apparently not.

> Still pending review?

I guess? Probably mostly pending maintainer attention.

Also, Johannes already had noticed (and privately messaged me): this
patch took a while to show up on the linux-wireless Patchwork
instance. So the first review (from Guenter Roeck) and my extra reply
noting the -stable regression didn't make it to Patchwork:

https://patchwork.kernel.org/patch/11057585/

> In anyway,
>   Reviewed-by: Takashi Iwai 

Thanks. Hopefully Kalle can pick it up.

Brian


[PATCH] mac80211: don't WARN on short WMM parameters from AP

2019-07-26 Thread Brian Norris
In a very similar spirit to commit c470bdc1aaf3 ("mac80211: don't WARN
on bad WMM parameters from buggy APs"), an AP may not transmit a
fully-formed WMM IE. For example, it may miss or repeat an Access
Category. The above loop won't catch that and will instead leave one of
the four ACs zeroed out. This triggers the following warning in
drv_conf_tx()

  wlan0: invalid CW_min/CW_max: 0/0

and it may leave one of the hardware queues unconfigured. If we detect
such a case, let's just print a warning and fall back to the defaults.

Tested with a hacked version of hostapd, intentionally corrupting the
IEs in hostapd_eid_wmm().

Signed-off-by: Brian Norris 
---
 net/mac80211/mlme.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index a99ad0325309..4c888dc9bd81 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -2042,6 +2042,16 @@ ieee80211_sta_wmm_params(struct ieee80211_local *local,
ieee80211_regulatory_limit_wmm_params(sdata, [ac], ac);
}
 
+   /* WMM specification requires all 4 ACIs. */
+   for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) {
+   if (params[ac].cw_min == 0) {
+   sdata_info(sdata,
+  "AP has invalid WMM params (missing AC %d), 
using defaults\n",
+  ac);
+   return false;
+   }
+   }
+
for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) {
mlme_dbg(sdata,
 "WMM AC=%d acm=%d aifs=%d cWmin=%d cWmax=%d txop=%d 
uapsd=%d, downgraded=%d\n",
-- 
2.22.0.709.g102302147b-goog



[PATCH 5.3] mwifiex: fix 802.11n/WPA detection

2019-07-24 Thread Brian Norris
Commit 63d7ef36103d ("mwifiex: Don't abort on small, spec-compliant
vendor IEs") adjusted the ieee_types_vendor_header struct, which
inadvertently messed up the offsets used in
mwifiex_is_wpa_oui_present(). Add that offset back in, mirroring
mwifiex_is_rsn_oui_present().

As it stands, commit 63d7ef36103d breaks compatibility with WPA (not
WPA2) 802.11n networks, since we hit the "info: Disable 11n if AES is
not supported by AP" case in mwifiex_is_network_compatible().

Fixes: 63d7ef36103d ("mwifiex: Don't abort on small, spec-compliant vendor IEs")
Cc: 
Signed-off-by: Brian Norris 
---
 drivers/net/wireless/marvell/mwifiex/main.h | 1 +
 drivers/net/wireless/marvell/mwifiex/scan.c | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/main.h 
b/drivers/net/wireless/marvell/mwifiex/main.h
index 3e442c7f7882..095837fba300 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.h
+++ b/drivers/net/wireless/marvell/mwifiex/main.h
@@ -124,6 +124,7 @@ enum {
 
 #define MWIFIEX_MAX_TOTAL_SCAN_TIME(MWIFIEX_TIMER_10S - MWIFIEX_TIMER_1S)
 
+#define WPA_GTK_OUI_OFFSET 2
 #define RSN_GTK_OUI_OFFSET 2
 
 #define MWIFIEX_OUI_NOT_PRESENT0
diff --git a/drivers/net/wireless/marvell/mwifiex/scan.c 
b/drivers/net/wireless/marvell/mwifiex/scan.c
index 0d6d41727037..21dda385f6c6 100644
--- a/drivers/net/wireless/marvell/mwifiex/scan.c
+++ b/drivers/net/wireless/marvell/mwifiex/scan.c
@@ -181,7 +181,8 @@ mwifiex_is_wpa_oui_present(struct mwifiex_bssdescriptor 
*bss_desc, u32 cipher)
u8 ret = MWIFIEX_OUI_NOT_PRESENT;
 
if (has_vendor_hdr(bss_desc->bcn_wpa_ie, WLAN_EID_VENDOR_SPECIFIC)) {
-   iebody = (struct ie_body *) bss_desc->bcn_wpa_ie->data;
+   iebody = (struct ie_body *)((u8 *)bss_desc->bcn_wpa_ie->data +
+   WPA_GTK_OUI_OFFSET);
oui = _wpa_oui[cipher][0];
ret = mwifiex_search_oui_in_ie(iebody, oui);
if (ret)
-- 
2.22.0.657.g960e92d24f-goog



Re: [PATCH 5.3] mwifiex: fix 802.11n/WPA detection

2019-07-24 Thread Brian Norris
On Wed, Jul 24, 2019 at 12:46:34PM -0700, Brian Norris wrote:
> Fixes: 63d7ef36103d ("mwifiex: Don't abort on small, spec-compliant vendor 
> IEs")
> Cc: 
> Signed-off-by: Brian Norris 

To add to this: unfortunately, the above went out to -stable earlier
this week. So a prompt merging would be appreciated, pending review of
course.

Sorry for the breakage.

/me goes to add another (embarrasingly missing) test case to our WiFi
test suite.

Brian


Re: [PATCH 2/2] mwifiex: Make use of the new sdio_trigger_replug() API to reset

2019-07-19 Thread Brian Norris
Hi Doug,

On Tue, Jul 16, 2019 at 09:42:09AM -0700, Doug Anderson wrote:
> As described in the patch ("mmc: core: Add sdio_trigger_replug()
> API"), the current mwifiex_sdio_card_reset() is broken in the cases
> where we're running Bluetooth on a second SDIO func on the same card
> as WiFi.  The problem goes away if we just use the
> sdio_trigger_replug() API call.

I'm unfortunately not a good evaluator of SDIO/MMC stuff, so I'll mostly
leave that to others and assume that the "replug" description is pretty
much all I need to know.

> NOTE: Even though with this new solution there is less of a reason to
> do our work from a workqueue (the unplug / plug mechanism we're using
> is possible for a human to perform at any time so the stack is
> supposed to handle it without it needing to be called from a special
> context), we still need a workqueue because the Marvell reset function
> could called from a context where sleeping is invalid and thus we
> can't claim the host.  One example is Marvell's wakeup_timer_fn().
> 
> Signed-off-by: Douglas Anderson 
> ---
> 
>  drivers/net/wireless/marvell/mwifiex/sdio.c | 14 +++---
>  1 file changed, 3 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c 
> b/drivers/net/wireless/marvell/mwifiex/sdio.c
> index 24c041dad9f6..f77ad2615f08 100644
> --- a/drivers/net/wireless/marvell/mwifiex/sdio.c
> +++ b/drivers/net/wireless/marvell/mwifiex/sdio.c
> @@ -2218,14 +2218,6 @@ static void mwifiex_sdio_card_reset_work(struct 
> mwifiex_adapter *adapter)
>  {
>   struct sdio_mmc_card *card = adapter->card;
>   struct sdio_func *func = card->func;
> - int ret;
> -
> - mwifiex_shutdown_sw(adapter);

I'm very mildly unhappy to see this driver diverge from the PCIe one
again, but the only way it makes sense to do things the same is if there
is such thing as a "function level reset" for SDIO (i.e., doesn't also
kill the Bluetooth function). But it appears we don't really have such a
thing.

> -
> - /* power cycle the adapter */
> - sdio_claim_host(func);
> - mmc_hw_reset(func->card->host);
> - sdio_release_host(func);
>  
>   /* Previous save_adapter won't be valid after this. We will cancel

^^^ FTR, the "save_adapter" note was already obsolete as of

  cc75c577806a mwifiex: get rid of global save_adapter and sdio_work

but the clear_bit() calls were (before this patch) still useful for
other reasons.

>* pending work requests.
> @@ -2233,9 +2225,9 @@ static void mwifiex_sdio_card_reset_work(struct 
> mwifiex_adapter *adapter)
>   clear_bit(MWIFIEX_IFACE_WORK_DEVICE_DUMP, >work_flags);
>   clear_bit(MWIFIEX_IFACE_WORK_CARD_RESET, >work_flags);

But now, I don't think you need these clear_bit() calls any more --
you're totally destroying the card and its workqueue on remove(). (And
anyway, MWIFIEX_IFACE_WORK_CARD_RESET was just cleared by your caller.)

>  
> - ret = mwifiex_reinit_sw(adapter);
> - if (ret)
> - dev_err(>dev, "reinit failed: %d\n", ret);
> + sdio_claim_host(func);
> + sdio_trigger_replug(func);
> + sdio_release_host(func);

And...we're approximately back to where we were 4 years ago :)

commit b4336a282db86b298b70563f8ed51782b36b772c
Author: Andreas Fenkart 
Date:   Thu Jul 16 18:50:01 2015 +0200

mwifiex: sdio: reset adapter using mmc_hw_reset

Anyway, assuming the "function reset" thing isn't workable, and you drop
the clear_bit() stuff, I think this is fine:

Reviewed-by: Brian Norris 

>  }
>  
>  /* This function read/write firmware */
> -- 
> 2.22.0.510.g264f2c817a-goog
> 


Re: [PATCH] mac80211: don't warn about CW params when not using them

2019-07-18 Thread Brian Norris
On Thu, Jul 18, 2019 at 12:45 AM Stanislaw Gruszka  wrote:
> Fix looks fine for me. However I think rtw88 should implement
> drv_conf_tx() because parameters can be different on different
> network setups and maybe more important WMM/AC parameters become
> quite recently part of ETSI regulatory.

Ack. I just figured we should stay consistent with the WARNINGs (and
we both noticed this one on earlier patch reviews). I don't know about
you, but for me, the whole wireless stack is so full of WARNINGs that
my crash reporter system separately classifies net/wireless and
drivers/net/wireless/ from the rest of the kernel when categorizing
automated problem reports. (And...most developers then ignore them.)

But I digress ;)

Brian


[PATCH] mac80211: don't warn about CW params when not using them

2019-07-17 Thread Brian Norris
ieee80211_set_wmm_default() normally sets up the initial CW min/max for
each queue, except that it skips doing this if the driver doesn't
support ->conf_tx. We still end up calling drv_conf_tx() in some cases
(e.g., ieee80211_reconfig()), which also still won't do anything
useful...except it complains here about the invalid CW parameters.

Let's just skip the WARN if we weren't going to do anything useful with
the parameters.

Signed-off-by: Brian Norris 
---
Noticed because rtw88 does not currently implement .conf_tx()

I think there are several ways to slice this one. I picked one fix,
which may not be the best one.

 net/mac80211/driver-ops.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/mac80211/driver-ops.c b/net/mac80211/driver-ops.c
index acd4afb4944b..c9a8a2433e8a 100644
--- a/net/mac80211/driver-ops.c
+++ b/net/mac80211/driver-ops.c
@@ -187,11 +187,16 @@ int drv_conf_tx(struct ieee80211_local *local,
if (!check_sdata_in_driver(sdata))
return -EIO;
 
-   if (WARN_ONCE(params->cw_min == 0 ||
- params->cw_min > params->cw_max,
- "%s: invalid CW_min/CW_max: %d/%d\n",
- sdata->name, params->cw_min, params->cw_max))
+   if (params->cw_min == 0 || params->cw_min > params->cw_max) {
+   /*
+* If we can't configure hardware anyway, don't warn. We may
+* never have initialized the CW parameters.
+*/
+   WARN_ONCE(local->ops->conf_tx,
+ "%s: invalid CW_min/CW_max: %d/%d\n",
+ sdata->name, params->cw_min, params->cw_max);
return -EINVAL;
+   }
 
trace_drv_conf_tx(local, sdata, ac, params);
if (local->ops->conf_tx)
-- 
2.22.0.510.g264f2c817a-goog



Re: [RFC PATCH] bug: always show source-tree-relative paths in WARN()/BUG()

2019-07-12 Thread Brian Norris
On Thu, Jul 11, 2019 at 6:50 PM Masahiro Yamada
 wrote:
> GCC 8 added this flag.
> So, it will be eventually all solved in the GCC world.

Ack.

> Clang has not supported it yet...

That's what it appeared like. I've bugged our Clang-loving toolchain
folks to see if we can get parity.

> Trimming absolute path at run-time
> is no help for reducing the kernel image.

Sure, but that's not my stated goal. It would indeed be nicer though.
I guess if no one else speaks up with a favorable word toward my RFC,
I'll just see what I can do on the toolchain side.

Thanks for the help,
Brian


Re: [RFC PATCH] bug: always show source-tree-relative paths in WARN()/BUG()

2019-07-11 Thread Brian Norris
On Thu, Jul 11, 2019 at 6:14 PM Masahiro Yamada
 wrote:
> BTW, did you see this?
>
> commit a73619a845d5625079cc1b3b820f44c899618388
> Author: Masahiro Yamada 
> Date:   Fri Mar 30 13:15:26 2018 +0900
>
> kbuild: use -fmacro-prefix-map to make __FILE__ a relative path

Oh, wow, no I did not. If my reading is correct, that's GCC only? I've
been using various combinations of newer (5.2) and older (4.14.y --
didn't have that patch) kernels, older GCC (doesn't have that feature
AFAICT), and newer Clang (doesn't appear to have that feature). So I'm
not totally sure if I ever actually tried a combo that *could* make
use of that. But I may give it another shot.

In the event that this is GCC-specific...I don't suppose I could
convince anybody to expend any effort (e.g., taking a patch like mine)
to solve it for the non-GCC world?

Thanks for the tip,
Brian


[RFC PATCH] bug: always show source-tree-relative paths in WARN()/BUG()

2019-07-11 Thread Brian Norris
When building out-of-tree (e.g., 'make O=...'), __FILE__ ends up being
an absolute path, and so WARN() and BUG() end up putting path names from
the build system into the log text. For example:

  # echo BUG > /sys/kernel/debug/provoke-crash/DIRECT
  ...
  kernel BUG at /mnt/host/source/[...]/drivers/misc/lkdtm/bugs.c:71!

Not only is this excessively verbose, it also adds extra noise into
tools that might parse this output. (For example, if builder paths
change across versions, we suddenly get a "new" crash signature.)

All in all, this looks much better as:

  kernel BUG at drivers/misc/lkdtm/bugs.c:71!

It appears the Kbuild system is fairly entrenched in using
$(KBUILD_OUTPUT) for the ${CWD}, which necessarily means that the
preprocessor will get handed an absolute path. It seems the only
solution then, is to do some sort of post-processing on __FILE__.

It so happens that lib/dynamic_debug.c already solves this sort of
problem, so I steal its solution for use in panic/warn/bug code as well.

Signed-off-by: Brian Norris 
---
I'd be happy to entertain better solutions to this problem, but so far,
I haven't been creative enough to come up with one.

I'm also unsure of who best to address this to. If anyone has better
pointers, I'm all ears.

 include/linux/bug.h |  2 ++
 kernel/panic.c  | 21 +++--
 lib/bug.c   |  3 ++-
 lib/dynamic_debug.c | 18 --
 4 files changed, 27 insertions(+), 17 deletions(-)

diff --git a/include/linux/bug.h b/include/linux/bug.h
index fe5916550da8..6ab59e53801d 100644
--- a/include/linux/bug.h
+++ b/include/linux/bug.h
@@ -76,4 +76,6 @@ static inline __must_check bool check_data_corruption(bool v) 
{ return v; }
corruption;  \
}))
 
+const char *trim_filepath_prefix(const char *path);
+
 #endif /* _LINUX_BUG_H */
diff --git a/kernel/panic.c b/kernel/panic.c
index 4d9f55bf7d38..0bed3101f049 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -546,6 +546,23 @@ struct warn_args {
va_list args;
 };
 
+/**
+ * trim_filepath_prefix - retrieve source-root relative path from a __FILE__
+ * @path: a __FILE__-like path argument.
+ * Return: path relative to source root.
+ */
+const char *trim_filepath_prefix(const char *path)
+{
+   int skip = strlen(__FILE__) - strlen("kernel/panic.c");
+
+   BUILD_BUG_ON(strlen(__FILE__) < strlen("kernel/panic.c"));
+
+   if (strncmp(path, __FILE__, skip))
+   skip = 0; /* prefix mismatch, don't skip */
+
+   return path + skip;
+}
+
 void __warn(const char *file, int line, void *caller, unsigned taint,
struct pt_regs *regs, struct warn_args *args)
 {
@@ -556,8 +573,8 @@ void __warn(const char *file, int line, void *caller, 
unsigned taint,
 
if (file)
pr_warn("WARNING: CPU: %d PID: %d at %s:%d %pS\n",
-   raw_smp_processor_id(), current->pid, file, line,
-   caller);
+   raw_smp_processor_id(), current->pid,
+   trim_filepath_prefix(file), line, caller);
else
pr_warn("WARNING: CPU: %d PID: %d at %pS\n",
raw_smp_processor_id(), current->pid, caller);
diff --git a/lib/bug.c b/lib/bug.c
index 1077366f496b..2aa91d330451 100644
--- a/lib/bug.c
+++ b/lib/bug.c
@@ -191,7 +191,8 @@ enum bug_trap_type report_bug(unsigned long bugaddr, struct 
pt_regs *regs)
printk(KERN_DEFAULT CUT_HERE);
 
if (file)
-   pr_crit("kernel BUG at %s:%u!\n", file, line);
+   pr_crit("kernel BUG at %s:%u!\n", trim_filepath_prefix(file),
+   line);
else
pr_crit("Kernel BUG at %pB [verbose debug info unavailable]\n",
(void *)bugaddr);
diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index 8a16c2d498e9..0896f067ba17 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -13,6 +13,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ":%s: " fmt, __func__
 
+#include 
 #include 
 #include 
 #include 
@@ -67,17 +68,6 @@ static LIST_HEAD(ddebug_tables);
 static int verbose;
 module_param(verbose, int, 0644);
 
-/* Return the path relative to source root */
-static inline const char *trim_prefix(const char *path)
-{
-   int skip = strlen(__FILE__) - strlen("lib/dynamic_debug.c");
-
-   if (strncmp(path, __FILE__, skip))
-   skip = 0; /* prefix mismatch, don't skip */
-
-   return path + skip;
-}
-
 static struct { unsigned flag:8; char opt_char; } opt_array[] = {
{ _DPRINTK_FLAGS_PRINT, 'p' },
{ _DPRINTK_FLAGS_INCL_MODNAME, 'm' },
@@ -164,7 +154,7 @@ static int ddebug_change(const struct ddebug_query *query,
!match_wildcard(query->filename,

Re: [PATCH AUTOSEL 4.19 14/60] mwifiex: Abort at too short BSS descriptor element

2019-07-10 Thread Brian Norris
On Wed, Jul 10, 2019 at 7:51 AM Sasha Levin  wrote:
> I see that 63d7ef36103d didn't make it into 5.2, so I'll just drop this
> for now.

Yeah, I think it's stuck at net/master. Presumably it'll get into
5.3-rc somewhere.

Brian


Re: [PATCH AUTOSEL 4.19 14/60] mwifiex: Abort at too short BSS descriptor element

2019-06-28 Thread Brian Norris
On Wed, Jun 26, 2019 at 5:49 PM Sasha Levin  wrote:
>
> From: Takashi Iwai 
>
> [ Upstream commit 685c9b7750bfacd6fc1db50d86579980593b7869 ]
>
> Currently mwifiex_update_bss_desc_with_ie() implicitly assumes that
> the source descriptor entries contain the enough size for each type
> and performs copying without checking the source size.  This may lead
> to read over boundary.
>
> Fix this by putting the source size check in appropriate places.
>
> Signed-off-by: Takashi Iwai 
> Signed-off-by: Kalle Valo 
> Signed-off-by: Sasha Levin 

For the record, this fixup is still aiming for 5.2, correcting some
potential mistakes in this patch:

63d7ef36103d mwifiex: Don't abort on small, spec-compliant vendor IEs

So you might want to hold off a bit, and grab them both.

Brian


[PATCH v2 2/2] mwifiex: don't disable hardirqs; just softirqs

2019-06-25 Thread Brian Norris
main_proc_lock and int_lock (in mwifiex_adapter) are the only spinlocks
used in hardirq contexts. The rest are only in task or softirq contexts.

Convert every other lock from *_irq{save,restore}() variants to _bh()
variants.

This is a mechanical transformation of all spinlock usage in mwifiex
using the following:

Step 1:
I ran this nasty sed script:

sed -i -E '/spin_lock_irqsave|spin_unlock_irqrestore/ {
  /main_proc_lock|int_lock/! {
s:(spin_(un|)lock)_irq(save|restore):\1_bh: ;
# Join broken lines.
:a /;$/! {
  N;
  s/\s*\n\s*//;
  ba
}
/,.*\);$/ s:,.*\):\):
  }
}' drivers/net/wireless/marvell/mwifiex/*

Step 2:
Manually delete the flags / ra_list_flags args from:

  mwifiex_send_single_packet()
  mwifiex_11n_aggregate_pkt()
  mwifiex_send_processed_packet()

which are now unused.

Step 3:
Apply this semantic patch (coccinelle) to remove the unused 'flags'
variables:

// 
@@
type T;
identifier i;
@@

(
extern T i;
|
- T i;
  ... when != i
)
// 

(Usage is something like this:

  make coccicheck COCCI=./patch.cocci MODE=patch 
M=drivers/net/wireless/marvell/mwifiex/

although this skips *.h files for some reasons, so I had to massage
stuff.)

Testing: I've played with a variety of stress tests, including download
stress tests on the same APs which caught regressions with commit
5188d5453bc9 ("mwifiex: restructure rx_reorder_tbl_lock usage"). I've
primarily tested on Marvell 8997 / PCIe, although I've given 8897 / SDIO
a quick spin as well.

Signed-off-by: Brian Norris 
---
v2:
 * add new Step 2, because I missed a few functions that were passing
   along the IRQ flags as arguments (fixes "used but not set" warnings,
   because these flags were no longer really in use, but were getting
   passed along as args still)
---
 drivers/net/wireless/marvell/mwifiex/11n.c|  53 -
 drivers/net/wireless/marvell/mwifiex/11n.h|   5 +-
 .../net/wireless/marvell/mwifiex/11n_aggr.c   |  26 ++---
 .../net/wireless/marvell/mwifiex/11n_aggr.h   |   2 +-
 .../wireless/marvell/mwifiex/11n_rxreorder.c  |  86 ++
 .../net/wireless/marvell/mwifiex/cfg80211.c   |  35 +++---
 drivers/net/wireless/marvell/mwifiex/cmdevt.c |  76 +---
 drivers/net/wireless/marvell/mwifiex/init.c   |  32 ++---
 drivers/net/wireless/marvell/mwifiex/main.c   |  29 ++---
 drivers/net/wireless/marvell/mwifiex/scan.c   |  58 --
 .../wireless/marvell/mwifiex/sta_cmdresp.c|   5 +-
 .../net/wireless/marvell/mwifiex/sta_event.c  |  10 +-
 drivers/net/wireless/marvell/mwifiex/tdls.c   |  68 ---
 drivers/net/wireless/marvell/mwifiex/txrx.c   |   5 +-
 .../net/wireless/marvell/mwifiex/uap_txrx.c   |  10 +-
 drivers/net/wireless/marvell/mwifiex/usb.c|  10 +-
 drivers/net/wireless/marvell/mwifiex/util.c   |  15 +--
 drivers/net/wireless/marvell/mwifiex/wmm.c| 109 +++---
 18 files changed, 256 insertions(+), 378 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/11n.c 
b/drivers/net/wireless/marvell/mwifiex/11n.c
index 5d75c971004b..e435f801bc91 100644
--- a/drivers/net/wireless/marvell/mwifiex/11n.c
+++ b/drivers/net/wireless/marvell/mwifiex/11n.c
@@ -84,17 +84,15 @@ mwifiex_get_ba_status(struct mwifiex_private *priv,
  enum mwifiex_ba_status ba_status)
 {
struct mwifiex_tx_ba_stream_tbl *tx_ba_tsr_tbl;
-   unsigned long flags;
 
-   spin_lock_irqsave(>tx_ba_stream_tbl_lock, flags);
+   spin_lock_bh(>tx_ba_stream_tbl_lock);
list_for_each_entry(tx_ba_tsr_tbl, >tx_ba_stream_tbl_ptr, list) {
if (tx_ba_tsr_tbl->ba_status == ba_status) {
-   spin_unlock_irqrestore(>tx_ba_stream_tbl_lock,
-  flags);
+   spin_unlock_bh(>tx_ba_stream_tbl_lock);
return tx_ba_tsr_tbl;
}
}
-   spin_unlock_irqrestore(>tx_ba_stream_tbl_lock, flags);
+   spin_unlock_bh(>tx_ba_stream_tbl_lock);
return NULL;
 }
 
@@ -516,13 +514,12 @@ void mwifiex_11n_delete_all_tx_ba_stream_tbl(struct 
mwifiex_private *priv)
 {
int i;
struct mwifiex_tx_ba_stream_tbl *del_tbl_ptr, *tmp_node;
-   unsigned long flags;
 
-   spin_lock_irqsave(>tx_ba_stream_tbl_lock, flags);
+   spin_lock_bh(>tx_ba_stream_tbl_lock);
list_for_each_entry_safe(del_tbl_ptr, tmp_node,
 >tx_ba_stream_tbl_ptr, list)
mwifiex_11n_delete_tx_ba_stream_tbl_entry(priv, del_tbl_ptr);
-   spin_unlock_irqrestore(>tx_ba_stream_tbl_lock, flags);
+   spin_unlock_bh(>tx_ba_stream_tbl_lock);
 
INIT_LIST_HEAD(>tx_ba_stream_tbl_ptr);
 
@@ -539,18 +536,16 @@ struct mwifiex_tx_ba_stream_tbl *
 mwifiex_get_ba_tbl(struct mwifiex_private *priv, int tid, u8 *ra)
 {
struct mwifiex_tx_ba_stream_tbl *tx_ba_tsr

[PATCH v2 0/2] mwifiex: spinlock usage improvements

2019-06-25 Thread Brian Norris
This series follows up on some notes from this thread:

  
http://lkml.kernel.org/linux-wireless/20181130175957.167031-1-briannor...@chromium.org
  Subject: [4.20 PATCH] Revert "mwifiex: restructure rx_reorder_tbl_lock usage"

where Ganapathi suggested I send out my work.

In particular, patch 1 is a step toward helping apply Ganapathi's
original "mwifiex: restructure rx_reorder_tbl_lock usage" solution
without regression, by logically separating the two operations (and
therefore, the locking patterns) involved in that deadlock. It doesn't
re-apply that change, nor does it 100% unblock such a solution, but at
least it's a step in the right direction, as I understand it.

Patch 2 is a change I noticed should be possible along the way. There
are a number of reasons we probably shouldn't be disabling hardirqs when
it's not necessary, but one funny side effect: bugs noticed in the above
"revert" patch would no longer happen. This is because
mwifiex_recv_packet() bases softirq decisions on in_interrupt() (see
description in include/linux/preempt.h), so it will automatically skip
softirq processing if we have BH disabled, but not if we only have hard
IRQs disabled. In other words, if we have such an incorrect nesting bug
in the future (this time with BH disabled), we will now skip softirq
processing and therefore sidestep this sort of bug. [1]

[Related note: softirq masking is weird:
https://lwn.net/Articles/779738/]

It's also possible we can improve system responsiveness and
debuggability by keeping (hard) IRQs enabled more often, although I
didn't measure any particular effect here, and most of these contexts
should be rather quick.

I've done a variety of performance and stress tests for this series, on
both 8897/SDIO and 8997/PCIe, and I haven't seen any decrease in
performance or stability. Or, any change in performance appears to be
within the range of "noise".

Regards,
Brian

[1] Side note: the usage of 'in_interrupt()' in mwifiex_recv_packet() is
probably not really a good idea. But it does have a helpful side effect
for this particular sort of bug.

Changelog:

v2:
 * fix warnings about using uninitialized "flags" variables


Brian Norris (2):
  mwifiex: dispatch/rotate from reorder table atomically
  mwifiex: don't disable hardirqs; just softirqs

 drivers/net/wireless/marvell/mwifiex/11n.c|  53 +++-
 drivers/net/wireless/marvell/mwifiex/11n.h|   5 +-
 .../net/wireless/marvell/mwifiex/11n_aggr.c   |  26 ++--
 .../net/wireless/marvell/mwifiex/11n_aggr.h   |   2 +-
 .../wireless/marvell/mwifiex/11n_rxreorder.c  | 125 --
 .../net/wireless/marvell/mwifiex/cfg80211.c   |  35 +++--
 drivers/net/wireless/marvell/mwifiex/cmdevt.c |  76 +--
 drivers/net/wireless/marvell/mwifiex/init.c   |  32 ++---
 drivers/net/wireless/marvell/mwifiex/main.c   |  29 ++--
 drivers/net/wireless/marvell/mwifiex/scan.c   |  58 
 .../wireless/marvell/mwifiex/sta_cmdresp.c|   5 +-
 .../net/wireless/marvell/mwifiex/sta_event.c  |  10 +-
 drivers/net/wireless/marvell/mwifiex/tdls.c   |  68 --
 drivers/net/wireless/marvell/mwifiex/txrx.c   |   5 +-
 .../net/wireless/marvell/mwifiex/uap_txrx.c   |  10 +-
 drivers/net/wireless/marvell/mwifiex/usb.c|  10 +-
 drivers/net/wireless/marvell/mwifiex/util.c   |  15 +--
 drivers/net/wireless/marvell/mwifiex/wmm.c| 109 ++-
 18 files changed, 278 insertions(+), 395 deletions(-)

-- 
2.22.0.410.gd8fdbe21b5-goog



[PATCH v2 1/2] mwifiex: dispatch/rotate from reorder table atomically

2019-06-25 Thread Brian Norris
mwifiex_11n_scan_and_dispatch() and
mwifiex_11n_dispatch_pkt_until_start_win() share similar patterns, where
they perform a few different actions on the same table, using the same
lock, but non-atomically. There have been other attempts to clean up
this sort of behavior, but they have had problems (incomplete;
introducing new deadlocks).

We can improve these functions' atomicity by queueing up our RX packets
in a list, to dispatch at the end of the function. This avoids problems
of another operation modifying the table in between our dispatch and
rotation operations.

This was inspired by investigations around this:

  
http://lkml.kernel.org/linux-wireless/20181130175957.167031-1-briannor...@chromium.org
  Subject: [4.20 PATCH] Revert "mwifiex: restructure rx_reorder_tbl_lock usage"

While the original (now-reverted) patch had good intentions in
restructuring some of the locking patterns in this driver, it missed an
important detail: we cannot defer to softirq contexts while already in
an atomic context. We can help avoid this sort of problem by separating
the two steps of:
(1) iterating / clearing the mwifiex reordering table
(2) dispatching received packets to upper layers

This makes it much harder to make lock recursion mistakes, as these
two steps no longer need to hold the same locks.

Testing: I've played with a variety of stress tests, including download
stress tests on the same APs which caught regressions with commit
5188d5453bc9 ("mwifiex: restructure rx_reorder_tbl_lock usage"). I've
primarily tested on Marvell 8997 / PCIe, although I've given 8897 / SDIO
a quick spin as well.

Signed-off-by: Brian Norris 
Acked-by: Ganapathi Bhat 
---
v2: no change
---
 .../wireless/marvell/mwifiex/11n_rxreorder.c  | 43 +++
 1 file changed, 24 insertions(+), 19 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c 
b/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c
index 5380fba652cc..77bdf16d6573 100644
--- a/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c
+++ b/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c
@@ -76,7 +76,8 @@ static int mwifiex_11n_dispatch_amsdu_pkt(struct 
mwifiex_private *priv,
 /* This function will process the rx packet and forward it to kernel/upper
  * layer.
  */
-static int mwifiex_11n_dispatch_pkt(struct mwifiex_private *priv, void 
*payload)
+static int mwifiex_11n_dispatch_pkt(struct mwifiex_private *priv,
+   struct sk_buff *payload)
 {
 
int ret;
@@ -109,27 +110,26 @@ mwifiex_11n_dispatch_pkt_until_start_win(struct 
mwifiex_private *priv,
 struct mwifiex_rx_reorder_tbl *tbl,
 int start_win)
 {
+   struct sk_buff_head list;
+   struct sk_buff *skb;
int pkt_to_send, i;
-   void *rx_tmp_ptr;
unsigned long flags;
 
+   __skb_queue_head_init();
+   spin_lock_irqsave(>rx_reorder_tbl_lock, flags);
+
pkt_to_send = (start_win > tbl->start_win) ?
  min((start_win - tbl->start_win), tbl->win_size) :
  tbl->win_size;
 
for (i = 0; i < pkt_to_send; ++i) {
-   spin_lock_irqsave(>rx_reorder_tbl_lock, flags);
-   rx_tmp_ptr = NULL;
if (tbl->rx_reorder_ptr[i]) {
-   rx_tmp_ptr = tbl->rx_reorder_ptr[i];
+   skb = tbl->rx_reorder_ptr[i];
+   __skb_queue_tail(, skb);
tbl->rx_reorder_ptr[i] = NULL;
}
-   spin_unlock_irqrestore(>rx_reorder_tbl_lock, flags);
-   if (rx_tmp_ptr)
-   mwifiex_11n_dispatch_pkt(priv, rx_tmp_ptr);
}
 
-   spin_lock_irqsave(>rx_reorder_tbl_lock, flags);
/*
 * We don't have a circular buffer, hence use rotation to simulate
 * circular buffer
@@ -141,6 +141,9 @@ mwifiex_11n_dispatch_pkt_until_start_win(struct 
mwifiex_private *priv,
 
tbl->start_win = start_win;
spin_unlock_irqrestore(>rx_reorder_tbl_lock, flags);
+
+   while ((skb = __skb_dequeue()))
+   mwifiex_11n_dispatch_pkt(priv, skb);
 }
 
 /*
@@ -155,24 +158,22 @@ static void
 mwifiex_11n_scan_and_dispatch(struct mwifiex_private *priv,
  struct mwifiex_rx_reorder_tbl *tbl)
 {
+   struct sk_buff_head list;
+   struct sk_buff *skb;
int i, j, xchg;
-   void *rx_tmp_ptr;
unsigned long flags;
 
+   __skb_queue_head_init();
+   spin_lock_irqsave(>rx_reorder_tbl_lock, flags);
+
for (i = 0; i < tbl->win_size; ++i) {
-   spin_lock_irqsave(>rx_reorder_tbl_lock, flags);
-   if (!tbl->rx_reorder_ptr[i]) {
-   spin_unlock_irqrestore(>rx_reorder_tbl_lock,
-  

Re: [PATCH 1/2] mwifiex: dispatch/rotate from reorder table atomically

2019-06-25 Thread Brian Norris
On Mon, Jun 24, 2019 at 9:45 PM Kalle Valo  wrote:
> New warning:
>
> drivers/net/wireless/marvell/mwifiex/wmm.c: In function 
> 'mwifiex_wmm_process_tx':
> drivers/net/wireless/marvell/mwifiex/wmm.c:1438:4: warning: 'flags' may be 
> used uninitialized in this function [-Wmaybe-uninitialized]
> mwifiex_11n_aggregate_pkt(priv, ptr, ptr_index, flags);
> ^~
> drivers/net/wireless/marvell/mwifiex/wmm.c:1406:16: note: 'flags' was 
> declared here
>   unsigned long flags;
> ^

Yikes! Not sure how I missed that, as I *thought* I had -Werror
enabled. Maybe things got lost a bit in the shuffles from GCC to Clang
for building our kernels internally.

Anyway, "used" is a bit generous here, since these are just useless
function arguments (never *actually* used now). I should remove these
args entirely.

> 2 patches set to Changes Requested.

Will send a v2.

Thanks,
Brian


[PATCH 5.2 1/2] mwifiex: Don't abort on small, spec-compliant vendor IEs

2019-06-14 Thread Brian Norris
Per the 802.11 specification, vendor IEs are (at minimum) only required
to contain an OUI. A type field is also included in ieee80211.h (struct
ieee80211_vendor_ie) but doesn't appear in the specification. The
remaining fields (subtype, version) are a convention used in WMM
headers.

Thus, we should not reject vendor-specific IEs that have only the
minimum length (3 bytes) -- we should skip over them (since we only want
to match longer IEs, that match either WMM or WPA formats). We can
reject elements that don't have the minimum-required 3 byte OUI.

While we're at it, move the non-standard subtype and version fields into
the WMM structs, to avoid this confusion in the future about generic
"vendor header" attributes.

Fixes: 685c9b7750bf ("mwifiex: Abort at too short BSS descriptor element")
Cc: Takashi Iwai 
Signed-off-by: Brian Norris 
---
It appears that commit 685c9b7750bf is on its way to 5.2, so I labeled
this bugfix for 5.2 as well.

 drivers/net/wireless/marvell/mwifiex/fw.h  | 12 +---
 drivers/net/wireless/marvell/mwifiex/scan.c| 18 +++---
 .../net/wireless/marvell/mwifiex/sta_ioctl.c   |  4 ++--
 drivers/net/wireless/marvell/mwifiex/wmm.c |  2 +-
 4 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/fw.h 
b/drivers/net/wireless/marvell/mwifiex/fw.h
index b73f99dc5a72..1fb76d2f5d3f 100644
--- a/drivers/net/wireless/marvell/mwifiex/fw.h
+++ b/drivers/net/wireless/marvell/mwifiex/fw.h
@@ -1759,9 +1759,10 @@ struct mwifiex_ie_types_wmm_queue_status {
 struct ieee_types_vendor_header {
u8 element_id;
u8 len;
-   u8 oui[4];  /* 0~2: oui, 3: oui_type */
-   u8 oui_subtype;
-   u8 version;
+   struct {
+   u8 oui[3];
+   u8 oui_type;
+   } __packed oui;
 } __packed;
 
 struct ieee_types_wmm_parameter {
@@ -1775,6 +1776,9 @@ struct ieee_types_wmm_parameter {
 *   Version [1]
 */
struct ieee_types_vendor_header vend_hdr;
+   u8 oui_subtype;
+   u8 version;
+
u8 qos_info_bitmap;
u8 reserved;
struct ieee_types_wmm_ac_parameters ac_params[IEEE80211_NUM_ACS];
@@ -1792,6 +1796,8 @@ struct ieee_types_wmm_info {
 *   Version [1]
 */
struct ieee_types_vendor_header vend_hdr;
+   u8 oui_subtype;
+   u8 version;
 
u8 qos_info_bitmap;
 } __packed;
diff --git a/drivers/net/wireless/marvell/mwifiex/scan.c 
b/drivers/net/wireless/marvell/mwifiex/scan.c
index c269a0de9413..e2786ab612ca 100644
--- a/drivers/net/wireless/marvell/mwifiex/scan.c
+++ b/drivers/net/wireless/marvell/mwifiex/scan.c
@@ -1361,21 +1361,25 @@ int mwifiex_update_bss_desc_with_ie(struct 
mwifiex_adapter *adapter,
break;
 
case WLAN_EID_VENDOR_SPECIFIC:
-   if (element_len + 2 < sizeof(vendor_ie->vend_hdr))
-   return -EINVAL;
-
vendor_ie = (struct ieee_types_vendor_specific *)
current_ptr;
 
-   if (!memcmp
-   (vendor_ie->vend_hdr.oui, wpa_oui,
-sizeof(wpa_oui))) {
+   /* 802.11 requires at least 3-byte OUI. */
+   if (element_len < sizeof(vendor_ie->vend_hdr.oui.oui))
+   return -EINVAL;
+
+   /* Not long enough for a match? Skip it. */
+   if (element_len < sizeof(wpa_oui))
+   break;
+
+   if (!memcmp(_ie->vend_hdr.oui, wpa_oui,
+   sizeof(wpa_oui))) {
bss_entry->bcn_wpa_ie =
(struct ieee_types_vendor_specific *)
current_ptr;
bss_entry->wpa_offset = (u16)
(current_ptr - bss_entry->beacon_buf);
-   } else if (!memcmp(vendor_ie->vend_hdr.oui, wmm_oui,
+   } else if (!memcmp(_ie->vend_hdr.oui, wmm_oui,
sizeof(wmm_oui))) {
if (total_ie_len ==
sizeof(struct ieee_types_wmm_parameter) ||
diff --git a/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c 
b/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c
index ebc0e41e5d3b..74e50566db1f 100644
--- a/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c
+++ b/drivers/net/wireless/marvell/mwifiex/sta_ioctl.c
@@ -1351,7 +1351,7 @@ mwifiex_set_gen_ie_helper(struct mwifiex_private *priv, 
u8 *ie_data_ptr,
/* Test to see if it is a WPA IE, if not, then
 * it is a gen IE
 */
-   if (!memcmp(pvendor_ie->

[PATCH 2/2] mwifiex: use 'total_ie_len' in mwifiex_update_bss_desc_with_ie()

2019-06-14 Thread Brian Norris
This is clearer than copy/pasting the magic number '+ 2' around, and it
even saves the need for one existing comment.

Cc: Takashi Iwai 
Signed-off-by: Brian Norris 
---
 drivers/net/wireless/marvell/mwifiex/scan.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/scan.c 
b/drivers/net/wireless/marvell/mwifiex/scan.c
index e2786ab612ca..707e5159262f 100644
--- a/drivers/net/wireless/marvell/mwifiex/scan.c
+++ b/drivers/net/wireless/marvell/mwifiex/scan.c
@@ -1269,7 +1269,7 @@ int mwifiex_update_bss_desc_with_ie(struct 
mwifiex_adapter *adapter,
break;
 
case WLAN_EID_FH_PARAMS:
-   if (element_len + 2 < sizeof(*fh_param_set))
+   if (total_ie_len < sizeof(*fh_param_set))
return -EINVAL;
fh_param_set =
(struct ieee_types_fh_param_set *) current_ptr;
@@ -1279,7 +1279,7 @@ int mwifiex_update_bss_desc_with_ie(struct 
mwifiex_adapter *adapter,
break;
 
case WLAN_EID_DS_PARAMS:
-   if (element_len + 2 < sizeof(*ds_param_set))
+   if (total_ie_len < sizeof(*ds_param_set))
return -EINVAL;
ds_param_set =
(struct ieee_types_ds_param_set *) current_ptr;
@@ -1292,7 +1292,7 @@ int mwifiex_update_bss_desc_with_ie(struct 
mwifiex_adapter *adapter,
break;
 
case WLAN_EID_CF_PARAMS:
-   if (element_len + 2 < sizeof(*cf_param_set))
+   if (total_ie_len < sizeof(*cf_param_set))
return -EINVAL;
cf_param_set =
(struct ieee_types_cf_param_set *) current_ptr;
@@ -1302,7 +1302,7 @@ int mwifiex_update_bss_desc_with_ie(struct 
mwifiex_adapter *adapter,
break;
 
case WLAN_EID_IBSS_PARAMS:
-   if (element_len + 2 < sizeof(*ibss_param_set))
+   if (total_ie_len < sizeof(*ibss_param_set))
return -EINVAL;
ibss_param_set =
(struct ieee_types_ibss_param_set *)
@@ -1459,10 +1459,8 @@ int mwifiex_update_bss_desc_with_ie(struct 
mwifiex_adapter *adapter,
break;
}
 
-   current_ptr += element_len + 2;
-
-   /* Need to account for IE ID and IE Len */
-   bytes_left -= (element_len + 2);
+   current_ptr += total_ie_len;
+   bytes_left -= total_ie_len;
 
}   /* while (bytes_left > 2) */
return ret;
-- 
2.22.0.410.gd8fdbe21b5-goog



Re: [PATCH v3 3/4] backlight: pwm_bl: compute brightness of LED linearly to human eye.

2019-06-11 Thread Brian Norris
On Tue, Jun 11, 2019 at 3:49 AM Daniel Thompson
 wrote:
> This is a long standing flaw in the backlight interfaces. AFAIK generic
> userspaces end up with a (flawed) heuristic.

Bingo! Would be nice if we could start to fix this long-standing flaw.

> Basically devices with a narrow range of choices can be assumed to be
> logarithmic

That's (almost, see below) exactly what we have.

(And this is what Matthias is fighting against, now that we're
implementing both "large number of data points" and "pre-curved" at
the same time. We will have to either adapt the heuristic, or else
adapt our device trees to fit the heuristic.)

> Systems are coming along that allow us to animate the change of
> brightness (part of the reason for interpolated tables is to
> permit smooth animation rather than because the user explicitly wants
> to set the brightness to exactly 1117).

Chrome OS has done this for a long time. So "coming along" is a bit late ;)

Also, I believe Chrome OS will do animation/smoothing for all tables
(small or large) where it can: even for the small tables.

> These systems are often
> logarithmic but with a wide range of values.

NB: Chrome OS happens to use a polynomial formula (exponent = 2 or
0.5, depending on how you look at it), not logarithmic. You can see it
in all its (non)glory here:

https://chromium.googlesource.com/chromiumos/platform2/+/ee015853b227cf265491bd80ccf096b188490529/power_manager/powerd/policy/internal_backlight_controller.cc#451

Regards,
Brian


Re: [EXT] Re: [4.20 PATCH] Revert "mwifiex: restructure rx_reorder_tbl_lock usage"

2019-06-04 Thread Brian Norris
Hi Ganapathi,

On Mon, Jun 3, 2019 at 8:04 PM Ganapathi Bhat  wrote:
> > There are a few possible ways to handle this:
> > (a) prevent processing softirqs in that context; e.g., with
> > local_bh_disable(). This seems somewhat of a hack.
> > (Side note: I think most of the locks in this driver really could be
> > spin_lock_bh(), not spin_lock_irqsave() -- we don't really care
> > about hardirq context for 99% of these locks.)
> > (b) restructure so that packet processing (e.g., netif_rx_ni()) is done
> > outside of the spinlock.
> >
> > It's actually not that hard to do (b). You can just queue your skb's up in a
> > temporary sk_buff_head list and process them all at once after you've
> > finished processing the reorder table. I have a local patch to do this, and 
> > I
> > might send it your way if I can give it a bit more testing.
>
> OK; That will be good; We will run a complete test after the patch; (OR we 
> can work on this, share for review);

I gave my work another round of testing and submitted it here:

https://patchwork.kernel.org/cover/10976089/
[PATCH 0/2] mwifiex: spinlock usage improvements

Feel free to give it a spin. It doesn't completely resolve everything
you were trying to fix, but I believe it helps nudge things in the
right direction.

Brian


  1   2   3   4   5   6   7   8   9   10   >