Re: [PATCH 2/2] mac80211: Support LIVE_ADDRESS_CHANGE feature

2019-10-08 Thread Denis Kenzior

Hi Johannes,

On 10/8/19 3:16 PM, Johannes Berg wrote:

On Tue, 2019-10-08 at 13:50 -0500, Denis Kenzior wrote:

Hi Johannes,


But they shouldn't change due a mac address change?  I wonder if we can
further relax the requirements to allow mac change if
NL80211_SCAN_FLAG_RANDOM_ADDR was used?


No, at least with HW scan that would completely confuse the driver -
since from the driver's POV we'd remove the interface it's currently
managing the scan for.


So help me understand this better.


include/net/mac80211.h:

int (*hw_scan)(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
   struct ieee80211_scan_request *req);

How is it difficult to understand that with an API like that, it might
not be a good idea to remove the vif from the driver while it's
scanning?


Right, so you're talking in the context of this implementation which 
performs a remove/add interface.  You're right about that.


But I was asking more in general terms.  If all we're doing is scanning, 
can we just change the mac?  Anyway, not important.  Maybe I bring this 
up once this set is accepted.


Regards,
-Denis


Re: [PATCH 2/2] mac80211: Support LIVE_ADDRESS_CHANGE feature

2019-10-08 Thread Denis Kenzior

Hi Johannes,


But they shouldn't change due a mac address change?  I wonder if we can
further relax the requirements to allow mac change if
NL80211_SCAN_FLAG_RANDOM_ADDR was used?


No, at least with HW scan that would completely confuse the driver -
since from the driver's POV we'd remove the interface it's currently
managing the scan for.


So help me understand this better.  Just by virtue of copying the new 
mac into sdata->vif.addr we'd be confusing the driver such that it can't 
associate the ongoing scan request to the wdev it was started on?  Even 
when it has all this info?  I mean you have a single scan request on a 
phy, how hard can this be? :)


Note that some apps perform poor-man's scan address randomization by 
varying the mac (I assume prior to each nth scan).  So being able to 
change the mac while scanning might be a boon to them.  I personally 
don't think changing mac via rtnl to accomplish this is a great idea, 
but just tossing it out for you.


Regards,
-Denis


[PATCH] nl80211: trivial: Remove redundant loop

2019-10-08 Thread Denis Kenzior
cfg80211_assign_cookie already checks & prevents a 0 from being
returned, so the explicit loop is unnecessary.

Signed-off-by: Denis Kenzior 
---
 net/wireless/nl80211.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index d21b1581a665..57bade7ea41c 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -8227,10 +8227,8 @@ static int nl80211_start_sched_scan(struct sk_buff *skb,
/* leave request id zero for legacy request
 * or if driver does not support multi-scheduled scan
 */
-   if (want_multi && rdev->wiphy.max_sched_scan_reqs > 1) {
-   while (!sched_scan_req->reqid)
-   sched_scan_req->reqid = cfg80211_assign_cookie(rdev);
-   }
+   if (want_multi && rdev->wiphy.max_sched_scan_reqs > 1)
+   sched_scan_req->reqid = cfg80211_assign_cookie(rdev);
 
err = rdev_sched_scan_start(rdev, dev, sched_scan_req);
if (err)
-- 
2.21.0



[PATCH] mac80211: More strictly validate .abort_scan

2019-10-08 Thread Denis Kenzior
nl80211 requires NL80211_CMD_ABORT_SCAN to have a wdev or netdev
attribute present and checks that if netdev is provided it is UP.
However, mac80211 does not check that an ongoing scan actually belongs
to the netdev/wdev provided by the user.  In other words, it is possible
for an application to cancel scans on an interface it doesn't manage.

Signed-off-by: Denis Kenzior 
Cc: sta...@vger.kernel.org
---
 net/mac80211/cfg.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 70739e746c13..ece344f9e9ca 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -2333,7 +2333,13 @@ static int ieee80211_scan(struct wiphy *wiphy,
 
 static void ieee80211_abort_scan(struct wiphy *wiphy, struct wireless_dev 
*wdev)
 {
-   ieee80211_scan_cancel(wiphy_priv(wiphy));
+   struct ieee80211_local *local = wiphy_priv(wiphy);
+   struct ieee80211_sub_if_data *sdata =
+   IEEE80211_WDEV_TO_SUB_IF(wdev);
+   bool cancel_scan = rcu_access_pointer(local->scan_sdata) == sdata;
+
+   if (cancel_scan)
+   ieee80211_scan_cancel(local);
 }
 
 static int
-- 
2.21.0



Re: [PATCH 2/2] mac80211: Support LIVE_ADDRESS_CHANGE feature

2019-10-08 Thread Denis Kenzior

Hi Johannes,


Indeed.  But that is not what you were suggesting earlier with just
checking local->scanning.  So if scan_req contains a wdev, then yes it
should be possible to compare the scan_req->wdev to the interface being
changed.


Well, yes, but only because I was incrementally going from James's
patch, which was checking that only.


Well, something to improve.  Sometimes it is pretty hard to figure out 
what you mean.




Similar with the other local-> things being checked here, btw, though in
some cases it might be harder to actually determine which wdev is doing
something and which isn't.


Right




No, this typically cannot be fixed, and it doesn't really make sense.
The NIC cannot possibly do two scans at a time since it has only a
single radio resource :-)


So why is the scan request not per phy then?  And should mac address
even affect the ongoing scan?  Can we simply change it with the scan
ongoing?


There are things that affect the scan from the interface, e.g.
capability overrides, (extended) capabilities, the MAC address is used
unless randomization is requested, etc.



But they shouldn't change due a mac address change?  I wonder if we can 
further relax the requirements to allow mac change if 
NL80211_SCAN_FLAG_RANDOM_ADDR was used?


Regards,
-Denis


Re: [PATCH 2/2] mac80211: Support LIVE_ADDRESS_CHANGE feature

2019-10-08 Thread Denis Kenzior

Hi Johannes,

On 10/8/19 10:52 AM, Johannes Berg wrote:

Hi,


You could have two interfaces, one which is scanning right now, right?
And then theoretically you don't care about the other one - it *should*
be OK to remove/re-add (with new MAC address) the one that *isn't*
scanning, right?


Actually, I don't think you can?  Unless I'm missing something?  All the
scan state is stored on struct ieee80211_local, so if that struct is
allocated per phy as you point out below, then what you suggest is
currently not possible?


?

The scan_req struct contains a reference to which interface is scanning,
so it should very well be possible to have

phy0:
  wlan0: IFF_UP & scanning
  wlan1: IFF_UP & change MAC address all the time

just like it's possible to change the MAC address when wlan1 *isn't*
IFF_UP even if wlan0 is scanning, right?



Indeed.  But that is not what you were suggesting earlier with just 
checking local->scanning.  So if scan_req contains a wdev, then yes it 
should be possible to compare the scan_req->wdev to the interface being 
changed.



But we don't have that granularity here for anything - you're just
checking "sdata->local->something", and by going from sdata to local
you've now checked the whole NIC, not just a single interface on that
NIC.


Right.  But that seems to be a limitation of mac80211 actually.  We
can't run two scans concurrently on different interfaces.  This is
rather unintuitive given that scan requests require an ifindex/wdev.

Can this be changed / fixed in mac80211 actually?  I would expect that
if a card supports p2p and station simultaneously, then it can scan / go
offchannel on two interfaces simultaneously? Or not?  What can iwlwifi
do for example?


No, this typically cannot be fixed, and it doesn't really make sense.
The NIC cannot possibly do two scans at a time since it has only a
single radio resource :-)


So why is the scan request not per phy then?  And should mac address 
even affect the ongoing scan?  Can we simply change it with the scan 
ongoing?





Given the above, I'm not sure I see anything wrong?  The switch/case can
probably be gotten rid of, but it actually makes things clear that only
station/p2p_device and adhoc are handled specially.


I just don't think they *should* be handled specially.



Fair enough.


Given your code now, you can have

phy0:
  wlan0: STATION, IFF_UP & something is doing remain-on-channel
  wlan1: STATION, IFF_UP

--> cannot change wlan1 MAC address

phy0:
  wlan0: STATION, IFF_UP & something is doing remain-on-channel

wlan1: AP, IFF_UP

--> *can* change wlan1 MAC address

This doesn't really make much sense?



Agreed.

Regards,
-Denis


Re: [PATCH 2/2] mac80211: Support LIVE_ADDRESS_CHANGE feature

2019-10-08 Thread Denis Kenzior

Hi Johannes,

On 10/7/19 4:16 PM, Johannes Berg wrote:

Hi,


If you do care about this being more granular then you should check
*which* interface is scanning, and then you can still switch the
MAC
address for *other* interfaces - but I'd still argue it should be
independent of interface type.


So yes these can scan, but this should be covered by the
netif_carrier_ok check which is done first.


Not sure what you mean by that.

You could have two interfaces, one which is scanning right now, right?
And then theoretically you don't care about the other one - it *should*
be OK to remove/re-add (with new MAC address) the one that *isn't*
scanning, right?


Actually, I don't think you can?  Unless I'm missing something?  All the 
scan state is stored on struct ieee80211_local, so if that struct is 
allocated per phy as you point out below, then what you suggest is 
currently not possible?




But we don't have that granularity here for anything - you're just
checking "sdata->local->something", and by going from sdata to local
you've now checked the whole NIC, not just a single interface on that
NIC.


Right.  But that seems to be a limitation of mac80211 actually.  We 
can't run two scans concurrently on different interfaces.  This is 
rather unintuitive given that scan requests require an ifindex/wdev.


Can this be changed / fixed in mac80211 actually?  I would expect that 
if a card supports p2p and station simultaneously, then it can scan / go 
offchannel on two interfaces simultaneously? Or not?  What can iwlwifi 
do for example?




Which is fine, no complaint from me, just in that case you end up
failing when really there isn't much need to fail. In fact, in a case
like this, actually clearing IFF_UP, changing address and setting IFF_UP
would work, concurrently with another interface scanning.


  We can just remove the
switch entirely, but the roc_list/scanning check only matters for
station/p2p_client so checking for the other interface types is kinda
pointless and redundant.


But it's also completely confusing to do it this way because you go from
"sdata" to "local", and at that point the data that you're working on is
no longer specific to that one interface, it's actually for the whole
NIC.


I agree its confusing, but that seems to be how mac80211 works?



Basically what I'm saying is this: it's confusing and makes no sense to
do something like

if (this_is_a_certain_netdev_type)
check_some_global_data();


Also I am not sure what you mean by *which* interface. This function is
called on a single interface, so checking what other interfaces are
doing seems strange...


My point exactly - but that's what you're doing here in the code. Now I
think perhaps without even realizing?



Given the above, I'm not sure I see anything wrong?  The switch/case can 
probably be gotten rid of, but it actually makes things clear that only 
station/p2p_device and adhoc are handled specially.


Regards,
-Denis


Re: [PATCH 2/2] mac80211: Support LIVE_ADDRESS_CHANGE feature

2019-10-08 Thread Denis Kenzior

Hi Johannes,


And, I'm confused, but isn't the polarity of the scanning check
wrong?


Ah yeah, after you pointed that out I realized 'scanning' is a bit
field. I should be doing:

test_bit(SCAN_HW_SCANNING, &sdata->local->scanning)


I think checking for all the bits is fine (and necessary, just HW scan
is unlikely to be enough, changing the MAC address would also disrupt a
software scan) - just need to invert the polarity?


I concur that scanning should be checked as
if (sdata->local->scanning).  So Johannes you're right that the polarity 
is reversed.  However, __ieee80211_start_scan seems to check for 
local->scan_req instead to take deferred scans into account.  So I 
wonder if that is a better approach.


Regards,
-Denis


Re: [RFC 0/4] Allow live MAC address change

2019-09-17 Thread Denis Kenzior

Hi Kalle,

For user experience scanning and DHCP are also important, what kind of > numbers you get when those are included? No need to have anything> 

precise, I would like just to get an understanding where we are> nowadays.

Scanning heavily depends on the RF environment and the hardware.  In our 
testing ath9k takes stupid long to scan for example.


But in a sort of best case scenario, using limited scan and no mac 
change, iwd connects in ~300ms.  People have reported that they have not 
finished opening their laptop screen and they're connected, so at that 
level of latency, every millisecond is important and totally worth 
fighting for.  Randomizing the MAC would penalize our connection times 
by 2X (300 ms at least).  And Android folks have reported the penalty to 
be as high as 3 seconds.  So this needs to be fixed.  And do note that 
this is a feature every modern OS implements by default.




As you only provided one number it's clear that you are only working
with one driver. But for us it's not that simple, we have to support a


Please don't jump to conclusions like you seem to be doing here.  James 
gave you one number that is pretty typical.  If you want us to provide 
numbers for other drivers under given conditions, just ask.  We have a 
framework for timing these.



myriad of different types of hardware and there can be complications and
additions later on, even for simple features. Like the dynamic power
save support I submitted to mac80211 over 10 years which was supposed to
be simple, and still we talk almost every year how do we get it out of
mac80211 as it makes maintenance difficult.



I'm not sure what point you're trying to make here?

Regards,
-Denis


Re: [PATCH 04/11] wil6210: fix PTK re-key race

2019-09-17 Thread Denis Kenzior

Hi Alexander,


And that the intend of it is to replace the "old" path.

Correct.



So the best way forward here would be to

1) implement the patch here to work around the problem without 
control_port or the theoretical QDISC bypass

2) start implementing control port for the future.

correct?



I don't know what the right answer is, but it seems strange to me that 
we developed a 'better way', upstreamed it several years ago, but are 
still trying to kludge around adding special flags to what is now 
considered a legacy approach.  Also disconcerting that not a single 
fullmac driver has added support for this 'better way' yet.




CONTROL_PORT was added specifically to take care of the re-keying 
races and can be extended with additional attributes over time, as 
needed (perhaps for extended key id, etc).  Also note that in our 
testing CONTROL_PORT is _way_ faster than PAE socket...




Extended Key ID is pretty robust when rekeying and the driver/card only 
has to take care to not mix KeyIDs within one A-MPDU. It's no problem 
encrypting eapol#4 with the new key. You can even encrypt it at the 
initial connect and it will work. Basically all races the "classical" 
rekey has to work around go away.


For "normal" rekeys it's working pretty well with ath9k and iwlwifi even 
without control_port and just learned some weeks ago that QDISC could 
still cause issues...


Okay, if control port doesn't need to handle extended keys then even better.

By the way, thanks for your earlier explanation (upthread).

Regards,
-Denis


Re: [PATCH 04/11] wil6210: fix PTK re-key race

2019-09-13 Thread Denis Kenzior

Hi Arend, Alexander,

Basically, we now have two bypass methods dealing with the same/similar 
issue:


1) bypass the QDISC.
2) bypass network stack entirely with CONTROL_PORT.


It also raises the question in my mind as to why we have two ways of 
doing the same thing?  From the discussion so far it also sounds like 
each requires somewhat different / special handling in the driver. 
Wouldn't it make sense to deprecate one and encourage drivers to 
implement the other?


CONTROL_PORT was added specifically to take care of the re-keying races 
and can be extended with additional attributes over time, as needed 
(perhaps for extended key id, etc).  Also note that in our testing 
CONTROL_PORT is _way_ faster than PAE socket...


Regards,
-Denis


Re: [PATCH 04/11] wil6210: fix PTK re-key race

2019-09-12 Thread Denis Kenzior

Hi Alexander,

I don't know anything about the driver here but in mac80211 the idea to 
avoid the race is to simply flush the queues prior deleting the outgoing 
key.


Maybe a silly question, but what does flushing the queue mean in this 
context?  Is it waiting for all the packets to be sent or dropping them 
on the floor?




Now wpa_supplicant is not yet bypassing qdisks, but adding the socket 
parameter PACKET_QDISC_BYPASS is basically a one-liner in wpa_supplicant 
and should allow a generic way for drivers to avoid the race with a 
simple queue flush...


Can you expand on this actually?  What would the sequence of events be?

Also, how would this be made to work with CONTROL_PORT over NL80211 ?

Regards,
-Denis


Re: [RFCv3 2/3] nl80211: Support >4096 byte NEW_WIPHY event nlmsg

2019-09-11 Thread Denis Kenzior

Hi Johannes,

On 9/11/19 10:28 AM, Johannes Berg wrote:

On Wed, 2019-09-11 at 07:41 -0500, Denis Kenzior wrote:


For my dual band cards the channel dump all fits into a ~1k attribute if
I recall correctly.  So there may not really be a need for this.  Or at
the very least we could keep things simple(r) and only split at the band
level, not at the individual channel level.


Yeah, that does seem reasonable, especially if we're moving to bigger
messages anyway. If we do add something huge to each channel, we can
recover that code I suppose.


So do you want me to drop the channel splitting logic and only allow 
this for bands?  Or just keep this since it is already done?



The current logic uses last_channel_pos for some of the trickery in
addition to last_good_pos.  So nla_put_failure would have to be made
aware of that.  Perhaps we can store last_good_pos in the stack, but the
split mechanism only allows the splits to be done at certain points...


Hmm, not sure I understand. last_channel_pos and last_good_pos are
basically equivalent, no? In fact, I'm not sure why you need
last_channel_pos vs. just using last_good_pos instead? Maybe I'm missing
something.


Sort of.  The way I did it, last_channel_pos keeps track of whether any 
channel info was added or not.  So if NULL, we simply backtrack to 
last_good_pos in nla_put_failure. You can probably use last_good_pos for 
everything and an extra variable for the channel info tracking.




To me, conceptually, the "state->band_start" and "state->chan_start" is
basically a sub-state of "state->split", so this is underneath state-

split == 3 (I think?), you basically get 3.0.0, 3.0.1, 3.0.2, ...,

3.1.0, 3.1.1 ... for the state? Which you have to unwind in terms of
message formatting, but the last_good_pos should be sufficient?


Right.  And as I mentioned above, this could be done, but you probably 
need another state variable..




IOW, the only difference I see between the normal split states 1, 2, ...
and the band/channel split states 3.0.0, 3.0.1, ... is the fact that we
have to also fix up the nested attributes after we trim to last_good_pos
on failures. Where am I wrong?



Didn't say that you were ;)


Right now only the channel dump uses this (and I'm still not fully
convinced we should go to all the trouble), so one argument would be not
to introduce something this generic until another user of it manifests
itself?


I was thinking it'd actually be less complex, but I guess I have to try
to write it to see for myself.


To be clear, I think it is a good approach and can be made to work.  My 
main hesitation is whether doing it now is worth it given the discussion 
at the very top.  But I can see what I can come up with if you want.


Regards,
-Denis


Re: [PATCH] nl80211: Support mgmt frame unregistrations

2019-09-11 Thread Denis Kenzior

Hi Johannes,

On 9/11/19 3:53 AM, Johannes Berg wrote:

On Wed, 2019-09-04 at 11:22 -0500, Denis Kenzior wrote:

To state another way, it is
currently not possible to write a userspace application that utilizes a
single nl80211 genl socket, instead it must open multiple genl sockets
for multiple wdevs on multiple phys.


I don't see how this is too onerous for the application, every
application is basically going to have an event loop anyway.


What does having an event loop have to do with this? :)



Thus, I don't really see any reason for us to add a bunch of code just
to make an application track fewer file descriptors - we need to have
the cleanup on close already anyway, so why not actually exercise those
code paths?


I just find this super wasteful.  We have instances where we need to 
register to a single management frame temporarily.  So opening and 
closing a socket just for that is just bloat.




I do note that with the "unregister on iftype change" patch you could
switch to an unsupported type and reach this, but I don't think you'd
want to rely on that :-)



Not sure I understand?


Possibly I could imagine a reason for this if you needed a single socket
for functional reason, but you're not really giving any such reason. I
could imagine that there might be races, but I'm having a hard time
coming up with a scenario where they actually matter ... if you really
really get a race between e.g. RX-AUTH and INTERFACE-DEL you'll try to
do some operations that will just fail, but so what?


- Waste on the userspace side.  Typically userspace uses some sort of 
abstraction for tracking genl sockets.  So it has to allocate buffers, 
etc.  Can we get around that? Sure, but you're not winning any arguments 
that the nl80211 is 'nice to use' that way.


- Waste on the kernel side.  Each socket costs something for the kernel, 
makes things harder to audit, etc, etc.  And we now have people trying 
to stuff 15+ cards into a single system.  Each card might have multiple 
netdevs.  Each netdev might need multiple file descriptors open.  So 
we're ending up needing 30-60, or whatever file descriptors when we 
could just as easily use 1.  Extreme case? Sure, but I like to remove 
bloat whenever / wherever I can.


Regards,
-Denis


Re: [RFCv3 3/3] nl80211: Send large new_wiphy events

2019-09-11 Thread Denis Kenzior

Hi Johannes,

On 9/11/19 10:12 AM, Johannes Berg wrote:

On Wed, 2019-09-11 at 07:20 -0500, Denis Kenzior wrote:


I'm not sure I see how the applications could do buffers that are
"inherently" large enough, there's no practical message size limit, is
there (32-bits for the size).


The kernel caps this to 32k right now if I read the code correctly.  But
fair point.


The kernel caps this for dumps only, no? We can allocate here ourselves
for multicasting a message as large as we like I think.



Right, but it is set for only 8k at the moment.  Anyway, I will take 
care of this.



+   if (WARN_ON(nl80211_send_wiphy(rdev, cmd, msg, 0, 0, 0, &state) < 0)) {
+   nlmsg_free(msg);
+   goto legacy;
+   }
+
+   genlmsg_multicast_netns(&nl80211_fam, wiphy_net(&rdev->wiphy), msg, 0,
+   NL80211_MCGRP_CONFIG2, GFP_KERNEL);
+
+legacy:


nit: just use "else" instead of the goto?


I'm not sure I understand?  We want to send both messages here...


It's equivalent to:

-
if (WARN_ON(nl80211_send_wiphy(...) < 0)
nlmsg_free(msg);
else
genlmsg_multicast_netns(...);

... code for legacy ...
-

no?


Ah, now I see what you want.  Sure I will take care of this in v4.

Regards,
-Denis


Re: [RFCv3 2/3] nl80211: Support >4096 byte NEW_WIPHY event nlmsg

2019-09-11 Thread Denis Kenzior

Hi Johannes,

On 9/11/19 4:44 AM, Johannes Berg wrote:

Hi,

The first patch looks good, couple of nits/comments on this one below.

On Fri, 2019-09-06 at 10:43 -0500, Denis Kenzior wrote:

For historical reasons, NEW_WIPHY messages generated by dumps or
GET_WIPHY commands were limited to 4096 bytes.  This was due to the
fact that the kernel only allocated 4k buffers prior to commit
9063e21fb026 ("netlink: autosize skb lengthes").  Once the sizes of
NEW_WIPHY messages exceeded these sizes, split dumps were introduced.


Actually, userspace prior to around the same time *also* only used 4k
buffers (old libnl), and so even with that kernel we still could
possibly have to deal with userspace that had 4k messages only ... but
we could have solved that part trivially instead of adding code to split
it, just the kernel part was still in the way then.

Anyway, I can reword this per my understanding (but will have to reread
all my messages I guess).



Sure


- The code in case '3' is quite complex, but it does try to support a
   message running out of room in the middle of a channel dump and
   restarting from where it left off in the next split message.  Perhaps
   this can be simplified, but it seems this capability is useful.
   Please take extra care when reviewing this.


Is it useful? You say it basically all fits today, and that means the
channels will either fit into a single message or not ... Then again, if
we add a lot of channels or a lot more data to each channel. Hmm. OK, I
guess better if we do have it.



For my dual band cards the channel dump all fits into a ~1k attribute if 
I recall correctly.  So there may not really be a need for this.  Or at 
the very least we could keep things simple(r) and only split at the band 
level, not at the individual channel level.


All the cards I tried would split well after case 9 with 4096 byte 
buffers anyway.  The channel dump is quite early in the message and it 
would really need to become bloated for this code path to be triggered...





+   void *last_good_pos = 0;


Use NULL.


Will fix




+   last_good_pos = nlmsg_get_pos(msg);
state->split_start++;


Maybe we're better off having a local macro for these two lines? That
way, we don't risk updating one without the other, which would be
confusing.


Yep, will do that.




@@ -2004,81 +2004,78 @@ static int nl80211_send_wiphy(struct 
cfg80211_registered_device *rdev,
if (!nl_bands)
goto nla_put_failure;
  
-		for (band = state->band_start;

-band < NUM_NL80211_BANDS; band++) {
+   /* Position in the buffer if we added a set of channel info */
+   last_channel_pos = 0;


NULL


Will fix




[snip]



+chan_put_failure:
+   if (!last_channel_pos)
+   goto nla_put_failure;
+
+   nlmsg_trim(msg, last_channel_pos);
+   nla_nest_end(msg, nl_freqs);
nla_nest_end(msg, nl_band);
  
-			if (state->split) {

-   /* start again here */
-   if (state->chan_start)
-   band--;
+   if (state->chan_start < sband->n_channels)
break;
-   }
+
+   state->chan_start = 0;
+   state->band_start++;
}
-   nla_nest_end(msg, nl_bands);
  
-		if (band < NUM_NL80211_BANDS)

-   state->band_start = band + 1;
-   else
-   state->band_start = 0;
+band_put_failure:
+   if (!last_channel_pos)
+   goto nla_put_failure;
+
+   nla_nest_end(msg, nl_bands);
  
-		/* if bands & channels are done, continue outside */

-   if (state->band_start == 0 && state->chan_start == 0)
-   state->split_start++;
-   if (state->split)
+   if (state->band_start < NUM_NL80211_BANDS)
break;


Thinking out loud, maybe we could simplify this by just having a small
"stack" of nested attributes to end?

I mean, essentially, you have here similar code to the nla_put_failure
label, in that it finishes and sends out the message, except here you
have to end a bunch of nested attributes.

What if we did something like

#define dump_nest_start(msg, attr) ({   \
struct nlattr r = nla_nest_start_noflag(msg, attr); \
BUG_ON(nest_stack_depth >= ARRAY_SIZE(nest_stack);   \
nest_stack[nest_stack_depth++] = r; \
r;  \
})

#define dump_nest_end(msg, r) do {   

Re: [RFCv3 3/3] nl80211: Send large new_wiphy events

2019-09-11 Thread Denis Kenzior

hi Johannes,

On 9/11/19 4:47 AM, Johannes Berg wrote:

On Fri, 2019-09-06 at 10:43 -0500, Denis Kenzior wrote:


+ * There are no limits (outside of netlink protocol limits) on
+ * message sizes that can be sent over the "config2" multicast group. It
+ * is assumed that applications utilizing "config2" multicast group
+ * utilize buffers that are inherently large enough or can utilize
+ * MSG_PEEK/MSG_TRUNC in the netlink transport in order to allocate big
+ * enough buffers.


I'm not sure I see how the applications could do buffers that are
"inherently" large enough, there's no practical message size limit, is
there (32-bits for the size).


The kernel caps this to 32k right now if I read the code correctly.  But 
fair point.




I'd argue this should just say that applications should use large
buffers and still use MSG_PEEK/handle MSG_TRUNC, but I can also edit it
later.


+   msg = nlmsg_new(alloc_size, GFP_KERNEL);
+   if (!msg)
+   goto legacy;
+
+   if (WARN_ON(nl80211_send_wiphy(rdev, cmd, msg, 0, 0, 0, &state) < 0)) {
+   nlmsg_free(msg);
+   goto legacy;
+   }
+
+   genlmsg_multicast_netns(&nl80211_fam, wiphy_net(&rdev->wiphy), msg, 0,
+   NL80211_MCGRP_CONFIG2, GFP_KERNEL);
+
+legacy:


nit: just use "else" instead of the goto?


I'm not sure I understand?  We want to send both messages here...

Regards,
-Denis


Re: [PATCH] cfg80211: Purge frame registrations on iftype change

2019-09-11 Thread Denis Kenzior

Hi Johannes,


'typical' failure paths.  I didn't add such checking in the other patch
set since I felt you might find it overly intrusive on userspace.  But
maybe we really should do this?


As I just said on the other patch, I think we probably should do that
there, if just to be able to advertise a correct set of interface types
that you can switch between there. I don't see how it'd be more
intrusive to userspace than failing later? :-)


What I was worried about was that all the fullmac drivers would have had 
to be updated to set the feature bit, and it would have caused 
wpa_s/hostapd to no longer be able to do the whole set_iftype -> ebusy 
-> ifdown & set_iftype retry logic until all were updated.



I would concur as that is what happens today.  But should it?


Well, dunno, what should happen? If you ask drivers they might want to
remove & re-register after, for those registrations that are still
possible.



I would think you'd want to define a clear order of operations that 
cfg80211 / mac80211 would enforce :)




Let's not then.

I've applied this patch now.



Great, thanks.

Regards,
-Denis


[RFCv3 1/3] nl80211: Fix broken non-split wiphy dumps

2019-09-06 Thread Denis Kenzior
If a (legacy) client requested a wiphy dump but did not provide the
NL80211_ATTR_SPLIT_WIPHY_DUMP attribute, the dump was supposed to be
composed of purely non-split NEW_WIPHY messages, with 1 wiphy per
message.  At least this was the intent after commit:
3713b4e364ef ("nl80211: allow splitting wiphy information in dumps")

However, in reality the non-split dumps were broken very shortly after.
Perhaps around commit:
fe1abafd942f ("nl80211: re-add channel width and extended capa advertising")

The reason for the bug is a missing setting of split_start to 0 in the
case of a non-split dump.

Here is a sample non-split dump performed on kernel 4.19, some parts
were cut for brevity:
< Request: Get Wiphy (0x01) len 0 [ack,0x300]
> Result: New Wiphy (0x03) len 3496 [multi]
Wiphy: 0 (0x)
Wiphy Name: phy0
Generation: 1 (0x0001)

> Result: New Wiphy (0x03) len 68 [multi]
Wiphy: 0 (0x)
Wiphy Name: phy0
Generation: 1 (0x0001)
Extended Capabilities: len 8
Capability: bit  2: Extended channel switching
Capability: bit 62: Opmode Notification
Extended Capabilities Mask: len 8
04 00 00 00 00 00 00 40  ...@
VHT Capability Mask: len 12
f0 1f 80 33 ff ff 00 00 ff ff 00 00  ...3
> Result: New Wiphy (0x03) len 28 [multi]
Wiphy: 0 (0x)
Wiphy Name: phy0
Generation: 1 (0x0001)
> Result: New Wiphy (0x03) len 28 [multi]
Wiphy: 0 (0x)
Wiphy Name: phy0
Generation: 1 (0x0001)
> Result: New Wiphy (0x03) len 52 [multi]
Wiphy: 0 (0x)
Wiphy Name: phy0
Generation: 1 (0x0001)
Max CSA Counters: len 1
02   .
Scheduled Scan Maximum Requests: len 4
01 00 00 00  
Extended Features: len 4
02 02 00 04  
> Result: New Wiphy (0x03) len 36 [multi]
Wiphy: 0 (0x)
Wiphy Name: phy0
Generation: 1 (0x0001)
Reserved: len 4
00 00 00 00  
> Complete: Get Wiphy (0x01) len 4 [multi]
Status: 0

Signed-off-by: Denis Kenzior 
---
 net/wireless/nl80211.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 3e30e18d1d89..ff6200fcd492 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -2191,6 +2191,9 @@ static int nl80211_send_wiphy(struct 
cfg80211_registered_device *rdev,
 * but break unconditionally so unsplit data stops here.
 */
state->split_start++;
+
+   if (!state->split)
+   state->split_start = 0;
break;
case 9:
if (rdev->wiphy.extended_capabilities &&
-- 
2.19.2



[RFCv3 2/3] nl80211: Support >4096 byte NEW_WIPHY event nlmsg

2019-09-06 Thread Denis Kenzior
For historical reasons, NEW_WIPHY messages generated by dumps or
GET_WIPHY commands were limited to 4096 bytes.  This was due to the
fact that the kernel only allocated 4k buffers prior to commit
9063e21fb026 ("netlink: autosize skb lengthes").  Once the sizes of
NEW_WIPHY messages exceeded these sizes, split dumps were introduced.

Any new, non-legacy data was added only to messages using split-dumps
(including filtered dumps).

However, split-dumping has quite a significant overhead.  On cards
tested, split dumps generated message sizes 1.7-1.8x compared to
non-split dumps, while still comfortably fitting into an 8k buffer.  The
kernel now expects userspace to provide 16k buffers by default, and 32k
buffers are possible.

The kernel netlink layer is now much smarter and utilizes certain
heuristics for figuring out what buffer sizes userspace provides, so it
can allocate optimally sized buffers for netlink messages accordingly.
This commit changes the split logic so that messages are packed more
compactly into the (potentially) larger buffers provided by userspace.

If large-enough buffers are provided, then split dumps will generate a
single netlink NEW_WIPHY message for each wiphy being dumped, removing
any overhead.

Signed-off-by: Denis Kenzior 
---
 net/wireless/nl80211.c | 222 +
 1 file changed, 112 insertions(+), 110 deletions(-)

Changes since last version:

- Patch completely rewritten based on Johannes' feedback.  We now try to
  pack as many attributes as can fit into the current message.  If the
  application uses large enough buffers (typically 8k is sufficient as
  of the time of this writing), then no splitting is even required.
- Rewrote the commit description based on Johannes' git history
  findings.  E.g. the kernel was at fault for the 4096 byte buffer size
  limits and not userspace.
- Patch 3 was dropped as it was no longer required

Other thoughts:

- I tested the split dump with 3k, 4k and 8k userspace buffers and
  things seem to work as expected.
- The code in case '3' is quite complex, but it does try to support a
  message running out of room in the middle of a channel dump and
  restarting from where it left off in the next split message.  Perhaps
  this can be simplified, but it seems this capability is useful.
  Please take extra care when reviewing this.
- I dropped the split and restart logic in case 13 as no current driver
  besides iwlwifi seems to support the attributes here, and the
  attributes appear to be quite small anyway.

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index ff6200fcd492..03421f66eea3 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -1854,8 +1854,8 @@ static int nl80211_send_pmsr_capa(struct 
cfg80211_registered_device *rdev,
 struct nl80211_dump_wiphy_state {
s64 filter_wiphy;
long start;
-   long split_start, band_start, chan_start, capa_start;
-   bool split;
+   long split_start, band_start, chan_start;
+   bool legacy;
 };
 
 static int nl80211_send_wiphy(struct cfg80211_registered_device *rdev,
@@ -1867,8 +1867,9 @@ static int nl80211_send_wiphy(struct 
cfg80211_registered_device *rdev,
struct nlattr *nl_bands, *nl_band;
struct nlattr *nl_freqs, *nl_freq;
struct nlattr *nl_cmds;
-   enum nl80211_band band;
struct ieee80211_channel *chan;
+   void *last_good_pos = 0;
+   void *last_channel_pos;
int i;
const struct ieee80211_txrx_stypes *mgmt_stypes =
rdev->wiphy.mgmt_stypes;
@@ -1939,9 +1940,9 @@ static int nl80211_send_wiphy(struct 
cfg80211_registered_device *rdev,
if ((rdev->wiphy.flags & WIPHY_FLAG_TDLS_EXTERNAL_SETUP) &&
nla_put_flag(msg, NL80211_ATTR_TDLS_EXTERNAL_SETUP))
goto nla_put_failure;
+
+   last_good_pos = nlmsg_get_pos(msg);
state->split_start++;
-   if (state->split)
-   break;
/* fall through */
case 1:
if (nla_put(msg, NL80211_ATTR_CIPHER_SUITES,
@@ -1986,17 +1987,16 @@ static int nl80211_send_wiphy(struct 
cfg80211_registered_device *rdev,
}
}
 
+   last_good_pos = nlmsg_get_pos(msg);
state->split_start++;
-   if (state->split)
-   break;
/* fall through */
case 2:
if (nl80211_put_iftypes(msg, NL80211_ATTR_SUPPORTED_IFTYPES,
rdev->wiphy.interface_modes))
goto nla_put_failure;
+
+   last_good_pos = nlmsg_get_pos(msg);
state->split_start++;
-   if (state->split)
-   break;
/* fall through */
case 3:
   

[RFCv3 3/3] nl80211: Send large new_wiphy events

2019-09-06 Thread Denis Kenzior
Send large NEW_WIPHY events on a new multicast group so that clients
that can accept larger messages do not need to round-trip to the kernel
and perform extra filtered wiphy dumps.

A new multicast group is introduced and the large message is sent before
the legacy message.  This way clients that listen on both multicast
groups can ignore duplicate legacy messages if needed.

Signed-off-by: Denis Kenzior 
---
 include/uapi/linux/nl80211.h | 31 +++
 net/wireless/nl80211.c   | 34 --
 2 files changed, 63 insertions(+), 2 deletions(-)

Changes in this version:

- Updated the docs based on Johannes' feedback
- Added WARN_ON in case the large message building fails (e.g. our
  buffer size needs to be increased)
- Minor style fixes based on Johannes' feedback
- Added a missing skb_get to take an extra reference when sending
  NEW/DEL INTERFACE events.

diff --git a/include/uapi/linux/nl80211.h b/include/uapi/linux/nl80211.h
index beee59c831a7..7a125cb4d9d9 100644
--- a/include/uapi/linux/nl80211.h
+++ b/include/uapi/linux/nl80211.h
@@ -50,6 +50,7 @@
 #define NL80211_MULTICAST_GROUP_MLME   "mlme"
 #define NL80211_MULTICAST_GROUP_VENDOR "vendor"
 #define NL80211_MULTICAST_GROUP_NAN"nan"
+#define NL80211_MULTICAST_GROUP_CONFIG2"config2"
 #define NL80211_MULTICAST_GROUP_TESTMODE   "testmode"
 
 #define NL80211_EDMG_BW_CONFIG_MIN 4
@@ -267,8 +268,30 @@
  * @NL80211_CMD_NEW_WIPHY: Newly created wiphy, response to get request
  * or rename notification. Has attributes %NL80211_ATTR_WIPHY and
  * %NL80211_ATTR_WIPHY_NAME.
+ *
+ * Note that when %NL80211_CMD_NEW_WIPHY is being sent as an event, it
+ * will be multicast on two groups: "config" and "config2".  The messages
+ * on the two multicast groups will be different.  On "config" multicast
+ * group, %NL80211_CMD_NEW_WIPHY messages will be 'reduced' size and will
+ * only contain legacy information.  This is due to historical kernel
+ * behavior that limited such messages to 4096 bytes.  The "config2"
+ * multicast group was introduced to support applications that can
+ * allocate larger buffers and can thus accept new wiphy events with
+ * the full set of information included.  Messages on the "config2"
+ * multicast group are sent before the "config" multicast group.
+ *
+ * There are no limits (outside of netlink protocol limits) on
+ * message sizes that can be sent over the "config2" multicast group. It
+ * is assumed that applications utilizing "config2" multicast group
+ * utilize buffers that are inherently large enough or can utilize
+ * MSG_PEEK/MSG_TRUNC in the netlink transport in order to allocate big
+ * enough buffers.
  * @NL80211_CMD_DEL_WIPHY: Wiphy deleted. Has attributes
  * %NL80211_ATTR_WIPHY and %NL80211_ATTR_WIPHY_NAME.
+ * Note that when %NL80211_CMD_DEL_WIPHY is being sent as an event, it
+ * will be multicast on two groups: "config" and "config2".  Messages on
+ * the "config2" multicast group are sent before the "config" multicast
+ * group.
  *
  * @NL80211_CMD_GET_INTERFACE: Request an interface's configuration;
  * either a dump request for all interfaces or a specific get with a
@@ -281,10 +304,18 @@
  * be sent from userspace to request creation of a new virtual interface,
  * then requires attributes %NL80211_ATTR_WIPHY, %NL80211_ATTR_IFTYPE and
  * %NL80211_ATTR_IFNAME.
+ * Note that when %NL80211_CMD_NEW_INTERFACE is being sent as an event, it
+ * will be multicast on two groups: "config" and "config2".  Messages on
+ * the "config2" multicast group are sent before the "config" multicast
+ * group.
  * @NL80211_CMD_DEL_INTERFACE: Virtual interface was deleted, has attributes
  * %NL80211_ATTR_IFINDEX and %NL80211_ATTR_WIPHY. Can also be sent from
  * userspace to request deletion of a virtual interface, then requires
  * attribute %NL80211_ATTR_IFINDEX.
+ * Note that when %NL80211_CMD_DEL_INTERFACE is being sent as an event, it
+ * will be multicast on two groups: "config" and "config2".  Messages on
+ * the "config2" multicast group are sent before the "config" multicast
+ * group.
  *
  * @NL80211_CMD_GET_KEY: Get sequence counter information for a key specified
  * by %NL80211_ATTR_KEY_IDX and/or %NL80211_ATTR_MAC.
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 03421f66eea3..68f496c0c0a4 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -46,6 +46,7 @@ enum nl80211_multicast_groups {
NL80211_MCGRP_MLME,
NL80211_MCGRP_VE

[PATCH] nl80211: Support mgmt frame unregistrations

2019-09-04 Thread Denis Kenzior
Currently nl80211 supports purging management frame registrations
only under the following circumstances:
- If the underlying wireless device is destroyed
- If userspace closes the socket associated with that frame
registration

Thus userspace applications that want to gracefully support changing
interface modes (and thus the set of management frames they're
interested in seeing) must resort to opening multiple genl sockets and
manage these sockets appropriately.  To state another way, it is
currently not possible to write a userspace application that utilizes a
single nl80211 genl socket, instead it must open multiple genl sockets
for multiple wdevs on multiple phys.

This commit introduces two new NL80211 commands:
NL80211_CMD_REGISTER_FRAME2 and NL80211_CMD_UNREGISTER_FRAME.  The
former acts very much like NL80211_CMD_REGISTER_FRAME, except it returns
an NL80211_ATTR_COOKIE with a unique id identifying the management frame
registration.  This cookie can then be used with
NL80211_CMD_UNREGISTER_FRAME to delete a previous registration.

NL80211_CMD_UNREGISTER_FRAME can also be used to remove all frame
registrations currently associated with the calling socket.  This is
done by omitting the NL80211_ATTR_COOKIE attribute.  Only frame
registrations owned by the calling socket can be removed.

NL80211_CMD_REGISTER_FRAME2 was added to keep backwards compatibility
with older clients which rely on NL80211_CMD_REGISTER_FRAME and might
not be able to deal with introduction of a new attribute as a part of
the return value.

Signed-off-by: Denis Kenzior 
---
 include/uapi/linux/nl80211.h | 17 
 net/wireless/core.h  |  4 +-
 net/wireless/mlme.c  | 37 +++-
 net/wireless/nl80211.c   | 83 +++-
 4 files changed, 137 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/nl80211.h b/include/uapi/linux/nl80211.h
index beee59c831a7..cef7e6920a6d 100644
--- a/include/uapi/linux/nl80211.h
+++ b/include/uapi/linux/nl80211.h
@@ -1101,6 +1101,20 @@
  * peer MAC address and %NL80211_ATTR_FRAME is used to specify the frame
  * content. The frame is ethernet data.
  *
+ * @NL80211_CMD_REGISTER_FRAME2: Same as %NL80211_CMD_REGISTER_FRAME,
+ * except returns an ATTR_COOKIE so that the frame can be unregistered
+ * Unlike %NL80211_CMD_REGISTER_FRAME, this command requires a frame
+ * type attribute.  Registration can be dropped
+ * using %NL80211_CMD_UNREGISTER_FRAME
+ *
+ * @NL80211_CMD_UNREGISTER_FRAME: Unregisters a previously registered frame
+ * that was registered with %NL80211_CMD_REGISTER_FRAME2.  If
+ * %NL80211_ATTR_COOKIE is provided, then a single frame registration
+ * matching that cookie is unregistered.  Otherwise, all frames
+ * associated with the current socket are unregistered.  Note that this
+ * command can only affect registrations for a single wdev.  So
+ * %NL80211_ATTR_IFINDEX or %NL80211_ATTR_WDEV must be provided.
+ *
  * @NL80211_CMD_MAX: highest used command number
  * @__NL80211_CMD_AFTER_LAST: internal use
  */
@@ -1325,6 +1339,9 @@ enum nl80211_commands {
 
NL80211_CMD_PROBE_MESH_LINK,
 
+   NL80211_CMD_REGISTER_FRAME2,
+   NL80211_CMD_UNREGISTER_FRAME,
+
/* add new commands above here */
 
/* used to define NL80211_CMD_MAX below */
diff --git a/net/wireless/core.h b/net/wireless/core.h
index 77556c58d9ac..c81a03fa8d39 100644
--- a/net/wireless/core.h
+++ b/net/wireless/core.h
@@ -385,9 +385,11 @@ void cfg80211_mlme_down(struct cfg80211_registered_device 
*rdev,
struct net_device *dev);
 int cfg80211_mlme_register_mgmt(struct wireless_dev *wdev, u32 snd_pid,
u16 frame_type, const u8 *match_data,
-   int match_len);
+   int match_len, u64 cookie);
 void cfg80211_mlme_unreg_wk(struct work_struct *wk);
 void cfg80211_mlme_unregister_socket(struct wireless_dev *wdev, u32 nlpid);
+bool cfg80211_mlme_remove_registrations(struct wireless_dev *wdev,
+   u32 nlportid, u64 cookie);
 void cfg80211_mlme_purge_registrations(struct wireless_dev *wdev);
 int cfg80211_mlme_mgmt_tx(struct cfg80211_registered_device *rdev,
  struct wireless_dev *wdev,
diff --git a/net/wireless/mlme.c b/net/wireless/mlme.c
index f9462010575f..3cf189742dfb 100644
--- a/net/wireless/mlme.c
+++ b/net/wireless/mlme.c
@@ -425,6 +425,8 @@ struct cfg80211_mgmt_registration {
 
__le16 frame_type;
 
+   u64 cookie;
+
u8 match[];
 };
 
@@ -470,7 +472,7 @@ void cfg80211_mlme_unreg_wk(struct work_struct *wk)
 
 int cfg80211_mlme_register_mgmt(struct wireless_dev *wdev, u32 snd_portid,
u16 frame_type, const u8 *match_data,
-   int match_len)
+   int match_len, u64 cookie)
 {
s

Re: [RFCv2 2/4] nl80211: Support >4096 byte NEW_WIPHY event nlmsg

2019-08-30 Thread Denis Kenzior

Hi Johannes,

On 8/30/19 4:36 AM, Johannes Berg wrote:

On Fri, 2019-08-16 at 14:27 -0500, Denis Kenzior wrote:

For historical reasons, NEW_WIPHY messages generated by dumps or
GET_WIPHY commands were limited to 4096 bytes due to userspace tools
using limited buffers.


I think now that I've figured out why, it'd be good to note that it
wasn't due to userspace tools, but rather due to the default netlink
dump skb allocation at the time, prior to commit  9063e21fb026
("netlink: autosize skb lengthes").



Sure, will take care of it.


Once the sizes NEW_WIPHY messages exceeded these
sizes, split dumps were introduced.  All any non-legacy data was added
only to messages using split-dumps (including filtered dumps).

However, split-dumping has quite a significant overhead.  On cards
tested, split dumps generated message sizes 1.7-1.8x compared to
non-split dumps, while still comfortably fitting into an 8k buffer.  The
kernel now expects userspace to provide 16k buffers by default, and 32k
buffers are possible.

Introduce a concept of a large message, so that if the kernel detects
that userspace has provided a buffer of sufficient size, a non-split
message could be generated.


So, there's still a wrinkle with this. Larger SKB allocation can fail,
and instead of returning an error to userspace, the kernel will allocate
a smaller SKB instead.

With genetlink, we currently don't even have a way of controlling the
minimum allocation that's always required.

Since we already have basically all of the mechanics, I'd say perhaps a
better concept would be to "split when necessary", aborting if split
isn't supported.

IOW, do something like

... nl80211_send_wiphy(...)
{
[...]

switch (state->split_start) {
[...]
case :
[...] // put stuff
state->split_start++;
state->skb_end = nlmsg_get_pos(skb);
/* fall through */
case :
[...]
}

finish:
genlmsg_end(msg, hdr);
return 0;
nla_put_failure:
if (state->split_start < 9) {
genlmsg_cancel(msg, hdr);
return -EMSGSIZE;
}
nlmsg_trim(msg, state->skb_end);
goto finish;
}


That way, we fill each SKB as much as possible, up to 32k if userspace
provided big enough buffers *and* we could allocate the SKB.


Your userspace would still set the split flag, and thus be compatible
with all kinds of options:
  * really old kernel not supporting split
  * older kernel sending many messages
  * kernel after this change packing more into one message
  * even if allocating big SKBs failed



What I was thinking was to attempt to build a large message first and if 
that fails to fail over to the old logic.  But I think what you propose 
is even better.  I'll incorporate this feedback into the next version.


Regards,
-Denis


Re: [PATCH 1/2] nl80211: Add NL80211_EXT_FEATURE_LIVE_IFTYPE_CHANGE

2019-08-30 Thread Denis Kenzior

Hi Johannes,

On 8/30/19 5:19 AM, Johannes Berg wrote:

On Mon, 2019-08-26 at 11:26 -0500, Denis Kenzior wrote:


+ * Prior to Kernel 5.4, userspace applications should implement the
+ * following behavior:


I'm not sure mentioning the kernel version here does us any good? I
mean, you really need to implement that behaviour regardless of kernel
version, if NL80211_EXT_FEATURE_LIVE_IFTYPE_CHANGE isn't set.



Agreed.  I guess I just view nl80211.h as a sort of combination between 
a uapi file and an actual manpage.  And manpages do mention which kernel 
version a certain feature/flag/whatever was added.  Such info can be 
useful in many ways, e.g. figuring out which kernel version might be 
required for a certain piece of hardware, etc.


Another point where this might be useful is if the kernel starts 
enforcing certain behavior that it didn't before.  E.g. I mentioned this 
in the purge thread that a lot of mode change failure cases could be 
caught if the kernel checked this flag prior to doing anything.


I really leave this up to you if this is something you think is a good 
idea or not.



+ * @NL80211_EXT_FEATURE_LIVE_IFTYPE_CHANGE: This device supports switching
+ * the IFTYPE of an interface without having to bring the device DOWN
+ * first via RTNL.  Exact semantics of this feature is driver
+ * implementation dependent.


That's not really nice.


Sorry.  This came from some doc changes I have pending.  I think I wrote 
this after looking at some fullmac drivers and how they handle mode 
changes and the wording reflects the exasperation I felt at the time.


Do you want to suggest some alternate wording?  I think it is worth it 
to have some fair warning in the docs stating that prior to version so 
and so the semantics are completely driver dependent.





For mac80211, the following restrictions
+ * apply:
+ * - Only devices currently in IFTYPE AP, P2P_GO, P2P_CLIENT,
+ *   STATION, ADHOC and OCB can be switched.
+ * - The target IFTYPE must be one of: AP, P2P_GO, P2P_CLIENT,
+ *   STATION, ADHOC or OCB.
+ * Other drivers are expected to follow similar restrictions.


Maybe we should instead have a "bitmask of interface types that can be
switched while live" or something like that?



I'm fine with that, but this would only apply to newer kernels, no? 
Don't we at least want to attempt to state what the rules are for older 
ones?


An alternative might be to simply state what the restrictions are and 
just enforce those at the cfg80211 level.


Regards,
-Denis


Re: [RFCv2 1/4] nl80211: Fix broken non-split wiphy dumps

2019-08-30 Thread Denis Kenzior

Hi Johannes,

On 8/30/19 4:03 AM, Johannes Berg wrote:

On Fri, 2019-08-16 at 14:27 -0500, Denis Kenzior wrote:

If a (legacy) client requested a wiphy dump but did not provide the
NL80211_ATTR_SPLIT_WIPHY_DUMP attribute, the dump was supposed to be
composed of purely non-split NEW_WIPHY messages, with 1 wiphy per
message.  At least this was the intent after commit:
3713b4e364ef ("nl80211: allow splitting wiphy information in dumps")

However, in reality the non-split dumps were broken very shortly after.
Perhaps around commit:
fe1abafd942f ("nl80211: re-add channel width and extended capa advertising")


Fun. I guess we updated all userspace quickly enough to not actually
have any issues there. As far as I remember, nobody ever complained, so
I guess people just updated their userspace.

Given that it's been 6+ years, maybe we're better off just removing the
whole non-split thing then, instead of fixing it. Seems even less likely
now that somebody would run a 6+yo supplicant (from before its commit
c30a4ab045ce ("nl80211: Fix mode settings with split wiphy dump")).



That would be my vote, given that we're probably one of a handful of 
people in this world that understand that code path.


But...  How would we handle non-dump versions of GET_WIPHY?  To this day 
I have dhcpcd issuing fun stuff like:


< Request: Get Wiphy (0x01) len 8 [ack] 
0.374832

Interface Index: 59 (0x003b)


OTOH, this is a simple fix, would removing the non-split mode result in
any appreciable cleanups? Perhaps not, and we'd have to insert something
instead to reject non-split and log a warning, or whatnot.



Getting rid of the legacy non-split case would simplify things.  We 
could also be a-lot smarter about how we split up the messages in order 
to utilize buffer space more efficiently.  I think you cover this in 
your other replies, but I haven't processed those yet.


Regards,
-Denis


Re: [RFCv2 4/4] nl80211: Send large new_wiphy events

2019-08-30 Thread Denis Kenzior

Hi Johannes,

On 8/30/19 5:14 AM, Johannes Berg wrote:

On Fri, 2019-08-16 at 14:27 -0500, Denis Kenzior wrote:

Send large NEW_WIPHY events on a new multicast group so that clients
that can accept larger messages do not need to round-trip to the kernel
and perform extra filtered wiphy dumps.

A new multicast group is introduced and the large message is sent before
the legacy message.  This way clients that listen on both multicast
groups can ignore duplicate legacy messages if needed.


Since I just did the digging, it seems that this would affect (old)
applications with libnl up to 3.2.22, unless they changed the default
recvmsg() buffer size.



Sorry, I'm not sure I understand.  Are you saying new clients would try 
to use old libnl and subscribe to this new multicast group for large 
messages?  Legacy clients shouldn't even see messages on this multicast 
group since they would never subscribe to it.



I think this is a pretty decent approach, but I'm slightly worried about
hitting the new limits (16k) eventually. It seems far off now, but who
knows what kind of data we'll add. HE is (and likely will be) adding
quite a bit since it has everything for each interface type - something
drivers have for the most part not implemented yet. That trend will only
continue, as complexity in the spec doesn't seem to be going down.



Right, but the kernel will go up to 32k buffers if userspace read buffer 
is that large.  So I think we have quite some room to grow.  On the 
other hand, we probably should be vigilant that any new stuff added 
tries to minimize message sizes whenever possible.



And I don't really want to see "config3" a couple of years down the
road...



Agreed.


So can we at least mandate (document) that "config2" basically has no
message limit, and you will use MSG_PEEK/handle MSG_TRUNC with it?



Yes, I will take care of that in v3.


That way, we can later bump the 8192 even beyond 16k if needed, and not
run into problems.


+   if (cmd == NL80211_CMD_NEW_WIPHY) {
+   state.large_message = true;
+   alloc_size = 8192UL;
+   } else
+   alloc_size = NLMSG_DEFAULT_SIZE;
+


nit: there should be braces on both branches



will fix


+   if (nl80211_send_wiphy(rdev, cmd, msg, 0, 0, 0, &state) < 0) {
+   nlmsg_free(msg);
+   goto legacy;
+   }


I think that'd be worth a WARN_ON(), it should never happen that you
actually run out of space, it means that the above wasn't big enough.



Yep


Now, on the previous patches I actually thought that you could set
"state->split" (and you should) and not need "state->large_message" in
order to indicate that the sub-functions are allowed to create larger
data - just keep filling the SKBs as much as possible for the dump.

Here, it seems like we do need it. It might be possible to get away
without it (by setting split here, and then having some special code to
handle the case of it not getting to the end), but that doesn't seem
worth it.


@@ -14763,6 +14787,8 @@ void nl80211_notify_iface(struct 
cfg80211_registered_device *rdev,
 return;
 }
  
+   genlmsg_multicast_netns(&nl80211_fam, wiphy_net(&rdev->wiphy), msg, 0,

+   NL80211_MCGRP_CONFIG2, GFP_KERNEL);


Hmm. That seems only needed if you don't want to listen on "config" at
all, but in the patch description you explicitly said that you send it
on "config2" *before* "config" for compatibility reasons (which makes
sense) - so what is it?


Well it can be both, depending on whether large messages can fail or 
not.  So one use case might be that a client detects whether the config2 
multicast group exists.  If so, then it only subscribes to it and that's it.


Another use case might be (if userspace is worried about losing large 
messages) to subscribe to both groups.  If it receives the large 
message, it can ignore the one that comes on the legacy multicast group.




I'm having a hard time seeing anyone get away with only listening on
config2 since that'd basically require very recent (as of now future)
kernel. Are you planning this for a world where you can ditch support
for kernel<5.4 (or so)?


No, but there's nothing stopping the client in making the choice at 
runtime depending on the genl family info it gets.  E.g. by peeking into 
CTRL_ATTR_MCAST_GROUPS.


Regards,
-Denis


Re: [PATCH] cfg80211: Purge frame registrations on iftype change

2019-08-30 Thread Denis Kenzior

Hi Johannes,

On 8/30/19 3:53 AM, Johannes Berg wrote:

On Wed, 2019-08-28 at 16:11 -0500, Denis Kenzior wrote:

Currently frame registrations are not purged, even when changing the
interface type.  This can lead to potentially weird / dangerous
situations where frames possibly not relevant to a given interface
type remain registered and mgmt_frame_register is not called for the
no-longer-relevant frame types.


I'd argue really just "weird and non-working", hardly dangerous. Even in
the mac80211 design where we want to not let you intercept e.g. AUTH
frames in client mode - if you did, then you'd just end up with a non-
working interface. Not sure I see any "dangerous situation". Not really
an all that important distinction though.


Fair enough, I'm happy to drop / reword this language.  It seemed fishy 
to me since the unregistration operation was not called at all, and the 
driver does go to some lengths to set up the valid frame registration 
types.




Depending on the design, it may also just be that those registrations
are *ignored*, because e.g. firmware intercepts the AUTH frame already,
which would just (maybe) confuse userspace - but that seems unlikely
since it switched interface type and has no real need for those frames
then.


There might be corner cases where userspace gets confused and doesn't 
update the frame registrations properly.  For example, wpa_s/hostap does 
not listen to SET_INTERFACE events that I can tell.  So if some external 
app sets the mode (particularly on a 'live' interface) then all kinds of 
unexpected things might happen.  This is one of the motivations for 
restricting certain NL80211 commands to interface SOCKET_OWNER.


So really this patch is intended more as a hot-fix / backport to stable 
to make sure the older kernels can deal with some of these situations.





The kernel currently relies on userspace apps to actually purge the
registrations themselves, e.g. by closing the nl80211 socket associated
with those frames.  However, this requires multiple nl80211 sockets to
be open by the userspace app, and for userspace to be aware of all state
changes.  This is not something that the kernel should rely on.


I tend to agree with that the kernel shouldn't rely on it.


This commit adds a call to cfg80211_mlme_purge_registrations() to
forcefully remove any registrations left over prior to switching the
iftype.


However, I do wonder if we should make this more transactional, and hang
on to them if the type switching fails. We're not notifying userspace
that the registrations have disappeared, so if type switching fails and
it continues to work with the old type rather than throwing its hands up
and quitting or something, it'd make a possibly bigger mess to just
silently have removed them already.


I do like that idea, not sure how to go about implementing it though? 
The failure case is a bit hard to deal with.  Something like 
NL80211_EXT_FEATURE_LIVE_IFTYPE_CHANGE would help, particularly if 
nl80211/cfg80211 actually checked it prior to doing anything (e.g. 
disconnecting, etc).  That would then take care of the majority of the 
'typical' failure paths.  I didn't add such checking in the other patch 
set since I felt you might find it overly intrusive on userspace.  But 
maybe we really should do this?


So playing devil's advocate, another argument might be that by the time 
we got here, we've already tore down a bunch of state.  E.g. 
disconnected the station, stopped AP, etc.  So we've already 
side-effected state in a bunch of ways, what's one more?




I *think* it should be safe to just move this after the switching
succeeds, since the switching can pretty much only be done at a point
where nothing is happening on the interface anyway, though that might
confuse the driver when the remove happens.



I would concur as that is what happens today.  But should it?


Also, perhaps it'd be better to actually hang on to those registrations
that *are* still possible afterwards? But to not confuse the driver I
guess that might require unregister/re-register to happen, all of which
requires hanging on to the list and going through it after the type
switch completed?


Yes, I had those exact thoughts as well.

It isn't currently clear to me if there are any guarantees on the driver 
operation call sequence that cfg80211 provides.  E.g. can the driver 
expect rdev_change_virtual_intf to be called only once all the old 
registrations are purged and the new registrations are performed after 
the fact?  Or should it expect things to just happen in any order?




What do you think?



A big part of me thinks that just wiping the slate clean and having 
userspace set it up from scratch isn't that much to ask and it might 
want to do that anyway.  It might (a big maybe?) also make the driver's 
life easier if it can rely on certain guarantees from cf

[PATCH] cfg80211: Purge frame registrations on iftype change

2019-08-28 Thread Denis Kenzior
Currently frame registrations are not purged, even when changing the
interface type.  This can lead to potentially weird / dangerous
situations where frames possibly not relevant to a given interface
type remain registered and mgmt_frame_register is not called for the
no-longer-relevant frame types.

The kernel currently relies on userspace apps to actually purge the
registrations themselves, e.g. by closing the nl80211 socket associated
with those frames.  However, this requires multiple nl80211 sockets to
be open by the userspace app, and for userspace to be aware of all state
changes.  This is not something that the kernel should rely on.

This commit adds a call to cfg80211_mlme_purge_registrations() to
forcefully remove any registrations left over prior to switching the
iftype.

Cc: sta...@vger.kernel.org

Signed-off-by: Denis Kenzior 
---
 net/wireless/util.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/wireless/util.c b/net/wireless/util.c
index c99939067bb0..3fa092b78e62 100644
--- a/net/wireless/util.c
+++ b/net/wireless/util.c
@@ -964,6 +964,7 @@ int cfg80211_change_iface(struct cfg80211_registered_device 
*rdev,
}
 
cfg80211_process_rdev_events(rdev);
+   cfg80211_mlme_purge_registrations(dev->ieee80211_ptr);
}
 
err = rdev_change_virtual_intf(rdev, dev, ntype, params);
-- 
2.19.2



[PATCH 0/2] mac80211: Control Port over nl80211 fixes

2019-08-27 Thread Denis Kenzior
Couple of small fixes for Control Port handling in mac80211.  The
original commit was working by some crazy luck in all testing, but
manifested itself on certain hardware that managed to drop PAE frames
with uncanny consistency.

Denis Kenzior (2):
  mac80211: Don't memset RXCB prior to PAE intercept
  mac80211: Correctly set noencrypt for PAE frames

 net/mac80211/rx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
2.19.2



[PATCH 1/2] mac80211: Don't memset RXCB prior to PAE intercept

2019-08-27 Thread Denis Kenzior
In ieee80211_deliver_skb_to_local_stack intercepts EAPoL frames if
mac80211 is configured to do so and forwards the contents over nl80211.
During this process some additional data is also forwarded, including
whether the frame was received encrypted or not.  Unfortunately just
prior to the call to ieee80211_deliver_skb_to_local_stack, skb->cb is
cleared, resulting in incorrect data being exposed over nl80211.

Fixes: 018f6fbf540d ("mac80211: Send control port frames over nl80211")
Cc: sta...@vger.kernel.org
Signed-off-by: Denis Kenzior 
---
 net/mac80211/rx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 3c1ab870fefe..7c4aeac006fb 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -2452,6 +2452,8 @@ static void ieee80211_deliver_skb_to_local_stack(struct 
sk_buff *skb,
cfg80211_rx_control_port(dev, skb, noencrypt);
dev_kfree_skb(skb);
} else {
+   memset(skb->cb, 0, sizeof(skb->cb));
+
/* deliver to local stack */
if (rx->napi)
napi_gro_receive(rx->napi, skb);
@@ -2546,8 +2548,6 @@ ieee80211_deliver_skb(struct ieee80211_rx_data *rx)
 
if (skb) {
skb->protocol = eth_type_trans(skb, dev);
-   memset(skb->cb, 0, sizeof(skb->cb));
-
ieee80211_deliver_skb_to_local_stack(skb, rx);
}
 
-- 
2.19.2



[PATCH 2/2] mac80211: Correctly set noencrypt for PAE frames

2019-08-27 Thread Denis Kenzior
The noencrypt flag was intended to be set if the "frame was received
unencrypted" according to include/uapi/linux/nl80211.h.  However, the
current behavior is opposite of this.

Cc: sta...@vger.kernel.org
Fixes: 018f6fbf540d ("mac80211: Send control port frames over nl80211")
Signed-off-by: Denis Kenzior 
---
 net/mac80211/rx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 7c4aeac006fb..8514c1f4ca90 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -2447,7 +2447,7 @@ static void ieee80211_deliver_skb_to_local_stack(struct 
sk_buff *skb,
  skb->protocol == cpu_to_be16(ETH_P_PREAUTH)) &&
 sdata->control_port_over_nl80211)) {
struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(skb);
-   bool noencrypt = status->flag & RX_FLAG_DECRYPTED;
+   bool noencrypt = (status->flag & RX_FLAG_DECRYPTED) == 0;
 
cfg80211_rx_control_port(dev, skb, noencrypt);
dev_kfree_skb(skb);
-- 
2.19.2



[PATCH 1/2] nl80211: Add NL80211_EXT_FEATURE_LIVE_IFTYPE_CHANGE

2019-08-26 Thread Denis Kenzior
There is some ambiguity in how various drivers support
NL80211_CMD_SET_INTERFACE on devices where the underlying netdev is UP.
mac80211 for example supports this if the underlying driver provides a
change_interface operation.  However, most devices do not.  For FullMac
devices, the situation is even less clear.

This patch introduces a new feature flag that lets userspace know
whether it can expect a mode change (via SET_INTERFACE) to work while
the device is still UP or it should bring down the device first.

This commit also updates the documentation for SET_INTERFACE with hints
as to how it should be used.

Signed-off-by: Denis Kenzior 
---
 include/uapi/linux/nl80211.h | 36 
 1 file changed, 36 insertions(+)

diff --git a/include/uapi/linux/nl80211.h b/include/uapi/linux/nl80211.h
index bf7c4222f512..a9ca2fe67f52 100644
--- a/include/uapi/linux/nl80211.h
+++ b/include/uapi/linux/nl80211.h
@@ -275,6 +275,29 @@
  * single %NL80211_ATTR_IFINDEX is supported.
  * @NL80211_CMD_SET_INTERFACE: Set type of a virtual interface, requires
  * %NL80211_ATTR_IFINDEX and %NL80211_ATTR_IFTYPE.
+ *
+ * Note that it is driver-dependent whether a SET_INTERFACE will be
+ * allowed if the underlying netdev is currently UP.  Userspace
+ * can obtain a hint as to whether this is allowed by looking at
+ * the %NL80211_EXT_FEATURE_LIVE_IFTYPE_CHANGE, but certain restrictions
+ * will still apply.
+ *
+ * Prior to Kernel 5.4, userspace applications should implement the
+ * following behavior:
+ * 1. (Optionally) Attempt SET_INTERFACE on a wireless device
+ *with the underlying netdev in the UP state.  If -EBUSY
+ *is returned proceed to 2.  Note that a SET_INTERFACE
+ *which results in -EBUSY might still result in other
+ *side-effects, such as Deauthentication, exiting AP mode,
+ *etc.
+ * 2. Bring the netdev DOWN via RTNL
+ * 3. Attempt SET_INTERFACE on the underlying netdev in the DOWN
+ *state.  If successful, proceed to 4.
+ * 4. Bring the netdev UP via RTNL
+ *
+ * Note that bringing down the device might trigger a firmware reset /
+ * power cycling and/or other effects that are driver dependent.
+ *
  * @NL80211_CMD_NEW_INTERFACE: Newly created virtual interface or response
  * to %NL80211_CMD_GET_INTERFACE. Has %NL80211_ATTR_IFINDEX,
  * %NL80211_ATTR_WIPHY and %NL80211_ATTR_IFTYPE attributes. Can also
@@ -5481,6 +5504,18 @@ enum nl80211_feature_flags {
  * @NL80211_EXT_FEATURE_SAE_OFFLOAD: Device wants to do SAE authentication in
  * station mode (SAE password is passed as part of the connect command).
  *
+ * @NL80211_EXT_FEATURE_LIVE_IFTYPE_CHANGE: This device supports switching
+ * the IFTYPE of an interface without having to bring the device DOWN
+ * first via RTNL.  Exact semantics of this feature is driver
+ * implementation dependent.  For mac80211, the following restrictions
+ * apply:
+ * - Only devices currently in IFTYPE AP, P2P_GO, P2P_CLIENT,
+ *   STATION, ADHOC and OCB can be switched.
+ * - The target IFTYPE must be one of: AP, P2P_GO, P2P_CLIENT,
+ *   STATION, ADHOC or OCB.
+ * Other drivers are expected to follow similar restrictions.
+ * This flag was introduced in Kernel v5.4
+ *
  * @NUM_NL80211_EXT_FEATURES: number of extended features.
  * @MAX_NL80211_EXT_FEATURES: highest extended feature index.
  */
@@ -5526,6 +5561,7 @@ enum nl80211_ext_feature_index {
NL80211_EXT_FEATURE_EXT_KEY_ID,
NL80211_EXT_FEATURE_STA_TX_PWR,
NL80211_EXT_FEATURE_SAE_OFFLOAD,
+   NL80211_EXT_FEATURE_LIVE_IFTYPE_CHANGE,
 
/* add new features before the definition below */
NUM_NL80211_EXT_FEATURES,
-- 
2.19.2



[PATCH 2/2] mac80211: Set LIVE_IFTYPE_CHANGE if op provided

2019-08-26 Thread Denis Kenzior
Signed-off-by: Denis Kenzior 
---
 net/mac80211/main.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 29b9d57df1a3..073e5d10a48e 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -597,6 +597,10 @@ struct ieee80211_hw *ieee80211_alloc_hw_nm(size_t 
priv_data_len,
 
wiphy_ext_feature_set(wiphy, NL80211_EXT_FEATURE_RRM);
 
+   if (ops->change_interface)
+   wiphy_ext_feature_set(wiphy,
+ NL80211_EXT_FEATURE_LIVE_IFTYPE_CHANGE);
+
wiphy->bss_priv_size = sizeof(struct ieee80211_bss);
 
local = wiphy_priv(wiphy);
-- 
2.19.2



Re: [RFC 0/1] Allow MAC change on up interface

2019-08-20 Thread Denis Kenzior

Hi Dan,

On 8/20/19 4:18 PM, Dan Williams wrote:





Code will be written, but I'd rather it be written once rather than 3+
times for STA/AP/Mesh/etc.



I'm not sure you can state that definitively just yet?  So the real 
question is whether saving the extra round-trip to the kernel is worth 
the in-kernel complexity.  Given that interleaved system calls are 
_always_ a problem, I argue that it is worth it for at least the Station 
case (and it will keep connection times even faster to boot).  Isn't 
minimizing the latency of connections the end goal here?  I get that 
there are trade offs and people have other opinions on what a good trade 
off is.


But don't misunderstand, either solution is better than what we have 
today.  My argument is: "why close the door on a particular solution 
until the costs are known?"



The rest, I'm not sure why you are worried about them now?  For
station
there's a very clear & present use case.  If such a clear and
present
use case is presented for AP or Mesh, then deal with it then.


Why would you not want to pass a random MAC for AP or Mesh modes? The
same reasons for MAC randomization apply for all those too, I'd think.


Umm, I was not arguing against doing that at all?  All I said was that 
no such use case was yet presented.  For AP it isn't typically needed to 
rapidly switch between MAC addresses while keeping the device UP.  If 
you think there's such a need, I'm happy to learn something new? Same 
goes for Mesh really?





I don't see how this will not keep proliferating, and each new
thing
will come with its own dozen lines of code, a new feature flag,
etc.


Such is life? :)


Not really. It's the job of maintainers to balance all these things, to
step back and think of the bigger picture and the future rather than
just solving one particular use-case today.
 > Your tone leaves the impression you want a particular solution pushed
through without the normal planning/architecture discussions that
accompany API changes. And that's not how the process typically works.



So who's attacking who now?  We're trying to solve a long standing issue 
that nobody has bothered to fix for years in a clean way.  Something 
that one of your projects would benefit from, btw.


I have a technical opinion about how it should look like.  Johannes 
might have a different opinion.  In the end it is up to him and I can go 
pound sand.  So yes, I know how the process works ;)


Regards,
-Denis


Re: [RFC 0/1] Allow MAC change on up interface

2019-08-20 Thread Denis Kenzior

Hi Johannes,

On 8/20/19 3:15 PM, Johannes Berg wrote:

On Tue, 2019-08-20 at 14:58 -0500, Denis Kenzior wrote:


But what actual complexity are we talking about here? If the kernel can
do this while the CONNECT is pending, why not?  It makes things simpler
and faster for userspace.  I don't see the downside unless you can
somehow objectively explain 'complexity'.


It's just extra code that we have to worry about. Right now you want it
for CMD_CONNECT and CMD_AUTH. Somebody will come up with a reason to do
it in CMD_ASSOC next, perhaps, who knows. Somebody else will say "oh,
this is how it's done, so let's add it to CMD_JOIN_IBSS", because of
course that's what they care about. OCB? Mesh? AP mode for tethering?
Etc.


I don't buy the extra code argument.  If you want to do something useful 
you need to write 'extra code'.


The rest, I'm not sure why you are worried about them now?  For station 
there's a very clear & present use case.  If such a clear and present 
use case is presented for AP or Mesh, then deal with it then.




I don't see how this will not keep proliferating, and each new thing
will come with its own dozen lines of code, a new feature flag, etc.


Such is life? :)



Relaxing and defining once and for all in which situations while the
interface is up you can actually allow changing the address, and then
having userspace do it that way is IMHO a better way to design the
system, since it forgoes entirely all those questions of when and how
and which new use cases will come up etc.



That would be great in theory, but practically never works at least in 
my experience.  So maybe keep and open mind?  There is a clear need to 
make this path as fast as possible for STA.  There is no such need (yet) 
for the other cases you mentioned.



This was an RFC.  There isn't much point for us to cross all the 't's
and dot all the 'i's if you hate the idea in the first place.


Sure, but I cannot distinguish between "we only want it on CMD_CONNECT"
and "we'll extend this once we agree" unless you actually say so. It'd
help to communicate which t's and i's you didn't cross or dot.


Okay, I'll admit the RFC description could have been better.  But in the 
end you're human last I checked (at least I recall meeting you several 
times? ;)  How about a simple "Why do you think you need this?" first?


Regards,
-Denis


Re: [RFC 0/1] Allow MAC change on up interface

2019-08-20 Thread Denis Kenzior

Hi Johannes,

On 8/20/19 2:43 PM, Johannes Berg wrote:

On Tue, 2019-08-20 at 14:22 -0500, Denis Kenzior wrote:

Hi Johannes,

So keeping things purely technical now :)


I thought so, but I had another thought later. It might be possible to
set LIVE_ADDR_CHANGE, but then block it in mac80211 when the interface
is already connected (or beaconing, or whatever, using the MAC address
in some way - even while scanning, remain-on-channel is active, etc.)



Here's what we would like to see:

- The ability for userspace to add a 'Local Mac Address to use'
attribute on CMD_CONNECT and CMD_AUTHENTICATE.


Why here though? I don't really like this as it's extra complexity
there, and dev_set_mac_address() is really easy to call from userspace.
Yes, that's sort of a round-trip more (you wouldn't even really have to
wait for it I guess), but we have to make trade-offs like that (vs.
kernel complexity) at some point.


But what actual complexity are we talking about here? If the kernel can 
do this while the CONNECT is pending, why not?  It makes things simpler 
and faster for userspace.  I don't see the downside unless you can 
somehow objectively explain 'complexity'.





- It doesn't seem useful to add this to CMD_ASSOCIATE at the moment, as
for new connections you'd always go either through CMD_AUTHENTICATE,
CMD_ASSOCIATE sequence or use CMD_CONNECT.  This should take care of
some (most) of your objections about specifying different addresses for
authenticate/associate.  Feel free to correct me here.


That wasn't really my objection, I was more wondering why James
implemented it *only* for CMD_CONNECT, when I had been under the
(apparently mistaken) impression that one would generally prefer
CMD_ASSOCIATE over CMD_CONNECT, if available.


This was an RFC.  There isn't much point for us to cross all the 't's 
and dot all the 'i's if you hate the idea in the first place.





- Optionally (and I'm thinking of tools like NM here), add the ability
to set the mac address via RTNL while the device is UP but has no
carrier, and things like scanning, offchannel, etc are not in progress.
Though really I don't see how NM could guarantee this without bringing
the device down first, so I'll let NM guys chime in to see if this is a
good idea.


I'm thinking along the lines of letting you do this *instead* of the
scheme you described above. That way, we don't even need to have a
discussion over whether it makes sense at CONNECT, AUTH, ASSOC, or
whatever because you can essentially do it before/with any of these
commands.

Why would this not be sufficient?



It would get the job done.  But it is still a waste of time and still 
slowing us down.  Look at it this way, even if we save 10ms here.  Take 
that and multiply by 3-4 billion devices and then by the number of 
connections one does each day.  This adds up to some serious time 
wasted.  So why not do this elegantly and faster in the first place?



- We definitely do not want to to mess with the device state otherwise.
E.g. no firmware downloading, powering the adapter down, etc.  That just
takes too much time.


I can understand this part.


So tell us what you would like to see.  A new
IFF_NO_CARRIER_ADDRESS_CHANGE or whether it is possible to use
IFF_LIVE_ADDR_CHANGE with some additional restrictions.


I don't know. This is not something I'm intimately familiar with. I
could imagine (as you quoted above) that we just set
IFF_LIVE_ADDR_CHANGE and manage the exclusions local to e.g. mac80211.



Okay, so lets operate under the assumption we can hi-jack 
IFF_LIVE_ADDR_CHANGE for this



I still think you'd have to bake it into the mac80211<->driver API
somehow, because we normally "add_interface()" with the MAC address, and
nothing says that the driver cannot ignore the MAC address from that
point on. The fact that iwlwifi just copies it into every new MAC_CTXT
command and the firmware actually accepts the update seems rather
accidental and therefore fragile to rely on.



Since you seem to have a clear(er?) idea here, can you elaborate or
possibly provide the driver interface changes you want to see?


I can see a few ways of doing this, for example having an optional
"change_vif_addr" method in the mac80211 ops, so that drivers can do the
necessary work. I suppose iwlwifi would actually want to send a new
MAC_CONTEXT command at this time, for example, because the firmware is
notoriously finicky when it comes to command sequences.

Alternatively, and this would work with all drivers, you could just
pretend to remove/add the interface, ie. in mac80211 call

  drv_remove_interface(sdata)
  // set new mac addr
  drv_add_interface(sdata)

This has the advantage that it'll be guaranteed to work with all
drivers, at the expense of perhaps a few more firmware commands
(depending on the driver).

You can probably come 

Re: [RFC 0/1] Allow MAC change on up interface

2019-08-20 Thread Denis Kenzior

Hi Johannes,

On 8/20/19 2:32 PM, Johannes Berg wrote:

Hi Denis,

Rather than replying to all the separate items here, just two things:

1) I'll reiterate - please keep things civil. You're taking things out
of context a *lot* here, in what seems like an attempt to draw a
parallel between my and your utterances.

2) I'll take your point that I've been somewhat dismissive in tone, and
will try to change that.



I'll apologize for the methods I used, but you were not getting to 2) 
above via our earlier discussions.  Anyway, peace.




I do want to reply to two things specifically though:


Fine.  I get that.  But how about asking what the use case is? Or say
you don't quite understand why something is needed?


Really, I should *not* have to ask that. You should consider it your
obligation to provide enough information for me to review patches
without having to go back and ask what it is you actually want to
achieve.


So let me ask you this.  What do you _want_ to see when a contributor 
submits something as an RFC, which that contributor states is not ready 
for 'true' review and is really there to generate feedback?


Do you want to have that contributor use a different prefix? Every 
maintainer is differrent.  I get that.  So if we want to start an actual 
brainstorming session with you, how do we accomplish that?




Compared to some other subsystems and maintainers I've dealt with, I
think I've actually been rather patient in trying to extract the purpose
of your changes, rather than *really* just dismissing them (which I've
felt like many times.)



I'll admit you're not the worst I've dealt with, but you can always 
improve, right? :)



a maintainer who's job (by definition)
is to encourage new contributors and improve the subsystem he
maintains...?


This is what maybe you see as the maintainer's role, but I disagree, at
least to some extent. I see the role more of a supervisor, somebody
who's ensuring that all the changes coming in will work together. Yes, I
have my own incentive to improve things, but if you have a particular
incentive to improve a particular use case, the onus really is on you to
convince me of that, not on me to try to ferret the reasoning out of you
and make those improvements my own.



So this goes back to my earlier point.  How do you want us to start a 
discussion with you such that you don't become a 'supervisor' and 
instead try to understand the pain points first?


And really, we are not expecting you to do the improvements on your own. 
 But you know the subsystem best.  So you really should consider giving 
actual guidance on how to accomplish something in the best way.


Also, look at it from the PoV of any new contributor.  So while I can 
definitely relate to what you're saying here, I think you should put 
yourself in your peer's shoes and try to be more understanding of their 
perspective.  At least make the effort to hear people out...




So please - come with some actual reasoning. This particular thread only
offered "would elminate a few potential race conditions", in the cover
letter, not even the patch itself, so it wasn't very useful. I was
perhaps less than courteous trying to extract the reasoning, but I
shouldn't have to in the first place.



Okay, so we'll work on that.  But this is a two way street too.  And 
sometimes it seems like you're not actually reading the cover letters ;)


Regards,
-Denis


Re: [RFC 0/1] Allow MAC change on up interface

2019-08-20 Thread Denis Kenzior

Hi Johannes,

So keeping things purely technical now :)


I thought so, but I had another thought later. It might be possible to
set LIVE_ADDR_CHANGE, but then block it in mac80211 when the interface
is already connected (or beaconing, or whatever, using the MAC address
in some way - even while scanning, remain-on-channel is active, etc.)



Here's what we would like to see:

- The ability for userspace to add a 'Local Mac Address to use' 
attribute on CMD_CONNECT and CMD_AUTHENTICATE.
- It doesn't seem useful to add this to CMD_ASSOCIATE at the moment, as 
for new connections you'd always go either through CMD_AUTHENTICATE, 
CMD_ASSOCIATE sequence or use CMD_CONNECT.  This should take care of 
some (most) of your objections about specifying different addresses for 
authenticate/associate.  Feel free to correct me here.
- For Fast Transitions, Re-associations, roaming, etc userspace would 
simply omit the 'Local Mac Address to use' attribute and the currently 
set address would be used.
- Optionally (and I'm thinking of tools like NM here), add the ability 
to set the mac address via RTNL while the device is UP but has no 
carrier, and things like scanning, offchannel, etc are not in progress. 
Though really I don't see how NM could guarantee this without bringing 
the device down first, so I'll let NM guys chime in to see if this is a 
good idea.
- We definitely do not want to to mess with the device state otherwise. 
E.g. no firmware downloading, powering the adapter down, etc.  That just 
takes too much time.


The goal here is to keep things as fast as possible.  The randomization 
feature is basically standard on every modern OS.  So users expect it to 
be used on every connection.  Slowing down the connection time by up to 
3 seconds just isn't acceptable.


So tell us what you would like to see.  A new 
IFF_NO_CARRIER_ADDRESS_CHANGE or whether it is possible to use 
IFF_LIVE_ADDR_CHANGE with some additional restrictions.



I still think you'd have to bake it into the mac80211<->driver API
somehow, because we normally "add_interface()" with the MAC address, and
nothing says that the driver cannot ignore the MAC address from that
point on. The fact that iwlwifi just copies it into every new MAC_CTXT
command and the firmware actually accepts the update seems rather
accidental and therefore fragile to rely on.



Since you seem to have a clear(er?) idea here, can you elaborate or 
possibly provide the driver interface changes you want to see?


Regards,
-Denis


Re: [RFC 0/1] Allow MAC change on up interface

2019-08-20 Thread Denis Kenzior

Hi Dan,

On 8/20/19 12:53 PM, Dan Williams wrote:

On Tue, 2019-08-20 at 10:40 -0500, Denis Kenzior wrote:

Hi Johannes,


Stop.

Your tone, and in particular the constant snide comments and
attacks on
me are, quite frankly, getting extremely tiring.



Look, I'm sorry I hit a nerve, but from where I am sitting, it had to
be
said...


But did it really? And in that way?  There were certainly better ways
to go about that response.


The issue is that this isn't the first such occurrence.  There is a 
pattern here and it needs to change.  So +1 on handling this better.




I don't recall seeing a NAK anywhere his email chain (which you'd get
with some other kernel maintainers) but instead (a) an explanation of
why the proposed solution had some problems, (b) some alternative
possibilities and (c) requests for more information so the discussion
could continue.


So the cover letter states:
"Set IFF_LIVE_ADDR_CHANGE on net_device. Note: I know setting this
   where I did is likely not the right way to do it, but for this
   proof-of-concept it works. With guidance I can move this around
   to a proper place."

and I'll leave it up to you to read the first response from the maintainer.



It does the requested changes no good to take that kind of tone. Let's


Neither is:
"don't do that then"

or

"I'm not really sure I see any point in this to start with"

or

"To me, the whole thing seems like more of a problem than a solution."


move on from here and keep things constructive to solve the problem at
hand, which is:

"Changing the MAC address of a WiFi interface takes longer than I'd
like and clears some state that I'd like it to keep."

That is a technical problem we can solve, so let's keep it at that
level.



I'm all for moving on and having the people that know this stuff well 
giving actual guidance, as was requested originally.


Regards,
-Denis


Re: [RFC 0/1] Allow MAC change on up interface

2019-08-20 Thread Denis Kenzior

Hi Johannes,


Stop.

Your tone, and in particular the constant snide comments and attacks on
me are, quite frankly, getting extremely tiring.



Look, I'm sorry I hit a nerve, but from where I am sitting, it had to be 
said...


Regardless.  Peace, I'm not trying to start drama here.  We just want to 
improve things and it feels like you're shutting down every idea without 
truly understanding the issues we're facing.



It almost seems like you're just trying to bully me into taking your
patches by constantly trying to make me feel that I cannot know better
anyway. This is not how you should be treating anyone.



Before lecturing me on my tone, can you go back and re-read your 
responses to many of the contributors here?  I mean really read them? 
Your tone is quite dismissive, whether intentional or not.  When one of 
my guys comes to me and says:
	"Johannes' response was completely useless, it feels like he didn't 
even read my cover letter"


I will come out and call you on it.  So if you don't mean to come off 
that way, great.  We'll just chalk it up to a mis-understanding.



Look, I did say I don't see a point in this, but you're taking that out
of context. I also stated that I didn't understand the whole thing about
"race conditions" and all, because nobody actually explained the
reasoning behind the changes here.


Fine.  I get that.  But how about asking what the use case is? Or say 
you don't quite understand why something is needed?  We'll happily 
explain.  When the first reaction to an RFC is: "I don't see the point" 
or "You're doing it wrong" from a maintainer who's job (by definition) 
is to encourage new contributors and improve the subsystem he 
maintains...?  Well that's kind of a downer, don't you agree?  You're 
the maintainer and you should be held to a higher standard...


I maintain 3 projects, I know the job isn't great, but you still should 
be (more) civil to people...




James, unlike you, managed to reply on point and explain why it was
needed. If all you can do is accuse me of not using the software and
therefore not knowing how it should be used, even implying that I'm not
smart enough to understand the use cases, then I don't know why you
bother replying at all.


Good on James.  I council all my guys to keep cool when dealing with 
upstream.  But that doesn't mean you should be dismissing things out of 
hand on the very first interaction you have with a new contributor.




I can understand your frustration to some extent, and I want to give you
the benefit of doubt and want to believe this behaviour was borne out of
it, since I've been reviewing your changes relatively critically.



Good.  I want you to do that.  The changes are in very tricky areas and 
you know the code best.



However, I also want to believe that I've been (trying to) keep the
discussion on a technical level, telling you why I believe some things
shouldn't be done the way you did them, rather than telling you that
you're not smart enough to be working on this. If you feel otherwise,
please do let me know and I'll go back to understand, in order to
improve my behaviour in the future.


If you interpreted my rants as an assault to your intelligence, then I'm 
sorry.  They really were not meant this way.  But sometimes we really 
had to wonder if you were using the same API we were?  So the question I 
asked above was purely logical consequence of what I was seeing.


You yourself admitted that you have never implemented an event driven 
stack.  So how about listening to the guys that are?


We are using your APIs in different ways.  So instead of questioning why 
or attacking those ways, how about asking whether improvements can be made?


We are facing serious pain points with the API.  So instead of 
dismissing things out of hand, can we work together to improve things?


We are trying to make things fast.  The API is frequently not setup for 
that.  The MAC randomization is just one example.  Bringing down the 
interface (and shutting down the firmware, toggling power state, 
cleaning up resources/state) prior to every connection is just not 
acceptable.  Look at the link I sent.  The Android guys state 3 seconds 
is the typical 'hit'.  This is literally wasting everyone's time.




Please help keep the discussion technical, without demeaning undertones.


Point taken.  And I apologize again.  But please consider what I said above.

Regards,
-Denis


Re: [RFC 0/1] Allow MAC change on up interface

2019-08-19 Thread Denis Kenzior

Hi Johannes,


TBH, I'm not really sure I see any point in this to start with, many
devices will give the address to the firmware when the interface is
brought up (even iwlwifi does - I'm not sure we'd want to take your
patch for iwlwifi even if that works today, nothing says the firmware
might not change its mind on that), and so it's quite likely only going
to be supported in few devices.


Hmm... I sense a pattern of you not seeing a point in doing many 
things... Do you actually use the stuff you maintain?




You've also not really explained what exactly is troubling you with
changing the MAC address, you just mentioned some sort of "race
condition"?


Well, one possible use case might be, oh something like this:

https://source.android.com/devices/tech/connect/wifi-mac-randomization



Now, one thing I can imagine would be that you'd want to optimize

  * ifdown
- remove iface from fw/hw
- stop fw/hw
  * change MAC
  * ifup
- start fw/hw
- add iface to fw/hw

to just

  * ifdown
- remove iface from fw/hw
  * change MAC
  * ifup
- add iface to fw/hw



That would be a part of it...


i.e. not restart the firmware (which takes time) for all this, but that
seems much easier to solve by e.g. having a combined operation for all
of this that gets handled in mac80211, or more generally by having a
"please keep the firmware running" token that you can hold while you do
the operation?


And also maybe a bunch of other optimizations like not flushing scan 
results?




Your changes are also a bit strange - you modified the "connect" path
and iwlwifi, but the connect path is not usually (other than with iw or
even iwconfig) taken for iwlwifi? And if you modify auth/assoc paths,
you get into even weirder problems - what if you use different addresses
for auth and assoc? What if the assoc (or even connect) really was a
*re*assoc, and thus must have the same MAC address? To me, the whole
thing seems like more of a problem than a solution.



Okay, so there are some obstacles.  But can you get over the whole 
"Don't hold it like this" part and actually offer up something constructive?


Regards,
-Denis


[RFCv2 4/4] nl80211: Send large new_wiphy events

2019-08-16 Thread Denis Kenzior
Send large NEW_WIPHY events on a new multicast group so that clients
that can accept larger messages do not need to round-trip to the kernel
and perform extra filtered wiphy dumps.

A new multicast group is introduced and the large message is sent before
the legacy message.  This way clients that listen on both multicast
groups can ignore duplicate legacy messages if needed.

Signed-off-by: Denis Kenzior 
---
 include/uapi/linux/nl80211.h |  1 +
 net/wireless/nl80211.c   | 28 +++-
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/nl80211.h b/include/uapi/linux/nl80211.h
index 822851d369ab..b9c1cf29cf09 100644
--- a/include/uapi/linux/nl80211.h
+++ b/include/uapi/linux/nl80211.h
@@ -50,6 +50,7 @@
 #define NL80211_MULTICAST_GROUP_MLME   "mlme"
 #define NL80211_MULTICAST_GROUP_VENDOR "vendor"
 #define NL80211_MULTICAST_GROUP_NAN"nan"
+#define NL80211_MULTICAST_GROUP_CONFIG2"config2"
 #define NL80211_MULTICAST_GROUP_TESTMODE   "testmode"
 
 /**
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 24b67de99f3a..9ba9e1938d6b 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -46,6 +46,7 @@ enum nl80211_multicast_groups {
NL80211_MCGRP_MLME,
NL80211_MCGRP_VENDOR,
NL80211_MCGRP_NAN,
+   NL80211_MCGRP_CONFIG2,
NL80211_MCGRP_TESTMODE /* keep last - ifdef! */
 };
 
@@ -56,6 +57,7 @@ static const struct genl_multicast_group nl80211_mcgrps[] = {
[NL80211_MCGRP_MLME] = { .name = NL80211_MULTICAST_GROUP_MLME },
[NL80211_MCGRP_VENDOR] = { .name = NL80211_MULTICAST_GROUP_VENDOR },
[NL80211_MCGRP_NAN] = { .name = NL80211_MULTICAST_GROUP_NAN },
+   [NL80211_MCGRP_CONFIG2] = { .name = NL80211_MULTICAST_GROUP_CONFIG2 },
 #ifdef CONFIG_NL80211_TESTMODE
[NL80211_MCGRP_TESTMODE] = { .name = NL80211_MULTICAST_GROUP_TESTMODE }
 #endif
@@ -14730,12 +14732,34 @@ void nl80211_notify_wiphy(struct 
cfg80211_registered_device *rdev,
  enum nl80211_commands cmd)
 {
struct sk_buff *msg;
+   size_t alloc_size;
struct nl80211_dump_wiphy_state state = {};
 
WARN_ON(cmd != NL80211_CMD_NEW_WIPHY &&
cmd != NL80211_CMD_DEL_WIPHY);
 
-   msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+   if (cmd == NL80211_CMD_NEW_WIPHY) {
+   state.large_message = true;
+   alloc_size = 8192UL;
+   } else
+   alloc_size = NLMSG_DEFAULT_SIZE;
+
+   msg = nlmsg_new(alloc_size, GFP_KERNEL);
+   if (!msg)
+   goto legacy;
+
+   if (nl80211_send_wiphy(rdev, cmd, msg, 0, 0, 0, &state) < 0) {
+   nlmsg_free(msg);
+   goto legacy;
+   }
+
+   genlmsg_multicast_netns(&nl80211_fam, wiphy_net(&rdev->wiphy), msg, 0,
+   NL80211_MCGRP_CONFIG2, GFP_KERNEL);
+
+legacy:
+   state.large_message = false;
+   alloc_size = NLMSG_DEFAULT_SIZE;
+   msg = nlmsg_new(alloc_size, GFP_KERNEL);
if (!msg)
return;
 
@@ -14763,6 +14787,8 @@ void nl80211_notify_iface(struct 
cfg80211_registered_device *rdev,
return;
}
 
+   genlmsg_multicast_netns(&nl80211_fam, wiphy_net(&rdev->wiphy), msg, 0,
+   NL80211_MCGRP_CONFIG2, GFP_KERNEL);
genlmsg_multicast_netns(&nl80211_fam, wiphy_net(&rdev->wiphy), msg, 0,
NL80211_MCGRP_CONFIG, GFP_KERNEL);
 }
-- 
2.21.0



[RFCv2 1/4] nl80211: Fix broken non-split wiphy dumps

2019-08-16 Thread Denis Kenzior
If a (legacy) client requested a wiphy dump but did not provide the
NL80211_ATTR_SPLIT_WIPHY_DUMP attribute, the dump was supposed to be
composed of purely non-split NEW_WIPHY messages, with 1 wiphy per
message.  At least this was the intent after commit:
3713b4e364ef ("nl80211: allow splitting wiphy information in dumps")

However, in reality the non-split dumps were broken very shortly after.
Perhaps around commit:
fe1abafd942f ("nl80211: re-add channel width and extended capa advertising")

The reason for the bug is a missing setting of split_start to 0 in the
case of a non-split dump.

Here is a sample non-split dump performed on kernel 4.19, some parts
were cut for brevity:
< Request: Get Wiphy (0x01) len 0 [ack,0x300]
> Result: New Wiphy (0x03) len 3496 [multi]
Wiphy: 0 (0x)
Wiphy Name: phy0
Generation: 1 (0x0001)

> Result: New Wiphy (0x03) len 68 [multi]
Wiphy: 0 (0x)
Wiphy Name: phy0
Generation: 1 (0x0001)
Extended Capabilities: len 8
Capability: bit  2: Extended channel switching
Capability: bit 62: Opmode Notification
Extended Capabilities Mask: len 8
04 00 00 00 00 00 00 40  ...@
VHT Capability Mask: len 12
f0 1f 80 33 ff ff 00 00 ff ff 00 00  ...3
> Result: New Wiphy (0x03) len 28 [multi]
Wiphy: 0 (0x)
Wiphy Name: phy0
Generation: 1 (0x0001)
> Result: New Wiphy (0x03) len 28 [multi]
Wiphy: 0 (0x)
Wiphy Name: phy0
Generation: 1 (0x0001)
> Result: New Wiphy (0x03) len 52 [multi]
Wiphy: 0 (0x)
Wiphy Name: phy0
Generation: 1 (0x0001)
Max CSA Counters: len 1
02   .
Scheduled Scan Maximum Requests: len 4
01 00 00 00  
Extended Features: len 4
02 02 00 04  
> Result: New Wiphy (0x03) len 36 [multi]
Wiphy: 0 (0x)
Wiphy Name: phy0
Generation: 1 (0x0001)
Reserved: len 4
00 00 00 00  
> Complete: Get Wiphy (0x01) len 4 [multi]
Status: 0

Signed-off-by: Denis Kenzior 
---
 net/wireless/nl80211.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 1a107f29016b..b9b0199b5ec6 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -2173,6 +2173,9 @@ static int nl80211_send_wiphy(struct 
cfg80211_registered_device *rdev,
 * but break unconditionally so unsplit data stops here.
 */
state->split_start++;
+
+   if (!state->split)
+   state->split_start = 0;
break;
case 9:
if (rdev->wiphy.extended_capabilities &&
-- 
2.21.0



[RFCv2 3/4] nl80211: Don't split-dump for clients with large buffers

2019-08-16 Thread Denis Kenzior
Signed-off-by: Denis Kenzior 
---
 net/wireless/nl80211.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 682a095415eb..24b67de99f3a 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -2498,6 +2498,22 @@ static int nl80211_dump_wiphy(struct sk_buff *skb, 
struct netlink_callback *cb)
rtnl_unlock();
return ret;
}
+
+   /*
+* auto-detect support for large buffer sizes: af_netlink
+* will allocate skbufs larger than 4096 in cases where
+* it detects that the client receive buffer (given to
+* recvmsg) is bigger.  In such cases we can assume that
+* performing split dumps is wasteful since the client
+* can likely safely consume the entire un-split wiphy
+* message in one go without the extra message header
+* overhead.
+*/
+   if (skb_tailroom(skb) > 4096) {
+   state->large_message = true;
+   state->split = false;
+   }
+
cb->args[0] = (long)state;
}
 
@@ -2531,6 +2547,7 @@ static int nl80211_dump_wiphy(struct sk_buff *skb, struct 
netlink_callback *cb)
 * We can then retry with the larger buffer.
 */
if ((ret == -ENOBUFS || ret == -EMSGSIZE) &&
+   !state->large_message &&
!skb->len && !state->split &&
cb->min_dump_alloc < 4096) {
cb->min_dump_alloc = 4096;
-- 
2.21.0



[RFCv2 2/4] nl80211: Support >4096 byte NEW_WIPHY event nlmsg

2019-08-16 Thread Denis Kenzior
For historical reasons, NEW_WIPHY messages generated by dumps or
GET_WIPHY commands were limited to 4096 bytes due to userspace tools
using limited buffers.  Once the sizes NEW_WIPHY messages exceeded these
sizes, split dumps were introduced.  All any non-legacy data was added
only to messages using split-dumps (including filtered dumps).

However, split-dumping has quite a significant overhead.  On cards
tested, split dumps generated message sizes 1.7-1.8x compared to
non-split dumps, while still comfortably fitting into an 8k buffer.  The
kernel now expects userspace to provide 16k buffers by default, and 32k
buffers are possible.

Introduce a concept of a large message, so that if the kernel detects
that userspace has provided a buffer of sufficient size, a non-split
message could be generated.

Signed-off-by: Denis Kenzior 
---
 net/wireless/nl80211.c | 51 ++
 1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index b9b0199b5ec6..682a095415eb 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -1839,6 +1839,7 @@ struct nl80211_dump_wiphy_state {
long start;
long split_start, band_start, chan_start, capa_start;
bool split;
+   bool large_message;
 };
 
 static int nl80211_send_wiphy(struct cfg80211_registered_device *rdev,
@@ -2027,7 +2028,8 @@ static int nl80211_send_wiphy(struct 
cfg80211_registered_device *rdev,
 
if (nl80211_msg_put_channel(
msg, &rdev->wiphy, chan,
-   state->split))
+   state->split ||
+   state->large_message))
goto nla_put_failure;
 
nla_nest_end(msg, nl_freq);
@@ -2072,7 +2074,7 @@ static int nl80211_send_wiphy(struct 
cfg80211_registered_device *rdev,
i = nl80211_add_commands_unsplit(rdev, msg);
if (i < 0)
goto nla_put_failure;
-   if (state->split) {
+   if (state->split || state->large_message) {
CMD(crit_proto_start, CRIT_PROTOCOL_START);
CMD(crit_proto_stop, CRIT_PROTOCOL_STOP);
if (rdev->wiphy.flags & WIPHY_FLAG_HAS_CHANNEL_SWITCH)
@@ -2111,7 +2113,8 @@ static int nl80211_send_wiphy(struct 
cfg80211_registered_device *rdev,
/* fall through */
case 6:
 #ifdef CONFIG_PM
-   if (nl80211_send_wowlan(msg, rdev, state->split))
+   if (nl80211_send_wowlan(msg, rdev,
+   state->split || state->large_message))
goto nla_put_failure;
state->split_start++;
if (state->split)
@@ -2126,7 +2129,8 @@ static int nl80211_send_wiphy(struct 
cfg80211_registered_device *rdev,
goto nla_put_failure;
 
if (nl80211_put_iface_combinations(&rdev->wiphy, msg,
-  state->split))
+  state->split ||
+  state->large_message))
goto nla_put_failure;
 
state->split_start++;
@@ -2145,7 +2149,7 @@ static int nl80211_send_wiphy(struct 
cfg80211_registered_device *rdev,
 * dump is split, otherwise it makes it too big. Therefore
 * only advertise it in that case.
 */
-   if (state->split)
+   if (state->split || state->large_message)
features |= NL80211_FEATURE_ADVERTISE_CHAN_LIMITS;
if (nla_put_u32(msg, NL80211_ATTR_FEATURE_FLAGS, features))
goto nla_put_failure;
@@ -2170,13 +2174,20 @@ static int nl80211_send_wiphy(struct 
cfg80211_registered_device *rdev,
 *
 * We still increment split_start so that in the split
 * case we'll continue with more data in the next round,
-* but break unconditionally so unsplit data stops here.
+* but break unless large_messages are requested, so
+* legacy unsplit data stops here.
 */
state->split_start++;
 
-   if (!state->split)
+   if (state->split)
+   break;
+
+   if (!state->large_message) {
state->split_start = 0;
-   break;
+   break;
+   }
+
+   /* Fall through */
case 9:
  

BUG: atk9k/mac80211 CONTROL_PORT_FRAME processing + MFP

2019-08-15 Thread Denis Kenzior

Hi,

We're getting reports of weird behavior on ath9k/mac80211 devices when 
CONTROL_PORT_FRAME support is used.  iwlwifi, works fine under the same 
circumstances.  If CONTROL_PORT_FRAME is disabled and the old legacy PAE 
transport is used, everything works fine.


The short version:
 - We Connect with MFP enabled
 - Handshake packet 1 & 2 is exchanged
 - Handshake packet 3 is received & reply sent
 - Keys are set
 - AP never receives our packet 4, or the card drops it locally
 - Subsequent retransmissions are sent un-encrypted by the AP and are 
probably sent encrypted (via MFP) by mac80211.  Since the driver never 
sets NO_ENCRYPT flag, userspace has no knowledge to try and send the 
reply un-encrypted.


Here's a log (some parts are removed for brevity, but if someone wants 
the full log, I can provide this as well):


< Request: Connect (0x2e) len 160 [ack] 
1565892849.337243

Interface Index: 13 (0x000d)
Wiphy Frequency: 5805 (0x16ad)
MAC Address 10:C3:7B:54:74:D4
SSID: len 9
Auth Type: 0 (0x)
Privacy: true
Interface Socket Owner: true
Cipher Suites Pairwise:
CCMP (00:0f:ac) suite  04
Cipher Suite Group: CCMP (00:0f:ac) suite  04
Use MFP: 1 (0x0001)
AKM Suites:
PSK; RSNA PSK (00:0f:ac) suite  02
WPA Versions: 2 (0x0002)
Control Port: true
Control Port over NL80211: true
Use RRM: true
Information Elements: len 41
RSN:
Group Data Cipher Suite: len 4
CCMP (00:0f:ac) suite  04
Pairwise Cipher Suite: len 4
CCMP (00:0f:ac) suite  04
AKM Suite: len 4
PSK; RSNA PSK (00:0f:ac) suite  02
RSN capabilities: bits  2 - 3: 1 replay counter per PTKSA
RSN capabilities: bits  4 - 5: 1 replay counter per GTKSA
RSN capabilities: bit  7: Management Frame Protection Capable
01 00 00 0f ac 04 01 00 00 0f ac 04 01 00 00 0f 

ac 02 80 00   


RM Enabled Capabilities: len 5
Operating Channel Max Measurement Duration: 0
Non-Operating Channel Max Measurement Duration: 0
Measurement Pilot Capability: 0
00 00 00 00 00   . 


Extended Capabilities: len 10
Capability: bit 62: Opmode Notification
00 00 00 00 00 00 00 40 00 00...@.. 

> Event: New Station (0x13) len 32 
1565892851.673886

Interface Index: 13 (0x000d)
MAC Address 10:C3:7B:54:74:D4
Generation: 5 (0x0005)
Station Info: len 0
> Response: Connect (0x2e) len 4 [0x100] 
1565892851.673971

Status: Success (0)
> Event: Authenticate (0x25) len 64 
1565892851.676737

Wiphy: 1 (0x0001)
Interface Index: 13 (0x000d)
Frame: len 41



Here comes the Handshake packet 1 from the AP.  Why the hell is it here 
prior to Authenticate ?  But whatever ;)


> Event: Control Port Frame (0x81) len 176 
1565892851.686055

Wiphy: 1 (0x0001)
Interface Index: 13 (0x000d)
Wireless Device: 4294967299 (0x00010003)
MAC Address 10:C3:7B:54:74:D4
Control Port Ethertype: 34958 (0x888e)
Frame: len 121
Protocol Version: 2 (802.1X-2004)
Type: 3 (Key)
Length: 117
Descriptor Type: 2
Key MIC: false
Secure: false
Error: false
Request: false
Encrypted Key Data: false
SMK Message: false
Key Descriptor Version: 2 (02)
Key Type: true
Install: false
Key ACK: true
Key Length: 16
Key Replay Counter: 0
Key NONCE
e4 5b 1d 98 fd db 54 ae 54 98 fc 39 69 f5 bc 0b 
.[T.T..9i...
f6 a3 aa c5 50 43 9a 6d 27 e5 a4 bb e9 c5 6d de 
PC.m'.m.

Key IV
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 


Key RSC
00 00 00 00 00 00 00 00   


Key MIC Data
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 


Key Data: len 22
Vendor specific: len 20
IEEE 802.11 (00:0f:ac) type: 04
PMKID KDE
00 0f ac 04 d9 81 f4 29 57 31 7e ad 33 57 b8 af 
...)W1~.3W..
c7 a7 40 8f  ..@. 


02 03 00 75 02 00 8a 00 10 00 00 00 00 00 00 00  ...u
00 e4 5b 1d 98 fd db 54 ae 54 98 fc 39 69 f5 bc  ..[T.T..9i..
0b f6 a3 aa c5 50 43 9a 6d 27 e5 a4 bb e9 c5 6d  .PC.m'.m
de 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
00 00 16 dd 14 00 0f ac 04 d9 81 f4 29 57 31 7e  )W1~
ad

Re: [RFCv1 2/2] nl80211: Don't split-dump for clients with large buffers

2019-08-01 Thread Denis Kenzior

Hi Johannes,

On 8/1/19 4:13 AM, Johannes Berg wrote:

On Thu, 2019-08-01 at 02:14 -0500, Denis Kenzior wrote:


+   /*
+* auto-detect support for large buffer sizes: af_netlink
+* will allocate skbufs larger than 4096 in cases where
+* it detects that the client receive buffer (given to
+* recvmsg) is bigger.  In such cases we can assume that
+* performing split dumps is wasteful since the client
+* can likely safely consume the entire un-split wiphy
+* message in one go without the extra message header
+* overhead.
+*/
+   if (skb_tailroom(skb) > 4096) {
+   state->large_message = true;
+   state->split = false;
+   }


Hmm. That's kinda a neat idea, but I don't think it's a good idea. Have
you checked how long the message is now?


I only have hwsim and a ath9k to test with.  The hwsim comes out to be 
4448 bytes with an updated version of my patch.  ath9k is smaller.  V1 
did not address some extra gotcha code which didn't include some 
attributes in the unsplit case.  As far as I can tell all the attributes 
are now identical with v2.


Note that the overhead for split dumps is actually quite big.  The same 
info using split-dumps is 7724 bytes.  So there would definitely be an 
advantage in not using such fragmentation if not needed...




Since we *did* in fact hit the previous limit, and have added a *lot* of
things since then (this was years ago, after all), I wouldn't be
surprised if we're reasonably close to the new limit you propose even
now already.


It seems not?



Also, keep in mind that there are some devices that just have an
*enormous* amount of channels, and that's only going to increase (right
now with 6/7 GHz, etc.)


The 2.4 & 5 Ghz bands account for about 2k.  So even if we add another 
band, we're likely still within an 8k buffer.  And really the kernel 
recommends a 16k buffer to be used as a minimum...


Also, the way nl80211 encodes channel information is really quite 
wasteful.  Not sure if anything can be done about it now, but the flags 
really, really, really add up.  So there is significant savings to be 
had here...




So in general, given all the variable things we have here, all this
buffer size estimation doesn't seem very robust to me. You could have
any number of variable things in a message:
  * channel list - which we alleviated somewhat by having a separate
channel dump, so not all data is included here (which I guess you'll
complain about next :P)


Not sure I follow?  I don't see a separate channel dump?  Can you point 
me in the right direction?



  * nl80211_send_mgmt_stypes() things are also a bit variable, and we
keep adding interface types etc., and some devices may support lots
of frames (there's an upper bound, but it's not that small)
  * interface combinations - only getting more complex with more complex
devices and more concurrency use cases
  * vendor commands have no real limit
  * I'm sure measurement use cases will only increases
  * and generally of course we keep adding to everything


Also, I don't really buy the *need* for this since you're just removing
a few kernel/user roundtrips here when new devices are discovered, a
rare event. The parsing isn't really any more complicated for the
userspace side.



roundtrips to the kernel introduce races.  The less potential for a 
race, the less code we have to write and the less buggy it is.  Pretty 
simple...




Regarding the other patch, I think most of the above also applies there.
I can sort of see how you think it's *nice* to have all the data right
there, but I really don't see why you're so hung up about having to
request the full information ... And I really don't want to see this hit
the wall again in the future, in some weird scenarios with devices that
have lots of .



See above...




It should be safe to assume that any users of these new unsolicited
NEW_WIPHY events are non-legacy clients, which can use a
larger receive buffer for netlink messages.  Since older, legacy clients
did not utilize NEW_WIPHY events (they did not exist), it is assumed
that even if the client receives such a message (even if truncated), no
harm would result and backwards-compatibility would be kept.


Interesting idea, but no, in general you cannot assume that. Older
clients might have added support for NEW_WIPHY without fixing the split
dumps first ...


The two commits are over a year apart, but okay, fair enough.  Then 
again, you sort of hinted that nobody used this anyhow.


But regardless, if this mythical legacy/broken client is truly a 
concern, we can introduce a NEW_WIPHY_BIG or something.




Also, you mention in the code that messages are tr

[RFCv1 1/2] nl80211: Support >4096 byte NEW_WIPHY event nlmsg

2019-08-01 Thread Denis Kenzior
For historical reasons, NEW_WIPHY messages generated by dumps or
GET_WIPHY commands were limited to 4096 bytes due to userspace tools
using limited buffers.  Once the sizes NEW_WIPHY messages exceeded these
sizes, split dumps were introduced.  All any non-legacy data was added
only to messages using split-dumps (including filtered dumps).

When unsolicited NEW_WIPHY events were introduced they inherited the
4096 byte limitation.  These messages thus do not contain any non-legacy
wiphy dump data.  This means that userspace still needs to re-dump the
information from the kernel after receiving such NEW_WIPHY event since
some of the information is missing.  Thus it is desirable to relax such
restrictions for these messages and include the non-legacy data in these
events.

It should be safe to assume that any users of these new unsolicited
NEW_WIPHY events are non-legacy clients, which can use a
larger receive buffer for netlink messages.  Since older, legacy clients
did not utilize NEW_WIPHY events (they did not exist), it is assumed
that even if the client receives such a message (even if truncated), no
harm would result and backwards-compatibility would be kept.
---
 net/wireless/nl80211.c | 49 ++
 1 file changed, 40 insertions(+), 9 deletions(-)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 1a107f29016b..6774072e836f 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -1839,6 +1839,7 @@ struct nl80211_dump_wiphy_state {
long start;
long split_start, band_start, chan_start, capa_start;
bool split;
+   bool large_message;
 };
 
 static int nl80211_send_wiphy(struct cfg80211_registered_device *rdev,
@@ -2168,12 +2169,23 @@ static int nl80211_send_wiphy(struct 
cfg80211_registered_device *rdev,
 * helps ensure that newly added capabilities don't break
 * older tools by overrunning their buffers.
 *
+* For unsolicited NEW_WIPHY notifications, it is assumed
+* that the client can handle larger messages.  Unsolicited
+* NEW_WIPHY notifications were added relatively recently
+* and it is not expected that older tools with limited
+* buffers would utilize these messages anyway.  E.g. even
+* if the message is truncated, it would not have been
+* used regardless.
+*
 * We still increment split_start so that in the split
 * case we'll continue with more data in the next round,
-* but break unconditionally so unsplit data stops here.
+* but break unless large_messages are requested, so
+* legacy unsplit data stops here.
 */
state->split_start++;
-   break;
+   if (state->split || !state->large_message)
+   break;
+   /* Fall through */
case 9:
if (rdev->wiphy.extended_capabilities &&
(nla_put(msg, NL80211_ATTR_EXT_CAPA,
@@ -2215,7 +2227,9 @@ static int nl80211_send_wiphy(struct 
cfg80211_registered_device *rdev,
}
 
state->split_start++;
-   break;
+   if (state->split)
+   break;
+   /* Fall through */
case 10:
if (nl80211_send_coalesce(msg, rdev))
goto nla_put_failure;
@@ -2231,7 +2245,9 @@ static int nl80211_send_wiphy(struct 
cfg80211_registered_device *rdev,
goto nla_put_failure;
 
state->split_start++;
-   break;
+   if (state->split)
+   break;
+   /* Fall through */
case 11:
if (rdev->wiphy.n_vendor_commands) {
const struct nl80211_vendor_cmd_info *info;
@@ -2267,7 +2283,9 @@ static int nl80211_send_wiphy(struct 
cfg80211_registered_device *rdev,
nla_nest_end(msg, nested);
}
state->split_start++;
-   break;
+   if (state->split)
+   break;
+   /* Fall through */
case 12:
if (rdev->wiphy.flags & WIPHY_FLAG_HAS_CHANNEL_SWITCH &&
nla_put_u8(msg, NL80211_ATTR_MAX_CSA_COUNTERS,
@@ -2309,7 +2327,9 @@ static int nl80211_send_wiphy(struct 
cfg80211_registered_device *rdev,
}
 
state->split_start++;
-   break;
+   if (state->split)
+   break;
+   /* Fall through */
case 13:
if (rdev->wiphy.num_iftype_ext_capab &&
rdev->wiphy.iftype_ext_capab) {
@@ -2377,13 +2397,17 @@ static int nl80211_send_wiphy(struct 
cfg80211_registered_device *rdev,
}
 
sta

[RFCv1 2/2] nl80211: Don't split-dump for clients with large buffers

2019-08-01 Thread Denis Kenzior
---
 net/wireless/nl80211.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 6774072e836f..e1707cfd9076 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -2496,6 +2496,22 @@ static int nl80211_dump_wiphy(struct sk_buff *skb, 
struct netlink_callback *cb)
rtnl_unlock();
return ret;
}
+
+   /*
+* auto-detect support for large buffer sizes: af_netlink
+* will allocate skbufs larger than 4096 in cases where
+* it detects that the client receive buffer (given to
+* recvmsg) is bigger.  In such cases we can assume that
+* performing split dumps is wasteful since the client
+* can likely safely consume the entire un-split wiphy
+* message in one go without the extra message header
+* overhead.
+*/
+   if (skb_tailroom(skb) > 4096) {
+   state->large_message = true;
+   state->split = false;
+   }
+
cb->args[0] = (long)state;
}
 
@@ -2529,6 +2545,7 @@ static int nl80211_dump_wiphy(struct sk_buff *skb, struct 
netlink_callback *cb)
 * We can then retry with the larger buffer.
 */
if ((ret == -ENOBUFS || ret == -EMSGSIZE) &&
+   !state->large_message &&
!skb->len && !state->split &&
cb->min_dump_alloc < 4096) {
cb->min_dump_alloc = 4096;
-- 
2.21.0



Re: [PATCH v3 2/3] nl80211: Limit certain commands to interface owner

2019-07-31 Thread Denis Kenzior

Hi Johannnes,

On 7/31/19 4:51 AM, Johannes Berg wrote:

On Mon, 2019-07-01 at 10:33 -0500, Denis Kenzior wrote:

If the wdev object has been created (via NEW_INTERFACE) with
SOCKET_OWNER attribute set, then limit certain commands only to the
process that created that wdev.

This can be used to make sure no other process on the system interferes
by sending unwanted scans, action frames or any other funny business.

This patch introduces a new internal flag, and checks that flag in the
pre_doit hook.


So, looking at this ...

I can't say I'm convinced. You're tagging 35 out of about 106 commands,
and even if a handful of those are new and were added after your patch,
this doesn't really make sense.

NL80211_CMD_LEAVE_IBSS is tagged, but not NL80211_CMD_LEAVE_MESH?
NL80211_CMD_NEW_STATION is tagged, but not NL80211_CMD_NEW_MPATH?
NL80211_CMD_SET_KEY is tagged, but not NL80211_CMD_SET_PMK or
NL80211_CMD_SET_PMKSA?
NL80211_CMD_UPDATE_CONNECT_PARAMS is tagged, but not
NL80211_CMD_UPDATE_OWE_INFO (though this could be patch crossing?)

NL80211_CMD_CONTROL_PORT_FRAME isn't tagged?


So for some of these I was planning to submit a patch to check that the 
request comes from the SOCKET_OWNER for the connection itself.  E.g. it 
makes no sense to accept CONTROL_PORT_FRAME from a process that didn't 
issue the CMD_CONNECT.  Same applies for some of the offload commands. 
But I can include these in this list as well if you prefer.


For others, it was pure ignorance as to what the commands do.  E.g. we 
haven't looked into Mesh at all.  So if you want to suggest which 
commands should be included, I'm happy to add these.




NL80211_CMD_SET_QOS_MAP isn't tagged?

It almost feels like you just did a "git grep NL80211_CMD_" on your
code, and then dropped the flag on everything you were using.

And honestly, I think you need a better justification than just
"unwanted scans, action frames or any other funny business".



We have a limited resource that we are managing in userspace.  We can't 
just have any random process coming in and messing with that resource. 
So either the userspace daemon should do it or the kernel.  Right now it 
is just pure chaos...


And really, in the end, how is this different from SOCKET_OWNER for 
CMD_CONNECT?  It is optional in the end, so if you don't want to use it, 
don't?



Also, how's this not just a workaround for some very specific setup
issue you were seeing, where people trying out iwd didn't remove wpa_s
properly (*)? I'm really not convinced that this buys us anything except
in very limited development scenarios - and those are typically the
exact scenarios where you _want_ to be able to do things like that (and
honestly, I'd be pretty pissed off if I couldn't do an "iw wlan0 scan"
just because some tool decided it wanted to have control over things).


I understand where you're coming from, but you're just one user who can 
disable this behavior anyway if you really cared.  For 99.9% of the 
users this is never going to be a problem.  And I'd rather cater to the 
99%...




(*) also, that would just happen to work for you now with iwd winning
because you claim ownership and wpa_s doesn't, you'd still get the same
complaints "iwd doesn't work" if/when wpa_s *does* start to claim
ownership and you get locked out with a patch like this, so I don't feel
you'd actually win much even in this case.



This is in no way the motivation for this.  wpa_s or iwd winning is a 
distro/user configuration problem.  I don't care about that now. 
Besides, this was mostly taken care of by the SOCKET_OWNER set on 
CMD_CONNECT...




I'm trying to come up with places where we do something similar, defend
one application running as root against another ... but can't really?
Think about VPN - we don't stop anying from removing or adding IP
addresses that the VPN application didn't intend to use, yet that can
obviously break your connection. You could even run dhcp on it, even if
for (most) VPN protocols that's rather useless.


File locking would be one example.  Systemd can and does all kinds of 
fun stuff (e.g. locking a process out from twiddling rfkill).  LSM 
modules can do just about everything.  I think there are plenty of examples.


But really, a lot of this works just because various processes play 
'nice' and stay out of each other's way.  Also, as you point out, most 
things aren't done because they don't make sense.


But with nl80211 this isn't the case.  Many processes would be tempted 
to start an operation or get some info out of nl80211 directly.  So this 
patch tries to narrow down what they can use, e.g. informational-only 
commands are fine.  Anything that can affect state is not fine.




Overall, I'm not really convinced. The design is rather uncle

Re: [PATCH v4 2/3] nl80211: Limit certain commands to interface owner

2019-07-29 Thread Denis Kenzior

Hi Johannes,

On 7/22/19 6:33 AM, Denis Kenzior wrote:

If the wdev object has been created (via NEW_INTERFACE) with
SOCKET_OWNER attribute set, then limit certain commands only to the
process that created that wdev.

This can be used to make sure no other process on the system interferes
by sending unwanted scans, action frames or any other funny business.

This patch introduces a new internal flag, and checks that flag in the
pre_doit hook.

Signed-off-by: Denis Kenzior 
---
  net/wireless/nl80211.c | 78 --
  1 file changed, 60 insertions(+), 18 deletions(-)

Changes in v4:
   -  Minor restructuring suggested by Arend

Changes in v3:
   -  Fix minor locking mistake reported by kernel test robot

Changes in v2:
   -  None


I noticed that the other patches in this series got applied.  Was this 
one left out on purpose?


Regards,
-Denis



Re: [PATCH v2] cfg80211: use parallel_ops for genl

2019-07-29 Thread Denis Kenzior

On 7/29/19 9:31 AM, Johannes Berg wrote:

From: Johannes Berg 

Over time, we really need to get rid of all of our global locking.
One of the things needed is to use parallel_ops. This isn't really
the most important (RTNL is much more important) but OTOH we just
keep adding uses of genl_family_attrbuf() now. Use .parallel_ops to
disallow this.

Signed-off-by: Johannes Berg 
Link: https://lore.kernel.org/r/20190726191621.5031-1-johan...@sipsolutions.net
Signed-off-by: Johannes Berg 
---
  net/wireless/nl80211.c | 108 +
  1 file changed, 78 insertions(+), 30 deletions(-)



Reviewed-By: Denis Kenzior 

Regards,
-Denis



Re: [PATCH] cfg80211: use parallel_ops for genl

2019-07-27 Thread Denis Kenzior

Hi Johannes,

On 7/26/19 2:16 PM, Johannes Berg wrote:

From: Johannes Berg 

Over time, we really need to get rid of all of our global locking.
One of the things needed is to use parallel_ops. This isn't really
the most important (RTNL is much more important) but OTOH we just
keep adding uses of genl_family_attrbuf() now. Use .parallel_ops to
disallow this.

Signed-off-by: Johannes Berg 
---
  net/wireless/nl80211.c | 112 +
  1 file changed, 81 insertions(+), 31 deletions(-)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 10b57aa10227..59aefcd7ccb6 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -749,19 +749,29 @@ int nl80211_prepare_wdev_dump(struct netlink_callback *cb,
int err;
  
  	if (!cb->args[0]) {

+   struct nlattr **attrbuf;
+
+   attrbuf = kcalloc(NUM_NL80211_ATTR, sizeof(*attrbuf),
+ GFP_KERNEL);
+   if (!attrbuf)
+   return -ENOMEM;
+
err = nlmsg_parse_deprecated(cb->nlh,
 GENL_HDRLEN + nl80211_fam.hdrsize,
-genl_family_attrbuf(&nl80211_fam),
-nl80211_fam.maxattr,
+attrbuf, nl80211_fam.maxattr,
 nl80211_policy, NULL);
-   if (err)
+   if (err) {
+   kfree(attrbuf);
return err;
+   }
  
-		*wdev = __cfg80211_wdev_from_attrs(

-   sock_net(cb->skb->sk),
-   genl_family_attrbuf(&nl80211_fam));
-   if (IS_ERR(*wdev))
+   *wdev = __cfg80211_wdev_from_attrs(sock_net(cb->skb->sk),
+  attrbuf);
+   kfree(attrbuf);
+   if (IS_ERR(*wdev)) {
+   kfree(attrbuf);


Hmm, you just freed attrbuf above?


return PTR_ERR(*wdev);
+   }
*rdev = wiphy_to_rdev((*wdev)->wiphy);
/* 0 is the first index - add 1 to parse only once */
cb->args[0] = (*rdev)->wiphy_idx + 1;





@@ -12846,24 +12880,32 @@ static int nl80211_prepare_vendor_dump(struct sk_buff 
*skb,
return 0;
}
  
+	attrbuf = kcalloc(NUM_NL80211_ATTR, sizeof(*attrbuf), GFP_KERNEL);

+   if (!attrbuf)
+   return -ENOMEM;
+
err = nlmsg_parse_deprecated(cb->nlh,
 GENL_HDRLEN + nl80211_fam.hdrsize,
 attrbuf, nl80211_fam.maxattr,
 nl80211_policy, NULL);
if (err)
-   return err;
+   goto out;
  
  	if (!attrbuf[NL80211_ATTR_VENDOR_ID] ||

-   !attrbuf[NL80211_ATTR_VENDOR_SUBCMD])
-   return -EINVAL;
+   !attrbuf[NL80211_ATTR_VENDOR_SUBCMD]) {
+   err = -EINVAL;
+   goto out;
+   }


Might be nicer to just set err = -EINVAL before the if instead of using 
{} here


  
  	*wdev = __cfg80211_wdev_from_attrs(sock_net(skb->sk), attrbuf);

if (IS_ERR(*wdev))
*wdev = NULL;
  
  	*rdev = __cfg80211_rdev_from_attrs(sock_net(skb->sk), attrbuf);

-   if (IS_ERR(*rdev))
-   return PTR_ERR(*rdev);
+   if (IS_ERR(*rdev)) {
+   err = PTR_ERR(*rdev);
+   goto out;
+   }
  
  	vid = nla_get_u32(attrbuf[NL80211_ATTR_VENDOR_ID]);

subcmd = nla_get_u32(attrbuf[NL80211_ATTR_VENDOR_SUBCMD]);
@@ -12876,15 +12918,19 @@ static int nl80211_prepare_vendor_dump(struct sk_buff 
*skb,
if (vcmd->info.vendor_id != vid || vcmd->info.subcmd != subcmd)
continue;
  
-		if (!vcmd->dumpit)

-   return -EOPNOTSUPP;
+   if (!vcmd->dumpit) {
+   err = -EOPNOTSUPP;
+   goto out;
+   }


Same thing here, setting err = -EOPNOTSUPP before the for...

  
  		vcmd_idx = i;

break;
}
  
-	if (vcmd_idx < 0)

-   return -EOPNOTSUPP;
+   if (vcmd_idx < 0) {
+   err = -EOPNOTSUPP;
+   goto out;
+   }
  
  	if (attrbuf[NL80211_ATTR_VENDOR_DATA]) {

data = nla_data(attrbuf[NL80211_ATTR_VENDOR_DATA]);




Otherwise LGTM.

Feel free to add: Reviewed-by: Denis Kenzior 

Regards,
-Denis


[PATCH v4 1/3] nl80211: Update uapi for CMD_FRAME_WAIT_CANCEL

2019-07-22 Thread Denis Kenzior
Commit 1c38c7f22068 ("nl80211: send event when CMD_FRAME duration
expires") added the possibility of NL80211_CMD_FRAME_WAIT_CANCEL
being sent whenever the off-channel wait time associated with a
CMD_FRAME completes.  Document this in the uapi/linux/nl80211.h file.

Signed-off-by: Denis Kenzior 
---
 include/uapi/linux/nl80211.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

 Changes in v4:
  -  None

 Changes in v3:
  -  None

 Changes in v2:
  - update commit formatting
diff --git a/include/uapi/linux/nl80211.h b/include/uapi/linux/nl80211.h
index 8fc3a43cac75..0d9aad98c983 100644
--- a/include/uapi/linux/nl80211.h
+++ b/include/uapi/linux/nl80211.h
@@ -657,7 +657,9 @@
  * is used during CSA period.
  * @NL80211_CMD_FRAME_WAIT_CANCEL: When an off-channel TX was requested, this
  * command may be used with the corresponding cookie to cancel the wait
- * time if it is known that it is no longer necessary.
+ * time if it is known that it is no longer necessary.  This command is
+ * also sent as an event whenever the driver has completed the off-channel
+ * wait time.
  * @NL80211_CMD_ACTION: Alias for @NL80211_CMD_FRAME for backward 
compatibility.
  * @NL80211_CMD_FRAME_TX_STATUS: Report TX status of a management frame
  * transmitted with %NL80211_CMD_FRAME. %NL80211_ATTR_COOKIE identifies
-- 
2.21.0



[PATCH v4 3/3] nl80211: Include wiphy address setup in NEW_WIPHY

2019-07-22 Thread Denis Kenzior
Include wiphy address setup in wiphy dumps and new wiphy events.  The
wiphy permanent address is exposed as ATTR_MAC.  If addr_mask is setup,
then it is included as ATTR_MAC_MASK attribute.  If multiple addresses
are available, then their are exposed in a nested ATTR_MAC_ADDRS array.

This information is already exposed via sysfs, but it makes sense to
include it in the wiphy dump as well.

Signed-off-by: Denis Kenzior 
---
 net/wireless/nl80211.c | 25 +
 1 file changed, 25 insertions(+)

 Changes in v4:
  -  None

 Changes in v3:
  -  None

 Changes in v2:
  -  Move from case 0 to 9
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index a075d86a52f6..3fc4a9006155 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -2172,6 +2172,31 @@ static int nl80211_send_wiphy(struct 
cfg80211_registered_device *rdev,
rdev->wiphy.vht_capa_mod_mask))
goto nla_put_failure;
 
+   if (nla_put(msg, NL80211_ATTR_MAC, ETH_ALEN,
+   rdev->wiphy.perm_addr))
+   goto nla_put_failure;
+
+   if (!is_zero_ether_addr(rdev->wiphy.addr_mask) &&
+   nla_put(msg, NL80211_ATTR_MAC_MASK, ETH_ALEN,
+   rdev->wiphy.addr_mask))
+   goto nla_put_failure;
+
+   if (rdev->wiphy.n_addresses > 1) {
+   void *attr;
+
+   attr = nla_nest_start_noflag(msg,
+NL80211_ATTR_MAC_ADDRS);
+   if (!attr)
+   goto nla_put_failure;
+
+   for (i = 0; i < rdev->wiphy.n_addresses; i++)
+   if (nla_put(msg, i + 1, ETH_ALEN,
+   rdev->wiphy.addresses[i].addr))
+   goto nla_put_failure;
+
+   nla_nest_end(msg, attr);
+   }
+
state->split_start++;
break;
case 10:
-- 
2.21.0



[PATCH v4 2/3] nl80211: Limit certain commands to interface owner

2019-07-22 Thread Denis Kenzior
If the wdev object has been created (via NEW_INTERFACE) with
SOCKET_OWNER attribute set, then limit certain commands only to the
process that created that wdev.

This can be used to make sure no other process on the system interferes
by sending unwanted scans, action frames or any other funny business.

This patch introduces a new internal flag, and checks that flag in the
pre_doit hook.

Signed-off-by: Denis Kenzior 
---
 net/wireless/nl80211.c | 78 --
 1 file changed, 60 insertions(+), 18 deletions(-)

Changes in v4:
  -  Minor restructuring suggested by Arend

Changes in v3:
  -  Fix minor locking mistake reported by kernel test robot

Changes in v2:
  -  None
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index fc83dd179c1a..a075d86a52f6 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -13602,6 +13602,7 @@ static int nl80211_probe_mesh_link(struct sk_buff *skb, 
struct genl_info *info)
 #define NL80211_FLAG_NEED_WDEV_UP  (NL80211_FLAG_NEED_WDEV |\
 NL80211_FLAG_CHECK_NETDEV_UP)
 #define NL80211_FLAG_CLEAR_SKB 0x20
+#define NL80211_FLAG_OWNER_ONLY0x40
 
 static int nl80211_pre_doit(const struct genl_ops *ops, struct sk_buff *skb,
struct genl_info *info)
@@ -13610,6 +13611,7 @@ static int nl80211_pre_doit(const struct genl_ops *ops, 
struct sk_buff *skb,
struct wireless_dev *wdev;
struct net_device *dev;
bool rtnl = ops->internal_flags & NL80211_FLAG_NEED_RTNL;
+   int ret;
 
if (rtnl)
rtnl_lock();
@@ -13617,10 +13619,10 @@ static int nl80211_pre_doit(const struct genl_ops 
*ops, struct sk_buff *skb,
if (ops->internal_flags & NL80211_FLAG_NEED_WIPHY) {
rdev = cfg80211_get_dev_from_info(genl_info_net(info), info);
if (IS_ERR(rdev)) {
-   if (rtnl)
-   rtnl_unlock();
-   return PTR_ERR(rdev);
+   ret = PTR_ERR(rdev);
+   goto fail;
}
+
info->user_ptr[0] = rdev;
} else if (ops->internal_flags & NL80211_FLAG_NEED_NETDEV ||
   ops->internal_flags & NL80211_FLAG_NEED_WDEV) {
@@ -13629,32 +13631,33 @@ static int nl80211_pre_doit(const struct genl_ops 
*ops, struct sk_buff *skb,
wdev = __cfg80211_wdev_from_attrs(genl_info_net(info),
  info->attrs);
if (IS_ERR(wdev)) {
-   if (rtnl)
-   rtnl_unlock();
-   return PTR_ERR(wdev);
+   ret = PTR_ERR(wdev);
+   goto fail;
}
 
dev = wdev->netdev;
rdev = wiphy_to_rdev(wdev->wiphy);
 
+   ret = -EINVAL;
if (ops->internal_flags & NL80211_FLAG_NEED_NETDEV) {
-   if (!dev) {
-   if (rtnl)
-   rtnl_unlock();
-   return -EINVAL;
-   }
+   if (!dev)
+   goto fail;
 
info->user_ptr[1] = dev;
} else {
info->user_ptr[1] = wdev;
}
 
+   ret = -ENETDOWN;
if (ops->internal_flags & NL80211_FLAG_CHECK_NETDEV_UP &&
-   !wdev_running(wdev)) {
-   if (rtnl)
-   rtnl_unlock();
-   return -ENETDOWN;
-   }
+   !wdev_running(wdev))
+   goto fail;
+
+   ret = -EPERM;
+   if (ops->internal_flags & NL80211_FLAG_OWNER_ONLY &&
+   wdev->owner_nlportid &&
+   wdev->owner_nlportid != info->snd_portid)
+   goto fail;
 
if (dev)
dev_hold(dev);
@@ -13663,6 +13666,12 @@ static int nl80211_pre_doit(const struct genl_ops 
*ops, struct sk_buff *skb,
}
 
return 0;
+
+fail:
+   if (rtnl)
+   rtnl_unlock();
+
+   return ret;
 }
 
 static void nl80211_post_doit(const struct genl_ops *ops, struct sk_buff *skb,
@@ -13727,7 +13736,8 @@ static const struct genl_ops nl80211_ops[] = {
.doit = nl80211_set_interface,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV |
- NL80211_FLAG_NEED_RTNL,
+ NL80211_FLAG_NEED_RTNL |
+ NL80211_FLAG_OWNER_ONLY,
},
{
.cmd = NL80211_CMD_NEW_INT

Re: [PATCH v3 2/3] nl80211: Limit certain commands to interface owner

2019-07-22 Thread Denis Kenzior

Hi Arend,

On 7/18/19 3:24 AM, Arend Van Spriel wrote:

On 7/1/2019 5:33 PM, Denis Kenzior wrote:

If the wdev object has been created (via NEW_INTERFACE) with
SOCKET_OWNER attribute set, then limit certain commands only to the
process that created that wdev.

This can be used to make sure no other process on the system interferes
by sending unwanted scans, action frames or any other funny business.

This patch introduces a new internal flag, and checks that flag in the
pre_doit hook.

Signed-off-by: Denis Kenzior 
---
  net/wireless/nl80211.c | 80 --
  1 file changed, 61 insertions(+), 19 deletions(-)

Changes in v3:
   - Fix minor locking mistake reported by kernel test robot

Changes in v2:
   - None

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index ff760ba83449..ebf5eab1f9b2 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c


[snip]


-    return 0;
+    ret = 0;


I suggest to keep the return 0 here for success path and only do the 
below for failure case (and obviously dropping '&& ret < 0'). Maybe 
rename label 'done' to 'fail' as well.




Sure, makes sense.  I've made the suggested changes for v4.


+done:
+    if (rtnl && ret < 0)
+    rtnl_unlock();
+
+    return ret;
  }


Regards,
Arend


Regards,
-Denis


Re: [PATCH v3 1/3] nl80211: Update uapi for CMD_FRAME_WAIT_CANCEL

2019-07-09 Thread Denis Kenzior

ping?

Regards,
-Denis


[PATCH v3 2/3] nl80211: Limit certain commands to interface owner

2019-07-01 Thread Denis Kenzior
If the wdev object has been created (via NEW_INTERFACE) with
SOCKET_OWNER attribute set, then limit certain commands only to the
process that created that wdev.

This can be used to make sure no other process on the system interferes
by sending unwanted scans, action frames or any other funny business.

This patch introduces a new internal flag, and checks that flag in the
pre_doit hook.

Signed-off-by: Denis Kenzior 
---
 net/wireless/nl80211.c | 80 --
 1 file changed, 61 insertions(+), 19 deletions(-)

Changes in v3:
  - Fix minor locking mistake reported by kernel test robot

Changes in v2:
  - None

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index ff760ba83449..ebf5eab1f9b2 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -13587,6 +13587,7 @@ static int nl80211_probe_mesh_link(struct sk_buff *skb, 
struct genl_info *info)
 #define NL80211_FLAG_NEED_WDEV_UP  (NL80211_FLAG_NEED_WDEV |\
 NL80211_FLAG_CHECK_NETDEV_UP)
 #define NL80211_FLAG_CLEAR_SKB 0x20
+#define NL80211_FLAG_OWNER_ONLY0x40
 
 static int nl80211_pre_doit(const struct genl_ops *ops, struct sk_buff *skb,
struct genl_info *info)
@@ -13595,6 +13596,7 @@ static int nl80211_pre_doit(const struct genl_ops *ops, 
struct sk_buff *skb,
struct wireless_dev *wdev;
struct net_device *dev;
bool rtnl = ops->internal_flags & NL80211_FLAG_NEED_RTNL;
+   int ret;
 
if (rtnl)
rtnl_lock();
@@ -13602,10 +13604,10 @@ static int nl80211_pre_doit(const struct genl_ops 
*ops, struct sk_buff *skb,
if (ops->internal_flags & NL80211_FLAG_NEED_WIPHY) {
rdev = cfg80211_get_dev_from_info(genl_info_net(info), info);
if (IS_ERR(rdev)) {
-   if (rtnl)
-   rtnl_unlock();
-   return PTR_ERR(rdev);
+   ret = PTR_ERR(rdev);
+   goto done;
}
+
info->user_ptr[0] = rdev;
} else if (ops->internal_flags & NL80211_FLAG_NEED_NETDEV ||
   ops->internal_flags & NL80211_FLAG_NEED_WDEV) {
@@ -13614,32 +13616,33 @@ static int nl80211_pre_doit(const struct genl_ops 
*ops, struct sk_buff *skb,
wdev = __cfg80211_wdev_from_attrs(genl_info_net(info),
  info->attrs);
if (IS_ERR(wdev)) {
-   if (rtnl)
-   rtnl_unlock();
-   return PTR_ERR(wdev);
+   ret = PTR_ERR(wdev);
+   goto done;
}
 
dev = wdev->netdev;
rdev = wiphy_to_rdev(wdev->wiphy);
 
+   ret = -EINVAL;
if (ops->internal_flags & NL80211_FLAG_NEED_NETDEV) {
-   if (!dev) {
-   if (rtnl)
-   rtnl_unlock();
-   return -EINVAL;
-   }
+   if (!dev)
+   goto done;
 
info->user_ptr[1] = dev;
} else {
info->user_ptr[1] = wdev;
}
 
+   ret = -ENETDOWN;
if (ops->internal_flags & NL80211_FLAG_CHECK_NETDEV_UP &&
-   !wdev_running(wdev)) {
-   if (rtnl)
-   rtnl_unlock();
-   return -ENETDOWN;
-   }
+   !wdev_running(wdev))
+   goto done;
+
+   ret = -EPERM;
+   if (ops->internal_flags & NL80211_FLAG_OWNER_ONLY &&
+   wdev->owner_nlportid &&
+   wdev->owner_nlportid != info->snd_portid)
+   goto done;
 
if (dev)
dev_hold(dev);
@@ -13647,7 +13650,13 @@ static int nl80211_pre_doit(const struct genl_ops 
*ops, struct sk_buff *skb,
info->user_ptr[0] = rdev;
}
 
-   return 0;
+   ret = 0;
+
+done:
+   if (rtnl && ret < 0)
+   rtnl_unlock();
+
+   return ret;
 }
 
 static void nl80211_post_doit(const struct genl_ops *ops, struct sk_buff *skb,
@@ -13712,7 +13721,8 @@ static const struct genl_ops nl80211_ops[] = {
.doit = nl80211_set_interface,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV |
- NL80211_FLAG_NEED_RTNL,
+ NL80211_FLAG_NEED_RTNL |
+ NL80211_FLAG_OWNER_ONLY,
},
{
.cmd = NL802

[PATCH v3 1/3] nl80211: Update uapi for CMD_FRAME_WAIT_CANCEL

2019-07-01 Thread Denis Kenzior
Commit 1c38c7f22068 ("nl80211: send event when CMD_FRAME duration
expires") added the possibility of NL80211_CMD_FRAME_WAIT_CANCEL
being sent whenever the off-channel wait time associated with a
CMD_FRAME completes.  Document this in the uapi/linux/nl80211.h file.

Signed-off-by: Denis Kenzior 
---
 include/uapi/linux/nl80211.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

 Changes in v3:
  -  None

 Changes in v2:
  - update commit formatting

diff --git a/include/uapi/linux/nl80211.h b/include/uapi/linux/nl80211.h
index 8fc3a43cac75..0d9aad98c983 100644
--- a/include/uapi/linux/nl80211.h
+++ b/include/uapi/linux/nl80211.h
@@ -657,7 +657,9 @@
  * is used during CSA period.
  * @NL80211_CMD_FRAME_WAIT_CANCEL: When an off-channel TX was requested, this
  * command may be used with the corresponding cookie to cancel the wait
- * time if it is known that it is no longer necessary.
+ * time if it is known that it is no longer necessary.  This command is
+ * also sent as an event whenever the driver has completed the off-channel
+ * wait time.
  * @NL80211_CMD_ACTION: Alias for @NL80211_CMD_FRAME for backward 
compatibility.
  * @NL80211_CMD_FRAME_TX_STATUS: Report TX status of a management frame
  * transmitted with %NL80211_CMD_FRAME. %NL80211_ATTR_COOKIE identifies
-- 
2.21.0



[PATCH v3 3/3] nl80211: Include wiphy address setup in NEW_WIPHY

2019-07-01 Thread Denis Kenzior
Include wiphy address setup in wiphy dumps and new wiphy events.  The
wiphy permanent address is exposed as ATTR_MAC.  If addr_mask is setup,
then it is included as ATTR_MAC_MASK attribute.  If multiple addresses
are available, then their are exposed in a nested ATTR_MAC_ADDRS array.

This information is already exposed via sysfs, but it makes sense to
include it in the wiphy dump as well.

Signed-off-by: Denis Kenzior 
---
 net/wireless/nl80211.c | 25 +
 1 file changed, 25 insertions(+)

 Changes in v3:
  - None

 Changes in v2:
  - Move from case 0 to 9
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index ebf5eab1f9b2..fb09c2301ed8 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -2166,6 +2166,31 @@ static int nl80211_send_wiphy(struct 
cfg80211_registered_device *rdev,
rdev->wiphy.vht_capa_mod_mask))
goto nla_put_failure;
 
+   if (nla_put(msg, NL80211_ATTR_MAC, ETH_ALEN,
+   rdev->wiphy.perm_addr))
+   goto nla_put_failure;
+
+   if (!is_zero_ether_addr(rdev->wiphy.addr_mask) &&
+   nla_put(msg, NL80211_ATTR_MAC_MASK, ETH_ALEN,
+   rdev->wiphy.addr_mask))
+   goto nla_put_failure;
+
+   if (rdev->wiphy.n_addresses > 1) {
+   void *attr;
+
+   attr = nla_nest_start_noflag(msg,
+NL80211_ATTR_MAC_ADDRS);
+   if (!attr)
+   goto nla_put_failure;
+
+   for (i = 0; i < rdev->wiphy.n_addresses; i++)
+   if (nla_put(msg, i + 1, ETH_ALEN,
+   rdev->wiphy.addresses[i].addr))
+   goto nla_put_failure;
+
+   nla_nest_end(msg, attr);
+   }
+
state->split_start++;
break;
case 10:
-- 
2.21.0



Re: [PATCH v2 2/3] nl80211: Limit certain commands to interface owner

2019-06-24 Thread Denis Kenzior

Hi Arend,

On 06/24/2019 03:39 AM, Arend Van Spriel wrote:

On 6/22/2019 3:44 PM, Marcel Holtmann wrote:

Hi Arend,


If the wdev object has been created (via NEW_INTERFACE) with
SOCKET_OWNER attribute set, then limit certain commands only to the
process that created that wdev.

This can be used to make sure no other process on the system 
interferes

by sending unwanted scans, action frames or any other funny business.


The flag is a good addition opposed to having handlers deal with 
it. However, earlier motivation for SOCKET_OWNER use was about 
netlink multicast being unreliable, which I can agree to. However, 
avoiding
???  I can't agree to that as I have no idea what you're talking 
about :)  Explain?  SOCKET_OWNER was introduced mainly to bring down 
links / scans / whatever in case the initiating process died.  As a 
side effect it also helped in the beginning when users ran iwd + 
wpa_s simultaneously (by accident) and all sorts of fun ensued.  We 
then re-used SOCKET_OWNER for running EAPoL over NL80211.  But 
'multicast unreliability' was never an issue that I recall?


hmm. I tried searching in memory... of my email client but to no 
avail. I somehow recalled that netlink multicast was not guaranteed 
to be delivered/seen by all listeners.


"funny business" is a different thing. Our testing infrastructure 
is doing all kind of funny business. Guess we will need to refrain 
from
So you're going behind the managing daemon's back and messing with 
the kernel state...  I guess the question is why?  But really, if 
wpa_s wants to tolerate that, that is their problem :)  iwd doesn't 
want to, nor do we want to deal with the various race conditions and 
corner cases associated with that.  Life is hard as it is ;)


That's just it, right. This is what Marcel calls the real 
environment, but is it. The nl80211 is a kernel API and should that 
mean that there must be a managing daemon locking down APIs for other 
user-space tools to use. If I want a user-space app to show a radar 
screen with surrounding APs using scanning and FTM nl80211 commands 
it seems now it has to create a new interface and hope the resources 
are there for it to succeed. Where is my freedom in that? If I am 
using such an app don't you think I don't accept it could impact the 
managing daemon.


if you are operating on a shared radio resource you have to have some 
way of ensuring that nobody steals resources from you. Having an 
external application that will also use scanning and other off-channel 
operation will result in a bad experience. Especially if it involves 
scanning. Currently we still have 3 or more parties triggering 
scanning on nl80211. Essentially they are now fighting for radio time. 
You have wpa_supplicant scanning, you have NetworkManager scanning and 
you have the UI scanning. Now adding just another application that 
just scans at its decided time location / direction finding is not 
helping the situation.


My app was just a hypothetical example. I understand your conundrum, but 
my point was that you can not know how a system is configured. Now for 
the SOCKET_OWNER I should say it does not provide you any guarantees. At 
best it improves your chances. With the nl80211 API being as it is, you 
can not rule out multiple application controlling the same device. The 
virtual interfaces can be guarded with SOCKET_OWNER, but in the end 


+1, SOCKET_OWNER is a band-aid.  But in the end this is a fairly simple 
change.   So we can have the managing daemon destroy any existing wdevs 
and re-create them with SOCKET_OWNER set (can we get brcmfmac to support 
this please?).  This at least gives us a chance that nothing will 
inadvertently cause interference.  It won't stop other processes from 
creating other wdevs or other funny business, but that is currently much 
more rare anyway.  In case it really becomes a problem, we can at least 
say "Don't hold it that way."


there is still one physical device and only if you are lucky you may 
come across a device with two physical radios, but most of them just 
have one. If you really want to be in control we should allow only one 
socket or at least only one "control" socket.


+1.  I've been saying this for a while, there should only be a single 
controlling socket/process, or at least an option for something like 
that.  But the current nl80211 design makes this quite hard.  Hopefully 
the recognition that this is needed will gain traction, until then we're 
stuck with band-aids.




If our kernel cfg80211 / nl80211 would be smart enough to handle these 
concurrent tasks, I would have little objection to let all clients do 
whatever they want, but we don’t have that. I do not want an external 
application messing with my planned radio time. And frankly if I am in 
the middle of roaming, I don’t want to be delayed because some fancy 
radar looking UI decides to start a full spectrum scan or blocks us 
via an action frame that times out.


The have been some effort

Re: [PATCH v2 2/3] nl80211: Limit certain commands to interface owner

2019-06-21 Thread Denis Kenzior

Hi Arend,

"funny business" is a different thing. Our testing infrastructure is 
doing all kind of funny business. Guess we will need to refrain from 


So you're going behind the managing daemon's back and messing with the 
kernel state...  I guess the question is why?  But really, if wpa_s 
wants to tolerate that, that is their problem :)  iwd doesn't want to, 
nor do we want to deal with the various race conditions and corner 
cases associated with that.  Life is hard as it is ;)


That's just it, right. This is what Marcel calls the real environment, 
but is it. The nl80211 is a kernel API and should that mean that there 
must be a managing daemon locking down APIs for other user-space tools 
to use. If I want a user-space app to show a radar screen with 
surrounding APs using scanning and FTM nl80211 commands it seems now it 
has to create a new interface and hope the resources are there for it to 
succeed. Where is my freedom in that? If I am using such an app don't 
you think I don't accept it could impact the managing daemon.


I get it.  But on the flip side, should the managing daemon accept you 
messing with it? I mean there is a definite associated cost here, 
whether it is stuff crashing, having to account for extra corner cases 
and race conditions, giving out erroneous results, etc.


As Marcel pointed out, the proper solution is to do this via some 
diagnostic interface on the managing daemon, so it can properly manage 
such requests to not interfere with whatever else is going on.


By the way, the above would be generally useful to many people, so if 
you have some code lying around... ;)


to give iwd a spin, but this SOCKET_OWNER strategy kept me from it. 
Maybe iwd could have a developer option which disables the use of the 
SOCKET_OWNER attribute.


Okay?  Not sure what you're trying to say here?  I'd interpret this as 
"You guys suck.  I'm taking my ball and going home?" but I hope this 
isn't what you're saying?


Not saying that. Just saying that the "real environment" is in the eye 
of the beholder and it would be nice if there was a way to opt out, but 
Marcel seems strongly opposed to it. So there seems no point in 
scratching that itch and come up with a patch.




I guess the question is, what do you want this for?  If you want this 
for pure manual testing and accept the consequence of the managing 
daemon crashing, giving erroneous results or being otherwise confused? 
If you're fine with the above, I don't see a problem with such a patch.


Regards,
-Denis


Re: iwlwifi/brcmfmac public action frames crash (RESENDING)

2019-06-21 Thread Denis Kenzior

Ping, is anyone looking into these crashes?

On 06/13/2019 11:45 AM, James Prestwood wrote:

Sorry if this comes in twice, I sent it ~12 hours ago but never saw it
hit the list, nor in the archives so I am resending it.

Hi,

Both iwlwifi/brcmfmac seem to be unable to send public action frames to
an unassociated AP. I am attempting to do a GAS ANQP request with a
public action frame (via CMD_FRAME). Immediately after CMD_FRAME any of
the following happens depending on the card:

Intel 7260 (iwlwifi) - System lockup freeze (must hard reboot)
Intel 3160 (iwlwifi) - CMD_FRAME returns -EINVAL
BCM43602 (brcmfmac) - Kernel crash (below)
AR9462 (ath9k) - works
Random USB adapter (rt2800usb) - works

iwlwifi (on 7260) completely locks the system, where the only way to
recover is hard reboot. I have reproduced this on two separate systems,
both with a 7260. I *have* seen it not lock the system once although
lately it seems to happen every time. The 3160 did not cause a hang
with my limited testing, though it did not accept CMD_FRAME which is
likely why it never hung.

Not sure how I can get any more info about the iwlwifi problem as the
system is completely hung, but if there is a way I'll be happy to do
that.

Here is the brcmfmac crash:

[19735.643941] BUG: unable to handle kernel NULL pointer dereference at

[19735.643965] PGD 8001874aa067 P4D 8001874aa067 PUD 2735fe067
PMD 0
[19735.643984] Oops:  [#1] SMP PTI
[19735.643993] CPU: 7 PID: 5051 Comm: iwd Tainted: GW
I   4.19.0-rc2-custom #27
[19735.644002] Hardware name: System manufacturer System Product
Name/SABERTOOTH X58, BIOS 140208/09/2012
[19735.644027] RIP: 0010:brcmf_p2p_send_action_frame+0x23a/0x850
[brcmfmac]
[19735.644037] Code: 41 c7 86 e0 00 00 00 00 00 00 00 f0 41 80 66 20 bf
f0 41 80 66 20 7f 49 8b 46 48 b9 24 07 00 00 48 89 da 48 c7 c6 3d 00 8f
c0 <48> 8b 38 e8 3e d7 ff ff 85 c0 41 89 c5 0f 85 c4 00 00 00 8b 03 49
[19735.644051] RSP: 0018:a879c8477a00 EFLAGS: 00010246
[19735.644059] RAX:  RBX: 954a2e059000 RCX:
0724
[19735.644067] RDX: 954a2e059000 RSI: c08f003d RDI:
0002
[19735.644075] RBP: a879c8477a50 R08: 001c R09:
0999
[19735.644083] R10: 954b157a2f00 R11: c072 R12:
954c32f26021
[19735.644091] R13: 954a2e059000 R14: 954c32f26000 R15:

[19735.644099] FS:  7f8d5aa30740() GS:954c369c()
knlGS:
[19735.644108] CS:  0010 DS:  ES:  CR0: 80050033
[19735.644115] CR2:  CR3: 0001845c8000 CR4:
06e0
[19735.644123] Call Trace:
[19735.644133]  ? _cond_resched+0x19/0x40
[19735.644153]  brcmf_cfg80211_mgmt_tx+0x170/0x2f0 [brcmfmac]
[19735.644192]  cfg80211_mlme_mgmt_tx+0x115/0x2f0 [cfg80211]
[19735.644219]  nl80211_tx_mgmt+0x24d/0x3d0 [cfg80211]
[19735.644228]  genl_family_rcv_msg+0x1fe/0x3f0
[19735.644237]  ? nlmon_xmit+0x2c/0x30
[19735.644246]  ? dev_hard_start_xmit+0xa8/0x210
[19735.644254]  genl_rcv_msg+0x4c/0x90
[19735.644261]  ? genl_family_rcv_msg+0x3f0/0x3f0
[19735.644268]  netlink_rcv_skb+0x54/0x130
[19735.644275]  genl_rcv+0x28/0x40
[19735.644281]  netlink_unicast+0x1ab/0x250
[19735.644288]  netlink_sendmsg+0x2d1/0x3d0
[19735.644297]  sock_sendmsg+0x3e/0x50
[19735.644304]  __sys_sendto+0x13f/0x180
[19735.644313]  ? do_epoll_wait+0xb0/0xc0
[19735.644321]  __x64_sys_sendto+0x28/0x30
[19735.644329]  do_syscall_64+0x5a/0x120
[19735.644336]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[19735.644344] RIP: 0033:0x7f8d5a352c4d
[19735.644350] Code: ff ff ff ff eb b6 0f 1f 80 00 00 00 00 48 8d 05 c1
dc 2c 00 41 89 ca 8b 00 85 c0 75 20 45 31 c9 45 31 c0 b8 2c 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 6b f3 c3 66 0f 1f 84 00 00 00 00 00 41 56 41
[19735.644365] RSP: 002b:7ffc9a618048 EFLAGS: 0246 ORIG_RAX:
002c
[19735.644374] RAX: ffda RBX: 007077d0 RCX:
7f8d5a352c4d
[19735.644382] RDX: 0068 RSI: 0072bc40 RDI:
0004
[19735.644390] RBP: 00733510 R08:  R09:

[19735.644397] R10:  R11: 0246 R12:
7ffc9a618094
[19735.644405] R13: 7ffc9a61809c R14:  R15:

[19735.644414] Modules linked in: ccm algif_aead snd_hda_codec_realtek
snd_hda_codec_generic snd_hda_codec_hdmi binfmt_misc arc4 nouveau
gpio_ich ath9k mxm_wmi ath9k_common video rt2800usb intel_powerclamp
snd_hda_intel ath9k_hw rt2x00usb iwlmvm rt2800lib snd_hda_codec
rt2x00lib ath snd_seq_midi snd_seq_midi_event coretemp ttm mac80211
snd_hda_core brcmfmac snd_hwdep snd_rawmidi iwlwifi intel_cstate
drm_kms_helper brcmutil snd_seq drm snd_pcm input_leds serio_raw
lpc_ich cfg80211 snd_seq_device i2c_algo_bit snd_timer fb_sys_fops
syscopyarea sysfillrect snd sysimgblt i5500_temp wmi asus_atk0110
soundcore mac_hid i7core_edac sch_fq_codel kvm_intel kvm vfio_pci
vfio_virqfd irqbypass vfio_iommu_

Re: [PATCH v2 2/3] nl80211: Limit certain commands to interface owner

2019-06-21 Thread Denis Kenzior

Hi Arend,

On 06/21/2019 03:09 AM, Arend Van Spriel wrote:

On 6/21/2019 12:07 AM, Denis Kenzior wrote:

If the wdev object has been created (via NEW_INTERFACE) with
SOCKET_OWNER attribute set, then limit certain commands only to the
process that created that wdev.

This can be used to make sure no other process on the system interferes
by sending unwanted scans, action frames or any other funny business.


The flag is a good addition opposed to having handlers deal with it. 
However, earlier motivation for SOCKET_OWNER use was about netlink 
multicast being unreliable, which I can agree to. However, avoiding 


???  I can't agree to that as I have no idea what you're talking about 
:)  Explain?  SOCKET_OWNER was introduced mainly to bring down links / 
scans / whatever in case the initiating process died.  As a side effect 
it also helped in the beginning when users ran iwd + wpa_s 
simultaneously (by accident) and all sorts of fun ensued.  We then 
re-used SOCKET_OWNER for running EAPoL over NL80211.  But 'multicast 
unreliability' was never an issue that I recall?


"funny business" is a different thing. Our testing infrastructure is 
doing all kind of funny business. Guess we will need to refrain from 


So you're going behind the managing daemon's back and messing with the 
kernel state...  I guess the question is why?  But really, if wpa_s 
wants to tolerate that, that is their problem :)  iwd doesn't want to, 
nor do we want to deal with the various race conditions and corner cases 
associated with that.  Life is hard as it is ;)


using any user-space wireless tools that use the SOCKET_OWNER attribute, 
but how do we know? Somehow I suspect iwd is one to avoid ;-) I have yet 


I guess you will be avoiding wpa_s since that one uses SOCKET_OWNER too ;)

to give iwd a spin, but this SOCKET_OWNER strategy kept me from it. 
Maybe iwd could have a developer option which disables the use of the 
SOCKET_OWNER attribute.


Okay?  Not sure what you're trying to say here?  I'd interpret this as 
"You guys suck.  I'm taking my ball and going home?" but I hope this 
isn't what you're saying?


Regards,
-Denis


Re: [PATCH 3/3] nl80211: Include wiphy address setup in NEW_WIPHY

2019-06-20 Thread Denis Kenzior

Hi Johannes,

On 06/20/2019 03:21 PM, Johannes Berg wrote:

On Thu, 2019-06-20 at 22:09 +0200, Johannes Berg wrote:


Sure, but you don't really need to know *everything* about the events
right there ... you can already filter which ones you care about
(perhaps you know you never want to bind hwsim ones for example) and
then request data on those that you do need.


Btw, you can send a filter down when you do request the data, so you
only get the data for the new wiphy you actually just discovered.


Yes, I know that.  I did help fix this ~3 years ago in commit 
b7fb44dacae04.  Nobody was using that prior, which really leads me to 
wonder what other userspace tools are doing for hotplug and how broken 
they are...




So realistically, vs. your suggestion of sending all of the data in
multiple events, that just adds 2 messages (the request and the data you
already had), which isn't nearly as bad as you paint it.


I never 'painted' the message overhead as 'bad'.  The performance 
overhead of this ping-pong is probably irrelevant in the grand scheme of 
things.  But I find the approach inelegant.


But really I'm more worried about race conditions that userspace has to 
deal with.  We already have the weird case of ATTR_GENERATION (which 
nobody actually uses btw).  And then we also need to dump both the 
wiphys and the interfaces separately, cross-reference them while dealing 
with the possibility of a wiphy or interface going away or being added 
at any point.   Then there's the fact that some drivers always add a 
default netdev, others that (possibly) don't and the possibility that 
the system was left in a weird state.


So from that standpoint it is far better to have the kernel generate 
atomic change events with all the info present than having userspace 
re-poll it when stuff might have changed in the meantime.


Going back to your 2 message point.  What about  sending the 'NEW_WIPHY' 
event with the info in labels 0-8.  And then another event type with the 
'rest' of the info.  And perhaps another feature bit to tell userspace 
to expect multiple events.  That would still end up being 2 messages and 
still be more efficient than the ping-pong you suggest.


Regards,
-Denis


[PATCH v2 3/3] nl80211: Include wiphy address setup in NEW_WIPHY

2019-06-20 Thread Denis Kenzior
Include wiphy address setup in wiphy dumps and new wiphy events.  The
wiphy permanent address is exposed as ATTR_MAC.  If addr_mask is setup,
then it is included as ATTR_MAC_MASK attribute.  If multiple addresses
are available, then their are exposed in a nested ATTR_MAC_ADDRS array.

This information is already exposed via sysfs, but it makes sense to
include it in the wiphy dump as well.

Signed-off-by: Denis Kenzior 
---
 net/wireless/nl80211.c | 25 +
 1 file changed, 25 insertions(+)

 Changes in v2:
  - Move from case 0 to 9

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 26bab9560c0f..f4b3e6f1dfbf 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -2166,6 +2166,31 @@ static int nl80211_send_wiphy(struct 
cfg80211_registered_device *rdev,
rdev->wiphy.vht_capa_mod_mask))
goto nla_put_failure;
 
+   if (nla_put(msg, NL80211_ATTR_MAC, ETH_ALEN,
+   rdev->wiphy.perm_addr))
+   goto nla_put_failure;
+
+   if (!is_zero_ether_addr(rdev->wiphy.addr_mask) &&
+   nla_put(msg, NL80211_ATTR_MAC_MASK, ETH_ALEN,
+   rdev->wiphy.addr_mask))
+   goto nla_put_failure;
+
+   if (rdev->wiphy.n_addresses > 1) {
+   void *attr;
+
+   attr = nla_nest_start_noflag(msg,
+NL80211_ATTR_MAC_ADDRS);
+   if (!attr)
+   goto nla_put_failure;
+
+   for (i = 0; i < rdev->wiphy.n_addresses; i++)
+   if (nla_put(msg, i + 1, ETH_ALEN,
+   rdev->wiphy.addresses[i].addr))
+   goto nla_put_failure;
+
+   nla_nest_end(msg, attr);
+   }
+
state->split_start++;
break;
case 10:
-- 
2.21.0



[PATCH v2 1/3] nl80211: Update uapi for CMD_FRAME_WAIT_CANCEL

2019-06-20 Thread Denis Kenzior
Commit 1c38c7f22068 ("nl80211: send event when CMD_FRAME duration
expires") added the possibility of NL80211_CMD_FRAME_WAIT_CANCEL
being sent whenever the off-channel wait time associated with a
CMD_FRAME completes.  Document this in the uapi/linux/nl80211.h file.

Signed-off-by: Denis Kenzior 
---
 include/uapi/linux/nl80211.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

 Changes in v2:
  - update commit formatting

diff --git a/include/uapi/linux/nl80211.h b/include/uapi/linux/nl80211.h
index 8fc3a43cac75..0d9aad98c983 100644
--- a/include/uapi/linux/nl80211.h
+++ b/include/uapi/linux/nl80211.h
@@ -657,7 +657,9 @@
  * is used during CSA period.
  * @NL80211_CMD_FRAME_WAIT_CANCEL: When an off-channel TX was requested, this
  * command may be used with the corresponding cookie to cancel the wait
- * time if it is known that it is no longer necessary.
+ * time if it is known that it is no longer necessary.  This command is
+ * also sent as an event whenever the driver has completed the off-channel
+ * wait time.
  * @NL80211_CMD_ACTION: Alias for @NL80211_CMD_FRAME for backward 
compatibility.
  * @NL80211_CMD_FRAME_TX_STATUS: Report TX status of a management frame
  * transmitted with %NL80211_CMD_FRAME. %NL80211_ATTR_COOKIE identifies
-- 
2.21.0



[PATCH v2 2/3] nl80211: Limit certain commands to interface owner

2019-06-20 Thread Denis Kenzior
If the wdev object has been created (via NEW_INTERFACE) with
SOCKET_OWNER attribute set, then limit certain commands only to the
process that created that wdev.

This can be used to make sure no other process on the system interferes
by sending unwanted scans, action frames or any other funny business.

This patch introduces a new internal flag, and checks that flag in the
pre_doit hook.

Signed-off-by: Denis Kenzior 
---
 net/wireless/nl80211.c | 80 --
 1 file changed, 61 insertions(+), 19 deletions(-)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index ff760ba83449..26bab9560c0f 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -13587,6 +13587,7 @@ static int nl80211_probe_mesh_link(struct sk_buff *skb, 
struct genl_info *info)
 #define NL80211_FLAG_NEED_WDEV_UP  (NL80211_FLAG_NEED_WDEV |\
 NL80211_FLAG_CHECK_NETDEV_UP)
 #define NL80211_FLAG_CLEAR_SKB 0x20
+#define NL80211_FLAG_OWNER_ONLY0x40
 
 static int nl80211_pre_doit(const struct genl_ops *ops, struct sk_buff *skb,
struct genl_info *info)
@@ -13595,6 +13596,7 @@ static int nl80211_pre_doit(const struct genl_ops *ops, 
struct sk_buff *skb,
struct wireless_dev *wdev;
struct net_device *dev;
bool rtnl = ops->internal_flags & NL80211_FLAG_NEED_RTNL;
+   int ret;
 
if (rtnl)
rtnl_lock();
@@ -13602,10 +13604,10 @@ static int nl80211_pre_doit(const struct genl_ops 
*ops, struct sk_buff *skb,
if (ops->internal_flags & NL80211_FLAG_NEED_WIPHY) {
rdev = cfg80211_get_dev_from_info(genl_info_net(info), info);
if (IS_ERR(rdev)) {
-   if (rtnl)
-   rtnl_unlock();
-   return PTR_ERR(rdev);
+   ret = PTR_ERR(rdev);
+   goto done;
}
+
info->user_ptr[0] = rdev;
} else if (ops->internal_flags & NL80211_FLAG_NEED_NETDEV ||
   ops->internal_flags & NL80211_FLAG_NEED_WDEV) {
@@ -13614,32 +13616,33 @@ static int nl80211_pre_doit(const struct genl_ops 
*ops, struct sk_buff *skb,
wdev = __cfg80211_wdev_from_attrs(genl_info_net(info),
  info->attrs);
if (IS_ERR(wdev)) {
-   if (rtnl)
-   rtnl_unlock();
-   return PTR_ERR(wdev);
+   ret = PTR_ERR(wdev);
+   goto done;
}
 
dev = wdev->netdev;
rdev = wiphy_to_rdev(wdev->wiphy);
 
+   ret = -EINVAL;
if (ops->internal_flags & NL80211_FLAG_NEED_NETDEV) {
-   if (!dev) {
-   if (rtnl)
-   rtnl_unlock();
-   return -EINVAL;
-   }
+   if (!dev)
+   goto done;
 
info->user_ptr[1] = dev;
} else {
info->user_ptr[1] = wdev;
}
 
+   ret = -ENETDOWN;
if (ops->internal_flags & NL80211_FLAG_CHECK_NETDEV_UP &&
-   !wdev_running(wdev)) {
-   if (rtnl)
-   rtnl_unlock();
-   return -ENETDOWN;
-   }
+   !wdev_running(wdev))
+   goto done;
+
+   ret = -EPERM;
+   if (ops->internal_flags & NL80211_FLAG_OWNER_ONLY &&
+   wdev->owner_nlportid &&
+   wdev->owner_nlportid != info->snd_portid)
+   goto done;
 
if (dev)
dev_hold(dev);
@@ -13647,7 +13650,13 @@ static int nl80211_pre_doit(const struct genl_ops 
*ops, struct sk_buff *skb,
info->user_ptr[0] = rdev;
}
 
-   return 0;
+   ret = 0;
+
+done:
+   if (rtnl && !ret)
+   rtnl_unlock();
+
+   return ret;
 }
 
 static void nl80211_post_doit(const struct genl_ops *ops, struct sk_buff *skb,
@@ -13712,7 +13721,8 @@ static const struct genl_ops nl80211_ops[] = {
.doit = nl80211_set_interface,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV |
- NL80211_FLAG_NEED_RTNL,
+ NL80211_FLAG_NEED_RTNL |
+ NL80211_FLAG_OWNER_ONLY,
},
{
.cmd = NL80211_CMD_NEW_INTERFACE,
@@ -13728,7 +13738,8 @@ static const struct genl_ops nl80211_ops[] = {
 

Re: [PATCH 3/3] nl80211: Include wiphy address setup in NEW_WIPHY

2019-06-20 Thread Denis Kenzior

Hi Johannes,

On 06/20/2019 03:09 PM, Johannes Berg wrote:

On Thu, 2019-06-20 at 15:05 -0500, Denis Kenzior wrote:


Ugh.  So, if I understand this correctly, NEW_WIPHY events that are
generated when a new wiphy is plugged would only send the old 'legacy'
info and any info we add in cases 9+ would be 'lost' and the application
is forced into re-dumping the phy.


Yes.


This is pretty much counter to what we want.


Well, you want the info, shouldn't matter how you get it?



Well, it kind of does.  You're asking userspace to introduce extra 
complexity, extra round trips, extra stuff to go wrong just because the 
kernel API has painted itself into a corner.



If you want to keep your sanity in userspace, you need proper 'object
appeared' / 'object disappeared' events from the kernel.


Sure, but you don't really need to know *everything* about the events
right there ... you can already filter which ones you care about
(perhaps you know you never want to bind hwsim ones for example) and
then request data on those that you do need.


Sure, but it would be nice to have all the info available if we do not 
want to filter it...





And those
events should have all or nearly all info to not bother the kernel going
forward.


That's what you wish for, but ...


Well, it is a pretty basic requirement for any event driven API, no?




   It sounds like nl80211 API has run into the extend-ability
wall, no?


I don't really see it that way.


Any suggestions on how to resolve this?  Should NEW_WIPHY events also do
the whole split_dump semantic and generate 15+ or whatever messages?


No, that'd be awful, and anyway you'd have to send a new command because
otherwise old applications might be completely confused (not that I know
of any other than "iw event" that would event listen to this, but who
knows)


Well, given that we're the only ones that seem to care about this right 
now, I don't see sending a new command as much of a big deal.  I welcome 
other ideas, but having the kernel send us an event, then us turning 
around and requesting the *same* info is just silly.


Regards,
-Denis


Re: [PATCH 3/3] nl80211: Include wiphy address setup in NEW_WIPHY

2019-06-20 Thread Denis Kenzior

Hi Johannes,

On 06/20/2019 02:17 PM, Johannes Berg wrote:

Hi Denis,


We generally can't add anything to any of the cases before the split was
allowed, for compatibility with old userspace.


Can you educate me here? Is it because the non-split dump messages would
grow too large?


No. Those messages aren't really relevant, userspace will need to do a
larger buffer for it.

The problem is that old userspace (like really old) didn't split even
dumps. Eventually, we had so much information here that the default dump
message size is exceeded, and we simply couldn't dump that particular
wiphy anymore.

We solved this by splitting the wiphy information into multiple
messages, but that needed new userspace, so when userspace doesn't
request split dumps, we fall through all the way to "case 8" and then
stop - old userspace cannot care about new information anyway.

The reason it was split into cases 0-8 that are combined in non-split
dumps is that it was safer that way - there were certain configurations
where even the original data would go above the message size limit.


Ugh.  So, if I understand this correctly, NEW_WIPHY events that are 
generated when a new wiphy is plugged would only send the old 'legacy' 
info and any info we add in cases 9+ would be 'lost' and the application 
is forced into re-dumping the phy.  This is pretty much counter to what 
we want.


If you want to keep your sanity in userspace, you need proper 'object 
appeared' / 'object disappeared' events from the kernel.  And those 
events should have all or nearly all info to not bother the kernel going 
forward.  It sounds like nl80211 API has run into the extend-ability 
wall, no?


Any suggestions on how to resolve this?  Should NEW_WIPHY events also do 
the whole split_dump semantic and generate 15+ or whatever messages?


Regards,
-Denis


Re: [PATCH 3/3] nl80211: Include wiphy address setup in NEW_WIPHY

2019-06-20 Thread Denis Kenzior

Hi Johannes,

On 06/20/2019 01:58 AM, Johannes Berg wrote:

Didn't really review all of this yet, but

switch (state->split_start) {

case 0:
+   if (nla_put(msg, NL80211_ATTR_MAC, ETH_ALEN,
+   rdev->wiphy.perm_addr))
+   goto nla_put_failure;


We generally can't add anything to any of the cases before the split was
allowed, for compatibility with old userspace.


Can you educate me here? Is it because the non-split dump messages would 
grow too large?  But then non-dumps aren't split, so I still don't get 
how anyone can be broken by this (that isn't already broken in the first 
place).


Anyhow, What is the cut off point?  It didn't seem worthwhile to send 
yet-another-message for ~60 bytes of data, but if you want me to add it 
as a separate message, no problem.


Regards,
-Denis


[PATCH 3/3] nl80211: Include wiphy address setup in NEW_WIPHY

2019-06-19 Thread Denis Kenzior
Include wiphy address setup in wiphy dumps and new wiphy events.  The
wiphy permanent address is exposed as ATTR_MAC.  If addr_mask is setup,
then it is included as ATTR_MAC_MASK attribute.  If multiple addresses
are available, then their are exposed in a nested ATTR_MAC_ADDRS array.

This information is already exposed via sysfs, but it makes sense to
include it in the wiphy dump as well.

Signed-off-by: Denis Kenzior 
---
 net/wireless/nl80211.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 26bab9560c0f..65f3d47d9b63 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -1852,6 +1852,31 @@ static int nl80211_send_wiphy(struct 
cfg80211_registered_device *rdev,
 
switch (state->split_start) {
case 0:
+   if (nla_put(msg, NL80211_ATTR_MAC, ETH_ALEN,
+   rdev->wiphy.perm_addr))
+   goto nla_put_failure;
+
+   if (!is_zero_ether_addr(rdev->wiphy.addr_mask) &&
+   nla_put(msg, NL80211_ATTR_MAC_MASK, ETH_ALEN,
+   rdev->wiphy.addr_mask))
+   goto nla_put_failure;
+
+   if (rdev->wiphy.n_addresses > 1) {
+   void *attr;
+
+   attr = nla_nest_start_noflag(msg,
+NL80211_ATTR_MAC_ADDRS);
+   if (!attr)
+   goto nla_put_failure;
+
+   for (i = 0; i < rdev->wiphy.n_addresses; i++)
+   if (nla_put(msg, i + 1, ETH_ALEN,
+   rdev->wiphy.addresses[i].addr))
+   goto nla_put_failure;
+
+   nla_nest_end(msg, attr);
+   }
+
if (nla_put_u8(msg, NL80211_ATTR_WIPHY_RETRY_SHORT,
   rdev->wiphy.retry_short) ||
nla_put_u8(msg, NL80211_ATTR_WIPHY_RETRY_LONG,
-- 
2.21.0



[PATCH 1/3] nl80211: Update uapi for CMD_FRAME_WAIT_CANCEL

2019-06-19 Thread Denis Kenzior
1c38c7f2 added the possibility of NL80211_CMD_FRAME_WAIT_CANCEL
being sent whenever the off-channel wait time associated with a
CMD_FRAME completes.  Document this in the uapi/linux/nl80211.h file.

Signed-off-by: Denis Kenzior 
---
 include/uapi/linux/nl80211.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/nl80211.h b/include/uapi/linux/nl80211.h
index 8fc3a43cac75..0d9aad98c983 100644
--- a/include/uapi/linux/nl80211.h
+++ b/include/uapi/linux/nl80211.h
@@ -657,7 +657,9 @@
  * is used during CSA period.
  * @NL80211_CMD_FRAME_WAIT_CANCEL: When an off-channel TX was requested, this
  * command may be used with the corresponding cookie to cancel the wait
- * time if it is known that it is no longer necessary.
+ * time if it is known that it is no longer necessary.  This command is
+ * also sent as an event whenever the driver has completed the off-channel
+ * wait time.
  * @NL80211_CMD_ACTION: Alias for @NL80211_CMD_FRAME for backward 
compatibility.
  * @NL80211_CMD_FRAME_TX_STATUS: Report TX status of a management frame
  * transmitted with %NL80211_CMD_FRAME. %NL80211_ATTR_COOKIE identifies
-- 
2.21.0



[PATCH 2/3] nl80211: Limit certain commands to interface owner

2019-06-19 Thread Denis Kenzior
If the wdev object has been created (via NEW_INTERFACE) with
SOCKET_OWNER attribute set, then limit certain commands only to the
process that created that wdev.

This can be used to make sure no other process on the system interferes
by sending unwanted scans, action frames or any other funny business.

This patch introduces a new internal flag, and checks that flag in the
pre_doit hook.

Signed-off-by: Denis Kenzior 
---
 net/wireless/nl80211.c | 80 --
 1 file changed, 61 insertions(+), 19 deletions(-)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index ff760ba83449..26bab9560c0f 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -13587,6 +13587,7 @@ static int nl80211_probe_mesh_link(struct sk_buff *skb, 
struct genl_info *info)
 #define NL80211_FLAG_NEED_WDEV_UP  (NL80211_FLAG_NEED_WDEV |\
 NL80211_FLAG_CHECK_NETDEV_UP)
 #define NL80211_FLAG_CLEAR_SKB 0x20
+#define NL80211_FLAG_OWNER_ONLY0x40
 
 static int nl80211_pre_doit(const struct genl_ops *ops, struct sk_buff *skb,
struct genl_info *info)
@@ -13595,6 +13596,7 @@ static int nl80211_pre_doit(const struct genl_ops *ops, 
struct sk_buff *skb,
struct wireless_dev *wdev;
struct net_device *dev;
bool rtnl = ops->internal_flags & NL80211_FLAG_NEED_RTNL;
+   int ret;
 
if (rtnl)
rtnl_lock();
@@ -13602,10 +13604,10 @@ static int nl80211_pre_doit(const struct genl_ops 
*ops, struct sk_buff *skb,
if (ops->internal_flags & NL80211_FLAG_NEED_WIPHY) {
rdev = cfg80211_get_dev_from_info(genl_info_net(info), info);
if (IS_ERR(rdev)) {
-   if (rtnl)
-   rtnl_unlock();
-   return PTR_ERR(rdev);
+   ret = PTR_ERR(rdev);
+   goto done;
}
+
info->user_ptr[0] = rdev;
} else if (ops->internal_flags & NL80211_FLAG_NEED_NETDEV ||
   ops->internal_flags & NL80211_FLAG_NEED_WDEV) {
@@ -13614,32 +13616,33 @@ static int nl80211_pre_doit(const struct genl_ops 
*ops, struct sk_buff *skb,
wdev = __cfg80211_wdev_from_attrs(genl_info_net(info),
  info->attrs);
if (IS_ERR(wdev)) {
-   if (rtnl)
-   rtnl_unlock();
-   return PTR_ERR(wdev);
+   ret = PTR_ERR(wdev);
+   goto done;
}
 
dev = wdev->netdev;
rdev = wiphy_to_rdev(wdev->wiphy);
 
+   ret = -EINVAL;
if (ops->internal_flags & NL80211_FLAG_NEED_NETDEV) {
-   if (!dev) {
-   if (rtnl)
-   rtnl_unlock();
-   return -EINVAL;
-   }
+   if (!dev)
+   goto done;
 
info->user_ptr[1] = dev;
} else {
info->user_ptr[1] = wdev;
}
 
+   ret = -ENETDOWN;
if (ops->internal_flags & NL80211_FLAG_CHECK_NETDEV_UP &&
-   !wdev_running(wdev)) {
-   if (rtnl)
-   rtnl_unlock();
-   return -ENETDOWN;
-   }
+   !wdev_running(wdev))
+   goto done;
+
+   ret = -EPERM;
+   if (ops->internal_flags & NL80211_FLAG_OWNER_ONLY &&
+   wdev->owner_nlportid &&
+   wdev->owner_nlportid != info->snd_portid)
+   goto done;
 
if (dev)
dev_hold(dev);
@@ -13647,7 +13650,13 @@ static int nl80211_pre_doit(const struct genl_ops 
*ops, struct sk_buff *skb,
info->user_ptr[0] = rdev;
}
 
-   return 0;
+   ret = 0;
+
+done:
+   if (rtnl && !ret)
+   rtnl_unlock();
+
+   return ret;
 }
 
 static void nl80211_post_doit(const struct genl_ops *ops, struct sk_buff *skb,
@@ -13712,7 +13721,8 @@ static const struct genl_ops nl80211_ops[] = {
.doit = nl80211_set_interface,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = NL80211_FLAG_NEED_NETDEV |
- NL80211_FLAG_NEED_RTNL,
+ NL80211_FLAG_NEED_RTNL |
+ NL80211_FLAG_OWNER_ONLY,
},
{
.cmd = NL80211_CMD_NEW_INTERFACE,
@@ -13728,7 +13738,8 @@ static const struct genl_ops nl80211_ops[] = {
 

Re: cellular modem APIs - take 2

2019-05-29 Thread Denis Kenzior

Hi Johannes,



After all, I'm not really proposing that we put oFono or something like
it into the kernel - far from it! I'm only proposing that we kill the
many various ways of creating and managing the necessary netdevs (VLANs,
sysfs, rmnet, ...) from a piece of software like oFono (or libmbim or
whatever else).


I do like the concept of unifying this if possible.  The question is, is 
it actually possible :)  I think Dan covered most of the aspects of what 
userspace has to deal with already.  But the basic issue is that there's 
a heck of a lot of different ways of doing it.




Apart from CAIF and phonet, oFono doesn't even try to do this though,
afaict, so I guess it relies on the default netdev created, or some out-
of-band configuration is still needed?


Actually it can.  We can drive modems which provide only a single serial 
port and run multiplexing over that.  So we fully control the number of 
control channels created, the number of netdevs created and even 
create/destroy them on as needed basis.  And these netdevs can be PPP 
encapsulated or pure IP or whatever else.


Regards,
-Denis


Re: cellular modem APIs - take 2

2019-05-29 Thread Denis Kenzior

Hi Johannes,



It just seemed that people do want to have a netdev (if only to see that
their driver loaded and use ethtool to dump the firmware version), and
then you can reassign it to some actual context later?


I can see that this is useful for developers, but really 
counterproductive in production.  You have a bunch of services (systemd, 
NM, ConnMan, dhcpcd, etc, etc, etc) all observing the newly created 
devices and trying to 'own' them.  Which makes no freaking sense to do 
until those devices are really usable.  Just because it is a netdev, 
doesn't mean it is ethernet or behaves like it.


That also leads to expectations from users that want stuff like 
udev-consistent-naming to work, even though udev has no idea wtf a given 
device is, when it is ready or not ready, etc.  And the flip side, there 
may be systems that do not use systemd/udevd, so the daemons responsible 
for management of such devices cannot sanely assume anything.  It is 
just pure chaos.


And then there's hotplug/unplug to worry about ;)

So I would like to reiterate Marcel's point: creating unmanaged devices 
should not be the default behavior.




It doesn't really matter much. If you have a control channel and higher-
level abstraction (wwan device) then having the netdev is probably more
of a nuisance and mostly irrelevant, just might be useful for legacy
reasons.


Which we should be trying to eradicate, not encourage ;)


Should you really need to account for these specially, or would some
kind of sysfs linkage like SET_NETDEV_DEV() be more appropriate?

Really all you want to do is (a) identify which WWAN device a given
control/data channel is for and (b) perhaps tag different control/data
channels with attributes like primary/secondary/gps/sim/etc often
through USB attributes or hardcoded data on SoCs.


Userspace can also choose to do its own multiplexing, so you can't even 
really assume the above is what you 'want'.


Regards,
-Denis


Re: FYI: vendor specific nl80211 API upstream

2019-05-29 Thread Denis Kenzior

Hi Johannes,

On 05/29/2019 04:09 AM, Johannes Berg wrote:

On Tue, 2019-05-28 at 12:36 -0500, Denis Kenzior wrote:


I'm guessing that you guys considered and rejected the idea of pushing
these out to a separate, vendor specific genl family instead?


We do actually use that internally (though mostly for cases where we
don't have a cfg80211 connection like manufacturing support), but vendor
commands are there and people do like to use them :-)


And herein lies the danger.  If you make it too easy to add vendor APIs, 
there's no incentive for the vendors to do anything else.  In the end 
this all becomes a mess for userspace to deal with.


One idea off the top of my head is to introduce a concept of 
'experimental' APIs in NL80211, ones that are not guaranteed to be ABI 
stable going forward.  Specifically for dealing with such 'vendor' APIs. 
 The semantic difference might be subtle, but I think the effect will 
be drastically different.  E.g. people will approach this more seriously 
and you will get more people reviewing the API.




The idea with formalizing this is that they actually get more
visibility, and I hope that this will lead to more forming of real
nl80211 API too.


What about ABI guarantees (to tie it in with the discussion above) ?
If the vendor wants to change their API, can they?  Are NL80211 APIs 
stable unless they are vendor APIs?


Anyhow, speaking from experience with oFono, which has to deal with a 
bazillion of wwan modem vendors, I suspect that the opposite will 
actually happen.  Any time we let through a vendor API, the vendor lost 
any interest in generalizing it further.  And it becomes a huge pain to 
implement a proper generic one later.  I get that there are cases where 
something just cannot be generalized.  In that case it belongs on a 
separate genl family (or whatever) altogether.


So I would highly encourage you to reconsider this decision and 
deprecate vendor APIs altogether.  If someone really cares, they can 
implement their own genl family.  It is really not that hard.  And then 
they control the API, API stability policy, etc.


Regards,
-Denis


Re: brcmfmac & DEL_INTERFACE

2019-05-28 Thread Denis Kenzior

Hi Arend,

On 05/28/2019 03:27 PM, Arend Van Spriel wrote:

On 5/28/2019 8:16 PM, Denis Kenzior wrote:

Hi Arend,

We noticed that brcmfmac doesn't support .del_virtual_intf for 
non-p2p/ap interface types.  Any chance this can be added?


We currently remove all wifi interfaces and re-create the needed ones 
with SOCKET_OWNER set, and it would be nice if we didn't need to treat 
brcmfmac specially.


This came up recently. During probe the driver creates a network 
interface that we refer to as primary interface. We consider this 
non-virtual and ownership is with the driver. My guess is that this 
concept comes from the WEXT era, where we did not have the ieee80211 phy 
objects to interact with the driver from user-space. I suppose you don't 
mind the creation of this interface and just want to allow removing it, 
right?


Correct.  If we can at least get the DEL_INTERFACE supported, that would 
solve our immediate use case.


I do think that the drivers should not be creating a netdev by default 
and should wait until userspace asks for it.  But that is a separate 
topic, with backwards compatibility concerns, so I'll leave it for the 
future :)


Regards,
-Denis


brcmfmac & DEL_INTERFACE

2019-05-28 Thread Denis Kenzior

Hi Arend,

We noticed that brcmfmac doesn't support .del_virtual_intf for 
non-p2p/ap interface types.  Any chance this can be added?


We currently remove all wifi interfaces and re-create the needed ones 
with SOCKET_OWNER set, and it would be nice if we didn't need to treat 
brcmfmac specially.


Regards,
-Denis


Re: FYI: vendor specific nl80211 API upstream

2019-05-28 Thread Denis Kenzior

Hi Johannes,

On 05/28/2019 06:51 AM, Johannes Berg wrote:

Hi all,

FYI - at the discussions in Prague we decided to let some vendor
specific nl80211 API go upstream, and I've just documented the expected
rules here:

https://wireless.wiki.kernel.org/en/developers/documentation/nl80211#vendor-specific_api



I'm guessing that you guys considered and rejected the idea of pushing 
these out to a separate, vendor specific genl family instead?


Regards,
-Denis


Re: NL80211_SCAN_FLAG_RANDOM_ADDR ?

2019-04-12 Thread Denis Kenzior

Hi Sergey,

On 04/12/2019 04:26 AM, Sergey Matyukevich wrote:

I've been poking around at how this flag is used and I noticed this
check in net/wireless/nl80211.c:

nl80211_check_scan_flags()

 if (*flags & NL80211_SCAN_FLAG_RANDOM_ADDR) {
 int err;

 if (!(wiphy->features & randomness_flag) ||
 (wdev && wdev->current_bss))
 return -EOPNOTSUPP;


The above disallows the use of RANDOM_ADDR for scans while connected.
The nl80211.h uapi header seems to concur:

  "@NL80211_FEATURE_SCAN_RANDOM_MAC_ADDR: This device/driver supports
using a random MAC address during scan (if the device is unassociated);"

However, if I create a P2P Device (in addition to the default STA
device), the kernel happily lets me scan on the wdev while the STA
interface is connected.

sudo iw phy0 interface add p2p type __p2pdev
sudo iw wdev 0x2 p2p start
sudo iw wdev 0x2 scan randomize

So the immediate question I have is, should the RANDOM_ADDR flag indeed
be limited to unassociated STA interfaces?  It would seem the hardware
is capable randomizing even when connected? Please educate me :)


Hello Denis,

IIUC, this feature could be introduced to support Android Compatibility
Definition Document (CDD). Those documents are available at the
following page: https://source.android.com/compatibility/cdd


Thanks for the reference.  It looks like a 'At a minimum you should/must 
do this' type of document.  It doesn't look like it precludes the use of 
randomization when connected?




For instance, in the latest CDD randomized scan requirements are described
in the section 7.4.2. It looks like current high level nl80211 API follows
those recommendations. Probably it has been implemented with STA use-case
in mind, that is why you can use that flag for P2P connection. But, as
Ben pointed out, actual application of this flag may depend on
implementation in firwmare and hardware.



Sure, understood.  But this is exactly the point of my question.  Is the 
check at the global level correct?  Or should it be relaxed in case 
there is hardware out there that can randomize probe requests while 
connected?  From my test it would seem this is possible?


Or put another way, besides hardware limitations, are there reasons why 
you would not want to randomize probe request address when connected?


Regards,
-Denis


Re: NL80211_SCAN_FLAG_RANDOM_ADDR ?

2019-04-11 Thread Denis Kenzior

Hi Ben,

On 04/11/2019 06:20 PM, Ben Greear wrote:

On 4/11/19 4:19 PM, Ben Greear wrote:

On 4/11/19 3:30 PM, Denis Kenzior wrote:

Hi,

I've been poking around at how this flag is used and I noticed this 
check in net/wireless/nl80211.c:


nl80211_check_scan_flags()

 if (*flags & NL80211_SCAN_FLAG_RANDOM_ADDR) {
 int err;

 if (!(wiphy->features & randomness_flag) ||
 (wdev && wdev->current_bss))
 return -EOPNOTSUPP;


The above disallows the use of RANDOM_ADDR for scans while connected. 
The nl80211.h uapi header seems to concur:


  "@NL80211_FEATURE_SCAN_RANDOM_MAC_ADDR: This device/driver supports 
using a random MAC address during scan (if the device is unassociated);"


However, if I create a P2P Device (in addition to the default STA 
device), the kernel happily lets me scan on the wdev while the STA 
interface is connected.


sudo iw phy0 interface add p2p type __p2pdev
sudo iw wdev 0x2 p2p start
sudo iw wdev 0x2 scan randomize

So the immediate question I have is, should the RANDOM_ADDR flag 
indeed be limited to unassociated STA interfaces?  It would seem the 
hardware is capable randomizing even when connected? Please educate 
me :)


You can be sure that each driver/hardware has its own bugs and 
limitations related to this.


Ath10k wave 1 and wave 2 that I am aware of would ignore and/or not 
ACK probe responses
sent back to an MAC address that is not that of the station itself.  
And changing the mac of a station
would require complete re-association AFAIK.  That is likely just one 
of the many issues.


Yes, I understand that some hardware would not support this.  But the 
question is does this check belong at the nl80211 layer (e.g. no 
hardware can do this) vs somewhere at the driver layer + additional 
feature bit as needed.




I should add:  If you really want to scan in this manner, you could just 
create a new station vdev with
random addr and have it do the scanning, then delete it when done?  The 
original station will continue on

its way unmolested.



So you mean something like:
sudo iw phy0 interface add sta2 type station
sudo iw dev sta2 scan randomize
command failed: Network is down (-100)
sudo ifconfig sta2 up
SIOCSIFFLAGS: Device or resource busy

I guess I'm running into this:

valid interface combinations:
		 * #{ managed } <= 1, #{ AP, P2P-client, P2P-GO } <= 1, #{ P2P-device 
} <= 1,

   total <= 3, #channels <= 2

Or did you mean something else?

Regards,
-Denis


NL80211_SCAN_FLAG_RANDOM_ADDR ?

2019-04-11 Thread Denis Kenzior

Hi,

I've been poking around at how this flag is used and I noticed this 
check in net/wireless/nl80211.c:


nl80211_check_scan_flags()

if (*flags & NL80211_SCAN_FLAG_RANDOM_ADDR) {
int err;

if (!(wiphy->features & randomness_flag) ||
(wdev && wdev->current_bss))
return -EOPNOTSUPP;


The above disallows the use of RANDOM_ADDR for scans while connected. 
The nl80211.h uapi header seems to concur:


 "@NL80211_FEATURE_SCAN_RANDOM_MAC_ADDR: This device/driver supports 
using a random MAC address during scan (if the device is unassociated);"


However, if I create a P2P Device (in addition to the default STA 
device), the kernel happily lets me scan on the wdev while the STA 
interface is connected.


sudo iw phy0 interface add p2p type __p2pdev
sudo iw wdev 0x2 p2p start
sudo iw wdev 0x2 scan randomize

So the immediate question I have is, should the RANDOM_ADDR flag indeed 
be limited to unassociated STA interfaces?  It would seem the hardware 
is capable randomizing even when connected? Please educate me :)


Regards,
-Denis


Re: Kernel oops / WiFi connection failure with wpa_supplicant 2.7

2019-01-15 Thread Denis Kenzior

Hi Arend,



- Is RTNL LINK_MODE / OPER_STATE status being (supposed to be?) 
affected by the driver during a roam?  E.g. if we're in a 802.1X 
network with userspace authentication, and driver roamed requiring a 
new 802.1X auth, then in theory the RTNL mode needs to be brought 
back out of UP state...


So do you expect the driver/cfg80211 to take care of that or the 
supplicant? I assumed wpa_supplicant would be doing that.




With regular roaming where we trigger a Deassociate/Deathenticate 
(either explicitly or implicitly) first, the interface goes into 
dormant mode by virtue of the carrier going down.


With this it isn't really clear whether the same is happening and who 
(kernel/userspace) should be doing what.  I would actually assume the 
kernel is/should be turning carrier off for the duration of the roam 
operation?


On what layer do we know 802.1X re-auth is required?



Not sure what you mean by 'layer'?  If re-auth is required, then only 
the supplicant has the proper info and it will handle this via EAPoL frames.


But that is besides the point.  Regardless of whether a roam needs 
re-auth or not, network interface dormant notification is needed.  For 
example: userspace DHCP clients need to know when to renew the address. 
And yes, there are weird networks out there that expect you to 
re-negotiate your DHCP address on a roam.  Such clients are not 
integrated in any way with a supplicant and rely on rtnl.


Regards,
-Denis


Re: Kernel oops / WiFi connection failure with wpa_supplicant 2.7

2019-01-14 Thread Denis Kenzior

Hi Arend,

On 01/14/2019 02:12 PM, Arend Van Spriel wrote:

On 1/8/2019 6:44 PM, Denis Kenzior wrote:

Hi Arend,

However, there is more to it. When these offloads were introduced, we 
discussed about having a PORT_AUTHORIZED event or not. It was decided 
passing an attribute in CONNECT and ROAMED event would suffice and 
that is what was implemented in brcmfmac. However, it seems time 
passed and the need for an explicit PORT_AUTHORIZED was there 
(probably Denis knows), which wpa_supplicant now supports thus 
ignoring the attribute in the CONNECT and ROAMED events. The brcmfmac 
driver was not changed accordingly. For this there are patches 
pending in linux-wireless which are necessary to have a working 
connection.




Coming in a bit late to this discussion, but it does raise a few 
points I wouldn't mind some clarification on:


- With commit 503c1fb98ba3, the kernel effectively changed the 
userspace API.  So I take it that breaking userspace APIs are OK 
sometimes? If so, I have lots of suggestions to make ;)


I bet you do :-p I think the rule of thumb is that there are no drivers 
providing the functionality behind the user-space API and/or no 
user-space applications are using that API.


Maybe this is a question for Johannes as well, but define 'user-space 
applications'?  If that includes wpa_s, wasn't the rule of thumb broken 
with that commit?




- Is RTNL LINK_MODE / OPER_STATE status being (supposed to be?) 
affected by the driver during a roam?  E.g. if we're in a 802.1X 
network with userspace authentication, and driver roamed requiring a 
new 802.1X auth, then in theory the RTNL mode needs to be brought back 
out of UP state...


So do you expect the driver/cfg80211 to take care of that or the 
supplicant? I assumed wpa_supplicant would be doing that.




With regular roaming where we trigger a Deassociate/Deathenticate 
(either explicitly or implicitly) first, the interface goes into dormant 
mode by virtue of the carrier going down.


With this it isn't really clear whether the same is happening and who 
(kernel/userspace) should be doing what.  I would actually assume the 
kernel is/should be turning carrier off for the duration of the roam 
operation?


- The new API leaves a lot to be desired in terms of race conditions. 
For example, how long should userspace wait for EAPoL-EAP packets to 
arrive (before triggering its own EAPoL-Start for example) if a 
CMD_ROAMED event comes?


I think that question applies to CMD_CONNECT as well, right? Not sure if 
the specs provide any guidance for that. I can dive into that, but maybe 
someone like Jouni or Johannes know. If so, let me know ;-)


With CMD_CONNECT it is a bit more clear because you're most likely not 
specifying a PMKID for the first time, so you expect the authentication 
to happen in all cases.  If the AP doesn't respond after some small 
timeout, the supplicant can send its own EAPoL-Start.


With CMD_ROAMED it is less clear.



- What happens if userspace does send an EAPoL-Start in the middle of 
an offloaded 4-way handshake?


Probably those would be dropped.



I would love to have something more definitive than 'Probably', and it 
might be worth mentioning this hint in the documentation somewhere.


Regards,
-Denis


Re: Kernel oops / WiFi connection failure with wpa_supplicant 2.7

2019-01-08 Thread Denis Kenzior

Hi Arend,

However, there is more to it. When these offloads were introduced, we 
discussed about having a PORT_AUTHORIZED event or not. It was decided 
passing an attribute in CONNECT and ROAMED event would suffice and that 
is what was implemented in brcmfmac. However, it seems time passed and 
the need for an explicit PORT_AUTHORIZED was there (probably Denis 
knows), which wpa_supplicant now supports thus ignoring the attribute in 
the CONNECT and ROAMED events. The brcmfmac driver was not changed 
accordingly. For this there are patches pending in linux-wireless which 
are necessary to have a working connection.




Coming in a bit late to this discussion, but it does raise a few points 
I wouldn't mind some clarification on:


- With commit 503c1fb98ba3, the kernel effectively changed the userspace 
API.  So I take it that breaking userspace APIs are OK sometimes? If so, 
I have lots of suggestions to make ;)


- Is RTNL LINK_MODE / OPER_STATE status being (supposed to be?) affected 
by the driver during a roam?  E.g. if we're in a 802.1X network with 
userspace authentication, and driver roamed requiring a new 802.1X auth, 
then in theory the RTNL mode needs to be brought back out of UP state...


- The new API leaves a lot to be desired in terms of race conditions. 
For example, how long should userspace wait for EAPoL-EAP packets to 
arrive (before triggering its own EAPoL-Start for example) if a 
CMD_ROAMED event comes?


- What happens if userspace does send an EAPoL-Start in the middle of an 
offloaded 4-way handshake?


Regards,
-Denis


Re: [PATCH v6 2/3] mac80211: Define new driver callback replace_key

2018-08-16 Thread Denis Kenzior

Hi Alexander,

Just minor nitpicks:


+ * @replace_key: Replace an exiting in use key with a new one while 
guaranteeing
+ * to not leak clear text packets. Implementing this callback will enable
+ * mac80211 to announce NL80211_EXT_FEATURE_ATOMIC_KEY_REPLACE.
+ * Packets already queued must not be send out encrypted with the new key

send out -> sent out

+ * and packets decoded with the old key must not be handed over to mac80211
+ * when the driver is not checking IV/ICV itself once the callback has been
+ * completed.
+ * Mac80211 will log an error when asked to use replace a PTK key
+ * without replace_key but will still perform the then potentially
+ * insecure action via set_key for backward compatibility for now.
+ *


Not sure this part really belongs in the driver method description?


   * @update_tkip_key: See the section "Hardware crypto acceleration"
   *This callback will be called in the context of Rx. Called for drivers
   *which set IEEE80211_KEY_FLAG_TKIP_REQ_RX_P1_KEY.





diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 4fb2709cb527..84cc8005c19a 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -572,9 +572,14 @@ struct ieee80211_hw *ieee80211_alloc_hw_nm(size_t 
priv_data_len,
  
NL80211_EXT_FEATURE_SCAN_MIN_PREQ_CONTENT);
}
  
+	if (ops->replace_key)

+   wiphy_ext_feature_set(wiphy,
+ NL80211_EXT_FEATURE_ATOMIC_KEY_REPLACE);
+
if (!ops->set_key)
wiphy->flags |= WIPHY_FLAG_IBSS_RSN;
  
+


Stray whitespace?


if (ops->wake_tx_queue)
wiphy_ext_feature_set(wiphy, NL80211_EXT_FEATURE_TXQS);
  


Regards,
-Denis


Re: [PATCH v6 1/3] nl80211: Add ATOMIC_KEY_REPLACE API

2018-08-16 Thread Denis Kenzior

Hi Alexander,

On 08/14/2018 05:42 AM, Alexander Wetzel wrote:

Drivers able to correctly replace a in-use key should set
NL80211_EXT_FEATURE_ATOMIC_KEY_REPLACE to allow the userspace (e.g.
hostapd or wpa_supplicant) to rekey PTK keys.

The userspace must detect a PTK rekey attempt and only go ahead with the
rekey when the driver has set this flag. If the driver is not supporting
the feature the userspace either must not replace the PTK key or perform
a full re-association.

Ignoring this flag and continuing to rekey the connection can still
work but has to be considered insecure and broken. It can leak cleartext
packets or freeze the connection and is only supported to allow the
userspace to be updated.

Signed-off-by: Alexander Wetzel 
---
  include/uapi/linux/nl80211.h | 6 ++
  1 file changed, 6 insertions(+)



This looks good to me from a userspace perspective.  I will try to 
implement support for this in iwd soon to give you a prototype to play with.


Reviewed-by: Denis Kenzior 

Regards,
-Denis


Re: [PATCH v2] mac80211: Fix wlan freezes under load at rekey

2018-07-11 Thread Denis Kenzior

Hi Alexander,

On 07/11/2018 12:08 PM, Alexander Wetzel wrote:

Hi Denis,


Hi Alexander,


hostapd or wpa_supplicant are "ordering" mac80211 to install a new key
and are implementing the state machine and are in a good position to
handle the fallout... at least theoretically.


Ideally it would even know beforehand that we don't want to handle the
PTK rekeying, and then could reconnect instead of going through the
handshake.



Don't see how we could do that in the kernel, all the relevant
information is handled in the state machine. I guess an API extension
telling hostap/supplicant if we can handle rekeys or not would tbe he
only way to avoid that.



Can the kernel / driver provide some sort of hint to user space that PTK
rekey isn’t supported?  We could then have user space deauthenticate
with a big warning about what/why this is happening and try to
re-connect to the last used BSS.



Sure. In fact the latest patch is already doing that by returning an
error when set_key is called for PTK and it's not an initial call.
Tests with wpa_supplicant shows that this is is then handled like the
initial key set is failing. Networkmanager prompts for the password and
wpa_supplicant running without seems to blacklist a reconnect for 15s.


Ideally we shouldn't even get this far.  We really need some kind of 
capability bit on the phy telling userspace whether PTK rekey is 
supported or not.  Then userspace can take proper action based on this 
information.  E.g. if PTK rekey isn't safe, then we can simply issue a 
CMD_DISCONNECT and re-connect to the last BSS.  The kernel doesn't need 
to play any 'tricks'.


The fact that current userspace implementations are broken is 
regrettable and needs to be fixed.




I kind of liked that solution, but with existing implematations out this
is indeed awkward to find a "correct" solution.

The main problem for me currently is to find a correct and still
acceptable solution. This turned from "let's fix this nasty wlan
connection freezes" to a projet spanning the complete wlan stack: From
hardware up to and including the userspace...


Right.  The problem is that this PTK rekey likely 'works' (for some 
definition thereof) in a vast majority of cases, e.g. the link isn't 
broken, so the user doesn't notice.  So, if the kernel starts to 
unilaterally issue disconnects, you will have a lot of grumbling users.


Just to clarify, I'm not arguing against this necessarily.  I can see 
why issuing a disconnect is a good idea for many reasons (e.g. security, 
etc.)  But, I would expect a lot of user backlash if this is done, and 
given that this has been an issue for many years, I wonder if its the 
right way of handling this?




It's fun to learn how that interacts (if not very fast), I'm stuggling
finding the best way forward here. Whatever we do has undesired
consequences.
Maybe I'm missing something, but here the high level options we have in
my opinion:

1) Keep it as it is and solve that in a indefinite future when we and
the world implement the ieee802.11 2012 addition, to use key 0+1 for PTK
and 2+3 for GTK
- rekey has a extrem high probability of freezing connections and
leaking a few clear text packets for years (decades?) to come
+ The issue is fixed at the core


It would seem to me that 0+1 rekey is a separate issue that needs to be 
supported in both kernel and userspace anyway.




2) Make it worse, like some (most) Windows systems/cards seem to handle
it by encrypting EAPOL #4 with the NEW key, breaking the handshake and
forcing a reconnect.
- break something more to fix a problem sounds like a insane approach
+ This seems to be quite common and therefore well "tested" (based on my
very limited data on that)


This seems awful.  And then if you're unlucky someone will come in and 
tell you that the kernel has to maintain this 'legacy' behavior forever. 
 So things can't ever be fixed.


Plus, as I already mentioned above, some users 'think' that PTK rekey 
already works just fine.




3) Fix what we can in mac80211 but keep the API stable
- Without driver actions still many drivers will be "undefined" and even
if they are not freezing leak packets
+ This will reduce the problems to a fraction of what is is today with
only a mac80211 update

4) Redesign the mac80211 rekey handling and interaction with drivers to
only rekey if it is save and decline when not.
+ We only have to touch the kernel
- any supplicant (whatever runs the wpa state machine) may get errors
where the programmes did not expect them, leading to unexpected side
effects.

5) The full-stack solution: Update the API for the userpace
+ We do not have to "trick" the wpa state machine to disconnect, the
programmers of it have to code it.
- Well, it must be suppurted from the wpa state machine. If not we still
have to handle the rekey somehow or we accept freezes/cleartext leaks...

The last two solutions will also need some "fallback" when a secure
rekey is not possible and/or the user is runing an old 

Re: [PATCH v2] mac80211: Fix wlan freezes under load at rekey

2018-07-10 Thread Denis Kenzior

Hi Alexander,


hostapd or wpa_supplicant are "ordering" mac80211 to install a new key
and are implementing the state machine and are in a good position to
handle the fallout... at least theoretically.


Ideally it would even know beforehand that we don't want to handle the
PTK rekeying, and then could reconnect instead of going through the
handshake.



Don't see how we could do that in the kernel, all the relevant
information is handled in the state machine. I guess an API extension
telling hostap/supplicant if we can handle rekeys or not would tbe he
only way to avoid that.



Can the kernel / driver provide some sort of hint to user space that PTK 
rekey isn’t supported?  We could then have user space deauthenticate 
with a big warning about what/why this is happening and try to 
re-connect to the last used BSS.



So I think we're probably better off accepting the set_key but not
actually using it, and instead disconnecting... even if that's awkward
and should come with a big comment :-)


Instead of returning an error I'll change the code to accept the rekey
but do nothing with it. (Basically delete the new key and keep the old
active).
And of course calling ieee80211_set_disassoc() prior to return "success".

Let's see how the supplicant will react on a disassoc while doing a rekey...



This sounds pretty awful actually.  Now that wpa_s is not the only game 
in town, can we stop resorting to these tactics?


Regards,
-Denis



Re: Proper SET_KEY usage?

2018-07-10 Thread Denis Kenzior

Hi Johannes,

On 07/06/2018 07:17 AM, Johannes Berg wrote:

Hi,


Yeah. This stuff grew out of WEXT mostly, and what things wpa_s did at
the time...


Right.  My main intent with this was to see if I could understand things
enough to actually start fixing some of the docs, see if we need to
deprecate some things and/or maybe fix bits inside nl80211.


:-)
That'd be very welcome!


So it sounds like SET_KEY is completely unnecessary for a unicast key in
an RSN/WPA association for an STA.  Should we update mac80211 (and
possibly nl80211, though likely it doesn't have enough info) to issue a
warning when someone tries to use set_key on such a key?


I'm not sure I see much point, since we'll have to deal with older wpa_s
basically forever?


Do we though?  I can understand not breaking userspace APIs, but it 
seems like this API had only a single user (up until recently) and it 
'should' be acceptable to break ancient versions of it.  Whether we can 
or not is a question for the wider community though...





What about AP/IBSS?  Aren't unicast keys also per-station in this case?
It sounds like SET_KEY isn't useful in this context either, right?


Yes, same situation really.


What about WEP?


WEP is a special case, you typically use the same key for all traffic,
and may switch around between them.

In theory though, and I think this API allows you to do it, you can have
multiple WEP keys and use different ones for unicast/multicast TX. Or
the same. Or switch them around as you like.


Put another way, should we deprecate the whole
NL80211_KEY_DEFAULT_TYPE_UNICAST attribute?


Perhaps. I'm not sure I see much point in it. The WEP case above is
pretty fringe, and wext didn't support it, so ... probably not needed?


Should we at least take this out of the driver API then and have it 
ignored by nl80211?  And maybe mark this deprecated in the docs?





And since you went off on a tangent (maybe this needs a separate
discussion?) but... Can you elaborate about PTK rekeying with a non-zero
key index?  Are you saying that in IBSS/AP mode we can't support PTK rekey?


We can't properly support PTK rekeying in any mode, see Alexander
Wetzel's patch and the whole discussion on the different versions I had
with him. I'll merge his patch soon I think, but it doesn't really work
properly.


I think we had this conversation before.  Up to 802.11-2012, PTK Rekey 
was not really explicitly mentioned as possible.  There were hints and 
stuff, but no explicit language.


I think in 802.11-2016 they finally explicitly say that this is possible.

However, we seem to have networks that perform PTK Rekey and even full 
802.1X re-auth every hour (eduroam for example).  How is this working? 
Or is it a case of it not always working?




Some time relatively recently (802.11-2016 I think) the spec has
introduced a method to indicate that you support what was previously not
allowed at all: using non-zero key index for a PTK. This way, you can do
transparent PTK rekeying like you do with GTK now, by switching to a new
key index when you rekey.


And keep the old one for whatever packets are in process, right?  That 
makes more sense.




We don't - currently - support this in mac80211 or even wpa_supplicant,
and I expect many devices would have issues with this with hardware
offload. I'm also not aware of any certification program for it, but I
also haven't followed the WPA3 efforts (but I don't think they're
focused here, they're focused on algorithms and higher layers IIRC.)



Does the recent PMK handshake offload support proposal take into 
consideration PTK rekeying?



(The difference between them is that in IBSS you will have per-station
GTKs, but that's also irrelevant because those are only used for RX -
your own GTK on the netdev/wdev/sdata/vif level is used for TX.)


Okay, so for GTKs we would have a per-station RX GTK and a 'default' TX
GTK.  So in this case a SET_KEY would be needed only on the 'default' TX
GTK index.

Okay, maybe a tangent, but this brings up a question: Why do we have
separate steps for this operation?  Can't we just issue a NEW_KEY and
have the kernel figure this out?


I think the only case would be WEP?


So can we expect all drivers to operate this way?  Or should we maybe 
also have nl80211 call set_key operation on a key with no associated STA 
implicitly?





Well, that's what I was pointing out earlier, the whole Multicast
attribute is ignored and should not be sent in the first place:

  >>   Key Default Types: len 4
  >>   Multicast: true

The proof :) follows:


:)


So while nl80211_new_key actually parses this information (stored in
struct key_parse) it never forwards any of it to the driver.  The group
key vs pairwise key determination seems to be predicated on presence of
NL80211_ATTR_KEY_TYPE and the following fallback:

  if (key.type == -1) {
  if (mac_addr)
  key.type = NL80211_KEYTYPE_PAIRWISE;
  else
 

[PATCH v2] nl80211/mac80211: allow non-linear skb in rx_control_port

2018-07-03 Thread Denis Kenzior
The current implementation of cfg80211_rx_control_port assumed that the
caller could provide a contiguous region of memory for the control port
frame to be sent up to userspace.  Unfortunately, many drivers produce
non-linear skbs, especially for data frames.  This resulted in userspace
getting notified of control port frames with correct metadata (from
address, port, etc) yet garbage / nonsense contents, resulting in bad
handshakes, disconnections, etc.

mac80211 linearizes skbs containing management frames.  But it didn't
seem worthwhile to do this for control port frames.  Thus the signature
of cfg80211_rx_control_port was changed to take the skb directly.
nl80211 then takes care of obtaining control port frame data directly
from the (linear | non-linear) skb.

The caller is still responsible for freeing the skb,
cfg80211_rx_control_port does not take ownership of it.

Signed-off-by: Denis Kenzior 
---

v2 changes:
- Reword commit header
- Rework tracing slightly per Arend's feedback
- Added blurb about skb->protocol in include/net/cfg80211.h

 include/net/cfg80211.h | 12 ++--
 net/mac80211/rx.c  |  5 +
 net/wireless/nl80211.c | 24 +++-
 net/wireless/trace.h   | 18 ++
 4 files changed, 32 insertions(+), 27 deletions(-)

diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index 9ba1f289c439..beed5b2a3933 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -5937,10 +5937,11 @@ void cfg80211_mgmt_tx_status(struct wireless_dev *wdev, 
u64 cookie,
 /**
  * cfg80211_rx_control_port - notification about a received control port frame
  * @dev: The device the frame matched to
- * @buf: control port frame
- * @len: length of the frame data
- * @addr: The peer from which the frame was received
- * @proto: frame protocol, typically PAE or Pre-authentication
+ * @skb: The skbuf with the control port frame.  It is assumed that the skbuf
+ * is 802.3 formatted (with 802.3 header).  The skb can be non-linear.  This
+ * function does not take ownership of the skb, so the caller is responsible
+ * for any cleanup.  The caller must also ensure that skb->protocol is set
+ * appropriately.
  * @unencrypted: Whether the frame was received unencrypted
  *
  * This function is used to inform userspace about a received control port
@@ -5953,8 +5954,7 @@ void cfg80211_mgmt_tx_status(struct wireless_dev *wdev, 
u64 cookie,
  * Return: %true if the frame was passed to userspace
  */
 bool cfg80211_rx_control_port(struct net_device *dev,
- const u8 *buf, size_t len,
- const u8 *addr, u16 proto, bool unencrypted);
+ struct sk_buff *skb, bool unencrypted);
 
 /**
  * cfg80211_cqm_rssi_notify - connection quality monitoring rssi event
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index a16ba568e2a3..64742f2765c4 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -2370,11 +2370,8 @@ static void ieee80211_deliver_skb_to_local_stack(struct 
sk_buff *skb,
 sdata->control_port_over_nl80211)) {
struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(skb);
bool noencrypt = status->flag & RX_FLAG_DECRYPTED;
-   struct ethhdr *ehdr = eth_hdr(skb);
 
-   cfg80211_rx_control_port(dev, skb->data, skb->len,
-ehdr->h_source,
-be16_to_cpu(skb->protocol), noencrypt);
+   cfg80211_rx_control_port(dev, skb, noencrypt);
dev_kfree_skb(skb);
} else {
/* deliver to local stack */
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 8db59129c095..b75a0144c0a5 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -15036,20 +15036,24 @@ void cfg80211_mgmt_tx_status(struct wireless_dev 
*wdev, u64 cookie,
 EXPORT_SYMBOL(cfg80211_mgmt_tx_status);
 
 static int __nl80211_rx_control_port(struct net_device *dev,
-const u8 *buf, size_t len,
-const u8 *addr, u16 proto,
+struct sk_buff *skb,
 bool unencrypted, gfp_t gfp)
 {
struct wireless_dev *wdev = dev->ieee80211_ptr;
struct cfg80211_registered_device *rdev = wiphy_to_rdev(wdev->wiphy);
+   struct ethhdr *ehdr = eth_hdr(skb);
+   const u8 *addr = ehdr->h_source;
+   u16 proto = be16_to_cpu(skb->protocol);
struct sk_buff *msg;
void *hdr;
+   struct nlattr *frame;
+
u32 nlportid = READ_ONCE(wdev->conn_owner_nlportid);
 
if (!nlportid)
return -ENOENT;
 
-   msg = nlmsg_new(100 + len, gfp);
+   msg = nlmsg_new(100 + skb->len, gfp);
if (!msg)
return -ENOMEM;
 
@@ -15063,13 +15067,17 @@ stat

Re: [PATCH] nl80211/mac80211: Fix cfg80211_rx_control_port

2018-07-03 Thread Denis Kenzior

Hi Arend,

On 07/03/2018 02:18 PM, Arend van Spriel wrote:

Hi Denis,

I prefer the subject summarizes the issue, eg. "allow non-linear skb 
data passed to cfg80211_rx_control_port".


Right, I'll fix this in the next version.



On 7/3/2018 8:26 PM, Denis Kenzior wrote:

The current implementation of cfg80211_rx_control_port assumed that the
caller could provide a contiguous region of memory for the control port
frame to be sent up to userspace.  Unfortunately, many drivers produce
non-linear skbs, especially for data frames.  This resulted in userspace
getting notified of control port frames with correct metadata (from
address, port, etc) yet garbage / nonsense contents, resulting in bad
handshakes, disconnections, etc.


[snip]


diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index 9ba1f289c439..94842c2a2f73 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -5937,10 +5937,10 @@ void cfg80211_mgmt_tx_status(struct 
wireless_dev *wdev, u64 cookie,

  /**
   * cfg80211_rx_control_port - notification about a received control 
port frame

   * @dev: The device the frame matched to
- * @buf: control port frame
- * @len: length of the frame data
- * @addr: The peer from which the frame was received
- * @proto: frame protocol, typically PAE or Pre-authentication
+ * @skb: The skbuf with the control port frame.  It is assumed that 
the skbuf
+ * is 802.3 formatted (with 802.3 header).  The skb can be 
non-linear.  This
+ * function does not take ownership of the skb, so the caller is 
responsible

+ * for any cleanup.
   * @unencrypted: Whether the frame was received unencrypted
   *
   * This function is used to inform userspace about a received 
control port
@@ -5953,8 +5953,7 @@ void cfg80211_mgmt_tx_status(struct wireless_dev 
*wdev, u64 cookie,

   * Return: %true if the frame was passed to userspace
   */
  bool cfg80211_rx_control_port(struct net_device *dev,
-  const u8 *buf, size_t len,
-  const u8 *addr, u16 proto, bool unencrypted);
+  struct sk_buff *skb, bool unencrypted);


If we are changing the signature, could we change the return type to 
int. I don't see much value in the int-2-bool conversion.


I was mostly following the pattern in other cfg80211_rx_ functions here. 
 They all return bool or void.  I'm fine either way.





  /**
   * cfg80211_cqm_rssi_notify - connection quality monitoring rssi event


[snip]


diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 8db59129c095..b75a0144c0a5 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -15036,20 +15036,24 @@ void cfg80211_mgmt_tx_status(struct 
wireless_dev *wdev, u64 cookie,

  EXPORT_SYMBOL(cfg80211_mgmt_tx_status);

  static int __nl80211_rx_control_port(struct net_device *dev,
- const u8 *buf, size_t len,
- const u8 *addr, u16 proto,
+ struct sk_buff *skb,
   bool unencrypted, gfp_t gfp)
  {
  struct wireless_dev *wdev = dev->ieee80211_ptr;
  struct cfg80211_registered_device *rdev = 
wiphy_to_rdev(wdev->wiphy);

+    struct ethhdr *ehdr = eth_hdr(skb);
+    const u8 *addr = ehdr->h_source;
+    u16 proto = be16_to_cpu(skb->protocol);


So this means skb->protocol has to be set by driver (using 
eth_type_trans() helper). Guess mac80211 does it for sure, but better 
make it a clear requirement for cfg80211-based drivers.




Right, mac80211 uses skb->protocol to do the filtering, so this is 
already done.  We could make protocol and addr explicit arguments, but 
it seemed strange to not extract these from the skb.


I guess we could also extract the proto from the eth_hdr or call 
eth_type_trans inside nl80211.  I have no strong feelings here.  It 
would be great if another driver or two tried to implement this and gave 
us feedback as to which works better...

  struct sk_buff *msg;
  void *hdr;
+    struct nlattr *frame;
+
  u32 nlportid = READ_ONCE(wdev->conn_owner_nlportid);

  if (!nlportid)
  return -ENOENT;

-    msg = nlmsg_new(100 + len, gfp);
+    msg = nlmsg_new(100 + skb->len, gfp);
  if (!msg)
  return -ENOMEM;

@@ -15063,13 +15067,17 @@ static int __nl80211_rx_control_port(struct 
net_device *dev,

  nla_put_u32(msg, NL80211_ATTR_IFINDEX, dev->ifindex) ||
  nla_put_u64_64bit(msg, NL80211_ATTR_WDEV, wdev_id(wdev),
    NL80211_ATTR_PAD) ||
-    nla_put(msg, NL80211_ATTR_FRAME, len, buf) ||
  nla_put(msg, NL80211_ATTR_MAC, ETH_ALEN, addr) ||
  nla_put_u16(msg, NL80211_ATTR_CONTROL_PORT_ETHERTYPE, proto) ||
  (unencrypted && nla_put_flag(msg,
   NL80211_ATTR_CONTROL_PORT_NO_ENCRYPT)))
  goto nla_put_failure;

+    frame = nla_reserve(msg, NL80211_ATTR_FRAME, skb->len);
+    if (!frame)
+    goto nla_put_failure;
+
+    skb_copy_bits

[PATCH] nl80211/mac80211: Fix cfg80211_rx_control_port

2018-07-03 Thread Denis Kenzior
The current implementation of cfg80211_rx_control_port assumed that the
caller could provide a contiguous region of memory for the control port
frame to be sent up to userspace.  Unfortunately, many drivers produce
non-linear skbs, especially for data frames.  This resulted in userspace
getting notified of control port frames with correct metadata (from
address, port, etc) yet garbage / nonsense contents, resulting in bad
handshakes, disconnections, etc.

mac80211 linearizes skbs containing management frames.  But it didn't
seem worthwhile to do this for control port frames.  Thus the signature
of cfg80211_rx_control_port was changed to take the skb directly.
nl80211 then takes care of obtaining control port frame data directly
from the (linear | non-linear) skb.

The caller is still responsible for freeing the skb,
cfg80211_rx_control_port does not take ownership of it.

Signed-off-by: Denis Kenzior 
---
 include/net/cfg80211.h | 11 +--
 net/mac80211/rx.c  |  5 +
 net/wireless/nl80211.c | 24 +++-
 net/wireless/trace.h   | 26 +-
 4 files changed, 38 insertions(+), 28 deletions(-)

diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index 9ba1f289c439..94842c2a2f73 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -5937,10 +5937,10 @@ void cfg80211_mgmt_tx_status(struct wireless_dev *wdev, 
u64 cookie,
 /**
  * cfg80211_rx_control_port - notification about a received control port frame
  * @dev: The device the frame matched to
- * @buf: control port frame
- * @len: length of the frame data
- * @addr: The peer from which the frame was received
- * @proto: frame protocol, typically PAE or Pre-authentication
+ * @skb: The skbuf with the control port frame.  It is assumed that the skbuf
+ * is 802.3 formatted (with 802.3 header).  The skb can be non-linear.  This
+ * function does not take ownership of the skb, so the caller is responsible
+ * for any cleanup.
  * @unencrypted: Whether the frame was received unencrypted
  *
  * This function is used to inform userspace about a received control port
@@ -5953,8 +5953,7 @@ void cfg80211_mgmt_tx_status(struct wireless_dev *wdev, 
u64 cookie,
  * Return: %true if the frame was passed to userspace
  */
 bool cfg80211_rx_control_port(struct net_device *dev,
- const u8 *buf, size_t len,
- const u8 *addr, u16 proto, bool unencrypted);
+ struct sk_buff *skb, bool unencrypted);
 
 /**
  * cfg80211_cqm_rssi_notify - connection quality monitoring rssi event
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index a16ba568e2a3..64742f2765c4 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -2370,11 +2370,8 @@ static void ieee80211_deliver_skb_to_local_stack(struct 
sk_buff *skb,
 sdata->control_port_over_nl80211)) {
struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(skb);
bool noencrypt = status->flag & RX_FLAG_DECRYPTED;
-   struct ethhdr *ehdr = eth_hdr(skb);
 
-   cfg80211_rx_control_port(dev, skb->data, skb->len,
-ehdr->h_source,
-be16_to_cpu(skb->protocol), noencrypt);
+   cfg80211_rx_control_port(dev, skb, noencrypt);
dev_kfree_skb(skb);
} else {
/* deliver to local stack */
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 8db59129c095..b75a0144c0a5 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -15036,20 +15036,24 @@ void cfg80211_mgmt_tx_status(struct wireless_dev 
*wdev, u64 cookie,
 EXPORT_SYMBOL(cfg80211_mgmt_tx_status);
 
 static int __nl80211_rx_control_port(struct net_device *dev,
-const u8 *buf, size_t len,
-const u8 *addr, u16 proto,
+struct sk_buff *skb,
 bool unencrypted, gfp_t gfp)
 {
struct wireless_dev *wdev = dev->ieee80211_ptr;
struct cfg80211_registered_device *rdev = wiphy_to_rdev(wdev->wiphy);
+   struct ethhdr *ehdr = eth_hdr(skb);
+   const u8 *addr = ehdr->h_source;
+   u16 proto = be16_to_cpu(skb->protocol);
struct sk_buff *msg;
void *hdr;
+   struct nlattr *frame;
+
u32 nlportid = READ_ONCE(wdev->conn_owner_nlportid);
 
if (!nlportid)
return -ENOENT;
 
-   msg = nlmsg_new(100 + len, gfp);
+   msg = nlmsg_new(100 + skb->len, gfp);
if (!msg)
return -ENOMEM;
 
@@ -15063,13 +15067,17 @@ static int __nl80211_rx_control_port(struct 
net_device *dev,
nla_put_u32(msg, NL80211_ATTR_IFINDEX, dev->ifindex) ||
nla_put_u64_64bit(msg, NL80211_ATTR_WDEV, wdev_id(wdev),
  

Re: Proper SET_KEY usage?

2018-06-29 Thread Denis Kenzior

Hi Johannes,

On 06/29/2018 05:01 AM, Johannes Berg wrote:

Hi Denis,

Hope you don't mind me adding the list to the explanations, so they're
recorded. Your questions are completely reasonable, after all :-)



Absolutely do not mind.


So I've been poking around a bit more in nl80211/mac80211 stuff and I
was curious how exactly NEW_KEY/SET_KEY are supposed to be used.  The
documentation is lets say... less than stellar.


Yeah. This stuff grew out of WEXT mostly, and what things wpa_s did at
the time...


Right.  My main intent with this was to see if I could understand things 
enough to actually start fixing some of the docs, see if we need to 
deprecate some things and/or maybe fix bits inside nl80211.





So here's a excerpts from a capture of wpa_supplicant 2.6 connecting to
a PSK (CCMP + MFP-enabled) BSS.  So first thing it does is creates the
pairwise key via CMD_NEW_KEY:

< Request: New Key (0x0b) len 68 [ack]
  Interface Index: 3 (0x0003)
  Key Data: len 16
  [...]
  Key Cipher: CCMP (00:0f:ac) suite  04
  Key Sequence: len 6
  00 00 00 00 00 00
  MAC Address [...]
  Key Index: 0 (0x00)
  > Response: New Key (0x0b) len 4 [0x100]
  Status: Success (0)

So far so good.


Yeah, that seems reasonable :-)


The next thing it does is:
< Request: Set Key (0x0a) len 28 [ack]
  Interface Index: 3 (0x0003)
  Key Index: 0 (0x00)
  Key Default: true
  Key Default Types: len 4
  Unicast: true
  > Response: Set Key (0x0a) len 4 [0x100]
  Status: Success (0)

This part is a bit bizarre.


Yeah. Maybe Jouni can chime in why this happens. I suspect the reason is
some legacy with wext or ancient drivers.


First it seems to be a complete no-op on
mac80211 because mac80211 puts the key into sta->ptk, while
ieee80211_set_default_key is fully ignorant of the sta and only operates
on struct ieee80211_sub_if_data.


Indeed. PTKs are also immediately selected for TX.


Is this really intended to be used
this way?  Is SET_KEY useful / necessary in an STA context?  What is the
intended usage here?


Even if we implement, in the future, PTK rekeying properly with non-zero
key index, I think we could still set the key for TX immediately, and
keep the other for RX, so it shouldn't be necessary in any way for a PTK
(or actually for a per-station key, which differs slightly) context.


So it sounds like SET_KEY is completely unnecessary for a unicast key in 
an RSN/WPA association for an STA.  Should we update mac80211 (and 
possibly nl80211, though likely it doesn't have enough info) to issue a 
warning when someone tries to use set_key on such a key?


What about AP/IBSS?  Aren't unicast keys also per-station in this case? 
It sounds like SET_KEY isn't useful in this context either, right?


What about WEP?

Put another way, should we deprecate the whole 
NL80211_KEY_DEFAULT_TYPE_UNICAST attribute?


And since you went off on a tangent (maybe this needs a separate 
discussion?) but... Can you elaborate about PTK rekeying with a non-zero 
key index?  Are you saying that in IBSS/AP mode we can't support PTK rekey?




(The difference between them is that in IBSS you will have per-station
GTKs, but that's also irrelevant because those are only used for RX -
your own GTK on the netdev/wdev/sdata/vif level is used for TX.)


Okay, so for GTKs we would have a per-station RX GTK and a 'default' TX 
GTK.  So in this case a SET_KEY would be needed only on the 'default' TX 
GTK index.


Okay, maybe a tangent, but this brings up a question: Why do we have 
separate steps for this operation?  Can't we just issue a NEW_KEY and 
have the kernel figure this out?





Okay, then wpa_s sets the Group key:
< Request: New Key (0x0b) len 64 [ack]
  Interface Index: 3 (0x0003)
  Key Data: len 16
  [...]
  Key Cipher: CCMP (00:0f:ac) suite  04
  Key Sequence: len 6
  c8 08 00 00 00 00
  Key Default Types: len 4
  Multicast: true
  Key Index: 1 (0x01)
  > Response: New Key (0x0b) len 4 [0x100]
  Status: Success (0)

wpa_s doesn't use CMD_SET_KEY at all for the key created above.  Notice
the completely superfluous 'Key Default Types' attribute.  This will be
ignored by nl80211 when invoking the add_key driver method.


CMD_SET_KEY is only used in contexts where you might transmit with the
key, this isn't true for the GTK in client-mode.

The key types is an artifact of the IBSS I described I think, in this
case you don't have a station MAC address for the key, but for GTK in
IBSS you will have a MAC address (because the GTK for RX is per station)
but you still need to tag it as being a GTK and not PTK.



Well, that's what I was pointing out earlier, the whole Multicast 
attribute is ignored and should not be sent in the first place:


>>   Key Default Types: len 4
>>   Multicast: true

The proof :) follows:

add_key has the following signature:
int (*add_key)(struct wiphy *wiphy, struct ne

Re: [BUG] mac80211: Using smp_processor_id() in preemptible code: iwd

2018-06-19 Thread Denis Kenzior
Hi Johannes,

> On Jun 15, 2018, at 7:30 AM, Johannes Berg  wrote:
> 
> On Fri, 2018-06-15 at 11:09 +, McGinn, Dan wrote:
>> Hi, I'm newly trying out Intel iwd daemon but I experience regular kernel 
>> errors in 4.17, although WPA2-PSK connection remains stable.  These errors 
>> don't seem to be experienced with wpa_supplicant.  The errors reliably 
>> appear around the following events:
>> netdev_unicast_notify()
>> netdev_control_port_frame_event()
>> netdev_set_rekey_offload()
>> netdev_set_gtk()
>> 
>> @Denkenz in IRC helpfully suggests Johannes could follow the finger of 
>> suspicion to this commit:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=911806491425d79107cadddbde11b42bbdfe38c8
> 
> It's his code ;-)

Right, but you’re much more aware of all the locking issues than I am.

> 
> Clearly this comes from cfg80211 without any locking other than rtnl, so
> you don't have preemption disabled. That's the minimum needed to get rid
> of the warning you found.

In my defense, I did ask you whether there are any potential locking issues in 
the RFC and you didn’t think there were any.

I posted a fix for this.  Could you please review?

Regards,
-Denis



[PATCH] mac80211: Fix oops in ieee80211_tx_control_port

2018-06-19 Thread Denis Kenzior
On pre-emption enabled kernels the following oops was being seen due to
missing local_bh_disable/local_bh_enable calls.  mac80211 assumes that
pre-emption is disabled in the data path.

[ 5365.229756] Call Trace:
[ 5365.229762]  dump_stack+0x5c/0x80
[ 5365.229766]  check_preemption_disabled.cold.0+0x46/0x51
[ 5365.229779]  __ieee80211_subif_start_xmit+0x144/0x210 [mac80211]
[ 5365.229790]  ieee80211_tx_control_port+0x116/0x140 [mac80211]
[ 5365.229806]  nl80211_tx_control_port+0x13c/0x270 [cfg80211]
[ 5365.229810]  genl_family_rcv_msg+0x1c4/0x3a0
[ 5365.229814]  ? nlmon_xmit+0x3c/0x50 [nlmon]
[ 5365.229816]  ? dev_hard_start_xmit+0xa5/0x240
[ 5365.229817]  genl_rcv_msg+0x47/0x90
[ 5365.229818]  ? genl_family_rcv_msg+0x3a0/0x3a0
[ 5365.229820]  netlink_rcv_skb+0x4c/0x120
[ 5365.229821]  genl_rcv+0x24/0x40
[ 5365.229822]  netlink_unicast+0x196/0x240
[ 5365.229824]  netlink_sendmsg+0x1fd/0x3c0
[ 5365.229826]  sock_sendmsg+0x33/0x40
[ 5365.229827]  __sys_sendto+0xee/0x160
[ 5365.229830]  ? __se_sys_epoll_ctl+0x34d/0xe80
[ 5365.229831]  ? do_epoll_wait+0xb0/0xd0
[ 5365.229832]  __x64_sys_sendto+0x24/0x30
[ 5365.229835]  do_syscall_64+0x5b/0x170
[ 5365.229836]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Denis Kenzior 
---
 net/mac80211/tx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 5b93bde248fd..6a79d564de35 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -4850,7 +4850,9 @@ int ieee80211_tx_control_port(struct wiphy *wiphy, struct 
net_device *dev,
skb_reset_network_header(skb);
skb_reset_mac_header(skb);
 
+   local_bh_disable();
__ieee80211_subif_start_xmit(skb, skb->dev, flags);
+   local_bh_enable();
 
return 0;
 }
-- 
2.13.5



Kernel Oops in ieee80211_subif_start_xmit

2018-06-18 Thread Denis Kenzior

Hi Johannes,

Some Arch Linux users are reporting a kernel oops when using iwd on 4.17 
inside the control port bits.  It seems to be pre-emption related since 
I've never seen this in all my testing.  Any pointers as to what might 
be causing this?


[ 5069.567760] Call Trace:
[ 5069.567765]  dump_stack+0x5c/0x80
[ 5069.567769]  check_preemption_disabled.cold.0+0x46/0x51
[ 5069.567778]  __ieee80211_subif_start_xmit+0x144/0x210 [mac80211]
[ 5069.567786]  ieee80211_tx_control_port+0x116/0x140 [mac80211]
[ 5069.567806]  nl80211_tx_control_port+0x13c/0x270 [cfg80211]
[ 5069.567809]  genl_family_rcv_msg+0x1c4/0x3a0
[ 5069.567811]  genl_rcv_msg+0x47/0x90
[ 5069.567814]  ? __kmalloc_node_track_caller+0x210/0x2b0
[ 5069.567815]  ? genl_family_rcv_msg+0x3a0/0x3a0
[ 5069.567816]  netlink_rcv_skb+0x4c/0x120
[ 5069.567818]  genl_rcv+0x24/0x40
[ 5069.567819]  netlink_unicast+0x196/0x240
[ 5069.567821]  netlink_sendmsg+0x1fd/0x3c0
[ 5069.567823]  sock_sendmsg+0x33/0x40
[ 5069.567825]  __sys_sendto+0xee/0x160
[ 5069.567826]  ? __sys_recvmsg+0x54/0xa0
[ 5069.567829]  ? do_epoll_wait+0xb0/0xd0
[ 5069.567830]  __x64_sys_sendto+0x24/0x30
[ 5069.567832]  do_syscall_64+0x5b/0x170
[ 5069.567834]  entry_SYSCALL_64_after_hwframe+0x44/0xa9


Re: [PATCH] cfg80211: Fix support for flushing old scan results

2018-05-22 Thread Denis Kenzior

On 05/22/2018 04:28 PM, Johannes Berg wrote:

On Tue, 2018-05-22 at 16:25 -0500, Denis Kenzior wrote:

Hi Johannes,


But in theory, I think you could've received the beacon with hidden SSID
*before* the scan, yet it might be present in the scan results if the
new scan caused the probe response to be associated with that scan.


Right, your explanation was helpful, thanks.  It still seems completely
weird and redundant that we get two separate entries though.  The second
entry with the probe response data still carries the beacon info (as you
point out).  Should the pure-beacon one be filtered?


I'm not sure. It still indicates that a hidden SSID was found, and in
general even a real SSID on the same BSSID doesn't indicate that this
was the only hidden SSID ...


Right, but you still get that info conveyed through the Beacon IEs 
elements on the second/third/etc entry.  So it still seems redundant to 
include the pure beacon one?


Also, given that you have to ask for the SSID you want specifically, 
what practical purpose does it serve to know that this wasn't the only 
hidden SSID?  I mean you can see that hidden SSIDs are out there, run an 
active scan for the ones you can use.  If none are there, you can just 
ignore that bssid...





Right, so thinking out loud here.  Would it be useful to tell GET SCAN
to only return entries with actual probe response data?  Combined with
the flush flag it seems like a much better fit for the cases you point out.


I don't really see much point in doing filtering in the kernel. It
wouldn't doesn't hurt, but just trades off more kernel code for less
transferred data - and that's mostly in this particular corner case, so
not really an efficiency problem?


Fair enough.  It was more motivated by 'make the API a bit more readable 
/ accessible / user friendly'.




And if it wasn't a hidden SSID, then you probably do want to know about
the non-hidden SSIDs that you picked up along the way. In fact, this
will become crucial with OCE, since that results in cases where you
don't even send a probe request if you've picked up certain things
during the scan passively.


Right.  In our case we only scan passively unless we detect a hidden 
ssid and have provisioned hidden ssids.  Then we issue an active scan 
for just those...


Regards,
-Denis


  1   2   3   >