Re: [PacketFence-users] Hostapd causing duplicate Sign-Into-Network prompts intermittently

Durand fabrice via PacketFence-users Tue, 06 Mar 2018 18:46:57 -0800

Hello Ian,

there is a configuration parameter that can be change to raise the timebetween the deauth order and the CoA request.


It's Config{'fencing'}{'wait_for_redirect'} , so in pf.conf:

[fencing]
#
# fencing.wait_for_redirect
#

# How many seconds should the WebAPI sleep before actually triggeringthe VLAN change.# This is meant to give the device enough time to fetch the redirectionpage before

# switching VLAN.
wait_for_redirect = 31

Regards

Fabrice



Le 2018-03-06 à 13:54, Ian MacDonald via PacketFence-users a écrit :

Well,

We easily narrowed this down to a timing issue related to theAccess-Accept(2) messages and how long an authorization is consideredvalid by hostapd.

In short, we need to make sure that the CoA from portal activation isNOT sent to the AP within 30 seconds of the initial connection to theregistration VLAN.

We don't think the redirect or network_detection delay is suitable forthis as that happens after the CoA.

*How do we bake in a delay between Access-Accept and CoA messages fromthe radius server for speedy WISPr clients? *

*
*
Is there a portal setting that can help us here?

Ideally, a "minimum_coa_delay" setting could allow packetfence to onlyissue a CoA after a minimum time period since the last Access-Requestor Access-Accept was processed.

Additionally, we are going to investigate configuration of hostapd tochange the 30s to something smaller (which would solve most our usecases) and/or, having the 30s timer reset on receipt of a CoA. Boththese seem non-trivial vs adding this intelligence to packetfencewhich may solve similar race conditions in the wide variety of radiusclients supported.


Details follow,

cheers,
Ian

Hostapd considers the last authorization valid for 30 seconds, and ifa host disconnects and reconnects within that window, it does notcheck back with the radius server for a new authorization. Thisholds true even if a CoA is received from the radius server; Thelatter which I think is a behavior that could be enhanced by hostapd -after all it is called "Change of Authorization" request.

How this is resulting the double prompt for some users, is that if auser is able to complete the portal activation within 30 seconds ofthe device connecting to the registration VLAN, hostapd does not issueanother Access-Request, but instead considers the currentauthorization to be valid, and the devices remain on the registrationVLAN evem after the CoA is sent that should switch them to the NormalVLAN.

This means you can actually re-Sign-In multiple times within that 30second window if you are quick (my email auto-populates on my phonemaking this easy). Each time the portal generates a CoA but returnsthe device to the registration VLAN. At 31 seconds after the initialdevice connnection to the registration VLAN, any subsequent CoAresults in the correct VLAN being assigned.

The 30s authorization sort of makes sense in order to deal with aclient that is authorized and disconnects and reconnects quickly abunch of times, you avoid some unnecessary radius requests back andforth, but the benefit is minimized since the Radius-Accountingpackets are still processed with each disconnect.

This 30 second delay also impacts de-registration. We were able toconfirm that if we try and de-register a client from the PF GUI within30 seconds of them being activated, they remain on the Normal VLAN. So there is some impact here as well -> And we remember seeing thislittle bug a long time ago, so we think this has always been there. Often testing the captive portals we de-register devices quickly to"try them again" and we used to have to hit "re-evaluate" after save.. but it turns out, only if we do it under 30 seconds.

On Mon, Mar 5, 2018 at 9:21 PM, Ian MacDonald <i...@netstatz.com<mailto:i...@netstatz.com>> wrote:


    We have packetfence 7.4 instances out-of-band running on Debian 8,
    and use the captive portal with hostapd for WiFi client access.

    Our clients register using an email source for activation.

    Our configuration uses all captive_portal defaults except for the
    network detection IP, shown below.

    network_detection_initial_delay = 5s
    network_detection_retry_delay = 2s
    network_redirect_delay = 20s
    secure_redirect=enabled
    detection_mecanism_bypass = disabled
    wispr_redirection = enabled

    At some point in the past I recall our mobile clients used to see
    the scrollbar for many seconds (10+?), which I believe was
    redirect_delay being interrupted by network_detection; this would
    occur immediately after submitting an email for activation.

    Now on PF 7.4.0, hostapd 17.01.4 and with Samsung S8 clients, the
    delay between submitting the email and activation onto the
    user/normal VLAN is instantaneous (about 2 seconds according to
    our logs).   I suspect this speedy interaction has something to do
    with WiSPr and CoA working together quickly, whereas previously we
    were maybe only doing captive portal.

    The problem is that intermittently, the Android clients, after
    activation, are returned to the Registration VLAN instead of the
    Normal/User VLAN, which causes those clients to pop up the "Sign
    Into Network" WiSPR prompt a second time.  Clicking on it again
    loads a quick PF Network Error page which disappears as quickly as
    it appears, and a second later the client is dumped onto the
    Normal/User VLAN as it should have been originally.

    This is confusing for end users, and most make it through. But not
    everyone is happy.

    Below is the working scenario, as seen in terms of tcpdump of
    radius packets on hostapd.

    PF server is 10.2.1.2 and the hostapd daemon is 10.2.1.11

    Client Connects to WiFi Network and Prestented with Email
    Activation in Portal
    12:03:58.157726 10.2.1.11.49281 > 10.2.1.2.1812: Access-Request (1)
    12:03:58.211386 10.2.1.2.1812 > 10.2.1.11.49281: Access-Accept (2)
    12:03:58.279846 10.2.1.11.45520 > 10.2.1.2.1813:
    Accounting-Request (4)
    12:03:58.325525 10.2.1.2.1813 > 10.2.1.11.45520:
    Accounting-Response (5)
    ~30s later Client hits Activation button after entering email
    12:04:28.735459 10.2.1.2.60410 > 10.2.1.11.3799:
    Disconnect-Request (40)
    12:04:28.736847 10.2.1.11.45520 > 10.2.1.2.1813:
    Accounting-Request (4)
    12:04:28.737166 10.2.1.11.3799 > 10.2.1.2.60410: Disconnect-ACK (41)
    12:04:28.762991 10.2.1.2.1813 > 10.2.1.11.45520:
    Accounting-Response (5)
    12:04:30.717764 10.2.1.11.49281 > 10.2.1.2.1812: Access-Request (1)
    12:04:30.782158 10.2.1.2.1812 > 10.2.1.11.49281: Access-Accept (2)
    12:04:30.906401 10.2.1.11.45520 > 10.2.1.2.1813:
    Accounting-Request (4)
    12:04:30.943903 10.2.1.2.1813 > 10.2.1.11.45520:
    Accounting-Response (5)
    ~2s later Client is on normal/user VLAN with "Connected" message.

    Corresponding log entries from the pf radius log showing
    Registration (81) and Normal (76) vlans in concert with the
    Access-Request packets:

    Mar  5 12:03:58 pf4 auth[1284]: [mac:a0:cc:2b:15:20:97] Accepted
    user:  and returned VLAN 81
    Mar  5 12:03:58 pf4 auth[1284]: (89) Login OK: [a0cc2b152097]
    (from client 10.2.1.11 port 0 cli a0:cc:2b:15:20:97)
    Mar  5 12:04:30 pf4 auth[1284]: [mac:a0:cc:2b:15:20:97] Accepted
    user:  and returned VLAN 76
    Mar  5 12:04:30 pf4 auth[1284]: (90) Login OK: [a0cc2b152097]
    (from client 10.2.1.11 port 0 cli a0:cc:2b:15:20:97)

    The hostapd radius logs align with PF and show the disconnect is
    followed by the new vlan received in the Access-Accept (2) message

    Mon Mar  5 12:03:58 2018 daemon.info <http://daemon.info> hostapd:
    wlan0: STA a0:cc:2b:15:20:97 RADIUS: VLAN ID 81
    Mon Mar  5 12:03:58 2018 daemon.info <http://daemon.info> hostapd:
    wlan0: STA a0:cc:2b:15:20:97 RADIUS: starting accounting session
    628C7DBAF341039F
    Mon Mar  5 12:04:28 2018 daemon.info <http://daemon.info> hostapd:
    wlan0: STA a0:cc:2b:15:20:97 RADIUS: stopped accounting session
    628C7DBAF341039F
    Mon Mar  5 12:04:30 2018 daemon.info <http://daemon.info> hostapd:
    wlan0: STA a0:cc:2b:15:20:97 RADIUS: VLAN ID 76
    Mon Mar  5 12:04:30 2018 daemon.info <http://daemon.info> hostapd:
    wlan0: STA a0:cc:2b:15:20:97 RADIUS: starting accounting session
    628C7DBAF341039F


    Here is what it looks like in the buggy scenario where the user is
    prompted twice to "Sign into Network".  It looks like after the
    Disconnect-Request (40) to change to the normal VLAN, the hostapd
    does not send the Access-Request (1) but just uses the VLAN from
    the previous Access-Accept(2).

    Client Connects to WiFi Network and Prestented with Email
    Activation in Portal
    15:11:35.269960 10.2.1.11.49281 > 10.2.1.2.1812: Access-Request (1)
    15:11:35.326104 10.2.1.2.1812 > 10.2.1.11.49281: Access-Accept (2)
    15:11:35.386984 10.2.1.11.45520 > 10.2.1.2.1813:
    Accounting-Request (4)
    15:11:35.427652 10.2.1.2.1813 > 10.2.1.11.45520:
    Accounting-Response (5)
    ~22s later Client hits Activation button after entering email
    15:11:57.417798 10.2.1.2.58842 > 10.2.1.11.3799:
    Disconnect-Request (40)
    15:11:57.420730 10.2.1.11.45520 > 10.2.1.2.1813:
    Accounting-Request (4)
    15:11:57.424759 10.2.1.11.3799 > 10.2.1.2.58842: Disconnect-ACK (41)
    15:11:57.446361 10.2.1.2.1813 > 10.2.1.11.45520:
    Accounting-Response (5)
    15:11:59.243604 10.2.1.11.45520 > 10.2.1.2.1813:
    Accounting-Request (4)
    15:11:59.283332 10.2.1.2.1813 > 10.2.1.11.45520:
    Accounting-Response (5)
    ~29s later Client clicks on the second occurrence of "Sign into
    Network"
    15:12:28.554417 10.2.1.2.41995 > 10.2.1.11.3799:
    Disconnect-Request (40)
    15:12:28.557488 10.2.1.11.45520 > 10.2.1.2.1813:
    Accounting-Request (4)
    15:12:28.561537 10.2.1.11.3799 > 10.2.1.2.41995: Disconnect-ACK (41)
    15:12:28.580379 10.2.1.2.1813 > 10.2.1.11.45520:
    Accounting-Response (5)
    15:12:30.262645 10.2.1.11.49281 > 10.2.1.2.1812: Access-Request (1)
    15:12:30.322381 10.2.1.2.1812 > 10.2.1.11.49281: Access-Accept (2)
    15:12:30.446732 10.2.1.11.45520 > 10.2.1.2.1813:
    Accounting-Request (4)
    15:12:30.483542 10.2.1.2.1813 > 10.2.1.11.45520:
    Accounting-Response (5)

    Again the pf radius logs also show no autz request after the
    initial disconnect request.

    Mar  5 15:11:35 pf4 auth[1284]: [mac:a0:cc:2b:15:20:97] Accepted
    user:  and returned VLAN 81
    Mar  5 15:11:35 pf4 auth[1284]: (93) Login OK: [a0cc2b152097]
    (from client 10.2.1.11 port 0 cli a0:cc:2b:15:20:97)
    Mar  5 15:12:30 pf4 auth[1284]: [mac:a0:cc:2b:15:20:97] Accepted
    user:  and returned VLAN 76
    Mar  5 15:12:30 pf4 auth[1284]: (94) Login OK: [a0cc2b152097]
    (from client 10.2.1.11 port 0 cli a0:cc:2b:15:20:97)

    Additionally the hostapd logs show that the registration vlan 81
    seemed to be automatically assigned a second time after the
    initial Disconnect-Request:

    Mon Mar  5 15:11:35 2018 daemon.info <http://daemon.info> hostapd:
    wlan0: STA a0:cc:2b:15:20:97 RADIUS: VLAN ID 81
    Mon Mar  5 15:11:35 2018 daemon.info <http://daemon.info> hostapd:
    wlan0: STA a0:cc:2b:15:20:97 RADIUS: starting accounting session
    27395E82D8E75E24
    Mon Mar  5 15:11:57 2018 daemon.info <http://daemon.info> hostapd:
    wlan0: STA a0:cc:2b:15:20:97 RADIUS: stopped accounting session
    27395E82D8E75E24
    Mon Mar  5 15:11:59 2018 daemon.info <http://daemon.info> hostapd:
    wlan0: STA a0:cc:2b:15:20:97 RADIUS: VLAN ID 81
    Mon Mar  5 15:11:59 2018 daemon.info <http://daemon.info> hostapd:
    wlan0: STA a0:cc:2b:15:20:97 RADIUS: starting accounting session
    27395E82D8E75E24
    Mon Mar  5 15:12:28 2018 daemon.info <http://daemon.info> hostapd:
    wlan0: STA a0:cc:2b:15:20:97 RADIUS: stopped accounting session
    27395E82D8E75E24
    Mon Mar  5 15:12:30 2018 daemon.info <http://daemon.info> hostapd:
    wlan0: STA a0:cc:2b:15:20:97 RADIUS: VLAN ID 76
    Mon Mar  5 15:12:30 2018 daemon.info <http://daemon.info> hostapd:
    wlan0: STA a0:cc:2b:15:20:97 RADIUS: starting accounting session
    27395E82D8E75E24

    My gut thinks that
    1) maybe hostapd has a problem, so shortly after this email we are
    going to go back and stage a legacy hostapd/15.05 AP to rule out
    recent hostapd issues and figure out how to get more debug out of
    hostapd logging.

    2)  Speedy Android 7 WiSPr may be a problem for hostapd.  Try some
    slower clients with captive portal detection only and long
    redirect delay (like we used to see)

    Posting here for some additional insights and see if anyone else
    has seen similar problems or has some workaround ideas.

    cheers,
    Ian




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


_______________________________________________
PacketFence-users mailing list
PacketFence-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/packetfence-users

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
PacketFence-users mailing list
PacketFence-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/packetfence-users

Re: [PacketFence-users] Hostapd causing duplicate Sign-Into-Network prompts intermittently

Reply via email to