Hello Yan,

it's very difficult to debug with just what you say ,what is your setup ? (running PacketFence on a raspberry pi is not the same thing than running on a 32 cpus server)

rlm_perl is?0?2 just use to select the correct chroot based on the realm, so it's something really fast.

First what you can check is the radius audit log where you can see how many seconds the request is processing (something more that 1 is not normal , so it mean that there is an issue).

After that you can check the radius authentication latency (something near 15/20ms is ok) and also the NTLM call timing to see if the AD is fast enough.

Last thing, check in graphite the statsd graph, it will show you how long it take to finish some ldap search per example.

As i said, if you don't want to turn around the pot we have support.

Regards

Fabrice


Le 2018-01-31 ?? 20:39, Yan a ??crit?0?2:
Hi Fabrice,
I mean rtml_perl module takes too much time processing requests and drags radius very slow.

And I see, no need to login but only need to open mgmt_ip:9000. But which graphics can tell the issue cause ? Today we did a pressure test with 50 qps (pf+AD authentication) and found the freeradius in pf crashed every time and the phenomenon was very similar with the issue we met recently. We tried to adjust below parameters and the result was always the same: Freeradius crashed in about 2 minutes. First it became slow and then crashed and restarted and then we met ??No EAP session match xxxxxx?? and nearly all requests got rejected. Hardly to believe 50 qps can ddos freeradius...So any configurations suggestions?

We changed below parameters but the result was the same:
Before we change the parameters in radiusd.conf:
1?? max_request_time = 10
2?? cleanup_delay = 5
3?? max_requests = 20000
4?? reject_delay = 1
5?? max_servers = 512 ?0?2??
--------
We changed the parameters in radiusd.conf as below??
1?? max_request_time = 20
2?? cleanup_delay = 10
3?? max_requests = 1280000 <tel:1280000>
4?? reject_delay = 1
5?? max_servers = 512


------------------ Original ------------------
*From:* Fabrice Durand <[email protected]>
*Date:* ????,2?? 1,2018 07:43
*To:* Yan <[email protected]>
*Subject:* Re: [PacketFence-users] All authentication failed with error"NoEAPsession matching state xxxx"

Hello Yan,

there is no username and password.

Also what is doperl module ?

Fabrice



Le 2018-01-31 ?? 09:20, Yan a ??crit :
Hi Fabrice,

I never logged in graph GUI, what??s the username and password it used ? I tried admin GUI account but wrong.

BTW it seems there is a global lock in doperl module and this is the hard bottleneck as per our stress test...


------------------ Original ------------------
*From:* Fabrice Durand <[email protected]>
*Date:* ????,1?? 31,2018 22:04
*To:* Yan <[email protected]>, packetfence-users <[email protected]> *Subject:* Re: [PacketFence-users] All authentication failed with error "NoEAPsession matching state xxxx"

Hello Yan,


Le 2018-01-31 ?? 00:28, Yan a ??crit :

Hi dear users,

After a whole night??s analysis, we found it??s pf that takes too much time processing authentication request if the QPS is too high and hangs all radius requests later and then Aruba AC meets the radius timeout setting and re-sends the same radius access request to pf while pf just sent out the first radius accept packet and then received the same request, it will response accept for a second time and then delete the state id, but Aruba AC might has waited for another 5 seconds and send a radius request for a third time, and this time pf find no state id match this session and just response reject...And then more and more reject responses will cause user re-connect wireless and the QPS is much more...It's bad circle...


We find pf has below bottlenecks at least to lead to the hang issue:

1.Mysql query is too slow.

Most of the times it's because you receive too many accounting packet (try to disable it) or because there too many IO.

2."curl" keeps calling httpd service and it's very slow.

Where do you see curl ?, Freeradius use the rest module to talk to the webservice

3."doperl" is too slow.

Not really, it depend how you configured PacketFence, let's say you have a ldap source but it take 600ms to do a search then the radius answer will be slow.

4."ntlm_auth" process is too slow.

Because probably the AD is too slow to answer, btw you can use the NTLM cache for that.

5.A device will try to connect again if radiusd crashes or restarted or meets its max requests



But we don't find which configuration will solve this issue yet. Is there any suggestion on how to change configuration to handle this performance issue ? Or any basic directions on how to adjust the parameters to handle 200 QPS,500 QPS and 2000 QPS ?


We have setup that handle millions of request per day and without any issues, check the graph like radius latency and also have a look at http://mgmt_ip:9000 and try to find where it take time. Btw if you want to us to check your setup, you can ask for a support with inverse and it will be a pleasure to help you.

Regards
Fabrice

Any response is appreciated. Thank you very very much.


-- Fabrice [email protected]  ::  +1.514.447.4918 (x135) 
::www.inverse.caInverse inc. :: Leaders behind SOGo (http://www.sogo.nu) and 
PacketFence (http://packetfence.org)


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
PacketFence-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/packetfence-users

Reply via email to