Follow up to my issues with pfdhcpdlistener process stalling when using
Fortigate firewall SSO
This seems to be some sort of race condition in the API processing of DHCP.
As far as I can tell the pfdhcplistener process analyses the DHCP packet and
punts it on to the
pf:dhcp:processor.pm:process_packet via the API.
The pf:dhcp:processor.pm module registers the IP and MAC addresses extracted
from the DHCP packets
in the nodes table of the database. At this point it also performs the SSO to
the firewall. This SSO request
is punted to the pf:firewallsso module again via the API.
When I turn on the SSO everything works fine for a while. Every so often the
QUEUE length for the pfdhcplistener
process will rise to about 15 and then drop back to 0 but eventually (about 20
minutes) a spike in the the QUEUE length for
the pfdhcplistener process rises to over 30 but then does not drop back to 0
but very rapidly rises to several 1000 at which
time the whole pf:dhcp:processor.pm module stops getting any more packets from
the API.
I cannot find any ERROR messages in the log files (under TRACE mode) for any of
the systems. The only thing I have
noted is that more than 1 pfdhcplistener is picking up the same DHCP packet
from the queue:-
Dec 13 14:52:32 pfqueue(20439) INFO: [mac:unknown] APKT---DHCPACK for
88:e8:7f:cd:85:d5(10.240.240.28) for 1800 seconds
(pf::dhcp::processor::process_packet)
Dec 13 14:52:32 pfqueue(20439) INFO: [mac:unknown] The listener process is on
the same server as the DHCP server. (pf::dhcp::processor::pf_is_dhcp)
Dec 13 14:52:32 pfqueue(20439) INFO: [mac:unknown] DHCPACK from 10.30.249.118
(00:26:98:0a:86:41) to host 88:e8:7f:cd:85:d5 (10.240.240.28) for 1800 seconds
(pf::dhcp::processor::parse_dhcp_ack)
Dec 13 14:52:32 pfqueue(20439) INFO: [mac:unknown] The listener process is on
the same server as the DHCP server. (pf::dhcp::processor::pf_is_dhcp)
Dec 13 14:52:32 pfqueue(20439) INFO: [mac:unknown] Updating iplog for
88:e8:7f:cd:85:d5 -> 10.240.240.28 (pf::dhcp::processor::handle_new_ip)
Dec 13 14:52:32 pfqueue(20455) INFO: [mac:unknown] APKT---DHCPACK for
88:e8:7f:cd:85:d5(10.240.240.28) for 1800 seconds
(pf::dhcp::processor::process_packet)
Dec 13 14:52:32 pfqueue(20455) INFO: [mac:unknown] The listener process is on
the same server as the DHCP server. (pf::dhcp::processor::pf_is_dhcp)
Dec 13 14:52:32 pfqueue(20455) INFO: [mac:unknown] DHCPACK from 10.30.249.118
(00:26:98:0a:86:41) to host 88:e8:7f:cd:85:d5 (10.240.240.28) for 1800 seconds
(pf::dhcp::processor::parse_dhcp_ack)
Dec 13 14:52:32 pfqueue(20455) INFO: [mac:unknown] The listener process is on
the same server as the DHCP server. (pf::dhcp::processor::pf_is_dhcp)
Dec 13 14:52:32 pfqueue(20455) INFO: [mac:unknown] Updating iplog for
88:e8:7f:cd:85:d5 -> 10.240.240.28 (pf::dhcp::processor::handle_new_ip)
Dec 13 14:52:32 pfqueue(20439) INFO: [mac:88:e8:7f:cd:85:d5] Found locationlog
entry with role : with start date 2017-11-09 12:54:07
(pf::dhcp::processor::check_for_parking)
Dec 13 14:52:32 pfqueue(20439) WARN: [mac:88:e8:7f:cd:85:d5] 88:e8:7f:cd:85:d5
STUCK on the registration role for 2944705 seconds 10.240.240.28. Triggering
parking violation (pf::dhcp::processor::check_for_parking)
Dec 13 14:52:32 pfqueue(20455) INFO: [mac:88:e8:7f:cd:85:d5] Found locationlog
entry with role : with start date 2017-11-09 12:54:07
(pf::dhcp::processor::check_for_parking)
Dec 13 14:52:32 pfqueue(20455) WARN: [mac:88:e8:7f:cd:85:d5] 88:e8:7f:cd:85:d5
STUCK on the registration role for 2944705 seconds 10.240.240.28. Triggering
parking violation (pf::dhcp::processor::check_for_parking)
Dec 13 14:52:32 pfqueue(20439) INFO: [mac:88:e8:7f:cd:85:d5] Updating SSO for
88:e8:7f:cd:85:d5 -> 10.240.240.28 (pf::dhcp::processor::handle_new_ip)
Dec 13 14:52:32 pfqueue(20455) INFO: [mac:88:e8:7f:cd:85:d5] Updating SSO for
88:e8:7f:cd:85:d5 -> 10.240.240.28 (pf::dhcp::processor::handle_new_ip)
In this log extract you can see that two separate pfqueue processes are parsing
the same DHCP packet 20455 and 20439.
I would have thought that the API hands the call to the
pf:dhcp:processor:process_packet module and then it is removed
from the redis queue and then move on to the next one but it seems to be
handing the same request to two processes.
Something seems very wrong with this to me but I do not really understand how
the ‘redis’ stuff works exactly.
Andrew
Andrew Torry
Senior Infrastructure Engineer
Tel: 01326 370760
Email: [email protected]
[cid:[email protected]]
[Falmouth Exeter Plus]
[cid:[email protected]]
[Twitter] <https://twitter.com/falmouthexeter> [Facebook]
<https://www.facebook.com/falmouthexeter> [Instagram]
<https://www.instagram.com/falmouthexeterplus/> [YouTube]
<https://www.youtube.com/channel/UC5-Jq4vTOhWgYoJJDYrZHWw>
[cid:[email protected]]
[Falmouth University]
Falmouth Exeter Plus is an exempt charity established by Falmouth University
and the University of Exeter to deliver their shared Higher Education services
in Cornwall.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
PacketFence-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/packetfence-users