In an upgrade of a PF environment from 11.0 to 12.1, following the upgrade
instructions, we stumbled on a bug due to some package dependencies.

These redudant packages are being tracked in a bug here
https://github.com/inverse-inc/packetfence/issues/7246 however they cause
an upgrade from 11.0 to 12.1 to fail.

>From an uptodate 11.0 instance running on Debian 11.6, we ran the upgrade
script, following prompts etc.

# /usr/local/pf/addons/full-upgrade/run-upgrade.sh

The process ended due to the package conflict, but we were able to resolve
the conflicts, getting apt sorted out, and then resume the upgrade script,
which completed the DB upgrades and appeared to exit cleanly.

# dpkg --force-depends -r packetfence-captive-portal-javascript
# apt --fix-broken install
# /usr/local/pf/addons/full-upgrade/run-upgrade.sh

However it did not leave us with a working configuration.   The radius-acct
binding is not working properly; we are not really sure what else might be
a problem, so likely we have to restage to 11.0 and try again once we
figure out an upgrade strategy.

Previously we had the same issue in another environment going from 11.0
->11.1 ->11.2 ->12.1 where the upgrade process was interrupted by a problem
with our firewall causing timeouts during the docker pull.   In that case,
there was no package issue, but we were restarting the upgrade script.  The
result was the same binding issue, and in that case we just decided to
restage to 12.1 rather than figure it out, as our goal was to complete
testing of the captive portal on 12.1.

Our hypothesis is that interruption of the upgrade wrapper script, in one
case due to a timeout on docker image pulls, and in this latest case from
package dependencies, led to a 12.1 environment with this service binding
issue.

Our next step will be to restage 11.0 -> 12.1 without interruption by
starting with

# dpkg --force-depends -r packetfence-captive-portal-javascript

to see if we can get to 12.1 with radius-acct/pfacct bindings in proper
working order.

Below are some details, maybe if there is an obvious quick fix that might
be useful for anyone going through the upgrade process with similar
results, or save us the time to restage later this month.

cheers,
Ian

We did some quick comparisons to try and find the difference in systemctl
packetfence units and configuration between working and no-working 12.1
system.  We greped config and systemctl units for "1813" and did not find
any differences but we know its there somewhere.

On the working PF 12.1 instance, staged directly to 12.1, the binding looks
like the following, where 10.2.1.2 is the management network where switches
connect to radius-acct.

pf4:/usr/local/pf/conf# netstat -tunap | grep ":1813"
udp        0      0 10.2.1.2:1813           0.0.0.0:*
    42375/pfacct

On the non-working instance the binding seems to be to the localhost;

pf3:/usr/local/pf/conf# netstat -tunap | grep ":1813"
udp        0      0 127.0.0.1:1813          0.0.0.0:*
    1942/pfacct


We can see radius-acct complaining in the logs as well and working hard to
restart what we believe is a docker container with radius running inside
that the actual binding should be pointed at.

radius-acct.log:Jan 22 15:48:16 pf3 radiusd-acct-docker-wrapper[93781]: Sun
Jan 22 15:48:16 2023 : Error: Failed binding to acct address * port 1813
bound to server packetfence: Address already in use
radius-acct.log:Jan 22 15:48:16 pf3 radiusd-acct-docker-wrapper[93781]: Sun
Jan 22 15:48:16 2023 : Error: /usr/local/pf/raddb/acct.conf[8]: Error
binding to port for 0.0.0.0 port 1813
radius-acct.log:Jan 22 15:48:19 pf3 radiusd-acct-docker-wrapper[93890]:
Error: No such container: radiusd-acct
radius-acct.log:Jan 22 15:48:19 pf3 radiusd-acct-docker-wrapper[93890]:
Error: No such container: 1

If we stop the pfacct service only

systemctl stop packetfence-pfacct.service

this allows the docker  binding to work but not the pfacct process can not
be restarted, and the binding appears to then be directly with freeradius,
rather than pfacct, and on all interfaces rather than just the registration
VLAN.

pf3:/usr/local/pf/logs# netstat -tunap | grep ":1813"
udp        0      0 0.0.0.0:1813            0.0.0.0:*
    93957/freeradius

Without more knowledge of the difference between the PF Go (pfacct) and
Freeradius processes and how the new docker container bindings work, it
looks like restage and retry the upgrade is probably the next step for us.
_______________________________________________
PacketFence-users mailing list
PacketFence-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/packetfence-users

Reply via email to