Re: [PacketFence-users] Strange Timeout Errors during image download for PF 11 and 12 Upgrades

Ian MacDonald via PacketFence-users Wed, 11 Jan 2023 11:34:07 -0800

Zammit,

No proxy.  The management VLAN operates the portal listener behind an
OpenWRT firewall, which forwards 80/443 to Packetfence portal.   iPerf
shows no bandwidth issues whatsoever, in our lab,  and I we have never seen
any other connection issues, during apt upgrades, inbound portal
connections for authentication/login, etc.  Qualify that with the fact it
is a lab environment, and only sees real use when upgrades and testing are
underway.


Unfortunately, we have not upgraded our Prod yet (until we get this other
pesky mobile login / detection issue sorted from my other thread) which
will tell us if we see the same thing in our twinned Prod setup which is on
a geographically separate connection and location, but has all the same
pieces, and would further rule out anything network related if it repeated,
or if it was fine, squarely having us looking at the specific DNS and NAT
pieces involved for the versioned differences between or lab and prod bits,
which amount to a Xen Hypevisor, OpenWRT firewall and PowerDNS recursor
beyond physical L2 connectivity.

The IPv6 lookup had me a bit baffled, and I am wondering if there is
something unique about the DNS lookups for the image pulls,  noting that
the timeout appears to happen on DNS frames in most cases (port 53), that
might point us at an actual issue with timeouts at the DNS resolver, which
is ours, running PowerDNS doing about 40-50 queries per second for other
IPv4 only hosts which are not reporting any DNS issues.

Since the retry attempts eventually work, and upgrades are once in a while,
we have not worked very hard to figure it out;  There is probably more
debug we could be doing to observe an actual timeout on DNS by isolating
the traffic there during the image pull phase of the upgrade, which we
would probably be able to do going from 11.1-12.0-12.1, if we see this
during our 11.0-11.1 upgrade of Prod, after we fix the mobile device
captive portal on this 12.1 instance.

cheers,
Ian

pf4:~# iperf3 -c 105.244.196.67
Connecting to host 105.244.196.67, port 5201
[  5] local 10.2.1.2 port 46870 connected to 105.244.196.67 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   284 MBytes  2.38 Gbits/sec    0   1.56 MBytes

[  5]   1.00-2.00   sec   309 MBytes  2.59 Gbits/sec    0   1.56 MBytes

[  5]   2.00-3.00   sec   314 MBytes  2.63 Gbits/sec    0   1.56 MBytes

pf4:~# iperf3 -R -c 105.244.196.67
Connecting to host 105.244.196.67, port 5201
Reverse mode, remote host 105.244.196.67 is sending
[  5] local 10.2.1.2 port 51046 connected to 105.244.196.67 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   280 MBytes  2.35 Gbits/sec
[  5]   1.00-2.00   sec   333 MBytes  2.80 Gbits/sec
[  5]   2.00-3.00   sec   315 MBytes  2.64 Gbits/sec



On Tue, Jan 10, 2023 at 4:06 PM Zammit, Ludovic <luza...@akamai.com> wrote:

> Hello Ian,
>
> Is your PF server is behind a HTTP proxy ?
>
> Thanks,
>
>
> *Ludovic Zammit*
> *Product Support Engineer Principal Lead*
> *Cell:* +1.613.670.8432
> Akamai Technologies - Inverse
> 145 Broadway
> Cambridge, MA 02142
> Connect with Us: <https://community.akamai.com> <http://blogs.akamai.com>
> <https://twitter.com/akamai> <http://www.facebook.com/AkamaiTechnologies>
> <http://www.linkedin.com/company/akamai-technologies>
> <http://www.youtube.com/user/akamaitechnologies?feature=results_main>
>
> On Jan 10, 2023, at 3:27 PM, Ian MacDonald via PacketFence-users <
> packetfence-users@lists.sourceforge.net> wrote:
>
> Hey PF Users,
>
> For recent versions; I believe 11.1, 12.0 and now 12.1 and possibly 11.0
> (Fairly certain since the images below were downloaded from Inverse repos
> all at once during the installation or upgrade process) We have been having
> to restart the upgrade process due to timeout related errors.  Often 1-5
> times re-executions are required to complete the upgrade process.
>
> - proxysql
> - haproxy-portal
> - pfsso
> - radiusd-eduroam
> - httpd.aaa
> - radiusd-cli
> - pfconfig
> - fingerbank-db
> - pfcmd
> - radiusd-load-balancer
> - httpd.admin_dispatcher
> - radiusd-acct
> - pfpki
> - httpd.portal
> - httpd.dispatcher
> - pfcron
> - pfconnector
> - httpd.webservices
> - radiusd-auth
> - haproxy-admin
> - pfqueue
> - api-frontend
> - pfperl-api
>
> We do not really understand why, as there does not appear to be any
> connectivity or DNS lookup issues that would cause this type of behavior.
>  Below are some of the output lines captured during our installation
> process during a recent upgrade from 11.1 to 12.0 and then again from 12.0
> to 12.1.
>
> In a minor 12.0 upgrade we saw this one referencing an IPv6 github
> address, yet the system is IPv4, so no idea why it is attempting IPv6
> error pulling image configuration: Get "
> https://pkg-containers.githubusercontent.com/ghcr1/blobs/sha256:c31d236d97e3beb137f8c2b02bfbe88d0093b5592d9f181935c9c03a0132a142?se=2023-01-10T14%3A40%3A00Z&sig=%2B
> <https://urldefense.com/v3/__https://pkg-containers.githubusercontent.com/ghcr1/blobs/sha256:c31d236d97e3beb137f8c2b02bfbe88d0093b5592d9f181935c9c03a0132a142?se=2023-01-10T14*3A40*3A00Z&sig=*2B__;JSUl!!GjvTz_vk!T20y3SBb08rY8cyH1WgrsAfIusBD_b-80oH5hDoWOCzwhs8YdQjAEQvs76hnGa9EbMFuTDo_A3f20KzOPmvCrL3pfveLoirlkdPprQ$>
> HBahj6l0521Bm%2FB40v51MhZmNHztLYxzxBgJlsefEE%3D&sp=r&spr=https&sr=b&sv=2019-12-12":
> dial tcp [2606:50c0:8001::154]:443: connect: network is unreachable
>
> In another 12.0 upgrade attempt, we saw this one, which looks like a
> timeout to our DNS recursor, but to which there are no I/O bound or
> restrictive conditions we can see.
>
> error pulling image configuration: Get "
> https://ghcr.io/v2/inverse-inc/packetfence/pfcmd/blobs/sha256:5631317df2b6910aa8da1f20a382c04ecc0ffb572aeb7fd3201a18b0bee18633
> <https://urldefense.com/v3/__https://ghcr.io/v2/inverse-inc/packetfence/pfcmd/blobs/sha256:5631317df2b6910aa8da1f20a382c04ecc0ffb572aeb7fd3201a18b0bee18633__;!!GjvTz_vk!T20y3SBb08rY8cyH1WgrsAfIusBD_b-80oH5hDoWOCzwhs8YdQjAEQvs76hnGa9EbMFuTDo_A3f20KzOPmvCrL3pfveLoiouH-Lgiw$>":
> dial tcp: lookup ghcr.io
> <https://urldefense.com/v3/__http://ghcr.io__;!!GjvTz_vk!T20y3SBb08rY8cyH1WgrsAfIusBD_b-80oH5hDoWOCzwhs8YdQjAEQvs76hnGa9EbMFuTDo_A3f20KzOPmvCrL3pfveLoiobWe_Cqg$>
> on 105.244
> .196.155:53: read udp 10.2.1.2:35975->105.244.196.155:53
> <https://urldefense.com/v3/__http://105.244.196.155:53__;!!GjvTz_vk!T20y3SBb08rY8cyH1WgrsAfIusBD_b-80oH5hDoWOCzwhs8YdQjAEQvs76hnGa9EbMFuTDo_A3f20KzOPmvCrL3pfveLoiobNtwt0A$>:
> i/o timeout
>
> In 12.1 using the do-upgrade script we saw these similar messages
>
> Error response from daemon: Get "https://ghcr.io/v2/
> <https://urldefense.com/v3/__https://ghcr.io/v2/__;!!GjvTz_vk!T20y3SBb08rY8cyH1WgrsAfIusBD_b-80oH5hDoWOCzwhs8YdQjAEQvs76hnGa9EbMFuTDo_A3f20KzOPmvCrL3pfveLoiomwVg7yA$>":
> dial tcp: lookup ghcr.io
> <https://urldefense.com/v3/__http://ghcr.io__;!!GjvTz_vk!T20y3SBb08rY8cyH1WgrsAfIusBD_b-80oH5hDoWOCzwhs8YdQjAEQvs76hnGa9EbMFuTDo_A3f20KzOPmvCrL3pfveLoiobWe_Cqg$>
> on 105.244.196.155:53
> <https://urldefense.com/v3/__http://105.244.196.155:53__;!!GjvTz_vk!T20y3SBb08rY8cyH1WgrsAfIusBD_b-80oH5hDoWOCzwhs8YdQjAEQvs76hnGa9EbMFuTDo_A3f20KzOPmvCrL3pfveLoiobNtwt0A$>:
> read udp 10.2.1.2:60065->105.244.196.155:53
> <https://urldefense.com/v3/__http://105.244.196.155:53__;!!GjvTz_vk!T20y3SBb08rY8cyH1WgrsAfIusBD_b-80oH5hDoWOCzwhs8YdQjAEQvs76hnGa9EbMFuTDo_A3f20KzOPmvCrL3pfveLoiobNtwt0A$>:
> i/o timeout
>
> Error response from daemon: Head "
> https://ghcr.io/v2/inverse-inc/packetfence/radiusd-eduroam/manifests/maintenance-12-1
> <https://urldefense.com/v3/__https://ghcr.io/v2/inverse-inc/packetfence/radiusd-eduroam/manifests/maintenance-12-1__;!!GjvTz_vk!T20y3SBb08rY8cyH1WgrsAfIusBD_b-80oH5hDoWOCzwhs8YdQjAEQvs76hnGa9EbMFuTDo_A3f20KzOPmvCrL3pfveLoioq_IOarQ$>":
> dial tcp: lookup ghcr.io
> <https://urldefense.com/v3/__http://ghcr.io__;!!GjvTz_vk!T20y3SBb08rY8cyH1WgrsAfIusBD_b-80oH5hDoWOCzwhs8YdQjAEQvs76hnGa9EbMFuTDo_A3f20KzOPmvCrL3pfveLoiobWe_Cqg$>
> on 104.244.196.155:53
> <https://urldefense.com/v3/__http://104.244.196.155:53__;!!GjvTz_vk!T20y3SBb08rY8cyH1WgrsAfIusBD_b-80oH5hDoWOCzwhs8YdQjAEQvs76hnGa9EbMFuTDo_A3f20KzOPmvCrL3pfveLoiqPh85jfA$>:
> read udp 10.2.1.2:55582->104.244.196.155:53
> <https://urldefense.com/v3/__http://104.244.196.155:53__;!!GjvTz_vk!T20y3SBb08rY8cyH1WgrsAfIusBD_b-80oH5hDoWOCzwhs8YdQjAEQvs76hnGa9EbMFuTDo_A3f20KzOPmvCrL3pfveLoiqPh85jfA$>:
> i/o timeout
>
> We just repeated, which seemed like a good time to send this email, and it
> worked (3rd time just now on 12.1).
>
> Tue Jan 10 15:22:56 EST 2023 - Pull of images finished
> Tue Jan 10 15:22:58 EST 2023 - Tag of images finished
> Tue Jan 10 15:23:45 EST 2023 - Previous images cleaned
>
> It seems very odd that we get these timeouts when doing the image
> downloading.   Maybe somebody else has seen this or knows why it may be
> occurring during this stage of the installation/upgrade process.
>
> cheers,
> Ian
> _______________________________________________
> PacketFence-users mailing list
> PacketFence-users@lists.sourceforge.net
>
> https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/packetfence-users__;!!GjvTz_vk!T20y3SBb08rY8cyH1WgrsAfIusBD_b-80oH5hDoWOCzwhs8YdQjAEQvs76hnGa9EbMFuTDo_A3f20KzOPmvCrL3pfveLoipu2fzoWw$
>
>
>

_______________________________________________
PacketFence-users mailing list
PacketFence-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/packetfence-users

Re: [PacketFence-users] Strange Timeout Errors during image download for PF 11 and 12 Upgrades

Reply via email to