I investigated this quite a bit, and this appears to be an ntp bug and
not a charm bug.

This host is a trusty host, running ntp version
1:4.2.6.p5+dfsg-3ubuntu2.14.04.13. We have other hosts running the same
version that don't have the problem described above.

I spent quite some time investigating this, comparing the hosts, running
strace etc, and I noticed a subtle difference in /etc/hosts : on the
working host, the ::1 entry doesn't have "localhost", but it does on the
failing host. When I removed "localhost" from the ::1 entry on the
failing host, "ntpq -pn" started working.

Investigating things a bit more, I found out that on the working host,
ntpd was listening on ::1 but on the failing host, it wasn't (by
checking "ss -anupe" output as well as ntpd starting logs).

Comparing straces of starting ntpd, I think I was able to find what's
going on. On the working host it gives (only relevant output is posted
here) :

3973  19:41:32 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 5
[...]
3973  19:41:32 ioctl(5, SIOCGIFFLAGS, {ifr_name="qvobb268af4-e9", 
ifr_flags=IFF_UP|IFF_BROADCAST|IFF_RUNNING|IFF_PROMISC|IFF_MULTICAST}) = 0
3973  19:41:32 ioctl(5, SIOCGIFFLAGS, {ifr_name="qbrd5588b49-e3", 
ifr_flags=IFF_UP|IFF_BROADCAST|IFF_RUNNING|IFF_MULTICAST}) = 0
3973  19:41:32 ioctl(5, SIOCGIFFLAGS, {ifr_name="qvb1693c156-5f", 
ifr_flags=IFF_UP|IFF_BROADCAST|IFF_RUNNING|IFF_PROMISC|IFF_MULTICAST}) = 0
[... the same for a bunch of interfaces - this is a nova compute node so this 
is expected ...]
3973  19:41:32 close(5)                 = 0


But on the failing host, it checks a single interface :
56717 19:37:03 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 5
[...]
56717 19:37:03 ioctl(5, SIOCGIFFLAGS, {ifr_name="qvbba244f00-69", 
ifr_flags=IFF_UP|IFF_BROADCAST|IFF_RUNNING|IFF_PROMISC|IFF_MULTICAST}) = 0
56717 19:37:03 close(5)                 = 0

So I thought this interface was a bit special :
$ ip li sh dev qvbba244f00-69
67772: qvbba244f00-69@qvoba244f00-69: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> 
mtu 1500 qdisc noqueue master qbrba244f00-69 state UP mode DEFAULT group 
default qlen 1000
    link/ether 0e:ac:86:b1:c8:24 brd ff:ff:ff:ff:ff:ff

It appears completely normal, except that it has an unusually high
ifindex (67772). Could that be the cause of the problem ? Looking at the
source code at
https://git.launchpad.net/ubuntu/+source/ntp/tree/?h=ubuntu/trusty-
updates : interfaces are parsed looking at the /proc/net/if_inet6 file
(https://git.launchpad.net/ubuntu/+source/ntp/tree/lib/isc/unix/ifiter_getifaddrs.c?h=ubuntu/trusty-
updates#n54) which strace confirms :

3973  19:41:32 open("/proc/net/if_inet6", O_RDONLY) = 6

Each line is parsed using fgets :

fgets(iter->entry, sizeof(iter->entry), iter->proc) != NULL)

https://git.launchpad.net/ubuntu/+source/ntp/tree/lib/isc/unix/interfaceiter.c?h=ubuntu/trusty-
updates#n181

What's sizeof(iter->entry) ? Well "entry" is defined like that :

        char                    entry[ISC_IF_INET6_SZ];

https://git.launchpad.net/ubuntu/+source/ntp/tree/lib/isc/unix/ifiter_getifaddrs.c?h=ubuntu/trusty-
updates#n48

And ISC_IF_INET6_SZ is :
#define ISC_IF_INET6_SZ \
    sizeof("00000000000000000000000000000001 01 80 10 80 XXXXXXloXXXXXXXX\n")

https://git.launchpad.net/ubuntu/+source/ntp/tree/lib/isc/unix/interfaceiter.c?h=ubuntu/trusty-
updates#n153

And this is where the problem is. The computation of ISC_IF_INET6_SZ
assumes that ifindex will be 2 chars (in hex), so that ifindex will be <
256. However, ifindexes higher than that are likely common, so why don't
we see this bug elsewhere ? Well because the computation of
ISC_IF_INET6_SZ also assumes that the interface name is 16 chars.

In our example, the interface name is "only" 14 chars, so we have a buffer of 2 
chars for the ifindex. But that's not enough, it's off by 1 in fact !
"00000000000000000000000000000001 01 80 10 80 XXXXXXloXXXXXXXX\n" is 62 chars 
long.
The first line of if_inet6 on our machine is :
fe800000000000000cac86fffeb1c824 108bc 40 20 80 qvbba244f00-69, and that's 62 
chars long... but without the \n !

So what might be happening here is that the first iteration of the loop
will properly read the whole line except the \n, and the next iteration
will resume at that location, and because fgets() stops at EOF or
newline, it will just return a newline, which will make the whole
iteration stop.

The fix here is pretty simple : the computation of ISC_IF_INET6_SZ
should assume an ifindex of UINT_MAX, ie ffffffff (or any 8-chars
number). If I can trust
https://git.launchpad.net/ubuntu/+source/ntp/tree/lib/isc/unix/interfaceiter.c?h=applied/ubuntu/jammy
this is still present in Jammy.

Redirecting the bug to the "ntp" package.

** Also affects: ntp (Ubuntu)
   Importance: Undecided
       Status: New

** Changed in: ntp-charm
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1952264

Title:
  ntp sync checks fail when server as no IPv6 connectivity

To manage notifications about this bug go to:
https://bugs.launchpad.net/ntp-charm/+bug/1952264/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to