Indeed, and unfortunately it seems like I've been able to identify the
root cause which is still present in 2.90. What changed, we're no longer
seeing a segfault which was the tell-tale sign we were looking for, but
instead we just see it quietly exit with no trace. This led me to
getting a setup where I could reproduce the issue, which we've seen
trigger on Ironic's "standalone" jobs as they exercise a number of
different scenarios involving port/address updates quite a bit, and I
just sat with strace attached to the dnsmasq process.

alarm(77377)                            = 77377
poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}, 
{fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, 
events=POLLIN}, {fd=11, events=POLLIN}, {fd=12, events=POLLIN}, {fd=13, 
events=POLLIN}, {fd=14, events=POLLIN}, {fd=15, events=POLLIN}, {fd=16, 
events=POLLIN}, {fd=17, events=POLLIN}], 14, -1) = ? ERESTART_RESTARTBLOCK 
(Interrupted by signal)
--- SIGHUP {si_signo=SIGHUP, si_code=SI_USER, si_pid=236696, si_uid=0} ---
getpid()                                = 235993
writev(18, [{iov_base="\1\0\0\0\0\0\0\0\0\0\0\0", iov_len=12}], 1) = 12
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}, 
{fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, 
events=POLLIN}, {fd=11, events=POLLIN}, {fd=12, events=POLLIN}, {fd=13, 
events=POLLIN}, {fd=14, events=POLLIN}, {fd=15, events=POLLIN}, {fd=16, 
events=POLLIN}, {fd=17, events=POLLIN}], 14, -1) = 1 ([{fd=17, revents=POLLIN}])
read(17, "\1\0\0\0\0\0\0\0\0\0\0\0", 12) = 12
newfstatat(AT_FDCWD, 
"/opt/stack/data/neutron/dhcp/77cabae0-26bf-4374-997a-781947f2e5b2/addn_hosts", 
{st_mode=S_IFREG|0644, st_size=1268, ...}, 0) = 0
openat(AT_FDCWD, 
"/opt/stack/data/neutron/dhcp/77cabae0-26bf-4374-997a-781947f2e5b2/addn_hosts", 
O_RDONLY) = 20
newfstatat(20, "", {st_mode=S_IFREG|0644, st_size=1268, ...}, AT_EMPTY_PATH) = 0
read(20, "10.1.0.1\thost-10-1-0-1.openstack"..., 4096) = 1268
read(20, "", 4096)                      = 0
close(20)                               = 0
getpid()                                = 235993
newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=114, 
...}, 0) = 0
getpid()                                = 235993
write(19, "<30>Feb 29 22:35:30 dnsmasq[2359"..., 133) = 133
writev(2, [{iov_base="free(): invalid pointer", iov_len=23}, {iov_base="\n", 
iov_len=1}], 2) = 24
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7fec31710000
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
gettid()                                = 235993
getpid()                                = 235993
tgkill(235993, 235993, SIGABRT)         = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=235993, si_uid=65534} 
---
+++ killed by SIGABRT +++


The above was captured from a dnsmasq install with 
https://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=10d8b5f001a34ff46b3a72575f3af64b065f8637
 where as running the commit before it, I don't crash out dnsmasq. The result 
with 2.90 is basically identical (https://paste.openstack.org/show/823338/). 

In terms of the actual logging which results, all we get is along the
lines of https://paste.opendev.org/show/823337/, at least with the
install on jammy, where neutron is directly launching the dnsmasq
processes. Looking at the contents of the commit where this starts
(https://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=blobdiff;f=src/option.c;h=23cf058e8ad9fdcffe621ce0b92d626578d31f82;hp=e8926a49f46050eea9cba7cc90e12d67821f3a2f;hb=10d8b5f001a34ff46b3a72575f3af64b065f8637;hpb=ffd3ceb856eb3e1ff923cae95a2ddb889beab7c1),
it is clear that the logic has been changed into a different which
explains the error we are able to capture with stracce.

I think ironic is going to have to pin to dnsmasq 2.85 for the time
being, until we're able to get this fixed and fully understood in
upstream dnsmasq since this is impacting numerous CI branches and test
jobs.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2026757

Title:
  dnsmasq on Ubuntu Jammy crashes on neutron-dhcp-agent updates

To manage notifications about this bug go to:
https://bugs.launchpad.net/ironic/+bug/2026757/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to