[Touch-packages] [Bug 1872118] Re: [SRU] DHCP Cluster crashes after a few hours
** No longer affects: isc-dhcp (Ubuntu) ** No longer affects: isc-dhcp (Ubuntu Focal) ** No longer affects: isc-dhcp (Ubuntu Groovy) ** Changed in: bind9-libs (Ubuntu Groovy) Importance: Undecided => High -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1872118 Title: [SRU] DHCP Cluster crashes after a few hours Status in DHCP: New Status in bind9-libs package in Ubuntu: Fix Released Status in bind9-libs source package in Focal: Fix Committed Status in bind9-libs source package in Groovy: Fix Released Bug description: [Description] isc-dhcp-server uses libisc-export (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52 3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=, processes_to_wake=1, futex_word=) at ../sysdeps/nptl/futex-internal.h:364 4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x7fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=, readable=, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x7fb4dea68609 in start_thread (arg=) at pthread_create.c:477 #8 0x7fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode as follows: Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/ Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ 2) Run dhcpd as follows in both machine(s) # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4 3) Leave the cluster running for a long (2h) period until the crash/race condition is reproduced. [REGRESSION POTENTIAL] * The fix will prevent the assertion to happen in the dispatch_send path, later versions of isch-dhcp upstream lack this logic and entirely removed the existence of this flag. Therefore, removing the need for this assertion at process_fd shouldn't be problematic. To manage notifications about this bug go to: https://bugs.launchpad.net/dhcp/+bug/1872118/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1872118] Re: [SRU] DHCP Cluster crashes after a few hours
22 hours no issues # systemctl status isc-dhcp-server ● isc-dhcp-server.service - ISC DHCP IPv4 server Loaded: loaded (/lib/systemd/system/isc-dhcp-server.service; enabled; vendor preset: enabled) Active: active (running) since Sun 2020-08-16 09:43:12 BST; 22h ago Docs: man:dhcpd(8) Main PID: 730 (dhcpd) Tasks: 4 (limit: 9571) Memory: 10.9M CGroup: /system.slice/isc-dhcp-server.service └─730 dhcpd -user dhcpd -group dhcpd -f -4 -pf /run/dhcp-server/dhcpd.pid -cf /etc/dhcp/dhcpd.conf -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1872118 Title: [SRU] DHCP Cluster crashes after a few hours Status in DHCP: New Status in bind9-libs package in Ubuntu: Fix Released Status in isc-dhcp package in Ubuntu: Invalid Status in bind9-libs source package in Focal: Fix Committed Status in isc-dhcp source package in Focal: Invalid Status in bind9-libs source package in Groovy: Fix Released Status in isc-dhcp source package in Groovy: Invalid Bug description: [Description] isc-dhcp-server uses libisc-export (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52 3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=, processes_to_wake=1, futex_word=) at ../sysdeps/nptl/futex-internal.h:364 4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x7fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=, readable=, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x7fb4dea68609 in start_thread (arg=) at pthread_create.c:477 #8 0x7fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode as follows: Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/ Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ 2) Run dhcpd as follows in both machine(s) # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4 3) Leave the cluster running for a long (2h) period until the crash/race condition is reproduced. [REGRESSION POTENTIAL] * The fix will prevent the assertion to happen in the dispatch_send path, later versions of isch-dhcp upstream lack this logic and entirely removed the existence of this flag. Therefore, removing the need for this assertion at process_fd shouldn't be problematic. To manage notifications about this bug go to: https://bugs.launchpad.net/dhcp/+bug/1872118/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1872118] Re: [SRU] DHCP Cluster crashes after a few hours
First I reverted isc-dhcp-server back to the original focal version, since I had an updated version from the PPA: $ sudo apt install isc-dhcp-server=4.4.1-2.1ubuntu5 isc-dhcp-common=4.4.1-2.1ubuntu5 Then I install the update packages: $ sudo apt update $ sudo apt install libdns-export1109/focal-proposed libirs-export161/focal-proposed libisc-export1105/focal-proposed $ dpkg --status libdns-export1109 libirs-export161 libisc-export1105 | grep Version Version: 1:9.11.16+dfsg-3~ubuntu1 Version: 1:9.11.16+dfsg-3~ubuntu1 Version: 1:9.11.16+dfsg-3~ubuntu1 Then I restarted dhcpd: sudo systemctl restart isc-dhcp-server It has been running for four hours on both systems. ** Tags removed: verification-needed verification-needed-focal ** Tags added: verification-done verification-done-focal -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1872118 Title: [SRU] DHCP Cluster crashes after a few hours Status in DHCP: New Status in bind9-libs package in Ubuntu: Fix Released Status in isc-dhcp package in Ubuntu: Invalid Status in bind9-libs source package in Focal: Fix Committed Status in isc-dhcp source package in Focal: Invalid Status in bind9-libs source package in Groovy: Fix Released Status in isc-dhcp source package in Groovy: Invalid Bug description: [Description] isc-dhcp-server uses libisc-export (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52 3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=, processes_to_wake=1, futex_word=) at ../sysdeps/nptl/futex-internal.h:364 4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x7fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=, readable=, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x7fb4dea68609 in start_thread (arg=) at pthread_create.c:477 #8 0x7fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode as follows: Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/ Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ 2) Run dhcpd as follows in both machine(s) # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4 3) Leave the cluster running for a long (2h) period until the crash/race condition is reproduced. [REGRESSION POTENTIAL] * The fix will prevent the assertion to happen in the dispatch_send path, later versions of isch-dhcp upstream lack this logic and entirely removed the existence of this flag. Therefore, removing the need for this assertion at process_fd shouldn't be problematic. To manage notifications about this bug go to: https://bugs.launchpad.net/dhcp/+bug/1872118/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages
[Touch-packages] [Bug 1872118] Re: [SRU] DHCP Cluster crashes after a few hours
I managed to get it working last night, but the install was a bit of a mess. It did how ever run over night. This morning i carried out the following sudo apt install \ isc-dhcp-server \ libdns-export1109/focal-proposed \ libirs-export161/focal-proposed \ libisc-export1105/focal-proposed The versions are below,and i will leave this to run as a test GW1 # dpkg -l | grep libdns-export1109 ii libdns-export11091:9.11.16+dfsg-3~ubuntu1 amd64Exported DNS Shared Library # dpkg -l | grep libirs-export161 ii libirs-export161 1:9.11.16+dfsg-3~ubuntu1 amd64Exported IRS Shared Library # dpkg -l | grep libisc-export1105 ii libisc-export1105:amd64 1:9.11.16+dfsg-3~ubuntu1 amd64Exported ISC Shared Library GW2 # dpkg -l | grep libdns-export1109 ii libdns-export11091:9.11.16+dfsg-3~ubuntu1 amd64Exported DNS Shared Library # dpkg -l | grep libirs-export161 ii libirs-export161 1:9.11.16+dfsg-3~ubuntu1 amd64Exported IRS Shared Library # dpkg -l | grep libisc-export1105 ii libisc-export1105:amd64 1:9.11.16+dfsg-3~ubuntu1 amd64Exported ISC Shared Library -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1872118 Title: [SRU] DHCP Cluster crashes after a few hours Status in DHCP: New Status in bind9-libs package in Ubuntu: Fix Released Status in isc-dhcp package in Ubuntu: Invalid Status in bind9-libs source package in Focal: Fix Committed Status in isc-dhcp source package in Focal: Invalid Status in bind9-libs source package in Groovy: Fix Released Status in isc-dhcp source package in Groovy: Invalid Bug description: [Description] isc-dhcp-server uses libisc-export (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52 3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=, processes_to_wake=1, futex_word=) at ../sysdeps/nptl/futex-internal.h:364 4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x7fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=, readable=, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x7fb4dea68609 in start_thread (arg=) at pthread_create.c:477 #8 0x7fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode as follows: Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/ Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ 2) Run dhcpd as follows in both machine(s) # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4 3) Leave the cluster running for a long (2h) period until the
[Touch-packages] [Bug 1872118] Re: [SRU] DHCP Cluster crashes after a few hours
Andrew, 1:9.11.16+dfsg-3~build1 is wrong. The correct version is 1:9.11.16+dfsg-3~ubuntu1 (~ubuntu1 instead of ~build1). -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1872118 Title: [SRU] DHCP Cluster crashes after a few hours Status in DHCP: New Status in bind9-libs package in Ubuntu: Fix Released Status in isc-dhcp package in Ubuntu: Invalid Status in bind9-libs source package in Focal: Fix Committed Status in isc-dhcp source package in Focal: Invalid Status in bind9-libs source package in Groovy: Fix Released Status in isc-dhcp source package in Groovy: Invalid Bug description: [Description] isc-dhcp-server uses libisc-export (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52 3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=, processes_to_wake=1, futex_word=) at ../sysdeps/nptl/futex-internal.h:364 4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x7fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=, readable=, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x7fb4dea68609 in start_thread (arg=) at pthread_create.c:477 #8 0x7fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode as follows: Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/ Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ 2) Run dhcpd as follows in both machine(s) # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4 3) Leave the cluster running for a long (2h) period until the crash/race condition is reproduced. [REGRESSION POTENTIAL] * The fix will prevent the assertion to happen in the dispatch_send path, later versions of isch-dhcp upstream lack this logic and entirely removed the existence of this flag. Therefore, removing the need for this assertion at process_fd shouldn't be problematic. To manage notifications about this bug go to: https://bugs.launchpad.net/dhcp/+bug/1872118/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1872118] Re: [SRU] DHCP Cluster crashes after a few hours
Just tried this apt-get -y remove isc-dhcp-server apt-get -y remove libdns-export1109/focal-proposed apt-get -y remove libirs-export161/focal-proposed apt-get -y remove libisc-export1105/focal-proposed apt-get -y install isc-dhcp-server apt-get -y install libdns1109/focal-proposed apt-get -y install libirs161/focal-proposed apt-get -y install libisc1105/focal-proposed GW1 # dpkg -l | grep isc-dhcp-server ii isc-dhcp-server 4.4.1-2.1ubuntu5 amd64ISC DHCP server for automatic IP address assignment # dpkg -l | grep libdns ii libdns-export11001:9.11.3+dfsg-1ubuntu1.12 amd64Exported DNS Shared Library ii libdns-export11091:9.11.16+dfsg-3~build1 amd64Exported DNS Shared Library ii libdns1109:amd64 1:9.11.16+dfsg-3~ubuntu1 amd64DNS Shared Library used by BIND # dpkg -l | grep libirs ii libirs-export160 1:9.11.3+dfsg-1ubuntu1.12 amd64Exported IRS Shared Library ii libirs-export161 1:9.11.16+dfsg-3~build1 amd64Exported IRS Shared Library ii libirs161:amd64 1:9.11.16+dfsg-3~ubuntu1 amd64DNS Shared Library used by BIND # dpkg -l | grep libisc ii libisc-export1105:amd64 1:9.11.16+dfsg-3~build1 amd64Exported ISC Shared Library ii libisc-export169:amd64 1:9.11.3+dfsg-1ubuntu1.12 amd64Exported ISC Shared Library ii libisc1105:amd64 1:9.11.16+dfsg-3~ubuntu1 amd64ISC Shared Library used by BIND ii libisccc161:amd641:9.11.16+dfsg-3~ubuntu1 amd64Command Channel Library used by BIND ii libisccfg-export160 1:9.11.3+dfsg-1ubuntu1.12 amd64Exported ISC CFG Shared Library ii libisccfg-export163 1:9.11.16+dfsg-3~build1 amd64Exported ISC CFG Shared Library ii libisccfg163:amd64 1:9.11.16+dfsg-3~ubuntu1 amd64Config File Handling Library used by BIND GW2 # dpkg -l | grep isc-dhcp-server ii isc-dhcp-server 4.4.1-2.1ubuntu5 amd64ISC DHCP server for automatic IP address assignment # dpkg -l | grep libdns ii libdns-export11001:9.11.3+dfsg-1ubuntu1.12 amd64Exported DNS Shared Library ii libdns-export11091:9.11.16+dfsg-3~build1 amd64Exported DNS Shared Library ii libdns1109:amd64 1:9.11.16+dfsg-3~ubuntu1 amd64DNS Shared Library used by BIND # dpkg -l | grep libirs ii libirs-export160 1:9.11.3+dfsg-1ubuntu1.12 amd64Exported IRS Shared Library ii libirs-export161 1:9.11.16+dfsg-3~build1 amd64Exported IRS Shared Library ii libirs161:amd64 1:9.11.16+dfsg-3~ubuntu1 amd64DNS Shared Library used by BIND # dpkg -l | grep libisc ii libisc-export1105:amd64 1:9.11.16+dfsg-3~build1 amd64Exported ISC Shared Library ii libisc-export169:amd64 1:9.11.3+dfsg-1ubuntu1.12 amd64Exported ISC Shared Library ii libisc1105:amd64 1:9.11.16+dfsg-3~ubuntu1 amd64ISC Shared Library used by BIND ii libisccc161:amd641:9.11.16+dfsg-3~ubuntu1 amd64Command Channel Library used by BIND ii libisccfg-export160 1:9.11.3+dfsg-1ubuntu1.12 amd64Exported ISC CFG Shared Library ii libisccfg-export163 1:9.11.16+dfsg-3~build1 amd64Exported ISC CFG Shared Library ii libisccfg163:amd64 1:9.11.16+dfsg-3~ubuntu1 amd64Config File Handling Library used by BIND still not good -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1872118 Title: [SRU] DHCP Cluster crashes after a few hours Status in DHCP: New Status in bind9-libs package in Ubuntu: Fix Released Status in isc-dhcp package in Ubuntu: Invalid Status in bind9-libs source package in Focal: Fix Committed Status in isc-dhcp source package in Focal: Invalid Status in bind9-libs source package in Groovy: Fix Released Status in isc-dhcp source package in Groovy: Invalid Bug description: [Description] isc-dhcp-server uses libisc-export (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a
[Touch-packages] [Bug 1872118] Re: [SRU] DHCP Cluster crashes after a few hours
not looking good, Aug 15 20:14:55 gw2-focal sh[741]: ../../../../lib/isc/unix/socket.c:4359: fatal error: select() failed: Bad file descriptor Aug 15 20:14:55 gw2-focal systemd[1]: isc-dhcp-server.service: Main process exited, code=killed, status=6/ABRT Aug 15 20:14:55 gw2-focal systemd[1]: isc-dhcp-server.service: Failed with result 'signal'. have the correct files been installed ? -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1872118 Title: [SRU] DHCP Cluster crashes after a few hours Status in DHCP: New Status in bind9-libs package in Ubuntu: Fix Released Status in isc-dhcp package in Ubuntu: Invalid Status in bind9-libs source package in Focal: Fix Committed Status in isc-dhcp source package in Focal: Invalid Status in bind9-libs source package in Groovy: Fix Released Status in isc-dhcp source package in Groovy: Invalid Bug description: [Description] isc-dhcp-server uses libisc-export (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52 3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=, processes_to_wake=1, futex_word=) at ../sysdeps/nptl/futex-internal.h:364 4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x7fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=, readable=, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x7fb4dea68609 in start_thread (arg=) at pthread_create.c:477 #8 0x7fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode as follows: Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/ Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ 2) Run dhcpd as follows in both machine(s) # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4 3) Leave the cluster running for a long (2h) period until the crash/race condition is reproduced. [REGRESSION POTENTIAL] * The fix will prevent the assertion to happen in the dispatch_send path, later versions of isch-dhcp upstream lack this logic and entirely removed the existence of this flag. Therefore, removing the need for this assertion at process_fd shouldn't be problematic. To manage notifications about this bug go to: https://bugs.launchpad.net/dhcp/+bug/1872118/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1872118] Re: [SRU] DHCP Cluster crashes after a few hours
Timo, Just installed the updates I will let you know I installed as follows apt-get install isc-dhcp-server apt-get install libdns-export1109/focal-proposed apt-get install libirs-export161/focal-proposed apt-get install libisc-export1105/focal-proposed The installed version are GW1 # dpkg -l | grep isc-dhcp-server ii isc-dhcp-server 4.4.1-2.1ubuntu5 amd64ISC DHCP server for automatic IP address assignment # dpkg -l | grep libdns-export1109 ii libdns-export11091:9.11.16+dfsg-3~ubuntu1 amd64Exported DNS Shared Library # dpkg -l | grep libirs-export161 ii libirs-export161 1:9.11.16+dfsg-3~ubuntu1 amd64Exported IRS Shared Library # dpkg -l | grep libisc-export1105 ii libisc-export1105:amd64 1:9.11.16+dfsg-3~ubuntu1 amd64Exported ISC Shared Library # GW2 # dpkg -l | grep isc-dhcp-server ii isc-dhcp-server 4.4.1-2.1ubuntu5 amd64ISC DHCP server for automatic IP address assignment # dpkg -l | grep libdns-export1109 ii libdns-export11091:9.11.16+dfsg-3~ubuntu1 amd64Exported DNS Shared Library # dpkg -l | grep libirs-export161 ii libirs-export161 1:9.11.16+dfsg-3~build1 amd64Exported IRS Shared Library # dpkg -l | grep libisc-export1105 ii libisc-export1105:amd64 1:9.11.16+dfsg-3~build1 amd64Exported ISC Shared Library Can you please confirm the correct versions are installed -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1872118 Title: [SRU] DHCP Cluster crashes after a few hours Status in DHCP: New Status in bind9-libs package in Ubuntu: Fix Released Status in isc-dhcp package in Ubuntu: Invalid Status in bind9-libs source package in Focal: Fix Committed Status in isc-dhcp source package in Focal: Invalid Status in bind9-libs source package in Groovy: Fix Released Status in isc-dhcp source package in Groovy: Invalid Bug description: [Description] isc-dhcp-server uses libisc-export (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52 3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=, processes_to_wake=1, futex_word=) at ../sysdeps/nptl/futex-internal.h:364 4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x7fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=, readable=, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x7fb4dea68609 in start_thread (arg=) at pthread_create.c:477 #8 0x7fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode
[Touch-packages] [Bug 1872118] Re: [SRU] DHCP Cluster crashes after a few hours
Hello Andrew, or anyone else affected, Accepted bind9-libs into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/bind9-libs/1:9.11.16+dfsg-3~ubuntu1 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed- focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification- failed-focal. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping! N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days. ** Changed in: bind9-libs (Ubuntu Focal) Status: In Progress => Fix Committed ** Tags added: verification-needed verification-needed-focal -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1872118 Title: [SRU] DHCP Cluster crashes after a few hours Status in DHCP: New Status in bind9-libs package in Ubuntu: Fix Released Status in isc-dhcp package in Ubuntu: Invalid Status in bind9-libs source package in Focal: Fix Committed Status in isc-dhcp source package in Focal: Invalid Status in bind9-libs source package in Groovy: Fix Released Status in isc-dhcp source package in Groovy: Invalid Bug description: [Description] isc-dhcp-server uses libisc-export (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52 3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=, processes_to_wake=1, futex_word=) at ../sysdeps/nptl/futex-internal.h:364 4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x7fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=, readable=, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x7fb4dea68609 in start_thread (arg=) at pthread_create.c:477 #8 0x7fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode as follows: Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/ Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ 2) Run dhcpd as follows in both machine(s) #
[Touch-packages] [Bug 1872118] Re: [SRU] DHCP Cluster crashes after a few hours
** Changed in: bind9-libs (Ubuntu Focal) Importance: Undecided => High -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1872118 Title: [SRU] DHCP Cluster crashes after a few hours Status in DHCP: New Status in bind9-libs package in Ubuntu: Fix Released Status in isc-dhcp package in Ubuntu: Invalid Status in bind9-libs source package in Focal: In Progress Status in isc-dhcp source package in Focal: Invalid Status in bind9-libs source package in Groovy: Fix Released Status in isc-dhcp source package in Groovy: Invalid Bug description: [Description] isc-dhcp-server uses libisc-export (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52 3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=, processes_to_wake=1, futex_word=) at ../sysdeps/nptl/futex-internal.h:364 4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x7fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=, readable=, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x7fb4dea68609 in start_thread (arg=) at pthread_create.c:477 #8 0x7fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode as follows: Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/ Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ 2) Run dhcpd as follows in both machine(s) # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4 3) Leave the cluster running for a long (2h) period until the crash/race condition is reproduced. [REGRESSION POTENTIAL] * The fix will prevent the assertion to happen in the dispatch_send path, later versions of isch-dhcp upstream lack this logic and entirely removed the existence of this flag. Therefore, removing the need for this assertion at process_fd shouldn't be problematic. To manage notifications about this bug go to: https://bugs.launchpad.net/dhcp/+bug/1872118/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1872118] Re: [SRU] DHCP Cluster crashes after a few hours
Excellent. I'm available to test the -proposed update for focal whenever it is ready. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1872118 Title: [SRU] DHCP Cluster crashes after a few hours Status in DHCP: New Status in bind9-libs package in Ubuntu: Fix Released Status in isc-dhcp package in Ubuntu: Invalid Status in bind9-libs source package in Focal: In Progress Status in isc-dhcp source package in Focal: Invalid Status in bind9-libs source package in Groovy: Fix Released Status in isc-dhcp source package in Groovy: Invalid Bug description: [Description] isc-dhcp-server uses libisc-export (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52 3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=, processes_to_wake=1, futex_word=) at ../sysdeps/nptl/futex-internal.h:364 4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x7fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=, readable=, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x7fb4dea68609 in start_thread (arg=) at pthread_create.c:477 #8 0x7fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode as follows: Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/ Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ 2) Run dhcpd as follows in both machine(s) # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4 3) Leave the cluster running for a long (2h) period until the crash/race condition is reproduced. [REGRESSION POTENTIAL] * The fix will prevent the assertion to happen in the dispatch_send path, later versions of isch-dhcp upstream lack this logic and entirely removed the existence of this flag. Therefore, removing the need for this assertion at process_fd shouldn't be problematic. To manage notifications about this bug go to: https://bugs.launchpad.net/dhcp/+bug/1872118/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1872118] Re: [SRU] DHCP Cluster crashes after a few hours
This bug was fixed in the package bind9-libs - 1:9.11.19+dfsg-1ubuntu1 --- bind9-libs (1:9.11.19+dfsg-1ubuntu1) groovy; urgency=medium [ Jorge Niedbalski ] * debian/patches/0010-fix-1872118.patch: Check if pending_send if set before calling dispatch_send. Fixes LP: #1872118. -- Gianfranco Costamagna Tue, 11 Aug 2020 15:25:14 +0200 ** Changed in: bind9-libs (Ubuntu Groovy) Status: In Progress => Fix Released -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1872118 Title: [SRU] DHCP Cluster crashes after a few hours Status in DHCP: New Status in bind9-libs package in Ubuntu: Fix Released Status in isc-dhcp package in Ubuntu: Invalid Status in bind9-libs source package in Focal: In Progress Status in isc-dhcp source package in Focal: Invalid Status in bind9-libs source package in Groovy: Fix Released Status in isc-dhcp source package in Groovy: Invalid Bug description: [Description] isc-dhcp-server uses libisc-export (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52 3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=, processes_to_wake=1, futex_word=) at ../sysdeps/nptl/futex-internal.h:364 4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x7fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=, readable=, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x7fb4dea68609 in start_thread (arg=) at pthread_create.c:477 #8 0x7fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode as follows: Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/ Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ 2) Run dhcpd as follows in both machine(s) # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4 3) Leave the cluster running for a long (2h) period until the crash/race condition is reproduced. [REGRESSION POTENTIAL] * The fix will prevent the assertion to happen in the dispatch_send path, later versions of isch-dhcp upstream lack this logic and entirely removed the existence of this flag. Therefore, removing the need for this assertion at process_fd shouldn't be problematic. To manage notifications about this bug go to: https://bugs.launchpad.net/dhcp/+bug/1872118/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1872118] Re: [SRU] DHCP Cluster crashes after a few hours
I'm marking isc-dhcp tasks as invalid! -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1872118 Title: [SRU] DHCP Cluster crashes after a few hours Status in DHCP: New Status in bind9-libs package in Ubuntu: In Progress Status in isc-dhcp package in Ubuntu: Invalid Status in bind9-libs source package in Focal: In Progress Status in isc-dhcp source package in Focal: Invalid Status in bind9-libs source package in Groovy: In Progress Status in isc-dhcp source package in Groovy: Invalid Bug description: [Description] isc-dhcp-server uses libisc-export (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52 3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=, processes_to_wake=1, futex_word=) at ../sysdeps/nptl/futex-internal.h:364 4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x7fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=, readable=, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x7fb4dea68609 in start_thread (arg=) at pthread_create.c:477 #8 0x7fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode as follows: Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/ Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ 2) Run dhcpd as follows in both machine(s) # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4 3) Leave the cluster running for a long (2h) period until the crash/race condition is reproduced. [REGRESSION POTENTIAL] * The fix will prevent the assertion to happen in the dispatch_send path, later versions of isch-dhcp upstream lack this logic and entirely removed the existence of this flag. Therefore, removing the need for this assertion at process_fd shouldn't be problematic. To manage notifications about this bug go to: https://bugs.launchpad.net/dhcp/+bug/1872118/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1872118] Re: [SRU] DHCP Cluster crashes after a few hours
I uploaded in groovy (sorry, I didn't find the debdiff and it was partially wrong) and focal for review. ** Changed in: isc-dhcp (Ubuntu Focal) Status: In Progress => Invalid ** Changed in: isc-dhcp (Ubuntu Groovy) Status: In Progress => Incomplete ** Changed in: isc-dhcp (Ubuntu Groovy) Status: Incomplete => Invalid -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1872118 Title: [SRU] DHCP Cluster crashes after a few hours Status in DHCP: New Status in bind9-libs package in Ubuntu: In Progress Status in isc-dhcp package in Ubuntu: Invalid Status in bind9-libs source package in Focal: In Progress Status in isc-dhcp source package in Focal: Invalid Status in bind9-libs source package in Groovy: In Progress Status in isc-dhcp source package in Groovy: Invalid Bug description: [Description] isc-dhcp-server uses libisc-export (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52 3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=, processes_to_wake=1, futex_word=) at ../sysdeps/nptl/futex-internal.h:364 4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x7fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=, readable=, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x7fb4dea68609 in start_thread (arg=) at pthread_create.c:477 #8 0x7fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode as follows: Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/ Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ 2) Run dhcpd as follows in both machine(s) # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4 3) Leave the cluster running for a long (2h) period until the crash/race condition is reproduced. [REGRESSION POTENTIAL] * The fix will prevent the assertion to happen in the dispatch_send path, later versions of isch-dhcp upstream lack this logic and entirely removed the existence of this flag. Therefore, removing the need for this assertion at process_fd shouldn't be problematic. To manage notifications about this bug go to: https://bugs.launchpad.net/dhcp/+bug/1872118/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1872118] Re: [SRU] DHCP Cluster crashes after a few hours
The attachment "lp-1872118-groovy.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team. [This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.] ** Tags added: patch -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1872118 Title: [SRU] DHCP Cluster crashes after a few hours Status in DHCP: New Status in bind9-libs package in Ubuntu: In Progress Status in isc-dhcp package in Ubuntu: In Progress Status in bind9-libs source package in Focal: In Progress Status in isc-dhcp source package in Focal: In Progress Status in bind9-libs source package in Groovy: In Progress Status in isc-dhcp source package in Groovy: In Progress Bug description: [Description] isc-dhcp-server uses libisc-export (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52 3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=, processes_to_wake=1, futex_word=) at ../sysdeps/nptl/futex-internal.h:364 4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x7fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=, readable=, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x7fb4dea68609 in start_thread (arg=) at pthread_create.c:477 #8 0x7fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode as follows: Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/ Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ 2) Run dhcpd as follows in both machine(s) # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4 3) Leave the cluster running for a long (2h) period until the crash/race condition is reproduced. [REGRESSION POTENTIAL] * The fix will prevent the assertion to happen in the dispatch_send path, later versions of isch-dhcp upstream lack this logic and entirely removed the existence of this flag. Therefore, removing the need for this assertion at process_fd shouldn't be problematic. To manage notifications about this bug go to: https://bugs.launchpad.net/dhcp/+bug/1872118/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1872118] Re: [SRU] DHCP Cluster crashes after a few hours
bind9 in focal uses the 9.16.x versions of these libraries, packaged separately. It's just 9.11.x that was packaged as bind9-libs because of legacy applications like isc-dhcp that do not work with the 9.16 version of bind9. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1872118 Title: [SRU] DHCP Cluster crashes after a few hours Status in DHCP: New Status in bind9-libs package in Ubuntu: In Progress Status in isc-dhcp package in Ubuntu: In Progress Status in bind9-libs source package in Focal: In Progress Status in isc-dhcp source package in Focal: In Progress Status in bind9-libs source package in Groovy: In Progress Status in isc-dhcp source package in Groovy: In Progress Bug description: [Description] isc-dhcp-server uses libisc-export (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52 3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=, processes_to_wake=1, futex_word=) at ../sysdeps/nptl/futex-internal.h:364 4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x7fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=, readable=, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x7fb4dea68609 in start_thread (arg=) at pthread_create.c:477 #8 0x7fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode as follows: Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/ Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ 2) Run dhcpd as follows in both machine(s) # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4 3) Leave the cluster running for a long (2h) period until the crash/race condition is reproduced. [REGRESSION POTENTIAL] * The fix will prevent the assertion to happen in the dispatch_send path, later versions of isch-dhcp upstream lack this logic and entirely removed the existence of this flag. Therefore, removing the need for this assertion at process_fd shouldn't be problematic. To manage notifications about this bug go to: https://bugs.launchpad.net/dhcp/+bug/1872118/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1872118] Re: [SRU] DHCP Cluster crashes after a few hours
Jorge, I agree with Gianfranco Costamagna that a rebuild of isc-dhcp is NOT required. Why do you think it is? Presumably BIND also uses these libraries? If so, it seems like the Test Case should involve making sure BIND still seems to work, and that BIND should be mentioned in the Regression Potential. My DHCP servers also run BIND for recursive DNS and that has been fine with the patch applied. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1872118 Title: [SRU] DHCP Cluster crashes after a few hours Status in DHCP: New Status in bind9-libs package in Ubuntu: In Progress Status in isc-dhcp package in Ubuntu: In Progress Status in bind9-libs source package in Focal: In Progress Status in isc-dhcp source package in Focal: In Progress Status in bind9-libs source package in Groovy: In Progress Status in isc-dhcp source package in Groovy: In Progress Bug description: [Description] isc-dhcp-server uses libisc-export (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52 3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=, processes_to_wake=1, futex_word=) at ../sysdeps/nptl/futex-internal.h:364 4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x7fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=, readable=, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x7fb4dea68609 in start_thread (arg=) at pthread_create.c:477 #8 0x7fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode as follows: Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/ Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ 2) Run dhcpd as follows in both machine(s) # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4 3) Leave the cluster running for a long (2h) period until the crash/race condition is reproduced. [REGRESSION POTENTIAL] * The fix will prevent the assertion to happen in the dispatch_send path, later versions of isch-dhcp upstream lack this logic and entirely removed the existence of this flag. Therefore, removing the need for this assertion at process_fd shouldn't be problematic. To manage notifications about this bug go to: https://bugs.launchpad.net/dhcp/+bug/1872118/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1872118] Re: [SRU] DHCP Cluster crashes after a few hours
** Patch added: "lp-1872118-focal.debdiff" https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1872118/+attachment/5400761/+files/lp-1872118-focal.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1872118 Title: [SRU] DHCP Cluster crashes after a few hours Status in DHCP: New Status in bind9-libs package in Ubuntu: In Progress Status in isc-dhcp package in Ubuntu: In Progress Status in bind9-libs source package in Focal: In Progress Status in isc-dhcp source package in Focal: In Progress Status in bind9-libs source package in Groovy: In Progress Status in isc-dhcp source package in Groovy: In Progress Bug description: [Description] isc-dhcp-server uses libisc-export (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52 3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=, processes_to_wake=1, futex_word=) at ../sysdeps/nptl/futex-internal.h:364 4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x7fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=, readable=, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x7fb4dea68609 in start_thread (arg=) at pthread_create.c:477 #8 0x7fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode as follows: Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/ Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ 2) Run dhcpd as follows in both machine(s) # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4 3) Leave the cluster running for a long (2h) period until the crash/race condition is reproduced. [REGRESSION POTENTIAL] * The fix will prevent the assertion to happen in the dispatch_send path, later versions of isch-dhcp upstream lack this logic and entirely removed the existence of this flag. Therefore, removing the need for this assertion at process_fd shouldn't be problematic. To manage notifications about this bug go to: https://bugs.launchpad.net/dhcp/+bug/1872118/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1872118] Re: [SRU] DHCP Cluster crashes after a few hours
** Patch added: "lp-1872118-groovy.debdiff" https://bugs.launchpad.net/ubuntu/+source/isc-dhcp/+bug/1872118/+attachment/5400760/+files/lp-1872118-groovy.debdiff -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1872118 Title: [SRU] DHCP Cluster crashes after a few hours Status in DHCP: New Status in bind9-libs package in Ubuntu: In Progress Status in isc-dhcp package in Ubuntu: In Progress Status in bind9-libs source package in Focal: In Progress Status in isc-dhcp source package in Focal: In Progress Status in bind9-libs source package in Groovy: In Progress Status in isc-dhcp source package in Groovy: In Progress Bug description: [Description] isc-dhcp-server uses libisc-export (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52 3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=, processes_to_wake=1, futex_word=) at ../sysdeps/nptl/futex-internal.h:364 4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x7fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=, readable=, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x7fb4dea68609 in start_thread (arg=) at pthread_create.c:477 #8 0x7fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode as follows: Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/ Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ 2) Run dhcpd as follows in both machine(s) # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4 3) Leave the cluster running for a long (2h) period until the crash/race condition is reproduced. [REGRESSION POTENTIAL] * The fix will prevent the assertion to happen in the dispatch_send path, later versions of isch-dhcp upstream lack this logic and entirely removed the existence of this flag. Therefore, removing the need for this assertion at process_fd shouldn't be problematic. To manage notifications about this bug go to: https://bugs.launchpad.net/dhcp/+bug/1872118/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1872118] Re: [SRU] DHCP Cluster crashes after a few hours
Uploaded debdiff(s) for groovy and focal. This will require a follow up rebuild change for isc-dhcp, once the library change lands. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1872118 Title: [SRU] DHCP Cluster crashes after a few hours Status in DHCP: New Status in bind9-libs package in Ubuntu: In Progress Status in isc-dhcp package in Ubuntu: In Progress Status in bind9-libs source package in Focal: In Progress Status in isc-dhcp source package in Focal: In Progress Status in bind9-libs source package in Groovy: In Progress Status in isc-dhcp source package in Groovy: In Progress Bug description: [Description] isc-dhcp-server uses libisc-export (coming from bind9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. If this race condition happens, the following stacktrace will be hit: (gdb) info threads Id Target Id Frame * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52 3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=, processes_to_wake=1, futex_word=) at ../sysdeps/nptl/futex-internal.h:364 4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 (gdb) frame 2 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 (gdb) bt #1 0x7fb4deaa7859 in __GI_abort () at abort.c:79 #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 #4 process_fd (writeable=, readable=, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 #5 process_fds (writefds=, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 #7 0x7fb4dea68609 in start_thread (arg=) at pthread_create.c:477 #8 0x7fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) frame 3 #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 4041 in ../../../../lib/isc/unix/socket.c (gdb) p sock->pending_send $2 = 1 [TEST CASE] 1) Install isc-dhcp-server in 2 focal machine(s). 2) Configure peer/cluster mode as follows: Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/ Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ 2) Run dhcpd as follows in both machine(s) # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4 3) Leave the cluster running for a long (2h) period until the crash/race condition is reproduced. [REGRESSION POTENTIAL] * The fix will prevent the assertion to happen in the dispatch_send path, later versions of isch-dhcp upstream lack this logic and entirely removed the existence of this flag. Therefore, removing the need for this assertion at process_fd shouldn't be problematic. To manage notifications about this bug go to: https://bugs.launchpad.net/dhcp/+bug/1872118/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 1872118] Re: [SRU] DHCP Cluster crashes after a few hours
** Description changed: + [Description] - I have a pair of DHCP serevrs running in a cluster on ubuntu 20.04, All worked perfectly until recently, when they started stopping with code=killed, status=6/ABRT. - This is being fixed by + isc-dhcp-server uses libisc-export (coming from bin9-libs package) for handling the socket event(s) when configured in peer mode (master/secondary). It's possible that a sequence of messages dispatched by the master that requires acknowledgment from its peers holds a socket + in a pending to send state, a timer or a subsequent write request can be scheduled into this socket and the !sock->pending_send assertion + will be raised when trying to write again at the time data hasn't been flushed entirely and the pending_send flag hasn't been reset to 0 state. - https://bugs.launchpad.net/bugs/1870729 + If this race condition happens, the following stacktrace will be + hit: - However now one stops after a few hours with the following errors. One - can stay on line but not both. + (gdb) info threads + Id Target Id Frame + * 1 Thread 0x7fb4ddecb700 (LWP 3170) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 + 2 Thread 0x7fb4dd6ca700 (LWP 3171) __lll_lock_wait (futex=futex@entry=0x7fb4de6d2028, private=0) at lowlevellock.c:52 + 3 Thread 0x7fb4de6cc700 (LWP 3169) futex_wake (private=, processes_to_wake=1, futex_word=) at ../sysdeps/nptl/futex-internal.h:364 + 4 Thread 0x7fb4de74f740 (LWP 3148) futex_wait_cancelable (private=, expected=0, futex_word=0x7fb4de6cd0d0) at ../sysdeps/nptl/futex-internal.h:183 + + (gdb) frame 2 + #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, + cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 + (gdb) bt + #1 0x7fb4deaa7859 in __GI_abort () at abort.c:79 + #2 0x7fb4dec85985 in isc_assertion_failed (file=file@entry=0x7fb4decd8878 "../../../../lib/isc/unix/socket.c", line=line@entry=3361, type=type@entry=isc_assertiontype_insist, + cond=cond@entry=0x7fb4decda033 "!sock->pending_send") at ../../../lib/isc/assertions.c:52 + #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 + #4 process_fd (writeable=, readable=, fd=11, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4054 + #5 process_fds (writefds=, readfds=0x7fb4de6d1090, maxfd=13, manager=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4211 + #6 watcher (uap=0x7fb4de6d0010) at ../../../../lib/isc/unix/socket.c:4397 + #7 0x7fb4dea68609 in start_thread (arg=) at pthread_create.c:477 + #8 0x7fb4deba4103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 + + (gdb) frame 3 + #3 0x7fb4decc17e1 in dispatch_send (sock=0x7fb4de6d4990) at ../../../../lib/isc/unix/socket.c:4041 + 4041 in ../../../../lib/isc/unix/socket.c + (gdb) p sock->pending_send + $2 = 1 + + [TEST CASE] + + 1) Install isc-dhcp-server in 2 focal machine(s). + 2) Configure peer/cluster mode as follows: +Primary configuration: https://pastebin.ubuntu.com/p/XYj648MghK/ +Secondary configuration: https://pastebin.ubuntu.com/p/PYkcshZCWK/ + 2) Run dhcpd as follows in both machine(s) + + # dhcpd -f -d -4 -cf /etc/dhcp/dhcpd.conf --no-pid ens4 + + 3) Leave the cluster running for a long (2h) period until the crash/race + condition is reproduced. + [REGRESSION POTENTIAL] - Syslog shows - Apr 10 17:20:15 dhcp-primary sh[6828]: ../../../../lib/isc/unix/socket.c:3361: INSIST(!sock->pending_send) failed, back trace - Apr 10 17:20:15 dhcp-primary sh[6828]: #0 0x7fbe78702a4a in ?? - Apr 10 17:20:15 dhcp-primary sh[6828]: #1 0x7fbe78702980 in ?? - Apr 10 17:20:15 dhcp-primary sh[6828]: #2 0x7fbe7873e7e1 in ?? - Apr 10 17:20:15 dhcp-primary sh[6828]: #3 0x7fbe784e5609 in ?? - Apr 10 17:20:15 dhcp-primary sh[6828]: #4 0x7fbe78621103 in ?? - - - nothing in kern.log - - - apport.log shows - ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: called for pid 6828, signal 6, core limit 0, dump mode 2 - ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: not creating core for pid with dump mode of 2 - ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: executable: /usr/sbin/dhcpd (command line "dhcpd -user dhcpd -group dhcpd -f -4 -pf /run/dhcp-server/dhcpd.pid -cf /etc/dhcp/dhcpd.conf") - ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: is_closing_session(): no DBUS_SESSION_BUS_ADDRESS in environment - ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: wrote report /var/crash/_usr_sbin_dhcpd.0.crash - - - /var/crash/_usr_sbin_dhcpd.0.crash shows - - ProblemType: Crash - Architecture: amd64 - CrashCounter: 1 - Date: Fri Apr 10 17:20:15 2020 - DistroRelease: Ubuntu 20.04 - ExecutablePath: /usr/sbin/dhcpd - ExecutableTimestamp: 1586210315 - ProcCmdline: dhcpd -user dhcpd -group dhcpd -f -4 -pf
[Touch-packages] [Bug 1872118] Re: [SRU] DHCP Cluster crashes after a few hours
** Summary changed: - DHCP Cluster crashes after a few hours + [SRU] DHCP Cluster crashes after a few hours -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to isc-dhcp in Ubuntu. https://bugs.launchpad.net/bugs/1872118 Title: [SRU] DHCP Cluster crashes after a few hours Status in DHCP: New Status in bind9-libs package in Ubuntu: In Progress Status in isc-dhcp package in Ubuntu: In Progress Status in bind9-libs source package in Focal: In Progress Status in isc-dhcp source package in Focal: In Progress Status in bind9-libs source package in Groovy: In Progress Status in isc-dhcp source package in Groovy: In Progress Bug description: I have a pair of DHCP serevrs running in a cluster on ubuntu 20.04, All worked perfectly until recently, when they started stopping with code=killed, status=6/ABRT. This is being fixed by https://bugs.launchpad.net/bugs/1870729 However now one stops after a few hours with the following errors. One can stay on line but not both. Syslog shows Apr 10 17:20:15 dhcp-primary sh[6828]: ../../../../lib/isc/unix/socket.c:3361: INSIST(!sock->pending_send) failed, back trace Apr 10 17:20:15 dhcp-primary sh[6828]: #0 0x7fbe78702a4a in ?? Apr 10 17:20:15 dhcp-primary sh[6828]: #1 0x7fbe78702980 in ?? Apr 10 17:20:15 dhcp-primary sh[6828]: #2 0x7fbe7873e7e1 in ?? Apr 10 17:20:15 dhcp-primary sh[6828]: #3 0x7fbe784e5609 in ?? Apr 10 17:20:15 dhcp-primary sh[6828]: #4 0x7fbe78621103 in ?? nothing in kern.log apport.log shows ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: called for pid 6828, signal 6, core limit 0, dump mode 2 ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: not creating core for pid with dump mode of 2 ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: executable: /usr/sbin/dhcpd (command line "dhcpd -user dhcpd -group dhcpd -f -4 -pf /run/dhcp-server/dhcpd.pid -cf /etc/dhcp/dhcpd.conf") ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: is_closing_session(): no DBUS_SESSION_BUS_ADDRESS in environment ERROR: apport (pid 6850) Fri Apr 10 17:20:15 2020: wrote report /var/crash/_usr_sbin_dhcpd.0.crash /var/crash/_usr_sbin_dhcpd.0.crash shows ProblemType: Crash Architecture: amd64 CrashCounter: 1 Date: Fri Apr 10 17:20:15 2020 DistroRelease: Ubuntu 20.04 ExecutablePath: /usr/sbin/dhcpd ExecutableTimestamp: 1586210315 ProcCmdline: dhcpd -user dhcpd -group dhcpd -f -4 -pf /run/dhcp-server/dhcpd.pid -cf /etc/dhcp/dhcpd.conf ProcEnviron: Error: [Errno 13] Permission denied: 'environ' ProcMaps: Error: [Errno 13] Permission denied: 'maps' ProcStatus: Name: dhcpd Umask: 0022 State: D (disk sleep) Tgid: 6828 Ngid: 0 Pid: 6828 PPid: 1 TracerPid: 0 Uid: 113 113 113 113 Gid: 118 118 118 118 FDSize:128 Groups: NStgid:6828 NSpid: 6828 NSpgid:6828 NSsid: 6828 VmPeak: 236244 kB VmSize: 170764 kB VmLck:0 kB VmPin:0 kB VmHWM:12064 kB VmRSS:12064 kB RssAnon: 5940 kB RssFile: 6124 kB RssShmem: 0 kB VmData: 30792 kB VmStk: 132 kB VmExe: 592 kB VmLib: 5424 kB VmPTE: 76 kB VmSwap: 0 kB HugetlbPages: 0 kB CoreDumping: 1 THP_enabled: 1 Threads: 4 SigQ: 0/7609 SigPnd: ShdPnd: SigBlk: SigIgn:1000 SigCgt:00018000 CapInh: CapPrm: CapEff: CapBnd:003f CapAmb: NoNewPrivs:0 Seccomp: 0 Speculation_Store_Bypass: thread vulnerable Cpus_allowed: 3 Cpus_allowed_list: 0-1 Mems_allowed: ,,,,,,,,,,,,,,,,,,,,,,,,,,,, ,,,0001 Mems_allowed_list: 0 voluntary_ctxt_switches: 111 nonvoluntary_ctxt_switches:144 Signal: 6 Uname: Linux 5.4.0-21-generic x86_64 UserGroups: To manage notifications about this bug go to: https://bugs.launchpad.net/dhcp/+bug/1872118/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp