Processed: Re: Bug#1093243: linux-image-6.1.0-29-amd64 causes mariadb hangs
Processing control commands: > forwarded -1 https://jira.mariadb.org/projects/MDEV/issues/MDEV-35886 Bug #1093243 [src:linux] linux-image-6.1.0-29-amd64 causes mariadb hangs Set Bug forwarded-to-address to 'https://jira.mariadb.org/projects/MDEV/issues/MDEV-35886'. -- 1093243: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1093243 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#1093243: linux-image-6.1.0-29-amd64 causes mariadb hangs
Control: forwarded -1 https://jira.mariadb.org/projects/MDEV/issues/MDEV-35886 Hi, On Tue, Jan 21, 2025 at 08:06:18PM +0100, Bernhard Schmidt wrote: > Control: affects -1 src:mariadb > Control: tags -1 + confirmed > Control: severity -1 critical > > Seeing this too. We have two standalone systems running the stock > bookworm MariaDB and the opensource network management system LibreNMS, > which is quite write-heavy. After some time (sometimes a couple of > hours, sometimes 1-2 days) all connection slots to the database are > full. > > When you kill one client process you can connect and issue "show > processlist", you see all slots busy with easy update/select queries > that have been running for hours. You need to SIGKILL mariadbd to > recover. > > The last two days our colleagues running a Galera cluster (unsure about > the version, inquiring) have been affected by this as well. They found > an mariadb bug report about this. > > https://jira.mariadb.org/projects/MDEV/issues/MDEV-35886?filter=allopenissues > > Since there have been reports about data loss I think it warrants > increasing the severity to critical. > > I'm not 100% sure about -30 though, we have been downgrading the > production system to -28 and upgraded the test system to -30, and both > are working fine. The test system has less load though, and I trust the > reports here that -30 is still broken. I would be interested to know if someone is able to reproduce the issue more in under lab conditions, which would enable us to bisect the issue. As a start I set the above issue as a forward, to have the issues linked (and we later on can update it to the linux upstream report). Regards, Salvatore
Bug#1093243: linux-image-6.1.0-29-amd64 causes mariadb hangs
Dear all, I have found that at least in my case sending a SIGSTOP followed by a SIGCONT to the MariaDB process, i.e., kill -STOP $(pgrep -f mariadb) ; kill -CONT $(pgrep -f mariadb) is sufficient to bring it back to life. I believe this approach has fewer side effects compared to using SIGKILL. Also attaching gdb and detaching it works, too, this is how I found out about this. Best regards, Max On Tue, 21 Jan 2025 20:06:18 +0100 Bernhard Schmidt wrote: > When you kill one client process you can connect and issue "show > processlist", you see all slots busy with easy update/select queries > that have been running for hours. You need to SIGKILL mariadbd to > recover.
Bug#1093243: linux-image-6.1.0-29-amd64 causes mariadb hangs
Control: affects -1 src:mariadb Control: tags -1 + confirmed Control: severity -1 critical Seeing this too. We have two standalone systems running the stock bookworm MariaDB and the opensource network management system LibreNMS, which is quite write-heavy. After some time (sometimes a couple of hours, sometimes 1-2 days) all connection slots to the database are full. When you kill one client process you can connect and issue "show processlist", you see all slots busy with easy update/select queries that have been running for hours. You need to SIGKILL mariadbd to recover. The last two days our colleagues running a Galera cluster (unsure about the version, inquiring) have been affected by this as well. They found an mariadb bug report about this. https://jira.mariadb.org/projects/MDEV/issues/MDEV-35886?filter=allopenissues Since there have been reports about data loss I think it warrants increasing the severity to critical. I'm not 100% sure about -30 though, we have been downgrading the production system to -28 and upgraded the test system to -30, and both are working fine. The test system has less load though, and I trust the reports here that -30 is still broken.
Processed: Re: Bug#1093243: linux-image-6.1.0-29-amd64 causes mariadb hangs
Processing control commands: > affects -1 src:mariadb Bug #1093243 [src:linux] linux-image-6.1.0-29-amd64 causes mariadb hangs Added indication that 1093243 affects src:mariadb > tags -1 + confirmed Bug #1093243 [src:linux] linux-image-6.1.0-29-amd64 causes mariadb hangs Added tag(s) confirmed. > severity -1 critical Bug #1093243 [src:linux] linux-image-6.1.0-29-amd64 causes mariadb hangs Severity set to 'critical' from 'important' -- 1093243: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1093243 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#1093243: linux-image-6.1.0-29-amd64 causes mariadb hangs
Hi, we can confirm this issue on our systems (Debian 12) as well. It appeared after updating the Kernel from 6.1.0-28-amd64 to 6.1.0-30-amd64. We experienced issues with the db requests that were not processed. Connecting to mariadb was still possible but some updates on tables got hung resulting in a locked tables. The issue is fully reproducible on multiple identical systems when updating to kernel 6.1.0.30 and it vanishes after booting with the old kernel 6.1.0.28. -- E-Mail: maib...@dfn.de | Fon: +49 711 63314-219 | Fax: +49 30884299-370 __ DFN - Deutsches Forschungsnetz | German National Research and Education Network Verein zur Förderung eines Deutschen Forschungsnetzes e.V. Alexanderplatz 1 | 10178 Berlin https://www.dfn.de Vorstand: Prof. Dr.-Ing. Stefan Wesner | Prof. Dr. Helmut Reiser | Christian Zens Geschäftsführung: Dr. Christian Grimm | Jochem Pattloch VR AG Charlottenburg 7729B | USt.-ID. DE 136623822 smime.p7s Description: S/MIME Cryptographic Signature
Bug#1093243: linux-image-6.1.0-29-amd64 causes mariadb hangs
Max, Good to know it's not just me. Am I correct that your 10.11.6 MariaDB binaries are the Debian-provided ones? I'm running the Debian binaries provided by MariaDB, version 10.11.9. Hopefully it's a useful data point to know that it's happening on both. I might file a MariaDB bug report too. -Xan
Bug#1093243: linux-image-6.1.0-29-amd64 causes mariadb hangs
Dear all, I experience exactly the same with 6.1.0-29-amd64 and 6.1.0-30-amd64. MariaDB is just a standalone instance (mariadb Ver 15.1 Distrib 10.11.6-MariaDB). It happens on two different VMs. When I use gdb to attach to the stuck MariaDB, the stack trace shows a lot of threads waiting in __futex_abstimed_wait_common64. As soon as I detach the debugger, MariaDB continues to work. Thank you! On Thu, 16 Jan 2025 14:54:52 -0600 Xan Charbonnet wrote: > Package: src:linux > Version: 6.1.123-1 > Severity: important > X-Debbugs-Cc: x...@charbonnet.com > > Dear Maintainer, > > One of my machines runs mariadb as an asynchronous slave of a production > database. Every day it makes an LVM snapshot of its database partition, then > starts a secondary instance of mariadb with the snapshot as its data drive. > It then outputs everything, piped through bzip3, to an archive file. > > This has worked for a long time but recently started failing: the secondary > mariadb instance will claim to still be executing a SELECT, but no data will > ever be put into the pipe. > > This appears to have started after bookworm 12.9 on the evening of Sunday the > 12th. So I rolled back to kernel 6.1.119-1 from linux-image-6.1.0-28-amd64. > It worked perfectly. My other backup server which I have not yet upgraded to > Debian 12.9 is also working perfectly. > > There are other weird things too which I have associated with the new kernel: > the mariadb replication on this machine will sometimes just stop. It appears > to be writing a commit, but I have to kill -9 it. > > And over in production, two of the servers in a Galera cluster were upgraded. > The cluster has been unstable since then: every once in a while, one of the > members will just stop being able to do any writing and I have to kill -9 the > mariadb process. > > But the issue which I can readily repeat any time is the one I first brought > up. Here is some data from experiments I have run: > > 6.1.0-28-amd64 -- replication able to catch up > Process started 2025-01-15 19:46 > Canceled by user 2025-01-15 20:52 47179030576 bytes, still being written > (That is, this was working: still running after an hour, file still growing.) > > 6.1.0-29-amd64 -- replication unable to catch up > Process started 2025-01-16 11:31 > Canceled by user 2025-01-16 11:46 > 1162798112 bytes, last write 2025-01-16 11:36 > (That is, mariadb stopped outputting any data after 5 minutes and had to be > kill -9'd) > > Attempt 2 started 2025-01-16 13:02 > Canceled by user 2025-01-16 13:19 > 2149409328 bytes, last write 2025-01-16 13:10 > (This one lasted 8 minutes) > > Attempt 3 started 2025-01-16 13:52 > Canceled by user 2025-01-16 14:14 > 3352755856 bytes, last write 2025-01-16 14:07 > (Got 15 minutes this time) > > 6.1.0-30-amd64 -- unsure about replication performance > Process started 2025-01-16 11:51 > Canceled by user 2025-01-16 12:14 > 2956803424 bytes, last write 2025-01-16 12:04 > (The issue persists in 6.1.124-1: hung after 13 minutes) -- Max Ried Heinrich-Heine-Universität Düsseldorf Institut für Informatik Systemadministration Gebäude 25.12, Raum 01.28 Universitätsstr. 1 40225 Düsseldorf Tel: +49 211 81 10715 Thread 39 (Thread 0x7ff2705596c0 (LWP 98731) "mariadbd"): #0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x7ff270558c80, op=393, expected=0, futex_word=0x55cb6ac9428c ) at ./nptl/futex-internal.c:57 #1 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x55cb6ac9428c , expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x7ff270558c80, private=private@entry=0, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87 #2 0x7ff29b4a4f7b in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x55cb6ac9428c , expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x7ff270558c80, private=private@entry=0) at ./nptl/futex-internal.c:139 #3 0x7ff29b4a78bc in __pthread_cond_wait_common (abstime=0x7ff270558c80, clockid=0, mutex=0x55cb6ac942d0 , cond=0x55cb6ac94260 ) at ./nptl/pthread_cond_wait.c:503 #4 ___pthread_cond_timedwait64 (cond=0x55cb6ac94260 , mutex=0x55cb6ac942d0 , abstime=0x7ff270558c80) at ./nptl/pthread_cond_wait.c:643 #5 0x55cb69d0accc in do_handle_one_connection(CONNECT*, bool) () #6 0x55cb69d0af8d in handle_one_connection () #7 0x55cb6a024440 in ?? () #8 0x7ff29b4a81c4 in start_thread (arg=) at ./nptl/pthread_create.c:442 #9 0x7ff29b52885c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 Thread 38 (Thread 0x7ff27827c6c0 (LWP 98722) "mariadbd"): #0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55cb6abe62a4) at ./nptl/futex-internal.c:57 #1 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x55cb6abe62a4, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true)
Bug#1093243: linux-image-6.1.0-29-amd64 causes mariadb hangs
Package: src:linux Version: 6.1.123-1 Severity: important X-Debbugs-Cc: x...@charbonnet.com Dear Maintainer, One of my machines runs mariadb as an asynchronous slave of a production database. Every day it makes an LVM snapshot of its database partition, then starts a secondary instance of mariadb with the snapshot as its data drive. It then outputs everything, piped through bzip3, to an archive file. This has worked for a long time but recently started failing: the secondary mariadb instance will claim to still be executing a SELECT, but no data will ever be put into the pipe. This appears to have started after bookworm 12.9 on the evening of Sunday the 12th. So I rolled back to kernel 6.1.119-1 from linux-image-6.1.0-28-amd64. It worked perfectly. My other backup server which I have not yet upgraded to Debian 12.9 is also working perfectly. There are other weird things too which I have associated with the new kernel: the mariadb replication on this machine will sometimes just stop. It appears to be writing a commit, but I have to kill -9 it. And over in production, two of the servers in a Galera cluster were upgraded. The cluster has been unstable since then: every once in a while, one of the members will just stop being able to do any writing and I have to kill -9 the mariadb process. But the issue which I can readily repeat any time is the one I first brought up. Here is some data from experiments I have run: 6.1.0-28-amd64 -- replication able to catch up Process started 2025-01-15 19:46 Canceled by user 2025-01-15 20:52 47179030576 bytes, still being written (That is, this was working: still running after an hour, file still growing.) 6.1.0-29-amd64 -- replication unable to catch up Process started 2025-01-16 11:31 Canceled by user 2025-01-16 11:46 1162798112 bytes, last write 2025-01-16 11:36 (That is, mariadb stopped outputting any data after 5 minutes and had to be kill -9'd) Attempt 2 started 2025-01-16 13:02 Canceled by user 2025-01-16 13:19 2149409328 bytes, last write 2025-01-16 13:10 (This one lasted 8 minutes) Attempt 3 started 2025-01-16 13:52 Canceled by user 2025-01-16 14:14 3352755856 bytes, last write 2025-01-16 14:07 (Got 15 minutes this time) 6.1.0-30-amd64 -- unsure about replication performance Process started 2025-01-16 11:51 Canceled by user 2025-01-16 12:14 2956803424 bytes, last write 2025-01-16 12:04 (The issue persists in 6.1.124-1: hung after 13 minutes) Please let me know what other information I can provide. As indicated, I can make this problem happen at will, which might be helpful for troubleshooting. Thank you! -- Package-specific info: ** Version: Linux version 6.1.0-29-amd64 (debian-kernel@lists.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 6.1.123-1 (2025-01-02) ** Command line: BOOT_IMAGE=/vmlinuz-6.1.0-29-amd64 root=UUID=ac0f357b-d33c-418b-9451-b61a7b60d592 ro quiet ** Not tainted ** Kernel log: [ 24.398913] systemd[1]: Starting systemd-journald.service - Journal Service... [ 24.409135] systemd[1]: Starting systemd-modules-load.service - Load Kernel Modules... [ 24.409745] systemd[1]: Starting systemd-remount-fs.service - Remount Root and Kernel File Systems... [ 24.410343] systemd[1]: Starting systemd-udev-trigger.service - Coldplug All udev Devices... [ 24.411245] systemd[1]: Finished kmod-static-nodes.service - Create List of Static Device Nodes. [ 24.416991] systemd[1]: modprobe@efi_pstore.service: Deactivated successfully. [ 24.417110] systemd[1]: Finished modprobe@efi_pstore.service - Load Kernel Module efi_pstore. [ 24.431690] systemd[1]: modprobe@configfs.service: Deactivated successfully. [ 24.431806] systemd[1]: Finished modprobe@configfs.service - Load Kernel Module configfs. [ 24.432453] systemd[1]: Mounting sys-kernel-config.mount - Kernel Configuration File System... [ 24.485775] loop: module loaded [ 24.489145] ACPI: bus type drm_connector registered [ 24.489198] systemd[1]: modprobe@loop.service: Deactivated successfully. [ 24.489327] systemd[1]: Finished modprobe@loop.service - Load Kernel Module loop. [ 24.489939] systemd[1]: modprobe@drm.service: Deactivated successfully. [ 24.490048] systemd[1]: Finished modprobe@drm.service - Load Kernel Module drm. [ 24.491592] systemd[1]: Finished systemd-modules-load.service - Load Kernel Modules. [ 24.492253] systemd[1]: Starting systemd-sysctl.service - Apply Kernel Variables... [ 24.503090] fuse: init (API version 7.38) [ 24.503584] systemd[1]: modprobe@fuse.service: Deactivated successfully. [ 24.503712] systemd[1]: Finished modprobe@fuse.service - Load Kernel Module fuse. [ 24.504491] systemd[1]: Mounting sys-fs-fuse-connections.mount - FUSE Control File System... [ 24.506343] device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log. [ 24.506364] device-mapper: uevent: version