Processed: Re: Bug#1093243: linux-image-6.1.0-29-amd64 causes mariadb hangs

2025-01-22 Thread Debian Bug Tracking System
Processing control commands:

> forwarded -1 https://jira.mariadb.org/projects/MDEV/issues/MDEV-35886
Bug #1093243 [src:linux] linux-image-6.1.0-29-amd64 causes mariadb hangs
Set Bug forwarded-to-address to 
'https://jira.mariadb.org/projects/MDEV/issues/MDEV-35886'.

-- 
1093243: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1093243
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems



Bug#1093243: linux-image-6.1.0-29-amd64 causes mariadb hangs

2025-01-22 Thread Salvatore Bonaccorso
Control: forwarded -1 https://jira.mariadb.org/projects/MDEV/issues/MDEV-35886
Hi,

On Tue, Jan 21, 2025 at 08:06:18PM +0100, Bernhard Schmidt wrote:
> Control: affects -1 src:mariadb
> Control: tags -1 + confirmed
> Control: severity -1 critical
> 
> Seeing this too. We have two standalone systems running the stock
> bookworm MariaDB and the opensource network management system LibreNMS,
> which is quite write-heavy. After some time (sometimes a couple of
> hours, sometimes 1-2 days) all connection slots to the database are
> full.
> 
> When you kill one client process you can connect and issue "show
> processlist", you see all slots busy with easy update/select queries
> that have been running for hours. You need to SIGKILL mariadbd to
> recover.
> 
> The last two days our colleagues running a Galera cluster (unsure about
> the version, inquiring) have been affected by this as well. They found
> an mariadb bug report about this.
> 
> https://jira.mariadb.org/projects/MDEV/issues/MDEV-35886?filter=allopenissues
> 
> Since there have been reports about data loss I think it warrants
> increasing the severity to critical.
> 
> I'm not 100% sure about -30 though, we have been downgrading the
> production system to -28 and upgraded the test system to -30, and both
> are working fine. The test system has less load though, and I trust the
> reports here that -30 is still broken.

I would be interested to know if someone is able to reproduce the
issue more in under lab conditions, which would enable us to bisect
the issue.

As a start I set the above issue as a forward, to have the issues
linked (and we later on can update it to the linux upstream report).

Regards,
Salvatore



Bug#1093243: linux-image-6.1.0-29-amd64 causes mariadb hangs

2025-01-21 Thread Max Jakub Ried
Dear all,

I have found that at least in my case sending a SIGSTOP followed by a SIGCONT 
to the MariaDB process, i.e.,
kill -STOP $(pgrep -f mariadb) ; kill -CONT $(pgrep -f mariadb)

is sufficient to bring it back to life. I believe this approach has fewer side 
effects compared to using SIGKILL. Also attaching gdb and detaching it works, 
too, this is how I found out about this.


Best regards,
Max




On Tue, 21 Jan 2025 20:06:18 +0100 Bernhard Schmidt  wrote:

> When you kill one client process you can connect and issue "show
> processlist", you see all slots busy with easy update/select queries
> that have been running for hours. You need to SIGKILL mariadbd to
> recover.


Bug#1093243: linux-image-6.1.0-29-amd64 causes mariadb hangs

2025-01-21 Thread Bernhard Schmidt
Control: affects -1 src:mariadb
Control: tags -1 + confirmed
Control: severity -1 critical

Seeing this too. We have two standalone systems running the stock
bookworm MariaDB and the opensource network management system LibreNMS,
which is quite write-heavy. After some time (sometimes a couple of
hours, sometimes 1-2 days) all connection slots to the database are
full.

When you kill one client process you can connect and issue "show
processlist", you see all slots busy with easy update/select queries
that have been running for hours. You need to SIGKILL mariadbd to
recover.

The last two days our colleagues running a Galera cluster (unsure about
the version, inquiring) have been affected by this as well. They found
an mariadb bug report about this.

https://jira.mariadb.org/projects/MDEV/issues/MDEV-35886?filter=allopenissues

Since there have been reports about data loss I think it warrants
increasing the severity to critical.

I'm not 100% sure about -30 though, we have been downgrading the
production system to -28 and upgraded the test system to -30, and both
are working fine. The test system has less load though, and I trust the
reports here that -30 is still broken.



Processed: Re: Bug#1093243: linux-image-6.1.0-29-amd64 causes mariadb hangs

2025-01-21 Thread Debian Bug Tracking System
Processing control commands:

> affects -1 src:mariadb
Bug #1093243 [src:linux] linux-image-6.1.0-29-amd64 causes mariadb hangs
Added indication that 1093243 affects src:mariadb
> tags -1 + confirmed
Bug #1093243 [src:linux] linux-image-6.1.0-29-amd64 causes mariadb hangs
Added tag(s) confirmed.
> severity -1 critical
Bug #1093243 [src:linux] linux-image-6.1.0-29-amd64 causes mariadb hangs
Severity set to 'critical' from 'important'

-- 
1093243: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1093243
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems



Bug#1093243: linux-image-6.1.0-29-amd64 causes mariadb hangs

2025-01-20 Thread Volker Maibaum

Hi,

we can confirm this issue on our systems (Debian 12) as well.

It appeared after updating the Kernel from 6.1.0-28-amd64 to 6.1.0-30-amd64.

We experienced issues with the db requests that were not processed. Connecting to 
mariadb was still possible but some updates on tables got hung resulting in a locked 
tables.
The issue is fully reproducible on multiple identical systems when updating to kernel 
6.1.0.30 and it vanishes after booting with the old kernel 6.1.0.28.


--
E-Mail: maib...@dfn.de | Fon: +49 711 63314-219 | Fax: +49 30884299-370
__

DFN - Deutsches Forschungsnetz | German National Research and Education Network
Verein zur Förderung eines Deutschen Forschungsnetzes e.V.
Alexanderplatz 1 | 10178 Berlin
https://www.dfn.de

Vorstand: Prof. Dr.-Ing. Stefan Wesner | Prof. Dr. Helmut Reiser | Christian 
Zens
Geschäftsführung: Dr. Christian Grimm | Jochem Pattloch
VR AG Charlottenburg 7729B | USt.-ID. DE 136623822



smime.p7s
Description: S/MIME Cryptographic Signature


Bug#1093243: linux-image-6.1.0-29-amd64 causes mariadb hangs

2025-01-19 Thread Xan Charbonnet

Max,

Good to know it's not just me.  Am I correct that your 10.11.6 MariaDB 
binaries are the Debian-provided ones?  I'm running the Debian binaries 
provided by MariaDB, version 10.11.9.  Hopefully it's a useful data 
point to know that it's happening on both.  I might file a MariaDB bug 
report too.


-Xan



Bug#1093243: linux-image-6.1.0-29-amd64 causes mariadb hangs

2025-01-17 Thread Max Jakub Ried

Dear all,

I experience exactly the same with 6.1.0-29-amd64 and 6.1.0-30-amd64. 
MariaDB is just a standalone instance (mariadb Ver 15.1 Distrib 
10.11.6-MariaDB). It happens on two different VMs. When I use gdb to 
attach to the stuck MariaDB, the stack trace shows a lot of threads 
waiting in __futex_abstimed_wait_common64. As soon as I detach the 
debugger, MariaDB continues to work.


Thank you!

On Thu, 16 Jan 2025 14:54:52 -0600 Xan Charbonnet  
wrote:


> Package: src:linux
> Version: 6.1.123-1
> Severity: important
> X-Debbugs-Cc: x...@charbonnet.com
>
> Dear Maintainer,
>
> One of my machines runs mariadb as an asynchronous slave of a production
> database. Every day it makes an LVM snapshot of its database 
partition, then
> starts a secondary instance of mariadb with the snapshot as its data 
drive.

> It then outputs everything, piped through bzip3, to an archive file.
>
> This has worked for a long time but recently started failing: the 
secondary
> mariadb instance will claim to still be executing a SELECT, but no 
data will

> ever be put into the pipe.
>
> This appears to have started after bookworm 12.9 on the evening of 
Sunday the
> 12th. So I rolled back to kernel 6.1.119-1 from 
linux-image-6.1.0-28-amd64.
> It worked perfectly. My other backup server which I have not yet 
upgraded to

> Debian 12.9 is also working perfectly.
>
> There are other weird things too which I have associated with the new 
kernel:
> the mariadb replication on this machine will sometimes just stop. It 
appears

> to be writing a commit, but I have to kill -9 it.
>
> And over in production, two of the servers in a Galera cluster were 
upgraded.
> The cluster has been unstable since then: every once in a while, one 
of the
> members will just stop being able to do any writing and I have to 
kill -9 the

> mariadb process.
>
> But the issue which I can readily repeat any time is the one I first 
brought

> up. Here is some data from experiments I have run:
>
> 6.1.0-28-amd64 -- replication able to catch up
> Process started 2025-01-15 19:46
> Canceled by user 2025-01-15 20:52 47179030576 bytes, still being written
> (That is, this was working: still running after an hour, file still 
growing.)

>
> 6.1.0-29-amd64 -- replication unable to catch up
> Process started 2025-01-16 11:31
> Canceled by user 2025-01-16 11:46
> 1162798112 bytes, last write 2025-01-16 11:36
> (That is, mariadb stopped outputting any data after 5 minutes and had 
to be

> kill -9'd)
>
> Attempt 2 started 2025-01-16 13:02
> Canceled by user 2025-01-16 13:19
> 2149409328 bytes, last write 2025-01-16 13:10
> (This one lasted 8 minutes)
>
> Attempt 3 started 2025-01-16 13:52
> Canceled by user 2025-01-16 14:14
> 3352755856 bytes, last write 2025-01-16 14:07
> (Got 15 minutes this time)
>
> 6.1.0-30-amd64 -- unsure about replication performance
> Process started 2025-01-16 11:51
> Canceled by user 2025-01-16 12:14
> 2956803424 bytes, last write 2025-01-16 12:04
> (The issue persists in 6.1.124-1: hung after 13 minutes)

--
Max Ried
Heinrich-Heine-Universität Düsseldorf
Institut für Informatik
Systemadministration
Gebäude 25.12, Raum 01.28
Universitätsstr. 1
40225 Düsseldorf
Tel: +49 211 81 10715

Thread 39 (Thread 0x7ff2705596c0 (LWP 98731) "mariadbd"):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, 
abstime=0x7ff270558c80, op=393, expected=0, futex_word=0x55cb6ac9428c 
) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x55cb6ac9428c 
, expected=expected@entry=0, clockid=clockid@entry=0, 
abstime=abstime@entry=0x7ff270558c80, private=private@entry=0, 
cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
#2  0x7ff29b4a4f7b in __GI___futex_abstimed_wait_cancelable64 
(futex_word=futex_word@entry=0x55cb6ac9428c , 
expected=expected@entry=0, clockid=clockid@entry=0, 
abstime=abstime@entry=0x7ff270558c80, private=private@entry=0) at 
./nptl/futex-internal.c:139
#3  0x7ff29b4a78bc in __pthread_cond_wait_common (abstime=0x7ff270558c80, 
clockid=0, mutex=0x55cb6ac942d0 , cond=0x55cb6ac94260 
) at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_timedwait64 (cond=0x55cb6ac94260 , 
mutex=0x55cb6ac942d0 , abstime=0x7ff270558c80) at 
./nptl/pthread_cond_wait.c:643
#5  0x55cb69d0accc in do_handle_one_connection(CONNECT*, bool) ()
#6  0x55cb69d0af8d in handle_one_connection ()
#7  0x55cb6a024440 in ?? ()
#8  0x7ff29b4a81c4 in start_thread (arg=) at 
./nptl/pthread_create.c:442
#9  0x7ff29b52885c in clone3 () at 
../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 38 (Thread 0x7ff27827c6c0 (LWP 98722) "mariadbd"):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, 
op=393, expected=0, futex_word=0x55cb6abe62a4) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x55cb6abe62a4, 
expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, 
private=private@entry=0, cancel=cancel@entry=true) 

Bug#1093243: linux-image-6.1.0-29-amd64 causes mariadb hangs

2025-01-16 Thread Xan Charbonnet
Package: src:linux
Version: 6.1.123-1
Severity: important
X-Debbugs-Cc: x...@charbonnet.com

Dear Maintainer,

One of my machines runs mariadb as an asynchronous slave of a production
database.  Every day it makes an LVM snapshot of its database partition, then
starts a secondary instance of mariadb with the snapshot as its data drive.
It then outputs everything, piped through bzip3, to an archive file.

This has worked for a long time but recently started failing: the secondary
mariadb instance will claim to still be executing a SELECT, but no data will
ever be put into the pipe.

This appears to have started after bookworm 12.9 on the evening of Sunday the
12th.  So I rolled back to kernel 6.1.119-1 from linux-image-6.1.0-28-amd64.
It worked perfectly.  My other backup server which I have not yet upgraded to
Debian 12.9 is also working perfectly.

There are other weird things too which I have associated with the new kernel:
the mariadb replication on this machine will sometimes just stop.  It appears
to be writing a commit, but I have to kill -9 it.

And over in production, two of the servers in a Galera cluster were upgraded.
The cluster has been unstable since then: every once in a while, one of the
members will just stop being able to do any writing and I have to kill -9 the
mariadb process.

But the issue which I can readily repeat any time is the one I first brought
up.  Here is some data from experiments I have run:

6.1.0-28-amd64 -- replication able to catch up
Process started 2025-01-15 19:46
Canceled by user 2025-01-15 20:52 47179030576 bytes, still being written
(That is, this was working: still running after an hour, file still growing.)

6.1.0-29-amd64 -- replication unable to catch up
Process started 2025-01-16 11:31
Canceled by user 2025-01-16 11:46
1162798112 bytes, last write 2025-01-16 11:36
(That is, mariadb stopped outputting any data after 5 minutes and had to be
kill -9'd)

Attempt 2 started 2025-01-16 13:02
Canceled by user 2025-01-16 13:19
2149409328 bytes, last write 2025-01-16 13:10
(This one lasted 8 minutes)

Attempt 3 started 2025-01-16 13:52
Canceled by user 2025-01-16 14:14
3352755856 bytes, last write 2025-01-16 14:07
(Got 15 minutes this time)

6.1.0-30-amd64 -- unsure about replication performance
Process started 2025-01-16 11:51
Canceled by user 2025-01-16 12:14
2956803424 bytes, last write 2025-01-16 12:04
(The issue persists in 6.1.124-1: hung after 13 minutes)


Please let me know what other information I can provide.  As indicated, I can
make this problem happen at will, which might be helpful for troubleshooting.

Thank you!





-- Package-specific info:
** Version:
Linux version 6.1.0-29-amd64 (debian-kernel@lists.debian.org) (gcc-12 (Debian 
12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP 
PREEMPT_DYNAMIC Debian 6.1.123-1 (2025-01-02)

** Command line:
BOOT_IMAGE=/vmlinuz-6.1.0-29-amd64 
root=UUID=ac0f357b-d33c-418b-9451-b61a7b60d592 ro quiet

** Not tainted

** Kernel log:
[   24.398913] systemd[1]: Starting systemd-journald.service - Journal 
Service...
[   24.409135] systemd[1]: Starting systemd-modules-load.service - Load Kernel 
Modules...
[   24.409745] systemd[1]: Starting systemd-remount-fs.service - Remount Root 
and Kernel File Systems...
[   24.410343] systemd[1]: Starting systemd-udev-trigger.service - Coldplug All 
udev Devices...
[   24.411245] systemd[1]: Finished kmod-static-nodes.service - Create List of 
Static Device Nodes.
[   24.416991] systemd[1]: modprobe@efi_pstore.service: Deactivated 
successfully.
[   24.417110] systemd[1]: Finished modprobe@efi_pstore.service - Load Kernel 
Module efi_pstore.
[   24.431690] systemd[1]: modprobe@configfs.service: Deactivated successfully.
[   24.431806] systemd[1]: Finished modprobe@configfs.service - Load Kernel 
Module configfs.
[   24.432453] systemd[1]: Mounting sys-kernel-config.mount - Kernel 
Configuration File System...
[   24.485775] loop: module loaded
[   24.489145] ACPI: bus type drm_connector registered
[   24.489198] systemd[1]: modprobe@loop.service: Deactivated successfully.
[   24.489327] systemd[1]: Finished modprobe@loop.service - Load Kernel Module 
loop.
[   24.489939] systemd[1]: modprobe@drm.service: Deactivated successfully.
[   24.490048] systemd[1]: Finished modprobe@drm.service - Load Kernel Module 
drm.
[   24.491592] systemd[1]: Finished systemd-modules-load.service - Load Kernel 
Modules.
[   24.492253] systemd[1]: Starting systemd-sysctl.service - Apply Kernel 
Variables...
[   24.503090] fuse: init (API version 7.38)
[   24.503584] systemd[1]: modprobe@fuse.service: Deactivated successfully.
[   24.503712] systemd[1]: Finished modprobe@fuse.service - Load Kernel Module 
fuse.
[   24.504491] systemd[1]: Mounting sys-fs-fuse-connections.mount - FUSE 
Control File System...
[   24.506343] device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. 
Duplicate IMA measurements will not be recorded in the IMA log.
[   24.506364] device-mapper: uevent: version