Bug#1037223: Possible bug causing I/O hangs
Hi Salvatore, Thanks for your response! > From the screenshot I guess you mean 5.10.179-1, or possibly already in 5.10.178-3? Ah yes, I took the version from https://packages.debian.org/bullseye/linux-image-amd64 which mentions 5.10.178-3 as the current version, however indeed we are running 5.10.179-1: apt-cache policy linux-image-amd64 linux-image-amd64: Installed: 5.10.179-1 Candidate: 5.10.179-1 Version table: 6.1.20-2~bpo11+1 100 100 http://ftp.nl.debian.org/debian bullseye-backports/main amd64 Packages *** 5.10.179-1 500 500 http://security.debian.org/debian-security bullseye-security/updates/main amd64 Packages 100 /var/lib/dpkg/status 5.10.178-3 500 500 http://ftp.nl.debian.org/debian bullseye/main amd64 Packages I can see that we upgraded from 5.10.158-2, where the issue did not occur. Interestingly this is *older* than the version you mentioned as having received the fix for the issue I thought was the cause (5.10.163). Thank you for for confirming the possible fix I mentioned is already in the kernel we are running. This is unfortunate for me as it means the cause remains unknown. We have a VM where we can sort-of reproduce the issue but we haven't been able to reliably reproduce it. I.e. it takes ~18-24 hours of stress-testing the VM before we see the issue occur. This is why bisecting will be difficult, but I understand it is very helpful if we are able to do so. I will report back if I have any additional information. Best regards, Niels Hendriks From: Salvatore Bonaccorso To: Niels Hendriks , <1037...@bugs.debian.org> Sent: 08/06/2023 9:23 PM Subject: Re: Bug#1037223: Possible bug causing I/O hangs Control: tags -1 + moreinfo Hi Niels, On Thu, Jun 08, 2023 at 11:33:13AM +0200, Niels Hendriks wrote: > Package: linux-image-amd64 > Version: 5.10.178-3 >From the screenshot I guess you mean 5.10.179-1, or possibly already in 5.10.178-3? > > > Hi all, > > I do not usually report kernel bugs so hopefully this is the right > place! > > We recently updated the kernel of our Debian 11 servers and since > then we have encountered a bunch of servers (both VMs and bare > metal) that suffer I/O hanging issues. > We can access the server through a console where I cannot copy text, > but I have attached a screenshot showing the message we see in > dmesg. > > We initially thought this was related to the ext4 fast_commit > feature flag we have enabled, and we do feel the issue occurs less > often with fast_commit disabled, but it does not appear to be solved > completely when we disable this feature. > > With this error, we've been googling a bit and I ended up on this > thread: https://www.spinics.net/lists/linux-ext4/msg86261.html > through initially https://github.com/flatcar/Flatcar/issues/847 It > mentions this > fix: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/fs/ext4?h=linux-5.15.y=5bc0b2fda4b47c86278f7c6d30c211f425bf51cf > > I believe this fix is currently not present in the 5.10 kernel > available for Debian 11. That commit is upstream commit a44e84a9b7764c72896f7241a0ec9ac7e7ef38dd, which was backported to various stable series, in particular 5.10.163 with 1be16a0c2f10186df505e28b0cc92d7f3366e2a8 . > > However, the linked fix also mentions: > > This bug has been around for many years, but it became *much* easier > to hit after commit 65f8b80053a1 ("ext4: fix race when reusing xattr > blocks"). > > Looking at the > changelog: > https://metadata.ftp-master.debian.org/changelogs//main/l/linux-signed-amd64/linux-signed-amd64_5.10.178+3_changelog > > We do see the "ext4: fix race when reusing xattr blocks" change > being added in 5.10.178-1. This is why we believe we are now > hitting this bug. > > My question is whether this seems plausible, and if so, whether the > fix I linked can also be released for Debian 11? Right now I do not see that to be the cause, as the above mentioned commit *is* in the version, unless I'm missunderstanding. > > We could also upgrade to the bullseye-backports kernel, but given > that this issue makes the system essentially unusable and we hit it > every few days on one of our servers it may be more widespread and > worth it to fix it in the regular bullseye kernel as well. Do you had a 5.10.y kernel which was fine, and can you bisect the changes between that version and 5.10.179 to pin point the first bad commit causing the issue? If your infrastructure is prepared to do so, next steps might involve trying the most recent 5.10.y kernel to see if it still exhibit the problem, then going up to newer stable series and/or mainline. Please in particular test the current 5.10.182 upstream as it has interesting ext4 related changes between 5.10.179 and 5.10.182. Regards, Salvatore
Bug#1037223: Possible bug causing I/O hangs
Package: linux-image-amd64 Version: 5.10.178-3 Hi all, I do not usually report kernel bugs so hopefully this is the right place! We recently updated the kernel of our Debian 11 servers and since then we have encountered a bunch of servers (both VMs and bare metal) that suffer I/O hanging issues. We can access the server through a console where I cannot copy text, but I have attached a screenshot showing the message we see in dmesg. We initially thought this was related to the ext4 fast_commit feature flag we have enabled, and we do feel the issue occurs less often with fast_commit disabled, but it does not appear to be solved completely when we disable this feature. With this error, we've been googling a bit and I ended up on this thread: https://www.spinics.net/lists/linux-ext4/msg86261.html through initially https://github.com/flatcar/Flatcar/issues/847 It mentions this fix: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/fs/ext4?h=linux-5.15.y=5bc0b2fda4b47c86278f7c6d30c211f425bf51cf I believe this fix is currently not present in the 5.10 kernel available for Debian 11. However, the linked fix also mentions: > This bug has been around for many years, but it became *much* easier to hit after commit 65f8b80053a1 ("ext4: fix race when reusing xattr blocks"). Looking at the changelog: https://metadata.ftp-master.debian.org/changelogs//main/l/linux-signed-amd64/linux-signed-amd64_5.10.178+3_changelog We do see the "ext4: fix race when reusing xattr blocks" change being added in 5.10.178-1. This is why we believe we are now hitting this bug. My question is whether this seems plausible, and if so, whether the fix I linked can also be released for Debian 11? We could also upgrade to the bullseye-backports kernel, but given that this issue makes the system essentially unusable and we hit it every few days on one of our servers it may be more widespread and worth it to fix it in the regular bullseye kernel as well. Thank you! Best regards, Niels Hendriks
Bug#1016710: Update for Buster
Hi, Hopefully this is the right place to ask this. We noticed that CVE-2022-37434 shows no fixed version for Debian buster ( https://security-tracker.debian.org/tracker/CVE-2022-37434 ) Since Bullseye received the fix a >7 days ago we were wondering when Buster would get an updated package. The CVSS score is 9.8, that's why we thought it would also be fixed for Buster. Thanks! Niels __ RootNet B.V. Helpdesk: 024 3500112 (9:00 - 17:30) Service meldingen: rootnet.network Meldingen via Twitter: twitter.com/RootnetNL
Bug#887106: Bug#886630: linux-image-3.2.0-5-amd64 Kernel panic after upgrading when use hidepid Debian wheezy
Hi Ben, Thanks for your response. Is there any ETA for when the new version will be released ? We'd like to patch the Meltdown vulnerability and also keep hidepid enabled, but currently the system is unusable with this kernel and hidepid enabled. We tried running with wheezy-backports but it seems that kernel doesn't have the meltdown patch yet. We'd prefer not to compile the kernel manually from source. Thank you, Niels Hendriks On 15 January 2018 at 00:02, Ben Hutchings <b...@decadent.org.uk> wrote: > Control: tag -1 patch > > On Mon, 2018-01-08 at 10:29 +0100, Camilo Echevarne wrote: > [...] > > After updating the linux-image-amd64 system package, when we try to > > mount proc with the hidepid option the server throws a kernel panic. > [...] > > This is a warning, not a panic (which would stop the kernel > completely). Still, I assume that the permission denial makes it > impractical to use the system with hidepid enabled. > > This problem was not caused by any of the fixes in the latest update, > but by a fix in 3.2.93 that meant I should have updated the backport of > the hidepid feature. However, I added a binary compatibility patch to > avoid problems like this with any out-of-tree users of the API, and > that hid the problem until I bumped the ABI number and removed all the > binary compatibility patches. > > I'll fix this in the next upload. As a temporary measure, you can > rebuild the kernel package with the attached patch, by following the > instructions here: > https://kernel-handbook.alioth.debian.org/ch-common- > tasks.html#s-common-official > > Ben. > > -- > Ben Hutchings > The generation of random numbers is too important to be left to chance. >- Robert Coveyou >
Bug#868568: adduser - deluser command says user has running processes when user has a custom UID assigned
Package: adduser Version: 3.115 OS: Debian 9 amd64 Linux debian 4.9.15-x86_64-linode81 #1 SMP Fri Mar 17 09:47:36 EDT 2017 x86_64 GNU/Linux dpkg -s libc6 | grep ^Version Version: 2.24-11 Hello, On a clean Debian 9 amd64 install the deluser command detects running processes from the user I am trying to delete, while no processes are running under that user. This is only the case when the user has a special UID assigned. Steps to reproduce on a clean Debian 9 install. I did this on a Linode upon the first login with SSH as root: adduser foo adduser foo1234 --uid 101234 adduser foo1234 sudo su - foo1234 sudo su - # enter sudo password deluser foo This gives the following output: root@debian:~# adduser foo Adding user `foo' ... Adding new group `foo' (1000) ... Adding new user `foo' (1000) with group `foo' ... Creating home directory `/home/foo' ... Copying files from `/etc/skel' ... Enter new UNIX password: Retype new UNIX password: passwd: password updated successfully Changing the user information for foo Enter the new value, or press ENTER for the default Full Name []: Room Number []: Work Phone []: Home Phone []: Other []: Is the information correct? [Y/n] y root@debian:~# adduser foo1234 --uid 101234 Adding user `foo1234' ... Adding new group `foo1234' (101234) ... Adding new user `foo1234' (101234) with group `foo1234' ... Creating home directory `/home/foo1234' ... Copying files from `/etc/skel' ... Enter new UNIX password: Retype new UNIX password: passwd: password updated successfully Changing the user information for foo1234 Enter the new value, or press ENTER for the default Full Name []: Room Number []: Work Phone []: Home Phone []: Other []: Is the information correct? [Y/n] y root@debian:~# adduser foo1234 sudo Adding user `foo1234' to group `sudo' ... Adding user foo1234 to group sudo Done. root@debian:~# su - foo1234 foo1234@debian:~$ sudo su - [sudo] password for foo1234: root@debian:~# deluser foo Removing user `foo' ... Warning: group `foo' has no more members. userdel: user foo is currently used by process 4892 /usr/sbin/deluser: `/usr/sbin/userdel foo' returned error code 8. Exiting. root@debian:~# root@debian:~# ps aux | grep 4892 foo1234 4892 0.0 0.4 20936 4740 pts/0S17:54 0:00 -su root 4917 0.0 0.0 12788 940 pts/0S+ 17:55 0:00 grep 4892 As you can see I am unable to delete the user foo because of the running process with pid 4892, which is actually the process from user foo1234 When I do not assign a specific UID to the user foo1234 it works correctly and as expected. I also used specific UIDs in debian 8, and this issue did not pop up there. It seems to be new in debian 9. This is my first bug report to Debian so I hope all required information is present and that you are able to reproduce it with the above steps. Thanks, Niels Hendriks