Bug#1037223: Possible bug causing I/O hangs

2023-06-08 Thread Niels Hendriks
Hi Salvatore,


Thanks for your response!


> From the screenshot I guess you mean 5.10.179-1, or possibly already
in 5.10.178-3?


Ah yes, I took the version from 
https://packages.debian.org/bullseye/linux-image-amd64 which mentions 
5.10.178-3 as the current version, however indeed we are running 5.10.179-1:



apt-cache policy linux-image-amd64
linux-image-amd64:
  Installed: 5.10.179-1
  Candidate: 5.10.179-1
  Version table:
     6.1.20-2~bpo11+1 100
        100 http://ftp.nl.debian.org/debian bullseye-backports/main amd64 
Packages
 *** 5.10.179-1 500
        500 http://security.debian.org/debian-security 
bullseye-security/updates/main amd64 Packages
        100 /var/lib/dpkg/status
     5.10.178-3 500
        500 http://ftp.nl.debian.org/debian bullseye/main amd64 Packages


I can see that we upgraded from 5.10.158-2, where the issue did not occur. 
Interestingly this is *older* than the version you mentioned as having received 
the fix for the issue I thought was the cause (5.10.163).


Thank you for for confirming the possible fix I mentioned is already in the 
kernel we are running. This is unfortunate for me as it means the cause remains 
unknown.


We have a VM where we can sort-of reproduce the issue but we haven't been able 
to reliably reproduce it. I.e. it takes ~18-24 hours of stress-testing the VM 
before we see the issue occur. This is why bisecting will be difficult, but I 
understand it is very helpful if we are able to do so.


I will report back if I have any additional information.


Best regards,
Niels Hendriks



 From:   Salvatore Bonaccorso  
 To:   Niels Hendriks , <1037...@bugs.debian.org> 
 Sent:   08/06/2023 9:23 PM 
 Subject:   Re: Bug#1037223: Possible bug causing I/O hangs 

Control: tags -1 + moreinfo 
 
Hi Niels, 
 
On Thu, Jun 08, 2023 at 11:33:13AM +0200, Niels Hendriks wrote: 
> Package: linux-image-amd64 
> Version: 5.10.178-3 
 
>From the screenshot I guess you mean 5.10.179-1, or possibly already 
in 5.10.178-3? 
>  
>  
> Hi all, 
>  
> I do not usually report kernel bugs so hopefully this is the right 
> place! 
>  
> We recently updated the kernel of our Debian 11 servers and since 
> then we have encountered a bunch of servers (both VMs and bare 
> metal) that suffer I/O hanging issues. 
> We can access the server through a console where I cannot copy text, 
> but I have attached a screenshot showing the message we see in 
> dmesg. 
>  
> We initially thought this was related to the ext4 fast_commit 
> feature flag we have enabled, and we do feel the issue occurs less 
> often with fast_commit disabled, but it does not appear to be solved 
> completely when we disable this feature. 
>  
> With this error, we've been googling a bit and I ended up on this 
> thread: https://www.spinics.net/lists/linux-ext4/msg86261.html 
> through initially https://github.com/flatcar/Flatcar/issues/847 It 
> mentions this 
> fix: 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/fs/ext4?h=linux-5.15.y=5bc0b2fda4b47c86278f7c6d30c211f425bf51cf
>  
> I believe this fix is currently not present in the 5.10 kernel 
> available for Debian 11. 
 
That commit is upstream commit 
a44e84a9b7764c72896f7241a0ec9ac7e7ef38dd, which was backported to 
various stable series, in particular 5.10.163 with 
1be16a0c2f10186df505e28b0cc92d7f3366e2a8 . 
>  
> However, the linked fix also mentions: 
> > This bug has been around for many years, but it became *much* easier 
> to hit after commit 65f8b80053a1 ("ext4: fix race when reusing xattr 
> blocks"). 
>  
> Looking at the 
> changelog: 
> https://metadata.ftp-master.debian.org/changelogs//main/l/linux-signed-amd64/linux-signed-amd64_5.10.178+3_changelog
>  
> We do see the "ext4: fix race when reusing xattr blocks" change 
> being added in 5.10.178-1.  This is why we believe we are now 
> hitting this bug. 
>  
> My question is whether this seems plausible, and if so, whether the 
> fix I linked can also be released for Debian 11? 
 
Right now I do not see that to be the cause, as the above mentioned 
commit *is* in the version, unless I'm missunderstanding. 
>  
> We could also upgrade to the bullseye-backports kernel, but given 
> that this issue makes the system essentially unusable and we hit it 
> every few days on one of our servers it may be more widespread and 
> worth it to fix it in the regular bullseye kernel as well. 
 
Do you had a 5.10.y kernel which was fine, and can you bisect the 
changes between that version and 5.10.179 to pin point the first bad 
commit causing the issue? 
 
If your infrastructure is prepared to do so, next steps might involve 
trying the most recent 5.10.y kernel to see if it still exhibit the 
problem, then going up to newer stable series and/or mainline. 
 
Please in particular test the current 5.10.182 upstream as it has 
interesting ext4 related changes between 5.10.179 and 5.10.182. 
 
Regards, 
Salvatore 


Bug#1037223: Possible bug causing I/O hangs

2023-06-08 Thread Niels Hendriks
Package: linux-image-amd64
Version: 5.10.178-3


Hi all,

I do not usually report kernel bugs so hopefully this is the right place!

We recently updated the kernel of our Debian 11 servers and since then we have 
encountered a bunch of servers (both VMs and bare metal) that suffer I/O 
hanging issues.
We can access the server through a console where I cannot copy text, but I have 
attached a screenshot showing the message we see in dmesg.

We initially thought this was related to the ext4 fast_commit feature flag we 
have enabled, and we do feel the issue occurs less often with fast_commit 
disabled, but it does not appear to be solved completely when we disable this 
feature.

With this error, we've been googling a bit and I ended up on this thread: 
https://www.spinics.net/lists/linux-ext4/msg86261.html through initially 
https://github.com/flatcar/Flatcar/issues/847
It mentions this fix: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/fs/ext4?h=linux-5.15.y=5bc0b2fda4b47c86278f7c6d30c211f425bf51cf
I believe this fix is currently not present in the 5.10 kernel available for 
Debian 11.

However, the linked fix also mentions:
> This bug has been around for many years, but it became *much* easier
to hit after commit 65f8b80053a1 ("ext4: fix race when reusing xattr
blocks").

Looking at the changelog: 
https://metadata.ftp-master.debian.org/changelogs//main/l/linux-signed-amd64/linux-signed-amd64_5.10.178+3_changelog
We do see the "ext4: fix race when reusing xattr blocks" change being added in 
5.10.178-1.
This is why we believe we are now hitting this bug.

My question is whether this seems plausible, and if so, whether the fix I 
linked can also be released for Debian 11?

We could also upgrade to the bullseye-backports kernel, but given that this 
issue makes the system essentially unusable and we hit it every few days on one 
of our servers it may be more widespread and worth it to fix it in the regular 
bullseye kernel as well.

Thank you!
Best regards,


Niels Hendriks

Bug#1016710: Update for Buster

2022-09-04 Thread Niels Hendriks
Hi,


Hopefully this is the right place to ask this.
We noticed that CVE-2022-37434 shows no fixed version for Debian buster ( 
https://security-tracker.debian.org/tracker/CVE-2022-37434 )


Since Bullseye received the fix a >7 days ago we were wondering when Buster 
would get an updated package.
The CVSS score is 9.8, that's why we thought it would also be fixed for Buster.


Thanks!
Niels

__
RootNet B.V.

Helpdesk: 024 3500112 (9:00 - 17:30)
Service meldingen: rootnet.network
Meldingen via Twitter: twitter.com/RootnetNL

Bug#887106: Bug#886630: linux-image-3.2.0-5-amd64 Kernel panic after upgrading when use hidepid Debian wheezy

2018-01-16 Thread Niels Hendriks
Hi Ben,

Thanks for your response. Is there any ETA for when the new version will be
released ? We'd like to patch the Meltdown vulnerability and also keep
hidepid enabled, but currently the system is unusable with this kernel and
hidepid enabled. We tried running with wheezy-backports but it seems that
kernel doesn't have the meltdown patch yet. We'd prefer not to compile the
kernel manually from source.

Thank you,
Niels Hendriks


On 15 January 2018 at 00:02, Ben Hutchings <b...@decadent.org.uk> wrote:

> Control: tag -1 patch
>
> On Mon, 2018-01-08 at 10:29 +0100, Camilo Echevarne wrote:
> [...]
> > After updating the linux-image-amd64 system package, when we try to
> > mount proc with the hidepid option the server  throws a kernel panic.
> [...]
>
> This is a warning, not a panic (which would stop the kernel
> completely).  Still, I assume that the permission denial makes it
> impractical to use the system with hidepid enabled.
>
> This problem was not caused by any of the fixes in the latest update,
> but by a fix in 3.2.93 that meant I should have updated the backport of
> the hidepid feature.  However, I added a binary compatibility patch to
> avoid problems like this with any out-of-tree users of the API, and
> that hid the problem until I bumped the ABI number and removed all the
> binary compatibility patches.
>
> I'll fix this in the next upload.  As a temporary measure, you can
> rebuild the kernel package with the attached patch, by following the
> instructions here:
> https://kernel-handbook.alioth.debian.org/ch-common-
> tasks.html#s-common-official
>
> Ben.
>
> --
> Ben Hutchings
> The generation of random numbers is too important to be left to chance.
>- Robert Coveyou
>


Bug#868568: adduser - deluser command says user has running processes when user has a custom UID assigned

2017-07-16 Thread Niels Hendriks
Package: adduser
Version: 3.115
OS: Debian 9 amd64

Linux debian 4.9.15-x86_64-linode81 #1 SMP Fri Mar 17 09:47:36 EDT 2017
x86_64 GNU/Linux

dpkg -s libc6 | grep ^Version
Version: 2.24-11

Hello,

On a clean Debian 9 amd64 install the deluser command detects running
processes from the user I am trying to delete, while no processes are
running under that user. This is only the case when the user has a special
UID assigned.

Steps to reproduce on a clean Debian 9 install. I did this on a Linode upon
the first login with SSH as root:

adduser foo
adduser foo1234 --uid 101234
adduser foo1234 sudo
su - foo1234
sudo su -
# enter sudo password
deluser foo

This gives the following output:

root@debian:~# adduser foo
Adding user `foo' ...
Adding new group `foo' (1000) ...
Adding new user `foo' (1000) with group `foo' ...
Creating home directory `/home/foo' ...
Copying files from `/etc/skel' ...
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for foo
Enter the new value, or press ENTER for the default
Full Name []:
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] y
root@debian:~# adduser foo1234 --uid 101234
Adding user `foo1234' ...
Adding new group `foo1234' (101234) ...
Adding new user `foo1234' (101234) with group `foo1234' ...
Creating home directory `/home/foo1234' ...
Copying files from `/etc/skel' ...
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for foo1234
Enter the new value, or press ENTER for the default
Full Name []:
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] y
root@debian:~# adduser foo1234 sudo
Adding user `foo1234' to group `sudo' ...
Adding user foo1234 to group sudo
Done.
root@debian:~# su - foo1234
foo1234@debian:~$ sudo su -
[sudo] password for foo1234:
root@debian:~# deluser foo
Removing user `foo' ...
Warning: group `foo' has no more members.
userdel: user foo is currently used by process 4892
/usr/sbin/deluser: `/usr/sbin/userdel foo' returned error code 8. Exiting.
root@debian:~#
root@debian:~# ps aux | grep 4892
foo1234   4892  0.0  0.4  20936  4740 pts/0S17:54   0:00 -su
root  4917  0.0  0.0  12788   940 pts/0S+   17:55   0:00 grep 4892


As you can see I am unable to delete the user foo because of the running
process with pid 4892, which is actually the process from user foo1234

When I do not assign a specific UID to the user foo1234 it works correctly
and as expected.

I also used specific UIDs in debian 8, and this issue did not pop up there.
It seems to be new in debian 9.

This is my first bug report to Debian so I hope all required information is
present and that you are able to reproduce it with the above steps.

Thanks,
Niels Hendriks