[Bug 1973167] Re: linux-image-4.15.0-177-generic freezes on the welcome screen

2022-06-02 Thread Thimo E
Thank you for your analysis and test kernel.
A lot of our machines (Supermicro X11 / Xeon W-2133 based) also suffer from the 
problem introduced by 4.15.0-177.

I can confirm that the kernel 4.15.0-182-generic #191+lp1973167 provided
by Kai-Heng Feng fixes the issue on my HW.

Could you please proceed with the roll-out of this patch?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1973167

Title:
  linux-image-4.15.0-177-generic freezes on the welcome screen

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1973167/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1915130] Re: libnfsidmap-regex package broken in current LTS release (focal)

2022-02-22 Thread Thimo E
Unfortunately, my glibc on Ubuntu 20.04 is too old:
rpc.idmapd: libnfsidmap: Unable to load plugin: 
/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by 
/lib/x86_64-linux-gnu/libnfsidmap/regex.so)

mememe@deatcsXXXfcYYY:/# dpkg-query --showformat=\${Version} --show libc6
2.31-0ubuntu9.42.31-0ubuntu9.4

A dry run analysis LGTM:

mememe@deatcsXXXfcYYY:/# readelf -s /tmp/nfsidmap-test/libnfsidmap.so.1.0.0 | 
egrep 'nfsidmap_config_get|conf_get_str'
78: 41f010 FUNCGLOBAL DEFAULT   12 nfsidmap_config_get
mememe@deatcsXXXfcYYY:/# readelf -s /tmp/nfsidmap-test/libnfsidmap/regex.so | 
egrep 'nfsidmap_config_get|conf_get_str'
12:  0 NOTYPE  GLOBAL DEFAULT  UND nfsidmap_config_get

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1915130

Title:
   libnfsidmap-regex package broken in current LTS release (focal)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libnfsidmap-regex/+bug/1915130/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2021-06-23 Thread Thimo E
Hi Matthew,

sorry for the late reply.
Today I triggered another fstrim with the linux-image-5.4.0-75-generic kernel 
and made a final check on the RAID - for me no trouble occured yet.
Thank you for pursuing this topic so persistently and providing the patches to 
the Ubuntu kernel finally.

Best regards,
 Thimo

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2021-06-10 Thread Thimo E
Hi Matthew,

Thanks for your effort to add this feature to the Ubuntu kernels.

I installed linux-image-5.4.0-75-generic on 2021-06-08.
Neither during normal work nor during manual fstrim any problems so far.


Best regards,
 Thimo

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2021-05-10 Thread Thimo E
Hi Matthew,

thank you for your continuous effort. I tested your 5.4.0-72-generic 
#80+TEST1896578v20210504b1-Ubuntu until now without trouble.
I also started fstrim manually on a machine which did not do it for some time 
due to disabled fstrim service.

Regards,
 Thimo

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2021-05-05 Thread Thimo E
Hi Matthew,

thank you for providing the test-kernel and instructions. I will give it
a try.

Regards,
 Thimo

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2021-05-03 Thread Thimo E
Hi Matthew,

are these tests still relevant for you?

BR,
 Thimo

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1915130] [NEW] libnfsidmap-regex package broken in current LTS release (focal)

2021-02-09 Thread Thimo E
Public bug reported:

Dear professionals,

When using the regex translation method for idmapping, idmapd complains
about an unresolved symbol (nfsidmap_config_get) in the regex.so shared
object:

systemctl status -l nfs-idmapd.service
● nfs-idmapd.service - NFSv4 ID-name mapping service
 Loaded: loaded (/lib/systemd/system/nfs-idmapd.service; static; vendor 
preset: enabled)
 Active: failed (Result: exit-code) since Thu 2021-02-04 13:51:52 CET; 
22min ago
Process: 43954 ExecStart=/usr/sbin/rpc.idmapd $RPCIDMAPDARGS (code=exited, 
status=1/FAILURE)

Feb 04 13:51:52 defil37 rpc.idmapd[43954]: sss_nfs_init: use memcache: 1
Feb 04 13:51:52 defil37 rpc.idmapd[43954]: libnfsidmap: loaded plugin 
/lib/x86_64-linux-gnu/libnfsidmap/sss.so for method sss
Feb 04 13:51:52 defil37 rpc.idmapd[43954]: libnfsidmap: Unable to load plugin: 
/lib/x86_64-linux-gnu/libnfsidmap/regex.so: undefined symbol: 
nfsidmap_config_get
Feb 04 13:51:52 defil37 rpc.idmapd[43954]: libnfsidmap: requested translation 
method, 'regex', is not available
Feb 04 13:51:52 defil37 rpc.idmapd[43954]: rpc.idmapd: libnfsidmap: Unable to 
load plugin: /lib/x86_64-linux-gnu/libnfsidmap/regex.so: undefined symbol: 
nfsidmap_config_get
Feb 04 13:51:52 defil37 rpc.idmapd[43954]: rpc.idmapd: libnfsidmap: requested 
translation method, 'regex', is not available
Feb 04 13:51:52 defil37 rpc.idmapd[43954]: rpc.idmapd: Unable to create name to 
user id mappings.
Feb 04 13:51:52 defil37 systemd[1]: nfs-idmapd.service: Control process exited, 
code=exited, status=1/FAILURE
Feb 04 13:51:52 defil37 systemd[1]: nfs-idmapd.service: Failed with result 
'exit-code'.
Feb 04 13:51:52 defil37 systemd[1]: Failed to start NFSv4 ID-name mapping 
service.

This symbol obviously does not exist in the libnfsidmap.so.0.3.0 shared object:
readelf -s /usr/lib/x86_64-linux-gnu/libnfsidmap.so.0.3.0 | grep 
nfsidmap_config_get


When checking the sources at https://github.com/isginf/libnfsidmap-
regex/blob/1179b2ec3392c91a40da228afada46fd210113a2/regex.c#L57 it seems
this library is accidentally using the wrong interface since the symbol
"conf_get_str" exists in libnfsidmap.so.0.3.0.

Since the groovy version of the package "libnfsidmap-regex" did not raise the 
dependencies, I also checked this one and could verify that it:
 a) uses the "conf_get_str" function
 b) loads cleanly

I would like to ask you to either re-compile this package correctly or
publish the groovy package version also for focal.

** Affects: libnfsidmap-regex (Ubuntu)
 Importance: Undecided
 Status: New


** Tags: idmapd nfsd regex

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1915130

Title:
   libnfsidmap-regex package broken in current LTS release (focal)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libnfsidmap-regex/+bug/1915130/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-09 Thread Thimo E
This is just the procedure with the least damage I found.
Still data loss may happen (and actually happened to some of our systems).

Probably first re-adding (after zeroing) the second component to the
RAID and then fsck-ing leads to the exact same result but I wanted to
keep the second component as fall-back until I could see the results of
fsck.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] Re: raid10: discard leads to corrupted file system

2020-12-09 Thread Thimo E
Hi Matthew and all,

thank you for taking action immediately. I really appreciate your
effort.

After investigating the issue further I have to add that the mount
option discard seems to trigger the issue, too.

@Trent
The general problem here is that RAID10 can balance single read streams to all 
disks (which is probably the major advantage over RAID1 effectively providing 
you RAID0 read speed; RAID1 needs parallel reads to achieve this).

That said it is no big surprise that several machines at our site went to 
readonly mode after *some time* (probably reading some filesystem relevant data 
from the "bad disk"). Unfortunately the "clean first disk" only happens if you 
act immediately, otherwise you might have some data corruption.
I verified this on one system where the root partition was affected using the 
debsums tool (just run debsums -xa) after fixing FS errors.

My procedure to recover was:
Assembly of the RAID:
mdadm --assemble /dev/md127 /dev/nvme0n1p2
mdadm --run /dev/md127

Filesystem check on all partitions (note the -f parameter, some FS "think" they 
are clean):
fsck.ext4 -f /dev/VolGroup/...

Re-add the second component:
mdadm --zero-superblock /dev/nvme1n1p2
mdadm --add /dev/md127 /dev/nvme1n1p2

Best regards

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1907262] [NEW] raid10: discard leads to corrupted file system

2020-12-08 Thread Thimo E
Public bug reported:

Seems to be closely related to
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578

After updating the Ubuntu 18.04 kernel from 4.15.0-124 to 4.15.0-126 the
fstrim command triggered by fstrim.timer causes a severe number of
mismatches between two RAID10 component devices.

This bug affects several machines in our company with different HW
configurations (All using ECC RAM). Both, NVMe and SATA SSDs are
affected.

How to reproduce:
 - Create a RAID10 LVM and filesystem on two SSDs
mdadm -C -v -l10 -n2 -N "lv-raid" -R /dev/md0 /dev/nvme0n1p2 /dev/nvme1n1p2
pvcreate -ff -y /dev/md0
vgcreate -f -y VolGroup /dev/md0
lvcreate -n root-L 100G -ay -y VolGroup
mkfs.ext4 /dev/VolGroup/root
mount /dev/VolGroup/root /mnt
 - Write some data, sync and delete it
dd if=/dev/zero of=/mnt/data.raw bs=4K count=1M
sync
rm /mnt/data.raw
 - Check the RAID device
echo check >/sys/block/md0/md/sync_action
 - After finishing (see /proc/mdstat), check the mismatch_cnt (should be 0):
cat /sys/block/md0/md/mismatch_cnt
 - Trigger the bug
fstrim /mnt
 - Re-Check the RAID device
echo check >/sys/block/md0/md/sync_action
 - After finishing (see /proc/mdstat), check the mismatch_cnt (probably in the 
range of N*1):
cat /sys/block/md0/md/mismatch_cnt

After investigating this issue on several machines it *seems* that the
first drive does the trim correctly while the second one goes wild. At
least the number and severity of errors found by a  USB stick live
session fsck.ext4 suggests this.

To perform the single drive evaluation the RAID10 was started using a single 
drive at once:
  mdadm --assemble /dev/md127 /dev/nvme0n1p2
  mdadm --run /dev/md127
  fsck.ext4 -n -f /dev/VolGroup/root

  vgchange -a n /dev/VolGroup
  mdadm --stop /dev/md127

  mdadm --assemble /dev/md127 /dev/nvme1n1p2
  mdadm --run /dev/md127
  fsck.ext4 -n -f /dev/VolGroup/root

When starting these fscks without -n, on the first device it seems the
directory structure is OK while on the second device there is only the
lost+found folder left.

Side-note: Another machine using HWE kernel 5.4.0-56 (after using -53
before) seems to have a quite similar issue.

Unfortunately the risk/regression assessment in the aforementioned bug
is not complete: the workaround only mitigates the issues during FS
creation. This bug on the other hand is triggered by a weekly service
(fstrim) causing severe file system corruption.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1907262

Title:
  raid10: discard leads to corrupted file system

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1752251] Re: Missing mcelog userspace package in bionic - or maybe linux kernel config should disable mcelog_legacy

2020-02-05 Thread Thimo E
The removal of the package is quite unfortunate since rasdaemon is still
missing the email notification feature.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1752251

Title:
  Missing mcelog userspace package in bionic - or maybe linux kernel
  config should disable mcelog_legacy

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1752251/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs