Re: SOLVED Re: Disk corruption and performance issue.
On Mon, 26 Feb 2024, Andy Smith wrote: Hi, On Mon, Feb 26, 2024 at 06:25:53PM +, Tim Woodall wrote: Feb 17 17:01:49 xen17 vmunix: [3.802581] ata1.00: disabling queued TRIM support Feb 17 17:01:49 xen17 vmunix: [3.805074] ata1.00: disabling queued TRIM support from libata-core.c { "Samsung SSD 870*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM | ATA_HORKAGE_NO_NCQ_ON_ATI }, This fixed the disk corruption errors at the cost of dramatically reducing performance. (I'm not sure why because manual fstrim didn't improve things) That's interesting. I have quite a few of these drives and haven't noticed any problems. What kernel version introduced the above workarounds? $ sudo lsblk -do NAME,MODEL NAME MODEL sda SAMSUNG_MZ7KM1T9HAJM-5 sdb SAMSUNG_MZ7KM1T9HAJM-5 sdc Samsung_SSD_870_EVO_4TB sdd Samsung_SSD_870_EVO_4TB sde ST4000LM016-1N2170 sdf ST4000LM016-1N2170 sdg SuperMicro_SSD sdh SuperMicro_SSD Thanks, Andy Looks like the fix was brand new around sept 2021 https://www.neowin.net/news/linux-patch-disables-trim-and-ncq-on-samsung-860870-ssds-in-intel-and-amd-systems/ I was still seeing corruption in August 2022 but it's possible the fix wasn't backported to whatever release I was running. Tim.
Re: SOLVED Re: Disk corruption and performance issue.
Hi, On Mon, Feb 26, 2024 at 06:25:53PM +, Tim Woodall wrote: > Feb 17 17:01:49 xen17 vmunix: [3.802581] ata1.00: disabling queued TRIM > support > Feb 17 17:01:49 xen17 vmunix: [3.805074] ata1.00: disabling queued TRIM > support > > > from libata-core.c > > { "Samsung SSD 870*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | > ATA_HORKAGE_ZERO_AFTER_TRIM | > ATA_HORKAGE_NO_NCQ_ON_ATI }, > > This fixed the disk corruption errors at the cost of dramatically > reducing performance. (I'm not sure why because manual fstrim didn't > improve things) That's interesting. I have quite a few of these drives and haven't noticed any problems. What kernel version introduced the above workarounds? $ sudo lsblk -do NAME,MODEL NAME MODEL sda SAMSUNG_MZ7KM1T9HAJM-5 sdb SAMSUNG_MZ7KM1T9HAJM-5 sdc Samsung_SSD_870_EVO_4TB sdd Samsung_SSD_870_EVO_4TB sde ST4000LM016-1N2170 sdf ST4000LM016-1N2170 sdg SuperMicro_SSD sdh SuperMicro_SSD Thanks, Andy -- https://bitfolk.com/ -- No-nonsense VPS hosting
Re: SOLVED Re: Disk corruption and performance issue.
>>> You should not be running trim in a container/virtual machine >> Why not? That's, in my case, basically saying "you should not be running >> trim on a drive exported via iscsi" Perhaps I shouldn't be but I'd like >> to understand why. Enabling thin_provisioning and fstrim works and gets >> mapped to the underlying layers all the way down to the SSD. > > I guest you didn't understand the systemd timer that runs fstrim on > the host. How can the host properly run `fstrim` if it only sees a disk image and may not know how that image is divided into partitions/filesystems? Stefan
Re: SOLVED Re: Disk corruption and performance issue.
On 2/26/24 16:31, Tim Woodall wrote: On Mon, 26 Feb 2024, Gremlin wrote: re running fstrim in a vm. The Host system takes care of it I guess you've no idea what iscsi is. Because this makes no sense at all. systemd or no systemd. The physical disk doesn't have to be something the host system knows anything about. Here's a thread of someone wanting to do fstrim from a vm with iscsi mounted disks. https://serverfault.com/questions/1031580/trim-unmap-zvol-over-iscsi And another page suggesting you should. https://gist.github.com/hostberg/86bfaa81e50cc0666f1745e1897c0a56 8.10.2. Trim/Discard It is good practice to run fstrim (discard) regularly on VMs and containers. This releases data blocks that the filesystem isn't using anymore. It reduces data usage and resource load. Most modern operating systems issue such discard commands to their disks regularly. You only need to ensure that the Virtual Machines enable the disk discard option. I would guess that if you use sparse file backed storage to a vm you'd want the vm to run fstrim too but this isn't a setup I've ever used so perhaps it's nonsense. Never mind, arguing with me will not solve your issue.
Re: SOLVED Re: Disk corruption and performance issue.
On Mon, 26 Feb 2024, Gremlin wrote: re running fstrim in a vm. The Host system takes care of it I guess you've no idea what iscsi is. Because this makes no sense at all. systemd or no systemd. The physical disk doesn't have to be something the host system knows anything about. Here's a thread of someone wanting to do fstrim from a vm with iscsi mounted disks. https://serverfault.com/questions/1031580/trim-unmap-zvol-over-iscsi And another page suggesting you should. https://gist.github.com/hostberg/86bfaa81e50cc0666f1745e1897c0a56 8.10.2. Trim/Discard It is good practice to run fstrim (discard) regularly on VMs and containers. This releases data blocks that the filesystem isn't using anymore. It reduces data usage and resource load. Most modern operating systems issue such discard commands to their disks regularly. You only need to ensure that the Virtual Machines enable the disk discard option. I would guess that if you use sparse file backed storage to a vm you'd want the vm to run fstrim too but this isn't a setup I've ever used so perhaps it's nonsense.
Re: SOLVED Re: Disk corruption and performance issue.
On 2/26/24 14:40, Tim Woodall wrote: On Mon, 26 Feb 2024, Gremlin wrote: Are you using systemd ? No, I'm not You should not be running trim in a container/virtual machine Why not? That's, in my case, basically saying "you should not be running trim on a drive exported via iscsi" Perhaps I shouldn't be but I'd like to understand why. Enabling thin_provisioning and fstrim works and gets mapped to the underlying layers all the way down to the SSD. I guest you didn't understand the systemd timer that runs fstrim on the host. My underlying VG is less than 50% occupied, so I can trim the free space by creating a LV and then removing it again (I have issue_discards set) FWIW, I did issue fstrim in the VMs with no visible issues at all. Perhaps I got lucky? Here is some info: https://wiki.archlinux.org/title/Solid_state_drive I don't see VM or virtual machine anywhere on that page. Exactly, and you should not be running it in a VM/container. Which BTW systemd will not run fstrim in a container. The Host system takes care of it Well you can keep shooting yourself in the butt as long as you wish, I on the other hand tend not to do that as much I possibly can as I need to be able to set down at times.
Re: SOLVED Re: Disk corruption and performance issue.
On Mon, 26 Feb 2024, Gremlin wrote: Are you using systemd ? No, I'm not You should not be running trim in a container/virtual machine Why not? That's, in my case, basically saying "you should not be running trim on a drive exported via iscsi" Perhaps I shouldn't be but I'd like to understand why. Enabling thin_provisioning and fstrim works and gets mapped to the underlying layers all the way down to the SSD. My underlying VG is less than 50% occupied, so I can trim the free space by creating a LV and then removing it again (I have issue_discards set) FWIW, I did issue fstrim in the VMs with no visible issues at all. Perhaps I got lucky? Here is some info: https://wiki.archlinux.org/title/Solid_state_drive I don't see VM or virtual machine anywhere on that page.
Re: SOLVED Re: Disk corruption and performance issue.
On 2/26/24 13:25, Tim Woodall wrote: TLDR; there was a firmware bug in a disk in the raid array resulting in data corruption. A subsequent kernel workaround resulted in dramatically reducing the disk performance. (probably just writes but I didn't confirm) Initially, under heavy disk load I got errors like: Preparing to unpack .../03-libperl5.34_5.34.0-5_arm64.deb ... Unpacking libperl5.34:arm64 (5.34.0-5) ... dpkg-deb (subprocess): decompressing archive '/tmp/apt-dpkg-install-zqY3js/03-libperl5.34_5.34.0-5_arm64.deb' (size=4015516) member 'data.tar': lzma error: compressed data is corrupt dpkg-deb: error: subprocess returned error exit status 2 dpkg: error processing archive /tmp/apt-dpkg-install-zqY3js/03-libperl5.34_5.34.0-5_arm64.deb (--unpack): cannot copy extracted data for './usr/lib/aarch64-linux-gnu/libperl.so.5.34.0' to '/usr/lib/aarch64-linux-gnu/libperl.so.5.34.0.dpkg-new': unexpected end of file or stream The checksum will have been verified by apt during the download but when it comes to read the downloaded deb to unpack and install it doesn't get the same data. The corruption can happen at both the writing (the file on disk is corrupted) and the reading (the file on disk has the correct checksum) A second problem I got was 503 errors from apt-cacher-ng (which ran on the same machine as the above error) I initially assumed this was due to faulty memory, or possibly a faulty CPU. But I assumed memory because the disk errors were happening in a VM and no other VMs were affected. Because I always start the same VMs in the same order I assumed they'd be using the same physical memory each time. However, nothing I could do would help track down where the memory problem was. Everything worked perfectly except when using the disk under load. At this time I spent a significant amount of time migrating everything important, including the big job that triggered this problem, off this machine onto the pair. After that the corruption problems went away but I continued to get periodic 503 errors from apt-cacher-ng. I continued to worry at this on and off but failed to make any progress in finding what was wrong. The version of the motherboard is no longer available otherwise I'd probably have bought another one. During this time I also spent quite a lot of time ensurning that it was much easier to move VMs between my two machines. I'd underestimated how tricky this would be if the dodgy machine failed totally which I became aware of when I did migrate the VM having problems. Late last year or early this year someone (possibly Andy Smith?) posted a question about logical/physical sector sizes on SSDs. That set me off investigating again as that's not something I'd thought of. That didn't prove fruitful either but I did notice this in the kernel logs: Feb 17 17:01:49 xen17 vmunix: [ 3.802581] ata1.00: disabling queued TRIM support Feb 17 17:01:49 xen17 vmunix: [ 3.805074] ata1.00: disabling queued TRIM support from libata-core.c { "Samsung SSD 870*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM | ATA_HORKAGE_NO_NCQ_ON_ATI }, This fixed the disk corruption errors at the cost of dramatically reducing performance. (I'm not sure why because manual fstrim didn't improve things) At this point I'd discovered that the big job that had been regularly hitting corruption issues now completed. However, it was taking 19 hours instead of 11 hours. I ordered some new disks - I'd assumed both disks were affected but while writing this I notice that that "disabling queued TRIM support" prints twice for the same disk, not once per disk. I thought one of these was my disk but looking again now I see I had 1000MX500 which doesn't actually match. { "Crucial_CT*M500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM }, { "Crucial_CT*MX100*", "MU01", ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM }, While waiting for my disks I started looking at the apt-cacher-ng 503 problem - which has continued to bug me. I got lucky and discovered a way I could almost always trigger it. I managed to track that down to a race condition when updating the Release files if multiple machines request the same file at the same moment. After finding a fix I found this bug reporting the same problem: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1022043 There is now a patch attached to that bug that I've been running for a few weeks without a single 503 error. And Sunday I replaced the two disks with new ones. Today that big job completed in 10h15m. Another thing I notice, although I'm not sure I understand what is going on, is that my iscsi disks all have Thin-provisioning: No This means tha fstrim on the vm doesn't work. Switching them to Yes and it does. So I'm not exactly sure where the queued trim was coming from > in the first place. Are you using systemd ?
SOLVED Re: Disk corruption and performance issue.
TLDR; there was a firmware bug in a disk in the raid array resulting in data corruption. A subsequent kernel workaround resulted in dramatically reducing the disk performance. (probably just writes but I didn't confirm) Initially, under heavy disk load I got errors like: Preparing to unpack .../03-libperl5.34_5.34.0-5_arm64.deb ... Unpacking libperl5.34:arm64 (5.34.0-5) ... dpkg-deb (subprocess): decompressing archive '/tmp/apt-dpkg-install-zqY3js/03-libperl5.34_5.34.0-5_arm64.deb' (size=4015516) member 'data.tar': lzma error: compressed data is corrupt dpkg-deb: error: subprocess returned error exit status 2 dpkg: error processing archive /tmp/apt-dpkg-install-zqY3js/03-libperl5.34_5.34.0-5_arm64.deb (--unpack): cannot copy extracted data for './usr/lib/aarch64-linux-gnu/libperl.so.5.34.0' to '/usr/lib/aarch64-linux-gnu/libperl.so.5.34.0.dpkg-new': unexpected end of file or stream The checksum will have been verified by apt during the download but when it comes to read the downloaded deb to unpack and install it doesn't get the same data. The corruption can happen at both the writing (the file on disk is corrupted) and the reading (the file on disk has the correct checksum) A second problem I got was 503 errors from apt-cacher-ng (which ran on the same machine as the above error) I initially assumed this was due to faulty memory, or possibly a faulty CPU. But I assumed memory because the disk errors were happening in a VM and no other VMs were affected. Because I always start the same VMs in the same order I assumed they'd be using the same physical memory each time. However, nothing I could do would help track down where the memory problem was. Everything worked perfectly except when using the disk under load. At this time I spent a significant amount of time migrating everything important, including the big job that triggered this problem, off this machine onto the pair. After that the corruption problems went away but I continued to get periodic 503 errors from apt-cacher-ng. I continued to worry at this on and off but failed to make any progress in finding what was wrong. The version of the motherboard is no longer available otherwise I'd probably have bought another one. During this time I also spent quite a lot of time ensurning that it was much easier to move VMs between my two machines. I'd underestimated how tricky this would be if the dodgy machine failed totally which I became aware of when I did migrate the VM having problems. Late last year or early this year someone (possibly Andy Smith?) posted a question about logical/physical sector sizes on SSDs. That set me off investigating again as that's not something I'd thought of. That didn't prove fruitful either but I did notice this in the kernel logs: Feb 17 17:01:49 xen17 vmunix: [3.802581] ata1.00: disabling queued TRIM support Feb 17 17:01:49 xen17 vmunix: [3.805074] ata1.00: disabling queued TRIM support from libata-core.c { "Samsung SSD 870*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM | ATA_HORKAGE_NO_NCQ_ON_ATI }, This fixed the disk corruption errors at the cost of dramatically reducing performance. (I'm not sure why because manual fstrim didn't improve things) At this point I'd discovered that the big job that had been regularly hitting corruption issues now completed. However, it was taking 19 hours instead of 11 hours. I ordered some new disks - I'd assumed both disks were affected but while writing this I notice that that "disabling queued TRIM support" prints twice for the same disk, not once per disk. I thought one of these was my disk but looking again now I see I had 1000MX500 which doesn't actually match. { "Crucial_CT*M500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM }, { "Crucial_CT*MX100*", "MU01", ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM }, While waiting for my disks I started looking at the apt-cacher-ng 503 problem - which has continued to bug me. I got lucky and discovered a way I could almost always trigger it. I managed to track that down to a race condition when updating the Release files if multiple machines request the same file at the same moment. After finding a fix I found this bug reporting the same problem: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1022043 There is now a patch attached to that bug that I've been running for a few weeks without a single 503 error. And Sunday I replaced the two disks with new ones. Today that big job completed in 10h15m. Another thing I notice, although I'm not sure I understand what is going on, is that my iscsi disks all have Thin-provisioning: No This means tha fstrim on the vm doesn't work. Switching them to Yes and it does. So I'm not exactly sure where the queued trim was coming from in the first place. I also need to check the version of tgt in sid because there doesn't seem to be an option to switch this in the config