Re: SOLVED Re: Disk corruption and performance issue.

2024-02-26 Thread Tim Woodall

On Mon, 26 Feb 2024, Andy Smith wrote:


Hi,

On Mon, Feb 26, 2024 at 06:25:53PM +, Tim Woodall wrote:

Feb 17 17:01:49 xen17 vmunix: [3.802581] ata1.00: disabling queued TRIM 
support
Feb 17 17:01:49 xen17 vmunix: [3.805074] ata1.00: disabling queued TRIM 
support


from libata-core.c

 { "Samsung SSD 870*",  NULL, ATA_HORKAGE_NO_NCQ_TRIM |
  ATA_HORKAGE_ZERO_AFTER_TRIM |
  ATA_HORKAGE_NO_NCQ_ON_ATI },

This fixed the disk corruption errors at the cost of dramatically
reducing performance. (I'm not sure why because manual fstrim didn't
improve things)


That's interesting. I have quite a few of these drives and haven't
noticed any problems. What kernel version introduced the above
workarounds?

$ sudo lsblk -do NAME,MODEL
NAME MODEL
sda  SAMSUNG_MZ7KM1T9HAJM-5
sdb  SAMSUNG_MZ7KM1T9HAJM-5
sdc  Samsung_SSD_870_EVO_4TB
sdd  Samsung_SSD_870_EVO_4TB
sde  ST4000LM016-1N2170
sdf  ST4000LM016-1N2170
sdg  SuperMicro_SSD
sdh  SuperMicro_SSD

Thanks,
Andy



Looks like the fix was brand new around sept 2021
https://www.neowin.net/news/linux-patch-disables-trim-and-ncq-on-samsung-860870-ssds-in-intel-and-amd-systems/

I was still seeing corruption in August 2022 but it's possible the fix
wasn't backported to whatever release I was running.

Tim.



Re: SOLVED Re: Disk corruption and performance issue.

2024-02-26 Thread Andy Smith
Hi,

On Mon, Feb 26, 2024 at 06:25:53PM +, Tim Woodall wrote:
> Feb 17 17:01:49 xen17 vmunix: [3.802581] ata1.00: disabling queued TRIM 
> support
> Feb 17 17:01:49 xen17 vmunix: [3.805074] ata1.00: disabling queued TRIM 
> support
> 
> 
> from libata-core.c
> 
>  { "Samsung SSD 870*",  NULL, ATA_HORKAGE_NO_NCQ_TRIM |
>   ATA_HORKAGE_ZERO_AFTER_TRIM |
>   ATA_HORKAGE_NO_NCQ_ON_ATI },
> 
> This fixed the disk corruption errors at the cost of dramatically
> reducing performance. (I'm not sure why because manual fstrim didn't
> improve things)

That's interesting. I have quite a few of these drives and haven't
noticed any problems. What kernel version introduced the above
workarounds?

$ sudo lsblk -do NAME,MODEL
NAME MODEL
sda  SAMSUNG_MZ7KM1T9HAJM-5
sdb  SAMSUNG_MZ7KM1T9HAJM-5
sdc  Samsung_SSD_870_EVO_4TB
sdd  Samsung_SSD_870_EVO_4TB
sde  ST4000LM016-1N2170
sdf  ST4000LM016-1N2170
sdg  SuperMicro_SSD
sdh  SuperMicro_SSD

Thanks,
Andy

-- 
https://bitfolk.com/ -- No-nonsense VPS hosting



Re: SOLVED Re: Disk corruption and performance issue.

2024-02-26 Thread Stefan Monnier
>>> You should not be running trim in a container/virtual machine
>> Why not? That's, in my case, basically saying "you should not be running
>> trim on a drive exported via iscsi" Perhaps I shouldn't be but I'd like
>> to understand why. Enabling thin_provisioning and fstrim works and gets
>> mapped to the underlying layers all the way down to the SSD.
>
> I guest you didn't understand the systemd timer that runs fstrim on
> the host.

How can the host properly run `fstrim` if it only sees a disk image and
may not know how that image is divided into partitions/filesystems?


Stefan



Re: SOLVED Re: Disk corruption and performance issue.

2024-02-26 Thread Gremlin

On 2/26/24 16:31, Tim Woodall wrote:

On Mon, 26 Feb 2024, Gremlin wrote:

re running fstrim in a vm.


The Host system takes care of it


I guess you've no idea what iscsi is. Because this makes no sense at
all. systemd or no systemd. The physical disk doesn't have to be
something the host system knows anything about.

Here's a thread of someone wanting to do fstrim from a vm with iscsi
mounted disks.

https://serverfault.com/questions/1031580/trim-unmap-zvol-over-iscsi


And another page suggesting you should.

https://gist.github.com/hostberg/86bfaa81e50cc0666f1745e1897c0a56

8.10.2. Trim/Discard It is good practice to run fstrim (discard)
regularly on VMs and containers. This releases data blocks that the
filesystem isn't using anymore. It reduces data usage and resource load.
Most modern operating systems issue such discard commands to their disks
regularly. You only need to ensure that the Virtual Machines enable the
disk discard option.


I would guess that if you use sparse file backed storage to a vm you'd
want the vm to run fstrim too but this isn't a setup I've ever used so
perhaps it's nonsense.





Never mind, arguing with me will not solve your issue.






Re: SOLVED Re: Disk corruption and performance issue.

2024-02-26 Thread Tim Woodall

On Mon, 26 Feb 2024, Gremlin wrote:

re running fstrim in a vm.


The Host system takes care of it


I guess you've no idea what iscsi is. Because this makes no sense at
all. systemd or no systemd. The physical disk doesn't have to be
something the host system knows anything about.

Here's a thread of someone wanting to do fstrim from a vm with iscsi
mounted disks.

https://serverfault.com/questions/1031580/trim-unmap-zvol-over-iscsi


And another page suggesting you should.

https://gist.github.com/hostberg/86bfaa81e50cc0666f1745e1897c0a56

8.10.2. Trim/Discard It is good practice to run fstrim (discard)
regularly on VMs and containers. This releases data blocks that the
filesystem isn't using anymore. It reduces data usage and resource load.
Most modern operating systems issue such discard commands to their disks
regularly. You only need to ensure that the Virtual Machines enable the
disk discard option.


I would guess that if you use sparse file backed storage to a vm you'd
want the vm to run fstrim too but this isn't a setup I've ever used so
perhaps it's nonsense.



Re: SOLVED Re: Disk corruption and performance issue.

2024-02-26 Thread Gremlin

On 2/26/24 14:40, Tim Woodall wrote:

On Mon, 26 Feb 2024, Gremlin wrote:


Are you using systemd ?

No, I'm not


You should not be running trim in a container/virtual machine


Why not? That's, in my case, basically saying "you should not be running
trim on a drive exported via iscsi" Perhaps I shouldn't be but I'd like
to understand why. Enabling thin_provisioning and fstrim works and gets
mapped to the underlying layers all the way down to the SSD.


I guest you didn't understand the systemd timer that runs fstrim on the 
host.




My underlying VG is less than 50% occupied, so I can trim the free space
by creating a LV and then removing it again (I have issue_discards set)

FWIW, I did issue fstrim in the VMs with no visible issues at all.
Perhaps I got lucky?


Here is some info: https://wiki.archlinux.org/title/Solid_state_drive


I don't see VM or virtual machine anywhere on that page.




Exactly, and you should not be running it in a VM/container. Which BTW 
systemd will not run fstrim in a container.


The Host system takes care of it

Well you can keep shooting yourself in the butt as long as you wish,  I 
on the other hand tend not to do that as much I possibly can as I need 
to be able to set down at times.




Re: SOLVED Re: Disk corruption and performance issue.

2024-02-26 Thread Tim Woodall

On Mon, 26 Feb 2024, Gremlin wrote:


Are you using systemd ?

No, I'm not


You should not be running trim in a container/virtual machine


Why not? That's, in my case, basically saying "you should not be running
trim on a drive exported via iscsi" Perhaps I shouldn't be but I'd like
to understand why. Enabling thin_provisioning and fstrim works and gets
mapped to the underlying layers all the way down to the SSD.

My underlying VG is less than 50% occupied, so I can trim the free space
by creating a LV and then removing it again (I have issue_discards set)

FWIW, I did issue fstrim in the VMs with no visible issues at all.
Perhaps I got lucky?


Here is some info: https://wiki.archlinux.org/title/Solid_state_drive


I don't see VM or virtual machine anywhere on that page.



Re: SOLVED Re: Disk corruption and performance issue.

2024-02-26 Thread Gremlin

On 2/26/24 13:25, Tim Woodall wrote:

TLDR; there was a firmware bug in a disk in the raid array resulting in
data corruption. A subsequent kernel workaround resulted in
dramatically reducing the disk performance. (probably just writes but I
didn't confirm)


Initially, under heavy disk load I got errors like:


Preparing to unpack .../03-libperl5.34_5.34.0-5_arm64.deb ...
Unpacking libperl5.34:arm64 (5.34.0-5) ...
dpkg-deb (subprocess): decompressing archive 
'/tmp/apt-dpkg-install-zqY3js/03-libperl5.34_5.34.0-5_arm64.deb' 
(size=4015516) member 'data.tar': lzma error: compressed data is corrupt

dpkg-deb: error:  subprocess returned error exit status 2
dpkg: error processing archive 
/tmp/apt-dpkg-install-zqY3js/03-libperl5.34_5.34.0-5_arm64.deb 
(--unpack):
cannot copy extracted data for 
'./usr/lib/aarch64-linux-gnu/libperl.so.5.34.0' to 
'/usr/lib/aarch64-linux-gnu/libperl.so.5.34.0.dpkg-new': unexpected 
end of file or stream


The checksum will have been verified by apt during the download but when
it comes to read the downloaded deb to unpack and install it doesn't get
the same data. The corruption can happen at both the writing (the file
on disk is corrupted) and the reading (the file on disk has the correct
checksum)



A second problem I got was 503 errors from apt-cacher-ng (which ran on
the same machine as the above error)



I initially assumed this was due to faulty memory, or possibly a faulty
CPU. But I assumed memory because the disk errors were happening in a VM
and no other VMs were affected. Because I always start the same VMs in
the same order I assumed they'd be using the same physical memory each
time.

However, nothing I could do would help track down where the memory
problem was. Everything worked perfectly except when using the disk
under load.

At this time I spent a significant amount of time migrating everything
important, including the big job that triggered this problem, off this
machine onto the pair. After that the corruption problems went away but
I continued to get periodic 503 errors from apt-cacher-ng.


I continued to worry at this on and off but failed to make any progress
in finding what was wrong. The version of the motherboard is no longer
available otherwise I'd probably have bought another one. During this
time I also spent quite a lot of time ensurning that it was much easier
to move VMs between my two machines. I'd underestimated how tricky this
would be if the dodgy machine failed totally which I became aware of
when I did migrate the VM having problems.


Late last year or early this year someone (possibly Andy Smith?) posted
a question about logical/physical sector sizes on SSDs. That set me off
investigating again as that's not something I'd thought of. That didn't
prove fruitful either but I did notice this in the kernel logs:

Feb 17 17:01:49 xen17 vmunix: [    3.802581] ata1.00: disabling queued 
TRIM support
Feb 17 17:01:49 xen17 vmunix: [    3.805074] ata1.00: disabling queued 
TRIM support



from libata-core.c

  { "Samsung SSD 870*",  NULL, ATA_HORKAGE_NO_NCQ_TRIM |
   ATA_HORKAGE_ZERO_AFTER_TRIM |
   ATA_HORKAGE_NO_NCQ_ON_ATI },

This fixed the disk corruption errors at the cost of dramatically
reducing performance. (I'm not sure why because manual fstrim didn't
improve things)


At this point I'd discovered that the big job that had been regularly
hitting corruption issues now completed. However, it was taking 19 hours
instead of 11 hours.

I ordered some new disks - I'd assumed both disks were affected but
while writing this I notice that that "disabling queued TRIM support"
prints twice for the same disk, not once per disk.

I thought one of these was my disk but looking again now I see I had
1000MX500 which doesn't actually match.

  { "Crucial_CT*M500*",  NULL, ATA_HORKAGE_NO_NCQ_TRIM |
   ATA_HORKAGE_ZERO_AFTER_TRIM },
  { "Crucial_CT*MX100*",  "MU01", ATA_HORKAGE_NO_NCQ_TRIM |
   ATA_HORKAGE_ZERO_AFTER_TRIM },

While waiting for my disks I started looking at the apt-cacher-ng
503 problem - which has continued to bug me. I got lucky and discovered
a way I could almost always trigger it.

I managed to track that down to a race condition when updating the
Release files if multiple machines request the same file at the same
moment.

After finding a fix I found this bug reporting the same problem:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1022043

There is now a patch attached to that bug that I've been running for a
few weeks without a single 503 error.

And Sunday I replaced the two disks with new ones. Today that big job
completed in 10h15m.

Another thing I notice, although I'm not sure I understand what is going
on, is that my iscsi disks all have
    Thin-provisioning: No

This means tha fstrim on the vm doesn't work. Switching them to Yes and
it does. So I'm not exactly sure where the queued trim was coming from

> in the first place.


Are you using systemd ?


SOLVED Re: Disk corruption and performance issue.

2024-02-26 Thread Tim Woodall

TLDR; there was a firmware bug in a disk in the raid array resulting in
data corruption. A subsequent kernel workaround resulted in
dramatically reducing the disk performance. (probably just writes but I
didn't confirm)


Initially, under heavy disk load I got errors like:


Preparing to unpack .../03-libperl5.34_5.34.0-5_arm64.deb ...
Unpacking libperl5.34:arm64 (5.34.0-5) ...
dpkg-deb (subprocess): decompressing archive 
'/tmp/apt-dpkg-install-zqY3js/03-libperl5.34_5.34.0-5_arm64.deb' 
(size=4015516) member 'data.tar': lzma error: compressed data is corrupt

dpkg-deb: error:  subprocess returned error exit status 2
dpkg: error processing archive 
/tmp/apt-dpkg-install-zqY3js/03-libperl5.34_5.34.0-5_arm64.deb (--unpack):
cannot copy extracted data for 
'./usr/lib/aarch64-linux-gnu/libperl.so.5.34.0' to 
'/usr/lib/aarch64-linux-gnu/libperl.so.5.34.0.dpkg-new': unexpected end of 
file or stream


The checksum will have been verified by apt during the download but when
it comes to read the downloaded deb to unpack and install it doesn't get
the same data. The corruption can happen at both the writing (the file
on disk is corrupted) and the reading (the file on disk has the correct
checksum)



A second problem I got was 503 errors from apt-cacher-ng (which ran on
the same machine as the above error)



I initially assumed this was due to faulty memory, or possibly a faulty
CPU. But I assumed memory because the disk errors were happening in a VM
and no other VMs were affected. Because I always start the same VMs in
the same order I assumed they'd be using the same physical memory each
time.

However, nothing I could do would help track down where the memory
problem was. Everything worked perfectly except when using the disk
under load.

At this time I spent a significant amount of time migrating everything
important, including the big job that triggered this problem, off this
machine onto the pair. After that the corruption problems went away but
I continued to get periodic 503 errors from apt-cacher-ng.


I continued to worry at this on and off but failed to make any progress
in finding what was wrong. The version of the motherboard is no longer
available otherwise I'd probably have bought another one. During this
time I also spent quite a lot of time ensurning that it was much easier
to move VMs between my two machines. I'd underestimated how tricky this
would be if the dodgy machine failed totally which I became aware of
when I did migrate the VM having problems.


Late last year or early this year someone (possibly Andy Smith?) posted
a question about logical/physical sector sizes on SSDs. That set me off
investigating again as that's not something I'd thought of. That didn't
prove fruitful either but I did notice this in the kernel logs:

Feb 17 17:01:49 xen17 vmunix: [3.802581] ata1.00: disabling queued TRIM 
support
Feb 17 17:01:49 xen17 vmunix: [3.805074] ata1.00: disabling queued TRIM 
support


from libata-core.c

 { "Samsung SSD 870*",  NULL, ATA_HORKAGE_NO_NCQ_TRIM |
  ATA_HORKAGE_ZERO_AFTER_TRIM |
  ATA_HORKAGE_NO_NCQ_ON_ATI },

This fixed the disk corruption errors at the cost of dramatically
reducing performance. (I'm not sure why because manual fstrim didn't
improve things)


At this point I'd discovered that the big job that had been regularly
hitting corruption issues now completed. However, it was taking 19 hours
instead of 11 hours.

I ordered some new disks - I'd assumed both disks were affected but
while writing this I notice that that "disabling queued TRIM support"
prints twice for the same disk, not once per disk.

I thought one of these was my disk but looking again now I see I had
1000MX500 which doesn't actually match.

 { "Crucial_CT*M500*",  NULL, ATA_HORKAGE_NO_NCQ_TRIM |
  ATA_HORKAGE_ZERO_AFTER_TRIM },
 { "Crucial_CT*MX100*",  "MU01", ATA_HORKAGE_NO_NCQ_TRIM |
  ATA_HORKAGE_ZERO_AFTER_TRIM },

While waiting for my disks I started looking at the apt-cacher-ng
503 problem - which has continued to bug me. I got lucky and discovered
a way I could almost always trigger it.

I managed to track that down to a race condition when updating the
Release files if multiple machines request the same file at the same
moment.

After finding a fix I found this bug reporting the same problem:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1022043

There is now a patch attached to that bug that I've been running for a
few weeks without a single 503 error.

And Sunday I replaced the two disks with new ones. Today that big job
completed in 10h15m.

Another thing I notice, although I'm not sure I understand what is going
on, is that my iscsi disks all have
   Thin-provisioning: No

This means tha fstrim on the vm doesn't work. Switching them to Yes and
it does. So I'm not exactly sure where the queued trim was coming from
in the first place.

I also need to check the version of tgt in sid because there doesn't
seem to be an option to switch this in the config