On 5/1/24 14:38, mike tancsa wrote:
Kind of struggling to check if TRIM is actually working or not with my
SSDs on RELENG_14 in ZFS.
On a pool that has almost no files on it (capacity at 0% out of 3TB),
should not
zpool -w trim <pool> be almost instant after a couple of runs ?
Instead it seems to always take about 10min to complete.
Looking at the stats,
kstat.zfs.tortank1.misc.iostats.trim_bytes_failed: 0
kstat.zfs.tortank1.misc.iostats.trim_extents_failed: 0
kstat.zfs.tortank1.misc.iostats.trim_bytes_skipped: 2743435264
kstat.zfs.tortank1.misc.iostats.trim_extents_skipped: 253898
kstat.zfs.tortank1.misc.iostats.trim_bytes_written: 14835526799360
kstat.zfs.tortank1.misc.iostats.trim_extents_written: 1169158
what and why are bytes being skipped ?
One of the drives for example
sysctl -a kern.cam.ada.0
kern.cam.ada.0.trim_ticks: 0
kern.cam.ada.0.trim_goal: 0
kern.cam.ada.0.sort_io_queue: 0
kern.cam.ada.0.rotating: 0
kern.cam.ada.0.unmapped_io: 1
kern.cam.ada.0.flags:
0x1be3bde<CAN_48BIT,CAN_FLUSHCACHE,CAN_NCQ,CAN_DMA,WAS_OTAG,CAN_TRIM,OPEN,SCTX_INIT,CAN_POWERMGT,CAN_DMA48,CAN_LOG,CAN_WCACHE,CAN_RAHEAD,PROBED,ANNOUNCED,DIRTY,PIM_ATA_EXT,UNMAPPEDIO>
kern.cam.ada.0.max_seq_zones: 0
kern.cam.ada.0.optimal_nonseq_zones: 0
kern.cam.ada.0.optimal_seq_zones: 0
kern.cam.ada.0.zone_support: None
kern.cam.ada.0.zone_mode: Not Zoned
kern.cam.ada.0.write_cache: -1
kern.cam.ada.0.read_ahead: -1
kern.cam.ada.0.trim_lbas: 7771432624
kern.cam.ada.0.trim_ranges: 371381
kern.cam.ada.0.trim_count: 310842
kern.cam.ada.0.delete_method: DSM_TRIM
If I take one of the disks out of the pool and replace it with a
spare, and do a manual trim it seems to work
I had a hard time seeing evidence of this at the disk level while
fiddling with TRIM recently. It appeared that at least some counters are
driver and operation specific. For example, the da driver appears to
update counters in some paths but not others. I assume that ada is
different. There is a bug report for da, but haven't seen any feedback ...
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277673
You could try to run gstat with the -d flag during the time period when
the delete operations are expected to occur. That should give you an
idea of what's happening at the disk level in real time but may not
offer more info than you're already seeing.
e.g. here was one disk in the pool that was taking a long time for
each zpool trim
# time trim -f /dev/ada1
trim /dev/ada1 offset 0 length 1000204886016
0.000u 0.057s 1:29.33 0.0% 5+184k 0+0io 0pf+0w
and then if I re-run it
# time trim -f /dev/ada1
trim /dev/ada1 offset 0 length 1000204886016
0.000u 0.052s 0:04.15 1.2% 1+52k 0+0io 0pf+0w
90 seconds and then 4 seconds after that.
-Matthew