Re: ATA TRIM?
On Sun, Dec 25, 2022 at 10:27:49AM -0500, Mouse wrote: > >> According to that PDF, dholland is wrong. > > I fail to see a behaviour that would be allowed due to dholland@'s > > definition, but not according to the one you cited, nor the other way > > round. > > A read returning the pre-TRIM contents. Two of the options > specifically state "independent of the previously written value"; the > third is simply zero, which is also independent of the previously > written value. dholland wrote > > > The state of the data after TRIM is unspecified; you might read the > > old data, you might read zeros or ones, you might (I think) even read > > something else. > > and, as I read that PDF, "you might read the old data" is specifically > disallowed. I believe the drive is allowed to discard the request, in which case you'll read the old data. I expect at that point the block is not "trimmed" but ... that's a detail. I know at least some drives will ignore single-sector trims. -- David A. Holland dholl...@netbsd.org
Re: ATA TRIM?
On Sun, Dec 25, 2022 at 11:10:44AM -0500, Mouse wrote: > >> I find it far more plausible that I'm doing something wrong. > > Or maybe the drive just doesn't obey the spec? > > That's possible, I suppose. But it's a brand new Kingston SSD I've used quite a few Kingston SSDs in BSD systems over the last ten years or so, and whilst they work they have all shown some odd behaviours compared to other brands. Specifically, after a couple of years of use they suddenly become very slow to _read_, I had one that was literally reading at about 2 or 3 Mb/sec compared to ~140 Mb/sec previously in the same system. A secerase has always fixed this issue every time I've encountered it. BUT, the secerase leaves the device bricked for about 60 minutes or sometimes longer. I.E. it blocks the SATA bus and most machines just hang at the BIOS for about 45 seconds until they time out. Power cycling the bricked device seemingly does nothing to help, you just have to wait until it eventually reverts to normal behaviour. We have at least five Kingston SSDs here that have all exhibited this behaviour. I've never seen it with any other SSD, and have used plenty of Crucial, Corsair, and others. > The packaging promises free technical support. I suppose I should try > to chase down a contact (the packaging gives no hint whom to contact > for that promised support) and ask. At worst I'll be told nothing > useful. That would be interesting.
Re: ATA TRIM?
>> I find it far more plausible that I'm doing something wrong. > Or maybe the drive just doesn't obey the spec? That's possible, I suppose. But it's a brand new Kingston SSD, which I would expect would support TRIM. And it self-identifies as supporting TRIM. The packaging promises free technical support. I suppose I should try to chase down a contact (the packaging gives no hint whom to contact for that promised support) and ask. At worst I'll be told nothing useful. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: ATA TRIM?
On Sun, Dec 25, 2022 at 10:27:49AM -0500, Mouse wrote: > I find it far more plausible that I'm doing something wrong. Or maybe the drive just doesn't obey the spec? I've got a disk here that, when sent a SECERASE, writes 0x00 to the first 1 Gb of the media and leaves the rest unchanged. That clearly violates the spec.
Re: ATA TRIM?
>> According to that PDF, dholland is wrong. > I fail to see a behaviour that would be allowed due to dholland@'s > definition, but not according to the one you cited, nor the other way > round. A read returning the pre-TRIM contents. Two of the options specifically state "independent of the previously written value"; the third is simply zero, which is also independent of the previously written value. dholland wrote > The state of the data after TRIM is unspecified; you might read the > old data, you might read zeros or ones, you might (I think) even read > something else. and, as I read that PDF, "you might read the old data" is specifically disallowed. You may read zeros or ones, or something else, but the only way you'll read the old data is if the old data matches what the drive's algorithm happens to return for those sectors (for example, if the drive returns zeros but zeros were what you had written). It is theoretically possible that the data I wrote happens to match what the drive returns for trimmed sectors. Given the data, I find that extremely unlikely. (I may try again with different data, just in case, but I still don't like the way the command is timing out. I find it far more plausible that I'm doing something wrong.) /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: ATA TRIM?
> According to that PDF, dholland is wrong. I fail to see a behaviour that would be allowed due to dholland@'s definition, but not according to the one you cited, nor the other way round.
Re: ATA TRIM?
[dholland] > The state of the data after TRIM is unspecified; you might read the > old data, you might read zeros or ones, you might (I think) even read > something else. [RVP] > OK, I've now actually looked at what the spec[1] says instead of > relying on my faulty recall of stuff I read on lwn.net years ago. > [1] [...] > https://web.archive.org/web/20200616054353if_/http://t13.org/Documents/UploadedDocuments/docs2017/di529r18-ATAATAPI_Command_Set_-_4.pdf According to that PDF, dholland is wrong. PDF page 150, page-number page 113, includes examples of "propperties associated with trimmed logical sectors" including a) no storage resources; and b) read commands return: A) a nondeterministic value that is independent of the previously written value; B) a deterministic value that is independent of the previously written value; or C) zero. though it seems to me (b)(C) is actually a special case of (b)(B). See table 33, later on that page, for more. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: ATA TRIM?
On Mon, Dec 12, 2022 at 11:53:56PM +1100, matthew green wrote: > maybe port that tool back, it's also supposed to match the > linux command of the same name. it's not in netbsd-9, but > last i tried, the interfaces the -current tool uses are > available in -9 kernels. The trim/discard plumbing appeared in -7. -- David A. Holland dholl...@netbsd.org
Re: ATA TRIM?
> are you trying to trim a really large section at once? i think > that's what i see: >> [ - root] 3> date; ./trim /dev/rwd1d 4 2; date That means "first six bytes contain 4, LE; second two bytes contain 2, LE". I thought that in turn meant "2 sectors at offset 4". Apparently it actually means "2 * max_dsm_blocks at offset 4", but max_dsm_blocks is 8 for this device, so that's still only 8K. > at least in my experience, the problem is that most devices take a > while to handle a TRIM request, longer than the 30s timeout typically > used. That's...odd. How can it be useful if it takes that long? Is the intent that it be useful only for very occasional "erase this whole filesystem" use, or what? I thought it was intended for routine filesystem use upon deleting files. > this is why blkdiscard(8) defaults to 32MiB chunks. I once did what I thought was trying to trim 16M, but my current understanding says that attempt would have been 128M. That didn't work any better. I just tried increasing the timeout to 30 (ie, five minutes) and trimming offset 0 size 8, which I now think for this device (with max_dsm_blocks 8) should mean 64 (interface) sectors, ie, 32k. It still timed out, with the same followup timeouts. Note the date output here; it took five minutes for the TRIM to time out, then thirty seconds for wd_flushcache. [ - root] 4> date; trim /dev/rwd1d 0 8; date Mon Dec 12 08:22:29 EST 2022 TRIM wd1: arg 00 00 00 00 00 00 08 00 Version 2040.283, max DSM blocks 8 TRIM wd1: calling exec piixide1:0:1: lost interrupt type: ata tc_bcount: 512 tc_skip: 0 TRIM wd1: returned 1 ATAIOCTRIM workd wd1: wd_flushcache: status=128 Mon Dec 12 08:27:59 EST 2022 [ - root] 5> dd if=/dev/rwd1d of=/dev/null count=8 piixide1:0:1: wait timed out wd1d: device timeout reading fsbn 0 (wd1 bn 0; cn 0 tn 0 sn 0), retrying wd1: soft error (corrected) 8+0 records in 8+0 records out 4096 bytes transferred in 0.005 secs (819200 bytes/sec) [ - root] 6> > maybe port that tool back, I'll try to have a look at it. I haven't been trying to match the -9 userland API, though, so I'm not sure how useful it will actually be. It may point me in a useful direction, though. > it's also supposed to match the linux command of the same name. it's > not in netbsd-9, but last i tried, the interfaces the -current tool > uses are available in -9 kernels. I did bring over the 9.2 syssrc set, so I should be able to figure _something_ out. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: ATA TRIM?
On Sat, 10 Dec 2022, Mouse wrote: OK, so any requests >4K will have to be packaged into further range requests [...] This isn't right. Bytes 7 & 8 of a TRIM range request form a counter. So, a counter of 1 = (1 x max_dsm_blocks); 2 = (2 x max_dsm_blocks) up to 0x counts. So is max_dsm_blocks misnamed, or is it just being abused as a dsm_granularity value by TRIM, whereas other DSM commands do use it as a maximum? If the former, I'd like to rename it in my tree OK, I've now actually looked at what the spec[1] says instead of relying on my faulty recall of stuff I read on lwn.net years ago. So: A single range is 8 bytes: 6 bytes for LBA start + 2 for the count of sectors (logical, so 512 bytes). This makes 512 * 65535 bytes that can be trimmed by a single range. You can have 64 ranges in a 512-byte DSM packet. This makes 64 * 512 * 65535 bytes. max_dsm_blocks is the maximum no. of 512-byte DSM packets which the drive will accept. In your case, 8 blocks = 4K. Therefore your drive can trim, in a single DSM request, a maximum of 8 * 64 * 512 * 65535 = ~16 GB You clearly know a lot more about the relevant commands than I do, Clearly not :-( -RVP [1]: https://ata.wiki.kernel.org/index.php/Developer_Resources and https://web.archive.org/web/20200616054353if_/http://t13.org/Documents/UploadedDocuments/docs2017/di529r18-ATAATAPI_Command_Set_-_4.pdf
re: ATA TRIM?
are you trying to trim a really large section at once? i think that's what i see: > [ - root] 3> date; ./trim /dev/rwd1d 4 2; date at least in my experience, the problem is that most devices take a while to handle a TRIM request, longer than the 30s timeout typically used. this is why blkdiscard(8) defaults to 32MiB chunks. maybe port that tool back, it's also supposed to match the linux command of the same name. it's not in netbsd-9, but last i tried, the interfaces the -current tool uses are available in -9 kernels. .mrg.
Re: ATA TRIM?
>> OK, so any requests >4K will have to be packaged into further range >> requests [...] > This isn't right. Bytes 7 & 8 of a TRIM range request form a > counter. So, a counter of 1 = (1 x max_dsm_blocks); 2 = (2 x > max_dsm_blocks) up to 0x counts. So is max_dsm_blocks misnamed, or is it just being abused as a dsm_granularity value by TRIM, whereas other DSM commands do use it as a maximum? If the former, I'd like to rename it in my tree > And you can have 64 range requests (contiguous or disjoint) in a 512 > byte DSM payload. You clearly know a lot more about the relevant commands than I do, though admittedly at the moment that's a very very low bar. > Start with a `count' of 1 after you set the LBA48 flag. Once I figure out how to get some analog to LBA48, at least. :) Yes, my code sets r_count to 1 because the code I started with does analogously. Until I saw your email, I had no idea there was even any way to _represent_ multiple ranges in a single request. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: ATA TRIM?
>> I tried trimming 8 at 0. Still the same syndrome: TRIM timeout, >> flush timeout, device timeout reading [...] > You may have to set the AT_LBA48 flag (not sure if this is present on > 5.2) It is not. 5.2 has an ATA_LBA48 flag, going with the flags field of struct ata_bio, but no LBA48 flag for ata_command.flags. My evolution of 5.2 has AT_READREG48, which I added as part of my attempt to support HPAs. But that's the closest thing I see, and that's not really close enough to be useful here. > so that `wdccommandext' gets called rather than `wdccommand' for the > ATA_DATA_SET_MANAGEMENT command. All this from [FreeBSD] And presumably the NetBSD wdccommand/wdccommandext difference matches the FreeBSD one closely enough for that to be relevant? I shall have to read wdccommand* over in more detail. Mouse
Re: ATA TRIM?
On Fri, 9 Dec 2022, RVP wrote: OK, so any requests >4K will have to be packaged into further range requests [...] This isn't right. Bytes 7 & 8 of a TRIM range request form a counter. So, a counter of 1 = (1 x max_dsm_blocks); 2 = (2 x max_dsm_blocks) up to 0x counts. And you can have 64 range requests (contiguous or disjoint) in a 512 byte DSM payload. But these are the maximum possible limits; and I don't how many `counts' would be valid in a synchronous request like yours. Start with a `count' of 1 after you set the LBA48 flag. -RVP
Re: ATA TRIM?
On Fri, 9 Dec 2022, Mouse wrote: What is the value of `max_dsm_blocks' that your drive reports? Unfortunately, atactl(8) doesn't show this currently. I added that - and the version numbers - to my printf. atap_ata_major is 2040, 0x7f8. atap_ata_minor is 283, 0x11b. max_dsm_blocks is 8. OK, so any requests >4K will have to be packaged into further range requests--subject to offset and length alignment constraints (for which you'll have to carry a quirks table--though I can't find your device in the FreeBSD driver). I tried trimming 8 at 0. Still the same syndrome: TRIM timeout, flush timeout, device timeout reading [...] You may have to set the AT_LBA48 flag (not sure if this is present on 5.2) so that `wdccommandext' gets called rather than `wdccommand' for the ATA_DATA_SET_MANAGEMENT command. All this from: https://github.com/freebsd/freebsd-src/blob/main/sys/cam/ata/ata_da.c Hope this helps, -RVP
Re: ATA TRIM?
>> Okay, that now seems unlikely. I tried to TRIM 32M at zero. (Actually, 16M - 32K blocks is 16M.) > What is the value of `max_dsm_blocks' that your drive reports? > Unfortunately, atactl(8) doesn't show this currently. I added that - and the version numbers - to my printf. atap_ata_major is 2040, 0x7f8. atap_ata_minor is 283, 0x11b. max_dsm_blocks is 8. I tried trimming 8 at 0. Still the same syndrome: TRIM timeout, cache flush timeout, device timeout reading - and I just now noticed that the last timeout is a timeout reading wd*0*. This leads me to suspect that it's the host hardware, not the drive, that's falling over here (presumably trying to load dd to read wd1 with). Is that plausible? I did another test. I tried to trim 8 at 0, but, first, I started a loop that reads successive blocks of wd0, the OS's disk, one per second, printing timestamps as it goes. wd0 access locks up during the TRIM attempt. One read got through between that and the cache flush; it locked up again during that. It then came back. But when I tried to read wd1 it locked up again during that. Dunno what all this means /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: ATA TRIM?
On Thu, 8 Dec 2022, Mouse wrote: Okay, that now seems unlikely. I tried to TRIM 32M at zero. (Much more than that seems implausible, since the request has only 16 bits of size, so the maximum representible size is 65535 blocks, or a smidgen under 64M. And zero certainlky ought to be aligned.) What is the value of `max_dsm_blocks' that your drive reports? Unfortunately, atactl(8) doesn't show this currently. -RVP
Re: ATA TRIM?
On Thu, 8 Dec 2022, Mouse wrote: I will have to dig into that more. It does seem to be waiting, in that the call does not return until the thirty seconds specified in the timeout field have elapsed. (It then takes about another 30s before printing the cache-flush timeout message and returning to userland.) [...] Why? cmd.flags specifies AT_WAIT, and as I remarked above it is indeed waiting, so cmd, on the kernel stack, should outlive the I/O attempt. OK, I now see that the *_exec_command()s in 5.2 do wait if AT_WAIT is set. 9.X does a ata_wait_cmd() for this. -RVP
Re: ATA TRIM?
On Thu, Dec 08, 2022 at 11:44:59PM -0500, Mouse wrote: > Since the data on the device is still there afterwards, I don't think > [...] The state of the data after TRIM is unspecified; you might read the old data, you might read zeros or ones, you might (I think) even read something else. What you read, though, also doesn't indicate whether the device thinks the blocks are available; that is, as far as I know it's legal for the device to leave the mapping of those logical blocks to physical blocks alone but mark the physical blocks for later reuse. So it's very difficult to tell if it actually did anything. The code in -9 does work, though, as far as we know. oh, I also remember having to switch some machine's bios to cause the controller to appear as ahcisata and not something else older. Don't remember what the symptoms of not doing that were, though, and that was like 10 years ago. -- David A. Holland dholl...@netbsd.org
Re: ATA TRIM?
[Replying to two messages at once here, both from the same person] [First message] >> printf("TRIM %s: calling exec\n",device_xname(wd->sc_dev)); >> rv = wd->atabus->ata_exec_command(wd->drvp,); >> printf("TRIM %s: returned %d\n",device_xname(wd->sc_dev),rv); >> return(0); > ata_exec_command() will start the command, but, the completion of it > is usually signalled by an interrupt. Presumably, the 9.2 > ATA-related code takes care of this as ata_exec_command() takes a > `xfer' parameter rather than a bare command struct. How does 5.2 > wait for ATA command completion? I will have to dig into that more. It does seem to be waiting, in that the call does not return until the thirty seconds specified in the timeout field have elapsed. (It then takes about another 30s before printing the cache-flush timeout message and returning to userland.) Since the data on the device is still there afterwards, I don't think it's just a question of not correctly handling completion. If it were, I'd expect the operation to work in the sense of dropping the blocks described by the argument values. [Other message] >>case ATAIOCTRIM: >> { unsigned char rq[512]; >> struct ata_command cmd; ... >> rv = wd->atabus->ata_exec_command(wd->drvp,); >> printf("TRIM %s: returned %d\n",device_xname(wd->sc_dev),rv); >> return(0); >> } > Ah, shouldn't `cmd' be allocated memory rather than being a > locally-scoped variable? Why? cmd.flags specifies AT_WAIT, and as I remarked above it is indeed waiting, so cmd, on the kernel stack, should outlive the I/O attempt. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: ATA TRIM?
On Wed, 7 Dec 2022, Mouse wrote: case ATAIOCTRIM: { unsigned char rq[512]; struct ata_command cmd; [...] printf("TRIM %s: calling exec\n",device_xname(wd->sc_dev)); rv = wd->atabus->ata_exec_command(wd->drvp,); printf("TRIM %s: returned %d\n",device_xname(wd->sc_dev),rv); return(0); } break; Ah, shouldn't `cmd' be allocated memory rather than being a locally-scoped variable? -RVP
Re: ATA TRIM?
On Wed, 7 Dec 2022, Mouse wrote: printf("TRIM %s: calling exec\n",device_xname(wd->sc_dev)); rv = wd->atabus->ata_exec_command(wd->drvp,); printf("TRIM %s: returned %d\n",device_xname(wd->sc_dev),rv); return(0); ata_exec_command() will start the command, but, the completion of it is usually signalled by an interrupt. Presumably, the 9.2 ATA-related code takes care of this as ata_exec_command() takes a `xfer' parameter rather than a bare command struct. How does 5.2 wait for ATA command completion? -RVP
Re: ATA TRIM?
>> [...TRIM...] > It could perhaps be that the area you're trying to trim is too small, > or badly aligned? Okay, that now seems unlikely. I tried to TRIM 32M at zero. (Much more than that seems implausible, since the request has only 16 bits of size, so the maximum representible size is 65535 blocks, or a smidgen under 64M. And zero certainlky ought to be aligned.) The behaviour is basically the same. Except for the details, like the argument area, it looks the same: [ - root] 3> trim /dev/rwd1d 0 32768 TRIM wd1: arg 00 00 00 00 00 00 00 80 TRIM wd1: calling exec piixide1:0:1: lost interrupt type: ata tc_bcount: 512 tc_skip: 0 TRIM wd1: returned 1 ATAIOCTRIM workd wd1: wd_flushcache: status=128 [ - root] 4> [ - root] 4> dd if=/dev/rwd1d of=/dev/null count=64 piixide1:0:1: wait timed out wd1d: device timeout reading fsbn 0 (wd1 bn 0; cn 0 tn 0 sn 0), retrying wd1: soft error (corrected) 64+0 records in 64+0 records out 32768 bytes transferred in 0.008 secs (4096000 bytes/sec) [ - root] 5> That is, the request starts and nothing happens until the 30-second timeout expires, at which point it reports "lost interrupt" and says it worked. It then reports another timeout on cache flush. Attempting to read gives _another_ timeout, from which it recovers and then works. And, as before, reading the beginning of the drive indicates that the first hundred sectors, at least, still retain the test data I wrote to them before I started all this. Hm, the device packaging promises free technical support. As cynical as I may be about vendor support, I suppose I really ought to call them up and see if they can put me in touch with someone who actually knows how TRIM works. I don't really have anything to lose except some time. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: ATA TRIM?
>>> I'm trying to understand TRIM, such as is used on SSDs. [...] >> [...] > It could perhaps be that the area you're trying to trim is too small, > or badly aligned? Entirely possible. What are the restrictions? Are they device-specific, or generic? (While wedging seems like a rather broken response to such issues, I've seen brokener.) Mouse
Re: ATA TRIM?
On Thu 08 Dec 2022 at 10:23:19 -0500, Mouse wrote: > I wrote > > > I'm trying to understand TRIM, such as is used on SSDs. [...] > > I forgot to ask: does anyone know whether TRIM is known to work? It It could perhaps be that the area you're trying to trim is too small, or badly aligned? -Olaf. -- ___ "Buying carbon credits is a bit like a serial killer paying someone else to \X/ have kids to make his activity cost neutral." -The BOFHfalu.nl@rhialto signature.asc Description: PGP signature
Re: ATA TRIM?
I wrote > I'm trying to understand TRIM, such as is used on SSDs. [...] I forgot to ask: does anyone know whether TRIM is known to work? It occurs to me that I don't actually know whether the code I'm trying to backport works. The code looks more or less identical in current, according to cvsweb, but that still doesn't tell me whether anyone is _using_ it. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B