Re: Suggestion about error control
To be honest I don???t think I ever used any T10 documentation for the SCSI passthrough. It is needed for ATA passthrough, but there is plenty of other documentation and open source code for the SCSI passthrough, and I know for sure everything I found was free. And from what I can tell, the SCSI passthrough is still processed by the kernel, and the kernel deals with the inconsistencies of devices, so your concerns about the ???zillion existing exceptions??? is still well handled by the kernel. You only need five SCSI commands: 1) INQUIRY 2) READ CAPACITY (10) 3) READ CAPACITY (16) 4) READ (10) 5) READ (16) Originally I was using the host_status as one way to tell if a drive was offline, but some devices cause this status to be bad for no reason. So after every read error perform an inquiry, if it fails then the device is no longer responding. Also perform a read capacity command and verify the capacity is still reported as the same size, if not then the drive is no longer responding properly. It is really that simple, once you get past the somewhat complicated part of actually performing and processing the SCSI passthrough. Other than the host_status issue, the only other issue I have seen is that normally if a device is large enough to require READ CAPACITY (16) it is supposed to report a block capacity of 0x with the READ CAPACITY (10) command, so you would know to use size 16 commands. I don???t remember exactly why or what the conditions were, but I found it better to try a READ CAPACITY (16) command first, and if it fails for invalid command then stick to size 10 commands. One other thing that must be followed is there is a buffer limit for every connected device when using passthrough mode. The limit is stored at /sys/block/DEVICE/queue/max_sectors_kb, where "DEVICE" is the device you are reading (example "/sys/block/sda/queue/max_sectors_kb"). The number stored here is referenced in KB, and the default for a hard drive is usually 512 (meaning 512KB). This number is usually smaller for a USB connected drive (120KB). This size limit must not be exceeded when reading, or bad things will happen. You may find those issues to be a reason to say something like ???See, there are things that are inconsistent and that is not safe???. But I can say that following those basic rules has been rock solid for me with the SCSI passthrough. As for the ???zillion existing exceptions???, I have stepped into the realm of direct packet communication with USB devices, and at that level it does get very messy. It makes one aware of how much the kernel does deal with the inconsistencies of the devices so that we don???t see the chaos. Regards, Scott On 6/3/2020 5:18 PM, Antonio Diaz Diaz wrote: Scott Dwyer wrote: No, you have spent much time on an excellent program, the only one of its kind in the open source world, and I bet with little financial return. Thanks. You are right about the "little financial return". I have received about 20 euros in donations in the last three months. (6.67 eur/month). My intention was to reply to the suggestion of error control that ddrescue doesn't do like other programs. You must go deeper to accomplish this, at a minimum SCSI passthrough. I do it in Linux, and the other program can also do it in Windows I believe. Both are specific and non-portable, due to the nature of what needs to be done at a lower level. It is obviously more complicated, but when done correctly it is no more dangerous than what the kernel does. How can one be sure that it is done correctly given the zillion existing exceptions? You know. Some drive does not implement some SCSI command. Some other implements it in a funny way. Some other has a bug in the implementation... I mean, the kernel already does it badly enough (specially for USB drives). See for example this note from http://sg.danny.cz/sg/ "The term SCSI has several meaning depending on the context. This leads to confusion. One practical way of defining it today is everything that the T10 INCITS committee controls, see www.t10.org . Probably the most succinct overview is this standards architecture page . For practical purposes a "SCSI device" in Linux is any device that uses the Linux SCSI subsystem and this often includes SATA disks." Moreover, SCSI standards are not freely accesible[1]. If I can't find a free copy, I'll need that someone donates one for the development of ddrescue. [1] http://www.t10.org/t10_access.htm And FYI the kernel does NOT know best when it comes to a failing drive, it will thrash it more than needed in Linux, and Windows is even worse. I believe you. But at least if linux gets any bug related to a failing drive, say returning wrong data for good sectors near a bad sector, I expect it to be discovered faster than if I make the same mistake in (the much less used) ddrescue, for example. Then maybe someone can come up with the SCSI
Re: Suggestion about error control
Scott Dwyer wrote: No, you have spent much time on an excellent program, the only one of its kind in the open source world, and I bet with little financial return. Thanks. You are right about the "little financial return". I have received about 20 euros in donations in the last three months. (6.67 eur/month). My intention was to reply to the suggestion of error control that ddrescue doesn't do like other programs. You must go deeper to accomplish this, at a minimum SCSI passthrough. I do it in Linux, and the other program can also do it in Windows I believe. Both are specific and non-portable, due to the nature of what needs to be done at a lower level. It is obviously more complicated, but when done correctly it is no more dangerous than what the kernel does. How can one be sure that it is done correctly given the zillion existing exceptions? You know. Some drive does not implement some SCSI command. Some other implements it in a funny way. Some other has a bug in the implementation... I mean, the kernel already does it badly enough (specially for USB drives). See for example this note from http://sg.danny.cz/sg/ "The term SCSI has several meaning depending on the context. This leads to confusion. One practical way of defining it today is everything that the T10 INCITS committee controls, see www.t10.org . Probably the most succinct overview is this standards architecture page . For practical purposes a "SCSI device" in Linux is any device that uses the Linux SCSI subsystem and this often includes SATA disks." Moreover, SCSI standards are not freely accesible[1]. If I can't find a free copy, I'll need that someone donates one for the development of ddrescue. [1] http://www.t10.org/t10_access.htm And FYI the kernel does NOT know best when it comes to a failing drive, it will thrash it more than needed in Linux, and Windows is even worse. I believe you. But at least if linux gets any bug related to a failing drive, say returning wrong data for good sectors near a bad sector, I expect it to be discovered faster than if I make the same mistake in (the much less used) ddrescue, for example. Then maybe someone can come up with the SCSI passthrough code for ddrescue (hint to programmers out there that want to, I have produced open source Linux patches for this in the past that would be a good starting point, look into the old ddrutility stuff). Thank you for the patches. I keep them and I plan to use them at least to compare them with my own code as a way to find possible errors in my code. IIRC, the main reason why I have never used your SCSI passthrough patch is that its main feature is increasing the read performance, which I think should be done by the kernel when --idirect is used. I do not consider that reading data through the SCSI passthrough interface is safe enough for ddrescue. The readme file for your patch tends to confirm this[2]. [2] http://sourceforge.net/projects/ddrutility/files/ddrescue%20patches/passthrough%20patch/ You have done a good work, but I plan to keep the risks low and limit ddrescue's use of the SCSI passthrough interface to the improvement of the detection of error conditions in the input device. IMO every piece of software should either publish the full source code (so that users can decide if they trust it) or offer an unlimited warranty in case of misbehavior of the code. If this were the case, then all software would be open source or open to incredible liability. Without the hope of financial gain (or having the fear of great loss), there would be much less effort, and many good programs would not exist. It does not need to be "open source" in the sense of "free software", only in the sense of "the users may verify it, even if they aren't allowed to redistribute it". This surely would increase the safety of the software by removing lots of crappy non-free software from the market. Maybe I should remove myself from the list so I don't see the emails, and therefore not tempted to reply. I might just do that... Please, don't. Your contributions are valuable and appreciated. It is just that writing about non-free software (specially to promote it) is off-topic in GNU lists. Best regards, Antonio.
Re: Suggestion about error control
On 6/2/2020 4:34 PM, Antonio Diaz Diaz wrote: Scott Dwyer wrote: The tools mentioned are not open source and have paid versions for a reason. The authors have spent much time and effort working on them to make them special. Do you mean that I have not spent much time and effort working on ddrescue to make it special? No, you have spent much time on an excellent program, the only one of its kind in the open source world, and I bet with little financial return. Realizing the different error conditions of a device cannot be done with normal commands. IMO, non-standard commands should be used only when standard commands don't suffice, because non-standard commands have a much higher probability of causing a catastrophic data loss because of an incompatibility. Ddrescue is a low-risk project. Maybe it won't maximize the probability of recovering the data in difficult cases, but it won't risk destroying your data by trying to be more clever than the kernel. It is good that there exist low- and high-risk projects. This allows you to try the low-risk ddrescue first, and try the high-risk non-standard software if ddrescue can't recover your data. My intention was to reply to the suggestion of error control that ddrescue doesn't do like other programs. You must go deeper to accomplish this, at a minimum SCSI passthrough. I do it in Linux, and the other program can also do it in Windows I believe. Both are specific and non-portable, due to the nature of what needs to be done at a lower level. It is obviously more complicated, but when done correctly it is no more dangerous than what the kernel does. And FYI the kernel does NOT know best when it comes to a failing drive, it will thrash it more than needed in Linux, and Windows is even worse. Comparing those tools to ddrescue is like comparing apples to oranges. Certainly. Ddrescue exposes its code so that everybody can see (and improve) it. OTOH, we only have your word that the secret, non-standard commands you use in your program won't eat the user's data. :-) Then maybe someone can come up with the SCSI passthrough code for ddrescue (hint to programmers out there that want to, I have produced open source Linux patches for this in the past that would be a good starting point, look into the old ddrutility stuff). I would actually like to see this implemented in ddrescue, and while I would not give away any of my secrets, I would be willing to at least point someone in the right direction if they were trying and had technical questions, although it would obviously not be my top priority to respond to those questions in a super timely manner. And yes, users of my software have only my word and no warranty ;) IMO every piece of software should either publish the full source code (so that users can decide if they trust it) or offer an unlimited warranty in case of misbehavior of the code. If this were the case, then all software would be open source or open to incredible liability. Without the hope of financial gain (or having the fear of great loss), there would be much less effort, and many good programs would not exist. If you told me that my software had to be either open source or unlimited warranty, it would not exist. By the way, would you mind toning down the ads in your emails? This mailing list is for improving GNU ddrescue, not for advertising other software. Thanks. Sorry, the only reason I mentioned the names is that it was in the original message. I try not to respond to the list, but sometimes I see something that I feel the need to respond to (sometimes against my better judgment). Maybe I should remove myself from the list so I don't see the emails, and therefore not tempted to reply. I might just do that... Scott
Re: Suggestion about error control
Scott Dwyer wrote: The tools mentioned (hddsuperclone and DMDE) are not open source and have paid versions for a reason. The authors have spent much time and effort working on them to make them special. Do you mean that I have not spent much time and effort working on ddrescue to make it special? Realizing the different error conditions of a device cannot be done with normal commands. IMO, non-standard commands should be used only when standard commands don't suffice, because non-standard commands have a much higher probability of causing a catastrophic data loss because of an incompatibility. Ddrescue is a low-risk project. Maybe it won't maximize the probability of recovering the data in difficult cases, but it won't risk destroying your data by trying to be more clever than the kernel. It is good that there exist low- and high-risk projects. This allows you to try the low-risk ddrescue first, and try the high-risk non-standard software if ddrescue can't recover your data. Comparing those tools to ddrescue is like comparing apples to oranges. Certainly. Ddrescue exposes its code so that everybody can see (and improve) it. OTOH, we only have your word that the secret, non-standard commands you use in your program won't eat the user's data. :-) IMO every piece of software should either publish the full source code (so that users can decide if they trust it) or offer an unlimited warranty in case of misbehavior of the code. By the way, would you mind toning down the ads in your emails? This mailing list is for improving GNU ddrescue, not for advertising other software. Thanks. Antonio.
Re: Suggestion about error control
The tools mentioned (hddsuperclone and DMDE) are not open source and have paid versions for a reason. The authors have spent much time and effort working on them to make them special. Realizing the different error conditions of a device cannot be done with normal commands. Comparing those tools to ddrescue is like comparing apples to oranges. Regards, Scott On 6/1/2020 7:06 PM, Antonio Diaz Diaz wrote: Hello kickman. Thanks for your message and sorry for the late answer. I have been very busy. anonymous wrote: The two last can recognize and distinguish the error codes got from the kernel after every read /write request. For example, when hddsuperclone detects the disk is offline (not ready) - it just waits for it to become ready again. So DMDE does. It shows different error codes allowing the operator to make the right decision about what to do. But my favourite tool ddrescue never made such a difference to the error codes! Either the error code was "CRC error" or "Device not ready!" I suppose those tools (hddsuperclone and DMDE) use non-portable ways to get those codes, because neither "CRC error" nor "Device not ready!" appear among the error codes returned by 'read'[1]. [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html Bypassing the kernel should not be done lightly on POSIX systems. As a portable program, ddrescue must limit itself to the error codes returned by 'read' as documented by POSIX. As a first approximation, 'EIO' may correspond to 'CRC error' and 'ENXIO' seems the best match for 'Device not ready'. I'll implement some form of differential response to error codes in the next version of ddrescue. Maybe ddrescue could stop or ask the user when readblockp sets errno to EBADF, ESPIPE, or ENXIO, for example. It may be also possible to implement a non-portable way to retrieve the drive status, and compile it conditionally with '--enable-non-posix'. Feedback is welcome. Best regards, Antonio.
Re: Suggestion about error control
I'd be happy with the non-portable for Linux for sure. Those kinds of errors, and wait conditions would be good as well. Thanks. Kind Regards, Cameron Andrews North Brisbane Data Recovery On 2/6/20 9:06 am, Antonio Diaz Diaz wrote: Hello kickman. Thanks for your message and sorry for the late answer. I have been very busy. anonymous wrote: The two last can recognize and distinguish the error codes got from the kernel after every read /write request. For example, when hddsuperclone detects the disk is offline (not ready) - it just waits for it to become ready again. So DMDE does. It shows different error codes allowing the operator to make the right decision about what to do. But my favourite tool ddrescue never made such a difference to the error codes! Either the error code was "CRC error" or "Device not ready!" I suppose those tools (hddsuperclone and DMDE) use non-portable ways to get those codes, because neither "CRC error" nor "Device not ready!" appear among the error codes returned by 'read'[1]. [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html Bypassing the kernel should not be done lightly on POSIX systems. As a portable program, ddrescue must limit itself to the error codes returned by 'read' as documented by POSIX. As a first approximation, 'EIO' may correspond to 'CRC error' and 'ENXIO' seems the best match for 'Device not ready'. I'll implement some form of differential response to error codes in the next version of ddrescue. Maybe ddrescue could stop or ask the user when readblockp sets errno to EBADF, ESPIPE, or ENXIO, for example. It may be also possible to implement a non-portable way to retrieve the drive status, and compile it conditionally with '--enable-non-posix'. Feedback is welcome. Best regards, Antonio.
Re: Suggestion about error control
Hello kickman. Thanks for your message and sorry for the late answer. I have been very busy. anonymous wrote: The two last can recognize and distinguish the error codes got from the kernel after every read /write request. For example, when hddsuperclone detects the disk is offline (not ready) - it just waits for it to become ready again. So DMDE does. It shows different error codes allowing the operator to make the right decision about what to do. But my favourite tool ddrescue never made such a difference to the error codes! Either the error code was "CRC error" or "Device not ready!" I suppose those tools (hddsuperclone and DMDE) use non-portable ways to get those codes, because neither "CRC error" nor "Device not ready!" appear among the error codes returned by 'read'[1]. [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html Bypassing the kernel should not be done lightly on POSIX systems. As a portable program, ddrescue must limit itself to the error codes returned by 'read' as documented by POSIX. As a first approximation, 'EIO' may correspond to 'CRC error' and 'ENXIO' seems the best match for 'Device not ready'. I'll implement some form of differential response to error codes in the next version of ddrescue. Maybe ddrescue could stop or ask the user when readblockp sets errno to EBADF, ESPIPE, or ENXIO, for example. It may be also possible to implement a non-portable way to retrieve the drive status, and compile it conditionally with '--enable-non-posix'. Feedback is welcome. Best regards, Antonio.