Re: Suggestion about error control

2020-06-13 Thread Bug reports for ddrescue, data recovery tool.
To be honest I don???t think I ever used any T10 documentation for the 
SCSI passthrough. It is needed for ATA passthrough, but there is plenty 
of other documentation and open source code for the SCSI passthrough, 
and I know for sure everything I found was free. And from what I can 
tell, the SCSI passthrough is still processed by the kernel, and the 
kernel deals with the inconsistencies of devices, so your concerns about 
the ???zillion existing exceptions??? is still well handled by the kernel.


You only need five SCSI commands:

1) INQUIRY

2) READ CAPACITY (10)

3) READ CAPACITY (16)

4) READ (10)

5) READ (16)

Originally I was using the host_status as one way to tell if a drive was 
offline, but some devices cause this status to be bad for no reason. So 
after every read error perform an inquiry, if it fails then the device 
is no longer responding. Also perform a read capacity command and verify 
the capacity is still reported as the same size, if not then the drive 
is no longer responding properly. It is really that simple, once you get 
past the somewhat complicated part of actually performing and processing 
the SCSI passthrough. Other than the host_status issue, the only other 
issue I have seen is that normally if a device is large enough to 
require READ CAPACITY (16) it is supposed to report a block capacity of 
0x with the READ CAPACITY (10) command, so you would know to use 
size 16 commands. I don???t remember exactly why or what the conditions 
were, but I found it better to try a READ CAPACITY (16) command first, 
and if it fails for invalid command then stick to size 10 commands.


One other thing that must be followed is there is a buffer limit for 
every connected device when using passthrough mode. The limit is stored 
at /sys/block/DEVICE/queue/max_sectors_kb, where "DEVICE" is the device 
you are reading (example "/sys/block/sda/queue/max_sectors_kb"). The 
number stored here is referenced in KB, and the default for a hard drive 
is usually 512 (meaning 512KB). This number is usually smaller for a USB 
connected drive (120KB). This size limit must not be exceeded when 
reading, or bad things will happen.


You may find those issues to be a reason to say something like ???See, 
there are things that are inconsistent and that is not safe???. But I can 
say that following those basic rules has been rock solid for me with the 
SCSI passthrough. As for the ???zillion existing exceptions???, I have 
stepped into the realm of direct packet communication with USB devices, 
and at that level it does get very messy. It makes one aware of how much 
the kernel does deal with the inconsistencies of the devices so that we 
don???t see the chaos.


Regards,
Scott


On 6/3/2020 5:18 PM, Antonio Diaz Diaz wrote:

Scott Dwyer wrote:

No, you have spent much time on an excellent program, the only one of
its kind in the open source world, and I bet with little financial 
return.


Thanks. You are right about the "little financial return". I have 
received about 20 euros in donations in the last three months. (6.67 
eur/month).



My intention was to reply to the suggestion of error control that
ddrescue doesn't do like other programs. You must go deeper to
accomplish this, at a minimum SCSI passthrough. I do it in Linux, and
the other program can also do it in Windows I believe. Both are specific
and non-portable, due to the nature of what needs to be done at a lower
level. It is obviously more complicated, but when done correctly it is
no more dangerous than what the kernel does.


How can one be sure that it is done correctly given the zillion 
existing exceptions? You know. Some drive does not implement some SCSI 
command. Some other implements it in a funny way. Some other has a bug 
in the implementation... I mean, the kernel already does it badly 
enough (specially for USB drives).


See for example this note from http://sg.danny.cz/sg/
"The term SCSI has several meaning depending on the context. This 
leads to confusion. One practical way of defining it today is 
everything that the T10 INCITS committee controls, see www.t10.org . 
Probably the most succinct overview is this standards architecture 
page . For practical purposes a "SCSI device" in Linux is any device 
that uses the Linux SCSI subsystem and this often includes SATA disks."


Moreover, SCSI standards are not freely accesible[1]. If I can't find 
a free copy, I'll need that someone donates one for the development of 
ddrescue.


[1] http://www.t10.org/t10_access.htm


And FYI the kernel does NOT know best when it comes to a failing drive,
it will thrash it more than needed in Linux, and Windows is even worse.


I believe you. But at least if linux gets any bug related to a failing 
drive, say returning wrong data for good sectors near a bad sector, I 
expect it to be discovered faster than if I make the same mistake in 
(the much less used) ddrescue, for example.



Then maybe someone can come up with the SCSI 

Re: Suggestion about error control

2020-06-03 Thread Antonio Diaz Diaz

Scott Dwyer wrote:

No, you have spent much time on an excellent program, the only one of
its kind in the open source world, and I bet with little financial return.


Thanks. You are right about the "little financial return". I have received 
about 20 euros in donations in the last three months. (6.67 eur/month).



My intention was to reply to the suggestion of error control that
ddrescue doesn't do like other programs. You must go deeper to
accomplish this, at a minimum SCSI passthrough. I do it in Linux, and
the other program can also do it in Windows I believe. Both are specific
and non-portable, due to the nature of what needs to be done at a lower
level. It is obviously more complicated, but when done correctly it is
no more dangerous than what the kernel does.


How can one be sure that it is done correctly given the zillion existing 
exceptions? You know. Some drive does not implement some SCSI command. Some 
other implements it in a funny way. Some other has a bug in the 
implementation... I mean, the kernel already does it badly enough (specially 
for USB drives).


See for example this note from http://sg.danny.cz/sg/
"The term SCSI has several meaning depending on the context. This leads to 
confusion. One practical way of defining it today is everything that the T10 
INCITS committee controls, see www.t10.org . Probably the most succinct 
overview is this standards architecture page . For practical purposes a 
"SCSI device" in Linux is any device that uses the Linux SCSI subsystem and 
this often includes SATA disks."


Moreover, SCSI standards are not freely accesible[1]. If I can't find a free 
copy, I'll need that someone donates one for the development of ddrescue.


[1] http://www.t10.org/t10_access.htm


And FYI the kernel does NOT know best when it comes to a failing drive,
it will thrash it more than needed in Linux, and Windows is even worse.


I believe you. But at least if linux gets any bug related to a failing 
drive, say returning wrong data for good sectors near a bad sector, I expect 
it to be discovered faster than if I make the same mistake in (the much less 
used) ddrescue, for example.



Then maybe someone can come up with the SCSI passthrough code for
ddrescue (hint to programmers out there that want to, I have produced
open source Linux patches for this in the past that would be a good
starting point, look into the old ddrutility stuff).


Thank you for the patches. I keep them and I plan to use them at least to 
compare them with my own code as a way to find possible errors in my code.


IIRC, the main reason why I have never used your SCSI passthrough patch is 
that its main feature is increasing the read performance, which I think 
should be done by the kernel when --idirect is used. I do not consider that 
reading data through the SCSI passthrough interface is safe enough for 
ddrescue. The readme file for your patch tends to confirm this[2].


[2] 
http://sourceforge.net/projects/ddrutility/files/ddrescue%20patches/passthrough%20patch/


You have done a good work, but I plan to keep the risks low and limit 
ddrescue's use of the SCSI passthrough interface to the improvement of the 
detection of error conditions in the input device.



IMO every piece of software should either publish the full source code
(so that users can decide if they trust it) or offer an unlimited
warranty in case of misbehavior of the code.


If this were the case, then all software would be open source or open to
incredible liability. Without the hope of financial gain (or having the
fear of great loss), there would be much less effort, and many good
programs would not exist.


It does not need to be "open source" in the sense of "free software", only 
in the sense of "the users may verify it, even if they aren't allowed to 
redistribute it". This surely would increase the safety of the software by 
removing lots of crappy non-free software from the market.



Maybe I should remove myself from the list so I don't see the emails, and
therefore not tempted to reply. I might just do that...


Please, don't. Your contributions are valuable and appreciated. It is just 
that writing about non-free software (specially to promote it) is off-topic 
in GNU lists.


Best regards,
Antonio.



Re: Suggestion about error control

2020-06-02 Thread Scott Dwyer

On 6/2/2020 4:34 PM, Antonio Diaz Diaz wrote:

Scott Dwyer wrote:

The tools mentioned are not open source and
have paid versions for a reason. The authors have spent much time and
effort working on them to make them special.


Do you mean that I have not spent much time and effort working on 
ddrescue to make it special?
No, you have spent much time on an excellent program, the only one of 
its kind in the open source world, and I bet with little financial return.



Realizing the different error conditions of a device cannot be done with
normal commands.


IMO, non-standard commands should be used only when standard commands 
don't suffice, because non-standard commands have a much higher 
probability of causing a catastrophic data loss because of an 
incompatibility.


Ddrescue is a low-risk project. Maybe it won't maximize the 
probability of recovering the data in difficult cases, but it won't 
risk destroying your data by trying to be more clever than the kernel.


It is good that there exist low- and high-risk projects. This allows 
you to try the low-risk ddrescue first, and try the high-risk 
non-standard software if ddrescue can't recover your data.
My intention was to reply to the suggestion of error control that 
ddrescue doesn't do like other programs. You must go deeper to 
accomplish this, at a minimum SCSI passthrough. I do it in Linux, and 
the other program can also do it in Windows I believe. Both are specific 
and non-portable, due to the nature of what needs to be done at a lower 
level. It is obviously more complicated, but when done correctly it is 
no more dangerous than what the kernel does. And FYI the kernel does NOT 
know best when it comes to a failing drive, it will thrash it more than 
needed in Linux, and Windows is even worse.



Comparing those tools to ddrescue is like comparing apples to oranges.


Certainly. Ddrescue exposes its code so that everybody can see (and 
improve) it. OTOH, we only have your word that the secret, 
non-standard commands you use in your program won't eat the user's 
data. :-)
Then maybe someone can come up with the SCSI passthrough code for 
ddrescue (hint to programmers out there that want to, I have produced 
open source Linux patches for this in the past that would be a good 
starting point, look into the old ddrutility stuff). I would actually 
like to see this implemented in ddrescue, and while I would not give 
away any of my secrets, I would be willing to at least point someone in 
the right direction if they were trying and had technical questions, 
although it would obviously not be my top priority to respond to those 
questions in a super timely manner. And yes, users of my software have 
only my word and no warranty ;)


IMO every piece of software should either publish the full source code 
(so that users can decide if they trust it) or offer an unlimited 
warranty in case of misbehavior of the code.
If this were the case, then all software would be open source or open to 
incredible liability. Without the hope of financial gain (or having the 
fear of great loss), there would be much less effort, and many good 
programs would not exist. If you told me that my software had to be 
either open source or unlimited warranty, it would not exist.


By the way, would you mind toning down the ads in your emails? This 
mailing list is for improving GNU ddrescue, not for advertising other 
software. Thanks.
Sorry, the only reason I mentioned the names is that it was in the 
original message. I try not to respond to the list, but sometimes I see 
something that I feel the need to respond to (sometimes against my 
better judgment). Maybe I should remove myself from the list so I don't 
see the emails, and therefore not tempted to reply. I might just do that...


Scott



Re: Suggestion about error control

2020-06-02 Thread Antonio Diaz Diaz

Scott Dwyer wrote:

The tools mentioned (hddsuperclone and DMDE) are not open source and
have paid versions for a reason. The authors have spent much time and
effort working on them to make them special.


Do you mean that I have not spent much time and effort working on ddrescue 
to make it special?



Realizing the different error conditions of a device cannot be done with
normal commands.


IMO, non-standard commands should be used only when standard commands don't 
suffice, because non-standard commands have a much higher probability of 
causing a catastrophic data loss because of an incompatibility.


Ddrescue is a low-risk project. Maybe it won't maximize the probability of 
recovering the data in difficult cases, but it won't risk destroying your 
data by trying to be more clever than the kernel.


It is good that there exist low- and high-risk projects. This allows you to 
try the low-risk ddrescue first, and try the high-risk non-standard software 
if ddrescue can't recover your data.



Comparing those tools to ddrescue is like comparing apples to oranges.


Certainly. Ddrescue exposes its code so that everybody can see (and improve) 
it. OTOH, we only have your word that the secret, non-standard commands you 
use in your program won't eat the user's data. :-)


IMO every piece of software should either publish the full source code (so 
that users can decide if they trust it) or offer an unlimited warranty in 
case of misbehavior of the code.


By the way, would you mind toning down the ads in your emails? This mailing 
list is for improving GNU ddrescue, not for advertising other software. Thanks.


Antonio.



Re: Suggestion about error control

2020-06-02 Thread Bug reports for ddrescue, data recovery tool.
The tools mentioned (hddsuperclone and DMDE) are not open source and 
have paid versions for a reason. The authors have spent much time and 
effort working on them to make them special. Realizing the different 
error conditions of a device cannot be done with normal commands. 
Comparing those tools to ddrescue is like comparing apples to oranges.


Regards,
Scott


On 6/1/2020 7:06 PM, Antonio Diaz Diaz wrote:

Hello kickman.

Thanks for your message and sorry for the late answer. I have been 
very busy.


anonymous wrote:

The two last can recognize and distinguish the error codes got from the
kernel after every read /write request. For example, when hddsuperclone
detects the disk is offline (not ready) - it just waits for it to become
ready again. So DMDE does. It shows different error codes allowing the
operator to make the right decision about what to do. But my favourite
tool ddrescue never made such a difference to the error codes! Either 
the

error code was "CRC error" or "Device not ready!"


I suppose those tools (hddsuperclone and DMDE) use non-portable ways 
to get those codes, because neither "CRC error" nor "Device not 
ready!" appear among the error codes returned by 'read'[1].


[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html

Bypassing the kernel should not be done lightly on POSIX systems. As a 
portable program, ddrescue must limit itself to the error codes 
returned by 'read' as documented by POSIX. As a first approximation, 
'EIO' may correspond to 'CRC error' and 'ENXIO' seems the best match 
for 'Device not ready'.


I'll implement some form of differential response to error codes in 
the next version of ddrescue. Maybe ddrescue could stop or ask the 
user when readblockp sets errno to EBADF, ESPIPE, or ENXIO, for example.


It may be also possible to implement a non-portable way to retrieve 
the drive status, and compile it conditionally with '--enable-non-posix'.


Feedback is welcome.

Best regards,
Antonio.





Re: Suggestion about error control

2020-06-01 Thread Cameron Andrews
I'd be happy with the non-portable for Linux for sure.  Those kinds of 
errors, and wait conditions would be good as well. Thanks.


Kind Regards,
Cameron Andrews
North Brisbane Data Recovery

On 2/6/20 9:06 am, Antonio Diaz Diaz wrote:

Hello kickman.

Thanks for your message and sorry for the late answer. I have been 
very busy.


anonymous wrote:

The two last can recognize and distinguish the error codes got from the
kernel after every read /write request. For example, when hddsuperclone
detects the disk is offline (not ready) - it just waits for it to become
ready again. So DMDE does. It shows different error codes allowing the
operator to make the right decision about what to do. But my favourite
tool ddrescue never made such a difference to the error codes! Either 
the

error code was "CRC error" or "Device not ready!"


I suppose those tools (hddsuperclone and DMDE) use non-portable ways 
to get those codes, because neither "CRC error" nor "Device not 
ready!" appear among the error codes returned by 'read'[1].


[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html

Bypassing the kernel should not be done lightly on POSIX systems. As a 
portable program, ddrescue must limit itself to the error codes 
returned by 'read' as documented by POSIX. As a first approximation, 
'EIO' may correspond to 'CRC error' and 'ENXIO' seems the best match 
for 'Device not ready'.


I'll implement some form of differential response to error codes in 
the next version of ddrescue. Maybe ddrescue could stop or ask the 
user when readblockp sets errno to EBADF, ESPIPE, or ENXIO, for example.


It may be also possible to implement a non-portable way to retrieve 
the drive status, and compile it conditionally with '--enable-non-posix'.


Feedback is welcome.

Best regards,
Antonio.





Re: Suggestion about error control

2020-06-01 Thread Antonio Diaz Diaz

Hello kickman.

Thanks for your message and sorry for the late answer. I have been very busy.

anonymous wrote:

The two last can recognize and distinguish the error codes got from the
kernel after every read /write request. For example, when hddsuperclone
detects the disk is offline (not ready) - it just waits for it to become
ready again. So DMDE does. It shows different error codes allowing the
operator to make the right decision about what to do. But my favourite
tool ddrescue never made such a difference to the error codes! Either the
error code was "CRC error" or "Device not ready!"


I suppose those tools (hddsuperclone and DMDE) use non-portable ways to get 
those codes, because neither "CRC error" nor "Device not ready!" appear 
among the error codes returned by 'read'[1].


[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html

Bypassing the kernel should not be done lightly on POSIX systems. As a 
portable program, ddrescue must limit itself to the error codes returned by 
'read' as documented by POSIX. As a first approximation, 'EIO' may 
correspond to 'CRC error' and 'ENXIO' seems the best match for 'Device not 
ready'.


I'll implement some form of differential response to error codes in the next 
version of ddrescue. Maybe ddrescue could stop or ask the user when 
readblockp sets errno to EBADF, ESPIPE, or ENXIO, for example.


It may be also possible to implement a non-portable way to retrieve the 
drive status, and compile it conditionally with '--enable-non-posix'.


Feedback is welcome.

Best regards,
Antonio.