bug#17505: Pádraig: does this solve your consistency concern? (was bug#17505: dd statistics output)
On 07/27/14 19:11, Linda Walsh wrote: It is more common to specify transfer sizes in SI and mean IEC if you are in the US where the digital computer was created. People in the US have not adopted SI units and many wouldn't know a meter from a molehill, so SI units aren't the first thing that they are likely to be meaning. Computer scientists and the industry here, grew up with using IEC prefixes where multiples of 8 are already in use. I.e. if you are talking *bytes*, you are using base 2. I didn't grow up in the US, and grew up with the metric system, but when I'm talking about memory sizes I always mean IEC (2^10) and never SI (10^3). The only pitfall here are hard disk sizes where I have to remember that they mean SI. It is inconsistent to switch to decimal prefixes when talking about binary numbers. Agreed. BTW I was playing devil's advocate with my mention of the SIGUSR1 inconsistency. I'm still of the opinion that the dynamic switch of human units based on current transferred amount is the lesser of two evils, since this output is destined for human consumption. I don't get the reason for the dynamic switch at all. Can somebody enlighten me? regards, chris
bug#17505: Pádraig: does this solve your consistency concern? (was bug#17505: dd statistics output)
Christian Groessler wrote: On 07/27/14 19:11, Linda Walsh wrote: It is more common to specify transfer sizes in SI and mean IEC if you are in the US where the digital computer was created. People in the US have not adopted SI units and many wouldn't know a meter from a molehill, so SI units aren't the first thing that they are likely to be meaning. Computer scientists and the industry here, grew up with using IEC prefixes where multiples of 8 are already in use. I.e. if you are talking *bytes*, you are using base 2. I didn't grow up in the US, and grew up with the metric system, but when I'm talking about memory sizes I always mean IEC (2^10) and never SI (10^3). The only pitfall here are hard disk sizes where I have to remember that they mean SI. I was trying to come up with some reason for Padraig's belief that people usually meant SI when using IEC prefixes for computer sizes like units bytes (2^3bits) or sectors (2^12 bits)... now what power of 10 is that? I've never heard of anyone supporting Padraig position -- so I assumed it must be some foreign country where the metric system and metric prefixes are meant to apply to non-unary and non-base-10 quantities. Pádraig: where did you get your impression? When it comes to disk space -- computers always give it in IEC -- except where they've bought the line that mixed base-2 and power-of-10 prefixes is a good thing, then they try to get others to buy into such. But reality is that one can't express disk space as a power of 10 as there is no multiple of 10 that lines up with a 512-byte multiple. I.e. the system is designed to be inaccurate and confuse the issue to make it harder for consumers to do comparisons. I don't get the reason for the dynamic switch at all. Can somebody enlighten me? I think it was thrown in as a red herring, as I can't think of any useful case for it. Having the output vary units randomly, not at the bequest of the user, doesn't seem especially useful.
bug#17505: Pádraig: does this solve your consistency concern? (was bug#17505: dd statistics output)
Pádraig Brady wrote: That was the original approach but is a bit worse than the dynamic approach since it's common to specify transfer sizes in IEC units for SI sized data. It is more common to specify transfer sizes in SI and mean IEC if you are in the US where the digital computer was created. People in the US have not adopted SI units and many wouldn't know a meter from a molehill, so SI units aren't the first thing that they are likely to be meaning. Computer scientists and the industry here, grew up with using IEC prefixes where multiples of 8 are already in use. I.e. if you are talking *bytes*, you are using base 2. It is inconsistent to switch to decimal prefixes when talking about binary numbers. OTOH, if you are talking *bits*, I would say usage meaning SI units are more common. Bytes = 2^3 bits. not 10 bits. Now I was willing to go so far as to not force incompatible or bad nomenclature upon others, but to use their own nomenclature when replying to them. If someone came up to you and spoke a question in French, would you answer them in English and make some comment about people using French by accident and they really mean to use English? If you goal was clear communication, you'd try to answer in the language they were querying in (presuming you knew it). Only giving responses in English, when you accept input in French, would likely be thought insulting. If people are that concerned to get the output they want in SI, they might be bothered to use it on input (or read the manpage and find out how to make it happen). For those that are concerned to get the output they want in computer compatible binary, you seem to be saying they are S-O-L, which seems a poor and selfish attitude to be taking. BTW I was playing devil's advocate with my mention of the SIGUSR1 inconsistency. I'm still of the opinion that the dynamic switch of human units based on current transferred amount is the lesser of two evils, since this output is destined for human consumption. If it is for human consumption, humans like consistency -- if they speak to you in 1 language, they likely appreciate being replied to in the same .. same goes for terminology and units. If someone asks you how many kilometers it is to XXX and you come back with 38 miles, you think that's a user friendly design? cheers, Pádraig.
bug#17505: Pádraig: does this solve your consistency concern? (was bug#17505: dd statistics output)
On 07/26/2014 02:35 AM, Linda Walsh wrote: Pádraig: you may have missed this as it was a reply to an old thread, but, changing the subj and composing as new should prevent that (I hope) You were concerned that the user would get different outputs based on the previously suggested algorithm -- as well as possibly different output when SIGUSR1 came in. This idea seems to solve both of those -- so if the patch that was proposed for this was modified in line with this suggestion, would there be any further problems? Linda Walsh wrote: Found old bug, still open... Pádraig Brady wrote: On 07/16/2014 10:38 AM, Pádraig Brady wrote: http://bugs.gnu.org/17505#37 was proposed do the following automatically (depending on the amount output): 268435456 bytes (256 MiB) copied, 0.0248346 s, 10.8 GB/s However that wasn't applied due to inconsistency concerns. I'm still of the opinion that the change above would be a net gain, as the number in brackets is for human interpretation, and in the vast majority of cases would be the best representation for that. One patch that would not be inconsistent: If the user uses units of a single system (i.e. doesn't use 'si' and b2 units in same statement), then display the summary units using the same notation the user used: dd if=xx bs=256M ...(256M copied) vs. dd if=xx bs=256MB ...(256MB copied)... Note another reason to _not_ apply the patch is that requests to print the statistics can come async through SIGUSR1, and thus increase the chances of inconsistent output. Solves this too, since the units are decided when the command is parsed, so SIGUSR would use the same units as would come out on a final summary. Or is using consistent units w/what the user users not ok? Note, for statements w/o units (or mixed system), there would be no reason to change current behavior. That was the original approach but is a bit worse than the dynamic approach since it's common to specify transfer sizes in IEC units for SI sized data. BTW I was playing devil's advocate with my mention of the SIGUSR1 inconsistency. I'm still of the opinion that the dynamic switch of human units based on current transferred amount is the lesser of two evils, since this output is destined for human consumption. cheers, Pádraig.
bug#17505: Pádraig: does this solve your consistency concern? (was bug#17505: dd statistics output)
Pádraig: you may have missed this as it was a reply to an old thread, but, changing the subj and composing as new should prevent that (I hope) You were concerned that the user would get different outputs based on the previously suggested algorithm -- as well as possibly different output when SIGUSR1 came in. This idea seems to solve both of those -- so if the patch that was proposed for this was modified in line with this suggestion, would there be any further problems? Linda Walsh wrote: Found old bug, still open... Pádraig Brady wrote: On 07/16/2014 10:38 AM, Pádraig Brady wrote: http://bugs.gnu.org/17505#37 was proposed do the following automatically (depending on the amount output): 268435456 bytes (256 MiB) copied, 0.0248346 s, 10.8 GB/s However that wasn't applied due to inconsistency concerns. I'm still of the opinion that the change above would be a net gain, as the number in brackets is for human interpretation, and in the vast majority of cases would be the best representation for that. One patch that would not be inconsistent: If the user uses units of a single system (i.e. doesn't use 'si' and b2 units in same statement), then display the summary units using the same notation the user used: dd if=xx bs=256M ...(256M copied) vs. dd if=xx bs=256MB ...(256MB copied)... Note another reason to _not_ apply the patch is that requests to print the statistics can come async through SIGUSR1, and thus increase the chances of inconsistent output. Solves this too, since the units are decided when the command is parsed, so SIGUSR would use the same units as would come out on a final summary. Or is using consistent units w/what the user users not ok? Note, for statements w/o units (or mixed system), there would be no reason to change current behavior.
bug#17505: dd statistics output
Found old bug, still open... Pádraig Brady wrote: On 07/16/2014 10:38 AM, Pádraig Brady wrote: http://bugs.gnu.org/17505#37 was proposed do the following automatically (depending on the amount output): 268435456 bytes (256 MiB) copied, 0.0248346 s, 10.8 GB/s However that wasn't applied due to inconsistency concerns. I'm still of the opinion that the change above would be a net gain, as the number in brackets is for human interpretation, and in the vast majority of cases would be the best representation for that. One patch that would not be inconsistent: If the user uses units of a single system (i.e. doesn't use 'si' and b2 units in same statement), then display the summary units using the same notation the user used: dd if=xx bs=256M ...(256M copied) vs. dd if=xx bs=256MB ...(256MB copied)... Note another reason to _not_ apply the patch is that requests to print the statistics can come async through SIGUSR1, and thus increase the chances of inconsistent output. Solves this too, since the units are decided when the command is parsed, so SIGUSR would use the same units as would come out on a final summary. Or is using consistent units w/what the user users not ok? Note, for statements w/o units (or mixed system), there would be no reason to change current behavior.
bug#17505: dd statistics output
On 07/16/2014 10:38 AM, Pádraig Brady wrote: On 07/16/2014 03:45 AM, Christian Groessler wrote: Hi, the final output of 'dd' is in SI mode (or how to call it). It uses 10^6 instead of 2^20 for megabyte. Example: $ dd if=/dev/zero of=/dev/null bs=65536 count=4096 4096+0 records in 4096+0 records out 268435456 bytes (268 MB) copied, 0.0248346 s, 10.8 GB/s $ Is there a switch to display in traditional units, I'd like to have 268435456 bytes (256 MB) copied, ... http://bugs.gnu.org/17505#37 was proposed do the following automatically (depending on the amount output): 268435456 bytes (256 MiB) copied, 0.0248346 s, 10.8 GB/s However that wasn't applied due to inconsistency concerns. I'm still of the opinion that the change above would be a net gain, as the number in brackets is for human interpretation, and in the vast majority of cases would be the best representation for that. Note another reason to _not_ apply the patch is that requests to print the statistics can come async through SIGUSR1, and thus increase the chances of inconsistent output. thanks, Pádraig.
bug#17505: dd statistics output
On 07/16/14 15:42, Pádraig Brady wrote: Note another reason to _not_ apply the patch is that requests to print the statistics can come async through SIGUSR1, and thus increase the chances of inconsistent output. Sorry, I cannot follow. Which inconsistent output are you referring to? regards, chris