Alain Bench <[EMAIL PROTECTED]> writes: > Removing separators will break existing apps parsing wget's output. > Such apps exist?
They do exist, but *any* change in Wget's output will break them. Since they probably do the equivalent of sed s/,//g anyway, the removal of separators is likely to be the least of their problems. Maybe I was not clear enough in the "pasting" requirement from my first bullet point: by that I didn't refer to programmatical processing of Wget's whole output, but to hand-picking parts of it (such as file size or file name), and manually copy+pasting them to the shell or to bc. In that case sed is not trivially involved and yet the thousand separators *always* have to be removed. >> omitting the thousand separators merely removes redundancy, not useful >> information. > > That's true only if you assume the user analyses the /unit-size/ and > /kmt-size/ as a whole, as a unique info. But that's not always the case. > One may well look only at /unit-size/. Without seps, this user is forced > to count digits, or to look additionally to /kmt-size/, and do some > brainwork to find corresponding order of magnitude. For this user, sep > removal removes readability. Here you seem to assume that the typical user cares about and first looks at exact, to-the-byte figures. In my experience that is rarely the case -- in most cases, the user cares about the order of magnitude, such as "640K" or "42M", rather than the byte size. In fact, when I do need the exact size, it is exactly in order to be able to paste it to another program, such as emacs or bc, which Wget makes harder by inserting those separators! With the order of magnitude information being readily available in the form of the unit, Wget (at least for some uses) does me a disservice by adding that same information in the form of separators. > Unless a bigger unavoidable danger interferes. That's my humble > opinion, but I believe it's also some more general ergonomic > principle. If so, I have yet to see this principle in writing, or use an application that abides by it by default, the single exception being -- Wget. (And Wget doesn't accept grouped digits in numeric input, so it's inconsistent to boot.) Even number-oriented applications touted as user-friendly such as oocalc (and presumably Excel, but I don't have it around to verify) don't group digits by default. >> As for localization, I'm not against it. The argument was that, where >> possible, I prefer the output of applications to remain parsable. > > So we disagree only on the balance. I'd say output to humans should > be localized as much as possible, unless this creates a really serious > problem for the machine parsing secondary usage. You're right, my choice of balance leans more to the parsing side, although actual parsing is only part of the picture. For example, the ISO 8601 dates have the nice property that the simple textual sort orders them chronologically. This is useful for file names (e.g. log files), but also for easy sorting of textual date columns in spreadsheets and databases! In this case the computer didn't even try to make sense of the data, but its regularity helped make it more useful. (Of course, ISO 8601 dates also have the property of being easily parsable with either straightforward regexps or trivial C code, neither being the case for localized dates -- see GNU getdate.y.) > Where incompatible, human and machine output may be separated. An important point of the Unix philosophy is that, with some care, the same output can be served to humans and machines. (Piping the output of `du' or `wc' to sort is an example of doing both.) While that principle may be misguided and doesn't directly apply to the more human-oriented Wget's output, it can be applied with measure. I find it self-evident that it is better to at least be able to paste parts of output into other programs than to not be able to do so.
