Re: Removing thousand separators from file size output

Hrvoje Niksic Sat, 25 Jun 2005 06:40:33 -0700

Alain Bench <[EMAIL PROTECTED]> writes:

> Removing separators will break existing apps parsing wget's output.
> Such apps exist?


They do exist, but *any* change in Wget's output will break them.
Since they probably do the equivalent of sed s/,//g anyway, the
removal of separators is likely to be the least of their problems.

Maybe I was not clear enough in the "pasting" requirement from my
first bullet point: by that I didn't refer to programmatical
processing of Wget's whole output, but to hand-picking parts of it
(such as file size or file name), and manually copy+pasting them to
the shell or to bc.  In that case sed is not trivially involved and
yet the thousand separators *always* have to be removed.

>> omitting the thousand separators merely removes redundancy, not useful
>> information.
>
> That's true only if you assume the user analyses the /unit-size/ and
> /kmt-size/ as a whole, as a unique info. But that's not always the case.
> One may well look only at /unit-size/. Without seps, this user is forced
> to count digits, or to look additionally to /kmt-size/, and do some
> brainwork to find corresponding order of magnitude. For this user, sep
> removal removes readability.

Here you seem to assume that the typical user cares about and first
looks at exact, to-the-byte figures.  In my experience that is rarely
the case -- in most cases, the user cares about the order of
magnitude, such as "640K" or "42M", rather than the byte size.  In
fact, when I do need the exact size, it is exactly in order to be able
to paste it to another program, such as emacs or bc, which Wget makes
harder by inserting those separators!

With the order of magnitude information being readily available in the
form of the unit, Wget (at least for some uses) does me a disservice
by adding that same information in the form of separators.

> Unless a bigger unavoidable danger interferes. That's my humble
> opinion, but I believe it's also some more general ergonomic
> principle.

If so, I have yet to see this principle in writing, or use an
application that abides by it by default, the single exception being
-- Wget.  (And Wget doesn't accept grouped digits in numeric input, so
it's inconsistent to boot.)

Even number-oriented applications touted as user-friendly such as
oocalc (and presumably Excel, but I don't have it around to verify)
don't group digits by default.

>> As for localization, I'm not against it. The argument was that, where
>> possible, I prefer the output of applications to remain parsable.
>
>     So we disagree only on the balance. I'd say output to humans should
> be localized as much as possible, unless this creates a really serious
> problem for the machine parsing secondary usage.

You're right, my choice of balance leans more to the parsing side,
although actual parsing is only part of the picture.  For example, the
ISO 8601 dates have the nice property that the simple textual sort
orders them chronologically.  This is useful for file names (e.g. log
files), but also for easy sorting of textual date columns in
spreadsheets and databases!  In this case the computer didn't even try
to make sense of the data, but its regularity helped make it more
useful.

(Of course, ISO 8601 dates also have the property of being easily
parsable with either straightforward regexps or trivial C code,
neither being the case for localized dates -- see GNU getdate.y.)

>     Where incompatible, human and machine output may be separated.

An important point of the Unix philosophy is that, with some care, the
same output can be served to humans and machines.  (Piping the output
of `du' or `wc' to sort is an example of doing both.)  While that
principle may be misguided and doesn't directly apply to the more
human-oriented Wget's output, it can be applied with measure.  I find
it self-evident that it is better to at least be able to paste parts
of output into other programs than to not be able to do so.

Re: Removing thousand separators from file size output

Reply via email to