Re: [Toybox] [PATCH] Add support for 1024 as well as 1000 to human_readable.

Samuel Holland Fri, 04 Sep 2015 23:05:05 -0700

On 2015-09-04 18:24, Rob Landley wrote:

Why is the _caller_ not appending B when they printf() the result? The
space is before the units but the B isn't, and this is a string that
gets put into a buffer and then used by something else. Further editing
is kinda _normal_...

Because the caller would then have to worry about the M/MB/MiB problem.The convention (at least in GNU and util-linux) is that M and MiB bothrefer to 2^20 bytes, and MB refers to 10^6 bytes. If the caller appendsthe B afterward, it might change the meaning of the number:

        10Mi -> 10MiB is fine
        10M  -> 10MB is wrong

The purpose of the flag is to append B if the number is less than1000/1024, so (among other reasons) you can have a fixed-with string ofoutput: 42G, 42M, 42K, 42B, even if there would not normally be a letterthere. In that case, at least, you definitely don't want to "just appenda B", because you only want the B in certain cases.

HN_DIVISOR_1000 Divide number with 1000 instead of 1024.


Yep, I think network speeds are measured in SI units for example
I could live with 1024 units everywhere esp. if we also used the IEC prefixes


I object to the word "kibibyte" on general principles, and disks are
also sold in decimal sizes (for historical marketing reasons).

But RAM is sold in binary sizes. "16 gigs" of RAM is 16384MiB, not16000MB. (Think `free -h`.) And on a more fundamental level, it willalways be measured in binary sizes: pages are 4096 bytes, not 4000.

And so is flash, manufactured in binary. Even though you can buy a"500GB" SSD, it's really 512GiB on the inside, with the additional spaceused as spare flash pages.

(Of course "512 gigs" is mixing decimal and binary when you _do_ use
binary gigs, since the 512 is decimal and all. But let's be honest,
"kibibytes" is a stupid name, all else is details for me.)

HN_IEC_PREFIXES Use the IEE/IEC notion of prefixes (Ki, Mi,


Mebibytes. *shudder*

Huh, I thought the i was the second character in "binary", but this
implies it's "IEC"? Or possibly IEE? Or maybe the i from "mebi" which is
back to "binary" again...


Mi -> Mebi -> million binary -> 2^20

Gi...). This flag has no effect when
HN_DIVISOR_1000 is also specified.


Err yes, but it is not that it has no effect but that if you are using 1000s 
there should not be the 'i'


The B is already a separate flag from the 1024. If the caller wants to
append the unicode character for "clown nose" to the returned string,
that's not really human_readable()'s business.

See above. You have to have the "i" if you want to append the "B". Butyou can't just append both if you want the "B" in the case of <1000,because then you'll have 1KiB = 1024BiB, or 1KB = 1000BB, and there's nosuch thing as a BiB.

For my two cents I would suggest we go for IEC prefixes by default, yes they 
are so-so
but there is a standard and it does make things noticeably clearer, might as do 
it right instead
of the usual customary ComSci notation where it is Notoriously ambiguous


The function is called human_readable().

You want to default to binary units.

What exactly is our goal here again?

Using binary powers is quite important for some human-readable cases.Take, for example, SSDs. For performance and longevity, you have toalign data access to flash erase block sizes, which get up to 128KiB or256KiB. It's important then to align partitions on MiB (not MB)boundaries. cfdisk and Debian's partitioner get this horribly wrong.(Especially because you specify MB when creating partitions that it willthen show you in MiB sizes).

(Keeping the thundering hordes of android users happy. Right. Trying not
to get emotionally invested in an aesthetic decision which hasn't _got_
a right answer and just needs to be consistent. That said, if I can help
kill the term "mebibytes" it is worth MUCH EFFORT on my part...)

in the entire tree, there's only one use of HN_GETSCALE
(/usr/bin/procstat), and it doesn't look like that's actually
necessary).

HN_DECIMAL and HN_NOSPACE are used a lot: ls, df, du, and so on. HN_B


I did not have a HN_DECIMAL since I expect 0-9 to have a decimal point for a 
second
digit of precision, the range is to 999 anyway so it will not use more 
characters.

is used less, but in df, du, and vmstat. HN_DIVISOR_1000 is only
really used in df (it's also used once each in "edquota" and
"camcontrol").


I would have no problem with df using units 1024 instead and displaying IEC 
Units


Disks are sold in decimal measurements. People are going to ask why your
horribly inefficient file format is eating so much of their disk space.


Even Windows shows disk free space in binary units.

(What, did they stop doing that with flash? I'd be surprised if they did...)

No, SSDs are still sold in decimal sizes. But you have to _use_ them inbinary sizes.

HN_IEC_PREFIXES isn't used at all. not even a test.


Yeah, I have noticed for myself, following the standard and even making it the 
default
so that you know what everything is in would be good, alas somewhat incompatable
with custom, but are scripts using -h and then parsing it... something is 
likely that dumb.
But it would be nice to actually do the right thing.


Nothing extending the usage of the word "gibibytes" is the right thing.


Then just do like util-linux and use "G" instead of "GiB"

so until we find a place where we want to turn off HN_DECIMAL, we're
good. (that's a harder thing to grep for, but i couldn't find an
instance in FreeBSD.)


I would hope not, I would regard it as a useless loss of presision.
9.9 will fit in the same space as 999 just fine.


human_readable() _IS_ a useless loss of precision. That's what it's _for_.

And the units advance by kilobytes so 9.9 and 999 are not rephrasings of
each other. 999k and 1.0M can be from a rounding  perspective, but "loss
of precision" is the reason rounding _exists_...

You can also set a flags to drop the space between number and prefix or use the 
ubuntu 0..1023 style
also you can request the limited range 0..999, 1.0 k-999 k style in either SI 
or IEC


Yes, but why would we want to?


Strict conformance to the standard? avoiding the 9999->9.8Ki transition.


The first I heard of this standard was when you mentioned it. Ubuntu
clearly wasn't doing it.

(If you git add a file, git diff shows no differences, mercurial diff
shows it diffed against /dev/null. I'm STILL getting used to the weird
little behavioral divergences.)


git diff --cached

That will show your staged changes (including added/removed file diffs).

I hope this is interesting.


It's very interesting and I'm keeping it around in case it's needed. I'm
just trying to figure out if the extra flags are something any command
is actually going to use. (And that's an Elliott question more than a me
question, I never use -h and it's not in posix or LSB.)


Odd, it has been in common useage for years, but I guess it was just whatever
people felt a human would like to see rather than one of the standards.


It's got a dozen flags because everybody who implemented this did it
differently because the machine readable scriptable version is just to
print out the actual NUMBER, thus the aesthetic cleanup is (or at least
should be) just that.

And because different quantities are measured with different units.Network speeds use decimal; memory sizes use binary; and disk sizes useboth.

Bringing an international standards body into a purely aesthetic
decision is weird. ANSI vs ISO tea was a _joke_.

(Ok, maybe the aesthetic output has mutated into functional due to
screen scrapers, which is what Elliott was implying by scripts depending
on -h output. In which case either rigorously copying the historical
mistakes or breaking them really loudly is called for. Adding a
standards body to that sort of mess gives me a headache long before we
get into any sort of details.)

Rob

--
Regards,
Samuel Holland <[email protected]>
_______________________________________________
Toybox mailing list
[email protected]
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Re: [Toybox] [PATCH] Add support for 1024 as well as 1000 to human_readable.

Reply via email to