This is sort of moot now, but I composed it and hadn't hit send and it did
explain what I was thinking at the time, so...

On 9/8/20 1:55 PM, enh wrote:
> 
> 
> On Mon, Sep 7, 2020 at 1:29 PM Rob Landley <r...@landley.net
> <mailto:r...@landley.net>> wrote:
> 
>     On 9/6/20 6:45 AM, Jarno Mäkipää wrote:
>     > On Sun, Sep 6, 2020 at 12:34 PM Rob Landley <r...@landley.net
>     <mailto:r...@landley.net>> wrote:
>     >> Elliott says there's a maximum limit on the number of digits users are
>     willing
>     >> to parse, and you're saying it's better to just have large blank gaps 
> between
>     >> the numbers than to use that space for anything, AND that the cap on 
> the
>     maximum
>     >> number of digits is insurmountable rather than using separators like
>     people have
>     >> been doing for hundreds of years to cope with long numbers in "human
>     readable"
>     >> output?
>     >>
>     >> It's certainly a point of view.
>     >
>     > Groups of 3 are indeed easier for eye. I would suggest using something
>     > more sensible like spaces.
> 
>     Which is not what any country uses by default and thus makes everyone 
> equally do
>     a double take? Egalitarian badness?
> 
>     Hardwiring it to the esperanto of formats is certainly a suggestion.
> 
>     > SI system uses spaces as thousands separator, comma and period both
>     > being valid decimal separator.
>     > 123 456.789 or 123 456,789
> 
>     Ok, I'll bite: which countries teach SI to their kids in primary school?
> 
>     >> Bravo. And bionic's libc/bionic/locale.gratuitouslycppbutactuallyc 
> says:
>     >>
>     >>   // We only support two locales, the "C" locale (also known as 
> "POSIX"),
>     >>   // and the "C.UTF-8" locale (also known as "en_US.UTF-8").
>     >>
>     >> So they don't support it either.
>     >
>     > C.UTF-8 and en_US.UTF-8 are not same.
> 
>     I cut and pasted that out of the bionic source.
> 
>     >> However, if the commas go, why doesn't the period in human_readable() 
> go? I
>     >> don't see how they're conceptually different?
> 
>     I'm waiting for an opinion from Elliott, which might be a "meh?" because 
> it's
>     not exactly his area either.
> 
> 
> i actually felt that 5 digits was small enough to not need separators.
> 
> a couple of things do stand out though. here's toybox and procps-ng 3.3.16 on
> Debian on my middling "real computer":
> 
>   Mem:   63,978M total,   53,696M used,   10,282M free,     1870M buffers
>  Swap:   56,095M total,      0.0M used,   56,088M free,   35,929M cached
> 
> MiB Mem :  63978.8 total,  10287.4 free,  13083.5 used,  40607.9 buff/cache
> MiB Swap:  56096.0 total,  56088.2 free,      7.8 used.  49419.2 avail Mem 
> 
> on the whole i prefer the toybox output, apart from the bug that gets us 
> "1870M"
> rather than "1,870M" to match the others,

It's not a bug I did that intentionally (both because the comma's always been
optional with only 4 digits and because what do you display if you have exactly
4 digits of output space?), but I can remove it again. I suppose passing
HR_COMMAS with dgt 4 is caller error...

> and the weird "0.0M". i'd always use
> decimals or never use decimals.

I can have the commas flag suppress the tenths?

> toybox doesn't do as good a job on the smallest system available to me right 
> now
> (a phone from a couple of years ago):
> 
>   Mem:     3931M total,     2149M used,     1782M free,     6336K buffers
>  Swap:     2948M total,      0.0K used,     2948M free,     1485M cached
> 
> there's the same 0.0 issue (though the ',' bug cancels itself out here because
> all the fields are consistent and -- imho -- 4 digits is definitely readable
> without separators anyway).

I can make is so the comma is only suppressed when the output size is "4". If
you CAN show 5 digits, add the comma.

> i still dislike that "buffers" is using K where the
> others are using M.

Hmmm, good point. But the real question is why didn't the earlier fields use all
the space? Ah, I see: I gave it 8 digits and 3,931,000 is 9.

I have a "force megabytes" threshold based on testing the total memory size, and
right now that test is 10 gigs. So I should either make the display size 9
digits (I have 7 spaces free at the end of the buffers line on an 80 column
screen so eating 4 is fine), or I could make the megabyte threshold be 1 gig
instead, so 999,999 would use 7 digits and those would stay in kilobytes.

I lean towards going to 9 digits, personally. Either way, the units should stay
consistent.

> that to me still seems like the worst issue: i think we
> should always use the same units (which procps at least seems to do, even if
> they are sometimes KiB and other times MiB). which is basically a stronger
> version of the decimals complaint --- it's a table, and it's really weird when
> different fields in the table are in different units.

The design is short of shifting out from under human_readable(). The above
probably fixes it, but if this happens again I should step back and rethink the
objective here. (Specifying units instead of autodetecting them, possibly a
version that operates on an array of values instead of one at a time. There's a
lot of "measure all the output so it matches up" ala ls, and no generic plumbing
to handle that, but a design for efficient plumbing to handle that is not
immediately obvious to me. Hmmm...)

> i don't think on a 3GiB system that i actually want to know whether i have
> 6.2MiB or 6.3MiB of buffers --- "6" is fine. but even if i did know, i'd want 
> to
> see 6.2 vs 3931.0 rather than switch units mid-row.

Consistency across the group is good, I agree.

>     A proper fix would be a localeconv() in libc that DOESN'T return constant 
> stub
>     info, which is out of scope for toybox. (And is as much an ADB thing as a 
> bionic
>     thing since android seems to be using adb instead of ssh, so that would 
> have to
>     marshall the locale environment variables from the host into the target. 
> But I
>     often "wait for somebody to complain", you complained, and therefore I 
> want to
>     fix it PROPERLY.)
> 
>     In the meantime, I can add a call to localeconv() that would use "," if 
> that
>     returns "" which means right now it would be a NOP but then it's not my 
> fault
>     it's getting it wrong. And I can test against glibc which does have an
>     overengineered version of this in it. Way back when uClibc had a much 
> compressed
>     format for the localeconv data, but didn't have a database of countries 
> and thus
>     copied its data from glibc, which it couldn't distribute for licensing 
> reasons:

Of course the sad part is that these are _strings_ not bytes (utf8!) which means
the buffer size passed into human_readable_long is no longer fixed and easily
calculable unless I cheat and say I only care about the first byte of the return
and if they use utf8 we're outputting gibberish. (Which for right now, "wait for
somebody to complain"...)

The proper fix to that would be to malloc() the result, which is an imposition
on all the callers and I'm gonna punt on that for now.

Sigh, the locale init in main.c is sort of wrong and doesn't address LC_NUMERIC
anyway. Right, do it in the function...

>       https://lists.uclibc.org/pipermail/uclibc/2015-June/049000.html
> 
>     Rob
> 
>     P.S. I ranted about this sort of aesthetic issue being something the open 
> source
>     development model can't deal with 10 years ago, almost to the day:
> 
>       https://landley.net/notes-2010.html#13-08-2010
> 
>     And included it in my 2013 talk:
> 
>       https://www.youtube.com/watch?v=SGmtP5Lg_t0#t=11m30s
> 

Rob
_______________________________________________
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Reply via email to