Re: [Toybox] [PATCH] lib/lib human_readable_long fix utf-8 LC_NUMERIC

2020-09-09 Thread Rob Landley
On 9/9/20 12:36 PM, Jarno Mäkipää wrote:
> Apparently LC_NUMERIC thousands_sep can be NARROW NO-BREAK SPACE
> 
> There might be cleaner fix than this, but copying just char out of
> thousands_sep spit out

I've fallen a bit behind on posting blog entries again (they need editing and
formatting) but I already wrote there about the reason I DIDN'T do that, which
is that the buffer it writes into is a fixed length and if you don't know how
long the output is you have to malloc it all instead which requires changing
every caller to free it again. (In theory you could calculate a maximum size but
who says there aren't COMBINING characters in your utf8 sequence?)

So no, this is not currently supported. I made it work with "," and with ".",
and if this bothers you I can update it so the test to fall back to (',' is !x
|| x>127).

At the moment, musl and bionic are both returning hardwired "" so this code only
triggers with glibc anyway.

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] lib/lib human_readable_long fix utf-8 LC_NUMERIC

2020-09-09 Thread Jarno Mäkipää
On Thu, Sep 10, 2020 at 3:20 AM enh  wrote:
>
> if you've ever wondered why the same person (me) worked so hard to ensure 
> that OEMs couldn't remove locale data from icu4c but also personally removed 
> all the localization from the core Java libraries and libc...
>
> i'd always been a strong proponent of localization, but one of the first 
> things i did on Android was to remove this sort of "low-level localization" 
> where i found it. i was finding that bugs were getting less attention than 
> they should because developers didn't know what to do with (say) a Turkish 
> error message. automated bug report clustering was failing to realize that 
> (say) `Datei oder Verzeichnis nicht gefunden` and `그런 파일이나 디렉터리가 없습니다` and 
> `No such file or directory` are the same. or scripts failing to parse output 
> because they've been trained on en_US.

Yes, googling problems based on error messages is a lot easier when
errors are on english.

>
> for *apps* -- anything that real people interact with directly -- 
> localization is massively important. but, at least after working on Android, 
> i came to believe that it's a _mistake_ and actively harmful for development 
> tools. the fact that i've had to (say) help a native Russian speaker fix a 
> bug where `x = 70,2` was valid but very much not what they meant only 
> _strengthens_ this belief for me --- if you're going to work on this stuff, 
> you're going to have to learn the C/POSIX locale sooner or later.

I'm ok with the C/Posix locale. It does not have thousands separators
so there is no confusion. But I think forcing en_US on the other hand
is not ok.

>
> see also: why ISO-8601 is the one true date format.
>
> don't apps need libc localization? not really. the POSIX localization 
> functionality is so anaemic that it's really not useful even for "major 
> minority" languages. if you're serious about localization, you're going to 
> need icu4c anyway, which isn't scared to embrace all the diversity that's 
> actually out there (rather than the tiny subset that the POSIX folks could 
> imagine, which doesn't even stretch to the need for the genitive case in 
> dates, to pick one random fairly mainstream example).
>
> Luckily, i've also been able to neuter Android's libc so none of this will 
> affect Android whichever way toybox goes[1]. but i still think it's a bad 
> idea. no "real people" should ever need to look at this, but machines and 
> developers will, and every bit of localization hurts the real audience.
>
> at least 15'936.2 would be a valid C++14 identifier (and i'm assuming will 
> make it into C2x) :-)

And rust has underlines 15_936.2 to add confusion.

>
> ___
> 1. strictly, the fact that you're doing your own insertion of ',' separators 
> might hurt me (in the `top -b` case), but i'll worry about that if i notice 
> it actually break any parsing. i know that's included in Android's standard 
> bugreports, but i _don't_ know that anyone's parsing it.
>
> On Wed, Sep 9, 2020 at 10:37 AM Jarno Mäkipää  wrote:
>>
>> Apparently LC_NUMERIC thousands_sep can be NARROW NO-BREAK SPACE
>>
>> There might be cleaner fix than this, but copying just char out of
>> thousands_sep spit out
>>
>>
>>   Mem:   15�36M total,4�92M used,   11�44M free,  674M buffers
>>  Swap:2�47M total,0M used,2�47M free,1�97M cached
>>
>>
>> after patch
>>   Mem: 15 936M total,  4 658M used, 11 277M free,  677M buffers
>>  Swap:  2 047M total,0M used,  2 047M free,  1 675M cached
>>
>>
>> -Jarno
>> ___
>> Toybox mailing list
>> Toybox@lists.landley.net
>> http://lists.landley.net/listinfo.cgi/toybox-landley.net
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] lib/lib human_readable_long fix utf-8 LC_NUMERIC

2020-09-09 Thread enh via Toybox
if you've ever wondered why the same person (me) worked so hard to ensure
that OEMs couldn't remove locale data from icu4c but also personally
removed all the localization from the core Java libraries and libc...

i'd always been a strong proponent of localization, but one of the first
things i did on Android was to remove this sort of "low-level localization"
where i found it. i was finding that bugs were getting less attention than
they should because developers didn't know what to do with (say) a Turkish
error message. automated bug report clustering was failing to realize that
(say) `Datei oder Verzeichnis nicht gefunden` and `그런 파일이나 디렉터리가 없습니다` and
`No such file or directory` are the same. or scripts failing to parse
output because they've been trained on en_US.

for *apps* -- anything that real people interact with directly --
localization is massively important. but, at least after working on
Android, i came to believe that it's a _mistake_ and actively harmful for
development tools. the fact that i've had to (say) help a native Russian
speaker fix a bug where `x = 70,2` was valid but very much not what they
meant only _strengthens_ this belief for me --- if you're going to work on
this stuff, you're going to have to learn the C/POSIX locale sooner or
later.

see also: why ISO-8601 is the one true date format.

don't apps need libc localization? not really. the POSIX localization
functionality is so anaemic that it's really not useful even for "major
minority" languages. if you're serious about localization, you're going to
need icu4c anyway, which isn't scared to embrace all the diversity that's
actually out there (rather than the tiny subset that the POSIX folks could
imagine, which doesn't even stretch to the need for the genitive case in
dates, to pick one random fairly mainstream example).

luckily, i've also been able to neuter Android's libc so none of this will
affect Android whichever way toybox goes[1]. but i still think it's a
bad idea. no "real people" should ever need to look at this, but machines
and developers will, and every bit of localization hurts the real audience.

at least 15'936.2 would be a valid C++14 identifier (and i'm assuming will
make it into C2x) :-)

___
1. strictly, the fact that you're doing your own insertion of ','
separators might hurt me (in the `top -b` case), but i'll worry about that
if i notice it actually break any parsing. i know that's included in
Android's standard bugreports, but i _don't_ know that anyone's parsing it.

On Wed, Sep 9, 2020 at 10:37 AM Jarno Mäkipää  wrote:

> Apparently LC_NUMERIC thousands_sep can be NARROW NO-BREAK SPACE
>
> There might be cleaner fix than this, but copying just char out of
> thousands_sep spit out
>
>
>   Mem:   15�36M total,4�92M used,   11�44M free,  674M buffers
>  Swap:2�47M total,0M used,2�47M free,1�97M cached
>
>
> after patch
>   Mem: 15 936M total,  4 658M used, 11 277M free,  677M buffers
>  Swap:  2 047M total,0M used,  2 047M free,  1 675M cached
>
>
> -Jarno
> ___
> Toybox mailing list
> Toybox@lists.landley.net
> http://lists.landley.net/listinfo.cgi/toybox-landley.net
>
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


[Toybox] [PATCH] Add ipv6 support to wget.c

2020-09-09 Thread Chris Sarra via Toybox
---
 toys/pending/wget.c | 48 +
 1 file changed, 40 insertions(+), 8 deletions(-)

diff --git a/toys/pending/wget.c b/toys/pending/wget.c
index 75fad3f4..5c85889a 100644
--- a/toys/pending/wget.c
+++ b/toys/pending/wget.c
@@ -25,11 +25,11 @@ GLOBALS(
   char *filename;
 )
 
-// extract hostname from url
+// extract hostname and port from url
 static unsigned get_hn(const char *url, char *hostname) {
   unsigned i;
 
-  for (i = 0; url[i] != '\0' && url[i] != ':' && url[i] != '/'; i++) {
+  for (i = 0; url[i] != '\0' && url[i] != '/'; i++) {
 if(i >= 1024) error_exit("too long hostname in URL");
 hostname[i] = url[i];
   }
@@ -41,7 +41,6 @@ static unsigned get_hn(const char *url, char *hostname) {
 // extract port number
 static unsigned get_port(const char *url, char *port, unsigned url_i) {
   unsigned i;
-
   for (i = 0; url[i] != '\0' && url[i] != '/'; i++, url_i++) {
 if('0' <= url[i] && url[i] <= '9') port[i] = url[i];
 else error_exit("wrong decimal port number");
@@ -52,6 +51,20 @@ static unsigned get_port(const char *url, char *port, 
unsigned url_i) {
   return url_i;
 }
 
+static void strip_v6_brackets(char* hostname) {
+  size_t len = strlen(hostname);
+  if (len > 1023) {
+error_exit("hostname too long, %d bytes\n", len);
+  }
+  char * closing_bracket = strchr(hostname, ']');
+  if (closing_bracket && closing_bracket == hostname + len - 1) {
+if (strchr(hostname, '[') == hostname) {
+  hostname[len-1] = 0;
+  memmove(hostname, hostname + 1, len - 1);
+}
+  }
+}
+
 // get http infos in URL
 static void get_info(const char *url, char* hostname, char *port, char *path) {
   unsigned i = 7, len;
@@ -62,11 +75,30 @@ static void get_info(const char *url, char* hostname, char 
*port, char *path) {
   len = get_hn(url+i, hostname);
   i += len;
 
-  // get port if exists
-  if (url[i] == ':') {
-i++;
-i = get_port(url+i, port, i);
-  } else strcpy(port, "80");
+  // `hostname` now contains `host:port`, where host can be any of: a raw IPv4
+  // address; a bracketed, raw IPv6 address, or a hostname. Extract port, if 
it exists,
+  // by searching for the last ':' in the hostname string.
+  char *port_delim = strrchr(hostname, ':');
+  char use_default_port = 1;
+  if (port_delim) {
+// Found a colon; is there a closing bracket after it? If so,
+// then this colon was in the middle of a bracketed IPv6 address
+if (!strchr(port_delim, ']')) {
+  // No closing bracket; this is a real port
+  use_default_port = 0;
+  get_port(port_delim + 1, port, 0);
+
+  // Mark the new end of the hostname string
+  *port_delim = 0;
+}
+  }
+
+  if (use_default_port) {
+strcpy(port, "80");
+  }
+
+  // This is a NOP if hostname is not a bracketed IPv6 address
+  strip_v6_brackets(hostname);
 
   // get uri in URL
   if (url[i] == '\0') strcpy(path, "/");
-- 
2.28.0.526.ge36021eeef-goog

___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] logger.c was failing to properly log local0-local6 facilities, due to a string parsing error. This patch enables proper local facility handling.

2020-09-09 Thread Chris Sarra via Toybox
Just ran it through a short suite of tests on my end and all looks good
here.
Thanks for the cleanup!
+Chris


On Wed, Sep 9, 2020 at 12:57 AM Rob Landley  wrote:

>
>
> On 9/9/20 12:30 AM, Rob Landley wrote:
> > On 9/8/20 2:16 PM, Chris Sarra via Toybox wrote:
> >> ---
> >>  toys/posix/logger.c | 7 ---
> >>  1 file changed, 4 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/toys/posix/logger.c b/toys/posix/logger.c
> >> index 3bcfb174..d1cc7291 100644
> >> --- a/toys/posix/logger.c
> >> +++ b/toys/posix/logger.c
> >> @@ -64,9 +64,10 @@ void logger_main(void)
> >>  else {
> >>*s1++ = len = 0;
> >>facility = arrayfind(TT.p, facilities, ARRAY_LEN(facilities));
> >> -  if (facility == -1 && strncasecmp(TT.p, "local", 5)) {
> >> -facility = s1[5]-'0';
> >> -if (facility>7 || s1[6]) facility = -1;
> >> +  if (facility == -1 && strncasecmp(TT.p, "local", 5) == 0) {
> >> +s2 = TT.p;
> >> +facility = s2[5]-'0';
> >> +if (facility>7 || s2[6]) facility = -1;
> >
> > Sigh, why did I promote this out of pending? arrayfind() initializes
> matchlen to
> > 0 and then never sets it to anything ELSE, so it ONLY returns exact
> matches not
> > longest unambiguous match (which is the point of the function I think?)
> >
> > Applied your patch, but I have some cleanup to do to this command...
>
> I did the cleanup but I have no tests/logger.test, so I dunno if I broke
> it. (It
> survived obvious smoketesting, but...?)
>
> Could you try the attached and see if it works for you?
>
> Thanks,
>
> Rob
>
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


[Toybox] [PATCH] blkid: don't show empty tags.

2020-09-09 Thread enh via Toybox
The util-linux blkid (even if explicitly asked with -s) won't show you a
tag with no value.
---
 toys/other/blkid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
From 5072ada1d7feae02161400650dfaf120f12d56f8 Mon Sep 17 00:00:00 2001
From: Elliott Hughes 
Date: Wed, 9 Sep 2020 12:31:50 -0700
Subject: [PATCH] blkid: don't show empty tags.

The util-linux blkid (even if explicitly asked with -s) won't show you a
tag with no value.
---
 toys/other/blkid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/toys/other/blkid.c b/toys/other/blkid.c
index 40391de4..e3badca7 100644
--- a/toys/other/blkid.c
+++ b/toys/other/blkid.c
@@ -73,7 +73,7 @@ static void show_tag(char *key, char *value)
 for (al = TT.s; al; al = al->next) if (!strcmp(key, al->arg)) show = 1;
   } else show = 1;
 
-  if (show) printf(" %s=\"%s\"", key, value);
+  if (show && *value) printf(" %s=\"%s\"", key, value);
 }
 
 static void flagshow(char *s, char *name)
-- 
2.28.0.526.ge36021eeef-goog

___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


[Toybox] [PATCH] lib/lib human_readable_long fix utf-8 LC_NUMERIC

2020-09-09 Thread Jarno Mäkipää
Apparently LC_NUMERIC thousands_sep can be NARROW NO-BREAK SPACE

There might be cleaner fix than this, but copying just char out of
thousands_sep spit out


  Mem:   15�36M total,4�92M used,   11�44M free,  674M buffers
 Swap:2�47M total,0M used,2�47M free,1�97M cached


after patch
  Mem: 15 936M total,  4 658M used, 11 277M free,  677M buffers
 Swap:  2 047M total,0M used,  2 047M free,  1 675M cached


-Jarno
From 04ae21f85a606038710ccfe1118a3dc6a7f33632 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jarno=20M=C3=A4kip=C3=A4=C3=A4?= 
Date: Wed, 9 Sep 2020 20:20:40 +0300
Subject: [PATCH] lib/lib human_readable_long fix utf-8 LC_NUMERIC

Apparently LC_NUMERIC thousands_sep can be NARROW NO-BREAK SPACE
---
 lib/lib.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/lib/lib.c b/lib/lib.c
index 319b6af4..a7ca8077 100644
--- a/lib/lib.c
+++ b/lib/lib.c
@@ -1151,7 +1151,7 @@ match:
 int human_readable_long(char *buf, unsigned long long num, int dgt, int unit,
   int style)
 {
-  static char cc, dot;
+  static char *cc, *dot;
   unsigned long long snap = 0;
   int len, commas = 0, off, ii, divisor = (style_1000) ? 1000 : 1024;
 
@@ -1173,26 +1173,27 @@ int human_readable_long(char *buf, unsigned long long num, int dgt, int unit,
 
   setlocale(LC_NUMERIC, "");
   ll = localeconv();
-  dot = *ll->decimal_point ? : '.';
-  cc = *ll->thousands_sep ? : ',';
-} else cc = ',', dot = '.';
+  dot = ll->decimal_point ? : ".";
+  cc = ll->thousands_sep ? : ",";
+} else cc = ",", dot = ".";
   }
 
   len = sprintf(buf, "%llu", num);
   if (style_COMMAS) {
+int clen = strlen(cc);
 for (ii = 0; ii___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] logger.c was failing to properly log local0-local6 facilities, due to a string parsing error. This patch enables proper local facility handling.

2020-09-09 Thread Rob Landley


On 9/9/20 12:30 AM, Rob Landley wrote:
> On 9/8/20 2:16 PM, Chris Sarra via Toybox wrote:
>> ---
>>  toys/posix/logger.c | 7 ---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/toys/posix/logger.c b/toys/posix/logger.c
>> index 3bcfb174..d1cc7291 100644
>> --- a/toys/posix/logger.c
>> +++ b/toys/posix/logger.c
>> @@ -64,9 +64,10 @@ void logger_main(void)
>>  else {
>>*s1++ = len = 0;
>>facility = arrayfind(TT.p, facilities, ARRAY_LEN(facilities));
>> -  if (facility == -1 && strncasecmp(TT.p, "local", 5)) {
>> -facility = s1[5]-'0';
>> -if (facility>7 || s1[6]) facility = -1;
>> +  if (facility == -1 && strncasecmp(TT.p, "local", 5) == 0) {
>> +s2 = TT.p;
>> +facility = s2[5]-'0';
>> +if (facility>7 || s2[6]) facility = -1;
> 
> Sigh, why did I promote this out of pending? arrayfind() initializes matchlen 
> to
> 0 and then never sets it to anything ELSE, so it ONLY returns exact matches 
> not
> longest unambiguous match (which is the point of the function I think?)
> 
> Applied your patch, but I have some cleanup to do to this command...

I did the cleanup but I have no tests/logger.test, so I dunno if I broke it. (It
survived obvious smoketesting, but...?)

Could you try the attached and see if it works for you?

Thanks,

Rob
/* logger.c - Log messages.
 *
 * Copyright 2013 Ilya Kuzmich 
 *
 * See http://pubs.opengroup.org/onlinepubs/9699919799/utilities/logger.html
 *
 * Deviations from posix: specified manner and format, defined implementation.

USE_LOGGER(NEWTOY(logger, "t:p:s", TOYFLAG_USR|TOYFLAG_BIN))

config LOGGER
  bool "logger"
  default y
  help
usage: logger [-s] [-t TAG] [-p [FACILITY.]PRIORITY] [MESSAGE...]

Log message (or stdin) to syslog.

-s	Also write message to stderr
-t	Use TAG instead of username to identify message source
-p	Specify PRIORITY with optional FACILITY. Default is "user.notice"
*/

#define FOR_logger
#include "toys.h"

GLOBALS(
  char *p, *t;
)

// find str in names[], accepting unambiguous short matches
// returns offset into array of match, or -1 if no match
int arrayfind(char *str, char *names[], int len)
{
  int j, i, ll = 0, maybe = -1;

  for (j = 0; jll) maybe = j;
  else if (i==ll) maybe = -1;
  break;
}
if (!names[j][i] || toupper(str[i])!=toupper(names[j][i])) break;
  }

  return maybe;
}

void logger_main(void)
{
  int facility = LOG_USER, priority = LOG_NOTICE, len = 0;
  char *s1, *s2, **arg,
*priorities[] = {"emerg", "alert", "crit", "error", "warning", "notice",
 "info", "debug"},
*facilities[] = {"kern", "user", "mail", "daemon", "auth", "syslog",
 "lpr", "news", "uucp", "cron", "authpriv", "ftp"};

  if (!TT.t) TT.t = xgetpwuid(geteuid())->pw_name;
  if (TT.p) {
if (!(s1 = strchr(TT.p, '.'))) s1 = TT.p;
else {
  *s1++ = 0;
  facility = arrayfind(TT.p, facilities, ARRAY_LEN(facilities));
  if (facility<0) {
if (sscanf(TT.p, "local%d", )>0 && !(facility&~7))
  facility += 16;
else error_exit("bad facility: %s", TT.p);
  }
  facility *= 8;
}

priority = arrayfind(s1, priorities, ARRAY_LEN(priorities));
if (priority<0) error_exit("bad priority: %s", s1);
  }

  if (toys.optc) {
for (arg = toys.optargs; *arg; arg++) len += strlen(*arg)+1;
s1 = s2 = xmalloc(len);
for (arg = toys.optargs; *arg; arg++) {
  if (arg != toys.optargs) *s2++ = ' ';
  s2 = stpcpy(s2, *arg);
}
  } else toybuf[readall(0, s1 = toybuf, sizeof(toybuf)-1)] = 0;

  openlog(TT.t, LOG_PERROR*FLAG(s), facility);
  syslog(priority, "%s", s1);
  closelog();
}
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net