Izabera on freenode sent in a bug report, boiling down to sort.c predating the FLAG_x macros and when I converted it I apparently screwed some stuff up. (I do some strange things with the macros,but if (flags*FLAG_f) blah(); CAN'T work...)
The other issue this raised is making -f work with utf8, which Izabera suggested _could_ work if instead of a for() loop calling toupper() sort instead called strcmp or strcasecmp based on FLAG_f. Good idea, so let's do that. But updating tests/sort.test to tell this bug got fixed (which always a good idea: if you submit a bug, submit a test to show that how bug didn't used to work, which now passes after the fix)... is tricky. The problem is the GNU/dammit sort changes its behavior based on LC_BLAH internationalization, and we're matching the LC_ALL=c behavior, which is what source package builds need. (Sort in ASCII order, so all uppercase letters come before all lowercase letters. This is why "toybox ls" on the toybox source puts Config.in and LICENSE and README at the top, while Ubuntu's ls mixes them in with the rest.) I want UTF8 awareness when comparing case insensitivity, but ASCII sort order otherwise. (Yeah, judgement call, but toybox has always drawn the line at "UTF8 support yes, full internationalization of currencies and dates and help text no".) In theory, this means I set LC_ALL=c in scripts/test.sh the same way I do in scripts/make.sh, but I don't want to accidentally disable UTF8 support in the host version I'm testing against. Internationalization is a can of worms generally requiring external files to look things up in a per-country database, which makes it out of scope for toybox. But the majority of the planet doesn't speak english, and now that there's one format for international text it's obviously the right thing to do. (Yes several countries still use historical encodings that got there first, but they're out of scope.) Anyway, that's why I set LC_COLLATE=c in scripts/test.sh instead of LC_ALL=c. I _think_ I want LC_CTYPE=UTF8 and the rest (LC_COLLATE, LC_MONETARY, LC_NUMERIC, LC_TIME, LC_MESSAGES, at least according to http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html) set to C. (I sort of want LC_ALL_EXCEPT_THIS_BIT, but the posix page doesn't actually define LC_ALL as an environment variable so I dunno whether more specific ones override it if set?) Oddly, my ubuntu system doesn't define LC_ anything, just: LANG=en_US.UTF-8 LANGUAGE=en_US:en Which is another tangent entirely. (See "full internationalization: can of worms.") Anyway, lemme know if I broke sort, Rob _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
