On Wed, Oct 20, 2021 at 01:56:21AM -0500, Rob Landley wrote: > Those tables are just under 2k each, and could SERIOUSLY be compressed. For > example, the first 512 bytes of the first table are all "16" except for a > sparse > set of sequential values running from 18 to 59, so it could be initialized > from > a bitmask. Let's see, quick and dirty bash to generate said bitmask from the > table:
If you know a compression sceheme that admits accessing them efficiently in-place, that would be great. Expanding them into memory is a big net loss. You'd be trading shareable rodata for per-process dirty memory plus decompression code (text) that's comparable in size if not larger than the uncompressed data. ALWAYS prefer static const [] tables over runtime generation except possibly in microcontroller context where the usual ROM vs RAM cost analysis may not apply. > Not sure if I _should_, but I _can_. (It was nice to leave this to libc. Then > it > wasn't my problem to update it every time Microsoft wrote another check to the > unicode committee. Both glibc and musl can do this when statically linked. > Sigh.) I think it's better not to duplicate this information in more places that become inconsistent if you don't have to. In theory glibc users might even have locales where the ambiguous-width characters are treated as wide -- I'm not sure if anyone does this anymore but it was legacy CJK practice in some locales and honoring that is the polite thing to do. > P.S. Rich's other table has some 17s mixed in the 16s which... I think it > moves > in runs of 8? Very small bitmap if so? It would be so much easier to work out > the alignment if he'd wordwrapped his tables to a consistent number of entries > per line, but no. Eh, runs of 4, 54 bits total. Plus two isolated weirdos.) > And The tables are generated and the generator aims to format the output such that diffs are small and readable when the output is checked in. It wraps both at column overflow and at fixed power of two indices. If you'd like to see it, the code (horribly ugly; this does not matter because it's not something to be deployed anywhere) is at: https://github.com/richfelker/musl-chartable-tools > most of the nonzero values in the latter part are 255, so traverse a bitmap of > THOSE and there's not much left to initialize afterwards. Yeah, a dozen lines > per table of initialization is looking doable, within range of sticking it in > portability.c... I'm pretty sure you'll find this larger than the tables. Rich _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net