On 09/26/2018 10:28 AM, Rob Landley wrote: > The crunch_str() logic is designed to escape nonprintable stuff and for > watch.c > I need to write something that measures output but lets utf8 combining stuff > happen. (And measures tabs. And also parses at least the color change part of > ansi escapes, but we'll burn that bridge when we come to it...) > > Using hexdump and echo -e's hex escapes to try to print minimal bits of the > combining character examples (which cut and paste appears to have horked > somewhat, but you get the idea): > > $ cat tests/files/utf8/test1.txt > l̴̗̞̠ȩ̸̩̥ṱ̴͍̻ ̴̲͜ͅt̷͇̗̮h̵̥͉̝e̴̡̺̼ ̸̤̜͜ŗ̴͓͉i̶͉͓͎t̷̞̝̻u̶̻̫̗a̴̺͎̯l̴͍͜ͅ > ̵̩̲̱c̷̩̟̖o̴̠͍̻m̸͚̬̘ṃ̷̢͜e̵̗͎̫n̸̨̦̖c̷̰̩͎e̴̱̞̗ > $ echo -e '\xcc\xb4\xcc\x97\xcc\xa0e' > e > $ echo -e 'l\xcc\xb4\xcc\x97\xcc\xa0e' > l̴̗̠e > $ echo -e '\xcc\xb4\xcc\x97\xcc\xa0ee' > ee > $ echo -e 'l\xcc\xb4\xcc\x97\xcc\xa0' > l̴̗̠ > $ echo -e '\xcc\xb4\xcc\x97\xcc\xa0' > > So there needs to be a character _before_ the combining characters for them to > take effect, but they apply to the character _after_? Even when it's a > newline? > (Which still works as a newline, but leaves trailing weirdness?)
But if I have just enough characters to fill a line, the trailing weirdness does _not_ go to the next line (it appears to get discarded), at least on my 80 char xfce Terminal: echo -e 'xxxxxxxxxxxxxxxxxx0123456789091234567890123456789012345678901234567890123456789a\xcc\xb4\xcc\x97\xcc\xa0' I should look up what these escape sequences _do_. Hmmm... I could slowly and painfully do that by hand, but really I want a sort of unicode version of "hexdump -C" telling me what the codepoints are. (Ideally combined with a variant of the "ascii" program to then tell me what each one does.) Somebody has to have written this already, but I dunno what to Google for. Hmm... Hey Rich, I'm fiddling with unicode and lost/confused. Know any good tools for this? Rob _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
