The crunch_str() logic is designed to escape nonprintable stuff and for watch.c I need to write something that measures output but lets utf8 combining stuff happen. (And measures tabs. And also parses at least the color change part of ansi escapes, but we'll burn that bridge when we come to it...)
Using hexdump and echo -e's hex escapes to try to print minimal bits of the combining character examples (which cut and paste appears to have horked somewhat, but you get the idea): $ cat tests/files/utf8/test1.txt l̴̗̞̠ȩ̸̩̥ṱ̴͍̻ ̴̲͜ͅt̷͇̗̮h̵̥͉̝e̴̡̺̼ ̸̤̜͜ŗ̴͓͉i̶͉͓͎t̷̞̝̻u̶̻̫̗a̴̺͎̯l̴͍͜ͅ ̵̩̲̱c̷̩̟̖o̴̠͍̻m̸͚̬̘ṃ̷̢͜e̵̗͎̫n̸̨̦̖c̷̰̩͎e̴̱̞̗ $ echo -e '\xcc\xb4\xcc\x97\xcc\xa0e' e $ echo -e 'l\xcc\xb4\xcc\x97\xcc\xa0e' l̴̗̠e $ echo -e '\xcc\xb4\xcc\x97\xcc\xa0ee' ee $ echo -e 'l\xcc\xb4\xcc\x97\xcc\xa0' l̴̗̠ $ echo -e '\xcc\xb4\xcc\x97\xcc\xa0' So there needs to be a character _before_ the combining characters for them to take effect, but they apply to the character _after_? Even when it's a newline? (Which still works as a newline, but leaves trailing weirdness?) I googled a bit and found out about "zero width joiners" and "zero width non-joiners" and am now even more confused. (I know about the sequence that reverses direction, and should test that my reset.c is resetting that, but I'm willing to call that one pilot error for the moment...) Rob _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
