On 10/20/23 23:22, Oliver Webb via Toybox wrote: > Heya, I noticed that tr was in pending, taking a look at the source code.
Yeah, it's one of the big remaining todo items to get Linux From Scratch building, I was looking at it briefly last week... > It doesn't look very unclean, nor does it fail any test cases. I have a redesign to make it handle utf-8 encoded unicode, both in the input and in the patterns. Took me forever to work out how, but I _think_ I understand it now? Just haven't done it yet. Well, I think I've figured out how to handle unicode (with combining characters) and the [:class:] specifiers. Still don't understand what [=CHAR=] equivalency classes mean, exactly, other than "strip combining characters"? Except there's a lot of À Á Â Ã in the base set that... the man page says that equivalence classes are defined by LC_COLLATE but everybody seems to punt on the specifics. (Or maybe this is just a symptom of Google having a harder time finding stuff these days? Section 3.1.3.6 of http://unicode.org/L2/L2001/01487-14652w25.pdf is not very illuminating.) Anyway, hadn't dug into that part yet. Vaguely planning to punt and wait for a complaint, because the OTHER thing that comes up a lot when you search for this is "it doesn't work". Although I am highly amused by the database error at: https://www.unix.com/shell-programming-and-scripting/283373-equivalence-classes-dont-work.html Which is saying that the page talking about how equivalence classes don't work itself does not work. This guy went into detail, but I have not opened that particular can of worms yet: http://databasearchitects.blogspot.com/2016/08/equivalence-of-unicode-strings-is.html > The only 2 things > in the TODO are -t and -a. Neither POSIX or GNU tr specify a -a[scii] option. > The name gives a general idea of what it's supposed to do > (Stop acting utf-8 safe and treat everything as extended ASCII?) It's a note-to-self that there should probably be a way to disable the unicode support I haven't added yet, and that -a isn't currently used anywhere I could find. > I added in a -t[runcate] option and a corresponding test case. > > I also cleaned up some of code (foobar[0] to *foobar, removing sizeof(char), > etc) Applied, and I did a little more cleanup while I was there. Rob _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
