On 9/22/22 10:48, enh wrote: > On Thu, Sep 22, 2022 at 2:43 AM Rob Landley <[email protected] > <mailto:[email protected]>> wrote: > > If you you have the same set of combining characters in a different > order, is > the result still considered the same character for string matching > purposes? > > "depends". there are multiple normalization forms.
Oh joy. > "For most full-featured regular expression engines, it is quite difficult to > match under canonical equivalence, which may involve reordering, splitting, or > merging of characters." I've gone back to just punting unicode to regcomp() and friends: you stick a character above 127 in your pattern and it's not taking the fast path I'm implementing. (Not that I expect the regex engine to do better, but then it's not _my_ fault quite so much.) But I'm trying to understand regex escapes, and... $ echo 'a[c' | grep 'a\[c' a[c $ echo 'a\bc' | grep 'a\bc' $ echo abc | grep 'a\bc' $ echo ac | grep 'a\bc' $ echo 'a^c' | grep 'a\^c' a\c $ echo 'a^c' | grep 'a^c' a^c $ echo 'a\b' | grep 'a\b' a\b $ echo 'a\b' | grep 'a\b.' a\b $ echo 'a\b' | grep 'a\b..' a\b $ echo 'a\b' | grep 'a\b...' $ I do not understand regex escapes. (This is all with the debian grep.) Rob _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
