On Mon, Dec 09, 2013 at 08:16:03AM -0600, msk...@ansuz.sooke.bc.ca wrote: > On Mon, 9 Dec 2013, Philip Taylor wrote: > > Keith -- could you possible supply an example of > > "a properly encoded utf-8 string" from which it > > can be unambiguously determined whether the string > > "sang" is an English word (the past tense of "sing") > > I'll probably regret pointing this out, and the characters involved have > been deprecated since Unicode 5, but: > > U+E0001 U+E0065 U+E006E U+0073 U+0061 U+006E U+0067
And it is a kind of tagging, so beyond the scope of identifying the language of *untagged* text (which is the claim that spurred all this discussion). Regards, Khaled -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex