Richard Below is a function that was translated from a PHP script. It is intended to determine whether the passed in string "could be" utf8. I have tested it in a limited way and it seems to work. But maybe someone else can see the flaws.
If it returns false, then it is not UTF8. If it returns true, it fits the pattern of utf8, but it could be something else like some random binary. If it doesn't work, you could perhaps use it to scare children. function couldBeUtf8 pString put "(?is)^([\x09\x0A\x0D\x20-\x7E]" into tRE put "|[\xC2-\xDF][\x80-\xBF]" after tRE put "|\xE0[\xA0-\xBF][\x80-\xBF]" after tRE put "|[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}" after tRE put "|\xED[\x80-\x9F][\x80-\xBF]" after tRE put "|\xF0[\x90-\xBF][\x80-\xBF]{2}" after tRE put "|[\xF1-\xF3][\x80-\xBF]{3}" after tRE put "|\xF4[\x80-\x8F][\x80-\xBF]{2})*$" after tRE return matchText(pString, tRE) end couldBeUtf8 Cheers Dave On 6 Oct 2010, at 21:23, Richard Gaskin wrote: > I have an app that needs to auto-detect Unicode and plain text, and render > them correctly based on that auto-detection. > > I have the UTF16 stuff working, but with UTF8 I have a problem: there is no > BOM to let me know if it's Unicode, and some plain text files will > occasionally have high-ASCII values in them (like the dagger symbol). > > What patterns should I be looking for in the binary data of a file to > distinguish UTF8 from plain text? > > -- > Richard Gaskin > Fourth World > LiveCode training and consulting: http://www.fourthworld.com > Webzine for LiveCode developers: http://www.LiveCodeJournal.com > LiveCode Journal blog: http://LiveCodejournal.com/blog.irv > _______________________________________________ > use-revolution mailing list > use-revolution@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-revolution _______________________________________________ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution