Re: best/fastest way to tell if a field contains unicode text?

Ben Rubinstein Thu, 20 Mar 2014 11:47:21 -0700

On 20/03/2014 15:37, Geoff Canyon wrote:

I have a field that has been populated by setting the unicodetext. Some
lines actually need unicode -- umlauts, enye, etc. -- and others are plain
ascii.


What's the most efficient way to count how many lines are plain and how
many actually need unicode?

Could you (when all the uni-7 stuff has settled down and we have properconversions etc) convert text from unicode to UTF8, and also to an 8- or 7-bitrepresentation, and compare the number of bytes in these two representations?

If the lengths are the same in both the UTF8 and ISO-8859-1 versions, then allthe characters could be represented in a single byte in UTF8.

That probably means in fact that all the characters are in ISO-8859-1 (I thinkthat the one-byte characters in UTF8 approximately correspond to ISO-8859-1,but I'm prepared to be corrected).

Depending your definition of 'plain', that may suffice. If your API actuallyneeds plain ASCII, then you can convert one more time, to ASCII, and comparethe actual text of the ISO-8859-1 and ASCII versions - if they differ thatshould be because some characters that aren't in ASCII have been replaced with"?", so it ain't ASCII. (Unless the textDecode system is cute and eg tries toreplace 'smart' quotes with plain ones...)


Ben

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: best/fastest way to tell if a field contains unicode text?

Reply via email to