On 20/03/2014 15:37, Geoff Canyon wrote:
I have a field that has been populated by setting the unicodetext. Some
lines actually need unicode -- umlauts, enye, etc. -- and others are plain
ascii.
What's the most efficient way to count how many lines are plain and how
many actually need unicode?
Could you (when all the uni-7 stuff has settled down and we have proper
conversions etc) convert text from unicode to UTF8, and also to an 8- or 7-bit
representation, and compare the number of bytes in these two representations?
If the lengths are the same in both the UTF8 and ISO-8859-1 versions, then all
the characters could be represented in a single byte in UTF8.
That probably means in fact that all the characters are in ISO-8859-1 (I think
that the one-byte characters in UTF8 approximately correspond to ISO-8859-1,
but I'm prepared to be corrected).
Depending your definition of 'plain', that may suffice. If your API actually
needs plain ASCII, then you can convert one more time, to ASCII, and compare
the actual text of the ISO-8859-1 and ASCII versions - if they differ that
should be because some characters that aren't in ASCII have been replaced with
"?", so it ain't ASCII. (Unless the textDecode system is cute and eg tries to
replace 'smart' quotes with plain ones...)
Ben
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode