Re: Decode UTF-8 in variable ?

Dar Scott Thu, 20 Jun 2013 08:58:34 -0700

You might be able to work with it as it is.

In UTF-8 the ASCII subset looks just like ... ASCII.  All characters 
represented by UTF-8 that are not ASCII are represented by one to three bytes 
with the high bit set, that is, the byte value of each byte in the sequence is 
over 127.  All ASCII characters are represented by a single byte with the high 
bit zero, that is, the byte value is less than 128.

So, if all the characters of the UTF-8 string are in the ASCII subset, it is 
already "converted" to ASCII.  

You are not going to find any interesting LiveCode characters (I think) in the 
non ASCII characters of UTF-8.  Tab, new-line (LF), space, comma, quote, 
digits, decimal point, and so on are all ASCII.  This means that your scripts 
to work with the db values might still work with UTF-8.  The important thing is 
to watch out for cases where you assume a character is one char (in the 
LiveCode script sense).  However, if you are not writing back to the db and you 
think the non ASCII characters as unimportant, then you can remove them.  
Conversion might remove or try to translate them, I'm not sure.  

If you think the encoding is not UTF-8, then it might be UTF-16.  If the text 
is mostly ASCII characters, then that will be encoded in UTF-16BE as NUL-char, 
NUL-char...  If it is UTF-16LE (little endian) then you will see char-NUL 
patterns.  So, if you see the code (that is charToNum()) is zero a lot, then 
suspect you have some form of UTF-16.  A db might use UTF-16LE, UTF-16BE, or 
track the endian of unsigned 16-bit integers of the machine.  If you know the 
order, you can decide whether to swap bytes to change the endian to that of 
your machine, then you can convert to UTF-8.  To get the endian of your 
machine, convert  a char to UTF-16 and then look at whether the first byte is 
NUL.  This paragraph has a lot of info, and I might have skipped some parts, so 
keep at me until I explain it well.

Dar

On Jun 20, 2013, at 12:07 AM, Ludovic Thébault wrote:

> Hello,
> 
> I need to get datas from  sqlite (in UTF-8) and convert it in ASCII for 
> treatment, but i don't need to put it in a field..
> I try unidecode(uniencode(myTXT, "utf8")) and many others solutions with no 
> result.
> 
> We need to pass by a field ? 
> with this command : set the unicodetext of field "xxx" to .. ?
> 
> Thanks
> _______________________________________________
> use-livecode mailing list
> [email protected]
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode

_______________________________________________
use-livecode mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Decode UTF-8 in variable ?

Reply via email to