Text strings in LiveCode are native encoded (MacRoman or ISO 8859) where 
possible and where you don’t explicitly tell the engine it’s unicode (via 
textDecode) so that they can follow faster single byte code paths. If you use 
textDecode then the engine will first check if the text can be native encoded 
and use native if so otherwise it will use UTF 16 encoding.

For what it’s worth using `offset` is the wrong thing to do if you have 
textEncoded your strings into binary data. You want to use `byteOffset` 
otherwise the engine will convert your data to a string and assume native 
encoding. This is probably why you are getting some case insensitivity.

I haven’t been following along the offset discussion. I’ll have to take a look 
to see if there were some speed comparisons between offset and codepointOffset.

Cheers

Monte

> On 13 Nov 2018, at 9:35 am, Ben Rubinstein via use-livecode 
> <use-livecode@lists.runrev.com> wrote:
> 
> This is something that I've been wondering about for a while.
> 
> My unexamined assumption had been that in the 'new' fully unicode LC, text 
> was held in UTF-8. However when I saved some text strings in binary I got 
> something like UTF-8 - but not quite. And the recent experiments with offset 
> suggested that LC at the least is able to distinguish between a string which 
> is fully represented as single-byte (or perhaps ASCII?). And the reports of 
> the ingenious investigators using UTF-32 to speed up offsets, and discovering 
> that offset somehow managed to be case-insensitive in this case, made me 
> wonder whether after using textEncode(xt, "UTF-32") LC marks the string in 
> some way to give a clue about how to interpret it as text?
> 
> So could someone who is familar with this bit of the engine enlighten us? In 
> particular:
> - What is the internal format?
> - Is it different on different platforms?
> - Given that it appears to include a flag to indicate whether it is 
> single-byte text or not, are there any other attributes?
> - Does saving a string in 'binary' file faithfully report the internal format?
> 
> TIA,
> 
> Ben
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to