Thanks Fraser, I had just started to understand this myself by browsing the 
actual Unicode Standard (6.2). The obvious objection that a bit-twiddler like 
me would make - "how do you know how many bytes are being used to represent a 
particular character in UTF-8?" - is answered  by the fact (as I understand it, 
looking at it for the first time) that the first bit of the first byte of any 
sequence is interpreted as a flag saying how many of the following bytes belong 
in the sequence: this bit has a predefined meaning - 0 means "no more bytes in 
this sequence", 1 means "expect some more bytes, and look at their own top bits 
to see how many" and the subsequent bytes (if any) also have flags like this 
embedded in them. This is clearly a bit tricky to interpret (looks like it's 
fairly easy to get lost, for example if a byte gets missed from the sequence), 
but at least it explains how you can get a variable number of bytes in the 
encoding.

Light dawns very slowly. I am glad I am not writing a Unicode word processor. I 
am still very far from understanding how LC goes about handling Unicode, and 
how 7.x will differ from the 6.5.x we have now, and how, if I put something in 
UTF-8 onto the clipboard, LC will be able to transform it into UTF-16.

There's a lot to learn.

Graham


On 26 Jan 2014, at 19:02, Fraser Gordon <[email protected]> wrote:

> On 26/01/2014 17:31, Richmond wrote:
>> I'm not sure that ALL Unicode chars are double-byte ones; possibly the
>> first 255 are not.
> It depends on the encoding. In UTF-16 encoding, all characters are
> either 2 bytes or 4 bytes. In UTF-8, they can be 1 (for the first 128
> characters), 2, 3 or 4 bytes long (depending on the character). LiveCode
> 6.x uses UTF-16 and should consequently have 2 byte unicode characters.
> 
> Regards,
> Fraser
> 


_______________________________________________
use-livecode mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to