> On 23 Jun 2017, at 11:19 am, Richard Gaskin via use-livecode 
> <use-livecode@lists.runrev.com> wrote:
> 
> Monte Goulding wrote:
> 
> >> On 23 Jun 2017, at 10:06 am, Richard Gaskin wrote:
> >>
> >> How can we know which is in use for a given string?
> >
> > You shouldn’t need to know. The engine will use native encoding where
> > possible for efficiency. A lot of the performance improvements between
> > LC 7 and 8 were using the right code paths based on whether the string
> > is native or unicode.
> 
> Seems murky.  I'd much rather at least have something like a byteLen 
> function, which returns the number of bytes for a given string.  With that I 
> can maintain byte offsets into a file with good performance and no ambiguity.

In theory `the number of bytes of <string>` should in my opinion return 
whatever the byteLength function would given the codeunit docs state:

> The hierarchy of the new and altered chunk types is as follows: byte w of 
> codeunit x of codepoint y of char z of word …. 

However this report was resolved as not a bug so I guess that theory is wrong 
and maybe there’s a docs bug in there (I have asked internally on our language 
channel) http://quality.livecode.com/show_bug.cgi?id=13248 
<http://quality.livecode.com/show_bug.cgi?id=13248>
put the number of codeunits of  “😀️” -> 3 

So this is actually 6 bytes but as documented you can’t rely on the codeunit 
length being 16 bit so I guess that means there is currently no way to get what 
you want reliably. Whether you need it is a separate discussion.
> 
> 
> >> Suppose I wanted to process a lot of text, so performance is
> >> critical. Using bytes would be optimal, since any chunk type or even
> >> Unicode characters may vary in length.
> >>
> >> So if I wanted to create an index of byte offsets into a large chunk
> >> of text, how would I know how long a character is?
> >
> > If it’s text encoded then you probably want to use character offsets
> > and let the engine worry about optimising it. If you know it’s binary
> > data then use bytes.
> 
> How do I find a substring in binary data in a what that will tell me the 
> number of bytes of the offset?


If you are dealing with bytes of binary data then use byteOffset. Is that what 
you mean here? Probably better to talk about ranges rather than substrings if 
you are discussing binary data.

Cheers

Monte

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to