Hi Eliot,

> On Dec 15, 2015, at 13:46, Eliot Miranda <eliot.mira...@gmail.com> wrote:
> 
> Just so you know, I will dig my heels in as deeply as I am able to prevent 
> the use of C++ libraries in the VM.  It destroys the simulator, which is the 
> most important thing we have for VM development productivity.  As far as I'm 
> concerned any use of external libraries to implement core functionality kills 
> the VM-in-Smalltalk concept that Squeak (and Pharo) are built upon.

OK, I defer to you because you certainly know more about the VM internals and 
what does and doesn't work well than anyone else.  

So I guess I would like to know your recommendation for 1) how best to store 
strings - byte arrays (UTF8), - 2-byte word arrays (UTF16 - now we get to worry 
about endian).  

Bearing in mind that both representations are variable length and so while 
accessing the n'th byte/word is O(1), accessing the n'th character is 
necessarily O(n) unless you know you have no surrogates in your string.

Also...since NSString has been mentioned...it is worth noting that NSString is 
built atop CFString (source code here: 
https://www.opensource.apple.com/source/CF/CF-855.11/CFString.c 
<https://www.opensource.apple.com/source/CF/CF-855.11/CFString.c>) which does a 
fair job of optimizing memory by using bytes where it can and shorts where it 
cannot.  It is also worth noting that characterAt: actually does the wrong 
thing, since it assumes characters are no bigger than FFFF rather than 10FFFF.  

Also...I'll just toss in this very nice article on unicode and how NSString 
deals with it.
https://www.objc.io/issues/9-strings/unicode/ 
<https://www.objc.io/issues/9-strings/unicode/>

-Todd Blanchard


Reply via email to