Hi Eliot, > On Dec 15, 2015, at 13:46, Eliot Miranda <eliot.mira...@gmail.com> wrote: > > Just so you know, I will dig my heels in as deeply as I am able to prevent > the use of C++ libraries in the VM. It destroys the simulator, which is the > most important thing we have for VM development productivity. As far as I'm > concerned any use of external libraries to implement core functionality kills > the VM-in-Smalltalk concept that Squeak (and Pharo) are built upon.
OK, I defer to you because you certainly know more about the VM internals and what does and doesn't work well than anyone else. So I guess I would like to know your recommendation for 1) how best to store strings - byte arrays (UTF8), - 2-byte word arrays (UTF16 - now we get to worry about endian). Bearing in mind that both representations are variable length and so while accessing the n'th byte/word is O(1), accessing the n'th character is necessarily O(n) unless you know you have no surrogates in your string. Also...since NSString has been mentioned...it is worth noting that NSString is built atop CFString (source code here: https://www.opensource.apple.com/source/CF/CF-855.11/CFString.c <https://www.opensource.apple.com/source/CF/CF-855.11/CFString.c>) which does a fair job of optimizing memory by using bytes where it can and shorts where it cannot. It is also worth noting that characterAt: actually does the wrong thing, since it assumes characters are no bigger than FFFF rather than 10FFFF. Also...I'll just toss in this very nice article on unicode and how NSString deals with it. https://www.objc.io/issues/9-strings/unicode/ <https://www.objc.io/issues/9-strings/unicode/> -Todd Blanchard