Hi Todd, On Tue, Dec 15, 2015 at 3:46 PM, Todd Blanchard <[email protected]> wrote:
> Hi Eliot, > > On Dec 15, 2015, at 13:46, Eliot Miranda <[email protected]> wrote: > > Just so you know, I will dig my heels in as deeply as I am able to prevent > the use of C++ libraries in the VM. It destroys the simulator, which is > the most important thing we have for VM development productivity. As far > as I'm concerned any use of external libraries to implement core > functionality kills the VM-in-Smalltalk concept that Squeak (and Pharo) are > built upon. > > > OK, I defer to you because you certainly know more about the VM internals > and what does and doesn't work well than anyone else. > > So I guess I would like to know your recommendation for 1) how best to > store strings - byte arrays (UTF8), - 2-byte word arrays (UTF16 - now we > get to worry about endian). > Raw Unicode, either as 8-bit, 16-bit or 32-bit. When creating a String it should start as an 8-bit-per-Unicode-character string. Attempts to store Character values that won't fit cause the String to become a String whose element size is large enough to accommodate the character. In Spur, become: is cheap so this growth pays only for the reallocation and copying of the at a, not for an expensive heap scan necessary to do the become:. > Bearing in mind that both representations are variable length and so while > accessing the n'th byte/word is O(1), accessing the n'th character is > necessarily O(n) unless you know you have no surrogates in your string. > Right, so UTF-8 and UTF-16 are not convenient representations and to be provided only for interchange. > > Also...since NSString has been mentioned...it is worth noting that > NSString is built atop CFString (source code here: > https://www.opensource.apple.com/source/CF/CF-855.11/CFString.c) which > does a fair job of optimizing memory by using bytes where it can and shorts > where it cannot. It is also worth noting that characterAt: actually does > the wrong thing, since it assumes characters are no bigger than FFFF rather > than 10FFFF. > Yes, and Squeak (and AFAIA, Pharo) has been doing this for ages. If one has become: it is very easy to manage. Now with Spur not only do we have become:, we have a fairly fast become:. Does this make sense? > Also...I'll just toss in this very nice article on unicode and how > NSString deals with it. > https://www.objc.io/issues/9-strings/unicode/ > > -Todd Blanchard > _,,,^..^,,,_ best, Eliot
