> On 6 Feb 2017, at 19:29, Ted F.A. van Gaalen via swift-evolution > <[email protected]> wrote: > >> >> On 6 Feb 2017, at 19:10, David Waite <[email protected] >> <mailto:[email protected]>> wrote: >> >>> >>> On Feb 6, 2017, at 10:26 AM, Ted F.A. van Gaalen via swift-evolution >>> <[email protected] <mailto:[email protected]>> wrote: >>> >>> Hi Dave, >>> Oops! yes, you’re right! >>> I did read again more thoroughly about Unicode >>> and how Unicode is handled within Swift... >>> -should have done that before I write something- sorry. >>> >>> Nevertheless: >>> >>> How about this solution: (if I am not making other omissions in my >>> thinking again) >>> -Store the string as a collection of fixed-width 32 bit UTF-32 characters >>> anyway. >>> -however, if the Unicode character is a grapheme cluster (2..n Unicode >>> characters),then >>> store a pointer to a hidden child string containing the actual grapheme >>> cluster, like so: >>> >>> 1: [UTF32, UTF32, UTF32, 1pointer, UTF32, UTF32, 1pointer, UTF32, UTF32] >>> | >>> | >>> 2: [UTF32, UTF32] [UTF32, >>> UTF32, UTF32, ...] >>> >>> whereby (1) is aString as seen by the programmer. >>> and (2) are hidden child strings, each containing a grapheme cluster. >> >> The random access would require a uniform layout, so a pointer and scalar >> would need to be the same size. The above would work with a 32 bit platform >> with a tagged pointer, but would require a 64-bit slot for pointers on >> 64-bit systems like macOS and iOS. >> > Yeah, I know that, but the “grapheme cluster pool” I am imagining > could be allocated at a certain predefined base address, > whereby the pointer I am referring to is just an offset from this base > address. > If so, an address space of 2^30 (1,073,741,824) 1 GB, will be available, > which is more than sufficient for just storing unique grapheme clusters.. > (of course, not taking in account other allocations and app limitations)
When it comes to fast access what’s most important is cache locality. DRAM is like 200x slower than L2 cache. Looping through some contiguous 16-bit integers is always going to beat the pants out of derefencing pointers. > >> Today when I need to do random access into a string, I convert it to an >> Array<Character>. Hardly efficient memory-wise, but efficient enough for >> random access. >> > As a programmer. I just want to use String as-is but with direct > subscripting like str[12..<34] > and, if possible also with open range like so: str[12…] > implemented natively in Swift. > > Kind Regards > TedvG > www.tedvg.com <http://www.tedvg.com/> > www.ravelnotes.com <http://www.ravelnotes.com/> > >> -DW > > _______________________________________________ > swift-evolution mailing list > [email protected] <mailto:[email protected]> > https://lists.swift.org/mailman/listinfo/swift-evolution > <https://lists.swift.org/mailman/listinfo/swift-evolution> It’s quite rare that you need to grab arbitrary parts of a String without knowing what is inside it. If you’re saying str[12..<34] - why 12, and why 34? Is 12 the length of some substring you know from earlier? In that case, you could find out how many CodeUnits it had, and use that information instead. The new model will give you some form of efficient “random” access; the catch is that it’s not totally random. Looking for the next character boundary is necessarily linear, so the trick for large strings (>16K) is to make sure you remember the CodeUnit offsets of important character boundaries. - Karl
_______________________________________________ swift-evolution mailing list [email protected] https://lists.swift.org/mailman/listinfo/swift-evolution
