> On 7 Feb 2017, at 05:42, Karl Wagner <[email protected]> wrote:
> 
>> 
>> On 6 Feb 2017, at 19:29, Ted F.A. van Gaalen via swift-evolution 
>> <[email protected] <mailto:[email protected]>> wrote:
>> 
>>> 
>>> On 6 Feb 2017, at 19:10, David Waite <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>>> 
>>>> On Feb 6, 2017, at 10:26 AM, Ted F.A. van Gaalen via swift-evolution 
>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>> 
>>>> Hi Dave,
>>>> Oops! yes, you’re right!
>>>> I did read again more thoroughly about Unicode 
>>>> and how Unicode is handled within Swift...
>>>> -should have done that before I write something- sorry.  
>>>> 
>>>> Nevertheless: 
>>>> 
>>>> How about this solution:  (if I am not making other omissions in my 
>>>> thinking again) 
>>>> -Store the string as a collection of fixed-width 32 bit UTF-32 characters 
>>>> anyway.
>>>> -however, if the Unicode character is a grapheme cluster (2..n Unicode 
>>>> characters),then 
>>>> store a pointer to a hidden child string containing the actual grapheme 
>>>> cluster, like so:
>>>> 
>>>> 1: [UTF32, UTF32, UTF32, 1pointer,  UTF32, UTF32, 1pointer, UTF32, UTF32]
>>>>                                                |                           
>>>>                |
>>>> 2:                               [UTF32, UTF32]                  [UTF32, 
>>>> UTF32, UTF32, ...]
>>>> 
>>>> whereby (1) is aString as seen by the programmer.
>>>> and (2)  are hidden child strings, each containing a grapheme cluster. 
>>> 
>>> The random access would require a uniform layout, so a pointer and scalar 
>>> would need to be the same size. The above would work with a 32 bit platform 
>>> with a tagged pointer, but would require a 64-bit slot for pointers on 
>>> 64-bit systems like macOS and iOS.
>>> 
>> Yeah, I know that,  but the “grapheme cluster pool” I am imagining 
>> could be allocated at a certain predefined base address, 
>> whereby the pointer I am referring to is just an offset from this base 
>> address. 
>> If so, an address space of  2^30  (1,073,741,824) 1 GB, will be available,
>> which is more than sufficient for just storing unique grapheme clusters..    
>> (of course, not taking in account other allocations and app limitations) 
> 
> When it comes to fast access what’s most important is cache locality. DRAM is 
> like 200x slower than L2 cache. Looping through some contiguous 16-bit 
> integers is always going to beat the pants out of derefencing pointers.
Hi Karl
That is of course hardware/processor dependent…and Swift runs on different 
target systems… isn’t? 
> 
>>   
>>> Today when I need to do random access into a string, I convert it to an 
>>> Array<Character>. Hardly efficient memory-wise, but efficient enough for 
>>> random access.
>>> 
>> As a programmer. I just want to use String as-is but with  direct 
>> subscripting like str[12..<34]
>> and, if possible also with open range like so: str[12…]   
>> implemented natively in Swift. 
>> 
>> Kind Regards
>> TedvG
>> www.tedvg.com <http://www.tedvg.com/>
>> www.ravelnotes.com <http://www.ravelnotes.com/>
>>  
>>> -DW
>> 
>> _______________________________________________
>> swift-evolution mailing list
>> [email protected] <mailto:[email protected]>
>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
> 
> 
> It’s quite rare that you need to grab arbitrary parts of a String without 
> knowing what is inside it. If you’re saying str[12..<34] - why 12, and why 
> 34? Is 12 the length of some substring you know from earlier? In that case, 
> you could find out how many CodeUnits it had, and use that information 
> instead.
For this example, I have used constants here, but normally these would be 
variables..
> 

I’d say it is not so rare, these things are often used for all kinds of string 
parsing, there are many
examples to be found on the Internet.
TedvG
> The new model will give you some form of efficient “random” access; the catch 
> is that it’s not totally random. Looking for the next character boundary is 
> necessarily linear, so the trick for large strings (>16K) is to make sure you 
> remember the CodeUnit offsets of important character boundaries.
> 
> - Karl

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to