Hi Dave
Thanks for your time to go in to this an explain.
This optimising goes much further then I thought.

> That's fine. Please don't be offended that I don't wish to argue it further. 
> It's been an interesting exercise while I'm on vacation and I hoped it would 
> lay out some general principles that would be useful to others in future even 
> if you are not convinced, but when I get back to work next week I'll have to 
> focus on other things.

yes, I understand, it would become too iterative and time consuming I guess. 
( how can you become work-detached if you keep doing things like this during 
vacation? )  

Enjoy your vacation!  
TedvG




> On 24 Feb 2017, at 22:40, Dave Abrahams <[email protected]> wrote:
> 
> 
> 
> Sent from my moss-covered three-handled family gradunza
> 
> On Feb 23, 2017, at 2:04 PM, Ted F.A. van Gaalen <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> 
>>> On 23 Feb 2017, at 02:24, Dave Abrahams <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Equally a non-starter. All known threadsafe schemes that require caches to 
>>> be updated upon non-mutating operations have horrible performance issues, 
>>> and further this would penalize all string code by reserving space for the 
>>> cache and filling it even for the vast majority of operations that don't 
>>> require random access.
>> Well, maybe “caching” is not the right description for what I've suggested.
>> It is more like:
>>   let all strings be stored as they are now, but as soon as you want to work 
>> with 
>> random accessing parts of a string just “lift the string out of normal 
>> optimised string storage” 
>>  and then add (temporarily)  a Character array so one can work with this 
>> array directly ” 
> 
> That's a cache.
> 
>> which implies that all other strings remain as they are.  ergo: efficiency 
>> is only reduced for the “elevated” strings,
> 
> You have to add that temporary array somewhere.  The performance of every 
> string is penalized for that storage, and also for the cost of throwing it 
> out upon mutation. Every branch counts. 
> 
>> Using e.g. str.freeSpace(), if necessary, would then place the String back 
>> in its normal storage domain, thereby disposing the Character array
>> associated with it. 
> 
> Avoiding hidden dynamic storage overhead that needs to be freed is an 
> explicit goal of the design (see the section on String and Substring).
> 
>>> Trust me, we've gotten lots of such suggestions and thought through the 
>>> implications of each one very carefully.  
>> That’s good, because it means, that a lot of people are interested in this 
>> subject and wish to help.  
>> Of course you’ll get many of suggestions that might not be very useful, 
>> perhaps like this one... but sometimes suddenly someone 
>> comes along with things that might never have occurred to you. 
>> That is the beautiful nature of ideas…
> 
> But at some point, I hope you'll understand, I also have to say that I think 
> all the simple schemes have been adequately explored and the complex ones all 
> seem to have this basic property of relying on caches, which has unacceptable 
> performance, complexity, and, yes, usability costs. Analyzing and refuting 
> each one in detail begins to be a waste of time after that.  I'm not really 
> willing to go further down this road unless someone has an implementation and 
> experimental evidence that demonstrates it as non-problematic. 
> 
> 
>>> I'm afraid you will have to accept being disappointed about this. 
>> Well, like most developers, I am a stubborn kind of guy.. 
>> Luckily Swift is very flexible like Lego, so I rolled my own convenience 
>> struct.
>> If I need direct access on a string I simply copy the string to it.
>> it permits things like this:  (and growing) 
>> 
>> let strabc = "abcdefghjiklmnopqrstuvwxyz"
>> let strABC = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
>> var abc = TGString(strabc)
>> var ABC = TGString(strABC)
>> 
>> func test()
>> {
>>     // as in Basic: left$, mid$, right$
>>     print(abc.left(5))
>>     print(abc.mid(5,10))
>>     print(ABC.mid(5))
>>     print(ABC.right(5))
>> 
>>     // ranges and concatenation:
>>     print(abc[12..<23])
>>     print(abc.left(5) + ABC.mid(6,6) + abc[10...25])
>>     
>>     // eat anything:
>>     let d:Double = -3.14159
>>     print(TGString(d))
>>  
>>     let n:Int = 1234
>>     print(TGString(n))
>>     
>>     print(TGString(1234.56789))
>>     
>>     let str = abc[15..<17].asString      // Copy to to normal Swift String
>>     print(str)
>>     
>>     let s = "\(abc[12..<20])" // interpolate to normal Swift String.
>>     print(s)
>>     
>>     abc[3..<5] = TGString("34") // if lengths don't match:
>>     abc[8...9] = ABC[24...25]   //  length of dest. string is altered.
>>     abc[12] = TGString("$$$$")  //  if src l > 1 will insert remainder after 
>> dest.12 here
>>     abc[14] = TGString("")      //  empty removes character at pos.
>>     print(abc)
>>     abc.insert(at: 3, string: ABC[0..<3])
>>     print(abc)
>> }
>>  
>> test()
>> .
>> outputs: 
>> abcde
>> fghjiklmno
>> FGHIJKLMNOPQRSTUVWXYZ
>> VWXYZ
>> mnopqrstuvw
>> abcdeGHIJKLklmnopqrstuvwxyz
>> -3.14159
>> 1234
>> 1234.56789
>> abcdefghjiklmnopqrstuvwxyz
>> mnopqrst
>> abc34fghYZkl$$$$nopqrstuvwxyz
>> abcABC34fghYZkl$$$$nopqrstuvwxyz
>> 
>> kinda hoped that this could be builtin in Swift strings 
>> Anyway, I’ve made myself what I wanted, which happily co-exists
>> alongside normal Swift strings.  Performance and storage
>> aspects of my struct TGString are not very important, because
>> I am not using this on thousands of strings.
>> Simply want to use a string as a plain array, that’s all, 
>> which is implemented in almost every PL on this planet. 
>> 
>> 
>>> More generally, there's a reason that the collection model has 
>>> bidirectional and random access distinctions: important data structures are 
>>> inherently not random access. 
>> I don’t understand the above line: definition of “important data structures” 
>> <> “inherently” 
> 
> Important data structures are those from the classical CS literature upon 
> which every practical programming language (and even modern CPU hardware) is 
> based, e.g. hash tables. Based on the properties of modern string processing, 
> strings fall into the same category. "Inherent" means that performance 
> characteristics are tied to the structure of the data or problem being 
> solved. You can't sort in better than O(N log N) worst case (mythical quantum 
> computers don't count here), and that's been proven mathematically. Similarly 
> it's easy to prove that the constraints of our problem mean that counting the 
> characters in a string will always be O(N) worst case where N is the length 
> of the representation. That means strings are inherently not random access.
> 
>>> Heroic attempts to present the illusion that they are randomly-accessible 
>>> are not going to fly. 
>>   ?? Accessing discrete elements directly 
> 
> All collections have direct access via indices. You mean randomly, via 
> arbitrary integers. 
> 
>> in an array is not an illusion to me. 
>> (e.g. I took the 4th and 7th eggs from the container) 
> 
> It's not an illusion when they're stored in an array. 
> 
> If you have to walk down an aisle of differently sized cereal boxes to pick 
> the 5th box of SuperBoomCrisp Flakes off the shelf in the grocery store, 
> that's not random access (even if you're willing to drop the boxes into an 
> array for later lookups as you go, as you're proposing). That's what Strings 
> are like. 
> 
>>> These abstractions always break down,  leaking the true non-random-access 
>>> nature in often unpredictable ways, penalizing lots of code for the sake of 
>>> a very few use-cases, and introducing complexity that is hard for the 
>>> optimizer to digest and makes it painful (sometimes impossible) to grow and 
>>> evolve the library.  
>>> 
>> Is an Array an abstraction? of what? 
> 
> A randomly-accessible homogeneous tail growable collection. But the 
> abstraction in question here isn't Array; it's RandomAccessCollection. 
> 
>> I don’t get this either. most components in the real world can be accessed 
>> randomly. 
> 
> Actually that's far from being true in the real world.  See the grocery store 
> above. Computer memory is very unlike most things in the real world. Ask any 
> roboticist.
> 
>> 
>>> This should be seen as a general design philosophy: Swift presents 
>>> abstractions that harmonize with, rather than hide, the true nature of 
>>> things.
>> The true nature of things is a very vague and subjective criterium, 
> 
> Not at all; see my explanation above.
> 
>> how can you harmonise with that, let alone with abstractions? 
>> e.g. for me: “the true nature of things” for an array is that it has direct 
>> accessible discrete elements…
> 
> Arrays definitely support random access. Strings are not arrays.
> 
>> Sorry, with respect, we have a difference of opinion here.
> 
> That's fine. Please don't be offended that I don't wish to argue it further. 
> It's been an interesting exercise while I'm on vacation and I hoped it would 
> lay out some general principles that would be useful to others in future even 
> if you are not convinced, but when I get back to work next week I'll have to 
> focus on other things. 
> 
>> Thanks btw for the link to this article about tagged pointers, very 
>> interesting.
>> it inspired me  to (have) read other things in this domain as well.  
>> 
>> TedvG

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to