> Am 05.08.2016 um 17:17 schrieb Joe Groff <[email protected]>:
> 
>> 
>> On Aug 4, 2016, at 11:31 AM, Johannes Neubauer <[email protected]> wrote:
>> 
>>> 
>>> Am 04.08.2016 um 20:21 schrieb Joe Groff <[email protected]>:
>>> 
>>>> 
>>>> On Aug 4, 2016, at 11:20 AM, Johannes Neubauer <[email protected]> 
>>>> wrote:
>>>> 
>>>> 
>>>>> Am 04.08.2016 um 17:26 schrieb Matthew Johnson via swift-evolution 
>>>>> <[email protected]>:
>>>>> 
>>>>>> 
>>>>>> On Aug 4, 2016, at 9:39 AM, Joe Groff <[email protected]> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> On Aug 3, 2016, at 8:46 PM, Chris Lattner <[email protected]> wrote:
>>>>>>> 
>>>>>>> On Aug 3, 2016, at 7:57 PM, Joe Groff <[email protected]> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> a. We indirect automatically based on some heuristic, as an
>>>>>>>>>>> optimization.
>>>>>>>>> 
>>>>>>>>> I weakly disagree with this, because it is important that we provide 
>>>>>>>>> a predictable model.  I’d rather the user get what they write, and 
>>>>>>>>> tell people to write ‘indirect’ as a performance tuning option.  “Too 
>>>>>>>>> magic” is bad.
>>>>>>>> 
>>>>>>>> I think 'indirect' structs with a heuristic default are important to 
>>>>>>>> the way people are writing Swift in practice. We've seen many users 
>>>>>>>> fully invest in value semantics types, because they wants the benefits 
>>>>>>>> of isolated state, without appreciating the code size and performance 
>>>>>>>> impacts. Furthermore, implementing 'indirect' by hand is a lot of 
>>>>>>>> boilerplate. Putting indirectness entirely in users' hands feels to me 
>>>>>>>> a lot like the "value if word sized, const& if struct" heuristics C++ 
>>>>>>>> makes you internalize, since there are similar heuristics where 
>>>>>>>> 'indirect' is almost always a win in Swift too.
>>>>>>> 
>>>>>>> I understand with much of your motivation, but I still disagree with 
>>>>>>> your conclusion.  I see this as exactly analogous to the situation and 
>>>>>>> discussion when we added indirect to enums.  At the time, some argued 
>>>>>>> for a magic model where the compiler figured out what to do in the most 
>>>>>>> common “obvious” cases.
>>>>>>> 
>>>>>>> We agreed to use our current model though because:
>>>>>>> 1) Better to be explicit about allocations & indirection that implicit.
>>>>>>> 2) The compiler can guide the user in the “obvious” case to add the 
>>>>>>> keyword with a fixit, preserving the discoverability / ease of use.
>>>>>>> 3) When indirection is necessary, there are choices to make about where 
>>>>>>> the best place to do it is.
>>>>>>> 4) In the most common case, the “boilerplate” is a single “indirect” 
>>>>>>> keyword added to the enum decl itself.  In the less common case, you 
>>>>>>> want the “boilerplate” so that you know where the indirections are 
>>>>>>> happening.
>>>>>>> 
>>>>>>> Overall, I think this model has worked well for enums and I’m still 
>>>>>>> very happy with it.  If you generalize it to structs, you also have to 
>>>>>>> consider that this should be part of a larger model that includes 
>>>>>>> better support for COW.  I think it would be really unfortunate to 
>>>>>>> “magically indirect” struct, when the right answer may actually be to 
>>>>>>> COW them instead.  I’d rather have a model where someone can use:
>>>>>>> 
>>>>>>> // simple, predictable, always inline, slow in some cases.
>>>>>>> struct S1 { … }
>>>>>>> 
>>>>>>> And then upgrade to one of:
>>>>>>> 
>>>>>>> indirect struct S2 {…}
>>>>>>> cow struct S3 { … }
>>>>>>> 
>>>>>>> Depending on the structure of their data.  In any case, to reiterate, 
>>>>>>> this really isn’t the time to have this debate, since it is clearly 
>>>>>>> outside of stage 1.
>>>>>> 
>>>>>> In my mind, indirect *is* cow. An indirect struct without value 
>>>>>> semantics is a class, so there would be no reason to implement 
>>>>>> 'indirect' for structs without providing copy-on-write behavior.
>>>>> 
>>>>> This is my view as well.  Chris, what is the distinction in your mind?
>>>>> 
>>>>>> I believe that the situation with structs and enums is also different. 
>>>>>> Indirecting enums has a bigger impact on interface because they enable 
>>>>>> recursive data structures, and while there are places where indirecting 
>>>>>> a struct may make new recursion possible, that's much rarer of a reason 
>>>>>> to introduce indirectness for structs. Performance and code size are the 
>>>>>> more common reasons, and we've described how to build COW boxes manually 
>>>>>> to work around performance problems at the last two years' WWDC. There 
>>>>>> are pretty good heuristics for when indirection almost always beats 
>>>>>> inline storage: once you have more than one refcounted field, passing 
>>>>>> around a box and retaining once becomes cheaper than retaining the 
>>>>>> fields individually. Once you exceed the fixed-sized buffer threshold of 
>>>>>> three words, indirecting some or all of your fields becomes necessary to 
>>>>>> avoid falling off a cliff in unspecialized generic or 
>>>>>> protocol-type-based code.  Considering that we hope to explore other 
>>>>>> layout optimizations, such as automatically reordering fields to 
>>>>>> minimize padding, and that, as with padding, there are simple rules for 
>>>>>> indirecting that can be mechanically followed to get good results in the 
>>>>>> 99% case, it seems perfectly reasonable to me to automate this.
>>>>>> 
>>>>>> -Joe
>>>>> 
>>>>> I think everyone is making good points in this discussion.  
>>>>> Predictability is an important value, but so is default performance.  To 
>>>>> some degree there is a natural tension between them, but I think it can 
>>>>> be mitigated.
>>>>> 
>>>>> Swift relies so heavily on the optimizer for performance that I don’t 
>>>>> think the default performance is ever going to be perfectly predictable.  
>>>>> But that’s actually a good thing because, as this allows the compiler to 
>>>>> provide *better* performance for unannotated code than it would otherwise 
>>>>> be able to do.  We should strive to make the default characteristics, 
>>>>> behaviors, heuristics, etc as predictable as possible without 
>>>>> compromising the goal of good performance by default.  We’re already 
>>>>> pretty fair down this path.  It’s not clear to me why indirect value 
>>>>> types would be treated any differently.  I don’t think anyone will 
>>>>> complain as long as it is very rare for performance to be *worse* than 
>>>>> the 100% predictable choice (always inline in this case).
>>>>> 
>>>>> It seems reasonable to me to expect developers who are reasoning about 
>>>>> relatively low level performance details (i.e. not Big-O performance) to 
>>>>> understand some lower level details of the language defaults.  It is also 
>>>>> important to offer tools for developers to take direct, manual control 
>>>>> when desired to make performance and behavior as predictable as possible.
>>>>> 
>>>>> For example, if we commit to and document the size of the inline 
>>>>> existential buffer it is possible to reason about whether or not a value 
>>>>> type is small enough to fit. If the indirection heuristic is relatively 
>>>>> simple - such as exceeding the inline buffer size, having more than one 
>>>>> ref counted field (including types implemented with CoW), etc the default 
>>>>> behavior will still be reasonably predictable.  These commitments don’t 
>>>>> necessarily need to cover *every* case and don’t necessarily need to 
>>>>> happen immediately, but hopefully the language will reach a stage of 
>>>>> maturity where the core team feels confident in committing to some of the 
>>>>> details that are relevant to common use cases.
>>>>> 
>>>>> We just need to also support users that want / need complete 
>>>>> predictability and optimal performance for their specific use case by 
>>>>> allowing opt-in annotations that offer more precise control.
>>>> 
>>>> I agree with this. First: IMHO indirect *should be* CoW, but currently it 
>>>> is not. If a value does not fit into the value buffer of an existential 
>>>> container, the value will be put onto the heap. If you store the same 
>>>> value into a second existential container (via an assignment to a variable 
>>>> of protocol type), it will be copied and put *as a second indirectly 
>>>> stored value* onto the heap, although no write has happened at all. Arnold 
>>>> Schwaighofer explained that in his talk at WWDC2016 very good (if you need 
>>>> a link, just ask me).
>>>> 
>>>> If there will be an automatic mechanism for indirect storage *and* CoW 
>>>> (which I would love), of course there have to be „tradeoff heuristics“ for 
>>>> when to store a value directly and when to use indirect storage. Further 
>>>> on, there should be a *unique value pool* for each value type where all 
>>>> (currently used) values of that type are stored (uniquely). I would even 
>>>> prefer, that the „tradeoff heuristics“ are done upfront by the compiler 
>>>> for a type, not for a variable. That means, Swift would use always a 
>>>> container for value types, but there are two types of containers: the 
>>>> value container and the existential container. The existential container 
>>>> stays like it is. The value container is as big as it needs to be to store 
>>>> the value of the given type, for small values (at most as big as the value 
>>>> buffer). If the value is bigger than the value buffer (or has more than 
>>>> one association to a reference type) the value container for this type is 
>>>> only as big as a reference, because these type will then stored on the 
>>>> heap with CoW **always**. This way I can always assign a value to a 
>>>> variable typed with a protocol, since value (or reference) will fit into 
>>>> the value buffer of the existential container. Additionally, CoW is 
>>>> available automatically for all types for which it „makes sense“ (of 
>>>> course annotations should be available to turn to the current „behavior“ 
>>>> if someone does not like this automatism. Last but not least, using the 
>>>> *unique value pool* for all value types, that fall into the category 
>>>> CoW-abonga this will be very space efficient.
>>>> 
>>>> Of course, if you create a new value of such a CoW-type, you need an 
>>>> *atomic lookup and set operation* in the value pool first checking whether 
>>>> it is already there (therefore a good (default) implementation of equality 
>>>> and hashable is a prerequisite) and either use the available value or in 
>>>> the other case add the new value to the pool.
>>>> 
>>>> Such a value pool could even be used system-wide (some languages do this 
>>>> for Strings, Ints and other value types). These values have to be evicted 
>>>> if their reference count drops to `0`. For some values permanent storage 
>>>> or storage for some time even if they are currently not referenced like in 
>>>> a cache could be implemented in order to reduce heap allocations (e.g. 
>>>> Java does this for primitive type wrapper instances for boxing and 
>>>> unboxing).
>>>> 
>>>> I would really love this. It would affect ABI, so it is a (potential) 
>>>> candidate for Swift 4 Phase 1 right?
>>> 
>>> I know some Java VM implementations have attempted global uniquing of 
>>> strings, but from what I've heard, nobody has done it in a way that's worth 
>>> the performance and complexity tradeoffs.
>> 
>> To my knowledge no other language than Swift offers this form of custom 
>> value types, that can implement protocols, and the need for CoW for "big 
>> values" is apparent. So why do you think, that only because the tradeoff in 
>> other languages (like Java) which have only a limited set of fixed value 
>> types (and a completely different memory model and with a virtual machine 
>> instead of LLVM) did not pay off, should not be worth evaluating in Swift?
> 
> Strings are immutable in Java, so they are effectively value types (if not 
> for their identity, which you can't rely on anyway).

Yes, I agree. But the  argumentation still holds.


Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to