> On Oct 20, 2016, at 8:51 AM, Dave Abrahams <[email protected]> wrote: > > We might want to leave some room in the design for a shared atomic cache > reference to live in the buffer, FWIW. It would have to be mutable even when > the buffer was multiply-referenced
Should be no problem with an attribute on that field. Like ‘mutable' in C++. > > Sent from my moss-covered three-handled family gradunza > > On Oct 20, 2016, at 8:41 AM, Erik Eckstein <[email protected] > <mailto:[email protected]>> wrote: > >> >>> On Oct 19, 2016, at 6:36 PM, Andrew Trick via swift-dev >>> <[email protected] <mailto:[email protected]>> wrote: >>> >>>> >>>> On Oct 19, 2016, at 10:13 AM, Dave Abrahams via swift-dev >>>> <[email protected] <mailto:[email protected]>> wrote: >>>> >>>> >>>> on Tue Oct 18 2016, Erik Eckstein <swift-dev-AT-swift.org >>>> <http://swift-dev-at-swift.org/>> wrote: >>>> >>>>>> On Oct 17, 2016, at 10:21 AM, Dave Abrahams <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>> >>>>>> >>>>>> on Mon Oct 17 2016, Erik Eckstein <eeckstein-AT-apple.com >>>>>> <http://eeckstein-at-apple.com/>> wrote: >>>>>> >>>>> >>>>>>> On Oct 16, 2016, at 2:05 PM, Dave Abrahams via swift-dev >>>>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>>>> >>>>>>>> on Thu Oct 13 2016, Joe Groff <swift-dev-AT-swift.org >>>>>>>> <http://swift-dev-at-swift.org/> <http://swift-dev-at-swift.org/ >>>>>>>> <http://swift-dev-at-swift.org/>>> wrote: >>>>>>>> >>>>>>>>>> On Oct 11, 2016, at 4:48 PM, Erik Eckstein via swift-dev >>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>>>>>>> >>>>>>>>>> This is a proposal for representing copy-on-write buffers in >>>>>>>>>> SIL. Actually it’s still a draft for a proposal. It also heavily >>>>>>>>>> depends on how we move forward with SIL ownership. >>>>>>>>>> <CopyOnWrite.rst> >>>>>>>>>> If you have any comments, please let me know. >>>>>>>>> >>>>>>>>> The SIL-level design seems sensible to me at a glance. At the language >>>>>>>>> level, I think it would make more sense to treat this as an attribute >>>>>>>>> on class types rather than on properties in structs using the class. I >>>>>>>>> don't think many people reuse class definitions as both shared >>>>>>>>> reference types and as value type payloads, >>>>>>>> >>>>>>>> Foundation does, or would if they could. >>>>>>>> >>>>>>>>> but beyond that, I think that making it an attribute of classes would >>>>>>>>> put us into a better position to leverage the borrow model to enforce >>>>>>>>> the "mutable-only-when-unique" aspect of COW implementations. John >>>>>>>>> alluded to this in the "SIL address types and borrowing" thread: >>>>>>>>> >>>>>>>>>> I wonder if it would make more sense to make copy-on-write buffer >>>>>>>>>> references a move-only type, so that as long as you were just >>>>>>>>>> working with the raw reference (as opposed to the CoW aggregate, >>>>>>>>>> which would remain copyable) it wouldn't get implicitly copied >>>>>>>>>> anymore. You could have mutable and immutable buffer reference >>>>>>>>>> types, both move-only, and there could be a consuming checkUnique >>>>>>>>>> operation on the immutable one that, I dunno, returned an Either of >>>>>>>>>> the mutable and immutable versions. >>>>>>>>>> >>>>>>>>>> For CoW aggregates, you'd need some @copied attribute on the field >>>>>>>>>> to make sure that the CoW attribute was still copyable. Within the >>>>>>>>>> implementation of the type, though, you would be projecting out the >>>>>>>>>> reference immediately, and thereafter you'd be certain that you were >>>>>>>>>> borrowing / moving it around as appropriate. >>>>>>>>> >>>>>>>>> If 'copy-on-write' were a trait on classes, then we could distinguish >>>>>>>>> unique and nonunique references to the class. A unique reference would >>>>>>>>> act like a move-only type to prevent accidental loss of uniqueness. >>>>>>>> >>>>>>>> +1 >>>>>>>> >>>>>>>>> We can also allow a copy-on-write class to have "mutating" methods, >>>>>>>>> and only allow mutation on unique references. It seems to me like, >>>>>>>>> exploring this direction, we could also come up with a way for the >>>>>>>>> high-level value-semantics operations on the struct to statically >>>>>>>>> indicate which methods are known to leave the value's buffers in a >>>>>>>>> unique state, or which return values that are uniquely owned, which >>>>>>>>> would give the optimizer more ability to avoid uniqueness checks >>>>>>>>> across calls without relying on inlining and IPO. >>>>>>>> >>>>>>>> That's pretty cool. However, I think there's nothing to prevent any >>>>>>>> mutating method from storing a copy of self in a global, so I think >>>>>>>> we'd >>>>>>>> need some participation from the programmer (either an agreement not to >>>>>>>> do that, or an explicit claim of uniqueness on exit) in order to >>>>>>>> identify operations that create/preserve uniqueness. >>>>>>> >>>>>>> If a mutating reference (like self in a mutating method) is move-only >>>>>>> then you would not be able to “copy” it to a global. >>>>>> >>>>>> Yes, a reference to a move-only type would work for this purpose. >>>>>> >>>>>> >>>>>>>> On Oct 16, 2016, at 2:01 PM, Dave Abrahams via swift-dev >>>>>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> on Tue Oct 11 2016, Erik Eckstein <swift-dev-AT-swift.org >>>>>>>> <http://swift-dev-at-swift.org/>> wrote: >>>>>>>> >>>>>>>>> This is a proposal for representing copy-on-write buffers in >>>>>>>>> SIL. Actually it’s still a draft for a proposal. It also heavily >>>>>>>>> depends on how we move forward with SIL ownership. >>>>>>>>> >>>>>>>>> :orphan: >>>>>>>>> >>>>>>>>> .. highlight:: sil >>>>>>>>> >>>>>>>>> =================================== >>>>>>>>> Copy-On-Write Representation in SIL >>>>>>>>> =================================== >>>>>>>>> >>>>>>>>> .. contents:: >>>>>>>>> >>>>>>>>> Overview >>>>>>>>> ======== >>>>>>>>> >>>>>>>>> This document proposes: >>>>>>>>> >>>>>>>>> - An ownership attribute to define copy-on-write (COW) buffers in >>>>>>>>> Swift data >>>>>>>>> types. >>>>>>>>> >>>>>>>>> - A representation of COW buffers in SIL so that optimizations can >>>>>>>>> take benefit >>>>>>>>> of it. >>>>>>>>> >>>>>>>>> The basic idea is to enable the SIL optimizer to reason about COW >>>>>>>>> data types >>>>>>>>> in the same way as a programmer can do. >>>>>>>>> This means: a COW buffer can only be modified by its owning SIL >>>>>>>>> value, because >>>>>>>>> either it's uniquely referenced or the buffer is copied before >>>>>>>>> modified. >>>>>>>>> >>>>>>>>> .. note:: >>>>>>>>> In the following the term "buffer" refers to a Swift heap object. >>>>>>>>> It can be any heap object, not necessarily a “buffer” with e.g. >>>>>>>>> tail-allocated elements. >>>>>>>>> >>>>>>>>> COW Types >>>>>>>>> ========= >>>>>>>>> >>>>>>>>> The basic structure of COW data types can be simplified as follows:: >>>>>>>>> >>>>>>>>> class COWBuffer { >>>>>>>>> var someData: Int >>>>>>>>> ... >>>>>>>>> } >>>>>>>>> >>>>>>>>> struct COWType { >>>>>>>>> var b : COWBuffer >>>>>>>>> >>>>>>>>> mutating func change_it() { >>>>>>>>> if (!isUniquelyReferenced(b)) { >>>>>>>>> b = copy_buffer(b) >>>>>>>>> } >>>>>>>>> b.someData = ... >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> Currently the COW behavior of such types is just defined by their >>>>>>>>> implementation. >>>>>>>>> But there is no representation of this special behavior in the SIL. >>>>>>>>> So the SIL optimizer has no clue about it and cannot take advantage >>>>>>>>> of it. >>>>>>>>> >>>>>>>>> For example:: >>>>>>>>> >>>>>>>>> func foo(arr : [Int]) { >>>>>>>>> x = arr[0] >>>>>>>>> opaque_function() >>>>>>>>> y = arr[0] // can RLE replace this with y = x? >>>>>>>>> } >>>>>>>>> >>>>>>>>> If opaque_function() wants to change the contents of the array buffer >>>>>>>>> it first >>>>>>>>> has to copy it. >>>>>>>> >>>>>>>> ...or determine that it's uniquely-referenced. >>>>>>> >>>>>>> In this specific example, if opqaue_function holds a reference to arr’s >>>>>>> buffer, the buffer is not >>>>>>> uniquely-referenced. >>>>>> >>>>>> Right. >>>>>> >>>>>>>> >>>>>>>>> But the optimizer does not know it so it has to conservatively assume >>>>>>>>> that opaque_function() will write to the location of arr[0]. >>>>>>>>> >>>>>>>>> Copy-on-write Ownership Attribute >>>>>>>>> ================================= >>>>>>>>> >>>>>>>>> This section proposes an ownership attribute to define a >>>>>>>>> copy-on-write buffer. >>>>>>>>> >>>>>>>>> Swift Syntax >>>>>>>>> ------------ >>>>>>>>> >>>>>>>>> A COW buffer reference can be defined with a new ownership attribute >>>>>>>>> for the >>>>>>>>> buffer variable declaration (similar to “weak” and “unowned”):: >>>>>>>>> >>>>>>>>> struct COWType { >>>>>>>>> copy_on_write var b : COWBuffer >>>>>>>>> >>>>>>>>> // ... >>>>>>>>> } >>>>>>>>> >>>>>>>>> The ``copy_on_write`` attribute is purely used for optimization >>>>>>>>> purposes. >>>>>>>>> It does not change the semantics of the program. >>>>>>>> >>>>>>>> Presumably, it changes what code you can execute on `b` without >>>>>>>> invoking >>>>>>>> traps or undefined behavior. Otherwise, the optimizer wouldn't be able >>>>>>>> to do anything differently to take advantage of the annotation. >>>>>>> >>>>>>> That’s true. >>>>>>> >>>>>>>> What are the rules for writing code that uses `copy_on_write`? >>>>>>> >>>>>>> See below ("The rules for using ``copy_on_write`` and the built-ins >>>>>>> are:”) >>>>>> >>>>>> Yeah, I got there, eventually. But just saying “doesn't change >>>>>> semantics” at this point in the proposal leaves a gap, because it does >>>>>> change semantic *requirements*. You should mention that. >>>>>> >>>>>>>>> .. note:: >>>>>>>>> >>>>>>>>> “copy_on_write” is a working title. TODO: decide on the name. >>>>>>>>> Maybe it should be a @-attribute, like @copy_on_write? >>>>>>>>> Another question is if we should open this attribute for the public >>>>>>>>> or just >>>>>>>>> use it internally in the library, because violating the implied rules >>>>>>>>> (see below) could break memory safety. >>>>>>>>> >>>>>>>>> Implementation >>>>>>>>> -------------- >>>>>>>>> >>>>>>>>> The ``copy_on_write`` references can be represented in the AST as a >>>>>>>>> special >>>>>>>>> ``StorageType``, just like how ``unowned`` and ``weak`` is >>>>>>>>> represented. >>>>>>>>> The canonical type of a ``CopyOnWriteStorageType`` would be the >>>>>>>>> referenced >>>>>>>>> buffer class type. >>>>>>>>> >>>>>>>>> In SIL the buffer reference will have type:: >>>>>>>>> >>>>>>>>> $@sil_cow COWBuffer >>>>>>>>> >>>>>>>>> where ``COWBuffer`` is the type of the referenced heap object. >>>>>>>>> >>>>>>>>> Two conversion instructions are needed to convert from a ``@sil_cow`` >>>>>>>>> reference >>>>>>>>> type to a regular reference type:: >>>>>>>>> >>>>>>>>> cow_to_ref >>>>>>>>> ref_to_cow >>>>>>>>> >>>>>>>>> Again, this is similar to ``ref_to_unowned`` and ``unowned_to_ref``. >>>>>>>>> >>>>>>>>> For example the SIL code for:: >>>>>>>>> >>>>>>>>> var c: COWType >>>>>>>>> let x = c.b.someData >>>>>>>>> >>>>>>>>> would be:: >>>>>>>>> >>>>>>>>> %1 = struct_extract %1 : COWType, #COWType.b >>>>>>>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>>>>>>>> %3 = ref_element_addr %2 : $COWBuffer, #COWBuffer.someData >>>>>>>>> %4 = load %3 : $*Int >>>>>>>>> >>>>>>>>> The ``ref_to_cow`` instruction is needed to store a new buffer >>>>>>>>> reference into a >>>>>>>>> COW type. >>>>>>>>> >>>>>>>>> COW Buffers and the Optimizer >>>>>>>>> ============================= >>>>>>>>> >>>>>>>>> A reference to a COW buffer gives the optimizer additional >>>>>>>>> information: >>>>>>>>> >>>>>>>>> *A buffer, referenced by a @sil_cow reference is considered to be >>>>>>>>> immutable >>>>>>>>> during the lifetime of the reference.* >>>>>>>> >>>>>>>> This seems like much too broad a rule to allow inplace mutations of >>>>>>>> uniquely referenced buffers. >>>>>>> >>>>>>> The point is that all mutations must be guarded by an is_unique, which >>>>>>> takes the _address_ of the buffer reference as argument. >>>>>>> And the optimizer considers this instruction as a potential write to >>>>>>> the buffer reference. >>>>>>> The effect is that the lifetime of a buffer reference (as a SIL value) >>>>>>> will not outlive a is_unique - regardless if this is inside a called >>>>>>> function or inlined. >>>>>> >>>>>> I don't see how that allows me to mutate a uniquely referenced buffer >>>>>> held >>>>>> in a @sil_cow reference, given what you wrote above. >>>>> >>>>> You would not be able to get a reference to a mutable buffer by >>>>> reading the COW type’s @sil_cow field. Instead you would only get >>>>> such a reference as a result of the is_unique instruction/builtin. Or, >>>>> of course, by creating a new buffer. >>>>> >>>>> I’m not sure if this was the question, though. >>>> >>>> I think it just comes down to precise phrasing. AFAICT, what you really >>>> mean to say is something like >>>> >>>> A buffer cannot be directly mutated through a @sil_cow reference; >>>> instead one must mutate it indirectly via the result of is_unique or >>>> start_unique. >> >> Exactly, that’s what I wanted to say. >> >>>> >>>> Saying that the buffer is “considered to be immmutable during the >>>> lifetime of the reference” could be taken to mean that the compiler will >>>> assume no mutations of the buffer can occur while the reference exists. >>>> IIUC you are not planning to formally end the reference's lifetime at >>>> the moment is_unique/start_unique returns. >>> >>> To clarify: I proposed an alternate approach in which the @sil_cow >>> reference is only mutable during the Array’s @inout scope—to be >>> automatically enforced by the compiler once @inout scopes are enforced. But >>> the text in question is not referring to that approach, so your comments >>> are on target. >> >> After thinking about Joe’s suggestion (having the cow attribute on the class >> type and make a reference to that type move-only), I’m more inclined to go >> with the isUnique builtin. If such a reference can only be returned by >> isUnique, it is really guaranteed that only a uniquely referenced buffer can >> be mutated. With the inout approach, the programmer is not forced to make >> the uniqueness check before modifying the buffer. >> >>> -Andy >>> >>>>> >>>>> Plus: we will have an explicit conversion instruction (start_unique) to >>>>> convert an immutable >>>>> reference to a mutable referece. >>>>> A SIL optimization can replace an is_unique with this instruction if it >>>>> can prove that the reference >>>>> is already unique at that point. >>>>> >>>>>> >>>>>> >>>>>>>> Unless you mean the reference is >>>>>>>> immutable, rather than the storage being referred to by it. >>>>>>>> >>>>>>>>> This means any address derived from a ``cow_to_ref`` instruction can >>>>>>>>> be >>>>>>>>> considered to point to immutable memory. >>>>>>>>> >>>>>>>>> Some examples of optimizations which will benefit from copy-on-write >>>>>>>>> representation in SIL: >>>>>>>>> >>>>>>>>> - Redundant load elimination >>>>>>>>> >>>>>>>>> RLE can assume that opaque code does not modify a COW buffer. >>>>>>>> >>>>>>>> How do you distinguish “opaque code” from “code that is meant to >>>>>>>> modify the buffer and might do so in place if it's >>>>>>>> uniquely-referenced?” >>>>>>> >>>>>>> Again, the is_unique which takes the address of the reference, will >>>>>>> guarantee that during the lifetime of a buffer there are no >>>>>>> modifications of the buffer. >>>>>> >>>>>> Again, that sounds like it rules out inplace modification of uniquely >>>>>> referenced buffers. >>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>>> Example:: >>>>>>>>> >>>>>>>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>>>>>>>> %3 = ref_element_addr %2 : $COWBuffer, #someData >>>>>>>>> %4 = load %3 : $*Int >>>>>>>>> %5 = apply %foo() // Cannot overwrite >>>>>>>>> memory location %3 >>>>>>>>> %6 = load %3 : $*Int // Can be replaced by %4 >>>>>>>>> >>>>>>>>> Currently we do some ad-hoc optimizations for array, based on >>>>>>>>> semantics, >>>>>>>>> like array count propagation. These hacks would not be needed >>>>>>>>> anymore. >>>>>>>> >>>>>>>> W0000000000000000000000t. >>>>>>>> >>>>>>>>> Note that it’s not required to check if a ``cow_to_ref`` reference >>>>>>>>> (or a >>>>>>>>> projected address) escapes. Even if it escapes, it will reference >>>>>>>>> immutable >>>>>>>>> memory. >>>>>>>>> >>>>>>>>> - CSE, loop hoisting >>>>>>>>> >>>>>>>>> Similar to RLE: the optimizer can assume that opaque code cannot >>>>>>>>> modify a >>>>>>>>> COW buffer >>>>>>>> >>>>>>>> Same question here as above, then. >>>>>>>>> >>>>>>>>> - ARC optimization >>>>>>>>> >>>>>>>>> Knowing that some opaque code cannot overwrite a reference in the COW >>>>>>>>> buffer >>>>>>>>> can remove retain/release pairs across such code:: >>>>>>>>> >>>>>>>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>>>>>>>> %3 = ref_element_addr %2 : $COWBuffer, #someRef >>>>>>>>> %4 = load_strong %3 : $*MyClass // Can do a load_strong >>>>>>>>> [guarantee] >>>>>>>>> %5 = apply %foo() // Cannot overwrite >>>>>>>>> someRef and dealloc the object >>>>>>>>> // Use %4 >>>>>>>>> destroy_value %4 : $MyClass >>>>>>>>> >>>>>>>>> Scoping instructions >>>>>>>>> -------------------- >>>>>>>>> >>>>>>>>> To let the optimizer reason about the immutability of the COW buffer, >>>>>>>>> it is >>>>>>>>> important to *bind* the lifetime of the buffer content to the >>>>>>>>> lifetime of the >>>>>>>>> buffer reference. For example:: >>>>>>>>> >>>>>>>>> %b1 = load %baddr : $@sil_cow COWBuffer // load the buffer reference >>>>>>>>> // load something from %b1 >>>>>>>>> %a = apply %foo(%baddr : $@sil_cow COWBuffer) >>>>>>>>> %b2 = load %baddr : $@sil_cow COWBuffer // load the buffer >>>>>>>>> reference again >>>>>>>>> // load something from %b2 >>>>>>>>> >>>>>>>>> The question is: can RLE forward the load of the buffer reference and >>>>>>>>> replace >>>>>>>>> ``%b2`` with ``%b1``? It must not be able to do so if ``foo()`` >>>>>>>>> modifies the >>>>>>>>> buffer. >>>>>>>>> >>>>>>>>> To enforce this restriction, the scope of any buffer modification >>>>>>>>> must be >>>>>>>>> enclosed in a pair of SIL instructions. Those instructions define the >>>>>>>>> scope >>>>>>>>> of the mutation. Both instructions take the *address* of the buffer >>>>>>>>> reference as operand and act as a potential write to the buffer >>>>>>>>> reference. >>>>>>>>> >>>>>>>>> The purpose of the scoping instructions is to strictly separate the >>>>>>>>> liferanges >>>>>>>>> of references to an immutable buffer and references to the mutable >>>>>>>>> buffer. >>>>>>>> >>>>>>>> Looks reasonable. >>>>>>>> >>>>>>>>> The following example shows why the scoping instructions >>>>>>>>> (specifically the >>>>>>>>> end-of-scope instruction) are required to prevent loop-hoisting from >>>>>>>>> interleaving mutable and immutable liferanges:: >>>>>>>>> >>>>>>>>> // there should be a begin-of-scope %baddr >>>>>>>>> %mut_b = load %baddr >>>>>>>>> store %x to %mut_b // modification of the buffer >>>>>>>>> // there should be a end-of-scope %baddr >>>>>>>>> >>>>>>>>> loop { >>>>>>>>> %b = load %baddr >>>>>>>>> %y = load %b // load from the buffer >>>>>>>>> ... >>>>>>>>> } >>>>>>>>> >>>>>>>>> If there is no end-of-scope instruction, loop hoisting could do:: >>>>>>>>> >>>>>>>>> %mut_b = load %baddr >>>>>>>>> %b = load %baddr // moved out of the loop >>>>>>>>> store %x to %mut_b >>>>>>>>> >>>>>>>>> loop { >>>>>>>>> %y = load %b >>>>>>>>> ... >>>>>>>>> } >>>>>>>>> >>>>>>>>> Now the optimizer assumes that ``%b`` references an immutable buffer, >>>>>>>>> so it could >>>>>>>>> also hoist the load:: >>>>>>>>> >>>>>>>>> %mut_b = load %baddr >>>>>>>>> %b = load %baddr >>>>>>>>> %y = load %b // Wrong! Will be overwritten by the following >>>>>>>>> store >>>>>>>>> store %x to %mut_b >>>>>>>>> >>>>>>>>> loop { >>>>>>>>> ... >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> The following sections describe two alternatives to implement the >>>>>>>>> scoping. >>>>>>>>> >>>>>>>>> Scoping Alternative 1: Explicit Built-ins >>>>>>>>> ----------------------------------------- >>>>>>>>> >>>>>>>>> SIL instructions >>>>>>>>> ^^^^^^^^^^^^^^^^ >>>>>>>>> >>>>>>>>> The existing ``is_unique`` instruction is changed to a terminator >>>>>>>>> instruction:: >>>>>>>>> >>>>>>>>> bb0: >>>>>>>>> is_unique_addr_br %0 : $*@sil_cow COWBuffer, bb1, bb2 // %0 is >>>>>>>>> the address of the COWBuffer reference >>>>>>>>> bb1(%1 : $COWBuffer): // the true-block. The payload %1 is the >>>>>>>>> unique reference. Physically identical to "load %0” >>>>>>>>> // usually empty >>>>>>>>> br bb3(%1 : $COWBuffer) >>>>>>>>> bb2: // the false-block >>>>>>>>> // usually contains: >>>>>>>>> %2 = apply %copy_buffer >>>>>>>>> %3 = cow_to_ref %2 >>>>>>>>> store_strong %3 to %0 : $*@sil_cow COWBuffer >>>>>>>>> br bb3(%2 : $COWBuffer) >>>>>>>>> bb3(%4 : $COWBuffer): >>>>>>>>> // Modify the buffer referenced by %4 >>>>>>>>> // ... >>>>>>>>> >>>>>>>>> The end-of-scope instruction is:: >>>>>>>>> >>>>>>>>> end_unique_addr %0 : $*COWBuffer >>>>>>>>> >>>>>>>>> It is important that the references to the unique buffers (``%1``, >>>>>>>>> ``%2``) must >>>>>>>>> not outlive ``end_unique_addr``. In most cases this can be check by >>>>>>>>> the SIL >>>>>>>>> verifier. >>>>>>>>> >>>>>>>>> The two instructions must be paired properly but not necessarily in >>>>>>>>> the >>>>>>>>> same function. >>>>>>>>> >>>>>>>>> The purpose of an ``is_unique_addr_br`` - ``end_unique_addr`` pair is >>>>>>>>> to >>>>>>>>> separate the lifetimes of mutable and immutable accesses to the COW >>>>>>>>> buffer. >>>>>>>>> Both instructions take an address to the COW buffer reference and are >>>>>>>>> considered as potential stores to the reference. >>>>>>>>> This makes sure that the SIL optimizer cannot mix-up buffer reference >>>>>>>>> lifetimes >>>>>>>>> across these instructions. >>>>>>>>> For example, RLE cannot combine two buffer loads which are >>>>>>>>> interleaved with >>>>>>>>> a ``is_unique_addr_br``:: >>>>>>>>> >>>>>>>>> %1 = load_strong %0 : $*@sil_cow COWBuffer >>>>>>>>> // do something with %1 >>>>>>>>> … >>>>>>>>> is_unique_addr_br %0 : $*@sil_cow COWBuffer >>>>>>>>> … >>>>>>>>> %2 = load_strong %0 : $*@sil_cow COWBuffer // RLE cannot replace >>>>>>>>> this with %1 >>>>>>>>> >>>>>>>>> Another important thing is that the COW buffer can only be mutated by >>>>>>>>> using the >>>>>>>>> reference of the ``is_unique_addr_br`` true-block argument. >>>>>>>>> The COW buffer cannot be modified by simply loading/extracting the >>>>>>>>> reference >>>>>>>>> from the COWType. >>>>>>>>> Example:: >>>>>>>>> >>>>>>>>> %1 = load_strong %0 : $*COWBuffer >>>>>>>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>>>>>>>> %3 = ref_element_addr %2 : $COWBuffer, #someData >>>>>>>>> store %7 : $Int to %3 : $*Int // Violation! >>>>>>>>> >>>>>>>>> Most obvious violations to this constraint can be catched by the >>>>>>>>> SILVerifier. >>>>>>>>> >>>>>>>>> The ``_addr`` variants of the instructions also have a non-addr >>>>>>>>> counterpart:: >>>>>>>>> >>>>>>>>> is_unique_br %0 : $COWBuffer, bb1, bb2. // consumes %0 and produces >>>>>>>>> the true-block arg as owned >>>>>>>>> >>>>>>>>> %1 = end_unique %0 : $COWBuffer // consumes %0 and produces %1 as >>>>>>>>> owned >>>>>>>>> >>>>>>>>> These instructions are generated by Mem2reg (or a similar >>>>>>>>> optimization) >>>>>>>>> in case the COW value is stored (in a temporary alloc_stack location) >>>>>>>>> just for the sake of passing an address to ``is_unique_addr_br`` and >>>>>>>>> ``end_unique_addr``. >>>>>>>>> For example in the following code, where the COW data is passed >>>>>>>>> as-value and >>>>>>>>> all the mutating functions are inlined:: >>>>>>>>> >>>>>>>>> func foo(arr : [Int], x: Int) { >>>>>>>>> arr[0] = 27 >>>>>>>>> … >>>>>>>>> y = arr[x] >>>>>>>>> … >>>>>>>>> } >>>>>>>>> >>>>>>>>> Finally it’s probably a good idea to add an instruction for >>>>>>>>> converting an >>>>>>>>> immutable reference to a mutable reference:: >>>>>>>>> >>>>>>>>> %1 = start_unique %0 : $COWBuffer // consumes %0 and produces %1 : >>>>>>>>> $COWBuffer as owned >>>>>>>>> >>>>>>>>> which is basically just a simpler representation of the following >>>>>>>>> pattern:: >>>>>>>>> >>>>>>>>> bb0: >>>>>>>>> is_unique_br %0 : $@sil_cow COWBuffer, bb1, bb2 >>>>>>>>> bb1(%1 : $COWBuffer): >>>>>>>>> … // main control flow continues here >>>>>>>>> bb2: >>>>>>>>> unreachable >>>>>>>>> >>>>>>>>> An optimizations, which eliminate uniqueness checks, would replace a >>>>>>>>> ``is_unique_br`` by a ``start_unique``. >>>>>>>>> >>>>>>>>> Built-ins >>>>>>>>> ^^^^^^^^^ >>>>>>>>> >>>>>>>>> A COW type implementor can generate the new instructions by using a >>>>>>>>> set of built-ins:: >>>>>>>>> >>>>>>>>> func isUnique<BufferType>(_ buffer: inout BufferType) -> BufferType? >>>>>>>>> func endUnique<BufferType>(_ buffer: inout BufferType) >>>>>>>>> >>>>>>>>> For example:: >>>>>>>>> >>>>>>>>> struct COWType { >>>>>>>>> copy_on_write var b : COWBuffer >>>>>>>>> >>>>>>>>> mutating func makeMutable() -> COWBuffer { >>>>>>>>> if let uniqueBuffer = isUnique(&self.b) { >>>>>>>>> return uniqueBuffer >>>>>>>>> } >>>>>>>>> let copiedBuffer = copyBuffer(self.b) >>>>>>>>> self.b = copiedBuffer >>>>>>>>> return copiedBuffer >>>>>>>>> } >>>>>>>>> >>>>>>>>> mutating func setSomeData(x: Int) { >>>>>>>>> let uniqueBuffer = makeMutable() >>>>>>>>> uniqueBuffer.someData = x >>>>>>>>> endUnique(&self.b) >>>>>>>>> } >>>>>>>>> } >>>>>>>> >>>>>>>> This seems reasonable, but it also looks like the compiler could do the >>>>>>>> `endUnique` dance for us based, e.g., on the mutability of methods. >>>>>>> >>>>>>> I agree, that would be ideal, e.g. the compiler could insert the >>>>>>> endUnique at the end of an inout >>>>>>> scope. >>>>>>> >>>>>>>> >>>>>>>>> The ``isUnique`` built-in returns an optional unique buffer reference. >>>>>>>>> Physically this is the COW buffer which is passed as the inout >>>>>>>>> argument. >>>>>>>>> The result is nil if the buffer is not uniquely referenced. >>>>>>>>> In this case usually the original buffer is copied and the reference >>>>>>>>> to the >>>>>>>>> copy is written back to the original buffer reference location >>>>>>>>> (``self.b = copiedBuffer``). >>>>>>>>> Starting at the point of the write-back, the reference to the copy >>>>>>>>> also becomes >>>>>>>>> a unique buffer reference. >>>>>>>>> >>>>>>>>> The ``isUnique`` built-in is lowered to the ``is_unique_addr_br`` >>>>>>>>> pattern which >>>>>>>>> constructs the Optional in the successor blocks. Using ``isUnique`` >>>>>>>>> in an >>>>>>>>> if-let (as shown above) will end up in two consecutive CFG "diamonds". >>>>>>>>> Simplify-CFG can combine those into a single ``is_unique_addr_br`` >>>>>>>>> diamond. >>>>>>>>> >>>>>>>>> .. note:: >>>>>>>>> This makes the definition of the unique buffer location lifetime a >>>>>>>>> little bit >>>>>>>>> problematic, because the false-branch of ``isUnique`` is not >>>>>>>>> equivalent to >>>>>>>>> the false-branch of the ``is_unique_addr_br`` instruction (before >>>>>>>>> SimplifyCFG >>>>>>>>> can do its job). >>>>>>>> >>>>>>>> I don't know what the implications of these diamonds and the problem >>>>>>>> described above might be, FWIW. >>>>>>>> >>>>>>>>> The rules for using ``copy_on_write`` and the built-ins are: >>>>>>>>> >>>>>>>>> 1. ``isUnique`` must be paired with ``endUnique``, but not >>>>>>>>> necessarily in the >>>>>>>>> same function. >>>>>>>>> >>>>>>>>> 2. The COW buffer may only be mutated by using the unique buffer >>>>>>>>> reference. >>>>>>>>> >>>>>>>>> 3. The COW buffer must not be mutated outside the ``isUnique`` - >>>>>>>>> ``endUnique`` >>>>>>>>> pair. >>>>>>>>> >>>>>>>>> 4. During the lifetime of the unique buffer reference, the original >>>>>>>>> COW buffer >>>>>>>>> reference must not be used in any way, e.g. for reading from the >>>>>>>>> buffer. >>>>>>>>> >>>>>>>>> Note that the lifetime of the unique buffer reference does not >>>>>>>>> include the >>>>>>>>> part between the begin of the ``isUnique`` false-branch and the >>>>>>>>> write-back >>>>>>>>> of the copy. This means is okay to read from the buffer (using >>>>>>>>> ``self.b``) >>>>>>>>> for the purpose of copying. >>>>>>>>> >>>>>>>>> Examples:: >>>>>>>>> >>>>>>>>> mutating func setSomeData(x: Int) { >>>>>>>>> let uniqueBuffer = makeMutable() >>>>>>>>> uniqueBuffer.someData = x >>>>>>>>> // violates rule 1 >>>>>>>>> } >>>>>>>>> >>>>>>>>> mutating func setSomeData(x: Int) { >>>>>>>>> makeMutable() >>>>>>>>> self.b.someData = x // violates rule 2 >>>>>>>>> endUnique(&self.b) >>>>>>>>> } >>>>>>>>> >>>>>>>>> mutating func setSomeData(x: Int) { >>>>>>>>> let uniqueBuffer = makeMutable() >>>>>>>>> uniqueBuffer.someData = x >>>>>>>>> endUnique(&self.b) >>>>>>>>> uniqueBuffer.someData = 27 // violates rule 3 >>>>>>>>> } >>>>>>>>> >>>>>>>>> mutating func incrementSomeData() { >>>>>>>>> let uniqueBuffer = makeMutable() >>>>>>>>> uniqueBuffer.someData = self.b.someData + 1 // violates rule 4 >>>>>>>>> endUnique(&self.b) >>>>>>>>> } >>>>>>>> >>>>>>>> It would be instructive to write down the *correct* code for these >>>>>>>> operations. >>>>>>> >>>>>>> added to my todo list. >>>>>>> >>>>>>>> >>>>>>>>> The intention of the rules is to ensure that there is no overlap of a >>>>>>>>> "read-only" life-range with a "mutable" life-range of the buffer >>>>>>>>> reference. >>>>>>>>> It’s the responsibility of the implementor to follow the rules. >>>>>>>>> But the compiler (a mandatory diagnostics pass and the SIL verifier) >>>>>>>>> can >>>>>>>>> statically detect rule violations in obvious cases (with >>>>>>>>> inter-procedural >>>>>>>>> analysis maybe even in most cases). >>>>>>>>> >>>>>>>>> This approach would require to change some of the internals of our >>>>>>>>> current COW data structures in the stdlib (Array, Dictionary, etc.). >>>>>>>>> For example, the Array make_mutable semantic functions currently do >>>>>>>>> not return >>>>>>>>> the unique buffer. >>>>>>>> >>>>>>>> No big deal. >>>>>>>> >>>>>>>>> Scoping Alternative 2: Implicit Inout Scopes >>>>>>>>> -------------------------------------------- >>>>>>>>> >>>>>>>>> There is an idea (proposal?) to change the representation of inout >>>>>>>>> variables >>>>>>>>> in SIL. This is independent of this proposal, but can be helpful for >>>>>>>>> the >>>>>>>>> purpose of defining the scope of a COW mutation. >>>>>>>>> >>>>>>>>> The basic idea is that SILGen inserts scoping instructions for *all* >>>>>>>>> inout >>>>>>>>> variables. And those scoping instructions can be used to define the >>>>>>>>> mutating >>>>>>>>> scope of a COW buffer. >>>>>>>>> >>>>>>>>> The scoping instructions which are inserted by SILGen for an inout >>>>>>>>> scope are:: >>>>>>>>> >>>>>>>>> begin_exclusive >>>>>>>>> end_exclusive >>>>>>>>> >>>>>>>>> Simliar to ``is_unique_addr_br`` and ``end_unique_addr``, those >>>>>>>>> instructions take the >>>>>>>>> address of the inout variable as argument. For the optimizer those >>>>>>>>> instructions >>>>>>>>> look like potential writes to the inout variable. >>>>>>>>> >>>>>>>>> The implementor of a COW type has to follow the rule that the COW >>>>>>>>> buffer may >>>>>>>>> only be modified in mutating functions of the COW type. But this is >>>>>>>>> the case >>>>>>>>> anyway because any modification needs a uniqueness check and this can >>>>>>>>> only be >>>>>>>>> done in mutating functions. >>>>>>>>> >>>>>>>>> Example:: >>>>>>>>> >>>>>>>>> // > mutating func setSomeData(x: Int) { >>>>>>>>> // Accepts a unique reference to the array value (avoiding refcount >>>>>>>>> operations) >>>>>>>>> sil @setSomeData : $(Int, @inout Array) -> () { >>>>>>>>> bb_entry(%x : Int, %arrayref : $*Array<T>) // Begin scope #0 >>>>>>>>> >>>>>>>>> // > makeMutable() (inlined) >>>>>>>>> // Forward the unique reference to the `self` array value, still >>>>>>>>> avoiding refcount operations. >>>>>>>>> // Begin the inlined exclusive scope (could be trivially removed). >>>>>>>>> begin_exclusive %arrayref : $*Array<T> // Begin scope #1 >>>>>>>>> >>>>>>>>> // > if !isUnique(&self._storage) { >>>>>>>>> // Extract a unique inout reference to the class reference to the >>>>>>>>> array storage. >>>>>>>>> // This begins the isUnique() argument's exclusive scope. The memory >>>>>>>>> is already exclusive >>>>>>>>> // but the scope helps ensure this is the only alias to _storage. >>>>>>>>> %arrayref._storageref = struct_element_addr [exclusive] %arrayref, >>>>>>>>> #Array._storage >>>>>>>>> >>>>>>>>> // Uniqueness checking requires an inout reference to the class >>>>>>>>> reference. >>>>>>>>> // The is_unique instruction does not need to create a new storage >>>>>>>>> reference. >>>>>>>>> // It's only purpose is to check the RC count, ensure that the >>>>>>>>> checked reference >>>>>>>>> // is inout, and prevent the inout scope from being optimized away. >>>>>>>>> %isuniq = is_unique %arrayref._storageref : $*@sil_cow >>>>>>>>> ArrayStorage<T> >>>>>>>>> >>>>>>>>> // End the isUnique argument's exclusive scope (can also be >>>>>>>>> trivially removed). >>>>>>>>> end_exclusive %arrayref._storageref : $*@sil_cow ArrayStorage<T> >>>>>>>>> >>>>>>>>> br %isuniq, bb_continue, bb_slow >>>>>>>>> >>>>>>>>> bb_slow: >>>>>>>>> // > self._storage = copyBuffer(self._storage) >>>>>>>>> // Produce a new class reference to storage with verifiably unique >>>>>>>>> RC semantics. >>>>>>>>> %copied_storage_class = alloc_ref ... >>>>>>>>> // A begin/end exclusive scope is implicit in store [assign]. >>>>>>>>> store [assign] %copied_storage_class to %arrayref._storageref >>>>>>>>> br bb_continue >>>>>>>>> >>>>>>>>> bb_continue: >>>>>>>>> >>>>>>>>> // This marks the end of makeMutable's inout `self` scope. Because >>>>>>>>> Array >>>>>>>>> // contains a "copy_on_write" property, the SIL verifier needs to >>>>>>>>> // prove that %arrayref.#_storage has not escaped at this point. This >>>>>>>>> // is equivalent to checking that %arrayref itself is not copied, and >>>>>>>>> // checking each projection of the "copy_on_write" storage property >>>>>>>>> // (%arrayref._storageref) is not copied. Or, if any copies are >>>>>>>>> present, >>>>>>>>> // they must be consumed within this scope. >>>>>>>>> end_exclusive %arrayref : $*Array<T> // End scope #1 >>>>>>>>> >>>>>>>>> // > self._storage.someData = x >>>>>>>>> // An _addr instruction with one load/store use doesn't really need >>>>>>>>> its own scope. >>>>>>>>> %arrayref._storageref = struct_element_addr %arrayref, >>>>>>>>> #Array._storage >>>>>>>>> >>>>>>>>> // ARC optimization can promote this to a borrow, replacing >>>>>>>>> strong_release with end_borrow. >>>>>>>>> %arrayref.cow_storage = load [copy] %arrayref._storageref : >>>>>>>>> $*@sil_cow ArrayStorage >>>>>>>>> %arrayref._storage = cow_to_ref %arrayref.cow_storage : $@sil_cow >>>>>>>>> ArrayStorage >>>>>>>>> >>>>>>>>> // Write some data into the CoW buffer. >>>>>>>>> // (For simplicity, pretend ArrayStorage has a "someData" field). >>>>>>>>> // A single-use _addr instruction, so no scope. >>>>>>>>> %somedata_addr = ref_element_addr %arrayref._storage, #someData >>>>>>>>> // A store with an implicit [exclusive] scope. >>>>>>>>> store [assign] %x to %somedata_addr >>>>>>>>> >>>>>>>>> strong_release %arrayref._storage : $*ArrayStorage<T> >>>>>>>>> >>>>>>>>> // End the isUnique argument's exclusive scope. >>>>>>>>> // The same verification is needed here, but the inner scope would >>>>>>>>> be eliminated. >>>>>>>>> end_exclusive %arrayref : $*Array<T> // End scope #0 >>>>>>>>> >>>>>>>>> In general this approach looks more "user-friendly" than the first >>>>>>>>> alternative. >>>>>>>> >>>>>>>> Well, I can't really tell, because you haven't shown the Swift code >>>>>>>> that >>>>>>>> generates this SIL. >>>>>>>> >>>>>>>>> But it depends on implementing the general feature to insert the inout >>>>>>>>> scoping instructions. Also, we still have to think through all the >>>>>>>>> details of this approach. >>>>>>>> >>>>>>>> FWIW, I am convinced we will need (and get) a stricter inout model that >>>>>>>> would be conducive to inserting the scoping instructions. >>>>>>>> >>>>>>>> >>>>>>>>> Dependency between a buffer reference to the scope-begin >>>>>>>>> -------------------------------------------------------- >>>>>>>> >>>>>>>> You can only have a dependency between two things, but as phrased “a >>>>>>>> buffer reference to the scope-begin” sounds like one thing. s/to/and/ >>>>>>>> would fix it. >>>>>>>> >>>>>>>>> With both alternatives there is no explicit dependency from a buffer >>>>>>>>> reference >>>>>>>>> to a scope-begin instruction:: >>>>>>>>> >>>>>>>>> %b_cow = load %baddr >>>>>>>>> %b = cow_to_ref %b_cow >>>>>>>>> %x = load %b // No dependency between this... >>>>>>>>> ... >>>>>>>>> begin_exclusive %baddr // ... and this instruction. >>>>>>>>> ... >>>>>>>>> >>>>>>>>> So in theory the optimizer is free to reschedule the instructions:: >>>>>>>>> >>>>>>>>> %b_cow = load %baddr >>>>>>>>> %b = cow_to_ref %b_cow >>>>>>>>> ... >>>>>>>>> begin_exclusive %baddr >>>>>>>>> %x = load %b // Wrong! Buffer could be modified here >>>>>>>>> ... >>>>>>>>> >>>>>>>>> We still have to figure out how to cope with this. >>>>>>>>> >>>>>>>>> - We could add an end-of-lifetime instruction for a COW buffer >>>>>>>>> reference, which >>>>>>>>> the optimizer may not move over a begin-of-scope instruction. >>>>>>>>> >>>>>>>>> - Or we just define the implicit rule for the optimizer that any use >>>>>>>>> of a COW >>>>>>>>> reference may not be moved over a begin-of-scope instruction. >>>>>>>>> >>>>>>>>> Preconditions >>>>>>>>> ============= >>>>>>>>> >>>>>>>>> To benefit from COW optimizations in the stdlib Array, Set and >>>>>>>>> Dictionary data >>>>>>>>> structures we first need eager bridging, meaning getting rid of the >>>>>>>>> bridged >>>>>>>>> buffer. >>>>>>>> >>>>>>>> As you know I'm very much in favor of eager bridging, but I don't see >>>>>>>> why this would be dependent on it. >>>>>>> >>>>>>> We could use copy_on_write with eager bridging, but I don’t think it >>>>>>> will give any benefits to the >>>>>>> optimizer. >>>>>>> For example, the SIL code to get from an Array to a native >>>>>>> ContiguousArrayStorage reference is pretty hard to understand for the >>>>>>> optimizer (involves low level bit operations, etc.). >>>>>> >>>>>> It wouldn't need to do low-level bit operations if our enums were >>>>>> capable/controllable enough. I'm just saying, there's no reason we >>>>>> couldn't give the optimizer something to work with that has higher level >>>>>> semantics than what we currently do. >>>>>> >>>>>>>>> At least for Array this is implemented as low-level bit operations and >>>>>>>>> optimizations cannot reason about it (e.g. finding a reasonable >>>>>>>>> RC-root for the buffer reference). >>>>>>>>> >>>>>>>>> Another thing is that we currently cannot use ``copy_on_write`` for >>>>>>>>> Array >>>>>>>>> because of pinning. Array pins it’s buffer when passing an element >>>>>>>>> address to >>>>>>>>> an inout parameter. This allows the array buffer to be modified even >>>>>>>>> if its >>>>>>>>> reference count is > 1. With ``copy_on_write``, a programmer could >>>>>>>>> break memory >>>>>>>>> safety when violating the inout rule. Example:: >>>>>>>>> >>>>>>>>> var arr = [MyClass()] // a global array >>>>>>>>> >>>>>>>>> foo(&arr[0]) // Pins the buffer of arr during the call >>>>>>>>> >>>>>>>>> func foo(_ x: inout MyClass) -> Int { >>>>>>>>> let b = arr // The ref-count of the buffer is not >>>>>>>>> incremented, because it is pinned! >>>>>>>>> let r = b[0] // optimizer removes the retain of r because it >>>>>>>>> thinks the following code cannot modify b >>>>>>>>> arr.removeAll() // does not copy the array buffer and thus >>>>>>>>> de-allocates r >>>>>>>>> return r.i // use-after-free! >>>>>>>>> } >>>>>>>> >>>>>>>> I only know of one way to resolve inout and pinning: >>>>>>>> >>>>>>>> * Semantically, references are replaced with a trap value when entering >>>>>>>> an inout context so that all inout values are provably unique >>>>>>>> references in the absence of unsafe code. We drop pinning and provide >>>>>>>> explicit operations that provide simultaneous lvalue accesses to >>>>>>>> distinct regions, e.g. c.swap(i1, i2) where i1 and i2 are indices. >>>>>>>> >>>>>>>> If there are other ideas out there, I'd like to hear them. If not, we >>>>>>>> should probably decide that this is what we're doing so that we can >>>>>>>> move >>>>>>>> forward without this looming uncertainty. >>>>>>>> >>>>>>>> -- >>>>>>>> -Dave >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> swift-dev mailing list >>>>>>>> [email protected] <mailto:[email protected]> >>>>>>>> https://lists.swift.org/mailman/listinfo/swift-dev >>>>>>>> <https://lists.swift.org/mailman/listinfo/swift-dev> >>>>>>> >>>>>> >>>>>> -- >>>>>> -Dave >>>>> >>>>> _______________________________________________ >>>>> swift-dev mailing list >>>>> [email protected] <mailto:[email protected]> >>>>> https://lists.swift.org/mailman/listinfo/swift-dev >>>>> <https://lists.swift.org/mailman/listinfo/swift-dev> >>>> >>>> -- >>>> -Dave >>>> >>>> _______________________________________________ >>>> swift-dev mailing list >>>> [email protected] <mailto:[email protected]> >>>> https://lists.swift.org/mailman/listinfo/swift-dev >>>> <https://lists.swift.org/mailman/listinfo/swift-dev> >>> _______________________________________________ >>> swift-dev mailing list >>> [email protected] <mailto:[email protected]> >>> https://lists.swift.org/mailman/listinfo/swift-dev >>> <https://lists.swift.org/mailman/listinfo/swift-dev>
_______________________________________________ swift-dev mailing list [email protected] https://lists.swift.org/mailman/listinfo/swift-dev
