Re: [swift-dev] copy-on-write proposal

Erik Eckstein via swift-dev Wed, 12 Oct 2016 09:48:07 -0700

Thanks for the feedback! Here is an updated version of the proposal: 
https://github.com/eeckstein/swift/blob/cow-proposal/docs/proposals/CopyOnWrite.rst
 
<https://github.com/eeckstein/swift/blob/cow-proposal/docs/proposals/CopyOnWrite.rst>
(you can look at the history to see the changes compared to the previous 
version)


> On Oct 11, 2016, at 7:57 PM, Michael Gottesman <[email protected]> wrote:
> 
>> ----
>> 
>> :orphan:
>> 
>> .. highlight:: sil
>> 
>> ===================================
>> Copy-On-Write Representation in SIL
>> ===================================
>> 
>> .. contents::
>> 
>> Overview
>> ========
>> 
>> This document proposes:
>> 
>> - An ownership attribute to define copy-on-write (COW) buffers in Swift data
>>  types.
>> 
>> - A representation of COW buffers in SIL so that optimizations can take 
>> benefit
>>  of it.
>> 
>> The basic idea is to enable the SIL optimizer to reason about COW data types
>> in the same way as a programmer can do.
> 
> in the same way as a programmer can do => just as a programmer can.
> 
>> 
>> 
>> This means: a COW buffer can only be modified by its owning SIL value, 
>> because
>> either it's uniquely referenced or the buffer is copied before modified.
> 
> modified => modification.
> 
>> 
>> .. note::
>>    In the following the term "buffer" refers to a Swift heap object.
>>    It can be any heap object, not necessarily a “buffer” with e.g. 
>> tail-allocated elements.
>> 
>> COW Types
>> =========
>> 
>> The basic structure of COW data types can be simplified as follows::
>> 
>>    class COWBuffer {
>>      var someData: Int
>>      ...
>>    }
>> 
>>    struct COWType {
>>      var b : COWBuffer
>> 
>>      mutating func change_it() {
>>        if (!isUniquelyReferenced(b)) {
>>          b = copy_buffer(b)
>>        }
>>        b.someData = ...
>>      }
>>    }
>> 
>> Currently the COW behavior of such types is just defined by their 
>> implementation.
>> But there is no representation of this special behavior in the SIL.
>> So the SIL optimizer has no clue about it and cannot take advantage of it.
>> 
>> For example::
>> 
>>    func foo(arr : [Int]) {
>>      x = arr[0]
>>      opaque_function()
>>      y = arr[0] // can RLE replace this with y = x?
>>    }
>> 
>> If opaque_function() wants to change the contents of the array buffer it 
>> first
>> has to copy it. But the optimizer does not know it so it has to 
>> conservatively
>> assume that opaque_function() will write to the location of arr[0].
>> 
>> Copy-on-write Ownership Attribute
>> =================================
>> 
>> This section proposes an ownership attribute to define a copy-on-write 
>> buffer.
>> 
>> Swift Syntax
>> ------------
>> 
>> A COW buffer reference can be defined with a new ownership attribute for the
>> buffer variable declaration (similar to “weak” and “unowned”)::
>> 
>>    struct COWType {
>>      copy_on_write var b : COWBuffer
>> 
>>      // ...
>>    }
>> 
>> The ``copy_on_write`` attribute is purely used for optimization purposes.
>> It does not change the semantics of the program.
>> 
>> .. note::
>> 
>>  “copy_on_write” is a  working title. TODO: decide on the name.
>>  Maybe it should be a @-attribute, like @copy_on_write?
>>  Another question is if we should open this attribute for the public or just
>>  use it internally in the library, because violating the implied rules
>>  (see below) could break memory safety.
>> 
>> Implementation
>> --------------
>> 
>> The ``copy_on_write`` references can be represented in the AST as a special
> 
> "The" is not needed here I think.
> 
>> ``StorageType``, just like how ``unowned`` and ``weak`` is represented.
> 
> is represented => are represented.
> 
>> The canonical type of a ``CopyOnWriteStorageType`` would be the referenced
>> buffer class type.
>> 
>> In SIL the buffer reference will have type::
>> 
>>    $@sil_cow COWBuffer
>> 
>> where ``COWBuffer`` is the type of the referenced heap object.
>> 
>> Two conversion instructions are needed to convert from a ``@sil_cow`` 
>> reference
>> type to a regular reference type::
>> 
>>    cow_to_ref
>>    ref_to_cow
>> 
>> Again, this is similar to ``ref_to_unowned`` and ``unowned_to_ref``.
>> 
>> For example the SIL code for::
>> 
>>    var c: COWType
>>    let x = c.b.someData
>> 
>> would be::
>> 
>>    %1 = struct_extract %1 : COWType, #COWType.b
>>    %2 = cow_to_ref %1 : $@sil_cow COWBuffer
>>    %3 = ref_element_addr %2 : $COWBuffer, #COWBuffer.someData
>>    %4 = load %3 : $*Int
>> 
>> The ``ref_to_cow`` instruction is needed to store a new buffer reference 
>> into a
>> COW type.
>> 
>> COW Buffers and the Optimizer
>> =============================
>> 
>> A reference to a COW buffer gives the optimizer additional information:
>> 
>>  *A buffer, referenced by a @sil_cow reference is considered to be immutable
>>  during the lifetime of the reference.*
>> 
>> This means any address derived from a ``cow_to_ref`` instruction can be
>> considered to point to immutable memory.
>> 
>> Some examples of optimizations which will benefit from copy-on-write
>> representation in SIL:
>> 
>> - Redundant load elimination
>> 
>>  RLE can assume that opaque code does not modify a COW buffer.
>>  Example::
>> 
>>      %2 = cow_to_ref %1 : $@sil_cow COWBuffer
>>      %3 = ref_element_addr %2 : $COWBuffer, #someData
>>      %4 = load %3 : $*Int
>>      %5 = apply %foo()                        // Cannot overwrite memory 
>> location %3
>>      %6 = load %3 : $*Int                     // Can be replaced by %4
>> 
>>  Currently we do some ad-hoc optimizations for array, based on semantics,
>>  like array count propagation. These hacks would not be needed anymore.
>> 
>>  Note that it’s not required to check if a ``cow_to_ref`` reference (or a
>>  projected address) escapes. Even if it escapes, it will reference immutable
>>  memory.
>> 
>> - CSE, loop hoisting
>> 
>>  Similar to RLE: the optimizer can assume that opaque code cannot modify a
>>  COW buffer
>> 
>> - ARC optimization
>> 
>>  Knowing that some opaque code cannot overwrite a reference in the COW buffer
>>  can remove retain/release pairs across such code::
>> 
>>      %2 = cow_to_ref %1 : $@sil_cow COWBuffer
>>      %3 = ref_element_addr %2 : $COWBuffer, #someRef
>>      %4 = load_strong %3 : $*MyClass          // Can do a load_strong 
>> [guarantee]
>>      %5 = apply %foo()                        // Cannot overwrite someRef 
>> and dealloc the object
>>      // Use %4
>>      destroy_value %4 : $MyClass
>> 
>> Scoping instructions
>> --------------------
>> 
>> To let the optimizer reason about the immutability of the COW buffer, it is
>> important to *bind* the lifetime of the buffer content to the lifetime of the
>> buffer reference. For example::
>> 
>>    %b1 = load %baddr : $@sil_cow COWBuffer  // load the buffer reference
>>    // load something from %b1
>>    %a = apply %foo(%baddr : $@sil_cow COWBuffer)
>>    %b2 = load %baddr : $@sil_cow COWBuffer  // load the buffer reference 
>> again
>>    // load something from %b2
>> 
>> The question is: can RLE forward the load of the buffer reference and replace
>> ``%b2`` with ``%b1``? It must not be able to do so if ``foo()`` modifies the
>> buffer.
>> 
>> To enforce this restriction, the scope of any buffer modification must be
>> enclosed in a pair of SIL instructions. Those instructions define the scope
>> of the mutation. Both instructions take the *address* of the buffer
>> reference as operand and act as a potential write to the buffer reference. 
>> 
>> The purpose of the scoping instructions is to strictly separate the 
>> liferanges
>> of references to an immutable buffer and references to the mutable buffer.
>> 
>> The following example shows why the scoping instructions (specifically the
>> end-of-scope instruction) are required to prevent loop-hoisting from
>> interleaving mutable and immutable liferanges::
>> 
>>    // there should be a begin-of-scope %baddr
>>    %mut_b = load %baddr
>>    store %x to %mut_b    // modification of the buffer
>>    // there should be a end-of-scope %baddr
>> 
>>    loop {
>>      %b = load %baddr
>>      %y = load %b        // load from the buffer
>>      ...
>>    }
>> 
>> If there is no end-of-scope instruction, loop hoisting could do::
>> 
>>    %mut_b = load %baddr
>>    %b = load %baddr        // moved out of the loop
>>    store %x to %mut_b
>> 
>>    loop {
>>      %y = load %b
>>      ...
>>    }
>> 
>> Now the optimizer assumes that ``%b`` references an immutable buffer, so it 
>> could
>> also hoist the load::
>> 
>>    %mut_b = load %baddr
>>    %b = load %baddr
>>    %y = load %b          // Wrong! Will be overwritten by the following store
>>    store %x to %mut_b
>> 
>>    loop {
>>      ...
>>    }
>> 
>> 
>> The following sections describe two alternatives to implement the scoping.
>> 
>> Scoping Alternative 1: Explicit Built-ins
>> -----------------------------------------
>> 
>> SIL instructions
>> ^^^^^^^^^^^^^^^^
>> 
>> The existing ``is_unique`` instruction is changed to a terminator 
>> instruction::
>> 
>>    bb0:
>>      is_unique_addr_br %0 : $*@sil_cow COWBuffer, bb1, bb2  // %0 is the 
>> address of the COWBuffer reference
>>    bb1(%1 : $COWBuffer): // the true-block. The payload %1 is the unique 
>> reference. Physically identical to "load %0”
>>      // usually empty
>>      br bb3(%1 : $COWBuffer)
>>    bb2:                  // the false-block
>>      // usually contains:
>>      %2 = apply %copy_buffer
>>      %3 = cow_to_ref %2
>>      store_strong %3 to %0 : $*@sil_cow COWBuffer
>>      br bb3(%2 : $COWBuffer)
>>    bb3(%4 : $COWBuffer):
>>      // Modify the buffer referenced by %4
>>      // ...
>> 
>> The end-of-scope instruction is::
>> 
>>    end_unique_addr %0 : $*COWBuffer
>> 
>> It is important that the references to the unique buffers (``%1``, ``%2``) 
>> must
>> not outlive ``end_unique_addr``. In most cases this can be check by the SIL
>> verifier.
>> 
>> The two instructions must be paired properly but not necessarily in the
>> same function.
>> 
> 
> Would this require a new form of function signature?

No

> 
>> 
>> The purpose of an ``is_unique_addr_br`` - ``end_unique_addr`` pair is to
>> separate the lifetimes of mutable and immutable accesses to the COW buffer.
>> Both instructions take an address to the COW buffer reference and are
>> considered as potential stores to the reference.
>> This makes sure that the SIL optimizer cannot mix-up buffer reference 
>> lifetimes
>> across these instructions.
>> For example, RLE cannot combine two buffer loads which are interleaved with
>> a ``is_unique_addr_br``::
>> 
>>    %1 = load_strong %0 : $*@sil_cow COWBuffer
>>    // do something with %1
>>    …
>>    is_unique_addr_br %0 : $*@sil_cow COWBuffer
>>    …
>>    %2 = load_strong %0 : $*@sil_cow COWBuffer // RLE cannot replace this 
>> with %1
> 
> Can you make this a complete example.

I think in this short version it’s easier to understand. But I added a comment 
beside the is_unique_addr_br for clarification.

> 
>> 
>> Another important thing is that the COW buffer can only be mutated by using 
>> the
>> reference of the ``is_unique_addr_br`` true-block argument.
> 
> On the face this claim is incorrect. I think it is important to be clear here 
> that you mean without any further copying done. It seems to contradict your 
> example above of doing mutable things in bb3.
> 

You are right. I rewrote this sentence.

>> The COW buffer cannot be modified by simply loading/extracting the reference
>> from the COWType.
>> Example::
>> 
>>  %1 = load_strong %0 : $*COWBuffer
>>  %2 = cow_to_ref %1 : $@sil_cow COWBuffer
>>  %3 = ref_element_addr %2 : $COWBuffer, #someData
>>  store %7 : $Int to %3 : $*Int            // Violation!
>> 
>> Most obvious violations to this constraint can be catched by the SILVerifier.
>> 
>> The ``_addr`` variants of the instructions also have a non-addr counterpart::
>> 
>>    is_unique_br %0 : $COWBuffer, bb1, bb2.  // consumes %0 and produces the 
>> true-block arg as owned
>> 
>>    %1 = end_unique %0 : $COWBuffer // consumes %0 and produces %1 as owned
>> 
>> These instructions are generated by Mem2reg (or a similar optimization)
>> in case the COW value is stored (in a temporary alloc_stack location)
>> just for the sake of passing an address to ``is_unique_addr_br`` and
>> ``end_unique_addr``.
>> For example in the following code, where the COW data is passed as-value and
>> all the mutating functions are inlined::
>> 
>>    func foo(arr : [Int], x: Int) {
>>      arr[0] = 27
>>      …
>>      y = arr[x]
>>      …
>>    }
>> 
>> Finally it’s probably a good idea to add an instruction for converting an
>> immutable reference to a mutable reference::
>> 
>>    %1 = start_unique %0 : $COWBuffer // consumes %0 and produces %1 : 
>> $COWBuffer as owned
>> 
>> which is basically just a simpler representation of the following pattern::
>> 
>>    bb0:
>>      is_unique_br %0 : $@sil_cow COWBuffer, bb1, bb2
>>    bb1(%1 : $COWBuffer):
>>      … // main control flow continues here
>>    bb2:
>>      unreachable
>> 
>> An optimizations, which eliminate uniqueness checks, would replace a
>> ``is_unique_br`` by a ``start_unique``.
> 
> Question, I imagine that make_mutable is a very common case. Why not just 
> provide such an instruction?

What should a make_mutable instruction do? Or are you just suggesting to rename 
start_unique to make_mutable?

> 
>> 
>> Built-ins
>> ^^^^^^^^^
>> 
>> A COW type implementor can generate the new instructions by using a set of 
>> built-ins::
>> 
>>    func isUnique<BufferType>(_ buffer: inout BufferType) -> BufferType?
>>    func endUnique<BufferType>(_ buffer: inout BufferType)  
>> 
>> For example::
>> 
>>    struct COWType {
>>      copy_on_write var b : COWBuffer
>> 
>>      mutating func makeMutable() -> COWBuffer {
>>        if let uniqueBuffer = isUnique(&self.b) {
>>          return uniqueBuffer
>>        }
>>        let copiedBuffer = copyBuffer(self.b)
>>        self.b = copiedBuffer
>>        return copiedBuffer
>>      }
>> 
>>      mutating func setSomeData(x: Int) {
>>        let uniqueBuffer = makeMutable()
>>        uniqueBuffer.someData = x
>>        endUnique(&self.b)
>>      }
>>    }
>> 
>> The ``isUnique`` built-in returns an optional unique buffer reference.
>> Physically this is the COW buffer which is passed as the inout argument.
>> The result is nil if the buffer is not uniquely referenced.
>> In this case usually the original buffer is copied and the reference to the
>> copy is written back to the original buffer reference location
>> (``self.b = copiedBuffer``).
>> Starting at the point of the write-back, the reference to the copy also 
>> becomes
>> a unique buffer reference.
>> 
>> The ``isUnique`` built-in is lowered to the ``is_unique_addr_br`` pattern 
>> which
>> constructs the Optional in the successor blocks. Using ``isUnique`` in an
>> if-let (as shown above) will end up in two consecutive CFG "diamonds".
>> Simplify-CFG can combine those into a single ``is_unique_addr_br`` diamond.
>> 
>> .. note::
>>  This makes the definition of the unique buffer location lifetime a little 
>> bit
>>  problematic, because the false-branch of ``isUnique`` is not equivalent to
>>  the false-branch of the ``is_unique_addr_br`` instruction (before 
>> SimplifyCFG
>>  can do its job).
>> 
>> The rules for using ``copy_on_write`` and the built-ins are:
>> 
>> 1. ``isUnique`` must be paired with ``endUnique``, but not necessarily in the
>>   same function.
>> 
>> 2. The COW buffer may only be mutated by using the unique buffer reference.
>> 
>> 3. The COW buffer must not be mutated outside the ``isUnique`` - 
>> ``endUnique``
>>   pair.
>> 
>> 4. During the lifetime of the unique buffer reference, the original COW 
>> buffer
>>   reference must not be used in any way, e.g. for reading from the buffer.
>> 
>>   Note that the lifetime of the unique buffer reference does not include the
>>   part between the begin of the ``isUnique`` false-branch and the write-back
>>   of the copy. This means is okay to read from the buffer (using ``self.b``)
>>   for the purpose of copying.
>> 
>> Examples::
>> 
>>    mutating func setSomeData(x: Int) {
>>      let uniqueBuffer = makeMutable()
>>      uniqueBuffer.someData = x
>>      // violates rule 1
>>    }
>> 
>>    mutating func setSomeData(x: Int) {
>>      makeMutable()
>>      self.b.someData = x // violates rule 2
>>      endUnique(&self.b)
>>    }
>> 
>>    mutating func setSomeData(x: Int) {
>>      let uniqueBuffer = makeMutable()
>>      uniqueBuffer.someData = x
>>      endUnique(&self.b)
>>      uniqueBuffer.someData = 27 // violates rule 3
>>    }
>> 
>>    mutating func incrementSomeData() {
>>      let uniqueBuffer = makeMutable()
>>      uniqueBuffer.someData = self.b.someData + 1 // violates rule 4
>>      endUnique(&self.b)
>>    }
>> 
>> 
>> The intention of the rules is to ensure that there is no overlap of a
>> "read-only" life-range with a "mutable" life-range of the buffer reference.
>> It’s the responsibility of the implementor to follow the rules.
>> But the compiler (a mandatory diagnostics pass and the SIL verifier) can
>> statically detect rule violations in obvious cases (with inter-procedural
>> analysis maybe even in most cases).
>> 
>> This approach would require to change some of the internals of our
>> current COW data structures in the stdlib (Array, Dictionary, etc.).
>> For example, the Array make_mutable semantic functions currently do not 
>> return
>> the unique buffer.
>> 
>> Scoping Alternative 2: Implicit Inout Scopes
>> --------------------------------------------
>> 
>> There is an idea (proposal?) to change the representation of inout variables
>> in SIL. This is independent of this proposal, but can be helpful for the
>> purpose of defining the scope of a COW mutation.
>> 
>> The basic idea is that SILGen inserts scoping instructions for *all* inout
>> variables. And those scoping instructions can be used to define the mutating
>> scope of a COW buffer.
>> 
>> The scoping instructions which are inserted by SILGen for an inout scope 
>> are::
>> 
>>    begin_exclusive
>>    end_exclusive
>> 
>> Simliar to ``is_unique_addr_br`` and ``end_unique_addr``, those instructions 
>> take the
>> address of the inout variable as argument. For the optimizer those 
>> instructions
>> look like potential writes to the inout variable.
>> 
>> The implementor of a COW type has to follow the rule that the COW buffer may
>> only be modified in mutating functions of the COW type. But this is the case
>> anyway because any modification needs a uniqueness check and this can only be
>> done in mutating functions.
>> 
>> Example::
>> 
>>    // > mutating func setSomeData(x: Int) {
>>    // Accepts a unique reference to the array value (avoiding refcount 
>> operations)
>>    sil @setSomeData : $(Int, @inout Array) -> () {
>>    bb_entry(%x : Int, %arrayref : $*Array<T>) // Begin scope #0
>> 
>>    // >   makeMutable() (inlined)
>>    // Forward the unique reference to the `self` array value, still avoiding 
>> refcount operations.
>>    // Begin the inlined exclusive scope (could be trivially removed).
>>    begin_exclusive %arrayref : $*Array<T> // Begin scope #1
>> 
>>    // >    if !isUnique(&self._storage) {
>>    // Extract a unique inout reference to the class reference to the array 
>> storage.
>>    // This begins the isUnique() argument's exclusive scope. The memory is 
>> already exclusive
>>    // but the scope helps ensure this is the only alias to _storage.
>>    %arrayref._storageref = struct_element_addr [exclusive] %arrayref, 
>> #Array._storage
>> 
>>    // Uniqueness checking requires an inout reference to the class reference.
>>    // The is_unique instruction does not need to create a new storage 
>> reference.
>>    // It's only purpose is to check the RC count, ensure that the checked 
>> reference
>>    // is inout, and prevent the inout scope from being optimized away.
>>    %isuniq = is_unique %arrayref._storageref : $*@sil_cow ArrayStorage<T>
>> 
>>    // End the isUnique argument's exclusive scope (can also be trivially 
>> removed).
>>    end_exclusive %arrayref._storageref : $*@sil_cow ArrayStorage<T>
>> 
>>    br %isuniq, bb_continue, bb_slow
>> 
>>    bb_slow:
>>    // >      self._storage = copyBuffer(self._storage)
>>    // Produce a new class reference to storage with verifiably unique RC 
>> semantics.
>>    %copied_storage_class = alloc_ref ...
>>    // A begin/end exclusive scope is implicit in store [assign].
>>    store [assign] %copied_storage_class to %arrayref._storageref
>>    br bb_continue
>> 
>>    bb_continue:
>> 
>>    // This marks the end of makeMutable's inout `self` scope. Because Array
>>    // contains a "copy_on_write" property, the SIL verifier needs to
>>    // prove that %arrayref.#_storage has not escaped at this point. This
>>    // is equivalent to checking that %arrayref itself is not copied, and
>>    // checking each projection of the "copy_on_write" storage property
>>    // (%arrayref._storageref) is not copied. Or, if any copies are present,
>>    // they must be consumed within this scope.
>>    end_exclusive %arrayref : $*Array<T> // End scope #1
>> 
>>    // >    self._storage.someData = x
>>    // An _addr instruction with one load/store use doesn't really need its 
>> own scope.
>>    %arrayref._storageref = struct_element_addr %arrayref, #Array._storage
>> 
>>    // ARC optimization can promote this to a borrow, replacing 
>> strong_release with end_borrow.
>>    %arrayref.cow_storage = load [copy] %arrayref._storageref : $*@sil_cow 
>> ArrayStorage
>>    %arrayref._storage = cow_to_ref %arrayref.cow_storage : $@sil_cow 
>> ArrayStorage
>> 
>>    // Write some data into the CoW buffer.
>>    // (For simplicity, pretend ArrayStorage has a "someData" field).
>>    // A single-use _addr instruction, so no scope.
>>    %somedata_addr = ref_element_addr %arrayref._storage, #someData
>>    // A store with an implicit [exclusive] scope.
>>    store [assign] %x to %somedata_addr
>> 
>>    strong_release %arrayref._storage : $*ArrayStorage<T>
>> 
>>    // End the isUnique argument's exclusive scope.
>>    // The same verification is needed here, but the inner scope would be 
>> eliminated.
>>    end_exclusive %arrayref : $*Array<T> // End scope #0
>> 
>> In general this approach looks more "user-friendly" than the first 
>> alternative.
>> But it depends on implementing the general feature to insert the inout 
>> scoping
>> instructions.
>> Also, we still have to think through all the details of this approach.
>> 
>> Dependency between a buffer reference to the scope-begin
>> --------------------------------------------------------
>> 
>> With both alternatives there is no explicit dependency from a buffer 
>> reference
>> to a scope-begin instruction::
>> 
>>    %b_cow = load %baddr
>>    %b = cow_to_ref %b_cow
>>    %x = load %b             // No dependency between this...
>>    ...
>>    begin_exclusive %baddr   // ... and this instruction.
>>    ...
>> 
>> So in theory the optimizer is free to reschedule the instructions::
>> 
>>    %b_cow = load %baddr
>>    %b = cow_to_ref %b_cow
>>    ...
>>    begin_exclusive %baddr
>>    %x = load %b             // Wrong! Buffer could be modified here
>>    ...
>> 
>> We still have to figure out how to cope with this.
>> 
>> - We could add an end-of-lifetime instruction for a COW buffer reference, 
>> which
>>  the optimizer may not move over a begin-of-scope instruction.
>> 
>> - Or we just define the implicit rule for the optimizer that any use of a COW
>>  reference may not be moved over a begin-of-scope instruction.
>> 
>> Preconditions
>> =============
>> 
>> To benefit from COW optimizations in the stdlib Array, Set and Dictionary 
>> data
>> structures we first need eager bridging, meaning getting rid of the bridged
>> buffer. At least for Array this is implemented as low-level bit operations 
>> and
>> optimizations cannot reason about it (e.g. finding a reasonable RC-root for 
>> the
>> buffer reference).
>> 
>> Another thing is that we currently cannot use ``copy_on_write`` for Array
>> because of pinning. Array pins it’s buffer when passing an element address to
>> an inout parameter. This allows the array buffer to be modified even if its
>> reference count is > 1. With ``copy_on_write``, a programmer could break 
>> memory
>> safety when violating the inout rule. Example::
>> 
>>    var arr = [MyClass()]  // a global array
>> 
>>    foo(&arr[0])        // Pins the buffer of arr during the call
>> 
>>    func foo(_ x: inout MyClass) -> Int {
>>      let b = arr       // The ref-count of the buffer is not incremented, 
>> because it is pinned!
>>      let r = b[0]      // optimizer removes the retain of r because it 
>> thinks the following code cannot modify b
>>      arr.removeAll()   // does not copy the array buffer and thus 
>> de-allocates r
>>      return r.i        // use-after-free!
>>    }
> 
> From a quick (re-)reading, this proposal still makes sense to me. My largest 
> comment is that I would really like to have some way that the program will 
> assert if the constraints are violated. Otherwise it will be difficult to 
> debug when your invariants are broken. The state machine seems simple enough 
> that I think you could reuse the pinned buffer and (if necessary) 1 
> additional bit in the object header (or perhaps tagged pointer bits).

Yeah, we could do this. I think it would just be sufficient to write a null 
pointer into the cow reference on at the begin-scope and restore it at the 
end-scope. It’s the same thing as having runtime checks for not violating 
borrowing rules.

> 
> Michael
> 
>> 
>> 
>>> On Oct 11, 2016, at 4:48 PM, Erik Eckstein via swift-dev 
>>> <[email protected]> wrote:
>>> 
>>> This is a proposal for representing copy-on-write buffers in SIL. Actually 
>>> it’s still a draft for a proposal. It also heavily depends on how we move 
>>> forward with SIL ownership.
>>> If you have any comments, please let me know.
>>> 
>>> Erik
>>> 
>>> 
>>> _______________________________________________
>>> swift-dev mailing list
>>> [email protected]
>>> https://lists.swift.org/mailman/listinfo/swift-dev
>>

_______________________________________________
swift-dev mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-dev

Re: [swift-dev] copy-on-write proposal

Reply via email to