Hi swift-evolution,

Below is a draft proposal for a change to facilitate fixing some memory safety 
issues in the standard library.

If you’re interested in the implementation, PRs can be found here: [stdlib] 
[WIP] Eliminate version of Array.append(contentsOf:) that takes a Collection 
<https://github.com/apple/swift/pull/5521> and here: [stdlib] [WIP] Add 
UnsafeRawBufferPointer.initialize(as:from:) 
<https://github.com/apple/swift/pull/5718>
(they need a bit of performance work before they’re ready to land)

Introduction

The version of UnsafeMutablePointer.initialize(from:) that takes a Collection 
should be deprecated in favour of a new method on UnsafeMutableBufferPointer 
that takes a Sequence, with a goal of improving memory safety and enabling 
faster initialization of memory from sequences. Similarly, 
UnsafeMutableRawPointer.initializeMemory(as:from:) should be deprecated in 
favour of a new UnsafeMutableRawBufferPointer.initialize(as:from:).

 
<https://github.com/airspeedswift/swift-evolution/blob/b1d05032567e5c7b469f3ccca1bc1a42175a2400/proposals/NNNN-move-unsafe-initialize-from.md#motivation>Motivation

UnsafeMutablePointer.initialize(from:) underpins implementations of 
collections, such as Array, which are backed by a buffer of contiguous memory. 
When operations such as Array.append are implemented, they first ensure their 
backing store can accommodate the number of elements in a source collection, 
then pass the that collection into the initialize method of their backing store.

Unfortunately there is a major flaw in this design: a collection's count might 
not accurately reflect the number of elements returned by its iterator. For 
example, some collections can be misused to return different results on each 
pass. Or a collection could just be implemented incorrectly.

If the collection's count ends up being lower than the actual number of 
elements yielded by its iterator, the caller may not allocate enough memory for 
them. Since UnsafeMutablePointer.initialize(from:) does not receive a limiting 
capacity, this method would then scribble past the end of the buffer, resulting 
in undefined behaviour.

Normally when using Unsafe... constructs in Swift the burden of ensuring safety 
falls on the caller. When using this method with something known to have 
correct behavior, like an Array, you can do that. But when used in a generic 
context like Array.append(contentsOf:), where the caller of initialize does not 
know exactly what kind of collection they are passing in, it is impossible to 
use this method safely. You can see the impact of this by running the following 
code. which exhibits memory-unsafe behavior despite only using “safe” 
constructs from the standard library, something that shouldn’t be possible:

var i = 0
let c = repeatElement(42, count: 10_000).lazy.filter { _ in 
    // capture i and use it to exhibit inconsistent
    // behavior across iteration of c
    i += 1; return i > 10_000 
}
var a: [Int] = []
// a will allocate insufficient memory before
// calling self._buffer.initialize(from: c)
a.append(contentsOf: c) // memory access violation
While a collection returning an inconsistent count is a programming error (in 
this case, use of the lazy filter in combination with an logically impure 
function, breaking value semantics), and it would be reasonable for the 
standard library to trap under these circumstances, undefined behavior like 
this is not OK.

In addition, the requirement to pre-allocate enough memory to accommodate 
from.count elements rules out using this method to initialize memory from a 
sequence, since sequences don't have a count property (they have an 
underestimatedCount but this isn't enough since underestimated counts are 
exactly the problem described above). The proposed solution would allow for 
this, enabling some internal performance optimizations.

 
<https://github.com/airspeedswift/swift-evolution/blob/b1d05032567e5c7b469f3ccca1bc1a42175a2400/proposals/NNNN-move-unsafe-initialize-from.md#proposed-solution>Proposed
 solution

The existing initialize method should be altered to receive a count of 
allocated memory to avoid running beyond what the caller has allocated. Given 
UnsafeMutableBufferPointer already exists to encapsulate "pointer plus a 
count", the method should be moved to that type and the old method deprecated.

This new method should take a Sequence as its from argument, and handle 
possible element overflow, returning an Iterator of any elements not written 
due to a lack of space. It should also return an index into the buffer to 
indicate where the elements were written up to in cases of underflow.

Once this has been done, the version of Array.append(contentsOf:) that takes a 
collection can be eliminated, since the performance benefits of the collection 
version could be incorporated into the implementation of the one that takes a 
sequence.

The intention of this change is to add memory safety, not to allow the 
flexibility of collections giving inconsistent counts. Therefore the 
precondition should remain that the caller should ensure enough memory is 
allocated to accommodate source.underestedCount elements. The only difference 
is if they don’t, the behaviour should be well-defined (ideally by trapping, if 
this can be done efficiently). 

Therefore:

Under-allocating the destination buffer relative to underestimatedCount may 
trap at runtime. May rather than will because this is an O(n) operation on some 
collections, so probably should only be enforced in debug builds. But when not 
enforced, the behavior would still be memory-safe when the precondition is 
broken.
Over-allocating the destination buffer relative to underestimatedCount is valid 
and simply results in sequence underflow with potentially uninitialized buffer 
memory (a likely case with arrays that reserve more than they need).
The source sequence's actual count may exceed both underEstimatedCount and the 
destination buffer size, resulting in sequence overflow. This is also valid and 
handled by returning an iterator to the uncopied elements as an overflow 
sequence.
A matching change should also be made to 
UnsafeRawBufferPointer.initializeMemory(from:). The one difference is that for 
convenience this should return an UnsafeMutableBufferPointer of the (typed) 
intialized elements instead of an index into the raw buffer.

 
<https://github.com/airspeedswift/swift-evolution/blob/b1d05032567e5c7b469f3ccca1bc1a42175a2400/proposals/NNNN-move-unsafe-initialize-from.md#detailed-design>Detailed
 design

The following API changes would be made:

extension UnsafeMutablePointer {
  @available(*, deprecated, message: "it will be removed in Swift 4.0.  Please 
use 'UnsafeMutableBufferPointer.initialize(from:)' instead")
  public func initialize<C : Collection>(from source: C)
      where C.Iterator.Element == Pointee 
}

extension UnsafeMutableBufferPointer {
  /// Initializes memory in the buffer with the elements of `source`.
  /// Returns an iterator to any elements of `source` that didn't fit in the 
  /// buffer, and an index to the point in the buffer one past the last element
  /// written (so `startIndex` if no elements written, `endIndex` if the buffer 
  /// was completely filled).
  ///
  /// - Precondition: The memory in `self` is uninitialized. The buffer must 
contain
  ///   sufficient uninitialized memory to accommodate 
`source.underestimatedCount`.
  ///
  /// - Postcondition: The returned iterator
  /// - Postcondition: The `Pointee`s at `self[startIndex..<initializedUpTo]` 
  ///   are initialized.
  @discardableResult
  public func initialize<S: Sequence>(
    from source: S
  ) -> (unwritten: S.Iterator, initializedUpTo: Index)
    where S.Iterator.Element == Iterator.Element
}

extension UnsafeMutableRawPointer {
  @available(*, deprecated, message: "it will be removed in Swift 4.0.  Please 
use 'UnsafeMutableRawBufferPointer.initialize(from:)' instead")
  @discardableResult
  public func initializeMemory<C : Collection>(
   as: C.Iterator.Element.Type, from source: C
  ) -> UnsafeMutablePointer<C.Iterator.Element>
}

extension UnsafeMutableRawBufferPointer {
  /// Initializes memory in the buffer with the elements of
  /// `source` and binds the initialized memory to type `T`.
  ///
  /// Returns an iterator to any elements of `source` that didn't fit in the 
  /// buffer, and an index into the buffer one past the last byte written.
  ///
  /// - Precondition: The memory in `self` is uninitialized or initialized to a
  ///   trivial type.
  ///
  /// - Precondition: The buffer must contain sufficient memory to
  ///   accommodate at least `source.underestimatedCount` elements.
  ///
  /// - Postcondition: The memory at `self[startIndex..<initialized.count *
  ///   MemoryLayout<S.Iterator.Element>.stride] is bound to type `S.Iterator`.
  ///
  /// - Postcondition: The memory at `self[startIndex..<initialized.count *
  ///   MemoryLayout<S.Iterator.Element>.stride] are initialized..
  @discardableResult
  public func initializeMemory<S: Sequence>(
     as: S.Iterator.Element.Type, from source: S
  ) -> (unwritten: S.Iterator, initialized: 
UnsafeRawBufferPointer<S.Iterator.Element>)
}
The += operators and append<C : Collection>(contentsOf newElements: C) methods 
on Array, ArraySlice and ContiguousArray will be removed as no-longer needed, 
since the implementation that takes a sequence can be made to be as efficient. 
They can be replaced by a generic one that calls 
RangeReplaceableCollection.append(contenstsOf:):

(note, because it forwards on to a protocol requirement, it itself does not 
need to be a static operator protocol requirement)

/// Appends the elements of a sequence to a range-replaceable collection.
///
/// Use this operator to append the elements of a sequence to the end of
/// range-replaceable collection with same `Element` type. This example
/// appends the elements of a `Range<Int>` instance to an array of
/// integers.
///
///     var numbers = [1, 2, 3, 4, 5]
///     numbers += 10...15
///     print(numbers)
///     // Prints "[1, 2, 3, 4, 5, 10, 11, 12, 13, 14, 15]"
///
/// - Parameters:
///   - lhs: The array to append to.
///   - rhs: A collection or finite sequence.
///
/// - Complexity: O(*n*), where *n* is the length of the resulting array.
public func += <
  R : RangeReplaceableCollection, S : Sequence
>(lhs: inout R, rhs: S) 
 where R.Iterator.Element == S.Iterator.Element
 
<https://github.com/airspeedswift/swift-evolution/blob/b1d05032567e5c7b469f3ccca1bc1a42175a2400/proposals/NNNN-move-unsafe-initialize-from.md#source-compatibility>Source
 compatibility

The addition of the new method does not affect source compatibility. The 
deprecation of the old method does, but since this is a fundamentally unsound 
operation that cannot be fixed except via a source-breaking change, it should 
be aggressively deprecated and then removed.

The knock-on ability to remove the version of Array.append(contentsOf:) that 
takes a collection does not affect source compatability since the version for 
sequences will be called for collections instead.

 
<https://github.com/airspeedswift/swift-evolution/blob/b1d05032567e5c7b469f3ccca1bc1a42175a2400/proposals/NNNN-move-unsafe-initialize-from.md#effect-on-abi-stability>Effect
 on ABI stability

This change must be made prior to declaring ABI stability, since it is 
currently called from the Array.append method, which is inlineable.

 
<https://github.com/airspeedswift/swift-evolution/blob/b1d05032567e5c7b469f3ccca1bc1a42175a2400/proposals/NNNN-move-unsafe-initialize-from.md#alternatives-considered>Alternatives
 considered

Overflow (elements remain on the returned iterator) and underflow 
(initializedUpTo != endIndex) are almost but not quite mutually exclusive – if 
the buffer is exactly used, the caller must call .next() to check for any 
unwritten elements, which means the returned value must be declared var, and 
the check can't be chained. This is a little ugly, but is the unavoidable 
consequence of how iterators work: since iterating is consuming, the initialize 
method cannot easily test for this and indicate it back to the caller in some 
other way (such as returning an optional iterator).
_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to