After having had this code laying around on my Mac since 6/26/15 (waiting for 
Swift to go open source)
I figured it’s about damn time I submit the actual RFC for it. So without 
further ado…

One of the areas where Swift’s stdlib is still quite lacking is searching.
Which is a shame as searching (along with indexing) might be among
the most important tasks of computer science/software development.
One might not need fast collection searches for writing a banal fart or 
flashlight app,
but almost every other app is likely to benefit from having a proper searching 
API.

So I’d like to fix that.

Attached you find a full RFC along with the full and functional source code to 
make it happen.

I’d love to hear your opinion on this and will be more than happy to answer 
questions.

Rendered Version + Full Implementation: 
https://gist.github.com/regexident/2b7531bd748f57679671 
<https://gist.github.com/regexident/2b7531bd748f57679671>
(The code is tested using Quick/Nimble. Tests would thus have to be ported to 
Swift’s XCTest.)

- Vincent

> Markdown-dump for
> 
> # Improved Collection Search
> 
> * Proposal: 
> [SE-NNNN](https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md)
> * Author(s): [Vincent Esche](https://github.com/regexident)
> * Status: **Review**
> * Review manager: TBD
> 
> ## Introduction
> 
> This RFC proposes an extension to the currently rather limited and linear 
> searching API of `CollectionType`, that is `indexOf(element:)` and 
> `indexOf(predicate:)`.
> It proposes the backwards-compatible refactoring of those existing methods as 
> well as the introduction of a view-based ensemble of methods for efficient 
> binary-search.
> 
> Swift-evolution thread: [link to the discussion thread for that 
> proposal](https://lists.swift.org/pipermail/swift-evolution)
> 
> ## Motivation
> 
> Indexing of and searching in data is a big deal in software development. As 
> such it is crucial to have capable means of doing so in a language that 
> aspires to be a highly performant programming language.
> 
> Swift's current API for searching is basically limited to these two methods 
> on `CollectionType`:
> 
> > ```swift
> > func indexOf(element: Self.Generator.Element) -> Self.Index?
> > ```
> > Returns the first index where value appears in self or nil if value is not 
> > found.
> >
> > **Complexity**: `O(self.count)`.
> 
> and:
> 
> > ```swift
> > func indexOf(@noescape predicate: (Self.Generator.Element) throws -> Bool) 
> > rethrows -> Self.Index?
> > ```
> > Returns the first index where predicate returns true for the corresponding 
> > value, or nil if such value is not found.
> > 
> > **Complexity**: `O(self.count)`.
> 
> The author sees a couple of issue with these two methods:
> 
> 1. They do not provide an option for restricting the "haystack" to a certain 
> `range` of `self`.
> 2. They do not provide a fast (`O(log2(self.count))`) path for sorted 
> collections.
> 3. They should not be limited to `CollectionType` but instead be moved to 
> `Indexable`.
> 
> In many situations it is desirable to perform fast searches on sorted 
> collections instead of having to fall back to naïve linear searches.
> 
> Further more while a single `index` might be the most common subject of 
> interest, there are many scenarios where a `range` or `count` of matches are 
> of more use.
> 
> And anyone who is writing a custom ordered collection type will eventually 
> want to find the insertion index for a given element and will thus end up 
> writing some variant of `lowerBoundOf(…)` or `upperBoundOf(…)` (upon which 
> all the other methods can be implemented, actually).
> 
> ## Proposed solution
> 
> The author proposes:
> 
> 1. A backwards-compatible refactoring of `CollectionType.indices`, moving it 
> to `Indexable`.
> 
> 2. A backwards-compatible refactoring of `indexOf(…)` (adding optional 
> `range:` and moving it to `Indexable`).
> 
> 3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to 
> `Indexable` with a TIME complexity of `O(self.count)`.
> 
> 4. The introduction of a `BinarySearchView` on `Indexable`, allowing for fast 
> (`O(log2(self.count))`) searches on `Indexable` via `indexOf(…)`, 
> `rangeOf(…)`, `countOf(…)`, `lowerBoundOf(…)`, `upperBoundOf(…)` without 
> cluttering `Indexable`'s interface.
> 
> ## Detailed design
> 
> ### `CollectionType.indices`:
> 
> The author proposes the relocation of `.indices` from `CollectionType` to 
> `Indexable`:
> 
> ```swift
> extension Indexable {
>     /// Return the range of valid index values.
>     ///
>     /// The result's `endIndex` is the same as that of `self`.  Because
>     /// `Range` is half-open, iterating the values of the result produces
>     /// all valid subscript arguments for `self`, omitting its `endIndex`.
>     public var indices: Range<Self.Index> { get }
> }
> ```
> 
> After all all it does is provide a convenience method for generating a Range 
> from its `startIndex` and `endIndex`. There is no need for `self` to be a 
> `CollectionType` for this.
> 
> This change should not break any existing code as it generalizes the property.
> 
> ### `Indexable` of `Comparable` elements:
> 
> The author proposes the addition of the following method on `Indexable where 
> Element : Comparable`:
> 
> ```swift
> extension Indexable where Index: RandomAccessIndexType, _Element: Comparable {
>       /// Return true iff all elements in `self` within `range` are sorted 
> according to `isOrderedBefore`.
>       ///
>       /// - Complexity: O(`self.count`)
>       public func isSorted(range: Range<Index>? = nil) -> Bool
> }
> ```
> 
> This method would perform a linear scan of the `Indexable` with a TIME 
> complexity of `O(self.count)`.
> 
> An actual implementation of these methods can be found in `Indexable.swift`.
> 
> ### `Indexable` of `Equatable ` elements:
> 
> The author proposes the addition of the following method on `Indexable where 
> Element : Equatable`:
> 
> ```swift
> extension Indexable where Index : RandomAccessIndexType, _Element : Equatable 
> {
> 
>     /// Returns the first index where `element` appears in `self`
>     /// within range `range` or `nil` if `element` is not found.
>     ///
>     /// - Complexity: O(`self.count`)
>     public func indexOf(element: _Element, range: Range<Index>? = default) -> 
> Self.Index?
> 
>     /// Returns the first range where `element` appears in `self`
>     /// within range `range` or `nil` if `element` is not found.
>     ///
>     /// - Complexity: O(`self.count`)
>     public func rangeOf(element: _Element, range: Range<Index>? = default) -> 
> Range<Self.Index>?
> 
>     /// Returns the number of `element`s that appear in `self`
>     /// within range `range` or `nil` if such an element is not found.
>     ///
>     /// - Complexity: O(`self.count`)
>     public func countOf(element: _Element, range: Range<Index>? = default) -> 
> Int
> }
> ```
> 
> These methods would perform a linear scan of the `Indexable` with a TIME 
> complexity of `O(self.count)`.
> 
> An actual implementation of these methods can be found in `Indexable.swift`.
> 
> ### `Indexable` of any elements:
> 
> The author proposes the addition of the following methods on `Indexable`:
> 
> ```swift
> extension Indexable where Index : RandomAccessIndexType {
> 
>     /// Returns the first index where an element appears in `self`
>     /// within range `range` for which `lensView(element) == value` or `nil` 
> if such an element is not found.
>     ///
>     /// - Complexity: O(`self.count`)
>     public func indexOf(range: Range<Index>? = default, @noescape predicate: 
> (_Element) throws -> Bool) rethrows -> Self.Index?
> 
>     /// Returns the first range where an element appears in `self`
>     /// within range `range` for which `lensView(element) == value` or `nil` 
> if such an element is not found.
>     ///
>     /// - Complexity: O(`self.count`)
>     public func rangeOf(range: Range<Index>? = default, @noescape predicate: 
> (_Element) throws -> Bool) rethrows -> Range<Self.Index>?
> 
>     /// Returns the number of `element`s that appear in `self`
>     /// within range `range` or `nil` if `element` is not found.
>     ///
>     /// - Complexity: O(`self.count`)
>     public func countOf(range: Range<Index>? = default, @noescape predicate: 
> (_Element) throws -> Bool) rethrows -> Int
> 
>     /// Return true iff all elements in `self` within `range` are sorted 
> according to `isOrderedBefore`.
>     /// 
>     /// - Complexity: O(`self.count`)
>     public func isSorted(range: Range<Index>? = default, @noescape 
> isOrderedBefore: (_Element, _Element) throws -> Bool) rethrows -> Bool
> }
> ```
> 
> These methods would perform a linear scan of the `Indexable` with a TIME 
> complexity of `O(self.count)`.
> 
> An actual implementation of these methods can be found in `Indexable.swift`.
> 
> **Note**: For explanation of and reasoning behind the `lensView:` argument as 
> well as the distinction between `T` and `Base._Element`, see below.
> 
> ### `Indexable.binarySearchView(validateAgainst:?)`:
> 
> The author proposes the addition of a `binarySearchView(validateAgainst:?)` 
> method on `Indexable` for obtaining a specialized search view for performing 
> searches with O(`log2(self.count)`) complexity:
> 
> ```swift
> extension Indexable where Index: RandomAccessIndexType {
>       /// Create a view into a sorted indexable that allows access within 
> `bounds`.
>       ///
>       /// - Complexity: O(`self.count`) if a validation closure is provided, 
> otherwise O(`1`).
>       public func binarySearchView(validateAgainst isOrderedBefore: 
> ((_Element, _Element) -> Bool)? = nil) -> BinarySearchView<Self>
> }
> ```
> 
> ### `BinarySearchView<T: Indexable>`:
> 
> The author proposes the introduction of a `BinarySearchView` struct with the 
> following interface:
> 
> ```swift
> public struct BinarySearchView<Base : Indexable where Base.Index : 
> RandomAccessIndexType> : Indexable {
>     public typealias Index = Base.Index
> 
>     public var startIndex: Base.Index { get }
> 
>     public var endIndex: Base.Index { get }
> 
>     public subscript (position: Index) -> Base._Element { get }
> 
>     /// Create a view into a sorted collection `base` that allows access 
> within `bounds`.
>     ///
>     /// - Complexity: O(1).
>     public init(base: Base, validateAgainst isOrderedBefore: ((Base._Element, 
> Base._Element) -> Bool)? = default)
> 
>     /// Returns first index of the first `element` in `self` within range 
> `range` for which
>     /// `lensView(element) < value` evaluates to false or `nil` if `element` 
> is not found.
>     ///
>     /// - Complexity: O(`log2(self.count)`).
>     public func lowerBoundOf<T : Comparable>(value: T, range: Range<Index>? = 
> default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape 
> lensView: Base._Element -> T) -> Base.Index
> 
>     /// Returns first index of the first `element` in `self` within range 
> `range` for which
>     /// `value < lensView(element)` evaluates to true or `nil` if `element` 
> is not found.
>     ///
>     /// - Complexity: O(`log2(self.count)`).
>     public func upperBoundOf<T : Comparable>(value: T, range: Range<Index>? = 
> default, @noescape isOrderedBefore: ((T, T) -> Bool) = default, @noescape 
> lensView: Base._Element -> T) -> Base.Index
> 
>     /// Returns the first index where an element appears in `self`
>     /// within range `range` for which `lensView(element) == value` or `nil` 
> if such an element is not found.
>     ///
>     /// - Complexity: O(`log2(self.count)`).
>     public func indexOf<T : Comparable>(value: T, range: Range<Index>? = 
> default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape 
> lensView: Base._Element -> T) -> Base.Index?
> 
>     /// Returns the range where an element appears in `self`
>     /// within range `range` for which `lensView(element) == value` or `nil` 
> if such an element is not found.
>     ///
>     /// - Complexity: O(`log2(self.count)`).
>     public func rangeOf<T : Comparable>(value: T, range: Range<Index>? = 
> default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape 
> lensView: Base._Element -> T) -> Range<Base.Index>?
> 
>     /// Returns the number of `element`s that appear in `self`
>     /// within range `range` for which `lensView(element) == value` or `nil` 
> if such an element is not found.
>     ///
>     /// - Complexity: O(`log2(self.count)`).
>     public func countOf<T : Comparable>(value: T, range: Range<Index>? = 
> default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape 
> lensView: Base._Element -> T) -> Int
> 
>     internal let _base: Base
> }
> ```
> 
> As well as the following extension for `Indexable where Element: Comparable`:
> 
> ```swift
> extension BinarySearchView where Base._Element : Comparable {
>     /// Returns first index of the first `element` in `self` within range 
> `range` for which
>     /// `… < element` evaluates to false or `nil` if `element` is not found.
>     ///
>     /// - Complexity: O(`log2(self.count)`).
>     public func lowerBoundOf(element: Base._Element, range: Range<Index>? = 
> default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = 
> default) -> Base.Index
> 
>     /// Returns first index of the first `element` in `self` within range 
> `range` for which
>     /// `element < …` evaluates to true or `nil` if `element` is not found.
>     ///
>     /// - Complexity: O(`log2(self.count)`).
>     public func upperBoundOf(element: Base._Element, range: Range<Index>? = 
> default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = 
> default) -> Base.Index
> 
>     /// Returns the first index where `element` appears in `self`
>     /// within range `range` or `nil` if `element` is not found.
>     ///
>     /// - Complexity: O(`log2(self.count)`).
>     public func indexOf(element: Base._Element, range: Range<Index>? = 
> default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = 
> default) -> Base.Index?
> 
>     /// Returns the range where `element` appears in `self`
>     /// within range `range` or `nil` if `element` is not found.
>     ///
>     /// - Complexity: O(`log2(self.count)`).
>     public func rangeOf(element: Base._Element, range: Range<Index>? = 
> default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = 
> default) -> Range<Base.Index>?
> 
>     /// Returns the number of `element`s that appear in `self`
>     /// within range `range` or `nil` if `element` is not found.
>     ///
>     /// - Complexity: O(`log2(self.count)`).
>     public func countOf(element: Base._Element, range: Range<Index>? = 
> default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool = 
> default) -> Int
> }
> ```
> 
> These methods would perform a linear scan of the `Indexable` with a TIME 
> complexity of `O(log2(self.count))`.
> 
> An actual implementation of these methods can be found in 
> `BinarySearchView.swift`.
> 
> ### Why `lensView`?
> 
> Let's assume one had a collection of Persons like this:
> 
> ```swift
> struct Person {
>     let name: String
>     let age: Int
> }
> 
> let persons: [Person] = [ … ]
> ```
> 
> Now let's assume one wanted to find the first person that's called `John 
> Doe`. One could either change `Person` to conform to `Equatable`, allowing 
> `indexOf(element:)` to be called on `persons`. Now that was easy, wasn't it?
> 
> Fine, but what if one wanted to search `Person`s by both their `name` or 
> their `age` depending on the given use case? You're out of luck now.
> 
> Luckily there is a variant of `indexOf(…)` called `indexOf(predicate:)` that 
> is available for any type of `Element` regardless of its conformance to 
> `Equatable`:
> 
> ```swift
> let searchView = persons.binarySearchView()
> let index = persons.indexOf { $0.name == "John Doe" }
> ```
> 
> Unfortunately however such a `predicate` cannot be used when searching for an 
> element's property in a sorted(!) collection or a collection of 
> non-`Comparable` elements within a TIME complexity of `O(log2(self.count))` 
> as binary search uses `<` instead of `==` internally and as the "needle" gets 
> passed as both the first (in `indexOf(…)` itself) and second argument (in 
> `lowerBoundOf(…)` which gets called by `indexOf(…)`) to the operator, one 
> cannot simply change `predicate: (E) -> Bool` to `isOrderedBefore: (V, E) -> 
> Bool` as that would result in a type mismatch (`(V, E)` vs. `(E, V)`).
> 
> The use of `lensView` however allows one to do just that without making a 
> mess.
> 
> ```swift
> let searchView = persons.binarySearchView() // sorted!
> let index = searchView.indexOf("John Doe") { $0.name }
> ```
> 
> The `lensView` works similarly to how keypaths work in Objective-C, just 
> type-safe.
> 
> ## Impact on existing code
> 
> The author proposes:
> 
> 1. Moving `CollectionType.indices` to `Indexable` should not cause any issues 
> with existing code.
> 
> 2. Changing `indexOf(element:)` to `indexOf(element:range:?)` should not 
> cause any issues with existing code as the `range` argument is optional and 
> the default behaves like it used to.
> 
> 2. Changing `indexOf(predicate:)` to `indexOf(range:?predicate:)` should not 
> cause any issues with existing code either, as the `range` argument is 
> optional and the default behaves like it used to. Swift's trailing-closure 
> magic allows `indexOf(predicate: …)` to still properly map to `indexOf(range: 
> nil, predicate: …)`.
> 
> 3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to 
> `Indexable` do not conflict with any existing API.
> 
> 4. The introduction of a `BinarySearchView` on `Indexable` does not conflict 
> with any existing API.
> 
> ## Alternatives considered
> 
> The author's initial approach was to implement all methods on Indexable. This 
> however required all methods to either be provided in two variants (one for 
> unsorted, one for sorted `Indexable`s):
> 
> ```swift
> final public func indexOf(element: _Element, range: Range<Index>? = nil) -> 
> Index? {
>       return …
> }
> 
> final public func sortedIndexOf(element: _Element, range: Range<Index>? = 
> nil) -> Index? {
>       return …
> }
> ```
> 
> Or require the introduction of an additional argument:
> 
> ```swift
> final public func indexOf(element: _Element, range: Range<Index>? = nil, 
> isSorted: Bool = true) -> Index? {
>     if isSorted {
>         return …
>     } else {
>         return …
>     }
> }
> ```
> 
> It also would have cluttered `Indexable` with methods such as `lowerBoundOf` 
> and `upperBoundOf`, which are only ever applicable when `self` is sorted 
> accordingly. The introduction of a dedicated BinarySearchView fixes these 
> issues.
> 
> One also would have had to switch from `predicate: (T) -> Bool` to 
> `isOrderedBefore: (T, T) -> Bool` for all methods for the second unified API 
> alternative.

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to