Hey Vincent, good proposal, love to see this in production. Better collection search would become quite handy :-)
regards, Flori On Mon, Jan 4, 2016 at 4:13 PM, Vincent Esche via swift-evolution < [email protected]> wrote: > After having had this code laying around on my Mac since 6/26/15 (waiting > for Swift to go open source) > I figured it’s about damn time I submit the actual RFC for it. So without > further ado… > > One of the areas where Swift’s stdlib is still quite lacking is searching. > Which is a shame as searching (along with indexing) might be among > the most important tasks of computer science/software development. > One might not need fast collection searches for writing a banal fart or > flashlight app, > but almost every other app is likely to benefit from having a proper > searching API. > > So I’d like to fix that. > > Attached you find a full RFC along with the full and functional source > code to make it happen. > > I’d love to hear your opinion on this and will be more than happy to > answer questions. > > Rendered Version + Full Implementation: > https://gist.github.com/regexident/2b7531bd748f57679671 > (The code is tested using Quick/Nimble. Tests would thus have to be ported > to Swift’s XCTest.) > > - Vincent > > Markdown-dump for > > # Improved Collection Search > > * Proposal: [SE-NNNN]( > https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md > ) > * Author(s): [Vincent Esche](https://github.com/regexident) > * Status: **Review** > * Review manager: TBD > > ## Introduction > > This RFC proposes an extension to the currently rather limited and linear > searching API of `CollectionType`, that is `indexOf(element:)` and > `indexOf(predicate:)`. > It proposes the backwards-compatible refactoring of those existing methods > as well as the introduction of a view-based ensemble of methods for > efficient binary-search. > > Swift-evolution thread: [link to the discussion thread for that proposal]( > https://lists.swift.org/pipermail/swift-evolution) > > ## Motivation > > Indexing of and searching in data is a big deal in software development. > As such it is crucial to have capable means of doing so in a language that > aspires to be a highly performant programming language. > > Swift's current API for searching is basically limited to these two > methods on `CollectionType`: > > > ```swift > > func indexOf(element: Self.Generator.Element) -> Self.Index? > > ``` > > Returns the first index where value appears in self or nil if value is > not found. > > > > **Complexity**: `O(self.count)`. > > and: > > > ```swift > > func indexOf(@noescape predicate: (Self.Generator.Element) throws -> > Bool) rethrows -> Self.Index? > > ``` > > Returns the first index where predicate returns true for the > corresponding value, or nil if such value is not found. > > > > **Complexity**: `O(self.count)`. > > The author sees a couple of issue with these two methods: > > 1. They do not provide an option for restricting the "haystack" to a > certain `range` of `self`. > 2. They do not provide a fast (`O(log2(self.count))`) path for sorted > collections. > 3. They should not be limited to `CollectionType` but instead be moved to > `Indexable`. > > In many situations it is desirable to perform fast searches on sorted > collections instead of having to fall back to naïve linear searches. > > Further more while a single `index` might be the most common subject of > interest, there are many scenarios where a `range` or `count` of matches > are of more use. > > And anyone who is writing a custom ordered collection type will eventually > want to find the insertion index for a given element and will thus end up > writing some variant of `lowerBoundOf(…)` or `upperBoundOf(…)` (upon which > all the other methods can be implemented, actually). > > ## Proposed solution > > The author proposes: > > 1. A backwards-compatible refactoring of `CollectionType.indices`, moving > it to `Indexable`. > > 2. A backwards-compatible refactoring of `indexOf(…)` (adding optional > `range:` and moving it to `Indexable`). > > 3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to > `Indexable` with a TIME complexity of `O(self.count)`. > > 4. The introduction of a `BinarySearchView` on `Indexable`, allowing for > fast (`O(log2(self.count))`) searches on `Indexable` via `indexOf(…)`, > `rangeOf(…)`, `countOf(…)`, `lowerBoundOf(…)`, `upperBoundOf(…)` without > cluttering `Indexable`'s interface. > > ## Detailed design > > ### `CollectionType.indices`: > > The author proposes the relocation of `.indices` from `CollectionType` to > `Indexable`: > > ```swift > extension Indexable { > /// Return the range of valid index values. > /// > /// The result's `endIndex` is the same as that of `self`. Because > /// `Range` is half-open, iterating the values of the result produces > /// all valid subscript arguments for `self`, omitting its `endIndex`. > public var indices: Range<Self.Index> { get } > } > ``` > > After all all it does is provide a convenience method for generating a > Range from its `startIndex` and `endIndex`. There is no need for `self` to > be a `CollectionType` for this. > > This change should not break any existing code as it generalizes the > property. > > ### `Indexable` of `Comparable` elements: > > The author proposes the addition of the following method on `Indexable > where Element : Comparable`: > > ```swift > extension Indexable where Index: RandomAccessIndexType, _Element: > Comparable { > /// Return true iff all elements in `self` within `range` are sorted > according to `isOrderedBefore`. > /// > /// - Complexity: O(`self.count`) > public func isSorted(range: Range<Index>? = nil) -> Bool > } > ``` > > This method would perform a linear scan of the `Indexable` with a TIME > complexity of `O(self.count)`. > > An actual implementation of these methods can be found in > `Indexable.swift`. > > ### `Indexable` of `Equatable ` elements: > > The author proposes the addition of the following method on `Indexable > where Element : Equatable`: > > ```swift > extension Indexable where Index : RandomAccessIndexType, _Element : > Equatable { > > /// Returns the first index where `element` appears in `self` > /// within range `range` or `nil` if `element` is not found. > /// > /// - Complexity: O(`self.count`) > public func indexOf(element: _Element, range: Range<Index>? = default) > -> Self.Index? > > /// Returns the first range where `element` appears in `self` > /// within range `range` or `nil` if `element` is not found. > /// > /// - Complexity: O(`self.count`) > public func rangeOf(element: _Element, range: Range<Index>? = default) > -> Range<Self.Index>? > > /// Returns the number of `element`s that appear in `self` > /// within range `range` or `nil` if such an element is not found. > /// > /// - Complexity: O(`self.count`) > public func countOf(element: _Element, range: Range<Index>? = default) > -> Int > } > ``` > > These methods would perform a linear scan of the `Indexable` with a TIME > complexity of `O(self.count)`. > > An actual implementation of these methods can be found in > `Indexable.swift`. > > ### `Indexable` of any elements: > > The author proposes the addition of the following methods on `Indexable`: > > ```swift > extension Indexable where Index : RandomAccessIndexType { > > /// Returns the first index where an element appears in `self` > /// within range `range` for which `lensView(element) == value` or > `nil` if such an element is not found. > /// > /// - Complexity: O(`self.count`) > public func indexOf(range: Range<Index>? = default, @noescape > predicate: (_Element) throws -> Bool) rethrows -> Self.Index? > > /// Returns the first range where an element appears in `self` > /// within range `range` for which `lensView(element) == value` or > `nil` if such an element is not found. > /// > /// - Complexity: O(`self.count`) > public func rangeOf(range: Range<Index>? = default, @noescape > predicate: (_Element) throws -> Bool) rethrows -> Range<Self.Index>? > > /// Returns the number of `element`s that appear in `self` > /// within range `range` or `nil` if `element` is not found. > /// > /// - Complexity: O(`self.count`) > public func countOf(range: Range<Index>? = default, @noescape > predicate: (_Element) throws -> Bool) rethrows -> Int > > /// Return true iff all elements in `self` within `range` are sorted > according to `isOrderedBefore`. > /// > /// - Complexity: O(`self.count`) > public func isSorted(range: Range<Index>? = default, @noescape > isOrderedBefore: (_Element, _Element) throws -> Bool) rethrows -> Bool > } > ``` > > These methods would perform a linear scan of the `Indexable` with a TIME > complexity of `O(self.count)`. > > An actual implementation of these methods can be found in > `Indexable.swift`. > > **Note**: For explanation of and reasoning behind the `lensView:` argument > as well as the distinction between `T` and `Base._Element`, see below. > > ### `Indexable.binarySearchView(validateAgainst:?)`: > > The author proposes the addition of a > `binarySearchView(validateAgainst:?)` method on `Indexable` for obtaining a > specialized search view for performing searches with O(`log2(self.count)`) > complexity: > > ```swift > extension Indexable where Index: RandomAccessIndexType { > /// Create a view into a sorted indexable that allows access within > `bounds`. > /// > /// - Complexity: O(`self.count`) if a validation closure is provided, > otherwise O(`1`). > public func binarySearchView(validateAgainst isOrderedBefore: ((_Element, > _Element) -> Bool)? = nil) -> BinarySearchView<Self> > } > ``` > > ### `BinarySearchView<T: Indexable>`: > > The author proposes the introduction of a `BinarySearchView` struct with > the following interface: > > ```swift > public struct BinarySearchView<Base : Indexable where Base.Index : > RandomAccessIndexType> : Indexable { > public typealias Index = Base.Index > > public var startIndex: Base.Index { get } > > public var endIndex: Base.Index { get } > > public subscript (position: Index) -> Base._Element { get } > > /// Create a view into a sorted collection `base` that allows access > within `bounds`. > /// > /// - Complexity: O(1). > public init(base: Base, validateAgainst isOrderedBefore: > ((Base._Element, Base._Element) -> Bool)? = default) > > /// Returns first index of the first `element` in `self` within range > `range` for which > /// `lensView(element) < value` evaluates to false or `nil` if > `element` is not found. > /// > /// - Complexity: O(`log2(self.count)`). > public func lowerBoundOf<T : Comparable>(value: T, range: > Range<Index>? = default, @noescape isOrderedBefore: (T, T) -> Bool = > default, @noescape lensView: Base._Element -> T) -> Base.Index > > /// Returns first index of the first `element` in `self` within range > `range` for which > /// `value < lensView(element)` evaluates to true or `nil` if > `element` is not found. > /// > /// - Complexity: O(`log2(self.count)`). > public func upperBoundOf<T : Comparable>(value: T, range: > Range<Index>? = default, @noescape isOrderedBefore: ((T, T) -> Bool) = > default, @noescape lensView: Base._Element -> T) -> Base.Index > > /// Returns the first index where an element appears in `self` > /// within range `range` for which `lensView(element) == value` or > `nil` if such an element is not found. > /// > /// - Complexity: O(`log2(self.count)`). > public func indexOf<T : Comparable>(value: T, range: Range<Index>? = > default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape > lensView: Base._Element -> T) -> Base.Index? > > /// Returns the range where an element appears in `self` > /// within range `range` for which `lensView(element) == value` or > `nil` if such an element is not found. > /// > /// - Complexity: O(`log2(self.count)`). > public func rangeOf<T : Comparable>(value: T, range: Range<Index>? = > default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape > lensView: Base._Element -> T) -> Range<Base.Index>? > > /// Returns the number of `element`s that appear in `self` > /// within range `range` for which `lensView(element) == value` or > `nil` if such an element is not found. > /// > /// - Complexity: O(`log2(self.count)`). > public func countOf<T : Comparable>(value: T, range: Range<Index>? = > default, @noescape isOrderedBefore: (T, T) -> Bool = default, @noescape > lensView: Base._Element -> T) -> Int > > internal let _base: Base > } > ``` > > As well as the following extension for `Indexable where Element: > Comparable`: > > ```swift > extension BinarySearchView where Base._Element : Comparable { > /// Returns first index of the first `element` in `self` within range > `range` for which > /// `… < element` evaluates to false or `nil` if `element` is not > found. > /// > /// - Complexity: O(`log2(self.count)`). > public func lowerBoundOf(element: Base._Element, range: Range<Index>? > = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> > Bool = default) -> Base.Index > > /// Returns first index of the first `element` in `self` within range > `range` for which > /// `element < …` evaluates to true or `nil` if `element` is not found. > /// > /// - Complexity: O(`log2(self.count)`). > public func upperBoundOf(element: Base._Element, range: Range<Index>? > = default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> > Bool = default) -> Base.Index > > /// Returns the first index where `element` appears in `self` > /// within range `range` or `nil` if `element` is not found. > /// > /// - Complexity: O(`log2(self.count)`). > public func indexOf(element: Base._Element, range: Range<Index>? = > default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool > = default) -> Base.Index? > > /// Returns the range where `element` appears in `self` > /// within range `range` or `nil` if `element` is not found. > /// > /// - Complexity: O(`log2(self.count)`). > public func rangeOf(element: Base._Element, range: Range<Index>? = > default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool > = default) -> Range<Base.Index>? > > /// Returns the number of `element`s that appear in `self` > /// within range `range` or `nil` if `element` is not found. > /// > /// - Complexity: O(`log2(self.count)`). > public func countOf(element: Base._Element, range: Range<Index>? = > default, @noescape isOrderedBefore: (Base._Element, Base._Element) -> Bool > = default) -> Int > } > ``` > > These methods would perform a linear scan of the `Indexable` with a TIME > complexity of `O(log2(self.count))`. > > An actual implementation of these methods can be found in > `BinarySearchView.swift`. > > ### Why `lensView`? > > Let's assume one had a collection of Persons like this: > > ```swift > struct Person { > let name: String > let age: Int > } > > let persons: [Person] = [ … ] > ``` > > Now let's assume one wanted to find the first person that's called `John > Doe`. One could either change `Person` to conform to `Equatable`, allowing > `indexOf(element:)` to be called on `persons`. Now that was easy, wasn't it? > > Fine, but what if one wanted to search `Person`s by both their `name` or > their `age` depending on the given use case? You're out of luck now. > > Luckily there is a variant of `indexOf(…)` called `indexOf(predicate:)` > that is available for any type of `Element` regardless of its conformance > to `Equatable`: > > ```swift > let searchView = persons.binarySearchView() > let index = persons.indexOf { $0.name == "John Doe" } > ``` > > Unfortunately however such a `predicate` cannot be used when searching for > an element's property in a sorted(!) collection or a collection of > non-`Comparable` elements within a TIME complexity of `O(log2(self.count))` > as binary search uses `<` instead of `==` internally and as the "needle" > gets passed as both the first (in `indexOf(…)` itself) and second argument > (in `lowerBoundOf(…)` which gets called by `indexOf(…)`) to the operator, > one cannot simply change `predicate: (E) -> Bool` to `isOrderedBefore: (V, > E) -> Bool` as that would result in a type mismatch (`(V, E)` vs. `(E, V)`). > > The use of `lensView` however allows one to do just that without making a > mess. > > ```swift > let searchView = persons.binarySearchView() // sorted! > let index = searchView.indexOf("John Doe") { $0.name } > ``` > > The `lensView` works similarly to how keypaths work in Objective-C, just > type-safe. > > ## Impact on existing code > > The author proposes: > > 1. Moving `CollectionType.indices` to `Indexable` should not cause any > issues with existing code. > > 2. Changing `indexOf(element:)` to `indexOf(element:range:?)` should not > cause any issues with existing code as the `range` argument is optional and > the default behaves like it used to. > > 2. Changing `indexOf(predicate:)` to `indexOf(range:?predicate:)` should > not cause any issues with existing code either, as the `range` argument is > optional and the default behaves like it used to. Swift's trailing-closure > magic allows `indexOf(predicate: …)` to still properly map to > `indexOf(range: nil, predicate: …)`. > > 3. The addition of `rangeOf(…)`, `countOf(…)` and `isSorted(…)` to > `Indexable` do not conflict with any existing API. > > 4. The introduction of a `BinarySearchView` on `Indexable` does not > conflict with any existing API. > > ## Alternatives considered > > The author's initial approach was to implement all methods on Indexable. > This however required all methods to either be provided in two variants > (one for unsorted, one for sorted `Indexable`s): > > ```swift > final public func indexOf(element: _Element, range: Range<Index>? = nil) > -> Index? { > return … > } > > final public func sortedIndexOf(element: _Element, range: Range<Index>? = > nil) -> Index? { > return … > } > ``` > > Or require the introduction of an additional argument: > > ```swift > final public func indexOf(element: _Element, range: Range<Index>? = nil, > isSorted: Bool = true) -> Index? { > if isSorted { > return … > } else { > return … > } > } > ``` > > It also would have cluttered `Indexable` with methods such as > `lowerBoundOf` and `upperBoundOf`, which are only ever applicable when > `self` is sorted accordingly. The introduction of a dedicated > BinarySearchView fixes these issues. > > One also would have had to switch from `predicate: (T) -> Bool` to > `isOrderedBefore: (T, T) -> Bool` for all methods for the second unified > API alternative. > > > > _______________________________________________ > swift-evolution mailing list > [email protected] > https://lists.swift.org/mailman/listinfo/swift-evolution > >
_______________________________________________ swift-evolution mailing list [email protected] https://lists.swift.org/mailman/listinfo/swift-evolution
