Oh yeah. Huge +1 from me as I’ve had to implement most of these things myself more than once.
Some notes: grouped(by:) seems to be missing from the detailed design section and I think it would apply to Set as well, but it’s not mentioned there either. The changes for remove(at:) seem odd since the motivation seems to be to remove all elements in the dictionary quickly, but Dictionary already has a removeAll(keepingCapacity:) function. I may be missing something. l8r Sean > On Mar 31, 2017, at 8:28 AM, Nate Cook via swift-evolution > <[email protected]> wrote: > > Hello everyone! > > Here is a draft proposal to fill in some usability gaps in the standard > library Dictionary and Set types. I'd love to hear any feedback from the list > before submitting this as a PR. > > Thanks! > Nate > > > > Dictionary & Set Enhancements > > • Proposal: SE-0000 > • Author: Nate Cook > • Review Manager: TBD > • Status: Awaiting review > > Introduction > > This proposal comprises a variety of commonly (and less commonly) suggested > improvements to the standard library's Dictionary type, from merging > initializers to dictionary-specific filter and mapValues methods. The > proposed additions to Dictionary, and the corresponding changes to Set, are > detailed in the sections below. > > • Suggested Improvements: > • Merging initializers and methods > • Key-based subscript with default value > • Dictionary-specific map and filter > • Visible Dictionary capacity > • Dictionary.remove(at:) returns next index > • A grouped(by:) method for sequences > • Relevant changes to Set > • Detailed Design > • Sandbox with Prototype > • Source Compatibility > > 1. Merging initializers and methods > > The Dictionary type should allow initialization from a sequence of (Key, > Value) tuples and offer methods that merge a sequence of (Key, Value) tuples > into a new or existing dictionary, using a closure to combine values for > duplicate keys. > > • First message of discussion thread > • Initial proposal draft > • Prior standalone proposal (SE-100) > Array and Set both have initializers that create a new instance from a > sequence of elements. The Array initializer is useful for converting other > sequences and collections to the "standard" collection type, while the Set > initializer is essential for recovering set operations after performing any > functional operations on a set. For example, filtering a set produces a > collection without any set operations available. > > let numberSet = Set(1 ... 100 > ) > > let fivesOnly = numberSet.lazy.filter { $0 % 5 == 0 } > fivesOnly is a LazyFilterCollection<Set<Int>> instead of a Set—sending that > back through the Set sequence initializer restores the expected methods. > > let fivesOnlySet = Set(numberSet.lazy.filter { $0 % 5 == 0 > }) > fivesOnlySet. > isSubsetOf(numberSet) // true > Dictionary, on the other hand, has no such initializer, so a similar > operation leaves no room to recover dictionary functionality without building > a mutable Dictionary via iteration or functional methods. These techniques > also don't support type inference from the source sequence, increasing > verbosity. > > let numberDictionary = ["one": 1, "two": 2, "three": 3, "four": 4 > ] > > let evenOnly = numberDictionary.lazy.filter { (_, value) in > > value > % 2 == 0 > > } > > > var viaIteration: [String: Int] = [: > ] > > for (key, value) in > evenOnly { > viaIteration[key] > = > value > } > > > let viaReduce: [String: Int] = evenOnly.reduce([:]) { (cumulative, kv) in > > > var dict = > cumulative > dict[kv. > key] = kv.value > > > return > dict > } > > Beyond initialization, Array and Set both also provide a method to add a new > block of elements to an existing collection. Array provides this via > append(contentsOf:) for the common appending case or replaceSubrange(_:with:) > for general inserting or replacing, while the unordered Set type lets you > pass any sequence to unionInPlace(_:) to add elements to an existing set. > > Once again, Dictionary has no corresponding API -- looping and adding > elements one at a time as shown above is the only way to merge new elements > into an existing dictionary. > > > Proposed solution > > This proposal puts forward two new ways to convert (Key, Value) sequences to > dictionary form: a full-width, failable initializer and a set of merging APIs > that handle input data with duplicate keys. > > > Sequence-based initializer > > The proposed solution would add a new, failable initializer to Dictionary > that accepts any sequence of (Key, Value)tuple pairs. > > init?<S: Sequence where S.Iterator.Element == (key: Key, value: Value > )>( > > _ keysAndValues: S) > Instead of the techniques for recovering a Dictionary instance shown above, > the proposed initializer would allow a much cleaner syntax. > > let viaProposed = Dictionary(evenOnly)! > Like Array.init(_:) and Set.init(_:), this is a full-width initializer. To > ensure this, the initializer requires that each key in the supplied sequence > is unique, and returns nil whenever that condition isn't met. This model > prevents accidentally dropping values for keys that might be duplicated, but > allows easier recovery than the trap that results from duplicate keys in a > dictionary literal. > > The new initializer allows for some convenient uses that aren't currently > possible. > > • Initializing from a DictionaryLiteral (the type, not an actual > literal) > > let literal: DictionaryLiteral = ["a": 1, "b": 2, "c": 3, "d": 4 > ] > > let dictFromDL = Dictionary(literal)! > • Swapping keys and values of an existing dictionary > > guard let reversedDict = Dictionary(dictFromDL.map { ($1, $0 > ) }) > > else { throw Errors.reversalFailed > } > > // [2: "b", 4: "d", 1: "a", 3: "c"] > • Converting an array to an indexed dictionary (popular on the thread) > > let names = ["Cagney", "Lacey", "Bensen" > ] > > let dict = Dictionary(names.enumerated().map { (i, val) in (i + 1, val) })! > // [2: "Lacey", 3: "Bensen", 1: "Cagney"] > • Initializing from a pair of zipped sequences (examples abound) > > let letters = "abcdef".characters.lazy.map(String.init > ) > > let dictFromZip = Dictionary(zip(letters, 1...10))! > // ["b": 2, "e": 5, "a": 1, "f": 6, "d": 4, "c": 3] > This particular use is currently blocked by SR-922. As a workaround, add .map > {(key: $0, value: $1)}. > > Merging initializer and methods > > Creating a Dictionary from a dictional literal currently checks the keys for > uniqueness, trapping on a duplicate. The sequence-based initializer shown > above has the same requirements, failing and returning nil when encountering > duplicate keys. > > let duplicates: DictionaryLiteral = ["a": 1, "b": 2, "a": 3, "b": 4 > ] > > let letterDict = Dictionary > (duplicates) > > // nil > However, some use cases can be forgiving of duplicate keys, so this proposal > includes a second new initializer. This initializer allows the caller to > supply, along with the sequence, a combining closure that's called with the > old and new values for any duplicate keys. > > init<S: Sequence > >( > > merging keysAndValues > : S, > > resolvingCollisionsWith combine: (Value, Value) throws -> Value > > ) > rethrows where S.Iterator.Element == (key: Key, value: Value) > This example shows how one could keep the first value of all those supplied > for a duplicate key. > > let letterDict2 = Dictionary(merging: duplicates, resolvingCollisionsWith: { > (first, _) in > first }) > > // ["b": 2, "a": 1] > Or the largest value for any duplicate keys. > > let letterDict3 = Dictionary(merging: duplicates, resolvingCollisionsWith > : max) > > // ["b": 4, "a": 3] > At other times the merging initializer could be used to combine values for > duplicate keys. Donnacha Oisín Kidney wrote a neat frequencies() method for > sequences as an example of such a use in the thread. > > extension Sequence where Iterator.Element: Hashable > { > > func frequencies() -> [Iterator.Element: Int > ] { > > return Dictionary(merging: self.lazy.map { v in (v, 1) }, > resolvingCollisionsWith: + > ) > } > } > [ > 1, 2, 2, 3, 1, 2, 4, 5, 3, 2, 3, 1].frequencies > () > > // [2: 4, 4: 1, 5: 1, 3: 3, 1: 3] > This proposal also includes new mutating and non-mutating methods for > Dictionary that merge the contents of a sequence of (Key, Value) tuples into > an existing dictionary, merge(contentsOf:) and merged(with:). > > mutating func merge<S: Sequence > >( > > contentsOf other > : S, > > resolvingCollisionsWith combine: (Value, Value) throws -> Value > > ) > rethrows where S.Iterator.Element == (key: Key, value: Value > ) > > func merged<S: Sequence > >( > > with other > : S, > > resolvingCollisionsWith combine: (Value, Value) throws -> Value > > ) > rethrows -> [Key: Value] where S.Iterator.Element == (key: Key, value: Value) > As above, there are a wide variety of uses for the merge. > > // Adding default values > let defaults: [String: Bool] = ["foo": false, "bar": false, "baz": false > ] > > var options: [String: Bool] = ["foo": true, "bar": false > ] > options. > merge(contentsOf: defaults) { (old, _) in > old } > > // options is now ["foo": true, "bar": false, "baz": false] > > > // Summing counts repeatedly > var bugCounts: [String: Int] = ["bees": 9, "ants": 112, ... > ] > > while bugCountingSource.hasMoreData > () { > bugCounts. > merge(contentsOf: bugCountingSource.countMoreBugs(), resolvingCollisionsWith: > + > ) > } > > To simplify two common uses of the merging initializer, this proposal > includes the addition of two new top-level functions to the standard library: > first(_:_:) and last(_:_:), which return their first and last arguments, > respectively. These new functions can be passed instead of a custom closure: > > let firstWins = Dictionary(merging: duplicates, resolvingCollisionsWith > : first) > > // ["b": 2, "a": 1] > > let lastWins = Dictionary(merging: duplicates, resolvingCollisionsWith > : last) > > // ["b": 4, "a": 3] > > Alternatives Considered > > The first(_:_:) and last(_:_:) functions may not pull their weight as > top-level symbols. Instead, at the cost of additional overloads for the > merging initializer and methods, we could allow users to specify the useFirst > and useLast cases of a MergeCollisionStrategy enumeration. > > extension Dictionary > { > > /// The strategy to use when merging a sequence of key-value pairs into a > dictionary. > enum MergeCollisionStrategy > { > > /// If there is more than one instance of a key in the sequence to merge, use > /// only the first value for the dictionary. > case useFirst > > > /// If there is more than one instance of a key in the sequence to merge, use > /// only the last value for the dictionary. > case useLast > > } > > > init<S: Sequence > >( > > merging keysAndValues > : S, > > resolvingCollisionsWith strategy > : MergeCollisionStrategy > ) > > > // other merging overloads > } > In use, this overload would look similar to the functional version, but may > aid in discoverability: > > let firstWins = Dictionary(merging: duplicates, resolvingCollisionsWith: > .useFirst > ) > > // ["b": 2, "a": 1] > > let lastWins = Dictionary(merging: duplicates, resolvingCollisionsWith: > .useLast > ) > > // ["b": 4, "a": 3] > > 2. Key-based subscript with default value > > Another common challenge with dictionaries is iteratively making changes to > key/value pairs that may or may not already be present. For example, to > iteratively add count the frequencies of letters in a string, one might write > something like the following: > > let source = "how now brown cow" > var frequencies: [Character: Int] = [: > ] > > for c in source.characters > { > > if frequencies[c] == nil > { > frequencies[c] > = 1 > > } > else > { > frequencies[c] > ! += 1 > > } > } > > Testing for nil and assigning through the force unwrapping operator are > awkward at best for such a common operation. Furthermore, the Optional<Value> > return type of the current keyed subscript complicates efficiencies that > could be gained for this type of modify action under a future ownership model. > > > Proposed solution > > A keyed subscript with a default value neatly simplifies this usage. Instead > of checking for nil, one can pass the default value along with the key as a > default subscript parameter. > > let source = "how now brown cow" > var frequencies: [Character: Int] = [: > ] > > for c in source.characters > { > frequencies[c, > default: 0] += 1 > > } > > The return type of this subscript is a non-optional Value. Note that > accessing the subscript as a getter never stores the default value in the > dictionary—the following two lines are equivalent: > > let x = frequencies["a", default: 0 > ] > > let y = frequencies["a"] ?? 0 > > 3. Dictionary-specific map and filter > > The standard map and filter methods, while always useful and beloved, aren't > ideal when applied to dictionaries. In both cases, the desired end-result is > frequently another dictionary instead of an array of key-value pairs—even > with the sequence-based initializer proposed above this is an inefficient way > of doing things. > > Additionally, the standard map method doesn't gracefully support passing a > function when transforming only the values of a dictionary. The transform > function must accept and return key/value pairs, necessitating a custom > closure in nearly every case. > > Assuming the addition of a sequence-based initializer, the current filter and > map look like the following: > > let numbers = ["one": 1, "two": 2, "three": 3, "four": 4 > ] > > let evens = Dictionary(numbers.lazy.filter { $0.value % 2 == 0 })! > // ["four": 4, "two": 2] > let strings = Dictionary(numbers.lazy.map { (k, v) in (k, String(v)) })! > // ["three": "3", "four": "4", "one": "1", "two": "2"] > > Proposed solution > > This proposal adds two new methods for Dictionary: > > • A mapValues method that keeps a dictionary's keys intact while > transforming the values. Mapping a dictionary's key/value pairs can't always > produce a new dictionary, due to the possibility of key collisions, but > mapping only the values can produce a new dictionary with the same underlying > layout as the original. > > let strings = numbers.mapValues(String.init > ) > > // ["three": "3", "four": "4", "one": "1", "two": "2"] > • A Dictionary-returning filter method. While transforming the keys and > values of a dictionary can result in key collisions, filtering the elements > of a dictionary can at worst replicate the entire dictionary. > > let evens = numbers.filter { $0.value % 2 == 0 > } > > // ["four": 4, "two": 2] > Both of these can be made significantly more efficient than their > Sequence-sourced counterparts. For example, the mapValues method can simply > copy the portion of the storage that holds the keys to the new dictionary > before transforming the values. > > > 4. Visible dictionary capacity > > As you add elements to a dictionary, it automatically grows its backing > storage as necessary. This reallocation is a significant operation—unlike > arrays, where the existing elements can be copied to a new block of storage > en masse, every key/value pair must be moved over individually, recalculating > the hash value for the key to find its position in the larger backing buffer. > > While dictionaries uses an exponential growth strategy to make this as > efficient as possible, beyond the init(minimumCapacity:) initializer they do > not expose a way to reserve a specific capacity. In addition, adding a > key/value pair to a dictionary is guaranteed not to invalidate existing > indices as long as the capacity doesn't change, yet we don't provide any way > of seeing a dictionary's current or post-addition capacity. > > > Proposed solution > > Dictionary should add a capacity property and a reserveCapacity(_:) method, > like those used in range-replaceable collections. The capacity of a > dictionary is the number of elements it can hold without reallocating a > larger backing storage, while calling reserveCapacity(_:) allocates a large > enough buffer to hold the requested number of elements without reallocating. > > var numbers = ["one": 1, "two": 2, "three": 3, "four": 4 > ] > numbers. > capacity // 6 > numbers.reserveCapacity(20 > ) > numbers. > capacity // 24 > Because hashed collections use extra storage capacity to reduce the > likelihood and cost of collisions, the value of the capacity property won't > be equal to the actual size of the backing storage. Likewise, the capacity > after calling reserveCapacity(_:) will be at least as large as the argument, > but usually larger. (In its current implementation, Dictionary always has a > power of 2-sized backing storage.) > > > 5. Dictionary.remove(at:) returns next index > > The method dictionaries use to store their key/value pairs can make it > challenging to sequentially remove elements in an efficient way. To > demonstrate, considering the following hypothetical dictionary: > > var dict = Dictionary<Int, Bool>(minimumCapacity: 5 > ) > dict[ > 3] = true > > dict[ > 7] = true > > dict[ > 1] = false > To add those three elements, dict performs the following steps: > > • Uses 3.hashValue to select a bucket, choosing bucket 5. > • Stores 3 and true at position 5 in the key and value storage, > respectively. > • Uses 7.hashValue to select a bucket, choosing bucket 2. > • Stores 7 and true at position 2. > • Uses 1.hashValue to select a bucket, choosing bucket 5. Collision! > Advances to the next open space, bucket 6. > • Stores 1 and false at position 6. > With these three elements, we have a storage layout depicted by the table > below—7 and 3 are in their ideal buckets, but 1 is not: > > bucket 0 1 2 3 4 5 6 7 > key 7 3 1 > value T T F > To write an algorithm that removes each element from the dictionary, we would > want to do something like this: > > var i = dict.startIndex > while i != dict.endIndex > { > > let next = dict.index(after > : i) > dict. > remove(at > : i) > i > = > next > } > > This will remove (7, true) without incident, but when it removes (3, true), > the dictionary will need to shift (1, false)back one slot so that it is found > in bucket 5. This shift will invalidate the next index that has already been > calculated. With the current index invalidation rules, there's no way to do > this efficiently. > > > Proposed solution > > If the remove(at:) method returns the next valid index, this kind of > algorithm is possible. remove(at:) already returns the key/value pair that > was removed, so this would change the return type to a tuple: > > @discardableResult > mutating func remove(at index: Index) -> > > ( > removed: (key: Key, value: Value), nextIndex: Index) > The code above can be rewritten as the following. When (1, false) is shifted > back into bucket 5, there is no problem, since the method can return that > same index. > > var i = dict.startIndex > while i != dict.endIndex > { > ( > _, i) = dict.remove(at > : i) > } > > This new capability could be used to implement an efficient in-place filter. > > > 6. A grouped(by:) method for sequences > > As a final Dictionary-related issue, grouping elements in a sequence by some > computed key is a commonly requested addition that we can add as part of this > omnibus proposal. Pass a closure that converts each value in a sequence to a > hashable type T; the result of the method is a dictionary with keys of type T > and values of type [Iterator.Element]. > > let names = ["Patti", "Aretha", "Anita", "Gladys" > ] > > > // By first letter > names.grouped(by: { $0.characters.first! > }) > > // ["P": ["Patti"], "G": ["Gladys"], "A": ["Aretha", "Anita"]] > > // By name length > names.grouped { $0.utf16.count > } > > // [5: ["Patti", "Anita"], 6: ["Aretha", "Gladys"]] > > 7. Apply relevant changes to Set > > As the Set and Dictionary types are similar enough to share large chunks of > their implementations, it makes sense to look at which of these Dictionary > enhancements can also be applied to Set. Of the symbols proposed above, the > following additions would also be useful and appropriate for the Set type: > > • Add a Set-specific filter method that returns a new set—this would > function essentially as a predicate-based intersection method. > • Add a capacity property and reserveCapacity() method, for the reasons > listed above. > • Modify remove(at:) to return the index of the next entry, likewise. > > Detailed design > > With the exception of the proposed capacity property and method, the proposed > additions to Dictionary, Set, and Sequence are available in this Swift > Sandbox. Note that this prototype is not a proposed implementation; rather a > way to try out the behavior of the proposed changes. > > Collected in one place, these are the new APIs for Dictionary, Set, and > Sequence: > > struct Dictionary<Key: Hashable, Value > > { > > typealias Element = (key: Key, value: Value > ) > > > // existing declarations > > > /// Creates a new dictionary using the key/value pairs in the given sequence. > /// If the given sequence has any duplicate keys, the result is `nil`. > init?<S: Sequence>(_ keysAndValues: S) where S.Iterator.Element == Element > > > > /// Creates a new dictionary using the key/value pairs in the given sequence, > /// using a combining closure to determine the value for any duplicate > keys. > init<S: Sequence > >( > > merging keysAndValues > : S, > > resolvingCollisionsWith combine: (Value, Value) throws -> Value > > ) > rethrows where S.Iterator.Element == Element > > > > /// Merges the key/value pairs in the given sequence into the dictionary, > /// using a combining closure to determine the value for any duplicate > keys. > mutating func merge<S: Sequence > >( > > contentsOf other > : S, > > resolvingCollisionsWith combine: (Value, Value) throws -> Value > > ) > rethrows where S.Iterator.Element == Element > > > > /// Returns a new dictionary created by merging the key/value pairs in the > /// given sequence into the dictionary, using a combining closure to > determine > /// the value for any duplicate keys. > func merged<S: Sequence > >( > > with other > : S, > > resolvingCollisionsWith combine: (Value, Value) throws -> Value > > ) > rethrows -> [Key: Value] where S.Iterator.Element == Element > > > > /// Accesses the element with the given key, or the specified default value, > /// if the dictionary doesn't contain the given key. > subscript(key: Key, default defaultValue: Value) -> Value { get set > } > > > /// Returns a new dictionary containing the key/value pairs that satisfy > /// the given predicate. > func filter(_ isIncluded: (Key, Value) throws -> Bool) rethrows -> [Key: > Value > ] > > > /// Returns a new dictionary containing the existing keys and the results of > /// mapping the given closure over the dictionary's values. > func mapValues<T>(_ transform: (Value) throws -> T) rethrows -> [Key > : T] > > > /// The number of key/value pairs that can be stored by the dictionary without > /// reallocating storage. > var capacity: Int { get > } > > > /// Ensures that the dictionary has enough storage for `capacity` key/value > /// pairs. > var reserveCapacity(_ capacity: Int > ) > > > /// Removes the key/value pair at the specified index. > /// > /// If you use `remove(at:)` while iterating through the contents of a > /// dictionary, continue iterating using the index returned as > `nextIndex`. > /// Calling this method invalidates all other previously existing indices. > /// > /// - Returns: A tuple containing the removed key/value pair and the index > /// of the next pair in the dictionary. > @discardableResult > > > mutating func remove(at index: Index) -> > > ( > removed: (key: Key, value: Value), nextIndex: Index > ) > } > > > extension Sequence > { > > /// Returns a dictionary where the keys are the groupings returned by > /// the given closure and the values are arrays of the elements that > /// returned each specific key. > func grouped<Key: Hashable > >( > > by grouping: (Iterator.Element) throws -> Key > > ) > rethrows -> [Key: [Iterator.Element > ]] > } > > > struct Set<Element: Hashable > > { > > // existing declarations > > > /// Returns a new set containing the elements that satisfy the given > predicate. > func filter(_ isIncluded: (Element) throws -> Bool) rethrows -> Set > > > > /// The number of elements that can be stored by the set without > /// reallocating storage. > var capacity: Int { get > } > > > /// Ensures that the set has enough storage for `capacity` elements. > var reserveCapacity(_ capacity: Int > ) > > > /// Removes the element at the specified index. > /// > /// If you use `remove(at:)` while iterating through the contents of a > /// set, continue iterating using the index returned as `nextIndex`. > /// Calling this method invalidates all other previously existing indices. > /// > /// - Returns: A tuple containing the removed element and the index > /// of the next element in the set. > @discardableResult > > > mutating func remove(at index: Index) -> (removed: Element, nextIndex: Index > ) > } > > > /// Returns its first argument. > func first<T>(_ a: T, _ b: T) -> > T > > > /// Returns its last argument. > func last<T>(_ a: T, _ b: T) -> T > > Source compatibility > > A significant majority of the proposed additions are purely additive and > should impose no source compatibility burden. The modified return return type > for the two remove(at:) methods, while additive in nature, will break source > compatibility in the cases where the removed value is captured. In theory, a > fix-it should be possible that would help in these cases. > > _______________________________________________ > swift-evolution mailing list > [email protected] > https://lists.swift.org/mailman/listinfo/swift-evolution _______________________________________________ swift-evolution mailing list [email protected] https://lists.swift.org/mailman/listinfo/swift-evolution
