Introduction

This proposal addresses significant unexpected performance gaps when using 
dictionaries. It introduces type-specific collections for a Dictionary 
instance's keys and values properties.

New DictionaryKeys and DictionaryValues collections provide efficient key 
lookup and mutable access to dictionary values, enabling updates to be 
performed in-place and allowing copy-on-write optimization of stored values.

 
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#motivation>Motivation

This proposal address two problems:

The Dictionary type keys implementation is inefficient, because 
LazyMapCollection doesn't know how to forward lookups to the underlying 
dictionary storage.
Dictionaries do not offer value-mutating APIs. The mutating key-based subscript 
wraps values in an Optional. This prevents types with copy-on-write 
optimizations from recognizing they are singly referenced.
This proposal uses the following [String: [Int]] dictionary to demonstrate 
these problems:

var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]
 
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-dictkeys-search>Inefficient
 dict.keys Search

Swift coders normally test key membership using nil checks or underscored 
optional bindings:

if dict["one"] != nil {
    // ...
}
if let _ = dict["one"] {
    // ...
}
These approaches provide the expected performance of a dictionary lookup but 
they read neither well nor "Swifty". Checking keys reads much better but 
introduces a serious performance penalty: this approach requires a linear 
search through a dictionary's keys to find a match.

if dict.keys.contains("one") {
    // ...
}
A similar dynamic plays out when comparing dict.index(forKey:) and 
dict.keys.index(of:).

 
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-value-mutation>Inefficient
 Value Mutation

Dictionary values can be modified through the keyed subscript by direct 
reassignment or by using optional chaining. Both of these statements append 1 
to the array stored by the key "one":

// Direct re-assignment
dict["one"] = (dict["one"] ?? []) + [1]

// Optional chaining
dict["one"]?.append(1)
Both approaches present problems. The first is complex and hard to read. The 
second ignores the case where "one" is not a key in the dictionary. It forces 
its check into a higher branch and encourages forced unwrapping. Furthermore, 
neither approach allows the array to grow in place. They introduce an 
unnecessary copy of the array's contents even though dict is the sole holder of 
its storage.

Adding mutation to a dictionary's index-based subscripting isn't possible. 
Changing a key stored at a particular index would almost certainly modify its 
hash value, rendering the index incorrect. This violates the requirements of 
the MutableCollection protocol.

 
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#proposed-solution>Proposed
 Solution

This proposal adds a custom collection for the keys and values dictionary 
properties. This follows the example set by String, which presents multiple 
views of its contents. A new DictionaryKeys collection introduces efficient key 
lookup, while a new DictionaryValues collection provides a mutable collection 
interface to dictionary values.

These changes introduce a simple and efficient way of checking whether a 
dictionary includes a key:

// Performant
if dict.keys.contains("one") {
    // ...
}
As a mutable collection, values enables modification without copies or clumsy 
code:

if let i = dict.index(forKey: "one") {
    dict.values[i].append(1)  // no copy here
} else {
    dict["one"] = [1]
}
Both the keys and values collections share the same index type as Dictionary. 
This allows the above sample to be rewritten as:

// Using `dict.keys.index(of:)`
if let i = dict.keys.index(of: "one") {
    dict.values[i].append(1)
} else {
    dict["one"] = [1]
}
 
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#detailed-design>Detailed
 design

The standard library introduces two new collection types: DictionaryKeys and 
DictionaryValues.
A Dictionary's keys and values property types change from LazyMapCollection to 
these new types. 
The new collection types are not directly constructable. They are presented 
only as views into a dictionary.
struct Dictionary<Key: Hashable, Value>: ... {
    var keys: DictionaryKeys<Key, Value> { get }
    var values: DictionaryValues<Key, Value> { get set }

    // Remaining declarations
}

/// A collection view of a dictionary's keys.
struct DictionaryKeys<Key: Hashable, Value>: Collection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Key { get }

    // Other `Collection` requirements
}

/// A mutable collection view of a dictionary's values.
struct DictionaryValues<Key: Hashable, Value>: MutableCollection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Value { get set }

    // Other `Collection` requirements
}
A sample implementation of this proposal can be found in this branch 
<https://github.com/natecook1000/swift/tree/nc-dictionary>.

 
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#impact-on-existing-code>Impact
 on existing code

The performance improvements of using the new DictionaryKeys type and the 
mutability of the DictionaryValuescollection are both additive in nature.

Most uses of these properties are transitory in nature. Adopting this proposal 
should not produce a major impact on existing code. The only impact on existing 
code exists where a program explicitly specifies the type of a dictionary's 
keysor values property. The fix is to change the specified type. 

 
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#alternatives-considered>Alternatives
 considered

The Generics Manifesto 
<https://github.com/apple/swift/blob/master/docs/GenericsManifesto.md> lists 
nested generics as a goal. This could impact the naming and structure of these 
new collection types. 

Instead of DictionaryKeys<Key, Value> and DictionaryValues<Key, Value>, these 
types could be Dictionary<Key, Value>.Keys and Dictionary<Key, Value>.Values. 
However, because many types in the standard library may be revisited once such 
a feature is available (indices, iterators, etc.), the current lack of nesting 
shouldn't prevent consideration of this proposal.

It could be possible to add additional compiler features that manage mutation 
through existing key-based subscripting without the copy-on-write problems of 
the current implementation. I don't know enough about how that would be 
implemented to speak to its feasibility or level of effort. Such a feature 
would reduce the need for a mutable DictionaryValues collection.
_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to