I’m dumb when it comes to proper hashing, but it’s such a tediously common thing in my experience to need to add Hashable to some kind of a struct so I can stash it in a set or use it as a dictionary key. Is there really no way to make this all more automatic? I have to be honest - this proposal seems *harder* to understand than the way it works now. Of course the easiest would be if the language could just do this “good enough" for me using reflection or whatever and if I really did run into a problem where I wanted to do this myself, I could override something.
Perfect is the enemy of good. l8r Sean > On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution > <[email protected]> wrote: > > Hi there, > > I've written up a proposal for an overhaul/replacement of the Hashable > protocol and would love to hear your feedback on it! > > Rendered | Blog Post > > Cheers, > Vincent > > Ps: I'd like to thank David Hart (@hartbit) for his great editorial feedback > on this proposal. 👍 > > HashVisitable > > • Proposal: SE-NNNN > • Authors: Vincent Esche > • Review Manager: TBD > • Status: Awaiting review > Introduction > > Replace the Hashable protocol by two new procotols (Hasher and HashVisitable) > to improve safety, versatility and learnability. > > Motivation > > Implementing Hashable is difficult and the consequences if not done well have > dire performance and safety repercussions. > > The documentation of Hashable lists a sample implementation of var hashValue: > > /// A point in an x-y coordinate system. > struct GridPoint { > var x: Int > var y: Int > } > > extension GridPoint: Hashable { > var hashValue: Int { > return x.hashValue ^ y.hashValue > } > > static func == (lhs: GridPoint, rhs: GridPoint) -> Bool { > return lhs.x == rhs.x && lhs.y == rhs.y > } > } > > Calculating the hashes of all GridPoints (given the above implementation) on > a 1000 × 1000 grid … > > let (width, height) = (1000, 1000) > let total = width * height > var hashes = Set<Int>() > for x in 0..<width { > for y in 0..<height { > hashes.insert(GridPoint(x: x, y: y).hashValue) > } > } > print("\(hashes.count) unique hashes out of a total of \(total).") > > … results in just 1024 unique hash values for 1_000_000 unique values. > > In other words: The recommended implementation causes 99.9% of values to > trigger a hash collision. > > Out of those 1_000_000 values the median collision count was 976 with min and > max being 976 and 1000respectively. > > The collision rate will have negative impact in algorithms which heavily use > hashValue like the ones in Dictionaryand Set. Furthermore, it increases the > vulnerability to DDOS attacks when exposed to the web. > > If even the official Swift documentation gets the implementation of hashValue > wrong, then who is to expect the average Swift programmer to do any better? > > In contrast, running the same snippet using HashVisitable and the semi-secure > Fnv1aHash (see below) results in zero collisions! > > Finally, the design of the Hashable protocol forces the use of one > implementation without the possibility of switching between multiple hashing > algorithms. > > Proposed solution > > Instead of coupling the hashing algorithm with each and every Swift type, we > should provide a hashing API based on the visitor-pattern. By freeing > application developers from the burden of having to implement hashing > algorithms, the Standard Library can provide default ones which fulfill > collision, performance and security goals. Furthermore, it would allow > developers to swap to different algorithms based on the use case. > > Detailed design > > The proposal deprecates the Hashable protocol and introduces the following > two: > > protocol Hasher > { > > mutating func finish() -> Int > > > mutating func write(bytes > : UnsafeRawBufferPointer) > } > > > protocol HashVisitable > { > > func hash<H: Hasher>(_ hasher: inout > H) > } > > Hasher is the protocol which represents a hashing algorithm, and > HashVisitable replaces Hashable. For types entirely represented by their > memory layout, the following protocol would provide a default implementation: > > protocol ContiguouslyHashable: HashVisitable {} > > extension ContiguouslyHashable { > func hash<H: Hasher>(_ hasher: inout H) { > var mutableSelf = self > try! Swift.withUnsafeBytes(of: &mutableSelf) { > hasher.write(bytes: $0) > } > } > } > > extension Bool : ContiguouslyHashable {} > extension UInt8 : ContiguouslyHashable {} > extension UInt16 : ContiguouslyHashable {} > extension UInt32 : ContiguouslyHashable {} > extension UInt64 : ContiguouslyHashable {} > extension UInt : ContiguouslyHashable {} > extension Int8 : ContiguouslyHashable {} > extension Int16 : ContiguouslyHashable {} > extension Int32 : ContiguouslyHashable {} > extension Int64 : ContiguouslyHashable {} > extension Int : ContiguouslyHashable {} > > The Standard-Library would then provide a set of hashing implementations > specific to each purpose. A possible choice for hashing algorithms would be > the reasonably fast SipHash-2-4, and the reasonably secure SipHash-4-8. > > FNV-1A is another popular semi-secure but blazingly fast hash algorithm, > which – for the sake of demonstration – could be implemented as follows: > > struct Fnv1aHash > { > > fileprivate var state: UInt > > > init(seed: UInt > ) { > > self.state = seed &+ 14695981039346656037 > > } > } > > > extension Fnv1aHash: Hasher > { > > mutating func write(bytes > : UnsafeRawBufferPointer) { > > for byte in > bytes { > > self.state = (self.state ^ UInt(byte)) &* 1099511628211 > > } > } > > mutating func finish() -> Int > { > > return unsafeBitCast(self.state, to: Int.self > ) > } > } > > Coming back to the sample code present in the Hashable documentation, the new > implementation would look like: > > extension GridPoint: HashVisitable > { > > func hash<H: Hasher>(_ hasher: inout > H) { > > self.x.hash(& > hasher) > > self.y.hash(& > hasher) > } > } > > Source compatibility > > Making use of "extending protocols to conform to protocols": > > extension Hashable: HashVisitable > { > > func hash<H: Hasher>(_ hasher: inout > H) { > > self.hashValue.hash(& > hasher) > } > } > > Effect on ABI stability > > n/a > > Effect on API resilience > > This feature should be possible to add/remove without breaking ABI. > > Alternatives considered > > n/a > _______________________________________________ > swift-evolution mailing list > [email protected] > https://lists.swift.org/mailman/listinfo/swift-evolution _______________________________________________ swift-evolution mailing list [email protected] https://lists.swift.org/mailman/listinfo/swift-evolution
