> On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution > <[email protected]> wrote: > > I’m dumb when it comes to proper hashing, but it’s such a tediously common > thing in my experience to need to add Hashable to some kind of a struct so I > can stash it in a set or use it as a dictionary key. Is there really no way > to make this all more automatic? I have to be honest - this proposal seems > *harder* to understand than the way it works now.
It's not really harder: just call hash on each of your type's significant values: x.hash(&hasher) y.hash(&hasher) How would you implement hashValue in a simpler way, remembering that 'x ^ y' is an incorrect implementation? > Of course the easiest would be if the language could just do this “good > enough" for me using reflection or whatever and if I really did run into a > problem where I wanted to do this myself, I could override something. > > Perfect is the enemy of good. > > l8r > Sean > > >> On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution >> <[email protected]> wrote: >> >> Hi there, >> >> I've written up a proposal for an overhaul/replacement of the Hashable >> protocol and would love to hear your feedback on it! >> >> Rendered | Blog Post >> >> Cheers, >> Vincent >> >> Ps: I'd like to thank David Hart (@hartbit) for his great editorial feedback >> on this proposal. 👍 >> >> HashVisitable >> >> • Proposal: SE-NNNN >> • Authors: Vincent Esche >> • Review Manager: TBD >> • Status: Awaiting review >> Introduction >> >> Replace the Hashable protocol by two new procotols (Hasher and >> HashVisitable) to improve safety, versatility and learnability. >> >> Motivation >> >> Implementing Hashable is difficult and the consequences if not done well >> have dire performance and safety repercussions. >> >> The documentation of Hashable lists a sample implementation of var hashValue: >> >> /// A point in an x-y coordinate system. >> struct GridPoint { >> var x: Int >> var y: Int >> } >> >> extension GridPoint: Hashable { >> var hashValue: Int { >> return x.hashValue ^ y.hashValue >> } >> >> static func == (lhs: GridPoint, rhs: GridPoint) -> Bool { >> return lhs.x == rhs.x && lhs.y == rhs.y >> } >> } >> >> Calculating the hashes of all GridPoints (given the above implementation) on >> a 1000 × 1000 grid … >> >> let (width, height) = (1000, 1000) >> let total = width * height >> var hashes = Set<Int>() >> for x in 0..<width { >> for y in 0..<height { >> hashes.insert(GridPoint(x: x, y: y).hashValue) >> } >> } >> print("\(hashes.count) unique hashes out of a total of \(total).") >> >> … results in just 1024 unique hash values for 1_000_000 unique values. >> >> In other words: The recommended implementation causes 99.9% of values to >> trigger a hash collision. >> >> Out of those 1_000_000 values the median collision count was 976 with min >> and max being 976 and 1000respectively. >> >> The collision rate will have negative impact in algorithms which heavily use >> hashValue like the ones in Dictionaryand Set. Furthermore, it increases the >> vulnerability to DDOS attacks when exposed to the web. >> >> If even the official Swift documentation gets the implementation of >> hashValue wrong, then who is to expect the average Swift programmer to do >> any better? >> >> In contrast, running the same snippet using HashVisitable and the >> semi-secure Fnv1aHash (see below) results in zero collisions! >> >> Finally, the design of the Hashable protocol forces the use of one >> implementation without the possibility of switching between multiple hashing >> algorithms. >> >> Proposed solution >> >> Instead of coupling the hashing algorithm with each and every Swift type, we >> should provide a hashing API based on the visitor-pattern. By freeing >> application developers from the burden of having to implement hashing >> algorithms, the Standard Library can provide default ones which fulfill >> collision, performance and security goals. Furthermore, it would allow >> developers to swap to different algorithms based on the use case. >> >> Detailed design >> >> The proposal deprecates the Hashable protocol and introduces the following >> two: >> >> protocol Hasher >> { >> >> mutating func finish() -> Int >> >> >> mutating func write(bytes >> : UnsafeRawBufferPointer) >> } >> >> >> protocol HashVisitable >> { >> >> func hash<H: Hasher>(_ hasher: inout >> H) >> } >> >> Hasher is the protocol which represents a hashing algorithm, and >> HashVisitable replaces Hashable. For types entirely represented by their >> memory layout, the following protocol would provide a default implementation: >> >> protocol ContiguouslyHashable: HashVisitable {} >> >> extension ContiguouslyHashable { >> func hash<H: Hasher>(_ hasher: inout H) { >> var mutableSelf = self >> try! Swift.withUnsafeBytes(of: &mutableSelf) { >> hasher.write(bytes: $0) >> } >> } >> } >> >> extension Bool : ContiguouslyHashable {} >> extension UInt8 : ContiguouslyHashable {} >> extension UInt16 : ContiguouslyHashable {} >> extension UInt32 : ContiguouslyHashable {} >> extension UInt64 : ContiguouslyHashable {} >> extension UInt : ContiguouslyHashable {} >> extension Int8 : ContiguouslyHashable {} >> extension Int16 : ContiguouslyHashable {} >> extension Int32 : ContiguouslyHashable {} >> extension Int64 : ContiguouslyHashable {} >> extension Int : ContiguouslyHashable {} >> >> The Standard-Library would then provide a set of hashing implementations >> specific to each purpose. A possible choice for hashing algorithms would be >> the reasonably fast SipHash-2-4, and the reasonably secure SipHash-4-8. >> >> FNV-1A is another popular semi-secure but blazingly fast hash algorithm, >> which – for the sake of demonstration – could be implemented as follows: >> >> struct Fnv1aHash >> { >> >> fileprivate var state: UInt >> >> >> init(seed: UInt >> ) { >> >> self.state = seed &+ 14695981039346656037 >> >> } >> } >> >> >> extension Fnv1aHash: Hasher >> { >> >> mutating func write(bytes >> : UnsafeRawBufferPointer) { >> >> for byte in >> bytes { >> >> self.state = (self.state ^ UInt(byte)) &* 1099511628211 >> >> } >> } >> >> mutating func finish() -> Int >> { >> >> return unsafeBitCast(self.state, to: Int.self >> ) >> } >> } >> >> Coming back to the sample code present in the Hashable documentation, the >> new implementation would look like: >> >> extension GridPoint: HashVisitable >> { >> >> func hash<H: Hasher>(_ hasher: inout >> H) { >> >> self.x.hash(& >> hasher) >> >> self.y.hash(& >> hasher) >> } >> } >> >> Source compatibility >> >> Making use of "extending protocols to conform to protocols": >> >> extension Hashable: HashVisitable >> { >> >> func hash<H: Hasher>(_ hasher: inout >> H) { >> >> self.hashValue.hash(& >> hasher) >> } >> } >> >> Effect on ABI stability >> >> n/a >> >> Effect on API resilience >> >> This feature should be possible to add/remove without breaking ABI. >> >> Alternatives considered >> >> n/a >> _______________________________________________ >> swift-evolution mailing list >> [email protected] >> https://lists.swift.org/mailman/listinfo/swift-evolution > > _______________________________________________ > swift-evolution mailing list > [email protected] > https://lists.swift.org/mailman/listinfo/swift-evolution _______________________________________________ swift-evolution mailing list [email protected] https://lists.swift.org/mailman/listinfo/swift-evolution
