I’m dumb when it comes to proper hashing, but it’s such a tediously common 
thing in my experience to need to add Hashable to some kind of a struct so I 
can stash it in a set or use it as a dictionary key. Is there really no way to 
make this all more automatic? I have to be honest - this proposal seems 
*harder* to understand than the way it works now. Of course the easiest would 
be if the language could just do this “good enough" for me using reflection or 
whatever and if I really did run into a problem where I wanted to do this 
myself, I could override something.

Perfect is the enemy of good.

l8r
Sean


> On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution 
> <[email protected]> wrote:
> 
> Hi there,
> 
> I've written up a proposal for an overhaul/replacement of the Hashable 
> protocol and would love to hear your feedback on it!
> 
> Rendered | Blog Post
> 
> Cheers,
> Vincent
> 
> Ps: I'd like to thank David Hart (@hartbit) for his great editorial feedback 
> on this proposal. 👍
> 
> HashVisitable
> 
>       • Proposal: SE-NNNN
>       • Authors: Vincent Esche
>       • Review Manager: TBD
>       • Status: Awaiting review
> Introduction
> 
> Replace the Hashable protocol by two new procotols (Hasher and HashVisitable) 
> to improve safety, versatility and learnability.
> 
> Motivation
> 
> Implementing Hashable is difficult and the consequences if not done well have 
> dire performance and safety repercussions.
> 
> The documentation of Hashable lists a sample implementation of var hashValue:
> 
> /// A point in an x-y coordinate system.
> struct GridPoint {
>     var x: Int
>     var y: Int
> }
> 
> extension GridPoint: Hashable {
>     var hashValue: Int {
>         return x.hashValue ^ y.hashValue
>     }
> 
>     static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
>         return lhs.x == rhs.x && lhs.y == rhs.y
>     }
> }
> 
> Calculating the hashes of all GridPoints (given the above implementation) on 
> a 1000 × 1000 grid …
> 
> let (width, height) = (1000, 1000)
> let total = width * height
> var hashes = Set<Int>()
> for x in 0..<width {
>     for y in 0..<height {
>         hashes.insert(GridPoint(x: x, y: y).hashValue)
>     }
> }
> print("\(hashes.count) unique hashes out of a total of \(total).")
> 
> … results in just 1024 unique hash values for 1_000_000 unique values.
> 
> In other words: The recommended implementation causes 99.9% of values to 
> trigger a hash collision.
> 
> Out of those 1_000_000 values the median collision count was 976 with min and 
> max being 976 and 1000respectively.
> 
> The collision rate will have negative impact in algorithms which heavily use 
> hashValue like the ones in Dictionaryand Set. Furthermore, it increases the 
> vulnerability to DDOS attacks when exposed to the web.
> 
> If even the official Swift documentation gets the implementation of hashValue 
> wrong, then who is to expect the average Swift programmer to do any better?
> 
> In contrast, running the same snippet using HashVisitable and the semi-secure 
> Fnv1aHash (see below) results in zero collisions!
> 
> Finally, the design of the Hashable protocol forces the use of one 
> implementation without the possibility of switching between multiple hashing 
> algorithms.
> 
> Proposed solution
> 
> Instead of coupling the hashing algorithm with each and every Swift type, we 
> should provide a hashing API based on the visitor-pattern. By freeing 
> application developers from the burden of having to implement hashing 
> algorithms, the Standard Library can provide default ones which fulfill 
> collision, performance and security goals. Furthermore, it would allow 
> developers to swap to different algorithms based on the use case.
> 
> Detailed design
> 
> The proposal deprecates the Hashable protocol and introduces the following 
> two:
> 
> protocol Hasher
>  {
>     
> mutating func finish() -> Int
> 
>     
> mutating func write(bytes
> : UnsafeRawBufferPointer)
> }
> 
> 
> protocol HashVisitable
>  {
>     
> func hash<H: Hasher>(_ hasher: inout
>  H)
> }
> 
> Hasher is the protocol which represents a hashing algorithm, and 
> HashVisitable replaces Hashable. For types entirely represented by their 
> memory layout, the following protocol would provide a default implementation:
> 
> protocol ContiguouslyHashable: HashVisitable {}
> 
> extension ContiguouslyHashable {
>     func hash<H: Hasher>(_ hasher: inout H) {
>         var mutableSelf = self
>         try! Swift.withUnsafeBytes(of: &mutableSelf) {
>             hasher.write(bytes: $0)
>         }
>     }
> }
> 
> extension Bool : ContiguouslyHashable {}
> extension UInt8 : ContiguouslyHashable {}
> extension UInt16 : ContiguouslyHashable {}
> extension UInt32 : ContiguouslyHashable {}
> extension UInt64 : ContiguouslyHashable {}
> extension UInt : ContiguouslyHashable {}
> extension Int8 : ContiguouslyHashable {}
> extension Int16 : ContiguouslyHashable {}
> extension Int32 : ContiguouslyHashable {}
> extension Int64 : ContiguouslyHashable {}
> extension Int : ContiguouslyHashable {}
> 
> The Standard-Library would then provide a set of hashing implementations 
> specific to each purpose. A possible choice for hashing algorithms would be 
> the reasonably fast SipHash-2-4, and the reasonably secure SipHash-4-8.
> 
> FNV-1A is another popular semi-secure but blazingly fast hash algorithm, 
> which – for the sake of demonstration – could be implemented as follows:
> 
> struct Fnv1aHash
>  {
>     
> fileprivate var state: UInt
> 
>     
> init(seed: UInt
> ) {
>         
> self.state = seed &+ 14695981039346656037
> 
>     }
> }
> 
> 
> extension Fnv1aHash: Hasher 
> {
>     
> mutating func write(bytes
> : UnsafeRawBufferPointer) {
>         
> for byte in
>  bytes {
>             
> self.state = (self.state ^ UInt(byte)) &* 1099511628211
> 
>         }
>     }
>     
> mutating func finish() -> Int
>  {
>         
> return unsafeBitCast(self.state, to: Int.self
> )
>     }
> }
> 
> Coming back to the sample code present in the Hashable documentation, the new 
> implementation would look like:
> 
> extension GridPoint: HashVisitable 
> {
>     
> func hash<H: Hasher>(_ hasher: inout
>  H) {
>         
> self.x.hash(&
> hasher)
>         
> self.y.hash(&
> hasher)
>     }
> }
> 
> Source compatibility
> 
> Making use of "extending protocols to conform to protocols":
> 
> extension Hashable: HashVisitable 
> {
>     
> func hash<H: Hasher>(_ hasher: inout
>  H) {
>         
> self.hashValue.hash(&
> hasher)
>     }
> }
> 
> Effect on ABI stability
> 
> n/a
> 
> Effect on API resilience
> 
> This feature should be possible to add/remove without breaking ABI.
> 
> Alternatives considered
> 
> n/a
> _______________________________________________
> swift-evolution mailing list
> [email protected]
> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to