Is there any reason the API couldn’t be expressed as something along these 
lines?

func hash() -> [Hashable] {
  return [x, y]
}

l8r
Sean


> On Mar 13, 2017, at 2:15 PM, David Hart <[email protected]> wrote:
> 
>> 
>> On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution 
>> <[email protected]> wrote:
>> 
>> I’m dumb when it comes to proper hashing, but it’s such a tediously common 
>> thing in my experience to need to add Hashable to some kind of a struct so I 
>> can stash it in a set or use it as a dictionary key. Is there really no way 
>> to make this all more automatic? I have to be honest - this proposal seems 
>> *harder* to understand than the way it works now.
> 
> It's not really harder: just call hash on each of your type's significant 
> values:
> 
> x.hash(&hasher)
> y.hash(&hasher)
> 
> How would you implement hashValue in a simpler way, remembering that 'x ^ y' 
> is an incorrect implementation?
> 
>> Of course the easiest would be if the language could just do this “good 
>> enough" for me using reflection or whatever and if I really did run into a 
>> problem where I wanted to do this myself, I could override something.
>> 
>> Perfect is the enemy of good.
>> 
>> l8r
>> Sean
>> 
>> 
>>> On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution 
>>> <[email protected]> wrote:
>>> 
>>> Hi there,
>>> 
>>> I've written up a proposal for an overhaul/replacement of the Hashable 
>>> protocol and would love to hear your feedback on it!
>>> 
>>> Rendered | Blog Post
>>> 
>>> Cheers,
>>> Vincent
>>> 
>>> Ps: I'd like to thank David Hart (@hartbit) for his great editorial 
>>> feedback on this proposal. 👍
>>> 
>>> HashVisitable
>>> 
>>>   • Proposal: SE-NNNN
>>>   • Authors: Vincent Esche
>>>   • Review Manager: TBD
>>>   • Status: Awaiting review
>>> Introduction
>>> 
>>> Replace the Hashable protocol by two new procotols (Hasher and 
>>> HashVisitable) to improve safety, versatility and learnability.
>>> 
>>> Motivation
>>> 
>>> Implementing Hashable is difficult and the consequences if not done well 
>>> have dire performance and safety repercussions.
>>> 
>>> The documentation of Hashable lists a sample implementation of var 
>>> hashValue:
>>> 
>>> /// A point in an x-y coordinate system.
>>> struct GridPoint {
>>>   var x: Int
>>>   var y: Int
>>> }
>>> 
>>> extension GridPoint: Hashable {
>>>   var hashValue: Int {
>>>       return x.hashValue ^ y.hashValue
>>>   }
>>> 
>>>   static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
>>>       return lhs.x == rhs.x && lhs.y == rhs.y
>>>   }
>>> }
>>> 
>>> Calculating the hashes of all GridPoints (given the above implementation) 
>>> on a 1000 × 1000 grid …
>>> 
>>> let (width, height) = (1000, 1000)
>>> let total = width * height
>>> var hashes = Set<Int>()
>>> for x in 0..<width {
>>>   for y in 0..<height {
>>>       hashes.insert(GridPoint(x: x, y: y).hashValue)
>>>   }
>>> }
>>> print("\(hashes.count) unique hashes out of a total of \(total).")
>>> 
>>> … results in just 1024 unique hash values for 1_000_000 unique values.
>>> 
>>> In other words: The recommended implementation causes 99.9% of values to 
>>> trigger a hash collision.
>>> 
>>> Out of those 1_000_000 values the median collision count was 976 with min 
>>> and max being 976 and 1000respectively.
>>> 
>>> The collision rate will have negative impact in algorithms which heavily 
>>> use hashValue like the ones in Dictionaryand Set. Furthermore, it increases 
>>> the vulnerability to DDOS attacks when exposed to the web.
>>> 
>>> If even the official Swift documentation gets the implementation of 
>>> hashValue wrong, then who is to expect the average Swift programmer to do 
>>> any better?
>>> 
>>> In contrast, running the same snippet using HashVisitable and the 
>>> semi-secure Fnv1aHash (see below) results in zero collisions!
>>> 
>>> Finally, the design of the Hashable protocol forces the use of one 
>>> implementation without the possibility of switching between multiple 
>>> hashing algorithms.
>>> 
>>> Proposed solution
>>> 
>>> Instead of coupling the hashing algorithm with each and every Swift type, 
>>> we should provide a hashing API based on the visitor-pattern. By freeing 
>>> application developers from the burden of having to implement hashing 
>>> algorithms, the Standard Library can provide default ones which fulfill 
>>> collision, performance and security goals. Furthermore, it would allow 
>>> developers to swap to different algorithms based on the use case.
>>> 
>>> Detailed design
>>> 
>>> The proposal deprecates the Hashable protocol and introduces the following 
>>> two:
>>> 
>>> protocol Hasher
>>> {
>>> 
>>> mutating func finish() -> Int
>>> 
>>> 
>>> mutating func write(bytes
>>> : UnsafeRawBufferPointer)
>>> }
>>> 
>>> 
>>> protocol HashVisitable
>>> {
>>> 
>>> func hash<H: Hasher>(_ hasher: inout
>>> H)
>>> }
>>> 
>>> Hasher is the protocol which represents a hashing algorithm, and 
>>> HashVisitable replaces Hashable. For types entirely represented by their 
>>> memory layout, the following protocol would provide a default 
>>> implementation:
>>> 
>>> protocol ContiguouslyHashable: HashVisitable {}
>>> 
>>> extension ContiguouslyHashable {
>>>   func hash<H: Hasher>(_ hasher: inout H) {
>>>       var mutableSelf = self
>>>       try! Swift.withUnsafeBytes(of: &mutableSelf) {
>>>           hasher.write(bytes: $0)
>>>       }
>>>   }
>>> }
>>> 
>>> extension Bool : ContiguouslyHashable {}
>>> extension UInt8 : ContiguouslyHashable {}
>>> extension UInt16 : ContiguouslyHashable {}
>>> extension UInt32 : ContiguouslyHashable {}
>>> extension UInt64 : ContiguouslyHashable {}
>>> extension UInt : ContiguouslyHashable {}
>>> extension Int8 : ContiguouslyHashable {}
>>> extension Int16 : ContiguouslyHashable {}
>>> extension Int32 : ContiguouslyHashable {}
>>> extension Int64 : ContiguouslyHashable {}
>>> extension Int : ContiguouslyHashable {}
>>> 
>>> The Standard-Library would then provide a set of hashing implementations 
>>> specific to each purpose. A possible choice for hashing algorithms would be 
>>> the reasonably fast SipHash-2-4, and the reasonably secure SipHash-4-8.
>>> 
>>> FNV-1A is another popular semi-secure but blazingly fast hash algorithm, 
>>> which – for the sake of demonstration – could be implemented as follows:
>>> 
>>> struct Fnv1aHash
>>> {
>>> 
>>> fileprivate var state: UInt
>>> 
>>> 
>>> init(seed: UInt
>>> ) {
>>> 
>>> self.state = seed &+ 14695981039346656037
>>> 
>>>   }
>>> }
>>> 
>>> 
>>> extension Fnv1aHash: Hasher 
>>> {
>>> 
>>> mutating func write(bytes
>>> : UnsafeRawBufferPointer) {
>>> 
>>> for byte in
>>> bytes {
>>> 
>>> self.state = (self.state ^ UInt(byte)) &* 1099511628211
>>> 
>>>       }
>>>   }
>>> 
>>> mutating func finish() -> Int
>>> {
>>> 
>>> return unsafeBitCast(self.state, to: Int.self
>>> )
>>>   }
>>> }
>>> 
>>> Coming back to the sample code present in the Hashable documentation, the 
>>> new implementation would look like:
>>> 
>>> extension GridPoint: HashVisitable 
>>> {
>>> 
>>> func hash<H: Hasher>(_ hasher: inout
>>> H) {
>>> 
>>> self.x.hash(&
>>> hasher)
>>> 
>>> self.y.hash(&
>>> hasher)
>>>   }
>>> }
>>> 
>>> Source compatibility
>>> 
>>> Making use of "extending protocols to conform to protocols":
>>> 
>>> extension Hashable: HashVisitable 
>>> {
>>> 
>>> func hash<H: Hasher>(_ hasher: inout
>>> H) {
>>> 
>>> self.hashValue.hash(&
>>> hasher)
>>>   }
>>> }
>>> 
>>> Effect on ABI stability
>>> 
>>> n/a
>>> 
>>> Effect on API resilience
>>> 
>>> This feature should be possible to add/remove without breaking ABI.
>>> 
>>> Alternatives considered
>>> 
>>> n/a
>>> _______________________________________________
>>> swift-evolution mailing list
>>> [email protected]
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>> 
>> _______________________________________________
>> swift-evolution mailing list
>> [email protected]
>> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to