> On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution 
> <[email protected]> wrote:
> 
> I’m dumb when it comes to proper hashing, but it’s such a tediously common 
> thing in my experience to need to add Hashable to some kind of a struct so I 
> can stash it in a set or use it as a dictionary key. Is there really no way 
> to make this all more automatic? I have to be honest - this proposal seems 
> *harder* to understand than the way it works now.

It's not really harder: just call hash on each of your type's significant 
values:

x.hash(&hasher)
y.hash(&hasher)

How would you implement hashValue in a simpler way, remembering that 'x ^ y' is 
an incorrect implementation?

> Of course the easiest would be if the language could just do this “good 
> enough" for me using reflection or whatever and if I really did run into a 
> problem where I wanted to do this myself, I could override something.
> 
> Perfect is the enemy of good.
> 
> l8r
> Sean
> 
> 
>> On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution 
>> <[email protected]> wrote:
>> 
>> Hi there,
>> 
>> I've written up a proposal for an overhaul/replacement of the Hashable 
>> protocol and would love to hear your feedback on it!
>> 
>> Rendered | Blog Post
>> 
>> Cheers,
>> Vincent
>> 
>> Ps: I'd like to thank David Hart (@hartbit) for his great editorial feedback 
>> on this proposal. 👍
>> 
>> HashVisitable
>> 
>>    • Proposal: SE-NNNN
>>    • Authors: Vincent Esche
>>    • Review Manager: TBD
>>    • Status: Awaiting review
>> Introduction
>> 
>> Replace the Hashable protocol by two new procotols (Hasher and 
>> HashVisitable) to improve safety, versatility and learnability.
>> 
>> Motivation
>> 
>> Implementing Hashable is difficult and the consequences if not done well 
>> have dire performance and safety repercussions.
>> 
>> The documentation of Hashable lists a sample implementation of var hashValue:
>> 
>> /// A point in an x-y coordinate system.
>> struct GridPoint {
>>    var x: Int
>>    var y: Int
>> }
>> 
>> extension GridPoint: Hashable {
>>    var hashValue: Int {
>>        return x.hashValue ^ y.hashValue
>>    }
>> 
>>    static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
>>        return lhs.x == rhs.x && lhs.y == rhs.y
>>    }
>> }
>> 
>> Calculating the hashes of all GridPoints (given the above implementation) on 
>> a 1000 × 1000 grid …
>> 
>> let (width, height) = (1000, 1000)
>> let total = width * height
>> var hashes = Set<Int>()
>> for x in 0..<width {
>>    for y in 0..<height {
>>        hashes.insert(GridPoint(x: x, y: y).hashValue)
>>    }
>> }
>> print("\(hashes.count) unique hashes out of a total of \(total).")
>> 
>> … results in just 1024 unique hash values for 1_000_000 unique values.
>> 
>> In other words: The recommended implementation causes 99.9% of values to 
>> trigger a hash collision.
>> 
>> Out of those 1_000_000 values the median collision count was 976 with min 
>> and max being 976 and 1000respectively.
>> 
>> The collision rate will have negative impact in algorithms which heavily use 
>> hashValue like the ones in Dictionaryand Set. Furthermore, it increases the 
>> vulnerability to DDOS attacks when exposed to the web.
>> 
>> If even the official Swift documentation gets the implementation of 
>> hashValue wrong, then who is to expect the average Swift programmer to do 
>> any better?
>> 
>> In contrast, running the same snippet using HashVisitable and the 
>> semi-secure Fnv1aHash (see below) results in zero collisions!
>> 
>> Finally, the design of the Hashable protocol forces the use of one 
>> implementation without the possibility of switching between multiple hashing 
>> algorithms.
>> 
>> Proposed solution
>> 
>> Instead of coupling the hashing algorithm with each and every Swift type, we 
>> should provide a hashing API based on the visitor-pattern. By freeing 
>> application developers from the burden of having to implement hashing 
>> algorithms, the Standard Library can provide default ones which fulfill 
>> collision, performance and security goals. Furthermore, it would allow 
>> developers to swap to different algorithms based on the use case.
>> 
>> Detailed design
>> 
>> The proposal deprecates the Hashable protocol and introduces the following 
>> two:
>> 
>> protocol Hasher
>> {
>> 
>> mutating func finish() -> Int
>> 
>> 
>> mutating func write(bytes
>> : UnsafeRawBufferPointer)
>> }
>> 
>> 
>> protocol HashVisitable
>> {
>> 
>> func hash<H: Hasher>(_ hasher: inout
>> H)
>> }
>> 
>> Hasher is the protocol which represents a hashing algorithm, and 
>> HashVisitable replaces Hashable. For types entirely represented by their 
>> memory layout, the following protocol would provide a default implementation:
>> 
>> protocol ContiguouslyHashable: HashVisitable {}
>> 
>> extension ContiguouslyHashable {
>>    func hash<H: Hasher>(_ hasher: inout H) {
>>        var mutableSelf = self
>>        try! Swift.withUnsafeBytes(of: &mutableSelf) {
>>            hasher.write(bytes: $0)
>>        }
>>    }
>> }
>> 
>> extension Bool : ContiguouslyHashable {}
>> extension UInt8 : ContiguouslyHashable {}
>> extension UInt16 : ContiguouslyHashable {}
>> extension UInt32 : ContiguouslyHashable {}
>> extension UInt64 : ContiguouslyHashable {}
>> extension UInt : ContiguouslyHashable {}
>> extension Int8 : ContiguouslyHashable {}
>> extension Int16 : ContiguouslyHashable {}
>> extension Int32 : ContiguouslyHashable {}
>> extension Int64 : ContiguouslyHashable {}
>> extension Int : ContiguouslyHashable {}
>> 
>> The Standard-Library would then provide a set of hashing implementations 
>> specific to each purpose. A possible choice for hashing algorithms would be 
>> the reasonably fast SipHash-2-4, and the reasonably secure SipHash-4-8.
>> 
>> FNV-1A is another popular semi-secure but blazingly fast hash algorithm, 
>> which – for the sake of demonstration – could be implemented as follows:
>> 
>> struct Fnv1aHash
>> {
>> 
>> fileprivate var state: UInt
>> 
>> 
>> init(seed: UInt
>> ) {
>> 
>> self.state = seed &+ 14695981039346656037
>> 
>>    }
>> }
>> 
>> 
>> extension Fnv1aHash: Hasher 
>> {
>> 
>> mutating func write(bytes
>> : UnsafeRawBufferPointer) {
>> 
>> for byte in
>> bytes {
>> 
>> self.state = (self.state ^ UInt(byte)) &* 1099511628211
>> 
>>        }
>>    }
>> 
>> mutating func finish() -> Int
>> {
>> 
>> return unsafeBitCast(self.state, to: Int.self
>> )
>>    }
>> }
>> 
>> Coming back to the sample code present in the Hashable documentation, the 
>> new implementation would look like:
>> 
>> extension GridPoint: HashVisitable 
>> {
>> 
>> func hash<H: Hasher>(_ hasher: inout
>> H) {
>> 
>> self.x.hash(&
>> hasher)
>> 
>> self.y.hash(&
>> hasher)
>>    }
>> }
>> 
>> Source compatibility
>> 
>> Making use of "extending protocols to conform to protocols":
>> 
>> extension Hashable: HashVisitable 
>> {
>> 
>> func hash<H: Hasher>(_ hasher: inout
>> H) {
>> 
>> self.hashValue.hash(&
>> hasher)
>>    }
>> }
>> 
>> Effect on ABI stability
>> 
>> n/a
>> 
>> Effect on API resilience
>> 
>> This feature should be possible to add/remove without breaking ABI.
>> 
>> Alternatives considered
>> 
>> n/a
>> _______________________________________________
>> swift-evolution mailing list
>> [email protected]
>> https://lists.swift.org/mailman/listinfo/swift-evolution
> 
> _______________________________________________
> swift-evolution mailing list
> [email protected]
> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to