Yes please! I’ve a package on GitHub to implement roughly the same thing. I’ve 
been happily using it for months now, and I wouldn’t ever write a hashValue 
implementation by hand again.

https://github.com/lorentey/SipHash <https://github.com/lorentey/SipHash>

I think the fact that we’ve both come up with essentially the same API is an 
interesting data point; it definitely underlines the need for a better Hashable 
protocol.

My comments:

* In an ideal world, this would be a replacement for Hashable, not a protocol 
with a different name. Once visitor-based hashing becomes available, nobody 
should implement a hashValue property.

* All standard Hashable types in standard library should implement the new hash 
function directly, rather than relying on the default implementation.

* Why is the HashVisitable.hash a generic function? Hasher could just as well 
be a concrete type defined in stdlib. Making it generic may have some 
performance implications.

* I find that I prefer to hash components by calling a mutating method on the 
hasher, rather than directly calling the components' hash implementations. 
Putting the hasher first is much more readable to me, primarily because it gets 
rid of all the distracting &s. It also makes it possible to find slightly 
better names, eliminating the repetitiveness of "foo.hash(&hasher)":

extension GridPoint: SipHashable {
    func appendHashes(to hasher: inout
 SipHasher) {
         hasher.append(x)
         hasher.append(y)
    }
}

* I suggest using SipHash instead of FNV-1a. The standard library already 
contains an implementation for SipHash, as undocumented internal API, complete 
with a note that it should be made public. AFAICR, it is currently only used 
for String hashing, but it would be very much worth making it universal. 
(Accomodating SipHash's random key is one reason why hashValue’s documentation 
explicitly notes that its value is "not guaranteed to be equal across different 
executions of your program.”)

* ContiguouslyHashable seems like an unsafe construct to me. It is quite 
dangerous to base hashing on the raw byte sequence underlying a value: for 
example, struct values may include uninitialized gaps between some of their 
stored properties due to alignment constraints. So two otherwise identical 
values may very well have different in-memory representations. Therefore, I 
suggest ContiguouslyHashable should be removed from the proposal. Swift 
programmers who know what they’re doing would still be able to call 
withUnsafeBytes(of:) when they want to, but regular schmoes like myself will 
not be tempted to shoot themselves in the foot by using it inappropriately.

Cheers,
Karoly


> On 2017-03-13, at 16:38, Vincent Esche via swift-evolution 
> <swift-evolution@swift.org> wrote:
> 
> Hi there,
> 
> I've written up a proposal for an overhaul/replacement of the Hashable 
> protocol and would love to hear your feedback on it!
> 
> Rendered 
> <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25> | Blog 
> Post <https://blog.definiteloops.com/ha-r-sh-visitors-8c0c3686a46f>
> 
> Cheers,
> Vincent
> 
> Ps: I'd like to thank David Hart (@hartbit) for his great editorial feedback 
> on this proposal. 👍
> 
> HashVisitable
> 
> Proposal: SE-NNNN <https://gist.github.com/regexident/NNNN-filename.md>
> Authors: Vincent Esche <https://github.com/regexident>
> Review Manager: TBD
> Status: Awaiting review
>  
> <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#introduction>Introduction
> 
> Replace the Hashable protocol by two new procotols (Hasher and HashVisitable) 
> to improve safety, versatility and learnability.
> 
>  
> <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#motivation>Motivation
> 
> Implementing Hashable is difficult and the consequences if not done well have 
> dire performance and safety repercussions.
> 
> The documentation of Hashable lists a sample implementation 
> <https://developer.apple.com/reference/swift/hashable> of var hashValue:
> 
> /// A point in an x-y coordinate system.
> struct GridPoint {
>     var x: Int
>     var y: Int
> }
> 
> extension GridPoint: Hashable {
>     var hashValue: Int {
>         return x.hashValue ^ y.hashValue
>     }
> 
>     static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
>         return lhs.x == rhs.x && lhs.y == rhs.y
>     }
> }
> Calculating the hashes of all GridPoints (given the above implementation) on 
> a 1000 × 1000 grid …
> 
> let (width, height) = (1000, 1000)
> let total = width * height
> var hashes = Set<Int>()
> for x in 0..<width {
>     for y in 0..<height {
>         hashes.insert(GridPoint(x: x, y: y).hashValue)
>     }
> }
> print("\(hashes.count) unique hashes out of a total of \(total).")
> … results in just 1024 unique hash values for 1_000_000 unique values.
> 
> In other words: The recommended implementation causes 99.9% of values to 
> trigger a hash collision.
> 
> Out of those 1_000_000 values the median collision count was 976 with min and 
> max being 976 and 1000respectively.
> 
> The collision rate will have negative impact in algorithms which heavily use 
> hashValue like the ones in Dictionaryand Set. Furthermore, it increases the 
> vulnerability to DDOS attacks when exposed to the web 
> <https://arstechnica.com/business/2011/12/huge-portions-of-web-vulnerable-to-hashing-denial-of-service-attack/>.
> 
> If even the official Swift documentation gets the implementation of hashValue 
> wrong, then who is to expect the average Swift programmer to do any better?
> 
> In contrast, running the same snippet using HashVisitable and the semi-secure 
> Fnv1aHash (see below) results in zero collisions!
> 
> Finally, the design of the Hashable protocol forces the use of one 
> implementation without the possibility of switching between multiple hashing 
> algorithms.
> 
>  
> <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#proposed-solution>Proposed
>  solution
> 
> Instead of coupling the hashing algorithm with each and every Swift type, we 
> should provide a hashing API based on the visitor-pattern. By freeing 
> application developers from the burden of having to implement hashing 
> algorithms, the Standard Library can provide default ones which fulfill 
> collision, performance and security goals. Furthermore, it would allow 
> developers to swap to different algorithms based on the use case.
> 
>  
> <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#detailed-design>Detailed
>  design
> 
> The proposal deprecates the Hashable protocol and introduces the following 
> two:
> 
> protocol Hasher {
>     mutating func finish() -> Int
>     mutating func write(bytes: UnsafeRawBufferPointer)
> }
> 
> protocol HashVisitable {
>     func hash<H: Hasher>(_ hasher: inout H)
> }
> Hasher is the protocol which represents a hashing algorithm, and 
> HashVisitable replaces Hashable. For types entirely represented by their 
> memory layout, the following protocol would provide a default implementation:
> 
> protocol ContiguouslyHashable: HashVisitable {}
> 
> extension ContiguouslyHashable {
>     func hash<H: Hasher>(_ hasher: inout H) {
>         var mutableSelf = self
>         try! Swift.withUnsafeBytes(of: &mutableSelf) {
>             hasher.write(bytes: $0)
>         }
>     }
> }
> 
> extension Bool : ContiguouslyHashable {}
> extension UInt8 : ContiguouslyHashable {}
> extension UInt16 : ContiguouslyHashable {}
> extension UInt32 : ContiguouslyHashable {}
> extension UInt64 : ContiguouslyHashable {}
> extension UInt : ContiguouslyHashable {}
> extension Int8 : ContiguouslyHashable {}
> extension Int16 : ContiguouslyHashable {}
> extension Int32 : ContiguouslyHashable {}
> extension Int64 : ContiguouslyHashable {}
> extension Int : ContiguouslyHashable {}
> The Standard-Library would then provide a set of hashing implementations 
> specific to each purpose. A possible choice for hashing algorithms would be 
> the reasonably fast SipHash-2-4 <https://en.wikipedia.org/wiki/SipHash>, and 
> the reasonably secure SipHash-4-8 <https://en.wikipedia.org/wiki/SipHash>.
> 
> FNV-1A is another popular semi-secure but blazingly fast hash algorithm, 
> which – for the sake of demonstration – could be implemented as follows:
> 
> struct Fnv1aHash {
>     fileprivate var state: UInt
>     init(seed: UInt) {
>         self.state = seed &+ 14695981039346656037
>     }
> }
> 
> extension Fnv1aHash: Hasher {
>     mutating func write(bytes: UnsafeRawBufferPointer) {
>         for byte in bytes {
>             self.state = (self.state ^ UInt(byte)) &* 1099511628211
>         }
>     }
>     mutating func finish() -> Int {
>         return unsafeBitCast(self.state, to: Int.self)
>     }
> }
> Coming back to the sample code present in the Hashable documentation, the new 
> implementation would look like:
> 
> extension GridPoint: HashVisitable {
>     func hash<H: Hasher>(_ hasher: inout H) {
>         self.x.hash(&hasher)
>         self.y.hash(&hasher)
>     }
> }
>  
> <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#source-compatibility>Source
>  compatibility
> 
> Making use of "extending protocols to conform to protocols 
> <https://github.com/apple/swift-evolution/blob/d33c129f0920af0404f42219db56981411b20e76/proposals/0143-conditional-conformances.md#extending-protocols-to-conform-to-protocols>":
> 
> extension Hashable: HashVisitable {
>     func hash<H: Hasher>(_ hasher: inout H) {
>         self.hashValue.hash(&hasher)
>     }
> }
>  
> <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#effect-on-abi-stability>Effect
>  on ABI stability
> 
> n/a
> 
>  
> <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#effect-on-api-resilience>Effect
>  on API resilience
> 
> This feature should be possible to add/remove without breaking ABI.
> 
>  
> <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#alternatives-considered>Alternatives
>  considered
> 
> n/a
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to