> On 13 Mar 2017, at 20:51, Tony Allevato <[email protected]> wrote:
> 
> I'm not convinced this is the right approach: it couples the fact that a type 
> is hashable with a specific strategy for implementing the hash value 
> computation. 

I’m confused. This proposal specifically decouples the implementation of 
hashable from hashing strategies: that’s its main goal. Could you explain how 
you see coupling here?

> Instead, the language should make efforts to avoid requiring the user to 
> think about hashing algorithms *at all*; to answer Sean's question a couple 
> messages up, I've proposed in the past 
> <https://gist.github.com/allevato/2fd10290bfa84accfbe977d8ac07daad> (rough 
> working draft) adding derivation of Equatable and Hashable for structs and 
> enums where it's possible to automatically do so (for example, if all of the 
> enum's associated values are Equatable/Hashable).

This proposal does so with the ContinuouslyHashable protocol, which provides a 
default implementation of hashable that works for enums and structs that have 
“inclusive” memory layouts. I think this is good thing, because it means you 
have to make a conscious choice to use ContinuouslyHashable instead of relying 
on an automatic implementation which will sometimes be wrong.

> I've been experimenting with an implementation in the compiler and I have 
> something that works for enums, except for recursive ones. I haven't started 
> structs yet because I think there are more open questions there, and I hope 
> to propose enums and structs independently so that the enum one doesn't get 
> bogged down by the struct one.
> 
> The core team seems to be aware of the need for this; the logic that derives 
> Equatable/Hashable for enums without associated values also has TODO comments 
> to handle those with associated values, and to handle structs as well. Slava 
> Pestov mentioned a while back that they encouraged PRs on it, which is why I 
> started, but my free time has been limited lately.
> 
> That being said, I *do* think there would be value in having some kind of 
> hash "mixer" as part of the standard library and strongly encouraging through 
> documentation that hashValue implementors use it. Getting the function right 
> is the hard part, and if Swift both (1) took care of it for you by default in 
> many cases and (2) made the harder cases easier by providing a high quality 
> canned implementation, I think we'd have a much cleaner solution than what is 
> being proposed here.
> 
> 
> On Mon, Mar 13, 2017 at 12:18 PM Sean Heber via swift-evolution 
> <[email protected] <mailto:[email protected]>> wrote:
> Is there any reason the API couldn’t be expressed as something along these 
> lines?
> 
> func hash() -> [Hashable] {
>   return [x, y]
> }
> 
> l8r
> Sean
> 
> 
> > On Mar 13, 2017, at 2:15 PM, David Hart <[email protected] 
> > <mailto:[email protected]>> wrote:
> >
> >>
> >> On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution 
> >> <[email protected] <mailto:[email protected]>> wrote:
> >>
> >> I’m dumb when it comes to proper hashing, but it’s such a tediously common 
> >> thing in my experience to need to add Hashable to some kind of a struct so 
> >> I can stash it in a set or use it as a dictionary key. Is there really no 
> >> way to make this all more automatic? I have to be honest - this proposal 
> >> seems *harder* to understand than the way it works now.
> >
> > It's not really harder: just call hash on each of your type's significant 
> > values:
> >
> > x.hash(&hasher)
> > y.hash(&hasher)
> >
> > How would you implement hashValue in a simpler way, remembering that 'x ^ 
> > y' is an incorrect implementation?
> >
> >> Of course the easiest would be if the language could just do this “good 
> >> enough" for me using reflection or whatever and if I really did run into a 
> >> problem where I wanted to do this myself, I could override something.
> >>
> >> Perfect is the enemy of good.
> >>
> >> l8r
> >> Sean
> >>
> >>
> >>> On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution 
> >>> <[email protected] <mailto:[email protected]>> wrote:
> >>>
> >>> Hi there,
> >>>
> >>> I've written up a proposal for an overhaul/replacement of the Hashable 
> >>> protocol and would love to hear your feedback on it!
> >>>
> >>> Rendered | Blog Post
> >>>
> >>> Cheers,
> >>> Vincent
> >>>
> >>> Ps: I'd like to thank David Hart (@hartbit) for his great editorial 
> >>> feedback on this proposal. 👍
> >>>
> >>> HashVisitable
> >>>
> >>>   • Proposal: SE-NNNN
> >>>   • Authors: Vincent Esche
> >>>   • Review Manager: TBD
> >>>   • Status: Awaiting review
> >>> Introduction
> >>>
> >>> Replace the Hashable protocol by two new procotols (Hasher and 
> >>> HashVisitable) to improve safety, versatility and learnability.
> >>>
> >>> Motivation
> >>>
> >>> Implementing Hashable is difficult and the consequences if not done well 
> >>> have dire performance and safety repercussions.
> >>>
> >>> The documentation of Hashable lists a sample implementation of var 
> >>> hashValue:
> >>>
> >>> /// A point in an x-y coordinate system.
> >>> struct GridPoint {
> >>>   var x: Int
> >>>   var y: Int
> >>> }
> >>>
> >>> extension GridPoint: Hashable {
> >>>   var hashValue: Int {
> >>>       return x.hashValue ^ y.hashValue
> >>>   }
> >>>
> >>>   static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
> >>>       return lhs.x == rhs.x && lhs.y == rhs.y
> >>>   }
> >>> }
> >>>
> >>> Calculating the hashes of all GridPoints (given the above implementation) 
> >>> on a 1000 × 1000 grid …
> >>>
> >>> let (width, height) = (1000, 1000)
> >>> let total = width * height
> >>> var hashes = Set<Int>()
> >>> for x in 0..<width {
> >>>   for y in 0..<height {
> >>>       hashes.insert(GridPoint(x: x, y: y).hashValue)
> >>>   }
> >>> }
> >>> print("\(hashes.count) unique hashes out of a total of \(total).")
> >>>
> >>> … results in just 1024 unique hash values for 1_000_000 unique values.
> >>>
> >>> In other words: The recommended implementation causes 99.9% of values to 
> >>> trigger a hash collision.
> >>>
> >>> Out of those 1_000_000 values the median collision count was 976 with min 
> >>> and max being 976 and 1000respectively.
> >>>
> >>> The collision rate will have negative impact in algorithms which heavily 
> >>> use hashValue like the ones in Dictionaryand Set. Furthermore, it 
> >>> increases the vulnerability to DDOS attacks when exposed to the web.
> >>>
> >>> If even the official Swift documentation gets the implementation of 
> >>> hashValue wrong, then who is to expect the average Swift programmer to do 
> >>> any better?
> >>>
> >>> In contrast, running the same snippet using HashVisitable and the 
> >>> semi-secure Fnv1aHash (see below) results in zero collisions!
> >>>
> >>> Finally, the design of the Hashable protocol forces the use of one 
> >>> implementation without the possibility of switching between multiple 
> >>> hashing algorithms.
> >>>
> >>> Proposed solution
> >>>
> >>> Instead of coupling the hashing algorithm with each and every Swift type, 
> >>> we should provide a hashing API based on the visitor-pattern. By freeing 
> >>> application developers from the burden of having to implement hashing 
> >>> algorithms, the Standard Library can provide default ones which fulfill 
> >>> collision, performance and security goals. Furthermore, it would allow 
> >>> developers to swap to different algorithms based on the use case.
> >>>
> >>> Detailed design
> >>>
> >>> The proposal deprecates the Hashable protocol and introduces the 
> >>> following two:
> >>>
> >>> protocol Hasher
> >>> {
> >>>
> >>> mutating func finish() -> Int
> >>>
> >>>
> >>> mutating func write(bytes
> >>> : UnsafeRawBufferPointer)
> >>> }
> >>>
> >>>
> >>> protocol HashVisitable
> >>> {
> >>>
> >>> func hash<H: Hasher>(_ hasher: inout
> >>> H)
> >>> }
> >>>
> >>> Hasher is the protocol which represents a hashing algorithm, and 
> >>> HashVisitable replaces Hashable. For types entirely represented by their 
> >>> memory layout, the following protocol would provide a default 
> >>> implementation:
> >>>
> >>> protocol ContiguouslyHashable: HashVisitable {}
> >>>
> >>> extension ContiguouslyHashable {
> >>>   func hash<H: Hasher>(_ hasher: inout H) {
> >>>       var mutableSelf = self
> >>>       try! Swift.withUnsafeBytes(of: &mutableSelf) {
> >>>           hasher.write(bytes: $0)
> >>>       }
> >>>   }
> >>> }
> >>>
> >>> extension Bool : ContiguouslyHashable {}
> >>> extension UInt8 : ContiguouslyHashable {}
> >>> extension UInt16 : ContiguouslyHashable {}
> >>> extension UInt32 : ContiguouslyHashable {}
> >>> extension UInt64 : ContiguouslyHashable {}
> >>> extension UInt : ContiguouslyHashable {}
> >>> extension Int8 : ContiguouslyHashable {}
> >>> extension Int16 : ContiguouslyHashable {}
> >>> extension Int32 : ContiguouslyHashable {}
> >>> extension Int64 : ContiguouslyHashable {}
> >>> extension Int : ContiguouslyHashable {}
> >>>
> >>> The Standard-Library would then provide a set of hashing implementations 
> >>> specific to each purpose. A possible choice for hashing algorithms would 
> >>> be the reasonably fast SipHash-2-4, and the reasonably secure SipHash-4-8.
> >>>
> >>> FNV-1A is another popular semi-secure but blazingly fast hash algorithm, 
> >>> which – for the sake of demonstration – could be implemented as follows:
> >>>
> >>> struct Fnv1aHash
> >>> {
> >>>
> >>> fileprivate var state: UInt
> >>>
> >>>
> >>> init(seed: UInt
> >>> ) {
> >>>
> >>> self.state = seed &+ 14695981039346656037
> >>>
> >>>   }
> >>> }
> >>>
> >>>
> >>> extension Fnv1aHash: Hasher
> >>> {
> >>>
> >>> mutating func write(bytes
> >>> : UnsafeRawBufferPointer) {
> >>>
> >>> for byte in
> >>> bytes {
> >>>
> >>> self.state = (self.state ^ UInt(byte)) &* 1099511628211
> >>>
> >>>       }
> >>>   }
> >>>
> >>> mutating func finish() -> Int
> >>> {
> >>>
> >>> return unsafeBitCast(self.state, to: Int.self
> >>> )
> >>>   }
> >>> }
> >>>
> >>> Coming back to the sample code present in the Hashable documentation, the 
> >>> new implementation would look like:
> >>>
> >>> extension GridPoint: HashVisitable
> >>> {
> >>>
> >>> func hash<H: Hasher>(_ hasher: inout
> >>> H) {
> >>>
> >>> self.x.hash(&
> >>> hasher)
> >>>
> >>> self.y.hash(&
> >>> hasher)
> >>>   }
> >>> }
> >>>
> >>> Source compatibility
> >>>
> >>> Making use of "extending protocols to conform to protocols":
> >>>
> >>> extension Hashable: HashVisitable
> >>> {
> >>>
> >>> func hash<H: Hasher>(_ hasher: inout
> >>> H) {
> >>>
> >>> self.hashValue.hash(&
> >>> hasher)
> >>>   }
> >>> }
> >>>
> >>> Effect on ABI stability
> >>>
> >>> n/a
> >>>
> >>> Effect on API resilience
> >>>
> >>> This feature should be possible to add/remove without breaking ABI.
> >>>
> >>> Alternatives considered
> >>>
> >>> n/a
> >>> _______________________________________________
> >>> swift-evolution mailing list
> >>> [email protected] <mailto:[email protected]>
> >>> https://lists.swift.org/mailman/listinfo/swift-evolution 
> >>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
> >>
> >> _______________________________________________
> >> swift-evolution mailing list
> >> [email protected] <mailto:[email protected]>
> >> https://lists.swift.org/mailman/listinfo/swift-evolution 
> >> <https://lists.swift.org/mailman/listinfo/swift-evolution>
> 
> _______________________________________________
> swift-evolution mailing list
> [email protected] <mailto:[email protected]>
> https://lists.swift.org/mailman/listinfo/swift-evolution 
> <https://lists.swift.org/mailman/listinfo/swift-evolution>

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to