> On 13 Mar 2017, at 20:51, Tony Allevato <[email protected]> wrote: > > I'm not convinced this is the right approach: it couples the fact that a type > is hashable with a specific strategy for implementing the hash value > computation.
I’m confused. This proposal specifically decouples the implementation of hashable from hashing strategies: that’s its main goal. Could you explain how you see coupling here? > Instead, the language should make efforts to avoid requiring the user to > think about hashing algorithms *at all*; to answer Sean's question a couple > messages up, I've proposed in the past > <https://gist.github.com/allevato/2fd10290bfa84accfbe977d8ac07daad> (rough > working draft) adding derivation of Equatable and Hashable for structs and > enums where it's possible to automatically do so (for example, if all of the > enum's associated values are Equatable/Hashable). This proposal does so with the ContinuouslyHashable protocol, which provides a default implementation of hashable that works for enums and structs that have “inclusive” memory layouts. I think this is good thing, because it means you have to make a conscious choice to use ContinuouslyHashable instead of relying on an automatic implementation which will sometimes be wrong. > I've been experimenting with an implementation in the compiler and I have > something that works for enums, except for recursive ones. I haven't started > structs yet because I think there are more open questions there, and I hope > to propose enums and structs independently so that the enum one doesn't get > bogged down by the struct one. > > The core team seems to be aware of the need for this; the logic that derives > Equatable/Hashable for enums without associated values also has TODO comments > to handle those with associated values, and to handle structs as well. Slava > Pestov mentioned a while back that they encouraged PRs on it, which is why I > started, but my free time has been limited lately. > > That being said, I *do* think there would be value in having some kind of > hash "mixer" as part of the standard library and strongly encouraging through > documentation that hashValue implementors use it. Getting the function right > is the hard part, and if Swift both (1) took care of it for you by default in > many cases and (2) made the harder cases easier by providing a high quality > canned implementation, I think we'd have a much cleaner solution than what is > being proposed here. > > > On Mon, Mar 13, 2017 at 12:18 PM Sean Heber via swift-evolution > <[email protected] <mailto:[email protected]>> wrote: > Is there any reason the API couldn’t be expressed as something along these > lines? > > func hash() -> [Hashable] { > return [x, y] > } > > l8r > Sean > > > > On Mar 13, 2017, at 2:15 PM, David Hart <[email protected] > > <mailto:[email protected]>> wrote: > > > >> > >> On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution > >> <[email protected] <mailto:[email protected]>> wrote: > >> > >> I’m dumb when it comes to proper hashing, but it’s such a tediously common > >> thing in my experience to need to add Hashable to some kind of a struct so > >> I can stash it in a set or use it as a dictionary key. Is there really no > >> way to make this all more automatic? I have to be honest - this proposal > >> seems *harder* to understand than the way it works now. > > > > It's not really harder: just call hash on each of your type's significant > > values: > > > > x.hash(&hasher) > > y.hash(&hasher) > > > > How would you implement hashValue in a simpler way, remembering that 'x ^ > > y' is an incorrect implementation? > > > >> Of course the easiest would be if the language could just do this “good > >> enough" for me using reflection or whatever and if I really did run into a > >> problem where I wanted to do this myself, I could override something. > >> > >> Perfect is the enemy of good. > >> > >> l8r > >> Sean > >> > >> > >>> On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution > >>> <[email protected] <mailto:[email protected]>> wrote: > >>> > >>> Hi there, > >>> > >>> I've written up a proposal for an overhaul/replacement of the Hashable > >>> protocol and would love to hear your feedback on it! > >>> > >>> Rendered | Blog Post > >>> > >>> Cheers, > >>> Vincent > >>> > >>> Ps: I'd like to thank David Hart (@hartbit) for his great editorial > >>> feedback on this proposal. 👍 > >>> > >>> HashVisitable > >>> > >>> • Proposal: SE-NNNN > >>> • Authors: Vincent Esche > >>> • Review Manager: TBD > >>> • Status: Awaiting review > >>> Introduction > >>> > >>> Replace the Hashable protocol by two new procotols (Hasher and > >>> HashVisitable) to improve safety, versatility and learnability. > >>> > >>> Motivation > >>> > >>> Implementing Hashable is difficult and the consequences if not done well > >>> have dire performance and safety repercussions. > >>> > >>> The documentation of Hashable lists a sample implementation of var > >>> hashValue: > >>> > >>> /// A point in an x-y coordinate system. > >>> struct GridPoint { > >>> var x: Int > >>> var y: Int > >>> } > >>> > >>> extension GridPoint: Hashable { > >>> var hashValue: Int { > >>> return x.hashValue ^ y.hashValue > >>> } > >>> > >>> static func == (lhs: GridPoint, rhs: GridPoint) -> Bool { > >>> return lhs.x == rhs.x && lhs.y == rhs.y > >>> } > >>> } > >>> > >>> Calculating the hashes of all GridPoints (given the above implementation) > >>> on a 1000 × 1000 grid … > >>> > >>> let (width, height) = (1000, 1000) > >>> let total = width * height > >>> var hashes = Set<Int>() > >>> for x in 0..<width { > >>> for y in 0..<height { > >>> hashes.insert(GridPoint(x: x, y: y).hashValue) > >>> } > >>> } > >>> print("\(hashes.count) unique hashes out of a total of \(total).") > >>> > >>> … results in just 1024 unique hash values for 1_000_000 unique values. > >>> > >>> In other words: The recommended implementation causes 99.9% of values to > >>> trigger a hash collision. > >>> > >>> Out of those 1_000_000 values the median collision count was 976 with min > >>> and max being 976 and 1000respectively. > >>> > >>> The collision rate will have negative impact in algorithms which heavily > >>> use hashValue like the ones in Dictionaryand Set. Furthermore, it > >>> increases the vulnerability to DDOS attacks when exposed to the web. > >>> > >>> If even the official Swift documentation gets the implementation of > >>> hashValue wrong, then who is to expect the average Swift programmer to do > >>> any better? > >>> > >>> In contrast, running the same snippet using HashVisitable and the > >>> semi-secure Fnv1aHash (see below) results in zero collisions! > >>> > >>> Finally, the design of the Hashable protocol forces the use of one > >>> implementation without the possibility of switching between multiple > >>> hashing algorithms. > >>> > >>> Proposed solution > >>> > >>> Instead of coupling the hashing algorithm with each and every Swift type, > >>> we should provide a hashing API based on the visitor-pattern. By freeing > >>> application developers from the burden of having to implement hashing > >>> algorithms, the Standard Library can provide default ones which fulfill > >>> collision, performance and security goals. Furthermore, it would allow > >>> developers to swap to different algorithms based on the use case. > >>> > >>> Detailed design > >>> > >>> The proposal deprecates the Hashable protocol and introduces the > >>> following two: > >>> > >>> protocol Hasher > >>> { > >>> > >>> mutating func finish() -> Int > >>> > >>> > >>> mutating func write(bytes > >>> : UnsafeRawBufferPointer) > >>> } > >>> > >>> > >>> protocol HashVisitable > >>> { > >>> > >>> func hash<H: Hasher>(_ hasher: inout > >>> H) > >>> } > >>> > >>> Hasher is the protocol which represents a hashing algorithm, and > >>> HashVisitable replaces Hashable. For types entirely represented by their > >>> memory layout, the following protocol would provide a default > >>> implementation: > >>> > >>> protocol ContiguouslyHashable: HashVisitable {} > >>> > >>> extension ContiguouslyHashable { > >>> func hash<H: Hasher>(_ hasher: inout H) { > >>> var mutableSelf = self > >>> try! Swift.withUnsafeBytes(of: &mutableSelf) { > >>> hasher.write(bytes: $0) > >>> } > >>> } > >>> } > >>> > >>> extension Bool : ContiguouslyHashable {} > >>> extension UInt8 : ContiguouslyHashable {} > >>> extension UInt16 : ContiguouslyHashable {} > >>> extension UInt32 : ContiguouslyHashable {} > >>> extension UInt64 : ContiguouslyHashable {} > >>> extension UInt : ContiguouslyHashable {} > >>> extension Int8 : ContiguouslyHashable {} > >>> extension Int16 : ContiguouslyHashable {} > >>> extension Int32 : ContiguouslyHashable {} > >>> extension Int64 : ContiguouslyHashable {} > >>> extension Int : ContiguouslyHashable {} > >>> > >>> The Standard-Library would then provide a set of hashing implementations > >>> specific to each purpose. A possible choice for hashing algorithms would > >>> be the reasonably fast SipHash-2-4, and the reasonably secure SipHash-4-8. > >>> > >>> FNV-1A is another popular semi-secure but blazingly fast hash algorithm, > >>> which – for the sake of demonstration – could be implemented as follows: > >>> > >>> struct Fnv1aHash > >>> { > >>> > >>> fileprivate var state: UInt > >>> > >>> > >>> init(seed: UInt > >>> ) { > >>> > >>> self.state = seed &+ 14695981039346656037 > >>> > >>> } > >>> } > >>> > >>> > >>> extension Fnv1aHash: Hasher > >>> { > >>> > >>> mutating func write(bytes > >>> : UnsafeRawBufferPointer) { > >>> > >>> for byte in > >>> bytes { > >>> > >>> self.state = (self.state ^ UInt(byte)) &* 1099511628211 > >>> > >>> } > >>> } > >>> > >>> mutating func finish() -> Int > >>> { > >>> > >>> return unsafeBitCast(self.state, to: Int.self > >>> ) > >>> } > >>> } > >>> > >>> Coming back to the sample code present in the Hashable documentation, the > >>> new implementation would look like: > >>> > >>> extension GridPoint: HashVisitable > >>> { > >>> > >>> func hash<H: Hasher>(_ hasher: inout > >>> H) { > >>> > >>> self.x.hash(& > >>> hasher) > >>> > >>> self.y.hash(& > >>> hasher) > >>> } > >>> } > >>> > >>> Source compatibility > >>> > >>> Making use of "extending protocols to conform to protocols": > >>> > >>> extension Hashable: HashVisitable > >>> { > >>> > >>> func hash<H: Hasher>(_ hasher: inout > >>> H) { > >>> > >>> self.hashValue.hash(& > >>> hasher) > >>> } > >>> } > >>> > >>> Effect on ABI stability > >>> > >>> n/a > >>> > >>> Effect on API resilience > >>> > >>> This feature should be possible to add/remove without breaking ABI. > >>> > >>> Alternatives considered > >>> > >>> n/a > >>> _______________________________________________ > >>> swift-evolution mailing list > >>> [email protected] <mailto:[email protected]> > >>> https://lists.swift.org/mailman/listinfo/swift-evolution > >>> <https://lists.swift.org/mailman/listinfo/swift-evolution> > >> > >> _______________________________________________ > >> swift-evolution mailing list > >> [email protected] <mailto:[email protected]> > >> https://lists.swift.org/mailman/listinfo/swift-evolution > >> <https://lists.swift.org/mailman/listinfo/swift-evolution> > > _______________________________________________ > swift-evolution mailing list > [email protected] <mailto:[email protected]> > https://lists.swift.org/mailman/listinfo/swift-evolution > <https://lists.swift.org/mailman/listinfo/swift-evolution>
_______________________________________________ swift-evolution mailing list [email protected] https://lists.swift.org/mailman/listinfo/swift-evolution
