The documentation bug is not the sole motivation for the proposal. I did in fact only find it while writing the accompanying blog post <https://blog.definiteloops.com/ha-r-sh-visitors-8c0c3686a46f>, in which I present a couple of additional motivations besides hash collisions.
Namely: - decoupling - independence from a one-size-fits-all hasher impl - multiple hashers per instance (for implementing Bloom Filters, etc.) - different hashers for different use-cases (fast&weak for safe bottlenecks, slow&secure for web-exposed APIs) - … I’m actually much more interested in the independence gained, than pure performance. On Tue, Mar 14, 2017 at 2:44 AM, Xiaodi Wu <[email protected]> wrote: > On Mon, Mar 13, 2017 at 8:30 PM, Tony Allevato via swift-evolution < > [email protected]> wrote: > >> Adding the list back to this reply because I don't believe you meant to >> reply only to me, and I think it's an important discussion to have :) >> >> >> On Mon, Mar 13, 2017 at 4:32 PM Vincent Esche < >> [email protected]> wrote: >> >> Automatic generation of Hashable implementation code only "solves" the >> problem of having to implement `var hashValue`. >> It however only works for some types. By no means for all. >> >> >> Certainly, and I never suggested that it would work for every case. I was >> responding to Sean's point that the compiler should be able to do "good >> enough" in the common cases and I offered a way that that can be achieved. >> >> >> Take this hypothetical implementation of a HSB color: >> >> struct Color { >> let hue: UInt8 >> let saturation: UInt8 >> let brightness: UInt8 >> } >> >> Let the semantic of this type be this: >> Any two colors with a `brightness` of `0` are to be considered equal >> regardless of their respective `hue` or `saturation`. At night all cats are >> gray. >> >> >> An auto-generated conformance impl for `Color` would most likely be wrong >> afaict. >> And those kind of semantics are everywhere. >> >> >> Of course, and in those cases, you wouldn't want to use an auto-derived >> Equatable or Hashable implementation. Those kinds of semantics are >> "everywhere" for certain definitions of "everywhere", but similarly, >> semantics where the hash value of a thing can be determined simply via >> combination of the hash values of its components are also "everywhere". >> >> I wouldn't suggest that auto-deriving Hashable solves *all* problems, and >> similarly, your proposal doesn't help in the situation you described >> either. Your API provides a different way to mix the hue, saturation, and >> brightness components in an implementation of hashValue, but it doesn't >> force the user to mix them *correctly* (by excluding hue/saturation if >> brightness is zero). >> >> So in both cases, users still have to provide custom implementations of >> == and hashValue. But without auto-deriving, users have to provide them >> even for the cases where the auto-derived implementation *would have been >> correct.* Auto-deriving is about reducing the number of types that need to >> have custom implementations, thereby reducing the chance that a user will >> do it wrong. >> >> When considering a type with auto-derived Hashable and Equatable, there >> are two ways it can be wrong: >> >> 1) The auto-generated implementations of both == and hashValue don't >> honor the semantic contract of the type (for example, don't ignore >> brightness when it's zero). >> 2) The user overrides the auto-generated implementation of one of >> ==/hashValue but not both, and violates the contracts between them. >> >> For #1, yes, the compiler generated an incorrect implementation in that >> case. However, I would argue it's the developer's responsibility to fix it >> by overriding it if they need different semantics. As I mentioned above, >> without derivation, the developer could still implement it incorrectly, >> just as they could with your API. >> >> For #2, this could be solved by requiring users to override both if they >> override one of them. Now they're back in the same situation as #1—they >> either did it right, or they did it wrong. >> >> >> >> >> Auto-generation clearly is no panacea when it comes to hashing. >> Especially if it leads to invalid and unsafe default implementations. >> >> >> Auto-deriving cannot produce "unsafe" implementations because it's >> defined in terms of a function operating over the hash values of its >> components. It can produce an implementation that does not match the >> intended semantics of the class that are defined outside of its component >> values, but that's not the compiler's job to decide; it's up to the user to >> provide those. >> >> Regarding unsafety, it's worth noting that your ContiguouslyHashable >> protocol is inherently unsafe and fragile. Imagine that a user implements a >> struct that they make conform to ContiguouslyHashable because at the time, >> it's a simple fixed layout type with primitive fields. If they take that >> type and add a new field to it that isn't contiguously hashable (say, a >> class reference) and they forget to replace the ContiguouslyHashable >> conformance, they now have a very broken type, and that behavior isn't >> caught until *runtime* when things go wrong. >> >> At least with derived conformances, in that situation the *compiler* >> would see that the type was no longer hashable and emit an error when it's >> used anywhere that Hashable/hashValue was expected. >> >> So if safety is your motivation, ContiguouslyHashable kind of fights back >> against that because it gives the user a door they can open—in some cases, >> accidentally—that produces invalid results. >> >> >> >> >> It would however be nice to be able to annotate types for which you want >> HashVisitable implementations to be generated. >> >> >> That's one of the still-open questions from the last discussion thread on >> the topic; whether auto-deriving should be automatic for any type where it >> is possible (as it is now for enums without associated values) or whether >> the user should have to opt-in. >> >> >> >> >> I actually don't see auto-generation as an alternative to a proper >> hashing API. But rather as an addition. >> HashVisitable and auto-deriving would actually work great together! >> >> >> I completely agree that these ideas complement each other very well! And >> I also agree that the language would do well to make it easier for users to >> easily do the right thing. I just feel that *the API proposed here >> specifically* adds too much complexity for the problem it's trying to solve. >> >> I'd find it more compelling if it was simplified a bit: >> >> * The standard library doesn't need to provide a number of hash >> implementations; it should just provide one that works "well enough" in >> most cases >> * Hashable doesn't have tie itself to a visitor pattern. It's not >> necessarily safer because users can still mix the wrong values; in that >> case, I'd rather the stdlib implementation of the "good enough" hash just >> be a standalone mixer that the language can encourage users to use. >> > > I agree with this assessment. If SipHash is deemed "good enough," then > Swift should offer these as standalone facilities and encourage users to > call them in `Hashable`. I think the API design proposed here is much too > complex, but offering tried-and-true hash functions is certainly reasonable > and would be an improvement over xor. > > Certainly also, the documentation for `Hashable` can be improved. In > general, however, it's not very convincing to motivate an entire proposal > for a new feature based on a documentation bug. Recognize that there are > (or at least have been, in past shipped versions of Swift) code snippets in > the documentation that _don't even compile_! If we accepted the logic put > forward here, we should give up on Swift entirely; after all, if even the > official Swift documentation can't provide code that compiles, what hope do > the rest of us have?! > > > >> >> In fact they already do in other languages <https://is.gd/iy5770>. >> >> >> On Mon, Mar 13, 2017 at 8:51 PM, Tony Allevato via swift-evolution < >> [email protected]> wrote: >> >> I'm not convinced this is the right approach: it couples the fact that a >> type is hashable with a specific strategy for implementing the hash value >> computation. >> >> Instead, the language should make efforts to avoid requiring the user to >> think about hashing algorithms *at all*; to answer Sean's question a couple >> messages up, I've proposed in the past >> <https://gist.github.com/allevato/2fd10290bfa84accfbe977d8ac07daad> (rough >> working draft) adding derivation of Equatable and Hashable for structs and >> enums where it's possible to automatically do so (for example, if all of >> the enum's associated values are Equatable/Hashable). I've been >> experimenting with an implementation in the compiler and I have something >> that works for enums, except for recursive ones. I haven't started structs >> yet because I think there are more open questions there, and I hope to >> propose enums and structs independently so that the enum one doesn't get >> bogged down by the struct one. >> >> The core team seems to be aware of the need for this; the logic that >> derives Equatable/Hashable for enums without associated values also has >> TODO comments to handle those with associated values, and to handle structs >> as well. Slava Pestov mentioned a while back that they encouraged PRs on >> it, which is why I started, but my free time has been limited lately. >> >> That being said, I *do* think there would be value in having some kind of >> hash "mixer" as part of the standard library and strongly encouraging >> through documentation that hashValue implementors use it. Getting the >> function right is the hard part, and if Swift both (1) took care of it for >> you by default in many cases and (2) made the harder cases easier by >> providing a high quality canned implementation, I think we'd have a much >> cleaner solution than what is being proposed here. >> >> >> On Mon, Mar 13, 2017 at 12:18 PM Sean Heber via swift-evolution < >> [email protected]> wrote: >> >> Is there any reason the API couldn’t be expressed as something along >> these lines? >> >> func hash() -> [Hashable] { >> return [x, y] >> } >> >> l8r >> Sean >> >> >> > On Mar 13, 2017, at 2:15 PM, David Hart <[email protected]> wrote: >> > >> >> >> >> On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution < >> [email protected]> wrote: >> >> >> >> I’m dumb when it comes to proper hashing, but it’s such a tediously >> common thing in my experience to need to add Hashable to some kind of a >> struct so I can stash it in a set or use it as a dictionary key. Is there >> really no way to make this all more automatic? I have to be honest - this >> proposal seems *harder* to understand than the way it works now. >> > >> > It's not really harder: just call hash on each of your type's >> significant values: >> > >> > x.hash(&hasher) >> > y.hash(&hasher) >> > >> > How would you implement hashValue in a simpler way, remembering that 'x >> ^ y' is an incorrect implementation? >> > >> >> Of course the easiest would be if the language could just do this >> “good enough" for me using reflection or whatever and if I really did run >> into a problem where I wanted to do this myself, I could override something. >> >> >> >> Perfect is the enemy of good. >> >> >> >> l8r >> >> Sean >> >> >> >> >> >>> On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution < >> [email protected]> wrote: >> >>> >> >>> Hi there, >> >>> >> >>> I've written up a proposal for an overhaul/replacement of the >> Hashable protocol and would love to hear your feedback on it! >> >>> >> >>> Rendered | Blog Post >> >>> >> >>> Cheers, >> >>> Vincent >> >>> >> >>> Ps: I'd like to thank David Hart (@hartbit) for his great editorial >> feedback on this proposal. 👍 >> >>> >> >>> HashVisitable >> >>> >> >>> • Proposal: SE-NNNN >> >>> • Authors: Vincent Esche >> >>> • Review Manager: TBD >> >>> • Status: Awaiting review >> >>> Introduction >> >>> >> >>> Replace the Hashable protocol by two new procotols (Hasher and >> HashVisitable) to improve safety, versatility and learnability. >> >>> >> >>> Motivation >> >>> >> >>> Implementing Hashable is difficult and the consequences if not done >> well have dire performance and safety repercussions. >> >>> >> >>> The documentation of Hashable lists a sample implementation of var >> hashValue: >> >>> >> >>> /// A point in an x-y coordinate system. >> >>> struct GridPoint { >> >>> var x: Int >> >>> var y: Int >> >>> } >> >>> >> >>> extension GridPoint: Hashable { >> >>> var hashValue: Int { >> >>> return x.hashValue ^ y.hashValue >> >>> } >> >>> >> >>> static func == (lhs: GridPoint, rhs: GridPoint) -> Bool { >> >>> return lhs.x == rhs.x && lhs.y == rhs.y >> >>> } >> >>> } >> >>> >> >>> Calculating the hashes of all GridPoints (given the above >> implementation) on a 1000 × 1000 grid … >> >>> >> >>> let (width, height) = (1000, 1000) >> >>> let total = width * height >> >>> var hashes = Set<Int>() >> >>> for x in 0..<width { >> >>> for y in 0..<height { >> >>> hashes.insert(GridPoint(x: x, y: y).hashValue) >> >>> } >> >>> } >> >>> print("\(hashes.count) unique hashes out of a total of \(total).") >> >>> >> >>> … results in just 1024 unique hash values for 1_000_000 unique values. >> >>> >> >>> In other words: The recommended implementation causes 99.9% of values >> to trigger a hash collision. >> >>> >> >>> Out of those 1_000_000 values the median collision count was 976 with >> min and max being 976 and 1000respectively. >> >>> >> >>> The collision rate will have negative impact in algorithms which >> heavily use hashValue like the ones in Dictionaryand Set. Furthermore, it >> increases the vulnerability to DDOS attacks when exposed to the web. >> >>> >> >>> If even the official Swift documentation gets the implementation of >> hashValue wrong, then who is to expect the average Swift programmer to do >> any better? >> >>> >> >>> In contrast, running the same snippet using HashVisitable and the >> semi-secure Fnv1aHash (see below) results in zero collisions! >> >>> >> >>> Finally, the design of the Hashable protocol forces the use of one >> implementation without the possibility of switching between multiple >> hashing algorithms. >> >>> >> >>> Proposed solution >> >>> >> >>> Instead of coupling the hashing algorithm with each and every Swift >> type, we should provide a hashing API based on the visitor-pattern. By >> freeing application developers from the burden of having to implement >> hashing algorithms, the Standard Library can provide default ones which >> fulfill collision, performance and security goals. Furthermore, it would >> allow developers to swap to different algorithms based on the use case. >> >>> >> >>> Detailed design >> >>> >> >>> The proposal deprecates the Hashable protocol and introduces the >> following two: >> >>> >> >>> protocol Hasher >> >>> { >> >>> >> >>> mutating func finish() -> Int >> >>> >> >>> >> >>> mutating func write(bytes >> >>> : UnsafeRawBufferPointer) >> >>> } >> >>> >> >>> >> >>> protocol HashVisitable >> >>> { >> >>> >> >>> func hash<H: Hasher>(_ hasher: inout >> >>> H) >> >>> } >> >>> >> >>> Hasher is the protocol which represents a hashing algorithm, and >> HashVisitable replaces Hashable. For types entirely represented by their >> memory layout, the following protocol would provide a default >> implementation: >> >>> >> >>> protocol ContiguouslyHashable: HashVisitable {} >> >>> >> >>> extension ContiguouslyHashable { >> >>> func hash<H: Hasher>(_ hasher: inout H) { >> >>> var mutableSelf = self >> >>> try! Swift.withUnsafeBytes(of: &mutableSelf) { >> >>> hasher.write(bytes: $0) >> >>> } >> >>> } >> >>> } >> >>> >> >>> extension Bool : ContiguouslyHashable {} >> >>> extension UInt8 : ContiguouslyHashable {} >> >>> extension UInt16 : ContiguouslyHashable {} >> >>> extension UInt32 : ContiguouslyHashable {} >> >>> extension UInt64 : ContiguouslyHashable {} >> >>> extension UInt : ContiguouslyHashable {} >> >>> extension Int8 : ContiguouslyHashable {} >> >>> extension Int16 : ContiguouslyHashable {} >> >>> extension Int32 : ContiguouslyHashable {} >> >>> extension Int64 : ContiguouslyHashable {} >> >>> extension Int : ContiguouslyHashable {} >> >>> >> >>> The Standard-Library would then provide a set of hashing >> implementations specific to each purpose. A possible choice for hashing >> algorithms would be the reasonably fast SipHash-2-4, and the reasonably >> secure SipHash-4-8. >> >>> >> >>> FNV-1A is another popular semi-secure but blazingly fast hash >> algorithm, which – for the sake of demonstration – could be implemented as >> follows: >> >>> >> >>> struct Fnv1aHash >> >>> { >> >>> >> >>> fileprivate var state: UInt >> >>> >> >>> >> >>> init(seed: UInt >> >>> ) { >> >>> >> >>> self.state = seed &+ 14695981039346656037 >> >>> >> >>> } >> >>> } >> >>> >> >>> >> >>> extension Fnv1aHash: Hasher >> >>> { >> >>> >> >>> mutating func write(bytes >> >>> : UnsafeRawBufferPointer) { >> >>> >> >>> for byte in >> >>> bytes { >> >>> >> >>> self.state = (self.state ^ UInt(byte)) &* 1099511628211 >> >>> >> >>> } >> >>> } >> >>> >> >>> mutating func finish() -> Int >> >>> { >> >>> >> >>> return unsafeBitCast(self.state, to: Int.self >> >>> ) >> >>> } >> >>> } >> >>> >> >>> Coming back to the sample code present in the Hashable documentation, >> the new implementation would look like: >> >>> >> >>> extension GridPoint: HashVisitable >> >>> { >> >>> >> >>> func hash<H: Hasher>(_ hasher: inout >> >>> H) { >> >>> >> >>> self.x.hash(& >> >>> hasher) >> >>> >> >>> self.y.hash(& >> >>> hasher) >> >>> } >> >>> } >> >>> >> >>> Source compatibility >> >>> >> >>> Making use of "extending protocols to conform to protocols": >> >>> >> >>> extension Hashable: HashVisitable >> >>> { >> >>> >> >>> func hash<H: Hasher>(_ hasher: inout >> >>> H) { >> >>> >> >>> self.hashValue.hash(& >> >>> hasher) >> >>> } >> >>> } >> >>> >> >>> Effect on ABI stability >> >>> >> >>> n/a >> >>> >> >>> Effect on API resilience >> >>> >> >>> This feature should be possible to add/remove without breaking ABI. >> >>> >> >>> Alternatives considered >> >>> >> >>> n/a >> >>> _______________________________________________ >> >>> swift-evolution mailing list >> >>> [email protected] >> >>> https://lists.swift.org/mailman/listinfo/swift-evolution >> >> >> >> _______________________________________________ >> >> swift-evolution mailing list >> >> [email protected] >> >> https://lists.swift.org/mailman/listinfo/swift-evolution >> >> _______________________________________________ >> swift-evolution mailing list >> [email protected] >> https://lists.swift.org/mailman/listinfo/swift-evolution >> >> >> _______________________________________________ >> swift-evolution mailing list >> [email protected] >> https://lists.swift.org/mailman/listinfo/swift-evolution >> >> >> >> _______________________________________________ >> swift-evolution mailing list >> [email protected] >> https://lists.swift.org/mailman/listinfo/swift-evolution >> >> >
_______________________________________________ swift-evolution mailing list [email protected] https://lists.swift.org/mailman/listinfo/swift-evolution
