On Mon, Mar 13, 2017 at 8:30 PM, Tony Allevato via swift-evolution < swift-evolution@swift.org> wrote:
> Adding the list back to this reply because I don't believe you meant to > reply only to me, and I think it's an important discussion to have :) > > > On Mon, Mar 13, 2017 at 4:32 PM Vincent Esche <regexident.mailinglists@ > gmail.com> wrote: > > Automatic generation of Hashable implementation code only "solves" the > problem of having to implement `var hashValue`. > It however only works for some types. By no means for all. > > > Certainly, and I never suggested that it would work for every case. I was > responding to Sean's point that the compiler should be able to do "good > enough" in the common cases and I offered a way that that can be achieved. > > > Take this hypothetical implementation of a HSB color: > > struct Color { > let hue: UInt8 > let saturation: UInt8 > let brightness: UInt8 > } > > Let the semantic of this type be this: > Any two colors with a `brightness` of `0` are to be considered equal > regardless of their respective `hue` or `saturation`. At night all cats are > gray. > > > An auto-generated conformance impl for `Color` would most likely be wrong > afaict. > And those kind of semantics are everywhere. > > > Of course, and in those cases, you wouldn't want to use an auto-derived > Equatable or Hashable implementation. Those kinds of semantics are > "everywhere" for certain definitions of "everywhere", but similarly, > semantics where the hash value of a thing can be determined simply via > combination of the hash values of its components are also "everywhere". > > I wouldn't suggest that auto-deriving Hashable solves *all* problems, and > similarly, your proposal doesn't help in the situation you described > either. Your API provides a different way to mix the hue, saturation, and > brightness components in an implementation of hashValue, but it doesn't > force the user to mix them *correctly* (by excluding hue/saturation if > brightness is zero). > > So in both cases, users still have to provide custom implementations of == > and hashValue. But without auto-deriving, users have to provide them even > for the cases where the auto-derived implementation *would have been > correct.* Auto-deriving is about reducing the number of types that need to > have custom implementations, thereby reducing the chance that a user will > do it wrong. > > When considering a type with auto-derived Hashable and Equatable, there > are two ways it can be wrong: > > 1) The auto-generated implementations of both == and hashValue don't honor > the semantic contract of the type (for example, don't ignore brightness > when it's zero). > 2) The user overrides the auto-generated implementation of one of > ==/hashValue but not both, and violates the contracts between them. > > For #1, yes, the compiler generated an incorrect implementation in that > case. However, I would argue it's the developer's responsibility to fix it > by overriding it if they need different semantics. As I mentioned above, > without derivation, the developer could still implement it incorrectly, > just as they could with your API. > > For #2, this could be solved by requiring users to override both if they > override one of them. Now they're back in the same situation as #1—they > either did it right, or they did it wrong. > > > > > Auto-generation clearly is no panacea when it comes to hashing. Especially > if it leads to invalid and unsafe default implementations. > > > Auto-deriving cannot produce "unsafe" implementations because it's defined > in terms of a function operating over the hash values of its components. It > can produce an implementation that does not match the intended semantics of > the class that are defined outside of its component values, but that's not > the compiler's job to decide; it's up to the user to provide those. > > Regarding unsafety, it's worth noting that your ContiguouslyHashable > protocol is inherently unsafe and fragile. Imagine that a user implements a > struct that they make conform to ContiguouslyHashable because at the time, > it's a simple fixed layout type with primitive fields. If they take that > type and add a new field to it that isn't contiguously hashable (say, a > class reference) and they forget to replace the ContiguouslyHashable > conformance, they now have a very broken type, and that behavior isn't > caught until *runtime* when things go wrong. > > At least with derived conformances, in that situation the *compiler* would > see that the type was no longer hashable and emit an error when it's used > anywhere that Hashable/hashValue was expected. > > So if safety is your motivation, ContiguouslyHashable kind of fights back > against that because it gives the user a door they can open—in some cases, > accidentally—that produces invalid results. > > > > > It would however be nice to be able to annotate types for which you want > HashVisitable implementations to be generated. > > > That's one of the still-open questions from the last discussion thread on > the topic; whether auto-deriving should be automatic for any type where it > is possible (as it is now for enums without associated values) or whether > the user should have to opt-in. > > > > > I actually don't see auto-generation as an alternative to a proper hashing > API. But rather as an addition. > HashVisitable and auto-deriving would actually work great together! > > > I completely agree that these ideas complement each other very well! And I > also agree that the language would do well to make it easier for users to > easily do the right thing. I just feel that *the API proposed here > specifically* adds too much complexity for the problem it's trying to solve. > > I'd find it more compelling if it was simplified a bit: > > * The standard library doesn't need to provide a number of hash > implementations; it should just provide one that works "well enough" in > most cases > * Hashable doesn't have tie itself to a visitor pattern. It's not > necessarily safer because users can still mix the wrong values; in that > case, I'd rather the stdlib implementation of the "good enough" hash just > be a standalone mixer that the language can encourage users to use. > I agree with this assessment. If SipHash is deemed "good enough," then Swift should offer these as standalone facilities and encourage users to call them in `Hashable`. I think the API design proposed here is much too complex, but offering tried-and-true hash functions is certainly reasonable and would be an improvement over xor. Certainly also, the documentation for `Hashable` can be improved. In general, however, it's not very convincing to motivate an entire proposal for a new feature based on a documentation bug. Recognize that there are (or at least have been, in past shipped versions of Swift) code snippets in the documentation that _don't even compile_! If we accepted the logic put forward here, we should give up on Swift entirely; after all, if even the official Swift documentation can't provide code that compiles, what hope do the rest of us have?! > > In fact they already do in other languages <https://is.gd/iy5770>. > > > On Mon, Mar 13, 2017 at 8:51 PM, Tony Allevato via swift-evolution < > swift-evolution@swift.org> wrote: > > I'm not convinced this is the right approach: it couples the fact that a > type is hashable with a specific strategy for implementing the hash value > computation. > > Instead, the language should make efforts to avoid requiring the user to > think about hashing algorithms *at all*; to answer Sean's question a couple > messages up, I've proposed in the past > <https://gist.github.com/allevato/2fd10290bfa84accfbe977d8ac07daad> (rough > working draft) adding derivation of Equatable and Hashable for structs and > enums where it's possible to automatically do so (for example, if all of > the enum's associated values are Equatable/Hashable). I've been > experimenting with an implementation in the compiler and I have something > that works for enums, except for recursive ones. I haven't started structs > yet because I think there are more open questions there, and I hope to > propose enums and structs independently so that the enum one doesn't get > bogged down by the struct one. > > The core team seems to be aware of the need for this; the logic that > derives Equatable/Hashable for enums without associated values also has > TODO comments to handle those with associated values, and to handle structs > as well. Slava Pestov mentioned a while back that they encouraged PRs on > it, which is why I started, but my free time has been limited lately. > > That being said, I *do* think there would be value in having some kind of > hash "mixer" as part of the standard library and strongly encouraging > through documentation that hashValue implementors use it. Getting the > function right is the hard part, and if Swift both (1) took care of it for > you by default in many cases and (2) made the harder cases easier by > providing a high quality canned implementation, I think we'd have a much > cleaner solution than what is being proposed here. > > > On Mon, Mar 13, 2017 at 12:18 PM Sean Heber via swift-evolution < > swift-evolution@swift.org> wrote: > > Is there any reason the API couldn’t be expressed as something along these > lines? > > func hash() -> [Hashable] { > return [x, y] > } > > l8r > Sean > > > > On Mar 13, 2017, at 2:15 PM, David Hart <da...@hartbit.com> wrote: > > > >> > >> On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution < > swift-evolution@swift.org> wrote: > >> > >> I’m dumb when it comes to proper hashing, but it’s such a tediously > common thing in my experience to need to add Hashable to some kind of a > struct so I can stash it in a set or use it as a dictionary key. Is there > really no way to make this all more automatic? I have to be honest - this > proposal seems *harder* to understand than the way it works now. > > > > It's not really harder: just call hash on each of your type's > significant values: > > > > x.hash(&hasher) > > y.hash(&hasher) > > > > How would you implement hashValue in a simpler way, remembering that 'x > ^ y' is an incorrect implementation? > > > >> Of course the easiest would be if the language could just do this “good > enough" for me using reflection or whatever and if I really did run into a > problem where I wanted to do this myself, I could override something. > >> > >> Perfect is the enemy of good. > >> > >> l8r > >> Sean > >> > >> > >>> On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution < > swift-evolution@swift.org> wrote: > >>> > >>> Hi there, > >>> > >>> I've written up a proposal for an overhaul/replacement of the Hashable > protocol and would love to hear your feedback on it! > >>> > >>> Rendered | Blog Post > >>> > >>> Cheers, > >>> Vincent > >>> > >>> Ps: I'd like to thank David Hart (@hartbit) for his great editorial > feedback on this proposal. 👍 > >>> > >>> HashVisitable > >>> > >>> • Proposal: SE-NNNN > >>> • Authors: Vincent Esche > >>> • Review Manager: TBD > >>> • Status: Awaiting review > >>> Introduction > >>> > >>> Replace the Hashable protocol by two new procotols (Hasher and > HashVisitable) to improve safety, versatility and learnability. > >>> > >>> Motivation > >>> > >>> Implementing Hashable is difficult and the consequences if not done > well have dire performance and safety repercussions. > >>> > >>> The documentation of Hashable lists a sample implementation of var > hashValue: > >>> > >>> /// A point in an x-y coordinate system. > >>> struct GridPoint { > >>> var x: Int > >>> var y: Int > >>> } > >>> > >>> extension GridPoint: Hashable { > >>> var hashValue: Int { > >>> return x.hashValue ^ y.hashValue > >>> } > >>> > >>> static func == (lhs: GridPoint, rhs: GridPoint) -> Bool { > >>> return lhs.x == rhs.x && lhs.y == rhs.y > >>> } > >>> } > >>> > >>> Calculating the hashes of all GridPoints (given the above > implementation) on a 1000 × 1000 grid … > >>> > >>> let (width, height) = (1000, 1000) > >>> let total = width * height > >>> var hashes = Set<Int>() > >>> for x in 0..<width { > >>> for y in 0..<height { > >>> hashes.insert(GridPoint(x: x, y: y).hashValue) > >>> } > >>> } > >>> print("\(hashes.count) unique hashes out of a total of \(total).") > >>> > >>> … results in just 1024 unique hash values for 1_000_000 unique values. > >>> > >>> In other words: The recommended implementation causes 99.9% of values > to trigger a hash collision. > >>> > >>> Out of those 1_000_000 values the median collision count was 976 with > min and max being 976 and 1000respectively. > >>> > >>> The collision rate will have negative impact in algorithms which > heavily use hashValue like the ones in Dictionaryand Set. Furthermore, it > increases the vulnerability to DDOS attacks when exposed to the web. > >>> > >>> If even the official Swift documentation gets the implementation of > hashValue wrong, then who is to expect the average Swift programmer to do > any better? > >>> > >>> In contrast, running the same snippet using HashVisitable and the > semi-secure Fnv1aHash (see below) results in zero collisions! > >>> > >>> Finally, the design of the Hashable protocol forces the use of one > implementation without the possibility of switching between multiple > hashing algorithms. > >>> > >>> Proposed solution > >>> > >>> Instead of coupling the hashing algorithm with each and every Swift > type, we should provide a hashing API based on the visitor-pattern. By > freeing application developers from the burden of having to implement > hashing algorithms, the Standard Library can provide default ones which > fulfill collision, performance and security goals. Furthermore, it would > allow developers to swap to different algorithms based on the use case. > >>> > >>> Detailed design > >>> > >>> The proposal deprecates the Hashable protocol and introduces the > following two: > >>> > >>> protocol Hasher > >>> { > >>> > >>> mutating func finish() -> Int > >>> > >>> > >>> mutating func write(bytes > >>> : UnsafeRawBufferPointer) > >>> } > >>> > >>> > >>> protocol HashVisitable > >>> { > >>> > >>> func hash<H: Hasher>(_ hasher: inout > >>> H) > >>> } > >>> > >>> Hasher is the protocol which represents a hashing algorithm, and > HashVisitable replaces Hashable. For types entirely represented by their > memory layout, the following protocol would provide a default > implementation: > >>> > >>> protocol ContiguouslyHashable: HashVisitable {} > >>> > >>> extension ContiguouslyHashable { > >>> func hash<H: Hasher>(_ hasher: inout H) { > >>> var mutableSelf = self > >>> try! Swift.withUnsafeBytes(of: &mutableSelf) { > >>> hasher.write(bytes: $0) > >>> } > >>> } > >>> } > >>> > >>> extension Bool : ContiguouslyHashable {} > >>> extension UInt8 : ContiguouslyHashable {} > >>> extension UInt16 : ContiguouslyHashable {} > >>> extension UInt32 : ContiguouslyHashable {} > >>> extension UInt64 : ContiguouslyHashable {} > >>> extension UInt : ContiguouslyHashable {} > >>> extension Int8 : ContiguouslyHashable {} > >>> extension Int16 : ContiguouslyHashable {} > >>> extension Int32 : ContiguouslyHashable {} > >>> extension Int64 : ContiguouslyHashable {} > >>> extension Int : ContiguouslyHashable {} > >>> > >>> The Standard-Library would then provide a set of hashing > implementations specific to each purpose. A possible choice for hashing > algorithms would be the reasonably fast SipHash-2-4, and the reasonably > secure SipHash-4-8. > >>> > >>> FNV-1A is another popular semi-secure but blazingly fast hash > algorithm, which – for the sake of demonstration – could be implemented as > follows: > >>> > >>> struct Fnv1aHash > >>> { > >>> > >>> fileprivate var state: UInt > >>> > >>> > >>> init(seed: UInt > >>> ) { > >>> > >>> self.state = seed &+ 14695981039346656037 > >>> > >>> } > >>> } > >>> > >>> > >>> extension Fnv1aHash: Hasher > >>> { > >>> > >>> mutating func write(bytes > >>> : UnsafeRawBufferPointer) { > >>> > >>> for byte in > >>> bytes { > >>> > >>> self.state = (self.state ^ UInt(byte)) &* 1099511628211 > >>> > >>> } > >>> } > >>> > >>> mutating func finish() -> Int > >>> { > >>> > >>> return unsafeBitCast(self.state, to: Int.self > >>> ) > >>> } > >>> } > >>> > >>> Coming back to the sample code present in the Hashable documentation, > the new implementation would look like: > >>> > >>> extension GridPoint: HashVisitable > >>> { > >>> > >>> func hash<H: Hasher>(_ hasher: inout > >>> H) { > >>> > >>> self.x.hash(& > >>> hasher) > >>> > >>> self.y.hash(& > >>> hasher) > >>> } > >>> } > >>> > >>> Source compatibility > >>> > >>> Making use of "extending protocols to conform to protocols": > >>> > >>> extension Hashable: HashVisitable > >>> { > >>> > >>> func hash<H: Hasher>(_ hasher: inout > >>> H) { > >>> > >>> self.hashValue.hash(& > >>> hasher) > >>> } > >>> } > >>> > >>> Effect on ABI stability > >>> > >>> n/a > >>> > >>> Effect on API resilience > >>> > >>> This feature should be possible to add/remove without breaking ABI. > >>> > >>> Alternatives considered > >>> > >>> n/a > >>> _______________________________________________ > >>> swift-evolution mailing list > >>> swift-evolution@swift.org > >>> https://lists.swift.org/mailman/listinfo/swift-evolution > >> > >> _______________________________________________ > >> swift-evolution mailing list > >> swift-evolution@swift.org > >> https://lists.swift.org/mailman/listinfo/swift-evolution > > _______________________________________________ > swift-evolution mailing list > swift-evolution@swift.org > https://lists.swift.org/mailman/listinfo/swift-evolution > > > _______________________________________________ > swift-evolution mailing list > swift-evolution@swift.org > https://lists.swift.org/mailman/listinfo/swift-evolution > > > > _______________________________________________ > swift-evolution mailing list > swift-evolution@swift.org > https://lists.swift.org/mailman/listinfo/swift-evolution > >
_______________________________________________ swift-evolution mailing list swift-evolution@swift.org https://lists.swift.org/mailman/listinfo/swift-evolution