While I understand and agree with the goal of the proposal, I too feel like 
it’s more difficult to understand than the current approach. HashVisitable 
looks odd, and one would have to understand the role and API of the Hasher 
protocol.

I would argue that if Swift provided a good default hashing API, then we could 
simply call it when computing the hashValue.

struct GridPoint: Hashable {
  var x: Int
  var y: Int
  var hashValue: Int {
    let hasher = Swift.DefaultHasher()
    hasher.hash(self.x)
    hasher.hash(self.y)
    return hasher.value
  }
  static func == (lhs: GridPoint, rhs: GridPoint) -> Bool { /* … */ }
}

It might seem wordy, but it would allow one to leverage good hashing 
algorithms, while not forcing people who do have a good hashing function for 
their particular type to define their own hasher.

On an unrelated note, maybe I missed something but I think HashVisitable would 
still need to conform to Equatable as well, if it were to replace Hashable. 
This is unclear in your proposal, as the equality operator is present in the 
motivational example, but absent from the proposed solution.

On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution 
<[email protected]<mailto:[email protected]>> wrote:

I’m dumb when it comes to proper hashing, but it’s such a tediously common 
thing in my experience to need to add Hashable to some kind of a struct so I 
can stash it in a set or use it as a dictionary key. Is there really no way to 
make this all more automatic? I have to be honest - this proposal seems 
*harder* to understand than the way it works now. Of course the easiest would 
be if the language could just do this “good enough" for me using reflection or 
whatever and if I really did run into a problem where I wanted to do this 
myself, I could override something.

Perfect is the enemy of good.

l8r
Sean


On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution 
<[email protected]<mailto:[email protected]>> wrote:

Hi there,

I've written up a proposal for an overhaul/replacement of the Hashable protocol 
and would love to hear your feedback on it!

Rendered | Blog Post

Cheers,
Vincent

Ps: I'd like to thank David Hart (@hartbit) for his great editorial feedback on 
this proposal. 👍

HashVisitable

• Proposal: SE-NNNN
• Authors: Vincent Esche
• Review Manager: TBD
• Status: Awaiting review
Introduction

Replace the Hashable protocol by two new procotols (Hasher and HashVisitable) 
to improve safety, versatility and learnability.

Motivation

Implementing Hashable is difficult and the consequences if not done well have 
dire performance and safety repercussions.

The documentation of Hashable lists a sample implementation of var hashValue:

/// A point in an x-y coordinate system.
struct GridPoint {
   var x: Int
   var y: Int
}

extension GridPoint: Hashable {
   var hashValue: Int {
       return x.hashValue ^ y.hashValue
   }

   static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
       return lhs.x == rhs.x && lhs.y == rhs.y
   }
}

Calculating the hashes of all GridPoints (given the above implementation) on a 
1000 × 1000 grid …

let (width, height) = (1000, 1000)
let total = width * height
var hashes = Set<Int>()
for x in 0..<width {
   for y in 0..<height {
       hashes.insert(GridPoint(x: x, y: y).hashValue)
   }
}
print("\(hashes.count) unique hashes out of a total of \(total).")

… results in just 1024 unique hash values for 1_000_000 unique values.

In other words: The recommended implementation causes 99.9% of values to 
trigger a hash collision.

Out of those 1_000_000 values the median collision count was 976 with min and 
max being 976 and 1000respectively.

The collision rate will have negative impact in algorithms which heavily use 
hashValue like the ones in Dictionaryand Set. Furthermore, it increases the 
vulnerability to DDOS attacks when exposed to the web.

If even the official Swift documentation gets the implementation of hashValue 
wrong, then who is to expect the average Swift programmer to do any better?

In contrast, running the same snippet using HashVisitable and the semi-secure 
Fnv1aHash (see below) results in zero collisions!

Finally, the design of the Hashable protocol forces the use of one 
implementation without the possibility of switching between multiple hashing 
algorithms.

Proposed solution

Instead of coupling the hashing algorithm with each and every Swift type, we 
should provide a hashing API based on the visitor-pattern. By freeing 
application developers from the burden of having to implement hashing 
algorithms, the Standard Library can provide default ones which fulfill 
collision, performance and security goals. Furthermore, it would allow 
developers to swap to different algorithms based on the use case.

Detailed design

The proposal deprecates the Hashable protocol and introduces the following two:

protocol Hasher
{

mutating func finish() -> Int


mutating func write(bytes
: UnsafeRawBufferPointer)
}


protocol HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H)
}

Hasher is the protocol which represents a hashing algorithm, and HashVisitable 
replaces Hashable. For types entirely represented by their memory layout, the 
following protocol would provide a default implementation:

protocol ContiguouslyHashable: HashVisitable {}

extension ContiguouslyHashable {
   func hash<H: Hasher>(_ hasher: inout H) {
       var mutableSelf = self
       try! Swift.withUnsafeBytes(of: &mutableSelf) {
           hasher.write(bytes: $0)
       }
   }
}

extension Bool : ContiguouslyHashable {}
extension UInt8 : ContiguouslyHashable {}
extension UInt16 : ContiguouslyHashable {}
extension UInt32 : ContiguouslyHashable {}
extension UInt64 : ContiguouslyHashable {}
extension UInt : ContiguouslyHashable {}
extension Int8 : ContiguouslyHashable {}
extension Int16 : ContiguouslyHashable {}
extension Int32 : ContiguouslyHashable {}
extension Int64 : ContiguouslyHashable {}
extension Int : ContiguouslyHashable {}

The Standard-Library would then provide a set of hashing implementations 
specific to each purpose. A possible choice for hashing algorithms would be the 
reasonably fast SipHash-2-4, and the reasonably secure SipHash-4-8.

FNV-1A is another popular semi-secure but blazingly fast hash algorithm, which 
– for the sake of demonstration – could be implemented as follows:

struct Fnv1aHash
{

fileprivate var state: UInt


init(seed: UInt
) {

self.state = seed &+ 14695981039346656037

   }
}


extension Fnv1aHash: Hasher
{

mutating func write(bytes
: UnsafeRawBufferPointer) {

for byte in
bytes {

self.state = (self.state ^ UInt(byte)) &* 1099511628211

       }
   }

mutating func finish() -> Int
{

return unsafeBitCast(self.state, to: Int.self
)
   }
}

Coming back to the sample code present in the Hashable documentation, the new 
implementation would look like:

extension GridPoint: HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H) {

self.x.hash(&
hasher)

self.y.hash(&
hasher)
   }
}

Source compatibility

Making use of "extending protocols to conform to protocols":

extension Hashable: HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H) {

self.hashValue.hash(&
hasher)
   }
}

Effect on ABI stability

n/a

Effect on API resilience

This feature should be possible to add/remove without breaking ABI.

Alternatives considered

n/a
_______________________________________________
swift-evolution mailing list
[email protected]<mailto:[email protected]>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
[email protected]<mailto:[email protected]>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to