Re: [swift-dev] Dictionary Collision Resolution Guarantees

Alexis via swift-dev Thu, 13 Oct 2016 17:17:51 -0700

> On Oct 13, 2016, at 6:09 PM, Dave Abrahams via swift-dev 
> <[email protected]> wrote:
> 
> 
> on Thu Oct 13 2016, Alexis <swift-dev-AT-swift.org 
> <http://swift-dev-at-swift.org/>> wrote:
> 
>> I’m currently cleaning up the docs on Dictionary to reflect the new indexing 
>> model, and I stumbled
>> across the note that the following code should work assuming no 
>> reallocations occur:
>> 
>> //
>> //   var (i, found) = d.find(k) // i is associated with d's buffer
>> //   if found {
>> //      var e = d            // now d is sharing its data with e
>> //      e[newKey] = newValue // e now has a unique copy of the data
>> //      return e[i]          // use i to access e
>> //   }
>> //
>> 
>> This is effectively assuming that the open-addressing scheme being
>> used is first-come-first-serve (FCFS). That is, any element being
>> inserted can *only* be inserted into vacant buckets, rather than
>> displacing existing elements. This is currently only internal docs,
>> but do we actually want to guarantee this? 
> 
> No, not to users.  But "//" comments are not user-level comments and
> don't imply any guarantees.


OK cool, is there any reason it’s even written down? I don’t see any code
that’s obviously relying on it. (seems fine to delete it?)

> 
>> The popular alternative of robin hood (RH) doesn’t follow this.
>> 
>> Some background: One interesting optimization avenue worth exploring
>> is to tweak Dictionary to store hashes, rather than bits, to identify
>> occupied slots (with some careful masking so that 0 still means
>> “unoccupied”). Then searching for elements can be implemented as
>> follows:
>> 
>> let hash = hashFromKey(key)
>> var i = indexFromHash(hash)
>> while true {
>>  if hashes[i] == 0 {
>>    // vacant, not contained
>>  } else if hashes[i] == hash {
>>    // maybe already exists?
>>    if keys[i] == key {
>>      // actually exists
>>    }
>>  }
>>  i = nextIndex(i)
>> }
>> 
>> The interesting property of this is that it almost all of the search
>> time is spent linearly walking through an array and doing simple
>> comparisons on integers, which is of course really easy to optimize
>> (potentially even SIMD). 
> 
> Yep, that's cool.  It does cost a lot more storage, though.  Tradeoffs.
> 
>> 99.9999% of the time, if the element isn’t stored in the Dictionary,
>> then you’ll just hit a 0 hash, and never look at the keys. Similarly,
>> 99.9999% of the time, if the element *is* stored in the Dictionary,
>> you’ll only do a single key comparison (on the actually equal
>> element). So this is also really great for cache — pretty much all of
>> your access are just linear scans of the hashes.
>> 
>> The main downside is you’re now “wasting” more memory to store hashes,
>> but you can potentially compensate for this by truncating the hashes
>> to 8/16 bits for small Dictionaries (which will be better for SIMD
>> anyway).
> 
> Hey, cool.
> 
>> So what does this have to do with the RH scheme? Compared to FCFS, RH
>> generally leads to lower variance on search distances, and provides a
>> mechanism to short-circuit long runs if you have the hashes handy. If
>> you find hashes which want to live later in the array than where you
>> started, then the current element definitely isn’t contained. This
>> means you can increase the load factor on your dictionary and further
>> improve cache/memory usage (compensating for the memory-usage loss
>> from storing hashes). However it will be prohibitively expensive if
>> you require hashes to be recomputed on-the-fly.
> 
> Sounds like a worthy optimization!
> 
> -- 
> -Dave
> 
> _______________________________________________
> swift-dev mailing list
> [email protected] <mailto:[email protected]>
> https://lists.swift.org/mailman/listinfo/swift-dev 
> <https://lists.swift.org/mailman/listinfo/swift-dev>

_______________________________________________
swift-dev mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-dev

Re: [swift-dev] Dictionary Collision Resolution Guarantees

Reply via email to