on Thu Oct 13 2016, Alexis <swift-dev-AT-swift.org> wrote: > I’m currently cleaning up the docs on Dictionary to reflect the new indexing > model, and I stumbled > across the note that the following code should work assuming no reallocations > occur: > > // > // var (i, found) = d.find(k) // i is associated with d's buffer > // if found { > // var e = d // now d is sharing its data with e > // e[newKey] = newValue // e now has a unique copy of the data > // return e[i] // use i to access e > // } > // > > This is effectively assuming that the open-addressing scheme being > used is first-come-first-serve (FCFS). That is, any element being > inserted can *only* be inserted into vacant buckets, rather than > displacing existing elements. This is currently only internal docs, > but do we actually want to guarantee this?
No, not to users. But "//" comments are not user-level comments and don't imply any guarantees. > The popular alternative of robin hood (RH) doesn’t follow this. > > Some background: One interesting optimization avenue worth exploring > is to tweak Dictionary to store hashes, rather than bits, to identify > occupied slots (with some careful masking so that 0 still means > “unoccupied”). Then searching for elements can be implemented as > follows: > > let hash = hashFromKey(key) > var i = indexFromHash(hash) > while true { > if hashes[i] == 0 { > // vacant, not contained > } else if hashes[i] == hash { > // maybe already exists? > if keys[i] == key { > // actually exists > } > } > i = nextIndex(i) > } > > The interesting property of this is that it almost all of the search > time is spent linearly walking through an array and doing simple > comparisons on integers, which is of course really easy to optimize > (potentially even SIMD). Yep, that's cool. It does cost a lot more storage, though. Tradeoffs. > 99.9999% of the time, if the element isn’t stored in the Dictionary, > then you’ll just hit a 0 hash, and never look at the keys. Similarly, > 99.9999% of the time, if the element *is* stored in the Dictionary, > you’ll only do a single key comparison (on the actually equal > element). So this is also really great for cache — pretty much all of > your access are just linear scans of the hashes. > > The main downside is you’re now “wasting” more memory to store hashes, > but you can potentially compensate for this by truncating the hashes > to 8/16 bits for small Dictionaries (which will be better for SIMD > anyway). Hey, cool. > So what does this have to do with the RH scheme? Compared to FCFS, RH > generally leads to lower variance on search distances, and provides a > mechanism to short-circuit long runs if you have the hashes handy. If > you find hashes which want to live later in the array than where you > started, then the current element definitely isn’t contained. This > means you can increase the load factor on your dictionary and further > improve cache/memory usage (compensating for the memory-usage loss > from storing hashes). However it will be prohibitively expensive if > you require hashes to be recomputed on-the-fly. Sounds like a worthy optimization! -- -Dave _______________________________________________ swift-dev mailing list swift-dev@swift.org https://lists.swift.org/mailman/listinfo/swift-dev