> On Oct 13, 2016, at 6:09 PM, Dave Abrahams via swift-dev > <swift-dev@swift.org> wrote: > > > on Thu Oct 13 2016, Alexis <swift-dev-AT-swift.org > <http://swift-dev-at-swift.org/>> wrote: > >> I’m currently cleaning up the docs on Dictionary to reflect the new indexing >> model, and I stumbled >> across the note that the following code should work assuming no >> reallocations occur: >> >> // >> // var (i, found) = d.find(k) // i is associated with d's buffer >> // if found { >> // var e = d // now d is sharing its data with e >> // e[newKey] = newValue // e now has a unique copy of the data >> // return e[i] // use i to access e >> // } >> // >> >> This is effectively assuming that the open-addressing scheme being >> used is first-come-first-serve (FCFS). That is, any element being >> inserted can *only* be inserted into vacant buckets, rather than >> displacing existing elements. This is currently only internal docs, >> but do we actually want to guarantee this? > > No, not to users. But "//" comments are not user-level comments and > don't imply any guarantees.
OK cool, is there any reason it’s even written down? I don’t see any code that’s obviously relying on it. (seems fine to delete it?) > >> The popular alternative of robin hood (RH) doesn’t follow this. >> >> Some background: One interesting optimization avenue worth exploring >> is to tweak Dictionary to store hashes, rather than bits, to identify >> occupied slots (with some careful masking so that 0 still means >> “unoccupied”). Then searching for elements can be implemented as >> follows: >> >> let hash = hashFromKey(key) >> var i = indexFromHash(hash) >> while true { >> if hashes[i] == 0 { >> // vacant, not contained >> } else if hashes[i] == hash { >> // maybe already exists? >> if keys[i] == key { >> // actually exists >> } >> } >> i = nextIndex(i) >> } >> >> The interesting property of this is that it almost all of the search >> time is spent linearly walking through an array and doing simple >> comparisons on integers, which is of course really easy to optimize >> (potentially even SIMD). > > Yep, that's cool. It does cost a lot more storage, though. Tradeoffs. > >> 99.9999% of the time, if the element isn’t stored in the Dictionary, >> then you’ll just hit a 0 hash, and never look at the keys. Similarly, >> 99.9999% of the time, if the element *is* stored in the Dictionary, >> you’ll only do a single key comparison (on the actually equal >> element). So this is also really great for cache — pretty much all of >> your access are just linear scans of the hashes. >> >> The main downside is you’re now “wasting” more memory to store hashes, >> but you can potentially compensate for this by truncating the hashes >> to 8/16 bits for small Dictionaries (which will be better for SIMD >> anyway). > > Hey, cool. > >> So what does this have to do with the RH scheme? Compared to FCFS, RH >> generally leads to lower variance on search distances, and provides a >> mechanism to short-circuit long runs if you have the hashes handy. If >> you find hashes which want to live later in the array than where you >> started, then the current element definitely isn’t contained. This >> means you can increase the load factor on your dictionary and further >> improve cache/memory usage (compensating for the memory-usage loss >> from storing hashes). However it will be prohibitively expensive if >> you require hashes to be recomputed on-the-fly. > > Sounds like a worthy optimization! > > -- > -Dave > > _______________________________________________ > swift-dev mailing list > swift-dev@swift.org <mailto:swift-dev@swift.org> > https://lists.swift.org/mailman/listinfo/swift-dev > <https://lists.swift.org/mailman/listinfo/swift-dev>
_______________________________________________ swift-dev mailing list swift-dev@swift.org https://lists.swift.org/mailman/listinfo/swift-dev