Re: [swift-dev] Dictionary Collision Resolution Guarantees

2016-10-14 Thread Dave Abrahams via swift-dev

on Fri Oct 14 2016, Ole Begemann  wrote:

> On 14/10/2016 02:46, Dave Abrahams wrote:
>
>>> OK cool, is there any reason it’s even written down? I don’t see any code
>>> that’s obviously relying on it. (seems fine to delete it?)
>>
>> It's written down because we've never formalized our index invalidation
>> rules.  I didn't realize that this comment was related to index
>> invalidation.  The guarantee mentioned is one we might want to give, at
>> least under some circumstances.  We'll have to think about that more
>> carefully.  In any case, it's not time to delete it yet.
>
> For what it's worth, this rule is also explicitly mentioned in
> docs/IndexInvalidation.rst
> (https://github.com/apple/swift/blob/master/docs/IndexInvalidation.rst):
>
> "Insertion into a Dictionary invalidates indexes only on a rehash. If
> a Dictionary has enough free buckets (guaranteed by calling an
> initializer or reserving space), then inserting elements does not
> invalidate indexes.
>
> Note: unlike C++'s std::unordered_map, removing elements from a
> Dictionary invalidates indexes."
>
> (I realize the stuff in /docs is not necessarily official
> documentation (right?), I just wanted to mention it.)

Thanks for pointing that out; that's important.

-- 
-Dave
___
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev


Re: [swift-dev] Dictionary Collision Resolution Guarantees

2016-10-14 Thread Ole Begemann via swift-dev

On 14/10/2016 02:46, Dave Abrahams wrote:


OK cool, is there any reason it’s even written down? I don’t see any code
that’s obviously relying on it. (seems fine to delete it?)


It's written down because we've never formalized our index invalidation
rules.  I didn't realize that this comment was related to index
invalidation.  The guarantee mentioned is one we might want to give, at
least under some circumstances.  We'll have to think about that more
carefully.  In any case, it's not time to delete it yet.


For what it's worth, this rule is also explicitly mentioned in 
docs/IndexInvalidation.rst 
(https://github.com/apple/swift/blob/master/docs/IndexInvalidation.rst):


"Insertion into a Dictionary invalidates indexes only on a rehash. If a 
Dictionary has enough free buckets (guaranteed by calling an initializer 
or reserving space), then inserting elements does not invalidate indexes.


Note: unlike C++'s std::unordered_map, removing elements from a 
Dictionary invalidates indexes."


(I realize the stuff in /docs is not necessarily official documentation 
(right?), I just wanted to mention it.)

___
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev


Re: [swift-dev] Dictionary Collision Resolution Guarantees

2016-10-13 Thread Dave Abrahams via swift-dev

on Thu Oct 13 2016, Alexis  wrote:

>>> This is effectively assuming that the open-addressing scheme being
>>> used is first-come-first-serve (FCFS). That is, any element being
>>> inserted can *only* be inserted into vacant buckets, rather than
>>> displacing existing elements. This is currently only internal docs,
>>> but do we actually want to guarantee this? 
>> 
>> No, not to users.  But "//" comments are not user-level comments and
>> don't imply any guarantees.
>
> OK cool, is there any reason it’s even written down? I don’t see any code
> that’s obviously relying on it. (seems fine to delete it?)

It's written down because we've never formalized our index invalidation
rules.  I didn't realize that this comment was related to index
invalidation.  The guarantee mentioned is one we might want to give, at
least under some circumstances.  We'll have to think about that more
carefully.  In any case, it's not time to delete it yet.

-- 
-Dave
___
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev


Re: [swift-dev] Dictionary Collision Resolution Guarantees

2016-10-13 Thread Alexis via swift-dev

> On Oct 13, 2016, at 6:09 PM, Dave Abrahams via swift-dev 
>  wrote:
> 
> 
> on Thu Oct 13 2016, Alexis  > wrote:
> 
>> I’m currently cleaning up the docs on Dictionary to reflect the new indexing 
>> model, and I stumbled
>> across the note that the following code should work assuming no 
>> reallocations occur:
>> 
>> //
>> //   var (i, found) = d.find(k) // i is associated with d's buffer
>> //   if found {
>> //  var e = d// now d is sharing its data with e
>> //  e[newKey] = newValue // e now has a unique copy of the data
>> //  return e[i]  // use i to access e
>> //   }
>> //
>> 
>> This is effectively assuming that the open-addressing scheme being
>> used is first-come-first-serve (FCFS). That is, any element being
>> inserted can *only* be inserted into vacant buckets, rather than
>> displacing existing elements. This is currently only internal docs,
>> but do we actually want to guarantee this? 
> 
> No, not to users.  But "//" comments are not user-level comments and
> don't imply any guarantees.

OK cool, is there any reason it’s even written down? I don’t see any code
that’s obviously relying on it. (seems fine to delete it?)

> 
>> The popular alternative of robin hood (RH) doesn’t follow this.
>> 
>> Some background: One interesting optimization avenue worth exploring
>> is to tweak Dictionary to store hashes, rather than bits, to identify
>> occupied slots (with some careful masking so that 0 still means
>> “unoccupied”). Then searching for elements can be implemented as
>> follows:
>> 
>> let hash = hashFromKey(key)
>> var i = indexFromHash(hash)
>> while true {
>>  if hashes[i] == 0 {
>>// vacant, not contained
>>  } else if hashes[i] == hash {
>>// maybe already exists?
>>if keys[i] == key {
>>  // actually exists
>>}
>>  }
>>  i = nextIndex(i)
>> }
>> 
>> The interesting property of this is that it almost all of the search
>> time is spent linearly walking through an array and doing simple
>> comparisons on integers, which is of course really easy to optimize
>> (potentially even SIMD). 
> 
> Yep, that's cool.  It does cost a lot more storage, though.  Tradeoffs.
> 
>> 99.% of the time, if the element isn’t stored in the Dictionary,
>> then you’ll just hit a 0 hash, and never look at the keys. Similarly,
>> 99.% of the time, if the element *is* stored in the Dictionary,
>> you’ll only do a single key comparison (on the actually equal
>> element). So this is also really great for cache — pretty much all of
>> your access are just linear scans of the hashes.
>> 
>> The main downside is you’re now “wasting” more memory to store hashes,
>> but you can potentially compensate for this by truncating the hashes
>> to 8/16 bits for small Dictionaries (which will be better for SIMD
>> anyway).
> 
> Hey, cool.
> 
>> So what does this have to do with the RH scheme? Compared to FCFS, RH
>> generally leads to lower variance on search distances, and provides a
>> mechanism to short-circuit long runs if you have the hashes handy. If
>> you find hashes which want to live later in the array than where you
>> started, then the current element definitely isn’t contained. This
>> means you can increase the load factor on your dictionary and further
>> improve cache/memory usage (compensating for the memory-usage loss
>> from storing hashes). However it will be prohibitively expensive if
>> you require hashes to be recomputed on-the-fly.
> 
> Sounds like a worthy optimization!
> 
> -- 
> -Dave
> 
> ___
> swift-dev mailing list
> swift-dev@swift.org 
> https://lists.swift.org/mailman/listinfo/swift-dev 
> 
___
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev


Re: [swift-dev] Dictionary Collision Resolution Guarantees

2016-10-13 Thread Dave Abrahams via swift-dev

on Thu Oct 13 2016, Alexis  wrote:

> I’m currently cleaning up the docs on Dictionary to reflect the new indexing 
> model, and I stumbled
> across the note that the following code should work assuming no reallocations 
> occur:
>
> //
> //   var (i, found) = d.find(k) // i is associated with d's buffer
> //   if found {
> //  var e = d// now d is sharing its data with e
> //  e[newKey] = newValue // e now has a unique copy of the data
> //  return e[i]  // use i to access e
> //   }
> //
>
> This is effectively assuming that the open-addressing scheme being
> used is first-come-first-serve (FCFS). That is, any element being
> inserted can *only* be inserted into vacant buckets, rather than
> displacing existing elements. This is currently only internal docs,
> but do we actually want to guarantee this? 

No, not to users.  But "//" comments are not user-level comments and
don't imply any guarantees.

> The popular alternative of robin hood (RH) doesn’t follow this.
>
> Some background: One interesting optimization avenue worth exploring
> is to tweak Dictionary to store hashes, rather than bits, to identify
> occupied slots (with some careful masking so that 0 still means
> “unoccupied”). Then searching for elements can be implemented as
> follows:
>
> let hash = hashFromKey(key)
> var i = indexFromHash(hash)
> while true {
>   if hashes[i] == 0 {
> // vacant, not contained
>   } else if hashes[i] == hash {
> // maybe already exists?
> if keys[i] == key {
>   // actually exists
> }
>   }
>   i = nextIndex(i)
> }
>
> The interesting property of this is that it almost all of the search
> time is spent linearly walking through an array and doing simple
> comparisons on integers, which is of course really easy to optimize
> (potentially even SIMD). 

Yep, that's cool.  It does cost a lot more storage, though.  Tradeoffs.

> 99.% of the time, if the element isn’t stored in the Dictionary,
> then you’ll just hit a 0 hash, and never look at the keys. Similarly,
> 99.% of the time, if the element *is* stored in the Dictionary,
> you’ll only do a single key comparison (on the actually equal
> element). So this is also really great for cache — pretty much all of
> your access are just linear scans of the hashes.
>
> The main downside is you’re now “wasting” more memory to store hashes,
> but you can potentially compensate for this by truncating the hashes
> to 8/16 bits for small Dictionaries (which will be better for SIMD
> anyway).

Hey, cool.

> So what does this have to do with the RH scheme? Compared to FCFS, RH
> generally leads to lower variance on search distances, and provides a
> mechanism to short-circuit long runs if you have the hashes handy. If
> you find hashes which want to live later in the array than where you
> started, then the current element definitely isn’t contained. This
> means you can increase the load factor on your dictionary and further
> improve cache/memory usage (compensating for the memory-usage loss
> from storing hashes). However it will be prohibitively expensive if
> you require hashes to be recomputed on-the-fly.

Sounds like a worthy optimization!

-- 
-Dave

___
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev


[swift-dev] Dictionary Collision Resolution Guarantees

2016-10-13 Thread Alexis via swift-dev
I’m currently cleaning up the docs on Dictionary to reflect the new indexing 
model, and I stumbled across the note that the following code should work 
assuming no reallocations occur:

//
//   var (i, found) = d.find(k) // i is associated with d's buffer
//   if found {
//  var e = d// now d is sharing its data with e
//  e[newKey] = newValue // e now has a unique copy of the data
//  return e[i]  // use i to access e
//   }
//

This is effectively assuming that the open-addressing scheme being used is 
first-come-first-serve (FCFS). That is, any element being inserted can *only* 
be inserted into vacant buckets, rather than displacing existing elements. This 
is currently only internal docs, but do we actually want to guarantee this? The 
popular alternative of robin hood (RH) doesn’t follow this.

Some background: One interesting optimization avenue worth exploring is to 
tweak Dictionary to store hashes, rather than bits, to identify occupied slots 
(with some careful masking so that 0 still means “unoccupied”). Then searching 
for elements can be implemented as follows:

let hash = hashFromKey(key)
var i = indexFromHash(hash)
while true {
  if hashes[i] == 0 {
// vacant, not contained
  } else if hashes[i] == hash {
// maybe already exists?
if keys[i] == key {
  // actually exists
}
  }
  i = nextIndex(i)
}

The interesting property of this is that it almost all of the search time is 
spent linearly walking through an array and doing simple comparisons on 
integers, which is of course really easy to optimize (potentially even SIMD). 
99.% of the time, if the element isn’t stored in the Dictionary, then 
you’ll just hit a 0 hash, and never look at the keys. Similarly, 99.% of 
the time, if the element *is* stored in the Dictionary, you’ll only do a single 
key comparison (on the actually equal element). So this is also really great 
for cache — pretty much all of your access are just linear scans of the hashes. 

The main downside is you’re now “wasting” more memory to store hashes, but you 
can potentially compensate for this by truncating the hashes to 8/16 bits for 
small Dictionaries (which will be better for SIMD anyway).

So what does this have to do with the RH scheme? Compared to FCFS, RH generally 
leads to lower variance on search distances, and provides a mechanism to 
short-circuit long runs if you have the hashes handy. If you find hashes which 
want to live later in the array than where you started, then the current 
element definitely isn’t contained. This means you can increase the load factor 
on your dictionary and further improve cache/memory usage (compensating for the 
memory-usage loss from storing hashes). However it will be prohibitively 
expensive if you require hashes to be recomputed on-the-fly.
___
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev