Re: hashtables

2022-04-06 Thread Dr. Arne Babenhauserheide

Stefan Israelsson Tampe  writes:

> After optimising the hashtable implementation I conclude the following,
> 1. + 0.25s  10M lookups in small hashtable  (0.36s for guiles current)
> 2. + 6X faster for a large table scanning numbers

What do you mean by "large taable scanning numbers"? (I don’t understand)

> 3. + The general hash interface is faster also because it does not call SCM 
> from C
> 4. + fast (2X) for relative large random working sets although the hashtable 
> is large

That would likely directly speed up code I have by factor 2. 

> 5. - we do not suport the assoc generalisation (sort off though)

What do you mean by assoc generalization?

> 6. Compact (low memory overhead)

All in all these improvements sound great! And low memory overhead is
the icing on the cake.

How big is the remaining overhead? (compared to storing the values in a vector?)

> Let me know if you want to have this supported in guile as I have defined 
> some vm operations that are needed else this will be about 1.5 2x slower on 
> small hashtables.

Since I do not know the implications of this on the VM, I cannot give an
answer.

If Andy Wingo doesn’t object, then I think these improvements would be
awesome to have!

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein,
ohne es zu merken.
draketo.de


signature.asc
Description: PGP signature


hashtables

2022-04-06 Thread Stefan Israelsson Tampe
Hi all,

After optimising the hashtable implementation I conclude the following,
1. + 0.25s  10M lookups in small hashtable  (0.36s for guiles current)
2. + 6X faster for a large table scanning numbers
3. + The general hash interface is faster also because it does not call SCM
from C
4. + fast (2X) for relative large random working sets although the
hashtable is large
5. - we do not suport the assoc generalisation (sort off though)
6. Compact (low memory overhead)

To note the design allows high performance for non uniform hashes e.g. we
can allow it to not spread out the hashes as optimal e.g. we can grow the
bins quite well and still maintain speed. This is especially well suited
for python hash strategies where integers just translate to integers.

This uses a datastructures that take advantage of SIMD instructions to
quickly find matches and vacant slots. Also an lru strategy is included for
speedup when the hash is large and the working set is not as large.

This is with the low level primitives. I will continue now with the higher
level primitives.

Let me know if you want to have this supported in guile as I have defined
some vm operations that are needed else this will be about 1.5 2x slower on
small hashtables.

Code:
;; This is the slow path where the bin is larger than 15
;; elements
(define (stis-d-ref v k default h eq)
  (let ((l (stis-b-alist-ref v h)))
(let lp ((ll l))
  (if (pair? ll)
  (let ((it (car ll)))
 (if (eq k (car it))
  it
  (lp (cdr ll
  default


;; stis-b-search v is hashtable, k is key, h is hashvalue eq
;; the eqality predicate, stis-e-ref is the underlining lru
;; VM instruction used to lookup most (key . value)
(define-inlinable (stis-c-ref v k default h eq)
  (define (slow-ref) (stis-d-ref v k default h eq))
(aif it (stis-e-search v h)
 (if (pair? it)
 (if (eq (car it) k)
 it
 (slow-ref))
 (slow-ref))
  default))

;; This is the slow set where one needs to cons the pair
;; on the last slot that is the assoc
(define (stis-d-set! v k val h eq)
  (let ((l (stis-b-alist-ref v h)))
(let lp ((ll l))
  (if (pair? ll)
  (let ((it (car ll)))
 (if (eq k (car it))
 (begin
(set-cdr! it val)
it)
 (lp (cdr ll
   (let* ((it (cons k val))
 (l  (cons it l)))
 (stis-b-alist-set! v h l)
 it)

;; This is the storing of a (k . val) pair in the
;; stis hash data structure
;; if we find a handle, set it else if the key hash is
;; missmatching, store it on the assoc, else if not a match
;; add it to the list via the c-primitive stis-b-add!
(define-inlinable (stis-c-set! v k val h eq)
  (define (slow-set!) (stis-d-set! v k val h eq))
(aif it (stis-e-search v h)
 (if (pair? it)
 (if (eq (car it) k)
 (begin
   (set-cdr! it val)
   it)
 (slow-set!))
 (slow-set!))
 (let ((it (cons k val)))
(stis-b-add! v h it)
it)))

Code:
see my previous post, but it's in guile-persist (gitlab)