Re: [racket-users] Re: Time of hash-ref when the key is (not) found

2016-08-12 Thread Gustavo Massaccesi
Ok. To reproduce the results I have to use:
  (define char (integer->char 37932))

Thank to both.

Gustavo


On Fri, Aug 12, 2016 at 7:08 PM, Matthew Flatt  wrote:
> That still seems consistent with Alex's explanation. To test Alex's
> explanation, I think you need a hash table with a key whose hash code
> matches the one for the value you check, but where the key is not equal
> that value.
>
> Here's a variant of your code with a `bignum` and `char` that have the
> same hash code, and `other` with a different hash code. The fastest
> case is the last one, where there's no match on hash code. The slowest
> case in the second one, where there's a match on the hash code but the
> values are not `equal?`. The matching case is in between.
>
>
> #lang racket/base
>
> ;; These two have the same `equal?` hash code internally:
> (define bignum (expt 2 64))
> (define char (integer->char 4164))
>
> ;; Has a different hash code:
> (define other 0)
>
> (define hash-with-bignum (hash bignum #t))
> (define hash-with-char (hash char #t))
> (define hash-with-other (hash other #t))
>
> (for ([rep (in-range 5)])
>   (display "found: ")
>   (time
>(for ([i (in-range 1000)])
>  (hash-ref hash-with-bignum bignum #f))
>(void))
>
>   (display "not found, but hash code found: ")
>   (time
>(for ([i (in-range 1000)])
>  (hash-ref hash-with-char bignum #f))
>(void))
>
>   (display "hash code not found: ")
>   (time
>(for ([i (in-range 1000)])
>  (hash-ref hash-with-other bignum #f))
>(void))
>   )
>
> At Fri, 12 Aug 2016 17:27:29 -0300, Gustavo Massaccesi wrote:
>> I think it's something more subtle. With this definitions:
>>
>>  (define long-assoc (for/list ([i (in-range 64 (+ 64 1024))])
>>   (cons i #t)))
>>  (define hash0 (make-immutable-hash (cons '(0 . #t) long-assoc)))
>>  (define hash1 (make-immutable-hash (cons '(1 . #t) long-assoc)))
>>
>> With 5 repetitions I get:
>>
>> h0: average 3872 +- 30
>> h1: average 2511 +- 22
>>
>> I get similar results replacing 1024 with other numbers. For example
>> with 1048576 both times increase, but the difference is still there.
>>
>> Gustavo
>>
>>
>> On Fri, Aug 12, 2016 at 12:17 AM, Alex Harsanyi  
>> wrote:
>> > On Friday, August 12, 2016 at 10:09:21 AM UTC+8, gustavo wrote:
>> >> I have these strange times in a microbenchmark that compares the time
>> >> to run hash-ref when the key is in the hash and when it is not there:
>> >>
>> >> ;---
>> >> #lang racket/base
>> >> (define hash0 #hash((0 . #t)))
>> >> (define hash1 #hash((1 . #t)))
>> >>
>> >> (for ([rep (in-range 5)])
>> >>   (display "h0: ")
>> >>   (time
>> >>(for ([i (in-range 1000)])
>> >>  (hash-ref hash0 0 #f))
>> >>(void))
>> >>
>> >>   (display "h1: ")
>> >>   (time
>> >>(for ([i (in-range 1000)])
>> >>  (hash-ref hash1 0 #f))
>> >>(void))
>> >> )
>> >> ;---
>> >>
>> >> Typical run times, sorted by cpu time
>> >>
>> >> found:
>> >> h0: cpu time: 3775 real time: 3798 gc time: 0
>> >> h0: cpu time: 3900 real time: 3936 gc time: 0
>> >> h0: cpu time: 3916 real time: 4009 gc time: 0
>> >> h0: cpu time: 3947 real time: 4006 gc time: 0
>> >> h0: cpu time: 3978 real time: 4096 gc time: 0
>> >>
>> >> not found:
>> >> h1: cpu time: 2278 real time: 2270 gc time: 0
>> >> h1: cpu time: 2293 real time: 2360 gc time: 0
>> >> h1: cpu time: 2434 real time: 2462 gc time: 0
>> >> h1: cpu time: 2449 real time: 2506 gc time: 0
>> >> h1: cpu time: 2589 real time: 2682 gc time: 0
>> >>
>> >> The "not found" version is 40% faster. I tried a few variation with
>> >> small hashes and I got similar results.
>> >>
>> >> I expected that both have roughly the same time (or that the "found"
>> >> version were slightly faster).
>> >>
>> >> Gustavo
>> >
>> > The "found case" probably has the following 3 steps:
>> >
>> > 1. compute hash value of the lookup key (0 in your case)
>> > 2. locate the bucket for that hash value
>> > 3. compare the value 0 against the first value in the bucket => *found*
>> >
>> > Well, the "not found" case probably has the following 2 steps:
>> >
>> > 1. compute hash value of the lookup key (0 in your case)
>> > 2. locate the bucket for that hash value => *not found*
>> >
>> > So there is less work to do in the "not found" case.
>> >
>> > I suspect if you looked up a value that hashes to the same bucket as the
>> value 0, you would get the same time for the "not found" case.
>> >
>> > A better test is to use a hash with lots of keys, and I'm not sure what 
>> > lots
>> mean here.
>> >
>> > Best Regards,
>> > Alex.
>> >
>> > --
>> > You received this message because you are subscribed to the Google Groups
>> "Racket Users" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an
>> email to racket-users+unsubscr...@googlegroups.com.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are 

Re: [racket-users] Re: Time of hash-ref when the key is (not) found

2016-08-12 Thread Matthew Flatt
That still seems consistent with Alex's explanation. To test Alex's
explanation, I think you need a hash table with a key whose hash code
matches the one for the value you check, but where the key is not equal
that value.

Here's a variant of your code with a `bignum` and `char` that have the
same hash code, and `other` with a different hash code. The fastest
case is the last one, where there's no match on hash code. The slowest
case in the second one, where there's a match on the hash code but the
values are not `equal?`. The matching case is in between.


#lang racket/base

;; These two have the same `equal?` hash code internally:
(define bignum (expt 2 64))
(define char (integer->char 4164))

;; Has a different hash code:
(define other 0)

(define hash-with-bignum (hash bignum #t))
(define hash-with-char (hash char #t))
(define hash-with-other (hash other #t))

(for ([rep (in-range 5)])
  (display "found: ")
  (time
   (for ([i (in-range 1000)])
 (hash-ref hash-with-bignum bignum #f))
   (void))
  
  (display "not found, but hash code found: ")
  (time
   (for ([i (in-range 1000)])
 (hash-ref hash-with-char bignum #f))
   (void))
  
  (display "hash code not found: ")
  (time
   (for ([i (in-range 1000)])
 (hash-ref hash-with-other bignum #f))
   (void))
  )

At Fri, 12 Aug 2016 17:27:29 -0300, Gustavo Massaccesi wrote:
> I think it's something more subtle. With this definitions:
> 
>  (define long-assoc (for/list ([i (in-range 64 (+ 64 1024))])
>   (cons i #t)))
>  (define hash0 (make-immutable-hash (cons '(0 . #t) long-assoc)))
>  (define hash1 (make-immutable-hash (cons '(1 . #t) long-assoc)))
> 
> With 5 repetitions I get:
> 
> h0: average 3872 +- 30
> h1: average 2511 +- 22
> 
> I get similar results replacing 1024 with other numbers. For example
> with 1048576 both times increase, but the difference is still there.
> 
> Gustavo
> 
> 
> On Fri, Aug 12, 2016 at 12:17 AM, Alex Harsanyi  
> wrote:
> > On Friday, August 12, 2016 at 10:09:21 AM UTC+8, gustavo wrote:
> >> I have these strange times in a microbenchmark that compares the time
> >> to run hash-ref when the key is in the hash and when it is not there:
> >>
> >> ;---
> >> #lang racket/base
> >> (define hash0 #hash((0 . #t)))
> >> (define hash1 #hash((1 . #t)))
> >>
> >> (for ([rep (in-range 5)])
> >>   (display "h0: ")
> >>   (time
> >>(for ([i (in-range 1000)])
> >>  (hash-ref hash0 0 #f))
> >>(void))
> >>
> >>   (display "h1: ")
> >>   (time
> >>(for ([i (in-range 1000)])
> >>  (hash-ref hash1 0 #f))
> >>(void))
> >> )
> >> ;---
> >>
> >> Typical run times, sorted by cpu time
> >>
> >> found:
> >> h0: cpu time: 3775 real time: 3798 gc time: 0
> >> h0: cpu time: 3900 real time: 3936 gc time: 0
> >> h0: cpu time: 3916 real time: 4009 gc time: 0
> >> h0: cpu time: 3947 real time: 4006 gc time: 0
> >> h0: cpu time: 3978 real time: 4096 gc time: 0
> >>
> >> not found:
> >> h1: cpu time: 2278 real time: 2270 gc time: 0
> >> h1: cpu time: 2293 real time: 2360 gc time: 0
> >> h1: cpu time: 2434 real time: 2462 gc time: 0
> >> h1: cpu time: 2449 real time: 2506 gc time: 0
> >> h1: cpu time: 2589 real time: 2682 gc time: 0
> >>
> >> The "not found" version is 40% faster. I tried a few variation with
> >> small hashes and I got similar results.
> >>
> >> I expected that both have roughly the same time (or that the "found"
> >> version were slightly faster).
> >>
> >> Gustavo
> >
> > The "found case" probably has the following 3 steps:
> >
> > 1. compute hash value of the lookup key (0 in your case)
> > 2. locate the bucket for that hash value
> > 3. compare the value 0 against the first value in the bucket => *found*
> >
> > Well, the "not found" case probably has the following 2 steps:
> >
> > 1. compute hash value of the lookup key (0 in your case)
> > 2. locate the bucket for that hash value => *not found*
> >
> > So there is less work to do in the "not found" case.
> >
> > I suspect if you looked up a value that hashes to the same bucket as the 
> value 0, you would get the same time for the "not found" case.
> >
> > A better test is to use a hash with lots of keys, and I'm not sure what 
> > lots 
> mean here.
> >
> > Best Regards,
> > Alex.
> >
> > --
> > You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> > To unsubscribe from this group and stop receiving emails from it, send an 
> email to racket-users+unsubscr...@googlegroups.com.
> > For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To 

Re: [racket-users] Re: Time of hash-ref when the key is (not) found

2016-08-12 Thread Gustavo Massaccesi
I think it's something more subtle. With this definitions:

 (define long-assoc (for/list ([i (in-range 64 (+ 64 1024))])
  (cons i #t)))
 (define hash0 (make-immutable-hash (cons '(0 . #t) long-assoc)))
 (define hash1 (make-immutable-hash (cons '(1 . #t) long-assoc)))

With 5 repetitions I get:

h0: average 3872 +- 30
h1: average 2511 +- 22

I get similar results replacing 1024 with other numbers. For example
with 1048576 both times increase, but the difference is still there.

Gustavo


On Fri, Aug 12, 2016 at 12:17 AM, Alex Harsanyi  wrote:
> On Friday, August 12, 2016 at 10:09:21 AM UTC+8, gustavo wrote:
>> I have these strange times in a microbenchmark that compares the time
>> to run hash-ref when the key is in the hash and when it is not there:
>>
>> ;---
>> #lang racket/base
>> (define hash0 #hash((0 . #t)))
>> (define hash1 #hash((1 . #t)))
>>
>> (for ([rep (in-range 5)])
>>   (display "h0: ")
>>   (time
>>(for ([i (in-range 1000)])
>>  (hash-ref hash0 0 #f))
>>(void))
>>
>>   (display "h1: ")
>>   (time
>>(for ([i (in-range 1000)])
>>  (hash-ref hash1 0 #f))
>>(void))
>> )
>> ;---
>>
>> Typical run times, sorted by cpu time
>>
>> found:
>> h0: cpu time: 3775 real time: 3798 gc time: 0
>> h0: cpu time: 3900 real time: 3936 gc time: 0
>> h0: cpu time: 3916 real time: 4009 gc time: 0
>> h0: cpu time: 3947 real time: 4006 gc time: 0
>> h0: cpu time: 3978 real time: 4096 gc time: 0
>>
>> not found:
>> h1: cpu time: 2278 real time: 2270 gc time: 0
>> h1: cpu time: 2293 real time: 2360 gc time: 0
>> h1: cpu time: 2434 real time: 2462 gc time: 0
>> h1: cpu time: 2449 real time: 2506 gc time: 0
>> h1: cpu time: 2589 real time: 2682 gc time: 0
>>
>> The "not found" version is 40% faster. I tried a few variation with
>> small hashes and I got similar results.
>>
>> I expected that both have roughly the same time (or that the "found"
>> version were slightly faster).
>>
>> Gustavo
>
> The "found case" probably has the following 3 steps:
>
> 1. compute hash value of the lookup key (0 in your case)
> 2. locate the bucket for that hash value
> 3. compare the value 0 against the first value in the bucket => *found*
>
> Well, the "not found" case probably has the following 2 steps:
>
> 1. compute hash value of the lookup key (0 in your case)
> 2. locate the bucket for that hash value => *not found*
>
> So there is less work to do in the "not found" case.
>
> I suspect if you looked up a value that hashes to the same bucket as the 
> value 0, you would get the same time for the "not found" case.
>
> A better test is to use a hash with lots of keys, and I'm not sure what lots 
> mean here.
>
> Best Regards,
> Alex.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.