Re: [racket-users] Word Count program/benchmark performance

2021-03-20 Thread Bogdan Popa
On top of these changes, replacing the hash function with Chez's
`equal-hash` saves another 50ms.  That gets the runtime down to around
900ms on my machine, including Racket startup time.  Ignoring startup
time, this version beats the `simple.c` implementation in the original
repo by about 50ms (800ms vs 850ms).

https://gist.github.com/Bogdanp/b9256e1a91de9083830cb616b3659ff8

Gustavo Massaccesi writes:

> With two additional tricks I saved like 100ms.
>
> * Saving the output port instead of reading the parameter implicitly each
> time.
>
> * Replacing (write (cdr p)) with (write-fx cdr p)) where
>
> (define (write-fx n [o (current-output-port)])
>   ; TODO: Add negatives :)
>   (if (fx> n 0)
>   (let loop ([n n])
> (when (fx> n 10)
>   (loop (fxquotient n 10)))
> (write-byte (fx+ 48 (fxremainder n 10)) o))
>   (write-byte 48 o)))
>
> and at the end of a program something like
>
> (define o (current-output-port))
>   (time (for ([p (in-vector items)]
> #:break (not (pair? p)))
> (write-bytes (car p) o)
> (write-byte 32 o)
> (write-fx (cdr p) o)
> (write-byte 10 o)))  ; and a closing )
>
> Gustavo
>
> On Fri, Mar 19, 2021 at 1:22 PM Sam Tobin-Hochstadt 
> wrote:
>
>> I went from numbers around 1000 ms to 950 ms to 900 ms. There was
>> variance around those numbers, but it was pretty consistent.
>>
>> For more precise answers, there are a few things you can try. One is
>> to measure instructions instead of time (ie, with perf). Another is to
>> run it a bunch of times and take an average. The `hyperfine` tool is
>> good for that. But probably the best advice is to make the program
>> take longer so differences are more apparent -- variation usually
>> increases sub-linearly.
>>
>> Sam
>>
>> On Fri, Mar 19, 2021 at 12:17 PM Laurent  wrote:
>> >
>> > Sam: How do you accurately measure such small speed-ups? On my machines,
>> if I run the same program twice, I can sometimes see more than 10% time
>> difference.
>> >
>> > On Fri, Mar 19, 2021 at 4:10 PM Sam Tobin-Hochstadt <
>> sa...@cs.indiana.edu> wrote:
>> >>
>> >> Use `#:authentic`, and `unsafe-vector*-{ref,set!}` saved about 50 more
>> >> ms on my machine.
>> >>
>> >> Then getting rid of `set!` and just re-binding the relevant variables
>> >> produced another 50 ms speedup.
>> >>
>> >> https://gist.github.com/7fc52e7bdc327fb59c8858a42258c26a
>> >>
>> >> Sam
>> >>
>> >> On Fri, Mar 19, 2021 at 7:21 AM Sam Tobin-Hochstadt
>> >>  wrote:
>> >> >
>> >> > One minor additional suggestion: if you use #:authentic for the
>> struct, it will generate slightly better code for the accessors.
>> >> >
>> >> > Sam
>> >> >
>> >> > On Fri, Mar 19, 2021, 6:18 AM Bogdan Popa  wrote:
>> >> >>
>> >> >> I updated the gist with some cleanups and additional improvements
>> that
>> >> >> get the runtime down to a little over 1s (vs ~350ms for the
>> optimized C
>> >> >> and Rust code) on my maxed-out 2019 MBP and ~600ms on my M1 Mac Mini.
>> >> >>
>> >> >> Pawel Mosakowski writes:
>> >> >>
>> >> >> > Hi Bogdan,
>> >> >> >
>> >> >> > This is a brilliant solution and also completely over my head. It
>> finishes
>> >> >> > in ~3.75s on my PC and is faster than the Python version which
>> basically
>> >> >> > delegates all the work to C. I will need to spend some time on
>> >> >> > understanding it but I am looking forward to learning something
>> new.
>> >> >> >
>> >> >> > Many thanks,
>> >> >> > Pawel
>> >> >> >
>> >> >> > On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote:
>> >> >> >
>> >> >> >> I managed to get it about as fast as Python by making it really
>> >> >> >> imperative and rolling my own hash:
>> >> >> >>
>> >> >> >> https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571
>> >> >> >>
>> >> >> >> Sam Tobin-Hochstadt writes:
>> >> >> >>
>> >> >> >> > Here are several variants of the code:
>> >> >> >> > https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9
>> >> >> >> >
>> >> >> >> > The enabled version is about the fastest I can get without using
>> >> >> >> > `unsafe` (which the rules said not to do). It's possible to
>> optimize a
>> >> >> >> > tiny bit more by avoiding sorting, but only a few milliseconds
>> -- it
>> >> >> >> > would be more significant if there were more different words.
>> >> >> >> >
>> >> >> >> > Switching to bytes works correctly for the given task, but
>> wouldn't
>> >> >> >> > always work in the case of general UTF8 input. But those
>> versions
>> >> >> >> > appeared not be faster for me. Also, writing my own
>> string-downcase
>> >> >> >> > didn't help. And using a big buffer and doing my own newline
>> splitting
>> >> >> >> > didn't help either.
>> >> >> >> >
>> >> >> >> > The version using just a regexp matching on a port (suggested by
>> >> >> >> > Robby) turned out not to be faster either, so my suspicion is
>> that the
>> >> >> >> > original slowness is just using regexps for splitting words.
>> >> >> >> >
>> >> >> >> > Sam
>> >> >> >> >
>> >> >> >> > On Thu, Mar

Re: [racket-users] Word Count program/benchmark performance

2021-03-20 Thread Gustavo Massaccesi
With two additional tricks I saved like 100ms.

* Saving the output port instead of reading the parameter implicitly each
time.

* Replacing (write (cdr p)) with (write-fx cdr p)) where

(define (write-fx n [o (current-output-port)])
  ; TODO: Add negatives :)
  (if (fx> n 0)
  (let loop ([n n])
(when (fx> n 10)
  (loop (fxquotient n 10)))
(write-byte (fx+ 48 (fxremainder n 10)) o))
  (write-byte 48 o)))

and at the end of a program something like

(define o (current-output-port))
  (time (for ([p (in-vector items)]
#:break (not (pair? p)))
(write-bytes (car p) o)
(write-byte 32 o)
(write-fx (cdr p) o)
(write-byte 10 o)))  ; and a closing )

Gustavo

On Fri, Mar 19, 2021 at 1:22 PM Sam Tobin-Hochstadt 
wrote:

> I went from numbers around 1000 ms to 950 ms to 900 ms. There was
> variance around those numbers, but it was pretty consistent.
>
> For more precise answers, there are a few things you can try. One is
> to measure instructions instead of time (ie, with perf). Another is to
> run it a bunch of times and take an average. The `hyperfine` tool is
> good for that. But probably the best advice is to make the program
> take longer so differences are more apparent -- variation usually
> increases sub-linearly.
>
> Sam
>
> On Fri, Mar 19, 2021 at 12:17 PM Laurent  wrote:
> >
> > Sam: How do you accurately measure such small speed-ups? On my machines,
> if I run the same program twice, I can sometimes see more than 10% time
> difference.
> >
> > On Fri, Mar 19, 2021 at 4:10 PM Sam Tobin-Hochstadt <
> sa...@cs.indiana.edu> wrote:
> >>
> >> Use `#:authentic`, and `unsafe-vector*-{ref,set!}` saved about 50 more
> >> ms on my machine.
> >>
> >> Then getting rid of `set!` and just re-binding the relevant variables
> >> produced another 50 ms speedup.
> >>
> >> https://gist.github.com/7fc52e7bdc327fb59c8858a42258c26a
> >>
> >> Sam
> >>
> >> On Fri, Mar 19, 2021 at 7:21 AM Sam Tobin-Hochstadt
> >>  wrote:
> >> >
> >> > One minor additional suggestion: if you use #:authentic for the
> struct, it will generate slightly better code for the accessors.
> >> >
> >> > Sam
> >> >
> >> > On Fri, Mar 19, 2021, 6:18 AM Bogdan Popa  wrote:
> >> >>
> >> >> I updated the gist with some cleanups and additional improvements
> that
> >> >> get the runtime down to a little over 1s (vs ~350ms for the
> optimized C
> >> >> and Rust code) on my maxed-out 2019 MBP and ~600ms on my M1 Mac Mini.
> >> >>
> >> >> Pawel Mosakowski writes:
> >> >>
> >> >> > Hi Bogdan,
> >> >> >
> >> >> > This is a brilliant solution and also completely over my head. It
> finishes
> >> >> > in ~3.75s on my PC and is faster than the Python version which
> basically
> >> >> > delegates all the work to C. I will need to spend some time on
> >> >> > understanding it but I am looking forward to learning something
> new.
> >> >> >
> >> >> > Many thanks,
> >> >> > Pawel
> >> >> >
> >> >> > On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote:
> >> >> >
> >> >> >> I managed to get it about as fast as Python by making it really
> >> >> >> imperative and rolling my own hash:
> >> >> >>
> >> >> >> https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571
> >> >> >>
> >> >> >> Sam Tobin-Hochstadt writes:
> >> >> >>
> >> >> >> > Here are several variants of the code:
> >> >> >> > https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9
> >> >> >> >
> >> >> >> > The enabled version is about the fastest I can get without using
> >> >> >> > `unsafe` (which the rules said not to do). It's possible to
> optimize a
> >> >> >> > tiny bit more by avoiding sorting, but only a few milliseconds
> -- it
> >> >> >> > would be more significant if there were more different words.
> >> >> >> >
> >> >> >> > Switching to bytes works correctly for the given task, but
> wouldn't
> >> >> >> > always work in the case of general UTF8 input. But those
> versions
> >> >> >> > appeared not be faster for me. Also, writing my own
> string-downcase
> >> >> >> > didn't help. And using a big buffer and doing my own newline
> splitting
> >> >> >> > didn't help either.
> >> >> >> >
> >> >> >> > The version using just a regexp matching on a port (suggested by
> >> >> >> > Robby) turned out not to be faster either, so my suspicion is
> that the
> >> >> >> > original slowness is just using regexps for splitting words.
> >> >> >> >
> >> >> >> > Sam
> >> >> >> >
> >> >> >> > On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt
> >> >> >> >  wrote:
> >> >> >> >>
> >> >> >> >> Here's a somewhat-optimized version of the code:
> >> >> >> >>
> >> >> >> >> #lang racket/base
> >> >> >> >> (require racket/string racket/vector racket/port)
> >> >> >> >>
> >> >> >> >> (define h (make-hash))
> >> >> >> >>
> >> >> >> >> (time
> >> >> >> >> (for* ([l (in-lines)]
> >> >> >> >> [w (in-list (string-split l))]
> >> >> >> >> [w* (in-value (string-downcase w))])
> >> >> >> >> (hash-update! h w* add1 0)))
> >> >> >> >>
> >> >> >> >> (define v
> >> >> >> >> 

Re: [racket-users] Word Count program/benchmark performance

2021-03-19 Thread Sam Tobin-Hochstadt
I went from numbers around 1000 ms to 950 ms to 900 ms. There was
variance around those numbers, but it was pretty consistent.

For more precise answers, there are a few things you can try. One is
to measure instructions instead of time (ie, with perf). Another is to
run it a bunch of times and take an average. The `hyperfine` tool is
good for that. But probably the best advice is to make the program
take longer so differences are more apparent -- variation usually
increases sub-linearly.

Sam

On Fri, Mar 19, 2021 at 12:17 PM Laurent  wrote:
>
> Sam: How do you accurately measure such small speed-ups? On my machines, if I 
> run the same program twice, I can sometimes see more than 10% time difference.
>
> On Fri, Mar 19, 2021 at 4:10 PM Sam Tobin-Hochstadt  
> wrote:
>>
>> Use `#:authentic`, and `unsafe-vector*-{ref,set!}` saved about 50 more
>> ms on my machine.
>>
>> Then getting rid of `set!` and just re-binding the relevant variables
>> produced another 50 ms speedup.
>>
>> https://gist.github.com/7fc52e7bdc327fb59c8858a42258c26a
>>
>> Sam
>>
>> On Fri, Mar 19, 2021 at 7:21 AM Sam Tobin-Hochstadt
>>  wrote:
>> >
>> > One minor additional suggestion: if you use #:authentic for the struct, it 
>> > will generate slightly better code for the accessors.
>> >
>> > Sam
>> >
>> > On Fri, Mar 19, 2021, 6:18 AM Bogdan Popa  wrote:
>> >>
>> >> I updated the gist with some cleanups and additional improvements that
>> >> get the runtime down to a little over 1s (vs ~350ms for the optimized C
>> >> and Rust code) on my maxed-out 2019 MBP and ~600ms on my M1 Mac Mini.
>> >>
>> >> Pawel Mosakowski writes:
>> >>
>> >> > Hi Bogdan,
>> >> >
>> >> > This is a brilliant solution and also completely over my head. It 
>> >> > finishes
>> >> > in ~3.75s on my PC and is faster than the Python version which basically
>> >> > delegates all the work to C. I will need to spend some time on
>> >> > understanding it but I am looking forward to learning something new.
>> >> >
>> >> > Many thanks,
>> >> > Pawel
>> >> >
>> >> > On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote:
>> >> >
>> >> >> I managed to get it about as fast as Python by making it really
>> >> >> imperative and rolling my own hash:
>> >> >>
>> >> >> https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571
>> >> >>
>> >> >> Sam Tobin-Hochstadt writes:
>> >> >>
>> >> >> > Here are several variants of the code:
>> >> >> > https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9
>> >> >> >
>> >> >> > The enabled version is about the fastest I can get without using
>> >> >> > `unsafe` (which the rules said not to do). It's possible to optimize 
>> >> >> > a
>> >> >> > tiny bit more by avoiding sorting, but only a few milliseconds -- it
>> >> >> > would be more significant if there were more different words.
>> >> >> >
>> >> >> > Switching to bytes works correctly for the given task, but wouldn't
>> >> >> > always work in the case of general UTF8 input. But those versions
>> >> >> > appeared not be faster for me. Also, writing my own string-downcase
>> >> >> > didn't help. And using a big buffer and doing my own newline 
>> >> >> > splitting
>> >> >> > didn't help either.
>> >> >> >
>> >> >> > The version using just a regexp matching on a port (suggested by
>> >> >> > Robby) turned out not to be faster either, so my suspicion is that 
>> >> >> > the
>> >> >> > original slowness is just using regexps for splitting words.
>> >> >> >
>> >> >> > Sam
>> >> >> >
>> >> >> > On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt
>> >> >> >  wrote:
>> >> >> >>
>> >> >> >> Here's a somewhat-optimized version of the code:
>> >> >> >>
>> >> >> >> #lang racket/base
>> >> >> >> (require racket/string racket/vector racket/port)
>> >> >> >>
>> >> >> >> (define h (make-hash))
>> >> >> >>
>> >> >> >> (time
>> >> >> >> (for* ([l (in-lines)]
>> >> >> >> [w (in-list (string-split l))]
>> >> >> >> [w* (in-value (string-downcase w))])
>> >> >> >> (hash-update! h w* add1 0)))
>> >> >> >>
>> >> >> >> (define v
>> >> >> >> (time
>> >> >> >> (for/vector #:length (hash-count h)
>> >> >> >> ([(k v) (in-hash h)])
>> >> >> >> (cons k v
>> >> >> >> (time (vector-sort! v > #:key cdr))
>> >> >> >> (define p (current-output-port) #;(open-output-nowhere))
>> >> >> >> (time
>> >> >> >> (for ([pair (in-vector v)])
>> >> >> >> (write-string (car pair) p)
>> >> >> >> (write-string (number->string (cdr pair)) p)
>> >> >> >> (newline p)))
>> >> >> >>
>> >> >> >> It's much more imperative, but also pretty nice and compact. The
>> >> >> >> `printf` optimization is significant for that portion of the 
>> >> >> >> program,
>> >> >> >> but that isn't much of the running time. The overall running time 
>> >> >> >> for
>> >> >> >> 10 copies of the KJV is about 9 seconds on my laptop.
>> >> >> >>
>> >> >> >> I think the remaining difference between Racket and other languages 
>> >> >> >> is
>> >> >> >> likely the `string-split` and `string-downcase` functions, plus the
>> >> >> >> relatively-ineffic

Re: [racket-users] Word Count program/benchmark performance

2021-03-19 Thread Laurent
Sam: How do you accurately measure such small speed-ups? On my machines, if
I run the same program twice, I can sometimes see more than 10% time
difference.

On Fri, Mar 19, 2021 at 4:10 PM Sam Tobin-Hochstadt 
wrote:

> Use `#:authentic`, and `unsafe-vector*-{ref,set!}` saved about 50 more
> ms on my machine.
>
> Then getting rid of `set!` and just re-binding the relevant variables
> produced another 50 ms speedup.
>
> https://gist.github.com/7fc52e7bdc327fb59c8858a42258c26a
>
> Sam
>
> On Fri, Mar 19, 2021 at 7:21 AM Sam Tobin-Hochstadt
>  wrote:
> >
> > One minor additional suggestion: if you use #:authentic for the struct,
> it will generate slightly better code for the accessors.
> >
> > Sam
> >
> > On Fri, Mar 19, 2021, 6:18 AM Bogdan Popa  wrote:
> >>
> >> I updated the gist with some cleanups and additional improvements that
> >> get the runtime down to a little over 1s (vs ~350ms for the optimized C
> >> and Rust code) on my maxed-out 2019 MBP and ~600ms on my M1 Mac Mini.
> >>
> >> Pawel Mosakowski writes:
> >>
> >> > Hi Bogdan,
> >> >
> >> > This is a brilliant solution and also completely over my head. It
> finishes
> >> > in ~3.75s on my PC and is faster than the Python version which
> basically
> >> > delegates all the work to C. I will need to spend some time on
> >> > understanding it but I am looking forward to learning something new.
> >> >
> >> > Many thanks,
> >> > Pawel
> >> >
> >> > On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote:
> >> >
> >> >> I managed to get it about as fast as Python by making it really
> >> >> imperative and rolling my own hash:
> >> >>
> >> >> https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571
> >> >>
> >> >> Sam Tobin-Hochstadt writes:
> >> >>
> >> >> > Here are several variants of the code:
> >> >> > https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9
> >> >> >
> >> >> > The enabled version is about the fastest I can get without using
> >> >> > `unsafe` (which the rules said not to do). It's possible to
> optimize a
> >> >> > tiny bit more by avoiding sorting, but only a few milliseconds --
> it
> >> >> > would be more significant if there were more different words.
> >> >> >
> >> >> > Switching to bytes works correctly for the given task, but wouldn't
> >> >> > always work in the case of general UTF8 input. But those versions
> >> >> > appeared not be faster for me. Also, writing my own string-downcase
> >> >> > didn't help. And using a big buffer and doing my own newline
> splitting
> >> >> > didn't help either.
> >> >> >
> >> >> > The version using just a regexp matching on a port (suggested by
> >> >> > Robby) turned out not to be faster either, so my suspicion is that
> the
> >> >> > original slowness is just using regexps for splitting words.
> >> >> >
> >> >> > Sam
> >> >> >
> >> >> > On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt
> >> >> >  wrote:
> >> >> >>
> >> >> >> Here's a somewhat-optimized version of the code:
> >> >> >>
> >> >> >> #lang racket/base
> >> >> >> (require racket/string racket/vector racket/port)
> >> >> >>
> >> >> >> (define h (make-hash))
> >> >> >>
> >> >> >> (time
> >> >> >> (for* ([l (in-lines)]
> >> >> >> [w (in-list (string-split l))]
> >> >> >> [w* (in-value (string-downcase w))])
> >> >> >> (hash-update! h w* add1 0)))
> >> >> >>
> >> >> >> (define v
> >> >> >> (time
> >> >> >> (for/vector #:length (hash-count h)
> >> >> >> ([(k v) (in-hash h)])
> >> >> >> (cons k v
> >> >> >> (time (vector-sort! v > #:key cdr))
> >> >> >> (define p (current-output-port) #;(open-output-nowhere))
> >> >> >> (time
> >> >> >> (for ([pair (in-vector v)])
> >> >> >> (write-string (car pair) p)
> >> >> >> (write-string (number->string (cdr pair)) p)
> >> >> >> (newline p)))
> >> >> >>
> >> >> >> It's much more imperative, but also pretty nice and compact. The
> >> >> >> `printf` optimization is significant for that portion of the
> program,
> >> >> >> but that isn't much of the running time. The overall running time
> for
> >> >> >> 10 copies of the KJV is about 9 seconds on my laptop.
> >> >> >>
> >> >> >> I think the remaining difference between Racket and other
> languages is
> >> >> >> likely the `string-split` and `string-downcase` functions, plus
> the
> >> >> >> relatively-inefficient string representation that Racket uses.
> >> >> >>
> >> >> >> Sam
> >> >> >>
> >> >> >>
> >> >> >> On Thu, Mar 18, 2021 at 10:28 AM Pawel Mosakowski <
> pa...@mosakowski.net>
> >> >> wrote:
> >> >> >> >
> >> >> >> > Hi David,
> >> >> >> >
> >> >> >> > Yes, the 21 seconds includes the interpreter startup time. I
> have
> >> >> done a simple test to see how long it takes:
> >> >> >> >
> >> >> >> > $ time racket -e '(displayln "Hello, world")'
> >> >> >> > Hello, world
> >> >> >> >
> >> >> >> > real 0m0.479s
> >> >> >> > user 0m0.449s
> >> >> >> > sys 0m0.030s
> >> >> >> >
> >> >> >> > I have also put my code inside a main function and profiled it:
> >> >> >> >
> >> >> >> > Profiling results
> >> >> >> > -

Re: [racket-users] Word Count program/benchmark performance

2021-03-19 Thread Sam Tobin-Hochstadt
Use `#:authentic`, and `unsafe-vector*-{ref,set!}` saved about 50 more
ms on my machine.

Then getting rid of `set!` and just re-binding the relevant variables
produced another 50 ms speedup.

https://gist.github.com/7fc52e7bdc327fb59c8858a42258c26a

Sam

On Fri, Mar 19, 2021 at 7:21 AM Sam Tobin-Hochstadt
 wrote:
>
> One minor additional suggestion: if you use #:authentic for the struct, it 
> will generate slightly better code for the accessors.
>
> Sam
>
> On Fri, Mar 19, 2021, 6:18 AM Bogdan Popa  wrote:
>>
>> I updated the gist with some cleanups and additional improvements that
>> get the runtime down to a little over 1s (vs ~350ms for the optimized C
>> and Rust code) on my maxed-out 2019 MBP and ~600ms on my M1 Mac Mini.
>>
>> Pawel Mosakowski writes:
>>
>> > Hi Bogdan,
>> >
>> > This is a brilliant solution and also completely over my head. It finishes
>> > in ~3.75s on my PC and is faster than the Python version which basically
>> > delegates all the work to C. I will need to spend some time on
>> > understanding it but I am looking forward to learning something new.
>> >
>> > Many thanks,
>> > Pawel
>> >
>> > On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote:
>> >
>> >> I managed to get it about as fast as Python by making it really
>> >> imperative and rolling my own hash:
>> >>
>> >> https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571
>> >>
>> >> Sam Tobin-Hochstadt writes:
>> >>
>> >> > Here are several variants of the code:
>> >> > https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9
>> >> >
>> >> > The enabled version is about the fastest I can get without using
>> >> > `unsafe` (which the rules said not to do). It's possible to optimize a
>> >> > tiny bit more by avoiding sorting, but only a few milliseconds -- it
>> >> > would be more significant if there were more different words.
>> >> >
>> >> > Switching to bytes works correctly for the given task, but wouldn't
>> >> > always work in the case of general UTF8 input. But those versions
>> >> > appeared not be faster for me. Also, writing my own string-downcase
>> >> > didn't help. And using a big buffer and doing my own newline splitting
>> >> > didn't help either.
>> >> >
>> >> > The version using just a regexp matching on a port (suggested by
>> >> > Robby) turned out not to be faster either, so my suspicion is that the
>> >> > original slowness is just using regexps for splitting words.
>> >> >
>> >> > Sam
>> >> >
>> >> > On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt
>> >> >  wrote:
>> >> >>
>> >> >> Here's a somewhat-optimized version of the code:
>> >> >>
>> >> >> #lang racket/base
>> >> >> (require racket/string racket/vector racket/port)
>> >> >>
>> >> >> (define h (make-hash))
>> >> >>
>> >> >> (time
>> >> >> (for* ([l (in-lines)]
>> >> >> [w (in-list (string-split l))]
>> >> >> [w* (in-value (string-downcase w))])
>> >> >> (hash-update! h w* add1 0)))
>> >> >>
>> >> >> (define v
>> >> >> (time
>> >> >> (for/vector #:length (hash-count h)
>> >> >> ([(k v) (in-hash h)])
>> >> >> (cons k v
>> >> >> (time (vector-sort! v > #:key cdr))
>> >> >> (define p (current-output-port) #;(open-output-nowhere))
>> >> >> (time
>> >> >> (for ([pair (in-vector v)])
>> >> >> (write-string (car pair) p)
>> >> >> (write-string (number->string (cdr pair)) p)
>> >> >> (newline p)))
>> >> >>
>> >> >> It's much more imperative, but also pretty nice and compact. The
>> >> >> `printf` optimization is significant for that portion of the program,
>> >> >> but that isn't much of the running time. The overall running time for
>> >> >> 10 copies of the KJV is about 9 seconds on my laptop.
>> >> >>
>> >> >> I think the remaining difference between Racket and other languages is
>> >> >> likely the `string-split` and `string-downcase` functions, plus the
>> >> >> relatively-inefficient string representation that Racket uses.
>> >> >>
>> >> >> Sam
>> >> >>
>> >> >>
>> >> >> On Thu, Mar 18, 2021 at 10:28 AM Pawel Mosakowski 
>> >> >> 
>> >> wrote:
>> >> >> >
>> >> >> > Hi David,
>> >> >> >
>> >> >> > Yes, the 21 seconds includes the interpreter startup time. I have
>> >> done a simple test to see how long it takes:
>> >> >> >
>> >> >> > $ time racket -e '(displayln "Hello, world")'
>> >> >> > Hello, world
>> >> >> >
>> >> >> > real 0m0.479s
>> >> >> > user 0m0.449s
>> >> >> > sys 0m0.030s
>> >> >> >
>> >> >> > I have also put my code inside a main function and profiled it:
>> >> >> >
>> >> >> > Profiling results
>> >> >> > -
>> >> >> > Total cpu time observed: 20910ms (out of 20970ms)
>> >> >> > Number of samples taken: 382 (once every 55ms)
>> >> >> > (Hiding functions with self<1.0% and local<2.0%: 1 of 12 hidden)
>> >> >> >
>> >> >> > ==
>> >> >> > Caller
>> >> >> > Idx Total Self Name+src Local%
>> >> >> > ms(pct) ms(pct) Callee
>> >> >> > ==
>> >> >> > [1] 20910(100.0%) 0(0.0%) [running body

Re: [racket-users] Word Count program/benchmark performance

2021-03-19 Thread Laurent
(Welcome to Racket v8.0.0.1 [cs]. )
All results are measured on my laptop on the 10x file with `$ time racket
`, thus including the Racket VM.

* Bogdan's version with #lang racket/base: 1s.
* Dominik's version with vectors of length 256 (instead of 26) and
splitting on spaces/return/newline only and #lang racket/base: 1.6s

But Bogdan's printing takes only ~100ms while Dominik's takes ~400ms.

* I also implemented a variant of Dominik's 'discrimination tree' to use a
hasheqv instead of a vector at each node: 6.2s (but the memory footprint is
likely nicer :-p )
  This also uses `in-lines` instead of Bogdan's buffers so there may still
be something to gain here.
* Replacing a hasheqv with an assoc: 4.2s.
* Starting with an assoc and switching to a hasheqv when there are too many
elements (n=20, couldn't do better): 3.9s

Code is here:
https://gist.github.com/Metaxal/ae0a6937d8f388f3f40ec7396041be55

I also noticed that `dict-ref` is *really* slow (35s) compared to `assv`.

@Bogdan: You can use `#:key` in `sort`.


On Fri, Mar 19, 2021 at 12:08 PM Bogdan Popa  wrote:

> Nice!  It's worth pointing out, though, that by limiting yourself to
> alpha chars, you're processing about 8% less data and the results don't
> pass the tests. :P
>
> $ wc kjvbible_x10.txt
>   998170 8211330 43325060
>
> $ sed 's/[a-zA-Z ]//g' < kjvbible_x10.txt | wc
>   998170  739310  3600800
>
> I think your version would still be faster, but I'm curious what the
> numbers would look like if only whitespace chars were considered word
> separators.
>
> Dominik Pantůček writes:
>
> > Another attack of [1]. But yeah, why not do some [2].
> >
> > Trees to the rescue [3].
> >
> > $ racket --version
> > Welcome to Racket v8.0 [cs].
> >
> > $ racket countwords-bogdan2.rkt  > cpu time: 135 real time: 135 gc time: 8
> >
> > $ racket countwords-dzoe2.rkt  > cpu time: 69 real time: 69 gc time: 3
> >
> > I just changed (countwords) to (time (countwords)) in Bogdan's code to
> > measure the running time.
> >
> > The difference is that I am positively defining which letters form words
> > (a-z, A-Z) and that all others are treated as word separators. The
> > buffer size is the same - and honestly, the speedup between 1024 and
> > 1024^2 bytes buffer is barely measurable.
> >
> > The only option for further speedup I can immediately think of is to
> > allocate a huge vector of wtnodes and change chld field to be a starting
> > index into this big vector (should reduce allocations).
> >
> > Btw, making it unsafe does not speed it up at all (probably CS
> > recognizes the vectors and all those refs are inlined anyway).
> >
> >
> > Cheers,
> > Dominik
> >
> > [1] https://xkcd.com/386/
> > [2] http://phdcomics.com/comics/archive.php?comicid=1735
> > [3] https://gist.github.com/dzoep/0e081d0544afac539a4829179c601e0e
> >
> > On 19. 03. 21 11:18, Bogdan Popa wrote:
> >> I updated the gist with some cleanups and additional improvements that
> >> get the runtime down to a little over 1s (vs ~350ms for the optimized C
> >> and Rust code) on my maxed-out 2019 MBP and ~600ms on my M1 Mac Mini.
> >>
> >> Pawel Mosakowski writes:
> >>
> >>> Hi Bogdan,
> >>>
> >>> This is a brilliant solution and also completely over my head. It
> finishes
> >>> in ~3.75s on my PC and is faster than the Python version which
> basically
> >>> delegates all the work to C. I will need to spend some time on
> >>> understanding it but I am looking forward to learning something new.
> >>>
> >>> Many thanks,
> >>> Pawel
> >>>
> >>> On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote:
> >>>
>  I managed to get it about as fast as Python by making it really
>  imperative and rolling my own hash:
> 
>  https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571
> 
>  Sam Tobin-Hochstadt writes:
> 
> > Here are several variants of the code:
> > https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9
> >
> > The enabled version is about the fastest I can get without using
> > `unsafe` (which the rules said not to do). It's possible to optimize
> a
> > tiny bit more by avoiding sorting, but only a few milliseconds -- it
> > would be more significant if there were more different words.
> >
> > Switching to bytes works correctly for the given task, but wouldn't
> > always work in the case of general UTF8 input. But those versions
> > appeared not be faster for me. Also, writing my own string-downcase
> > didn't help. And using a big buffer and doing my own newline
> splitting
> > didn't help either.
> >
> > The version using just a regexp matching on a port (suggested by
> > Robby) turned out not to be faster either, so my suspicion is that
> the
> > original slowness is just using regexps for splitting words.
> >
> > Sam
> >
> > On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt
> >  wrote:
> >>
> >> Here's a somewhat-optimized version of the cod

Re: [racket-users] Word Count program/benchmark performance

2021-03-19 Thread Bogdan Popa
Nice!  It's worth pointing out, though, that by limiting yourself to
alpha chars, you're processing about 8% less data and the results don't
pass the tests. :P

$ wc kjvbible_x10.txt
  998170 8211330 43325060

$ sed 's/[a-zA-Z ]//g' < kjvbible_x10.txt | wc
  998170  739310  3600800

I think your version would still be faster, but I'm curious what the
numbers would look like if only whitespace chars were considered word
separators.

Dominik Pantůček writes:

> Another attack of [1]. But yeah, why not do some [2].
>
> Trees to the rescue [3].
>
> $ racket --version
> Welcome to Racket v8.0 [cs].
>
> $ racket countwords-bogdan2.rkt  cpu time: 135 real time: 135 gc time: 8
>
> $ racket countwords-dzoe2.rkt  cpu time: 69 real time: 69 gc time: 3
>
> I just changed (countwords) to (time (countwords)) in Bogdan's code to
> measure the running time.
>
> The difference is that I am positively defining which letters form words
> (a-z, A-Z) and that all others are treated as word separators. The
> buffer size is the same - and honestly, the speedup between 1024 and
> 1024^2 bytes buffer is barely measurable.
>
> The only option for further speedup I can immediately think of is to
> allocate a huge vector of wtnodes and change chld field to be a starting
> index into this big vector (should reduce allocations).
>
> Btw, making it unsafe does not speed it up at all (probably CS
> recognizes the vectors and all those refs are inlined anyway).
>
>
> Cheers,
> Dominik
>
> [1] https://xkcd.com/386/
> [2] http://phdcomics.com/comics/archive.php?comicid=1735
> [3] https://gist.github.com/dzoep/0e081d0544afac539a4829179c601e0e
>
> On 19. 03. 21 11:18, Bogdan Popa wrote:
>> I updated the gist with some cleanups and additional improvements that
>> get the runtime down to a little over 1s (vs ~350ms for the optimized C
>> and Rust code) on my maxed-out 2019 MBP and ~600ms on my M1 Mac Mini.
>>
>> Pawel Mosakowski writes:
>>
>>> Hi Bogdan,
>>>
>>> This is a brilliant solution and also completely over my head. It finishes
>>> in ~3.75s on my PC and is faster than the Python version which basically
>>> delegates all the work to C. I will need to spend some time on
>>> understanding it but I am looking forward to learning something new.
>>>
>>> Many thanks,
>>> Pawel
>>>
>>> On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote:
>>>
 I managed to get it about as fast as Python by making it really
 imperative and rolling my own hash:

 https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571

 Sam Tobin-Hochstadt writes:

> Here are several variants of the code:
> https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9
>
> The enabled version is about the fastest I can get without using
> `unsafe` (which the rules said not to do). It's possible to optimize a
> tiny bit more by avoiding sorting, but only a few milliseconds -- it
> would be more significant if there were more different words.
>
> Switching to bytes works correctly for the given task, but wouldn't
> always work in the case of general UTF8 input. But those versions
> appeared not be faster for me. Also, writing my own string-downcase
> didn't help. And using a big buffer and doing my own newline splitting
> didn't help either.
>
> The version using just a regexp matching on a port (suggested by
> Robby) turned out not to be faster either, so my suspicion is that the
> original slowness is just using regexps for splitting words.
>
> Sam
>
> On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt
>  wrote:
>>
>> Here's a somewhat-optimized version of the code:
>>
>> #lang racket/base
>> (require racket/string racket/vector racket/port)
>>
>> (define h (make-hash))
>>
>> (time
>> (for* ([l (in-lines)]
>> [w (in-list (string-split l))]
>> [w* (in-value (string-downcase w))])
>> (hash-update! h w* add1 0)))
>>
>> (define v
>> (time
>> (for/vector #:length (hash-count h)
>> ([(k v) (in-hash h)])
>> (cons k v
>> (time (vector-sort! v > #:key cdr))
>> (define p (current-output-port) #;(open-output-nowhere))
>> (time
>> (for ([pair (in-vector v)])
>> (write-string (car pair) p)
>> (write-string (number->string (cdr pair)) p)
>> (newline p)))
>>
>> It's much more imperative, but also pretty nice and compact. The
>> `printf` optimization is significant for that portion of the program,
>> but that isn't much of the running time. The overall running time for
>> 10 copies of the KJV is about 9 seconds on my laptop.
>>
>> I think the remaining difference between Racket and other languages is
>> likely the `string-split` and `string-downcase` functions, plus the
>> relatively-inefficient string representation that Racket uses.
>>
>> Sam
>>
>>
>> On Thu, Mar 18, 2021 at 10:28 AM Pawel M

Re: [racket-users] Word Count program/benchmark performance

2021-03-19 Thread Dominik Pantůček
Another attack of [1]. But yeah, why not do some [2].

Trees to the rescue [3].

$ racket --version
Welcome to Racket v8.0 [cs].

$ racket countwords-bogdan2.rkt https://xkcd.com/386/
[2] http://phdcomics.com/comics/archive.php?comicid=1735
[3] https://gist.github.com/dzoep/0e081d0544afac539a4829179c601e0e

On 19. 03. 21 11:18, Bogdan Popa wrote:
> I updated the gist with some cleanups and additional improvements that
> get the runtime down to a little over 1s (vs ~350ms for the optimized C
> and Rust code) on my maxed-out 2019 MBP and ~600ms on my M1 Mac Mini.
> 
> Pawel Mosakowski writes:
> 
>> Hi Bogdan,
>>
>> This is a brilliant solution and also completely over my head. It finishes
>> in ~3.75s on my PC and is faster than the Python version which basically
>> delegates all the work to C. I will need to spend some time on
>> understanding it but I am looking forward to learning something new.
>>
>> Many thanks,
>> Pawel
>>
>> On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote:
>>
>>> I managed to get it about as fast as Python by making it really
>>> imperative and rolling my own hash:
>>>
>>> https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571
>>>
>>> Sam Tobin-Hochstadt writes:
>>>
 Here are several variants of the code:
 https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9

 The enabled version is about the fastest I can get without using
 `unsafe` (which the rules said not to do). It's possible to optimize a
 tiny bit more by avoiding sorting, but only a few milliseconds -- it
 would be more significant if there were more different words.

 Switching to bytes works correctly for the given task, but wouldn't
 always work in the case of general UTF8 input. But those versions
 appeared not be faster for me. Also, writing my own string-downcase
 didn't help. And using a big buffer and doing my own newline splitting
 didn't help either.

 The version using just a regexp matching on a port (suggested by
 Robby) turned out not to be faster either, so my suspicion is that the
 original slowness is just using regexps for splitting words.

 Sam

 On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt
  wrote:
>
> Here's a somewhat-optimized version of the code:
>
> #lang racket/base
> (require racket/string racket/vector racket/port)
>
> (define h (make-hash))
>
> (time
> (for* ([l (in-lines)]
> [w (in-list (string-split l))]
> [w* (in-value (string-downcase w))])
> (hash-update! h w* add1 0)))
>
> (define v
> (time
> (for/vector #:length (hash-count h)
> ([(k v) (in-hash h)])
> (cons k v
> (time (vector-sort! v > #:key cdr))
> (define p (current-output-port) #;(open-output-nowhere))
> (time
> (for ([pair (in-vector v)])
> (write-string (car pair) p)
> (write-string (number->string (cdr pair)) p)
> (newline p)))
>
> It's much more imperative, but also pretty nice and compact. The
> `printf` optimization is significant for that portion of the program,
> but that isn't much of the running time. The overall running time for
> 10 copies of the KJV is about 9 seconds on my laptop.
>
> I think the remaining difference between Racket and other languages is
> likely the `string-split` and `string-downcase` functions, plus the
> relatively-inefficient string representation that Racket uses.
>
> Sam
>
>
> On Thu, Mar 18, 2021 at 10:28 AM Pawel Mosakowski 
>>> wrote:
>>
>> Hi David,
>>
>> Yes, the 21 seconds includes the interpreter startup time. I have
>>> done a simple test to see how long it takes:
>>
>> $ time racket -e '(displayln "Hello, world")'
>> Hello, world
>>
>> real 0m0.479s
>> user 0m0.449s
>> sys 0m0.030s
>>
>> I have also put my code inside a main function and profiled it:
>>
>> Profiling results
>> -
>> Total cpu time observed: 20910ms (out of 20970ms)
>> Number of samples taken: 382 (once every 55ms)
>> (Hiding functions with self<1.0% and local<2.0%: 1 of 12 hidden)
>>
>> ==
>> Caller
>> Idx Total Self Name+src Local%
>> ms(pct) ms(pct) Callee
>> ==
>> [1] 20910(100.0%) 0(0.0%) [running body]
>>> ...word-occurences-profile.rkt":##f
>> profile-thunk [2] 100.0%
>> --
>> [running body] [1] 100.0%
>> [2] 20910(100.0%) 0(0.0%) profile-thunk
>>> ...ket/pkgs/profile-lib/main.rkt:9:0
>> run [3] 100.0%
>> --
>> profile-thunk [2] 100.0%
>> [3] 20910(100.0%) 0(0.0%) run
>>> ...share/racket/pkgs/profile-lib/main.rkt:39:2
>> main [4] 100.0%
>> ---

Re: [racket-users] Word Count program/benchmark performance

2021-03-19 Thread Sam Tobin-Hochstadt
One minor additional suggestion: if you use #:authentic for the struct, it
will generate slightly better code for the accessors.

Sam

On Fri, Mar 19, 2021, 6:18 AM Bogdan Popa  wrote:

> I updated the gist with some cleanups and additional improvements that
> get the runtime down to a little over 1s (vs ~350ms for the optimized C
> and Rust code) on my maxed-out 2019 MBP and ~600ms on my M1 Mac Mini.
>
> Pawel Mosakowski writes:
>
> > Hi Bogdan,
> >
> > This is a brilliant solution and also completely over my head. It
> finishes
> > in ~3.75s on my PC and is faster than the Python version which basically
> > delegates all the work to C. I will need to spend some time on
> > understanding it but I am looking forward to learning something new.
> >
> > Many thanks,
> > Pawel
> >
> > On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote:
> >
> >> I managed to get it about as fast as Python by making it really
> >> imperative and rolling my own hash:
> >>
> >> https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571
> >>
> >> Sam Tobin-Hochstadt writes:
> >>
> >> > Here are several variants of the code:
> >> > https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9
> >> >
> >> > The enabled version is about the fastest I can get without using
> >> > `unsafe` (which the rules said not to do). It's possible to optimize a
> >> > tiny bit more by avoiding sorting, but only a few milliseconds -- it
> >> > would be more significant if there were more different words.
> >> >
> >> > Switching to bytes works correctly for the given task, but wouldn't
> >> > always work in the case of general UTF8 input. But those versions
> >> > appeared not be faster for me. Also, writing my own string-downcase
> >> > didn't help. And using a big buffer and doing my own newline splitting
> >> > didn't help either.
> >> >
> >> > The version using just a regexp matching on a port (suggested by
> >> > Robby) turned out not to be faster either, so my suspicion is that the
> >> > original slowness is just using regexps for splitting words.
> >> >
> >> > Sam
> >> >
> >> > On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt
> >> >  wrote:
> >> >>
> >> >> Here's a somewhat-optimized version of the code:
> >> >>
> >> >> #lang racket/base
> >> >> (require racket/string racket/vector racket/port)
> >> >>
> >> >> (define h (make-hash))
> >> >>
> >> >> (time
> >> >> (for* ([l (in-lines)]
> >> >> [w (in-list (string-split l))]
> >> >> [w* (in-value (string-downcase w))])
> >> >> (hash-update! h w* add1 0)))
> >> >>
> >> >> (define v
> >> >> (time
> >> >> (for/vector #:length (hash-count h)
> >> >> ([(k v) (in-hash h)])
> >> >> (cons k v
> >> >> (time (vector-sort! v > #:key cdr))
> >> >> (define p (current-output-port) #;(open-output-nowhere))
> >> >> (time
> >> >> (for ([pair (in-vector v)])
> >> >> (write-string (car pair) p)
> >> >> (write-string (number->string (cdr pair)) p)
> >> >> (newline p)))
> >> >>
> >> >> It's much more imperative, but also pretty nice and compact. The
> >> >> `printf` optimization is significant for that portion of the program,
> >> >> but that isn't much of the running time. The overall running time for
> >> >> 10 copies of the KJV is about 9 seconds on my laptop.
> >> >>
> >> >> I think the remaining difference between Racket and other languages
> is
> >> >> likely the `string-split` and `string-downcase` functions, plus the
> >> >> relatively-inefficient string representation that Racket uses.
> >> >>
> >> >> Sam
> >> >>
> >> >>
> >> >> On Thu, Mar 18, 2021 at 10:28 AM Pawel Mosakowski <
> pa...@mosakowski.net>
> >> wrote:
> >> >> >
> >> >> > Hi David,
> >> >> >
> >> >> > Yes, the 21 seconds includes the interpreter startup time. I have
> >> done a simple test to see how long it takes:
> >> >> >
> >> >> > $ time racket -e '(displayln "Hello, world")'
> >> >> > Hello, world
> >> >> >
> >> >> > real 0m0.479s
> >> >> > user 0m0.449s
> >> >> > sys 0m0.030s
> >> >> >
> >> >> > I have also put my code inside a main function and profiled it:
> >> >> >
> >> >> > Profiling results
> >> >> > -
> >> >> > Total cpu time observed: 20910ms (out of 20970ms)
> >> >> > Number of samples taken: 382 (once every 55ms)
> >> >> > (Hiding functions with self<1.0% and local<2.0%: 1 of 12 hidden)
> >> >> >
> >> >> > ==
> >> >> > Caller
> >> >> > Idx Total Self Name+src Local%
> >> >> > ms(pct) ms(pct) Callee
> >> >> > ==
> >> >> > [1] 20910(100.0%) 0(0.0%) [running body]
> >> ...word-occurences-profile.rkt":##f
> >> >> > profile-thunk [2] 100.0%
> >> >> > --
> >> >> > [running body] [1] 100.0%
> >> >> > [2] 20910(100.0%) 0(0.0%) profile-thunk
> >> ...ket/pkgs/profile-lib/main.rkt:9:0
> >> >> > run [3] 100.0%
> >> >> > --
> >> >> > profile-thunk [2] 100.0%
> >> >> > [3] 2

Re: [racket-users] Word Count program/benchmark performance

2021-03-19 Thread Bogdan Popa
I updated the gist with some cleanups and additional improvements that
get the runtime down to a little over 1s (vs ~350ms for the optimized C
and Rust code) on my maxed-out 2019 MBP and ~600ms on my M1 Mac Mini.

Pawel Mosakowski writes:

> Hi Bogdan,
>
> This is a brilliant solution and also completely over my head. It finishes
> in ~3.75s on my PC and is faster than the Python version which basically
> delegates all the work to C. I will need to spend some time on
> understanding it but I am looking forward to learning something new.
>
> Many thanks,
> Pawel
>
> On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote:
>
>> I managed to get it about as fast as Python by making it really
>> imperative and rolling my own hash:
>>
>> https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571
>>
>> Sam Tobin-Hochstadt writes:
>>
>> > Here are several variants of the code:
>> > https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9
>> >
>> > The enabled version is about the fastest I can get without using
>> > `unsafe` (which the rules said not to do). It's possible to optimize a
>> > tiny bit more by avoiding sorting, but only a few milliseconds -- it
>> > would be more significant if there were more different words.
>> >
>> > Switching to bytes works correctly for the given task, but wouldn't
>> > always work in the case of general UTF8 input. But those versions
>> > appeared not be faster for me. Also, writing my own string-downcase
>> > didn't help. And using a big buffer and doing my own newline splitting
>> > didn't help either.
>> >
>> > The version using just a regexp matching on a port (suggested by
>> > Robby) turned out not to be faster either, so my suspicion is that the
>> > original slowness is just using regexps for splitting words.
>> >
>> > Sam
>> >
>> > On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt
>> >  wrote:
>> >>
>> >> Here's a somewhat-optimized version of the code:
>> >>
>> >> #lang racket/base
>> >> (require racket/string racket/vector racket/port)
>> >>
>> >> (define h (make-hash))
>> >>
>> >> (time
>> >> (for* ([l (in-lines)]
>> >> [w (in-list (string-split l))]
>> >> [w* (in-value (string-downcase w))])
>> >> (hash-update! h w* add1 0)))
>> >>
>> >> (define v
>> >> (time
>> >> (for/vector #:length (hash-count h)
>> >> ([(k v) (in-hash h)])
>> >> (cons k v
>> >> (time (vector-sort! v > #:key cdr))
>> >> (define p (current-output-port) #;(open-output-nowhere))
>> >> (time
>> >> (for ([pair (in-vector v)])
>> >> (write-string (car pair) p)
>> >> (write-string (number->string (cdr pair)) p)
>> >> (newline p)))
>> >>
>> >> It's much more imperative, but also pretty nice and compact. The
>> >> `printf` optimization is significant for that portion of the program,
>> >> but that isn't much of the running time. The overall running time for
>> >> 10 copies of the KJV is about 9 seconds on my laptop.
>> >>
>> >> I think the remaining difference between Racket and other languages is
>> >> likely the `string-split` and `string-downcase` functions, plus the
>> >> relatively-inefficient string representation that Racket uses.
>> >>
>> >> Sam
>> >>
>> >>
>> >> On Thu, Mar 18, 2021 at 10:28 AM Pawel Mosakowski 
>> wrote:
>> >> >
>> >> > Hi David,
>> >> >
>> >> > Yes, the 21 seconds includes the interpreter startup time. I have
>> done a simple test to see how long it takes:
>> >> >
>> >> > $ time racket -e '(displayln "Hello, world")'
>> >> > Hello, world
>> >> >
>> >> > real 0m0.479s
>> >> > user 0m0.449s
>> >> > sys 0m0.030s
>> >> >
>> >> > I have also put my code inside a main function and profiled it:
>> >> >
>> >> > Profiling results
>> >> > -
>> >> > Total cpu time observed: 20910ms (out of 20970ms)
>> >> > Number of samples taken: 382 (once every 55ms)
>> >> > (Hiding functions with self<1.0% and local<2.0%: 1 of 12 hidden)
>> >> >
>> >> > ==
>> >> > Caller
>> >> > Idx Total Self Name+src Local%
>> >> > ms(pct) ms(pct) Callee
>> >> > ==
>> >> > [1] 20910(100.0%) 0(0.0%) [running body]
>> ...word-occurences-profile.rkt":##f
>> >> > profile-thunk [2] 100.0%
>> >> > --
>> >> > [running body] [1] 100.0%
>> >> > [2] 20910(100.0%) 0(0.0%) profile-thunk
>> ...ket/pkgs/profile-lib/main.rkt:9:0
>> >> > run [3] 100.0%
>> >> > --
>> >> > profile-thunk [2] 100.0%
>> >> > [3] 20910(100.0%) 0(0.0%) run
>> ...share/racket/pkgs/profile-lib/main.rkt:39:2
>> >> > main [4] 100.0%
>> >> > --
>> >> > run [3] 100.0%
>> >> > [4] 20910(100.0%) 50(0.2%) main
>> ...cket/count-word-occurences-profile.rkt:5:0
>> >> > read-from-stdin-it [5] 98.5%
>> >> > ??? [6] 0.2%
>> >> > --
>> >> > main [4] 100.0%
>> >> > [5] 20606(98.5%) 117

Re: [racket-users] Word Count program/benchmark performance

2021-03-18 Thread Sorawee Porncharoenwase
string-split always uses regex. I wonder if a fast path when the splitter
is a regular string will be worth it.

On Fri, Mar 19, 2021 at 4:19 AM Pawel Mosakowski 
wrote:

> Hi Bogdan,
>
> This is a brilliant solution and also completely over my head. It finishes
> in ~3.75s on my PC and is faster than the Python version which basically
> delegates all the work to C. I will need to spend some time on
> understanding it but I am looking forward to learning something new.
>
> Many thanks,
> Pawel
>
> On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote:
>
>> I managed to get it about as fast as Python by making it really
>> imperative and rolling my own hash:
>>
>> https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571
>>
>> Sam Tobin-Hochstadt writes:
>>
>> > Here are several variants of the code:
>> > https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9
>> >
>> > The enabled version is about the fastest I can get without using
>> > `unsafe` (which the rules said not to do). It's possible to optimize a
>> > tiny bit more by avoiding sorting, but only a few milliseconds -- it
>> > would be more significant if there were more different words.
>> >
>> > Switching to bytes works correctly for the given task, but wouldn't
>> > always work in the case of general UTF8 input. But those versions
>> > appeared not be faster for me. Also, writing my own string-downcase
>> > didn't help. And using a big buffer and doing my own newline splitting
>> > didn't help either.
>> >
>> > The version using just a regexp matching on a port (suggested by
>> > Robby) turned out not to be faster either, so my suspicion is that the
>> > original slowness is just using regexps for splitting words.
>> >
>> > Sam
>> >
>> > On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt
>> >  wrote:
>> >>
>> >> Here's a somewhat-optimized version of the code:
>> >>
>> >> #lang racket/base
>> >> (require racket/string racket/vector racket/port)
>> >>
>> >> (define h (make-hash))
>> >>
>> >> (time
>> >> (for* ([l (in-lines)]
>> >> [w (in-list (string-split l))]
>> >> [w* (in-value (string-downcase w))])
>> >> (hash-update! h w* add1 0)))
>> >>
>> >> (define v
>> >> (time
>> >> (for/vector #:length (hash-count h)
>> >> ([(k v) (in-hash h)])
>> >> (cons k v
>> >> (time (vector-sort! v > #:key cdr))
>> >> (define p (current-output-port) #;(open-output-nowhere))
>> >> (time
>> >> (for ([pair (in-vector v)])
>> >> (write-string (car pair) p)
>> >> (write-string (number->string (cdr pair)) p)
>> >> (newline p)))
>> >>
>> >> It's much more imperative, but also pretty nice and compact. The
>> >> `printf` optimization is significant for that portion of the program,
>> >> but that isn't much of the running time. The overall running time for
>> >> 10 copies of the KJV is about 9 seconds on my laptop.
>> >>
>> >> I think the remaining difference between Racket and other languages is
>> >> likely the `string-split` and `string-downcase` functions, plus the
>> >> relatively-inefficient string representation that Racket uses.
>> >>
>> >> Sam
>> >>
>> >>
>> >> On Thu, Mar 18, 2021 at 10:28 AM Pawel Mosakowski <
>> pa...@mosakowski.net> wrote:
>> >> >
>> >> > Hi David,
>> >> >
>> >> > Yes, the 21 seconds includes the interpreter startup time. I have
>> done a simple test to see how long it takes:
>> >> >
>> >> > $ time racket -e '(displayln "Hello, world")'
>> >> > Hello, world
>> >> >
>> >> > real 0m0.479s
>> >> > user 0m0.449s
>> >> > sys 0m0.030s
>> >> >
>> >> > I have also put my code inside a main function and profiled it:
>> >> >
>> >> > Profiling results
>> >> > -
>> >> > Total cpu time observed: 20910ms (out of 20970ms)
>> >> > Number of samples taken: 382 (once every 55ms)
>> >> > (Hiding functions with self<1.0% and local<2.0%: 1 of 12 hidden)
>> >> >
>> >> > ==
>> >> > Caller
>> >> > Idx Total Self Name+src Local%
>> >> > ms(pct) ms(pct) Callee
>> >> > ==
>> >> > [1] 20910(100.0%) 0(0.0%) [running body]
>> ...word-occurences-profile.rkt":##f
>> >> > profile-thunk [2] 100.0%
>> >> > --
>> >> > [running body] [1] 100.0%
>> >> > [2] 20910(100.0%) 0(0.0%) profile-thunk
>> ...ket/pkgs/profile-lib/main.rkt:9:0
>> >> > run [3] 100.0%
>> >> > --
>> >> > profile-thunk [2] 100.0%
>> >> > [3] 20910(100.0%) 0(0.0%) run
>> ...share/racket/pkgs/profile-lib/main.rkt:39:2
>> >> > main [4] 100.0%
>> >> > --
>> >> > run [3] 100.0%
>> >> > [4] 20910(100.0%) 50(0.2%) main
>> ...cket/count-word-occurences-profile.rkt:5:0
>> >> > read-from-stdin-it [5] 98.5%
>> >> > ??? [6] 0.2%
>> >> > --
>> >> > main [4] 100.0%
>> >> > [5] 20606(98.5%) 11796(56.4%) read-from-stdin-it
>> ...-occurences

Re: [racket-users] Word Count program/benchmark performance

2021-03-18 Thread Pawel Mosakowski
Hi Bogdan,

This is a brilliant solution and also completely over my head. It finishes 
in ~3.75s on my PC and is faster than the Python version which basically 
delegates all the work to C. I will need to spend some time on 
understanding it but I am looking forward to learning something new.

Many thanks,
Pawel

On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote:

> I managed to get it about as fast as Python by making it really
> imperative and rolling my own hash:
>
> https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571
>
> Sam Tobin-Hochstadt writes:
>
> > Here are several variants of the code:
> > https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9
> >
> > The enabled version is about the fastest I can get without using
> > `unsafe` (which the rules said not to do). It's possible to optimize a
> > tiny bit more by avoiding sorting, but only a few milliseconds -- it
> > would be more significant if there were more different words.
> >
> > Switching to bytes works correctly for the given task, but wouldn't
> > always work in the case of general UTF8 input. But those versions
> > appeared not be faster for me. Also, writing my own string-downcase
> > didn't help. And using a big buffer and doing my own newline splitting
> > didn't help either.
> >
> > The version using just a regexp matching on a port (suggested by
> > Robby) turned out not to be faster either, so my suspicion is that the
> > original slowness is just using regexps for splitting words.
> >
> > Sam
> >
> > On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt
> >  wrote:
> >>
> >> Here's a somewhat-optimized version of the code:
> >>
> >> #lang racket/base
> >> (require racket/string racket/vector racket/port)
> >>
> >> (define h (make-hash))
> >>
> >> (time
> >> (for* ([l (in-lines)]
> >> [w (in-list (string-split l))]
> >> [w* (in-value (string-downcase w))])
> >> (hash-update! h w* add1 0)))
> >>
> >> (define v
> >> (time
> >> (for/vector #:length (hash-count h)
> >> ([(k v) (in-hash h)])
> >> (cons k v
> >> (time (vector-sort! v > #:key cdr))
> >> (define p (current-output-port) #;(open-output-nowhere))
> >> (time
> >> (for ([pair (in-vector v)])
> >> (write-string (car pair) p)
> >> (write-string (number->string (cdr pair)) p)
> >> (newline p)))
> >>
> >> It's much more imperative, but also pretty nice and compact. The
> >> `printf` optimization is significant for that portion of the program,
> >> but that isn't much of the running time. The overall running time for
> >> 10 copies of the KJV is about 9 seconds on my laptop.
> >>
> >> I think the remaining difference between Racket and other languages is
> >> likely the `string-split` and `string-downcase` functions, plus the
> >> relatively-inefficient string representation that Racket uses.
> >>
> >> Sam
> >>
> >>
> >> On Thu, Mar 18, 2021 at 10:28 AM Pawel Mosakowski  
> wrote:
> >> >
> >> > Hi David,
> >> >
> >> > Yes, the 21 seconds includes the interpreter startup time. I have 
> done a simple test to see how long it takes:
> >> >
> >> > $ time racket -e '(displayln "Hello, world")'
> >> > Hello, world
> >> >
> >> > real 0m0.479s
> >> > user 0m0.449s
> >> > sys 0m0.030s
> >> >
> >> > I have also put my code inside a main function and profiled it:
> >> >
> >> > Profiling results
> >> > -
> >> > Total cpu time observed: 20910ms (out of 20970ms)
> >> > Number of samples taken: 382 (once every 55ms)
> >> > (Hiding functions with self<1.0% and local<2.0%: 1 of 12 hidden)
> >> >
> >> > ==
> >> > Caller
> >> > Idx Total Self Name+src Local%
> >> > ms(pct) ms(pct) Callee
> >> > ==
> >> > [1] 20910(100.0%) 0(0.0%) [running body] 
> ...word-occurences-profile.rkt":##f
> >> > profile-thunk [2] 100.0%
> >> > --
> >> > [running body] [1] 100.0%
> >> > [2] 20910(100.0%) 0(0.0%) profile-thunk 
> ...ket/pkgs/profile-lib/main.rkt:9:0
> >> > run [3] 100.0%
> >> > --
> >> > profile-thunk [2] 100.0%
> >> > [3] 20910(100.0%) 0(0.0%) run 
> ...share/racket/pkgs/profile-lib/main.rkt:39:2
> >> > main [4] 100.0%
> >> > --
> >> > run [3] 100.0%
> >> > [4] 20910(100.0%) 50(0.2%) main 
> ...cket/count-word-occurences-profile.rkt:5:0
> >> > read-from-stdin-it [5] 98.5%
> >> > ??? [6] 0.2%
> >> > --
> >> > main [4] 100.0%
> >> > [5] 20606(98.5%) 11796(56.4%) read-from-stdin-it 
> ...-occurences-profile.rkt:19:6
> >> > internal-split [7] 42.8%
> >> > --
> >> > main [4] 100.0%
> >> > [6] 51(0.2%) 0(0.0%) ??? 
> ...cket/collects/racket/private/sort.rkt:369:3
> >> > generic-sort/key [8] 100.0%
> >> > 

Re: [racket-users] Word Count program/benchmark performance

2021-03-18 Thread Pawel Mosakowski
Hi Sam,

Thank you for your responses. Your fastest approach runs in quarter of the 
time (~5 seconds) of my naive implementation which is pretty amazing. I 
already had a closer look and can see all the improvements like avoiding 
string-split, using a mutable hash and modifying it in place, using a 
vector etc. I have learned a lot and I will incorporate this in my future 
programs.

Many thanks,
Pawel

On Thursday, March 18, 2021 at 6:01:47 PM UTC Sam Tobin-Hochstadt wrote:

> Here are several variants of the code:
> https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9
>
> The enabled version is about the fastest I can get without using
> `unsafe` (which the rules said not to do). It's possible to optimize a
> tiny bit more by avoiding sorting, but only a few milliseconds -- it
> would be more significant if there were more different words.
>
> Switching to bytes works correctly for the given task, but wouldn't
> always work in the case of general UTF8 input. But those versions
> appeared not be faster for me. Also, writing my own string-downcase
> didn't help. And using a big buffer and doing my own newline splitting
> didn't help either.
>
> The version using just a regexp matching on a port (suggested by
> Robby) turned out not to be faster either, so my suspicion is that the
> original slowness is just using regexps for splitting words.
>
> Sam
>
> On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt
>  wrote:
> >
> > Here's a somewhat-optimized version of the code:
> >
> > #lang racket/base
> > (require racket/string racket/vector racket/port)
> >
> > (define h (make-hash))
> >
> > (time
> > (for* ([l (in-lines)]
> > [w (in-list (string-split l))]
> > [w* (in-value (string-downcase w))])
> > (hash-update! h w* add1 0)))
> >
> > (define v
> > (time
> > (for/vector #:length (hash-count h)
> > ([(k v) (in-hash h)])
> > (cons k v
> > (time (vector-sort! v > #:key cdr))
> > (define p (current-output-port) #;(open-output-nowhere))
> > (time
> > (for ([pair (in-vector v)])
> > (write-string (car pair) p)
> > (write-string (number->string (cdr pair)) p)
> > (newline p)))
> >
> > It's much more imperative, but also pretty nice and compact. The
> > `printf` optimization is significant for that portion of the program,
> > but that isn't much of the running time. The overall running time for
> > 10 copies of the KJV is about 9 seconds on my laptop.
> >
> > I think the remaining difference between Racket and other languages is
> > likely the `string-split` and `string-downcase` functions, plus the
> > relatively-inefficient string representation that Racket uses.
> >
> > Sam
> >
> >
> > On Thu, Mar 18, 2021 at 10:28 AM Pawel Mosakowski  
> wrote:
> > >
> > > Hi David,
> > >
> > > Yes, the 21 seconds includes the interpreter startup time. I have done 
> a simple test to see how long it takes:
> > >
> > > $ time racket -e '(displayln "Hello, world")'
> > > Hello, world
> > >
> > > real 0m0.479s
> > > user 0m0.449s
> > > sys 0m0.030s
> > >
> > > I have also put my code inside a main function and profiled it:
> > >
> > > Profiling results
> > > -
> > > Total cpu time observed: 20910ms (out of 20970ms)
> > > Number of samples taken: 382 (once every 55ms)
> > > (Hiding functions with self<1.0% and local<2.0%: 1 of 12 hidden)
> > >
> > > ==
> > > Caller
> > > Idx Total Self Name+src Local%
> > > ms(pct) ms(pct) Callee
> > > ==
> > > [1] 20910(100.0%) 0(0.0%) [running body] 
> ...word-occurences-profile.rkt":##f
> > > profile-thunk [2] 100.0%
> > > --
> > > [running body] [1] 100.0%
> > > [2] 20910(100.0%) 0(0.0%) profile-thunk 
> ...ket/pkgs/profile-lib/main.rkt:9:0
> > > run [3] 100.0%
> > > --
> > > profile-thunk [2] 100.0%
> > > [3] 20910(100.0%) 0(0.0%) run 
> ...share/racket/pkgs/profile-lib/main.rkt:39:2
> > > main [4] 100.0%
> > > --
> > > run [3] 100.0%
> > > [4] 20910(100.0%) 50(0.2%) main 
> ...cket/count-word-occurences-profile.rkt:5:0
> > > read-from-stdin-it [5] 98.5%
> > > ??? [6] 0.2%
> > > --
> > > main [4] 100.0%
> > > [5] 20606(98.5%) 11796(56.4%) read-from-stdin-it 
> ...-occurences-profile.rkt:19:6
> > > internal-split [7] 42.8%
> > > --
> > > main [4] 100.0%
> > > [6] 51(0.2%) 0(0.0%) ??? ...cket/collects/racket/private/sort.rkt:369:3
> > > generic-sort/key [8] 100.0%
> > > --
> > > read-from-stdin-it [5]100.0%
> > > [7] 8810(42.1%) 3528(16.9%) internal-split 
> ...collects/racket/string.rkt:117:0
> > > regexp-split [9] 59.9%
> > > --
> > > ???

Re: [racket-users] Word Count program/benchmark performance

2021-03-18 Thread Bogdan Popa
I managed to get it about as fast as Python by making it really
imperative and rolling my own hash:

https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571

Sam Tobin-Hochstadt writes:

> Here are several variants of the code:
> https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9
>
> The enabled version is about the fastest I can get without using
> `unsafe` (which the rules said not to do). It's possible to optimize a
> tiny bit more by avoiding sorting, but only a few milliseconds -- it
> would be more significant if there were more different words.
>
> Switching to bytes works correctly for the given task, but wouldn't
> always work in the case of general UTF8 input. But those versions
> appeared not be faster for me. Also, writing my own string-downcase
> didn't help. And using a big buffer and doing my own newline splitting
> didn't help either.
>
> The version using just a regexp matching on a port (suggested by
> Robby) turned out not to be faster either, so my suspicion is that the
> original slowness is just using regexps for splitting words.
>
> Sam
>
> On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt
>  wrote:
>>
>> Here's a somewhat-optimized version of the code:
>>
>> #lang racket/base
>> (require racket/string racket/vector racket/port)
>>
>> (define h (make-hash))
>>
>> (time
>>  (for* ([l (in-lines)]
>> [w (in-list (string-split l))]
>> [w* (in-value (string-downcase w))])
>>(hash-update! h w* add1 0)))
>>
>> (define v
>>   (time
>>(for/vector #:length (hash-count h)
>>([(k v) (in-hash h)])
>>  (cons k v
>> (time (vector-sort! v > #:key cdr))
>> (define p (current-output-port) #;(open-output-nowhere))
>> (time
>>  (for ([pair (in-vector v)])
>>(write-string (car pair) p)
>>(write-string (number->string (cdr pair)) p)
>>(newline p)))
>>
>> It's much more imperative, but also pretty nice and compact. The
>> `printf` optimization is significant for that portion of the program,
>> but that isn't much of the running time. The overall running time for
>> 10 copies of the KJV is about 9 seconds on my laptop.
>>
>> I think the remaining difference between Racket and other languages is
>> likely the `string-split` and `string-downcase` functions, plus the
>> relatively-inefficient string representation that Racket uses.
>>
>> Sam
>>
>>
>> On Thu, Mar 18, 2021 at 10:28 AM Pawel Mosakowski  
>> wrote:
>> >
>> > Hi David,
>> >
>> > Yes, the 21 seconds includes the interpreter startup time. I have done a 
>> > simple test to see how long it takes:
>> >
>> > $ time racket -e '(displayln "Hello, world")'
>> > Hello, world
>> >
>> > real0m0.479s
>> > user0m0.449s
>> > sys0m0.030s
>> >
>> > I have also put my code inside a main function and profiled it:
>> >
>> > Profiling results
>> > -
>> >   Total cpu time observed: 20910ms (out of 20970ms)
>> >   Number of samples taken: 382 (once every 55ms)
>> >   (Hiding functions with self<1.0% and local<2.0%: 1 of 12 hidden)
>> >
>> > ==
>> >   Caller
>> >  IdxTotal Self  Name+srcLocal%
>> > ms(pct)   ms(pct) Callee
>> > ==
>> >  [1] 20910(100.0%) 0(0.0%)  [running body] 
>> > ...word-occurences-profile.rkt":##f
>> >   profile-thunk [2] 100.0%
>> > --
>> >   [running body] [1]100.0%
>> >  [2] 20910(100.0%) 0(0.0%)  profile-thunk 
>> > ...ket/pkgs/profile-lib/main.rkt:9:0
>> >   run [3]   100.0%
>> > --
>> >   profile-thunk [2] 100.0%
>> >  [3] 20910(100.0%) 0(0.0%)  run 
>> > ...share/racket/pkgs/profile-lib/main.rkt:39:2
>> >   main [4]  100.0%
>> > --
>> >   run [3]   100.0%
>> >  [4] 20910(100.0%)50(0.2%)  main 
>> > ...cket/count-word-occurences-profile.rkt:5:0
>> >   read-from-stdin-it [5] 98.5%
>> >   ??? [6] 0.2%
>> > --
>> >   main [4]  100.0%
>> >  [5] 20606(98.5%)  11796(56.4%) read-from-stdin-it 
>> > ...-occurences-profile.rkt:19:6
>> >   internal-split [7] 42.8%
>> > --
>> >   main [4]  100.0%
>> >  [6]51(0.2%)   0(0.0%)  ??? 
>> > ...cket/collects/racket/private/sort.rkt:369:3
>> >

Re: [racket-users] Word Count program/benchmark performance

2021-03-18 Thread Sam Tobin-Hochstadt
Here are several variants of the code:
https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9

The enabled version is about the fastest I can get without using
`unsafe` (which the rules said not to do). It's possible to optimize a
tiny bit more by avoiding sorting, but only a few milliseconds -- it
would be more significant if there were more different words.

Switching to bytes works correctly for the given task, but wouldn't
always work in the case of general UTF8 input. But those versions
appeared not be faster for me. Also, writing my own string-downcase
didn't help. And using a big buffer and doing my own newline splitting
didn't help either.

The version using just a regexp matching on a port (suggested by
Robby) turned out not to be faster either, so my suspicion is that the
original slowness is just using regexps for splitting words.

Sam

On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt
 wrote:
>
> Here's a somewhat-optimized version of the code:
>
> #lang racket/base
> (require racket/string racket/vector racket/port)
>
> (define h (make-hash))
>
> (time
>  (for* ([l (in-lines)]
> [w (in-list (string-split l))]
> [w* (in-value (string-downcase w))])
>(hash-update! h w* add1 0)))
>
> (define v
>   (time
>(for/vector #:length (hash-count h)
>([(k v) (in-hash h)])
>  (cons k v
> (time (vector-sort! v > #:key cdr))
> (define p (current-output-port) #;(open-output-nowhere))
> (time
>  (for ([pair (in-vector v)])
>(write-string (car pair) p)
>(write-string (number->string (cdr pair)) p)
>(newline p)))
>
> It's much more imperative, but also pretty nice and compact. The
> `printf` optimization is significant for that portion of the program,
> but that isn't much of the running time. The overall running time for
> 10 copies of the KJV is about 9 seconds on my laptop.
>
> I think the remaining difference between Racket and other languages is
> likely the `string-split` and `string-downcase` functions, plus the
> relatively-inefficient string representation that Racket uses.
>
> Sam
>
>
> On Thu, Mar 18, 2021 at 10:28 AM Pawel Mosakowski  
> wrote:
> >
> > Hi David,
> >
> > Yes, the 21 seconds includes the interpreter startup time. I have done a 
> > simple test to see how long it takes:
> >
> > $ time racket -e '(displayln "Hello, world")'
> > Hello, world
> >
> > real0m0.479s
> > user0m0.449s
> > sys0m0.030s
> >
> > I have also put my code inside a main function and profiled it:
> >
> > Profiling results
> > -
> >   Total cpu time observed: 20910ms (out of 20970ms)
> >   Number of samples taken: 382 (once every 55ms)
> >   (Hiding functions with self<1.0% and local<2.0%: 1 of 12 hidden)
> >
> > ==
> >   Caller
> >  IdxTotal Self  Name+srcLocal%
> > ms(pct)   ms(pct) Callee
> > ==
> >  [1] 20910(100.0%) 0(0.0%)  [running body] 
> > ...word-occurences-profile.rkt":##f
> >   profile-thunk [2] 100.0%
> > --
> >   [running body] [1]100.0%
> >  [2] 20910(100.0%) 0(0.0%)  profile-thunk 
> > ...ket/pkgs/profile-lib/main.rkt:9:0
> >   run [3]   100.0%
> > --
> >   profile-thunk [2] 100.0%
> >  [3] 20910(100.0%) 0(0.0%)  run 
> > ...share/racket/pkgs/profile-lib/main.rkt:39:2
> >   main [4]  100.0%
> > --
> >   run [3]   100.0%
> >  [4] 20910(100.0%)50(0.2%)  main 
> > ...cket/count-word-occurences-profile.rkt:5:0
> >   read-from-stdin-it [5] 98.5%
> >   ??? [6] 0.2%
> > --
> >   main [4]  100.0%
> >  [5] 20606(98.5%)  11796(56.4%) read-from-stdin-it 
> > ...-occurences-profile.rkt:19:6
> >   internal-split [7] 42.8%
> > --
> >   main [4]  100.0%
> >  [6]51(0.2%)   0(0.0%)  ??? 
> > ...cket/collects/racket/private/sort.rkt:369:3
> >   generic-sort/key [8]  100.0%
> > --
> >   read-from-stdin-it [5]100.0%
> >  [7]  8810(42.1%)   3528(16.9%) internal-split 
> > ...collects/racket/string.rkt:117:0
> >   regexp-split [9]   59.9

Re: [racket-users] Word Count program/benchmark performance

2021-03-18 Thread Sam Tobin-Hochstadt
Here's a somewhat-optimized version of the code:

#lang racket/base
(require racket/string racket/vector racket/port)

(define h (make-hash))

(time
 (for* ([l (in-lines)]
[w (in-list (string-split l))]
[w* (in-value (string-downcase w))])
   (hash-update! h w* add1 0)))

(define v
  (time
   (for/vector #:length (hash-count h)
   ([(k v) (in-hash h)])
 (cons k v
(time (vector-sort! v > #:key cdr))
(define p (current-output-port) #;(open-output-nowhere))
(time
 (for ([pair (in-vector v)])
   (write-string (car pair) p)
   (write-string (number->string (cdr pair)) p)
   (newline p)))

It's much more imperative, but also pretty nice and compact. The
`printf` optimization is significant for that portion of the program,
but that isn't much of the running time. The overall running time for
10 copies of the KJV is about 9 seconds on my laptop.

I think the remaining difference between Racket and other languages is
likely the `string-split` and `string-downcase` functions, plus the
relatively-inefficient string representation that Racket uses.

Sam


On Thu, Mar 18, 2021 at 10:28 AM Pawel Mosakowski  wrote:
>
> Hi David,
>
> Yes, the 21 seconds includes the interpreter startup time. I have done a 
> simple test to see how long it takes:
>
> $ time racket -e '(displayln "Hello, world")'
> Hello, world
>
> real0m0.479s
> user0m0.449s
> sys0m0.030s
>
> I have also put my code inside a main function and profiled it:
>
> Profiling results
> -
>   Total cpu time observed: 20910ms (out of 20970ms)
>   Number of samples taken: 382 (once every 55ms)
>   (Hiding functions with self<1.0% and local<2.0%: 1 of 12 hidden)
>
> ==
>   Caller
>  IdxTotal Self  Name+srcLocal%
> ms(pct)   ms(pct) Callee
> ==
>  [1] 20910(100.0%) 0(0.0%)  [running body] 
> ...word-occurences-profile.rkt":##f
>   profile-thunk [2] 100.0%
> --
>   [running body] [1]100.0%
>  [2] 20910(100.0%) 0(0.0%)  profile-thunk 
> ...ket/pkgs/profile-lib/main.rkt:9:0
>   run [3]   100.0%
> --
>   profile-thunk [2] 100.0%
>  [3] 20910(100.0%) 0(0.0%)  run 
> ...share/racket/pkgs/profile-lib/main.rkt:39:2
>   main [4]  100.0%
> --
>   run [3]   100.0%
>  [4] 20910(100.0%)50(0.2%)  main 
> ...cket/count-word-occurences-profile.rkt:5:0
>   read-from-stdin-it [5] 98.5%
>   ??? [6] 0.2%
> --
>   main [4]  100.0%
>  [5] 20606(98.5%)  11796(56.4%) read-from-stdin-it 
> ...-occurences-profile.rkt:19:6
>   internal-split [7] 42.8%
> --
>   main [4]  100.0%
>  [6]51(0.2%)   0(0.0%)  ??? 
> ...cket/collects/racket/private/sort.rkt:369:3
>   generic-sort/key [8]  100.0%
> --
>   read-from-stdin-it [5]100.0%
>  [7]  8810(42.1%)   3528(16.9%) internal-split 
> ...collects/racket/string.rkt:117:0
>   regexp-split [9]   59.9%
> --
>   ??? [6]   100.0%
>  [8]51(0.2%)   0(0.0%)  generic-sort/key 
> .../racket/private/sort.rkt:156:2
>   copying-mergesort [10]100.0%
> --
>   internal-split [7]100.0%
>  [9]  5282(25.3%)   2810(13.4%) regexp-split 
> ...ts/racket/private/string.rkt:338:2
>   loop [11]  46.8%
> --
>   generic-sort/key [8]   10.0%
>   copying-mergesort [10] 90.0%
> [10]51(0.2%)  51(0.2%)  copying-mergesort 
> ...racket/private/sort.rkt:129:8
>   copying-mergesort [10] 90.0%
> --
>   regexp-split [9]  100.0%
> [11]  2471(11.8%)   2471(11.8%) loop 
> ...t/collects/racket/private/

Re: [racket-users] Word Count program/benchmark performance

2021-03-18 Thread Pawel Mosakowski
Hi David,

Yes, the 21 seconds includes the interpreter startup time. I have done a 
simple test to see how long it takes:

$ time racket -e '(displayln "Hello, world")'
Hello, world

real0m0.479s
user0m0.449s
sys0m0.030s

I have also put my code inside a main function and profiled it:

Profiling results
-
  Total cpu time observed: 20910ms (out of 20970ms)
  Number of samples taken: 382 (once every 55ms)
  (Hiding functions with self<1.0% and local<2.0%: 1 of 12 hidden)

==
  Caller
 IdxTotal Self  Name+srcLocal%
ms(pct)   ms(pct) Callee
==
 [1] 20910(100.0%) 0(0.0%)  [running body] ...word-occurences-profile.
rkt":##f
  profile-thunk [2] 100.0%
--
  [running body] [1]100.0%
 [2] 20910(100.0%) 0(0.0%)  profile-thunk ...ket/pkgs/profile-lib/main.
rkt:9:0
  run [3]   100.0%
--
  profile-thunk [2] 100.0%
 [3] 20910(100.0%) 0(0.0%)  run ...share/racket/pkgs/profile-
lib/main.rkt:39:2
  main [4]  100.0%
--
  run [3]   100.0%
 [4] 20910(100.0%)50(0.2%)  main ...cket/count-word-occurences-
profile.rkt:5:0
  read-from-stdin-it [5] 98.5%
  ??? [6] 0.2%
--
  main [4]  100.0%
 [5] 20606(98.5%)  11796(56.4%) read-from-stdin-it 
...-occurences-profile.rkt:19:6
  internal-split [7] 42.8%
--
  main [4]  100.0%
 [6]51(0.2%)   0(0.0%)  ??? ...cket/collects/racket/
private/sort.rkt:369:3
  generic-sort/key [8]  100.0%
--
  read-from-stdin-it [5]100.0%
 [7]  8810(42.1%)   3528(16.9%) internal-split 
...collects/racket/string.rkt:117:0
  regexp-split [9]   59.9%
--
  ??? [6]   100.0%
 [8]51(0.2%)   0(0.0%)  generic-sort/key 
.../racket/private/sort.rkt:156:2
  copying-mergesort [10]100.0%
--
  internal-split [7]100.0%
 [9]  5282(25.3%)   2810(13.4%) regexp-split ...ts/racket/private/string.
rkt:338:2
  loop [11]  46.8%
--
  generic-sort/key [8]   10.0%
  copying-mergesort [10] 90.0%
[10]51(0.2%)  51(0.2%)  copying-mergesort 
...racket/private/sort.rkt:129:8
  copying-mergesort [10] 90.0%
--
  regexp-split [9]  100.0%
[11]  2471(11.8%)   2471(11.8%) loop ...t/collects/racket/private/
string.rkt:169:7
--

Kind regards,
Pawel


On Thursday, March 18, 2021 at 2:09:35 PM UTC david@gmail.com wrote:

> Hi Pawel,
>
> I'll take a look at the code later, but did that 21 seconds include 
> startup time for the interpreter?
>
> On Thu, Mar 18, 2021, 9:24 AM Pawel Mosakowski  
> wrote:
>
>> Hello,
>>
>> I am a Racket beginner and I have come across this article:
>>
>> 
>> https://benhoyt.com/writings/count-words/
>>
>> This is my attempt at solving the challenge:
>>
>> 
>> https://pastebin.com/kL16w5Hc
>>
>> However when I have benchmarked it, it takes ~21 seconds to run compared 
>> to the Python and Ruby versions which take around 3-4 seconds.
>>
>> I understand that both Ruby and Python probably have the string 
>> operations and hash tables implemented in optimized C but is there anything 
>> I can do to improve performance of my program?
>>
>> Many thanks for all help and suggestions.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Racket Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to racket-users...@googlegroups.com.
>> To view this discussion on the web vis

Re: [racket-users] Word Count program/benchmark performance

2021-03-18 Thread David Storrs
Hi Pawel,

I'll take a look at the code later, but did that 21 seconds include startup
time for the interpreter?

On Thu, Mar 18, 2021, 9:24 AM Pawel Mosakowski  wrote:

> Hello,
>
> I am a Racket beginner and I have come across this article:
>
> 
> https://benhoyt.com/writings/count-words/
>
> This is my attempt at solving the challenge:
>
> 
> https://pastebin.com/kL16w5Hc
>
> However when I have benchmarked it, it takes ~21 seconds to run compared
> to the Python and Ruby versions which take around 3-4 seconds.
>
> I understand that both Ruby and Python probably have the string operations
> and hash tables implemented in optimized C but is there anything I can do
> to improve performance of my program?
>
> Many thanks for all help and suggestions.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to racket-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/racket-users/118c1340-66d1-421d-92a4-6b66c56cb88fn%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/CAE8gKoego85qz3O%2B4-PMq-YpX2ygmXEW3a62asYO5e_hxANZ0A%40mail.gmail.com.


[racket-users] Word Count program/benchmark performance

2021-03-18 Thread Pawel Mosakowski
Hello,

I am a Racket beginner and I have come across this article:


https://benhoyt.com/writings/count-words/

This is my attempt at solving the challenge:


https://pastebin.com/kL16w5Hc

However when I have benchmarked it, it takes ~21 seconds to run compared to 
the Python and Ruby versions which take around 3-4 seconds.

I understand that both Ruby and Python probably have the string operations 
and hash tables implemented in optimized C but is there anything I can do 
to improve performance of my program?

Many thanks for all help and suggestions.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/118c1340-66d1-421d-92a4-6b66c56cb88fn%40googlegroups.com.