Hi Sam,

Thank you for your responses. Your fastest approach runs in quarter of the 
time (~5 seconds) of my naive implementation which is pretty amazing. I 
already had a closer look and can see all the improvements like avoiding 
string-split, using a mutable hash and modifying it in place, using a 
vector etc. I have learned a lot and I will incorporate this in my future 
programs.

Many thanks,
Pawel

On Thursday, March 18, 2021 at 6:01:47 PM UTC Sam Tobin-Hochstadt wrote:

> Here are several variants of the code:
> https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9
>
> The enabled version is about the fastest I can get without using
> `unsafe` (which the rules said not to do). It's possible to optimize a
> tiny bit more by avoiding sorting, but only a few milliseconds -- it
> would be more significant if there were more different words.
>
> Switching to bytes works correctly for the given task, but wouldn't
> always work in the case of general UTF8 input. But those versions
> appeared not be faster for me. Also, writing my own string-downcase
> didn't help. And using a big buffer and doing my own newline splitting
> didn't help either.
>
> The version using just a regexp matching on a port (suggested by
> Robby) turned out not to be faster either, so my suspicion is that the
> original slowness is just using regexps for splitting words.
>
> Sam
>
> On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt
> <[email protected]> wrote:
> >
> > Here's a somewhat-optimized version of the code:
> >
> > #lang racket/base
> > (require racket/string racket/vector racket/port)
> >
> > (define h (make-hash))
> >
> > (time
> > (for* ([l (in-lines)]
> > [w (in-list (string-split l))]
> > [w* (in-value (string-downcase w))])
> > (hash-update! h w* add1 0)))
> >
> > (define v
> > (time
> > (for/vector #:length (hash-count h)
> > ([(k v) (in-hash h)])
> > (cons k v))))
> > (time (vector-sort! v > #:key cdr))
> > (define p (current-output-port) #;(open-output-nowhere))
> > (time
> > (for ([pair (in-vector v)])
> > (write-string (car pair) p)
> > (write-string (number->string (cdr pair)) p)
> > (newline p)))
> >
> > It's much more imperative, but also pretty nice and compact. The
> > `printf` optimization is significant for that portion of the program,
> > but that isn't much of the running time. The overall running time for
> > 10 copies of the KJV is about 9 seconds on my laptop.
> >
> > I think the remaining difference between Racket and other languages is
> > likely the `string-split` and `string-downcase` functions, plus the
> > relatively-inefficient string representation that Racket uses.
> >
> > Sam
> >
> >
> > On Thu, Mar 18, 2021 at 10:28 AM Pawel Mosakowski <[email protected]> 
> wrote:
> > >
> > > Hi David,
> > >
> > > Yes, the 21 seconds includes the interpreter startup time. I have done 
> a simple test to see how long it takes:
> > >
> > > $ time racket -e '(displayln "Hello, world")'
> > > Hello, world
> > >
> > > real 0m0.479s
> > > user 0m0.449s
> > > sys 0m0.030s
> > >
> > > I have also put my code inside a main function and profiled it:
> > >
> > > Profiling results
> > > -----------------
> > > Total cpu time observed: 20910ms (out of 20970ms)
> > > Number of samples taken: 382 (once every 55ms)
> > > (Hiding functions with self<1.0% and local<2.0%: 1 of 12 hidden)
> > >
> > > ==============================================================
> > > Caller
> > > Idx Total Self Name+src Local%
> > > ms(pct) ms(pct) Callee
> > > ==============================================================
> > > [1] 20910(100.0%) 0(0.0%) [running body] 
> ...word-occurences-profile.rkt":##f
> > > profile-thunk [2] 100.0%
> > > --------------------------------------------------------------
> > > [running body] [1] 100.0%
> > > [2] 20910(100.0%) 0(0.0%) profile-thunk 
> ...ket/pkgs/profile-lib/main.rkt:9:0
> > > run [3] 100.0%
> > > --------------------------------------------------------------
> > > profile-thunk [2] 100.0%
> > > [3] 20910(100.0%) 0(0.0%) run 
> ...share/racket/pkgs/profile-lib/main.rkt:39:2
> > > main [4] 100.0%
> > > --------------------------------------------------------------
> > > run [3] 100.0%
> > > [4] 20910(100.0%) 50(0.2%) main 
> ...cket/count-word-occurences-profile.rkt:5:0
> > > read-from-stdin-it [5] 98.5%
> > > ??? [6] 0.2%
> > > --------------------------------------------------------------
> > > main [4] 100.0%
> > > [5] 20606(98.5%) 11796(56.4%) read-from-stdin-it 
> ...-occurences-profile.rkt:19:6
> > > internal-split [7] 42.8%
> > > --------------------------------------------------------------
> > > main [4] 100.0%
> > > [6] 51(0.2%) 0(0.0%) ??? ...cket/collects/racket/private/sort.rkt:369:3
> > > generic-sort/key [8] 100.0%
> > > --------------------------------------------------------------
> > > read-from-stdin-it [5]100.0%
> > > [7] 8810(42.1%) 3528(16.9%) internal-split 
> ...collects/racket/string.rkt:117:0
> > > regexp-split [9] 59.9%
> > > --------------------------------------------------------------
> > > ??? [6] 100.0%
> > > [8] 51(0.2%) 0(0.0%) generic-sort/key .../racket/private/sort.rkt:156:2
> > > copying-mergesort [10]100.0%
> > > --------------------------------------------------------------
> > > internal-split [7] 100.0%
> > > [9] 5282(25.3%) 2810(13.4%) regexp-split 
> ...ts/racket/private/string.rkt:338:2
> > > loop [11] 46.8%
> > > --------------------------------------------------------------
> > > generic-sort/key [8] 10.0%
> > > copying-mergesort [10] 90.0%
> > > [10] 51(0.2%) 51(0.2%) copying-mergesort 
> ...racket/private/sort.rkt:129:8
> > > copying-mergesort [10] 90.0%
> > > --------------------------------------------------------------
> > > regexp-split [9] 100.0%
> > > [11] 2471(11.8%) 2471(11.8%) loop 
> ...t/collects/racket/private/string.rkt:169:7
> > > --------------------------------------------------------------
> > >
> > > Kind regards,
> > > Pawel
> > >
> > >
> > > On Thursday, March 18, 2021 at 2:09:35 PM UTC [email protected] 
> wrote:
> > >>
> > >> Hi Pawel,
> > >>
> > >> I'll take a look at the code later, but did that 21 seconds include 
> startup time for the interpreter?
> > >>
> > >> On Thu, Mar 18, 2021, 9:24 AM Pawel Mosakowski <[email protected]> 
> wrote:
> > >>>
> > >>> Hello,
> > >>>
> > >>> I am a Racket beginner and I have come across this article:
> > >>>
> > >>> https://benhoyt.com/writings/count-words/
> > >>>
> > >>> This is my attempt at solving the challenge:
> > >>>
> > >>> https://pastebin.com/kL16w5Hc
> > >>>
> > >>> However when I have benchmarked it, it takes ~21 seconds to run 
> compared to the Python and Ruby versions which take around 3-4 seconds.
> > >>>
> > >>> I understand that both Ruby and Python probably have the string 
> operations and hash tables implemented in optimized C but is there anything 
> I can do to improve performance of my program?
> > >>>
> > >>> Many thanks for all help and suggestions.
> > >>>
> > >>> --
> > >>> You received this message because you are subscribed to the Google 
> Groups "Racket Users" group.
> > >>> To unsubscribe from this group and stop receiving emails from it, 
> send an email to [email protected].
> > >>> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/racket-users/118c1340-66d1-421d-92a4-6b66c56cb88fn%40googlegroups.com
> .
> > >
> > > --
> > > You received this message because you are subscribed to the Google 
> Groups "Racket Users" group.
> > > To unsubscribe from this group and stop receiving emails from it, send 
> an email to [email protected].
> > > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/racket-users/09c58a34-bd2d-49e7-bfbd-d3253c1e6dd1n%40googlegroups.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/5623235c-6576-42b9-8ddf-dd36442285ebn%40googlegroups.com.

Reply via email to