[racket-users] [ANNOUNCE] Xiden is now in beta
Hi folks, About a year, 1384 commits, 489 tests, ~10k LOC, and 2" on my waistline later, Xiden is in beta. An update is pending on the default catalog. https://github.com/zyrolasting/xiden Xiden is a dependency manager I wrote to support use cases that I could not get working with `raco pkg`. Dependency management is hard, so Xiden was something I originally didn't want to make. However, it ended up becoming one of my most aspirational projects, and I'm proud of how it ended up. If you could take the time to read a longer email, I'd like to share a bit about how it might be helpful to you. *** Like Guix, Xiden supports deterministic and atomic installations. Unlike Guix, Xiden is cross-platform. The Racket programs I write no longer have to assume that code comes in collections (outside of the built-in ones). You can force dependencies of different versions to resolve to the same data to avoid issues with non-eq? bindings [multiver]. Dependencies are accessed by symbolic links with names defined by the dependent. So if two packages are called "uri", you can still install them both under names that are meaningful to you. Dependencies are fulfilled the same way, regardless if the dependent is a human or more software. Explicit, affirmative consent is fundamental to Xiden's workings. The default configuration is zero-trust (a.k.a. "Deny All"). Trust in cryptographic hash functions and public keys (or any bytes lacking either) must be declared to authenticate bytesfrom any source (even hard coded!). Not doing so will cause Xiden to reject data, but print an error that helpfully instructs you how to consent to the scenario. For those wanting convenience, there are "blanket" configuration options to consent to every instance of those scenarios. This makes Xiden a way to educate users on the exact shape and nature of the risks they accept with something from the Internet. In this sense, Xiden does not invent anything new with security. It only aims to get ahead of the "Allow Some" arms-race in other dependency managers like NPM. Customization comes from a plugin module. You can use a plugin to integrate GPG, use a different archive format, or otherwise fill in gaps in Xiden's functionality. Xiden keeps authentication and integrity checking decoupled in this way so that users can transition on their own in the event a smart person finds a collision in a CHF, or cracks a cipher. Similarly, Xiden's data sources are any data type declared with a path to an input port, including queries to a catalog. A neat effect of this is that you can configure your own syntax for data sources in your command lines. Even though I call Xiden a dependency manager, it is generalized enough to be useful as a component for a CI system, as a self-hosted OS development environment, or even as a back-end for a more specialized dependency manager. If this is something that interests you, please consider trying the examples with the guide [ex][guide]. Like all software, Xiden is not perfect, so I depend on your feedback to make Xiden better for you, and to decide what interfaces should be declared stable. [ex]: https://github.com/zyrolasting/xiden/tree/master/examples [guide]: https://docs.racket-lang.org/xiden-guide@xiden/index.html [ethos]: https://groups.google.com/g/racket-users/c/4iI-SanIbzk/m/sGHYijLPAAAJ [multiver]: https://github.com/zyrolasting/xiden/tree/master/examples/01-differing-versions -- ~slg -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/0fcfc4de-6742-e729-73e3-a7e71326991f%40sagegerard.com.
Re: [racket-users] Best way to say 'block until true'?
Ah. Thanks. On Fri, Mar 19, 2021 at 1:33 PM Jay McCarthy wrote: > It is not a built-in thing. I am talking about the use-pattern of a > condition variable: > https://en.wikipedia.org/wiki/Monitor_(synchronization)#Condition_variables > > -- > Jay McCarthy > Associate Professor @ CS @ UMass Lowell > http://jeapostrophe.github.io > Vincit qui se vincit. > > On Fri, Mar 19, 2021 at 1:02 PM David Storrs > wrote: > > > > > > > > On Fri, Mar 19, 2021 at 12:02 PM Jay McCarthy > wrote: > >> > >> The best thing is to use a semaphore instead of a mutable reference. > >> If you can't do that, then I think that you should combine the mutable > >> reference with a signaling semaphore. If you can't do that, then I > >> can't think of anything but a poll. > > > > > > Is this the kind of thing you meant by 'signalling semaphore'? > > > > #lang racket > > > > (define x #f) > > (define sema (make-semaphore 0)) > > > > (define (wait-on-x) (sync (semaphore-peek-evt sema)) always-evt) > > (define (set-x! val) > > (void (thread ; don't print the thread object when running in the repl > > (thunk > > (sleep 3) > > (set! x val) > > (semaphore-post sema) > > > > (define (check-x) > > (match (sync/timeout 10 (wait-on-x)) > > [#f (displayln "timeout")] > > [_ (displayln "success")])) > > > > (set-x! 7) > > (check-x) ; pauses for 3 seconds, then outputs "success" > > (check-x) ; outputs "success" immediately > > (check-x) ; ibid > >> > >> > >> -- > >> Jay McCarthy > >> Associate Professor @ CS @ UMass Lowell > >> http://jeapostrophe.github.io > >> Vincit qui se vincit. > >> > >> On Fri, Mar 19, 2021 at 11:59 AM David Storrs > wrote: > >> > > >> > Suppose I have a function that tests for some condition, e.g. > >> > > >> > (define current-user (make-parameter #f)) > >> > (define (current-user-set?) (not (false? (current-user))) > >> > > >> > What is the best way to say "wait until 'current-user-set?' returns > true"? I've been through the Events chapter in the Reference and nothing > seems like a great fit. I could do polling via sleep or alarm-evt but that > seems inefficient. Is there a better way? > >> > > >> > -- > >> > You received this message because you are subscribed to the Google > Groups "Racket Users" group. > >> > To unsubscribe from this group and stop receiving emails from it, > send an email to racket-users+unsubscr...@googlegroups.com. > >> > To view this discussion on the web visit > https://groups.google.com/d/msgid/racket-users/CAE8gKocbPgjcFAF_o2g6mhZBEH8PpeGyJ4CwznKc3DZkMjY%3DGw%40mail.gmail.com > . > -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/CAE8gKofsHVJfBZCX12Sm%3DeNSeEj4AmehPYhsS2wJf8G3q%3DRXXA%40mail.gmail.com.
Re: [racket-users] Best way to say 'block until true'?
It is not a built-in thing. I am talking about the use-pattern of a condition variable: https://en.wikipedia.org/wiki/Monitor_(synchronization)#Condition_variables -- Jay McCarthy Associate Professor @ CS @ UMass Lowell http://jeapostrophe.github.io Vincit qui se vincit. On Fri, Mar 19, 2021 at 1:02 PM David Storrs wrote: > > > > On Fri, Mar 19, 2021 at 12:02 PM Jay McCarthy wrote: >> >> The best thing is to use a semaphore instead of a mutable reference. >> If you can't do that, then I think that you should combine the mutable >> reference with a signaling semaphore. If you can't do that, then I >> can't think of anything but a poll. > > > Is this the kind of thing you meant by 'signalling semaphore'? > > #lang racket > > (define x #f) > (define sema (make-semaphore 0)) > > (define (wait-on-x) (sync (semaphore-peek-evt sema)) always-evt) > (define (set-x! val) > (void (thread ; don't print the thread object when running in the repl > (thunk > (sleep 3) > (set! x val) > (semaphore-post sema) > > (define (check-x) > (match (sync/timeout 10 (wait-on-x)) > [#f (displayln "timeout")] > [_ (displayln "success")])) > > (set-x! 7) > (check-x) ; pauses for 3 seconds, then outputs "success" > (check-x) ; outputs "success" immediately > (check-x) ; ibid >> >> >> -- >> Jay McCarthy >> Associate Professor @ CS @ UMass Lowell >> http://jeapostrophe.github.io >> Vincit qui se vincit. >> >> On Fri, Mar 19, 2021 at 11:59 AM David Storrs wrote: >> > >> > Suppose I have a function that tests for some condition, e.g. >> > >> > (define current-user (make-parameter #f)) >> > (define (current-user-set?) (not (false? (current-user))) >> > >> > What is the best way to say "wait until 'current-user-set?' returns true"? >> > I've been through the Events chapter in the Reference and nothing seems >> > like a great fit. I could do polling via sleep or alarm-evt but that >> > seems inefficient. Is there a better way? >> > >> > -- >> > You received this message because you are subscribed to the Google Groups >> > "Racket Users" group. >> > To unsubscribe from this group and stop receiving emails from it, send an >> > email to racket-users+unsubscr...@googlegroups.com. >> > To view this discussion on the web visit >> > https://groups.google.com/d/msgid/racket-users/CAE8gKocbPgjcFAF_o2g6mhZBEH8PpeGyJ4CwznKc3DZkMjY%3DGw%40mail.gmail.com. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/CAJYbDa%3DNiV7ryAnPp_KJOuyez23FY9ztqr%3Daydij7j4hHLn43g%40mail.gmail.com.
Re: [racket-users] Best way to say 'block until true'?
On Fri, Mar 19, 2021 at 12:02 PM Jay McCarthy wrote: > The best thing is to use a semaphore instead of a mutable reference. > If you can't do that, then I think that you should combine the mutable > reference with a signaling semaphore. If you can't do that, then I > can't think of anything but a poll. > Is this the kind of thing you meant by 'signalling semaphore'? #lang racket (define x #f) (define sema (make-semaphore 0)) (define (wait-on-x) (sync (semaphore-peek-evt sema)) always-evt) (define (set-x! val) (void (thread ; don't print the thread object when running in the repl (thunk (sleep 3) (set! x val) (semaphore-post sema) (define (check-x) (match (sync/timeout 10 (wait-on-x)) [#f (displayln "timeout")] [_ (displayln "success")])) (set-x! 7) (check-x) ; pauses for 3 seconds, then outputs "success" (check-x) ; outputs "success" immediately (check-x) ; ibid > > -- > Jay McCarthy > Associate Professor @ CS @ UMass Lowell > http://jeapostrophe.github.io > Vincit qui se vincit. > > On Fri, Mar 19, 2021 at 11:59 AM David Storrs > wrote: > > > > Suppose I have a function that tests for some condition, e.g. > > > > (define current-user (make-parameter #f)) > > (define (current-user-set?) (not (false? (current-user))) > > > > What is the best way to say "wait until 'current-user-set?' returns > true"? I've been through the Events chapter in the Reference and nothing > seems like a great fit. I could do polling via sleep or alarm-evt but that > seems inefficient. Is there a better way? > > > > -- > > You received this message because you are subscribed to the Google > Groups "Racket Users" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to racket-users+unsubscr...@googlegroups.com. > > To view this discussion on the web visit > https://groups.google.com/d/msgid/racket-users/CAE8gKocbPgjcFAF_o2g6mhZBEH8PpeGyJ4CwznKc3DZkMjY%3DGw%40mail.gmail.com > . > -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/CAE8gKofBk8YjrDMXHykwJBspE5%2BOGDKhC16O%3DP96aXLe0pP90w%40mail.gmail.com.
Re: [racket-users] Word Count program/benchmark performance
I went from numbers around 1000 ms to 950 ms to 900 ms. There was variance around those numbers, but it was pretty consistent. For more precise answers, there are a few things you can try. One is to measure instructions instead of time (ie, with perf). Another is to run it a bunch of times and take an average. The `hyperfine` tool is good for that. But probably the best advice is to make the program take longer so differences are more apparent -- variation usually increases sub-linearly. Sam On Fri, Mar 19, 2021 at 12:17 PM Laurent wrote: > > Sam: How do you accurately measure such small speed-ups? On my machines, if I > run the same program twice, I can sometimes see more than 10% time difference. > > On Fri, Mar 19, 2021 at 4:10 PM Sam Tobin-Hochstadt > wrote: >> >> Use `#:authentic`, and `unsafe-vector*-{ref,set!}` saved about 50 more >> ms on my machine. >> >> Then getting rid of `set!` and just re-binding the relevant variables >> produced another 50 ms speedup. >> >> https://gist.github.com/7fc52e7bdc327fb59c8858a42258c26a >> >> Sam >> >> On Fri, Mar 19, 2021 at 7:21 AM Sam Tobin-Hochstadt >> wrote: >> > >> > One minor additional suggestion: if you use #:authentic for the struct, it >> > will generate slightly better code for the accessors. >> > >> > Sam >> > >> > On Fri, Mar 19, 2021, 6:18 AM Bogdan Popa wrote: >> >> >> >> I updated the gist with some cleanups and additional improvements that >> >> get the runtime down to a little over 1s (vs ~350ms for the optimized C >> >> and Rust code) on my maxed-out 2019 MBP and ~600ms on my M1 Mac Mini. >> >> >> >> Pawel Mosakowski writes: >> >> >> >> > Hi Bogdan, >> >> > >> >> > This is a brilliant solution and also completely over my head. It >> >> > finishes >> >> > in ~3.75s on my PC and is faster than the Python version which basically >> >> > delegates all the work to C. I will need to spend some time on >> >> > understanding it but I am looking forward to learning something new. >> >> > >> >> > Many thanks, >> >> > Pawel >> >> > >> >> > On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote: >> >> > >> >> >> I managed to get it about as fast as Python by making it really >> >> >> imperative and rolling my own hash: >> >> >> >> >> >> https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571 >> >> >> >> >> >> Sam Tobin-Hochstadt writes: >> >> >> >> >> >> > Here are several variants of the code: >> >> >> > https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9 >> >> >> > >> >> >> > The enabled version is about the fastest I can get without using >> >> >> > `unsafe` (which the rules said not to do). It's possible to optimize >> >> >> > a >> >> >> > tiny bit more by avoiding sorting, but only a few milliseconds -- it >> >> >> > would be more significant if there were more different words. >> >> >> > >> >> >> > Switching to bytes works correctly for the given task, but wouldn't >> >> >> > always work in the case of general UTF8 input. But those versions >> >> >> > appeared not be faster for me. Also, writing my own string-downcase >> >> >> > didn't help. And using a big buffer and doing my own newline >> >> >> > splitting >> >> >> > didn't help either. >> >> >> > >> >> >> > The version using just a regexp matching on a port (suggested by >> >> >> > Robby) turned out not to be faster either, so my suspicion is that >> >> >> > the >> >> >> > original slowness is just using regexps for splitting words. >> >> >> > >> >> >> > Sam >> >> >> > >> >> >> > On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt >> >> >> > wrote: >> >> >> >> >> >> >> >> Here's a somewhat-optimized version of the code: >> >> >> >> >> >> >> >> #lang racket/base >> >> >> >> (require racket/string racket/vector racket/port) >> >> >> >> >> >> >> >> (define h (make-hash)) >> >> >> >> >> >> >> >> (time >> >> >> >> (for* ([l (in-lines)] >> >> >> >> [w (in-list (string-split l))] >> >> >> >> [w* (in-value (string-downcase w))]) >> >> >> >> (hash-update! h w* add1 0))) >> >> >> >> >> >> >> >> (define v >> >> >> >> (time >> >> >> >> (for/vector #:length (hash-count h) >> >> >> >> ([(k v) (in-hash h)]) >> >> >> >> (cons k v >> >> >> >> (time (vector-sort! v > #:key cdr)) >> >> >> >> (define p (current-output-port) #;(open-output-nowhere)) >> >> >> >> (time >> >> >> >> (for ([pair (in-vector v)]) >> >> >> >> (write-string (car pair) p) >> >> >> >> (write-string (number->string (cdr pair)) p) >> >> >> >> (newline p))) >> >> >> >> >> >> >> >> It's much more imperative, but also pretty nice and compact. The >> >> >> >> `printf` optimization is significant for that portion of the >> >> >> >> program, >> >> >> >> but that isn't much of the running time. The overall running time >> >> >> >> for >> >> >> >> 10 copies of the KJV is about 9 seconds on my laptop. >> >> >> >> >> >> >> >> I think the remaining difference between Racket and other languages >> >> >> >> is >> >> >> >> likely the `string-split` and `string-downcase` functions, plus the >> >> >> >> relatively-ineffic
Re: [racket-users] Best way to say 'block until true'?
Cool. Thank you both. On Fri, Mar 19, 2021 at 12:15 PM Sam Tobin-Hochstadt wrote: > Another possibility is to send a message on a channel when the user is > set, and then just wait with `sync` for a message to appear on the > channel. > > Sam > > On Fri, Mar 19, 2021 at 12:02 PM Jay McCarthy > wrote: > > > > The best thing is to use a semaphore instead of a mutable reference. > > If you can't do that, then I think that you should combine the mutable > > reference with a signaling semaphore. If you can't do that, then I > > can't think of anything but a poll. > > > > -- > > Jay McCarthy > > Associate Professor @ CS @ UMass Lowell > > http://jeapostrophe.github.io > > Vincit qui se vincit. > > > > On Fri, Mar 19, 2021 at 11:59 AM David Storrs > wrote: > > > > > > Suppose I have a function that tests for some condition, e.g. > > > > > > (define current-user (make-parameter #f)) > > > (define (current-user-set?) (not (false? (current-user))) > > > > > > What is the best way to say "wait until 'current-user-set?' returns > true"? I've been through the Events chapter in the Reference and nothing > seems like a great fit. I could do polling via sleep or alarm-evt but that > seems inefficient. Is there a better way? > > > > > > -- > > > You received this message because you are subscribed to the Google > Groups "Racket Users" group. > > > To unsubscribe from this group and stop receiving emails from it, send > an email to racket-users+unsubscr...@googlegroups.com. > > > To view this discussion on the web visit > https://groups.google.com/d/msgid/racket-users/CAE8gKocbPgjcFAF_o2g6mhZBEH8PpeGyJ4CwznKc3DZkMjY%3DGw%40mail.gmail.com > . > > > > -- > > You received this message because you are subscribed to the Google > Groups "Racket Users" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to racket-users+unsubscr...@googlegroups.com. > > To view this discussion on the web visit > https://groups.google.com/d/msgid/racket-users/CAJYbDanE4zqgFRAFSYs4kdLzjKf9xg3xi0JMNU7VmFREstNBgQ%40mail.gmail.com > . > -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/CAE8gKoeCDA653EC78oM-ZLZ3Ok%3Ds0%3DTczO96ggVh%2B6EBA-%2BVjQ%40mail.gmail.com.
Re: [racket-users] Word Count program/benchmark performance
Sam: How do you accurately measure such small speed-ups? On my machines, if I run the same program twice, I can sometimes see more than 10% time difference. On Fri, Mar 19, 2021 at 4:10 PM Sam Tobin-Hochstadt wrote: > Use `#:authentic`, and `unsafe-vector*-{ref,set!}` saved about 50 more > ms on my machine. > > Then getting rid of `set!` and just re-binding the relevant variables > produced another 50 ms speedup. > > https://gist.github.com/7fc52e7bdc327fb59c8858a42258c26a > > Sam > > On Fri, Mar 19, 2021 at 7:21 AM Sam Tobin-Hochstadt > wrote: > > > > One minor additional suggestion: if you use #:authentic for the struct, > it will generate slightly better code for the accessors. > > > > Sam > > > > On Fri, Mar 19, 2021, 6:18 AM Bogdan Popa wrote: > >> > >> I updated the gist with some cleanups and additional improvements that > >> get the runtime down to a little over 1s (vs ~350ms for the optimized C > >> and Rust code) on my maxed-out 2019 MBP and ~600ms on my M1 Mac Mini. > >> > >> Pawel Mosakowski writes: > >> > >> > Hi Bogdan, > >> > > >> > This is a brilliant solution and also completely over my head. It > finishes > >> > in ~3.75s on my PC and is faster than the Python version which > basically > >> > delegates all the work to C. I will need to spend some time on > >> > understanding it but I am looking forward to learning something new. > >> > > >> > Many thanks, > >> > Pawel > >> > > >> > On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote: > >> > > >> >> I managed to get it about as fast as Python by making it really > >> >> imperative and rolling my own hash: > >> >> > >> >> https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571 > >> >> > >> >> Sam Tobin-Hochstadt writes: > >> >> > >> >> > Here are several variants of the code: > >> >> > https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9 > >> >> > > >> >> > The enabled version is about the fastest I can get without using > >> >> > `unsafe` (which the rules said not to do). It's possible to > optimize a > >> >> > tiny bit more by avoiding sorting, but only a few milliseconds -- > it > >> >> > would be more significant if there were more different words. > >> >> > > >> >> > Switching to bytes works correctly for the given task, but wouldn't > >> >> > always work in the case of general UTF8 input. But those versions > >> >> > appeared not be faster for me. Also, writing my own string-downcase > >> >> > didn't help. And using a big buffer and doing my own newline > splitting > >> >> > didn't help either. > >> >> > > >> >> > The version using just a regexp matching on a port (suggested by > >> >> > Robby) turned out not to be faster either, so my suspicion is that > the > >> >> > original slowness is just using regexps for splitting words. > >> >> > > >> >> > Sam > >> >> > > >> >> > On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt > >> >> > wrote: > >> >> >> > >> >> >> Here's a somewhat-optimized version of the code: > >> >> >> > >> >> >> #lang racket/base > >> >> >> (require racket/string racket/vector racket/port) > >> >> >> > >> >> >> (define h (make-hash)) > >> >> >> > >> >> >> (time > >> >> >> (for* ([l (in-lines)] > >> >> >> [w (in-list (string-split l))] > >> >> >> [w* (in-value (string-downcase w))]) > >> >> >> (hash-update! h w* add1 0))) > >> >> >> > >> >> >> (define v > >> >> >> (time > >> >> >> (for/vector #:length (hash-count h) > >> >> >> ([(k v) (in-hash h)]) > >> >> >> (cons k v > >> >> >> (time (vector-sort! v > #:key cdr)) > >> >> >> (define p (current-output-port) #;(open-output-nowhere)) > >> >> >> (time > >> >> >> (for ([pair (in-vector v)]) > >> >> >> (write-string (car pair) p) > >> >> >> (write-string (number->string (cdr pair)) p) > >> >> >> (newline p))) > >> >> >> > >> >> >> It's much more imperative, but also pretty nice and compact. The > >> >> >> `printf` optimization is significant for that portion of the > program, > >> >> >> but that isn't much of the running time. The overall running time > for > >> >> >> 10 copies of the KJV is about 9 seconds on my laptop. > >> >> >> > >> >> >> I think the remaining difference between Racket and other > languages is > >> >> >> likely the `string-split` and `string-downcase` functions, plus > the > >> >> >> relatively-inefficient string representation that Racket uses. > >> >> >> > >> >> >> Sam > >> >> >> > >> >> >> > >> >> >> On Thu, Mar 18, 2021 at 10:28 AM Pawel Mosakowski < > pa...@mosakowski.net> > >> >> wrote: > >> >> >> > > >> >> >> > Hi David, > >> >> >> > > >> >> >> > Yes, the 21 seconds includes the interpreter startup time. I > have > >> >> done a simple test to see how long it takes: > >> >> >> > > >> >> >> > $ time racket -e '(displayln "Hello, world")' > >> >> >> > Hello, world > >> >> >> > > >> >> >> > real 0m0.479s > >> >> >> > user 0m0.449s > >> >> >> > sys 0m0.030s > >> >> >> > > >> >> >> > I have also put my code inside a main function and profiled it: > >> >> >> > > >> >> >> > Profiling results > >> >> >> > -
Re: [racket-users] Best way to say 'block until true'?
Another possibility is to send a message on a channel when the user is set, and then just wait with `sync` for a message to appear on the channel. Sam On Fri, Mar 19, 2021 at 12:02 PM Jay McCarthy wrote: > > The best thing is to use a semaphore instead of a mutable reference. > If you can't do that, then I think that you should combine the mutable > reference with a signaling semaphore. If you can't do that, then I > can't think of anything but a poll. > > -- > Jay McCarthy > Associate Professor @ CS @ UMass Lowell > http://jeapostrophe.github.io > Vincit qui se vincit. > > On Fri, Mar 19, 2021 at 11:59 AM David Storrs wrote: > > > > Suppose I have a function that tests for some condition, e.g. > > > > (define current-user (make-parameter #f)) > > (define (current-user-set?) (not (false? (current-user))) > > > > What is the best way to say "wait until 'current-user-set?' returns true"? > > I've been through the Events chapter in the Reference and nothing seems > > like a great fit. I could do polling via sleep or alarm-evt but that seems > > inefficient. Is there a better way? > > > > -- > > You received this message because you are subscribed to the Google Groups > > "Racket Users" group. > > To unsubscribe from this group and stop receiving emails from it, send an > > email to racket-users+unsubscr...@googlegroups.com. > > To view this discussion on the web visit > > https://groups.google.com/d/msgid/racket-users/CAE8gKocbPgjcFAF_o2g6mhZBEH8PpeGyJ4CwznKc3DZkMjY%3DGw%40mail.gmail.com. > > -- > You received this message because you are subscribed to the Google Groups > "Racket Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to racket-users+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/racket-users/CAJYbDanE4zqgFRAFSYs4kdLzjKf9xg3xi0JMNU7VmFREstNBgQ%40mail.gmail.com. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/CAK%3DHD%2BZbgPu734nDR408MGnvLG_tbfzeHp8TeiJ0546Fu8zVKQ%40mail.gmail.com.
Re: [racket-users] Word Count program/benchmark performance
Use `#:authentic`, and `unsafe-vector*-{ref,set!}` saved about 50 more ms on my machine. Then getting rid of `set!` and just re-binding the relevant variables produced another 50 ms speedup. https://gist.github.com/7fc52e7bdc327fb59c8858a42258c26a Sam On Fri, Mar 19, 2021 at 7:21 AM Sam Tobin-Hochstadt wrote: > > One minor additional suggestion: if you use #:authentic for the struct, it > will generate slightly better code for the accessors. > > Sam > > On Fri, Mar 19, 2021, 6:18 AM Bogdan Popa wrote: >> >> I updated the gist with some cleanups and additional improvements that >> get the runtime down to a little over 1s (vs ~350ms for the optimized C >> and Rust code) on my maxed-out 2019 MBP and ~600ms on my M1 Mac Mini. >> >> Pawel Mosakowski writes: >> >> > Hi Bogdan, >> > >> > This is a brilliant solution and also completely over my head. It finishes >> > in ~3.75s on my PC and is faster than the Python version which basically >> > delegates all the work to C. I will need to spend some time on >> > understanding it but I am looking forward to learning something new. >> > >> > Many thanks, >> > Pawel >> > >> > On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote: >> > >> >> I managed to get it about as fast as Python by making it really >> >> imperative and rolling my own hash: >> >> >> >> https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571 >> >> >> >> Sam Tobin-Hochstadt writes: >> >> >> >> > Here are several variants of the code: >> >> > https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9 >> >> > >> >> > The enabled version is about the fastest I can get without using >> >> > `unsafe` (which the rules said not to do). It's possible to optimize a >> >> > tiny bit more by avoiding sorting, but only a few milliseconds -- it >> >> > would be more significant if there were more different words. >> >> > >> >> > Switching to bytes works correctly for the given task, but wouldn't >> >> > always work in the case of general UTF8 input. But those versions >> >> > appeared not be faster for me. Also, writing my own string-downcase >> >> > didn't help. And using a big buffer and doing my own newline splitting >> >> > didn't help either. >> >> > >> >> > The version using just a regexp matching on a port (suggested by >> >> > Robby) turned out not to be faster either, so my suspicion is that the >> >> > original slowness is just using regexps for splitting words. >> >> > >> >> > Sam >> >> > >> >> > On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt >> >> > wrote: >> >> >> >> >> >> Here's a somewhat-optimized version of the code: >> >> >> >> >> >> #lang racket/base >> >> >> (require racket/string racket/vector racket/port) >> >> >> >> >> >> (define h (make-hash)) >> >> >> >> >> >> (time >> >> >> (for* ([l (in-lines)] >> >> >> [w (in-list (string-split l))] >> >> >> [w* (in-value (string-downcase w))]) >> >> >> (hash-update! h w* add1 0))) >> >> >> >> >> >> (define v >> >> >> (time >> >> >> (for/vector #:length (hash-count h) >> >> >> ([(k v) (in-hash h)]) >> >> >> (cons k v >> >> >> (time (vector-sort! v > #:key cdr)) >> >> >> (define p (current-output-port) #;(open-output-nowhere)) >> >> >> (time >> >> >> (for ([pair (in-vector v)]) >> >> >> (write-string (car pair) p) >> >> >> (write-string (number->string (cdr pair)) p) >> >> >> (newline p))) >> >> >> >> >> >> It's much more imperative, but also pretty nice and compact. The >> >> >> `printf` optimization is significant for that portion of the program, >> >> >> but that isn't much of the running time. The overall running time for >> >> >> 10 copies of the KJV is about 9 seconds on my laptop. >> >> >> >> >> >> I think the remaining difference between Racket and other languages is >> >> >> likely the `string-split` and `string-downcase` functions, plus the >> >> >> relatively-inefficient string representation that Racket uses. >> >> >> >> >> >> Sam >> >> >> >> >> >> >> >> >> On Thu, Mar 18, 2021 at 10:28 AM Pawel Mosakowski >> >> >> >> >> wrote: >> >> >> > >> >> >> > Hi David, >> >> >> > >> >> >> > Yes, the 21 seconds includes the interpreter startup time. I have >> >> done a simple test to see how long it takes: >> >> >> > >> >> >> > $ time racket -e '(displayln "Hello, world")' >> >> >> > Hello, world >> >> >> > >> >> >> > real 0m0.479s >> >> >> > user 0m0.449s >> >> >> > sys 0m0.030s >> >> >> > >> >> >> > I have also put my code inside a main function and profiled it: >> >> >> > >> >> >> > Profiling results >> >> >> > - >> >> >> > Total cpu time observed: 20910ms (out of 20970ms) >> >> >> > Number of samples taken: 382 (once every 55ms) >> >> >> > (Hiding functions with self<1.0% and local<2.0%: 1 of 12 hidden) >> >> >> > >> >> >> > == >> >> >> > Caller >> >> >> > Idx Total Self Name+src Local% >> >> >> > ms(pct) ms(pct) Callee >> >> >> > == >> >> >> > [1] 20910(100.0%) 0(0.0%) [running body
Re: [racket-users] Best way to say 'block until true'?
The best thing is to use a semaphore instead of a mutable reference. If you can't do that, then I think that you should combine the mutable reference with a signaling semaphore. If you can't do that, then I can't think of anything but a poll. -- Jay McCarthy Associate Professor @ CS @ UMass Lowell http://jeapostrophe.github.io Vincit qui se vincit. On Fri, Mar 19, 2021 at 11:59 AM David Storrs wrote: > > Suppose I have a function that tests for some condition, e.g. > > (define current-user (make-parameter #f)) > (define (current-user-set?) (not (false? (current-user))) > > What is the best way to say "wait until 'current-user-set?' returns true"? > I've been through the Events chapter in the Reference and nothing seems like > a great fit. I could do polling via sleep or alarm-evt but that seems > inefficient. Is there a better way? > > -- > You received this message because you are subscribed to the Google Groups > "Racket Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to racket-users+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/racket-users/CAE8gKocbPgjcFAF_o2g6mhZBEH8PpeGyJ4CwznKc3DZkMjY%3DGw%40mail.gmail.com. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/CAJYbDanE4zqgFRAFSYs4kdLzjKf9xg3xi0JMNU7VmFREstNBgQ%40mail.gmail.com.
[racket-users] Best way to say 'block until true'?
Suppose I have a function that tests for some condition, e.g. (define current-user (make-parameter #f)) (define (current-user-set?) (not (false? (current-user))) What is the best way to say "wait until 'current-user-set?' returns true"? I've been through the Events chapter in the Reference and nothing seems like a great fit. I could do polling via sleep or alarm-evt but that seems inefficient. Is there a better way? -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/CAE8gKocbPgjcFAF_o2g6mhZBEH8PpeGyJ4CwznKc3DZkMjY%3DGw%40mail.gmail.com.
Re: [racket-users] Word Count program/benchmark performance
(Welcome to Racket v8.0.0.1 [cs]. ) All results are measured on my laptop on the 10x file with `$ time racket `, thus including the Racket VM. * Bogdan's version with #lang racket/base: 1s. * Dominik's version with vectors of length 256 (instead of 26) and splitting on spaces/return/newline only and #lang racket/base: 1.6s But Bogdan's printing takes only ~100ms while Dominik's takes ~400ms. * I also implemented a variant of Dominik's 'discrimination tree' to use a hasheqv instead of a vector at each node: 6.2s (but the memory footprint is likely nicer :-p ) This also uses `in-lines` instead of Bogdan's buffers so there may still be something to gain here. * Replacing a hasheqv with an assoc: 4.2s. * Starting with an assoc and switching to a hasheqv when there are too many elements (n=20, couldn't do better): 3.9s Code is here: https://gist.github.com/Metaxal/ae0a6937d8f388f3f40ec7396041be55 I also noticed that `dict-ref` is *really* slow (35s) compared to `assv`. @Bogdan: You can use `#:key` in `sort`. On Fri, Mar 19, 2021 at 12:08 PM Bogdan Popa wrote: > Nice! It's worth pointing out, though, that by limiting yourself to > alpha chars, you're processing about 8% less data and the results don't > pass the tests. :P > > $ wc kjvbible_x10.txt > 998170 8211330 43325060 > > $ sed 's/[a-zA-Z ]//g' < kjvbible_x10.txt | wc > 998170 739310 3600800 > > I think your version would still be faster, but I'm curious what the > numbers would look like if only whitespace chars were considered word > separators. > > Dominik Pantůček writes: > > > Another attack of [1]. But yeah, why not do some [2]. > > > > Trees to the rescue [3]. > > > > $ racket --version > > Welcome to Racket v8.0 [cs]. > > > > $ racket countwords-bogdan2.rkt > cpu time: 135 real time: 135 gc time: 8 > > > > $ racket countwords-dzoe2.rkt > cpu time: 69 real time: 69 gc time: 3 > > > > I just changed (countwords) to (time (countwords)) in Bogdan's code to > > measure the running time. > > > > The difference is that I am positively defining which letters form words > > (a-z, A-Z) and that all others are treated as word separators. The > > buffer size is the same - and honestly, the speedup between 1024 and > > 1024^2 bytes buffer is barely measurable. > > > > The only option for further speedup I can immediately think of is to > > allocate a huge vector of wtnodes and change chld field to be a starting > > index into this big vector (should reduce allocations). > > > > Btw, making it unsafe does not speed it up at all (probably CS > > recognizes the vectors and all those refs are inlined anyway). > > > > > > Cheers, > > Dominik > > > > [1] https://xkcd.com/386/ > > [2] http://phdcomics.com/comics/archive.php?comicid=1735 > > [3] https://gist.github.com/dzoep/0e081d0544afac539a4829179c601e0e > > > > On 19. 03. 21 11:18, Bogdan Popa wrote: > >> I updated the gist with some cleanups and additional improvements that > >> get the runtime down to a little over 1s (vs ~350ms for the optimized C > >> and Rust code) on my maxed-out 2019 MBP and ~600ms on my M1 Mac Mini. > >> > >> Pawel Mosakowski writes: > >> > >>> Hi Bogdan, > >>> > >>> This is a brilliant solution and also completely over my head. It > finishes > >>> in ~3.75s on my PC and is faster than the Python version which > basically > >>> delegates all the work to C. I will need to spend some time on > >>> understanding it but I am looking forward to learning something new. > >>> > >>> Many thanks, > >>> Pawel > >>> > >>> On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote: > >>> > I managed to get it about as fast as Python by making it really > imperative and rolling my own hash: > > https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571 > > Sam Tobin-Hochstadt writes: > > > Here are several variants of the code: > > https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9 > > > > The enabled version is about the fastest I can get without using > > `unsafe` (which the rules said not to do). It's possible to optimize > a > > tiny bit more by avoiding sorting, but only a few milliseconds -- it > > would be more significant if there were more different words. > > > > Switching to bytes works correctly for the given task, but wouldn't > > always work in the case of general UTF8 input. But those versions > > appeared not be faster for me. Also, writing my own string-downcase > > didn't help. And using a big buffer and doing my own newline > splitting > > didn't help either. > > > > The version using just a regexp matching on a port (suggested by > > Robby) turned out not to be faster either, so my suspicion is that > the > > original slowness is just using regexps for splitting words. > > > > Sam > > > > On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt > > wrote: > >> > >> Here's a somewhat-optimized version of the cod
Re: [racket-users] Word Count program/benchmark performance
Nice! It's worth pointing out, though, that by limiting yourself to alpha chars, you're processing about 8% less data and the results don't pass the tests. :P $ wc kjvbible_x10.txt 998170 8211330 43325060 $ sed 's/[a-zA-Z ]//g' < kjvbible_x10.txt | wc 998170 739310 3600800 I think your version would still be faster, but I'm curious what the numbers would look like if only whitespace chars were considered word separators. Dominik Pantůček writes: > Another attack of [1]. But yeah, why not do some [2]. > > Trees to the rescue [3]. > > $ racket --version > Welcome to Racket v8.0 [cs]. > > $ racket countwords-bogdan2.rkt cpu time: 135 real time: 135 gc time: 8 > > $ racket countwords-dzoe2.rkt cpu time: 69 real time: 69 gc time: 3 > > I just changed (countwords) to (time (countwords)) in Bogdan's code to > measure the running time. > > The difference is that I am positively defining which letters form words > (a-z, A-Z) and that all others are treated as word separators. The > buffer size is the same - and honestly, the speedup between 1024 and > 1024^2 bytes buffer is barely measurable. > > The only option for further speedup I can immediately think of is to > allocate a huge vector of wtnodes and change chld field to be a starting > index into this big vector (should reduce allocations). > > Btw, making it unsafe does not speed it up at all (probably CS > recognizes the vectors and all those refs are inlined anyway). > > > Cheers, > Dominik > > [1] https://xkcd.com/386/ > [2] http://phdcomics.com/comics/archive.php?comicid=1735 > [3] https://gist.github.com/dzoep/0e081d0544afac539a4829179c601e0e > > On 19. 03. 21 11:18, Bogdan Popa wrote: >> I updated the gist with some cleanups and additional improvements that >> get the runtime down to a little over 1s (vs ~350ms for the optimized C >> and Rust code) on my maxed-out 2019 MBP and ~600ms on my M1 Mac Mini. >> >> Pawel Mosakowski writes: >> >>> Hi Bogdan, >>> >>> This is a brilliant solution and also completely over my head. It finishes >>> in ~3.75s on my PC and is faster than the Python version which basically >>> delegates all the work to C. I will need to spend some time on >>> understanding it but I am looking forward to learning something new. >>> >>> Many thanks, >>> Pawel >>> >>> On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote: >>> I managed to get it about as fast as Python by making it really imperative and rolling my own hash: https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571 Sam Tobin-Hochstadt writes: > Here are several variants of the code: > https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9 > > The enabled version is about the fastest I can get without using > `unsafe` (which the rules said not to do). It's possible to optimize a > tiny bit more by avoiding sorting, but only a few milliseconds -- it > would be more significant if there were more different words. > > Switching to bytes works correctly for the given task, but wouldn't > always work in the case of general UTF8 input. But those versions > appeared not be faster for me. Also, writing my own string-downcase > didn't help. And using a big buffer and doing my own newline splitting > didn't help either. > > The version using just a regexp matching on a port (suggested by > Robby) turned out not to be faster either, so my suspicion is that the > original slowness is just using regexps for splitting words. > > Sam > > On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt > wrote: >> >> Here's a somewhat-optimized version of the code: >> >> #lang racket/base >> (require racket/string racket/vector racket/port) >> >> (define h (make-hash)) >> >> (time >> (for* ([l (in-lines)] >> [w (in-list (string-split l))] >> [w* (in-value (string-downcase w))]) >> (hash-update! h w* add1 0))) >> >> (define v >> (time >> (for/vector #:length (hash-count h) >> ([(k v) (in-hash h)]) >> (cons k v >> (time (vector-sort! v > #:key cdr)) >> (define p (current-output-port) #;(open-output-nowhere)) >> (time >> (for ([pair (in-vector v)]) >> (write-string (car pair) p) >> (write-string (number->string (cdr pair)) p) >> (newline p))) >> >> It's much more imperative, but also pretty nice and compact. The >> `printf` optimization is significant for that portion of the program, >> but that isn't much of the running time. The overall running time for >> 10 copies of the KJV is about 9 seconds on my laptop. >> >> I think the remaining difference between Racket and other languages is >> likely the `string-split` and `string-downcase` functions, plus the >> relatively-inefficient string representation that Racket uses. >> >> Sam >> >> >> On Thu, Mar 18, 2021 at 10:28 AM Pawel M
Re: [racket-users] Word Count program/benchmark performance
Another attack of [1]. But yeah, why not do some [2]. Trees to the rescue [3]. $ racket --version Welcome to Racket v8.0 [cs]. $ racket countwords-bogdan2.rkt https://xkcd.com/386/ [2] http://phdcomics.com/comics/archive.php?comicid=1735 [3] https://gist.github.com/dzoep/0e081d0544afac539a4829179c601e0e On 19. 03. 21 11:18, Bogdan Popa wrote: > I updated the gist with some cleanups and additional improvements that > get the runtime down to a little over 1s (vs ~350ms for the optimized C > and Rust code) on my maxed-out 2019 MBP and ~600ms on my M1 Mac Mini. > > Pawel Mosakowski writes: > >> Hi Bogdan, >> >> This is a brilliant solution and also completely over my head. It finishes >> in ~3.75s on my PC and is faster than the Python version which basically >> delegates all the work to C. I will need to spend some time on >> understanding it but I am looking forward to learning something new. >> >> Many thanks, >> Pawel >> >> On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote: >> >>> I managed to get it about as fast as Python by making it really >>> imperative and rolling my own hash: >>> >>> https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571 >>> >>> Sam Tobin-Hochstadt writes: >>> Here are several variants of the code: https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9 The enabled version is about the fastest I can get without using `unsafe` (which the rules said not to do). It's possible to optimize a tiny bit more by avoiding sorting, but only a few milliseconds -- it would be more significant if there were more different words. Switching to bytes works correctly for the given task, but wouldn't always work in the case of general UTF8 input. But those versions appeared not be faster for me. Also, writing my own string-downcase didn't help. And using a big buffer and doing my own newline splitting didn't help either. The version using just a regexp matching on a port (suggested by Robby) turned out not to be faster either, so my suspicion is that the original slowness is just using regexps for splitting words. Sam On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt wrote: > > Here's a somewhat-optimized version of the code: > > #lang racket/base > (require racket/string racket/vector racket/port) > > (define h (make-hash)) > > (time > (for* ([l (in-lines)] > [w (in-list (string-split l))] > [w* (in-value (string-downcase w))]) > (hash-update! h w* add1 0))) > > (define v > (time > (for/vector #:length (hash-count h) > ([(k v) (in-hash h)]) > (cons k v > (time (vector-sort! v > #:key cdr)) > (define p (current-output-port) #;(open-output-nowhere)) > (time > (for ([pair (in-vector v)]) > (write-string (car pair) p) > (write-string (number->string (cdr pair)) p) > (newline p))) > > It's much more imperative, but also pretty nice and compact. The > `printf` optimization is significant for that portion of the program, > but that isn't much of the running time. The overall running time for > 10 copies of the KJV is about 9 seconds on my laptop. > > I think the remaining difference between Racket and other languages is > likely the `string-split` and `string-downcase` functions, plus the > relatively-inefficient string representation that Racket uses. > > Sam > > > On Thu, Mar 18, 2021 at 10:28 AM Pawel Mosakowski >>> wrote: >> >> Hi David, >> >> Yes, the 21 seconds includes the interpreter startup time. I have >>> done a simple test to see how long it takes: >> >> $ time racket -e '(displayln "Hello, world")' >> Hello, world >> >> real 0m0.479s >> user 0m0.449s >> sys 0m0.030s >> >> I have also put my code inside a main function and profiled it: >> >> Profiling results >> - >> Total cpu time observed: 20910ms (out of 20970ms) >> Number of samples taken: 382 (once every 55ms) >> (Hiding functions with self<1.0% and local<2.0%: 1 of 12 hidden) >> >> == >> Caller >> Idx Total Self Name+src Local% >> ms(pct) ms(pct) Callee >> == >> [1] 20910(100.0%) 0(0.0%) [running body] >>> ...word-occurences-profile.rkt":##f >> profile-thunk [2] 100.0% >> -- >> [running body] [1] 100.0% >> [2] 20910(100.0%) 0(0.0%) profile-thunk >>> ...ket/pkgs/profile-lib/main.rkt:9:0 >> run [3] 100.0% >> -- >> profile-thunk [2] 100.0% >> [3] 20910(100.0%) 0(0.0%) run >>> ...share/racket/pkgs/profile-lib/main.rkt:39:2 >> main [4] 100.0% >> ---
Re: [racket-users] Word Count program/benchmark performance
One minor additional suggestion: if you use #:authentic for the struct, it will generate slightly better code for the accessors. Sam On Fri, Mar 19, 2021, 6:18 AM Bogdan Popa wrote: > I updated the gist with some cleanups and additional improvements that > get the runtime down to a little over 1s (vs ~350ms for the optimized C > and Rust code) on my maxed-out 2019 MBP and ~600ms on my M1 Mac Mini. > > Pawel Mosakowski writes: > > > Hi Bogdan, > > > > This is a brilliant solution and also completely over my head. It > finishes > > in ~3.75s on my PC and is faster than the Python version which basically > > delegates all the work to C. I will need to spend some time on > > understanding it but I am looking forward to learning something new. > > > > Many thanks, > > Pawel > > > > On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote: > > > >> I managed to get it about as fast as Python by making it really > >> imperative and rolling my own hash: > >> > >> https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571 > >> > >> Sam Tobin-Hochstadt writes: > >> > >> > Here are several variants of the code: > >> > https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9 > >> > > >> > The enabled version is about the fastest I can get without using > >> > `unsafe` (which the rules said not to do). It's possible to optimize a > >> > tiny bit more by avoiding sorting, but only a few milliseconds -- it > >> > would be more significant if there were more different words. > >> > > >> > Switching to bytes works correctly for the given task, but wouldn't > >> > always work in the case of general UTF8 input. But those versions > >> > appeared not be faster for me. Also, writing my own string-downcase > >> > didn't help. And using a big buffer and doing my own newline splitting > >> > didn't help either. > >> > > >> > The version using just a regexp matching on a port (suggested by > >> > Robby) turned out not to be faster either, so my suspicion is that the > >> > original slowness is just using regexps for splitting words. > >> > > >> > Sam > >> > > >> > On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt > >> > wrote: > >> >> > >> >> Here's a somewhat-optimized version of the code: > >> >> > >> >> #lang racket/base > >> >> (require racket/string racket/vector racket/port) > >> >> > >> >> (define h (make-hash)) > >> >> > >> >> (time > >> >> (for* ([l (in-lines)] > >> >> [w (in-list (string-split l))] > >> >> [w* (in-value (string-downcase w))]) > >> >> (hash-update! h w* add1 0))) > >> >> > >> >> (define v > >> >> (time > >> >> (for/vector #:length (hash-count h) > >> >> ([(k v) (in-hash h)]) > >> >> (cons k v > >> >> (time (vector-sort! v > #:key cdr)) > >> >> (define p (current-output-port) #;(open-output-nowhere)) > >> >> (time > >> >> (for ([pair (in-vector v)]) > >> >> (write-string (car pair) p) > >> >> (write-string (number->string (cdr pair)) p) > >> >> (newline p))) > >> >> > >> >> It's much more imperative, but also pretty nice and compact. The > >> >> `printf` optimization is significant for that portion of the program, > >> >> but that isn't much of the running time. The overall running time for > >> >> 10 copies of the KJV is about 9 seconds on my laptop. > >> >> > >> >> I think the remaining difference between Racket and other languages > is > >> >> likely the `string-split` and `string-downcase` functions, plus the > >> >> relatively-inefficient string representation that Racket uses. > >> >> > >> >> Sam > >> >> > >> >> > >> >> On Thu, Mar 18, 2021 at 10:28 AM Pawel Mosakowski < > pa...@mosakowski.net> > >> wrote: > >> >> > > >> >> > Hi David, > >> >> > > >> >> > Yes, the 21 seconds includes the interpreter startup time. I have > >> done a simple test to see how long it takes: > >> >> > > >> >> > $ time racket -e '(displayln "Hello, world")' > >> >> > Hello, world > >> >> > > >> >> > real 0m0.479s > >> >> > user 0m0.449s > >> >> > sys 0m0.030s > >> >> > > >> >> > I have also put my code inside a main function and profiled it: > >> >> > > >> >> > Profiling results > >> >> > - > >> >> > Total cpu time observed: 20910ms (out of 20970ms) > >> >> > Number of samples taken: 382 (once every 55ms) > >> >> > (Hiding functions with self<1.0% and local<2.0%: 1 of 12 hidden) > >> >> > > >> >> > == > >> >> > Caller > >> >> > Idx Total Self Name+src Local% > >> >> > ms(pct) ms(pct) Callee > >> >> > == > >> >> > [1] 20910(100.0%) 0(0.0%) [running body] > >> ...word-occurences-profile.rkt":##f > >> >> > profile-thunk [2] 100.0% > >> >> > -- > >> >> > [running body] [1] 100.0% > >> >> > [2] 20910(100.0%) 0(0.0%) profile-thunk > >> ...ket/pkgs/profile-lib/main.rkt:9:0 > >> >> > run [3] 100.0% > >> >> > -- > >> >> > profile-thunk [2] 100.0% > >> >> > [3] 2
Re: [racket-users] Word Count program/benchmark performance
I updated the gist with some cleanups and additional improvements that get the runtime down to a little over 1s (vs ~350ms for the optimized C and Rust code) on my maxed-out 2019 MBP and ~600ms on my M1 Mac Mini. Pawel Mosakowski writes: > Hi Bogdan, > > This is a brilliant solution and also completely over my head. It finishes > in ~3.75s on my PC and is faster than the Python version which basically > delegates all the work to C. I will need to spend some time on > understanding it but I am looking forward to learning something new. > > Many thanks, > Pawel > > On Thursday, March 18, 2021 at 7:22:10 PM UTC bogdan wrote: > >> I managed to get it about as fast as Python by making it really >> imperative and rolling my own hash: >> >> https://gist.github.com/Bogdanp/fb39d202037cdaadd55dae3d45737571 >> >> Sam Tobin-Hochstadt writes: >> >> > Here are several variants of the code: >> > https://gist.github.com/d6fbe3757c462d5b4d1d9393b72f9ab9 >> > >> > The enabled version is about the fastest I can get without using >> > `unsafe` (which the rules said not to do). It's possible to optimize a >> > tiny bit more by avoiding sorting, but only a few milliseconds -- it >> > would be more significant if there were more different words. >> > >> > Switching to bytes works correctly for the given task, but wouldn't >> > always work in the case of general UTF8 input. But those versions >> > appeared not be faster for me. Also, writing my own string-downcase >> > didn't help. And using a big buffer and doing my own newline splitting >> > didn't help either. >> > >> > The version using just a regexp matching on a port (suggested by >> > Robby) turned out not to be faster either, so my suspicion is that the >> > original slowness is just using regexps for splitting words. >> > >> > Sam >> > >> > On Thu, Mar 18, 2021 at 11:28 AM Sam Tobin-Hochstadt >> > wrote: >> >> >> >> Here's a somewhat-optimized version of the code: >> >> >> >> #lang racket/base >> >> (require racket/string racket/vector racket/port) >> >> >> >> (define h (make-hash)) >> >> >> >> (time >> >> (for* ([l (in-lines)] >> >> [w (in-list (string-split l))] >> >> [w* (in-value (string-downcase w))]) >> >> (hash-update! h w* add1 0))) >> >> >> >> (define v >> >> (time >> >> (for/vector #:length (hash-count h) >> >> ([(k v) (in-hash h)]) >> >> (cons k v >> >> (time (vector-sort! v > #:key cdr)) >> >> (define p (current-output-port) #;(open-output-nowhere)) >> >> (time >> >> (for ([pair (in-vector v)]) >> >> (write-string (car pair) p) >> >> (write-string (number->string (cdr pair)) p) >> >> (newline p))) >> >> >> >> It's much more imperative, but also pretty nice and compact. The >> >> `printf` optimization is significant for that portion of the program, >> >> but that isn't much of the running time. The overall running time for >> >> 10 copies of the KJV is about 9 seconds on my laptop. >> >> >> >> I think the remaining difference between Racket and other languages is >> >> likely the `string-split` and `string-downcase` functions, plus the >> >> relatively-inefficient string representation that Racket uses. >> >> >> >> Sam >> >> >> >> >> >> On Thu, Mar 18, 2021 at 10:28 AM Pawel Mosakowski >> wrote: >> >> > >> >> > Hi David, >> >> > >> >> > Yes, the 21 seconds includes the interpreter startup time. I have >> done a simple test to see how long it takes: >> >> > >> >> > $ time racket -e '(displayln "Hello, world")' >> >> > Hello, world >> >> > >> >> > real 0m0.479s >> >> > user 0m0.449s >> >> > sys 0m0.030s >> >> > >> >> > I have also put my code inside a main function and profiled it: >> >> > >> >> > Profiling results >> >> > - >> >> > Total cpu time observed: 20910ms (out of 20970ms) >> >> > Number of samples taken: 382 (once every 55ms) >> >> > (Hiding functions with self<1.0% and local<2.0%: 1 of 12 hidden) >> >> > >> >> > == >> >> > Caller >> >> > Idx Total Self Name+src Local% >> >> > ms(pct) ms(pct) Callee >> >> > == >> >> > [1] 20910(100.0%) 0(0.0%) [running body] >> ...word-occurences-profile.rkt":##f >> >> > profile-thunk [2] 100.0% >> >> > -- >> >> > [running body] [1] 100.0% >> >> > [2] 20910(100.0%) 0(0.0%) profile-thunk >> ...ket/pkgs/profile-lib/main.rkt:9:0 >> >> > run [3] 100.0% >> >> > -- >> >> > profile-thunk [2] 100.0% >> >> > [3] 20910(100.0%) 0(0.0%) run >> ...share/racket/pkgs/profile-lib/main.rkt:39:2 >> >> > main [4] 100.0% >> >> > -- >> >> > run [3] 100.0% >> >> > [4] 20910(100.0%) 50(0.2%) main >> ...cket/count-word-occurences-profile.rkt:5:0 >> >> > read-from-stdin-it [5] 98.5% >> >> > ??? [6] 0.2% >> >> > -- >> >> > main [4] 100.0% >> >> > [5] 20606(98.5%) 117