FYI, I just updated Optimization Coach to report when you pass non-regexp-valued patterns to regexp functions.
Thanks for the report! Vincent At Thu, 26 Jun 2014 20:03:37 +0200, Laurent wrote: > > Oh woah! Thank you very much Matthew! What a relief :) > > I thought I had tried that, but looks like I didn't... (my hash table was > probably misplaced and your solution is neater anyway) > > Laurent > > > On Thu, Jun 26, 2014 at 6:35 PM, Matthew Flatt <mfl...@cs.utah.edu> wrote: > > > Instead of passing a string directly as the first argument to > > `regexp-match`, use `regexp` to convert the string to a regexp value if > > you want to use it multiple times. > > > > Changing > > > > (define (matches rx lstr) > > (filter (λ(w)(regexp-match? rx w)) lstr)) > > > > to > > > > (define (matches rx-str lstr) > > (define rx (regexp rx-str)) > > (filter (λ(w)(regexp-match? rx w)) lstr)) > > > > makes the "regexp-golf-racket.rkt" example take 3 seconds on my machine > > instead of 50 seconds. Similarly, mapping `regexp` over the right-hand > > side of `lrx` in "regex-timing.rkt" makes it take less than 1 second > > instead of 25 seconds. > > > > I think Python similarly converts strings to regexps, but it internally > > caches the conversion. > > > > At Thu, 26 Jun 2014 18:19:51 +0200, Laurent wrote: > > > Hi, > > > > > > While translating (almost literally) Norvig's awesome regex golf from > > > Python [1] to Racket [2], I'm facing a 25x slowdown (about 2s vs 50s). > > I've > > > run the optimization coach and made the obvious changes (mainly adding > > > `in-list` in `for` loops), tried to optimize `filter` and cache the > > regexps > > > in a hash (to avoid their recompilation, since I need to keep the raw > > > string too), but with no significant gain. > > > > > > Reading the profile report, it seems that most of the time (99%) is spent > > > in `regexp-match`, so I wanted to write a stripped down version showing > > > that Python indeed is much faster than Racket on this particular function > > > but this test case shows almost the contrary! (25s vs 23s) > > > > > > So I'm stumped and I have no idea how to cut this 25x slowdown. Does > > anyone > > > have an idea? > > > > > > In [2], you can find: > > > - the regex golf file in Python (used for timing), which is almost > > > identical to Norvig's > > > - the translation in Racket > > > - the stress test of `re.search` in Python > > > - the stress test of `regexp-match?` in Racket > > > > > > Timings can be found at the bottom of each file. > > > > > > Note that the outputs of the programs are not exactly the same, because > > > sets are used in Python and lists in Racket, but unless there are bugs in > > > my code (which is not improbable) I don't think it matters a lot. > > > > > > Thanks, > > > Laurent > > > > > > [1] http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313.ipynb > > > [2] https://gist.github.com/Metaxal/777be153b50a35a2618c > > > ____________________ > > > Racket Users list: > > > http://lists.racket-lang.org/users > > > [1.2 <text/html; UTF-8 (quoted-printable)>] > > [2 <text/plain; us-ascii (7bit)>] > ____________________ > Racket Users list: > http://lists.racket-lang.org/users ____________________ Racket Users list: http://lists.racket-lang.org/users