> From looking at the profile, we can trace that about 60% of the time > is being spent in... regexp-try-match! That sounds really unusual: > lexing should not be the expensive part of this process... > > So perhaps it might be helpful to see if an alternative lexing > strategy (perhaps using parser-tools/lex) will perform better.
Ok, I looked at the problem a little more. It appears that there are a few stupid-simple optimizations to the JSON library that we can do. I've been able to cut down the time on my machine from an unoptimized run of 52 second to parse your file, to about 36 seconds. Here's the patch: https://github.com/dyoo/racket/commit/e8dc403217574754c57fa4bd95439abfb9b521ec I haven't pushed to master just because I'd like someone else to review the changes. Also, I have not been able to find the unit tests for the json library. Does anyone know where they are? Here's a summary of the changes. 1. First, pull all the content of the input port into a string port. This cut down the runtime from 52 seconds to 45 seconds. (15% improvement) 2. Modified read-list so it avoids using regular expressions when simpler peek-char/read-char operations suffice. Reduced the runtime from 45 seconds to 40 seconds. (12% improvement) 3. Looked at the profiler, which pointed out that read-string was very expensive. Looked and found the regular expression: rx"^(.*?)(\"|\\\\(.))" which is performance-hungry. Replaced with a char-complement version to avoid the "?" part of the pattern: #rx"^([^\"\\]*)(\"|\\\\(.))" which cut down the runtime from 40 seconds to 36 seconds. (11% improvement) There still seems to be a lot of low-hanging fruit with regards to the use of regexp-try-catch, which is still taking 52% of the runtime, according to the profile here: https://gist.github.com/4533369 ____________________ Racket Users list: http://lists.racket-lang.org/users