Symbols are stored internally in utf-8, I believe.
On Tue, Jan 15, 2013 at 5:14 PM, Danny Yoo <d...@hashcollision.org> wrote: > >> >> 1. First, pull all the content of the input port into a string > >> >> port. This cut down the runtime from 52 seconds to 45 > >> >> seconds. (15% improvement) > >> > > >> > I don't think that this is a good idea -- it looks lie a dangerous > >> > assumption for a generic library to do, instead of letting users > >> > decide for themselves if they want to do so and hand the port to > >> > the library. > > Wow. Ok, I see what you mean now, and yeah, my optimization here is > unsound. I did not know the JSON library behaved in a streaming > manner. Thanks! > > > > >> When I watch `top` and see how much memory's being used in the > >> original code, I think this is a red herring, for the unoptimized > >> json parser is already consuming around 500MB of ram on J G Cho's > >> 92MB file during the parse. > > > > Is the *result* 500mb or the memory used while parsing? If it's the > > former, then that's not the consumption that is increased. (BTW, if > > most of it is made of strings, then we get the 4x UCS32 factor.) If > > it's the latter then I'm surprised. > > Yeah, the input JSON file is full of string literals from casual > inspection, so I think you're right about the UCS32 explanation. It's > too bad; I had assumed that Racket used utf-8, since I've seen so many > instances of bytes->string/utf-8 in Racket code. > > > > >> >> 2. Modified read-list so it avoids using regular expressions when > >> >> simpler peek-char/read-char operations suffice. Reduced the runtime > >> >> from 45 seconds to 40 seconds. (12% improvement) > >> > > >> > This is a questionable change, IMO. The thing is that keeping > >> > things with regexps makes it easy to revise and modify in the > >> > future, but switching to a single character thing makes it hard > >> > and in addition requires the code to know when to use regexps and > >> > when to use a character. I prefer in this case the code > >> > readability over performance. > > Ok, I'll abandon this specific patch for now. > > It sounds though that Ray Racine mentioned that his TR-ed version of > the code performs faster than the non-TRed version? Ray, do you have > that version available somewhere to play with? > > --- > > I did push master with one change to the JSON library: the replacement > of the non-greedy regexp with the char-complement version. I also > added several test cases to make sure I got it right. > > Thanks again for the review! > ____________________ > Racket Users list: > http://lists.racket-lang.org/users >
____________________ Racket Users list: http://lists.racket-lang.org/users