> Question about the following code. > > +INTVAL > +string_compare(STRING* s1, STRING* s2) { > + if (s1->encoding != s2->encoding) { > + if (s1->encoding->which != enc_utf32) { > + s1 = Parrot_transcode_table[s1->encoding->which][enc_utf32](s1, > NULL); > + } > + if (s2->encoding->which != enc_utf32) { > + s2 = Parrot_transcode_table[s2->encoding->which][enc_utf32](s2, > NULL); > + } > + } > + > + return (ENC_VTABLE(s1)->compare)(s1, s2); > +} >
Am I missing something here, or does this code not properly free transcoded s1's and s2's after it's done comparing them? The second thing I wanted to ask about....I'm not sure if this is preoptimization, but Perl did it, so it may be worth looking into. (Or may be worth not looking into, for the same reason ;) What about making an store_transcode( string, encodingtype ) function which takes a string, and stores the encodingtype version in it. Strings would then be able to store multiple versions of themselves, in utf32, utf8, etc format. The original format would still be remembered as the 'main' format, of course, for all printing and so on. But for internal parrot string ops, one could use these alternate representations to avoid re-transcoding strings. If we compare a native string against many unicode strings, there should be a performance increase. (Tests would be required to validate this, of course.) This shouldn't blow our cache either, since we'd only be following the pointer to the one string type we are using. Of course, comparing a native string against many different native AND utf32 strings would result in both being loaded into memory. But then we'd be using as much cache memory as if we loaded the native one and converted it, if I understand things correctly. Doing this with large strings could make memory/cache usage go to hell if we store two copies: the megabyte native string, and multimegabyte utf32 string. But then again, I can't imagine convering it to utf32 at every comparsion will be very fast either. :) Is it too early to be adding this kind of complexity to the string representations, and should this be delayed until later? Also, this kind of 'optimization' also brings up the classic speed vs. memory tradeoff. I'm curious if there some guideline under which parrot development should be heading towards, in a dilemma like this? Or will it always be about choosing the 'reasonable' tradeoff? Thanks, Mike Lambert ...hoping his first post to the list doesn't make him look like too much of an idiot ;)