Re: String comparison ops

Michel Lambert Wed, 10 Oct 2001 00:57:48 -0700

> Question about the following code.
>
> +INTVAL
> +string_compare(STRING* s1, STRING* s2) {
> +    if (s1->encoding != s2->encoding) {
> +        if (s1->encoding->which != enc_utf32) {
> +            s1 =
Parrot_transcode_table[s1->encoding->which][enc_utf32](s1,
> NULL);
> +        }
> +        if (s2->encoding->which != enc_utf32) {
> +            s2 =
Parrot_transcode_table[s2->encoding->which][enc_utf32](s2,
> NULL);
> +        }
> +    }
> +
> +    return (ENC_VTABLE(s1)->compare)(s1, s2);
> +}
>


Am I missing something here, or does this code not properly free transcoded
s1's and s2's after it's done comparing them?

The second thing I wanted to ask about....I'm not sure if this is
preoptimization, but Perl did it, so it may be worth looking into. (Or may
be worth not looking into, for the same reason ;)

What about making an store_transcode( string, encodingtype ) function which
takes a string, and stores the encodingtype version in it. Strings would
then be able to store multiple versions of themselves, in utf32, utf8, etc
format. The original format would still be remembered as the 'main' format,
of course, for all printing and so on. But for internal parrot string ops,
one could use these alternate representations to avoid re-transcoding
strings.

If we compare a native string against many unicode strings, there should be
a performance increase. (Tests would be required to validate this, of
course.) This shouldn't blow our cache either, since we'd only be following
the pointer to the one string type we are using. Of course, comparing a
native string against many different native AND utf32 strings would result
in both being loaded into memory. But then we'd be using as much cache
memory as if we loaded the native one and converted it, if I understand
things correctly. Doing this with large strings could make memory/cache
usage go to hell if we store two copies: the megabyte native string, and
multimegabyte utf32 string. But then again, I can't imagine convering it to
utf32 at every comparsion will be very fast either. :)

Is it too early to be adding this kind of complexity to the string
representations, and should this be delayed until later?

Also, this kind of 'optimization' also brings up the classic speed vs.
memory tradeoff. I'm curious if there some guideline under which parrot
development should be heading towards, in a dilemma like this? Or will it
always be about choosing the 'reasonable' tradeoff?

Thanks,
Mike Lambert
...hoping his first post to the list doesn't make him look like too much of
an idiot ;)

Re: String comparison ops

Reply via email to