RE: ASCII/Unicode

Patrik Stridvall Wed, 26 Apr 2000 11:06:03 -0700
> We have the following situation: we receive strings as 
> arguments; their
> encoding is not explicit with every string, but rather is 
> implicit by the
> entry
> point.

Please note that there are some cases in which the encoding is
determined by the user specified OS version.

I'm not saying that they are a problem, but please note it.

> Now, we can do two things:
>   1. [eager] conver at the entry point in one common format, 
> and carry on
>       in with one internally with that format

Note that many API requires that we store the string for later
retrival, so we need a common format to store it in anyway.

Of course we could store the encoding format as well...

>   2. [lazy] remember the encoding that the strings are in, 
> and pass that
>       around until we actually need a specific encoding
> 
> > But ... I think we all would appreciate it, if the code would stay
> readable
> > and not get lots of if()s and the like.
> 
> None of the previously suggested strategies adds if() or much clutter.
> Both clutter the code in their own way:
>  1. we have the ugly HEAP_strdup/HeapFree things all over the place
>  2. we (may) need to pass an extra parameter around.
> 
> Note that with 2, we may not need to carry an extra parameter 
> around -- we
> can save the current encoding for the thread when we 
> enter/exit wine, but
> that would be tricky -- not sure that it's worth it. 

I don't think so either. Think about reentrancy problems.

> Carring an extra
> parameter is not much clutter -- it is akin to a this pointer.

Indeed.
 
> Anyway, I like 2 better than 1. Not commiting to an encoding 
> early in the
> game is good -- sometimes we need UTF8 (filesystems, X), in 
> other cases we
> need UTF16 (pure Win stuff). 

True.

> Moreover, the thing is scalable 
> -- if another
> encoding comes along, we could easily support it. 

If we design a general enough solution, yes.
However, I think that such a solution is to
inefficent. I think the only way to get it
fast enough is to limit it so it knows how
the different formats relate.

> And, on top 
> of it all, it
> should be more efficient.

More efficient for the _teoretical_ average case perhaps,
but definitely not for the common case which is ASCII and
will likely remain so for forseable future.

Being lazy penalizes all cases equally, but all cases are
not equally likely in the real world.

I'm not say I am against the lazy solution,
I like it from a theoretical point of view.

The eager solution has the problem that either

1. We choose ASCII as the common format, and then
UNICODE becomes largely useless, eventhough
many UNICODE applications will work.
2. We choose UNICODE as the common format, and then
the in the real world common case ASCII is penalized.

My prefered solution is that have a common
c file and compile it several times with different
defines for each format that needs to be supported.

You still have choose which format you store
the internal strings in but that can be made
either a compile option or by compiling
everything one more time for each variant
you wish to support.

Of course you need for speed reasons to have some
sort of pseudo allocation/conversion macros that are
defined to the identity for the matching case, but
that is just technical details.

Does it sound horrible? With say 3 different formats
some parts could be compiled 9 (3*3) times.

Well, it is not that bad, since only the currently
needed variants are loaded in memory at the same time.
As for disk space it is cheap, it in the cases it is
not like embedded system we can just easlity have a
compile option for just supporting ASCII with ASCII
internals.

Of course Alexandre has already expressed his dislike for
this approach but this doesn't change the fact that I
still consider it the least bad alternative.
RE: ASCII/Unicode

Reply via email to