Re: ASCII/Unicode

Dimitrie O. Paun Wed, 26 Apr 2000 12:09:50 -0700

From: "Patrik Stridvall" <[EMAIL PROTECTED]>

> Please note that there are some cases in which the encoding is
> determined by the user specified OS version.

Yeah, these are trivial to deal with, once we can mark the encoding.

> > Now, we can do two things:
> >   1. [eager] conver at the entry point in one common format,
> > and carry on
> >       in with one internally with that format
>
> Note that many API requires that we store the string for later
> retrival, so we need a common format to store it in anyway.
>
> Of course we could store the encoding format as well...

That's right, if they are internal. If the user can access them directly, we
are bound by what the user expects. In any case, these are a small part, so
I don't care one way or the other.

> >   2. [lazy] remember the encoding that the strings are in,
> > and pass that
> >       around until we actually need a specific encoding

[...]

> > Moreover, the thing is scalable
> > -- if another
> > encoding comes along, we could easily support it.

[...]


> If we design a general enough solution, yes.
> However, I think that such a solution is to
> inefficent. I think the only way to get it
> fast enough is to limit it so it knows how
> the different formats relate.

I beg to differ. What I mean is that if we mark the encoding
with an int, 32 bits is big enough to hold any forseeable
number of encodings. Now, this should not affect (significantly)
our performance.  What will happen is that we will carry around
the encoding until we will have to transform it to a specfic one
(say, if it is a filename, or we need to print it, it will be UTF8),
so we will have functions of the form:


HEAP_strdupXtoUTF8(int enc, LPSTR str);
HEAP_strdupXtoUTF16(int enc, LPSTR str);

the speed of such functions is (almost) unaffected by the number of
encodings that we support, as internally all they have is probably a
switch statement.


> > And, on top  of it all, it should be more efficient.
>
> More efficient for the _teoretical_ average case perhaps,
> but definitely not for the common case which is ASCII and
> will likely remain so for forseable future.
>
> Being lazy penalizes all cases equally, but all cases are
> not equally likely in the real world.

No, it doesn't. In fact, if the input is streight ASCII, we need not
worry because in most cases we deal with UTF8 which is
compatible with ASCII. So for the common case, we are as
fast as we can be (ignoring the very small overhead of carring
the encoding around).

> The eager solution has the problem that either
>
> 1. We choose ASCII as the common format, and then
> UNICODE becomes largely useless, eventhough
> many UNICODE applications will work.

Well, as I said, it is a lot of work & clutter for close to no gain.
It is a waste of time, IMO.

> 2. We choose UNICODE as the common format, and then
> the in the real world common case ASCII is penalized.

Well, it is severily penalized, which is why I don't like.

> My prefered solution is that have a common
> c file and compile it several times with different
> defines for each format that needs to be supported.

Hmm, I don't like this either -- I agree with Alexandre on this one.

--
Dimi.
Re: ASCII/Unicode

Reply via email to