Re: ASCII/Unicode

Marcus Meissner Wed, 26 Apr 2000 11:44:48 -0700
> Yes, it is easy, but there is a lot of effort for little gain -- in the end,
> we only really support ASCII, and on top of it, we do it in a slow and
> inefficient manner (let alone ugly).

No, we don't. What we do is slow. And inefficient. Yes.

But it is easy to handle and to debug.

And since we are still in ALPHA state and those W<->A conversion are one
of the smaller bottlenecks I think "easy to handle and to debug" counts more
than a bit more speed and efficency.

> > But there is nothing stopping a slow migration from W->A conversions into
> > A->W conversions everywhere.
> 
> Somehow, I don't think working with W is the right thing to do in Unix.

glibc-2.2 will have a nearly complete set of wide character functions,
the whole other Linux stuff is going the UNICODE way too. QT2 has it, KDE2
will have it.

> We have the following situation: we receive strings as arguments; their
> encoding is not explicit with every string, but rather is implicit by the
> entry point. Now, we can do two things:
>   1. [eager] conver at the entry point in one common format, and carry on
>       in with one internally with that format

Yes.

>   2. [lazy] remember the encoding that the strings are in, and pass that
>       around until we actually need a specific encoding

This is bad from a programming point of view. This adds bloat and you
have two possible code paths and duplication. You can't test both cases 
easily.

With the first approach we have one codepath (and a bit additional framework
for the conversion). We can just test the path with either W or A call.
 
> > But ... I think we all would appreciate it, if the code would stay readable
> > and not get lots of if()s and the like.
> 
> None of the previously suggested strategies adds if() or much clutter.

I don't see how you want to avoid them. Just let me use "GetFileAttributes"
as example including the whole DOS->UNIX path conversion layer below. How
do you want to avoid if()s there?

> Both clutter the code in their own way:
>  1. we have the ugly HEAP_strdup/HeapFree things all over the place

Yup, but otherwise the code is not cluttered. Just a wrapper which is always
the same and has no bugs.

And most of the lowlevel API is still 8bit.

Ciao, Marcus
Re: ASCII/Unicode

Reply via email to