Re: ASCII/Unicode

Dimitrie O. Paun Wed, 26 Apr 2000 12:30:22 -0700
> > Yes, it is easy, but there is a lot of effort for little gain -- in the
end,
> > we only really support ASCII, and on top of it, we do it in a slow and
> > inefficient manner (let alone ugly).
>
> No, we don't. What we do is slow. And inefficient. Yes.
>
> But it is easy to handle and to debug.

You have a point there. But mind you, we can still keep the W->A conversions
for now to help in debugging, so it is not a huge problem.

> And since we are still in ALPHA state and those W<->A conversion are one
> of the smaller bottlenecks I think "easy to handle and to debug" counts
more
> than a bit more speed and efficency.

point taken.

> > > But there is nothing stopping a slow migration from W->A conversions
into
> > > A->W conversions everywhere.
> >
> > Somehow, I don't think working with W is the right thing to do in Unix.
>
> glibc-2.2 will have a nearly complete set of wide character functions,
> the whole other Linux stuff is going the UNICODE way too. QT2 has it, KDE2
> will have it.

Hey, Unicode is great! I am not arguing against it. The question is, which
_encoding_
do we use? I think that in the Unix/Linux world, we are better off with UTF8
rather
than UTF16. That was my argument.

> > We have the following situation: we receive strings as arguments; their
> > encoding is not explicit with every string, but rather is implicit by
the
> > entry point. Now, we can do two things:
> >   1. [eager] conver at the entry point in one common format, and carry
on
> >       in with one internally with that format
>
> Yes.
>
> >   2. [lazy] remember the encoding that the strings are in, and pass that
> >       around until we actually need a specific encoding
>
> This is bad from a programming point of view. This adds bloat and you
> have two possible code paths and duplication. You can't test both cases
> easily.

I don't understand. OK, passing one more integer to every function that
takes
a string is a bit of a bloat, but come on -- it will fade in the noise! This
would
be the last problem to worry about. As for the two code path, I think you
misunderstood my idea.

We generally do:

fooW
  |
 V
fooA
  |
 V
barA
  |
 V
bazA
  |
 V
someUnixFunc

I say we do:

fooW    fooA
    \         /
     \      /
      fooX
        |
       V
      barX
        |
       V
      bazZ
        |
       V
  someUnixFunc

the only difference is that when we call the Unix function, we do a

HEAP_strdupXtoUTF8 or whatever. Even if we are to go W all the way
(as we propose), we still need to do a HEAP_strdupWtoUTF8, since
most Unix functions (X, kernel) take UTF8 encoded strings.

> With the first approach we have one codepath (and a bit additional
framework
> for the conversion). We can just test the path with either W or A call.

Same here.

> > Both clutter the code in their own way:
> >  1. we have the ugly HEAP_strdup/HeapFree things all over the place
>
> Yup, but otherwise the code is not cluttered. Just a wrapper which is
always
> the same and has no bugs.

The same here: all functions with the X suffix, take as the first parameter
an
integer. This can even be verified by winapi_ckeck! :)

--
Dimi.
Re: ASCII/Unicode

Reply via email to