> > Yes, it is easy, but there is a lot of effort for little gain -- in the
end,
> > we only really support ASCII, and on top of it, we do it in a slow and
> > inefficient manner (let alone ugly).
>
> No, we don't. What we do is slow. And inefficient. Yes.
>
> But it is easy to handle and to debug.
You have a point there. But mind you, we can still keep the W->A conversions
for now to help in debugging, so it is not a huge problem.
> And since we are still in ALPHA state and those W<->A conversion are one
> of the smaller bottlenecks I think "easy to handle and to debug" counts
more
> than a bit more speed and efficency.
point taken.
> > > But there is nothing stopping a slow migration from W->A conversions
into
> > > A->W conversions everywhere.
> >
> > Somehow, I don't think working with W is the right thing to do in Unix.
>
> glibc-2.2 will have a nearly complete set of wide character functions,
> the whole other Linux stuff is going the UNICODE way too. QT2 has it, KDE2
> will have it.
Hey, Unicode is great! I am not arguing against it. The question is, which
_encoding_
do we use? I think that in the Unix/Linux world, we are better off with UTF8
rather
than UTF16. That was my argument.
> > We have the following situation: we receive strings as arguments; their
> > encoding is not explicit with every string, but rather is implicit by
the
> > entry point. Now, we can do two things:
> > 1. [eager] conver at the entry point in one common format, and carry
on
> > in with one internally with that format
>
> Yes.
>
> > 2. [lazy] remember the encoding that the strings are in, and pass that
> > around until we actually need a specific encoding
>
> This is bad from a programming point of view. This adds bloat and you
> have two possible code paths and duplication. You can't test both cases
> easily.
I don't understand. OK, passing one more integer to every function that
takes
a string is a bit of a bloat, but come on -- it will fade in the noise! This
would
be the last problem to worry about. As for the two code path, I think you
misunderstood my idea.
We generally do:
fooW
|
V
fooA
|
V
barA
|
V
bazA
|
V
someUnixFunc
I say we do:
fooW fooA
\ /
\ /
fooX
|
V
barX
|
V
bazZ
|
V
someUnixFunc
the only difference is that when we call the Unix function, we do a
HEAP_strdupXtoUTF8 or whatever. Even if we are to go W all the way
(as we propose), we still need to do a HEAP_strdupWtoUTF8, since
most Unix functions (X, kernel) take UTF8 encoded strings.
> With the first approach we have one codepath (and a bit additional
framework
> for the conversion). We can just test the path with either W or A call.
Same here.
> > Both clutter the code in their own way:
> > 1. we have the ugly HEAP_strdup/HeapFree things all over the place
>
> Yup, but otherwise the code is not cluttered. Just a wrapper which is
always
> the same and has no bugs.
The same here: all functions with the X suffix, take as the first parameter
an
integer. This can even be verified by winapi_ckeck! :)
--
Dimi.