Hello everybody,
Looking over the code (while trying to eliminate as much as possible our
internal APIs by using standard Win APIs), I realized that our handling of
ASCII/Unicode functions is less that satisfactory. Thinking about this, I
realized we are way too eager to know the encoding that is being used. What
I mean is that we usually do:
Case A
---------
fooA(LPSTR aa)
{
LPSTR wa;
wa = HEAP_strdupWtoA( GetProcessHeap(), 0, aa );
fooW(wa);
HeapFree( GetProcessHeap(), 0, wa );
}
Case B
---------
fooW(LPSTR wa)
{
LPSTR aa;
aa = HEAP_strdupAtoW( GetProcessHeap(), 0, wa );
fooA(aa);
HeapFree( GetProcessHeap(), 0, aa );
}
Both of this are slow, inefficient, and ugly like heck. Moreover, Case B is
not correct if we get arbitrary Unicode input.
I think we should be a lot more agnostic about the string encoding. What I
mean is that we should have only one function that works with all encodings,
which takes as the first argument the encoding used by the other strings
passed in as arguments:
fooX(int enc, LPSTR xa)
{
/* do the work independent of the encoding used by xa*/
}
and so, the A and W versions would just be (conceptually):
fooA(LPSTR a)
{ fooX(ENCODING_ASCII, a); }
fooW(LPSTR a)
{ fooX(ENCODING_UNICODE, a); }
these, of course, can be generated automatically, and can be optimized as:
fooA:
push #ENCODING_ASCII
jmp fooX
fooW:
push #ENCODING_UNICODE
jmp fooX
This would considerably clean up the source, and provide correct support for
Unicode with decent performance.
What I am saying, why be eager about converting things to an encoding when
we do no really know what encoding we really need. Sometime we may need
UTF8, sometimes UTF16, or whatever.
What do you think?
--
Dimi.