Re: ASCII/Unicode

Dimitrie O. Paun Thu, 27 Apr 2000 11:32:43 -0700
From: "Patrik Stridvall" <[EMAIL PROTECTED]>

> [I, forwarded it to the list, since I don't believe you meant
>  it as a private mail]

You are correct, I resent the original message to the list.

> Talking about overstatement isn't hate that as well.

I hate the inconsistency. [Look, hate = don't like :) ]

> The currently solution is slow, but at least it works
> and is obviously correct.

For the A->W, I (almost) agree. But the W->A (the majority of cases!),
it is both slow, and incorrect.
But no, even in the A->W it is incorrect! What do we do when we
run out of memory? We KILL the app. No questions asked. Bad.
Proper handling of OOM situations is almost
impossible in the current scheme.

> > That being said,
> > I think it is an overstatement to say that it sucks on
> > readability. Really.
>
> OK, sucks was a bit strong, sorry.
> What I mean is that the the extra int makes the code less readable.

I will address this shortly. In another message.

> > Debugging can be a bit more problematic,
> > I agree, but support for this sort of thing can easily be built
> > into the debugger, so you will not see much difference.
>
> Which debugger? Ultimately we want to run gdb as often as possible,
> not the internal debugger.

True. It is hard to teach all debuggers, but for some things, you need to.
This is very similar to the C++ case: you want proper C++ support, you
need to teach your tools about C++. Tough.

[...]

> In any case it still avoid the A->W->U transition
> that you don't like, in all cases except when
> one API does A->W and stores it internally,
> and a second API does W->U  on the internally
> stored string but that is the problem with your
> solution as well, unless you store the encoding
> format internally.

How many times we store string internally? There are two major cases,
the way I see it:

1. We do not store references to the string internally.
 I think these are the majority of functions, and we can separate them in
two subcategories:

a. We are given a string to do something with it
 (open a file, draw it on the screen, etc). Vast majority of these functions
just end up passing the string on to a Unix function that does the real
work.
In this case (most common code paths), we can save a lot on useless
conversions.

b. We are given a string to process it and return the result
  (such as strlen, strcmp, etc that are purely functional).
All of these functions must be aware of the encoding used and we
must have them in any case. This part is rather unaffected by either
scheme of dealing with Unicode.


2. We do store references to the string internally.
How many functions fall in this category? Anyway, we can
differenciate here two other subcases:

a. The app does not have access to the strings. In this case, we can
trivially remeber the encoding. No problem

b. The app will have access to the string. We have again two strategies:

   i. [eager] store the string in a internal format (say, W)
  ii. [lazy] remeber the encoding and do the conversion on the first access
(I like this one)


But, tell me, how many functions of the type 2.b we have?

[...]
> All applications running under the same Wine server will
> share the same internal format, so that is not a problem.
> Windows also does it that way.

So what? What I am saying is that in your solution, if we
run to apps, one A and one W, we load the lib twice in mem.
Bad.

> > Actually, I think it will be slower.
> AFAIU my solution is theoretically optimal, speedwise.

>From one perspective. But from a memory usage, disk usage,
cache usage, it is not.

[...]

> Personally I don't really like your solution,
> I have a gut feeling that we might regret it
> in the future, and no I can't really can't
> say exactly why.

As I hintted in the beggining of the message, I will detail in a
different message why I now think my solution is "The Right Way".

This one is too long already.

--
Dimi.
Re: ASCII/Unicode

Reply via email to