Re: Codepages and locales

Alexandre Julliard Mon, 29 May 2000 12:00:03 -0700
Bertho Stultiens <[EMAIL PROTECTED]> writes:

> I am currently implementing the Wine Message Compiler (wmc: as an
> alternative to mc.exe). I need to implement quite a bit unicode support
> for it to function correctly. Unicode requires a lot of tables for
> conversion and I did just that for nearly all codepages (from
> ftp.unicode.org).

I have been working on that with Dmitry Timoshkov, so it sounds like
we may be doing duplicate work here...

> The complex DBCS codepages (cp932..cp950; include leadbytes for
> extension) generate tables which are approx 128kB in size (total about
> 500kB). More than 80% of the program is occupied by the tables. The
> tables are, strictly seen, also required for wrc and wine{lib}. Wouldn't
> it be better to make the codepages "loadable"? This would save quite a
> bit of static (r/o) data in the executable and make it possible to
> share. If so, how should the file/data format and API interface for the
> tables be (I do not directly mean multibytetowidechar and friends, but
> they should be considered too)? I would prefer to encapsulate the data
> into ELF shared libraries so that we can take advantage of the .rodata
> mapping being OS maintained. Otherwise, you either need to write into
> the tables for indirection, or create extra tables at runtime on which
> you can build indirection tables later.

I'd prefer to have all of that inside libwine.so, and use the standard
APIs to access them; now of course if we need the message compiler to
build libwine we have a problem. It may be possible to link a
temporary message compiler with only the unicode objects, and then
build the final version once libwine.so is built.

> Then there is the issue of ToUpper/ToLower, strcoll, etc... Should we
> rely on the collate-info from libc and system language setting, or build
> that into wine as well (also with the glibc bug in mind)?

I don't think we want to rely on libc at this point; maybe in a few
years most systems will have proper Unicode support in libc, but this
is not the case yet.

> Another thing that I noticed was that a lot of the NLS data (ole/nls/*)
> is plain wrong. I need the language/codepage identification for wmc. A
> lot of different countries have exactly the same language-id, which is
> not possible in real life... Are these files used? Is anybody
> maintaining them? Are there plans for changing the
> location/content/interpretation? And, shouldn't they also be runtime
> loadable, instead of occupying memory?

They could do with some amount of bug fixing yes, and maybe with a
more efficient storage structure. I don't think they need to be
runtime loadable, demand-loading does that for us.

-- 
Alexandre Julliard
[EMAIL PROTECTED]
Re: Codepages and locales

Reply via email to