Re: Glibc binary locale generation for glibc does not generate UTF-8 locales

Jan Janssens Tue, 04 Jul 2006 00:04:47 -0700

Sorry to bump my own email, but even though it is a long read, I think
the subject is interesting. This could be easily classified a bug.


Regards, Jan

On 6/27/06, Jan Janssens <[EMAIL PROTECTED]> wrote:

Hi,

I've been trying to get the glibc(-2.3.2-cvs20040726) binary locale
generation working with my ARM target and I'm very close. In fact, the
build process finishes and produces the binary locale data, using
Qemu.

My problem is this: the binary locale tree that is generated by
localedef (using qemu) does not differentiate between UTF-8 and other
encodings. Because the ".UTF-8" part of a locale is stripped[1] from
the locale name, if there are more than 1 encodings for a given
locale, it is overwritten each time. This results in only the last one
going into the packages and ending up on the target. Because the UTF-8
version of each locale is generated first[2], the UTF-8 locale will
never be in any package, contrary to the message in
glibc-package.bbclass, which states "Reshuffle names so that UTF-8 is
preferred over other encodings".

This results in the locales for en_US and nl_NL to be iso-8895-1 and
zh_CN to be GB2312. I would have expected them to be UTF-8 by default,
or at least that UTF-8 could be selected somehow.

For iso-8895-1 locales, like en_US and nl_NL, this does not constitute
a real problem, because the gconv module will transparantly translate
any UTF-8 strings into iso-8895-1. For zh_CN (Chinese) however, this
is a real problem. I am using UTF-8 input for gettext generated
mo-files, but because the zh_CN locale-data is in GB2312, pango will
refuse to render the string (complaining it is not UTF-8).

I can fix this problem in a number of different ways, simplest of all
is to only build the UTF-8 versions of all locales. However, I think
the better solution would be to generate locales like f.i. Ubuntu
does: /usr/lib/locale/en_US.utf8/* or possibly just "en_US.UTF-8", so
there is a difference between en_US and en_US.UTF-8 (different dirs,
different packages).

As I'm just learning to work with Linux distributions in general and
OE in specific, I would like to understand what the implications of
such a change would be and why the UTF-8 part was truncated from the
locale name in the first place.

--- I hope one of you OE-guru's can comment on this, so we can fix
this the nice way for all distributions, instead of me doing a dirty
local fix.

Regards,

Jan

[1] glibc-package.bbclass:
dot_re = re.compile("(.*)\.(.*)")
...
m = dot_re.match(locale)
if m:
    locale = m.group(1)

[2] As per glibc-package.bbclass: "Reshuffle names so that UTF-8 is
preferred over other encodings", the code below this comment shows the
UTF-8 locale is built first (if I understand the Python code
correctly).

_______________________________________________
Oe mailing list
[email protected]
https://www.handhelds.org/mailman/listinfo/oe

Re: Glibc binary locale generation for glibc does not generate UTF-8 locales

Reply via email to