Re: Glibc binary locale generation for glibc does not generate UTF-8 locales

Jan Janssens Sat, 08 Jul 2006 05:45:09 -0700

I'm not exactly sure what the impact is of adding ".UTF-8" or ".utf8"
as most Linux distributions seem to do.
I've registered this some days ago in the bugtracker (you might have missed it):
http://bugs.openembedded.org/show_bug.cgi?id=1140


Jan

P.S. I've solved it locally by only generating UTF-8 locales, but that
is obviously not a real solution...

On 7/8/06, Phil Blundell <[EMAIL PROTECTED]> wrote:

Yes, this sounds like a bad bug.

The intended behaviour is that "zh_CN" would be a UTF-8 locale, but you
could select "zH_CN.gb2312" if you wanted the legacy GB encoding.  From
what you say, this obviously isn't turning out quite right at the
moment.  I'll take a look at it later today.

p.

On Tue, 2006-06-27 at 15:34 +0200, Jan Janssens wrote:
> Hi,
>
> I've been trying to get the glibc(-2.3.2-cvs20040726) binary locale
> generation working with my ARM target and I'm very close. In fact, the
> build process finishes and produces the binary locale data, using
> Qemu.
>
> My problem is this: the binary locale tree that is generated by
> localedef (using qemu) does not differentiate between UTF-8 and other
> encodings. Because the ".UTF-8" part of a locale is stripped[1] from
> the locale name, if there are more than 1 encodings for a given
> locale, it is overwritten each time. This results in only the last one
> going into the packages and ending up on the target. Because the UTF-8
> version of each locale is generated first[2], the UTF-8 locale will
> never be in any package, contrary to the message in
> glibc-package.bbclass, which states "Reshuffle names so that UTF-8 is
> preferred over other encodings".
>
> This results in the locales for en_US and nl_NL to be iso-8895-1 and
> zh_CN to be GB2312. I would have expected them to be UTF-8 by default,
> or at least that UTF-8 could be selected somehow.
>
> For iso-8895-1 locales, like en_US and nl_NL, this does not constitute
> a real problem, because the gconv module will transparantly translate
> any UTF-8 strings into iso-8895-1. For zh_CN (Chinese) however, this
> is a real problem. I am using UTF-8 input for gettext generated
> mo-files, but because the zh_CN locale-data is in GB2312, pango will
> refuse to render the string (complaining it is not UTF-8).
>
> I can fix this problem in a number of different ways, simplest of all
> is to only build the UTF-8 versions of all locales. However, I think
> the better solution would be to generate locales like f.i. Ubuntu
> does: /usr/lib/locale/en_US.utf8/* or possibly just "en_US.UTF-8", so
> there is a difference between en_US and en_US.UTF-8 (different dirs,
> different packages).
>
> As I'm just learning to work with Linux distributions in general and
> OE in specific, I would like to understand what the implications of
> such a change would be and why the UTF-8 part was truncated from the
> locale name in the first place.
>
> --- I hope one of you OE-guru's can comment on this, so we can fix
> this the nice way for all distributions, instead of me doing a dirty
> local fix.
>
> Regards,
>
> Jan
>
> [1] glibc-package.bbclass:
> dot_re = re.compile("(.*)\.(.*)")
> ...
> m = dot_re.match(locale)
> if m:
>     locale = m.group(1)
>
> [2] As per glibc-package.bbclass: "Reshuffle names so that UTF-8 is
> preferred over other encodings", the code below this comment shows the
> UTF-8 locale is built first (if I understand the Python code
> correctly).
> _______________________________________________
> Oe mailing list
> [email protected]
> https://www.handhelds.org/mailman/listinfo/oe

_______________________________________________
Oe mailing list
[email protected]
https://www.handhelds.org/mailman/listinfo/oe

Re: Glibc binary locale generation for glibc does not generate UTF-8 locales

Reply via email to