Re: [arch-general] Locale packages

2020-06-23 Thread Daan De Meyer via arch-general
Alright, thanks for all the info. I'll leave this be for now until the
C.UTF-8 support in upstream glibc is released. If they manage to reduce the
size sufficiently to have it built-in, there might not even be anything to
change on Arch's side.

Daan


Re: [arch-general] Locale packages

2020-06-23 Thread Eli Schwartz via arch-general
On 6/23/20 3:02 PM, Daan De Meyer via arch-general wrote:
>> This is not about locale-gen. locale-gen (and /etc/locale.gen) are
>> Arch-specific custom scripts which IIRC were copied from Debian once
>> upon a time, which just run localedef. I actually use a much simpler
>> locale-gen program which uses flag files e.g. /etc/locales/en_US (file
>> contents can contain a charset but are otherwise assumed to be UTF-8).
>> It's not hard to hack your own.
> 
> Running localedef directly doesn't really solve any of the issues I
> mentioned either though.

It would:
- avoid the *additional* issue "what to do if locale-gen doesn't exist",
- solve the issue "locale-gen does not have a --root option"

It wouldn't:
- solve the issue "host/guest glibc version mismatches"

> What if we make do with a single locale package? I just found out there's
> some progress on the C.UTF-8 locale upstream support in glibc (
> https://sourceware.org/pipermail/libc-alpha/2020-June/115224.html). It
> doesn't look like it will be built-in though unless they manage to get the
> size down significantly. If it isn't built-in, maybe we could add a single
> package just for the C.UTF-8 locale? That should be sufficient for 95% of
> the "I'm building an Arch container/vm image for development/server/any
> other development stuff" use cases which generally will be using an english
> locale and avoids all the problems I mentioned earlier without requiring
> the addition of 300+ packages. It'll have to wait until we have C.UTF-8 in
> glibc though. I guess we could add a package for en_US.UTF-8 as a stopgap
> but that doesn't seem worth the effort assuming C.UTF-8 gets merged in a
> reasonable timeframe.

The ultimate goal is to ensure C.UTF-8 always exists no matter what. If
it gets merged upstream in glibc as a non-builtin localedef generated
locale, then the probable best solution is to make locale-gen always
include C.UTF-8 regardless of which other locales are requested by the
user's system.

Or include its compiled form in the glibc package directly, if it isn't
too bloated.

> As an example of why one would need a UTF-8 locale specifically in a
> container/vm image, meson (actually python) does not like running under a
> non UTF-8 locale at all.

You're preaching to the choir, here. ;) I thoroughly agree there must be
a UTF-8 locale.

The question is at what stage should this be selected and generated.

> (I don't use mailing lists very often, I hope I didn't mess up the reply
> etiquette)

Generally people tend to delete the sections they are not replying to,
but reply inline, rather than including everytyhing the bottom as a
second copy of the sections you quoted and replied to inline. Still,
replying inline is the main thing, and you did that. :)


-- 
Eli Schwartz
Bug Wrangler and Trusted User



signature.asc
Description: OpenPGP digital signature


Re: [arch-general] Locale packages

2020-06-23 Thread Daan De Meyer via arch-general
> Very firm -1 to any approach that involves creating hundreds of new
> packages which each provide a tiny file.

You're right, this would be overkill. Even when limiting to only UTF-8 we'd
still have 313 packages.

> This is not about locale-gen. locale-gen (and /etc/locale.gen) are
> Arch-specific custom scripts which IIRC were copied from Debian once
> upon a time, which just run localedef. I actually use a much simpler
> locale-gen program which uses flag files e.g. /etc/locales/en_US (file
> contents can contain a charset but are otherwise assumed to be UTF-8).
> It's not hard to hack your own.

Running localedef directly doesn't really solve any of the issues I
mentioned either though.

What if we make do with a single locale package? I just found out there's
some progress on the C.UTF-8 locale upstream support in glibc (
https://sourceware.org/pipermail/libc-alpha/2020-June/115224.html). It
doesn't look like it will be built-in though unless they manage to get the
size down significantly. If it isn't built-in, maybe we could add a single
package just for the C.UTF-8 locale? That should be sufficient for 95% of
the "I'm building an Arch container/vm image for development/server/any
other development stuff" use cases which generally will be using an english
locale and avoids all the problems I mentioned earlier without requiring
the addition of 300+ packages. It'll have to wait until we have C.UTF-8 in
glibc though. I guess we could add a package for en_US.UTF-8 as a stopgap
but that doesn't seem worth the effort assuming C.UTF-8 gets merged in a
reasonable timeframe.

As an example of why one would need a UTF-8 locale specifically in a
container/vm image, meson (actually python) does not like running under a
non UTF-8 locale at all.

(I don't use mailing lists very often, I hope I didn't mess up the reply
etiquette)

Daan

On Mon, 22 Jun 2020 at 22:31, Eli Schwartz via arch-general <
arch-general@archlinux.org> wrote:

> On 6/22/20 3:11 PM, Daan De Meyer via arch-general wrote:
> > Hi,
> >
> > While working on locale-gen support for systemd-firstboot (
> > https://github.com/systemd/systemd/pull/15994), I started wondering if
> it
> > wouldn't be simpler to delegate the installation of locales to pacman
> > instead. I haven't been following the mailing lists for very long so I
> > don't know if this has ever been discussed. I'd imagine Arch could
> provide
> > a package for each locale supported by glibc and users would install the
> > ones they need.
>
> Very firm -1 to any approach that involves creating hundreds of new
> packages which each provide a tiny file.
>
> > The PKGBUILD would use localedef to generate separate folders of compiled
> > locale files for each locale that would be stored in /usr/lib/locale.
> This
> > approach is already implemented by distros such as Fedora (and co) and
> > Ubuntu.
> >
> > The main advantage of this approach is that there's no need to set up an
> > entire chroot to run locale-gen when pacstrapping a new Arch system
> image.
> > This might seem easy but becomes trickier when the image uses a different
> > architecture than the host system since emulation of that architecture
> has
> > to be set up first. Even if locale-gen had a --root option so using the
> > host's locale-gen would be an option, I'm not sure if there's any
> guarantee
> > that compiled locale definitions generated by the host system's
> locale-gen
> > would work with the glibc version used by the image (less of a problem
> with
> > Arch but the glibc on the host could still potentially be out-of-date
> > compared to the one installed in the image). Being able to install
> locales
> > with pacman would solve all these problems.
> >
> > Any interest in something like this from the Arch developers? I'd be
> > willing to try my hand at a PKGBUILD for this but I'm not a TU so I'd
> need
> > some support to get this implemented (if there is any interest at all).
> >
> > (This also doesn't imply that locale-gen wouldn't work anymore,
> locale-gen
> > stores everything in /usr/lib/locale/locale-archive which would be
> > independent from the files installed by the locale packages, so both
> > approaches should work side-by-side)
>
> This is not about locale-gen. locale-gen (and /etc/locale.gen) are
> Arch-specific custom scripts which IIRC were copied from Debian once
> upon a time, which just run localedef. I actually use a much simpler
> locale-gen program which uses flag files e.g. /etc/locales/en_US (file
> contents can contain a charset but are otherwise assumed to be UTF-8).
> It's not hard to hack your own.
>
> IIRC Fedora follows the "hundreds of packages which each provide a small
> file" approach, that being the localedef --no-archive intersection of a
> locale and a charmap. The combination of all possibilities will result
> in significant size bloat, so it is not feasible to provide them all in
> the glibc package itself. (e.g. try uncommenting all 487 locales in
> /etc/locale.gen and 

Re: [arch-general] Locale packages

2020-06-22 Thread Eli Schwartz via arch-general
On 6/22/20 3:11 PM, Daan De Meyer via arch-general wrote:
> Hi,
> 
> While working on locale-gen support for systemd-firstboot (
> https://github.com/systemd/systemd/pull/15994), I started wondering if it
> wouldn't be simpler to delegate the installation of locales to pacman
> instead. I haven't been following the mailing lists for very long so I
> don't know if this has ever been discussed. I'd imagine Arch could provide
> a package for each locale supported by glibc and users would install the
> ones they need.

Very firm -1 to any approach that involves creating hundreds of new
packages which each provide a tiny file.

> The PKGBUILD would use localedef to generate separate folders of compiled
> locale files for each locale that would be stored in /usr/lib/locale. This
> approach is already implemented by distros such as Fedora (and co) and
> Ubuntu.
> 
> The main advantage of this approach is that there's no need to set up an
> entire chroot to run locale-gen when pacstrapping a new Arch system image.
> This might seem easy but becomes trickier when the image uses a different
> architecture than the host system since emulation of that architecture has
> to be set up first. Even if locale-gen had a --root option so using the
> host's locale-gen would be an option, I'm not sure if there's any guarantee
> that compiled locale definitions generated by the host system's locale-gen
> would work with the glibc version used by the image (less of a problem with
> Arch but the glibc on the host could still potentially be out-of-date
> compared to the one installed in the image). Being able to install locales
> with pacman would solve all these problems.
> 
> Any interest in something like this from the Arch developers? I'd be
> willing to try my hand at a PKGBUILD for this but I'm not a TU so I'd need
> some support to get this implemented (if there is any interest at all).
> 
> (This also doesn't imply that locale-gen wouldn't work anymore, locale-gen
> stores everything in /usr/lib/locale/locale-archive which would be
> independent from the files installed by the locale packages, so both
> approaches should work side-by-side)

This is not about locale-gen. locale-gen (and /etc/locale.gen) are
Arch-specific custom scripts which IIRC were copied from Debian once
upon a time, which just run localedef. I actually use a much simpler
locale-gen program which uses flag files e.g. /etc/locales/en_US (file
contents can contain a charset but are otherwise assumed to be UTF-8).
It's not hard to hack your own.

IIRC Fedora follows the "hundreds of packages which each provide a small
file" approach, that being the localedef --no-archive intersection of a
locale and a charmap. The combination of all possibilities will result
in significant size bloat, so it is not feasible to provide them all in
the glibc package itself. (e.g. try uncommenting all 487 locales in
/etc/locale.gen and it is a 500MB locale-archive, "only" 100MB if you
stick to UTF-8 locales)

-- 
Eli Schwartz
Bug Wrangler and Trusted User



signature.asc
Description: OpenPGP digital signature


[arch-general] Locale packages

2020-06-22 Thread Daan De Meyer via arch-general
Hi,

While working on locale-gen support for systemd-firstboot (
https://github.com/systemd/systemd/pull/15994), I started wondering if it
wouldn't be simpler to delegate the installation of locales to pacman
instead. I haven't been following the mailing lists for very long so I
don't know if this has ever been discussed. I'd imagine Arch could provide
a package for each locale supported by glibc and users would install the
ones they need.

The PKGBUILD would use localedef to generate separate folders of compiled
locale files for each locale that would be stored in /usr/lib/locale. This
approach is already implemented by distros such as Fedora (and co) and
Ubuntu.

The main advantage of this approach is that there's no need to set up an
entire chroot to run locale-gen when pacstrapping a new Arch system image.
This might seem easy but becomes trickier when the image uses a different
architecture than the host system since emulation of that architecture has
to be set up first. Even if locale-gen had a --root option so using the
host's locale-gen would be an option, I'm not sure if there's any guarantee
that compiled locale definitions generated by the host system's locale-gen
would work with the glibc version used by the image (less of a problem with
Arch but the glibc on the host could still potentially be out-of-date
compared to the one installed in the image). Being able to install locales
with pacman would solve all these problems.

Any interest in something like this from the Arch developers? I'd be
willing to try my hand at a PKGBUILD for this but I'm not a TU so I'd need
some support to get this implemented (if there is any interest at all).

(This also doesn't imply that locale-gen wouldn't work anymore, locale-gen
stores everything in /usr/lib/locale/locale-archive which would be
independent from the files installed by the locale packages, so both
approaches should work side-by-side)

Cheers,

Daan De Meyer