[gentoo-dev] [RFC] Should unicode be allowed in ebuild metadata variables?

2008-12-29 Thread Zac Medico
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

In response to bug 252748 I've implemented a new
'variable.invalidchar' repoman check that will trigger if an ebuild
metadata variable contains any characters that aren't in the ASCII
character set (0-127). Is this okay, or does anybody think that we
should allow UTF-8 unicode?
- --
Thanks,
Zac
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.9 (GNU/Linux)

iEUEARECAAYFAklZT9gACgkQ/ejvha5XGaMoZwCcC4ALWY/m+hOQenQZFINzD0jz
B6AAmIB3uN6bHMPJF2zrIC6jOCwtPvg=
=BQov
-END PGP SIGNATURE-



Re: [gentoo-dev] [RFC] Should unicode be allowed in ebuild metadata variables?

2008-12-29 Thread Zac Medico
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Zac Medico wrote:
 Hi,
 
 In response to bug 252748 I've implemented a new
 'variable.invalidchar' repoman check that will trigger if an ebuild
 metadata variable contains any characters that aren't in the ASCII
 character set (0-127). Is this okay, or does anybody think that we
 should allow UTF-8 unicode?

Nevermind, apparently GLEP 31 already requires ASCII anyway:

  http://www.gentoo.org/proj/en/glep/glep-0031.html

- --
Thanks,
Zac
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.9 (GNU/Linux)

iEYEARECAAYFAklZUqoACgkQ/ejvha5XGaM9hACbB7ftF/NiGYce9uRohE0w7AW8
6IkAn2ifjwQxILUIh/FUBursWFoE0J78
=ms0N
-END PGP SIGNATURE-



Re: [gentoo-dev] [RFC] Should unicode be allowed in ebuild metadata variables?

2008-12-29 Thread Ben de Groot
Zac Medico wrote:
 In response to bug 252748 I've implemented a new
 'variable.invalidchar' repoman check that will trigger if an ebuild
 metadata variable contains any characters that aren't in the ASCII
 character set (0-127). Is this okay, or does anybody think that we
 should allow UTF-8 unicode?
 
 Nevermind, apparently GLEP 31 already requires ASCII anyway:
 
   http://www.gentoo.org/proj/en/glep/glep-0031.html
 
The way I read that GLEP is that in ChangeLog and metadata.xml
we should accept the full range of UTF-8.

-- 
Ben de Groot
Gentoo Linux developer (lxde, media, qt, desktop-misc)
Gentoo Linux Release Engineering PR liaison
__

yng...@gentoo.org
http://ben.liveforge.org/
irc://chat.freenode.net/#gentoo-media
irc://irc.oftc.net/#lxde
__




Re: [gentoo-dev] [RFC] Should unicode be allowed in ebuild metadata variables?

2008-12-29 Thread Nirbheek Chauhan
On Tue, Dec 30, 2008 at 8:27 AM, Ben de Groot yng...@gentoo.org wrote:
 Zac Medico wrote:
 Nevermind, apparently GLEP 31 already requires ASCII anyway:

   http://www.gentoo.org/proj/en/glep/glep-0031.html

 The way I read that GLEP is that in ChangeLog and metadata.xml
 we should accept the full range of UTF-8.

I read that as contents of portage tree should be in UTF-8, file
paths should be in ASCII

It is proposed that UTF-8 ([1]) is used for encoding ChangeLog and
metadata.xml files inside the portage tree.

[...]it is proposed that UTF-8 is used as the official encoding for
ebuild and eclass files

Patches must clearly be in the same character set as the file they
are patching.

Characters outside the ASCII 0..127 range cannot safely be used for
file or directory names

It is also worth mentioning that Python 3K uses UTF-8 as the default
encoding for it's files rather than ASCII as Python 2.X did. Why
should *we* go backwards? :p

-- 
~Nirbheek Chauhan



Re: [gentoo-dev] [RFC] Should unicode be allowed in ebuild metadata variables?

2008-12-29 Thread Marius Mauch
On Tue, 30 Dec 2008 09:37:24 +0530
Nirbheek Chauhan nirbheek.chau...@gmail.com wrote:

 On Tue, Dec 30, 2008 at 8:27 AM, Ben de Groot yng...@gentoo.org
 wrote:
  Zac Medico wrote:
  Nevermind, apparently GLEP 31 already requires ASCII anyway:
 
http://www.gentoo.org/proj/en/glep/glep-0031.html
 
  The way I read that GLEP is that in ChangeLog and metadata.xml
  we should accept the full range of UTF-8.
 
 I read that as contents of portage tree should be in UTF-8, file
 paths should be in ASCII
 
 It is proposed that UTF-8 ([1]) is used for encoding ChangeLog and
 metadata.xml files inside the portage tree.
 
 [...]it is proposed that UTF-8 is used as the official encoding for
 ebuild and eclass files
 
 Patches must clearly be in the same character set as the file they
 are patching.
 
 Characters outside the ASCII 0..127 range cannot safely be used for
 file or directory names
 
 It is also worth mentioning that Python 3K uses UTF-8 as the default
 encoding for it's files rather than ASCII as Python 2.X did. Why
 should *we* go backwards? :p

And none of that is relevant to Zacs original question, which is
covered by the following section of the GLEP:
However, developers should be warned that any code which is parsed by
bash (in other words, non-comments), and any output which is echoed to
the screen (for example, einfo messages) or given to portage (for
example any of the standard global variables) must not use anything
outside the regular ASCII 0..127 range for compatibility purposes.

Marius



Re: [gentoo-dev] [RFC] Should unicode be allowed in ebuild metadata variables?

2008-12-29 Thread Nirbheek Chauhan
On Tue, Dec 30, 2008 at 10:03 AM, Marius Mauch gen...@gentoo.org wrote:
 And none of that is relevant to Zacs original question, which is
 covered by the following section of the GLEP:

Oops, sorry, misread the question :)


-- 
~Nirbheek Chauhan