Re: [gentoo-dev] The tree is now utf-8 clean
On Sat, 17 Sep 2005 02:42:09 +0100 Ciaran McCreesh <[EMAIL PROTECTED]> wrote: | The tree is now utf-8 clean. ...and now it isn't. app-benchmarks/ltp/ChangeLog Bad character 0x0a inside UTF-8 sequence (2/4) at line 8 offset 9 -- Ciaran McCreesh : Gentoo Developer (Vim, Shell tools, Fluxbox, Cron) Mail: ciaranm at gentoo.org Web : http://dev.gentoo.org/~ciaranm pgpse5NVhj9zw.pgp Description: PGP signature
Re: [gentoo-dev] The tree is now utf-8 clean
maillog: 19/09/2005-11:52:26(+0200): Paul de Vrieze types > On Saturday 17 September 2005 22:06, Mike Frysinger wrote: > > On Saturday 17 September 2005 01:15 pm, Ciaran McCreesh wrote: > > > On Sat, 17 Sep 2005 12:56:37 +0200 "Fernando J. Pereda" > > > > > > <[EMAIL PROTECTED]> wrote: > > > | On Sat, Sep 17, 2005 at 02:42:09AM +0100, Ciaran McCreesh wrote: > > > | | Something strange I noticed... Some people are using funny quotes > > > | | and non breaking spaces in ebuilds. Some people are using weird > > > | | characters as substitution delimiters for sed. Don't! It will > > > | | break on many systems. I'm going to go and purge all of those, > > > | | UTF-8 or not, whenever my brain recovers. > > > | > > > | I hope ~ is not considered a weird character... if it is, tell me > > > | and I'll fix all my ebuilds. > > > > > > No, ~ is fine. Anything with a value below 127 (don't use 127, it's > > > weird) that sed accepts is ok. > > > > in other words, ASCII characters are OK. if in doubt, just run `man > > ascii` and see if your character is in the table > > You probably don't want to use the ascii control characters either > (anything below 32), although they should not give issues with people > they could cause havoc for terminals or annoy people (using the BELL > character as sed separator). Um, I guess everybody got the point. In fact, you probably shouldn't use alphanumerics either -- they work, but are as ugly as... echo herr | sed -e sorolog -- (* Georgi Georgiev (* They can always run stderr through uniq. :-) (* *)[EMAIL PROTECTED]*) -- Larry Wall in *) (* +81(90)2877-8845 (* <[EMAIL PROTECTED]> (* pgpg5sCWkN0mw.pgp Description: PGP signature
Re: [gentoo-dev] The tree is now utf-8 clean
On Saturday 17 September 2005 22:06, Mike Frysinger wrote: > On Saturday 17 September 2005 01:15 pm, Ciaran McCreesh wrote: > > On Sat, 17 Sep 2005 12:56:37 +0200 "Fernando J. Pereda" > > > > <[EMAIL PROTECTED]> wrote: > > | On Sat, Sep 17, 2005 at 02:42:09AM +0100, Ciaran McCreesh wrote: > > | | Something strange I noticed... Some people are using funny quotes > > | | and non breaking spaces in ebuilds. Some people are using weird > > | | characters as substitution delimiters for sed. Don't! It will > > | | break on many systems. I'm going to go and purge all of those, > > | | UTF-8 or not, whenever my brain recovers. > > | > > | I hope ~ is not considered a weird character... if it is, tell me > > | and I'll fix all my ebuilds. > > > > No, ~ is fine. Anything with a value below 127 (don't use 127, it's > > weird) that sed accepts is ok. > > in other words, ASCII characters are OK. if in doubt, just run `man > ascii` and see if your character is in the table You probably don't want to use the ascii control characters either (anything below 32), although they should not give issues with people they could cause havoc for terminals or annoy people (using the BELL character as sed separator). Paul -- Paul de Vrieze Gentoo Developer Mail: [EMAIL PROTECTED] Homepage: http://www.devrieze.net pgptLTrYB92Ee.pgp Description: PGP signature
Re: [gentoo-dev] The tree is now utf-8 clean
On Saturday 17 September 2005 01:15 pm, Ciaran McCreesh wrote: > On Sat, 17 Sep 2005 12:56:37 +0200 "Fernando J. Pereda" > > <[EMAIL PROTECTED]> wrote: > | On Sat, Sep 17, 2005 at 02:42:09AM +0100, Ciaran McCreesh wrote: > | | Something strange I noticed... Some people are using funny quotes > | | and non breaking spaces in ebuilds. Some people are using weird > | | characters as substitution delimiters for sed. Don't! It will break > | | on many systems. I'm going to go and purge all of those, UTF-8 or > | | not, whenever my brain recovers. > | > | I hope ~ is not considered a weird character... if it is, tell me and > | I'll fix all my ebuilds. > > No, ~ is fine. Anything with a value below 127 (don't use 127, it's > weird) that sed accepts is ok. in other words, ASCII characters are OK. if in doubt, just run `man ascii` and see if your character is in the table -mike -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] The tree is now utf-8 clean
On Sat, 17 Sep 2005 18:15:31 +0100 Ciaran McCreesh <[EMAIL PROTECTED]> wrote: | No, ~ is fine. Anything with a value below 127 (don't use 127, it's | weird) that sed accepts is ok. There are some ebuilds that use that | curly paragraph marker character (§) and weird curly quotes. Those're | the ones that cause problems. Uhm, where by 127 I of course mean 128... -- Ciaran McCreesh : Gentoo Developer (Vim, Shell tools, Fluxbox, Cron) Mail: ciaranm at gentoo.org Web : http://dev.gentoo.org/~ciaranm pgpbmk5wjIuGb.pgp Description: PGP signature
Re: [gentoo-dev] The tree is now utf-8 clean
On Sat, 17 Sep 2005 12:56:37 +0200 "Fernando J. Pereda" <[EMAIL PROTECTED]> wrote: | On Sat, Sep 17, 2005 at 02:42:09AM +0100, Ciaran McCreesh wrote: | | Something strange I noticed... Some people are using funny quotes | | and non breaking spaces in ebuilds. Some people are using weird | | characters as substitution delimiters for sed. Don't! It will break | | on many systems. I'm going to go and purge all of those, UTF-8 or | | not, whenever my brain recovers. | | I hope ~ is not considered a weird character... if it is, tell me and | I'll fix all my ebuilds. No, ~ is fine. Anything with a value below 127 (don't use 127, it's weird) that sed accepts is ok. There are some ebuilds that use that curly paragraph marker character (§) and weird curly quotes. Those're the ones that cause problems. -- Ciaran McCreesh : Gentoo Developer (Vim, Shell tools, Fluxbox, Cron) Mail: ciaranm at gentoo.org Web : http://dev.gentoo.org/~ciaranm pgprJtWkOpW0S.pgp Description: PGP signature
Re: [gentoo-dev] The tree is now utf-8 clean
On Sat, Sep 17, 2005 at 02:42:09AM +0100, Ciaran McCreesh wrote: | Something strange I noticed... Some people are using funny quotes and | non breaking spaces in ebuilds. Some people are using weird characters | as substitution delimiters for sed. Don't! It will break on many | systems. I'm going to go and purge all of those, UTF-8 or not, whenever | my brain recovers. I hope ~ is not considered a weird character... if it is, tell me and I'll fix all my ebuilds. Cheers, Ferdy -- Fernando J. Pereda Garcimartín Gentoo Developer (Alpha,net-mail) 20BB BDC3 761A 4781 E6ED ED0B 0A48 5B0C 60BD 28D4 pgpPPyvqGYvqI.pgp Description: PGP signature
[gentoo-dev] The tree is now utf-8 clean
The tree is now utf-8 clean. Or it is to the extent that a computer can reasonably determine... If the relevant people are prepared to smack anyone who refuses to play nice then now would be a good time to unwithdraw GLEP 31, make compliance mandatory and add glep31check [1] to repoman or server-side. There are still a few instances of munged character sequences that happen to also be valid UTF-8. If you come across one, feel free to fix it. If you have weird characters in your name, please make especially sure that you're getting your ChangeLog name right. These are far more common than occasional user credit ChangeLog entries. Also, if your name on the devlist [2] isn't accented, pester someone to update it. Something strange I noticed... Some people are using funny quotes and non breaking spaces in ebuilds. Some people are using weird characters as substitution delimiters for sed. Don't! It will break on many systems. I'm going to go and purge all of those, UTF-8 or not, whenever my brain recovers. As far as editor support... On those really rare occasions when you need to enter UTF-8 text in ebuilds, vim, emacs and nano should all more or less work. For ChangeLogs, echangelog is utf-8 transparent, meaning if you run it from a UTF-8 terminal it should be ok. We have a guide [3] if you want to know more... [1]: http://dev.gentoo.org/~ciaranm/toys/glep31check-0.3.3.tar.bz2 [2]: http://www.gentoo.org/proj/en/devrel/roll-call/userinfo.xml [3]: http://www.gentoo.org/doc/en/utf-8.xml -- Ciaran McCreesh : Gentoo Developer (Vim, Shell tools, Fluxbox, Cron) Mail: ciaranm at gentoo.org Web : http://dev.gentoo.org/~ciaranm pgpR2WIhHmmXx.pgp Description: PGP signature