Re: [gentoo-dev] The tree is now utf-8 clean

2005-09-28 Thread Ciaran McCreesh
On Sat, 17 Sep 2005 02:42:09 +0100 Ciaran McCreesh <[EMAIL PROTECTED]>
wrote:
| The tree is now utf-8 clean.

...and now it isn't.

app-benchmarks/ltp/ChangeLog
  Bad character 0x0a inside UTF-8 sequence (2/4) at line 8 offset 9

-- 
Ciaran McCreesh : Gentoo Developer (Vim, Shell tools, Fluxbox, Cron)
Mail: ciaranm at gentoo.org
Web : http://dev.gentoo.org/~ciaranm



pgpse5NVhj9zw.pgp
Description: PGP signature


Re: [gentoo-dev] The tree is now utf-8 clean

2005-09-19 Thread Georgi Georgiev
maillog: 19/09/2005-11:52:26(+0200): Paul de Vrieze types
> On Saturday 17 September 2005 22:06, Mike Frysinger wrote:
> > On Saturday 17 September 2005 01:15 pm, Ciaran McCreesh wrote:
> > > On Sat, 17 Sep 2005 12:56:37 +0200 "Fernando J. Pereda"
> > >
> > > <[EMAIL PROTECTED]> wrote:
> > > | On Sat, Sep 17, 2005 at 02:42:09AM +0100, Ciaran McCreesh wrote:
> > > | | Something strange I noticed... Some people are using funny quotes
> > > | | and non breaking spaces in ebuilds. Some people are using weird
> > > | | characters as substitution delimiters for sed. Don't! It will
> > > | | break on many systems. I'm going to go and purge all of those,
> > > | | UTF-8 or not, whenever my brain recovers.
> > > |
> > > | I hope ~ is not considered a weird character... if it is, tell me
> > > | and I'll fix all my ebuilds.
> > >
> > > No, ~ is fine. Anything with a value below 127 (don't use 127, it's
> > > weird) that sed accepts is ok.
> >
> > in other words, ASCII characters are OK.  if in doubt, just run `man
> > ascii` and see if your character is in the table
> 
> You probably don't want to use the ascii control characters either 
> (anything below 32), although they should not give issues with people 
> they could cause havoc for terminals or annoy people (using the BELL 
> character as sed separator).

Um, I guess everybody got the point. In fact, you probably shouldn't use
alphanumerics either -- they work, but are as ugly as...
echo herr | sed -e sorolog

-- 
(*   Georgi Georgiev   (* They can always run stderr through uniq. :-) (*
*)[EMAIL PROTECTED]*) -- Larry Wall in *)
(*  +81(90)2877-8845   (* <[EMAIL PROTECTED]> (*


pgpg5sCWkN0mw.pgp
Description: PGP signature


Re: [gentoo-dev] The tree is now utf-8 clean

2005-09-19 Thread Paul de Vrieze
On Saturday 17 September 2005 22:06, Mike Frysinger wrote:
> On Saturday 17 September 2005 01:15 pm, Ciaran McCreesh wrote:
> > On Sat, 17 Sep 2005 12:56:37 +0200 "Fernando J. Pereda"
> >
> > <[EMAIL PROTECTED]> wrote:
> > | On Sat, Sep 17, 2005 at 02:42:09AM +0100, Ciaran McCreesh wrote:
> > | | Something strange I noticed... Some people are using funny quotes
> > | | and non breaking spaces in ebuilds. Some people are using weird
> > | | characters as substitution delimiters for sed. Don't! It will
> > | | break on many systems. I'm going to go and purge all of those,
> > | | UTF-8 or not, whenever my brain recovers.
> > |
> > | I hope ~ is not considered a weird character... if it is, tell me
> > | and I'll fix all my ebuilds.
> >
> > No, ~ is fine. Anything with a value below 127 (don't use 127, it's
> > weird) that sed accepts is ok.
>
> in other words, ASCII characters are OK.  if in doubt, just run `man
> ascii` and see if your character is in the table

You probably don't want to use the ascii control characters either 
(anything below 32), although they should not give issues with people 
they could cause havoc for terminals or annoy people (using the BELL 
character as sed separator).

Paul

-- 
Paul de Vrieze
Gentoo Developer
Mail: [EMAIL PROTECTED]
Homepage: http://www.devrieze.net


pgptLTrYB92Ee.pgp
Description: PGP signature


Re: [gentoo-dev] The tree is now utf-8 clean

2005-09-17 Thread Mike Frysinger
On Saturday 17 September 2005 01:15 pm, Ciaran McCreesh wrote:
> On Sat, 17 Sep 2005 12:56:37 +0200 "Fernando J. Pereda"
>
> <[EMAIL PROTECTED]> wrote:
> | On Sat, Sep 17, 2005 at 02:42:09AM +0100, Ciaran McCreesh wrote:
> | | Something strange I noticed... Some people are using funny quotes
> | | and non breaking spaces in ebuilds. Some people are using weird
> | | characters as substitution delimiters for sed. Don't! It will break
> | | on many systems. I'm going to go and purge all of those, UTF-8 or
> | | not, whenever my brain recovers.
> |
> | I hope ~ is not considered a weird character... if it is, tell me and
> | I'll fix all my ebuilds.
>
> No, ~ is fine. Anything with a value below 127 (don't use 127, it's
> weird) that sed accepts is ok.

in other words, ASCII characters are OK.  if in doubt, just run `man ascii` 
and see if your character is in the table
-mike
-- 
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] The tree is now utf-8 clean

2005-09-17 Thread Ciaran McCreesh
On Sat, 17 Sep 2005 18:15:31 +0100 Ciaran McCreesh <[EMAIL PROTECTED]>
wrote:
| No, ~ is fine. Anything with a value below 127 (don't use 127, it's
| weird) that sed accepts is ok. There are some ebuilds that use that
| curly paragraph marker character (§) and weird curly quotes. Those're
| the ones that cause problems.

Uhm, where by 127 I of course mean 128...

-- 
Ciaran McCreesh : Gentoo Developer (Vim, Shell tools, Fluxbox, Cron)
Mail: ciaranm at gentoo.org
Web : http://dev.gentoo.org/~ciaranm



pgpbmk5wjIuGb.pgp
Description: PGP signature


Re: [gentoo-dev] The tree is now utf-8 clean

2005-09-17 Thread Ciaran McCreesh
On Sat, 17 Sep 2005 12:56:37 +0200 "Fernando J. Pereda"
<[EMAIL PROTECTED]> wrote:
| On Sat, Sep 17, 2005 at 02:42:09AM +0100, Ciaran McCreesh wrote:
| | Something strange I noticed... Some people are using funny quotes
| | and non breaking spaces in ebuilds. Some people are using weird
| | characters as substitution delimiters for sed. Don't! It will break
| | on many systems. I'm going to go and purge all of those, UTF-8 or
| | not, whenever my brain recovers.
| 
| I hope ~ is not considered a weird character... if it is, tell me and
| I'll fix all my ebuilds.

No, ~ is fine. Anything with a value below 127 (don't use 127, it's
weird) that sed accepts is ok. There are some ebuilds that use that
curly paragraph marker character (§) and weird curly quotes. Those're
the ones that cause problems.

-- 
Ciaran McCreesh : Gentoo Developer (Vim, Shell tools, Fluxbox, Cron)
Mail: ciaranm at gentoo.org
Web : http://dev.gentoo.org/~ciaranm



pgprJtWkOpW0S.pgp
Description: PGP signature


Re: [gentoo-dev] The tree is now utf-8 clean

2005-09-17 Thread Fernando J. Pereda
On Sat, Sep 17, 2005 at 02:42:09AM +0100, Ciaran McCreesh wrote:
| Something strange I noticed... Some people are using funny quotes and
| non breaking spaces in ebuilds. Some people are using weird characters
| as substitution delimiters for sed. Don't! It will break on many
| systems. I'm going to go and purge all of those, UTF-8 or not, whenever
| my brain recovers.

I hope ~ is not considered a weird character... if it is, tell me and
I'll fix all my ebuilds.

Cheers,
Ferdy

-- 
Fernando J. Pereda Garcimartín
Gentoo Developer (Alpha,net-mail)
20BB BDC3 761A 4781 E6ED  ED0B 0A48 5B0C 60BD 28D4


pgpPPyvqGYvqI.pgp
Description: PGP signature


[gentoo-dev] The tree is now utf-8 clean

2005-09-16 Thread Ciaran McCreesh
The tree is now utf-8 clean. Or it is to the extent that a computer can
reasonably determine... If the relevant people are prepared to smack
anyone who refuses to play nice then now would be a good time to
unwithdraw GLEP 31, make compliance mandatory and add glep31check [1] to
repoman or server-side.

There are still a few instances of munged character sequences that
happen to also be valid UTF-8. If you come across one, feel free to fix
it.

If you have weird characters in your name, please make especially sure
that you're getting your ChangeLog name right. These are far more common
than occasional user credit ChangeLog entries. Also, if your name on the
devlist [2] isn't accented, pester someone to update it.

Something strange I noticed... Some people are using funny quotes and
non breaking spaces in ebuilds. Some people are using weird characters
as substitution delimiters for sed. Don't! It will break on many
systems. I'm going to go and purge all of those, UTF-8 or not, whenever
my brain recovers.

As far as editor support... On those really rare occasions when you need
to enter UTF-8 text in ebuilds, vim, emacs and nano should all more or
less work. For ChangeLogs, echangelog is utf-8 transparent, meaning if
you run it from a UTF-8 terminal it should be ok. We have a guide [3] if
you want to know more...

[1]: http://dev.gentoo.org/~ciaranm/toys/glep31check-0.3.3.tar.bz2
[2]: http://www.gentoo.org/proj/en/devrel/roll-call/userinfo.xml
[3]: http://www.gentoo.org/doc/en/utf-8.xml

-- 
Ciaran McCreesh : Gentoo Developer (Vim, Shell tools, Fluxbox, Cron)
Mail: ciaranm at gentoo.org
Web : http://dev.gentoo.org/~ciaranm



pgpR2WIhHmmXx.pgp
Description: PGP signature