Re: OT: Python (was: Make Unicode bugs release critical?)

2011-02-15 Thread Vincent Lefevre
On 2011-02-14 16:43:11 +, Ian Jackson wrote:
 When LC_CTYPE=en_GB.utf-8, programs which attempt to print unicode
 characters to stdout should use UTF-8.  That's what LC_TYPE means.

So, cat, grep, etc. are all broken. :)

-- 
Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/
100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110216000107.gl15...@prunille.vinc17.org



Re: OT: Python (was: Make Unicode bugs release critical?)

2011-02-15 Thread Adam Borowski
On Wed, Feb 16, 2011 at 01:01:07AM +0100, Vincent Lefevre wrote:
 On 2011-02-14 16:43:11 +, Ian Jackson wrote:
  When LC_CTYPE=en_GB.utf-8, programs which attempt to print unicode
  characters to stdout should use UTF-8.  That's what LC_TYPE means.
 
 So, cat, grep, etc. are all broken. :)

How come?

cat will, for any valid UTF-8 character on input, print a valid UTF-8
character on output.  For any valid ISO-8859-1 character on input, it will
print a valid ISO-8859-1 character on output.  

grep on the other hand has to actually understand the encoding -- and it
does.  Try this:
$ echo ą|LC_CTYPE=C grep --color=always .
Will be mangled.
$ echo ą|LC_CTYPE=en_US.utf-8 grep --color=always .
Will be handled correctly.

-- 
1KB // Microsoft corollary to Hanlon's razor:
//  Never attribute to stupidity what can be
//  adequately explained by malice.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110216003451.ga14...@angband.pl



Re: OT: Python (was: Make Unicode bugs release critical?)

2011-02-15 Thread Vincent Lefevre
On 2011-02-16 01:34:51 +0100, Adam Borowski wrote:
 On Wed, Feb 16, 2011 at 01:01:07AM +0100, Vincent Lefevre wrote:
  On 2011-02-14 16:43:11 +, Ian Jackson wrote:
   When LC_CTYPE=en_GB.utf-8, programs which attempt to print unicode
   characters to stdout should use UTF-8.  That's what LC_TYPE means.
  
  So, cat, grep, etc. are all broken. :)
 
 How come?
 
 cat will, for any valid UTF-8 character on input, print a valid UTF-8
 character on output.  For any valid ISO-8859-1 character on input, it will
 print a valid ISO-8859-1 character on output.

I was just commenting what Ian said. If there is a valid reason for
which cat may not produce UTF-8 in UTF-8 locales, this is also
true for perl or any other software.

-- 
Vincent Lefèvre vinc...@vinc17.net - Web: http://www.vinc17.net/
100% accessible validated (X)HTML - Blog: http://www.vinc17.net/blog/
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110216004529.gn15...@prunille.vinc17.org



OT: Python (was: Make Unicode bugs release critical?)

2011-02-14 Thread Klaus Ethgen
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Hi,

lets start a python rant. I love to hate this language. :-)

Am Mo den 14. Feb 2011 um 14:14 schrieb Jakub Wilk:
 $ LC_CTYPE=en_GB.utf-8 python -c 'print u\u00a3'
 unicode pound sign
[...]
 $ LC_CTYPE=en_GB.utf-8 python -c 'print u\u00a3' | cat
 Traceback (most recent call last):
  File string, line 1, in module
 UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in
 position 0: ordinal not in range(128)
 
 This is the expected behaviour. Incidentally, it has nothing to do
 with UTF-8. You'll get the same result if you use a locale with a
 legacy encoding.

I see. It is funny to see python lovers to blame other for the bugs in
the language.

~ LC_CTYPE=en_GB.utf-8 perl -e 'print \x{00a3}\n;'
~ LC_CTYPE=en_GB.utf-8 perl -e 'print \x{00a3}\n;' | cat

Both gives the same result, a '£' sign as expected.

 * Ian Jackson ijack...@chiark.greenend.org.uk, 2011-02-14, 12:42:
 Excellent, I look forward to the removal of python.  I always
 hated that language anyway.

I hate them more. :-)

Regards
   Klaus
- -- 
Klaus Ethgenhttp://www.ethgen.ch/
pub  2048R/D1A4EDE5 2000-02-26 Klaus Ethgen kl...@ethgen.de
Fingerprint: D7 67 71 C4 99 A6 D4 FE  EA 40 30 57 3C 88 26 2B
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iQEVAwUBTVkwIJ+OKpjRpO3lAQr9qAf+I4UXXNKso2hhr6BEjgn/o0IOpbI6/jhe
YwSf5rysUlb924NvtdOc1VzLoOff/uUDXOpW0VICSJMZRfVLZvVvdwaysa+SJj/f
0UL0CnuHogtan5uV627JFQRI5/VpQ9LXRc7w6w0+Eh8d7Pm/FJYomI4fuGAM0jPo
n1mFCeHSP2PiSIJ85cKWCqxsDkC4EDrPvrqol2ZJfuW1bVqqViGWMIrQ8RXzQ8JD
eSBHY0qjOCoMz1W46C4ruk3SVkX6FGe/V9U6XUG9kcAYlfpMyfeHDQ207P1tuEUH
dmD9gFA8ZpUgxHSZY43ONBnJlFynubPv7bmWoic7sez6V8zab6TFqg==
=KrXl
-END PGP SIGNATURE-


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110214133736.gb6...@ikki.ethgen.ch



Re: OT: Python (was: Make Unicode bugs release critical?)

2011-02-14 Thread Philipp Kern
On 2011-02-14, Klaus Ethgen kl...@ethgen.de wrote:
 ~ LC_CTYPE=en_GB.utf-8 perl -e 'print \x{00a3}\n;'
 ~ LC_CTYPE=en_GB.utf-8 perl -e 'print \x{00a3}\n;' | cat
 Both gives the same result, a '£' sign as expected.

And what's the value in that demonstration?  Yes, you can treat UTF8 like a
bytestream.  And the thread was about the problems that can arise of this.

Kind regards
Philipp Kern


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/slrnilidf3.11r.tr...@kelgar.0x539.de



Re: OT: Python (was: Make Unicode bugs release critical?)

2011-02-14 Thread Lars Wirzenius
On ma, 2011-02-14 at 14:37 +0100, Klaus Ethgen wrote:
 lets start a python rant. I love to hate this language. :-)

Let's not.

Let's not rant about any languages, or tools, or desktop environments.
Let's be constructive on Debian mailing lists, shall we?

We have plenty of side-channels for rants, sarcasm, snide remarks,
passive-aggressiveness, and other forms of anti-social behavior, let's
use those instead.



-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1297692931.31960.13.camel@tacticus



Re: OT: Python (was: Make Unicode bugs release critical?)

2011-02-14 Thread Jakub Wilk

* Klaus Ethgen kl...@ethgen.de, 2011-02-14, 14:37:

~ LC_CTYPE=en_GB.utf-8 perl -e 'print \x{00a3}\n;'
~ LC_CTYPE=en_GB.utf-8 perl -e 'print \x{00a3}\n;' | cat


Let me try...

$ LC_CTYPE=en_GB.utf-8 perl -e 'print \x{00a3}\n;' | isutf8
stdin: line 1, char 1, byte offset 1: invalid UTF-8 code


But I don't blame Perl for that. It's documented behavior, so I can 
either live with that or use another language.


--
Jakub Wilk


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110214143302.ga6...@jwilk.net



Re: OT: Python (was: Make Unicode bugs release critical?)

2011-02-14 Thread Klaus Ethgen
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Am Mo den 14. Feb 2011 um 15:15 schrieb Lars Wirzenius:
 On ma, 2011-02-14 at 14:37 +0100, Klaus Ethgen wrote:
  lets start a python rant. I love to hate this language. :-)
 
 Let's not.

'Till here it is personal desire.

 Let's not rant about any languages, or tools, or desktop environments.
 Let's be constructive on Debian mailing lists, shall we?

You are true. I just couldn't resist if someone was trying to blame all
other than the one that has the bug.

Regards
   Klaus
- -- 
Klaus Ethgenhttp://www.ethgen.ch/
pub  2048R/D1A4EDE5 2000-02-26 Klaus Ethgen kl...@ethgen.de
Fingerprint: D7 67 71 C4 99 A6 D4 FE  EA 40 30 57 3C 88 26 2B
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iQEVAwUBTVk9hZ+OKpjRpO3lAQoy7Qf9EV1erqhNsAgfJ1ubQiitzufbk5Wq4rA/
rVh+Tpn4SHTE3D5Sw20UIPrUYonaQD6z8gokOkIdvzvgzVOBj3vPioFnWZy368QK
DUXymUPal23q+iwwV8FYNqq7ggnwpnT0DX1PNCmMUHZl21ZkMjMJO2cuv21ycD6I
JGBvA0w+dOVb7YfI+HGMwAlyT2gEkT7nsg8nlvYUU+EgzCaXjC1tdPHfe3QAYsQh
Pd0QDqhxFvwVRB9SskSas1JnjUh5DKMI/USr7a/+jP6dWeVQHIRglIN5uNFCq8kW
70jM2XCdTeZcdFy1lOiJ07YCYW1gg0kKCN+DlyEFJmJUzYsfP+4KsQ==
=H8Sg
-END PGP SIGNATURE-


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110214143445.gd6...@ikki.ethgen.ch



Re: OT: Python (was: Make Unicode bugs release critical?)

2011-02-14 Thread Adam Borowski
On Mon, Feb 14, 2011 at 02:02:11PM +, Philipp Kern wrote:
 On 2011-02-14, Klaus Ethgen kl...@ethgen.de wrote:
  ~ LC_CTYPE=en_GB.utf-8 perl -e 'print \x{00a3}\n;'
  ~ LC_CTYPE=en_GB.utf-8 perl -e 'print \x{00a3}\n;' | cat
  Both gives the same result, a '£' sign as expected.
 
 And what's the value in that demonstration?  Yes, you can treat UTF8 like a
 bytestream.  And the thread was about the problems that can arise of this.

Er, and tell me where exactly it makes sense to allow one encoding but not
another for a bytestream?

It appears that Python has a nasty bug where it ignores the encoding if
isatty(stdout) returns 0.  So let's go fixing or reporting that rather than
arguing about it.

-- 
1KB // Microsoft corollary to Hanlon's razor:
//  Never attribute to stupidity what can be
//  adequately explained by malice.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110214143608.ga8...@angband.pl



Re: OT: Python (was: Make Unicode bugs release critical?)

2011-02-14 Thread Ian Jackson
Jakub Wilk writes (Re: OT: Python (was: Make Unicode bugs release critical?)):
 * Klaus Ethgen kl...@ethgen.de, 2011-02-14, 14:37:
 ~ LC_CTYPE=en_GB.utf-8 perl -e 'print \x{00a3}\n;'
 ~ LC_CTYPE=en_GB.utf-8 perl -e 'print \x{00a3}\n;' | cat
 
 Let me try...
 
 $ LC_CTYPE=en_GB.utf-8 perl -e 'print \x{00a3}\n;' | isutf8
 stdin: line 1, char 1, byte offset 1: invalid UTF-8 code

WTF.  OK, Perl's out too.

We'll have to write everything in dash :-).

Ian.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/19801.18743.486394.290...@chiark.greenend.org.uk



Re: OT: Python (was: Make Unicode bugs release critical?)

2011-02-14 Thread Klaus Ethgen
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Am Mo den 14. Feb 2011 um 16:24 schrieb Ian Jackson:
 Jakub Wilk writes (Re: OT: Python (was: Make Unicode bugs release 
 critical?)):
  * Klaus Ethgen kl...@ethgen.de, 2011-02-14, 14:37:
  ~ LC_CTYPE=en_GB.utf-8 perl -e 'print \x{00a3}\n;'
  ~ LC_CTYPE=en_GB.utf-8 perl -e 'print \x{00a3}\n;' | cat
  
  Let me try...
  
  $ LC_CTYPE=en_GB.utf-8 perl -e 'print \x{00a3}\n;' | isutf8
  stdin: line 1, char 1, byte offset 1: invalid UTF-8 code
 
 WTF.  OK, Perl's out too.

No, it is not. 00a3 is just not a utf-8 character, it is unicode. To get
a correct utf-8 character you need to print \x{c2a3} and then isutf8 is
happy.

 We'll have to write everything in dash :-).

lisp. :-)

But now we get complete out of topic.

Regards
   Klaus
- -- 
Klaus Ethgenhttp://www.ethgen.ch/
pub  2048R/D1A4EDE5 2000-02-26 Klaus Ethgen kl...@ethgen.de
Fingerprint: D7 67 71 C4 99 A6 D4 FE  EA 40 30 57 3C 88 26 2B
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)

iQEVAwUBTVlWk5+OKpjRpO3lAQohXgf9FC839X5Pozj2LZUJKd+X9Bcy5F/q+zWg
cdPlFkRL2BSq05M4+V8anb6vP47JdMMJfgc1oszNWZkYOQkgZdTy1GdCVF9o0jpD
xSlA7MVBt7ijTtfOlodzZiO6PyXPx7vo6AJGUufwb4KxekLR6vKq9fzlTLvvD/mH
lPPbCuZrY90eWqRjFeLyXA6Cmx+cJG5jt8nAAOzBjWTuENNp+vTFx1Lad13que7T
AAXrQupjCpRwAxfN8cuYMMIAFw5FCOyTQNAZXaAeMV1UOslVVdXlffUDB6uqpNvC
JPPL9PhughLVWtSxsm74emFCVkBQ75xTGMJTbCUCfMmdwTj3mD7uLw==
=J1JB
-END PGP SIGNATURE-


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110214162139.gf6...@ikki.ethgen.ch



Re: OT: Python (was: Make Unicode bugs release critical?)

2011-02-14 Thread Ian Jackson
Klaus Ethgen writes (Re: OT: Python (was: Make Unicode bugs release 
critical?)):
 No, it is not. 00a3 is just not a utf-8 character, it is unicode. To get
 a correct utf-8 character you need to print \x{c2a3} and then isutf8 is
 happy.

When LC_CTYPE=en_GB.utf-8, programs which attempt to print unicode
characters to stdout should use UTF-8.  That's what LC_TYPE means.

Ian.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/19801.23455.536473.211...@chiark.greenend.org.uk



Re: OT: Python (was: Make Unicode bugs release critical?)

2011-02-14 Thread Konstantin Khomoutov
On Mon, 14 Feb 2011 16:43:11 +
Ian Jackson ijack...@chiark.greenend.org.uk wrote:

 Klaus Ethgen writes (Re: OT: Python (was: Make Unicode bugs release
 critical?)):
  No, it is not. 00a3 is just not a utf-8 character, it is unicode.
  To get a correct utf-8 character you need to print \x{c2a3} and
  then isutf8 is happy.
 
 When LC_CTYPE=en_GB.utf-8, programs which attempt to print unicode
 characters to stdout should use UTF-8.  That's what LC_TYPE means.

By the way,

$ LC_CTYPE=en_GB.utf-8 echo 'puts \x00a3\n'|tclsh|isutf8
$
$ LC_CTYPE=en_GB.utf-8 echo 'puts \x00a3\n'|tclsh|xxd -p
c2a30a0a
$

But RMS told the world not to use Tcl.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110214203601.715df57c.kos...@domain007.com