Re: [Haskell-cafe] getting crazy with character encoding

2007-09-13 Thread John Meacham
On Wed, Sep 12, 2007 at 05:19:22PM +0200, Andrea Rossato wrote: And so it's my job to convert it in what I need. Luckily I've just discovered (and now I'm reading) some of John Meacham's code on locale. This is going to be very helpful (unfortunately I don't see Licenses coming with HsLocale,

[Haskell-cafe] getting crazy with character encoding

2007-09-12 Thread Andrea Rossato
Hi, supposed that, in a Linux system, in an utf-8 locale, you create a file with non ascii characters. For instance: touch abèèè Now, I would expect that the output of a shell command such as ls ab* would be a string/list of 5 chars. Instead I find it to be a list of 8 chars...;-) That is to

Re: [Haskell-cafe] getting crazy with character encoding

2007-09-12 Thread Brandon S. Allbery KF8NH
On Sep 12, 2007, at 10:18 , Andrea Rossato wrote: supposed that, in a Linux system, in an utf-8 locale, you create a file with non ascii characters. For instance: touch abèèè Now, I would expect that the output of a shell command such as ls ab* would be a string/list of 5 chars. Instead I

Re: [Haskell-cafe] getting crazy with character encoding

2007-09-12 Thread Seth Gordon
Andrea Rossato wrote: Hi, supposed that, in a Linux system, in an utf-8 locale, you create a file with non ascii characters. For instance: touch abèèè Now, I would expect that the output of a shell command such as ls ab* would be a string/list of 5 chars. Instead I find it to be a list of 8

Re: [Haskell-cafe] getting crazy with character encoding

2007-09-12 Thread Andrea Rossato
On Wed, Sep 12, 2007 at 10:53:29AM -0400, Brandon S. Allbery KF8NH wrote: That is expected. The low level filesystem storage doesn't know about character sets, so non-ASCII filenames must be encoded in e.g. UTF-8. 8 characters is therefore correct, and you must do UTF-8 decoding on input

Re: [Haskell-cafe] getting crazy with character encoding

2007-09-12 Thread Dougal Stanton
On 12/09/2007, Seth Gordon [EMAIL PROTECTED] wrote: I � Unicode. Was it intentional that the central character appears as a little '?', even though the aleph on the line above worked? Either way it would be very amusing, but for different reasons... D

Re: [Haskell-cafe] getting crazy with character encoding

2007-09-12 Thread Andrea Rossato
On Wed, Sep 12, 2007 at 11:16:25AM -0400, Seth Gordon wrote: It appears that in spite of the locale definition, hGetContents is treating each byte as a separate character without translating the multi-byte sequences *from* UTF-8, and then putStrLn sends each of those bytes to standard

Re: [Haskell-cafe] getting crazy with character encoding

2007-09-12 Thread Seth Gordon
Andrea Rossato wrote: What puzzles me is the behavior of putStrLn. putStrLn is sending the following bytes to standard output: 97, 98, 195, 168, 195, 168, 195, 168, 10 Since the code that renders characters in your terminal emulator is expecting UTF-8[*], each (195, 168) pair of bytes is

Re: [Haskell-cafe] getting crazy with character encoding

2007-09-12 Thread Seth Gordon
Dougal Stanton wrote: On 12/09/2007, Seth Gordon [EMAIL PROTECTED] wrote: I � Unicode. Was it intentional that the central character appears as a little '?', even though the aleph on the line above worked? It was intentional. If I ♡ed Unicode, I would have said so.

Re: [Haskell-cafe] getting crazy with character encoding

2007-09-12 Thread Andrea Rossato
On Wed, Sep 12, 2007 at 11:40:11AM -0400, Seth Gordon wrote: The Unix utility od can be very helpful in figuring out problems like this. Thanks for pointing me to od, I didn't know it. [*]At least on my computer, I get the same result *even if* I change LANG from en_US.utf8 to C. As

Re: [Haskell-cafe] getting crazy with character encoding

2007-09-12 Thread David Benbennick
On 9/12/07, Andrea Rossato [EMAIL PROTECTED] wrote: If I run it in a console I get abAAA (sort of) no matter what my LANG is - 8 single 8 -bit characters. It's possible to set your Linux console to grok UTF8. I don't remember the details, but I'm sure you can Google for it. By the way, does

Re: [Haskell-cafe] getting crazy with character encoding

2007-09-12 Thread Jules Bean
David Benbennick wrote: On 9/12/07, Andrea Rossato [EMAIL PROTECTED] wrote: If I run it in a console I get abAAA (sort of) no matter what my LANG is - 8 single 8 -bit characters. It's possible to set your Linux console to grok UTF8. I don't remember the details, but I'm sure you can Google

Re: [Haskell-cafe] getting crazy with character encoding

2007-09-12 Thread Don Stewart
mailing_list: On Wed, Sep 12, 2007 at 11:16:25AM -0400, Seth Gordon wrote: It appears that in spite of the locale definition, hGetContents is treating each byte as a separate character without translating the multi-byte sequences *from* UTF-8, and then putStrLn sends each of those