On Wed, Sep 12, 2007 at 05:19:22PM +0200, Andrea Rossato wrote:
And so it's my job to convert it in what I need. Luckily I've just
discovered (and now I'm reading) some of John Meacham's code on
locale. This is going to be very helpful (unfortunately I don't see
Licenses coming with HsLocale,
Hi,
supposed that, in a Linux system, in an utf-8 locale, you create a file
with non ascii characters. For instance:
touch abèèè
Now, I would expect that the output of a shell command such as
ls ab*
would be a string/list of 5 chars. Instead I find it to be a list of 8
chars...;-)
That is to
On Sep 12, 2007, at 10:18 , Andrea Rossato wrote:
supposed that, in a Linux system, in an utf-8 locale, you create a
file
with non ascii characters. For instance:
touch abèèè
Now, I would expect that the output of a shell command such as
ls ab*
would be a string/list of 5 chars. Instead I
Andrea Rossato wrote:
Hi,
supposed that, in a Linux system, in an utf-8 locale, you create a file
with non ascii characters. For instance:
touch abèèè
Now, I would expect that the output of a shell command such as
ls ab*
would be a string/list of 5 chars. Instead I find it to be a list of 8
On Wed, Sep 12, 2007 at 10:53:29AM -0400, Brandon S. Allbery KF8NH wrote:
That is expected. The low level filesystem storage doesn't know about
character sets, so non-ASCII filenames must be encoded in e.g. UTF-8. 8
characters is therefore correct, and you must do UTF-8 decoding on input
On 12/09/2007, Seth Gordon [EMAIL PROTECTED] wrote:
I � Unicode.
Was it intentional that the central character appears as a little '?',
even though the aleph on the line above worked? Either way it would be
very amusing, but for different reasons...
D
On Wed, Sep 12, 2007 at 11:16:25AM -0400, Seth Gordon wrote:
It appears that in spite of the locale definition, hGetContents is treating
each byte as a separate character without translating the multi-byte
sequences *from* UTF-8, and then putStrLn sends each of those bytes to
standard
Andrea Rossato wrote:
What puzzles me is the behavior of putStrLn.
putStrLn is sending the following bytes to standard output:
97, 98, 195, 168, 195, 168, 195, 168, 10
Since the code that renders characters in your terminal emulator is
expecting UTF-8[*], each (195, 168) pair of bytes is
Dougal Stanton wrote:
On 12/09/2007, Seth Gordon [EMAIL PROTECTED] wrote:
I � Unicode.
Was it intentional that the central character appears as a little '?',
even though the aleph on the line above worked?
It was intentional. If I ♡ed Unicode, I would have said so.
On Wed, Sep 12, 2007 at 11:40:11AM -0400, Seth Gordon wrote:
The Unix utility od can be very helpful in figuring out problems like
this.
Thanks for pointing me to od, I didn't know it.
[*]At least on my computer, I get the same result *even if* I change LANG
from en_US.utf8 to C.
As
On 9/12/07, Andrea Rossato [EMAIL PROTECTED] wrote:
If I run it in a console I get
abAAA (sort of) no matter what my LANG is - 8 single 8 -bit
characters.
It's possible to set your Linux console to grok UTF8. I don't
remember the details, but I'm sure you can Google for it.
By the way, does
David Benbennick wrote:
On 9/12/07, Andrea Rossato [EMAIL PROTECTED] wrote:
If I run it in a console I get
abAAA (sort of) no matter what my LANG is - 8 single 8 -bit
characters.
It's possible to set your Linux console to grok UTF8. I don't
remember the details, but I'm sure you can Google
mailing_list:
On Wed, Sep 12, 2007 at 11:16:25AM -0400, Seth Gordon wrote:
It appears that in spite of the locale definition, hGetContents is
treating
each byte as a separate character without translating the multi-byte
sequences *from* UTF-8, and then putStrLn sends each of those
13 matches
Mail list logo