RE: Text in Haskell: a second proposal

2002-08-10 Thread Ashley Yakeley
At 2002-08-09 03:26, Simon Marlow wrote: >Why combine I/O and {en,de}coding? Firstly, efficiency. Hmm... surely the encoding functions can be defined efficiently? decodeISO88591 :: [Word8] -> [Char]; encodeISO88591 :: [Char] -> [Word8]; -- uses low octet of codepoint You could surely

Re: UTF-8 library

2002-08-10 Thread Marcin 'Qrczak' Kowalczyk
Thu, 08 Aug 2002 19:28:18 +1000 (EST), Manuel M T Chakravarty <[EMAIL PROTECTED]> pisze: > ANSI C guarantees that char is 1 byte (more precisely that > "sizeof (char)" == 1). It says that sizeof (char) == 1 but doesn't say that it means 8 bits. sizeof is measured in chars, whatever it is. But l

Re: UTF-8 library

2002-08-10 Thread Marcin 'Qrczak' Kowalczyk
Thu, 8 Aug 2002 09:59:12 -0700 (PDT), anatoli <[EMAIL PROTECTED]> pisze: > I'd still rather associate locale with a handle. I agree. http://www.sf.net/projects/qforeign/ contains an experimental character recoding library with a IO module wrapper which associates encodings with Handles. But I do

Re: UTF-8 library

2002-08-10 Thread Ashley Yakeley
At 2002-08-10 01:21, Marcin 'Qrczak' Kowalczyk wrote: >Perhaps we can assume some widely true facts even if ANSI C doesn't >guarantee that if it makes life easier. For example that a C type >corresponding to Int32 exists at all, and that different pointer >types have the same representation - we

Re: UTF-8 library

2002-08-10 Thread Marcin 'Qrczak' Kowalczyk
09 Aug 2002 10:17:21 +0200, Sven Moritz Hallberg <[EMAIL PROTECTED]> pisze: > I argue _strongly_ against associating some sort of locale state with > handles. > > 1) In agreement with Ashley's statements, file IO should use octets, > because that's what's in a file. So it would imply two types

Re: UTF-8 library

2002-08-10 Thread Marcin 'Qrczak' Kowalczyk
Sat, 10 Aug 2002 01:31:51 -0700, Ashley Yakeley <[EMAIL PROTECTED]> pisze: >>that different pointer >>types have the same representation - we already rely on that, don't we? > > No, we have separate Ptrs and FunctionPtrs IIRC... Yes, but I mean the possibility that Ptr Word8 looks differently t

Re: Text in Haskell: A PROPOSAL

2002-08-10 Thread Marcin 'Qrczak' Kowalczyk
Thu, 8 Aug 2002 03:16:09 -0700, Ashley Yakeley <[EMAIL PROTECTED]> pisze: >> With, perhaps, UTF-8 as a reasonable default? > > Perhaps it should _always_ be UTF-8? My files are not in UTF-8, so reading them as UTF-8 is wrong. Files are in the locale encoding unless the programmer explicitly sp

Re: Text in Haskell: a second proposal

2002-08-10 Thread Marcin 'Qrczak' Kowalczyk
Thu, 8 Aug 2002 23:40:42 -0700, Ashley Yakeley <[EMAIL PROTECTED]> pisze: >> 1. Octets. >> 2. C "char". >> 3. Unicode code points. >> 4. Unicode code values, useful only for UTF-16, which is seldom used. >> 5. "What handles handle". > I disagree, they should be: > > 1. Word8 > 2. CChar > 3. Cha

Re: UTF-8 library

2002-08-10 Thread anatoli
--- Sven Moritz Hallberg <[EMAIL PROTECTED]> wrote: > I argue _strongly_ against associating some sort of locale state with > handles. > > 1) In agreement with Ashley's statements, file IO should use octets, > because that's what's in a file. By the same token, we should handle CR/LF/CR-LF/LF-CR

Re: UTF-8 library

2002-08-10 Thread Ashley Yakeley
At 2002-08-10 03:03, anatoli wrote: >--- Sven Moritz Hallberg <[EMAIL PROTECTED]> wrote: >> I argue _strongly_ against associating some sort of locale state with >> handles. >> >> 1) In agreement with Ashley's statements, file IO should use octets, >> because that's what's in a file. > >By the s

Re: Text in Haskell: a second proposal

2002-08-10 Thread Wolfgang Jeltsch
On Friday, 2002-08-09, 08:40, CEST, Ashley Yakeley wrote: > At 2002-08-08 23:10, Ken Shan wrote: > > > 1. Octets. > > 2. C "char". > > 3. Unicode code points. > > 4. Unicode code values, useful only for UTF-16, which is seldom used. > > 5. "What handles handle". > ... > >I suggest that the follow

Re: UTF-8 library

2002-08-10 Thread anatoli
--- Ashley Yakeley <[EMAIL PROTECTED]> wrote: > >By the same token, we should handle CR/LF/CR-LF/LF-CR mess by hand. > >(Files don't have lines in them, they are just sequences of octets.) > > Correct. Exactly what kind of newline do you want in your file? The correct answer depends on the leve

Re: UTF-8 library

2002-08-10 Thread Sven Moritz Hallberg
On Sat, 2002-08-10 at 12:03, anatoli wrote: > --- Sven Moritz Hallberg <[EMAIL PROTECTED]> wrote: > > I argue _strongly_ against associating some sort of locale state with > > handles. > > > > 1) In agreement with Ashley's statements, file IO should use octets, > > because that's what's in a file

Re: Yet more text pedantry

2002-08-10 Thread Marcin 'Qrczak' Kowalczyk
Fri, 09 Aug 2002 15:24:55 +0200, George Russell <[EMAIL PROTECTED]> pisze: > but the fact is that the standard access functions return > characters*, and on Solaris the default representation of > a characters is as a signed quantity. Only because of a messy history. No need to transfer the sill

Re: UTF-8 library

2002-08-10 Thread David Feuer
On Sat, 10 Aug 2002, Ashley Yakeley wrote: > One of the things that really bothers me about C is the way its > unspecifiedness about types can "infect" other languages. For instance, > what exactly is a Haskell Int? I think it's the idea that's infectious, because it is a good idea. The C stand

Re: UTF-8 library

2002-08-10 Thread anatoli
[apologies if you see multiple copies; I forgot to Cc: the list the first time around.] --- Sven Moritz Hallberg <[EMAIL PROTECTED]> wrote: > [...] I think that it's > ugly, though, to do it somewhere outside, pretending the issue's not > there. I value about Haskell it's clean representation of

Re: UTF-8 library

2002-08-10 Thread Joe English
Ashley Yakeley wrote: > > One of the things that really bothers me about C is the way its > unspecifiedness about types can "infect" other languages. For instance, > what exactly is a Haskell Int? > > Java, at least, stands firm, but then platform-independence was one of > Java's explicit design

Re: UTF-8 library

2002-08-10 Thread Lennart Augustsson
Joe English wrote: >Java attempts platform independence by declaring that all >the world *is*, in fact, a VAX [*]. > > >[*] More precisely, a 32-bit platform with IEEE 754 floating point. > And the original VAX did in fact not have IEEE floating point. :-) -- Lennart __

The Newest and Fastest gateway to London

2002-08-10 Thread ukresorts
The Newest and Fastest gateway to London Property/Cars/Jobs/Classifieds/Dating Business Directory Services. WWW.ONLINE-LONDON.NET Is your Business listed ? ___ Haskell mailing list [EMAIL PROTECTED] http://www.haskel

Re: UTF-8 library

2002-08-10 Thread Manuel M T Chakravarty
"Marcin 'Qrczak' Kowalczyk" <[EMAIL PROTECTED]> wrote, > Thu, 08 Aug 2002 19:28:18 +1000 (EST), Manuel M T Chakravarty <[EMAIL PROTECTED]> >pisze: > > > ANSI C guarantees that char is 1 byte (more precisely that > > "sizeof (char)" == 1). > > It says that sizeof (char) == 1 but doesn't say tha