Re: Why are strings linked lists?

2003-11-29 Thread Ashley Yakeley
In article <[EMAIL PROTECTED]>,
 Glynn Clements <[EMAIL PROTECTED]> wrote:

> OK; by "Char is 4 bytes" I basically meant that it's "large enough".

Char is exactly the correct size. The Eq, Ord and Enum instances all 
work correctly. The fact that you cannot represent values outside the 
range is important IMO.

> 1. Where would you get a Char from?
> 2. Where would you put it?

You can convert to and from the codepoint number using toEnum and 
fromEnum. What is missing is UTF-8 and Latin-1 charset conversions, and 
character properties. You can find draft standard library code for these 
here:


> BTW, I agree that the IO functions *should* use Word8.

Right.

> And I really
> wouldn't be that bothered if the standard was changed to just use
> "type Char = Word8". Actually, I would prefer that to the current
> fiction.

No! In GHC, a Char represents a Unicode codepoint: nothing more, and 
nothing less. This is something that probably ought to become part of 
some later Haskell standard. Frankly I find the idea that the character 
'A' is somehow identical to the number 65, or octet value 65, to be 
completely bizarre, and Haskell does well to give them separate types.

The problem is that certain IO functions do implicit Latin-1 conversion.

> The IO problems are design bugs, and can't truly be fixed without
> breaking a lot of existing code.

Well that's what deprecation is for. New Word8-based functions would 
have new names. Every so often there's a burst of activity on the 
Libraries or the Internationalisation lists concerning this, but it 
never quite comes together somehow.

-- 
Ashley Yakeley, Seattle WA

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread Mark Carroll
On Sat, 29 Nov 2003 [EMAIL PROTECTED] wrote:
(snip)
> Interesting that you mention this.  I've also been thinking about this
> lately in the context of the discussion on collections and the left-fold
> combinator both here and on LtU.  When people say "I want String to be
> [Char]", what I'm actually hearing is "I want String to be a collection
> of Char".  I may be mishearing.

It did strike me that it would be interesting if you could make various
things instances of a List sort of class and then take, reverse, etc.
would work on them. How this relates to your comment, I'm not sure.
Things like map, of course, could work on unordered bags of things too,
but I suppose that's what Functors are for.

-- Mark
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread Glynn Clements

Ashley Yakeley wrote:

> > Simply claiming that values of type Char are Unicode characters
> > doesn't make it so.
> 
> Actually, that's exactly what makes it so.

Hmm. I suppose that there's some validity to that perspective. OTOH,
it's one thing to state that it's true, but that's rather hollow if
nothing actually behaves as if it is.

It's a bit like saying "values of type Int are complex numbers; oh,
BTW, the implementation is currently broken".

IOW, if it walks like a duck, ...

> > Unless I'm missing something, the only "support" that GHC provides is
> > that Char is 4 bytes.
> 
> No, on GHC a Char is a Unicode codepoint, which means it has only 
> 17*2^16 possible values. This by itself is the most important aspect of 
> Unicode support.

OK; by "Char is 4 bytes" I basically meant that it's "large enough".

> But most of the rest is missing.

AFAICT, *all*[1] of the rest is missing.

[1] With one rather useless exception: (maxBound :: Char) == 0x10. 
I can't think of any other aspect of GHC's behaviour which would
indicate that Char is meant to be Unicode.

> > If you use Char to store anything other than ISO
> > Latin-1 characters, none of the Haskell functions with Char in their
> > signature will be of any use.
> 
> Actually, many of those functions ought to use Word8 instead.

But then:

1. Where would you get a Char from?
2. Where would you put it?

BTW, I agree that the IO functions *should* use Word8. And I really
wouldn't be that bothered if the standard was changed to just use
"type Char = Word8". Actually, I would prefer that to the current
fiction.

At least the problems with the Char functions are just implementation
bugs; those functions *could* be made to work correctly.

The IO problems are design bugs, and can't truly be fixed without
breaking a lot of existing code. A workaround which preserves backward
compatibility could result in a rather ugly interface: either all of
the relevant functions use a default encoding (which will probably be
the wrong one as often as not), or the "right" functions have to have
their names bastardised because the "wrong" functions have already
stolen the obvious names.

-- 
Glynn Clements <[EMAIL PROTECTED]>
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread ajb
G'day all.

Quoting Wolfgang Jeltsch <[EMAIL PROTECTED]>:

> I think, I have already said the following on this list. I would also like to
> have different character types for different subsets of Char (e.g., ASCII)
> and a class Character which the different character types are instances of.

As a matter of interest, what might some of the methods of this
class be?  ord and chr are two obvious choices.  What else?

Cheers,
Andrew Bromage
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Incomplete output when hPutStr is lifted

2003-11-29 Thread andrew cooke

Tomasz Zielonka said:
> On Thu, Nov 27, 2003 at 04:09:00PM -0300, andrew cooke wrote:
[...]
>> If I compile and run the code below, the file "foo" contains 10 lines
>> of output (as I would expect), but the file "bar" contains just 9 -
>> the final line is missing.  If I add a "join", as in the comment, then
>> all 10 lines appear.
>>
>> I don't understand why, completely - my best guess is that it's
>> related laziness and the final result of the fold not being used.
>
> No, it's not about laziness. The last result (action of type IO Int) is
> not threaded through the IO monad (forgive me the clumsy wording).

Ah.  Thanks.  I understand what you mean in general terms - I'll have to
go back and look at the code to understand in detail.

[...]
> Join will work, but why are you using such a strange code anyway ?

It was a simplified version of more complex code that drives a progress
meter - as the list is consumed IO is performed.  It wasn't designed from
the bottom up to support IO in such a low-level part of the code. The
simplest solution was to have the list be of "IO a" rather than "a" (the
IO coming from teh changing state of the progress meter).  Then the fold
over the list values gives code similar to what I posted.

Hope that makes some sense.  I'm still getting to grips with IO - it seems
a lot less complicated than I first thought, but I still don't have much
practice at structuring programs that use it.  So it's possible it's a
rather odd solution.

Cheers,
Andrew

-- 
personal web site: http://www.acooke.org/andrew
personal mail list: http://www.acooke.org/andrew/compute.html
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread ajb
G'day all.

Quoting John Meacham <[EMAIL PROTECTED]>:

> Something I'd like to see (perhaps a bit less
> drastic) would be a String class, similar to Num so string constants
> would have type
> String a => a

Interesting that you mention this.  I've also been thinking about this
lately in the context of the discussion on collections and the left-fold
combinator both here and on LtU.  When people say "I want String to be
[Char]", what I'm actually hearing is "I want String to be a collection
of Char".  I may be mishearing.

Cheers,
Andrew Bromage
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread Ashley Yakeley
In article <[EMAIL PROTECTED]>,
 Glynn Clements <[EMAIL PROTECTED]> wrote:

> Simply claiming that values of type Char are Unicode characters
> doesn't make it so.

Actually, that's exactly what makes it so.

And in article <[EMAIL PROTECTED]>,
 Glynn Clements <[EMAIL PROTECTED]> wrote:

> Unless I'm missing something, the only "support" that GHC provides is
> that Char is 4 bytes.

No, on GHC a Char is a Unicode codepoint, which means it has only 
17*2^16 possible values. This by itself is the most important aspect of 
Unicode support. But most of the rest is missing.

> If you use Char to store anything other than ISO
> Latin-1 characters, none of the Haskell functions with Char in their
> signature will be of any use.

Actually, many of those functions ought to use Word8 instead.

-- 
Ashley Yakeley, Seattle WA

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: System.Posix (symbolic links)

2003-11-29 Thread sebc

You need to use getSymbolicLinkStatus instead of getFileStatus, which
always follows symbolic links (I guess getSymbolicLinkStatus uses the
stat system call, while getSymbolicLinkStatus uses lstat).

-- 
Sebastien

On Sat, Nov 29, 2003 at 08:24:08PM +0100, Johannes Goetz wrote:
> Hi! Sorry for posting this message twice. Last message had wrong subject.
> 
> Calling isSymbolicLink always returns False... (ghc-6.0.1linux binary 
> tarball)
> It doesn't make a difference whether the symbolic link points
> to a regular file or a directory.
> Test code:
> 
> #ln -s test link
> #ghc Test.hs -o test
> #./test
> False
> #
> 
> Test.hs:
> 
> module Main(main) where
> import System.Posix
> main = do
> status <- getFileStatus "link"
> print (isSymbolicLink status)
> 
> 
> Johannes
> 
> ___
> Haskell mailing list
> [EMAIL PROTECTED]
> http://www.haskell.org/mailman/listinfo/haskell


signature.asc
Description: Digital signature
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


System.Posix (symbolic links)

2003-11-29 Thread Johannes Goetz
Hi! Sorry for posting this message twice. Last message had wrong subject.

Calling isSymbolicLink always returns False... (ghc-6.0.1linux binary 
tarball)
It doesn't make a difference whether the symbolic link points
to a regular file or a directory.
Test code:

#ln -s test link
#ghc Test.hs -o test
#./test
False
#
Test.hs:

module Main(main) where
import System.Posix
main = do
status <- getFileStatus "link"
print (isSymbolicLink status)
Johannes

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread Johannes Goetz
Hi!

Calling isSymbolicLink always returns False... (ghc-6.0.1linux binary 
tarball)
It doesn't make a difference whether the symbolic link points
to a regular file or a directory.
Test code:

#ln -s test link
#ghc Test.hs -o test
#./test
False
#
Test.hs:

module Main(main) where
import System.Posix
main = do
status <- getFileStatus "link"
print (isSymbolicLink status)
Johannes

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread Tomasz Zielonka
On Sat, Nov 29, 2003 at 11:10:57AM -0500, Wojtek Moczydlowski wrote:
> 
> (though it still bothers me that I don't have an answer yet to the
> memory leak I posted some time ago)

If you are talking about StateT space leak, then I think I have given
you an answer. My guess was that it is a CAF leak.

Best regards,
Tom

-- 
.signature: Too many levels of symbolic links
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


RE: Why are strings linked lists?

2003-11-29 Thread Wojtek Moczydlowski
> As a matter of pure speculation, how big an impact would it have if, in
> the next "version" of Haskell, Strings were represented as opaque types
> with appropriate functions to convert to and from [Char]?  Would there be
> rioting in the streets?
>
> Andrew Bromage

I would complain. I don't care much about efficiency (though it still
bothers me that I don't have an answer yet to the
memory leak I posted some time ago), and the easiness of dealing with
strings as lists is quite important to me. Expliciting packing and unpacking
would be an incovenience.

Wojtek

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread Glynn Clements

Wolfgang Jeltsch wrote:

> > Right now, values of type Char are, in reality, ISO Latin-1 codepoints
> > padded out to 4 bytes per char.
> 
> No, because this would mean that you wouldn't have chars with codes greater 
> than 255 which is not the case with GHC.

However, the behaviour of codes greater than 255 is undefined. Well,
effectively undefined; I can't imagine anyone wanting to explicitly
define the current behaviour, particularly the fact that:

putChar c
and:
putChar (chr (ord c + n * 256))

are equivalent for all integral n.

> But, of course, I agree with you that currently the main part of Unicode 
> support is missing.

I think that it goes much deeper than that.

Fixing the Char functions (to{Upper,Lower}, is*) is the easy part.

The hard part is dealing with the legacy of the I/O "fiction", i.e. 
the notion that the gap (or, rather, gulf) between characters and
octets can just be waved away, or at least made simple enough that it
can be effectively hidden.

For practical purposes, you need binary I/O, and you need I/O of text
in arbitrary encodings. The correct encoding may be different for
different parts of a program, and for different parts of data obtained
from a single source. The correct encoding may not be known at the
point that I/O occurs (at least, not for input), so you need to be
able to read octets then translate them to Chars once you actually
know the encoding. You also need to be able to handle data where the
encoding is unknown, or which isn't correctly encoded.

This isn't something which can be hidden; at least, not without
reducing Haskell to a toy language (e.g. only handles UTF-8, or only
handles the encoding specified by the locale etc).

-- 
Glynn Clements <[EMAIL PROTECTED]>
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread Glynn Clements

John Meacham wrote:

> > What Unicode support?
> > 
> > Simply claiming that values of type Char are Unicode characters
> > doesn't make it so.
> > 
> > Actually supporting Unicode would require re-implementing toUpper,
> > toLower and the is* functions, as well as at least re-implementing the
> > I/O library (and, realistically, re-designing it; while you *could*
> > just force the use of a specific encoding, the result of doing so
> > would be an I/O system which was almost worthless for real use).
> > 
> > Right now, values of type Char are, in reality, ISO Latin-1 codepoints
> > padded out to 4 bytes per char.
> > 
> > It isn't possible to "drop" support which isn't there.
> 
> I use unicode support with ghc all the time. using my CWString library
> and an alternate set of h* routines. Works quite well. A standard UTF8
> packed string type might be handy though.

IOW, you've written your own Unicode support to get around the fact
that GHC doesn't provide any.

Unless I'm missing something, the only "support" that GHC provides is
that Char is 4 bytes. If you use Char to store anything other than ISO
Latin-1 characters, none of the Haskell functions with Char in their
signature will be of any use. You could just as easily have added
"type WChar = Word32", and made your library use that instead of Char.

-- 
Glynn Clements <[EMAIL PROTECTED]>
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread Glynn Clements

[EMAIL PROTECTED] wrote:

> >> What Unicode support?
> 
> >> Simply claiming that values of type Char are Unicode characters
> >> doesn't make it so.
> 
> > Just because some implementations lack toUpper etc. doesn't mean
> > they all do.  
> 
> I think the point is that for toUpper etc to be properly Unicoded,
> they can't simply look at a single character.  IIRC, there are some
> characters that expand to two characters when the case is changed, and
> then there's titlecase and so on.

If that was the extent of the problems, I wouldn't be describing
Unicode support as "non-existent".

Note that ANSI C9X doesn't handle the first problem either:

   7.25.3.1.1  The towlower function

   #include 
   wint_t towlower(wint_t wc);

   7.25.3.1.2  The towupper function

   #include 
   wint_t towupper(wint_t wc);

And it only handles the second problemm (title case) insofar that it
provides a generic transformation mechanism:

   7.25.3.2  Extensible wide-character case mapping functions

   [#1] The functions wctrans and towctrans provide  extensible
   wide-character mapping as well as case mapping equivalent to
   that performed by the functions described  in  the  previous
   subclause (7.25.3.1).

   7.25.3.2.1  The towctrans function

   #include 
   wint_t towctrans(wint_t wc, wctrans_t desc);

   7.25.3.2.2  The wctrans function

   #include 
   wctrans_t wctrans(const char *property);

Whilst a title-case transformer is the most obvious application of
this, nothing in the standard specifies this.

> toUpper etc. are AFAIK only implemented correctly for a small (but
> IMHO probably the useful) subset of characters.

Yes; so it may as well have just defined Char as an 8-bit ISO Latin-1
character.

Actually, US-ASCII (i.e. the same behaviour as ANSI C with the C/POSIX
locale) would arguably have been a better choice. At least that won't
fail quite so badly if you use e.g. toUpper on a string which is
actually in e.g. ISO Latin-2; the case may be wrong, but at least it
will be the correct letter.

-- 
Glynn Clements <[EMAIL PROTECTED]>
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread Wolfgang Jeltsch
Am Freitag, 28. November 2003 08:49 schrieb John Meacham:
> [...]

> I also have wondered how much the string representation hurts haskell
> program performance.. Something I'd like to see (perhaps a bit less drastic)
> would be a String class, similar to Num so string constants would have type
> String a => a
> then we can make [Char], PackedString, and whatnot instances. It should at
> least make working with alternate string representations easier.

I think, I have already said the following on this list. I would also like to 
have different character types for different subsets of Char (e.g., ASCII) 
and a class Character which the different character types are instances of.

You could combine this idea with the string class idea in the following way:
class Character c => String s c | s -> c where
[...]

instance Character c => String [c] c where
[...]

instance String PackedString Char where
[...]

instance String PackedASCIIString ASCIIChar where
[...]

> John

Wolfgang

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread Wolfgang Jeltsch
Am Freitag, 28. November 2003 22:21 schrieb Glynn Clements:
> [...]

> > What do you mean with this? Hopefully, not dropping Unicode support
> > because this would be a very bad idea, IMHO.
>
> What Unicode support?
>
> Simply claiming that values of type Char are Unicode characters doesn't make
> it so.

You have the possibility to store Unicode codepoints as values of type Char in 
GHC. This is a a little Unicode support which you don't have with 8-bit 
chars.

> [...]

> Right now, values of type Char are, in reality, ISO Latin-1 codepoints
> padded out to 4 bytes per char.

No, because this would mean that you wouldn't have chars with codes greater 
than 255 which is not the case with GHC.

> [...]

But, of course, I agree with you that currently the main part of Unicode 
support is missing.

Wolfgang

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread ketil+haskell
Lennart Augustsson <[EMAIL PROTECTED]> writes:

> Glynn Clements wrote:
>> What Unicode support?

>> Simply claiming that values of type Char are Unicode characters
>> doesn't make it so.

> Just because some implementations lack toUpper etc. doesn't mean
> they all do.  

I think the point is that for toUpper etc to be properly Unicoded,
they can't simply look at a single character.  IIRC, there are some
characters that expand to two characters when the case is changed, and
then there's titlecase and so on.

toUpper etc. are AFAIK only implemented correctly for a small (but
IMHO probably the useful) subset of characters.

> Hbc has had those implemented for maybe 10 years.

I must admit I haven't looked at HBC -- are these functions
implemented properly for codepoints >127?  Outside the ISO-8859-x
ranges? 

-kzm
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell