Re: Why are strings linked lists?

2003-12-10 Thread Alastair Reid

> Wouldn't it be good to implement an LVM import/export feature in GHC, Hugs
> and nhc98?

To read and write lvm bytecodes?

That could be quite a lot of work because the bytecodes are specific to the 
abstract machine and the abstract machines are different.  Unless the LVM is 
essentially just a G-machine (I suspect it isn't), generating LVM code in 
Hugs would require a new code generator.  Reading LVM code is potentially 
even harder.  IMO, the right way to do this is to read/write 'core' lambda 
calculus like the code GHC uses as its intermediate form.

Having found a way to import/export code, there is a second problem to be 
overcome.  Different compilers have different sets of primitive operations.  
I'm sure they all have the same arithmetic operations and, of course, they 
differ in whether or not they have concurrency operations but there are big 
differences in their implementation of the IO monad and exceptions, their 
array operations, etc.

--
Alastair Reid   www.haskell-consulting.com

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-12-10 Thread Wolfgang Jeltsch
Am Dienstag, 9. Dezember 2003 18:34 schrieb Malcolm Wallace:
> [...]

> > The most obvious is the LVM.  See Helium though the LVM is not tied to it.
>
> Both Hugs and nhc98 are also based on bytecode interpreters.

Wouldn't it be good to implement an LVM import/export feature in GHC, Hugs and 
nhc98?

> Regards,
> Malcolm

Wolfgang

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-12-10 Thread Sebastian Sylvan


Jerzy Karczmarczuk wrote:

Robert Will wrote:

Why is 'last' so much slower than 'head'?  Why is 'head' not called
'first'?  Why does 'but_last' (aka init) copy the list, but 'but_first'
(aka tail) does not?


Are those rhetoric questions, asked just to inspire some discussion, or
you *really* don't know why?
Giving the (shamelessly plugged =)) project he's working on I'd guess he 
knows the answer =)

/Sebastian Sylvan
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-12-10 Thread Jerzy Karczmarczuk
Robert Will wrote:

Why is 'last' so much slower than 'head'?  Why is 'head' not called
'first'?  Why does 'but_last' (aka init) copy the list, but 'but_first'
(aka tail) does not?
Are those rhetoric questions, asked just to inspire some discussion, or
you *really* don't know why?


Jerzy Karczmarczuk



___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Why are strings linked lists?

2003-12-10 Thread Robert Will
hi,

In relation to my other message, I can contribute to this question.  I
think it is very important and useful to consider Strings as Sequences of
Characters.  And independently of the issue discussed elsewhere -- whether
Characters should be a type or a class -- I am asking whether Sequences
should be "linked list", that is functional stacks in Haskell.  (Stacks
because cons, head, tail are primitive.)

> 1. Today I spend a few hours trying to track down a memory leak.  It
>turns out I just didn't realize how much space a string takes up.
>On my machine "replicate 500 'a'" will use 90MB of space!

Of course, Packed Arrays will not really change this, but any
implementation of Sequences, where ++ doesn't copy its first argument
will.

In Dessy's standard implementation the above will only use a few bytes...
http://www.stud.tu-ilmenau.de/~robertw/dessy/fun/

> 2. They are extremely slow for most operations like writing to disk,
>adding something to the end, concatenation, etc.

Again, while PackedArrays can't do all of that better, there are a purely
functional data structures, that can.  I estimate that 99% of all uses of
Sequence will work fine with the standard implementation or with Deques,
found on the same page above.

> 3. They make learning Haskell harder.

Why is 'last' so much slower than 'head'?  Why is 'head' not called
'first'?  Why does 'but_last' (aka init) copy the list, but 'but_first'
(aka tail) does not?

The question is whether we want to teach abstraction or stack
programming...


Robert
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-12-09 Thread Malcolm Wallace
Derek Elkins <[EMAIL PROTECTED]> writes:

> > What Haskell byte code projects are out there?
> 
> The most obvious is the LVM.  See Helium though the LVM is not tied to
> it.

Both Hugs and nhc98 are also based on bytecode interpreters.
Regards,
Malcolm
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-12-09 Thread Kevin Everets
On Tue, Dec 09, 2003 at 08:40:12AM -0500, Derek Elkins wrote:

> > I tried to track that down (for a colleague), but the link at:
> >http://www.cs.uu.nl/helium/documentation.html
> > to:
> >http://www.cs.uu.nl/~daan/papers/lvm.pdf
> > is broken, and I can't find any other references.
> 
> Yeah, it seems to have disappeared off the face of the Internet (quite a
> feat).  Emailing Daan Leijen may produce something.

Archive.org to the rescue!

http://web.archive.org/web/20030314040306/http://www.cs.uu.nl/~daan/papers/lvm.pdf

Cheers,

Kevin.
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


RE: Why are strings linked lists?

2003-12-09 Thread Daan Leijen
Hi Graham,
 
> I tried to track that down (for a colleague), but the link at:
>http://www.cs.uu.nl/helium/documentation.html
> to:
>http://www.cs.uu.nl/~daan/papers/lvm.pdf
> is broken, and I can't find any other references.

Sorry for the broken link. The LVM is described in
the last chapter of my thesis which can be found on
my home page (http://www.cs.uu.nl/~daan)

-- Daan.
 
> #g
> 
> 
> 
> Graham Klyne
> For email:
> http://www.ninebynine.org/#Contact
> 
> ___
> Haskell mailing list
> [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell
> 
> 


___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-12-09 Thread Derek Elkins
On Tue, 09 Dec 2003 10:51:08 +
Graham Klyne <[EMAIL PROTECTED]> wrote:

> At 15:52 08/12/03 -0500, Derek Elkins wrote:
> > > What Haskell byte code projects are out there?
> >
> >The most obvious is the LVM.  See Helium though the LVM is not tied
> >to it.
> 
> I tried to track that down (for a colleague), but the link at:
>http://www.cs.uu.nl/helium/documentation.html
> to:
>http://www.cs.uu.nl/~daan/papers/lvm.pdf
> is broken, and I can't find any other references.

Yeah, it seems to have disappeared off the face of the Internet (quite a
feat).  Emailing Daan Leijen may produce something.

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-12-09 Thread Graham Klyne
At 15:52 08/12/03 -0500, Derek Elkins wrote:
> What Haskell byte code projects are out there?

The most obvious is the LVM.  See Helium though the LVM is not tied to
it.
I tried to track that down (for a colleague), but the link at:
  http://www.cs.uu.nl/helium/documentation.html
to:
  http://www.cs.uu.nl/~daan/papers/lvm.pdf
is broken, and I can't find any other references.
#g


Graham Klyne
For email:
http://www.ninebynine.org/#Contact
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


RE: Why are strings linked lists?

2003-12-09 Thread Simon Marlow
 
> > GHC 6.2 (shortly to be released) also supports toUpper, toLower, and
> the
> > character predicates isUpper, isLower etc. on the full Unicode
> character
> > set.
> > 
> > There is one caveat: the implementation is based on the C library's
> > towupper() and so on, so the support is only as good as the 
> C library
> > provides, and it relies on wchar_t being equivalent to Unicode (the
> > sensible choice, but not all libcs do this).
> 
> Now, why would one want to base this on C's wchar_t and its
> "w" routines?

Because it's easy, and it's still an improvement over what we had
before.

> wchar_t is sometimes (isolated) UTF-32 code units,
> including in Linux, sometimes it is (isolated) UTF-16 code units,
> including in Windows, and sometimes something utterly useless.

Yes, I mentioned this above.

> Please instead use ICU's UChar32,

Thanks, I didn't know about ICU - it looks nice.  I did briefly
investigate libunicode, which turned out to be worse than the C library.

Cheers,
Simon

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


RE: Why are strings linked lists?

2003-12-08 Thread Kent Karlsson

> GHC 6.2 (shortly to be released) also supports toUpper, toLower, and
the
> character predicates isUpper, isLower etc. on the full Unicode
character
> set.
> 
> There is one caveat: the implementation is based on the C library's
> towupper() and so on, so the support is only as good as the C library
> provides, and it relies on wchar_t being equivalent to Unicode (the
> sensible choice, but not all libcs do this).

Now, why would one want to base this on C's wchar_t and its
"w" routines? wchar_t is sometimes (isolated) UTF-32 code units,
including in Linux, sometimes it is (isolated) UTF-16 code units,
including in Windows, and sometimes something utterly useless.
The casing data is not reliable (it could be entirely wrong, and even
locale dependent in an erroneous way), nor kept up to date with the
Unicode character database in all implementations (even where
wchar_t is some form of Unicode/10646). wchar_t is best forgotten,
especially for portable programs.

Please instead use ICU's UChar32, which is (isolated) UTF-32, and
and Unicode::isUpperCase(cp), Unicode::toUpperCase(cp) (C++ here),
etc. The ICU data is kept up-to-date with Unicode versions. The
case mappings are the simplistic ones, not taking SpecialCasing.txt
into account, just the UnicodeData.txt case mapping data. It is thus
not locale dependent, nor context dependent, nor doesn't cae-map
a character to more than one character (so it is not fully appropriate
for strings, but still much, much better than C's wchar_t and its
w-functions).

> Proper support for character set conversions in the I/O library has
been
> talked about for some time, and there are a couple of implementations

One can base this on the ICU character encoding conversions. I would
very much recommend that over the C locale dependent "mb"
conversion routines, for the same reasons as above.

/kent k

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-12-08 Thread Derek Elkins
On Mon, 8 Dec 2003 20:59:53 +0100
Wolfgang Jeltsch <[EMAIL PROTECTED]> wrote:

> Am Montag, 8. Dezember 2003 15:13 schrieb Tomasz Zielonka:
> > [...]
> 
> > Even in unoptimized, byte-code compiled code?
> 
> Does GHCi use byte code internally? 

Yes.

> What Haskell byte code projects are out there?

The most obvious is the LVM.  See Helium though the LVM is not tied to
it.

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-12-08 Thread Wolfgang Jeltsch
Am Montag, 8. Dezember 2003 15:13 schrieb Tomasz Zielonka:
> [...]

> Even in unoptimized, byte-code compiled code?

Does GHCi use byte code internally? Would it be possible to export this code 
and load it into GHCi again or compile it to machine code? What Haskell byte 
code projects are out there?

> [...]

Wolfgang

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


RE: Why are strings linked lists?

2003-12-08 Thread Simon Marlow
 
> But then... what trick does GHC use to prevent CAF leaks?

No tricks really, the garbage collector traces references from code as
well as references from the heap to find all the live data.  Of course,
it doesn't actually examine the code; each code segment comes with a
list of top-level closures that it references, called an SRT (static
reference table).

In a standalone program, we know that only CAFs which are referenced by
code which is reachable from the current machine state (i.e. the stack)
will ever be used again.  In an interpreted system it is more difficult:
we don't know what expressions will be typed at the prompt in the
future.

Cheers,
Simon

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-12-08 Thread Tomasz Zielonka
On Mon, Dec 08, 2003 at 02:24:51PM -, Simon Marlow wrote:
> 
> Just compiling the code (instead of running it in GHCi) is enough to
> prevent CAFs from leaking.

Ah, I see the story is a bit different that I thought.

Please correct me I am wrong:

When I compiled the code with -O2 there was no leak because t was
optimised into a tight loop with no allocs. In other words: the CAF was
not garbage-collected, but there was not much to collect.

We see two things interacting here:
 
 1. GHCi not reclaiming CAFs

 2. Deforestation (and probably some other optimisations)

This interaction made me believe that CAF leaks were eliminated by some
optimisation turned on by -O2.

But then... what trick does GHC use to prevent CAF leaks?

Best regards,
Tomek

-- 
.signature: Too many levels of symbolic links
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


RE: Why are strings linked lists?

2003-12-08 Thread Simon Marlow
 
> Even in unoptimized, byte-code compiled code?
> 
> Take this module:
> 
> module A where
> 
> t :: IO ()
> t = sequence_ (repeat (return ()))
> 
> If I :load it into ghci as interpreted, or if I compile it without
> optimisation options, and I run t, then the process grows, and grows,
> and grows, ...

True, GHC does not garbage-collect CAFs when interpreting.  This is
partly intentional; we provide an option to turn off the behaviour (:set
+r) although GHC is currently not capable of garbage collecting CAFs
*during* evaluation when interpreting.

> Of course, when I first compile it with -O2 option then it runs in
> constant space.
> 
> When I compiled Wojtek's code with -O2, the problem disappeared, so I
> guess he was loading the module without compiling it with -O2 first.
> 
> Shouldn't you rather say: GHC doesn't have CAF leaks in code compiled
> with [insert the relevand optimisation option here] option?

Just compiling the code (instead of running it in GHCi) is enough to
prevent CAFs from leaking.  You might get better space behaviour by
optimising, however (or you might get worse space behaviour, but that's
unlikely).

Cheers,
Simon


___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-12-08 Thread Tomasz Zielonka
On Mon, Dec 08, 2003 at 01:45:30PM -, Simon Marlow wrote:
>  
> > On Sat, Nov 29, 2003 at 11:10:57AM -0500, Wojtek Moczydlowski wrote:
> > > 
> > > (though it still bothers me that I don't have an answer yet to the
> > > memory leak I posted some time ago)
> > 
> > If you are talking about StateT space leak, then I think I have given
> > you an answer. My guess was that it is a CAF leak.
> 
> GHC doesn't have CAF leaks.  Next guess :-)

Even in unoptimized, byte-code compiled code?

Take this module:

module A where

t :: IO ()
t = sequence_ (repeat (return ()))

If I :load it into ghci as interpreted, or if I compile it without
optimisation options, and I run t, then the process grows, and grows,
and grows, ...

Of course, when I first compile it with -O2 option then it runs in
constant space.

When I compiled Wojtek's code with -O2, the problem disappeared, so I
guess he was loading the module without compiling it with -O2 first.

Shouldn't you rather say: GHC doesn't have CAF leaks in code compiled
with [insert the relevand optimisation option here] option?

Case closed!

Or I am still wrong?

> Cheers,
>   Simon

Best regards,
Tom

-- 
.signature: Too many levels of symbolic links
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


RE: Why are strings linked lists?

2003-12-08 Thread Simon Marlow
 
> On Sat, Nov 29, 2003 at 11:10:57AM -0500, Wojtek Moczydlowski wrote:
> > 
> > (though it still bothers me that I don't have an answer yet to the
> > memory leak I posted some time ago)
> 
> If you are talking about StateT space leak, then I think I have given
> you an answer. My guess was that it is a CAF leak.

GHC doesn't have CAF leaks.  Next guess :-)

Cheers,
Simon
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


RE: Why are strings linked lists?

2003-12-08 Thread Simon Marlow
> Unless I'm missing something, the only "support" that GHC provides is
> that Char is 4 bytes.

GHC 6.2 (shortly to be released) also supports toUpper, toLower, and the
character predicates isUpper, isLower etc. on the full Unicode character
set.

There is one caveat: the implementation is based on the C library's
towupper() and so on, so the support is only as good as the C library
provides, and it relies on wchar_t being equivalent to Unicode (the
sensible choice, but not all libcs do this).

Proper support for character set conversions in the I/O library has been
talked about for some time, and there are a couple of implementations
out there.  I think ultimately this should be part of a complete
redesign of the I/O library based around streams (see recent discussion
on the libraries list).

Cheers,
Simon

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-30 Thread Wolfgang Jeltsch
Am Samstag, 29. November 2003 23:58 schrieb [EMAIL PROTECTED]:
> G'day all.
>
> Quoting Wolfgang Jeltsch <[EMAIL PROTECTED]>:
> > I think, I have already said the following on this list. I would also like
> > to have different character types for different subsets of Char (e.g.,
> > ASCII) and a class Character which the different character types are
> > instances of.
>
> As a matter of interest, what might some of the methods of this class be?
> ord and chr are two obvious choices.  What else?

Hello,

I have such a Character class in my Seaweed library. You may have a look at
http://cvs.sf.net/viewcvs.py/seaweed/code/Seaweed/Core/Characters.hs
and
http://cvs.sf.net/viewcvs.py/seaweed/code/Seaweed/Core/Characters/ASCII.hs

The class has two methods, toCharacterMonad and fromCharacter, converting 
between ordinary Chars and instances of the Character class. The second 
function uses monads for error handling; it yields return  if 
conversion was successful and fail  if not. Several functions are 
implemented on top of these two methods.

I created the Character class since I'm working with text-based network 
protocols and text-based file formats (like XML) where characters are often 
restricted to certain sets. I wanted to have such restrictions forced by the 
compiler.

IMHO, it would be nice to also have some support for the Character class by 
the syntax of Haskell. I could imagine declarations of the form
chartype ASCIIPrintable = ' ' .. '~'
chartype ASCIICtrl = '\000' .. '\037' | '\177'
A char literal like 'A' could denote not only a Char value but a value of any 
type which is an instance of Character. A default mechanism similar to Num 
could be introduced also with characters.

> Cheers,
> Andrew Bromage

Wolfgang

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


RE: Why are strings linked lists?

2003-11-30 Thread Kent Karlsson
> > Glynn Clements wrote:
> >> What Unicode support?
> 
> >> Simply claiming that values of type Char are Unicode characters
> >> doesn't make it so.

Well, *claiming* so doesn't make it so. But actually representing
characters in such a way that the Unicode conformance rules are
followed, makes it so. There is no requirement for a particular API,
for instance.

> > Just because some implementations lack toUpper etc. doesn't mean
> > they all do.  

toUpper etc. are over-rated. They are very rarely used in real life,
or at least should be very rarely used, with very few exceptions:
auto-titlecasing of the first word of a sentence (which I find rather
handy for natural language texts), and for making "small caps"
(some fonts do that internally, but that's a mistake, since it is then
not language dependent).

Some things that are much more interesting and of practical use are:
Unicode normalisation, transformation between encoding forms
(mainly for I/O), finding formal character (or rather, code point)
properties, line breaking, combining character handling, language
dependent collation (UCA based), decimal number parsing and
formatting (for several scripts), regular expressions generalised
to Unicode (including support for "default ignorable"), ...

Case mapping falls rather low on the priority list. Except perhaps
for the special form of "case folding" (almost lowercasing but not
quite) used for IDNs, but almost only there; but could be used also
for Ada, SQL, etc. that "ignore" case.

B.t.w., for line breaking Thai, Lao, or Khmer, you need a dictionary.
ZERO WIDTH NO BREAK SPACE can be used between words, but isn't
normally.

> I think the point is that for toUpper etc to be properly Unicoded,
> they can't simply look at a single character.  IIRC, there are some
> characters that expand to two characters when the case is changed,

Yes, for instance for ß (sharp s). The uppercase of ß is SS. For
proper lowercasing you need a dictionary. It is also language
dependent.  Case mapping for Lithuanian and Turkish/Azerbaijani
have exceptions to what is done elsewhere. See
http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt

/kent k
<>___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread Ashley Yakeley
In article <[EMAIL PROTECTED]>,
 Glynn Clements <[EMAIL PROTECTED]> wrote:

> OK; by "Char is 4 bytes" I basically meant that it's "large enough".

Char is exactly the correct size. The Eq, Ord and Enum instances all 
work correctly. The fact that you cannot represent values outside the 
range is important IMO.

> 1. Where would you get a Char from?
> 2. Where would you put it?

You can convert to and from the codepoint number using toEnum and 
fromEnum. What is missing is UTF-8 and Latin-1 charset conversions, and 
character properties. You can find draft standard library code for these 
here:


> BTW, I agree that the IO functions *should* use Word8.

Right.

> And I really
> wouldn't be that bothered if the standard was changed to just use
> "type Char = Word8". Actually, I would prefer that to the current
> fiction.

No! In GHC, a Char represents a Unicode codepoint: nothing more, and 
nothing less. This is something that probably ought to become part of 
some later Haskell standard. Frankly I find the idea that the character 
'A' is somehow identical to the number 65, or octet value 65, to be 
completely bizarre, and Haskell does well to give them separate types.

The problem is that certain IO functions do implicit Latin-1 conversion.

> The IO problems are design bugs, and can't truly be fixed without
> breaking a lot of existing code.

Well that's what deprecation is for. New Word8-based functions would 
have new names. Every so often there's a burst of activity on the 
Libraries or the Internationalisation lists concerning this, but it 
never quite comes together somehow.

-- 
Ashley Yakeley, Seattle WA

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread Mark Carroll
On Sat, 29 Nov 2003 [EMAIL PROTECTED] wrote:
(snip)
> Interesting that you mention this.  I've also been thinking about this
> lately in the context of the discussion on collections and the left-fold
> combinator both here and on LtU.  When people say "I want String to be
> [Char]", what I'm actually hearing is "I want String to be a collection
> of Char".  I may be mishearing.

It did strike me that it would be interesting if you could make various
things instances of a List sort of class and then take, reverse, etc.
would work on them. How this relates to your comment, I'm not sure.
Things like map, of course, could work on unordered bags of things too,
but I suppose that's what Functors are for.

-- Mark
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread Glynn Clements

Ashley Yakeley wrote:

> > Simply claiming that values of type Char are Unicode characters
> > doesn't make it so.
> 
> Actually, that's exactly what makes it so.

Hmm. I suppose that there's some validity to that perspective. OTOH,
it's one thing to state that it's true, but that's rather hollow if
nothing actually behaves as if it is.

It's a bit like saying "values of type Int are complex numbers; oh,
BTW, the implementation is currently broken".

IOW, if it walks like a duck, ...

> > Unless I'm missing something, the only "support" that GHC provides is
> > that Char is 4 bytes.
> 
> No, on GHC a Char is a Unicode codepoint, which means it has only 
> 17*2^16 possible values. This by itself is the most important aspect of 
> Unicode support.

OK; by "Char is 4 bytes" I basically meant that it's "large enough".

> But most of the rest is missing.

AFAICT, *all*[1] of the rest is missing.

[1] With one rather useless exception: (maxBound :: Char) == 0x10. 
I can't think of any other aspect of GHC's behaviour which would
indicate that Char is meant to be Unicode.

> > If you use Char to store anything other than ISO
> > Latin-1 characters, none of the Haskell functions with Char in their
> > signature will be of any use.
> 
> Actually, many of those functions ought to use Word8 instead.

But then:

1. Where would you get a Char from?
2. Where would you put it?

BTW, I agree that the IO functions *should* use Word8. And I really
wouldn't be that bothered if the standard was changed to just use
"type Char = Word8". Actually, I would prefer that to the current
fiction.

At least the problems with the Char functions are just implementation
bugs; those functions *could* be made to work correctly.

The IO problems are design bugs, and can't truly be fixed without
breaking a lot of existing code. A workaround which preserves backward
compatibility could result in a rather ugly interface: either all of
the relevant functions use a default encoding (which will probably be
the wrong one as often as not), or the "right" functions have to have
their names bastardised because the "wrong" functions have already
stolen the obvious names.

-- 
Glynn Clements <[EMAIL PROTECTED]>
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread ajb
G'day all.

Quoting Wolfgang Jeltsch <[EMAIL PROTECTED]>:

> I think, I have already said the following on this list. I would also like to
> have different character types for different subsets of Char (e.g., ASCII)
> and a class Character which the different character types are instances of.

As a matter of interest, what might some of the methods of this
class be?  ord and chr are two obvious choices.  What else?

Cheers,
Andrew Bromage
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread ajb
G'day all.

Quoting John Meacham <[EMAIL PROTECTED]>:

> Something I'd like to see (perhaps a bit less
> drastic) would be a String class, similar to Num so string constants
> would have type
> String a => a

Interesting that you mention this.  I've also been thinking about this
lately in the context of the discussion on collections and the left-fold
combinator both here and on LtU.  When people say "I want String to be
[Char]", what I'm actually hearing is "I want String to be a collection
of Char".  I may be mishearing.

Cheers,
Andrew Bromage
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread Ashley Yakeley
In article <[EMAIL PROTECTED]>,
 Glynn Clements <[EMAIL PROTECTED]> wrote:

> Simply claiming that values of type Char are Unicode characters
> doesn't make it so.

Actually, that's exactly what makes it so.

And in article <[EMAIL PROTECTED]>,
 Glynn Clements <[EMAIL PROTECTED]> wrote:

> Unless I'm missing something, the only "support" that GHC provides is
> that Char is 4 bytes.

No, on GHC a Char is a Unicode codepoint, which means it has only 
17*2^16 possible values. This by itself is the most important aspect of 
Unicode support. But most of the rest is missing.

> If you use Char to store anything other than ISO
> Latin-1 characters, none of the Haskell functions with Char in their
> signature will be of any use.

Actually, many of those functions ought to use Word8 instead.

-- 
Ashley Yakeley, Seattle WA

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread Johannes Goetz
Hi!

Calling isSymbolicLink always returns False... (ghc-6.0.1linux binary 
tarball)
It doesn't make a difference whether the symbolic link points
to a regular file or a directory.
Test code:

#ln -s test link
#ghc Test.hs -o test
#./test
False
#
Test.hs:

module Main(main) where
import System.Posix
main = do
status <- getFileStatus "link"
print (isSymbolicLink status)
Johannes

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread Tomasz Zielonka
On Sat, Nov 29, 2003 at 11:10:57AM -0500, Wojtek Moczydlowski wrote:
> 
> (though it still bothers me that I don't have an answer yet to the
> memory leak I posted some time ago)

If you are talking about StateT space leak, then I think I have given
you an answer. My guess was that it is a CAF leak.

Best regards,
Tom

-- 
.signature: Too many levels of symbolic links
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


RE: Why are strings linked lists?

2003-11-29 Thread Wojtek Moczydlowski
> As a matter of pure speculation, how big an impact would it have if, in
> the next "version" of Haskell, Strings were represented as opaque types
> with appropriate functions to convert to and from [Char]?  Would there be
> rioting in the streets?
>
> Andrew Bromage

I would complain. I don't care much about efficiency (though it still
bothers me that I don't have an answer yet to the
memory leak I posted some time ago), and the easiness of dealing with
strings as lists is quite important to me. Expliciting packing and unpacking
would be an incovenience.

Wojtek

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread Glynn Clements

Wolfgang Jeltsch wrote:

> > Right now, values of type Char are, in reality, ISO Latin-1 codepoints
> > padded out to 4 bytes per char.
> 
> No, because this would mean that you wouldn't have chars with codes greater 
> than 255 which is not the case with GHC.

However, the behaviour of codes greater than 255 is undefined. Well,
effectively undefined; I can't imagine anyone wanting to explicitly
define the current behaviour, particularly the fact that:

putChar c
and:
putChar (chr (ord c + n * 256))

are equivalent for all integral n.

> But, of course, I agree with you that currently the main part of Unicode 
> support is missing.

I think that it goes much deeper than that.

Fixing the Char functions (to{Upper,Lower}, is*) is the easy part.

The hard part is dealing with the legacy of the I/O "fiction", i.e. 
the notion that the gap (or, rather, gulf) between characters and
octets can just be waved away, or at least made simple enough that it
can be effectively hidden.

For practical purposes, you need binary I/O, and you need I/O of text
in arbitrary encodings. The correct encoding may be different for
different parts of a program, and for different parts of data obtained
from a single source. The correct encoding may not be known at the
point that I/O occurs (at least, not for input), so you need to be
able to read octets then translate them to Chars once you actually
know the encoding. You also need to be able to handle data where the
encoding is unknown, or which isn't correctly encoded.

This isn't something which can be hidden; at least, not without
reducing Haskell to a toy language (e.g. only handles UTF-8, or only
handles the encoding specified by the locale etc).

-- 
Glynn Clements <[EMAIL PROTECTED]>
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread Glynn Clements

John Meacham wrote:

> > What Unicode support?
> > 
> > Simply claiming that values of type Char are Unicode characters
> > doesn't make it so.
> > 
> > Actually supporting Unicode would require re-implementing toUpper,
> > toLower and the is* functions, as well as at least re-implementing the
> > I/O library (and, realistically, re-designing it; while you *could*
> > just force the use of a specific encoding, the result of doing so
> > would be an I/O system which was almost worthless for real use).
> > 
> > Right now, values of type Char are, in reality, ISO Latin-1 codepoints
> > padded out to 4 bytes per char.
> > 
> > It isn't possible to "drop" support which isn't there.
> 
> I use unicode support with ghc all the time. using my CWString library
> and an alternate set of h* routines. Works quite well. A standard UTF8
> packed string type might be handy though.

IOW, you've written your own Unicode support to get around the fact
that GHC doesn't provide any.

Unless I'm missing something, the only "support" that GHC provides is
that Char is 4 bytes. If you use Char to store anything other than ISO
Latin-1 characters, none of the Haskell functions with Char in their
signature will be of any use. You could just as easily have added
"type WChar = Word32", and made your library use that instead of Char.

-- 
Glynn Clements <[EMAIL PROTECTED]>
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread Glynn Clements

[EMAIL PROTECTED] wrote:

> >> What Unicode support?
> 
> >> Simply claiming that values of type Char are Unicode characters
> >> doesn't make it so.
> 
> > Just because some implementations lack toUpper etc. doesn't mean
> > they all do.  
> 
> I think the point is that for toUpper etc to be properly Unicoded,
> they can't simply look at a single character.  IIRC, there are some
> characters that expand to two characters when the case is changed, and
> then there's titlecase and so on.

If that was the extent of the problems, I wouldn't be describing
Unicode support as "non-existent".

Note that ANSI C9X doesn't handle the first problem either:

   7.25.3.1.1  The towlower function

   #include 
   wint_t towlower(wint_t wc);

   7.25.3.1.2  The towupper function

   #include 
   wint_t towupper(wint_t wc);

And it only handles the second problemm (title case) insofar that it
provides a generic transformation mechanism:

   7.25.3.2  Extensible wide-character case mapping functions

   [#1] The functions wctrans and towctrans provide  extensible
   wide-character mapping as well as case mapping equivalent to
   that performed by the functions described  in  the  previous
   subclause (7.25.3.1).

   7.25.3.2.1  The towctrans function

   #include 
   wint_t towctrans(wint_t wc, wctrans_t desc);

   7.25.3.2.2  The wctrans function

   #include 
   wctrans_t wctrans(const char *property);

Whilst a title-case transformer is the most obvious application of
this, nothing in the standard specifies this.

> toUpper etc. are AFAIK only implemented correctly for a small (but
> IMHO probably the useful) subset of characters.

Yes; so it may as well have just defined Char as an 8-bit ISO Latin-1
character.

Actually, US-ASCII (i.e. the same behaviour as ANSI C with the C/POSIX
locale) would arguably have been a better choice. At least that won't
fail quite so badly if you use e.g. toUpper on a string which is
actually in e.g. ISO Latin-2; the case may be wrong, but at least it
will be the correct letter.

-- 
Glynn Clements <[EMAIL PROTECTED]>
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread Wolfgang Jeltsch
Am Freitag, 28. November 2003 08:49 schrieb John Meacham:
> [...]

> I also have wondered how much the string representation hurts haskell
> program performance.. Something I'd like to see (perhaps a bit less drastic)
> would be a String class, similar to Num so string constants would have type
> String a => a
> then we can make [Char], PackedString, and whatnot instances. It should at
> least make working with alternate string representations easier.

I think, I have already said the following on this list. I would also like to 
have different character types for different subsets of Char (e.g., ASCII) 
and a class Character which the different character types are instances of.

You could combine this idea with the string class idea in the following way:
class Character c => String s c | s -> c where
[...]

instance Character c => String [c] c where
[...]

instance String PackedString Char where
[...]

instance String PackedASCIIString ASCIIChar where
[...]

> John

Wolfgang

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread Wolfgang Jeltsch
Am Freitag, 28. November 2003 22:21 schrieb Glynn Clements:
> [...]

> > What do you mean with this? Hopefully, not dropping Unicode support
> > because this would be a very bad idea, IMHO.
>
> What Unicode support?
>
> Simply claiming that values of type Char are Unicode characters doesn't make
> it so.

You have the possibility to store Unicode codepoints as values of type Char in 
GHC. This is a a little Unicode support which you don't have with 8-bit 
chars.

> [...]

> Right now, values of type Char are, in reality, ISO Latin-1 codepoints
> padded out to 4 bytes per char.

No, because this would mean that you wouldn't have chars with codes greater 
than 255 which is not the case with GHC.

> [...]

But, of course, I agree with you that currently the main part of Unicode 
support is missing.

Wolfgang

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-29 Thread ketil+haskell
Lennart Augustsson <[EMAIL PROTECTED]> writes:

> Glynn Clements wrote:
>> What Unicode support?

>> Simply claiming that values of type Char are Unicode characters
>> doesn't make it so.

> Just because some implementations lack toUpper etc. doesn't mean
> they all do.  

I think the point is that for toUpper etc to be properly Unicoded,
they can't simply look at a single character.  IIRC, there are some
characters that expand to two characters when the case is changed, and
then there's titlecase and so on.

toUpper etc. are AFAIK only implemented correctly for a small (but
IMHO probably the useful) subset of characters.

> Hbc has had those implemented for maybe 10 years.

I must admit I haven't looked at HBC -- are these functions
implemented properly for codepoints >127?  Outside the ISO-8859-x
ranges? 

-kzm
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-28 Thread John Meacham
On Fri, Nov 28, 2003 at 09:21:50PM +, Glynn Clements wrote:
> What Unicode support?
> 
> Simply claiming that values of type Char are Unicode characters
> doesn't make it so.
> 
> Actually supporting Unicode would require re-implementing toUpper,
> toLower and the is* functions, as well as at least re-implementing the
> I/O library (and, realistically, re-designing it; while you *could*
> just force the use of a specific encoding, the result of doing so
> would be an I/O system which was almost worthless for real use).
> 
> Right now, values of type Char are, in reality, ISO Latin-1 codepoints
> padded out to 4 bytes per char.
> 
> It isn't possible to "drop" support which isn't there.

I use unicode support with ghc all the time. using my CWString library
and an alternate set of h* routines. Works quite well. A standard UTF8
packed string type might be handy though.
John

-- 
---
John Meacham - California Institute of Technology, Alum. - [EMAIL PROTECTED]
---
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-28 Thread Lennart Augustsson
Glynn Clements wrote:


What Unicode support?

Simply claiming that values of type Char are Unicode characters
doesn't make it so.
Just because some implementations lack toUpper etc. doesn't mean
they all do.  Hbc has had those implemented for maybe 10 years.
	-- Lennart

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-28 Thread Glynn Clements

Wolfgang Jeltsch wrote:

> > > As a matter of pure speculation, how big an impact would it have if, in
> > > the next "version" of Haskell, Strings were represented as opaque types
> > > with appropriate functions to convert to and from [Char]?  Would there be
> > > rioting in the streets?
> >
> > For me, there would rather be celebration :), especially if these could be
> > tuned to only use 8 bits.
> 
> What do you mean with this? Hopefully, not dropping Unicode support because 
> this would be a very bad idea, IMHO.

What Unicode support?

Simply claiming that values of type Char are Unicode characters
doesn't make it so.

Actually supporting Unicode would require re-implementing toUpper,
toLower and the is* functions, as well as at least re-implementing the
I/O library (and, realistically, re-designing it; while you *could*
just force the use of a specific encoding, the result of doing so
would be an I/O system which was almost worthless for real use).

Right now, values of type Char are, in reality, ISO Latin-1 codepoints
padded out to 4 bytes per char.

It isn't possible to "drop" support which isn't there.

-- 
Glynn Clements <[EMAIL PROTECTED]>
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-28 Thread Hal Daume III
> > For me, there would rather be celebration :), especially if these could be
> > tuned to only use 8 bits.
> 
> What do you mean with this? Hopefully, not dropping Unicode support because 
> this would be a very bad idea, IMHO.

I mean to have the option of using Unicode or plain 8bit ascii as you see 
fit.

-- 
 Hal Daume III   | [EMAIL PROTECTED]
 "Arrest this man, he talks in maths."   | www.isi.edu/~hdaume

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-28 Thread Wolfgang Jeltsch
Am Freitag, 28. November 2003 19:21 schrieb Hal Daume III:
> > As a matter of pure speculation, how big an impact would it have if, in
> > the next "version" of Haskell, Strings were represented as opaque types
> > with appropriate functions to convert to and from [Char]?  Would there be
> > rioting in the streets?
>
> For me, there would rather be celebration :), especially if these could be
> tuned to only use 8 bits.

What do you mean with this? Hopefully, not dropping Unicode support because 
this would be a very bad idea, IMHO.

> [...]

>  - Hal

Wolfgang

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-28 Thread Hal Daume III
> As a matter of pure speculation, how big an impact would it have if, in
> the next "version" of Haskell, Strings were represented as opaque types
> with appropriate functions to convert to and from [Char]?  Would there be
> rioting in the streets?

For me, there would rather be celebration :), especially if these could be 
tuned to only use 8 bits.  I tend to try to use packed strings for most of 
the stuff I do, but the one major problem I have with them is that there 
is not correspondent to hGetLine.  You need to know the length of the line 
apriori or write your own using getting characters and concatenating them 
(AFAIK).  Either way, you're going to get performance hits for going 
through [Char]s.

As a minor quibble, I don't like the naming scheme with packString and 
unpackPS...seems very unbalanced to me :).

my 2 cents

 - Hal

-- 
 Hal Daume III   | [EMAIL PROTECTED]
 "Arrest this man, he talks in maths."   | www.isi.edu/~hdaume

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-28 Thread sebc
On Fri, Nov 28, 2003 at 12:37:30PM +0100, Wolfgang Jeltsch wrote:
> >
> > So, what is happening that there is 1 cell in the heap
> > containing the representation of 'a', and then a linked list
> > of length 500, where each element points to that cell.
> 
> Yes, you're right. But if you choose the array alternative, you cannot use 
> sharing and would, therefore, still need 20 MB.

You can use sharing if you don't use unboxed arrays.  Not that it
matters if a character takes as much space as a pointer, but for
64-bits floating point numbers on a platform with 32-bits pointers, it
would decrease memory consumption by almost half.
Anyway, I'm just nitpicking. :-)

-- 
Sebastien

signature.asc
Description: Digital signature
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-28 Thread Matthias Kilian
On Fri, Nov 28, 2003 at 11:31:51AM +0100, Wolfgang Jeltsch wrote:
> > On my machine "replicate 500 'a'" will use 90MB of space!
> 
> You have to take into account that Chars (in GHC) take 4 bytes of memory 
> because they denote Unicode codepoints. 5,000,000 times 4 bytes is already 20 
> MB. (The rest is only a constant factor. ;-))

No, in the above example there's only one single 'a' but 500
(:) and one [] heap-allocated cells.

For example,

let f = replicate 500 in f (f 'a')

will use only about twice (and not 500 times) the memory of the
initial example. Exactly: 1000 * size of (:) + size of 'a' + size of [].

So the only relevant thing is the list constructor (:) which needs two
pointers to its arguments, and typically another pointer for housekeeping
(depending on the runtime implementation). That typically makes 12 bytes
per constructor (on a 32 bit architecture).

Regards,
Kili

-- 
Denken ist schwer, darum urteilen die meisten.
[Carl Gustav Jung]
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-28 Thread Wolfgang Jeltsch
Am Freitag, 28. November 2003 12:10 schrieb Koen Claessen:
> Wolfgang Jeltsch wrote:
>  | > 1.  Today I spend a few hours trying to track down a memory leak.  It
>  | > turns out I just didn't realize how much space a string takes up.
>  | > On my machine "replicate 500 'a'" will use 90MB of space!
>  |
>  | You have to take into account that Chars (in GHC) take 4
>  | bytes of memory because they denote Unicode codepoints.
>  | 5,000,000 times 4 bytes is already 20 MB. (The rest is
>  | only a constant factor. ;-))
>
> You have to realize that the space usage does not
> (necessarily) come from duplicating the character 'a'.
> In a lazy implementation, the representation of that
> character will be shared by all elements in the list.
>
> So, what is happening that there is 1 cell in the heap
> containing the representation of 'a', and then a linked list
> of length 500, where each element points to that cell.

Yes, you're right. But if you choose the array alternative, you cannot use 
sharing and would, therefore, still need 20 MB.

> Just my 2 öre.
>
> /Koen

Wolfgang

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-28 Thread Koen Claessen
Wolfgang Jeltsch wrote:

 | > 1.  Today I spend a few hours trying to track down a memory leak.  It
 | > turns out I just didn't realize how much space a string takes up.
 | > On my machine "replicate 500 'a'" will use 90MB of space!
 |
 | You have to take into account that Chars (in GHC) take 4
 | bytes of memory because they denote Unicode codepoints.
 | 5,000,000 times 4 bytes is already 20 MB. (The rest is
 | only a constant factor. ;-))

You have to realize that the space usage does not
(necessarily) come from duplicating the character 'a'.
In a lazy implementation, the representation of that
character will be shared by all elements in the list.

So, what is happening that there is 1 cell in the heap
containing the representation of 'a', and then a linked list
of length 500, where each element points to that cell.

Just my 2 öre.

/Koen
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-28 Thread Wolfgang Jeltsch
Am Freitag, 28. November 2003 04:32 schrieb Ben Escoto:
>Hi, can someone tell me why Haskell strings are linked lists?

I think they are lists because there is already good support for lists in 
Haskell. You can just take the many list functions and apply them directly to 
strings.

You could then ask why lists are defined via an algebraic data type and not 
via an array type. I think, this is because algebraic data types are 
fundamental in Haskell and arrays are just there because of efficiency. In 
addition, with an algebraic data type you are able to use such nice things 
like pattern matching.

> [...]

> 1.  Today I spend a few hours trying to track down a memory leak.  It
> turns out I just didn't realize how much space a string takes up.
> On my machine "replicate 500 'a'" will use 90MB of space!

You have to take into account that Chars (in GHC) take 4 bytes of memory 
because they denote Unicode codepoints. 5,000,000 times 4 bytes is already 20 
MB. (The rest is only a constant factor. ;-))

> 2.  They are extremely slow for most operations like writing to disk,
> adding something to the end, concatenation, etc.

Well, there are solutions for avoiding the quadratic time problem with 
concatenation like the ShowS thing in the prelude or my "EC list" 
implementation found under
http://cvs.sourceforge.net/viewcvs.py/seaweed/code/Seaweed/Core/Lists.hs
(Well, I don't know how big the overhead for building the internal data 
structures of EC lists is.)

> 3.  They make learning Haskell harder.  Lazy IO functions like
> hGetContents are kind of slick in a shallow way, but I was
> confused about the Haskell IO model because I thought of this as
> the normal way IO was done, not something you could only set up
> through unsafeInterleaveIO or similar.

These are problems concerning lazy I/O and not the fact that strings are 
implemented as linked lists.

> [...]

Wolfgang

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-28 Thread Jerzy Karczmarczuk
John Meacham wrote:
[EMAIL PROTECTED] wrote:
...
As a matter of pure speculation, how big an impact would it have if, in
the next "version" of Haskell, Strings were represented as opaque types
with appropriate functions to convert to and from [Char]?  Would there be
rioting in the streets?
I also have wondered how much the string representation hurts haskell
program performance.. Something I'd like to see (perhaps a bit less
drastic) would be a String class, similar to Num so string constants
would have type 
String a => a 

then we can make [Char], PackedString, and whatnot instances. It should
at least make working with alternate string representations easier.
One - among many - reasons why I use Clean [[and for some years I cannot
decide whether Haskell is the legitimate wife, and Clean a responsive
mistress, or vice-versa...]] is that strings being unboxed arrays permit
easily to communicate with lower-level binary file processing, which may
be then processed by higher-level code. Thus, I can easily read and write
image files (at least uncompressed, say .bmp), binary sound files, etc.
There is nothing fundamental there, simply a string *is* almost directly
the file buffer. Haskell introduces some overhead.
I believe that the only advantage of keeping string as lists is to
facilitate their lazy processing. Writing parsers and other lng string
consumers. But, as ajb said above, it can pass through a lazy conversion
stage, comprehensions, etc.
Jerzy Karczmarczuk
Caen, France


___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-28 Thread John Meacham
On Thu, Nov 27, 2003 at 10:54:11PM -0500, [EMAIL PROTECTED] wrote:
> OK, to be fair, it does make string-to-string operations a bit more
> convenient.  Apart from undergraduate homework exercises and some
> specific domains, though, this isn't exactly the "common case" of
> all situations where people want strings.
> 
> As a matter of pure speculation, how big an impact would it have if, in
> the next "version" of Haskell, Strings were represented as opaque types
> with appropriate functions to convert to and from [Char]?  Would there be
> rioting in the streets?

I also have wondered how much the string representation hurts haskell
program performance.. Something I'd like to see (perhaps a bit less
drastic) would be a String class, similar to Num so string constants
would have type 
String a => a 

then we can make [Char], PackedString, and whatnot instances. It should
at least make working with alternate string representations easier.
John

-- 
---
John Meacham - California Institute of Technology, Alum. - [EMAIL PROTECTED]
---
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-27 Thread Bernard James POPE
Ben writes:
> Hi, can someone tell me why Haskell strings are linked lists?  I have
> had some problems with Haskell strings:

You may want to try Data.PackedString which comes with GHC (if you are
using GHC that is).

Cheers,
Bernie.
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-27 Thread Donald Bruce Stewart
ajb:
> G'day all.
> 
> Quoting Ben Escoto <[EMAIL PROTECTED]>:
> 
> > Hi, can someone tell me why Haskell strings are linked lists?
> 
> Because that's the way it was done in Miranda, almost 20 years ago.
> 
> OK, to be fair, it does make string-to-string operations a bit more
> convenient.  Apart from undergraduate homework exercises and some
> specific domains, though, this isn't exactly the "common case" of
> all situations where people want strings.
> 
> As a matter of pure speculation, how big an impact would it have if, in
> the next "version" of Haskell, Strings were represented as opaque types
> with appropriate functions to convert to and from [Char]?  Would there be
> rioting in the streets?

You could look at GHC's FastString representation (used internally).
It is in $fptools/ghc/compiler/utils/FastString.lhs

data FastString
  = FastString   -- packed repr. on the heap.
  Int#   -- unique id
  Int#   -- length
  ByteArray# -- stuff

Now that is pretty compact.

-- Don
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Why are strings linked lists?

2003-11-27 Thread ajb
G'day all.

Quoting Ben Escoto <[EMAIL PROTECTED]>:

> Hi, can someone tell me why Haskell strings are linked lists?

Because that's the way it was done in Miranda, almost 20 years ago.

OK, to be fair, it does make string-to-string operations a bit more
convenient.  Apart from undergraduate homework exercises and some
specific domains, though, this isn't exactly the "common case" of
all situations where people want strings.

As a matter of pure speculation, how big an impact would it have if, in
the next "version" of Haskell, Strings were represented as opaque types
with appropriate functions to convert to and from [Char]?  Would there be
rioting in the streets?

Cheers,
Andrew Bromage
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Why are strings linked lists?

2003-11-27 Thread Ben Escoto
Hi, can someone tell me why Haskell strings are linked lists?  I have
had some problems with Haskell strings:

1.  Today I spend a few hours trying to track down a memory leak.  It
turns out I just didn't realize how much space a string takes up.
On my machine "replicate 500 'a'" will use 90MB of space!

2.  They are extremely slow for most operations like writing to disk,
adding something to the end, concatenation, etc.

3.  They make learning Haskell harder.  Lazy IO functions like
hGetContents are kind of slick in a shallow way, but I was
confused about the Haskell IO model because I thought of this as
the normal way IO was done, not something you could only set up
through unsafeInterleaveIO or similar.

Python's strings are also immutable, but they work great and are
really convenient.  They are apparently implemented as variable length
arrays.


-- 
Ben Escoto
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell