Re: [Haskell-cafe] The question of ByteString

2007-11-03 Thread Andrew Coppin

Duncan Coutts wrote:

On Fri, 2007-11-02 at 21:35 +, Andrew Coppin wrote:

  

Well OK, maybe I was a little vague. Let me be a bit more specific...

If you do text processing using ByteString rather than String, you get 
dramatically better performance in time and space. For me, this raises a 
number of questions:


1. Why do I have to type ByteString in my code? Why isn't the compiler 
automatically performing this optimisation for me? (I.e., is there some 
observable property that is changed? 



Yes, the semantics are different. ByteString is stricter. In some
circumstances you could discover that some list is being used
sufficiently strictly (spine and element strict) that you could do a
representation change to use strict arrays. It is something I have
pondered occasionally and I think that is an interesting avenue for
research.
  


OK. So I'm not the first person to wonder about this then... ;-)


One approach might be to do a more sophisticated strictness analysis

However this is likely to be quite fragile. I usually think that it's
better to declare the strictness you want up front in one place


Yeah, I can imagine. And I guess then you'd be forever wondering hey, 
did the compiler do that optimisation or not? like with certain current 
optimisations now. (E.g., did that type get unboxed?)



Currently the answer is yes: the ByteString interface only provides
trancated Unicode characters. But, in principle, that could be
changed.)



Indeed it could, we could provide a proper Unicode string type.
  


Likely to happen in the near term? (Not that I imagine this is a huge 
issue for anybody...)


2. ByteString makes text strings faster. But what about other kinds of 
collections? Can't we do something similar to them that makes them go 
faster?



There is much less benefit for other collections since the overheads of
generic structures are smaller for other types.

Note that the NDP parallel arrays stuff uses type functions to calculate
optimised data representations for arrays of types.
  


Type... functions...? That sounds pretty scary. :-}

I was thinking more immediately about lists in general, but also perhaps 
binary trees and things. (Although it's probably hard to optimise a 
general binary tree; you would probably optimise a tree designed for a 
specific *purpose* instead...)


As I understand it, ByteString is faster due to several factors. First 
of all, it's stricter.



So that's the semantic difference.

  
Secondly, it's an unboxed structure. 



Which is the representation optimisation allowed by the semantic change
of making it stricter.
  


Indeed.

Third, it's implemented as an array that looks like a linked list.
Given how ubiquitous lists are in Haskell, array that looks like a
linked list sounds like one seriously useful data type! Yet
ByteString seems to be the only implementation of this concept - and
only for lists on unboxed bytes. (Not even unboxed Word16 or anything
else, *only* Word8.) If I understand this correctly, a ByteString is
actually a linked list of large array chunks. (This presumably yields
fastER random access than a plain linked list?) Also, it seems to be
possible to create a new array which is merely a subrange of an
existing one, without any copying; the standard array API doesn't seem
to provide this, yet it sounds damn useful.



I think the NDP project should get us most of this stuff actually.
  


Cool.

So in summary, these kinds of things are on the way, they're just not 
here quite yet?


(BTW, anybody have any clue what's happening with stream fusion? I 
remember reading the paper saying hey, this is how it works and it's 
cool and we're going to try to replace the whole list library with new 
stream implementations, but that's the last I heard...)


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Why does GHC limit stack size?

2007-11-03 Thread Abhay Parvate
Hello all,

Why is there a limitation on the stack size in GHC? Like heap where we can
limit the size by -M RTS option but the default is unlimited, why not let
the program use as big a stack as required? If not by default, then by a
separate option?

Some of the functions that we write in recursive fashion will usually cause
a stack overflow, but will work fine if there is more stack (suppose we are
not worried about efficiency). And these functions generally look nicer and
compact than their tail recursive versions.

Is this is a technical hurdle, or just a checkpoint for runaway programs?
Can we have an RTS flag allowing unlimited stack size?

Thanks,
Abhay
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Why does GHC limit stack size?

2007-11-03 Thread Adrian Hey

Hello,

 Why is there a limitation on the stack size in GHC? Like heap where we
 can limit the size by -M RTS option but the default is unlimited, why
 not let the program use as big a stack as required? If not by default,
 then by a separate option?

 Some of the functions that we write in recursive fashion will usually
 cause a stack overflow, but will work fine if there is more stack
 (suppose we are not worried about efficiency). And these functions
 generally look nicer and compact than their tail recursive versions.

 Is this is a technical hurdle, or just a checkpoint for runaway
 programs?

This was discussed a while ago on the ghc users mailing list.
I think there was general agreement that this was bad, but
that doing something better meant a lot of work for someone
(who could be trusted to get it right :-)

http://www.haskell.org/pipermail/glasgow-haskell-users/2007-May/012467.html

Regards
--
Adrian Hey

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Why does GHC limit stack size?

2007-11-03 Thread Adrian Hey

Bulat Ziganshin wrote:

because program that require 8mb stack, will probably require 8gb when
processing more data :)


So.. what? You could say the same about heap, which was rather the point
of the earlier thread.

Regards
--
Adrian Hey


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] The question of ByteString

2007-11-03 Thread Paul Johnson

Andrew Coppin wrote:

Duncan Coutts wrote:

Yes, the semantics are different. ByteString is stricter. In some
circumstances you could discover that some list is being used
sufficiently strictly (spine and element strict) that you could do a
representation change to use strict arrays. It is something I have
pondered occasionally and I think that is an interesting avenue for
research.
Also important is the fact that String = [Char], and a Char can hold any 
Unicode character, whereas a ByteString is a sequence of Word8 elements 
(i.e. integers from 0-255).  If you want to store a Char in a ByteString 
then you have to convert it to UTF-8 or something similar.


The Encoding package 
(http://hackage.haskell.org/cgi-bin/hackage-scripts/package/encoding-0.2) 
has the functions for this kind of thing.


Paul.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Why can't Haskell be faster?

2007-11-03 Thread Hugh Perkins
On Nov 3, 2007 5:00 AM, Ryan Dickie [EMAIL PROTECTED] wrote:
 Lossless File compression, AKA entropy coding, attempts to maximize the
 amount of information per bit (or byte) to be as close to the entropy as
 possible. Basically, gzip is measuring (approximating) the amount of
 information contained in the code.

Hmmm, interesting idea.


  I think it would be interesting to compare the ratios between raw file size
 its entropy (we can come up with a precise metric later). This would show us
 how concise the language and code actually is.

Yeah, let's all write in bytecode using a hex editor :-D
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Compile-time evaluation

2007-11-03 Thread Alfonso Acosta
Hi Nicholas,

 compileTimeEval :: Data a = a - ExpQ
 compileTimeEval = return . toExp


 You're telling me all that horrendous pain in implementing toExp and
it already exists?!?

Yes unfortunately, compileTimeEval already exists in TH, it's called lift

compileTimeEval :: Lift a = a - ExpQ
compileTimeEval = lift

But don't be so hard on yourself. Your approach has one advantage.
GHC supports automatic derivation of Data whereas Lift instances have
to be created manually.

Note, however, that Lift instances can also be generated using Igloo's
th-lift package[1].

Cheers,

Fons

[1] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/th-lift-0.2
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] The question of ByteString

2007-11-03 Thread Brandon S. Allbery KF8NH


On Nov 3, 2007, at 5:34 , Andrew Coppin wrote:

(BTW, anybody have any clue what's happening with stream fusion? I  
remember reading the paper saying hey, this is how it works and  
it's cool and we're going to try to replace the whole list library  
with new stream implementations, but that's the last I heard...)


dons and I have both mentioned Data.Stream recently; I think it's on  
hackage.  It's still a work in progress though, I believe.


--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [EMAIL PROTECTED]
system administrator [openafs,heimdal,too many hats] [EMAIL PROTECTED]
electrical and computer engineering, carnegie mellon universityKF8NH


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Rose Tree

2007-11-03 Thread Ryan Bloor
Hello,
 
 
I need help... I am having trouble with rose trees.
 
 1
 
  2 3   4
   5 6 7   8910 11
 
That is the rose tree that I seek. 
 
Data Tree a = Empty | Leaf a | Node a [(Tree a)]
 
example :: Tree (String, String) 
example = Node (a,b) -- root node
[ ] -- end of tree
 
What I want to do is create two functions that return either the children or 
parents of a given input, here a String. 
 String - Tree - [String] I think 
_
The next generation of MSN Hotmail has arrived - Windows Live Hotmail
http://www.newhotmail.co.uk___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] torrent for 6.8.1?

2007-11-03 Thread brad clawsie
do torrents exist for 6.8.1? my experience is that people will use
torrents if they are offered and they really do lift the pressure from
the origin domain (haskell.org)



pgpTQvjxj07Kq.pgp
Description: PGP signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe