Re: [Haskell-cafe] Re: Suspected stupid Haskell Question

2007-10-19 Thread Justin Bailey
On 10/17/07, Thomas Hartman [EMAIL PROTECTED] wrote:


 Is there a more scientific way of figuring out if one version is better
 than the other by using, say profiling tools?


Profiling Haskell programs is black magic, but of the sort you learn by
having a problem to solve. I don't think it requires special genius, just
enough motivation. Profiling the interpreter my team created during the ICFP
contest did that to me.

The GHC heap profiler looked weird to me at first because you are required
to convert its output to a PostScript file. However, it's well worth doing.
There are several types of profiles, but you would probably be most
interested in the biographical profile. The GHC documentation is pretty
good and the article Heap Profiling for Space Efficiency[1] may also help.
The article was written for the nhc compiler but the tools look the same.

Performance profiling is easier - it's just dumped as text output when your
program runs. GHC's documentation is really good here. One thing I've
learned is to look for two things:

  1) Functions that do allocations and execute many times
  2) Functions that run lots of times

#2 is pretty much universal for profiling, but #1 is unique to Haskell (and
probably any pure functional language).

Sadly none of these technique work for stack overflows. Or, more likely, I
haven't learned how to spot them. Luckily the culprit is usually a fold that
isn't strict enough. Albert's post about the Bird book is a good pointer. I
just read that chapter myself last night, and he gives a very clear
explanation of how lazy evaluation works (he calls it 'outermost reduction')
and how strictness interacts with laziness.

Hope that helps!

Justin

[1] http://citeseer.ist.psu.edu/runciman96heap.html
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Suspected stupid Haskell Question

2007-10-18 Thread Bertram Felgenhauer
Thomas Hartman wrote:
 Since I'm interested in the stack overflow issue, and getting acquainted 
 with quickcheck, I thought I would take this opportunity to compare your 
 ordTable with some code Yitzchak Gale posted earlier, against Ham's 
 original problem.
 
 As far as I can tell, they're the same. They work on lists up to 10 
 element lists of strings, but on 10^6 size lists I lose patience waiting 
 for them to finish. 
 
 Is there a more scientific way of figuring out if one version is better 
 than the other by using, say profiling tools?
 
 Or by reasoning about the code?

No, measuring actual performance is the only way.

 t.
 
 
 
 import Data.List
 import qualified Data.Map as M
 import Control.Arrow
 import Test.QuickCheck
 import Test.GenTestData

I couldn't find this one, so I was unable to test anything.
[snip]

Some ideas:

- Your code doesn't contain a  main  function. Did you compile it?
- Strings are lists; storing a string of n characters needs 12*n
  bytes on 32 bit architectures, and 24*n bytes with 64 bits.

  1 million strings with 10 characters each will consume 120MB (or 240MB),
  without accounting for any overhead for the copying garbage collector.

  I expect that you can save some memory (and thus garbage collection
  time) by using Data.ByteString.ByteString (which uses about 32 + n
  (32 bits) or 64 + n bytes (64 bits), if I remember correctly).
- On using Data.Map vs. sort and group: The main advantage of the first
  approach is that duplicates are eliminated as they are found. So in
  fact, if you have a list of only 100 different strings, your code
  will run in constant memory, assuming the list is built lazily.

  Your test case looks like there are only few duplicates, so I'd expect
  the Data.Map code to perform a bit worse than the Data.List one. But
  as I wrote above, profiling is the only way to find out for sure.

Bertram
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Suspected stupid Haskell Question

2007-10-18 Thread Yitzchak Gale
Hi Chad,

Chad Scherrer wrote:
 I think the stack overflows
 were happening because Map.insertWith isn't strict enough.
 Otherwise I think the code is the same.

They are visibly almost identical - except that you
do an extra lookup to get your strictness, while insertWith'
has internal access and can do it in the same pass.
So using insertWith' should be faster.

 But I would expect intTable to be faster, since it
 uses IntMap,

I'm not sure if that's strictly true. I have never done
any testing, but I get the feeling that the performance
of IntMap may depend on the distribution of the keys.
After some not so good experiences, I abandonned
IntMap and just use Map. Anyone know?

Also - I don't have insertWith' for IntMap.

 and there's no IntMap.insertWith' as of 6.6.1

The mtl package is now independent of GHC.

Regards,
Yitz
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Suspected stupid Haskell Question

2007-10-18 Thread Thomas Hartman
  But I would expect intTable to be faster,

But if I understand correctly, intTable can only deal with integer keys, 
whereas BH's original question would have wanted string keys, and I can't 
see a way to convert string to int and back.

t.




Chad Scherrer [EMAIL PROTECTED] 
10/17/2007 11:38 PM

To
Thomas Hartman/ext/[EMAIL PROTECTED]
cc
haskell-cafe@haskell.org, [EMAIL PROTECTED]
Subject
Re: [Haskell-cafe] Re: Suspected stupid Haskell Question






Hmm, is insertWith' new? If I remember right, I think the stack overflows 
were happening because Map.insertWith isn't strict enough. Otherwise I 
think the code is the same. But I would expect intTable to be faster, 
since it uses IntMap, and there's no IntMap.insertWith' as of 6.6.1 
(though it may be easy enough to add one).

Chad

On 10/17/07, Thomas Hartman  [EMAIL PROTECTED] wrote:

Since I'm interested in the stack overflow issue, and getting acquainted 
with quickcheck, I thought I would take this opportunity to compare your 
ordTable with some code Yitzchak Gale posted earlier, against Ham's 
original problem. 

As far as I can tell, they're the same. They work on lists up to 10 
element lists of strings, but on 10^6 size lists I lose patience waiting 
for them to finish. 

Is there a more scientific way of figuring out if one version is better 
than the other by using, say profiling tools? 

Or by reasoning about the code? 

t. 

 

import Data.List 
import qualified Data.Map as M 
import Control.Arrow 
import Test.QuickCheck 
import Test.GenTestData 
import System.Random 

{- 
Is there a library function to take a list of Strings and return a list of 

ints showing how many times each String occurs in the list. 

So for example: 

[egg, egg, cheese] would return [2,1] 
-} 

testYitzGale n = do 
  l - rgenBndStrRow (10,10) (10^n,10^n)  -- 10 strings, strings are 
10 chars long, works. craps out on 10^6. 
  m - return $ freqFold l 
  putStrLn $ map items:  ++ ( show $ M.size m ) 

testCScherer n = do 
  l - rgenBndStrRow (10,10) (10^n,10^n)  -- same limitations as yitz gale 
code. 
  m - return $ ordTable l 
  putStrLn $ items:  ++ ( show $ length m ) 


-- slow for big lists 
--freqArr = Prelude.map ( last  length ) . group . sort 

-- yitz gale code. same as chad scherer code? it's simpler to understand, 
but is it as fast? 
freqFold :: [[Char]] - M.Map [Char] Int 
freqFold = foldl' g M.empty 
  where g accum x = M.insertWith' (+) x 1 accum 
-- c scherer code. insists on ord. far as I can tell, same speed as yitz. 
ordTable :: (Ord a) = [a] - [(a,Int)] 
ordTable xs = M.assocs $! foldl' f M.empty xs 
where f m x = let  m' = M.insertWith (+) x 1 m 
   Just v = M.lookup x m' 
  in v `seq` m' 


l = [egg,egg,cheese] 

-- other quickcheck stuff 
--prop_unchanged_by_reverse = \l - ( freqArr (l :: [[Char]]) ) == ( 
freqArr $ reverse l ) 
--prop_freqArr_eq_freqFold = \l - ( freqArr (l :: [[Char]]) == (freqFold 
l)) 
--test1 = quickCheck prop_unchanged_by_reverse 
--test2 = quickCheck prop_freqArr_eq_freqFold 

--- generate test data: 
genBndStrRow (minCols,maxCols) (minStrLen, maxStrLen) = rgen ( genBndLoL 
(minStrLen, maxStrLen) (minCols,maxCols) ) 

gen gen = do 
  sg - newStdGen 
  return $ generate 1 sg gen 

-- generator for a list with length between min and max 
genBndList :: Arbitrary a = (Int, Int) - Gen [a] 
genBndList (min,max) = do 
  len - choose (min,max) 
  vector len 


-- lists of lists 
--genBndLoL :: (Int, Int) - (Int, Int) - Gen [[a]] 
genBndLoL (min1,max1) (min2,max2) = do 
  len1 - choose (min1,max1) 
  len2 - choose (min2,max2) 
  vec2 len1 len2 

--vec2 :: Arbitrary a = Int - Int - Gen [[a]] 
vec2 n m = sequence [ vector m | i - [1..n] ] 





---

This e-mail may contain confidential and/or privileged information. If you 
are not the intended recipient (or have received this e-mail in error) 
please notify the sender immediately and destroy this e-mail. Any 
unauthorized copying, disclosure or distribution of the material in this 
e-mail is strictly forbidden.___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Suspected stupid Haskell Question

2007-10-18 Thread Albert Y. C. Lai

Thomas Hartman wrote:
Since I'm interested in the stack overflow issue, and getting acquainted 
with quickcheck, I thought I would take this opportunity to compare your 
ordTable with some code Yitzchak Gale posted earlier, against Ham's 
original problem.


As far as I can tell, they're the same. They work on lists up to 10 
element lists of strings, but on 10^6 size lists I lose patience waiting 
for them to finish.


Is there a more scientific way of figuring out if one version is better 
than the other by using, say profiling tools?


Or by reasoning about the code?


It can be reasoned. Some people know how to do it. No one has written up 
the method and theory properly. It is currently rather ad hoc. I want to 
write one in the future.


Some of the knowledge is in:

http://www.haskell.org/haskellwiki/Stack_overflow

http://en.wikibooks.org/wiki/Haskell (Advanced Track, Haskell Performance)

Richard Bird's Introduction to Functional Programming using Haskell, 
second edition (chapter 7 Efficiency, but also other chapters contain 
efficiency discussions)


anything that adequately defines lazy evaluation (or whatever evaluation 
your favourite executor seems to use)


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Suspected stupid Haskell Question

2007-10-17 Thread Big_Ham

Is there a library function to take a list of Strings and return a list of
ints showing how many times each String occurs in the list.

So for example:

[egg, egg, cheese] would return [2,1]

I couldn't find anything on a search, or anything in the librarys.

Thanks BH.
-- 
View this message in context: 
http://www.nabble.com/Suspected-stupid-Haskell-Question-tf4639170.html#a13250044
Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Suspected stupid Haskell Question

2007-10-17 Thread Stefan Holdermans

BH,

Is there a library function to take a list of Strings and return a  
list of

ints showing how many times each String occurs in the list.

So for example:

[egg, egg, cheese] would return [2,1]


freq xs = map length (group xs)

HTH,

  Stefan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Suspected stupid Haskell Question

2007-10-17 Thread Dougal Stanton
On 17/10/2007, Big_Ham [EMAIL PROTECTED] wrote:

 Is there a library function to take a list of Strings and return a list of
 ints showing how many times each String occurs in the list.

 So for example:

 [egg, egg, cheese] would return [2,1]

 I couldn't find anything on a search, or anything in the librarys.

 Thanks BH.

No, but it is also trivial to create, with the 'group' function in
Data.List. I'll stop there though, cos this could be a homework
question.

Cheers,

D.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Suspected stupid Haskell Question

2007-10-17 Thread Dougal Stanton
On 17/10/2007, Dougal Stanton [EMAIL PROTECTED] wrote:

 No, but it is also trivial to create, with the 'group' function in
 Data.List. I'll stop there though, cos this could be a homework
 question.

It's just occurred to me that answering questions like these is a bit
like the prisoner's dilemma.

 - If I give the full answer and no-one else does, then maybe I'm
doing someone's homework for them?
 - If I just give clues and someone else gives the answer, it makes me
look mean. :-(
 - If we all give the answer, everybody's happy and the blame (if it
was a set question) is spread around a bit.
 - If we all answer with vague hints then it makes the list as a whole
less useful and seem a bit arrogant.

There's no way to win! :-)

D.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Suspected stupid Haskell Question

2007-10-17 Thread Big_Ham

you are indeed right Peter, that's what I was after, the frequency regardless
of elements.  It also doesn't matter if it  outputs them as tuples, or as a
separate list on their own because each value would belong to the first
occurance of that element if you seem what I mean, so you could still tell
what came from what.


Peter Verswyvelen wrote:
 
 I'm a newbie here, so I'm not sure about my reply, but I think this is 
 not the answer to his question.
 
 freq [egg, egg, cheese] indeed returns [2,1]
 
 but
 
 freq [egg, cheese, egg] returns [1,1,1]
 
 BH just mentioned he needed the frequenty of elements in the list, 
 independent of their order.
 
 So in that case, the result should be a list of ordered pairs like: 
 [(egg, 2), (cheese, 1)]. Or a pair of two lists, like ([egg, 
 cheese), (2,1)]. Otherwise you would not know which frequency belongs 
 to which element?
 
 I can't write this concisely nor efficient yet, but the following does 
 the job:
 
 import Data.List
 
 freq xs = zip e f
   where
 s = sort xs
 e = nub s
 f = map length (group s)
 
 However, I suspect the experts here will be able to make that much 
 shorter and more efficient (maybe using Data.Map?)
 
 Peter
 
 
 Stefan Holdermans wrote:
 BH,

 Is there a library function to take a list of Strings and return a 
 list of
 ints showing how many times each String occurs in the list.

 So for example:

 [egg, egg, cheese] would return [2,1]

 freq xs = map length (group xs)

 HTH,

   Stefan
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


 
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe
 
 

-- 
View this message in context: 
http://www.nabble.com/Suspected-stupid-Haskell-Question-tf4639170.html#a13251343
Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Suspected stupid Haskell Question

2007-10-17 Thread Yitzchak Gale
Dougal Stanton wrote:
 It's just occurred to me that answering questions like these is a bit
 like the prisoner's dilemma...
 There's no way to win! :-)

Yes there is. Just mention the following wiki page
as part of your answer:

http://haskell.org/haskellwiki/Homework_help

-Yitz
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Suspected stupid Haskell Question

2007-10-17 Thread Stuart Cook
On 10/17/07, Peter Verswyvelen [EMAIL PROTECTED] wrote:
 So in that case, the result should be a list of ordered pairs like:
 [(egg, 2), (cheese, 1)]. Or a pair of two lists, like ([egg,
 cheese), (2,1)]. Otherwise you would not know which frequency belongs
 to which element?

 However, I suspect the experts here will be able to make that much
 shorter and more efficient (maybe using Data.Map?)


  import Control.Arrow
  import Data.List

  freqs = map (head  length) . group . sort

I have used this function quite a few times already.


Stuart
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Suspected stupid Haskell Question

2007-10-17 Thread Peter Verswyvelen
I'm a newbie here, so I'm not sure about my reply, but I think this is 
not the answer to his question.


freq [egg, egg, cheese] indeed returns [2,1]

but

freq [egg, cheese, egg] returns [1,1,1]

BH just mentioned he needed the frequenty of elements in the list, 
independent of their order.


So in that case, the result should be a list of ordered pairs like: 
[(egg, 2), (cheese, 1)]. Or a pair of two lists, like ([egg, 
cheese), (2,1)]. Otherwise you would not know which frequency belongs 
to which element?


I can't write this concisely nor efficient yet, but the following does 
the job:


import Data.List

freq xs = zip e f
 where
   s = sort xs
   e = nub s
   f = map length (group s)

However, I suspect the experts here will be able to make that much 
shorter and more efficient (maybe using Data.Map?)


Peter


Stefan Holdermans wrote:

BH,

Is there a library function to take a list of Strings and return a 
list of

ints showing how many times each String occurs in the list.

So for example:

[egg, egg, cheese] would return [2,1]


freq xs = map length (group xs)

HTH,

  Stefan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Suspected stupid Haskell Question

2007-10-17 Thread Yitzchak Gale
Peter Verswyvelen wrote:
 However, I suspect the experts here will be able to make that much
 shorter and more efficient (maybe using Data.Map?)

That makes it difficult to respond. I am definitely not
claiming to be an expert. For one thing, my name
is not Simon. But I'll say something anyway, fwiw.

The problem there is that nub is O(n^2).
You're stuck with that if your type is not an
instance of Ord (but then you can't use sort,
either).

When you can assume Ord, the standard solution
is, as you suggest, something like:

import qualified Data.Map as M
import Data.List

histogram = M.toList . foldl' (\m x - M.insertWith' (+) x 1 m)
M.empty . M.fromList

This should work efficiently, with the right amount of laziness, even
for very large lists.

Regards,
Yitz
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Suspected stupid Haskell Question

2007-10-17 Thread Peter Verswyvelen

Nice!!! As I'm learning Arrows now, this is really useful :-)

Stuart Cook wrote:

  import Control.Arrow
  import Data.List

  freqs = map (head  length) . group . sort

I have used this function quite a few times already.


Stuart
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


  


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Suspected stupid Haskell Question

2007-10-17 Thread Yitzchak Gale
I wrote:
 When you can assume Ord, the standard solution
 is, as you suggest, something like...

Oops, sorry, doesn't typecheck. Here it is corrected:

 import qualified Data.Map as M
 import Data.List

 histogram = M.toList . foldl' (\m x - M.insertWith' (+) x 1 m) M.empty

 This should work efficiently, with the right amount of laziness, even
 for very large lists.

Stuart's Arrows thing is much nicer when your list is small
enough to be held in memory all at once.

-Yitz
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Suspected stupid Haskell Question

2007-10-17 Thread manu

 The problem there is that nub is O(n^2).

Is there a place where one can look up the complexity of Standard  
Libraries functions  ?



E.D
 
___

Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Suspected stupid Haskell Question

2007-10-17 Thread Yitzchak Gale
 Is there a place where one can look up the complexity of Standard
 Libraries functions  ?

No. Some modules have it in their Haddock docs.
Most don't.

But the source code is available. :)

-Yitz
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: Suspected stupid Haskell Question

2007-10-17 Thread Chad Scherrer
Big_Ham joymachine2001 at hotmail.com writes:

 
 
 Is there a library function to take a list of Strings and return a list of
 ints showing how many times each String occurs in the list.
 
 So for example:
 
 [egg, egg, cheese] would return [2,1]
 
 I couldn't find anything on a search, or anything in the librarys.
 
 Thanks BH.

Hi BH,

This might be overkill, but it works well for me. And it avoid stack overflows I
was originally getting for very large lists. Dean Herrington and I came up with
this:

ordTable :: (Ord a) = [a] - [(a,Int)]
ordTable xs = Map.assocs $! foldl' f Map.empty xs
where f m x = let  m' = Map.insertWith (+) x 1 m
   Just v = Map.lookup x m'
  in v `seq` m'

intTable :: [Int] - [(Int,Int)]
intTable xs = IntMap.assocs $! foldl' f IntMap.empty xs
where f m x = let  m' = IntMap.insertWith (+) x 1 m
   Just v = IntMap.lookup x m'
  in v `seq` m'

enumTable :: (Enum a) = [a] - [(a,Int)]
enumTable = map fstToEnum . intTable . map fromEnum
where fstToEnum (x,y) = (toEnum x, y)

If you like, it's easily wrapped in a Table class.

Chad




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Suspected stupid Haskell Question

2007-10-17 Thread Dan Weston

Oh, why didn't you say you were learning Arrows? Then why not

freqs = sort  group  map (head  length)

So much more readable, don't you think? ;)

Either way, if you run into the dreaded monomorphism restriction:

Ambiguous type variable `a' in the constraint:
  `Ord a' arising from use of `sort' at A.hs:6:40-43
Possible cause: the monomorphism restriction applied to the following:
  freqs :: [a] - [(a, Int)] (bound at A.hs:6:0)
Probable fix: give these definition(s) an explicit type signature
  or use -fno-monomorphism-restriction

you'll have to either add an explicit type annotation:

freqs :: (Ord a) = [a] - [(a, Int)]

or else throw an arg onto it:

freqs x = map (head  length) . group . sort $ x

The latter hurts too much to write, so I always add the type.

Peter Verswyvelen wrote:

Nice!!! As I'm learning Arrows now, this is really useful :-)

Stuart Cook wrote:

  import Control.Arrow
  import Data.List

  freqs = map (head  length) . group . sort

I have used this function quite a few times already.


Stuart
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


  


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe





___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Suspected stupid Haskell Question

2007-10-17 Thread Thomas Hartman
Since I'm interested in the stack overflow issue, and getting acquainted 
with quickcheck, I thought I would take this opportunity to compare your 
ordTable with some code Yitzchak Gale posted earlier, against Ham's 
original problem.

As far as I can tell, they're the same. They work on lists up to 10 
element lists of strings, but on 10^6 size lists I lose patience waiting 
for them to finish. 

Is there a more scientific way of figuring out if one version is better 
than the other by using, say profiling tools?

Or by reasoning about the code?

t.



import Data.List
import qualified Data.Map as M
import Control.Arrow
import Test.QuickCheck
import Test.GenTestData
import System.Random

{-
Is there a library function to take a list of Strings and return a list of
ints showing how many times each String occurs in the list.

So for example:

[egg, egg, cheese] would return [2,1] 
-}

testYitzGale n = do
  l - rgenBndStrRow (10,10) (10^n,10^n)  -- 10 strings, strings are 
10 chars long, works. craps out on 10^6.
  m - return $ freqFold l 
  putStrLn $ map items:  ++ ( show $ M.size m )

testCScherer n = do
  l - rgenBndStrRow (10,10) (10^n,10^n)  -- same limitations as yitz gale 
code.
  m - return $ ordTable l 
  putStrLn $ items:  ++ ( show $ length m )


-- slow for big lists
--freqArr = Prelude.map ( last  length ) . group . sort

-- yitz gale code. same as chad scherer code? it's simpler to understand, 
but is it as fast?
freqFold :: [[Char]] - M.Map [Char] Int
freqFold = foldl' g M.empty
  where g accum x = M.insertWith' (+) x 1 accum
-- c scherer code. insists on ord. far as I can tell, same speed as yitz.
ordTable :: (Ord a) = [a] - [(a,Int)]
ordTable xs = M.assocs $! foldl' f M.empty xs
where f m x = let  m' = M.insertWith (+) x 1 m
   Just v = M.lookup x m'
  in v `seq` m'


l = [egg,egg,cheese]

-- other quickcheck stuff
--prop_unchanged_by_reverse = \l - ( freqArr (l :: [[Char]]) ) == ( 
freqArr $ reverse l )
--prop_freqArr_eq_freqFold = \l - ( freqArr (l :: [[Char]]) == (freqFold 
l))
--test1 = quickCheck prop_unchanged_by_reverse
--test2 = quickCheck prop_freqArr_eq_freqFold

--- generate test data: 
genBndStrRow (minCols,maxCols) (minStrLen, maxStrLen) = rgen ( genBndLoL 
(minStrLen, maxStrLen) (minCols,maxCols) )

gen gen = do
  sg - newStdGen
  return $ generate 1 sg gen

-- generator for a list with length between min and max
genBndList :: Arbitrary a = (Int, Int) - Gen [a]
genBndList (min,max) = do
  len - choose (min,max)
  vector len


-- lists of lists
--genBndLoL :: (Int, Int) - (Int, Int) - Gen [[a]]
genBndLoL (min1,max1) (min2,max2) = do
  len1 - choose (min1,max1)
  len2 - choose (min2,max2)
  vec2 len1 len2

--vec2 :: Arbitrary a = Int - Int - Gen [[a]]
vec2 n m = sequence [ vector m | i - [1..n] ]





Chad Scherrer [EMAIL PROTECTED] 
Sent by: [EMAIL PROTECTED]
10/17/2007 01:35 PM

To
haskell-cafe@haskell.org
cc

Subject
[Haskell-cafe] Re: Suspected stupid Haskell Question






Big_Ham joymachine2001 at hotmail.com writes:

 
 
 Is there a library function to take a list of Strings and return a list 
of
 ints showing how many times each String occurs in the list.
 
 So for example:
 
 [egg, egg, cheese] would return [2,1]
 
 I couldn't find anything on a search, or anything in the librarys.
 
 Thanks BH.

Hi BH,

This might be overkill, but it works well for me. And it avoid stack 
overflows I
was originally getting for very large lists. Dean Herrington and I came up 
with
this:

ordTable :: (Ord a) = [a] - [(a,Int)]
ordTable xs = Map.assocs $! foldl' f Map.empty xs
where f m x = let  m' = Map.insertWith (+) x 1 m
   Just v = Map.lookup x m'
  in v `seq` m'

intTable :: [Int] - [(Int,Int)]
intTable xs = IntMap.assocs $! foldl' f IntMap.empty xs
where f m x = let  m' = IntMap.insertWith (+) x 1 m
   Just v = IntMap.lookup x m'
  in v `seq` m'

enumTable :: (Enum a) = [a] - [(a,Int)]
enumTable = map fstToEnum . intTable . map fromEnum
where fstToEnum (x,y) = (toEnum x, y)

If you like, it's easily wrapped in a Table class.

Chad




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe



---

This e-mail may contain confidential and/or privileged information. If you 
are not the intended recipient (or have received this e-mail in error) 
please notify the sender immediately and destroy this e-mail. Any 
unauthorized copying, disclosure or distribution of the material in this 
e-mail is strictly forbidden.___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Suspected stupid Haskell Question

2007-10-17 Thread Chad Scherrer
Hmm, is insertWith' new? If I remember right, I think the stack overflows
were happening because Map.insertWith isn't strict enough. Otherwise I think
the code is the same. But I would expect intTable to be faster, since it
uses IntMap, and there's no IntMap.insertWith' as of 6.6.1 (though it may be
easy enough to add one).

Chad

On 10/17/07, Thomas Hartman [EMAIL PROTECTED] wrote:


 Since I'm interested in the stack overflow issue, and getting acquainted
 with quickcheck, I thought I would take this opportunity to compare your
 ordTable with some code Yitzchak Gale posted earlier, against Ham's original
 problem.

 As far as I can tell, they're the same. They work on lists up to 10
 element lists of strings, but on 10^6 size lists I lose patience waiting for
 them to finish.

 Is there a more scientific way of figuring out if one version is better
 than the other by using, say profiling tools?

 Or by reasoning about the code?

 t.

 

 import Data.List
 import qualified Data.Map as M
 import Control.Arrow
 import Test.QuickCheck
 import Test.GenTestData
 import System.Random

 {-
 Is there a library function to take a list of Strings and return a list of
 ints showing how many times each String occurs in the list.

 So for example:

 [egg, egg, cheese] would return [2,1]
 -}

 testYitzGale n = do
   l - rgenBndStrRow (10,10) (10^n,10^n)  -- 10 strings, strings are
 10 chars long, works. craps out on 10^6.
   m - return $ freqFold l
   putStrLn $ map items:  ++ ( show $ M.size m )

 testCScherer n = do
   l - rgenBndStrRow (10,10) (10^n,10^n)  -- same limitations as yitz gale
 code.
   m - return $ ordTable l
   putStrLn $ items:  ++ ( show $ length m )


 -- slow for big lists
 --freqArr = Prelude.map ( last  length ) . group . sort

 -- yitz gale code. same as chad scherer code? it's simpler to understand,
 but is it as fast?
 freqFold :: [[Char]] - M.Map [Char] Int
 freqFold = foldl' g M.empty
   where g accum x = M.insertWith' (+) x 1 accum
 -- c scherer code. insists on ord. far as I can tell, same speed as yitz.
 ordTable :: (Ord a) = [a] - [(a,Int)]
 ordTable xs = M.assocs $! foldl' f M.empty xs
 where f m x = let  m' = M.insertWith (+) x 1 m
Just v = M.lookup x m'
   in v `seq` m'


 l = [egg,egg,cheese]

 -- other quickcheck stuff
 --prop_unchanged_by_reverse = \l - ( freqArr (l :: [[Char]]) ) == (
 freqArr $ reverse l )
 --prop_freqArr_eq_freqFold = \l - ( freqArr (l :: [[Char]]) == (freqFold
 l))
 --test1 = quickCheck prop_unchanged_by_reverse
 --test2 = quickCheck prop_freqArr_eq_freqFold

 --- generate test data:
 genBndStrRow (minCols,maxCols) (minStrLen, maxStrLen) = rgen ( genBndLoL
 (minStrLen, maxStrLen) (minCols,maxCols) )

 gen gen = do
   sg - newStdGen
   return $ generate 1 sg gen

 -- generator for a list with length between min and max
 genBndList :: Arbitrary a = (Int, Int) - Gen [a]
 genBndList (min,max) = do
   len - choose (min,max)
   vector len


 -- lists of lists
 --genBndLoL :: (Int, Int) - (Int, Int) - Gen [[a]]
 genBndLoL (min1,max1) (min2,max2) = do
   len1 - choose (min1,max1)
   len2 - choose (min2,max2)
   vec2 len1 len2

 --vec2 :: Arbitrary a = Int - Int - Gen [[a]]
 vec2 n m = sequence [ vector m | i - [1..n] ]




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Stupid Haskell question

1993-02-26 Thread kh

 I also think its neat that you seem to have found a use for cyclic
 unification.  This is definitely an impetus to extend the language to
 include cyclic types.  (I don't expect we'll do this for a while
 though.  You might consider modifying the Glasgow Haskell compiler to
 include this yourself -- it may not be too difficult.)

I'm not sure with it's possible to do this with the substitution
algorithm used in our compiler, I thought the occurs check helped
preserve the idempotency of substitions?  It's certainly possible in
the graph-based algorithm I describe in my 1991 Glasgow FP workshop
paper (in fact, you get this "for free" and end up adding an occurs
check just to match the normal algorithm!).  The paper describes
[informally] how to transform a substitution algorithm written using
monads (such as the one in our compiler) into the graph version.  You
can also use a parallel algorithm if you use the right monad.  Apart
from the changes to the underlying monad, the differences are quite
minor.  The paper is (still) a draft version, so may be buggy, but
please ask if you'd like a copy.

Oh if you did this, you'd  also need to change the type output
routines, and you might like to allow explicitly cyclic types in type
signatures [may as well as existential types while you're there, too
:-) -- it's a pain when not all expressions can be given a type
signature].

Apart from the implementation (which doesn't seem to be a problem if
the right alg. is used[*]) does anyone know of more subtle problems with
cyclic types [such as not being able to define the type system using
the traditional sequent style]?  Is this a well-studied area?

Kevin

[*] Though if substitions couldn't be used, I wouldn't like to have
to reimplement the type checkers in all existing Haskell compilers.





Re: Stupid Haskell question

1993-02-23 Thread Guy M. Argo


Guy S., Phil W.,...
I ran into exactly this problem in two different applications.
The first was the same that Guy S. points out, namely adding
arbtrirary but well-typed annotations to a parse-tree. The solution I
eventually ended up using (after discussing it with John Hughes) was
the following:

data FooTree a b = Leaf b |
   Node (AnnotFooTree a b) (AnnotFooTree a b)
type AnnotFooTree a b = (a, FooTree a b)

originally I'd tried:

data FooTree a b = Leaf a | Node (a (FooTree b)) (a (FooTree b))

which was of course rejected by the Hindley-Milner typechecker.
I recall David Murphy saying that I could have done that in Isabelle
but it involved a much more powerful type-checker. BTW, Simon PJ and
David Lester hit on exactly the same solution in their paper on
fully lazy lambda-lifting so presumably they'd given this problem
some thought too.

The other application was Tries. Imagine we have some type:

data Assoc a b = ...

which maps objects of type a to objects of type b. We don't care if
it's implemented as a tree, list, array or whatever provided there's
a standard interface. Now imagine that our applications involves
keys which are variable length lists/strings of as. Using the above
type instantiating a as [a] will cause increasingly costly comparisons
as the search gets closer to its goal. Trie structures avoid this by
matching one segment of the key at a time. For instance rather than
storing the following

"guy argo"
|
"guy steele"

it's stored as:

'g' - 'u' - 'y' - ' ' - 'a' - 'r' - 'g' - 'o'
 |
's' - 't' - 'e' - 'e' - 'l' - 'e'

(n.b. Compressed tries go one step further by collapsing the one-way
 branching to give:

"guy " - "argo"
 |
 "steele")

Anyway, given the above Assoc type a very natural encoding of tries
is:

type Trie a b = Assoc a (Trie a b)

The operations on Tries can now be built out of the Assoc operations
which allows the lookup structure used at each level to be altered
without painful recoding merely by substituting a different Assoc ADT.

I mention this to support Guy S.'s point that there might be gains
from going to a more expressive type system than Hindley-Milner.
I don't know what that might be - I just know that the things that I
mentioned aren't handled as elegantly as I'd like.
My $0.02
Guy Argo





Re: Stupid Haskell question

1993-02-23 Thread wadler

Given your correction, I think that the type declaration

  data FooTree a b  =  Leaf b | Node a (FooTree a b) a (FooTree a b)

will handle things only a little less neatly than the use of
wrappers, and will allow you to use type inference in much the
way you wish.  What do you think?  Cheers,  -- P





Stupid Haskell question

1993-02-23 Thread Guy Steele

   Date: Tue, 23 Feb 93 17:18:19 GMT
   From: wadler [EMAIL PROTECTED]
   ...
   I don't understand.  Can't you handle this is as follows?

data FooTree a b  =  Leaf b | Node a (FooTree a b) a (FooTree a b)
data Annote a b c =  MkAnnote Info (FooTree (Annote a b c) b) c

   and then infer that what you are acting on is a

   FooTree (Annote a b c) b.

   The c field can be omitted, or it can be instantiated to additional
   annotation information later on.

   I'm sure you can hack around the problem, but it's not yet clear
   to me that there isn't an elegant solution.  Cheers,  -- P

This looks promising.   I'll have to ponder it some.
(Don't worry if you don't hear more from me for a while--
I'm out of town for a couple of weeks soon.)

--Guy




Stupid Haskell question

1993-02-23 Thread Guy Steele

   Date: Tue, 23 Feb 93 10:16:58 GMT
   From: wadler [EMAIL PROTECTED]

   Given your correction, I think that the type declaration

 data FooTree a b  =  Leaf b | Node a (FooTree a b) a (FooTree a b)

   will handle things only a little less neatly than the use of
   wrappers, and will allow you to use type inference in much the
   way you wish.  What do you think?  Cheers,  -- P

You're right!  That does solve the problem as I have stated it.

But now suppose that some of the annotations refer to FooTree items
(another thing I forgot to say).  I suppose I could do

 data FooTree a b  =  Leaf b | Node a (FooTree a b)
a (FooTree a b)
[FooTree a b]

including a list of goodies that the annotations could then refer to,
but I'm quickly losing my abstractions--the annotations are spread
out rather than being single items.

Well, I'll manage to hack around it somehow.  Thanks for helping!

--Guy




Re: Stupid Haskell question

1993-02-23 Thread wadler

Guy,  You write,

   But now suppose that some of the annotations refer to FooTree items
   (another thing I forgot to say).  I suppose I could do
   
data FooTree a b  =  Leaf b | Node a (FooTree a b)
a (FooTree a b)
[FooTree a b]
   
   including a list of goodies that the annotations could then refer to,
   but I'm quickly losing my abstractions--the annotations are spread
   out rather than being single items.

I don't understand.  Can't you handle this is as follows?

 data FooTree a b  =  Leaf b | Node a (FooTree a b) a (FooTree a b)
 data Annote a b c =  MkAnnote Info (FooTree (Annote a b c) b) c

and then infer that what you are acting on is a

FooTree (Annote a b c) b.

The c field can be omitted, or it can be instantiated to additional
annotation information later on.

I'm sure you can hack around the problem, but it's not yet clear
to me that there isn't an elegant solution.  Cheers,  -- P






Stupid Haskell question

1993-02-22 Thread Guy Steele

   Date: Mon, 22 Feb 93 14:28:47 GMT
   From: wadler [EMAIL PROTECTED]

   Guy asks the following (non-stupid) Haskell question, which I reply to
   below.  The question points out an area in the Haskell report that
   seems to be unclear; and a place where it might be worthwhile to change
   the design to be less conservative but more uniform.

   Guy's question:


   - Begin Included Message -

   From [EMAIL PROTECTED] Thu Feb 18 17:21:56 1993
   From: Guy Steele [EMAIL PROTECTED]
   Date: Thu, 18 Feb 93 12:20:44 EST
   To: wadler [EMAIL PROTECTED]
   Cc: [EMAIL PROTECTED]
   Subject: Re: Stupid Haskell question
   Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], 
   [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]


   Haskell theoretically allows recursive datatypes.  But the following
   example does not work (he said innocently).

   module Rec where

   data Unary a = Zero | Successor a

   f :: Unary z - [Unary z]
   f x = [x, Successor x]

   I think that the compiler ought to deduce the restriction
   x::q  where  q = Unary q.  It ought to be okay for q to be Unary q
   because "an algebraic datatype intervenes" (Haskell report, 4.2.2).

   But the Glasgow compiler says

   "/users/lang1/gls/Haskell/monads/Rec.hs", line 6:
   Type variable "a" occurs within the type "Unary a".
   In a list expression: [x, Successor x]

   and the Chalmers compiler says

   Errors:
   "/users/lang1/gls/Haskell/monads/Rec.hs", line 6, [63] unify1 (occurence)
   a
   and Unary a
in  (:) A1_f ((:) (Successor A1_f) ([]))
in f

   Now everything is okay if I write

   module Rec where

   data Unary a = Zero | Successor (Unary a)

   f :: Unary z - [Unary z]
   f x = [x, Successor x]

   but I have reasons in my actual code (which is hairy--this is a
   stripped-down example) not to force the data type to be recursive,
   but to let the type analysis deduce it where necessary.  Am I foolish
   to expect this?

   --Guy


   - End Included Message -

   Phil's response:

   Guy,

   Haskell requires that `an algebraic datatype intervenes' in order that
   all types can be written as a finite tree.  The type you refer to,

   q where q = Unary q,

   is an infinite tree (though a finite graph).  If we intended to allow
   such infinite solutions, we wouldn't need the restriction to algebraic
   datatypes at all.  This suggests we should clear up the wording in the
   Haskell report, so I've forwarded your question and this response to
   the Haskell mailing list.

   Why not allow cyclic types (i.e., any type expressible as a
   finite graph)?  It turns out there is a unification algorithm
   that works for finite graphs, so this is in theory possible.
   But the intent of Haskell was to be a conservative design, so
   we stuck with what we were familiar with.  Your example of
   a place where cyclic types are useful provides an impetus
   to step into the less familiar but more uniform territory.

Phil,

  Thanks for the reply.  I do think some clarification in the
report is called for because I suspect that there are some
implicit implementation-motivated assumptions about how types are
processed.  For example, I suspect there is an assumption that
type synonyms may be handled simply by substitution (inline
expansion), so to speak, whereas algebraic data types are not.
(If algebraic data types were substitutive, then recursive
algebraic types would be expressed as infinite trees after all.)

  So it is not so much that cyclic graphs are disallowed, as that
the user is notationally required to indicate where the graph is
to be cut so as to render it acyclic (and therefore notatable as a
tree, since sharing in a acyclic dag is irrelevant here).  This,
plus the inability to explicit construct type declarations
dynamically, means that all cycles in a type graph must be of
fixed size, determined by the static program text.  And this is
what is getting in my way.

  Here is a less stripped-down version of what I am trying
to accomplish.  I want a data structure that is a tree:

  data FooTree a = Leaf a | Node (FooTree a) (FooTree a)

But I want to be able to annotate these trees in various ways.
So I wrap an annotation structure around the recursion:

  data FooTree a = Leaf a | Node (FooTree (Wrapper a)) (FooTree (Wrapper a))

  data Wrapper a = Annotation a Int

Now sometimes I need different kinds of trees that have different
annotations.  But the basic tree structure is common to them all.
So I want to abstract over the annotation structures.  This leads me
to write:

  data FooTree a = Leaf a | Node t t

where my intent is that "t" is some complex type that may
eventually be discovered to involve "FooTree a".  Unfortunately,
"t" is not allowed on the right-hand side unless it appears on the
left-hand si

Re: Stupid Haskell question

1993-02-22 Thread wadler

Guy,

I agree that the report should be updated to express the restriction we
really have in mind.  Simon: as editor, this is your bailiwick!

I also think its neat that you seem to have found a use for cyclic
unification.  This is definitely an impetus to extend the language to
include cyclic types.  (I don't expect we'll do this for a while
though.  You might consider modifying the Glasgow Haskell compiler to
include this yourself -- it may not be too difficult.)

However, I am confused by some of your example.  You want to use a data
structure like

  data FooTree a = Leaf a | Node (FooTree (Wrapper a)) (FooTree (Wrapper a))
  data Wrapper a = Annotation a Int

This seems to add an additional level of annotation at every
Node.  Something zero nodes deep has zero annotations, something
five nodes deep has five annotations.  Is this really what you
want?

Because of the way type inference works in the Hindley-Milner system,
it is impossible to write a function that will act on values of the
type (FooTree a) as defined above.  (This is independent of the
additional complications you mention.) The reason is that every
instance of a function in a recursive definition must have the same
type as the function being defined.  But to define a function on
(FooTree a) you need a recursive instance of type (FooTree (Wrapper
a)).  Mycroft first suggested a type system that would allow such
functions to be typed, but I think it is still an open question as to
whether an inference algorithm exists for the type system.  (There was
a paper published that claimed an algorithm, but it was later withdrawn
as incorrect.)

So I hope you don't really need the types above, because then
we can't help you, even in the monomorphic case!

But would the simpler type perhaps work?

  data FooTree a b  =  Leaf b | Node a (FooTree a b) a (FooTree a b)

where a ranges over annotation, and b over leaf values?  Cheers,  -- P





Re: Stupid Haskell question

1993-02-22 Thread wadler

Guy asks the following (non-stupid) Haskell question, which I reply to
below.  The question points out an area in the Haskell report that
seems to be unclear; and a place where it might be worthwhile to change
the design to be less conservative but more uniform.

Guy's question:


- Begin Included Message -

From [EMAIL PROTECTED] Thu Feb 18 17:21:56 1993
From: Guy Steele [EMAIL PROTECTED]
Date: Thu, 18 Feb 93 12:20:44 EST
To: wadler [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: Stupid Haskell question
Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]


Haskell theoretically allows recursive datatypes.  But the following
example does not work (he said innocently).

module Rec where

data Unary a = Zero | Successor a

f :: Unary z - [Unary z]
f x = [x, Successor x]

I think that the compiler ought to deduce the restriction
x::q  where  q = Unary q.  It ought to be okay for q to be Unary q
because "an algebraic datatype intervenes" (Haskell report, 4.2.2).

But the Glasgow compiler says

"/users/lang1/gls/Haskell/monads/Rec.hs", line 6:
Type variable "a" occurs within the type "Unary a".
In a list expression: [x, Successor x]

and the Chalmers compiler says

Errors:
"/users/lang1/gls/Haskell/monads/Rec.hs", line 6, [63] unify1 (occurence)
a
and Unary a
 in  (:) A1_f ((:) (Successor A1_f) ([]))
 in f

Now everything is okay if I write

module Rec where

data Unary a = Zero | Successor (Unary a)

f :: Unary z - [Unary z]
f x = [x, Successor x]

but I have reasons in my actual code (which is hairy--this is a
stripped-down example) not to force the data type to be recursive,
but to let the type analysis deduce it where necessary.  Am I foolish
to expect this?

--Guy


- End Included Message -

Phil's response:

Guy,

Haskell requires that `an algebraic datatype intervenes' in order that
all types can be written as a finite tree.  The type you refer to,

q where q = Unary q,

is an infinite tree (though a finite graph).  If we intended to allow
such infinite solutions, we wouldn't need the restriction to algebraic
datatypes at all.  This suggests we should clear up the wording in the
Haskell report, so I've forwarded your question and this response to
the Haskell mailing list.

Why not allow cyclic types (i.e., any type expressible as a
finite graph)?  It turns out there is a unification algorithm
that works for finite graphs, so this is in theory possible.
But the intent of Haskell was to be a conservative design, so
we stuck with what we were familiar with.  Your example of
a place where cyclic types are useful provides an impetus
to step into the less familiar but more uniform territory.

Cheers,  -- P
  




Stupid Haskell question

1993-02-22 Thread Guy Steele


I blew it!  My example had a bad flaw.  See below.

   Date: Mon, 22 Feb 93 19:30:19 GMT
   From: wadler [EMAIL PROTECTED]

   Guy,

   I agree that the report should be updated to express the restriction we
   really have in mind.  Simon: as editor, this is your bailiwick!

   I also think its neat that you seem to have found a use for cyclic
   unification.  This is definitely an impetus to extend the language to
   include cyclic types.  (I don't expect we'll do this for a while
   though.  You might consider modifying the Glasgow Haskell compiler to
   include this yourself -- it may not be too difficult.)

   However, I am confused by some of your example.  You want to use a data
   structure like

 data FooTree a = Leaf a | Node (FooTree (Wrapper a)) (FooTree (Wrapper a))
 data Wrapper a = Annotation a Int

   This seems to add an additional level of annotation at every
   Node.  Something zero nodes deep has zero annotations, something
   five nodes deep has five annotations.  Is this really what you
   want?

Yes.  I want to annotate every non-root item of the tree.
Again, this is only a stripped-down example of what I'm
really trying to do.

   Because of the way type inference works in the Hindley-Milner system,
   it is impossible to write a function that will act on values of the
   type (FooTree a) as defined above.

Ooops!  My example has a bug in it.  What I meant to say was

  data FooTree a = Leaf a | Node (Wrapper (FooTree a)) (Wrapper (FooTree a))
  data Wrapper a = Annotation a Int

and so I want to be polymorphic over various tree types such as

  data FooTree a = Leaf a | Node (Wrapper1 (FooTree a))
 (Wrapper1 (FooTree a))

  data FooTree a = Leaf a | Node (Wrapper2 (FooTree a))
 (Wrapper2 (FooTree a))

  data FooTree a = Leaf a | Node (Wrapper1 (Wrapper2 (FooTree a)))
 (Wrapper1 (Wrapper2 (FooTree a)))

  data FooTree a = Leaf a | Node (Wrapper3 (Wrapper2 (FooTree a)))
 (Wrapper3 (Wrapper2 (FooTree a)))

and so on.  Roughly, think of the type of "f" typically looking like

f :: WrapperA (WrapperB (... (WrapperZ (FooTree a)) ...)) - FooTree a

and there are a whole bunch of different "f" functions of different types,
each intended to operate on trees with different sets of annotations.
I want treewalk to look very roughly like

treewalk f (Leaf x) = hackleaf x
treewalk f (Node p q) = hacknode (f p) (f q)

(actually it's much more complicated because f needs to process
annotations as well as extracting the subnode of type FooTree).
Many apologies for giving you a confusingly wrong example!
Thanks for taking me seriously.

--Guy