Re: [Haskell-cafe] Re: Suspected stupid Haskell Question
On 10/17/07, Thomas Hartman [EMAIL PROTECTED] wrote: Is there a more scientific way of figuring out if one version is better than the other by using, say profiling tools? Profiling Haskell programs is black magic, but of the sort you learn by having a problem to solve. I don't think it requires special genius, just enough motivation. Profiling the interpreter my team created during the ICFP contest did that to me. The GHC heap profiler looked weird to me at first because you are required to convert its output to a PostScript file. However, it's well worth doing. There are several types of profiles, but you would probably be most interested in the biographical profile. The GHC documentation is pretty good and the article Heap Profiling for Space Efficiency[1] may also help. The article was written for the nhc compiler but the tools look the same. Performance profiling is easier - it's just dumped as text output when your program runs. GHC's documentation is really good here. One thing I've learned is to look for two things: 1) Functions that do allocations and execute many times 2) Functions that run lots of times #2 is pretty much universal for profiling, but #1 is unique to Haskell (and probably any pure functional language). Sadly none of these technique work for stack overflows. Or, more likely, I haven't learned how to spot them. Luckily the culprit is usually a fold that isn't strict enough. Albert's post about the Bird book is a good pointer. I just read that chapter myself last night, and he gives a very clear explanation of how lazy evaluation works (he calls it 'outermost reduction') and how strictness interacts with laziness. Hope that helps! Justin [1] http://citeseer.ist.psu.edu/runciman96heap.html ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Suspected stupid Haskell Question
Thomas Hartman wrote: Since I'm interested in the stack overflow issue, and getting acquainted with quickcheck, I thought I would take this opportunity to compare your ordTable with some code Yitzchak Gale posted earlier, against Ham's original problem. As far as I can tell, they're the same. They work on lists up to 10 element lists of strings, but on 10^6 size lists I lose patience waiting for them to finish. Is there a more scientific way of figuring out if one version is better than the other by using, say profiling tools? Or by reasoning about the code? No, measuring actual performance is the only way. t. import Data.List import qualified Data.Map as M import Control.Arrow import Test.QuickCheck import Test.GenTestData I couldn't find this one, so I was unable to test anything. [snip] Some ideas: - Your code doesn't contain a main function. Did you compile it? - Strings are lists; storing a string of n characters needs 12*n bytes on 32 bit architectures, and 24*n bytes with 64 bits. 1 million strings with 10 characters each will consume 120MB (or 240MB), without accounting for any overhead for the copying garbage collector. I expect that you can save some memory (and thus garbage collection time) by using Data.ByteString.ByteString (which uses about 32 + n (32 bits) or 64 + n bytes (64 bits), if I remember correctly). - On using Data.Map vs. sort and group: The main advantage of the first approach is that duplicates are eliminated as they are found. So in fact, if you have a list of only 100 different strings, your code will run in constant memory, assuming the list is built lazily. Your test case looks like there are only few duplicates, so I'd expect the Data.Map code to perform a bit worse than the Data.List one. But as I wrote above, profiling is the only way to find out for sure. Bertram ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Suspected stupid Haskell Question
Hi Chad, Chad Scherrer wrote: I think the stack overflows were happening because Map.insertWith isn't strict enough. Otherwise I think the code is the same. They are visibly almost identical - except that you do an extra lookup to get your strictness, while insertWith' has internal access and can do it in the same pass. So using insertWith' should be faster. But I would expect intTable to be faster, since it uses IntMap, I'm not sure if that's strictly true. I have never done any testing, but I get the feeling that the performance of IntMap may depend on the distribution of the keys. After some not so good experiences, I abandonned IntMap and just use Map. Anyone know? Also - I don't have insertWith' for IntMap. and there's no IntMap.insertWith' as of 6.6.1 The mtl package is now independent of GHC. Regards, Yitz ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Suspected stupid Haskell Question
But I would expect intTable to be faster, But if I understand correctly, intTable can only deal with integer keys, whereas BH's original question would have wanted string keys, and I can't see a way to convert string to int and back. t. Chad Scherrer [EMAIL PROTECTED] 10/17/2007 11:38 PM To Thomas Hartman/ext/[EMAIL PROTECTED] cc haskell-cafe@haskell.org, [EMAIL PROTECTED] Subject Re: [Haskell-cafe] Re: Suspected stupid Haskell Question Hmm, is insertWith' new? If I remember right, I think the stack overflows were happening because Map.insertWith isn't strict enough. Otherwise I think the code is the same. But I would expect intTable to be faster, since it uses IntMap, and there's no IntMap.insertWith' as of 6.6.1 (though it may be easy enough to add one). Chad On 10/17/07, Thomas Hartman [EMAIL PROTECTED] wrote: Since I'm interested in the stack overflow issue, and getting acquainted with quickcheck, I thought I would take this opportunity to compare your ordTable with some code Yitzchak Gale posted earlier, against Ham's original problem. As far as I can tell, they're the same. They work on lists up to 10 element lists of strings, but on 10^6 size lists I lose patience waiting for them to finish. Is there a more scientific way of figuring out if one version is better than the other by using, say profiling tools? Or by reasoning about the code? t. import Data.List import qualified Data.Map as M import Control.Arrow import Test.QuickCheck import Test.GenTestData import System.Random {- Is there a library function to take a list of Strings and return a list of ints showing how many times each String occurs in the list. So for example: [egg, egg, cheese] would return [2,1] -} testYitzGale n = do l - rgenBndStrRow (10,10) (10^n,10^n) -- 10 strings, strings are 10 chars long, works. craps out on 10^6. m - return $ freqFold l putStrLn $ map items: ++ ( show $ M.size m ) testCScherer n = do l - rgenBndStrRow (10,10) (10^n,10^n) -- same limitations as yitz gale code. m - return $ ordTable l putStrLn $ items: ++ ( show $ length m ) -- slow for big lists --freqArr = Prelude.map ( last length ) . group . sort -- yitz gale code. same as chad scherer code? it's simpler to understand, but is it as fast? freqFold :: [[Char]] - M.Map [Char] Int freqFold = foldl' g M.empty where g accum x = M.insertWith' (+) x 1 accum -- c scherer code. insists on ord. far as I can tell, same speed as yitz. ordTable :: (Ord a) = [a] - [(a,Int)] ordTable xs = M.assocs $! foldl' f M.empty xs where f m x = let m' = M.insertWith (+) x 1 m Just v = M.lookup x m' in v `seq` m' l = [egg,egg,cheese] -- other quickcheck stuff --prop_unchanged_by_reverse = \l - ( freqArr (l :: [[Char]]) ) == ( freqArr $ reverse l ) --prop_freqArr_eq_freqFold = \l - ( freqArr (l :: [[Char]]) == (freqFold l)) --test1 = quickCheck prop_unchanged_by_reverse --test2 = quickCheck prop_freqArr_eq_freqFold --- generate test data: genBndStrRow (minCols,maxCols) (minStrLen, maxStrLen) = rgen ( genBndLoL (minStrLen, maxStrLen) (minCols,maxCols) ) gen gen = do sg - newStdGen return $ generate 1 sg gen -- generator for a list with length between min and max genBndList :: Arbitrary a = (Int, Int) - Gen [a] genBndList (min,max) = do len - choose (min,max) vector len -- lists of lists --genBndLoL :: (Int, Int) - (Int, Int) - Gen [[a]] genBndLoL (min1,max1) (min2,max2) = do len1 - choose (min1,max1) len2 - choose (min2,max2) vec2 len1 len2 --vec2 :: Arbitrary a = Int - Int - Gen [[a]] vec2 n m = sequence [ vector m | i - [1..n] ] --- This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Suspected stupid Haskell Question
Thomas Hartman wrote: Since I'm interested in the stack overflow issue, and getting acquainted with quickcheck, I thought I would take this opportunity to compare your ordTable with some code Yitzchak Gale posted earlier, against Ham's original problem. As far as I can tell, they're the same. They work on lists up to 10 element lists of strings, but on 10^6 size lists I lose patience waiting for them to finish. Is there a more scientific way of figuring out if one version is better than the other by using, say profiling tools? Or by reasoning about the code? It can be reasoned. Some people know how to do it. No one has written up the method and theory properly. It is currently rather ad hoc. I want to write one in the future. Some of the knowledge is in: http://www.haskell.org/haskellwiki/Stack_overflow http://en.wikibooks.org/wiki/Haskell (Advanced Track, Haskell Performance) Richard Bird's Introduction to Functional Programming using Haskell, second edition (chapter 7 Efficiency, but also other chapters contain efficiency discussions) anything that adequately defines lazy evaluation (or whatever evaluation your favourite executor seems to use) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Suspected stupid Haskell Question
Is there a library function to take a list of Strings and return a list of ints showing how many times each String occurs in the list. So for example: [egg, egg, cheese] would return [2,1] I couldn't find anything on a search, or anything in the librarys. Thanks BH. -- View this message in context: http://www.nabble.com/Suspected-stupid-Haskell-Question-tf4639170.html#a13250044 Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Suspected stupid Haskell Question
BH, Is there a library function to take a list of Strings and return a list of ints showing how many times each String occurs in the list. So for example: [egg, egg, cheese] would return [2,1] freq xs = map length (group xs) HTH, Stefan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Suspected stupid Haskell Question
On 17/10/2007, Big_Ham [EMAIL PROTECTED] wrote: Is there a library function to take a list of Strings and return a list of ints showing how many times each String occurs in the list. So for example: [egg, egg, cheese] would return [2,1] I couldn't find anything on a search, or anything in the librarys. Thanks BH. No, but it is also trivial to create, with the 'group' function in Data.List. I'll stop there though, cos this could be a homework question. Cheers, D. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Suspected stupid Haskell Question
On 17/10/2007, Dougal Stanton [EMAIL PROTECTED] wrote: No, but it is also trivial to create, with the 'group' function in Data.List. I'll stop there though, cos this could be a homework question. It's just occurred to me that answering questions like these is a bit like the prisoner's dilemma. - If I give the full answer and no-one else does, then maybe I'm doing someone's homework for them? - If I just give clues and someone else gives the answer, it makes me look mean. :-( - If we all give the answer, everybody's happy and the blame (if it was a set question) is spread around a bit. - If we all answer with vague hints then it makes the list as a whole less useful and seem a bit arrogant. There's no way to win! :-) D. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Suspected stupid Haskell Question
you are indeed right Peter, that's what I was after, the frequency regardless of elements. It also doesn't matter if it outputs them as tuples, or as a separate list on their own because each value would belong to the first occurance of that element if you seem what I mean, so you could still tell what came from what. Peter Verswyvelen wrote: I'm a newbie here, so I'm not sure about my reply, but I think this is not the answer to his question. freq [egg, egg, cheese] indeed returns [2,1] but freq [egg, cheese, egg] returns [1,1,1] BH just mentioned he needed the frequenty of elements in the list, independent of their order. So in that case, the result should be a list of ordered pairs like: [(egg, 2), (cheese, 1)]. Or a pair of two lists, like ([egg, cheese), (2,1)]. Otherwise you would not know which frequency belongs to which element? I can't write this concisely nor efficient yet, but the following does the job: import Data.List freq xs = zip e f where s = sort xs e = nub s f = map length (group s) However, I suspect the experts here will be able to make that much shorter and more efficient (maybe using Data.Map?) Peter Stefan Holdermans wrote: BH, Is there a library function to take a list of Strings and return a list of ints showing how many times each String occurs in the list. So for example: [egg, egg, cheese] would return [2,1] freq xs = map length (group xs) HTH, Stefan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- View this message in context: http://www.nabble.com/Suspected-stupid-Haskell-Question-tf4639170.html#a13251343 Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Suspected stupid Haskell Question
Dougal Stanton wrote: It's just occurred to me that answering questions like these is a bit like the prisoner's dilemma... There's no way to win! :-) Yes there is. Just mention the following wiki page as part of your answer: http://haskell.org/haskellwiki/Homework_help -Yitz ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Suspected stupid Haskell Question
On 10/17/07, Peter Verswyvelen [EMAIL PROTECTED] wrote: So in that case, the result should be a list of ordered pairs like: [(egg, 2), (cheese, 1)]. Or a pair of two lists, like ([egg, cheese), (2,1)]. Otherwise you would not know which frequency belongs to which element? However, I suspect the experts here will be able to make that much shorter and more efficient (maybe using Data.Map?) import Control.Arrow import Data.List freqs = map (head length) . group . sort I have used this function quite a few times already. Stuart ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Suspected stupid Haskell Question
I'm a newbie here, so I'm not sure about my reply, but I think this is not the answer to his question. freq [egg, egg, cheese] indeed returns [2,1] but freq [egg, cheese, egg] returns [1,1,1] BH just mentioned he needed the frequenty of elements in the list, independent of their order. So in that case, the result should be a list of ordered pairs like: [(egg, 2), (cheese, 1)]. Or a pair of two lists, like ([egg, cheese), (2,1)]. Otherwise you would not know which frequency belongs to which element? I can't write this concisely nor efficient yet, but the following does the job: import Data.List freq xs = zip e f where s = sort xs e = nub s f = map length (group s) However, I suspect the experts here will be able to make that much shorter and more efficient (maybe using Data.Map?) Peter Stefan Holdermans wrote: BH, Is there a library function to take a list of Strings and return a list of ints showing how many times each String occurs in the list. So for example: [egg, egg, cheese] would return [2,1] freq xs = map length (group xs) HTH, Stefan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Suspected stupid Haskell Question
Peter Verswyvelen wrote: However, I suspect the experts here will be able to make that much shorter and more efficient (maybe using Data.Map?) That makes it difficult to respond. I am definitely not claiming to be an expert. For one thing, my name is not Simon. But I'll say something anyway, fwiw. The problem there is that nub is O(n^2). You're stuck with that if your type is not an instance of Ord (but then you can't use sort, either). When you can assume Ord, the standard solution is, as you suggest, something like: import qualified Data.Map as M import Data.List histogram = M.toList . foldl' (\m x - M.insertWith' (+) x 1 m) M.empty . M.fromList This should work efficiently, with the right amount of laziness, even for very large lists. Regards, Yitz ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Suspected stupid Haskell Question
Nice!!! As I'm learning Arrows now, this is really useful :-) Stuart Cook wrote: import Control.Arrow import Data.List freqs = map (head length) . group . sort I have used this function quite a few times already. Stuart ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Suspected stupid Haskell Question
I wrote: When you can assume Ord, the standard solution is, as you suggest, something like... Oops, sorry, doesn't typecheck. Here it is corrected: import qualified Data.Map as M import Data.List histogram = M.toList . foldl' (\m x - M.insertWith' (+) x 1 m) M.empty This should work efficiently, with the right amount of laziness, even for very large lists. Stuart's Arrows thing is much nicer when your list is small enough to be held in memory all at once. -Yitz ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Suspected stupid Haskell Question
The problem there is that nub is O(n^2). Is there a place where one can look up the complexity of Standard Libraries functions ? E.D ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Suspected stupid Haskell Question
Is there a place where one can look up the complexity of Standard Libraries functions ? No. Some modules have it in their Haddock docs. Most don't. But the source code is available. :) -Yitz ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Suspected stupid Haskell Question
Big_Ham joymachine2001 at hotmail.com writes: Is there a library function to take a list of Strings and return a list of ints showing how many times each String occurs in the list. So for example: [egg, egg, cheese] would return [2,1] I couldn't find anything on a search, or anything in the librarys. Thanks BH. Hi BH, This might be overkill, but it works well for me. And it avoid stack overflows I was originally getting for very large lists. Dean Herrington and I came up with this: ordTable :: (Ord a) = [a] - [(a,Int)] ordTable xs = Map.assocs $! foldl' f Map.empty xs where f m x = let m' = Map.insertWith (+) x 1 m Just v = Map.lookup x m' in v `seq` m' intTable :: [Int] - [(Int,Int)] intTable xs = IntMap.assocs $! foldl' f IntMap.empty xs where f m x = let m' = IntMap.insertWith (+) x 1 m Just v = IntMap.lookup x m' in v `seq` m' enumTable :: (Enum a) = [a] - [(a,Int)] enumTable = map fstToEnum . intTable . map fromEnum where fstToEnum (x,y) = (toEnum x, y) If you like, it's easily wrapped in a Table class. Chad ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Suspected stupid Haskell Question
Oh, why didn't you say you were learning Arrows? Then why not freqs = sort group map (head length) So much more readable, don't you think? ;) Either way, if you run into the dreaded monomorphism restriction: Ambiguous type variable `a' in the constraint: `Ord a' arising from use of `sort' at A.hs:6:40-43 Possible cause: the monomorphism restriction applied to the following: freqs :: [a] - [(a, Int)] (bound at A.hs:6:0) Probable fix: give these definition(s) an explicit type signature or use -fno-monomorphism-restriction you'll have to either add an explicit type annotation: freqs :: (Ord a) = [a] - [(a, Int)] or else throw an arg onto it: freqs x = map (head length) . group . sort $ x The latter hurts too much to write, so I always add the type. Peter Verswyvelen wrote: Nice!!! As I'm learning Arrows now, this is really useful :-) Stuart Cook wrote: import Control.Arrow import Data.List freqs = map (head length) . group . sort I have used this function quite a few times already. Stuart ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Suspected stupid Haskell Question
Since I'm interested in the stack overflow issue, and getting acquainted with quickcheck, I thought I would take this opportunity to compare your ordTable with some code Yitzchak Gale posted earlier, against Ham's original problem. As far as I can tell, they're the same. They work on lists up to 10 element lists of strings, but on 10^6 size lists I lose patience waiting for them to finish. Is there a more scientific way of figuring out if one version is better than the other by using, say profiling tools? Or by reasoning about the code? t. import Data.List import qualified Data.Map as M import Control.Arrow import Test.QuickCheck import Test.GenTestData import System.Random {- Is there a library function to take a list of Strings and return a list of ints showing how many times each String occurs in the list. So for example: [egg, egg, cheese] would return [2,1] -} testYitzGale n = do l - rgenBndStrRow (10,10) (10^n,10^n) -- 10 strings, strings are 10 chars long, works. craps out on 10^6. m - return $ freqFold l putStrLn $ map items: ++ ( show $ M.size m ) testCScherer n = do l - rgenBndStrRow (10,10) (10^n,10^n) -- same limitations as yitz gale code. m - return $ ordTable l putStrLn $ items: ++ ( show $ length m ) -- slow for big lists --freqArr = Prelude.map ( last length ) . group . sort -- yitz gale code. same as chad scherer code? it's simpler to understand, but is it as fast? freqFold :: [[Char]] - M.Map [Char] Int freqFold = foldl' g M.empty where g accum x = M.insertWith' (+) x 1 accum -- c scherer code. insists on ord. far as I can tell, same speed as yitz. ordTable :: (Ord a) = [a] - [(a,Int)] ordTable xs = M.assocs $! foldl' f M.empty xs where f m x = let m' = M.insertWith (+) x 1 m Just v = M.lookup x m' in v `seq` m' l = [egg,egg,cheese] -- other quickcheck stuff --prop_unchanged_by_reverse = \l - ( freqArr (l :: [[Char]]) ) == ( freqArr $ reverse l ) --prop_freqArr_eq_freqFold = \l - ( freqArr (l :: [[Char]]) == (freqFold l)) --test1 = quickCheck prop_unchanged_by_reverse --test2 = quickCheck prop_freqArr_eq_freqFold --- generate test data: genBndStrRow (minCols,maxCols) (minStrLen, maxStrLen) = rgen ( genBndLoL (minStrLen, maxStrLen) (minCols,maxCols) ) gen gen = do sg - newStdGen return $ generate 1 sg gen -- generator for a list with length between min and max genBndList :: Arbitrary a = (Int, Int) - Gen [a] genBndList (min,max) = do len - choose (min,max) vector len -- lists of lists --genBndLoL :: (Int, Int) - (Int, Int) - Gen [[a]] genBndLoL (min1,max1) (min2,max2) = do len1 - choose (min1,max1) len2 - choose (min2,max2) vec2 len1 len2 --vec2 :: Arbitrary a = Int - Int - Gen [[a]] vec2 n m = sequence [ vector m | i - [1..n] ] Chad Scherrer [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 10/17/2007 01:35 PM To haskell-cafe@haskell.org cc Subject [Haskell-cafe] Re: Suspected stupid Haskell Question Big_Ham joymachine2001 at hotmail.com writes: Is there a library function to take a list of Strings and return a list of ints showing how many times each String occurs in the list. So for example: [egg, egg, cheese] would return [2,1] I couldn't find anything on a search, or anything in the librarys. Thanks BH. Hi BH, This might be overkill, but it works well for me. And it avoid stack overflows I was originally getting for very large lists. Dean Herrington and I came up with this: ordTable :: (Ord a) = [a] - [(a,Int)] ordTable xs = Map.assocs $! foldl' f Map.empty xs where f m x = let m' = Map.insertWith (+) x 1 m Just v = Map.lookup x m' in v `seq` m' intTable :: [Int] - [(Int,Int)] intTable xs = IntMap.assocs $! foldl' f IntMap.empty xs where f m x = let m' = IntMap.insertWith (+) x 1 m Just v = IntMap.lookup x m' in v `seq` m' enumTable :: (Enum a) = [a] - [(a,Int)] enumTable = map fstToEnum . intTable . map fromEnum where fstToEnum (x,y) = (toEnum x, y) If you like, it's easily wrapped in a Table class. Chad ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe --- This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Suspected stupid Haskell Question
Hmm, is insertWith' new? If I remember right, I think the stack overflows were happening because Map.insertWith isn't strict enough. Otherwise I think the code is the same. But I would expect intTable to be faster, since it uses IntMap, and there's no IntMap.insertWith' as of 6.6.1 (though it may be easy enough to add one). Chad On 10/17/07, Thomas Hartman [EMAIL PROTECTED] wrote: Since I'm interested in the stack overflow issue, and getting acquainted with quickcheck, I thought I would take this opportunity to compare your ordTable with some code Yitzchak Gale posted earlier, against Ham's original problem. As far as I can tell, they're the same. They work on lists up to 10 element lists of strings, but on 10^6 size lists I lose patience waiting for them to finish. Is there a more scientific way of figuring out if one version is better than the other by using, say profiling tools? Or by reasoning about the code? t. import Data.List import qualified Data.Map as M import Control.Arrow import Test.QuickCheck import Test.GenTestData import System.Random {- Is there a library function to take a list of Strings and return a list of ints showing how many times each String occurs in the list. So for example: [egg, egg, cheese] would return [2,1] -} testYitzGale n = do l - rgenBndStrRow (10,10) (10^n,10^n) -- 10 strings, strings are 10 chars long, works. craps out on 10^6. m - return $ freqFold l putStrLn $ map items: ++ ( show $ M.size m ) testCScherer n = do l - rgenBndStrRow (10,10) (10^n,10^n) -- same limitations as yitz gale code. m - return $ ordTable l putStrLn $ items: ++ ( show $ length m ) -- slow for big lists --freqArr = Prelude.map ( last length ) . group . sort -- yitz gale code. same as chad scherer code? it's simpler to understand, but is it as fast? freqFold :: [[Char]] - M.Map [Char] Int freqFold = foldl' g M.empty where g accum x = M.insertWith' (+) x 1 accum -- c scherer code. insists on ord. far as I can tell, same speed as yitz. ordTable :: (Ord a) = [a] - [(a,Int)] ordTable xs = M.assocs $! foldl' f M.empty xs where f m x = let m' = M.insertWith (+) x 1 m Just v = M.lookup x m' in v `seq` m' l = [egg,egg,cheese] -- other quickcheck stuff --prop_unchanged_by_reverse = \l - ( freqArr (l :: [[Char]]) ) == ( freqArr $ reverse l ) --prop_freqArr_eq_freqFold = \l - ( freqArr (l :: [[Char]]) == (freqFold l)) --test1 = quickCheck prop_unchanged_by_reverse --test2 = quickCheck prop_freqArr_eq_freqFold --- generate test data: genBndStrRow (minCols,maxCols) (minStrLen, maxStrLen) = rgen ( genBndLoL (minStrLen, maxStrLen) (minCols,maxCols) ) gen gen = do sg - newStdGen return $ generate 1 sg gen -- generator for a list with length between min and max genBndList :: Arbitrary a = (Int, Int) - Gen [a] genBndList (min,max) = do len - choose (min,max) vector len -- lists of lists --genBndLoL :: (Int, Int) - (Int, Int) - Gen [[a]] genBndLoL (min1,max1) (min2,max2) = do len1 - choose (min1,max1) len2 - choose (min2,max2) vec2 len1 len2 --vec2 :: Arbitrary a = Int - Int - Gen [[a]] vec2 n m = sequence [ vector m | i - [1..n] ] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Stupid Haskell question
I also think its neat that you seem to have found a use for cyclic unification. This is definitely an impetus to extend the language to include cyclic types. (I don't expect we'll do this for a while though. You might consider modifying the Glasgow Haskell compiler to include this yourself -- it may not be too difficult.) I'm not sure with it's possible to do this with the substitution algorithm used in our compiler, I thought the occurs check helped preserve the idempotency of substitions? It's certainly possible in the graph-based algorithm I describe in my 1991 Glasgow FP workshop paper (in fact, you get this "for free" and end up adding an occurs check just to match the normal algorithm!). The paper describes [informally] how to transform a substitution algorithm written using monads (such as the one in our compiler) into the graph version. You can also use a parallel algorithm if you use the right monad. Apart from the changes to the underlying monad, the differences are quite minor. The paper is (still) a draft version, so may be buggy, but please ask if you'd like a copy. Oh if you did this, you'd also need to change the type output routines, and you might like to allow explicitly cyclic types in type signatures [may as well as existential types while you're there, too :-) -- it's a pain when not all expressions can be given a type signature]. Apart from the implementation (which doesn't seem to be a problem if the right alg. is used[*]) does anyone know of more subtle problems with cyclic types [such as not being able to define the type system using the traditional sequent style]? Is this a well-studied area? Kevin [*] Though if substitions couldn't be used, I wouldn't like to have to reimplement the type checkers in all existing Haskell compilers.
Re: Stupid Haskell question
Guy S., Phil W.,... I ran into exactly this problem in two different applications. The first was the same that Guy S. points out, namely adding arbtrirary but well-typed annotations to a parse-tree. The solution I eventually ended up using (after discussing it with John Hughes) was the following: data FooTree a b = Leaf b | Node (AnnotFooTree a b) (AnnotFooTree a b) type AnnotFooTree a b = (a, FooTree a b) originally I'd tried: data FooTree a b = Leaf a | Node (a (FooTree b)) (a (FooTree b)) which was of course rejected by the Hindley-Milner typechecker. I recall David Murphy saying that I could have done that in Isabelle but it involved a much more powerful type-checker. BTW, Simon PJ and David Lester hit on exactly the same solution in their paper on fully lazy lambda-lifting so presumably they'd given this problem some thought too. The other application was Tries. Imagine we have some type: data Assoc a b = ... which maps objects of type a to objects of type b. We don't care if it's implemented as a tree, list, array or whatever provided there's a standard interface. Now imagine that our applications involves keys which are variable length lists/strings of as. Using the above type instantiating a as [a] will cause increasingly costly comparisons as the search gets closer to its goal. Trie structures avoid this by matching one segment of the key at a time. For instance rather than storing the following "guy argo" | "guy steele" it's stored as: 'g' - 'u' - 'y' - ' ' - 'a' - 'r' - 'g' - 'o' | 's' - 't' - 'e' - 'e' - 'l' - 'e' (n.b. Compressed tries go one step further by collapsing the one-way branching to give: "guy " - "argo" | "steele") Anyway, given the above Assoc type a very natural encoding of tries is: type Trie a b = Assoc a (Trie a b) The operations on Tries can now be built out of the Assoc operations which allows the lookup structure used at each level to be altered without painful recoding merely by substituting a different Assoc ADT. I mention this to support Guy S.'s point that there might be gains from going to a more expressive type system than Hindley-Milner. I don't know what that might be - I just know that the things that I mentioned aren't handled as elegantly as I'd like. My $0.02 Guy Argo
Re: Stupid Haskell question
Given your correction, I think that the type declaration data FooTree a b = Leaf b | Node a (FooTree a b) a (FooTree a b) will handle things only a little less neatly than the use of wrappers, and will allow you to use type inference in much the way you wish. What do you think? Cheers, -- P
Stupid Haskell question
Date: Tue, 23 Feb 93 17:18:19 GMT From: wadler [EMAIL PROTECTED] ... I don't understand. Can't you handle this is as follows? data FooTree a b = Leaf b | Node a (FooTree a b) a (FooTree a b) data Annote a b c = MkAnnote Info (FooTree (Annote a b c) b) c and then infer that what you are acting on is a FooTree (Annote a b c) b. The c field can be omitted, or it can be instantiated to additional annotation information later on. I'm sure you can hack around the problem, but it's not yet clear to me that there isn't an elegant solution. Cheers, -- P This looks promising. I'll have to ponder it some. (Don't worry if you don't hear more from me for a while-- I'm out of town for a couple of weeks soon.) --Guy
Stupid Haskell question
Date: Tue, 23 Feb 93 10:16:58 GMT From: wadler [EMAIL PROTECTED] Given your correction, I think that the type declaration data FooTree a b = Leaf b | Node a (FooTree a b) a (FooTree a b) will handle things only a little less neatly than the use of wrappers, and will allow you to use type inference in much the way you wish. What do you think? Cheers, -- P You're right! That does solve the problem as I have stated it. But now suppose that some of the annotations refer to FooTree items (another thing I forgot to say). I suppose I could do data FooTree a b = Leaf b | Node a (FooTree a b) a (FooTree a b) [FooTree a b] including a list of goodies that the annotations could then refer to, but I'm quickly losing my abstractions--the annotations are spread out rather than being single items. Well, I'll manage to hack around it somehow. Thanks for helping! --Guy
Re: Stupid Haskell question
Guy, You write, But now suppose that some of the annotations refer to FooTree items (another thing I forgot to say). I suppose I could do data FooTree a b = Leaf b | Node a (FooTree a b) a (FooTree a b) [FooTree a b] including a list of goodies that the annotations could then refer to, but I'm quickly losing my abstractions--the annotations are spread out rather than being single items. I don't understand. Can't you handle this is as follows? data FooTree a b = Leaf b | Node a (FooTree a b) a (FooTree a b) data Annote a b c = MkAnnote Info (FooTree (Annote a b c) b) c and then infer that what you are acting on is a FooTree (Annote a b c) b. The c field can be omitted, or it can be instantiated to additional annotation information later on. I'm sure you can hack around the problem, but it's not yet clear to me that there isn't an elegant solution. Cheers, -- P
Stupid Haskell question
Date: Mon, 22 Feb 93 14:28:47 GMT From: wadler [EMAIL PROTECTED] Guy asks the following (non-stupid) Haskell question, which I reply to below. The question points out an area in the Haskell report that seems to be unclear; and a place where it might be worthwhile to change the design to be less conservative but more uniform. Guy's question: - Begin Included Message - From [EMAIL PROTECTED] Thu Feb 18 17:21:56 1993 From: Guy Steele [EMAIL PROTECTED] Date: Thu, 18 Feb 93 12:20:44 EST To: wadler [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: Stupid Haskell question Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] Haskell theoretically allows recursive datatypes. But the following example does not work (he said innocently). module Rec where data Unary a = Zero | Successor a f :: Unary z - [Unary z] f x = [x, Successor x] I think that the compiler ought to deduce the restriction x::q where q = Unary q. It ought to be okay for q to be Unary q because "an algebraic datatype intervenes" (Haskell report, 4.2.2). But the Glasgow compiler says "/users/lang1/gls/Haskell/monads/Rec.hs", line 6: Type variable "a" occurs within the type "Unary a". In a list expression: [x, Successor x] and the Chalmers compiler says Errors: "/users/lang1/gls/Haskell/monads/Rec.hs", line 6, [63] unify1 (occurence) a and Unary a in (:) A1_f ((:) (Successor A1_f) ([])) in f Now everything is okay if I write module Rec where data Unary a = Zero | Successor (Unary a) f :: Unary z - [Unary z] f x = [x, Successor x] but I have reasons in my actual code (which is hairy--this is a stripped-down example) not to force the data type to be recursive, but to let the type analysis deduce it where necessary. Am I foolish to expect this? --Guy - End Included Message - Phil's response: Guy, Haskell requires that `an algebraic datatype intervenes' in order that all types can be written as a finite tree. The type you refer to, q where q = Unary q, is an infinite tree (though a finite graph). If we intended to allow such infinite solutions, we wouldn't need the restriction to algebraic datatypes at all. This suggests we should clear up the wording in the Haskell report, so I've forwarded your question and this response to the Haskell mailing list. Why not allow cyclic types (i.e., any type expressible as a finite graph)? It turns out there is a unification algorithm that works for finite graphs, so this is in theory possible. But the intent of Haskell was to be a conservative design, so we stuck with what we were familiar with. Your example of a place where cyclic types are useful provides an impetus to step into the less familiar but more uniform territory. Phil, Thanks for the reply. I do think some clarification in the report is called for because I suspect that there are some implicit implementation-motivated assumptions about how types are processed. For example, I suspect there is an assumption that type synonyms may be handled simply by substitution (inline expansion), so to speak, whereas algebraic data types are not. (If algebraic data types were substitutive, then recursive algebraic types would be expressed as infinite trees after all.) So it is not so much that cyclic graphs are disallowed, as that the user is notationally required to indicate where the graph is to be cut so as to render it acyclic (and therefore notatable as a tree, since sharing in a acyclic dag is irrelevant here). This, plus the inability to explicit construct type declarations dynamically, means that all cycles in a type graph must be of fixed size, determined by the static program text. And this is what is getting in my way. Here is a less stripped-down version of what I am trying to accomplish. I want a data structure that is a tree: data FooTree a = Leaf a | Node (FooTree a) (FooTree a) But I want to be able to annotate these trees in various ways. So I wrap an annotation structure around the recursion: data FooTree a = Leaf a | Node (FooTree (Wrapper a)) (FooTree (Wrapper a)) data Wrapper a = Annotation a Int Now sometimes I need different kinds of trees that have different annotations. But the basic tree structure is common to them all. So I want to abstract over the annotation structures. This leads me to write: data FooTree a = Leaf a | Node t t where my intent is that "t" is some complex type that may eventually be discovered to involve "FooTree a". Unfortunately, "t" is not allowed on the right-hand side unless it appears on the left-hand si
Re: Stupid Haskell question
Guy, I agree that the report should be updated to express the restriction we really have in mind. Simon: as editor, this is your bailiwick! I also think its neat that you seem to have found a use for cyclic unification. This is definitely an impetus to extend the language to include cyclic types. (I don't expect we'll do this for a while though. You might consider modifying the Glasgow Haskell compiler to include this yourself -- it may not be too difficult.) However, I am confused by some of your example. You want to use a data structure like data FooTree a = Leaf a | Node (FooTree (Wrapper a)) (FooTree (Wrapper a)) data Wrapper a = Annotation a Int This seems to add an additional level of annotation at every Node. Something zero nodes deep has zero annotations, something five nodes deep has five annotations. Is this really what you want? Because of the way type inference works in the Hindley-Milner system, it is impossible to write a function that will act on values of the type (FooTree a) as defined above. (This is independent of the additional complications you mention.) The reason is that every instance of a function in a recursive definition must have the same type as the function being defined. But to define a function on (FooTree a) you need a recursive instance of type (FooTree (Wrapper a)). Mycroft first suggested a type system that would allow such functions to be typed, but I think it is still an open question as to whether an inference algorithm exists for the type system. (There was a paper published that claimed an algorithm, but it was later withdrawn as incorrect.) So I hope you don't really need the types above, because then we can't help you, even in the monomorphic case! But would the simpler type perhaps work? data FooTree a b = Leaf b | Node a (FooTree a b) a (FooTree a b) where a ranges over annotation, and b over leaf values? Cheers, -- P
Re: Stupid Haskell question
Guy asks the following (non-stupid) Haskell question, which I reply to below. The question points out an area in the Haskell report that seems to be unclear; and a place where it might be worthwhile to change the design to be less conservative but more uniform. Guy's question: - Begin Included Message - From [EMAIL PROTECTED] Thu Feb 18 17:21:56 1993 From: Guy Steele [EMAIL PROTECTED] Date: Thu, 18 Feb 93 12:20:44 EST To: wadler [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: Stupid Haskell question Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] Haskell theoretically allows recursive datatypes. But the following example does not work (he said innocently). module Rec where data Unary a = Zero | Successor a f :: Unary z - [Unary z] f x = [x, Successor x] I think that the compiler ought to deduce the restriction x::q where q = Unary q. It ought to be okay for q to be Unary q because "an algebraic datatype intervenes" (Haskell report, 4.2.2). But the Glasgow compiler says "/users/lang1/gls/Haskell/monads/Rec.hs", line 6: Type variable "a" occurs within the type "Unary a". In a list expression: [x, Successor x] and the Chalmers compiler says Errors: "/users/lang1/gls/Haskell/monads/Rec.hs", line 6, [63] unify1 (occurence) a and Unary a in (:) A1_f ((:) (Successor A1_f) ([])) in f Now everything is okay if I write module Rec where data Unary a = Zero | Successor (Unary a) f :: Unary z - [Unary z] f x = [x, Successor x] but I have reasons in my actual code (which is hairy--this is a stripped-down example) not to force the data type to be recursive, but to let the type analysis deduce it where necessary. Am I foolish to expect this? --Guy - End Included Message - Phil's response: Guy, Haskell requires that `an algebraic datatype intervenes' in order that all types can be written as a finite tree. The type you refer to, q where q = Unary q, is an infinite tree (though a finite graph). If we intended to allow such infinite solutions, we wouldn't need the restriction to algebraic datatypes at all. This suggests we should clear up the wording in the Haskell report, so I've forwarded your question and this response to the Haskell mailing list. Why not allow cyclic types (i.e., any type expressible as a finite graph)? It turns out there is a unification algorithm that works for finite graphs, so this is in theory possible. But the intent of Haskell was to be a conservative design, so we stuck with what we were familiar with. Your example of a place where cyclic types are useful provides an impetus to step into the less familiar but more uniform territory. Cheers, -- P
Stupid Haskell question
I blew it! My example had a bad flaw. See below. Date: Mon, 22 Feb 93 19:30:19 GMT From: wadler [EMAIL PROTECTED] Guy, I agree that the report should be updated to express the restriction we really have in mind. Simon: as editor, this is your bailiwick! I also think its neat that you seem to have found a use for cyclic unification. This is definitely an impetus to extend the language to include cyclic types. (I don't expect we'll do this for a while though. You might consider modifying the Glasgow Haskell compiler to include this yourself -- it may not be too difficult.) However, I am confused by some of your example. You want to use a data structure like data FooTree a = Leaf a | Node (FooTree (Wrapper a)) (FooTree (Wrapper a)) data Wrapper a = Annotation a Int This seems to add an additional level of annotation at every Node. Something zero nodes deep has zero annotations, something five nodes deep has five annotations. Is this really what you want? Yes. I want to annotate every non-root item of the tree. Again, this is only a stripped-down example of what I'm really trying to do. Because of the way type inference works in the Hindley-Milner system, it is impossible to write a function that will act on values of the type (FooTree a) as defined above. Ooops! My example has a bug in it. What I meant to say was data FooTree a = Leaf a | Node (Wrapper (FooTree a)) (Wrapper (FooTree a)) data Wrapper a = Annotation a Int and so I want to be polymorphic over various tree types such as data FooTree a = Leaf a | Node (Wrapper1 (FooTree a)) (Wrapper1 (FooTree a)) data FooTree a = Leaf a | Node (Wrapper2 (FooTree a)) (Wrapper2 (FooTree a)) data FooTree a = Leaf a | Node (Wrapper1 (Wrapper2 (FooTree a))) (Wrapper1 (Wrapper2 (FooTree a))) data FooTree a = Leaf a | Node (Wrapper3 (Wrapper2 (FooTree a))) (Wrapper3 (Wrapper2 (FooTree a))) and so on. Roughly, think of the type of "f" typically looking like f :: WrapperA (WrapperB (... (WrapperZ (FooTree a)) ...)) - FooTree a and there are a whole bunch of different "f" functions of different types, each intended to operate on trees with different sets of annotations. I want treewalk to look very roughly like treewalk f (Leaf x) = hackleaf x treewalk f (Node p q) = hacknode (f p) (f q) (actually it's much more complicated because f needs to process annotations as well as extracting the subnode of type FooTree). Many apologies for giving you a confusingly wrong example! Thanks for taking me seriously. --Guy