Re: [Haskell-cafe] [Newbie] What to improve in my code
Daniel Fischer daniel.is.fisc...@web.de writes: First of all: I'm not sure if this question is allowed here. If not, I apologize You might want to check out the haskell-beginners list, but IMO most questions are okay to post here. Just a couple of style issues Daniel didn't mention: process :: [Char] - [String] process str = words (map toLower (removePunctuation str)) It's a matter of taste, but I think this reads clearer if written: process = words . map toLower . removePunctuation unique :: (Eq a) = [a] - [a] unique [] = [] unique (x:xs) = [x] ++ unique (filter (\s - x /= s) xs) Also 'filter (\s - x /= s)' can be written as filter (x /=) import qualified Data.Map as Map import Data.List occurrenceCount'' :: Ord a = [a] - [(a,Int)] occurrenceCount'' xs = Map.toList $ foldl' (\mp x - Map.insertWith' (+) x 1 mp) Map.empty xs Note the primes here! This is perhaps my most common use of Map, and because of laziness, it is very easy to blow the stack. Although you really want to store an Int for each key, the default is to store an unevaluated computation, in this case a tower of (1+(1+(1+..))). The foldl' and insertWith' functions are stricter, and presumably Daniel gets this right (I'm never comfortable without testing this myself :). 3) The whole process as i'm doing it now feels pretty imperatively (been working for years as a Java / PHP programmer). I've got this feeling that the occurenceCount' function could be implemented using a mapping function. What ways are there to make this more functional? I don't think I agree with this sentiment - you're building a pipeline of functions, not setting variables or otherwise mixing state or other imperativeness. Why do you think it's imperative? -k -- If I haven't seen further, it is by standing in the footprints of giants ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [Newbie] What to improve in my code
On Tue, Jul 13, 2010 at 11:49 PM, Frank1981 frankdewe...@gmail.com wrote: First of all: I'm not sure if this question is allowed here. If not, I apologize I'm trying to solve the following problem: For each word in a text find the number of occurences for each unique word in the text. i've come up with the following steps to solve this: * remove all punctuation except for whitespace and make the text lowercase A minor point: instead of removing the punctuation, you maybe should convert it to whitespace. Otherwise in texts like there was a quick,brown fox (notice the missing space after the comma) you'll have the word quickbrown instead of 2 words quick and brown. David. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [Newbie] What to improve in my code
On Mon, Jul 19, 2010 at 9:24 AM, David Virebayre dav.vire+hask...@gmail.com wrote: A minor point: instead of removing the punctuation, you maybe should convert it to whitespace. Otherwise in texts like there was a quick,brown fox (notice the missing space after the comma) you'll have the word quickbrown instead of 2 words quick and brown. If you remove punctuation you - run the risk of joining two valid words into one invalid word: quick,brown - quickbrown - run the risk of converting one word into a different word: can't - cant won't - wont If you split at punctuation you create more semi-words: can't - can, t shouldn't - shouldn t It might be better regarding in-word apostrophes as letters in this case? -- Dougal Stanton dou...@dougalstanton.net // http://www.dougalstanton.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] [Newbie] What to improve in my code
First of all: I'm not sure if this question is allowed here. If not, I apologize I'm trying to solve the following problem: For each word in a text find the number of occurences for each unique word in the text. i've come up with the following steps to solve this: * remove all punctuation except for whitespace and make the text lowercase * find all unique words in the text * for each unique word, count the number of occurences. This has resulted in the following code: removePunctuation :: [Char] - [Char] removePunctuation str = filter (\c - elem c (['a'..'z'] ++ ['A'..'Z'] ++ ['\t', ' ', '\n'])) str process :: [Char] - [String] process str = words (map toLower (removePunctuation str)) unique :: (Eq a) = [a] - [a] unique [] = [] unique (x:xs) = [x] ++ unique (filter (\s - x /= s) xs) occurenceCount :: (Eq a) = a - [a] - Int occurenceCount _ [] = 0 occurenceCount x (y:ys) | x == y = 1 + occurenceCount x ys | otherwise = occurenceCount x ys occurenceCount' :: [String] - [String] - [(String, Int)] occurenceCount' [] _ = [(, 0)] occurenceCount' (u:us) xs = [(u, occurenceCount u xs)] ++ occurenceCount' us xs Please remember i've only been playing with Haskell for three afternoons now and i'm happy that the above code is working correctly. However i've got three questions: 1) occurenceCount' [] _ = [(, 0)] is plain ugly and also adds a useless tuple to the end result. Is there a better way to solve this? 2) I'm forcing elements into a singleton list on two occasions, both in my unique function and in my occurenceCount' function. Once again this seems ugly and I'm wondering if there is a better solution. 3) The whole process as i'm doing it now feels pretty imperatively (been working for years as a Java / PHP programmer). I've got this feeling that the occurenceCount' function could be implemented using a mapping function. What ways are there to make this more functional? -- View this message in context: http://old.nabble.com/-Newbie--What-to-improve-in-my-code-tp29156025p29156025.html Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] [Newbie] What to improve in my code
On Tuesday 13 July 2010 23:49:45, Frank1981 wrote: First of all: I'm not sure if this question is allowed here. If not, I apologize I'm trying to solve the following problem: For each word in a text find the number of occurences for each unique word in the text. i've come up with the following steps to solve this: * remove all punctuation except for whitespace and make the text lowercase * find all unique words in the text * for each unique word, count the number of occurences. This has resulted in the following code: removePunctuation :: [Char] - [Char] removePunctuation str = filter (\c - elem c (['a'..'z'] ++ ['A'..'Z'] ++ ['\t', ' ', '\n'])) str Depending on your criteria, maybe import Data.Char removePunctuation = filter (\c - isAlpha c || isSpace c) is better process :: [Char] - [String] process str = words (map toLower (removePunctuation str)) Or perhaps process = map (fiter isLower) . words . map toLower unique :: (Eq a) = [a] - [a] unique [] = [] unique (x:xs) = [x] ++ unique (filter (\s - x /= s) xs) import Data.List unique = nub but it's not particularly efficient. If you don't need to keep the order of first occurrence and have an Ord instance, you could take unique' = map head . group . sort or import qualified Data.Set as Set unique'' = Set.toList . Set.fromList occurenceCount :: (Eq a) = a - [a] - Int occurenceCount _ [] = 0 occurenceCount x (y:ys) | x == y = 1 + occurenceCount x ys | otherwise = occurenceCount x ys occurrenceCount a xs = length (filter (== a) xs) or occurrenceCount a = length . filter (== a) occurenceCount' :: [String] - [String] - [(String, Int)] occurenceCount' [] _ = [(, 0)] why not occurrenceCount' [] _ = [] ? occurenceCount' (u:us) xs = [(u, occurenceCount u xs)] ++ occurenceCount' us xs But it can be done shorter: import qualified Data.Map as Map import Data.List occurrenceCount'' :: Ord a = [a] - [(a,Int)] occurrenceCount'' xs = Map.toList $ foldl' (\mp x - Map.insertWith' (+) x 1 mp) Map.empty xs No need to get the unique elements up front. Please remember i've only been playing with Haskell for three afternoons now and i'm happy that the above code is working correctly. However i've got three questions: 1) occurenceCount' [] _ = [(, 0)] is plain ugly and also adds a useless tuple to the end result. Is there a better way to solve this? 2) I'm forcing elements into a singleton list on two occasions, both in my unique function and in my occurenceCount' function. Once again this seems ugly and I'm wondering if there is a better solution. Use (:), e.g. unique (x:xs) = x : unique (filter (/= x) xs) 3) The whole process as i'm doing it now feels pretty imperatively (been working for years as a Java / PHP programmer). I've got this feeling that the occurenceCount' function could be implemented using a mapping function. What ways are there to make this more functional? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe