Re: [Haskell-cafe] [Newbie] What to improve in my code

2010-07-19 Thread Ketil Malde
Daniel Fischer daniel.is.fisc...@web.de writes:

 First of all: I'm not sure if this question is allowed here. If not, I
 apologize

You might want to check out the haskell-beginners list, but IMO most
questions are okay to post here.

Just a couple of style issues Daniel didn't mention:

 process :: [Char] - [String]
 process str = words (map toLower (removePunctuation str))

It's a matter of taste, but I think this reads clearer if written:

process = words . map toLower . removePunctuation

 unique :: (Eq a) = [a] - [a]
 unique [] = []
 unique (x:xs) = [x] ++ unique (filter (\s - x /= s) xs)

Also 'filter (\s - x /= s)' can be written as

 filter (x /=)

 import qualified Data.Map as Map
 import Data.List

 occurrenceCount'' :: Ord a = [a] - [(a,Int)]
 occurrenceCount'' xs = Map.toList $ 
   foldl' (\mp x - Map.insertWith' (+) x 1 mp) Map.empty xs

Note the primes here!  This is perhaps my most common use of Map, and
because of laziness, it is very easy to blow the stack.  Although you
really want to store an Int for each key, the default is to store an
unevaluated computation, in this case a tower of (1+(1+(1+..))).  The
foldl' and insertWith' functions are stricter, and presumably Daniel
gets this right (I'm never comfortable without testing this myself :).

 3) The whole process as i'm doing it now feels pretty imperatively (been
 working for years as a Java / PHP programmer). I've got this feeling
 that the occurenceCount' function could be implemented using a mapping
 function. What ways are there to make this more functional?

I don't think I agree with this sentiment - you're building a pipeline
of functions, not setting variables or otherwise mixing state or other
imperativeness.  Why do you think it's imperative? 

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] [Newbie] What to improve in my code

2010-07-19 Thread David Virebayre
On Tue, Jul 13, 2010 at 11:49 PM, Frank1981 frankdewe...@gmail.com wrote:

 First of all: I'm not sure if this question is allowed here. If not, I
 apologize

 I'm trying to solve the following problem: For each word in a text find the
 number of occurences for each unique word in the text.

 i've come up with the following steps to solve this:
  * remove all punctuation except for whitespace and make the text lowercase

A minor point: instead of removing the punctuation, you maybe should
convert it to whitespace.

Otherwise in texts like there was a quick,brown fox (notice the
missing space after the comma) you'll have the word quickbrown
instead of 2 words quick and brown.

David.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] [Newbie] What to improve in my code

2010-07-19 Thread Dougal Stanton
On Mon, Jul 19, 2010 at 9:24 AM, David Virebayre
dav.vire+hask...@gmail.com wrote:

 A minor point: instead of removing the punctuation, you maybe should
 convert it to whitespace.

 Otherwise in texts like there was a quick,brown fox (notice the
 missing space after the comma) you'll have the word quickbrown
 instead of 2 words quick and brown.

If you remove punctuation you

- run the risk of joining two valid words into one invalid word:
  quick,brown - quickbrown

- run the risk of converting one word into a different word:
 can't - cant
 won't - wont

If you split at punctuation you create more semi-words:
 can't - can, t
 shouldn't - shouldn t

It might be better regarding in-word apostrophes as letters in this case?

-- 
Dougal Stanton
dou...@dougalstanton.net // http://www.dougalstanton.net
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] [Newbie] What to improve in my code

2010-07-13 Thread Frank1981

First of all: I'm not sure if this question is allowed here. If not, I
apologize

I'm trying to solve the following problem: For each word in a text find the
number of occurences for each unique word in the text.

i've come up with the following steps to solve this:
 * remove all punctuation except for whitespace and make the text lowercase
 * find all unique words in the text
 * for each unique word, count the number of occurences.

This has resulted in the following code:
removePunctuation :: [Char] - [Char]
removePunctuation str = filter (\c - elem c (['a'..'z'] ++ ['A'..'Z'] ++
['\t', ' ', '\n'])) str


process :: [Char] - [String]
process str = words (map toLower (removePunctuation str))

unique :: (Eq a) = [a] - [a]
unique [] = []
unique (x:xs) = [x] ++ unique (filter (\s - x /= s) xs)

occurenceCount :: (Eq a) = a - [a] - Int
occurenceCount _ [] = 0
occurenceCount x (y:ys)
| x == y = 1 + occurenceCount x ys
| otherwise = occurenceCount x ys

occurenceCount' :: [String] - [String] - [(String, Int)]
occurenceCount' [] _ = [(, 0)]
occurenceCount' (u:us) xs = [(u, occurenceCount u xs)] ++ occurenceCount' us
xs

Please remember i've only been playing with Haskell for three afternoons now
and i'm happy that the above code is working correctly.

However i've got three questions:
1) occurenceCount' [] _ = [(, 0)] is plain ugly and also adds a useless
tuple to the end result. Is there a better way to solve this?
2) I'm forcing elements into a singleton list on two occasions, both in my
unique function and in my occurenceCount' function. Once again this seems
ugly and I'm wondering if there is a better solution.
3) The whole process as i'm doing it now feels pretty imperatively (been
working for years as a Java / PHP programmer). I've got this feeling that
the occurenceCount' function could be implemented using a mapping function.
What ways are there to make this more functional?
  
-- 
View this message in context: 
http://old.nabble.com/-Newbie--What-to-improve-in-my-code-tp29156025p29156025.html
Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] [Newbie] What to improve in my code

2010-07-13 Thread Daniel Fischer
On Tuesday 13 July 2010 23:49:45, Frank1981 wrote:
 First of all: I'm not sure if this question is allowed here. If not, I
 apologize

 I'm trying to solve the following problem: For each word in a text find
 the number of occurences for each unique word in the text.

 i've come up with the following steps to solve this:
  * remove all punctuation except for whitespace and make the text
 lowercase * find all unique words in the text
  * for each unique word, count the number of occurences.

 This has resulted in the following code:
 removePunctuation :: [Char] - [Char]
 removePunctuation str = filter (\c - elem c (['a'..'z'] ++ ['A'..'Z']
 ++ ['\t', ' ', '\n'])) str

Depending on your criteria, maybe

import Data.Char

removePunctuation = filter (\c - isAlpha c || isSpace c)

is better


 process :: [Char] - [String]
 process str = words (map toLower (removePunctuation str))

Or perhaps

process = map (fiter isLower) . words . map toLower


 unique :: (Eq a) = [a] - [a]
 unique [] = []
 unique (x:xs) = [x] ++ unique (filter (\s - x /= s) xs)

import Data.List

unique = nub

but it's not particularly efficient.
If you don't need to keep the order of first occurrence and have an Ord 
instance, you could take

unique' = map head . group . sort

or

import qualified Data.Set as Set

unique'' = Set.toList . Set.fromList


 occurenceCount :: (Eq a) = a - [a] - Int
 occurenceCount _ [] = 0
 occurenceCount x (y:ys)

   | x == y = 1 + occurenceCount x ys
   | otherwise = occurenceCount x ys

occurrenceCount a xs = length (filter (== a) xs)

or

occurrenceCount a = length . filter (== a)


 occurenceCount' :: [String] - [String] - [(String, Int)]
 occurenceCount' [] _ = [(, 0)]

why not occurrenceCount' [] _ = [] ?

 occurenceCount' (u:us) xs = [(u, occurenceCount u xs)] ++
 occurenceCount' us xs

But it can be done shorter:

import qualified Data.Map as Map
import Data.List

occurrenceCount'' :: Ord a = [a] - [(a,Int)]
occurrenceCount'' xs = Map.toList $ 
  foldl' (\mp x - Map.insertWith' (+) x 1 mp) Map.empty xs

No need to get the unique elements up front.


 Please remember i've only been playing with Haskell for three afternoons
 now and i'm happy that the above code is working correctly.

 However i've got three questions:
 1) occurenceCount' [] _ = [(, 0)] is plain ugly and also adds a
 useless tuple to the end result. Is there a better way to solve this?
 2) I'm forcing elements into a singleton list on two occasions, both in
 my unique function and in my occurenceCount' function. Once again this
 seems ugly and I'm wondering if there is a better solution.

Use (:), e.g.

unique (x:xs) = x : unique (filter (/= x) xs)

 3) The whole process as i'm doing it now feels pretty imperatively (been
 working for years as a Java / PHP programmer). I've got this feeling
 that the occurenceCount' function could be implemented using a mapping
 function. What ways are there to make this more functional?

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe