Re: [Haskell-cafe] Function to detect duplicates

2010-02-24 Thread Gwern Branwen
2010/2/23 Jonas Almström Duregård jonas.dureg...@gmail.com:
 Hi Rafael,

 I assume you will perform this operation on some very large lists, or
 performance would not be an issue. Have you tested if your optimized
 version is better than your initial one?

 You should compare your implementation against something like this:

 import qualified Data.Set as Set
 noneRepeated :: (Ord a) = [a] - Bool
 noneRepeated = accum Set.empty where
  accum _ [] = True
  accum s (x:xs)
    | Set.member x s = False
    | otherwise      = accum (Set.insert x s) xs

 Also there is some discussion about the nub function that relates to
 this topic, e.g. http://buffered.io/2008/07/28/a-better-nub/.

 /Jonas

Or better yet, 
http://www.haskell.org/pipermail/libraries/2008-October/010778.html
Much more thorough and practical w/r/t to actually getting faster nubs
in the libraries.

-- 
gwern
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Function to detect duplicates

2010-02-24 Thread Daniel Fischer
Am Mittwoch 24 Februar 2010 21:25:04 schrieb Gwern Branwen:
 2010/2/23 Jonas Almström Duregård jonas.dureg...@gmail.com:
  Hi Rafael,
 
  I assume you will perform this operation on some very large lists, or
  performance would not be an issue. Have you tested if your optimized
  version is better than your initial one?
 
  You should compare your implementation against something like this:
 
  import qualified Data.Set as Set
  noneRepeated :: (Ord a) = [a] - Bool
  noneRepeated = accum Set.empty where
   accum _ [] = True
   accum s (x:xs)
     | Set.member x s = False
     | otherwise      = accum (Set.insert x s) xs
 
  Also there is some discussion about the nub function that relates to
  this topic, e.g. http://buffered.io/2008/07/28/a-better-nub/.
 
  /Jonas

 Or better yet,
 http://www.haskell.org/pipermail/libraries/2008-October/010778.html Much
 more thorough and practical w/r/t to actually getting faster nubs in the
 libraries.

Umm,

using the nubOrd' code to nub a 1 million long list of pseudo random 
numbers takes (here) about 2.5 times the time and twice space as the Set-
based ordNub. It does slightly better for 100,000 elements, but still takes 
more than twice the time (and 1.6 x the space).

In my book, that's a compelling reason to go with the set-based 
implementation - unless we're talking about code to include directly in 
Data.List, but then I'd still _use_ the set-based one.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Function to detect duplicates

2010-02-24 Thread Casey Hawthorne
On Tue, 23 Feb 2010 08:30:18 -0300, you wrote:

Hi folks,

While solving a puzzle, I was posed the problem of finding if there was no
duplicates on a list.

Must it be a list data structure(DS) or list ADT?

Mergesort can be parallelized.


Best regards,

Rafael

--
Regards,
Casey
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Function to detect duplicates

2010-02-24 Thread Casey Hawthorne
On Tue, 23 Feb 2010 08:30:18 -0300, you wrote:

Hi folks,

While solving a puzzle, I was posed the problem of finding if there was no
duplicates on a list.

Must it be a list data structure(DS) or list ADT?

Mergesort can be parallelized.


Best regards,

Rafael


If space is at a premium you might want to look at a Bloom Filter.

http://en.wikipedia.org/wiki/Bloom_filter

The Bloom filter, conceived by Burton Howard Bloom in 1970,[1] is a
space-efficient probabilistic data structure that is used to test
whether an element is a member of a set. False positives are possible,
but false negatives are not. Elements can be added to the set, but not
removed (though this can be addressed with a counting filter). The
more elements that are added to the set, the larger the probability of
false positives.


The book Real World Haskell has an implementation.
--
Regards,
Casey
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Function to detect duplicates

2010-02-23 Thread Rafael Gustavo da Cunha Pereira Pinto
Hi folks,

While solving a puzzle, I was posed the problem of finding if there was no
duplicates on a list.

First I used:

noneRepeated=null.(filter (1)).(map length).group.sort


But this seemed very unneficient, so I thought that I could detect the
duplicates while sorting, and devised this:

import Control.Monad
import Data.Maybe

noneRepeated=isNothing . (foldl merge (Just [])) . (map sort) . pairs

pairs []=[]
pairs [x]=[[x]]
pairs (x:y:xs)=[x,y]:pairs xs

sort []=Just []
sort [x]=Just [x]
sort [x,y] | xy=Just [y,x]
   | yx=Just [x,y]
   | x==y=Nothing

merge::(Eq a, Ord a) = Maybe [a]-Maybe [a]-Maybe[a]
merge _ Nothing = Nothing
merge Nothing _ = Nothing
merge (Just []) (Just xs)=Just xs
merge (Just xs) (Just [])=Just xs
merge (Just (x:xs)) (Just (y:ys)) | x==y = Nothing
  | xy  = (Just y) +? (merge (Just (x:xs))
(Just ys))
  | xy  = (Just x) +? (merge (Just xs)
(Just (y:ys)))

(+?) = liftM2 (:)



My version of the merge sort returns Nothing whenever it finds two equal
entries, aborting all subsequent comparisons.

I have a few questions for the friendly people at this cafe:

1) Is there any improvement I can make?
2) Can it be parallelized (par, pseq)?


Best regards,

Rafael
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Function to detect duplicates

2010-02-23 Thread Jonas Almström Duregård
Hi Rafael,

I assume you will perform this operation on some very large lists, or
performance would not be an issue. Have you tested if your optimized
version is better than your initial one?

You should compare your implementation against something like this:

import qualified Data.Set as Set
noneRepeated :: (Ord a) = [a] - Bool
noneRepeated = accum Set.empty where
  accum _ [] = True
  accum s (x:xs)
| Set.member x s = False
| otherwise  = accum (Set.insert x s) xs

Also there is some discussion about the nub function that relates to
this topic, e.g. http://buffered.io/2008/07/28/a-better-nub/.

/Jonas

On 23 February 2010 12:30, Rafael Gustavo da Cunha Pereira Pinto
rafaelgcpp.li...@gmail.com wrote:

 Hi folks,

 While solving a puzzle, I was posed the problem of finding if there was no
 duplicates on a list.

 First I used:

 noneRepeated=null.(filter (1)).(map length).group.sort


 But this seemed very unneficient, so I thought that I could detect the
 duplicates while sorting, and devised this:

 import Control.Monad
 import Data.Maybe

 noneRepeated=isNothing . (foldl merge (Just [])) . (map sort) . pairs

 pairs []=[]
 pairs [x]=[[x]]
 pairs (x:y:xs)=[x,y]:pairs xs

 sort []=Just []
 sort [x]=Just [x]
 sort [x,y] | xy=Just [y,x]
    | yx=Just [x,y]
    | x==y=Nothing

 merge::(Eq a, Ord a) = Maybe [a]-Maybe [a]-Maybe[a]
 merge _ Nothing = Nothing
 merge Nothing _ = Nothing
 merge (Just []) (Just xs)=Just xs
 merge (Just xs) (Just [])=Just xs
 merge (Just (x:xs)) (Just (y:ys)) | x==y = Nothing
   | xy  = (Just y) +? (merge (Just (x:xs))
 (Just ys))
   | xy  = (Just x) +? (merge (Just xs)
 (Just (y:ys)))

 (+?) = liftM2 (:)



 My version of the merge sort returns Nothing whenever it finds two equal
 entries, aborting all subsequent comparisons.

 I have a few questions for the friendly people at this cafe:

 1) Is there any improvement I can make?
 2) Can it be parallelized (par, pseq)?


 Best regards,

 Rafael

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Function to detect duplicates

2010-02-23 Thread Ketil Malde
Rafael Gustavo da Cunha Pereira Pinto rafaelgcpp.li...@gmail.com
writes:

 First I used:

 noneRepeated=null.(filter (1)).(map length).group.sort

 But this seemed very unneficient, so I thought that I could detect the
 duplicates while sorting, and devised this:
   [...]
 1) Is there any improvement I can make?

Well - it's a bit long, don't you think?  Long enough that from a
cursory glance, I'd say it's in the no obvious errors category.  How
about (inspired by quicksort, as you no doubt can tell):

   noneRepeated [] = True
   noneRepeated (x:xs) = noneRepeated lt  singleton eq  noneRepeated gt
where lt = filter (x) xs
  gt = filter (x) xs
  eq = x:filter (==x) xs
  singleton [_] = True
  singleton _   = False

 2) Can it be parallelized (par, pseq)?

You could force each of the sublists in parallel, but you might lose
some laziness properties, so I'd carefully benchmark it.

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe