Re: [Haskell-cafe] ordNub

2013-10-13 Thread AntC
 Niklas Hambüchen mail at nh2.me writes:
 
 In sets, the order does not matter, while for nub it does.
 

Let's be careful here!. Niklas, when you say order, do you mean:
* the _ordering_ from the Ord instance? Or
* the relative sequence of elements in the list?

 ... the fact that Set is used inside my proposed 
 ordNub implementation is a detail not visible to the caller.

If you use the Set library, that fact may be very visible!
Because Set re-sequences the whole list, as per its Ord instance.

But List.nub preserves the list sequence (except for omitting duplicates).

Furthermore, the Ord instance might compare two elements as EQ, even 
though their Eq instance says they're not equal.

So a Set-based ordNub could end up returning:
* not the same elements as List.nub
* and/or not in the same list sequence

I'd call that very much *visible* to the caller.

 
 That's why it looks like a Data.List function to me.
 

[BTW I am still less than convinced that overall a Set-based ordNub is 
significantly more efficient. I suspect it depends on how big is your 
list.]


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-10-13 Thread Niklas Hambüchen
On 13/10/13 21:42, AntC wrote:
 Niklas Hambüchen mail at nh2.me writes:

 In sets, the order does not matter, while for nub it does.

 
 Let's be careful here!. Niklas, when you say order, do you mean:
 * the _ordering_ from the Ord instance? Or
 * the relative sequence of elements in the list?
 
 ... the fact that Set is used inside my proposed 
 ordNub implementation is a detail not visible to the caller.
 
 If you use the Set library, that fact may be very visible!
 Because Set re-sequences the whole list, as per its Ord instance.
 
 But List.nub preserves the list sequence (except for omitting duplicates).

I mean *exactly* what you say here.

ordNub behaves has the same behaviour as nub, while (Set.toList .
Set.fromList) doesn't.

 [BTW I am still less than convinced that overall a Set-based ordNub is 
 significantly more efficient. I suspect it depends on how big is your 
 list.]

What do you mean?

ordNub is clearly in a different complexity class, and the benchmarks
that I provided show not only this, but also that ordNub is *always*
faster than nub, even for singleton lists.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-10-13 Thread AntC
 Niklas Hambüchen mail at nh2.me writes:
 
  On 13/10/13 21:42, AntC wrote:
  ...
  If you use the Set library, that fact may be very visible!
  Because Set re-sequences the whole list, as per its Ord instance.
  
  But List.nub preserves the list sequence
  (except for omitting duplicates).
 
 I mean *exactly* what you say here.
 
 ordNub behaves has the same behaviour as nub, while (Set.toList .
 Set.fromList) doesn't.
 

That's great, thank you.

  [BTW I am still less than convinced that overall a Set-based ordNub is 
  significantly more efficient. I suspect it depends on how big is your 
  list.]
 
 What do you mean?
 
 ordNub is clearly in a different complexity class, ...

Yes, I'm not disputing that.

 ... and the benchmarks that I provided show not only this,
 but also that ordNub is *always* faster than nub,
 even for singleton lists.

Thanks Niklas, I hadn't spotted those benchmarks back in July.

I'm surprised at that result for singletons 
(and for very small numbers of elements which are in fact each different).

Especially because List's `nub` uses `filter` == fold, which should be 
tail-recursive.

It seems to me that for small numbers, the Set-based approach still 
requires comparing each element to each other. Plus there's the overhead 
for building the Set and inserting each element into it -- where `insert` 
again walks the Set to find the insertion point.

Then here's a further possible optimisation, instead of making separate 
calls to `member` and `insert`:
* Make a single call to
insert' :: (Ord a) = a - Set a - (Bool, Set a)
* The Bool returns True if already a member.
* Else returns an updated Set in the snd, with the element inserted.



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-10-13 Thread Niklas Hambüchen
On 14/10/13 03:20, AntC wrote:
 Thanks Niklas, I hadn't spotted those benchmarks back in July.

No worries :)

 I'm surprised at that result for singletons 
 (and for very small numbers of elements which are in fact each different).

I think one of the main reasons for the performance difference is that a
list node and a Set binary tree node have pretty much the same
performance, with the difference that in


http://hackage.haskell.org/package/containers-0.5.2.1/docs/src/Data-Set-Base.html

   data Set a = Bin {-# UNPACK #-} !Size !a !(Set a) !(Set a) | Tip

there are strictness and unpack annotations, while for

   data [a] = [] | a : [a] -- pseudo syntax

there are not.

Good for us in this case, I guess.

 It seems to me that for small numbers, the Set-based approach still 
 requires comparing each element to each other.

This I don't understand.

 Then here's a further possible optimisation, instead of making separate 
 calls to `member` and `insert`:

This I understand again. Where do you get insert' from? containers
doesn't seem to have it. Do you suggest adding it?

Niklas
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-10-13 Thread AntC
 Niklas Hambüchen mail at nh2.me writes:
 
  On 14/10/13 03:20, AntC wrote:
  ... 
  Then here's a further possible optimisation, instead of making 
  separate calls to `member` and `insert`:
 
 This I understand again. Where do you get insert' from? containers
 doesn't seem to have it. Do you suggest adding it?
 

err, well I didn't have any specific library in mind.

More there's a kind of 'folk idiom' for managing data structures,
(this applies more for imperative code/update-in-situ than functional)
that if you know the next thing you're going to do after failing to find 
an element is insert it, you might as well get on with the insert there 
and then.

(It's a higher-level analogue of a machine instruction decrement-and-
branch-if-zero.)

I'm looking at all the remarks about managing libraries and dependencies.
Would it make sense to build a stand-alone version of Set purely to 
support ordNub? Then it needs only methods `empty` and `insertIfAbsent`.



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-10-12 Thread Niklas Hambüchen
I would like to come back to the original question:

How can ordNub be added to base?

I guess we agree that Data.List is the right module for a function of
type Ord a = [a] - [a], but this introduces

* a cyclic dependency between Data.List and Data.Set
* a base dependency on containers.

What is the right way to go with that?

Should ordNub be introduced as part of Data.Set, as Conrad suggested?

It does not really have anything to do with Set, apart from being
implemented with it.

On 14/07/13 14:12, Roman Cheplyaka wrote:
 Something like that should definitely be included in Data.List.
 Thanks for working on it.
 
 Roman
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-10-12 Thread Anthony Cowley

On Oct 12, 2013, at 2:47 PM, Niklas Hambüchen m...@nh2.me wrote:
 
 I would like to come back to the original question:
 
 How can ordNub be added to base?
 
 I guess we agree that Data.List is the right module for a function of
 type Ord a = [a] - [a], but this introduces
 
 * a cyclic dependency between Data.List and Data.Set
 * a base dependency on containers.
 
 What is the right way to go with that?
 
 Should ordNub be introduced as part of Data.Set, as Conrad suggested?
 
 It does not really have anything to do with Set, apart from being
 implemented with it.

I think nub's behavior is rather set-related, so I don't really understand the 
objection to putting it in Data.Set.

Anthony

 
 On 14/07/13 14:12, Roman Cheplyaka wrote:
 Something like that should definitely be included in Data.List.
 Thanks for working on it.
 
 Roman
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-10-12 Thread Niklas Hambüchen
On 12/10/13 20:43, Anthony Cowley wrote:
 I think nub's behavior is rather set-related, so I don't really understand 
 the objection to putting it in Data.Set.

In sets, the order does not matter, while for nub it does.

nub:: Eq a  = [a] - [a]
ordNub :: Ord a = [a] - [a]

both do not mention Set, and the fact that Set is used inside my
proposed ordNub implementation is a detail not visible to the caller.

That's why it looks like a Data.List function to me.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-10-12 Thread Roman Cheplyaka
* Anthony Cowley acow...@seas.upenn.edu [2013-10-12 15:43:57-0400]
 
 On Oct 12, 2013, at 2:47 PM, Niklas Hambüchen m...@nh2.me wrote:
  
  I would like to come back to the original question:
  
  How can ordNub be added to base?
  
  I guess we agree that Data.List is the right module for a function of
  type Ord a = [a] - [a], but this introduces
  
  * a cyclic dependency between Data.List and Data.Set
  * a base dependency on containers.
  
  What is the right way to go with that?
  
  Should ordNub be introduced as part of Data.Set, as Conrad suggested?
  
  It does not really have anything to do with Set, apart from being
  implemented with it.
 
 I think nub's behavior is rather set-related, so I don't really
 understand the objection to putting it in Data.Set.

It's not Set (in the data structure sense) related. It's list-related,
because it clearly acts on lists.

Therefore, it belongs to Data.List.

Besides, we already have the precedent of the slow nub being in
Data.List.

Roman


signature.asc
Description: Digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-08-26 Thread Niklas Hambüchen
On 14/07/13 20:20, Niklas Hambüchen wrote:
 As you might not know, almost *all* practical Haskell projects use it,
 and that in places where an Ord instance is given, e.g. happy, Xmonad,
 ghc-mod, Agda, darcs, QuickCheck, yesod, shake, Cabal, haddock, and 600
 more (see https://github.com/nh2/haskell-ordnub).

GHC uses nub.

Also let me stress again that the n² case happens even if there are no
duplicates.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-08-19 Thread AntC
 Richard A. O'Keefe ok at cs.otago.ac.nz writes:
 
 There are at least four different things that an Ord version might
 mean:
 
  - first sort a list, then eliminate duplicates
  - sort a list eliminating duplicates stably as you go
(think 'merge sort', using 'union' instead of 'merge')
  - build a balanced tree set as you go
  - having a list that is already sorted, use that to
eliminated duplicates cheaply.
 
 These things have different costs.  For example, ...
 
 What I want is more often ordNubBy than ordNub, though.
 

(ordNubBy you can get via a suitable Ord instance for the element type?)

The bigger problem is that you might not have a suitable Ord instance. 
After all, sets are defined by equality/equivalence relation, not 
necessarily by Ord.

There are many other things you might want to do with a set/collection 
than just remove duplicates.

I notice that Data.Set.map = fromList . (map stuff) . toList
That is, build two lists (to be GC'd), as well as the set result. 

So does the performane cost of from/to List outrun the benefit of 
Data.Set.union? Depends how much you're mapping vs inserting and checking 
membership.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-07-16 Thread Ketil Malde

Francesco Mazzoli f...@mazzo.li writes:

 import qualified Data.HashSet as S
 
 nub :: Hashable a = [a] - [a]
 nub = S.toList . S.fromList

 Well, the above is not stable while Niklas’ is.  But I guess that’s not
 the point of your message :).

We could also implement Data.BloomFilter.nub, which removes equal
elements probabilistically (with a small but non-zero chance of removing
some unique elements) :-)

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-07-16 Thread Andreas Abel

On 14.07.2013 13:20, Niklas Hambüchen wrote:

I've taken the Ord-based O(n * log n) implementation from yi using a Set:

   ordNub :: (Ord a) = [a] - [a]
   ordNub l = go empty l
 where
   go _ [] = []
   go s (x:xs) = if x `member` s then go s xs
 else x : go (insert x s) xs

(The benchmark also shows some other potential problem: Using a state
monad to keep the set instead of a function argument can be up to 20
times slower. Should that happen?)


I cannot say whether this should happen, but your code about can be 
straight-forwardly refactored using a *Reader* monad.


import Control.Monad.Reader

import Data.Functor (($))
import qualified Data.Set as Set

-- ifM still not in Control.Monad
ifM mc md me = do { c - mc; if c then md else me }

ordNub :: (Ord a) = [a] - [a]
ordNub l = runReader (go l) Set.empty
  where
go [] = return []
go (x:xs) = ifM ((x `Set.member`) $ ask)
(go xs)
((x :) $ local (Set.insert x) (go xs))

test = ordNub [1,2,4,1,3,5,2,3,4,5,6,1]

Of course, this does not lend itself to an application of filterM.

In fact, your implementation is already in the (Set a -) reader monad, 
in normalized form.  It looks already optimal to me.


Cheers,
Andreas

--
Andreas AbelDu bist der geliebte Mensch.

Theoretical Computer Science, University of Munich
Oettingenstr. 67, D-80538 Munich, GERMANY

andreas.a...@ifi.lmu.de
http://www2.tcs.ifi.lmu.de/~abel/

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-07-15 Thread Clark Gaebel
Apologies. I was being lazy. Here's a stable version:

  import qualified Data.HashSet as S

  hashNub :: (Ord a) = [a] - [a]
  hashNub l = go S.empty l
where
  go _ [] = []
  go s (x:xs) = if x `S.member` s then go s xs
else x : go (S.insert x s) xs

Which, again, will probably be faster than the one using Ord, and I
can't think of any cases where I'd want the one using Ord instead. I
may just not be creative enough, though.


  - Clark

On Mon, Jul 15, 2013 at 12:46 AM, Brandon Allbery allber...@gmail.com wrote:
 On Sun, Jul 14, 2013 at 7:54 AM, Clark Gaebel cgae...@uwaterloo.ca wrote:

 Oops sorry I guess my point wasn't clear.

 Why ord based when hashable is faster? Then there's no reason this has to
 be in base, it can just be a

 Did the point about stable fly overhead?

 --
 brandon s allbery kf8nh   sine nomine associates
 allber...@gmail.com  ballb...@sinenomine.net
 unix, openafs, kerberos, infrastructure, xmonadhttp://sinenomine.net

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-07-15 Thread Niklas Hambüchen
Hey Jason,

would you mind giving a short idea of what the point of Bird's
implementation is / from what properties it is derived?

Also, running the QuickCheck tests you added, it doesn't give the same
output (order) as nub.

On 15/07/13 13:26, Jason Dagit wrote:
 Richard Bird has a book, Pearls of Functional Algorithm Design that
 is meant to teach a form of deriving algorithms from the properties we
 ask of them. In this book, he gives a possible derivation of ordNub,
 simply called nub in the book, following the methodology he is
 teaching. He notes in the text that this derivation feels more
 complicated than it ought.
 
 Here is his version: http://lpaste.net/87625
 
 I just sent you a pull request to add that one and S.toList .
 S.fromList that was suggested in this thread. I don't think those two
 implementations are faster than the others but it's nice to have them
 for completeness.
 
 Jason
 

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-07-15 Thread Brandon Allbery
On Sun, Jul 14, 2013 at 7:54 AM, Clark Gaebel cgae...@uwaterloo.ca wrote:

 Oops sorry I guess my point wasn't clear.

 Why ord based when hashable is faster? Then there's no reason this has to
 be in base, it can just be a

Did the point about stable fly overhead?

-- 
brandon s allbery kf8nh   sine nomine associates
allber...@gmail.com  ballb...@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonadhttp://sinenomine.net
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-07-15 Thread Jason Dagit
On Sun, Jul 14, 2013 at 4:20 AM, Niklas Hambüchen m...@nh2.me wrote:
 tldr: nub is abnormally slow, we shouldn't use it, but we do.


 As you might know, Data.List.nub is O(n²). (*)

 As you might not know, almost *all* practical Haskell projects use it,
 and that in places where an Ord instance is given, e.g. happy, Xmonad,
 ghc-mod, Agda, darcs, QuickCheck, yesod, shake, Cabal, haddock, and 600
 more (see https://github.com/nh2/haskell-ordnub).

 I've taken the Ord-based O(n * log n) implementation from yi using a Set:

   ordNub :: (Ord a) = [a] - [a]
   ordNub l = go empty l
 where
   go _ [] = []
   go s (x:xs) = if x `member` s then go s xs
 else x : go (insert x s) xs


 and put benchmarks on
 http://htmlpreview.github.io/?https://github.com/nh2/haskell-ordnub/blob/1f0a2c94a/report.html
 (compare `nub` vs `ordNub`).

Richard Bird has a book, Pearls of Functional Algorithm Design that
is meant to teach a form of deriving algorithms from the properties we
ask of them. In this book, he gives a possible derivation of ordNub,
simply called nub in the book, following the methodology he is
teaching. He notes in the text that this derivation feels more
complicated than it ought.

Here is his version: http://lpaste.net/87625

I just sent you a pull request to add that one and S.toList .
S.fromList that was suggested in this thread. I don't think those two
implementations are faster than the others but it's nice to have them
for completeness.

Jason

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-07-15 Thread John Lato
In my tests, using unordered-containers was slightly slower than using Ord,
although as the number of repeated elements grows unordered-containers
appears to have an advantage.  I'm sure the relative costs of comparison vs
hashing would affect this also.  But both are dramatically better than the
current nub.

Has anyone looked at Bart's patches to see how difficult it would be to
apply them (or re-write them)?


On Mon, Jul 15, 2013 at 8:43 PM, Clark Gaebel cgae...@uwaterloo.ca wrote:

 Apologies. I was being lazy. Here's a stable version:

   import qualified Data.HashSet as S

   hashNub :: (Ord a) = [a] - [a]
   hashNub l = go S.empty l
 where
   go _ [] = []
   go s (x:xs) = if x `S.member` s then go s xs
 else x : go (S.insert x s) xs

 Which, again, will probably be faster than the one using Ord, and I
 can't think of any cases where I'd want the one using Ord instead. I
 may just not be creative enough, though.


   - Clark

 On Mon, Jul 15, 2013 at 12:46 AM, Brandon Allbery allber...@gmail.com
 wrote:
  On Sun, Jul 14, 2013 at 7:54 AM, Clark Gaebel cgae...@uwaterloo.ca
 wrote:
 
  Oops sorry I guess my point wasn't clear.
 
  Why ord based when hashable is faster? Then there's no reason this has
 to
  be in base, it can just be a
 
  Did the point about stable fly overhead?
 
  --
  brandon s allbery kf8nh   sine nomine
 associates
  allber...@gmail.com
 ballb...@sinenomine.net
  unix, openafs, kerberos, infrastructure, xmonad
 http://sinenomine.net

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-07-15 Thread Ivan Lazar Miljenovic
On 16 July 2013 11:46, John Lato jwl...@gmail.com wrote:
 In my tests, using unordered-containers was slightly slower than using Ord,
 although as the number of repeated elements grows unordered-containers
 appears to have an advantage.  I'm sure the relative costs of comparison vs
 hashing would affect this also.  But both are dramatically better than the
 current nub.

 Has anyone looked at Bart's patches to see how difficult it would be to
 apply them (or re-write them)?

If I understand correctly, this function is proposed to be added to
Data.List which lives in base... but the proposals here are about
using either Sets from containers or HashSet from
unordered-containers; I thought base wasn't supposed to depend on any
other package :/




 On Mon, Jul 15, 2013 at 8:43 PM, Clark Gaebel cgae...@uwaterloo.ca wrote:

 Apologies. I was being lazy. Here's a stable version:

   import qualified Data.HashSet as S

   hashNub :: (Ord a) = [a] - [a]
   hashNub l = go S.empty l
 where
   go _ [] = []
   go s (x:xs) = if x `S.member` s then go s xs
 else x : go (S.insert x s) xs

 Which, again, will probably be faster than the one using Ord, and I
 can't think of any cases where I'd want the one using Ord instead. I
 may just not be creative enough, though.


   - Clark

 On Mon, Jul 15, 2013 at 12:46 AM, Brandon Allbery allber...@gmail.com
 wrote:
  On Sun, Jul 14, 2013 at 7:54 AM, Clark Gaebel cgae...@uwaterloo.ca
  wrote:
 
  Oops sorry I guess my point wasn't clear.
 
  Why ord based when hashable is faster? Then there's no reason this has
  to
  be in base, it can just be a
 
  Did the point about stable fly overhead?
 
  --
  brandon s allbery kf8nh   sine nomine
  associates
  allber...@gmail.com
  ballb...@sinenomine.net
  unix, openafs, kerberos, infrastructure, xmonad
  http://sinenomine.net

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe



 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe




-- 
Ivan Lazar Miljenovic
ivan.miljeno...@gmail.com
http://IvanMiljenovic.wordpress.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-07-15 Thread John Lato
On Tue, Jul 16, 2013 at 10:31 AM, Ivan Lazar Miljenovic 
ivan.miljeno...@gmail.com wrote:

 On 16 July 2013 11:46, John Lato jwl...@gmail.com wrote:
  In my tests, using unordered-containers was slightly slower than using
 Ord,
  although as the number of repeated elements grows unordered-containers
  appears to have an advantage.  I'm sure the relative costs of comparison
 vs
  hashing would affect this also.  But both are dramatically better than
 the
  current nub.
 
  Has anyone looked at Bart's patches to see how difficult it would be to
  apply them (or re-write them)?

 If I understand correctly, this function is proposed to be added to
 Data.List which lives in base... but the proposals here are about
 using either Sets from containers or HashSet from
 unordered-containers; I thought base wasn't supposed to depend on any
 other package :/


That was one of the points up for discussion: is it worth including a
subset of Set functionality to enable a much better nub in base?  Is it
even worth having Data.List.nub if it has quadratic complexity?

As an alternative, Bart's proposal was for both including ordNub in
containers and an improved nub (with no dependencies outside base) in
Data.List.  Unfortunately the patches are quite old (darcs format), and I
don't know how they'd apply to the current situation.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-07-15 Thread Clark Gaebel
I'm procrastinating something else, so I wrote the patch to
unordered-containers. Feel free to comment on the github link:

https://github.com/tibbe/unordered-containers/pull/67

I'm still against having an Ord version, since my intuition tells me
that hash-based data structures are faster than ordered ones. Someone
else can write the patch, though!

As a tangent, can anyone think of a data structure for which you can
write an Ord instance but Hashable/Eq is impossible (or prove
otherwise)? How about the converse?

Regards,
  - Clark

On Mon, Jul 15, 2013 at 10:40 PM, John Lato jwl...@gmail.com wrote:
 On Tue, Jul 16, 2013 at 10:31 AM, Ivan Lazar Miljenovic
 ivan.miljeno...@gmail.com wrote:

 On 16 July 2013 11:46, John Lato jwl...@gmail.com wrote:
  In my tests, using unordered-containers was slightly slower than using
  Ord,
  although as the number of repeated elements grows unordered-containers
  appears to have an advantage.  I'm sure the relative costs of comparison
  vs
  hashing would affect this also.  But both are dramatically better than
  the
  current nub.
 
  Has anyone looked at Bart's patches to see how difficult it would be to
  apply them (or re-write them)?

 If I understand correctly, this function is proposed to be added to
 Data.List which lives in base... but the proposals here are about
 using either Sets from containers or HashSet from
 unordered-containers; I thought base wasn't supposed to depend on any
 other package :/


 That was one of the points up for discussion: is it worth including a subset
 of Set functionality to enable a much better nub in base?  Is it even worth
 having Data.List.nub if it has quadratic complexity?

 As an alternative, Bart's proposal was for both including ordNub in
 containers and an improved nub (with no dependencies outside base) in
 Data.List.  Unfortunately the patches are quite old (darcs format), and I
 don't know how they'd apply to the current situation.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-07-15 Thread Brandon Allbery
On Mon, Jul 15, 2013 at 10:31 PM, Ivan Lazar Miljenovic 
ivan.miljeno...@gmail.com wrote:

 If I understand correctly, this function is proposed to be added to
 Data.List which lives in base... but the proposals here are about
 using either Sets from containers or HashSet from
 unordered-containers; I thought base wasn't supposed to depend on any
 other package :/


As I understand it, currently we have a double proposal: simple ordNub in
Data.List without external dependencies, and the other one in containers
and/or unordered-containers as appropriate.

-- 
brandon s allbery kf8nh   sine nomine associates
allber...@gmail.com  ballb...@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonadhttp://sinenomine.net
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-07-15 Thread Richard A. O'Keefe

On 16/07/2013, at 3:21 PM, Clark Gaebel wrote:
 
 I'm still against having an Ord version, since my intuition tells me
 that hash-based data structures are faster than ordered ones.

There are at least four different things that an Ord version might
mean:

 - first sort a list, then eliminate duplicates
 - sort a list eliminating duplicates stably as you go
   (think 'merge sort', using 'union' instead of 'merge')
 - build a balanced tree set as you go
 - having a list that is already sorted, use that to
   eliminated duplicates cheaply.

These things have different costs.  For example, if there are N
elements of which U are unique, the first as O(N.log N) cost,
the third has O(N.log U) cost, and the fourth has O(N) cost.

What I want is more often ordNubBy than ordNub, though.

 Someone
 else can write the patch, though!
 
 As a tangent, can anyone think of a data structure for which you can
 write an Ord instance but Hashable/Eq is impossible (or prove
 otherwise)? How about the converse?

Since Ord has Eq as a superclass, and since 0 is a functionally
correct hash value for anything, if you can implement Ord you
can obviously implement Hashable/Eq.  Whether it is *useful* to
do so is another question.

It turns out that it _is_ possible to define good quality hash
functions on sets, but most code in the field to do so is pretty bad.
(Just a modular sum or exclusive or.)

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-07-15 Thread Clark Gaebel
nubBy is a very good suggestion. Added!

Regarding good hash functions: if your data structure is algebraic,
you can derive generic and Hashable will give you a pretty good hash
function:

 data ADT a = C0 Int String | C1 [a]
   deriving Generic

 instance Hashable a = Hashable (ADT a)

It's magic!

  - Clark

On Mon, Jul 15, 2013 at 11:35 PM, Richard A. O'Keefe o...@cs.otago.ac.nz 
wrote:

 On 16/07/2013, at 3:21 PM, Clark Gaebel wrote:

 I'm still against having an Ord version, since my intuition tells me
 that hash-based data structures are faster than ordered ones.

 There are at least four different things that an Ord version might
 mean:

  - first sort a list, then eliminate duplicates
  - sort a list eliminating duplicates stably as you go
(think 'merge sort', using 'union' instead of 'merge')
  - build a balanced tree set as you go
  - having a list that is already sorted, use that to
eliminated duplicates cheaply.

 These things have different costs.  For example, if there are N
 elements of which U are unique, the first as O(N.log N) cost,
 the third has O(N.log U) cost, and the fourth has O(N) cost.

 What I want is more often ordNubBy than ordNub, though.

 Someone
 else can write the patch, though!

 As a tangent, can anyone think of a data structure for which you can
 write an Ord instance but Hashable/Eq is impossible (or prove
 otherwise)? How about the converse?

 Since Ord has Eq as a superclass, and since 0 is a functionally
 correct hash value for anything, if you can implement Ord you
 can obviously implement Hashable/Eq.  Whether it is *useful* to
 do so is another question.

 It turns out that it _is_ possible to define good quality hash
 functions on sets, but most code in the field to do so is pretty bad.
 (Just a modular sum or exclusive or.)

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-07-15 Thread Conrad Parker
On 16 July 2013 10:31, Ivan Lazar Miljenovic ivan.miljeno...@gmail.com wrote:
 On 16 July 2013 11:46, John Lato jwl...@gmail.com wrote:
 In my tests, using unordered-containers was slightly slower than using Ord,
 although as the number of repeated elements grows unordered-containers
 appears to have an advantage.  I'm sure the relative costs of comparison vs
 hashing would affect this also.  But both are dramatically better than the
 current nub.

 Has anyone looked at Bart's patches to see how difficult it would be to
 apply them (or re-write them)?

 If I understand correctly, this function is proposed to be added to
 Data.List which lives in base... but the proposals here are about
 using either Sets from containers or HashSet from
 unordered-containers; I thought base wasn't supposed to depend on any
 other package :/

This discussion (on -cafe@) is just about what course of action to
take; adding such functions to containers or unordered-containers
would not require a libraries@ proposal.

Conrad.





 On Mon, Jul 15, 2013 at 8:43 PM, Clark Gaebel cgae...@uwaterloo.ca wrote:

 Apologies. I was being lazy. Here's a stable version:

   import qualified Data.HashSet as S

   hashNub :: (Ord a) = [a] - [a]
   hashNub l = go S.empty l
 where
   go _ [] = []
   go s (x:xs) = if x `S.member` s then go s xs
 else x : go (S.insert x s) xs

 Which, again, will probably be faster than the one using Ord, and I
 can't think of any cases where I'd want the one using Ord instead. I
 may just not be creative enough, though.


   - Clark

 On Mon, Jul 15, 2013 at 12:46 AM, Brandon Allbery allber...@gmail.com
 wrote:
  On Sun, Jul 14, 2013 at 7:54 AM, Clark Gaebel cgae...@uwaterloo.ca
  wrote:
 
  Oops sorry I guess my point wasn't clear.
 
  Why ord based when hashable is faster? Then there's no reason this has
  to
  be in base, it can just be a
 
  Did the point about stable fly overhead?
 
  --
  brandon s allbery kf8nh   sine nomine
  associates
  allber...@gmail.com
  ballb...@sinenomine.net
  unix, openafs, kerberos, infrastructure, xmonad
  http://sinenomine.net

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe



 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe




 --
 Ivan Lazar Miljenovic
 ivan.miljeno...@gmail.com
 http://IvanMiljenovic.wordpress.com

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] ordNub

2013-07-14 Thread Niklas Hambüchen
tldr: nub is abnormally slow, we shouldn't use it, but we do.


As you might know, Data.List.nub is O(n²). (*)

As you might not know, almost *all* practical Haskell projects use it,
and that in places where an Ord instance is given, e.g. happy, Xmonad,
ghc-mod, Agda, darcs, QuickCheck, yesod, shake, Cabal, haddock, and 600
more (see https://github.com/nh2/haskell-ordnub).

I've taken the Ord-based O(n * log n) implementation from yi using a Set:

  ordNub :: (Ord a) = [a] - [a]
  ordNub l = go empty l
where
  go _ [] = []
  go s (x:xs) = if x `member` s then go s xs
else x : go (insert x s) xs


and put benchmarks on
http://htmlpreview.github.io/?https://github.com/nh2/haskell-ordnub/blob/1f0a2c94a/report.html
(compare `nub` vs `ordNub`).

`ordNub` is not only in a different complexity class, but even seems to
perform better than nub for very small numbers of actually different
list elements (that's the numbers before the benchmark names).

(The benchmark also shows some other potential problem: Using a state
monad to keep the set instead of a function argument can be up to 20
times slower. Should that happen?)

What do you think about ordNub?

I've seen a proposal from 5 years ago about adding a *sort*Nub function
started by Neil, but it just died.


(*) The mentioned complexity is for the (very common) worst case, in
which the number of different elements in the list grows with the list
(alias you don't have an N element list with always only 5 different
things inside).

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-07-14 Thread Clark Gaebel
Similarly, I've always used:

import qualified Data.HashSet as S

nub :: Hashable a = [a] - [a]
nub = S.toList . S.fromList

And i can't think of any type which i can't write a Hashable instance, so
this is extremely practical.
On Jul 14, 2013 7:24 AM, Niklas Hambüchen m...@nh2.me wrote:

 tldr: nub is abnormally slow, we shouldn't use it, but we do.


 As you might know, Data.List.nub is O(n²). (*)

 As you might not know, almost *all* practical Haskell projects use it,
 and that in places where an Ord instance is given, e.g. happy, Xmonad,
 ghc-mod, Agda, darcs, QuickCheck, yesod, shake, Cabal, haddock, and 600
 more (see https://github.com/nh2/haskell-ordnub).

 I've taken the Ord-based O(n * log n) implementation from yi using a Set:

   ordNub :: (Ord a) = [a] - [a]
   ordNub l = go empty l
 where
   go _ [] = []
   go s (x:xs) = if x `member` s then go s xs
 else x : go (insert x s) xs


 and put benchmarks on

 http://htmlpreview.github.io/?https://github.com/nh2/haskell-ordnub/blob/1f0a2c94a/report.html
 (compare `nub` vs `ordNub`).

 `ordNub` is not only in a different complexity class, but even seems to
 perform better than nub for very small numbers of actually different
 list elements (that's the numbers before the benchmark names).

 (The benchmark also shows some other potential problem: Using a state
 monad to keep the set instead of a function argument can be up to 20
 times slower. Should that happen?)

 What do you think about ordNub?

 I've seen a proposal from 5 years ago about adding a *sort*Nub function
 started by Neil, but it just died.


 (*) The mentioned complexity is for the (very common) worst case, in
 which the number of different elements in the list grows with the list
 (alias you don't have an N element list with always only 5 different
 things inside).

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-07-14 Thread Clark Gaebel
Oops sorry I guess my point wasn't clear.

Why ord based when hashable is faster? Then there's no reason this has to
be in base, it can just be a free function in Data.HashSet. If stability is
a concern then there's a way to easily account for that using HashMap.

  - Clark
On Jul 14, 2013 7:48 AM, Niklas Hambüchen m...@nh2.me wrote:

 One of my main points is:

 Should we not add such a function (ord-based, same output as nub,
 stable, no sorting) to base?

 As the package counting shows, if we don't offer an alternative, people
 obviously use it, and not to our benefit.

 (Not to say it this way:
 We could make the Haskell world fast with smarter fusion, strictness
 analysis and LLVM backends.
 Or we could stop using quadratic algorithms.)

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-07-14 Thread Niklas Hambüchen
One of my main points is:

Should we not add such a function (ord-based, same output as nub, 
stable, no sorting) to base?

As the package counting shows, if we don't offer an alternative, people 
obviously use it, and not to our benefit.

(Not to say it this way:
We could make the Haskell world fast with smarter fusion, strictness 
analysis and LLVM backends.
Or we could stop using quadratic algorithms.)

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-07-14 Thread Roman Cheplyaka
Something like that should definitely be included in Data.List.
Thanks for working on it.

Roman

* Niklas Hambüchen m...@nh2.me [2013-07-14 19:20:52+0800]
 tldr: nub is abnormally slow, we shouldn't use it, but we do.
 
 
 As you might know, Data.List.nub is O(n²). (*)
 
 As you might not know, almost *all* practical Haskell projects use it,
 and that in places where an Ord instance is given, e.g. happy, Xmonad,
 ghc-mod, Agda, darcs, QuickCheck, yesod, shake, Cabal, haddock, and 600
 more (see https://github.com/nh2/haskell-ordnub).
 
 I've taken the Ord-based O(n * log n) implementation from yi using a Set:
 
   ordNub :: (Ord a) = [a] - [a]
   ordNub l = go empty l
 where
   go _ [] = []
   go s (x:xs) = if x `member` s then go s xs
 else x : go (insert x s) xs
 
 
 and put benchmarks on
 http://htmlpreview.github.io/?https://github.com/nh2/haskell-ordnub/blob/1f0a2c94a/report.html
 (compare `nub` vs `ordNub`).
 
 `ordNub` is not only in a different complexity class, but even seems to
 perform better than nub for very small numbers of actually different
 list elements (that's the numbers before the benchmark names).
 
 (The benchmark also shows some other potential problem: Using a state
 monad to keep the set instead of a function argument can be up to 20
 times slower. Should that happen?)
 
 What do you think about ordNub?
 
 I've seen a proposal from 5 years ago about adding a *sort*Nub function
 started by Neil, but it just died.
 
 
 (*) The mentioned complexity is for the (very common) worst case, in
 which the number of different elements in the list grows with the list
 (alias you don't have an N element list with always only 5 different
 things inside).
 
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-07-14 Thread Francesco Mazzoli
At Sun, 14 Jul 2013 07:31:05 -0400,
Clark Gaebel wrote:
 Similarly, I've always used:
 
 import qualified Data.HashSet as S
 
 nub :: Hashable a = [a] - [a]
 nub = S.toList . S.fromList
 
 And i can't think of any type which i can't write a Hashable instance, so
 this is extremely practical.

Well, the above is not stable while Niklas’ is.  But I guess that’s not
the point of your message :).

I’ve always avoided “nub” too, and FWIW I’d like a constrained version
too—maybe avoiding Data.Set so that it could live in Data.List.  I think
Ord would be much better than Hashable, since it is 1. in “base” 2. much
more established and understood.

Although if you find yourself using “nub” too much you’re probably doing
something wrong...

Francesco

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-07-14 Thread Joey Adams
On Sun, Jul 14, 2013 at 7:31 AM, Clark Gaebel cgae...@uwaterloo.ca wrote:

 Similarly, I've always used:

 import qualified Data.HashSet as S

 nub :: Hashable a = [a] - [a]
 nub = S.toList . S.fromList

 And i can't think of any type which i can't write a Hashable instance, so
 this is extremely practical.

This won't yield results lazily (e.g. nub (repeat 'x') = _|_ instead of 'x'
: _|_), but Niklas' ordNub will.  His ordNub can be translated directly to
HashSet and still have the stability and laziness properties.

A difficulty with putting ordNub in Data.List is that it depends on
containers, which is outside of the base package.  Some options:

 * Move the implementation of Set to base.

 * Implement a lean version of Set in base that only provides 'insert' and
'member'.

 * Define ordNub in Data.Set instead.

Adding a Hashable-based nub to base would be even more problematic, since
you'd need Hashable in base.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-07-14 Thread Conrad Parker
On 15 July 2013 09:54, Joey Adams joeyadams3.14...@gmail.com wrote:
 On Sun, Jul 14, 2013 at 7:31 AM, Clark Gaebel cgae...@uwaterloo.ca wrote:

 Similarly, I've always used:

 import qualified Data.HashSet as S

 nub :: Hashable a = [a] - [a]
 nub = S.toList . S.fromList

 And i can't think of any type which i can't write a Hashable instance, so
 this is extremely practical.

 This won't yield results lazily (e.g. nub (repeat 'x') = _|_ instead of 'x'
 : _|_), but Niklas' ordNub will.  His ordNub can be translated directly to
 HashSet and still have the stability and laziness properties.

 A difficulty with putting ordNub in Data.List is that it depends on
 containers, which is outside of the base package.  Some options:

  * Move the implementation of Set to base.

  * Implement a lean version of Set in base that only provides 'insert' and
 'member'.

  * Define ordNub in Data.Set instead.

 Adding a Hashable-based nub to base would be even more problematic, since
 you'd need Hashable in base.

Right, I suggest the following community course of action:

1a) add ordNub to Data.Set
1b) add ordNub to Data.Hashable
(1 day)

2) make a libraries@ proposal to include a stripped-down Data.Set-like
balanced binary tree implementation to base.
(2 weeks)

3) bikeshed about the name, eg.:
  * is nub really intuitive? how about uniq, like in
perl/ruby/underscore.js?
  * but uniq in unix only removes _adjacent_ duplicates, confusing!
  * how about distinct? sole? unique? azygous?
(7 weeks)

4) Failing consensus on technical grounds (that the stripped-down
Data.Set implementation is overkill for one library function), agree
that anyone who really cares should just use the version from
containers or hashable. Only newbs and textbook authors actually use
base anyway, and it's impossible to change the language definition.
Prelude will continue to fulfil its role of avoiding success at all
costs, quadratic or otherwise.

(Please, let's have both 1a and 1b :)

Conrad.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ordNub

2013-07-14 Thread Thomas DuBuisson
Just so people are aware - five years ago the notion of nubOrd and
nubWith was discussed and a consensus reached on including nubOrd.  I
think Bart got too busy, didn't submit a final patch, and no one with
commit access actually commited any code.

http://haskell.1045720.n5.nabble.com/GHC-2717-Add-nubWith-nubOrd-td3159919.html

I fully support an efficient nub implementation making its way into
base - it's far past time.  Using Set seems sensible.

Cheers,
Thomas



On Sun, Jul 14, 2013 at 4:20 AM, Niklas Hambüchen m...@nh2.me wrote:
 tldr: nub is abnormally slow, we shouldn't use it, but we do.


 As you might know, Data.List.nub is O(n²). (*)

 As you might not know, almost *all* practical Haskell projects use it,
 and that in places where an Ord instance is given, e.g. happy, Xmonad,
 ghc-mod, Agda, darcs, QuickCheck, yesod, shake, Cabal, haddock, and 600
 more (see https://github.com/nh2/haskell-ordnub).

 I've taken the Ord-based O(n * log n) implementation from yi using a Set:

   ordNub :: (Ord a) = [a] - [a]
   ordNub l = go empty l
 where
   go _ [] = []
   go s (x:xs) = if x `member` s then go s xs
 else x : go (insert x s) xs


 and put benchmarks on
 http://htmlpreview.github.io/?https://github.com/nh2/haskell-ordnub/blob/1f0a2c94a/report.html
 (compare `nub` vs `ordNub`).

 `ordNub` is not only in a different complexity class, but even seems to
 perform better than nub for very small numbers of actually different
 list elements (that's the numbers before the benchmark names).

 (The benchmark also shows some other potential problem: Using a state
 monad to keep the set instead of a function argument can be up to 20
 times slower. Should that happen?)

 What do you think about ordNub?

 I've seen a proposal from 5 years ago about adding a *sort*Nub function
 started by Neil, but it just died.


 (*) The mentioned complexity is for the (very common) worst case, in
 which the number of different elements in the list grows with the list
 (alias you don't have an N element list with always only 5 different
 things inside).

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe