Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-03-14 Thread Malcolm Wallace


On 22 Feb 2011, at 22:21, Bryan O'Sullivan wrote:


for some code that's (b) faster than anything else currently available


I look forward to seeing some benchmarks against libraries other than  
containers, such as AVL trees, bytestring-trie, hamtmap, list-trie,  
etc.  Good comparisons of different API-usage patterns are hard to  
come by.


Regards,
Malcolm

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-03-14 Thread Johan Tibell
On Mar 14, 2011 6:23 PM, Malcolm Wallace malcolm.wall...@me.com wrote:


 On 22 Feb 2011, at 22:21, Bryan O'Sullivan wrote:

 for some code that's (b) faster than anything else currently available


 I look forward to seeing some benchmarks against libraries other than
containers, such as AVL trees, bytestring-trie, hamtmap, list-trie, etc.
 Good comparisons of different API-usage patterns are hard to come by.

Milan Straka compared containers to a number of other libraries (including
some of those you mentioned) and found them all to be slower. Since
unordered-containers is faster (or sometimes as fast) as containers I
haven't really bothered comparing it to libraries other than containers.

Johan
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-24 Thread Joachim Breitner
Hi,

Am Mittwoch, den 23.02.2011, 20:06 -0500 schrieb wren ng thornton:
 On 2/23/11 4:42 PM, Sterling Clover wrote:
  A quick grep of some of my own source reveals that I've used M.size and 
  S.size only to test for sizes equal to 1. So, for my purposes at least, an 
  O(1) isSingleton operation would be just as useful as an O(1) size.
 
 I agree, a fast isSingleton function would cover a very common use 
 case--- especially for set-like containers.

would ghc’s rule system be strong enough to replace size m == 1 by
isSingleton m? It would be nice if programmers get the advantage even
when they did not notice that a isSingleton function is provided.

Greetings,
Joachim

-- 
Joachim nomeata Breitner
  mail: m...@joachim-breitner.de | ICQ# 74513189 | GPG-Key: 4743206C
  JID: nome...@joachim-breitner.de | http://www.joachim-breitner.de/
  Debian Developer: nome...@debian.org


signature.asc
Description: This is a digitally signed message part
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-24 Thread Yitzchak Gale
Gwern Branwen wrote:
 You could look at them yourself; I attached the files.

Max Bolingbroke wrote:
 Frankly I am surprised how much size gets used.
 It seems that making it fast is more important than
 I thought.

Johan Tibell wrote:
 IntMap (which shares data structure with HashMap) only hash O(n) size.
 I wonder if people avoid using IntMap because of this.

Another common usage for Map is as a functional integer-indexed
random access array.

Once I implemented the standard algorithm for random shuffle
of a list using Data.Map Int. It was much nicer than the STArray
version, in my opinion. But when I tried switching to Data.IntMap,
hoping to make it faster, I was devastatingly disappointed. Now
I understand why.

-Yitz

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-24 Thread Ross Paterson
On Wed, Feb 23, 2011 at 08:45:47AM +, Max Bolingbroke wrote:
  3. Some map combining algorithms work more efficiently when one of
 their two arguments are smaller. For example, Data.Map.union is most
 efficient for (bigmap `union` smallmap). If you don't care about which
 of the two input maps wins when they contain the same keys, you can
 improve constant factors by testing the size of the map input to size
 (in O(1)) and flipping the arguments if you got (smallmap `union`
 bigmap) instead of the desirable way round.

This is a most unfortunate feature of Data.Map, forcing users to bend the
logic of their application to suit the library.  Ideally you'd want binary
operations that are symmetric and associative performance-wise, freeing
users to follow the structure of their problem domain.  The documentation
(and the original paper) doesn't quantify the gain for such unions,
but hopefully it achieves the optimum: O(m log(n/m)), where m and n are
the sizes of the smaller and larger trees respectively.  Then building
a tree costs O(n log(n)), regardless of the way you do it.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-24 Thread wren ng thornton

On 2/23/11 8:06 PM, wren ng thornton wrote:

On 2/23/11 4:42 PM, Sterling Clover wrote:

A quick grep of some of my own source reveals that I've used M.size
and S.size only to test for sizes equal to 1. So, for my purposes at
least, an O(1) isSingleton operation would be just as useful as an
O(1) size.


I agree, a fast isSingleton function would cover a very common use
case--- especially for set-like containers.


On reflection, in order to handle all the comparisons of size to some 
small constant, it would be better to have two functions: one for 
comparing the size of a collection against a boundary condition, and one 
for comparing the size of two collections. That is, you should add 
analogues of the functions in Data.List.Extras.LazyLength[1].


The lazy boundary condition function is O(min(m,n)) where m is the size 
of the boundary and n is the size of the collection. For detecting 
singletons, doubletons, etc, this reduces to O(1) though there may be a 
constant factor hit vs dedicated isSingleton/isDoubleton/et functions. 
It would be a slight difference that diminishes as m increases, but it 
could be worth benchmarking...


Similarly, the lazy comparative size function is O(min(m,n)) where m and 
n are the sizes of the two collections. This is always less than the 
O(m)+O(n) of taking the sizes individually[2], but it's remarkably less 
when one of the collections is known to be much smaller than the other 
(even if you don't know which is which).



[1] 
http://hackage.haskell.org/packages/archive/list-extras/0.4.0.1/doc/html/Data-List-Extras-LazyLength.html


[2] The O(m)+O(n) can be parallelized to yield O(max(m,n)) but that's 
still greater than O(min(m,n)). With some form of chunked-lazy natural 
numbers you may be able to get the parallel version to approach 
O(max(m,n)) and then get into a battle of constant factors.


--
Live well,
~wren

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-24 Thread wren ng thornton

On 2/24/11 3:05 AM, Joachim Breitner wrote:

Hi,

Am Mittwoch, den 23.02.2011, 20:06 -0500 schrieb wren ng thornton:

On 2/23/11 4:42 PM, Sterling Clover wrote:

A quick grep of some of my own source reveals that I've used M.size and S.size 
only to test for sizes equal to 1. So, for my purposes at least, an O(1) 
isSingleton operation would be just as useful as an O(1) size.


I agree, a fast isSingleton function would cover a very common use
case--- especially for set-like containers.


would ghc’s rule system be strong enough to replace size m == 1 by
isSingleton m? It would be nice if programmers get the advantage even
when they did not notice that a isSingleton function is provided.


Not really. I mean, yes, you could use rules for that; but it would be 
fragile and susceptible to breakage due to refactoring.


Since maps are finite, you shouldn't get any changes in the domain 
semantics (i.e., I believe that whether you hit bottom or not cannot 
change based on whether rules fire), but you could get asymptotic 
changes (since size is O(n) times whatever loop you're in) and you can 
get memory leak changes as well (depending on whether CSE fires or not). 
For me, allowing that much semantic variability based on whether rules 
fire or not is unacceptable. This is why I disabled the rewrite rules in 
Data.List.Extras.LazyLength[1] (which, granted, is even worse since 
infinite lists means that you can get domain semantic differences as well).


I'd rather advocate for having smart/lazy size comparison functions like 
Data.List.LazyLength offers, advertising them well, and then relying on 
users to say what they mean.



[1] 
http://hackage.haskell.org/packages/archive/list-extras/0.4.0.1/doc/html/Data-List-Extras-LazyLength.html


--
Live well,
~wren

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread Stephen Tetley
On 23 February 2011 05:31, Johan Tibell johan.tib...@gmail.com wrote:

 Can someone come up with a real world example where O(1) size is important?


Tangentially - if you changed the API so the size function was called
'count' rather than 'size' or 'length', there would be no shame what's
so ever in not being O(1).

Problem solved :-)

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread Max Bolingbroke
On 23 February 2011 05:31, Johan Tibell johan.tib...@gmail.com wrote:
 On Tue, Feb 22, 2011 at 9:19 PM, Johan Tibell johan.tib...@gmail.com wrote:
 Initial numbers suggest that lookup gets 3% slower and insert/delete
 6% slower. The upside is O(1) size.

 Can someone come up with a real world example where O(1) size is important?

I'm a bit sceptical that it is (I was not convinced by the earlier
strict-set-inclusion argument, since that's another Data.Map feature
I've never used). I thought of some other possibilities though:
 1. If copying an unordered-collection to a flat array you can improve
the constant factors (not the asymptotics) with O(1) size to
pre-allocate the array
 2. If building a map in a fixed point loop (I personally do this a
lot) where you know that the key uniquely determines the element, you
can test if a fixed point is reached in O(1) by just comparing the
sizes. Depending what you are taking a fixed point of, this may change
the asymptotics
 3. Some map combining algorithms work more efficiently when one of
their two arguments are smaller. For example, Data.Map.union is most
efficient for (bigmap `union` smallmap). If you don't care about which
of the two input maps wins when they contain the same keys, you can
improve constant factors by testing the size of the map input to size
(in O(1)) and flipping the arguments if you got (smallmap `union`
bigmap) instead of the desirable way round.

Personally I don't find any of these *particularly* compelling. But a
~6% slowdown for this functionality is not too bad - have you had a
chance to look at the core to see if the cause of the slowdown
manifests itself at that level? Perhaps it is possible to tweak the
code to make this cheaper.

Also, what was the size of the collections you used in your benchmark?
I would expect the relative cost of maintaining the size to get lower
as you increased the size of the collection.

Max

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread Alberto G. Corona
I took a rapid look and they seem a replacement for pure Map's, but not for
 mutable HashTable's. Sorry if it isn't the case. I don´t know if
Data.HashTable has improved, but the performance used to be very poor in
comparison with other languages.

The point is that pure data structures can not be used as shared data in
multithreaded environments. They must be encapsulated in mutable blocking
references, such is MVars, and the whole update process blocks any other
thread . (it is worst when using TVars)  So they are not a replacement for
mutable data structures such is Data.HashTable.

For this reason I think that an inprovement/mutable-replacement of
Data.HashTable is needed. If this hasn´t been done already. Are there
some improvements on it that I don't know?

2011/2/23 Max Bolingbroke batterseapo...@hotmail.com

 On 23 February 2011 05:31, Johan Tibell johan.tib...@gmail.com wrote:
  On Tue, Feb 22, 2011 at 9:19 PM, Johan Tibell johan.tib...@gmail.com
 wrote:
  Initial numbers suggest that lookup gets 3% slower and insert/delete
  6% slower. The upside is O(1) size.
 
  Can someone come up with a real world example where O(1) size is
 important?

 I'm a bit sceptical that it is (I was not convinced by the earlier
 strict-set-inclusion argument, since that's another Data.Map feature
 I've never used). I thought of some other possibilities though:
  1. If copying an unordered-collection to a flat array you can improve
 the constant factors (not the asymptotics) with O(1) size to
 pre-allocate the array
  2. If building a map in a fixed point loop (I personally do this a
 lot) where you know that the key uniquely determines the element, you
 can test if a fixed point is reached in O(1) by just comparing the
 sizes. Depending what you are taking a fixed point of, this may change
 the asymptotics
  3. Some map combining algorithms work more efficiently when one of
 their two arguments are smaller. For example, Data.Map.union is most
 efficient for (bigmap `union` smallmap). If you don't care about which
 of the two input maps wins when they contain the same keys, you can
 improve constant factors by testing the size of the map input to size
 (in O(1)) and flipping the arguments if you got (smallmap `union`
 bigmap) instead of the desirable way round.

 Personally I don't find any of these *particularly* compelling. But a
 ~6% slowdown for this functionality is not too bad - have you had a
 chance to look at the core to see if the cause of the slowdown
 manifests itself at that level? Perhaps it is possible to tweak the
 code to make this cheaper.

 Also, what was the size of the collections you used in your benchmark?
 I would expect the relative cost of maintaining the size to get lower
 as you increased the size of the collection.

 Max

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread Gregory Collins
On Wed, Feb 23, 2011 at 12:30 PM, Alberto G. Corona agocor...@gmail.com wrote:
 For this reason I think that an inprovement/mutable-replacement of
 Data.HashTable is needed. If this hasn´t been done already. Are there
 some improvements on it that I don't know?

I've been working on one lately, some preliminary benchmarks:

https://gist.github.com/826935

It's probably a month or two away from a releasable state, but my
work-in-progress is substantially faster (4-6X) than Data.Hashtable
for inserts and lookups.

G
-- 
Gregory Collins g...@gregorycollins.net

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread Roman Leshchinskiy
Johan Tibell wrote:

 I'm working on a patch that provides O(1) size right now. The trick is
 to define HashMap as:

 data HashMap k v = HM {-# UNPACK #-} !Int !(Tree k v)

Another possibility is:

data HashMap k v = HM Int !(Tree k v)

hashMap t = HM (treeSize t) t

That way size is O(n) on first use but O(1) afterwards. Then again, if
someone really needs this they can program it themselves. I've never
needed an O(1) size for maps.

Roman




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread Victor Nazarov
On Sat, Feb 19, 2011 at 4:38 AM, Johan Tibell johan.tib...@gmail.com wrote:
 Hi all,

 I am delighted to announce the release of preview versions of two new 
 packages:

 unordered-containers 0.1
 Efficient hashing-based container types.
 http://hackage.haskell.org/package/unordered-containers

 hashable 1.1
 Efficient hashing of common types.
 http://hackage.haskell.org/package/hashable

 These packages address a need for faster container types in Haskell,
 especially for string types like ByteString and Text. The package
 achieves better performance by using hashing and a Patricia trie (also
 known as a radix tree), instead of a balanced tree, in its
 implementation.

 The provided HashMap type is three times faster than Data.Map for
 lookups and twice as fast for inserts, for all key types I've tried,
 including ByteString, String, and Int.

 I'm calling this a preview release, even though the code hash been
 well tested both for correctness and performance, as the API is quite
 basic and doesn't provide a HashSet type yet. One thing you'll notice
 is that the API is split into a value-strict and a value-lazy version,
 making it easier to use in a strict setting.

 If you want to contribute, please get copies of the source trees from here:

 * git clone https://github.com/tibbe/unordered-containers.git
 * git clone https://github.com/tibbe/hashable.git

 Alternatively, just fork the projects on GitHub:

 http://github.com/tibbe

 As I mentioned in my talk on hashing-based containers earlier this
 week, I'm working on an even faster version, based on hash-array
 mapped tries, but it won't be ready until GHC 7.2.

 Johan


What about making Hashable subclass of Eq

class Eq a = Hashable a where

I think it's is obvious from Hasable class properties that some kind
of equality is needed and I think it will reduce type class
constraints in actual hash table implementation in
unordered-containers package.

Also I think that value of hash functions is obviously a Monoid and it
will be convenient to have Monoid instance

newtype Hash = Hash Int
instance Monoid Hash where
  mempty = Hash 0
  Hash a `mappend` Hash b = Hash (a `combine` b)

class Eq a = Hashable a where
hash :: a - Hash
hashWithSalt :: Hash - a - Hash

hashWithSalt salt x = salt `mappend` hash x

-- 
Victor Nazarov

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread Emil Axelsson

2011-02-23 13:56, Victor Nazarov skrev:

Also I think that value of hash functions is obviously a Monoid and it
will be convenient to have Monoid instance

 newtype Hash = Hash Int
 instance Monoid Hash where
   mempty = Hash 0
   Hash a `mappend` Hash b = Hash (a `combine` b)

 class Eq a =  Hashable a where
 hash :: a -  Hash
 hashWithSalt :: Hash -  a -  Hash

 hashWithSalt salt x = salt `mappend` hash x


Monoid would be a good idea if combine was associative :)

  Prelude Data.Hashable let a = hash a
  Prelude Data.Hashable let b = hash b
  Prelude Data.Hashable let c = hash c
  Prelude Data.Hashable (a `combine` b) `combine` c
  198573605
  Prelude Data.Hashable a `combine` (b `combine` c)
  177445

/ Emil


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread Max Bolingbroke
On 23 February 2011 12:05, Gregory Collins g...@gregorycollins.net wrote:
 I've been working on one lately, some preliminary benchmarks:

    https://gist.github.com/826935

 It's probably a month or two away from a releasable state, but my
 work-in-progress is substantially faster (4-6X) than Data.Hashtable
 for inserts and lookups.

Sounds cool. Can you use your HashTable in the ST monad? I have often
found myself wanting a mutable hashtable as a subcomponent in an
externally-pure computation.

Max

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread Gregory Collins
On Wed, Feb 23, 2011 at 3:49 PM, Max Bolingbroke
batterseapo...@hotmail.com wrote:
 On 23 February 2011 12:05, Gregory Collins g...@gregorycollins.net wrote:
 I've been working on one lately, some preliminary benchmarks:

    https://gist.github.com/826935

 It's probably a month or two away from a releasable state, but my
 work-in-progress is substantially faster (4-6X) than Data.Hashtable
 for inserts and lookups.

 Sounds cool. Can you use your HashTable in the ST monad? I have often
 found myself wanting a mutable hashtable as a subcomponent in an
 externally-pure computation.

Yep, all of the functions are currently in ST.

G
-- 
Gregory Collins g...@gregorycollins.net

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread Johan Tibell
Thanks for the examples. Point 3 is interesting but most of the gain
there could probably be had by telling the user to use (bigmap `union`
smallmap). My guess is that the user has a good idea which argument is
larger/smaller.

On Wed, Feb 23, 2011 at 12:45 AM, Max Bolingbroke
batterseapo...@hotmail.com wrote:
 Personally I don't find any of these *particularly* compelling. But a
 ~6% slowdown for this functionality is not too bad - have you had a
 chance to look at the core to see if the cause of the slowdown
 manifests itself at that level? Perhaps it is possible to tweak the
 code to make this cheaper.

I did look at the core for

test :: HashMap Int Int - HashMap Int Int
test hm = insert 42 12 hm

And I didn't see anything that looked particularly bad. The core uses
unboxed values everywhere possible and the recursive function (i.e.
inserts) returns an unboxed pair (# Int#, Tree k v #). The code for
O(1) size i here:


https://github.com/tibbe/unordered-containers/commit/77bc43acad2eb6b29472cd6ea21c3efb677cae0d

 Also, what was the size of the collections you used in your benchmark?
 I would expect the relative cost of maintaining the size to get lower
 as you increased the size of the collection.

I tried with 2^12.

Johan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread Johan Tibell
On Wed, Feb 23, 2011 at 3:30 AM, Alberto G. Corona agocor...@gmail.com wrote:
 The point is that pure data structures can not be used as shared data in
 multithreaded environments. They must be encapsulated in mutable blocking
 references, such is MVars, and the whole update process blocks any other
 thread . (it is worst when using TVars)  So they are not a replacement for
 mutable data structures such is Data.HashTable.

Depends on your use case. With one writer and multiple readers an
IORef containing a persistent data structure works well. Readers do
not block writers and can they execute concurrently.

HashTable is not a concurrent data structure. You need e.g. a lock
free mutable hash table.

Johan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread Johan Tibell
On Wed, Feb 23, 2011 at 4:56 AM, Victor Nazarov
asviraspossi...@gmail.com wrote:
 What about making Hashable subclass of Eq

    class Eq a = Hashable a where

 I think it's is obvious from Hasable class properties that some kind
 of equality is needed and I think it will reduce type class
 constraints in actual hash table implementation in
 unordered-containers package.

The documentation could perhaps be worded better: if 'a' is also an
instance of 'Eq', then the stated property should hold. You can
imagine things that can be hashed by not compared for equality. Also,
I don't know if type classes are a good way to model sub-typing
relationships (see Oleg's email about the Functor/Applicative/Monad
proposal).

Johan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread Jan-Willem Maessen
On Wed, Feb 23, 2011 at 11:08 AM, Johan Tibell johan.tib...@gmail.com wrote:
 ...
 HashTable is not a concurrent data structure. You need e.g. a lock
 free mutable hash table.

Good implementations of which are *not* thick on the ground.  Even
java.util.concurrent isn't fully lock-free.

-Jan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread Max Bolingbroke
On 23 February 2011 16:03, Johan Tibell johan.tib...@gmail.com wrote:
 Thanks for the examples. Point 3 is interesting but most of the gain
 there could probably be had by telling the user to use (bigmap `union`
 smallmap). My guess is that the user has a good idea which argument is
 larger/smaller.

Totally agreed - this matches my experience with Map/Set.

 And I didn't see anything that looked particularly bad. The core uses
 unboxed values everywhere possible and the recursive function (i.e.
 inserts) returns an unboxed pair (# Int#, Tree k v #).

Right. I was wondering if you were returning (Int, Tree k v), in which
case CPR wouldn't unbox the Int - but I see you already thought of
that issue.

By the way, do you plan to add a HashSet to complement HashMap?

Thanks for all the work on the library!
Max

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread Johan Tibell
On Wed, Feb 23, 2011 at 12:46 PM, Max Bolingbroke
batterseapo...@hotmail.com wrote:
 By the way, do you plan to add a HashSet to complement HashMap?

Yes, definitely. I want to flesh out the HashMap API some more and
then add a HashSet implementation. The code is still moving around a
lot (e.g. between modules) and I like it to settle down a bit before I
create HashSet (to avoid having more code to move around).

Johan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread Johan Tibell
On Wed, Feb 23, 2011 at 12:57 PM, Gwern Branwen gwe...@gmail.com wrote:
 On Wed, Feb 23, 2011 at 12:45 AM, Max Bolingbroke
 batterseapo...@hotmail.com wrote:
 I'm a bit sceptical that it is (I was not convinced by the earlier
 strict-set-inclusion argument, since that's another Data.Map feature
 I've never used). I thought of some other possibilities though:
  1. If copying an unordered-collection to a flat array you can improve
 the constant factors (not the asymptotics) with O(1) size to
 pre-allocate the array

 For kicks, I grepped my local repos for users of Data.Set's size
 function. My results are imperfect (I have few Github repos, I didn't
 check .lhs files, I only grepped files for 'Set.size' or 'S.size'),
 but I found around 100 uses of Data.Set.size.

Could you manually look at some of them to see if you find something
interesting. In particular `Set.size s == 0` (a common use of size in
imperative languages) could be replaced by `Set.null s`.

Johan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread Gwern Branwen
On Wed, Feb 23, 2011 at 1:18 PM, Johan Tibell johan.tib...@gmail.com wrote:

 Could you manually look at some of them to see if you find something
 interesting. In particular `Set.size s == 0` (a common use of size in
 imperative languages) could be replaced by `Set.null s`.

You could look at them yourself; I attached the files. I see 6 uses
out of ~100 which involve an == 0

-- 
gwern
http://www.gwern.net

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread Johan Tibell
On Wed, Feb 23, 2011 at 1:27 PM, Gwern Branwen gwe...@gmail.com wrote:
 You could look at them yourself; I attached the files. I see 6 uses
 out of ~100 which involve an == 0

Looks like the mailing list gateway didn't let your attachements
through. No need to attach them though, I can just grep Hackage for
uses.

Johan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread Sterling Clover
A quick grep of some of my own source reveals that I've used M.size and S.size 
only to test for sizes equal to 1. So, for my purposes at least, an O(1) 
isSingleton operation would be just as useful as an O(1) size.

Cheers,
Sterl

On Feb 23, 2011, at 4:32 PM, Johan Tibell wrote:

 On Wed, Feb 23, 2011 at 1:27 PM, Gwern Branwen gwe...@gmail.com wrote:
 You could look at them yourself; I attached the files. I see 6 uses
 out of ~100 which involve an == 0
 
 Looks like the mailing list gateway didn't let your attachements
 through. No need to attach them though, I can just grep Hackage for
 uses.
 
 Johan
 
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread Max Bolingbroke
On 23 February 2011 21:27, Gwern Branwen gwe...@gmail.com wrote:
 On Wed, Feb 23, 2011 at 1:18 PM, Johan Tibell johan.tib...@gmail.com wrote:

 Could you manually look at some of them to see if you find something
 interesting. In particular `Set.size s == 0` (a common use of size in
 imperative languages) could be replaced by `Set.null s`.

 You could look at them yourself; I attached the files. I see 6 uses
 out of ~100 which involve an == 0

Thanks for bringing some data to the table. There are definitely some
common patterns in what you sent me:

1) For defining Binary instances, you need to write set size before
you write the elements: ~7 occurrences
2) Tests against small constants (typically = 0 or 1, but also 2 and
3!): ~15 occurrences
3) A surprise to me: generating fresh names! People keep a set of all
names generated so far, and then just take size+1 as their fresh name.
Nice trick. ~17 occurrences
4) Turning sizes into strings for human consumption: ~19 occurrences
5) Just reexporting the functions somehow. Uninformative. ~8 occurrences

There were ~38 occurrences over ~13 repos where it appeared to be
somehow fundamental to an algorithm (I didn't look into most of these
in detail). I've put those after my message.

Frankly I am surprised how much size gets used. It seems that making
it fast is more important than I thought.

Cheers,
Max

===

Fundamental-looking occurrences:

bin/folkung/folkung/Clausify.hs:  siz (And ps)= S.size ps
bin/folkung/folkung/Paradox/Flatten.hs:| S.size cl = 1 + S.size bs
bin/folkung/folkung/Paradox/Flatten.hs:   || S.size cl = S.size
cl' = largestClique cl gr'
bin/folkung/folkung/Paradox/Flatten.hs:  -- 
S.size (free xs) = 1
bin/folkung/folkung/Paradox/Flatten.hs:n   = S.size (free ls)
bin/folkung/folkung/Paradox/Flatten.hs:  , S.size ws  n-1
bin/folkung/folkung/Paradox/Flatten.hs:(S.size s1,tpsize
v2,inter s2) `compare` (S.size s2,tpsize v1,inter s1)
bin/folkung/folkung/Paradox/Flatten.hs:sum [ S.size (s
`S.intersection` vs) | (v,vs) - cons, v `S.member` s  ]
bin/folkung/folkung/Paradox/Flatten.hs:, S.size ws'  S.size freeRight-1
bin/folkung/folkung/Paradox/Solve.hs:  degree x = S.size . free $ x
bin/gf/src/compiler/GF/Compile/GeneratePMCFG.hs:  | product
(map Set.size ys) == count
bin/gf/src/compiler/GF/Speech/CFGToFA.hs:indeg (c,_) =
maybe 0 Set.size $ Map.lookup c ub
bin/gf/src/compiler/GF/Speech/CFGToFA.hs:  where (fa', ns) = newStates
(replicate (Set.size cs) ()) fa
bin/gf/src/GF/Compile/GeneratePMCFG.hs:  | product (map
Set.size ys) == count =
bin/gf/src/GF/Compile/GeneratePMCFGOld.hs:  | product (map
Set.size ys) == count =
bin/gf/src/GF/Data/MultiMap.hs:size = sum . Prelude.map Set.size . Map.elems
bin/gf/src/GF/Speech/CFGToFA.hs:indeg (c,_) = maybe 0
Set.size $ Map.lookup c ub
bin/gf/src/GF/Speech/CFGToFA.hs:  where (fa', ns) = newStates
(replicate (Set.size cs) ()) fa
bin/halfs/Halfs/FSRoot.hs: $
Set.size $ fsRootAllInodeNums fsroot
bin/halfs/Halfs.hs:when (Set.size freeSet /= length freeList) (
bin/hcorpus/hcorpus.hs:   let rank = 1 `fidiv` Set.size wrds,
bin/hoogle/src/Hoogle/DataBase/TypeSearch/Binding.hs:g
(l, vs) = Just $ [restrict|isJust l] ++ replicate (max 0 $ Set.size vs
- 1) var
bin/htab/src/Formula.hs:countNominals f = Set.size $ extractNominals f
bin/hylores-2.5/src/HyLoRes/Clause/BasicClause.hs:size  =
Set.size . toFormulaSet
bin/hylores-2.5/src/HyLoRes/Subsumption/ClausesByFormulaIndex.hs:
let sortedCandidates = sortBy
(compareUsing Set.size) subsCandidates
bin/hylores-diego/src/HyLoRes/Clause/BasicClause.hs:size
= Set.size . toFormulaSet
bin/hylores-diego/src/HyLoRes/Subsumption/ClausesByFormulaIndex.hs:
  let sortedCandidates = sortBy
(compareUsing Set.size) subsCandidates
bin/ipclib/Language/Pepa/Compile/States.hs:Just limit -
((stateSpaceSize seen) + (Set.size stack))  limit
bin/jhc/src/Grin/SSimplify.hs:v n | n `IS.member` s = v (1 + n
+ IS.size s)
bin/jhc/src/Ho/Build.hs:maxModules - Set.size `fmap` countNodes cn
bin/jhc/src/Ho/Build.hs:maxModules - Set.size `fmap` countNodes cn
bin/lhc/src/Grin/Optimize/CallPattern.hs:  nPatterns =
Set.size callPatterns
bin/proteinvis/Graphics/Visualization/Tree/Geometry.hs:where theta
= ((fromIntegral index - (fromIntegral (S.size . fst . S.split (Name
name) $ missingLeaves))) * 2 * 3.14157) / (num_leaves - fromIntegral
(S.size missingLeaves))
bin/proteinvis/ProgramState.hs:  c = S.size go
bin/proteinvis/Protein.hs: , term_count = S.size terms
bin/proteinvis/Protein.hs:
  , term_count = S.size terms
bin/protocol-buffers/hprotoc/Text/ProtocolBuffers/ProtoCompile/Resolve.hs:
 when (Set.size numbers /= Seq.length (D.DescriptorProto.field dp)) $

Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread Johan Tibell
On Wed, Feb 23, 2011 at 1:55 PM, Max Bolingbroke
batterseapo...@hotmail.com wrote:
 Thanks for bringing some data to the table. There are definitely some
 common patterns in what you sent me:

 1) For defining Binary instances, you need to write set size before
 you write the elements: ~7 occurrences
 2) Tests against small constants (typically = 0 or 1, but also 2 and
 3!): ~15 occurrences
 3) A surprise to me: generating fresh names! People keep a set of all
 names generated so far, and then just take size+1 as their fresh name.
 Nice trick. ~17 occurrences
 4) Turning sizes into strings for human consumption: ~19 occurrences
 5) Just reexporting the functions somehow. Uninformative. ~8 occurrences

 There were ~38 occurrences over ~13 repos where it appeared to be
 somehow fundamental to an algorithm (I didn't look into most of these
 in detail). I've put those after my message.

 Frankly I am surprised how much size gets used. It seems that making
 it fast is more important than I thought.

Nice analysis. Does this apply to maps as well as sets or are sets use
differently than maps somehow?

IntMap (which shares data structure with HashMap) only hash O(n) size.
I wonder if people avoid using IntMap because of this.

I wonder if there's a way to implement size that doesn't mess up the
code so badly (see the commit on GitHub to see how badly it messes up
e.g. insert).

Johan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread Johan Tibell
Attached are all the uses of S.size and Set.size from a semi-recent
snapshot of Hackage.

Johan
Combinatorrent/0.3.2/Combinatorrent-0.3.2/src/Process/Peer.hs:let sz = 
S.size q
Combinatorrent/0.3.2/Combinatorrent-0.3.2/src/Process/PieceMgr.hs:
ipHave = S.size . ipHaveBlocks
Combinatorrent/0.3.2/Combinatorrent-0.3.2/src/Process/PieceMgr.hs:when 
( (S.size $ ipHaveBlocks ipp) = ipDone ipp)
cpsa/2.2.1/cpsa-2.2.1/src/CPSA/Graph/Tree.hs:  if S.size new == S.size 
old then
darcswatch/0.4.3/darcswatch-0.4.3/src/HTML.hs:  show (S.size (r2p d  r)) 
+++ 
EdisonCore/1.2.1.3/EdisonCore-1.2.1.3/src/Data/Edison/Coll/UnbalancedSet.hs:unsafeFromOrdSeq
 xs = fst (ins xs (S.size xs))
EdisonCore/1.2.1.3/EdisonCore-1.2.1.3/src/Data/Edison/Seq/RevSeq.hs:fromSeq xs 
= N (S.size xs - 1) xs
EdisonCore/1.2.1.3/EdisonCore-1.2.1.3/src/Data/Edison/Seq/RevSeq.hs:k = 
S.size ys
EdisonCore/1.2.1.3/EdisonCore-1.2.1.3/src/Data/Edison/Seq/RevSeq.hs:k = 
S.size ys
EdisonCore/1.2.1.3/EdisonCore-1.2.1.3/src/Data/Edison/Seq/RevSeq.hs:structuralInvariant
 (N i s) = i == ((S.size s) - 1)
EdisonCore/1.2.1.3/EdisonCore-1.2.1.3/src/Data/Edison/Seq/SizedSeq.hs:fromSeq 
xs = N (S.size xs) xs
EdisonCore/1.2.1.3/EdisonCore-1.2.1.3/src/Data/Edison/Seq/SizedSeq.hs:m 
= S.size ys
EdisonCore/1.2.1.3/EdisonCore-1.2.1.3/src/Data/Edison/Seq/SizedSeq.hs:m 
= S.size ys
EdisonCore/1.2.1.3/EdisonCore-1.2.1.3/src/Data/Edison/Seq/SizedSeq.hs:structuralInvariant
 (N i s) = i == S.size s
fountain/0.0.1/fountain-0.0.1/Codec/Fountain.hs:| S.size s == degree = (s, 
g)
fountain/0.0.1/fountain-0.0.1/Codec/Fountain.hs:  | S.size indices == 0 = 
decode' decoder newDroplets
fountain/0.0.1/fountain-0.0.1/Codec/Fountain.hs:  | S.size indices == 1 = 
decode' (Decoder messageLength extraSymbols (M.insert (head $ S.toList indices) 
symbol symbols) old) (new ++ newDroplets)
hashmap/1.1.0/hashmap-1.1.0/Data/HashSet.hs:some_size (More t) = S.size 
t
hashmap/1.1.0/hashmap-1.1.0/Data/HashSet.hs:some_norm s = case S.size s of 0 - 
Nothing
hashmap/1.1.0/hashmap-1.1.0/Data/HashSet.hs:some_norm' s = case S.size s of 1 
- Only $ S.findMin s
HaskellForMaths/0.3.1/HaskellForMaths-0.3.1/Math/Algebra/Group/PermutationGroup.hs:order
 gs = S.size $ eltsS gs -- length $ elts gs
HaskellForMaths/0.3.1/HaskellForMaths-0.3.1/Math/Combinatorics/LatinSquares.hs:isOneOfEach
 xs = length xs == S.size (S.fromList xs)
HaskellTorrent/0.1.1/HaskellTorrent-0.1.1/src/Process/Peer.hs:let 
sz = S.size q
HaskellTorrent/0.1.1/HaskellTorrent-0.1.1/src/Process/PieceMgr.hs:
ipHave = S.size . ipHaveBlocks
HaskellTorrent/0.1.1/HaskellTorrent-0.1.1/src/Process/PieceMgr.hs:when 
( (S.size $ ipHaveBlocks ipp) = ipDone ipp)
hburg/1.1.2/hburg-1.1.2/src/Csa/Csa.hs:  ' - expected 
type++ (if (S.size p  1) then s else ) ++
hgom/0.6/hgom-0.6/Gom/Checker.hs:f s = S.size s  1
Holumbus-MapReduce/0.1.1/Holumbus-MapReduce-0.1.1/Examples/MapReduce/Crawler/Crawl.hs:
   runX (traceMsg 1 (  Status: already processed:  ++ show 
(S.size $ cs_wereProcessed cs) ++ 
Holumbus-MapReduce/0.1.1/Holumbus-MapReduce-0.1.1/Examples/MapReduce/Crawler/Crawl.hs:
 , to be processed:++ show (S.size $ 
cs_toBeProcessed cs)))
hoopl/3.8.6.0/hoopl-3.8.6.0/Compiler/Hoopl/Unique.hs:  setSize (US s) = S.size s
hs2bf/0.6.2/hs2bf-0.6.2/SAM.hs:when (S.size rs/=length args) $ report 
duplicate arguments
ideas/0.7/ideas-0.7/src/Domain/Math/Polynomial/RationalExercises.hs:  
S.size (varSet expr)  1
ideas/0.7/ideas-0.7/src/Domain/Math/Polynomial/RationalExercises.hs:   
manyVars = S.size (varSet a `S.union` varSet b)  1
mahoro/0.1.2/mahoro-0.1.2/DB.hs:delJID (n, jids) = if S.size jids == 1
minesweeper/0.9/minesweeper-0.9/Data/ChangeSet.hs:size= S.size . toSet
minesweeper/0.9/minesweeper-0.9/State/Functions.hs:= mines (configuration 
st) - S.size (marked $ game st)
minesweeper/0.9/minesweeper-0.9/Table.hs:=  S.size (marked g) + M.size 
(revealResults g) == msize c
minesweeper/0.9/minesweeper-0.9/Table.hs:= mines c - S.size (marked g)
panda/2009.4.1/panda-2009.4.1/src/Panda/Model/Tag.hs:sorted xs= 
xs.sortBy(compare_by (resources  S.size)).reverse
phybin/0.1.2/phybin-0.1.2/Bio/Phylogeny/PhyBin/Main.hs:putStrLn$ \nTotal 
unique taxa (++ show (S.size taxa) ++):  ++ 
regex-tdfa/1.1.7/regex-tdfa-1.1.7/Data/IntSet/EnumSet2.hs:size (EnumSet s) = 
S.size s
relacion/0.1/relacion-0.1/Data/Relacion.hs:size r  =   M.fold ((+) . S.size) 0 
(dominio r)
repa/1.1.0.0/repa-1.1.0.0/Data/Array/Repa.hs:   $! U.enumFromTo 
(0 :: Int) (S.size sh - 1)
repa/1.1.0.0/repa-1.1.0.0/Data/Array/Repa.hs:   | U.length uarr /= S.size sh
repa/1.1.0.0/repa-1.1.0.0/Data/Array/Repa.hs:   , size of 
shape =  ++ (show $ S.size sh)  ++ \n
repa/1.1.0.0/repa-1.1.0.0/Data/Array/Repa.hs:   $ (flip reshape) (Z :. 
(S.size $ extent arr1)) 

Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-23 Thread wren ng thornton

On 2/23/11 4:42 PM, Sterling Clover wrote:

A quick grep of some of my own source reveals that I've used M.size and S.size 
only to test for sizes equal to 1. So, for my purposes at least, an O(1) 
isSingleton operation would be just as useful as an O(1) size.


I agree, a fast isSingleton function would cover a very common use 
case--- especially for set-like containers.


--
Live well,
~wren

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-22 Thread Bryan O'Sullivan
On Sat, Feb 19, 2011 at 11:58 AM, Louis Wasserman wasserman.lo...@gmail.com
 wrote:


 size takes O(n).  That's just depressing.  Really.


That's rather thoughtless wording for some code that's (a) free (b) faster
than anything else currently available (c) in its very first release and (d)
available in source form for you to improve as you see fit. Just depressing.
Really.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-22 Thread Don Stewart
bos:
On Sat, Feb 19, 2011 at 11:58 AM, Louis Wasserman
[1]wasserman.lo...@gmail.com wrote:
 
  size takes O(n).  That's just depressing.  Really.
 
That's rather thoughtless wording for some code that's (a) free (b) faster
than anything else currently available (c) in its very first release and
(d) available in source form for you to improve as you see fit. Just
depressing. Really.

Agreed. This is open source: patches or it stays at O(n).

-- Don

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-22 Thread Louis Wasserman
Apologies!

I was, in point of fact, working on such a patch, but I think I've been
convinced that it's a legitimate position for a package to take.  (I'm also
curious, Johann, about the approach you figured out that didn't cost a word
per node!)

That said, I've been working with somewhat similar structures for my TrieMap
package, and I hadn't honestly considered the possibility of having size run
in anything but O(1).  (When I was comparing benchmarks with
bytestring-trie, I didn't realize for a few hours that all of my benchmarks
called size, so O(W) operations were being benchmarked at O(n).  That threw
me off..)

Louis Wasserman
wasserman.lo...@gmail.com
http://profiles.google.com/wasserman.louis


On Tue, Feb 22, 2011 at 4:28 PM, Don Stewart d...@galois.com wrote:

 bos:
 On Sat, Feb 19, 2011 at 11:58 AM, Louis Wasserman
 [1]wasserman.lo...@gmail.com wrote:
 
   size takes O(n).  That's just depressing.  Really.
 
 That's rather thoughtless wording for some code that's (a) free (b)
 faster
 than anything else currently available (c) in its very first release
 and
 (d) available in source form for you to improve as you see fit. Just
 depressing. Really.

 Agreed. This is open source: patches or it stays at O(n).

 -- Don

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-22 Thread Johan Tibell
On Tue, Feb 22, 2011 at 6:28 PM, Louis Wasserman
wasserman.lo...@gmail.com wrote:
 Apologies!

Accepted. :)

 I was, in point of fact, working on such a patch, but I think I've been
 convinced that it's a legitimate position for a package to take.  (I'm also
 curious, Johann, about the approach you figured out that didn't cost a word
 per node!)

I'm working on a patch that provides O(1) size right now. The trick is
to define HashMap as:

data HashMap k v = HM {-# UNPACK #-} !Int !(Tree k v)

data Tree k v
= Nil
| Tip {-# UNPACK #-} !Hash
  {-# UNPACK #-} !(FL.FullList k v)
| Bin {-# UNPACK #-} !Prefix
  {-# UNPACK #-} !Mask
  !(Tree k v)
  !(Tree k v)

And have insert, delete, and other operations that change the size
return both the updated tree and the size delta (-1, 0, or 1 in the
case of insert/delete). The size delta is then applied to the stored
size.

At the moment I'm trying to figure out the performance impact of
making this change.

Johan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-22 Thread Louis Wasserman
Here's the approach I was attempting: a while back there was a proposal to
modify Data.Map to support a sort of zipper, including the following API:

data Location k a -- represents a map with a hole at a particular key

search :: k - Map k a - (Maybe a, Location k a)

key :: Location k a - k

assign :: a - Location k a - Map k a
clear :: Location k a - Map k a

and from here you might define

insert :: k - a - Map k a - Map k a
insert k a m = case search k m of
   (_, loc) - assign a loc

The surprising thing was that this approach *increased* the speed of the Map
implementation by something like 7%.  I don't know what happened to the
original proposal, but I was going to try something similar, with something
like

data HashMap k v = HM !Int !(Tree k v)

insert k v (HM sz tree) = case search k tree of
(Nothing, loc) - HM (sz + 1) (assign v loc)
(Just _, loc) - HM sz (assign v loc)

Louis Wasserman
wasserman.lo...@gmail.com
http://profiles.google.com/wasserman.louis


On Tue, Feb 22, 2011 at 8:45 PM, Johan Tibell johan.tib...@gmail.comwrote:

 On Tue, Feb 22, 2011 at 6:28 PM, Louis Wasserman
 wasserman.lo...@gmail.com wrote:
  Apologies!

 Accepted. :)

  I was, in point of fact, working on such a patch, but I think I've been
  convinced that it's a legitimate position for a package to take.  (I'm
 also
  curious, Johann, about the approach you figured out that didn't cost a
 word
  per node!)

 I'm working on a patch that provides O(1) size right now. The trick is
 to define HashMap as:

 data HashMap k v = HM {-# UNPACK #-} !Int !(Tree k v)

 data Tree k v
= Nil
| Tip {-# UNPACK #-} !Hash
  {-# UNPACK #-} !(FL.FullList k v)
| Bin {-# UNPACK #-} !Prefix
  {-# UNPACK #-} !Mask
  !(Tree k v)
  !(Tree k v)

 And have insert, delete, and other operations that change the size
 return both the updated tree and the size delta (-1, 0, or 1 in the
 case of insert/delete). The size delta is then applied to the stored
 size.

 At the moment I'm trying to figure out the performance impact of
 making this change.

 Johan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-22 Thread Johan Tibell
On Tue, Feb 22, 2011 at 6:53 PM, Louis Wasserman
wasserman.lo...@gmail.com wrote:
 Here's the approach I was attempting: a while back there was a proposal to
 modify Data.Map to support a sort of zipper, including the following API:

 data Location k a -- represents a map with a hole at a particular key

 search :: k - Map k a - (Maybe a, Location k a)

 key :: Location k a - k

 assign :: a - Location k a - Map k a
 clear :: Location k a - Map k a

 and from here you might define

 insert :: k - a - Map k a - Map k a
 insert k a m = case search k m of
    (_, loc) - assign a loc

 The surprising thing was that this approach *increased* the speed of the Map
 implementation by something like 7%.

I'm pretty sure it was a decrease overall. The function under
discussion was something like insertWith.

The direct implementation of insertWith was the fastest. The zipper
approach was 7% faster than using a naive composition of lookup and
insert to implement insertWith. The slowdown was most likely caused by
the extra O(log n) allocations required to create the zipper (in
essence that means reify the call stack of lookup).

Johan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-22 Thread Louis Wasserman
Hmmmkay.  I'm not entirely convinced, 'cause I've seen genuine (and
benchmarked) speedups in applying some of these ideas to my TrieMap library,
but I don't have any examples handy.

Louis Wasserman
wasserman.lo...@gmail.com
http://profiles.google.com/wasserman.louis


On Tue, Feb 22, 2011 at 9:10 PM, Johan Tibell johan.tib...@gmail.comwrote:

 On Tue, Feb 22, 2011 at 6:53 PM, Louis Wasserman
 wasserman.lo...@gmail.com wrote:
  Here's the approach I was attempting: a while back there was a proposal
 to
  modify Data.Map to support a sort of zipper, including the following API:
 
  data Location k a -- represents a map with a hole at a particular key
 
  search :: k - Map k a - (Maybe a, Location k a)
 
  key :: Location k a - k
 
  assign :: a - Location k a - Map k a
  clear :: Location k a - Map k a
 
  and from here you might define
 
  insert :: k - a - Map k a - Map k a
  insert k a m = case search k m of
 (_, loc) - assign a loc
 
  The surprising thing was that this approach *increased* the speed of the
 Map
  implementation by something like 7%.

 I'm pretty sure it was a decrease overall. The function under
 discussion was something like insertWith.

 The direct implementation of insertWith was the fastest. The zipper
 approach was 7% faster than using a naive composition of lookup and
 insert to implement insertWith. The slowdown was most likely caused by
 the extra O(log n) allocations required to create the zipper (in
 essence that means reify the call stack of lookup).

 Johan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-22 Thread Johan Tibell
On Tue, Feb 22, 2011 at 6:45 PM, Johan Tibell johan.tib...@gmail.com wrote:
 I'm working on a patch that provides O(1) size right now. The trick is
 to define HashMap as:

 data HashMap k v = HM {-# UNPACK #-} !Int !(Tree k v)

 data Tree k v
    = Nil
    | Tip {-# UNPACK #-} !Hash
          {-# UNPACK #-} !(FL.FullList k v)
    | Bin {-# UNPACK #-} !Prefix
          {-# UNPACK #-} !Mask
          !(Tree k v)
          !(Tree k v)

 And have insert, delete, and other operations that change the size
 return both the updated tree and the size delta (-1, 0, or 1 in the
 case of insert/delete). The size delta is then applied to the stored
 size.

 At the moment I'm trying to figure out the performance impact of
 making this change.

 Johan

Initial numbers suggest that lookup gets 3% slower and insert/delete
6% slower. The upside is O(1) size.

Johan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-22 Thread Johan Tibell
On Tue, Feb 22, 2011 at 9:19 PM, Johan Tibell johan.tib...@gmail.com wrote:
 Initial numbers suggest that lookup gets 3% slower and insert/delete
 6% slower. The upside is O(1) size.

Can someone come up with a real world example where O(1) size is important?

Johan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-21 Thread Max Cantor
If you want to use the library and need a short term fix, just write a small 
wrapper type/module

newtype SizedMap = SizedMap (Int, HashMap) and track the size yourself.  only 
complication is that on inserts and deletes you'll need to check if the key 
existed.  other than that, it shouldn't be too difficult.  

This way, the library stays super optimized but, if you need, you can track the 
size. As Johan said, it would slow down insert and delete a bit.  shouldn't 
affect lookup though..

max

On Feb 20, 2011, at 11:40 AM, Louis Wasserman wrote:

 I'd like to complain about that, too ;)
 
 Louis Wasserman
 wasserman.lo...@gmail.com
 http://profiles.google.com/wasserman.louis
 
 
 On Sat, Feb 19, 2011 at 9:02 PM, Edward Kmett ekm...@gmail.com wrote:
 On Sat, Feb 19, 2011 at 7:27 PM, Sterling Clover s.clo...@gmail.com wrote:
 On Sat, Feb 19, 2011 at 3:04 PM, Johan Tibell johan.tib...@gmail.com wrote:
  On Sat, Feb 19, 2011 at 11:58 AM, Louis Wasserman
  wasserman.lo...@gmail.com wrote:
  A couple thoughts:
  size takes O(n).  That's just depressing.  Really.
 
  This applies to all the container types. We could support O(1) size at
  the cost of slowing down e.g lookup, insert, and delete a little bit.
  I haven't measure how much yet. Would it be worth it?
 
 Getting a bit picky, but for the record, Data.Map and Data.Sequence
 provide O(1) size, and Data.HashTable I believe stores the information
 but doesn't expose it from its tiny API. That's not an argument either
 way for what a HashMap should do, however :-)
 
 NB: Data.IntMap, which Data.HashMap is based on, actually only provides O(n) 
 size.
 
 -Edward
 
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-21 Thread Felipe Almeida Lessa
On Mon, Feb 21, 2011 at 12:58 PM, Max Cantor mxcan...@gmail.com wrote:
 If you want to use the library and need a short term fix, just write a small 
 wrapper type/module

 newtype SizedMap = SizedMap (Int, HashMap) and track the size yourself.  only 
 complication is that on inserts and deletes you'll need to check if the key 
 existed.  other than that, it shouldn't be too difficult.

 This way, the library stays super optimized but, if you need, you can track 
 the size. As Johan said, it would slow down insert and delete a bit.  
 shouldn't affect lookup though..

This isn't sufficient in all cases.  How would you know the resulting
size of a union or intersection?

Cheers,

-- 
Felipe.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-21 Thread Jan-Willem Maessen
On Mon, Feb 21, 2011 at 8:07 AM, Felipe Almeida Lessa
felipe.le...@gmail.com wrote:
 On Mon, Feb 21, 2011 at 12:58 PM, Max Cantor mxcan...@gmail.com wrote:
 If you want to use the library and need a short term fix, just write a small 
 wrapper type/module

 newtype SizedMap = SizedMap (Int, HashMap) and track the size yourself.  
 only complication is that on inserts and deletes you'll need to check if the 
 key existed.  other than that, it shouldn't be too difficult.

 This way, the library stays super optimized but, if you need, you can track 
 the size. As Johan said, it would slow down insert and delete a bit.  
 shouldn't affect lookup though..

 This isn't sufficient in all cases.  How would you know the resulting
 size of a union or intersection?

Note that the library does not at present support union or
intersection!  There are various ways to deal with the problem without
caching sizes at nodes, one of which is to count overlap during union
or intersection operations (since this involves counting leaves that
are visited during these operations).

-Jan-Willem Maessen

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-21 Thread Max Bolingbroke
On 20 February 2011 03:40, Louis Wasserman wasserman.lo...@gmail.com wrote:
 I'd like to complain about that, too ;)

I'm curious, when do people find that they need a really fast way to
get map size? I use them quite a lot, and almost never call length -
and when I do, it is typically only to get some statistics about the
map for user display - certainly not in an inner loop.

I find it is still useful to have a way to quickly test if a map is
empty (for e.g. fixed points that slowly grow a map by unioning in a
new bit every time) but null works OK for that purpose.

Cheers,
Max

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-21 Thread Roman Cheplyaka
* Max Bolingbroke batterseapo...@hotmail.com [2011-02-21 14:57:08+]
 On 20 February 2011 03:40, Louis Wasserman wasserman.lo...@gmail.com wrote:
  I'd like to complain about that, too ;)
 
 I'm curious, when do people find that they need a really fast way to
 get map size? I use them quite a lot, and almost never call length -
 and when I do, it is typically only to get some statistics about the
 map for user display - certainly not in an inner loop.
 
 I find it is still useful to have a way to quickly test if a map is
 empty (for e.g. fixed points that slowly grow a map by unioning in a
 new bit every time) but null works OK for that purpose.

For example, consider a situation where you have two sets (they can be
maps as well), 'a' and 'b', such that by construction you know that 'a'
is a subset (submap) of 'b'. Now,

  size a  size b

is a convenient O(1) way to test whether 'a' is a proper subset (submap)
of 'b'.

-- 
Roman I. Cheplyaka :: http://ro-che.info/
Don't worry what people think, they don't do it very often.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-19 Thread Roman Cheplyaka
Hi Johan,

this work is highly appreciated, thanks!

* Johan Tibell johan.tib...@gmail.com [2011-02-18 17:38:51-0800]
 Hi all,
 
 I am delighted to announce the release of preview versions of two new 
 packages:
 
 unordered-containers 0.1
 Efficient hashing-based container types.
 http://hackage.haskell.org/package/unordered-containers
 
 hashable 1.1
 Efficient hashing of common types.
 http://hackage.haskell.org/package/hashable
 
 These packages address a need for faster container types in Haskell,
 especially for string types like ByteString and Text. The package
 achieves better performance by using hashing and a Patricia trie (also
 known as a radix tree), instead of a balanced tree, in its
 implementation.

-- 
Roman I. Cheplyaka :: http://ro-che.info/
Don't worry what people think, they don't do it very often.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-19 Thread Louis Wasserman
A couple thoughts:

size takes O(n).  That's just depressing.  Really.

Do you think union, intersection, etc. could be supported?

Louis Wasserman
wasserman.lo...@gmail.com
http://profiles.google.com/wasserman.louis
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-19 Thread Johan Tibell
On Sat, Feb 19, 2011 at 11:58 AM, Louis Wasserman
wasserman.lo...@gmail.com wrote:
 A couple thoughts:
 size takes O(n).  That's just depressing.  Really.

This applies to all the container types. We could support O(1) size at
the cost of slowing down e.g lookup, insert, and delete a little bit.
I haven't measure how much yet. Would it be worth it?

 Do you think union, intersection, etc. could be supported?
 Louis Wasserman

Definitely. I plan to add most of the Data.Map API. We cannot support
the ordered operations (such as min) but we should be able to support
the rest.

Johan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-19 Thread Sterling Clover
On Sat, Feb 19, 2011 at 3:04 PM, Johan Tibell johan.tib...@gmail.com wrote:
 On Sat, Feb 19, 2011 at 11:58 AM, Louis Wasserman
 wasserman.lo...@gmail.com wrote:
 A couple thoughts:
 size takes O(n).  That's just depressing.  Really.

 This applies to all the container types. We could support O(1) size at
 the cost of slowing down e.g lookup, insert, and delete a little bit.
 I haven't measure how much yet. Would it be worth it?

Getting a bit picky, but for the record, Data.Map and Data.Sequence
provide O(1) size, and Data.HashTable I believe stores the information
but doesn't expose it from its tiny API. That's not an argument either
way for what a HashMap should do, however :-)

Cheers,
Sterl.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-19 Thread wren ng thornton

On 2/18/11 8:38 PM, Johan Tibell wrote:

Hi all,

I am delighted to announce the release of preview versions of two new packages:

unordered-containers 0.1
Efficient hashing-based container types.
http://hackage.haskell.org/package/unordered-containers


How does, or will, this package differ from Milan Straka's 
http://hackage.haskell.org/package/hashmap ?


--
Live well,
~wren

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-19 Thread Johan Tibell
On Sat, Feb 19, 2011 at 4:27 PM, Sterling Clover s.clo...@gmail.com wrote:
 Getting a bit picky, but for the record, Data.Map and Data.Sequence
 provide O(1) size, and Data.HashTable I believe stores the information
 but doesn't expose it from its tiny API. That's not an argument either
 way for what a HashMap should do, however :-)

We could support O(1) size. I have a design that doesn't waste one
word per node to store the size. I'll look into it as soon as I have
time.

Johan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-19 Thread Johan Tibell
On Sat, Feb 19, 2011 at 4:49 PM, wren ng thornton w...@freegeek.org wrote:
 How does, or will, this package differ from Milan Straka's
 http://hackage.haskell.org/package/hashmap ?

It's a continuation of that work. The unordered-containers
implementation is more space efficient and a bit faster. Milan also
uses a Patricia trie in his implementation. I plan to switch to a
hash-array mapped trie (HAMT) in the future. It's about 25% faster for
lookups but at the moment quite a bit slower for inserts. Once we got
some new faster array copy primops in GHC we can hopefully improve the
insert time a lot. A HAMT also has much lower space overhead per
entry.

Johan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-19 Thread Edward Kmett
On Sat, Feb 19, 2011 at 7:27 PM, Sterling Clover s.clo...@gmail.com wrote:

 On Sat, Feb 19, 2011 at 3:04 PM, Johan Tibell johan.tib...@gmail.com
 wrote:
  On Sat, Feb 19, 2011 at 11:58 AM, Louis Wasserman
  wasserman.lo...@gmail.com wrote:
  A couple thoughts:
  size takes O(n).  That's just depressing.  Really.
 
  This applies to all the container types. We could support O(1) size at
  the cost of slowing down e.g lookup, insert, and delete a little bit.
  I haven't measure how much yet. Would it be worth it?

 Getting a bit picky, but for the record, Data.Map and Data.Sequence
 provide O(1) size, and Data.HashTable I believe stores the information
 but doesn't expose it from its tiny API. That's not an argument either
 way for what a HashMap should do, however :-)


NB: Data.IntMap, which Data.HashMap is based on, actually only provides O(n)
size.

-Edward
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-19 Thread Louis Wasserman
I'd like to complain about that, too ;)

Louis Wasserman
wasserman.lo...@gmail.com
http://profiles.google.com/wasserman.louis


On Sat, Feb 19, 2011 at 9:02 PM, Edward Kmett ekm...@gmail.com wrote:

 On Sat, Feb 19, 2011 at 7:27 PM, Sterling Clover s.clo...@gmail.comwrote:

 On Sat, Feb 19, 2011 at 3:04 PM, Johan Tibell johan.tib...@gmail.com
 wrote:
  On Sat, Feb 19, 2011 at 11:58 AM, Louis Wasserman
  wasserman.lo...@gmail.com wrote:
  A couple thoughts:
  size takes O(n).  That's just depressing.  Really.
 
  This applies to all the container types. We could support O(1) size at
  the cost of slowing down e.g lookup, insert, and delete a little bit.
  I haven't measure how much yet. Would it be worth it?

 Getting a bit picky, but for the record, Data.Map and Data.Sequence
 provide O(1) size, and Data.HashTable I believe stores the information
 but doesn't expose it from its tiny API. That's not an argument either
 way for what a HashMap should do, however :-)


 NB: Data.IntMap, which Data.HashMap is based on, actually only provides
 O(n) size.

 -Edward

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library

2011-02-18 Thread Jason Dusek
  What are your thoughts on iterative construction of maplike
  datastructures? Could something like builder work for maps,
  too? In the project I'm working on, I have a function that
  receives a bunch of YAML fragments and builds a big YAML map
  out of them.

--
Jason Dusek
Linux User #510144 | http://counter.li.org/

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe