Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On 22 Feb 2011, at 22:21, Bryan O'Sullivan wrote: for some code that's (b) faster than anything else currently available I look forward to seeing some benchmarks against libraries other than containers, such as AVL trees, bytestring-trie, hamtmap, list-trie, etc. Good comparisons of different API-usage patterns are hard to come by. Regards, Malcolm ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Mar 14, 2011 6:23 PM, Malcolm Wallace malcolm.wall...@me.com wrote: On 22 Feb 2011, at 22:21, Bryan O'Sullivan wrote: for some code that's (b) faster than anything else currently available I look forward to seeing some benchmarks against libraries other than containers, such as AVL trees, bytestring-trie, hamtmap, list-trie, etc. Good comparisons of different API-usage patterns are hard to come by. Milan Straka compared containers to a number of other libraries (including some of those you mentioned) and found them all to be slower. Since unordered-containers is faster (or sometimes as fast) as containers I haven't really bothered comparing it to libraries other than containers. Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
Hi, Am Mittwoch, den 23.02.2011, 20:06 -0500 schrieb wren ng thornton: On 2/23/11 4:42 PM, Sterling Clover wrote: A quick grep of some of my own source reveals that I've used M.size and S.size only to test for sizes equal to 1. So, for my purposes at least, an O(1) isSingleton operation would be just as useful as an O(1) size. I agree, a fast isSingleton function would cover a very common use case--- especially for set-like containers. would ghc’s rule system be strong enough to replace size m == 1 by isSingleton m? It would be nice if programmers get the advantage even when they did not notice that a isSingleton function is provided. Greetings, Joachim -- Joachim nomeata Breitner mail: m...@joachim-breitner.de | ICQ# 74513189 | GPG-Key: 4743206C JID: nome...@joachim-breitner.de | http://www.joachim-breitner.de/ Debian Developer: nome...@debian.org signature.asc Description: This is a digitally signed message part ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
Gwern Branwen wrote: You could look at them yourself; I attached the files. Max Bolingbroke wrote: Frankly I am surprised how much size gets used. It seems that making it fast is more important than I thought. Johan Tibell wrote: IntMap (which shares data structure with HashMap) only hash O(n) size. I wonder if people avoid using IntMap because of this. Another common usage for Map is as a functional integer-indexed random access array. Once I implemented the standard algorithm for random shuffle of a list using Data.Map Int. It was much nicer than the STArray version, in my opinion. But when I tried switching to Data.IntMap, hoping to make it faster, I was devastatingly disappointed. Now I understand why. -Yitz ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Wed, Feb 23, 2011 at 08:45:47AM +, Max Bolingbroke wrote: 3. Some map combining algorithms work more efficiently when one of their two arguments are smaller. For example, Data.Map.union is most efficient for (bigmap `union` smallmap). If you don't care about which of the two input maps wins when they contain the same keys, you can improve constant factors by testing the size of the map input to size (in O(1)) and flipping the arguments if you got (smallmap `union` bigmap) instead of the desirable way round. This is a most unfortunate feature of Data.Map, forcing users to bend the logic of their application to suit the library. Ideally you'd want binary operations that are symmetric and associative performance-wise, freeing users to follow the structure of their problem domain. The documentation (and the original paper) doesn't quantify the gain for such unions, but hopefully it achieves the optimum: O(m log(n/m)), where m and n are the sizes of the smaller and larger trees respectively. Then building a tree costs O(n log(n)), regardless of the way you do it. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On 2/23/11 8:06 PM, wren ng thornton wrote: On 2/23/11 4:42 PM, Sterling Clover wrote: A quick grep of some of my own source reveals that I've used M.size and S.size only to test for sizes equal to 1. So, for my purposes at least, an O(1) isSingleton operation would be just as useful as an O(1) size. I agree, a fast isSingleton function would cover a very common use case--- especially for set-like containers. On reflection, in order to handle all the comparisons of size to some small constant, it would be better to have two functions: one for comparing the size of a collection against a boundary condition, and one for comparing the size of two collections. That is, you should add analogues of the functions in Data.List.Extras.LazyLength[1]. The lazy boundary condition function is O(min(m,n)) where m is the size of the boundary and n is the size of the collection. For detecting singletons, doubletons, etc, this reduces to O(1) though there may be a constant factor hit vs dedicated isSingleton/isDoubleton/et functions. It would be a slight difference that diminishes as m increases, but it could be worth benchmarking... Similarly, the lazy comparative size function is O(min(m,n)) where m and n are the sizes of the two collections. This is always less than the O(m)+O(n) of taking the sizes individually[2], but it's remarkably less when one of the collections is known to be much smaller than the other (even if you don't know which is which). [1] http://hackage.haskell.org/packages/archive/list-extras/0.4.0.1/doc/html/Data-List-Extras-LazyLength.html [2] The O(m)+O(n) can be parallelized to yield O(max(m,n)) but that's still greater than O(min(m,n)). With some form of chunked-lazy natural numbers you may be able to get the parallel version to approach O(max(m,n)) and then get into a battle of constant factors. -- Live well, ~wren ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On 2/24/11 3:05 AM, Joachim Breitner wrote: Hi, Am Mittwoch, den 23.02.2011, 20:06 -0500 schrieb wren ng thornton: On 2/23/11 4:42 PM, Sterling Clover wrote: A quick grep of some of my own source reveals that I've used M.size and S.size only to test for sizes equal to 1. So, for my purposes at least, an O(1) isSingleton operation would be just as useful as an O(1) size. I agree, a fast isSingleton function would cover a very common use case--- especially for set-like containers. would ghc’s rule system be strong enough to replace size m == 1 by isSingleton m? It would be nice if programmers get the advantage even when they did not notice that a isSingleton function is provided. Not really. I mean, yes, you could use rules for that; but it would be fragile and susceptible to breakage due to refactoring. Since maps are finite, you shouldn't get any changes in the domain semantics (i.e., I believe that whether you hit bottom or not cannot change based on whether rules fire), but you could get asymptotic changes (since size is O(n) times whatever loop you're in) and you can get memory leak changes as well (depending on whether CSE fires or not). For me, allowing that much semantic variability based on whether rules fire or not is unacceptable. This is why I disabled the rewrite rules in Data.List.Extras.LazyLength[1] (which, granted, is even worse since infinite lists means that you can get domain semantic differences as well). I'd rather advocate for having smart/lazy size comparison functions like Data.List.LazyLength offers, advertising them well, and then relying on users to say what they mean. [1] http://hackage.haskell.org/packages/archive/list-extras/0.4.0.1/doc/html/Data-List-Extras-LazyLength.html -- Live well, ~wren ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On 23 February 2011 05:31, Johan Tibell johan.tib...@gmail.com wrote: Can someone come up with a real world example where O(1) size is important? Tangentially - if you changed the API so the size function was called 'count' rather than 'size' or 'length', there would be no shame what's so ever in not being O(1). Problem solved :-) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On 23 February 2011 05:31, Johan Tibell johan.tib...@gmail.com wrote: On Tue, Feb 22, 2011 at 9:19 PM, Johan Tibell johan.tib...@gmail.com wrote: Initial numbers suggest that lookup gets 3% slower and insert/delete 6% slower. The upside is O(1) size. Can someone come up with a real world example where O(1) size is important? I'm a bit sceptical that it is (I was not convinced by the earlier strict-set-inclusion argument, since that's another Data.Map feature I've never used). I thought of some other possibilities though: 1. If copying an unordered-collection to a flat array you can improve the constant factors (not the asymptotics) with O(1) size to pre-allocate the array 2. If building a map in a fixed point loop (I personally do this a lot) where you know that the key uniquely determines the element, you can test if a fixed point is reached in O(1) by just comparing the sizes. Depending what you are taking a fixed point of, this may change the asymptotics 3. Some map combining algorithms work more efficiently when one of their two arguments are smaller. For example, Data.Map.union is most efficient for (bigmap `union` smallmap). If you don't care about which of the two input maps wins when they contain the same keys, you can improve constant factors by testing the size of the map input to size (in O(1)) and flipping the arguments if you got (smallmap `union` bigmap) instead of the desirable way round. Personally I don't find any of these *particularly* compelling. But a ~6% slowdown for this functionality is not too bad - have you had a chance to look at the core to see if the cause of the slowdown manifests itself at that level? Perhaps it is possible to tweak the code to make this cheaper. Also, what was the size of the collections you used in your benchmark? I would expect the relative cost of maintaining the size to get lower as you increased the size of the collection. Max ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
I took a rapid look and they seem a replacement for pure Map's, but not for mutable HashTable's. Sorry if it isn't the case. I don´t know if Data.HashTable has improved, but the performance used to be very poor in comparison with other languages. The point is that pure data structures can not be used as shared data in multithreaded environments. They must be encapsulated in mutable blocking references, such is MVars, and the whole update process blocks any other thread . (it is worst when using TVars) So they are not a replacement for mutable data structures such is Data.HashTable. For this reason I think that an inprovement/mutable-replacement of Data.HashTable is needed. If this hasn´t been done already. Are there some improvements on it that I don't know? 2011/2/23 Max Bolingbroke batterseapo...@hotmail.com On 23 February 2011 05:31, Johan Tibell johan.tib...@gmail.com wrote: On Tue, Feb 22, 2011 at 9:19 PM, Johan Tibell johan.tib...@gmail.com wrote: Initial numbers suggest that lookup gets 3% slower and insert/delete 6% slower. The upside is O(1) size. Can someone come up with a real world example where O(1) size is important? I'm a bit sceptical that it is (I was not convinced by the earlier strict-set-inclusion argument, since that's another Data.Map feature I've never used). I thought of some other possibilities though: 1. If copying an unordered-collection to a flat array you can improve the constant factors (not the asymptotics) with O(1) size to pre-allocate the array 2. If building a map in a fixed point loop (I personally do this a lot) where you know that the key uniquely determines the element, you can test if a fixed point is reached in O(1) by just comparing the sizes. Depending what you are taking a fixed point of, this may change the asymptotics 3. Some map combining algorithms work more efficiently when one of their two arguments are smaller. For example, Data.Map.union is most efficient for (bigmap `union` smallmap). If you don't care about which of the two input maps wins when they contain the same keys, you can improve constant factors by testing the size of the map input to size (in O(1)) and flipping the arguments if you got (smallmap `union` bigmap) instead of the desirable way round. Personally I don't find any of these *particularly* compelling. But a ~6% slowdown for this functionality is not too bad - have you had a chance to look at the core to see if the cause of the slowdown manifests itself at that level? Perhaps it is possible to tweak the code to make this cheaper. Also, what was the size of the collections you used in your benchmark? I would expect the relative cost of maintaining the size to get lower as you increased the size of the collection. Max ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Wed, Feb 23, 2011 at 12:30 PM, Alberto G. Corona agocor...@gmail.com wrote: For this reason I think that an inprovement/mutable-replacement of Data.HashTable is needed. If this hasn´t been done already. Are there some improvements on it that I don't know? I've been working on one lately, some preliminary benchmarks: https://gist.github.com/826935 It's probably a month or two away from a releasable state, but my work-in-progress is substantially faster (4-6X) than Data.Hashtable for inserts and lookups. G -- Gregory Collins g...@gregorycollins.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
Johan Tibell wrote: I'm working on a patch that provides O(1) size right now. The trick is to define HashMap as: data HashMap k v = HM {-# UNPACK #-} !Int !(Tree k v) Another possibility is: data HashMap k v = HM Int !(Tree k v) hashMap t = HM (treeSize t) t That way size is O(n) on first use but O(1) afterwards. Then again, if someone really needs this they can program it themselves. I've never needed an O(1) size for maps. Roman ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Sat, Feb 19, 2011 at 4:38 AM, Johan Tibell johan.tib...@gmail.com wrote: Hi all, I am delighted to announce the release of preview versions of two new packages: unordered-containers 0.1 Efficient hashing-based container types. http://hackage.haskell.org/package/unordered-containers hashable 1.1 Efficient hashing of common types. http://hackage.haskell.org/package/hashable These packages address a need for faster container types in Haskell, especially for string types like ByteString and Text. The package achieves better performance by using hashing and a Patricia trie (also known as a radix tree), instead of a balanced tree, in its implementation. The provided HashMap type is three times faster than Data.Map for lookups and twice as fast for inserts, for all key types I've tried, including ByteString, String, and Int. I'm calling this a preview release, even though the code hash been well tested both for correctness and performance, as the API is quite basic and doesn't provide a HashSet type yet. One thing you'll notice is that the API is split into a value-strict and a value-lazy version, making it easier to use in a strict setting. If you want to contribute, please get copies of the source trees from here: * git clone https://github.com/tibbe/unordered-containers.git * git clone https://github.com/tibbe/hashable.git Alternatively, just fork the projects on GitHub: http://github.com/tibbe As I mentioned in my talk on hashing-based containers earlier this week, I'm working on an even faster version, based on hash-array mapped tries, but it won't be ready until GHC 7.2. Johan What about making Hashable subclass of Eq class Eq a = Hashable a where I think it's is obvious from Hasable class properties that some kind of equality is needed and I think it will reduce type class constraints in actual hash table implementation in unordered-containers package. Also I think that value of hash functions is obviously a Monoid and it will be convenient to have Monoid instance newtype Hash = Hash Int instance Monoid Hash where mempty = Hash 0 Hash a `mappend` Hash b = Hash (a `combine` b) class Eq a = Hashable a where hash :: a - Hash hashWithSalt :: Hash - a - Hash hashWithSalt salt x = salt `mappend` hash x -- Victor Nazarov ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
2011-02-23 13:56, Victor Nazarov skrev: Also I think that value of hash functions is obviously a Monoid and it will be convenient to have Monoid instance newtype Hash = Hash Int instance Monoid Hash where mempty = Hash 0 Hash a `mappend` Hash b = Hash (a `combine` b) class Eq a = Hashable a where hash :: a - Hash hashWithSalt :: Hash - a - Hash hashWithSalt salt x = salt `mappend` hash x Monoid would be a good idea if combine was associative :) Prelude Data.Hashable let a = hash a Prelude Data.Hashable let b = hash b Prelude Data.Hashable let c = hash c Prelude Data.Hashable (a `combine` b) `combine` c 198573605 Prelude Data.Hashable a `combine` (b `combine` c) 177445 / Emil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On 23 February 2011 12:05, Gregory Collins g...@gregorycollins.net wrote: I've been working on one lately, some preliminary benchmarks: https://gist.github.com/826935 It's probably a month or two away from a releasable state, but my work-in-progress is substantially faster (4-6X) than Data.Hashtable for inserts and lookups. Sounds cool. Can you use your HashTable in the ST monad? I have often found myself wanting a mutable hashtable as a subcomponent in an externally-pure computation. Max ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Wed, Feb 23, 2011 at 3:49 PM, Max Bolingbroke batterseapo...@hotmail.com wrote: On 23 February 2011 12:05, Gregory Collins g...@gregorycollins.net wrote: I've been working on one lately, some preliminary benchmarks: https://gist.github.com/826935 It's probably a month or two away from a releasable state, but my work-in-progress is substantially faster (4-6X) than Data.Hashtable for inserts and lookups. Sounds cool. Can you use your HashTable in the ST monad? I have often found myself wanting a mutable hashtable as a subcomponent in an externally-pure computation. Yep, all of the functions are currently in ST. G -- Gregory Collins g...@gregorycollins.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
Thanks for the examples. Point 3 is interesting but most of the gain there could probably be had by telling the user to use (bigmap `union` smallmap). My guess is that the user has a good idea which argument is larger/smaller. On Wed, Feb 23, 2011 at 12:45 AM, Max Bolingbroke batterseapo...@hotmail.com wrote: Personally I don't find any of these *particularly* compelling. But a ~6% slowdown for this functionality is not too bad - have you had a chance to look at the core to see if the cause of the slowdown manifests itself at that level? Perhaps it is possible to tweak the code to make this cheaper. I did look at the core for test :: HashMap Int Int - HashMap Int Int test hm = insert 42 12 hm And I didn't see anything that looked particularly bad. The core uses unboxed values everywhere possible and the recursive function (i.e. inserts) returns an unboxed pair (# Int#, Tree k v #). The code for O(1) size i here: https://github.com/tibbe/unordered-containers/commit/77bc43acad2eb6b29472cd6ea21c3efb677cae0d Also, what was the size of the collections you used in your benchmark? I would expect the relative cost of maintaining the size to get lower as you increased the size of the collection. I tried with 2^12. Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Wed, Feb 23, 2011 at 3:30 AM, Alberto G. Corona agocor...@gmail.com wrote: The point is that pure data structures can not be used as shared data in multithreaded environments. They must be encapsulated in mutable blocking references, such is MVars, and the whole update process blocks any other thread . (it is worst when using TVars) So they are not a replacement for mutable data structures such is Data.HashTable. Depends on your use case. With one writer and multiple readers an IORef containing a persistent data structure works well. Readers do not block writers and can they execute concurrently. HashTable is not a concurrent data structure. You need e.g. a lock free mutable hash table. Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Wed, Feb 23, 2011 at 4:56 AM, Victor Nazarov asviraspossi...@gmail.com wrote: What about making Hashable subclass of Eq class Eq a = Hashable a where I think it's is obvious from Hasable class properties that some kind of equality is needed and I think it will reduce type class constraints in actual hash table implementation in unordered-containers package. The documentation could perhaps be worded better: if 'a' is also an instance of 'Eq', then the stated property should hold. You can imagine things that can be hashed by not compared for equality. Also, I don't know if type classes are a good way to model sub-typing relationships (see Oleg's email about the Functor/Applicative/Monad proposal). Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Wed, Feb 23, 2011 at 11:08 AM, Johan Tibell johan.tib...@gmail.com wrote: ... HashTable is not a concurrent data structure. You need e.g. a lock free mutable hash table. Good implementations of which are *not* thick on the ground. Even java.util.concurrent isn't fully lock-free. -Jan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On 23 February 2011 16:03, Johan Tibell johan.tib...@gmail.com wrote: Thanks for the examples. Point 3 is interesting but most of the gain there could probably be had by telling the user to use (bigmap `union` smallmap). My guess is that the user has a good idea which argument is larger/smaller. Totally agreed - this matches my experience with Map/Set. And I didn't see anything that looked particularly bad. The core uses unboxed values everywhere possible and the recursive function (i.e. inserts) returns an unboxed pair (# Int#, Tree k v #). Right. I was wondering if you were returning (Int, Tree k v), in which case CPR wouldn't unbox the Int - but I see you already thought of that issue. By the way, do you plan to add a HashSet to complement HashMap? Thanks for all the work on the library! Max ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Wed, Feb 23, 2011 at 12:46 PM, Max Bolingbroke batterseapo...@hotmail.com wrote: By the way, do you plan to add a HashSet to complement HashMap? Yes, definitely. I want to flesh out the HashMap API some more and then add a HashSet implementation. The code is still moving around a lot (e.g. between modules) and I like it to settle down a bit before I create HashSet (to avoid having more code to move around). Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Wed, Feb 23, 2011 at 12:57 PM, Gwern Branwen gwe...@gmail.com wrote: On Wed, Feb 23, 2011 at 12:45 AM, Max Bolingbroke batterseapo...@hotmail.com wrote: I'm a bit sceptical that it is (I was not convinced by the earlier strict-set-inclusion argument, since that's another Data.Map feature I've never used). I thought of some other possibilities though: 1. If copying an unordered-collection to a flat array you can improve the constant factors (not the asymptotics) with O(1) size to pre-allocate the array For kicks, I grepped my local repos for users of Data.Set's size function. My results are imperfect (I have few Github repos, I didn't check .lhs files, I only grepped files for 'Set.size' or 'S.size'), but I found around 100 uses of Data.Set.size. Could you manually look at some of them to see if you find something interesting. In particular `Set.size s == 0` (a common use of size in imperative languages) could be replaced by `Set.null s`. Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Wed, Feb 23, 2011 at 1:18 PM, Johan Tibell johan.tib...@gmail.com wrote: Could you manually look at some of them to see if you find something interesting. In particular `Set.size s == 0` (a common use of size in imperative languages) could be replaced by `Set.null s`. You could look at them yourself; I attached the files. I see 6 uses out of ~100 which involve an == 0 -- gwern http://www.gwern.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Wed, Feb 23, 2011 at 1:27 PM, Gwern Branwen gwe...@gmail.com wrote: You could look at them yourself; I attached the files. I see 6 uses out of ~100 which involve an == 0 Looks like the mailing list gateway didn't let your attachements through. No need to attach them though, I can just grep Hackage for uses. Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
A quick grep of some of my own source reveals that I've used M.size and S.size only to test for sizes equal to 1. So, for my purposes at least, an O(1) isSingleton operation would be just as useful as an O(1) size. Cheers, Sterl On Feb 23, 2011, at 4:32 PM, Johan Tibell wrote: On Wed, Feb 23, 2011 at 1:27 PM, Gwern Branwen gwe...@gmail.com wrote: You could look at them yourself; I attached the files. I see 6 uses out of ~100 which involve an == 0 Looks like the mailing list gateway didn't let your attachements through. No need to attach them though, I can just grep Hackage for uses. Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On 23 February 2011 21:27, Gwern Branwen gwe...@gmail.com wrote: On Wed, Feb 23, 2011 at 1:18 PM, Johan Tibell johan.tib...@gmail.com wrote: Could you manually look at some of them to see if you find something interesting. In particular `Set.size s == 0` (a common use of size in imperative languages) could be replaced by `Set.null s`. You could look at them yourself; I attached the files. I see 6 uses out of ~100 which involve an == 0 Thanks for bringing some data to the table. There are definitely some common patterns in what you sent me: 1) For defining Binary instances, you need to write set size before you write the elements: ~7 occurrences 2) Tests against small constants (typically = 0 or 1, but also 2 and 3!): ~15 occurrences 3) A surprise to me: generating fresh names! People keep a set of all names generated so far, and then just take size+1 as their fresh name. Nice trick. ~17 occurrences 4) Turning sizes into strings for human consumption: ~19 occurrences 5) Just reexporting the functions somehow. Uninformative. ~8 occurrences There were ~38 occurrences over ~13 repos where it appeared to be somehow fundamental to an algorithm (I didn't look into most of these in detail). I've put those after my message. Frankly I am surprised how much size gets used. It seems that making it fast is more important than I thought. Cheers, Max === Fundamental-looking occurrences: bin/folkung/folkung/Clausify.hs: siz (And ps)= S.size ps bin/folkung/folkung/Paradox/Flatten.hs:| S.size cl = 1 + S.size bs bin/folkung/folkung/Paradox/Flatten.hs: || S.size cl = S.size cl' = largestClique cl gr' bin/folkung/folkung/Paradox/Flatten.hs: -- S.size (free xs) = 1 bin/folkung/folkung/Paradox/Flatten.hs:n = S.size (free ls) bin/folkung/folkung/Paradox/Flatten.hs: , S.size ws n-1 bin/folkung/folkung/Paradox/Flatten.hs:(S.size s1,tpsize v2,inter s2) `compare` (S.size s2,tpsize v1,inter s1) bin/folkung/folkung/Paradox/Flatten.hs:sum [ S.size (s `S.intersection` vs) | (v,vs) - cons, v `S.member` s ] bin/folkung/folkung/Paradox/Flatten.hs:, S.size ws' S.size freeRight-1 bin/folkung/folkung/Paradox/Solve.hs: degree x = S.size . free $ x bin/gf/src/compiler/GF/Compile/GeneratePMCFG.hs: | product (map Set.size ys) == count bin/gf/src/compiler/GF/Speech/CFGToFA.hs:indeg (c,_) = maybe 0 Set.size $ Map.lookup c ub bin/gf/src/compiler/GF/Speech/CFGToFA.hs: where (fa', ns) = newStates (replicate (Set.size cs) ()) fa bin/gf/src/GF/Compile/GeneratePMCFG.hs: | product (map Set.size ys) == count = bin/gf/src/GF/Compile/GeneratePMCFGOld.hs: | product (map Set.size ys) == count = bin/gf/src/GF/Data/MultiMap.hs:size = sum . Prelude.map Set.size . Map.elems bin/gf/src/GF/Speech/CFGToFA.hs:indeg (c,_) = maybe 0 Set.size $ Map.lookup c ub bin/gf/src/GF/Speech/CFGToFA.hs: where (fa', ns) = newStates (replicate (Set.size cs) ()) fa bin/halfs/Halfs/FSRoot.hs: $ Set.size $ fsRootAllInodeNums fsroot bin/halfs/Halfs.hs:when (Set.size freeSet /= length freeList) ( bin/hcorpus/hcorpus.hs: let rank = 1 `fidiv` Set.size wrds, bin/hoogle/src/Hoogle/DataBase/TypeSearch/Binding.hs:g (l, vs) = Just $ [restrict|isJust l] ++ replicate (max 0 $ Set.size vs - 1) var bin/htab/src/Formula.hs:countNominals f = Set.size $ extractNominals f bin/hylores-2.5/src/HyLoRes/Clause/BasicClause.hs:size = Set.size . toFormulaSet bin/hylores-2.5/src/HyLoRes/Subsumption/ClausesByFormulaIndex.hs: let sortedCandidates = sortBy (compareUsing Set.size) subsCandidates bin/hylores-diego/src/HyLoRes/Clause/BasicClause.hs:size = Set.size . toFormulaSet bin/hylores-diego/src/HyLoRes/Subsumption/ClausesByFormulaIndex.hs: let sortedCandidates = sortBy (compareUsing Set.size) subsCandidates bin/ipclib/Language/Pepa/Compile/States.hs:Just limit - ((stateSpaceSize seen) + (Set.size stack)) limit bin/jhc/src/Grin/SSimplify.hs:v n | n `IS.member` s = v (1 + n + IS.size s) bin/jhc/src/Ho/Build.hs:maxModules - Set.size `fmap` countNodes cn bin/jhc/src/Ho/Build.hs:maxModules - Set.size `fmap` countNodes cn bin/lhc/src/Grin/Optimize/CallPattern.hs: nPatterns = Set.size callPatterns bin/proteinvis/Graphics/Visualization/Tree/Geometry.hs:where theta = ((fromIntegral index - (fromIntegral (S.size . fst . S.split (Name name) $ missingLeaves))) * 2 * 3.14157) / (num_leaves - fromIntegral (S.size missingLeaves)) bin/proteinvis/ProgramState.hs: c = S.size go bin/proteinvis/Protein.hs: , term_count = S.size terms bin/proteinvis/Protein.hs: , term_count = S.size terms bin/protocol-buffers/hprotoc/Text/ProtocolBuffers/ProtoCompile/Resolve.hs: when (Set.size numbers /= Seq.length (D.DescriptorProto.field dp)) $
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Wed, Feb 23, 2011 at 1:55 PM, Max Bolingbroke batterseapo...@hotmail.com wrote: Thanks for bringing some data to the table. There are definitely some common patterns in what you sent me: 1) For defining Binary instances, you need to write set size before you write the elements: ~7 occurrences 2) Tests against small constants (typically = 0 or 1, but also 2 and 3!): ~15 occurrences 3) A surprise to me: generating fresh names! People keep a set of all names generated so far, and then just take size+1 as their fresh name. Nice trick. ~17 occurrences 4) Turning sizes into strings for human consumption: ~19 occurrences 5) Just reexporting the functions somehow. Uninformative. ~8 occurrences There were ~38 occurrences over ~13 repos where it appeared to be somehow fundamental to an algorithm (I didn't look into most of these in detail). I've put those after my message. Frankly I am surprised how much size gets used. It seems that making it fast is more important than I thought. Nice analysis. Does this apply to maps as well as sets or are sets use differently than maps somehow? IntMap (which shares data structure with HashMap) only hash O(n) size. I wonder if people avoid using IntMap because of this. I wonder if there's a way to implement size that doesn't mess up the code so badly (see the commit on GitHub to see how badly it messes up e.g. insert). Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
Attached are all the uses of S.size and Set.size from a semi-recent snapshot of Hackage. Johan Combinatorrent/0.3.2/Combinatorrent-0.3.2/src/Process/Peer.hs:let sz = S.size q Combinatorrent/0.3.2/Combinatorrent-0.3.2/src/Process/PieceMgr.hs: ipHave = S.size . ipHaveBlocks Combinatorrent/0.3.2/Combinatorrent-0.3.2/src/Process/PieceMgr.hs:when ( (S.size $ ipHaveBlocks ipp) = ipDone ipp) cpsa/2.2.1/cpsa-2.2.1/src/CPSA/Graph/Tree.hs: if S.size new == S.size old then darcswatch/0.4.3/darcswatch-0.4.3/src/HTML.hs: show (S.size (r2p d r)) +++ EdisonCore/1.2.1.3/EdisonCore-1.2.1.3/src/Data/Edison/Coll/UnbalancedSet.hs:unsafeFromOrdSeq xs = fst (ins xs (S.size xs)) EdisonCore/1.2.1.3/EdisonCore-1.2.1.3/src/Data/Edison/Seq/RevSeq.hs:fromSeq xs = N (S.size xs - 1) xs EdisonCore/1.2.1.3/EdisonCore-1.2.1.3/src/Data/Edison/Seq/RevSeq.hs:k = S.size ys EdisonCore/1.2.1.3/EdisonCore-1.2.1.3/src/Data/Edison/Seq/RevSeq.hs:k = S.size ys EdisonCore/1.2.1.3/EdisonCore-1.2.1.3/src/Data/Edison/Seq/RevSeq.hs:structuralInvariant (N i s) = i == ((S.size s) - 1) EdisonCore/1.2.1.3/EdisonCore-1.2.1.3/src/Data/Edison/Seq/SizedSeq.hs:fromSeq xs = N (S.size xs) xs EdisonCore/1.2.1.3/EdisonCore-1.2.1.3/src/Data/Edison/Seq/SizedSeq.hs:m = S.size ys EdisonCore/1.2.1.3/EdisonCore-1.2.1.3/src/Data/Edison/Seq/SizedSeq.hs:m = S.size ys EdisonCore/1.2.1.3/EdisonCore-1.2.1.3/src/Data/Edison/Seq/SizedSeq.hs:structuralInvariant (N i s) = i == S.size s fountain/0.0.1/fountain-0.0.1/Codec/Fountain.hs:| S.size s == degree = (s, g) fountain/0.0.1/fountain-0.0.1/Codec/Fountain.hs: | S.size indices == 0 = decode' decoder newDroplets fountain/0.0.1/fountain-0.0.1/Codec/Fountain.hs: | S.size indices == 1 = decode' (Decoder messageLength extraSymbols (M.insert (head $ S.toList indices) symbol symbols) old) (new ++ newDroplets) hashmap/1.1.0/hashmap-1.1.0/Data/HashSet.hs:some_size (More t) = S.size t hashmap/1.1.0/hashmap-1.1.0/Data/HashSet.hs:some_norm s = case S.size s of 0 - Nothing hashmap/1.1.0/hashmap-1.1.0/Data/HashSet.hs:some_norm' s = case S.size s of 1 - Only $ S.findMin s HaskellForMaths/0.3.1/HaskellForMaths-0.3.1/Math/Algebra/Group/PermutationGroup.hs:order gs = S.size $ eltsS gs -- length $ elts gs HaskellForMaths/0.3.1/HaskellForMaths-0.3.1/Math/Combinatorics/LatinSquares.hs:isOneOfEach xs = length xs == S.size (S.fromList xs) HaskellTorrent/0.1.1/HaskellTorrent-0.1.1/src/Process/Peer.hs:let sz = S.size q HaskellTorrent/0.1.1/HaskellTorrent-0.1.1/src/Process/PieceMgr.hs: ipHave = S.size . ipHaveBlocks HaskellTorrent/0.1.1/HaskellTorrent-0.1.1/src/Process/PieceMgr.hs:when ( (S.size $ ipHaveBlocks ipp) = ipDone ipp) hburg/1.1.2/hburg-1.1.2/src/Csa/Csa.hs: ' - expected type++ (if (S.size p 1) then s else ) ++ hgom/0.6/hgom-0.6/Gom/Checker.hs:f s = S.size s 1 Holumbus-MapReduce/0.1.1/Holumbus-MapReduce-0.1.1/Examples/MapReduce/Crawler/Crawl.hs: runX (traceMsg 1 ( Status: already processed: ++ show (S.size $ cs_wereProcessed cs) ++ Holumbus-MapReduce/0.1.1/Holumbus-MapReduce-0.1.1/Examples/MapReduce/Crawler/Crawl.hs: , to be processed:++ show (S.size $ cs_toBeProcessed cs))) hoopl/3.8.6.0/hoopl-3.8.6.0/Compiler/Hoopl/Unique.hs: setSize (US s) = S.size s hs2bf/0.6.2/hs2bf-0.6.2/SAM.hs:when (S.size rs/=length args) $ report duplicate arguments ideas/0.7/ideas-0.7/src/Domain/Math/Polynomial/RationalExercises.hs: S.size (varSet expr) 1 ideas/0.7/ideas-0.7/src/Domain/Math/Polynomial/RationalExercises.hs: manyVars = S.size (varSet a `S.union` varSet b) 1 mahoro/0.1.2/mahoro-0.1.2/DB.hs:delJID (n, jids) = if S.size jids == 1 minesweeper/0.9/minesweeper-0.9/Data/ChangeSet.hs:size= S.size . toSet minesweeper/0.9/minesweeper-0.9/State/Functions.hs:= mines (configuration st) - S.size (marked $ game st) minesweeper/0.9/minesweeper-0.9/Table.hs:= S.size (marked g) + M.size (revealResults g) == msize c minesweeper/0.9/minesweeper-0.9/Table.hs:= mines c - S.size (marked g) panda/2009.4.1/panda-2009.4.1/src/Panda/Model/Tag.hs:sorted xs= xs.sortBy(compare_by (resources S.size)).reverse phybin/0.1.2/phybin-0.1.2/Bio/Phylogeny/PhyBin/Main.hs:putStrLn$ \nTotal unique taxa (++ show (S.size taxa) ++): ++ regex-tdfa/1.1.7/regex-tdfa-1.1.7/Data/IntSet/EnumSet2.hs:size (EnumSet s) = S.size s relacion/0.1/relacion-0.1/Data/Relacion.hs:size r = M.fold ((+) . S.size) 0 (dominio r) repa/1.1.0.0/repa-1.1.0.0/Data/Array/Repa.hs: $! U.enumFromTo (0 :: Int) (S.size sh - 1) repa/1.1.0.0/repa-1.1.0.0/Data/Array/Repa.hs: | U.length uarr /= S.size sh repa/1.1.0.0/repa-1.1.0.0/Data/Array/Repa.hs: , size of shape = ++ (show $ S.size sh) ++ \n repa/1.1.0.0/repa-1.1.0.0/Data/Array/Repa.hs: $ (flip reshape) (Z :. (S.size $ extent arr1))
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On 2/23/11 4:42 PM, Sterling Clover wrote: A quick grep of some of my own source reveals that I've used M.size and S.size only to test for sizes equal to 1. So, for my purposes at least, an O(1) isSingleton operation would be just as useful as an O(1) size. I agree, a fast isSingleton function would cover a very common use case--- especially for set-like containers. -- Live well, ~wren ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Sat, Feb 19, 2011 at 11:58 AM, Louis Wasserman wasserman.lo...@gmail.com wrote: size takes O(n). That's just depressing. Really. That's rather thoughtless wording for some code that's (a) free (b) faster than anything else currently available (c) in its very first release and (d) available in source form for you to improve as you see fit. Just depressing. Really. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
bos: On Sat, Feb 19, 2011 at 11:58 AM, Louis Wasserman [1]wasserman.lo...@gmail.com wrote: size takes O(n). That's just depressing. Really. That's rather thoughtless wording for some code that's (a) free (b) faster than anything else currently available (c) in its very first release and (d) available in source form for you to improve as you see fit. Just depressing. Really. Agreed. This is open source: patches or it stays at O(n). -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
Apologies! I was, in point of fact, working on such a patch, but I think I've been convinced that it's a legitimate position for a package to take. (I'm also curious, Johann, about the approach you figured out that didn't cost a word per node!) That said, I've been working with somewhat similar structures for my TrieMap package, and I hadn't honestly considered the possibility of having size run in anything but O(1). (When I was comparing benchmarks with bytestring-trie, I didn't realize for a few hours that all of my benchmarks called size, so O(W) operations were being benchmarked at O(n). That threw me off..) Louis Wasserman wasserman.lo...@gmail.com http://profiles.google.com/wasserman.louis On Tue, Feb 22, 2011 at 4:28 PM, Don Stewart d...@galois.com wrote: bos: On Sat, Feb 19, 2011 at 11:58 AM, Louis Wasserman [1]wasserman.lo...@gmail.com wrote: size takes O(n). That's just depressing. Really. That's rather thoughtless wording for some code that's (a) free (b) faster than anything else currently available (c) in its very first release and (d) available in source form for you to improve as you see fit. Just depressing. Really. Agreed. This is open source: patches or it stays at O(n). -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Tue, Feb 22, 2011 at 6:28 PM, Louis Wasserman wasserman.lo...@gmail.com wrote: Apologies! Accepted. :) I was, in point of fact, working on such a patch, but I think I've been convinced that it's a legitimate position for a package to take. (I'm also curious, Johann, about the approach you figured out that didn't cost a word per node!) I'm working on a patch that provides O(1) size right now. The trick is to define HashMap as: data HashMap k v = HM {-# UNPACK #-} !Int !(Tree k v) data Tree k v = Nil | Tip {-# UNPACK #-} !Hash {-# UNPACK #-} !(FL.FullList k v) | Bin {-# UNPACK #-} !Prefix {-# UNPACK #-} !Mask !(Tree k v) !(Tree k v) And have insert, delete, and other operations that change the size return both the updated tree and the size delta (-1, 0, or 1 in the case of insert/delete). The size delta is then applied to the stored size. At the moment I'm trying to figure out the performance impact of making this change. Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
Here's the approach I was attempting: a while back there was a proposal to modify Data.Map to support a sort of zipper, including the following API: data Location k a -- represents a map with a hole at a particular key search :: k - Map k a - (Maybe a, Location k a) key :: Location k a - k assign :: a - Location k a - Map k a clear :: Location k a - Map k a and from here you might define insert :: k - a - Map k a - Map k a insert k a m = case search k m of (_, loc) - assign a loc The surprising thing was that this approach *increased* the speed of the Map implementation by something like 7%. I don't know what happened to the original proposal, but I was going to try something similar, with something like data HashMap k v = HM !Int !(Tree k v) insert k v (HM sz tree) = case search k tree of (Nothing, loc) - HM (sz + 1) (assign v loc) (Just _, loc) - HM sz (assign v loc) Louis Wasserman wasserman.lo...@gmail.com http://profiles.google.com/wasserman.louis On Tue, Feb 22, 2011 at 8:45 PM, Johan Tibell johan.tib...@gmail.comwrote: On Tue, Feb 22, 2011 at 6:28 PM, Louis Wasserman wasserman.lo...@gmail.com wrote: Apologies! Accepted. :) I was, in point of fact, working on such a patch, but I think I've been convinced that it's a legitimate position for a package to take. (I'm also curious, Johann, about the approach you figured out that didn't cost a word per node!) I'm working on a patch that provides O(1) size right now. The trick is to define HashMap as: data HashMap k v = HM {-# UNPACK #-} !Int !(Tree k v) data Tree k v = Nil | Tip {-# UNPACK #-} !Hash {-# UNPACK #-} !(FL.FullList k v) | Bin {-# UNPACK #-} !Prefix {-# UNPACK #-} !Mask !(Tree k v) !(Tree k v) And have insert, delete, and other operations that change the size return both the updated tree and the size delta (-1, 0, or 1 in the case of insert/delete). The size delta is then applied to the stored size. At the moment I'm trying to figure out the performance impact of making this change. Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Tue, Feb 22, 2011 at 6:53 PM, Louis Wasserman wasserman.lo...@gmail.com wrote: Here's the approach I was attempting: a while back there was a proposal to modify Data.Map to support a sort of zipper, including the following API: data Location k a -- represents a map with a hole at a particular key search :: k - Map k a - (Maybe a, Location k a) key :: Location k a - k assign :: a - Location k a - Map k a clear :: Location k a - Map k a and from here you might define insert :: k - a - Map k a - Map k a insert k a m = case search k m of (_, loc) - assign a loc The surprising thing was that this approach *increased* the speed of the Map implementation by something like 7%. I'm pretty sure it was a decrease overall. The function under discussion was something like insertWith. The direct implementation of insertWith was the fastest. The zipper approach was 7% faster than using a naive composition of lookup and insert to implement insertWith. The slowdown was most likely caused by the extra O(log n) allocations required to create the zipper (in essence that means reify the call stack of lookup). Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
Hmmmkay. I'm not entirely convinced, 'cause I've seen genuine (and benchmarked) speedups in applying some of these ideas to my TrieMap library, but I don't have any examples handy. Louis Wasserman wasserman.lo...@gmail.com http://profiles.google.com/wasserman.louis On Tue, Feb 22, 2011 at 9:10 PM, Johan Tibell johan.tib...@gmail.comwrote: On Tue, Feb 22, 2011 at 6:53 PM, Louis Wasserman wasserman.lo...@gmail.com wrote: Here's the approach I was attempting: a while back there was a proposal to modify Data.Map to support a sort of zipper, including the following API: data Location k a -- represents a map with a hole at a particular key search :: k - Map k a - (Maybe a, Location k a) key :: Location k a - k assign :: a - Location k a - Map k a clear :: Location k a - Map k a and from here you might define insert :: k - a - Map k a - Map k a insert k a m = case search k m of (_, loc) - assign a loc The surprising thing was that this approach *increased* the speed of the Map implementation by something like 7%. I'm pretty sure it was a decrease overall. The function under discussion was something like insertWith. The direct implementation of insertWith was the fastest. The zipper approach was 7% faster than using a naive composition of lookup and insert to implement insertWith. The slowdown was most likely caused by the extra O(log n) allocations required to create the zipper (in essence that means reify the call stack of lookup). Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Tue, Feb 22, 2011 at 6:45 PM, Johan Tibell johan.tib...@gmail.com wrote: I'm working on a patch that provides O(1) size right now. The trick is to define HashMap as: data HashMap k v = HM {-# UNPACK #-} !Int !(Tree k v) data Tree k v = Nil | Tip {-# UNPACK #-} !Hash {-# UNPACK #-} !(FL.FullList k v) | Bin {-# UNPACK #-} !Prefix {-# UNPACK #-} !Mask !(Tree k v) !(Tree k v) And have insert, delete, and other operations that change the size return both the updated tree and the size delta (-1, 0, or 1 in the case of insert/delete). The size delta is then applied to the stored size. At the moment I'm trying to figure out the performance impact of making this change. Johan Initial numbers suggest that lookup gets 3% slower and insert/delete 6% slower. The upside is O(1) size. Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Tue, Feb 22, 2011 at 9:19 PM, Johan Tibell johan.tib...@gmail.com wrote: Initial numbers suggest that lookup gets 3% slower and insert/delete 6% slower. The upside is O(1) size. Can someone come up with a real world example where O(1) size is important? Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
If you want to use the library and need a short term fix, just write a small wrapper type/module newtype SizedMap = SizedMap (Int, HashMap) and track the size yourself. only complication is that on inserts and deletes you'll need to check if the key existed. other than that, it shouldn't be too difficult. This way, the library stays super optimized but, if you need, you can track the size. As Johan said, it would slow down insert and delete a bit. shouldn't affect lookup though.. max On Feb 20, 2011, at 11:40 AM, Louis Wasserman wrote: I'd like to complain about that, too ;) Louis Wasserman wasserman.lo...@gmail.com http://profiles.google.com/wasserman.louis On Sat, Feb 19, 2011 at 9:02 PM, Edward Kmett ekm...@gmail.com wrote: On Sat, Feb 19, 2011 at 7:27 PM, Sterling Clover s.clo...@gmail.com wrote: On Sat, Feb 19, 2011 at 3:04 PM, Johan Tibell johan.tib...@gmail.com wrote: On Sat, Feb 19, 2011 at 11:58 AM, Louis Wasserman wasserman.lo...@gmail.com wrote: A couple thoughts: size takes O(n). That's just depressing. Really. This applies to all the container types. We could support O(1) size at the cost of slowing down e.g lookup, insert, and delete a little bit. I haven't measure how much yet. Would it be worth it? Getting a bit picky, but for the record, Data.Map and Data.Sequence provide O(1) size, and Data.HashTable I believe stores the information but doesn't expose it from its tiny API. That's not an argument either way for what a HashMap should do, however :-) NB: Data.IntMap, which Data.HashMap is based on, actually only provides O(n) size. -Edward ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Mon, Feb 21, 2011 at 12:58 PM, Max Cantor mxcan...@gmail.com wrote: If you want to use the library and need a short term fix, just write a small wrapper type/module newtype SizedMap = SizedMap (Int, HashMap) and track the size yourself. only complication is that on inserts and deletes you'll need to check if the key existed. other than that, it shouldn't be too difficult. This way, the library stays super optimized but, if you need, you can track the size. As Johan said, it would slow down insert and delete a bit. shouldn't affect lookup though.. This isn't sufficient in all cases. How would you know the resulting size of a union or intersection? Cheers, -- Felipe. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Mon, Feb 21, 2011 at 8:07 AM, Felipe Almeida Lessa felipe.le...@gmail.com wrote: On Mon, Feb 21, 2011 at 12:58 PM, Max Cantor mxcan...@gmail.com wrote: If you want to use the library and need a short term fix, just write a small wrapper type/module newtype SizedMap = SizedMap (Int, HashMap) and track the size yourself. only complication is that on inserts and deletes you'll need to check if the key existed. other than that, it shouldn't be too difficult. This way, the library stays super optimized but, if you need, you can track the size. As Johan said, it would slow down insert and delete a bit. shouldn't affect lookup though.. This isn't sufficient in all cases. How would you know the resulting size of a union or intersection? Note that the library does not at present support union or intersection! There are various ways to deal with the problem without caching sizes at nodes, one of which is to count overlap during union or intersection operations (since this involves counting leaves that are visited during these operations). -Jan-Willem Maessen ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On 20 February 2011 03:40, Louis Wasserman wasserman.lo...@gmail.com wrote: I'd like to complain about that, too ;) I'm curious, when do people find that they need a really fast way to get map size? I use them quite a lot, and almost never call length - and when I do, it is typically only to get some statistics about the map for user display - certainly not in an inner loop. I find it is still useful to have a way to quickly test if a map is empty (for e.g. fixed points that slowly grow a map by unioning in a new bit every time) but null works OK for that purpose. Cheers, Max ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
* Max Bolingbroke batterseapo...@hotmail.com [2011-02-21 14:57:08+] On 20 February 2011 03:40, Louis Wasserman wasserman.lo...@gmail.com wrote: I'd like to complain about that, too ;) I'm curious, when do people find that they need a really fast way to get map size? I use them quite a lot, and almost never call length - and when I do, it is typically only to get some statistics about the map for user display - certainly not in an inner loop. I find it is still useful to have a way to quickly test if a map is empty (for e.g. fixed points that slowly grow a map by unioning in a new bit every time) but null works OK for that purpose. For example, consider a situation where you have two sets (they can be maps as well), 'a' and 'b', such that by construction you know that 'a' is a subset (submap) of 'b'. Now, size a size b is a convenient O(1) way to test whether 'a' is a proper subset (submap) of 'b'. -- Roman I. Cheplyaka :: http://ro-che.info/ Don't worry what people think, they don't do it very often. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
Hi Johan, this work is highly appreciated, thanks! * Johan Tibell johan.tib...@gmail.com [2011-02-18 17:38:51-0800] Hi all, I am delighted to announce the release of preview versions of two new packages: unordered-containers 0.1 Efficient hashing-based container types. http://hackage.haskell.org/package/unordered-containers hashable 1.1 Efficient hashing of common types. http://hackage.haskell.org/package/hashable These packages address a need for faster container types in Haskell, especially for string types like ByteString and Text. The package achieves better performance by using hashing and a Patricia trie (also known as a radix tree), instead of a balanced tree, in its implementation. -- Roman I. Cheplyaka :: http://ro-che.info/ Don't worry what people think, they don't do it very often. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
A couple thoughts: size takes O(n). That's just depressing. Really. Do you think union, intersection, etc. could be supported? Louis Wasserman wasserman.lo...@gmail.com http://profiles.google.com/wasserman.louis ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Sat, Feb 19, 2011 at 11:58 AM, Louis Wasserman wasserman.lo...@gmail.com wrote: A couple thoughts: size takes O(n). That's just depressing. Really. This applies to all the container types. We could support O(1) size at the cost of slowing down e.g lookup, insert, and delete a little bit. I haven't measure how much yet. Would it be worth it? Do you think union, intersection, etc. could be supported? Louis Wasserman Definitely. I plan to add most of the Data.Map API. We cannot support the ordered operations (such as min) but we should be able to support the rest. Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Sat, Feb 19, 2011 at 3:04 PM, Johan Tibell johan.tib...@gmail.com wrote: On Sat, Feb 19, 2011 at 11:58 AM, Louis Wasserman wasserman.lo...@gmail.com wrote: A couple thoughts: size takes O(n). That's just depressing. Really. This applies to all the container types. We could support O(1) size at the cost of slowing down e.g lookup, insert, and delete a little bit. I haven't measure how much yet. Would it be worth it? Getting a bit picky, but for the record, Data.Map and Data.Sequence provide O(1) size, and Data.HashTable I believe stores the information but doesn't expose it from its tiny API. That's not an argument either way for what a HashMap should do, however :-) Cheers, Sterl. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On 2/18/11 8:38 PM, Johan Tibell wrote: Hi all, I am delighted to announce the release of preview versions of two new packages: unordered-containers 0.1 Efficient hashing-based container types. http://hackage.haskell.org/package/unordered-containers How does, or will, this package differ from Milan Straka's http://hackage.haskell.org/package/hashmap ? -- Live well, ~wren ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Sat, Feb 19, 2011 at 4:27 PM, Sterling Clover s.clo...@gmail.com wrote: Getting a bit picky, but for the record, Data.Map and Data.Sequence provide O(1) size, and Data.HashTable I believe stores the information but doesn't expose it from its tiny API. That's not an argument either way for what a HashMap should do, however :-) We could support O(1) size. I have a design that doesn't waste one word per node to store the size. I'll look into it as soon as I have time. Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Sat, Feb 19, 2011 at 4:49 PM, wren ng thornton w...@freegeek.org wrote: How does, or will, this package differ from Milan Straka's http://hackage.haskell.org/package/hashmap ? It's a continuation of that work. The unordered-containers implementation is more space efficient and a bit faster. Milan also uses a Patricia trie in his implementation. I plan to switch to a hash-array mapped trie (HAMT) in the future. It's about 25% faster for lookups but at the moment quite a bit slower for inserts. Once we got some new faster array copy primops in GHC we can hopefully improve the insert time a lot. A HAMT also has much lower space overhead per entry. Johan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
On Sat, Feb 19, 2011 at 7:27 PM, Sterling Clover s.clo...@gmail.com wrote: On Sat, Feb 19, 2011 at 3:04 PM, Johan Tibell johan.tib...@gmail.com wrote: On Sat, Feb 19, 2011 at 11:58 AM, Louis Wasserman wasserman.lo...@gmail.com wrote: A couple thoughts: size takes O(n). That's just depressing. Really. This applies to all the container types. We could support O(1) size at the cost of slowing down e.g lookup, insert, and delete a little bit. I haven't measure how much yet. Would it be worth it? Getting a bit picky, but for the record, Data.Map and Data.Sequence provide O(1) size, and Data.HashTable I believe stores the information but doesn't expose it from its tiny API. That's not an argument either way for what a HashMap should do, however :-) NB: Data.IntMap, which Data.HashMap is based on, actually only provides O(n) size. -Edward ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
I'd like to complain about that, too ;) Louis Wasserman wasserman.lo...@gmail.com http://profiles.google.com/wasserman.louis On Sat, Feb 19, 2011 at 9:02 PM, Edward Kmett ekm...@gmail.com wrote: On Sat, Feb 19, 2011 at 7:27 PM, Sterling Clover s.clo...@gmail.comwrote: On Sat, Feb 19, 2011 at 3:04 PM, Johan Tibell johan.tib...@gmail.com wrote: On Sat, Feb 19, 2011 at 11:58 AM, Louis Wasserman wasserman.lo...@gmail.com wrote: A couple thoughts: size takes O(n). That's just depressing. Really. This applies to all the container types. We could support O(1) size at the cost of slowing down e.g lookup, insert, and delete a little bit. I haven't measure how much yet. Would it be worth it? Getting a bit picky, but for the record, Data.Map and Data.Sequence provide O(1) size, and Data.HashTable I believe stores the information but doesn't expose it from its tiny API. That's not an argument either way for what a HashMap should do, however :-) NB: Data.IntMap, which Data.HashMap is based on, actually only provides O(n) size. -Edward ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANN: unordered-containers - a new, faster hashing-based containers library
What are your thoughts on iterative construction of maplike datastructures? Could something like builder work for maps, too? In the project I'm working on, I have a function that receives a bunch of YAML fragments and builds a big YAML map out of them. -- Jason Dusek Linux User #510144 | http://counter.li.org/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe