Re: [Chicken-users] ANN: persistent-hash-map 0.0.1
Hi Evan, Evan Hanson ev...@foldling.org writes: And also, thanks for the write-up about the library -- it's nice to have an idea of the motivation behind it, and the other tips (Chicken/performance-related) are helpful as well. Taking the time to do that is much appreciated. thanks a lot for your reply (same for Daniel and John, of course). This encourages me to do more write-ups like that in the future! Moritz ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] ANN: persistent-hash-map 0.0.1
John Cowan co...@mercury.ccil.org writes: This looks brilliant. I have written some sketchy stuff that searches an a-list in the normal way, but if the a-list lookup fails and the tail is not the empty list but a SRFI 69 hash table, then search the hash table, on the assumption that there's a certain amount of stuff that has to be functional and then there are lots of global keys that are maybe mutable, maybe immutable, but anyway should be searched fast. This looks like a *much* better approach. Right, hacky work-arounds like the one you describe are what encouraged me to port this thing :-) Moritz ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] ANN: persistent-hash-map 0.0.1
Fellow Chickeneers, [... snip great write-up... ] Cheers all around Moritz This looks really useful, thanks Moritz. And also, thanks for the write-up about the library -- it's nice to have an idea of the motivation behind it, and the other tips (Chicken/performance-related) are helpful as well. Taking the time to do that is much appreciated. Evan ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
Re: [Chicken-users] ANN: persistent-hash-map 0.0.1
Very much appreciated, thanks! On Sun, Feb 10, 2013 at 3:31 PM, Moritz Heidkamp mor...@twoticketsplease.de wrote: Fellow Chickeneers, I have finally gotten around to finishing my long standing plan of porting the useful persistent hash map data structure from Clojure to Chicken. Thankfully, ClojureScript grew its own implementation of it in the meantime which written (mostly) in ClojureScript itself so porting was way easier than with the original Java implementation. But first, let me explain why this data structure is so nice. In a nutshell it allows you to program in a functional style like you are used to with lists in Scheme. As you know, SRFI 69 style hash tables don't lend themselves very well to this style because they are mutable. Thus we Schemers often resort to alists when we need associative data structures which is fine in many cases but has (at least) two drawbacks: 1) Lookup, deletion, and insertion are fairly expensive operations of O(n) complexity. This can be circumvented by converting the alist into a hash table once it is finished but this requires a fair amount of copying. Also, from this point on you lose the (de facto) immutability of the alist which allowed you to program in a functional way. So now you have to deal with all the challenges of mutable state again. In conclusion you can't have your cake and eat it, too. 2) Alists are not of a distinct type so you can't for example dispatch or specialize on them. The closest approximation of a type predicate is of O(n) complexity, too, as it would require to walk the whole list to check whether all elements are indeed pairs. This is also an issue with serialization and deserialization as their external representation is indistinguishable from a list of pairs. In practice these drawbacks are usually not much of an issue but it strikes me as a kludgy work-aroundish state of affairs. I want to use a proper data structure when dealing with associative data without compromising programming convenience! And this is where persistent hash maps come into play. In essence they work like alists without the drawbacks mentioned above: 1) Lookup complexity is practically constant and contrived benchmarks show that performance is comparable to SRFI 69 hash tables. Insertion and deletion is around O(log(n)). As an optimization the implementation allows to turn a persistent map into a so called transient map which can be mutated. This allows for efficient batch modification. Later it can be turned back into a persistent map without any copying. 2) Maps are implemented as a distinct type. What's currently missing is an external representation but this can easily be added, e.g. by means of a SRFI 10 reader extension. In a way, persistent hash maps give you the best of alists and hash tables combined which is why I highly recommend you to check them out. The egg's documentation can be found at the usual place (http://wiki.call-cc.org/eggref/4/persistent-hash-map), the source code lives at https://bitbucket.org/DerGuteMoritz/persistent-hash-map. There are almost certainly still some bugs in the implementation, especially in edge cases. I'm happy to receive bug reports if you happen to run into those! With regards to the implementation I would like share a few tidbits. Since I wanted to achieve decently performing code I took this project as an opportunity to give Felix' programming for performance guide a try (http://wiki.call-cc.org/programming-for-performance). It was really helpful in reaching this goal, so thanks Felix for taking the time to write it up! My findings are as follows: The original ClojureScript (and Clojure) implementation uses types to represent different kinds of nodes. For those types several generic functions (protocol methods in Clojure lingo, see http://clojure.org/protocols) are defined. I wanted to keep this structure in my port so as to not diverge all too far from the original implementation. That way I hope to be able to integrate upstream patches in the future more swiftly. To that end I tried coops and fast-generic (as recommended in the guide) and found that coops yields slightly smaller compiled code (about 210K versus 244K with fast-generic) while fast-generic's dispatch is about an order of magnitude faster. The catch is that fast-generic types are only visible inside the compiled module but in this case this doesn't matter because the generic functions are completely internal to the implementation so I decided to go with fast-generic. The types themselves are represented as Scheme records. I tried both the typed-records and the record-variants extensions. It turned out that typed-records yielded slightly better performance than record-variants with all declarations enabled (unsafe, unchecked, inline). The typed-records may have additionally benefited from pervasive type
Re: [Chicken-users] ANN: persistent-hash-map 0.0.1
Moritz Heidkamp scripsit: In a way, persistent hash maps give you the best of alists and hash tables combined which is why I highly recommend you to check them out. This looks brilliant. I have written some sketchy stuff that searches an a-list in the normal way, but if the a-list lookup fails and the tail is not the empty list but a SRFI 69 hash table, then search the hash table, on the assumption that there's a certain amount of stuff that has to be functional and then there are lots of global keys that are maybe mutable, maybe immutable, but anyway should be searched fast. This looks like a *much* better approach. -- John Cowan co...@ccil.org You need a change: try Canada You need a change: try China --fortune cookies opened by a couple that I know ___ Chicken-users mailing list Chicken-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/chicken-users
[Chicken-users] ANN: persistent-hash-map 0.0.1
Fellow Chickeneers, I have finally gotten around to finishing my long standing plan of porting the useful persistent hash map data structure from Clojure to Chicken. Thankfully, ClojureScript grew its own implementation of it in the meantime which written (mostly) in ClojureScript itself so porting was way easier than with the original Java implementation. But first, let me explain why this data structure is so nice. In a nutshell it allows you to program in a functional style like you are used to with lists in Scheme. As you know, SRFI 69 style hash tables don't lend themselves very well to this style because they are mutable. Thus we Schemers often resort to alists when we need associative data structures which is fine in many cases but has (at least) two drawbacks: 1) Lookup, deletion, and insertion are fairly expensive operations of O(n) complexity. This can be circumvented by converting the alist into a hash table once it is finished but this requires a fair amount of copying. Also, from this point on you lose the (de facto) immutability of the alist which allowed you to program in a functional way. So now you have to deal with all the challenges of mutable state again. In conclusion you can't have your cake and eat it, too. 2) Alists are not of a distinct type so you can't for example dispatch or specialize on them. The closest approximation of a type predicate is of O(n) complexity, too, as it would require to walk the whole list to check whether all elements are indeed pairs. This is also an issue with serialization and deserialization as their external representation is indistinguishable from a list of pairs. In practice these drawbacks are usually not much of an issue but it strikes me as a kludgy work-aroundish state of affairs. I want to use a proper data structure when dealing with associative data without compromising programming convenience! And this is where persistent hash maps come into play. In essence they work like alists without the drawbacks mentioned above: 1) Lookup complexity is practically constant and contrived benchmarks show that performance is comparable to SRFI 69 hash tables. Insertion and deletion is around O(log(n)). As an optimization the implementation allows to turn a persistent map into a so called transient map which can be mutated. This allows for efficient batch modification. Later it can be turned back into a persistent map without any copying. 2) Maps are implemented as a distinct type. What's currently missing is an external representation but this can easily be added, e.g. by means of a SRFI 10 reader extension. In a way, persistent hash maps give you the best of alists and hash tables combined which is why I highly recommend you to check them out. The egg's documentation can be found at the usual place (http://wiki.call-cc.org/eggref/4/persistent-hash-map), the source code lives at https://bitbucket.org/DerGuteMoritz/persistent-hash-map. There are almost certainly still some bugs in the implementation, especially in edge cases. I'm happy to receive bug reports if you happen to run into those! With regards to the implementation I would like share a few tidbits. Since I wanted to achieve decently performing code I took this project as an opportunity to give Felix' programming for performance guide a try (http://wiki.call-cc.org/programming-for-performance). It was really helpful in reaching this goal, so thanks Felix for taking the time to write it up! My findings are as follows: The original ClojureScript (and Clojure) implementation uses types to represent different kinds of nodes. For those types several generic functions (protocol methods in Clojure lingo, see http://clojure.org/protocols) are defined. I wanted to keep this structure in my port so as to not diverge all too far from the original implementation. That way I hope to be able to integrate upstream patches in the future more swiftly. To that end I tried coops and fast-generic (as recommended in the guide) and found that coops yields slightly smaller compiled code (about 210K versus 244K with fast-generic) while fast-generic's dispatch is about an order of magnitude faster. The catch is that fast-generic types are only visible inside the compiled module but in this case this doesn't matter because the generic functions are completely internal to the implementation so I decided to go with fast-generic. The types themselves are represented as Scheme records. I tried both the typed-records and the record-variants extensions. It turned out that typed-records yielded slightly better performance than record-variants with all declarations enabled (unsafe, unchecked, inline). The typed-records may have additionally benefited from pervasive type declarations on all procedures defined in the module (a .types file is provided by the egg, too). With these things I managed to achieve quite good performance characteristics without resorting to