Re: [Chicken-users] ANN: persistent-hash-map 0.0.1

2013-02-16 Thread Moritz Heidkamp
Hi Evan,

Evan Hanson ev...@foldling.org writes:
 And also, thanks for the write-up about the library -- it's nice to have
 an idea of the motivation behind it, and the other tips
 (Chicken/performance-related) are helpful as well. Taking the time to do
 that is much appreciated.

thanks a lot for your reply (same for Daniel and John, of course). This
encourages me to do more write-ups like that in the future!

Moritz

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] ANN: persistent-hash-map 0.0.1

2013-02-16 Thread Moritz Heidkamp
John Cowan co...@mercury.ccil.org writes:
 This looks brilliant.  I have written some sketchy stuff that searches
 an a-list in the normal way, but if the a-list lookup fails and the tail
 is not the empty list but a SRFI 69 hash table, then search the hash table,
 on the assumption that there's a certain amount of stuff that has to be
 functional and then there are lots of global keys that are maybe mutable,
 maybe immutable, but anyway should be searched fast.  This looks like a 
 *much* better approach.

Right, hacky work-arounds like the one you describe are what encouraged
me to port this thing :-)

Moritz

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] ANN: persistent-hash-map 0.0.1

2013-02-12 Thread Evan Hanson
 Fellow Chickeneers,

 [... snip great write-up... ]

 Cheers all around
 Moritz

This looks really useful, thanks Moritz.

And also, thanks for the write-up about the library -- it's nice to have
an idea of the motivation behind it, and the other tips
(Chicken/performance-related) are helpful as well. Taking the time to do
that is much appreciated.

Evan

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] ANN: persistent-hash-map 0.0.1

2013-02-12 Thread Daniel Leslie
Very much appreciated, thanks!


On Sun, Feb 10, 2013 at 3:31 PM, Moritz Heidkamp mor...@twoticketsplease.de
 wrote:

 Fellow Chickeneers,

 I have finally gotten around to finishing my long standing plan of
 porting the useful persistent hash map data structure from Clojure to
 Chicken. Thankfully, ClojureScript grew its own implementation of it in
 the meantime which written (mostly) in ClojureScript itself so porting
 was way easier than with the original Java implementation. But first,
 let me explain why this data structure is so nice. In a nutshell it
 allows you to program in a functional style like you are used to with
 lists in Scheme. As you know, SRFI 69 style hash tables don't lend
 themselves very well to this style because they are mutable. Thus we
 Schemers often resort to alists when we need associative data structures
 which is fine in many cases but has (at least) two drawbacks:

 1) Lookup, deletion, and insertion are fairly expensive operations of
O(n) complexity. This can be circumvented by converting the alist
into a hash table once it is finished but this requires a fair
amount of copying. Also, from this point on you lose the (de facto)
immutability of the alist which allowed you to program in a
functional way. So now you have to deal with all the challenges of
mutable state again. In conclusion you can't have your cake and eat
it, too.

 2) Alists are not of a distinct type so you can't for example dispatch
or specialize on them. The closest approximation of a type predicate
is of O(n) complexity, too, as it would require to walk the whole
list to check whether all elements are indeed pairs. This is also an
issue with serialization and deserialization as their external
representation is indistinguishable from a list of pairs.

 In practice these drawbacks are usually not much of an issue but it
 strikes me as a kludgy work-aroundish state of affairs. I want to use a
 proper data structure when dealing with associative data without
 compromising programming convenience! And this is where persistent hash
 maps come into play. In essence they work like alists without the
 drawbacks mentioned above:

 1) Lookup complexity is practically constant and contrived benchmarks
show that performance is comparable to SRFI 69 hash tables. Insertion
and deletion is around O(log(n)). As an optimization the
implementation allows to turn a persistent map into a so called
transient map which can be mutated. This allows for efficient batch
modification. Later it can be turned back into a persistent map
without any copying.

 2) Maps are implemented as a distinct type. What's currently missing is
an external representation but this can easily be added, e.g. by
means of a SRFI 10 reader extension.

 In a way, persistent hash maps give you the best of alists and hash
 tables combined which is why I highly recommend you to check them
 out. The egg's documentation can be found at the usual place
 (http://wiki.call-cc.org/eggref/4/persistent-hash-map), the source code
 lives at https://bitbucket.org/DerGuteMoritz/persistent-hash-map.  There
 are almost certainly still some bugs in the implementation, especially
 in edge cases. I'm happy to receive bug reports if you happen to run
 into those!

 With regards to the implementation I would like share a few
 tidbits. Since I wanted to achieve decently performing code I took this
 project as an opportunity to give Felix' programming for performance
 guide a try (http://wiki.call-cc.org/programming-for-performance). It
 was really helpful in reaching this goal, so thanks Felix for taking the
 time to write it up! My findings are as follows:

 The original ClojureScript (and Clojure) implementation uses types to
 represent different kinds of nodes. For those types several generic
 functions (protocol methods in Clojure lingo, see
 http://clojure.org/protocols) are defined. I wanted to keep this
 structure in my port so as to not diverge all too far from the original
 implementation. That way I hope to be able to integrate upstream patches
 in the future more swiftly. To that end I tried coops and fast-generic
 (as recommended in the guide) and found that coops yields slightly
 smaller compiled code (about 210K versus 244K with fast-generic) while
 fast-generic's dispatch is about an order of magnitude faster. The catch
 is that fast-generic types are only visible inside the compiled module
 but in this case this doesn't matter because the generic functions are
 completely internal to the implementation so I decided to go with
 fast-generic. The types themselves are represented as Scheme records. I
 tried both the typed-records and the record-variants extensions. It
 turned out that typed-records yielded slightly better performance than
 record-variants with all declarations enabled (unsafe, unchecked,
 inline). The typed-records may have additionally benefited from
 pervasive type 

Re: [Chicken-users] ANN: persistent-hash-map 0.0.1

2013-02-11 Thread John Cowan
Moritz Heidkamp scripsit:

 In a way, persistent hash maps give you the best of alists and hash
 tables combined which is why I highly recommend you to check them
 out. 

This looks brilliant.  I have written some sketchy stuff that searches
an a-list in the normal way, but if the a-list lookup fails and the tail
is not the empty list but a SRFI 69 hash table, then search the hash table,
on the assumption that there's a certain amount of stuff that has to be
functional and then there are lots of global keys that are maybe mutable,
maybe immutable, but anyway should be searched fast.  This looks like a 
*much* better approach.

-- 
John Cowan   co...@ccil.org
You need a change: try Canada  You need a change: try China
--fortune cookies opened by a couple that I know

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


[Chicken-users] ANN: persistent-hash-map 0.0.1

2013-02-10 Thread Moritz Heidkamp
Fellow Chickeneers,

I have finally gotten around to finishing my long standing plan of
porting the useful persistent hash map data structure from Clojure to
Chicken. Thankfully, ClojureScript grew its own implementation of it in
the meantime which written (mostly) in ClojureScript itself so porting
was way easier than with the original Java implementation. But first,
let me explain why this data structure is so nice. In a nutshell it
allows you to program in a functional style like you are used to with
lists in Scheme. As you know, SRFI 69 style hash tables don't lend
themselves very well to this style because they are mutable. Thus we
Schemers often resort to alists when we need associative data structures
which is fine in many cases but has (at least) two drawbacks:

1) Lookup, deletion, and insertion are fairly expensive operations of
   O(n) complexity. This can be circumvented by converting the alist
   into a hash table once it is finished but this requires a fair
   amount of copying. Also, from this point on you lose the (de facto)
   immutability of the alist which allowed you to program in a
   functional way. So now you have to deal with all the challenges of
   mutable state again. In conclusion you can't have your cake and eat
   it, too.

2) Alists are not of a distinct type so you can't for example dispatch
   or specialize on them. The closest approximation of a type predicate
   is of O(n) complexity, too, as it would require to walk the whole
   list to check whether all elements are indeed pairs. This is also an
   issue with serialization and deserialization as their external
   representation is indistinguishable from a list of pairs.

In practice these drawbacks are usually not much of an issue but it
strikes me as a kludgy work-aroundish state of affairs. I want to use a
proper data structure when dealing with associative data without
compromising programming convenience! And this is where persistent hash
maps come into play. In essence they work like alists without the
drawbacks mentioned above:

1) Lookup complexity is practically constant and contrived benchmarks
   show that performance is comparable to SRFI 69 hash tables. Insertion
   and deletion is around O(log(n)). As an optimization the
   implementation allows to turn a persistent map into a so called
   transient map which can be mutated. This allows for efficient batch
   modification. Later it can be turned back into a persistent map
   without any copying.

2) Maps are implemented as a distinct type. What's currently missing is
   an external representation but this can easily be added, e.g. by
   means of a SRFI 10 reader extension.

In a way, persistent hash maps give you the best of alists and hash
tables combined which is why I highly recommend you to check them
out. The egg's documentation can be found at the usual place
(http://wiki.call-cc.org/eggref/4/persistent-hash-map), the source code
lives at https://bitbucket.org/DerGuteMoritz/persistent-hash-map.  There
are almost certainly still some bugs in the implementation, especially
in edge cases. I'm happy to receive bug reports if you happen to run
into those!

With regards to the implementation I would like share a few
tidbits. Since I wanted to achieve decently performing code I took this
project as an opportunity to give Felix' programming for performance
guide a try (http://wiki.call-cc.org/programming-for-performance). It
was really helpful in reaching this goal, so thanks Felix for taking the
time to write it up! My findings are as follows:

The original ClojureScript (and Clojure) implementation uses types to
represent different kinds of nodes. For those types several generic
functions (protocol methods in Clojure lingo, see
http://clojure.org/protocols) are defined. I wanted to keep this
structure in my port so as to not diverge all too far from the original
implementation. That way I hope to be able to integrate upstream patches
in the future more swiftly. To that end I tried coops and fast-generic
(as recommended in the guide) and found that coops yields slightly
smaller compiled code (about 210K versus 244K with fast-generic) while
fast-generic's dispatch is about an order of magnitude faster. The catch
is that fast-generic types are only visible inside the compiled module
but in this case this doesn't matter because the generic functions are
completely internal to the implementation so I decided to go with
fast-generic. The types themselves are represented as Scheme records. I
tried both the typed-records and the record-variants extensions. It
turned out that typed-records yielded slightly better performance than
record-variants with all declarations enabled (unsafe, unchecked,
inline). The typed-records may have additionally benefited from
pervasive type declarations on all procedures defined in the module (a
.types file is provided by the egg, too). With these things I managed to
achieve quite good performance characteristics without resorting to