Re: [Chicken-users] ANN: persistent-hash-map 0.0.1

2013-02-12 Thread Evan Hanson
 Fellow Chickeneers,

 [... snip great write-up... ]

 Cheers all around
 Moritz

This looks really useful, thanks Moritz.

And also, thanks for the write-up about the library -- it's nice to have
an idea of the motivation behind it, and the other tips
(Chicken/performance-related) are helpful as well. Taking the time to do
that is much appreciated.

Evan

___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] bind egg and strings

2013-02-12 Thread Daniel Leslie
Can someone with a little more core knowledge please comment on this?

I'm also curious as to whether this test case will eventually result in an
OOB error due to foreign reliance on GC-controlled memory.

-Dan


On Sun, Feb 10, 2013 at 11:04 PM, Andrei Barbu and...@0xab.com wrote:

 Attached is a trivial patch that does the strdup.


 Andrei


 On Sat, Feb 9, 2013 at 6:07 PM, Andrei Barbu and...@0xab.com wrote:
  Hi,
 
 
  I've been using the bind egg and encountered some strange behaviour.
  I have:
 
  struct a {
char *b;
  };
 
  Bind generates:
 
  (begin
(define a-b
  (foreign-lambda* c-string (((c-pointer (struct a)) s))
 return(s-b);))
(define make-a
  (foreign-lambda*
(c-pointer (struct a))
((c-string b))
struct a *tmp_ = (struct a *)C_malloc(sizeof(struct
  a));\ntmp_-b = b;\n\nC_return(tmp_);)))
 
 
  It seems to me that make-a is guaranteed to eventually lead to an out
  of bounds memory access because of:
   tmp_-b = b
  b is a c-string and will be GCed as soon a the foreign-lambda* returns.
  This is further exacerbated when using -mutable-fields making it
  impossible to set any char* member.
  Shouldn't the bind egg be doing an strdup here? Is there a way to get
  it do so? Or am I missing something?
 
 
  Thanks!
  Andrei

 ___
 Chicken-users mailing list
 Chicken-users@nongnu.org
 https://lists.nongnu.org/mailman/listinfo/chicken-users


___
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] ANN: persistent-hash-map 0.0.1

2013-02-12 Thread Daniel Leslie
Very much appreciated, thanks!


On Sun, Feb 10, 2013 at 3:31 PM, Moritz Heidkamp mor...@twoticketsplease.de
 wrote:

 Fellow Chickeneers,

 I have finally gotten around to finishing my long standing plan of
 porting the useful persistent hash map data structure from Clojure to
 Chicken. Thankfully, ClojureScript grew its own implementation of it in
 the meantime which written (mostly) in ClojureScript itself so porting
 was way easier than with the original Java implementation. But first,
 let me explain why this data structure is so nice. In a nutshell it
 allows you to program in a functional style like you are used to with
 lists in Scheme. As you know, SRFI 69 style hash tables don't lend
 themselves very well to this style because they are mutable. Thus we
 Schemers often resort to alists when we need associative data structures
 which is fine in many cases but has (at least) two drawbacks:

 1) Lookup, deletion, and insertion are fairly expensive operations of
O(n) complexity. This can be circumvented by converting the alist
into a hash table once it is finished but this requires a fair
amount of copying. Also, from this point on you lose the (de facto)
immutability of the alist which allowed you to program in a
functional way. So now you have to deal with all the challenges of
mutable state again. In conclusion you can't have your cake and eat
it, too.

 2) Alists are not of a distinct type so you can't for example dispatch
or specialize on them. The closest approximation of a type predicate
is of O(n) complexity, too, as it would require to walk the whole
list to check whether all elements are indeed pairs. This is also an
issue with serialization and deserialization as their external
representation is indistinguishable from a list of pairs.

 In practice these drawbacks are usually not much of an issue but it
 strikes me as a kludgy work-aroundish state of affairs. I want to use a
 proper data structure when dealing with associative data without
 compromising programming convenience! And this is where persistent hash
 maps come into play. In essence they work like alists without the
 drawbacks mentioned above:

 1) Lookup complexity is practically constant and contrived benchmarks
show that performance is comparable to SRFI 69 hash tables. Insertion
and deletion is around O(log(n)). As an optimization the
implementation allows to turn a persistent map into a so called
transient map which can be mutated. This allows for efficient batch
modification. Later it can be turned back into a persistent map
without any copying.

 2) Maps are implemented as a distinct type. What's currently missing is
an external representation but this can easily be added, e.g. by
means of a SRFI 10 reader extension.

 In a way, persistent hash maps give you the best of alists and hash
 tables combined which is why I highly recommend you to check them
 out. The egg's documentation can be found at the usual place
 (http://wiki.call-cc.org/eggref/4/persistent-hash-map), the source code
 lives at https://bitbucket.org/DerGuteMoritz/persistent-hash-map.  There
 are almost certainly still some bugs in the implementation, especially
 in edge cases. I'm happy to receive bug reports if you happen to run
 into those!

 With regards to the implementation I would like share a few
 tidbits. Since I wanted to achieve decently performing code I took this
 project as an opportunity to give Felix' programming for performance
 guide a try (http://wiki.call-cc.org/programming-for-performance). It
 was really helpful in reaching this goal, so thanks Felix for taking the
 time to write it up! My findings are as follows:

 The original ClojureScript (and Clojure) implementation uses types to
 represent different kinds of nodes. For those types several generic
 functions (protocol methods in Clojure lingo, see
 http://clojure.org/protocols) are defined. I wanted to keep this
 structure in my port so as to not diverge all too far from the original
 implementation. That way I hope to be able to integrate upstream patches
 in the future more swiftly. To that end I tried coops and fast-generic
 (as recommended in the guide) and found that coops yields slightly
 smaller compiled code (about 210K versus 244K with fast-generic) while
 fast-generic's dispatch is about an order of magnitude faster. The catch
 is that fast-generic types are only visible inside the compiled module
 but in this case this doesn't matter because the generic functions are
 completely internal to the implementation so I decided to go with
 fast-generic. The types themselves are represented as Scheme records. I
 tried both the typed-records and the record-variants extensions. It
 turned out that typed-records yielded slightly better performance than
 record-variants with all declarations enabled (unsafe, unchecked,
 inline). The typed-records may have additionally benefited from
 pervasive type