Re: fast parallel reduction into hash-set/map

2014-03-18 Thread Jules
Re: GPU and memory is not contiguous - you need to take a look at e.g.http://www.anandtech.com/show/7677/amd-kaveri-review-a8-7600-a10-7850k/6 GPU can now access the entire CPU address space without any copies Heterogeneous System Architecture (HSA) is already here. The h/w and libraries are

Re: fast parallel reduction into hash-set/map

2014-03-17 Thread Jean Niklas L'orange
Hello, On Saturday, March 15, 2014 11:37:11 PM UTC+1, Jules wrote: 2. I've had a look at rrb-vector - very fast for catvec - agreed, but there does appear to be a performance penalty in terms of conj[!]-ing - I can post my results if you are interested. I read the Bagwell paper on which

Re: fast parallel reduction into hash-set/map

2014-03-15 Thread Jules
Sorry that it has taken me so long to reply to this - I wanted to make sure that I had a look at rrb-vector and properly digest the points that you have made. 1. I've dumped the supervector idea :-) I ran some timings and as the number of levels of supervector increased, iteration times

Re: fast parallel reduction into hash-set/map

2014-03-15 Thread Jules
Ghadi, Thanks for you posting - please see the reply I have just posted. I'll give foldcat a look - sounds interesting. With regards to my code for splicing maps/sets together efficiently, I spent some time looking at how that could be integrated with core.reducers and did some timings :

Re: fast parallel reduction into hash-set/map

2014-02-20 Thread Jules
So, having broken the back of fast re-combination of hash sets and maps, I wanted to take a look at doing a similar sort of thing for vectors - another type of seq that I use very heavily in this sort of situation. Let's see the cake first: seqspert.vector= (def a (into [] (range 1000)))

Re: fast parallel reduction into hash-set/map

2014-02-20 Thread Andy Fingerhut
Have you looked at core.rrb-vector, which implements Relaxed Radix Trees for vectors that implement subvec and concatenating two arbitrary vectors in O(log n) time? https://github.com/clojure/core.rrb-vector Unless I am missing something, the fast concatenation is what you are trying to

Re: fast parallel reduction into hash-set/map

2014-02-20 Thread Jules
Hi, Andy. No I haven't - Thanks for the pointer - I'll take a look. I have a very specific usecase - the speeding up of the re-combination of the results of parallel reductions. I've found that the cost of this recombination often impacts badly on the overall timing of such a reduction, but

Re: fast parallel reduction into hash-set/map

2014-02-20 Thread shlomivaknin
Hey Jules, Really nice stuff your making! One note about your SuperVecs idea though, it seems that using that approach we could get a deeply nested tree to represent our a vector, and I think this is the exact thing vectors are trying to avoid.. On Friday, February 21, 2014 1:39:56 AM

Re: fast parallel reduction into hash-set/map

2014-02-20 Thread Jules
I did give this some thought :-) Reading the code for subvec, it looks like if you try to take a subvec of a subvec the top subvec cuts the intermediate subvec out of the process by recalculating a new projection on the underlying vector, relevant to the offsets imposed by the intermediate

Re: fast parallel reduction into hash-set/map

2014-02-20 Thread Alan Busby
On Fri, Feb 21, 2014 at 8:51 AM, shlomivak...@gmail.com wrote: One note about your SuperVecs idea though, it seems that using that approach we could get a deeply nested tree to represent our a vector, and I think this is the exact thing vectors are trying to avoid.. In general I think this

Re: fast parallel reduction into hash-set/map

2014-02-20 Thread Jules
all true. Also, if you look at the internal structure of PersistentVector (perhaps using me seqspert lib - announced this evening), you would see that PersistentVector is actually implemented as a tree. Recombination through several levels of supervec could simply be thought of as extending

Re: fast parallel reduction into hash-set/map

2014-02-20 Thread Jean Niklas L'orange
Hi Jules, On Thursday, February 20, 2014 11:59:03 PM UTC+1, Jules wrote: Subvec provides a view on a subseq of a vector that behaves like a full vector. Supervec provides a similar view that makes two vectors behave like a single one This data structure (supervec) is usually known as a

Re: fast parallel reduction into hash-set/map

2014-02-20 Thread Ghadi Shayban
Jules, For recombination of parallel reductions into a vector, have you looked at foldcathttps://github.com/clojure/clojure/blob/master/src/clj/clojure/core/reducers.clj#L314-L318? It works really well, and it's one of those wonderful gems in clojure.core waiting to be noticed. It gives you

Re: fast parallel reduction into hash-set/map

2014-02-17 Thread Jean Niklas L'orange
On Sunday, February 16, 2014 11:49:38 AM UTC+1, Mikera wrote: Wow - that's a pretty big win. I think we should try and get this into Clojure ASAP. Are we too late for 1.6? Yeah, this is probably too late for 1.6 =/ Anyway, cool stuff you got going on here. I'm playing around with

Re: fast parallel reduction into hash-set/map

2014-02-17 Thread Alex Miller
It is too late, but an enhancement jira would be appropriate. I would highly encourage some generative tests in such a patch and perhaps looking at https://github.com/ztellman/collection-check. With simple.check moving into contrib as test.check, we expect to be able to use test.check within

Re: fast parallel reduction into hash-set/map

2014-02-17 Thread Glen Mailer
Is there a specific part of this implementation which means it needs to live in core? It would be cool to have this as a library that could be used with existing versions of clojure (I have no idea if enough of the internals are exposed to make this viable) Glen On Saturday, 15 February 2014

Re: fast parallel reduction into hash-set/map

2014-02-17 Thread Jules
I've started doing some more serious testing and have not encountered any problems so far. I was a bit worried about the interaction of splice and transient/persistent!, but have not encountered any problems yet. I am going to read through the code again to satisfy myself that there is no

Re: fast parallel reduction into hash-set/map

2014-02-17 Thread Jules
Alex, thanks for the suggestion - I'll look at collection-check and raise the appropriate JIRA when I am happier with the code / idea. Jules On Monday, 17 February 2014 13:21:28 UTC, Alex Miller wrote: It is too late, but an enhancement jira would be appropriate. I would highly encourage

Re: fast parallel reduction into hash-set/map

2014-02-17 Thread Jules
Glen, I did start the implementation in Clojure, but had to move it under the skin of PersistentHashMap to achieve what I needed, so it is now written in Java and is part of PersistentHashMap... I don't think it would be practical to make it an add-on - but it would be nice :-). I'll keep it

Re: fast parallel reduction into hash-set/map

2014-02-16 Thread Jules
Thanks, Mikera You are right about merge: user= (def m1 (apply hash-map (range 1000))) #'user/m1 user= (def m2 (apply hash-map (range 500 1500))) #'user/m2 user= (time (def m3 (merge m1 m2))) Elapsed time: 5432.184582 msecs #'user/m3 user= (time (def m4

Re: fast parallel reduction into hash-set/map

2014-02-16 Thread Mikera
Wow - that's a pretty big win. I think we should try and get this into Clojure ASAP. Are we too late for 1.6? On Sunday, 16 February 2014 18:48:09 UTC+8, Jules wrote: Thanks, Mikera You are right about merge: user= (def m1 (apply hash-map (range 1000))) #'user/m1 user= (def m2

Re: fast parallel reduction into hash-set/map

2014-02-16 Thread Jules
I would have thought so - it's only my first cut - seems to work but I wouldn't like to stake my life on it. It really needs a developer who is familiar with PersistentHashMap to look it over and give it the thumbs up...Still, I guess if it was marked experimental ...:-) Jules On Sunday, 16

Re: fast parallel reduction into hash-set/map

2014-02-16 Thread Jules
Thinking about it a bit more, it would be good to have an interface e.g. Spliceable which e.g. 'into' could take advantage of when it found itself concatenating two seq of the same implementation... Further digging might demonstrate that a similar trick could be used with other seq types ?

fast parallel reduction into hash-set/map

2014-02-15 Thread Jules
Guys, I've been playing with reducers on and off for a while but have been frustrated because they don't seem to fit a particular usecase that I have in mind... specifically: getting as many associations into a hash-map as as I can in as short a time as possible. My understanding of the

Re: fast parallel reduction into hash-set/map

2014-02-15 Thread Alex Miller
You should try transients if you're looking to quickly fill collections - you might not even need to split up the work this way. On Saturday, February 15, 2014 5:06:24 PM UTC-6, Jules wrote: Guys, I've been playing with reducers on and off for a while but have been frustrated because they

Re: fast parallel reduction into hash-set/map

2014-02-15 Thread Jules
from src/clj/clojure/core.clj: (defn into Returns a new coll consisting of to-coll with all of the items of from-coll conjoined. {:added 1.0 :static true} [to from] (if (instance? clojure.lang.IEditableCollection to) (with-meta (persistent! (reduce conj! (transient to) from))