[ANN] data.avl 0.0.13 – fast sorted collections with O(log n) slices and nth
Hi, I am pleased to announce the 0.0.13 release of data.avl, a Clojure Contrib library providing highly performant drop-in replacements for Clojure(Script)'s built-in sorted maps and sets with the following headline features: 0. performance often superior to the built-ins (owing to the smaller height of AVL trees), see end of this email for some benchmarks 1. logarithmic-time slicing and splits: (avl/split-key 3 (avl/sorted-set 0 1 2 3 4 5)) ;= [#{0 1 2} 3 #{4 5 6}] (avl/split-at 2 (avl/sorted-set 0 1 2 3 4 5)) ;= [#{0 1} #{2 3 4 5}] (avl/subrange (avl/sorted-set 0 1 2 3 4 5) >= 2 < 5) ;= #{2 3 4} All returned collections are independent from the originals for GC purposes (no holding on to out-of-range keys present in the original collections), all operations run in O(log n) time in asymptotic terms, and actually fast in real-world terms. 2. logarithmic-time access by rank and rank queries: (nth (avl/sorted-set 0 1 2) 1) ;= 1 (avl/rank-of (avl/sorted-map-by > 0 0 1 1 2 2) 0) ;= 2 3. nearest-neighbour lookups: (avl/nearest (avl/sorted-set 0 1 2) < 1) ;= 0 The complete clojure.core sorted collections API is also supported, as is the transient API. See the README for more detailed examples and feature descriptions. Fixed in this release: 1. IMeta and IFn implementations now use correct method names in the ClojureScript version. 2. sorted-map/sorted-map-by now throw exceptions if no value is provided for the final key (in other words, if an odd number of arguments is passed in) – patch by Andy Fingerhut, thanks! Changed in this release: 1. Seqs over data.avl maps now use node objects themselves as map entries. This is in line with the behaviour of Clojure's built-in sorted maps. Previously seqs over data.avl maps used freshly-allocated map entry objects; this had the benefit of not holding on to subtrees in some scenarios, however not without some performance and GC pressure cost to regular traversals. Note that the new behaviour allows the user to rewrap key/value pairs as map entries if they so choose in a separate map step. 2. Because of 1., AVL nodes now implement vector and map entry interfaces. Cheers, Michał ;;; A handful of benchmarks ; CIDER 0.9.1 (Java 1.8.0_45-internal, Clojure 1.8.0-RC2, nREPL 0.2.10) user> (def ks (range 1)) #'user/ks user> (def ksks (doall (interleave ks ks))) #'user/ksks user> (def m1 (apply sorted-map ksks)) #'user/m1 user> (def m2 (apply avl/sorted-map ksks)) #'user/m2 user> (.tree m1) [4095 4095] user> (.getTree m2) [4095 4095] user> (let [] (prn :> 0) (c/quick-bench (get m1 0)) (c/quick-bench (get m2 0)) (prn :> 4095) (c/quick-bench (get m1 4095)) (c/quick-bench (get m2 4095)) (prn :> ) (c/quick-bench (get m1 )) (c/quick-bench (get m2 ))) :> 0 WARNING: Final GC required 9.525315094951035 % of runtime WARNING: Final GC required 88.3127760569387 % of runtime Evaluation count : 2280474 in 6 samples of 380079 calls. Execution time mean : 262.138936 ns Execution time std-deviation : 56.938518 ns Execution time lower quantile : 237.779848 ns ( 2.5%) Execution time upper quantile : 360.756510 ns (97.5%) Overhead used : 20.503990 ns Found 1 outliers in 6 samples (16.6667 %) low-severe 1 (16.6667 %) Variance from outliers : 64.2134 % Variance is severely inflated by outliers WARNING: Final GC required 78.56747149813818 % of runtime Evaluation count : 2280498 in 6 samples of 380083 calls. Execution time mean : 261.626921 ns Execution time std-deviation : 42.454179 ns Execution time lower quantile : 241.444705 ns ( 2.5%) Execution time upper quantile : 335.120524 ns (97.5%) Overhead used : 20.503990 ns Found 1 outliers in 6 samples (16.6667 %) low-severe 1 (16.6667 %) Variance from outliers : 47.5275 % Variance is moderately inflated by outliers :> 4095 WARNING: Final GC required 168.0614206986717 % of runtime Evaluation count : 1056 in 6 samples of 176 calls. Execution time mean : 25.939625 ns Execution time std-deviation : 4.135726 ns Execution time lower quantile : 22.648015 ns ( 2.5%) Execution time upper quantile : 32.134865 ns (97.5%) Overhead used : 20.503990 ns WARNING: Final GC required 293.8046791844393 % of runtime Evaluation count : 1056 in 6 samples of 176 calls. Execution time mean : 24.609377 ns Execution time std-deviation : 1.015680 ns Execution time lower quantile : 23.720800 ns ( 2.5%) Execution time upper quantile : 26.283825 ns (97.5%) Overhead used : 20.503990 ns Found 1 outliers in 6 samples (16.6667 %) low-severe 1 (16.6667 %) Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
Re: [ANN] data.avl 0.0.13 – fast sorted collections with O(log n) slices and nth
Very awesome :D! Cheers, Jan > On 09 Dec 2015, at 10:39, Michał Marczykwrote: > > Hi, > > I am pleased to announce the 0.0.13 release of data.avl, a Clojure > Contrib library providing highly performant drop-in replacements for > Clojure(Script)'s built-in sorted maps and sets with the following > headline features: > > 0. performance often superior to the built-ins (owing to the smaller > height of AVL trees), see end of this email for some benchmarks > > 1. logarithmic-time slicing and splits: > > (avl/split-key 3 (avl/sorted-set 0 1 2 3 4 5)) > ;= [#{0 1 2} 3 #{4 5 6}] > > (avl/split-at 2 (avl/sorted-set 0 1 2 3 4 5)) > ;= [#{0 1} #{2 3 4 5}] > > (avl/subrange (avl/sorted-set 0 1 2 3 4 5) >= 2 < 5) > ;= #{2 3 4} > > All returned collections are independent from the originals for GC > purposes (no holding on to out-of-range keys present in the > original collections), all operations run in O(log n) time in > asymptotic terms, and actually fast in real-world terms. > > 2. logarithmic-time access by rank and rank queries: > > (nth (avl/sorted-set 0 1 2) 1) > ;= 1 > > (avl/rank-of (avl/sorted-map-by > 0 0 1 1 2 2) 0) > ;= 2 > > 3. nearest-neighbour lookups: > > (avl/nearest (avl/sorted-set 0 1 2) < 1) > ;= 0 > > The complete clojure.core sorted collections API is also supported, as > is the transient API. See the README for more detailed examples and > feature descriptions. > > Fixed in this release: > > 1. IMeta and IFn implementations now use correct method names in the > ClojureScript version. > > 2. sorted-map/sorted-map-by now throw exceptions if no value is > provided for the final key (in other words, if an odd number of > arguments is passed in) – patch by Andy Fingerhut, thanks! > > Changed in this release: > > 1. Seqs over data.avl maps now use node objects themselves as map > entries. This is in line with the behaviour of Clojure's built-in > sorted maps. Previously seqs over data.avl maps used > freshly-allocated map entry objects; this had the benefit of not > holding on to subtrees in some scenarios, however not without some > performance and GC pressure cost to regular traversals. Note that > the new behaviour allows the user to rewrap key/value pairs as map > entries if they so choose in a separate map step. > > 2. Because of 1., AVL nodes now implement vector and map entry > interfaces. > > Cheers, > Michał > > > ;;; A handful of benchmarks > > ; CIDER 0.9.1 (Java 1.8.0_45-internal, Clojure 1.8.0-RC2, nREPL 0.2.10) > > user> (def ks (range 1)) > #'user/ks > user> (def ksks (doall (interleave ks ks))) > #'user/ksks > user> (def m1 (apply sorted-map ksks)) > #'user/m1 > user> (def m2 (apply avl/sorted-map ksks)) > #'user/m2 > user> (.tree m1) > [4095 4095] > user> (.getTree m2) > [4095 4095] > user> (let [] > (prn :> 0) > (c/quick-bench (get m1 0)) > (c/quick-bench (get m2 0)) > (prn :> 4095) > (c/quick-bench (get m1 4095)) > (c/quick-bench (get m2 4095)) > (prn :> ) > (c/quick-bench (get m1 )) > (c/quick-bench (get m2 ))) > :> 0 > WARNING: Final GC required 9.525315094951035 % of runtime > WARNING: Final GC required 88.3127760569387 % of runtime > Evaluation count : 2280474 in 6 samples of 380079 calls. > Execution time mean : 262.138936 ns > Execution time std-deviation : 56.938518 ns >Execution time lower quantile : 237.779848 ns ( 2.5%) >Execution time upper quantile : 360.756510 ns (97.5%) >Overhead used : 20.503990 ns > > Found 1 outliers in 6 samples (16.6667 %) > low-severe 1 (16.6667 %) > Variance from outliers : 64.2134 % Variance is severely inflated by outliers > WARNING: Final GC required 78.56747149813818 % of runtime > Evaluation count : 2280498 in 6 samples of 380083 calls. > Execution time mean : 261.626921 ns > Execution time std-deviation : 42.454179 ns >Execution time lower quantile : 241.444705 ns ( 2.5%) >Execution time upper quantile : 335.120524 ns (97.5%) >Overhead used : 20.503990 ns > > Found 1 outliers in 6 samples (16.6667 %) > low-severe 1 (16.6667 %) > Variance from outliers : 47.5275 % Variance is moderately inflated by > outliers > :> 4095 > WARNING: Final GC required 168.0614206986717 % of runtime > Evaluation count : 1056 in 6 samples of 176 calls. > Execution time mean : 25.939625 ns > Execution time std-deviation : 4.135726 ns >Execution time lower quantile : 22.648015 ns ( 2.5%) >Execution time upper quantile : 32.134865 ns (97.5%) >Overhead used : 20.503990 ns > WARNING: Final GC required 293.8046791844393 % of runtime > Evaluation count : 1056 in 6 samples of 176 calls. >
Re: [ANN] data.avl 0.0.13 – fast sorted collections with O(log n) slices and nth
Just noticed that I forgot to include dependency info and links… Here they are: https://github.com/clojure/data.avl [org.clojure/data.avl "0.0.13"] org.clojure data.avl 0.0.13 compile "org.clojure:data.avl:0.0.13" Cheers, Michał On 9 December 2015 at 10:39, Michał Marczykwrote: > Hi, > > I am pleased to announce the 0.0.13 release of data.avl, a Clojure > Contrib library providing highly performant drop-in replacements for > Clojure(Script)'s built-in sorted maps and sets with the following > headline features: > > 0. performance often superior to the built-ins (owing to the smaller > height of AVL trees), see end of this email for some benchmarks > > 1. logarithmic-time slicing and splits: > > (avl/split-key 3 (avl/sorted-set 0 1 2 3 4 5)) > ;= [#{0 1 2} 3 #{4 5 6}] > > (avl/split-at 2 (avl/sorted-set 0 1 2 3 4 5)) > ;= [#{0 1} #{2 3 4 5}] > > (avl/subrange (avl/sorted-set 0 1 2 3 4 5) >= 2 < 5) > ;= #{2 3 4} > > All returned collections are independent from the originals for GC > purposes (no holding on to out-of-range keys present in the > original collections), all operations run in O(log n) time in > asymptotic terms, and actually fast in real-world terms. > > 2. logarithmic-time access by rank and rank queries: > > (nth (avl/sorted-set 0 1 2) 1) > ;= 1 > > (avl/rank-of (avl/sorted-map-by > 0 0 1 1 2 2) 0) > ;= 2 > > 3. nearest-neighbour lookups: > > (avl/nearest (avl/sorted-set 0 1 2) < 1) > ;= 0 > > The complete clojure.core sorted collections API is also supported, as > is the transient API. See the README for more detailed examples and > feature descriptions. > > Fixed in this release: > > 1. IMeta and IFn implementations now use correct method names in the > ClojureScript version. > > 2. sorted-map/sorted-map-by now throw exceptions if no value is > provided for the final key (in other words, if an odd number of > arguments is passed in) – patch by Andy Fingerhut, thanks! > > Changed in this release: > > 1. Seqs over data.avl maps now use node objects themselves as map > entries. This is in line with the behaviour of Clojure's built-in > sorted maps. Previously seqs over data.avl maps used > freshly-allocated map entry objects; this had the benefit of not > holding on to subtrees in some scenarios, however not without some > performance and GC pressure cost to regular traversals. Note that > the new behaviour allows the user to rewrap key/value pairs as map > entries if they so choose in a separate map step. > > 2. Because of 1., AVL nodes now implement vector and map entry > interfaces. > > Cheers, > Michał > > > ;;; A handful of benchmarks > > ; CIDER 0.9.1 (Java 1.8.0_45-internal, Clojure 1.8.0-RC2, nREPL 0.2.10) > > user> (def ks (range 1)) > #'user/ks > user> (def ksks (doall (interleave ks ks))) > #'user/ksks > user> (def m1 (apply sorted-map ksks)) > #'user/m1 > user> (def m2 (apply avl/sorted-map ksks)) > #'user/m2 > user> (.tree m1) > [4095 4095] > user> (.getTree m2) > [4095 4095] > user> (let [] > (prn :> 0) > (c/quick-bench (get m1 0)) > (c/quick-bench (get m2 0)) > (prn :> 4095) > (c/quick-bench (get m1 4095)) > (c/quick-bench (get m2 4095)) > (prn :> ) > (c/quick-bench (get m1 )) > (c/quick-bench (get m2 ))) > :> 0 > WARNING: Final GC required 9.525315094951035 % of runtime > WARNING: Final GC required 88.3127760569387 % of runtime > Evaluation count : 2280474 in 6 samples of 380079 calls. > Execution time mean : 262.138936 ns > Execution time std-deviation : 56.938518 ns >Execution time lower quantile : 237.779848 ns ( 2.5%) >Execution time upper quantile : 360.756510 ns (97.5%) >Overhead used : 20.503990 ns > > Found 1 outliers in 6 samples (16.6667 %) > low-severe 1 (16.6667 %) > Variance from outliers : 64.2134 % Variance is severely inflated by > outliers > WARNING: Final GC required 78.56747149813818 % of runtime > Evaluation count : 2280498 in 6 samples of 380083 calls. > Execution time mean : 261.626921 ns > Execution time std-deviation : 42.454179 ns >Execution time lower quantile : 241.444705 ns ( 2.5%) >Execution time upper quantile : 335.120524 ns (97.5%) >Overhead used : 20.503990 ns > > Found 1 outliers in 6 samples (16.6667 %) > low-severe 1 (16.6667 %) > Variance from outliers : 47.5275 % Variance is moderately inflated by > outliers > :> 4095 > WARNING: Final GC required 168.0614206986717 % of runtime > Evaluation count : 1056 in 6 samples of 176 calls. > Execution time mean : 25.939625 ns > Execution time std-deviation : 4.135726 ns >Execution time lower quantile : 22.648015 ns ( 2.5%) >Execution time upper quantile : 32.134865 ns (97.5%)