[ANN] data.avl 0.0.13 – fast sorted collections with O(log n) slices and nth

2015-12-09 Thread Michał Marczyk
Hi,

I am pleased to announce the 0.0.13 release of data.avl, a Clojure
Contrib library providing highly performant drop-in replacements for
Clojure(Script)'s built-in sorted maps and sets with the following
headline features:

 0. performance often superior to the built-ins (owing to the smaller
height of AVL trees), see end of this email for some benchmarks

 1. logarithmic-time slicing and splits:

(avl/split-key 3 (avl/sorted-set 0 1 2 3 4 5))
;= [#{0 1 2} 3 #{4 5 6}]

(avl/split-at 2 (avl/sorted-set 0 1 2 3 4 5))
;= [#{0 1} #{2 3 4 5}]

(avl/subrange (avl/sorted-set 0 1 2 3 4 5) >= 2 < 5)
;= #{2 3 4}

All returned collections are independent from the originals for GC
purposes (no holding on to out-of-range keys present in the
original collections), all operations run in O(log n) time in
asymptotic terms, and actually fast in real-world terms.

 2. logarithmic-time access by rank and rank queries:

(nth (avl/sorted-set 0 1 2) 1)
;= 1

(avl/rank-of (avl/sorted-map-by > 0 0 1 1 2 2) 0)
;= 2

 3. nearest-neighbour lookups:

(avl/nearest (avl/sorted-set 0 1 2) < 1)
;= 0

The complete clojure.core sorted collections API is also supported, as
is the transient API. See the README for more detailed examples and
feature descriptions.

Fixed in this release:

 1. IMeta and IFn implementations now use correct method names in the
ClojureScript version.

 2. sorted-map/sorted-map-by now throw exceptions if no value is
provided for the final key (in other words, if an odd number of
arguments is passed in) – patch by Andy Fingerhut, thanks!

Changed in this release:

 1. Seqs over data.avl maps now use node objects themselves as map
entries. This is in line with the behaviour of Clojure's built-in
sorted maps. Previously seqs over data.avl maps used
freshly-allocated map entry objects; this had the benefit of not
holding on to subtrees in some scenarios, however not without some
performance and GC pressure cost to regular traversals. Note that
the new behaviour allows the user to rewrap key/value pairs as map
entries if they so choose in a separate map step.

 2. Because of 1., AVL nodes now implement vector and map entry
interfaces.

Cheers,
Michał


;;; A handful of benchmarks

; CIDER 0.9.1 (Java 1.8.0_45-internal, Clojure 1.8.0-RC2, nREPL 0.2.10)

user> (def ks (range 1))
#'user/ks
user> (def ksks (doall (interleave ks ks)))
#'user/ksks
user> (def m1 (apply sorted-map ksks))
#'user/m1
user> (def m2 (apply avl/sorted-map ksks))
#'user/m2
user> (.tree m1)
[4095 4095]
user> (.getTree m2)
[4095 4095]
user> (let []
(prn :> 0)
(c/quick-bench (get m1 0))
(c/quick-bench (get m2 0))
(prn :> 4095)
(c/quick-bench (get m1 4095))
(c/quick-bench (get m2 4095))
(prn :> )
(c/quick-bench (get m1 ))
(c/quick-bench (get m2 )))
:> 0
WARNING: Final GC required 9.525315094951035 % of runtime
WARNING: Final GC required 88.3127760569387 % of runtime
Evaluation count : 2280474 in 6 samples of 380079 calls.
 Execution time mean : 262.138936 ns
Execution time std-deviation : 56.938518 ns
   Execution time lower quantile : 237.779848 ns ( 2.5%)
   Execution time upper quantile : 360.756510 ns (97.5%)
   Overhead used : 20.503990 ns

Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
 Variance from outliers : 64.2134 % Variance is severely inflated by
outliers
WARNING: Final GC required 78.56747149813818 % of runtime
Evaluation count : 2280498 in 6 samples of 380083 calls.
 Execution time mean : 261.626921 ns
Execution time std-deviation : 42.454179 ns
   Execution time lower quantile : 241.444705 ns ( 2.5%)
   Execution time upper quantile : 335.120524 ns (97.5%)
   Overhead used : 20.503990 ns

Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
 Variance from outliers : 47.5275 % Variance is moderately inflated by
outliers
:> 4095
WARNING: Final GC required 168.0614206986717 % of runtime
Evaluation count : 1056 in 6 samples of 176 calls.
 Execution time mean : 25.939625 ns
Execution time std-deviation : 4.135726 ns
   Execution time lower quantile : 22.648015 ns ( 2.5%)
   Execution time upper quantile : 32.134865 ns (97.5%)
   Overhead used : 20.503990 ns
WARNING: Final GC required 293.8046791844393 % of runtime
Evaluation count : 1056 in 6 samples of 176 calls.
 Execution time mean : 24.609377 ns
Execution time std-deviation : 1.015680 ns
   Execution time lower quantile : 23.720800 ns ( 2.5%)
   Execution time upper quantile : 26.283825 ns (97.5%)
   Overhead used : 20.503990 ns

Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
 Variance from outliers : 13.8889 % Variance is moderately inflated by
outliers

Re: [ANN] data.avl 0.0.13 – fast sorted collections with O(log n) slices and nth

2015-12-09 Thread 'Jan-Paul Bultmann' via Clojure
Very awesome :D!

Cheers, Jan

> On 09 Dec 2015, at 10:39, Michał Marczyk  wrote:
> 
> Hi,
> 
> I am pleased to announce the 0.0.13 release of data.avl, a Clojure
> Contrib library providing highly performant drop-in replacements for
> Clojure(Script)'s built-in sorted maps and sets with the following
> headline features:
> 
>  0. performance often superior to the built-ins (owing to the smaller
> height of AVL trees), see end of this email for some benchmarks
> 
>  1. logarithmic-time slicing and splits:
> 
> (avl/split-key 3 (avl/sorted-set 0 1 2 3 4 5))
> ;= [#{0 1 2} 3 #{4 5 6}]
> 
> (avl/split-at 2 (avl/sorted-set 0 1 2 3 4 5))
> ;= [#{0 1} #{2 3 4 5}]
> 
> (avl/subrange (avl/sorted-set 0 1 2 3 4 5) >= 2 < 5)
> ;= #{2 3 4}
> 
> All returned collections are independent from the originals for GC
> purposes (no holding on to out-of-range keys present in the
> original collections), all operations run in O(log n) time in
> asymptotic terms, and actually fast in real-world terms.
> 
>  2. logarithmic-time access by rank and rank queries:
> 
> (nth (avl/sorted-set 0 1 2) 1)
> ;= 1
> 
> (avl/rank-of (avl/sorted-map-by > 0 0 1 1 2 2) 0)
> ;= 2
> 
>  3. nearest-neighbour lookups:
> 
> (avl/nearest (avl/sorted-set 0 1 2) < 1)
> ;= 0
> 
> The complete clojure.core sorted collections API is also supported, as
> is the transient API. See the README for more detailed examples and
> feature descriptions.
> 
> Fixed in this release:
> 
>  1. IMeta and IFn implementations now use correct method names in the
> ClojureScript version.
> 
>  2. sorted-map/sorted-map-by now throw exceptions if no value is
> provided for the final key (in other words, if an odd number of
> arguments is passed in) – patch by Andy Fingerhut, thanks!
> 
> Changed in this release:
> 
>  1. Seqs over data.avl maps now use node objects themselves as map
> entries. This is in line with the behaviour of Clojure's built-in
> sorted maps. Previously seqs over data.avl maps used
> freshly-allocated map entry objects; this had the benefit of not
> holding on to subtrees in some scenarios, however not without some
> performance and GC pressure cost to regular traversals. Note that
> the new behaviour allows the user to rewrap key/value pairs as map
> entries if they so choose in a separate map step.
> 
>  2. Because of 1., AVL nodes now implement vector and map entry
> interfaces.
> 
> Cheers,
> Michał
> 
> 
> ;;; A handful of benchmarks
> 
> ; CIDER 0.9.1 (Java 1.8.0_45-internal, Clojure 1.8.0-RC2, nREPL 0.2.10)
> 
> user> (def ks (range 1))
> #'user/ks
> user> (def ksks (doall (interleave ks ks)))
> #'user/ksks
> user> (def m1 (apply sorted-map ksks))
> #'user/m1
> user> (def m2 (apply avl/sorted-map ksks))
> #'user/m2
> user> (.tree m1)
> [4095 4095]
> user> (.getTree m2)
> [4095 4095]
> user> (let []
> (prn :> 0)
> (c/quick-bench (get m1 0))
> (c/quick-bench (get m2 0))
> (prn :> 4095)
> (c/quick-bench (get m1 4095))
> (c/quick-bench (get m2 4095))
> (prn :> )
> (c/quick-bench (get m1 ))
> (c/quick-bench (get m2 )))
> :> 0
> WARNING: Final GC required 9.525315094951035 % of runtime
> WARNING: Final GC required 88.3127760569387 % of runtime
> Evaluation count : 2280474 in 6 samples of 380079 calls.
>  Execution time mean : 262.138936 ns
> Execution time std-deviation : 56.938518 ns
>Execution time lower quantile : 237.779848 ns ( 2.5%)
>Execution time upper quantile : 360.756510 ns (97.5%)
>Overhead used : 20.503990 ns
> 
> Found 1 outliers in 6 samples (16.6667 %)
>   low-severe   1 (16.6667 %)
>  Variance from outliers : 64.2134 % Variance is severely inflated by outliers
> WARNING: Final GC required 78.56747149813818 % of runtime
> Evaluation count : 2280498 in 6 samples of 380083 calls.
>  Execution time mean : 261.626921 ns
> Execution time std-deviation : 42.454179 ns
>Execution time lower quantile : 241.444705 ns ( 2.5%)
>Execution time upper quantile : 335.120524 ns (97.5%)
>Overhead used : 20.503990 ns
> 
> Found 1 outliers in 6 samples (16.6667 %)
>   low-severe   1 (16.6667 %)
>  Variance from outliers : 47.5275 % Variance is moderately inflated by 
> outliers
> :> 4095
> WARNING: Final GC required 168.0614206986717 % of runtime
> Evaluation count : 1056 in 6 samples of 176 calls.
>  Execution time mean : 25.939625 ns
> Execution time std-deviation : 4.135726 ns
>Execution time lower quantile : 22.648015 ns ( 2.5%)
>Execution time upper quantile : 32.134865 ns (97.5%)
>Overhead used : 20.503990 ns
> WARNING: Final GC required 293.8046791844393 % of runtime
> Evaluation count : 1056 in 6 samples of 176 calls.
>

Re: [ANN] data.avl 0.0.13 – fast sorted collections with O(log n) slices and nth

2015-12-09 Thread Michał Marczyk
Just noticed that I forgot to include dependency info and links… Here they
are:

  https://github.com/clojure/data.avl

  [org.clojure/data.avl "0.0.13"]

  
org.clojure
data.avl
0.0.13
  

  compile "org.clojure:data.avl:0.0.13"

Cheers,
Michał


On 9 December 2015 at 10:39, Michał Marczyk 
wrote:

> Hi,
>
> I am pleased to announce the 0.0.13 release of data.avl, a Clojure
> Contrib library providing highly performant drop-in replacements for
> Clojure(Script)'s built-in sorted maps and sets with the following
> headline features:
>
>  0. performance often superior to the built-ins (owing to the smaller
> height of AVL trees), see end of this email for some benchmarks
>
>  1. logarithmic-time slicing and splits:
>
> (avl/split-key 3 (avl/sorted-set 0 1 2 3 4 5))
> ;= [#{0 1 2} 3 #{4 5 6}]
>
> (avl/split-at 2 (avl/sorted-set 0 1 2 3 4 5))
> ;= [#{0 1} #{2 3 4 5}]
>
> (avl/subrange (avl/sorted-set 0 1 2 3 4 5) >= 2 < 5)
> ;= #{2 3 4}
>
> All returned collections are independent from the originals for GC
> purposes (no holding on to out-of-range keys present in the
> original collections), all operations run in O(log n) time in
> asymptotic terms, and actually fast in real-world terms.
>
>  2. logarithmic-time access by rank and rank queries:
>
> (nth (avl/sorted-set 0 1 2) 1)
> ;= 1
>
> (avl/rank-of (avl/sorted-map-by > 0 0 1 1 2 2) 0)
> ;= 2
>
>  3. nearest-neighbour lookups:
>
> (avl/nearest (avl/sorted-set 0 1 2) < 1)
> ;= 0
>
> The complete clojure.core sorted collections API is also supported, as
> is the transient API. See the README for more detailed examples and
> feature descriptions.
>
> Fixed in this release:
>
>  1. IMeta and IFn implementations now use correct method names in the
> ClojureScript version.
>
>  2. sorted-map/sorted-map-by now throw exceptions if no value is
> provided for the final key (in other words, if an odd number of
> arguments is passed in) – patch by Andy Fingerhut, thanks!
>
> Changed in this release:
>
>  1. Seqs over data.avl maps now use node objects themselves as map
> entries. This is in line with the behaviour of Clojure's built-in
> sorted maps. Previously seqs over data.avl maps used
> freshly-allocated map entry objects; this had the benefit of not
> holding on to subtrees in some scenarios, however not without some
> performance and GC pressure cost to regular traversals. Note that
> the new behaviour allows the user to rewrap key/value pairs as map
> entries if they so choose in a separate map step.
>
>  2. Because of 1., AVL nodes now implement vector and map entry
> interfaces.
>
> Cheers,
> Michał
>
>
> ;;; A handful of benchmarks
>
> ; CIDER 0.9.1 (Java 1.8.0_45-internal, Clojure 1.8.0-RC2, nREPL 0.2.10)
>
> user> (def ks (range 1))
> #'user/ks
> user> (def ksks (doall (interleave ks ks)))
> #'user/ksks
> user> (def m1 (apply sorted-map ksks))
> #'user/m1
> user> (def m2 (apply avl/sorted-map ksks))
> #'user/m2
> user> (.tree m1)
> [4095 4095]
> user> (.getTree m2)
> [4095 4095]
> user> (let []
> (prn :> 0)
> (c/quick-bench (get m1 0))
> (c/quick-bench (get m2 0))
> (prn :> 4095)
> (c/quick-bench (get m1 4095))
> (c/quick-bench (get m2 4095))
> (prn :> )
> (c/quick-bench (get m1 ))
> (c/quick-bench (get m2 )))
> :> 0
> WARNING: Final GC required 9.525315094951035 % of runtime
> WARNING: Final GC required 88.3127760569387 % of runtime
> Evaluation count : 2280474 in 6 samples of 380079 calls.
>  Execution time mean : 262.138936 ns
> Execution time std-deviation : 56.938518 ns
>Execution time lower quantile : 237.779848 ns ( 2.5%)
>Execution time upper quantile : 360.756510 ns (97.5%)
>Overhead used : 20.503990 ns
>
> Found 1 outliers in 6 samples (16.6667 %)
> low-severe 1 (16.6667 %)
>  Variance from outliers : 64.2134 % Variance is severely inflated by
> outliers
> WARNING: Final GC required 78.56747149813818 % of runtime
> Evaluation count : 2280498 in 6 samples of 380083 calls.
>  Execution time mean : 261.626921 ns
> Execution time std-deviation : 42.454179 ns
>Execution time lower quantile : 241.444705 ns ( 2.5%)
>Execution time upper quantile : 335.120524 ns (97.5%)
>Overhead used : 20.503990 ns
>
> Found 1 outliers in 6 samples (16.6667 %)
> low-severe 1 (16.6667 %)
>  Variance from outliers : 47.5275 % Variance is moderately inflated by
> outliers
> :> 4095
> WARNING: Final GC required 168.0614206986717 % of runtime
> Evaluation count : 1056 in 6 samples of 176 calls.
>  Execution time mean : 25.939625 ns
> Execution time std-deviation : 4.135726 ns
>Execution time lower quantile : 22.648015 ns ( 2.5%)
>Execution time upper quantile : 32.134865 ns (97.5%)