Hi everyone,
was really impressed by the speedups provided by LSHForest compared to
brute-force search. Out of curiosity, I compared LSRForest to the existing ball
tree implementation. The approximate algorithm is consistently slower (see
below). Is this normal and should it be mentioned in the documentation? Does
approximate search offer any benefits in terms of memory usage?
I ran the same example
<http://scikit-learn.org/stable/auto_examples/neighbors/plot_approximate_nearest_neighbors_scalability.html#example-neighbors-plot-approximate-nearest-neighbors-scalability-py>
with a algorithm=ball_tree. I also had to set metric=‘euclidean’ (this may
affect results). The output is:
Index size: 1000, exact: 0.000s, LSHF: 0.007s, speedup: 0.0, accuracy: 1.00
+/-0.00
Index size: 2511, exact: 0.001s, LSHF: 0.007s, speedup: 0.1, accuracy: 0.94
+/-0.05
Index size: 6309, exact: 0.001s, LSHF: 0.008s, speedup: 0.2, accuracy: 0.92
+/-0.07
Index size: 15848, exact: 0.002s, LSHF: 0.008s, speedup: 0.3, accuracy: 0.92
+/-0.07
Index size: 39810, exact: 0.005s, LSHF: 0.010s, speedup: 0.5, accuracy: 0.84
+/-0.10
Index size: 100000, exact: 0.008s, LSHF: 0.016s, speedup: 0.5, accuracy: 0.80
+/-0.06
With n_candidates=100, the output is
Index size: 1000, exact: 0.000s, LSHF: 0.006s, speedup: 0.0, accuracy: 1.00
+/-0.00
Index size: 2511, exact: 0.001s, LSHF: 0.006s, speedup: 0.1, accuracy: 0.94
+/-0.05
Index size: 6309, exact: 0.001s, LSHF: 0.005s, speedup: 0.2, accuracy: 0.92
+/-0.07
Index size: 15848, exact: 0.002s, LSHF: 0.007s, speedup: 0.4, accuracy: 0.90
+/-0.11
Index size: 39810, exact: 0.005s, LSHF: 0.008s, speedup: 0.7, accuracy: 0.82
+/-0.13
Index size: 100000, exact: 0.007s, LSHF: 0.013s, speedup: 0.6, accuracy: 0.78
+/-0.04
---
Miroslav Batchkarov
PhD Student,
Text Analysis Group,
Department of Informatics,
University of Sussex
------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general