Hi everyone,

was really impressed by the speedups provided by LSHForest compared to 
brute-force search. Out of curiosity, I compared LSRForest to the existing ball 
tree implementation. The approximate algorithm is consistently slower (see 
below). Is this normal and should it be mentioned in the documentation? Does 
approximate search offer any benefits in terms of memory usage?


I ran the same example 
<http://scikit-learn.org/stable/auto_examples/neighbors/plot_approximate_nearest_neighbors_scalability.html#example-neighbors-plot-approximate-nearest-neighbors-scalability-py>
 with a algorithm=ball_tree. I also had to set metric=‘euclidean’ (this may 
affect results). The output is:

Index size: 1000, exact: 0.000s, LSHF: 0.007s, speedup: 0.0, accuracy: 1.00 
+/-0.00
Index size: 2511, exact: 0.001s, LSHF: 0.007s, speedup: 0.1, accuracy: 0.94 
+/-0.05
Index size: 6309, exact: 0.001s, LSHF: 0.008s, speedup: 0.2, accuracy: 0.92 
+/-0.07
Index size: 15848, exact: 0.002s, LSHF: 0.008s, speedup: 0.3, accuracy: 0.92 
+/-0.07
Index size: 39810, exact: 0.005s, LSHF: 0.010s, speedup: 0.5, accuracy: 0.84 
+/-0.10
Index size: 100000, exact: 0.008s, LSHF: 0.016s, speedup: 0.5, accuracy: 0.80 
+/-0.06

With n_candidates=100, the output is

Index size: 1000, exact: 0.000s, LSHF: 0.006s, speedup: 0.0, accuracy: 1.00 
+/-0.00
Index size: 2511, exact: 0.001s, LSHF: 0.006s, speedup: 0.1, accuracy: 0.94 
+/-0.05
Index size: 6309, exact: 0.001s, LSHF: 0.005s, speedup: 0.2, accuracy: 0.92 
+/-0.07
Index size: 15848, exact: 0.002s, LSHF: 0.007s, speedup: 0.4, accuracy: 0.90 
+/-0.11
Index size: 39810, exact: 0.005s, LSHF: 0.008s, speedup: 0.7, accuracy: 0.82 
+/-0.13
Index size: 100000, exact: 0.007s, LSHF: 0.013s, speedup: 0.6, accuracy: 0.78 
+/-0.04



---
Miroslav Batchkarov
PhD Student,
Text Analysis Group,
Department of Informatics,
University of Sussex



------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to