LHSForest is not intended for dimensions at which exact methods work well,
nor for tiny datasets. Try d>500, n_points>100000, I don't remember the
switchover point.
The documentation should make this clear, but unfortunately I don't see
that it does.
On Apr 15, 2015 7:08 PM, "Joel Nothman" <joel.noth...@gmail.com> wrote:
> I agree this is disappointing, and we need to work on making LSHForest
> faster. Portions should probably be coded in Cython, for instance, as the
> current implementation is a bit circuitous in order to work in numpy. PRs
> are welcome.
>
> LSHForest could use parallelism to be faster, but so can (and will) the
> exact neighbors methods. In theory in LSHForest, each "tree" could be
> stored on entirely different machines, providing memory benefits, but
> scikit-learn can't really take advantage of this.
>
> Having said that, I would also try with higher n_features and n_queries.
> We have to limit the scale of our examples in order to limit the overall
> document compilation time.
>
> On 16 April 2015 at 01:12, Miroslav Batchkarov <mbatchka...@gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> was really impressed by the speedups provided by LSHForest compared to
>> brute-force search. Out of curiosity, I compared LSRForest to the existing
>> ball tree implementation. The approximate algorithm is consistently slower
>> (see below). Is this normal and should it be mentioned in the
>> documentation? Does approximate search offer any benefits in terms of
>> memory usage?
>>
>>
>> I ran the same example
>> <http://scikit-learn.org/stable/auto_examples/neighbors/plot_approximate_nearest_neighbors_scalability.html#example-neighbors-plot-approximate-nearest-neighbors-scalability-py>
>> with
>> a algorithm=ball_tree. I also had to set metric=‘euclidean’ (this may
>> affect results). The output is:
>>
>> Index size: 1000, exact: 0.000s, LSHF: 0.007s, speedup: 0.0, accuracy:
>> 1.00 +/-0.00
>> Index size: 2511, exact: 0.001s, LSHF: 0.007s, speedup: 0.1, accuracy:
>> 0.94 +/-0.05
>> Index size: 6309, exact: 0.001s, LSHF: 0.008s, speedup: 0.2, accuracy:
>> 0.92 +/-0.07
>> Index size: 15848, exact: 0.002s, LSHF: 0.008s, speedup: 0.3, accuracy:
>> 0.92 +/-0.07
>> Index size: 39810, exact: 0.005s, LSHF: 0.010s, speedup: 0.5, accuracy:
>> 0.84 +/-0.10
>> Index size: 100000, exact: 0.008s, LSHF: 0.016s, speedup: 0.5, accuracy:
>> 0.80 +/-0.06
>>
>> With n_candidates=100, the output is
>>
>> Index size: 1000, exact: 0.000s, LSHF: 0.006s, speedup: 0.0, accuracy:
>> 1.00 +/-0.00
>> Index size: 2511, exact: 0.001s, LSHF: 0.006s, speedup: 0.1, accuracy:
>> 0.94 +/-0.05
>> Index size: 6309, exact: 0.001s, LSHF: 0.005s, speedup: 0.2, accuracy:
>> 0.92 +/-0.07
>> Index size: 15848, exact: 0.002s, LSHF: 0.007s, speedup: 0.4, accuracy:
>> 0.90 +/-0.11
>> Index size: 39810, exact: 0.005s, LSHF: 0.008s, speedup: 0.7, accuracy:
>> 0.82 +/-0.13
>> Index size: 100000, exact: 0.007s, LSHF: 0.013s, speedup: 0.6, accuracy:
>> 0.78 +/-0.04
>>
>>
>>
>> ---
>> Miroslav Batchkarov
>> PhD Student,
>> Text Analysis Group,
>> Department of Informatics,
>> University of Sussex
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
>> Develop your own process in accordance with the BPMN 2 standard
>> Learn Process modeling best practices with Bonita BPM through live
>> exercises
>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
>> event?utm_
>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live
> exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general