Moreover, this drawback occurs because LSHForest does not vectorize
multiple queries as in 'ball_tree' or any other method. This slows the
exact neighbor distance calculation down significantly after approximation.
This will not be a problem if queries are for individual points.
Unfortunately, former is the more useful usage of LSHForest.
Are you trying individual queries or multiple queries (for n_samples)?

On Thu, Apr 16, 2015 at 6:14 AM, Daniel Vainsencher <
daniel.vainsenc...@gmail.com> wrote:

> LHSForest is not intended for dimensions at which exact methods work well,
> nor for tiny datasets. Try d>500, n_points>100000, I don't remember the
> switchover point.
>
> The documentation should make this clear, but unfortunately I don't see
> that it does.
> On Apr 15, 2015 7:08 PM, "Joel Nothman" <joel.noth...@gmail.com> wrote:
>
>> I agree this is disappointing, and we need to work on making LSHForest
>> faster. Portions should probably be coded in Cython, for instance, as the
>> current implementation is a bit circuitous in order to work in numpy. PRs
>> are welcome.
>>
>> LSHForest could use parallelism to be faster, but so can (and will) the
>> exact neighbors methods. In theory in LSHForest, each "tree" could be
>> stored on entirely different machines, providing memory benefits, but
>> scikit-learn can't really take advantage of this.
>>
>> Having said that, I would also try with higher n_features and n_queries.
>> We have to limit the scale of our examples in order to limit the overall
>> document compilation time.
>>
>> On 16 April 2015 at 01:12, Miroslav Batchkarov <mbatchka...@gmail.com>
>> wrote:
>>
>>> Hi everyone,
>>>
>>> was really impressed by the speedups provided by LSHForest compared to
>>> brute-force search. Out of curiosity, I compared LSRForest to the existing
>>> ball tree implementation. The approximate algorithm is consistently slower
>>> (see below). Is this normal and should it be mentioned in the
>>> documentation? Does approximate search offer any benefits in terms of
>>> memory usage?
>>>
>>>
>>> I ran the same example
>>> <http://scikit-learn.org/stable/auto_examples/neighbors/plot_approximate_nearest_neighbors_scalability.html#example-neighbors-plot-approximate-nearest-neighbors-scalability-py>
>>>  with
>>> a algorithm=ball_tree. I also had to set metric=‘euclidean’ (this may
>>> affect results). The output is:
>>>
>>> Index size: 1000, exact: 0.000s, LSHF: 0.007s, speedup: 0.0, accuracy:
>>> 1.00 +/-0.00
>>> Index size: 2511, exact: 0.001s, LSHF: 0.007s, speedup: 0.1, accuracy:
>>> 0.94 +/-0.05
>>> Index size: 6309, exact: 0.001s, LSHF: 0.008s, speedup: 0.2, accuracy:
>>> 0.92 +/-0.07
>>> Index size: 15848, exact: 0.002s, LSHF: 0.008s, speedup: 0.3, accuracy:
>>> 0.92 +/-0.07
>>> Index size: 39810, exact: 0.005s, LSHF: 0.010s, speedup: 0.5, accuracy:
>>> 0.84 +/-0.10
>>> Index size: 100000, exact: 0.008s, LSHF: 0.016s, speedup: 0.5, accuracy:
>>> 0.80 +/-0.06
>>>
>>> With n_candidates=100, the output is
>>>
>>> Index size: 1000, exact: 0.000s, LSHF: 0.006s, speedup: 0.0, accuracy:
>>> 1.00 +/-0.00
>>> Index size: 2511, exact: 0.001s, LSHF: 0.006s, speedup: 0.1, accuracy:
>>> 0.94 +/-0.05
>>> Index size: 6309, exact: 0.001s, LSHF: 0.005s, speedup: 0.2, accuracy:
>>> 0.92 +/-0.07
>>> Index size: 15848, exact: 0.002s, LSHF: 0.007s, speedup: 0.4, accuracy:
>>> 0.90 +/-0.11
>>> Index size: 39810, exact: 0.005s, LSHF: 0.008s, speedup: 0.7, accuracy:
>>> 0.82 +/-0.13
>>> Index size: 100000, exact: 0.007s, LSHF: 0.013s, speedup: 0.6, accuracy:
>>> 0.78 +/-0.04
>>>
>>>
>>>
>>> ---
>>> Miroslav Batchkarov
>>> PhD Student,
>>> Text Analysis Group,
>>> Department of Informatics,
>>> University of Sussex
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
>>> Develop your own process in accordance with the BPMN 2 standard
>>> Learn Process modeling best practices with Bonita BPM through live
>>> exercises
>>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
>>> event?utm_
>>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
>> Develop your own process in accordance with the BPMN 2 standard
>> Learn Process modeling best practices with Bonita BPM through live
>> exercises
>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
>> event?utm_
>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live
> exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


-- 

*Maheshakya Wijewardena,Undergraduate,*
*Department of Computer Science and Engineering,*
*Faculty of Engineering.*
*University of Moratuwa,*
*Sri Lanka*
------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to