I agree with Joel. Profiling indicated that 69.8% of total time of kneighbors is spent on _find_matching_indices and 22.9% is spent on _compute_distances. So I'll give priority to work on _find_matching_indices with the method you suggested.
On Sun, Aug 2, 2015 at 10:51 AM, Maheshakya Wijewardena < pmaheshak...@gmail.com> wrote: > Hi Joel, > I was on vacation during past 3 days. I''ll look into this asap and let > you all know. > > I also did some profiling, but only with the usage of `pairwise_distance` > method. Brute force technique directly uses that for the entire query > array, but LSH uses that in a loop and I noticed there is a huge lag. I'll > first confirm your claims. I can start working on this but I think I'll > need your or some other contributers' reviewing as well . I'll do this if > it's possible. > > On Sun, Aug 2, 2015 at 3:50 AM, Joel Nothman <joel.noth...@gmail.com> > wrote: > >> @Maheshakya, will you be able to do work in the near future on speeding >> up the ascending phase instead? Or should another contributor take that >> baton? Not only does it seem to be a major contributor to runtime, but it >> is independent of metric and hashing mechanism (within binary hashes), and >> hence the most fundamental component of LSHForest.T >> >> On 30 July 2015 at 22:28, Joel Nothman <joel.noth...@gmail.com> wrote: >> >>> (sorry, I should have said the first b layers, not 2**b layers, >>> producing a memoization of 2**b offsets >>> >>> On 30 July 2015 at 22:22, Joel Nothman <joel.noth...@gmail.com> wrote: >>> >>>> One approach to fixing the ascending phase would ensure that >>>> _find_matching_indices is only searching over parts of the tree that have >>>> not yet been explored, while currently it searches over the entire index at >>>> each depth. >>>> >>>> My preferred, but more experimental, solution is to memoize where the >>>> first 2**b layers of the tree begin and end in the index, for small b. So >>>> if our index stored: >>>> [[0, 0, 0, 1, 1, 0, 0, 0], >>>> [0, 0, 1, 0, 1, 0, 1, 0], >>>> [0, 0, 1, 0, 1, 0, 1, 0], >>>> [0, 1, 0, 0, 0, 0, 0, 0], >>>> [0, 1, 0, 1, 1, 0, 0, 0], >>>> [0, 1, 1, 0, 0, 1, 1, 0], >>>> [1, 0, 0, 0, 0, 1, 0, 1], >>>> [1, 0, 0, 1, 0, 1, 0, 1], >>>> [1, 1, 0, 0, 0, 0, 0, 0], >>>> [1, 1, 1, 1, 1, 0, 0, 0]] >>>> and b=2, we'd memoize offsets for prefixes of size 2: >>>> [0, # 00 >>>> 3, # 01 >>>> 6, # 10 >>>> 8, # 11 >>>> ] >>>> >>>> Given a query like [0, 1, 1, 0, 0, 0, 0, 0], it's easy to shift down >>>> to leave the first b bits [0, 1] remaining, and look them up in the array >>>> just defined to identify a much narrower search space [3, 6) matching that >>>> prefix in the overall index. >>>> >>>> Indeed, given the min_hash_match constraint, not having this sort of >>>> thing for b >= min_hash_match seems wasteful. >>>> >>>> This provides us O(1) access to the top layers of the tree when >>>> ascending, and makes the searchsorted calls run in log(n / (2 ** b)) time >>>> rather than log(n). It is also much more like traditional LSH. However, it >>>> complexifies the code, as we now have to consider two strategies for >>>> descent/ascent. >>>> >>>> >>>> >>>> On 30 July 2015 at 21:46, Joel Nothman <joel.noth...@gmail.com> wrote: >>>> >>>>> What makes you think this is the main bottleneck? While it is not an >>>>> insignificant consumer of time, I really doubt this is what's making >>>>> scikit-learn's LSH implementation severely underperform with respect to >>>>> other implementations. >>>>> >>>>> We need to profile. In order to do that, we need some sensible >>>>> parameters that users might actually want, e.g. number of features for >>>>> {dense, sparse} cases, index size, target 10NN precision and recall >>>>> (selecting corresponding n_estimators and n_candidates). Ideally we'd >>>>> consider real-world datasets. And of course, these should be sensible for >>>>> whichever metric we're operating over, and whether we're doing KNN or >>>>> Radius searches. >>>>> >>>>> I don't know if it's realistic, but I've profiled the following >>>>> bench_plot_approximate_neighbors settings: >>>>> >>>>> Building NearestNeighbors for 100000 samples in 100 dimensions >>>>> LSHF parameters: n_estimators = 15, n_candidates = 100 >>>>> Building LSHForest for 100000 samples in 100 dimensions >>>>> Done in 1.492s >>>>> Average time for lshf neighbor queries: 0.005s >>>>> Average time for exact neighbor queries: 0.002s >>>>> Average Accuracy : 0.88 >>>>> Speed up: 0.5x >>>>> >>>>> Of 4.77s total time spent in LSHForest.kneighbors for a 1000-query >>>>> matrix, we have: >>>>> >>>>> - 0.03 spent in _query (hashing and descending) >>>>> - 0.91 spent in _compute_distances (exact distance calculation) >>>>> - 3.80 remaining in _get_candidates (ascending phase), almost all >>>>> of which is spent in _find_matching_indices >>>>> >>>>> Cutting exact distance calculation to 0s will still not get this >>>>> faster than the exact approach. Of course, your mileage may vary, but this >>>>> suggests to me you're barking up the wrong tree (no pun intended). >>>>> >>>>> On 30 July 2015 at 19:43, Maheshakya Wijewardena < >>>>> pmaheshak...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I've started to look into the matter of improving performance of >>>>>> LSHForest. As we have discussed sometime before(in fact, quite a long >>>>>> time), main concern is to Cythonize distance calculations. Currently, >>>>>> this >>>>>> done by iteratively moving over all the query vectors when `kneighbors` >>>>>> method is called for a set of query vectors. It has been identified that >>>>>> iterating over each query with Python loops is a huge overhead. I have >>>>>> implemented a few Cython hacks to demonstrate the distance calculation in >>>>>> LSHForest and I was able to get an approximate speedup 10x compared to >>>>>> current distance calculation with a Python loop. However, I came across >>>>>> some blockers while trying to do this and need some clarifications. >>>>>> >>>>>> What I need to know is, do we use a mechanism to release GIL when we >>>>>> want to parallelize. One of my observations is `pairwise_distance` uses >>>>>> all >>>>>> the cores even when I don't specify `n_jobs` parameter which is 1 in >>>>>> default. Is this an expected behavior? >>>>>> >>>>>> If I want to release GIL, can I use OpenMP module in Cython? Or is >>>>>> that a task of Joblib? >>>>>> Any input on this is highly appreciated. >>>>>> >>>>>> Best regards, >>>>>> -- >>>>>> >>>>>> *Maheshakya Wijewardena,Undergraduate,* >>>>>> *Department of Computer Science and Engineering,* >>>>>> *Faculty of Engineering.* >>>>>> *University of Moratuwa,* >>>>>> *Sri Lanka* >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> >>>>>> _______________________________________________ >>>>>> Scikit-learn-general mailing list >>>>>> Scikit-learn-general@lists.sourceforge.net >>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>>>> >>>>>> >>>>> >>>> >>> >> > > > -- > > *Maheshakya Wijewardena,Undergraduate,* > *Department of Computer Science and Engineering,* > *Faculty of Engineering.* > *University of Moratuwa,* > *Sri Lanka* > -- *Maheshakya Wijewardena,Undergraduate,* *Department of Computer Science and Engineering,* *Faculty of Engineering.* *University of Moratuwa,* *Sri Lanka*
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general