It's nice to see some decent speed-up factors, though the accuracy tradeoff
is still not so great. Still, I'd like to see the code and where we can go
from here. Great work so far!
On 7 August 2015 at 07:50, Maheshakya Wijewardena
wrote:
> I did a rough implementation, setting b = min_hash_match
I did a rough implementation, setting b = min_hash_match. The result I got
from running the benchmark is attached. It was able to roughly triple the
speed-up of kneighbors function for large index sizes. However, this
implementation adds some overhead to fitting time as there are 2**b *
n_estimator
Thanks, I look forward to this being improved, while I have little
availability to help myself atm.
On 2 August 2015 at 22:58, Maheshakya Wijewardena
wrote:
> I agree with Joel. Profiling indicated that 69.8% of total time of
> kneighbors is spent on _find_matching_indices and 22.9% is spent on
I agree with Joel. Profiling indicated that 69.8% of total time of
kneighbors is spent on _find_matching_indices and 22.9% is spent on
_compute_distances. So I'll give priority to work on _find_matching_indices
with the method you suggested.
On Sun, Aug 2, 2015 at 10:51 AM, Maheshakya Wijewardena
Hi Joel,
I was on vacation during past 3 days. I''ll look into this asap and let you
all know.
I also did some profiling, but only with the usage of `pairwise_distance`
method. Brute force technique directly uses that for the entire query
array, but LSH uses that in a loop and I noticed there is a
@Maheshakya, will you be able to do work in the near future on speeding up
the ascending phase instead? Or should another contributor take that baton?
Not only does it seem to be a major contributor to runtime, but it is
independent of metric and hashing mechanism (within binary hashes), and
hence
(sorry, I should have said the first b layers, not 2**b layers, producing a
memoization of 2**b offsets)
On 30 July 2015 at 22:22, Joel Nothman wrote:
> One approach to fixing the ascending phase would ensure that
> _find_matching_indices is only searching over parts of the tree that have
> not
One approach to fixing the ascending phase would ensure that
_find_matching_indices is only searching over parts of the tree that have
not yet been explored, while currently it searches over the entire index at
each depth.
My preferred, but more experimental, solution is to memoize where the first
What makes you think this is the main bottleneck? While it is not an
insignificant consumer of time, I really doubt this is what's making
scikit-learn's LSH implementation severely underperform with respect to
other implementations.
We need to profile. In order to do that, we need some sensible pa
Hi,
This is absolutely great news. Thanks a lot. Please do open a WIP PR. We
(at INRIA) were planning to allocate time from someone this summer to
work on this. So you'll have someone reviewing / advicing.
With regards to releasing the gil, you need to use the 'with nogil'
statement in cython.
H
10 matches
Mail list logo