Hi,

I've started to look into the matter of improving performance of LSHForest.
As we have discussed sometime before(in fact, quite a long time), main
concern is to Cythonize distance calculations. Currently, this done by
iteratively moving over all the query vectors when `kneighbors` method is
called for a set of query vectors. It has been identified that iterating
over each query with Python loops is a huge overhead. I have implemented a
few Cython hacks to demonstrate the distance calculation in LSHForest and I
was able to get an approximate speedup 10x compared to current distance
calculation with a Python loop. However,  I came across some blockers while
trying to do this and need some clarifications.

What I need to know is, do we use a mechanism to release GIL when we want
to parallelize. One of my observations is `pairwise_distance` uses all the
cores even when I don't specify `n_jobs` parameter which is 1 in default.
Is this an expected behavior?

If I want to release GIL, can I use OpenMP module in Cython? Or is that a
task of Joblib?
Any input on this is highly appreciated.

Best regards,
-- 

*Maheshakya Wijewardena,Undergraduate,*
*Department of Computer Science and Engineering,*
*Faculty of Engineering.*
*University of Moratuwa,*
*Sri Lanka*
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to