Hi Deb.
We don't really expose low-level functions like these, and only include them if there is a particular use-case.
Why not create a pull request for scipy?

Cheers,
Andy

On 02/01/2016 01:14 AM, Debanjan Bhattacharyya wrote:
Hi

I have written a method pairwise_distances_argmin_min_n in my "develop" mode. Functionality is similar to pairwise_distances_argmin_min, but, it returns n minimas rather than only one (both indices and the minimas). And it does it in chunk mode (parallel) on sparse matrices which needed some stacking and combining etc code

This is particularly useful in word vector models where you need to find the n closest documents against an input document given clustered vectors of the documents.

I had a 40GB numpy array, of size, 483858*21058 (where 21058 is the number of clusters), and I was trying to find out pairwise distances between the first 250,000 documents and the rest. A 2500 only big chunk of a resultant distance array from pairwise_distance results in a 2 GB file. The total distance file would have been 200 GB! in size. That was not making any sense to get only the top 100 or 200 closest matches.

Hence I implemented this function. I have tested its performance. Its good.

Please let me know whether I should create a pull request for this and contribute.

Thanks

Regards
Deb


------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to