Hi Deb.
We don't really expose low-level functions like these, and only include
them if there is a particular use-case.
Why not create a pull request for scipy?
Cheers,
Andy
On 02/01/2016 01:14 AM, Debanjan Bhattacharyya wrote:
Hi
I have written a method pairwise_distances_argmin_min_n in my
"develop" mode.
Functionality is similar to pairwise_distances_argmin_min, but, it
returns n minimas
rather than only one (both indices and the minimas). And it does it in
chunk mode (parallel) on sparse matrices which needed some stacking
and combining etc code
This is particularly useful in word vector models where you need to
find the n closest documents against an input document given clustered
vectors of the documents.
I had a 40GB numpy array, of size, 483858*21058 (where 21058 is the
number of clusters), and I was trying to find out pairwise distances
between the first 250,000 documents and the rest. A 2500 only big
chunk of a resultant distance array from pairwise_distance results in
a 2 GB file. The total distance file would have been 200 GB! in size.
That was not making any sense to get only the top 100 or 200 closest
matches.
Hence I implemented this function. I have tested its performance.
Its good.
Please let me know whether I should create a pull request for this and
contribute.
Thanks
Regards
Deb
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general