Actually I think the KNearestNeighborsClassifier implementation in
scikit-learn has a real memory occupation issue in "brute" mode (which
is selected).

I suspect it is materializing the whole (n_samples_train,
n_samples_predict) distances array in memory before computing the
(n_samples_predict * k) minimum values.

When both n_samples_train and  n_samples_predict are big this is an issue.

This could be worked around by chunking the data argument of the
predict calls instead.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to