Actually I think the KNearestNeighborsClassifier implementation in scikit-learn has a real memory occupation issue in "brute" mode (which is selected).
I suspect it is materializing the whole (n_samples_train, n_samples_predict) distances array in memory before computing the (n_samples_predict * k) minimum values. When both n_samples_train and n_samples_predict are big this is an issue. This could be worked around by chunking the data argument of the predict calls instead. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
