Hi everyone! I'm successfully using scikit-learn on a 384 core machine. I'm playing with two deployment: The first is a anaconda installation of python, which use MKL as backend of numpy, with python 3.6 The second is a "native" installation of scikit-learn and numpy, and thus the backend is based on openblas and python 3.4.5
Both implementations works, and I can see a high number of threads wigh high CPU load. (for instance when I'm doing PCA) The problem that I don't know how to debug, is that with kNearestNeighbour is using only one core. This puzzle me, since I can see that since version 0.17, the PR with the parallel KNN has been accepted into the main branch. https://github.com/scikit-learn/scikit-learn/pull/4009 , Sklearn should have merged this changes 1 year ago, and my version of sklearn is: > print('The scikit-learn version is {}.'.format(sklearn.__version__)) > The scikit-learn version is 0.18.1. Do you have any hints on how to use parallel KNN? I'm classifying a high dimensional dataset of MNIST (image digits). So I'm doing PCA to get vector of dimension 35-50, and then I'm doing a nonlinear expansion, so I'm getting vector of dimension 600-100. That's why I need parallelism so badly. clf = KNeighborsClassifier(algorithm='ball_tree') clf = clf.fit(train, train_labels) Thanks for all your amazing work. Ale
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn