Hi everyone!

I'm successfully using scikit-learn on a 384 core machine. I'm playing with two 
deployment:
The first is a anaconda installation of python, which use MKL as backend of 
numpy, with python 3.6
The second is a "native" installation of scikit-learn and numpy, and thus the 
backend is based on openblas and python 3.4.5

Both implementations works, and I can see a high number of threads wigh high 
CPU load. (for instance when I'm doing PCA)

The problem that I don't know how to debug, is that with kNearestNeighbour is 
using only one core.
This puzzle me, since I can see that since version 0.17, the PR with the 
parallel KNN has been accepted into the main branch.
 https://github.com/scikit-learn/scikit-learn/pull/4009 ,

Sklearn should have merged this changes 1 year ago, and my version of sklearn 
is:

> print('The scikit-learn version is {}.'.format(sklearn.__version__))
> The scikit-learn version is 0.18.1.

Do you have any hints on how to use parallel KNN?
I'm classifying a high dimensional dataset of MNIST (image digits). So I'm 
doing PCA to get vector of dimension 35-50, and then I'm doing a nonlinear 
expansion, so I'm getting vector of dimension 600-100. That's why I need 
parallelism so badly.

    clf = KNeighborsClassifier(algorithm='ball_tree')
    clf = clf.fit(train, train_labels)

Thanks for all your amazing work.

Ale

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to