[scikit-learn] MLPClassifier/Regressor and Kernel Processes when Multiprocessing

Taylor J Keding Tue, 28 Apr 2020 12:08:23 -0700

Hi SciKit-Learn folks,

I am building a stacked generalization classifier using the multilayer
perceptron classifier as one of it's submodels. All data have been
preprocessed appropriately and I am tuning each submodel's hyperparameters
with a customized randomized search protocol (very similar to sklearn's
RandomizedSearchCV). Importantly, I am using Python's
Multiprocessing.Pool() to parallelize this search.


When I start the hyperparameter search, jobs/threads do indeed spawn
appropriately. Tuning other submodels (RandomForestClassifier, SVC,
GradientBoostingClassifier, SDGClassifier) works perfectly, which each job
(model with particular randomized parameters) being scored with
cross_val_score and returning when the Pool of workers is complete. All is
well until I reach the MLPClassifier model. Jobs spawn as with the other
models, however, System CPU (Linux Kernel) processes surge and overwhelm my
server. Approximately 20% of the CPUs are running User processes, while the
other 80% of CPUS are running System/Kernel processes, causing immense
slow-down. Again, this only happens with the MLPClassifier - all other
models run appropriately with ~98% User processes and ~2% System/Kernel
processes.

Is there something unique in the MLPClassifier/Regressor models that causes
increased System/Kernel processes compared to other models? In an attempt
to troubleshoot, I used sklearn's RandomizedSearchCV instead of my custom
implementation and the same problems happen (with n_jobs specified in the
same way).

Any help with why the MLP models are behaving this way during
multiprocessing is much appreciated.
Best,
Taylor Keding

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

[scikit-learn] MLPClassifier/Regressor and Kernel Processes when Multiprocessing

Reply via email to