[scikit-learn] Smoke and Metamorphic Testing of scikit-learn

Steffen Herbold Wed, 22 Aug 2018 04:34:09 -0700

Dear developers,

I am writing you because I applied an approach for the automated testingof classification algorithms to scikit-learn and would like to forwardthe results to you.

The approach is a combination of smoke testing and metamorphic testing.The smoke tests try to find problems by executing the training andprediction functions of classifiers with different data. These smoketests should ensure the basic functioning of classifiers. I defined 20different data sets, some very simple (uniform features in [0,1]), somewith extreme distributions, e.g., data close to machine precision. Themetamorphic tests determine if classification results change as expectedif the training data is modified, e.g., by reordering features, flippingclass labels, or reordering instances.

I generated 70 different Python unittest tests for eleven differentscikit-learn classifiers. In summary, I found the following potentialproblems:- Two errors due to possibly infinite loops for theLogisticRegressionClassifier for data that approaches MAXDOUBLE.- The classification of LogisticRegression, MLPClassifier,QuadraticDiscriminantAnalysis, and SVM with a polynomial kernel changedif one is added to each feature value.- The classification of DecisionTreeClassifier, LogisticRegression,MLPClassifier, QuadraticDiscriminantAnalysis, RandomForestClassifier,and SVM with a linear and a polynomial kernel were not inverted when allbinary class labels are flipped.- The classification of LogisticRegression, MLPClassifier,QuadraticDiscriminantAnalysis, and RandomForestClassifier sometimeschanged when the features are reordered.- The classification of KNeighborsClassifier, MLPClassifier,QuadraticDiscriminantAnalysis, RandomForestClassifier, and SVM with alinear kernel sometimes changed when the instances are reordered.

You can find details of our results online [1]. The provided resourcesinclude the current draft of the paper that describes the tests as wellas detailed results in detail. Moreover, we provide an executable testsuite with all tests we executed, as well as the export of our testresults as XML file that contains all details of the test execution,including stack traces in case of exceptions. The preprint and onlinematerials also contain the results for two other machine learninglibraries, i.e., Weka and Spark MLlib. Additionally, you can find theatoml tool used to generate the tests on GitHub [2].

I hope that these tests may help with the future development ofscikit-learn. You could help me a lot by answering the following questions:

- Do you consider the tests helpful?

- Do you consider any source code or documentation changes due to ourfindings?- Would you be interested in a pull request or any other type ofintegration of (a subset of) the tests into your project?- Would you be interested in more such tests, e.g., for theconsideration of hyper parameters, other algorithm types likeclustering, or more complex algorithm specific metamorphic tests?


I am looking forward to your feedback.

Best regards,
Steffen Herbold

[1] http://user.informatik.uni-goettingen.de/~sherbold/atoml-results/
[2] https://github.com/sherbold/atoml

--
Dr. Steffen Herbold
Institute of Computer Science
University of Goettingen
Goldschmidtstraße 7
37077 Göttingen, Germany
mailto. herb...@cs.uni-goettingen.de
tel. +49 551 39-172037

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

[scikit-learn] Smoke and Metamorphic Testing of scikit-learn

Reply via email to