This is non-parametric (aka brute force) way to check that a model has a predictive performance significantly higher than chance. For models with 90% accuracy this is useless as we already know for sure that the model is better than predicting at random. This method is only useful if you have very little data or very noisy data and you are not even sure that your predictive method is able to pick anything predictive from the data. E.g. you have a balanced binary classification problem with ~52% accuracy.
It proceeds as follows: it first does a single cross-validation round with the true label to compute a reference score. Then it does the same 100 times but each time with independently randomly permuted variants of the labels (the y array). Then it returns the fraction of the time the reference CV score was higher than the CV scores of the models trained and evaluated with permuted labels. Here is an example: http://scikit-learn.org/stable/auto_examples/feature_selection/plot_permutation_test_for_classification.html Note that you should not use than method to select the best model from a collection of possible models and then report its permutation test p-value without correcting for multiple comparisons. -- Olivier
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
