Hello list! I want to use parallel cross-validation and still get reproducible results. In my code, I do
if __name__ == '__main__': # This is necessary to use n_jobs > 1. [...] clf = DecisionTreeClassifier(max_depth=5) cross_validation = StratifiedKFold(y, n_folds=10, shuffle=True, random_state=0) cross_val_prediction = cross_val_predict(clf, X, y, cv=cross_validation, n_jobs=6) However, this gives different results than with n_jobs=1! Could it be that there is a race condition between the jobs for access of the RNG? I noticed that when I set shuffle=False, the number of jobs does not matter. But isn't the RNG only used for the shuffling? And doesn't the shuffling happen _before_ launching the parallel jobs? So: How can I get reproducible results with shuffling and parallel processing? Best regards, Robert P.S.: I am using: Windows-7-6.1.7601-SP1 Python 3.5.1 (v3.5.1:37a07cee5969, Dec 6 2015, 01:54:25) [MSC v.1900 64 bit (AMD64)] NumPy 1.10.4 SciPy 0.17.0 Scikit-Learn 0.17 (all from WinPython-64bit-3.5.1.2). ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general