[Scikit-learn-general] Reproducible results of parallel cross-validation

Robert Pollak Thu, 03 Mar 2016 05:04:25 -0800

Hello list!

I want to use parallel cross-validation and still get reproducible results. In 
my code, I do


if __name__ == '__main__': # This is necessary to use n_jobs > 1.
    [...]
    clf = DecisionTreeClassifier(max_depth=5)
    cross_validation = StratifiedKFold(y, n_folds=10, shuffle=True, 
random_state=0)
    cross_val_prediction = cross_val_predict(clf, X, y, cv=cross_validation, 
n_jobs=6)

However, this gives different results than with n_jobs=1!

Could it be that there is a race condition between the jobs for access of the 
RNG?
I noticed that when I set shuffle=False, the number of jobs does not matter.

But isn't the RNG only used for the shuffling?
And doesn't the shuffling happen _before_ launching the parallel jobs?

So: How can I get reproducible results with shuffling and parallel processing?

Best regards,
Robert

P.S.:
I am using:
Windows-7-6.1.7601-SP1
Python 3.5.1 (v3.5.1:37a07cee5969, Dec 6 2015, 01:54:25) [MSC v.1900 64 bit 
(AMD64)]
NumPy 1.10.4
SciPy 0.17.0
Scikit-Learn 0.17
(all from WinPython-64bit-3.5.1.2).

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Reproducible results of parallel cross-validation

Reply via email to