2012/2/23 Matthias Ekman <[email protected]>:
> Hi,
>
> I only recently started using sklearn and it's an impressive and well
> documented library. Thanks!
>
> I run into some strange behavior while using the function
> 'permutation_test_score'.
>
> When using permutation_test_score with n_permutations = 50, everything
> looks alright
>
> In [4]: cv_scores, permutation_scores, pval =
> permutation_test_score(clf, X, Y, zero_one_score, cv=cv,
> n_permutations=50, n_jobs=4,verbose=1, random_state=0)
> [Parallel(n_jobs=4)]: Done   1 out of  50 | elapsed:    0.0s remaining:
> 1.5s
> [Parallel(n_jobs=4)]: Done  50 out of  50 | elapsed:    0.2s finished
>
> However, when using the exact same data, but with n_permutations = 200
> I don't get a result and this runs forever.
>
> In [6]: cv_scores, permutation_scores, pval =
> permutation_test_score(clf, X, Y, zero_one_score, cv=cv,
> n_permutations=200, n_jobs=4,verbose=1, random_state=0)
> [Parallel(n_jobs=4)]: Done   1 out of  54 | elapsed:    0.0s
> remaining:    2.0s # <-- stops here
>
> My code is here: https://gist.github.com/1884451 and the data to
> reproduce the problem is here:
> http://dl.dropbox.com/u/38470419/wired_data.dat # sample x feature matrix
> http://dl.dropbox.com/u/38470419/Y.txt # binary labels
>
> I am using sklearn .10 and joblib 0.6.1.
>
> I am not sure if that can be caused by some irregularities in my data.
> I would be grateful for every pointer.

Strange, it might be related to a problem Vlad is investigating on Mac
OS X Lion:

https://github.com/scikit-learn/scikit-learn/issues/636

Which platform / OS are you using?

> As a related question, as far as I can see permutation_test_score does
> not assure permuted labels, right? Couldn't
>
> pvalue = (np.sum(permutation_scores >= score) + 1.0) / (n_permutations + 1)
>
> be in some cases too conservative? I would count +1 _only_ when the
> true labels are _not_ included in the permutation set.

No idea, maybe @agramfort or @GaelVaroquaux have an opinion?

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to