hi all,
what would be the theoretical or practical implications of computing the cv
score by accumulating all test outcomes into a long vector (option 1) vs
averaging per fold (option 2), especially when N's are small.
for example:
fold1 : true[a,b,c,d] pred[x,y,z,w] s1=score(true, pred)
fold2: true[e,f,g,h] pred [p,q,r,s] s2=score(true, pred)
option 1. cv_score = score([a,b,c,d,e,f,g,h], [x,y,z,w,p,q,r,s])
option 2. cv_score = mean([s1, s2]) # currently sklearn implements this
cheers,
satra
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general