hi all,

what would be the theoretical or practical implications of computing the cv
score by accumulating all test outcomes into a long vector (option 1) vs
averaging per fold (option 2), especially when N's are small.

for example:

fold1 : true[a,b,c,d] pred[x,y,z,w] s1=score(true, pred)
fold2:  true[e,f,g,h] pred [p,q,r,s] s2=score(true, pred)

option 1.  cv_score = score([a,b,c,d,e,f,g,h], [x,y,z,w,p,q,r,s])
option 2.  cv_score = mean([s1, s2]) # currently sklearn implements this

cheers,

satra
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to