>> Also another way to circumvent the n_samples change issue when doing
>> CV-based model selection of sparse models might be to use the
>> Bootstrap (sampling with replacement) and make the training size of
>> the folds artificially fixed to a the total training set (by having
>> redundant samples): I wonder if this is a good idea or not (having the
>> same sample show up several times in the training set might be a bad
>> idea).
> good remark. The scaling is valid under independence of the samples
> which breaks if you use replacement. I have to admit I don't know but
> I know who to ask :)
But you're not breaking independence here: you're drawing iid from a 
finite population. However, as pointed by Olivier, this may create some 
artefact when used with some classifiers.
(the scores across folds are not independent but this is true for most 
cv techniques, and this is another matter anyway).

B

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to