>> Also another way to circumvent the n_samples change issue when doing >> CV-based model selection of sparse models might be to use the >> Bootstrap (sampling with replacement) and make the training size of >> the folds artificially fixed to a the total training set (by having >> redundant samples): I wonder if this is a good idea or not (having the >> same sample show up several times in the training set might be a bad >> idea). > good remark. The scaling is valid under independence of the samples > which breaks if you use replacement. I have to admit I don't know but > I know who to ask :) But you're not breaking independence here: you're drawing iid from a finite population. However, as pointed by Olivier, this may create some artefact when used with some classifiers. (the scores across folds are not independent but this is true for most cv techniques, and this is another matter anyway).
B ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
