You can just do this via a CV object. For example, use StratifiedShuffleSplit(train_set=.1, test_set=.1, n_folds=5) and your training and test set will be randomly samples disjoint 10% of the data, repeated 5 times.
On 02/19/2016 11:42 AM, Gael Varoquaux wrote: > That won't work, as it is modifying the number of samples, which breaks > the scikit-learn pipeline. > > Please add this usecase in the PR on the scikit-learn enhancement > proposal that discusses a possible modification to scikit-learn: > https://github.com/scikit-learn/enhancement_proposals/pull/2 > > Cheers, > > Gaƫl > > On Fri, Feb 19, 2016 at 11:36:29AM -0500, Sebastian Raschka wrote: >> Hi, Stelios, >> I am wondering, how did you implement this tweak? Just a thought, but >> instead of adding extra functionality inside the GridSearch class, what >> about using a random training data selector (transformer) as a pipeline >> object? Something along the lines of >> class RandomRowSelector(object): >> def __init__(self): >> pass >> def _some_random_sampling_function(self, X, y) >> def transform(self, X, y): >> sampled_rows = self.some_random_sampling_function(self, X, y) >> return X[sampled_rows, :], y[sampled_rows, :] >> def fit(self, X, y=None): >> return self >> Best, >> Sebastian >>> On Feb 19, 2016, at 7:56 AM, Stylianos Kampakis >>> <stylianos.kampa...@gmail.com> wrote: >>> Hi everyone, >>> I was thinking to implement a tweak where it is possible to sample randomly >>> from a dataset when using grid search. This would particularly useful for >>> big datasets. The sampling takes place during each round of grid search. >>> Does anyone think this would be worthy submitting to scikit-learn? >>> Best regards, >>> Stelios >>> ------------------------------------------------------------------------------ >>> Site24x7 APM Insight: Get Deep Visibility into Application Performance >>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >>> Monitor end-to-end web transactions and take corrective actions now >>> Troubleshoot faster and improve end-user experience. Signup Now! >>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >> ------------------------------------------------------------------------------ >> Site24x7 APM Insight: Get Deep Visibility into Application Performance >> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >> Monitor end-to-end web transactions and take corrective actions now >> Troubleshoot faster and improve end-user experience. Signup Now! >> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general