On Mon, Dec 05, 2011 at 01:41:53PM -0500, Alexandre Passos wrote: > On Mon, Dec 5, 2011 at 13:31, James Bergstra <[email protected]> wrote: > > I should probably not have scared ppl off speaking of a 250-job > > budget. My intuition would be that with 2-8 hyper-parameters, and 1-3 > > "significant" hyper-parameters, randomly sampling around 10-30 points > > should be pretty reliable.
> So perhaps the best implementation of this is to first generate a grid > (from the usual arguments to sklearn's GridSearch), randomly sort it, > and iterate over these points until the budget is exhausted? Does sound reasonnable. When doing grid searches, I find that an important aspect is that some grid points take a fraction of the time of others. This is actually a big motivation for doing things in parallel: with enough CPU (8) the time of a grid search can be fully limited by the time of computing the fit for the different folds on only one grid point. Thus the notion of budget is relevant, but the right budget is not exactly the number of fit points computed. That said, taking this is account will probably make the code much more complex, so I suggest that we put it on hold. G ------------------------------------------------------------------------------ Cloud Services Checklist: Pricing and Packaging Optimization This white paper is intended to serve as a reference, checklist and point of discussion for anyone considering optimizing the pricing and packaging model of a cloud services business. Read Now! http://www.accelacomm.com/jaw/sfnl/114/51491232/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
