On Mon, Dec 5, 2011 at 1:41 PM, Alexandre Passos <[email protected]> wrote: > On Mon, Dec 5, 2011 at 13:31, James Bergstra <[email protected]> wrote: >> I should probably not have scared ppl off speaking of a 250-job >> budget. My intuition would be that with 2-8 hyper-parameters, and 1-3 >> "significant" hyper-parameters, randomly sampling around 10-30 points >> should be pretty reliable. > > So perhaps the best implementation of this is to first generate a grid > (from the usual arguments to sklearn's GridSearch), randomly sort it, > and iterate over these points until the budget is exhausted? > > Presented like this I can easily see why this is better than (a) going > over the grid in order until the budget is exhausted or (b) using a > coarser grid to match the budget. This would also be very easy to > implement in sklearn. > > Do I make sense? > -- > - Alexandre > +1
This is definitely a good idea. I think randomly sampling is still useful though. It is not hard to get into settings where the grid is in theory very large and the user has a budget that is a tiny fraction of the full grid. Within the existing grid implementation though, the option to shuffle points and stop early would be great. - James ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
