I should probably not have scared ppl off speaking of a 250-job budget. My intuition would be that with 2-8 hyper-parameters, and 1-3 "significant" hyper-parameters, randomly sampling around 10-30 points should be pretty reliable.
- James On Mon, Dec 5, 2011 at 1:28 PM, James Bergstra <[email protected]> wrote: > On Sat, Dec 3, 2011 at 6:32 AM, Olivier Grisel <[email protected]> > wrote: >>> With regards to the random sampling, I am a bit worried that the results >>> hold for a fair amount of points, and with a small amount of points >>> (which is typically the situation in which many of us hide) it becomes >>> very sensitive to the seed. >> >> I guess you should monitor the improvement before deciding to stop the >> search. > > My experience has been > > 1. that you start from an idea of a grid you'd like to try (ranges for > hyper-parameters, intervals for each hyper-parameter that might make a > difference), > > 2. you realize there's a huge number of points in the ideal grid, and > you have a budget for like 250 > > 3a. you pick a good grid that still gets "the most important part" , vs. > > 3b. you sample randomly in the original (huge) space. > > If you sample randomly in a space that is close to the grid you were > going to try, but includes some of the finer resolution that you had > to throw out to get down to 250 grid points, you should do better with > 250 random points (3b) than your grid (3a). > > You're right that with just a few (i.e. < 10) random samples, mileage > will vary greatly... but that's not really the regime in which you can > do a grid search anyway. > > I can hopefully offer more convincing evidence soon... I have a > journal paper on this that has been accepted, but I still need to > polish it up for publication. > > - James ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
