Roberto,

Check this link out.

https://www.youtube.com/watch?v=0wUF_Ov8b0A

Andy actually did a workshop where he talked about grid searching and random 
searching a few weeks ago. I watched this one a few days ago as well, really 
great stuff in there that you'll be able to use as well.

Here is the paper that covers random search vs. grid search

http://www.jmlr.org/papers/v13/bergstra12a.html

Much of the math is above my paygrade in the paper, but the concepts are 
explained well and made intuitive sense as I was reading the rationale in the 
paper.

A similar question came up last week and I got these links from Andy's response 
(thanks Andy!)

-Jason



From: Pagliari, Roberto [mailto:rpagli...@appcomsci.com]
Sent: Tuesday, April 14, 2015 3:08 PM
To: scikit-learn-general@lists.sourceforge.net
Subject: Re: [Scikit-learn-general] adaboost parameters

hi Jason/Andreas,
I watched the videos and definitely helped. Adaboost seems to be a special case 
of GBR.

It seems to be working very well with my dataset.

If you guys have any suggestions about the grid over which grid search should 
be run, that'd be great. Some suggestions are provided in the video. I don't 
know other tips or rule of thumbs are available.

Thanks,


________________________________
From: Jason Wolosonovich [jmwol...@asu.edu]
Sent: Monday, April 13, 2015 10:47 PM
To: 
scikit-learn-general@lists.sourceforge.net<mailto:scikit-learn-general@lists.sourceforge.net>
Subject: Re: [Scikit-learn-general] adaboost parameters
Roberto,

Sorry it took so long to respond, I was traveling and haven't been able to 
check my email for a couple of days.

Andy brings up a great point. The Gradient Boosting Classifier (or Regressor, 
depending on which you're trying to do) might be what you're looking for. I 
watched the video he referenced in the link just last week actually for a 
project I was working on and I definitely recommend it. Check that video out 
and then let us know if you have any further questions, but I think that will 
start you in the right direction.

Regarding your learning rate question, smaller learning rates require more 
trees (n_estimators) which increases run time and computational requirements, 
but also (usually) improves your model, so that's a judgement call on your 
part, if that makes sense, since you know the amount of time, etc., that you 
have for your project.

-Jason

From: Andreas Mueller [mailto:t3k...@gmail.com]
Sent: Monday, April 13, 2015 3:31 PM
To: 
scikit-learn-general@lists.sourceforge.net<mailto:scikit-learn-general@lists.sourceforge.net>
Subject: Re: [Scikit-learn-general] adaboost parameters

You might consider using gradient boosting instead.
see https://www.youtube.com/watch?v=IXZKgIsZRm0
On 04/12/2015 03:45 AM, Pagliari, Roberto wrote:
Right now I'm using the default values, which means decision tree as the 
estimator and learning rate 1.0.

I should probably change the learning rate, at the very least, because I'm not 
getting good performance.

Does it make sense to use random forest, instead of decision tree?

Thanks,


From: Jason Wolosonovich [mailto:jmwol...@asu.edu]
Sent: Saturday, April 11, 2015 9:13 AM
To: 
scikit-learn-general@lists.sourceforge.net<mailto:scikit-learn-general@lists.sourceforge.net>
Subject: Re: [Scikit-learn-general] adaboost parameters

What is your dataset like? How are you building your individual classifier that 
you are ensembling with AdaBoost? A common-use case would be boosted decision 
stumps (one-level decision trees).

http://en.wikipedia.org/wiki/Decision_stump

http://lyonesse.stanford.edu/~langley/papers/stump.ml92.pdf<http://lyonesse.stanford.edu/%7Elangley/papers/stump.ml92.pdf>

So with decision stumps  and/or a very high learning rate, you would, in 
general, need more (relatively speaking) estimators. Whether  your dataset has 
10 features or 100 features (or more...or less) will be important as well as 
the depth of each tree (assuming that you're boosting decision trees). Boosting 
is an iterative process, so you'd like as many trees as you can get and a 
small-ish learning rate in order to get the best results, with the limiting 
factor (as always) being your computational and time budgets, respectively.

My 2 cents. :D

-Jason

From: Pagliari, Roberto [mailto:rpagli...@appcomsci.com]
Sent: Friday, April 10, 2015 1:18 PM
To: 
scikit-learn-general@lists.sourceforge.net<mailto:scikit-learn-general@lists.sourceforge.net>
Subject: [Scikit-learn-general] adaboost parameters

When using adaboost, what is a range of values of n_estimators and learning 
rate that makes sense to optimize over?

Thank you,



------------------------------------------------------------------------------

BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT

Develop your own process in accordance with the BPMN 2 standard

Learn Process modeling best practices with Bonita BPM through live exercises

http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_

source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF



_______________________________________________

Scikit-learn-general mailing list

Scikit-learn-general@lists.sourceforge.net<mailto:Scikit-learn-general@lists.sourceforge.net>

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to