Roberto,
Check this link out.
https://www.youtube.com/watch?v=0wUF_Ov8b0A
Andy actually did a workshop where he talked about grid searching and random
searching a few weeks ago. I watched this one a few days ago as well, really
great stuff in there that you'll be able to use as well.
Here is the paper that covers random search vs. grid search
http://www.jmlr.org/papers/v13/bergstra12a.html
Much of the math is above my paygrade in the paper, but the concepts are
explained well and made intuitive sense as I was reading the rationale in the
paper.
A similar question came up last week and I got these links from Andy's response
(thanks Andy!)
-Jason
From: Pagliari, Roberto [mailto:rpagli...@appcomsci.com]
Sent: Tuesday, April 14, 2015 3:08 PM
To: scikit-learn-general@lists.sourceforge.net
Subject: Re: [Scikit-learn-general] adaboost parameters
hi Jason/Andreas,
I watched the videos and definitely helped. Adaboost seems to be a special case
of GBR.
It seems to be working very well with my dataset.
If you guys have any suggestions about the grid over which grid search should
be run, that'd be great. Some suggestions are provided in the video. I don't
know other tips or rule of thumbs are available.
Thanks,
________________________________
From: Jason Wolosonovich [jmwol...@asu.edu]
Sent: Monday, April 13, 2015 10:47 PM
To:
scikit-learn-general@lists.sourceforge.net<mailto:scikit-learn-general@lists.sourceforge.net>
Subject: Re: [Scikit-learn-general] adaboost parameters
Roberto,
Sorry it took so long to respond, I was traveling and haven't been able to
check my email for a couple of days.
Andy brings up a great point. The Gradient Boosting Classifier (or Regressor,
depending on which you're trying to do) might be what you're looking for. I
watched the video he referenced in the link just last week actually for a
project I was working on and I definitely recommend it. Check that video out
and then let us know if you have any further questions, but I think that will
start you in the right direction.
Regarding your learning rate question, smaller learning rates require more
trees (n_estimators) which increases run time and computational requirements,
but also (usually) improves your model, so that's a judgement call on your
part, if that makes sense, since you know the amount of time, etc., that you
have for your project.
-Jason
From: Andreas Mueller [mailto:t3k...@gmail.com]
Sent: Monday, April 13, 2015 3:31 PM
To:
scikit-learn-general@lists.sourceforge.net<mailto:scikit-learn-general@lists.sourceforge.net>
Subject: Re: [Scikit-learn-general] adaboost parameters
You might consider using gradient boosting instead.
see https://www.youtube.com/watch?v=IXZKgIsZRm0
On 04/12/2015 03:45 AM, Pagliari, Roberto wrote:
Right now I'm using the default values, which means decision tree as the
estimator and learning rate 1.0.
I should probably change the learning rate, at the very least, because I'm not
getting good performance.
Does it make sense to use random forest, instead of decision tree?
Thanks,
From: Jason Wolosonovich [mailto:jmwol...@asu.edu]
Sent: Saturday, April 11, 2015 9:13 AM
To:
scikit-learn-general@lists.sourceforge.net<mailto:scikit-learn-general@lists.sourceforge.net>
Subject: Re: [Scikit-learn-general] adaboost parameters
What is your dataset like? How are you building your individual classifier that
you are ensembling with AdaBoost? A common-use case would be boosted decision
stumps (one-level decision trees).
http://en.wikipedia.org/wiki/Decision_stump
http://lyonesse.stanford.edu/~langley/papers/stump.ml92.pdf<http://lyonesse.stanford.edu/%7Elangley/papers/stump.ml92.pdf>
So with decision stumps and/or a very high learning rate, you would, in
general, need more (relatively speaking) estimators. Whether your dataset has
10 features or 100 features (or more...or less) will be important as well as
the depth of each tree (assuming that you're boosting decision trees). Boosting
is an iterative process, so you'd like as many trees as you can get and a
small-ish learning rate in order to get the best results, with the limiting
factor (as always) being your computational and time budgets, respectively.
My 2 cents. :D
-Jason
From: Pagliari, Roberto [mailto:rpagli...@appcomsci.com]
Sent: Friday, April 10, 2015 1:18 PM
To:
scikit-learn-general@lists.sourceforge.net<mailto:scikit-learn-general@lists.sourceforge.net>
Subject: [Scikit-learn-general] adaboost parameters
When using adaboost, what is a range of values of n_estimators and learning
rate that makes sense to optimize over?
Thank you,
------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general