Oh, I see. I think random forest is just a different approach … I would say that xgboost is kind of a hybrid algorithm borrowing ideas from random forests and boosting. Random forests, Adaboost, xgboost, etc. are just different algorithms (like logistic regression, SVMs, and multi-layer perceptrons are different). What I was trying to say is that I wouldn’t fundamentally change the random forest algorithm in scikit-learn using ideas from xgboost, since it wouldn’t be a random forest anymore, then. Please don’t get me wrong, I’d also like to see a more efficient (predictive and/or computational performance), but I think that it should be a separate estimator, not modification of the random forest itself.
> On Mar 26, 2016, at 4:52 PM, Raphael C <drr...@gmail.com> wrote: > > > > On Friday, 25 March 2016, Sebastian Raschka <se.rasc...@gmail.com> wrote: > > wondering what changes are needed to make > > RandomForestClassifier competitive with xgboost and H20 at > > Do you mean in terms of predictive performance (not computational > efficiency)? Not sure what other's think, but I wouldn't change the core > algorithm since otherwise it's not really a "Random forest" anymore as it is > described in literature -- and that would be very confusing for users and > researchers. > > > > I really meant just to ask the question, what is preventing the scikit learn > random forest implementation from a) scaling as well as xgboost and h20 and > b) getting as good AUC? > > If the answer is that this is fundamentally the limit of bagging random > forests ( and that xgboost and h20 both implement boosting or something else > that scales and performs better) then that is already very interesting. > Raphael > > On Mar 22, 2016, at 7:52 AM, Raphael C <drr...@gmail.com> wrote: > > > >> > >> - In tree-based Not handling categorical variables as such hurts us a lot > >> There's a PR to fix that, it still needs a bit of love: > >> https://github.com/scikit-learn/scikit-learn/pull/4899 > >> > > > > This is a conversation moved from > > https://github.com/scikit-learn/scikit-learn/pull/4899 . > > > > In the light of the comment above and comments in the PR, I was > > wondering what changes are needed to make > > RandomForestClassifier competitive with xgboost and H20 at > > http://datascience.la/benchmarking-random-forest-implementations/ . > > > > Raphael > > > > ------------------------------------------------------------------------------ > > Transform Data into Opportunity. > > Accelerate data analysis in your applications with > > Intel Data Analytics Acceleration Library. > > Click to learn more. > > http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140 > > _______________________________________________ > > Scikit-learn-general mailing list > > Scikit-learn-general@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > ------------------------------------------------------------------------------ > Transform Data into Opportunity. > Accelerate data analysis in your applications with > Intel Data Analytics Acceleration Library. > Click to learn more. > http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140 > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ > Transform Data into Opportunity. > Accelerate data analysis in your applications with > Intel Data Analytics Acceleration Library. > Click to learn more. > http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140_______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140 _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general