Oh, I see. 

I think random forest is just a different approach … I would say that xgboost 
is kind of a hybrid algorithm borrowing ideas from random forests and boosting. 
Random forests, Adaboost, xgboost, etc. are just different algorithms (like 
logistic regression, SVMs, and multi-layer perceptrons are different). What I 
was trying to say is that I wouldn’t fundamentally change the random forest 
algorithm in scikit-learn using ideas from xgboost, since it wouldn’t be a 
random forest anymore, then. Please don’t get me wrong, I’d also like to see a 
more efficient (predictive and/or computational performance), but I think that 
it should be a separate estimator, not modification of the random forest 
itself. 

> On Mar 26, 2016, at 4:52 PM, Raphael C <drr...@gmail.com> wrote:
> 
> 
> 
> On Friday, 25 March 2016, Sebastian Raschka <se.rasc...@gmail.com> wrote:
> > wondering what changes are needed to make
> > RandomForestClassifier competitive with xgboost and H20 at
> 
> Do you mean in terms of predictive performance (not computational 
> efficiency)? Not sure what other's think, but I wouldn't change the core 
> algorithm since otherwise it's not really a "Random forest" anymore as it is 
> described in literature -- and that would be very confusing for users and 
> researchers.
> 
> 
>  
> I really meant just to ask the question, what is preventing the scikit learn 
> random forest implementation from a) scaling as well as xgboost and h20 and 
> b) getting as good AUC?
> 
> If the answer is that this is fundamentally the limit of bagging random 
> forests ( and that xgboost and h20 both implement boosting or something else 
> that scales and performs better) then that is already very interesting.
> Raphael
> > On Mar 22, 2016, at 7:52 AM, Raphael C <drr...@gmail.com> wrote:
> >
> >>
> >> - In tree-based Not handling categorical variables as such hurts us a lot
> >>  There's a PR to fix that, it still needs a bit of love:
> >>  https://github.com/scikit-learn/scikit-learn/pull/4899
> >>
> >
> > This is a conversation moved from
> > https://github.com/scikit-learn/scikit-learn/pull/4899 .
> >
> > In the light of the comment above and comments in the PR, I was
> > wondering what changes are needed to make
> > RandomForestClassifier competitive with xgboost and H20 at
> > http://datascience.la/benchmarking-random-forest-implementations/ .
> >
> > Raphael
> >
> > ------------------------------------------------------------------------------
> > Transform Data into Opportunity.
> > Accelerate data analysis in your applications with
> > Intel Data Analytics Acceleration Library.
> > Click to learn more.
> > http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> 
> 
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140_______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to