It's not necessarily unique to stochastic gradient descent, it's more that some 
other algorithms are generally not well suited for "partial_fit". For SGD, 
partial fit is a more natural thing to do since you estimate the training loss 
from minibatches anyway -- i.e., you do SGD step by step anyway.

Also, think about it this way: models trained via SGD are typically parametric, 
so the number of parameters is fixed, and you simply just adjust their values 
iteratively during training. For nonparametric models, such as RF, the number 
of parameters (e.g., if you think about each node in the decision tree as a 
parameter) depends on the examples present in the training set. I.e., how deep 
each individual decision tree eventually becomes depends on the training set. 
So, it doesn't make sense to build a decision tree on a few training examples 
and then update it later by feeding it more training examples. Either way, you 
would probably end up throwing away the decision tree and build a new one if 
you get additional data. I am sure solutions for "updating" decision trees 
exist, which produce somewhat reasonable results efficiently, but it's less 
natural and not a common thing to do, which is why it's probably not 
implemented in scikit-learn.

Best,
Sebastian


> On Mar 13, 2019, at 10:45 PM, lampahome <pahome.c...@mirlab.org> wrote:
> 
> As title, I'm confused why some algo can partial_fit and some algo can't.
> 
> For regression model, I found SGD can but RF can't.
> 
> Is about the difference of algo? I thought it's able to partial_fit because 
> gradient descent, or just another reason?
> 
> thx
> _________________________

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to