It's not necessarily unique to stochastic gradient descent, it's more that some other algorithms are generally not well suited for "partial_fit". For SGD, partial fit is a more natural thing to do since you estimate the training loss from minibatches anyway -- i.e., you do SGD step by step anyway.
Also, think about it this way: models trained via SGD are typically parametric, so the number of parameters is fixed, and you simply just adjust their values iteratively during training. For nonparametric models, such as RF, the number of parameters (e.g., if you think about each node in the decision tree as a parameter) depends on the examples present in the training set. I.e., how deep each individual decision tree eventually becomes depends on the training set. So, it doesn't make sense to build a decision tree on a few training examples and then update it later by feeding it more training examples. Either way, you would probably end up throwing away the decision tree and build a new one if you get additional data. I am sure solutions for "updating" decision trees exist, which produce somewhat reasonable results efficiently, but it's less natural and not a common thing to do, which is why it's probably not implemented in scikit-learn. Best, Sebastian > On Mar 13, 2019, at 10:45 PM, lampahome <pahome.c...@mirlab.org> wrote: > > As title, I'm confused why some algo can partial_fit and some algo can't. > > For regression model, I found SGD can but RF can't. > > Is about the difference of algo? I thought it's able to partial_fit because > gradient descent, or just another reason? > > thx > _________________________ _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn