Hi!

On Wed, Apr 27, 2016 at 2:46 PM, Li Aodong <stamd...@outlook.com> wrote:
>
> First, it is about the subsample parameter description. As the picture
> below, it says that "Choosing subsample < 1.0 leads to *a* *reduction of
> variance and an increase in bias*". But I think choosing subsample < 1.0
> actually* increases the variance and decreases the bias*.
>
Subsampling in GBM can be seens as a form of bagging and it indeed reduce
variance
at the espense of increased bias.

>
> Second, if subsample is smaller than 1.0, it is indeed a necessary
> condition of Stochastic Gradient Boosting (SGB). However, SGB is not that
> simple. According to [1], SGB is required that "at each iteration a
> subsample of the training data is drawn at random (*without replacement*)
> from the full training data". I have tried your implementation, and I think
> you generate the sample mask *with replacement* at each iteration.
>
I haven't checked the implementation but I can reasonably state that
sampling
with or without replacement affects in minimal way the performance of the
model.
See http://www.stat.washington.edu/wxs/Learning-papers/paper-bag.pdf

Paolo
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to