Dear all,

Recently I have been using scikit-learn as my tool box of traditional machine 
learning. And I am pretty surprised about the strength. However, I have also 
met some problems about gradient boosting module and I think you can help me 
figure them out.


First, it is about the subsample parameter description. As the picture below, 
it says that "Choosing subsample < 1.0 leads to a reduction of variance and an 
increase in bias". But I think choosing subsample < 1.0 actually increases the 
variance and decreases the bias.


[cid:9512bd78-deb1-4e97-935c-d15b2e58e403]


Second, if subsample is smaller than 1.0, it is indeed a necessary condition of 
Stochastic Gradient Boosting (SGB). However, SGB is not that simple. According 
to [1], SGB is required that "at each iteration a subsample of the training 
data is drawn at random (without replacement) from the full training data". I 
have tried your implementation, and I think you generate the sample mask with 
replacement at each iteration.


I really like scikit-learn and I want to use it for my research. So I hope to 
figure it out whether I am wrong or there are problems about this.


Best,

Aodong
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to