Re: [Scikit-learn-general] Gradient boosting complexity

2013-01-14 Thread Peter Prettenhofer
2013/1/14 Andreas Mueller : > Hi Peter. > I only skimmed your mail, but I understood you said that the problem is > the use of a boolean mask. > Wouldn't it be possible to do the subsampling explicitly before training > the tree if the sample_fraction is low? absolutely, when I wrote the code I ha

Re: [Scikit-learn-general] Gradient boosting complexity

2013-01-14 Thread Andreas Mueller
Hi Peter. I only skimmed your mail, but I understood you said that the problem is the use of a boolean mask. Wouldn't it be possible to do the subsampling explicitly before training the tree if the sample_fraction is low? Or is the complexity of applying the sample mask higher than training the

Re: [Scikit-learn-general] Gradient boosting complexity

2013-01-14 Thread Peter Prettenhofer
2013/1/13 Erik Bernhardsson : > Just a quick question about the gradient boosting in scikit-learn. We have > tons of data to regress on (like 100M data points), but the running time of > the algorithm is linear in the size of X no matter what subsample is set to. Hi Erik, the problem pertains not