2013/1/14 Andreas Mueller <[email protected]>: > Hi Peter. > I only skimmed your mail, but I understood you said that the problem is > the use of a boolean mask. > Wouldn't it be possible to do the subsampling explicitly before training > the tree if the sample_fraction is low?
absolutely, when I wrote the code I haven't thought about very low values of ``subsample`` (<< 0.5). But again, this would incur high memory costs (we would need to fancy index) > Or is the complexity of applying the sample mask higher than training > the tree? by applying the sample mask you mean fancy indexing with the sample mask? In general, if you build deep trees the complexity of fancy indexing can be amortized by subsequent split computations; if you build shallow trees, it often cannot be amortized thus you're better off with the sparse sample_mask. I had the impression that it didn't help much for GBRT and shallow trees (below depth 6). > > Also: would it be possible to speed this up using the recently > introduced sample weights? > That helped for the random forests, right? no, unfortunately not - in GBRT we do sampling w/o replacement (RF is w/ replacement) > > Best, > Andy > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. SALE $99.99 this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122412 > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Peter Prettenhofer ------------------------------------------------------------------------------ Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122412 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
