2012/3/27 Paolo Losi <[email protected]>: > Gilles, > > thank you very much for having checked. > > If everyone agrees I'll: > > - uncomment extratrees and randomforest benchmark (@pprett is there > any valid reason to leave them out?)
no, absolutely not - I just forgot to uncomment them - thx > - explicitly config max_features=None for RandomForest and ExtraTrees +1 > > Thanks again > > Paolo > > On Tue, Mar 27, 2012 at 2:13 PM, Gilles Louppe <[email protected]> wrote: >> >> Hi, >> >> Using max_features="auto" (default setting) indeed yields the results >> that Paolo reports. >> >> When setting max_features=None (i.e., using all features as in our >> earlier code), I got the following on my machine: >> >> RandomForest 778.1471s 1.2830s 0.0248 >> Extra-Trees 1325.2397s 1.3544s 0.0199 >> >> which is consistent with what is mentioned in the doc. >> >> @pprett: Since max_features=sqrt(n_features) now by default on >> classification problems, the trees are usually more randomized, hence >> with a higher bias. To compensate for that, more trees usually need to >> be build whereas we only use 20 trees in the benchmark (which is low >> in my opinion). The effect of max_features is very dataset specific >> though. On some problems, decreasing max_features does not impair >> performance as much as here on covertype. I am not sure whether >> one-hot-encoding is causing this. >> >> Best, >> >> Gilles >> >> On 27 March 2012 13:38, Peter Prettenhofer <[email protected]> >> wrote: >> > Interesting - covtype involves a number of categorical attributes >> > which are represented via a one-hot encoding - do you think that such >> > a representation has a significant effect on feature sampling and thus >> > the performance of random forests? >> > >> > 2012/3/27 Gilles Louppe <[email protected]>: >> >> Hi, >> >> >> >> I am running the tests again, but indeed I think the difference in the >> >> results comes from that fact that max_features=sqrt(n_features) now by >> >> default whereas it was max_features=n_features before. >> >> >> >> Gilles >> >> >> >> On 27 March 2012 11:56, Paolo Losi <[email protected]> wrote: >> >>> Thanks Peter, >> >>> >> >>> On Tue, Mar 27, 2012 at 11:34 AM, Peter Prettenhofer >> >>> <[email protected]> wrote: >> >>>> >> >>>> Paolo, >> >>>> >> >>>> I noticed that too - maybe @glouppe can comment on this - I think the >> >>>> reason was a change in the ``n_features`` heuristic but I might be >> >>>> mistaken. >> >>> >> >>> >> >>> Gilles, can you give a quick look to it? If it's not anything obvious >> >>> just >> >>> ping back and I'll try to git bisect the issue... >> >>> >> >>>> >> >>>> Concerning the GaussianNB - there's a PR [1] adressing a critical bug >> >>>> in the estimator - it should be merged ASAP. >> >>> >> >>> >> >>> Thank's. I've commented on the PR (the performance regression seems >> >>> not to be connected with the PR) >> >>> >> >>>> >> >>>> Furthermore, test time is >> >>>> quite low - this might be due to memory layout issues - SGDClassifier >> >>>> converts ``coef_`` to fortran-style for increased test-time >> >>>> performance. >> >>> >> >>> >> >>> Clear. >> >>> >> >>> Thanks again >> >>> >> >>> Paolo >> >>> >> >>> >> >>> >> >>> ------------------------------------------------------------------------------ >> >>> This SF email is sponsosred by: >> >>> Try Windows Azure free for 90 days Click Here >> >>> http://p.sf.net/sfu/sfd2d-msazure >> >>> _______________________________________________ >> >>> Scikit-learn-general mailing list >> >>> [email protected] >> >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >>> >> > >> > >> > >> > -- >> > Peter Prettenhofer >> >> >> ------------------------------------------------------------------------------ >> This SF email is sponsosred by: >> Try Windows Azure free for 90 days Click Here >> http://p.sf.net/sfu/sfd2d-msazure >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > > -- > Paolo Losi > e-mail: [email protected] > mob: +39 348 7705261 > > ENUAN Srl > Via XX Settembre, 12 - 29100 Piacenza > > ------------------------------------------------------------------------------ > This SF email is sponsosred by: > Try Windows Azure free for 90 days Click Here > http://p.sf.net/sfu/sfd2d-msazure > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > -- Peter Prettenhofer ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
