Gilles,
thank you very much for having checked.
If everyone agrees I'll:
- uncomment extratrees and randomforest benchmark (@pprett is there
any valid reason to leave them out?)
- explicitly config max_features=None for RandomForest and ExtraTrees
Thanks again
Paolo
On Tue, Mar 27, 2012 at 2:13 PM, Gilles Louppe <[email protected]> wrote:
> Hi,
>
> Using max_features="auto" (default setting) indeed yields the results
> that Paolo reports.
>
> When setting max_features=None (i.e., using all features as in our
> earlier code), I got the following on my machine:
>
> RandomForest 778.1471s 1.2830s 0.0248
> Extra-Trees 1325.2397s 1.3544s 0.0199
>
> which is consistent with what is mentioned in the doc.
>
> @pprett: Since max_features=sqrt(n_features) now by default on
> classification problems, the trees are usually more randomized, hence
> with a higher bias. To compensate for that, more trees usually need to
> be build whereas we only use 20 trees in the benchmark (which is low
> in my opinion). The effect of max_features is very dataset specific
> though. On some problems, decreasing max_features does not impair
> performance as much as here on covertype. I am not sure whether
> one-hot-encoding is causing this.
>
> Best,
>
> Gilles
>
> On 27 March 2012 13:38, Peter Prettenhofer <[email protected]>
> wrote:
> > Interesting - covtype involves a number of categorical attributes
> > which are represented via a one-hot encoding - do you think that such
> > a representation has a significant effect on feature sampling and thus
> > the performance of random forests?
> >
> > 2012/3/27 Gilles Louppe <[email protected]>:
> >> Hi,
> >>
> >> I am running the tests again, but indeed I think the difference in the
> >> results comes from that fact that max_features=sqrt(n_features) now by
> >> default whereas it was max_features=n_features before.
> >>
> >> Gilles
> >>
> >> On 27 March 2012 11:56, Paolo Losi <[email protected]> wrote:
> >>> Thanks Peter,
> >>>
> >>> On Tue, Mar 27, 2012 at 11:34 AM, Peter Prettenhofer
> >>> <[email protected]> wrote:
> >>>>
> >>>> Paolo,
> >>>>
> >>>> I noticed that too - maybe @glouppe can comment on this - I think the
> >>>> reason was a change in the ``n_features`` heuristic but I might be
> >>>> mistaken.
> >>>
> >>>
> >>> Gilles, can you give a quick look to it? If it's not anything obvious
> just
> >>> ping back and I'll try to git bisect the issue...
> >>>
> >>>>
> >>>> Concerning the GaussianNB - there's a PR [1] adressing a critical bug
> >>>> in the estimator - it should be merged ASAP.
> >>>
> >>>
> >>> Thank's. I've commented on the PR (the performance regression seems
> >>> not to be connected with the PR)
> >>>
> >>>>
> >>>> Furthermore, test time is
> >>>> quite low - this might be due to memory layout issues - SGDClassifier
> >>>> converts ``coef_`` to fortran-style for increased test-time
> >>>> performance.
> >>>
> >>>
> >>> Clear.
> >>>
> >>> Thanks again
> >>>
> >>> Paolo
> >>>
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>> This SF email is sponsosred by:
> >>> Try Windows Azure free for 90 days Click Here
> >>> http://p.sf.net/sfu/sfd2d-msazure
> >>> _______________________________________________
> >>> Scikit-learn-general mailing list
> >>> [email protected]
> >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>>
> >
> >
> >
> > --
> > Peter Prettenhofer
>
>
> ------------------------------------------------------------------------------
> This SF email is sponsosred by:
> Try Windows Azure free for 90 days Click Here
> http://p.sf.net/sfu/sfd2d-msazure
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
--
Paolo Losi
e-mail: [email protected]
mob: +39 348 7705261
ENUAN Srl
Via XX Settembre, 12 - 29100 Piacenza
------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general