Paolo, I noticed that too - maybe @glouppe can comment on this - I think the reason was a change in the ``n_features`` heuristic but I might be mistaken.
Concerning the GaussianNB - there's a PR [1] adressing a critical bug in the estimator - it should be merged ASAP. Furthermore, test time is quite low - this might be due to memory layout issues - SGDClassifier converts ``coef_`` to fortran-style for increased test-time performance. best, Peter [1] https://github.com/scikit-learn/scikit-learn/pull/731 2012/3/27 Paolo Losi <[email protected]>: > Hi all, > > I've just run bench_covertype on today master. > I needed to uncomment ExtraTrees and RandomForest benchs. > > The result are quite unexpected: > > Classifier train-time test-time error-rate > -------------------------------------------- > Liblinear 13.5609s 0.0683s 0.2307 > GaussianNB 3.6565s 0.1753s 0.6367 > SGD 0.4522s 0.0170s 0.2300 > CART 35.3378s 0.0375s 0.0476 > RandomForest 246.8737s 0.6908s 0.0807 > Extra-Trees 182.0412s 0.6269s 0.1986 > > > with respect to the what reported in bech_covertype.py: > > Classifier train-time test-time error-rate > -------------------------------------------- Liblinear 11.8977s 0.0285s > 0.2305 GaussianNB 3.5931s 0.6645s 0.3633 SGD 0.2924s 0.0114s 0.2300 CART > 39.9829s 0.0345s 0.0476 RandomForest 794.6232s 1.0526s 0.0249 Extra-Trees > 1401.7051s 1.1181s 0.0230 > > Unless I'm missing something obvious I'll open a ticket > a try to give git bisect a run ... > > Thanks! > Paolo > > PS: I just noticed that also GaussianNB results are worse... -- Peter Prettenhofer ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
