On Thu, Nov 05, 2015 at 07:05:11AM +0000, Raphael C wrote: > https://github.com/szilard/benchm-ml
> The upshot is that in some cases it seems that the scikit-learn > versions have room for improvement. The various main lessons that I can see from those results are: * Linear models (aka LogisticRegression) don't scale very well: - The page benches the default, which is liblinear. I would be very curious to see how the other solvers (Newton, and SAG) fair on this dataset. It would be useful to introduce a 'solver="auto"' for logistic regression, based on heavy benchmarks and heuristics. I have created an issue about this, to discuss if we want to do this: https://github.com/scikit-learn/scikit-learn/issues/5736 - Having fused types to avoid increased memory would be useful. For this we first need to finish adding cython as a build dependency: https://github.com/scikit-learn/scikit-learn/pull/5492 - In tree-based Not handling categorical variables as such hurts us a lot There's a PR to fix that, it still needs a bit of love: https://github.com/scikit-learn/scikit-learn/pull/4899 Gaƫl ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general