On Thu, Nov 05, 2015 at 07:05:11AM +0000, Raphael C wrote:
> https://github.com/szilard/benchm-ml

> The upshot is that in some cases it seems that the scikit-learn
> versions have room for improvement.

The various main lessons that I can see from those results are:

* Linear models (aka LogisticRegression) don't scale very well:

  - The page benches the default, which is liblinear.
    I would be very curious to see how the other solvers (Newton, and
    SAG) fair on this dataset.
    It would be useful to introduce a 'solver="auto"' for logistic
    regression, based on heavy benchmarks and heuristics.
    I have created an issue about this, to discuss if we want to do this:
    https://github.com/scikit-learn/scikit-learn/issues/5736

  - Having fused types to avoid increased memory would be useful.
    For this we first need to finish adding cython as a build dependency:
    https://github.com/scikit-learn/scikit-learn/pull/5492
  
- In tree-based Not handling categorical variables as such hurts us a lot
  There's a PR to fix that, it still needs a bit of love:
  https://github.com/scikit-learn/scikit-learn/pull/4899

Gaƫl

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to