2012/9/7 Mathieu Blondel <[email protected]>:
> On my box, RidgeClassifier finishes in 136 seconds but kNN dies with
> MemoryError.
That's incredibly slow compared to all the other classifiers. Also
without --all_categories, it's slower than everything else.
I've already found out where things go wrong, though I still have no
idea why this didn't occur before.
Anyway, old-fashioned debugging with print statements points to line
103 of sklearn/linear_model/ridge.py as the culprit:
c = _solve(X * X.T + I, y, solver, tol)
(X * X.T) allocates a 90% dense (!) CSR matrix of size 11314**2.
That's close to a GB for the .data member alone and bigger than a
plain old np.array. Adding the diagonal matrix is enough the kill the
process (and all the PDF viewers I had open, stupid Linux
overcommit...).
> Could you try to identify which commit introduced the regression with git
> bisect?
I could try that, but I was actually working on something else when I
encountered this. I'm currently inclined to removing RidgeClassifier
from the example and opening an issue.
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general