On Fri, Sep 7, 2012 at 11:41 PM, Lars Buitinck <[email protected]> wrote:
> 2012/9/7 Mathieu Blondel <[email protected]>:
> > On my box, RidgeClassifier finishes in 136 seconds but kNN dies with
> > MemoryError.
>
> That's incredibly slow compared to all the other classifiers. Also
> without --all_categories, it's slower than everything else.
>
> I've already found out where things go wrong, though I still have no
> idea why this didn't occur before.
>
> Anyway, old-fashioned debugging with print statements points to line
> 103 of sklearn/linear_model/ridge.py as the culprit:
>
> c = _solve(X * X.T + I, y, solver, tol)
>
> (X * X.T) allocates a 90% dense (!) CSR matrix of size 11314**2.
> That's close to a GB for the .data member alone and bigger than a
> plain old np.array. Adding the diagonal matrix is enough the kill the
> process (and all the PDF viewers I had open, stupid Linux
> overcommit...).
>
I just tried to do A = safe_sparse_dot(X, X.T, dense_output=True) then
force the solver to be sparse_cg even though A is dense
(scipy.sparse.cgworks with dense matrices). The training time went
from 136 to 74 seconds
(the dense_cholesky solver doesn't have a tol parameter so you cannot
control the speed / accuracy trade-off...).
Unfortunately we currently don't have any efficient way to do the dot
product of two sparse matrices and output the result in a dense array
(n_samples x n_samples here).
> > Could you try to identify which commit introduced the regression with git
> > bisect?
>
> I could try that, but I was actually working on something else when I
> encountered this. I'm currently inclined to removing RidgeClassifier
> from the example and opening an issue.
>
What about removing it in the all_categories case only?
Mathieu
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general