I've just tried scipy.sparse.linalg.lsqr [*] on the full news20 dataset. On
my box it takes 8 seconds to run with tol=1e-3 and 5 seconds with tol=1e-2
without any accuracy loss. It also solves the memory problem mentioned by
Lars, as it works directly with X and y.
Unlike scipy.linalg.lsqr, scipy.sparse.linalg.lsqr supports a
regularization term so it can actually be used to implement Ridge. Also,
despite the name, it supports dense arrays too so it may be worth comparing
it with solver="dense_cholesky" in the dense case. It cannot be used if
sample_weight != 1.0 though.
I'll try to send a PR some time this week.
Mathieu
[*]
http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.lsqr.html#scipy.sparse.linalg.lsqr
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general