Hey guys,
I have a couple of questions about decomposition.nmf with respect to sparse
matrices:
nmf.py@527:
if not sp.issparse(X):
self.reconstruction_err_ = norm(X - np.dot(W, H))
else:
norm2X = np.sum(X.data ** 2) # Ok because X is CSR
normWHT = np.trace(np.dot(np.dot(H.T, np.dot(W.T, W)), H))
cross_prod = np.trace(np.dot((X * H.T).T, W))
self.reconstruction_err_ = sqrt(norm2X + normWHT
- 2. * cross_prod)
So, for a dense matrix X, this is relatively straight-forward. For a sparse
matrix, this is a massively expensive operation, at least from a memory
standpoint. Is there any reason we can't implement norm() for CSR, and
just self.reconstruction_err_
= safe_sparse_norm(X - safe_sparse_dot(W, H))? Additionally, is np.sum(X.data
** 2) a typo? Should it be np.sum(X.data * 2)? If not a typo, the variable
seems misnamed and should be "normSquared", or something, not norm2X,
right? Surely the current approach could be done more memory-efficiently
also, but a "sparse safe norm" sounds better...
Additionally, is there any reason to use math.sqrt() as above instead of
np.sqrt()?
I'm more-than-glad to fix this, but I'm hoping someone more familiar could
give me a bit of direction :)
Thanks!
-Will
------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general