Hey guys,

I have a couple of questions about decomposition.nmf with respect to sparse
matrices:

nmf.py@527:


            if not sp.issparse(X):

                self.reconstruction_err_ = norm(X - np.dot(W, H))

            else:

                norm2X = np.sum(X.data ** 2)  # Ok because X is CSR

                normWHT = np.trace(np.dot(np.dot(H.T, np.dot(W.T, W)), H))

                cross_prod = np.trace(np.dot((X * H.T).T, W))

                self.reconstruction_err_ = sqrt(norm2X + normWHT

                                                - 2. * cross_prod)


So, for a dense matrix X, this is relatively straight-forward. For a sparse
matrix, this is a massively expensive operation, at least from a memory
standpoint. Is there any reason we can't implement norm() for CSR, and
just self.reconstruction_err_
= safe_sparse_norm(X - safe_sparse_dot(W, H))? Additionally, is np.sum(X.data
** 2) a typo? Should it be np.sum(X.data * 2)? If not a typo, the variable
seems misnamed and should be "normSquared", or something, not norm2X,
right? Surely the current approach could be done more memory-efficiently
also, but a "sparse safe norm" sounds better...


Additionally, is there any reason to use math.sqrt() as above instead of
np.sqrt()?


I'm more-than-glad to fix this, but I'm hoping someone more familiar could
give me a bit of direction :)


Thanks!

-Will
------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to