hi Will,

>             if not sp.issparse(X):
>
>                 self.reconstruction_err_ = norm(X - np.dot(W, H))
>
>             else:
>
>                 norm2X = np.sum(X.data ** 2)  # Ok because X is CSR
>
>                 normWHT = np.trace(np.dot(np.dot(H.T, np.dot(W.T, W)), H))
>
>                 cross_prod = np.trace(np.dot((X * H.T).T, W))
>
>                 self.reconstruction_err_ = sqrt(norm2X + normWHT
>
>                                                 - 2. * cross_prod)
>
>
> So, for a dense matrix X, this is relatively straight-forward. For a sparse
> matrix, this is a massively expensive operation, at least from a memory
> standpoint. Is there any reason we can't implement norm() for CSR, and just
> self.reconstruction_err_ = safe_sparse_norm(X - safe_sparse_dot(W, H))?

the motivation for these lines is that even if X is sparse safe_sparse_dot(W, H)
will not be. So you will allocate a matrix of size X but dense which is
unacceptable in many cases.

> Additionally, is np.sum(X.data ** 2) a typo? Should it be np.sum(X.data *
> 2)? If not a typo, the variable seems misnamed and should be "normSquared",
> or something, not norm2X, right?

norm2 is a common name for L2 norm. Indeed I could have added squared.

> Surely the current approach could be done
> more memory-efficiently also, but a "sparse safe norm" sounds better...
>
>
> Additionally, is there any reason to use math.sqrt() as above instead of
> np.sqrt()?

yes. math.sqrt is faster on floats than np.sqrt which is only required
for arrays.

let me know if you have any question

Best,
Alex

> I'm more-than-glad to fix this, but I'm hoping someone more familiar could
> give me a bit of direction :)
>
>
> Thanks!
>
> -Will
>
>
> ------------------------------------------------------------------------------
> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
> Discover the easy way to master current and previous Microsoft technologies
> and advance your career. Get an incredible 1,500+ hours of step-by-step
> tutorial videos with LearnDevNow. Subscribe today and save!
> http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to