> the motivation for these lines is that even if X is sparse
safe_sparse_dot(W, H)
will not be. So you will allocate a matrix of size X but dense which is
unacceptable in many cases.

Er, it looks like safe_sparse_dot() returns sparse unless dense_output=True.
And, I'm confused as to how this would result in more memory. Aren't we
allocating more in the lines above for the issparse(X) case? I'm stick
right now because my 40k x 220k CSR matrix can't make it past computing the
reconstruction_err without a MemoryError--with 200GB or RAM free. Any ideas
of how to reduce memory constraints of that calculation?

Thanks,
Will


On Wed, Aug 28, 2013 at 11:49 PM, Alexandre Gramfort <
alexandre.gramf...@telecom-paristech.fr> wrote:

> hi Will,
>
> >             if not sp.issparse(X):
> >
> >                 self.reconstruction_err_ = norm(X - np.dot(W, H))
> >
> >             else:
> >
> >                 norm2X = np.sum(X.data ** 2)  # Ok because X is CSR
> >
> >                 normWHT = np.trace(np.dot(np.dot(H.T, np.dot(W.T, W)),
> H))
> >
> >                 cross_prod = np.trace(np.dot((X * H.T).T, W))
> >
> >                 self.reconstruction_err_ = sqrt(norm2X + normWHT
> >
> >                                                 - 2. * cross_prod)
> >
> >
> > So, for a dense matrix X, this is relatively straight-forward. For a
> sparse
> > matrix, this is a massively expensive operation, at least from a memory
> > standpoint. Is there any reason we can't implement norm() for CSR, and
> just
> > self.reconstruction_err_ = safe_sparse_norm(X - safe_sparse_dot(W, H))?
>
> the motivation for these lines is that even if X is sparse
> safe_sparse_dot(W, H)
> will not be. So you will allocate a matrix of size X but dense which is
> unacceptable in many cases.
>
> > Additionally, is np.sum(X.data ** 2) a typo? Should it be np.sum(X.data *
> > 2)? If not a typo, the variable seems misnamed and should be
> "normSquared",
> > or something, not norm2X, right?
>
> norm2 is a common name for L2 norm. Indeed I could have added squared.
>
> > Surely the current approach could be done
> > more memory-efficiently also, but a "sparse safe norm" sounds better...
> >
> >
> > Additionally, is there any reason to use math.sqrt() as above instead of
> > np.sqrt()?
>
> yes. math.sqrt is faster on floats than np.sqrt which is only required
> for arrays.
>
> let me know if you have any question
>
> Best,
> Alex
>
> > I'm more-than-glad to fix this, but I'm hoping someone more familiar
> could
> > give me a bit of direction :)
> >
> >
> > Thanks!
> >
> > -Will
> >
> >
> >
> ------------------------------------------------------------------------------
> > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
> > Discover the easy way to master current and previous Microsoft
> technologies
> > and advance your career. Get an incredible 1,500+ hours of step-by-step
> > tutorial videos with LearnDevNow. Subscribe today and save!
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
>
>
> ------------------------------------------------------------------------------
> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
> Discover the easy way to master current and previous Microsoft technologies
> and advance your career. Get an incredible 1,500+ hours of step-by-step
> tutorial videos with LearnDevNow. Subscribe today and save!
> http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to