This looks great, and I'll talk to my lead scientist about incorporating it
and evaluating, thanks! I must warn you all, I'm not an algorithms guy; I'm
on the software/performance/"make this shit work" side of things. For the
task at hand, we're using NMF for a reason, and I've gotta make this
work. safe_sparse_dot()
returning dense if inputs are dense makes sense; the idea is to make W and
H sparse as Lars suggested, sorry.

Thanks,
Will


On Thu, Aug 29, 2013 at 1:58 AM, Olivier Grisel <olivier.gri...@ensta.org>wrote:

> 2013/8/29 Will Buckner <wbuck...@beatsmusic.com>:
> >> the motivation for these lines is that even if X is sparse
> >> safe_sparse_dot(W, H)
> > will not be. So you will allocate a matrix of size X but dense which is
> > unacceptable in many cases.
> >
> > Er, it looks like safe_sparse_dot() returns sparse unless
> dense_output=True.
> > And, I'm confused as to how this would result in more memory. Aren't we
> > allocating more in the lines above for the issparse(X) case? I'm stick
> right
> > now because my 40k x 220k CSR matrix can't make it past computing the
> > reconstruction_err without a MemoryError--with 200GB or RAM free. Any
> ideas
> > of how to reduce memory constraints of that calculation?
>
> You probably need an online (aka out-of-core, streaming, incremental)
> algorithm instead (e.g. SGD on the least square reconstruction error
> with positivity constraints possibly implemented as projections).
> Mathieu Blondel knows even better algorithms but AFAIK his paper might
> still be pending reviews so I am not sure he would like to speak about
> it in more details.
>
> Here are sample codes for partially observed (sparse) data that
> implement matrix factorization without the positivity constraints:
>
> https://github.com/scikit-learn/scikit-learn/pull/2387
> http://code.google.com/p/pyrsvd/
>
> If you have time it would be interesting to experiment with adding
> positivity projections and report your results on this mailing list.
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>
>
> ------------------------------------------------------------------------------
> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
> Discover the easy way to master current and previous Microsoft technologies
> and advance your career. Get an incredible 1,500+ hours of step-by-step
> tutorial videos with LearnDevNow. Subscribe today and save!
> http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to