Re: Mahout SSVD is too slow for highly dimensional data

Ted Dunning Tue, 11 Jun 2013 00:49:59 -0700

Don't do that.

Why do you think you need 1000 singular values?


Have you tried with k=100, p=15?

Quite serious, I would expect that you would literally get just as good
results for almost any real application with 100 singular vectors and 900
orthogonal noise vectors.


On Tue, Jun 11, 2013 at 9:39 AM, Yehia Zakaria <[email protected]>wrote:

> Hi
>
> The requested rank (k) is 1000 and p is 1. The input size is 1.2 gigabyte.
>
> Thanks
>
>
>
> On Mon, Jun 10, 2013 at 9:28 PM, Dmitriy Lyubimov <[email protected]>
> wrote:
>
> > what is requested rank? This guy will not scale w.r.t rank, only w.r.t
> > input size. Reallistically you don't need k>100, p >15.
> >
> > What is the input size (A in Gb?)
> >
> >
> > On Mon, Jun 10, 2013 at 5:31 AM, Yahia Zakaria <[email protected]
> > >wrote:
> >
> > > Hi All
> > >
> > > I am running Mahout SSVD (trunk version) using pca option on Bag of
> Words
> > > dataset (http://archive.ics.uci.edu/ml/datasets/Bag+of+Words). This
> > > dataset
> > > have 8000000 instances (rows) and 100000 attributes (columns). Mahout
> > SSVD
> > > is too slow, it may take days to finish the first phase of SSVD (Q-Job)
> > . I
> > > am running the code on a cluster of 16 machines, each one is 8 cores
> and
> > 32
> > > GB memory. Moreover, the CPU and memory of the workers are not utilized
> > at
> > > all. While running Mahout SSVD on smaller dataset (12500 rows and 5000
> > > columns), it runs too fast, the job was finished in 2 minutes. Do you
> > have
> > > any idea why Mahout SSVD is too slow for high dimensional data ? and to
> > > what extent that SSVD can work efficiently (with respect to the number
> of
> > > rows and columns of the input matrix) ?
> > >
> > > Thanks
> > > Yehia
> > >
> >
>

Re: Mahout SSVD is too slow for highly dimensional data

Reply via email to