Re: Mahout SSVD is too slow for highly dimensional data

Sebastian Schelter Mon, 10 Jun 2013 08:44:53 -0700

Did you check your memory settings? Make sure the machines don't swap.
Am 10.06.2013 14:48 schrieb "Yahia Zakaria" <[email protected]>:


> Yes, I have tuned the number of reducers, the best choice based on my
> cluster is 56 reducers.
>
>
> On Mon, Jun 10, 2013 at 3:39 PM, Sebastian Schelter <[email protected]>
> wrote:
>
> > Did you tune the number of reducers? I successfully applied ssvd to a
> > dataset with 3B nonzeros on 6 machines in a few hours.
> > Am 10.06.2013 14:32 schrieb "Yahia Zakaria" <[email protected]>:
> >
> > > Hi All
> > >
> > > I am running Mahout SSVD (trunk version) using pca option on Bag of
> Words
> > > dataset (http://archive.ics.uci.edu/ml/datasets/Bag+of+Words). This
> > > dataset
> > > have 8000000 instances (rows) and 100000 attributes (columns). Mahout
> > SSVD
> > > is too slow, it may take days to finish the first phase of SSVD (Q-Job)
> > . I
> > > am running the code on a cluster of 16 machines, each one is 8 cores
> and
> > 32
> > > GB memory. Moreover, the CPU and memory of the workers are not utilized
> > at
> > > all. While running Mahout SSVD on smaller dataset (12500 rows and 5000
> > > columns), it runs too fast, the job was finished in 2 minutes. Do you
> > have
> > > any idea why Mahout SSVD is too slow for high dimensional data ? and to
> > > what extent that SSVD can work efficiently (with respect to the number
> of
> > > rows and columns of the input matrix) ?
> > >
> > > Thanks
> > > Yehia
> > >
> >
>

Re: Mahout SSVD is too slow for highly dimensional data

Reply via email to