Thanks a lot for the detailed explanation, it was very helpful.
I will write a CSV to sequence converter, just needed some clarity on the
key/value pairs in the sequence file.

Suppose my csv file contains the below values
11,22,33,44,55
13,23,34,45,56

I assume that the sequence file would look like this, where 12, 1, 14, 8,
15 are indices which hold the values
Key:1: Value:{12:11,1:22,14:33,8:44,15:55}
Key: 2: Value:{12:13,1:23,14:34,8:45,15:56}

Please confirm if my understanding is correct.

Thanks,
Vijay


On Wed, Mar 19, 2014 at 11:02 PM, Dmitriy Lyubimov <[email protected]>wrote:

> I am not sure if we have direct CSV converters to do that; CSV is not that
> expressive anyway. But it is not difficult to write up such converter on
> your own, i suppose.
>
> The steps you need to do is this :
>
> (1) prepare set of data points in a form of (unique vector key, n-vector)
> tuples. Vector key can be anything that can be adapted into a
> WritableComparable. Notably, Long or String. Vector key also has to be
> unique to make sense for you.
> (2) save the above tuples into a set of sequence files so that sequence
> file key is unique vector key, and sequence file value is
> o.a.m.math.VectorWritable.
> (3) decide how many dimensions there will be in reduced space. The key is
> reduced, i.e. you don't need too many. Say 50.
> (4) run mahout ssvd --pca true --us true --v false -k <k> .... . The
> reduced dimensionality output will be in the folder USigma. The output will
> have same keys bounds to vectors in reduced space of k dimensions.
>
>
> On Wed, Mar 19, 2014 at 9:45 AM, Vijay B <[email protected]> wrote:
>
> > Hi All,
> > I have a CSV file on which I've to perform dimensionality reduction. I'm
> > new to Mahout, on doing some search I understood that SSVD can be used
> for
> > performing dimensionality reduction. I'm not sure of the steps that have
> to
> > be executed before  SSVD, please help me.
> >
> > Thanks,
> > Vijay
> >
>

Reply via email to