I am not sure if we have direct CSV converters to do that; CSV is not that expressive anyway. But it is not difficult to write up such converter on your own, i suppose.
The steps you need to do is this : (1) prepare set of data points in a form of (unique vector key, n-vector) tuples. Vector key can be anything that can be adapted into a WritableComparable. Notably, Long or String. Vector key also has to be unique to make sense for you. (2) save the above tuples into a set of sequence files so that sequence file key is unique vector key, and sequence file value is o.a.m.math.VectorWritable. (3) decide how many dimensions there will be in reduced space. The key is reduced, i.e. you don't need too many. Say 50. (4) run mahout ssvd --pca true --us true --v false -k <k> .... . The reduced dimensionality output will be in the folder USigma. The output will have same keys bounds to vectors in reduced space of k dimensions. On Wed, Mar 19, 2014 at 9:45 AM, Vijay B <[email protected]> wrote: > Hi All, > I have a CSV file on which I've to perform dimensionality reduction. I'm > new to Mahout, on doing some search I understood that SSVD can be used for > performing dimensionality reduction. I'm not sure of the steps that have to > be executed before SSVD, please help me. > > Thanks, > Vijay >
