If the rows in the input for SSVD are data points you are trying to create reduced space for, then rows of USigma represent the same points in the PCA (reduced) space. The mapping between the input rows and output rows is by same keys in the sequence files. However, it doesn't look like your input is using distinct such values (1), this is not recommended.
SSVD will also propagate names if NamedVector is used for rows of the input. That's possibly another way to map input rows to PCA space rows in USigma. However, it doesn't look like the input is using Named vectors in this case. On Mon, Mar 17, 2014 at 10:22 PM, Vijaya Pratap <bvprat...@gmail.com> wrote: > Hi, > > I am trying to use SSVD for dimensionality reduction on Mahout, the input > is a sample data in CSV format. Below is a snippet of the input > > 22,2,44,36,5,9,2824,2,4,733,285,169 > 25,1,150,175,3,9,4037,2,18,1822,254,171 > > I have executed the below steps. > > 1. Loaded the csv file and Vectorized the data by following the steps > mentioned at https://github.com/tdunning/pig-vector with key as > TextConverter and value as VectorWritable. Listed below is the output of > this step. I believe the values 420468, 279945 are indices, please correct > me if I am wrong. > Key: 1: Value: > > {420468:733.0,279945:2.0,607618:285.0,107323:4.0,88330:2.0,263605:9.0,975378:169.0,796003:2824.0,899937:44.0,422862:5.0,723271:22.0,508675:36.0} > Key: 1: Value: > > {420468:1822.0,279945:2.0,607618:254.0,107323:18.0,88330:1.0,263605:9.0,975378:171.0,796003:4037.0,899937:150.0,422862:3.0,723271:25.0,508675:175.0} > > 2. Passed the output of the above command to SSVD as follows > bin/mahout ssvd -i /user/cloudera/vectorized_data/ -o > /user/cloudera/reduced_dimensions --rank 7 -us true -V false -U false -pca > true -ow -t 1 > > Below is a snippet of the output in USigma folder > Key: 1: Value: > > {0:190.78376981262613,1:350.30406212052424,2:78.24932121461198,3:98.67283686605012,4:-122.95056058078157,5:-4.201436498582381,6:-1.4370820809434337} > Key: 1: Value: > > {0:1295.933786837574,1:-698.5629072274602,2:-24.15996813349674,3:60.936737740013946,4:11.859426028893711,5:-6.379057682687426,6:0.9356299409590896} > > Please let me know if my approach is correct and help me in interpreting > the output in USigma folder > > > Thanks in advance > Pratap >