The input needs to be converted to a sequencefile of vectors in order to be processed by Mahout's pipeline. This has been asked a few times recently and search for Kevin Moulart's recent posts for doing this in the mail archives.
The converted vectors are then fed to RowIdJob with output matrix and docIndex, then feed the matrix (which is a DRM) to RowSimilarityJob. On Fri, May 23, 2014 at 1:31 AM, jamal sasha <[email protected]> wrote: > Hi, > I have data where each row is comma seperated vector... > And these are bunch of text files...like > 0.123,01433,0.932 > 0.129,0.932,0.123 > And I want to run's mahout rowIdSimilarity module on it.. butI am guessing > the input requirement is different. > How do I convert this csv vectors into format consumed by mahout > rowIdSimilarity module? > Thanks >
