The input needs to be converted to a sequencefile of vectors in order to be
processed by Mahout's pipeline. This has been asked a few times recently
and search for Kevin Moulart's recent posts for doing this in the mail
archives.

 The converted vectors are then fed to RowIdJob with output matrix and
docIndex, then feed the matrix (which is a DRM) to RowSimilarityJob.




On Fri, May 23, 2014 at 1:31 AM, jamal sasha <[email protected]> wrote:

> Hi,
>    I have data where each row is comma seperated vector...
> And these are bunch of text files...like
> 0.123,01433,0.932
> 0.129,0.932,0.123
> And I want to run's mahout rowIdSimilarity module on it.. butI am guessing
> the input requirement is different.
> How do I convert this csv vectors into format consumed by mahout
> rowIdSimilarity module?
> Thanks
>

Reply via email to