Simply write a Java program, create the vectors per user or item (don't
know how you want to cluster) and write them out via SequenceFileWriter.

On 01.07.2013 02:29, Carlos Seminario wrote:
> Hi: I want to vectorize the movielens 100K dataset as a
> RandomAccessSparseVector and use it to run Mahout k-means clustering. Has
> anyone done this before? If not, any ideas on a how this can be done? (BTW,
> movielens dataset contains ~100K records/lines with this format: userid,
> itemid, rating, unix time.)
> 
> Thanks .. Carlos
> 

Reply via email to