Simply write a Java program, create the vectors per user or item (don't know how you want to cluster) and write them out via SequenceFileWriter.
On 01.07.2013 02:29, Carlos Seminario wrote: > Hi: I want to vectorize the movielens 100K dataset as a > RandomAccessSparseVector and use it to run Mahout k-means clustering. Has > anyone done this before? If not, any ideas on a how this can be done? (BTW, > movielens dataset contains ~100K records/lines with this format: userid, > itemid, rating, unix time.) > > Thanks .. Carlos >
