The Synthetic Control examples use a similar (but space delimited) input format and there is an InputDriver in integration/ which can convert those files into Mahout Vector sequence files. You could easily modify the InputMapper to be comma delimited or modify your own file formats to use spaces.

On 1/9/12 12:50 PM, Daniel Quach wrote:
I have a file of vectors I formulated in csv format, and I want to use mahout 
to perform k-means clustering on the vectors in this file.

However, it seems mahout expects the input data to be formatted in a 
SequenceFile format, and I'm not sure if there's a way to easily do this (are 
there existing tools?)


Reply via email to