Even better, you might figure out how to pass the desired delimiter into
the InputDriver as an argument and submit a patch to make that a
permanent Mahout feature. It should be straightforward and it would
start you down the path to become a committer.
On 1/9/12 2:52 PM, Jeff Eastman wrote:
The Synthetic Control examples use a similar (but space delimited)
input format and there is an InputDriver in integration/ which can
convert those files into Mahout Vector sequence files. You could
easily modify the InputMapper to be comma delimited or modify your own
file formats to use spaces.
On 1/9/12 12:50 PM, Daniel Quach wrote:
I have a file of vectors I formulated in csv format, and I want to
use mahout to perform k-means clustering on the vectors in this file.
However, it seems mahout expects the input data to be formatted in a
SequenceFile format, and I'm not sure if there's a way to easily do
this (are there existing tools?)