+1 I would love to see that feature. Sadly enough I am no Java- guy myself.
Am 09.01.2012 um 22:59 schrieb Jeff Eastman <[email protected]>: > Even better, you might figure out how to pass the desired delimiter into the > InputDriver as an argument and submit a patch to make that a permanent Mahout > feature. It should be straightforward and it would start you down the path to > become a committer. > > > On 1/9/12 2:52 PM, Jeff Eastman wrote: >> The Synthetic Control examples use a similar (but space delimited) input >> format and there is an InputDriver in integration/ which can convert those >> files into Mahout Vector sequence files. You could easily modify the >> InputMapper to be comma delimited or modify your own file formats to use >> spaces. >> >> On 1/9/12 12:50 PM, Daniel Quach wrote: >>> I have a file of vectors I formulated in csv format, and I want to use >>> mahout to perform k-means clustering on the vectors in this file. >>> >>> However, it seems mahout expects the input data to be formatted in a >>> SequenceFile format, and I'm not sure if there's a way to easily do this >>> (are there existing tools?) >>> >> >
