The job is kmeans, class is KMeansDriver. The input there is supposed to be a
org.apache.hadoop.fs.Path containing a sequence file.
I see a txt file being passed.
./mahout kmeans -i output.txt -o kmeans -x 10 -k 100 -ow --clusters
clusters -cl
On 22-04-2012 04:55, Lance Norskog wrote:
Nothing should require a local or HDFS path. Which job/class is this?
On Fri, Apr 20, 2012 at 3:17 AM, Paritosh Ranjan<[email protected]> wrote:
I am not sure about this, however I see a txt ( -i output.txt ) as the input
to kmeans. KMeans input is supposed to take a hdfs Path as input.
Ignore if its a hdfs path.
On 19-04-2012 03:23, Jeff Eastman wrote:
Are you running seq2sparse in there somewhere? It has a -nv option that
will produce NamedVectors in its vector output. These will pass through the
clustering and be evident in the clusterdump output.
On 4/18/12 3:08 PM, Robert Stewart wrote:
I am running kmeans clustering on vectors extracted from a lucene index.
What I want as my end result is a mapping of document ID to the cluster
for each document. How can I get that output? I see many other people also
want this but I dont see enough detail in any solution that helps me enough
to get it.
So far I do this:
./mahout lucene.vector -d ~/clusterdemo/solr/data/index/ -f text
--idField id --output output.txt --dictOut dict.txt
./mahout kmeans -i output.txt -o kmeans -x 10 -k 100 -ow --clusters
clusters -cl
./mahout clusterdump --dictionary dict.txt --seqFileDir
kmeans/clusters-10-final --dictionaryType text --pointsDir
kmeans/clusteredPoints --output dump
But what I see inside "dump" file does not contain any mapping from
document ID to each cluster. How can I get that? Should not be this hard
to get the most obvious/useful output IMO ;)
Thanks
Bob