Nothing should require a local or HDFS path. Which job/class is this? On Fri, Apr 20, 2012 at 3:17 AM, Paritosh Ranjan <[email protected]> wrote: > I am not sure about this, however I see a txt ( -i output.txt ) as the input > to kmeans. KMeans input is supposed to take a hdfs Path as input. > Ignore if its a hdfs path. > > > On 19-04-2012 03:23, Jeff Eastman wrote: >> >> Are you running seq2sparse in there somewhere? It has a -nv option that >> will produce NamedVectors in its vector output. These will pass through the >> clustering and be evident in the clusterdump output. >> >> On 4/18/12 3:08 PM, Robert Stewart wrote: >>> >>> I am running kmeans clustering on vectors extracted from a lucene index. >>> >>> What I want as my end result is a mapping of document ID to the cluster >>> for each document. How can I get that output? I see many other people also >>> want this but I dont see enough detail in any solution that helps me enough >>> to get it. >>> >>> So far I do this: >>> >>> ./mahout lucene.vector -d ~/clusterdemo/solr/data/index/ -f text >>> --idField id --output output.txt --dictOut dict.txt >>> >>> ./mahout kmeans -i output.txt -o kmeans -x 10 -k 100 -ow --clusters >>> clusters -cl >>> >>> ./mahout clusterdump --dictionary dict.txt --seqFileDir >>> kmeans/clusters-10-final --dictionaryType text --pointsDir >>> kmeans/clusteredPoints --output dump >>> >>> But what I see inside "dump" file does not contain any mapping from >>> document ID to each cluster. How can I get that? Should not be this hard >>> to get the most obvious/useful output IMO ;) >>> >>> Thanks >>> Bob >>> >>> >>> >> >
-- Lance Norskog [email protected]
